On this page you can find our offerings for Master’s Projects and Master and Bachelor Research Projects in the realm of data mining and machine learning for (vocational) education ecosystems for the autumn semester 2026. Please note that this list of projects is not exhaustive and will be updated over the coming days with further exciting projects.
Last update: 11.05.2026
How to apply
Please apply via our student project application form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.
The student project application form will remain open for submissions until the late deadline of 15 August 2026. Applications will be reviewed on a on-going basis, so we strongly encourage you to apply as soon as possible. Early applicants will be considered starting in calendar week 21, with the early deadline set for 14 May 2026:
- Early deadline (first round): 14.05.2026
- First contact with supervisors: 18.05.2026 – 24.05.2026
- Late deadline: 15.08.2026
External students: Non EPFL students are kindly requested to get in touch with the project(s) supervisors by e-mail.
Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.
Project 1: Hierarchical Multi-Agent Systems for Simulated Learners in Inquiry-Based Environments
Large Language Models(LLMs) are increasingly used as simulated learners to support the development and evaluation of educational technologies. A central challenge is aligning agent behaviour with authentic student behaviour in inquiry-based environments such as Beer’s Law Lab, a virtual lab where students investigate Beer’s Law by varying solution characteristics and observing light absorbance. Here the action space is continuous, which amplifies the data problem since the space of plausible behaviours is vast and only sparsely covered by available student traces. Prior work on LLM-based simulated learners has not addressed design of architectures for handling these continuous action spaces.
Objectives:
- Design a hierarchical multi-agent system consisting of small specialised models for individual environment components (e.g., preparing solutions, setting path length, recording measurements), coordinated by a higher-level orchestrator agent.
- Investigate whether decomposing student behaviour into component-level skills improves behavioural alignment with real student data compared to a single LLM agent baseline.
- Determine whether fine-tuning is required for lower-level skill models or whether in-context learning suffices, and identify effective data sources for training.
- Evaluate the full system against a single-agent baseline on alignment metrics (behavioural similarity to held-out student traces) and task-level outcomes.
Requirements:
- Interest in: Multi-Agent Systems, Reinforcement Learning, NLP, Educational Technology
- Skills: Python, basic Reinforcement Learning, familiarity with Hugging Face TRL
Level: Master
Supervision: Bahar Radmehr (PhD student)
Project 2: Are Students Lazy Experts? Reframing LLM Alignment as Learning When and How to Depart from the Default Policy
Large Language Models(LLMs)used as simulated learners must be aligned not just to a generic “student” but to specific behavioural profiles capturing the diversity of real student behaviour. Standard alignment approaches such as DPO and GRPO adjust the policy globally and are constrained by a KL penalty against a reference model. This works well when the target profile is close to the model’s default, but struggles with distant profiles, such as students who perform weak or cluttered explorations, or who progress unusually slowly, which are exactly those whose presence is essential for a realistic simulated-learner population. Despite this need, no alignment method has been designed with simulated learners in mind or offers explicit control over where and how the policy should deviate from its default behaviour.
Objectives:
- Develop an alignment method that keeps the default policy intact and learns a lightweight gating policy that decides, at each step, whether to act from the default or from a profile-specific deviation head.
- Investigate whether selective deviation improves profile recovery on distant profiles compared to standard alignment approaches under matched compute.
- Design and evaluate a deviation budget parameterised as a function of profile distance, exploring whether this relationship can be learned from limited student data rather than hand-tuned.
- Assess whether selective deviation preserves default-policy behaviour on profile-irrelevant aspects (e.g., valid action generation, format adherence), reducing the capability degradation typically seen when lowering the KL penalty.
Requirements:
- Interest in: LLM Alignment, Reinforcement Learning, Simulated Learners, Educational Technology,
- Skills: Python, basic RL, familiarity with Hugging Face TRL
Level: Master
Supervision: Bahar Radmehr (PhD student)
Project 3: Disentangling Prompt and Knowledge Uncertainty in LLM Reasoning for Interactive Feedback
Large Language Models (LLMs) are increasingly used to provide students with feedback and support follow-up interaction, but they can produce fluent responses that are incorrect, misleading, or based on flawed reasoning. This uncertainty may come from an ambiguous or underspecified prompt, also known as aleatoric uncertainty, or from missing knowledge, also known as epistemic uncertainty. However, current uncertainty-estimation methods rarely examine where uncertainty appears inside the reasoning traces of thinking models. The goal of this project is to adapt uncertainty-estimation methods to LLM reasoning traces in student-facing feedback systems, identify uncertain reasoning steps, distinguish prompt-related from knowledge-related uncertainty, and communicate this information to learners to support more informed use and appropriate trust in the model’s output.
Objectives:
- Develop a controlled evaluation setup that uses variations of student questions, feedback prompts, and available course context to separate prompt-related uncertainty from knowledge-related uncertainty in reasoning traces.
- Adapt existing uncertainty methods to reasoning steps, using signals such as token entropy, top-token margin, and hidden-state features.
- Build lightweight predictors that estimate the uncertainty type and provide a test-time uncertainty score.
- Evaluate whether targeted interventions, such as structured prompting or added evidence, improve feedback quality and interactive response correctness.
- Explore how to communicate uncertainty to students through confidence levels, likely causes, and suggested next actions.
Project 4: Fading AI Scaffolding for Reflective Writing Skill Transfer
Objectives:
- Design a reflective writing system that implements staged reduction of AI support (e.g., from full conversational scaffolding to minimal prompts to no support).
- Investigate how learners perceive and respond to the gradual withdrawal of scaffolding, including the strategies they develop to maintain reflection quality.
- Examine whether fading support increases learners’ metacognitive awareness and sense of ownership over their reflections.
- Develop and evaluate NLP models to be able to provide reflective writing support.
- Conduct a user study (likely qualitative, e.g., multi-session use with think-aloud protocols and interviews) to inform the design of future large-scale classroom evaluations.
Requirements:
Interest in: LLMs, writing support with AI, human-computer interaction, educational technology
Skills: NLP and machine learning (knowing front-end is a bonus)
Level: Master
Supervision: Seyed Parsa Neshaei (PhD Student)
Project 5: In-the-Moment Capture for Enhancing Reflective Writing
- Design a tool that enables learners to capture brief in-situ inputs (e.g., text, voice, or other modalities) during learning experiences.
- Integrate captured data into a structured reflective writing interface, augmented with AI-based support.
- Analyze how learners engage with the capture tool, including the types of inputs they produce (e.g., observational, emotional, or question-based).
- Develop and evaluate NLP models to be able to provide reflective writing support.
- Conduct qualitative or mixed-method studies (depending on access and classroom availabliity) to inform future controlled experiments in authentic educational settings.
Requirements:
Interest in: LLMs, writing support with AI, human-computer interaction, educational technology
Skills: NLP and machine learning (knowing front-end is a bonus)
Level: Master
Supervision: Seyed Parsa Neshaei (PhD Student)
Project 6: Human-in-the-Loop Scenario Authoring for Diagnostic Reasoning in PharmaSim
Generative AI makes it easy to create scenario-based learning (SBL) experiences, but fully AI-generated scenarios often lack pedagogical control, transparency, and alignment with learning goals. This project explores how teachers can be supported in authoring diagnostic reasoning scenarios through a structured human-in-the-loop workflow rather than one-shot generation.The project proposes a PharmaSim-specific scenario authoring tool that enables pharmacy teachers to create diagnostic client cases, including client profiles, symptoms, possible causes, pedagogical strategies, and transfer scenarios. The system guides educators through pedagogically meaningful authoring steps while exposing intermediate representations that can be inspected, refined, and validated.
Objectives:
- Design a scenario authoring workflow for creating diagnostic reasoning scenarios in vocational pharmacy education.
- Develop structured authoring components for:
- client cases and contextual information
- possible causes and likelihoods
- pedagogical scaffolding strategies
- transfer scenarios
- interaction and information-release logic
- Implement AI-supported generation of intermediate scenario representations that teachers can iteratively refine.
- Support multiple pharmacist-agent pedagogical styles, including structuring, problematizing, hybrid, and no-scaffolding modes.
- Investigate whether structured human-in-the-loop authoring improves pedagogical alignment, controllability, and diversity of generated scenarios.
Requirements:
Interest in: Generative AI, Human-Computer Interaction, Educational Technology, Scenario-Based Learning
Skills: Unity and C#, familiarity with LLMs or prompt engineering, experience with UI/UX implementation or conversational systems is a bonus
Level: Master
Supervision: Fatma Betül Güreş (PhD student)
Project 7 : Exploring reasoning strategies for LLM-based problem generation
Project 8: Cross-Lingual On-Policy Self-Distillation for Educational Tasks
Large Language Models (LLMs) used in educational contexts—such as providing feedback, identifying misconceptions, and interactive tutoring—show significant performance gaps between high-resource (e.g., English) and low-resource languages. Existing cross-lingual transfer methods focus on reasoning or general NLU tasks, while educational tasks (tutoring, feedback generation, explanation) remain understudied. Educational tasks present unique challenges: multiple acceptable responses, cultural appropriateness, and pedagogical scaffolding—properties that standard methods for reasoning transfer do not address. Despite this need, no alignment or distillation method has been designed with educational tasks in low-resource languages in mind or offers explicit control over where and how the model should deviate from high-resource behavioral norms.
Objectives:
Implement two variants of on-policy self-distillation for pedagogical transfer. Variant A (Language-as-Privilege): Student generates low-resource output; teacher = same model conditioned on English input. Compute token-level advantage from log-probability differences. Variant B (Feedback Self-Distillation): Student generates low-resource output; model generates English feedback evaluating that output; teacher conditions on input + English feedback. Develop entropy-aware divergence weighting (forward KL for uncertain tokens, reverse KL for confident tokens) that respects pedagogical flexibility—recognizing that a good tutoring response in Hindi may not be a literal translation of the English response due to cultural norms for politeness, indirectness, and scaffolding. Compare against multiple baselines (zero-shot, translate-test, supervised fine-tuning, and prior cross-lingual distillation methods) on two dimensions: (a) task performance and data efficiency, and (b) preservation of pedagogical quality (feedback appropriateness, cultural sensitivity, dialogue naturalness) in low-resource languages. Assess whether the method avoids degrading high-resource language performance (preventing catastrophic forgetting of English pedagogical capabilities), a known risk when lowering KL constraints in alignment methods.
Requirements:
Interest in: LLM Alignment, Cross-Lingual Transfer, Educational Technology, Pedagogical
NLP
Skills: Python, PyTorch, Hugging Face Transformers, familiarity with RL concepts
Level: Master
Supervision: Jiaxu Zhao (postdoc)