On this page you can find our offerings for Master’s Projects and Master and Bachelor Research Projects in the realm of data mining and machine learning for (vocational) education ecosystems for the autumn semester 2026. Please note that this list of projects is not exhaustive and will be updated over the coming days with further exciting projects.
Last update: 29.04.2026
How to apply
Please apply via our student project application form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.
The student project application form will remain open for submissions until the late deadline of 15 August 2026. Applications will be reviewed on a on-going basis, so we strongly encourage you to apply as soon as possible. Early applicants will be considered starting in calendar week 21, with the early deadline set for 14 May 2026:
- Early deadline (first round): 14.05.2026
- First contact with supervisors: 18.05.2026 – 24.05.2026
- Late deadline: 15.08.2026
External students: Non EPFL students are kindly requested to get in touch with the project(s) supervisors by e-mail.
Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.
Project 1: Hierarchical Multi-Agent Systems for Simulated Learners in Inquiry-Based Environments
Large Language Models(LLMs) are increasingly used as simulated learners to support the development and evaluation of educational technologies. A central challenge is aligning agent behaviour with authentic student behaviour in inquiry-based environments such as Beer’s Law Lab, a virtual lab where students investigate Beer’s Law by varying solution characteristics and observing light absorbance. Here the action space is continuous, which amplifies the data problem since the space of plausible behaviours is vast and only sparsely covered by available student traces. Prior work on LLM-based simulated learners has not addressed design of architectures for handling these continuous action spaces.
Objectives:
- Design a hierarchical multi-agent system consisting of small specialised models for individual environment components (e.g., preparing solutions, setting path length, recording measurements), coordinated by a higher-level orchestrator agent.
- Investigate whether decomposing student behaviour into component-level skills improves behavioural alignment with real student data compared to a single LLM agent baseline.
- Determine whether fine-tuning is required for lower-level skill models or whether in-context learning suffices, and identify effective data sources for training.
- Evaluate the full system against a single-agent baseline on alignment metrics (behavioural similarity to held-out student traces) and task-level outcomes.
Requirements:
- Interest in: Multi-Agent Systems, Reinforcement Learning, NLP, Educational Technology
- Skills: Python, basic Reinforcement Learning, familiarity with Hugging Face TRL
Level: Master
Supervision: Bahar Radmehr (PhD student)
Project 2: Are Students Lazy Experts? Reframing LLM Alignment as Learning When and How to Depart from the Default Policy
Large Language Models(LLMs)used as simulated learners must be aligned not just to a generic “student” but to specific behavioural profiles capturing the diversity of real student behaviour. Standard alignment approaches such as DPO and GRPO adjust the policy globally and are constrained by a KL penalty against a reference model. This works well when the target profile is close to the model’s default, but struggles with distant profiles, such as students who perform weak or cluttered explorations, or who progress unusually slowly, which are exactly those whose presence is essential for a realistic simulated-learner population. Despite this need, no alignment method has been designed with simulated learners in mind or offers explicit control over where and how the policy should deviate from its default behaviour.
Objectives:
- Develop an alignment method that keeps the default policy intact and learns a lightweight gating policy that decides, at each step, whether to act from the default or from a profile-specific deviation head.
- Investigate whether selective deviation improves profile recovery on distant profiles compared to standard alignment approaches under matched compute.
- Design and evaluate a deviation budget parameterised as a function of profile distance, exploring whether this relationship can be learned from limited student data rather than hand-tuned.
- Assess whether selective deviation preserves default-policy behaviour on profile-irrelevant aspects (e.g., valid action generation, format adherence), reducing the capability degradation typically seen when lowering the KL penalty.
Requirements:
- Interest in: LLM Alignment, Reinforcement Learning, Simulated Learners, Educational Technology,
- Skills: Python, basic RL, familiarity with Hugging Face TRL
Level: Master
Supervision: Bahar Radmehr (PhD student)
Project 3: Disentangling Prompt and Knowledge Uncertainty in LLM Reasoning for Interactive Feedback
Large Language Models (LLMs) are increasingly used to provide students with feedback and support follow-up interaction, but they can produce fluent responses that are incorrect, misleading, or based on flawed reasoning. This uncertainty may come from an ambiguous or underspecified prompt, also known as aleatoric uncertainty, or from missing knowledge, also known as epistemic uncertainty. However, current uncertainty-estimation methods rarely examine where uncertainty appears inside the reasoning traces of thinking models. The goal of this project is to adapt uncertainty-estimation methods to LLM reasoning traces in student-facing feedback systems, identify uncertain reasoning steps, distinguish prompt-related from knowledge-related uncertainty, and communicate this information to learners to support more informed use and appropriate trust in the model’s output.
Objectives:
- Develop a controlled evaluation setup that uses variations of student questions, feedback prompts, and available course context to separate prompt-related uncertainty from knowledge-related uncertainty in reasoning traces.
- Adapt existing uncertainty methods to reasoning steps, using signals such as token entropy, top-token margin, and hidden-state features.
- Build lightweight predictors that estimate the uncertainty type and provide a test-time uncertainty score.
- Evaluate whether targeted interventions, such as structured prompting or added evidence, improve feedback quality and interactive response correctness.
- Explore how to communicate uncertainty to students through confidence levels, likely causes, and suggested next actions.