Student Projects

On this page you can find our offerings for Master’s Projects and Master and Bachelor Research Projects in the realm of data mining and machine learning for (vocational) education ecosystems for the autumn semester 2025. Please note that this list of projects is not exhaustive and will be updated over the coming days with further exciting projects.

Last update: 02.07.2025

How to apply

Please apply via our student project application form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.

The student project application form will remain open for submissions until the late deadline of 15 August 2025. Applications will be reviewed on a on-going basis, so we strongly encourage you to apply as soon as possible. Early applicants will be considered starting the in calendar week 21, with the early deadline set for 23 May 2025:

  • Early deadline: 23.05.2025
  • First contact with supervisor: between 26.05.2025 and 03.05.2025
  • Late deadline: 15.08.2025

External students: Non EPFL students are kindly requested to get in touch with the project(s) supervisors by e-mail.

Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.

Project 1: Improving Tool Use and Reasoning in Small Language Models for Interactive Feedback via RLHF

Tool-augmented large language models (LLMs) hold promise for delivering interactive, personalized feedback in educational settings. However, even with structured fine-tuning, smaller open-source models often struggle with selecting appropriate tools and generating factually grounded responses, leading to hallucinations, irrelevant output, or misapplied tool usage.

While supervised fine-tuning improves the actionability and relevance of responses, gains in tool relevance, correctness, and multi-step reasoning remain limited. To address this, we aim to incorporate Reinforcement Learning from Human Feedback (RLHF) to optimize models for more faithful tool use, accurate reasoning, and higher-quality responses.

Objectives:
  • Design a custom RLHF pipeline for tool-augmented models, including a reward model that scores tool relevance, factual accuracy, and reasoning quality.
  • Improve performance on reasoning, tool selection, and correctness, particularly in smaller models (≀8B) where these capabilities are underdeveloped.
  • Evaluate the impact of RLHF and compare it with supervised fine-tuning
Requirements:
  • Interest in: Natural Language Processing, Human-AI Feedback, Educational Technology
  • Skills: Python, Hugging Face Transformers, Machine Learning

Level: Master

Supervision: Fares Fawzi (PhD student)

Project assigned: Project 2: Learning Contextualized Vector Representations of Self-Regulated Learning Behavior

Understanding student behaviors in digital learning environments that require self-regulated learning (SRL, learners must plan, monitor and adapt their learning) is crucial for supporting them. The outstanding advances of modern NLP might contribute to this challenge. E2VEC, a recent work in educational data mining created contextualized vector representations of tokenized student log data from digital learning environments (navigation events). Such representations of learning behavior have the potential to discover temporal and semantic patterns of SRL behaviors that are not captured by established methods.
 
In this ML4ED semester project, you will explore cornerstone NLP techniques such as GloVe and BERT for generating vector representations of primary school students’ behavior in a project-based learning unit. You will have access to a longitudinal dataset of real students in a naturalistic learning context. The goal is to shed light on the advantages and disadvantages of this analytical pipeline to understand student learning, for example with embedding analysis.
 
The project will focus on:
  • generalizable tokenization of log stream events
  • methods to pre-train and fine-tune contextualized vector representations
  • prediction of student characteristics from the representations
Requirements:
  • Proficiency in: Python, NLP (transformers), Machine Learning
  • Interest in: Learning Sciences, Student Modeling and NLP for Education

Level: Master

Supervision: Dominik Glandorf (PhD Student)

Project 3: Exploring Retrieval and NLP Approaches for Reflective Writing Support

Reflective writing fosters metacognitive and self-regulated learning, yet students often struggle to articulate meaningful reflections. This project investigates how modern NLP techniques—such as retrieval-augmented generation (RAG), topic modeling, and large language models—can support metacognitive writing, e.g., reflective writing, in educational settings.

You will begin by reviewing relevant retrieval and generation approaches and apply selected models to a dataset of student reflections. Models will be evaluated based on how well they retrieve meaningful prior reflections and generate helpful suggestions, using a combination of evaluation methods. Promising approaches may be further developed into tools or prototypes for educational use.

The project will focus on:

  • Reviewing NLP approaches for reflective and educational text

  • Applying and comparing models (e.g., RAG, BERTopic, LLMs)

  • Evaluating reflection retrieval and suggestion quality

  • Exploring potential educational applications or interfaces

    … with the possible goal of an NLP submission (e.g., ACL, EMNLP, etc.)

Requirements:

  • Proficiency in: Python, NLP (e.g., Transformers, topic modeling, retrieval)
  • Interest in: Learning Sciences, Reflective Writing, NLP for Education

Level: Master

Supervision: Seyed Parsa Neshaei (PhD student)

Project assigned: Project 4: Exploring AI-Supported Collaborative Reflection in Learning Environments

Reflection is a key component of deep learning and metacognitive growth, but it doesn’t always occur in isolation or through writing alone. This project explores how collaborative and group-based reflection can be supported through digital and AI-powered tools, expanding the traditional view of reflection as solely a written, individual activity.

You will begin by reviewing recent research on collaborative reflection and its mediation through digital platforms. The project may involve prototyping or evaluating modalities beyond text, such as voice-based or multimodal reflection, on-the-fly concept mapping, or even immersive reflection support using AR/VR technologies. The goal is to investigate how technology can enhance shared reflective processes in educational contexts. The direction of the project can include system design, empirical analysis, developing necessary ML/NLP models, and technical prototyping.

The project will focus on:

  • Reviewing research on collaborative and multimodal reflection + AI-mediated reflection beyond written modalities
  • Developing necessary ML/NLP models for the backbone of the system
  • Designing and/or evaluating interaction methods + investigating educational applications of shared reflection tools

… with the possible goal of an HCI (e.g., CHI) or an AI in education (e.g., AIED) submission.

Requirements:

  • Proficiency in: basic ML/NLP (bonus: front-end development experience)
  • Interest in: Learning Sciences, HCI, interaction design, AI in Education

Level: Master

Supervision: Seyed Parsa Neshaei (PhD student)

Project 5: Federated RLHF for LLM Training on a AI Learning Platform

This project lies at the intersection of privacy-preserving machine learning, LLM training, and educational technology. Large Language Models (LLMs), such as GPT-4 and LLaMA, have transformed AI-driven education, but tailoring these models to align with user preferences while ensuring data privacy remains a critical challenge. Federated Learning (FL) combined with Reinforcement Learning with Human Feedback (RLHF) offers a unique solution to this problem, enabling distributed, privacy-conscious model training.

This project focuses on extending a current Federated RLHF training implementation and integrating it in ScholĂ©, a spinoff AI learning platform from the ML4ED lab designed for context-driven, job relevant education. Building on the foundation of FL frameworks, the project will explore how iterative feedback loops can be integrated across diverse user groups to refine the LLM’s capabilities. By leveraging user interactions from ScholĂ©, the aim is to develop a system that learns collaboratively without sharing sensitive data, fostering trust and personalization in educational AI systems.

We will also evaluate Federated RLHF’s impact on alignment, privacy preservation, and model performance using both qualitative and quantitative metrics.

Requirements:

  • Interest in: Federated Learning, Reinforcement Learning, Large Language Models, Privacy-preserving AI
  • Proficiency in: Python; Machine Learning frameworks (e.g., PyTorch, TensorFlow); Hugging Face Transformers
  • Bonus: Experience with Federated Learning frameworks (e.g., Flower, FedML); knowledge of RLHF

Further details

This semester project is aimed at one MSc student (semester project or thesis) with strong technical skills. The student will be supervised by Vinitra Swamy (PostDoc), Paola Mejia Domenzain (PostDoc), and Maxime Perrot (Engineer). This project is aligned with a spinoff initiative from the ML4ED Lab called Scholé AI.

Project assigned: Project 6: Usability Studies for Enhancing Scholé’s User Experience

This project sits at the intersection of Human-Computer Interaction (HCI), educational technology, and digital humanities. Scholé, an AI learning platform, strives to improve its usability and accessibility to better engage learners and educators. By integrating insights from digital humanities and HCI, this project seeks to design user studies and implement enhancements that align with human-centric design principles.

The focus will be on conducting structured usability studies, analyzing platform pain points, and iterating on design modifications to foster inclusivity and user engagement. This project also emphasizes interdisciplinary methods by incorporating theories and practices from digital humanities, ensuring the design process reflects diverse learner needs.

We will evaluate usability improvements through mixed-methods research, combining qualitative user feedback with quantitative usability metrics, and propose actionable recommendations for platform enhancements.

Requirements

  • Interest in: Human-Computer Interaction and Usability Testing
  • Bonus: Experience in interaction design, front-end development, or conducting user studies

Further details

This semester project is tailored for a digital humanities student, learning science student, or HCI enthusiast. The student will be supervised by Paola Mejia (PostDoc), Vinitra Swamy (PostDoc), and Maxime Perrot (Engineer). This project is aligned with a spinoff initiative from the ML4ED Lab called Scholé AI.

 

Project 7: Empower TAs with Pedagogical Chatbot

With advancements in generative AI, integrating AI-generated feedback in education offers scalability but raises significant challenges regarding its correctness and pedagogical quality. So, automated feedback systems delivering AI-generated feedback without additional oversight could harm the learning process by diverting students’ efforts from the intended learning goals, wasting time and diminishing motivation. Promoting human-AI hybrid feedback systems is a robust solution for the above concerns. Such AI and human-experts collaboration can be done at different levels, e.g., by incorporating fine-grained assessment rubrics, making AI-generated feedback more aligned with learning goals, or enabling human experts to oversee and refine AI-generated feedback before delivery.
EPFL utilizes Ed Discussion, an interactive platform facilitating course-related communications, including exercise/project Q&A discussions. EPFL students widely use these forums. However, delayed responses from TAs or peers often reduce communication effectiveness, leading to student frustration. In addition, TAs often lack a pedagogical background and struggle to produce practical, encouraging learning answers for their students. To overcome these challenges, our lab, in collaboration with EPFL – Center for Digital Education (CEDE), started an effort to assist and empower TAs by utilizing generative AI tools.
This project aims to investigate various scenarios for implementing a hybrid human-AI approach based on state-of-the-art large language models and generative AI techniques, develop pilot conversational AI systems, integrate them into Ed Discussion, and evaluate them on real EPFL student interactions.
 
Requirements:
  • An interest in Educational Technology, Natural Language Processing, Conversational AI
  • Proficiency in Python; Hugging Face Transformers; Machine Learning
  • Optional: Experience with fine-tuning LLMs

Level: Master

Supervision: Tanya Nazaretsky (Postdoc)

Project 8: Evaluating the Robustness of LLM-Based In-Context Guidance

As Large Language Models (LLMs) are increasingly deployed in interactive settings, a key question arises: can LLMs provide effective guidance without leaking task-critical information or solutions? This project investigates the robustness and alignment of LLM-generated instructional content in agent-based simulations, focusing on the teacher–student paradigm as a testbed for probing the limits of in-context learning and instruction-following behavior.The primary goal is to design and implement a simulation framework in which two LLM agents—a teacher and a student—interact iteratively. The teacher agent is tasked with providing contextually helpful guidance without revealing the full solution.This project will explore (1) the robustness of teacher models, (2) the effectiveness of in-context guidance in task progression without direct answer disclosure, (3) the design space of agentic LLM interactions, including role conditioning and memory management, (4) the emergent behaviors of student agents under varying instruction-following constraints.The resulting framework will serve as a platform to systematically evaluate alignment, leakage risks, and pedagogical robustness across model scales and prompting strategies.

Requirements:

  • Interest in: Large Language Models, Robustness & Alignment, Multi-agent Simulation, In-context Learning
  • Proficiency in: Python; ML/NLP libraries
  • Bonus: Research experience in NLP, agent-based architectures

Further details:This semester project is intended for one MSc student (semester project or thesis) with technical background in NLP.  The project will involve designing and building the framework from scratch. The student will be supervised by PhD student Marta KneĆŸević. 

Project assigned: Project 9: Analyzing student experimentation behaviors in inquiry-based science simulations

Inquiry-based learning is one of the most effective approaches in science education. It enables students to experiment like real scientists and develop a deeper understanding of scientific phenomena. To support this approach, interactive simulations are often used. By analyzing students’ logged actions within these simulations, we can trace their learning pathways and identify the experimentation strategies they employ.
In this semester project, you will explore multiple datasets containing students’ log data from various inquiry-based experimentation tasks. These datasets differ in simulation context, students’ age and educational background, as well as the nature of the tasks and goals of the experimentation.
The goal of the project is to identify similarities and differences across the datasets in terms of the experimentation strategies applied and their influence on students’ conceptual learning outcomes.
 
Requirements:
  • Interest in: Learning Sciences, Data Science, Education
  • Proficiency in: Python; Machine Learning (mainly, clustering)
Level: Master
Supervision: Kate Shved (PhD Student)

Project assigned: Project 10: Video Lecture Content Analysis with Vision-Language Models for Learner Modeling

Video lectures are a core component of Massive Open Online Courses and Flipped Classrooms. While transcripts, OCR of slide content, and time-stamped tables of contents have improved accessibility, the recent breakthroughs in vision-language models (VLMs) unlock deeper analysis of their didactic structure. Integrating this fine-grained VLM analysis with learner viewing behavior creates exciting opportunities to model cognitive knowledge acquisition and metacognitive strategies from video watching.
In this ML4ED project provides access to an extensive dataset of over 1,000 video lectures and millions of associated user interactions. The primary goal is to develop a generalizable method for characterizing the didactic structure within this video content. You will leverage APIs and EPFL’s compute infrastructure to perform inference with large foundation models and fine-tune them using Parameter-Efficient Fine-Tuning (PEFT) techniques. The project involves both human evaluation and quantitative assessment of the predictive power of the created video descriptions to anticipate user behaviors, such as pausing or skipping segments.
The project will focus on:
  • Identification of a suitable taxonomy for the pedagogical description of educational video content
  • Assessing methods to employ VLMs for the automated coding of video content
  • Prediction of user interactions based on video content
Requirements:
  • Proficiency in: Python, LLMs and NLP
  • Interest in: Learning Sciences, Educational Data Mining
  • Level: Master
Supervision: Dominik Glandorf (PhD student)