Student Projects – Spring 2024

On this page you can find our offerings for Master’s Projects and Master and Bachelor Research Projects in the realm of data mining and machine learning for (vocational) education ecosystems for spring 2024.

Last update: 08.11.2023

How to apply

Please apply via our student project application form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.

We process applications in two rounds. We collect applications for projects for the first round until 01.12.2023. We will get back to you about your application in the first week after the deadline. If we do not get back to you during the indicated period, this means that we do unfortunately not have the capacity to supervise your project.

If you miss the deadline, you can still apply for projects that are vacant after the first round. We will leave the student project application form open for late applications until 26.01.2024. Applications submitted after the first deadline will be reviewed on an on-going basis. However, we strongly recommend that you apply as quickly as possible as we expect many projects to be taken after the first round.

External students: Non EPFL students are kindly requested to get in touch with the supervisors of the project(s) by e-mail.

Early deadline (first round): 01.12.2023

First contact with supervisors: 04.12.2023 – 08.12.2023

Late deadline: 26.01.2024 (contact with supervisors on an on-going basis)

Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.

Project 1: Fair Recommendation via Knowledge Graph Reasoning

Equipping Recommender Systems with Knowledge Graphs and/or Reinforcement Learning agents is a common practice in order to increase their predictive performances and to propose more personalized recommendations. However, thanks to their structure and explicit relations between entities, KGs have also been used to enhance the explainability of RS predictions. Recently, new approaches to explainable recommendation via RL-based path reasoning on KGs have been explored in order to provide intuitive explanations to RS users. The fairness of the recommendation of these graph reasoning-based models have been studied but not addressed yet. 

At ML4ED, we are currently developing an explainable RS for courses based on Graph Reasoning and the objective of this project is to study the fairness of our approach and propose solutions to make our system more fair. This project will focus on fairness related to demographic variables, mainly gender and ethnicity.

The schedule of the project will be the following;

  1. Understanding the problem and existing solutions. “How to measure and study the fairness of recommender systems? How to address the lack of fairness of Graph Reasoning-based recommender systems?”
    1. Literature review: reading relevant research papers
    2. Getting familiar with our current solution
  2. Propose an extension of our current solution based on the literature review
  3. Implement solutions
  4. Aim for a conference/workshop mid 2024


  • Required Skills: Python & Machine Learning
  • Preferred Skills: Experience with Pytorch, Recommender Systems and Reinforcement Learning

This semester project is aimed at one MSc student, who will work with Jibril Frej (Postdoc).

Project 2: Identifying Anomalous Student Behavior Through Clickstream Data Analysis

The objective of this project is to analyze clickstream data from online learning platforms and develop models to predict students’ behavior. The focus will be on identifying students whose behavior deviates significantly from the predicted patterns and analyzing these anomalous behaviors. By comparing these behaviors to known modes of behavior within online learning platforms, such as gaming or dropout, we aim to uncover patterns and potential insights for improving student engagement and success.

The first stage of this project will involve processing large amounts of clickstream data from various online learning platforms for analysis, and fitting a variety of models to predict students’ next action including: simple regression models, recurrent neural networks, and transformer based deep-learning models. The second stage of this project will examine the behavior of the most anomalous students by researching and implementing existing methods for detecting types of behavior within online learning platforms, and


  • Proficiency in data preprocessing, machine learning algorithms, and predictive modeling.
  • Strong programming skills in Python and experience with relevant libraries/frameworks, such as pandas and Pytorch/TensorFlow/Keras.

This project is aimed at one MSc student with strong technical skills. The student will be supervised by Ethan Prihar (Postdoc).

Project 3: Emotional State Prediction of Online Learners from Click-stream Data using Transformers

Predicting the emotional states of learners in an online educational setting has become increasingly crucial as it helps in providing personalized interventions and supports to improve learners’ academic outcomes and experiences. In this project, we aim to leverage students’ click-stream data from within an online learning platform to predict their emotional states. So far, work done in this field has relied on rule-based or recurrent deep-learning models. This project seeks to investigate the effectiveness of using transformer-based deep-learning models to predict students’ emotional states.

In this project, you will start by reviewing existing literature and fitting baseline models. Then, you will design and test a suite of transformer-based models. This will involve processing the clickstream data in multiple ways to determine which is most effective, and determining the best model architecture. After this, using the best model, you will identify the patterns that are most predictive of student behaviour and derive other valuable insights.


  • Interest in: Recurrent Neural Networks, Transformers and Model interpretation.
  • Proficiency in: Python; Deep Learning Algorithms; Jupyter Notebooks; PyTorch / Tensorflow / Keras.

This project targets a MSc student with intermediate to advanced technical skills. The student will be under the supervision of Ethan Prihar (postdoc). The project will span a full semester.

Project 4: LLMs as Human-Centric Explainers

This semester project is focused on the intersection of large language models (i.e. GPT 4, LLAMA) and explainable AI, towards trustworthy, human-centric models. We aim to prompt LLMs based on social science theory to generate natural language explanations of black-box model behavior. The prompting strategies will be designed from literature in the psychology of explanations and different scientific explanatory models (DN model, causal chains, abductive reasoning, contrastive explanations).
We will also examine different granularities of information (asking LLMs to imitate post-hoc explainers, asking LLMs to act as an explainer, instruction / example-based prompting) and different modalities of input (in the education setting, we’ll use time-series data of problems, videos, and forum posts). We will then conduct a large scale human evaluation to validate the explanation quality.
  • an interest in: Large Language Models, Explainable AI, User Studies
  • proficiency in: Python; NLP; Neural Networks; Jupyter Notebooks
Further details
This semester project is aimed at one MSc student with strong technical skills. The student will be supervised by Vinitra Swamy (PhD).

Project 5: overconfidence vs underconfidence effect

The main goal of this project is to investigate how over-confident and under-confident students behave on interactive simulations, and how certain patterns of behaviour can lead to conceptual understanding gains, or to the reinforcement of misconceptions based on the students’ state of mind.

Specifically, we investigate how students in vocational education classrooms use open ended learning environments (OELE). To that effect, we guide students’ interactions by giving them a problem which indirectly requires the students to extract the mathematical models behind these environments. We then observe how they make use of inquiry principles and whether they are efficient in their investigation.

To control what students have learnt from this activity, each of them is given a pre-test collecting their prior knowledge about the topic as well as their confidence about it, and a post-activity collecting information about what the students have learnt and how confident they feel about it.

In previous studies we’ve observed that some students tend to evaluate their ability to answer the questions incorrectly. For example, there are over-confident students who express high levels of confidence in their abilities but then perform poorly on post-test. This overestimation of their own competence can lead to a mismatch between their self-assessment and their actual performance. On the other hand there are under-confident students who express low confidence in their abilities but perform better than they expect on post-test. These students tend to underestimate their own competence, which can sometimes lead to self-doubt or anxiety.

The goal of this study will be to understand what drives students to be under-/over-confident, and how it affects their learning experience. Specifically, we want to investigate:

  1. What are the learning, behavioural, and prior markers linked to an over-/under-estimation of confidence?
  2. Can I train a model which predicts post-test and under-/over-confidence at the same time?
  3. Can I train a model which uses post-test predictions to make predictions about under-/over-estimation of confidence?

To answer those questions, you will need to build different models, all predicting over-/under-estimation of confidence based on behavioural features. Leveraging the parameters of your models as well as the nature of the prediction, we will investigate what this means in terms of learning preferences and behaviour.

In this project, you will be co-supervised by two PhD students: Kate Shved, learning science researcher, and Jade Mai Cock, an AI for education researcher.

Project 6: Job Market-based Course Recommendation

Most course recommendation system are not considering the job market when recommending courses. In this project we aim at improving our course recommendation system that uses reinforcement learning for sequential recommendation in order to maximise job market attractiveness.

The project will revolve around the following key objectives:
  • Axiomatic Approach: The idea of thinking is to is to seek a set of desirable properties expressed mathematically as formal constraints to guide the search for an optimal solution. In our case you would be employing axiomatic principles to enhance the robustness and reliability of functions we are using to compute the attractiveness of a learner on the job market.
  • Reinforcement Learning: Improve our current reinforcement learning approaches based on Q-learning and SARSA.
  • Evaluation: Our current methodology for evaluation focuses on the job market only but should be also be extended to courses themselves.
Project Schedule:
  1. Problem Comprehension and Existing Solution Analysis:
    Conduct a literature review to understand current models in job market-driven course recommendation systems.
    Understand our existing recommendation system.
  2. Solution Formulation: Propose an extension of our current solution based on the literature review and on the axiomatic approach.
  3. Solution Implementation: Implement the solution.
  4. Conference Presentation Preparation: Prepare to showcase the project findings and developments at a mid-2024 academic conference or workshop.
  • Required Skills: Proficiency in Python & foundational Machine Learning concepts.
  • Preferred Skills: Prior experience with Pytorch, familiarity with Recommender Systems, and a deep understanding of Reinforcement Learning methodologies.
This project is designed for a dedicated MSc student and will be conducted under the guidance of Postdoc Jibril Frej.

Project 7: Aligning RL Agents’ Policies with students’ diverse behaviors with Reward Design by LLMs

The focus of this semester project lies on the intersection of large language models (i.e. GPT-4, LLaMA) and Reward Design in Reinforcement Learning, towards models of students’ heterogenous fine-grained behavior. Such computational models offer significant potential in educational technology, serving purposes such as teacher training, designing simulated peers, and designing and testing pedagogical interventions. One of the promising avenues for achieving such computational models is training RL agents to take the role of the learners in educational environments that showcase behaviors aligned with different subtypes of students’ behavior.
In this project, we will begin by identifying different types of student behavior within a Scenario-Based Learning environment using time series clustering methods. Once these behavior types are identified, our goal is to experiment with various prompting strategies to encourage a large language model (LLM) like GPT-4 to act as a proxy reward function for the desired behavior. In the final step, we will assess the performance of this reward design method in representing the desired behavior, comparing it with other reward functions learned through supervised learning. Ultimately, our project aims to develop a range of reward functions that can guide reinforcement learning agents to exhibit behaviors similar to specific student behavior subtypes.

  • an interest in: Large Language Models, Reward Design in Reinforcement Learning
  • proficiency in: Python; NLP; Reinforcement Learning; Jupyter Notebooks
This semester project is aimed at one MSc student with strong technical skills. The student will be supervised by Bahar Radmehr (PhD).


Project 8: Data Augmentation in the Context of Multi-label Text Classification

In recent years, we have seen a dramatic increase in the rich data collected from technologically-enhanced learning environments. Machine learning (ML) can effectively analyze these data and inform the design of data-driven mechanisms to support learners, teachers, and other stakeholders. However, the process most often requires large labeled datasets, and labeling data is especially challenging in education. It usually requires experts’ involvement and demands high pedagogical knowledge and expertise.

Data augmentation is one of the ML approaches that is used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data or even from scratch.

In this project, we aim to study how to employ state-of-the-art Large Language Models (e.g. GPT4) and prompting strategies for data augmentation in the context of multi-label text classification. In agreement with the student, possible projects may involve the following:

  • Applying and comparing several text augmentation methods for multi-label classification.
  • Testing the validity of the algorithm in a real-life educational scenario.


  • an interest in: Large Language Models.
  • proficiency in: Python & foundational Machine Learning concepts; NLP; Jupyter Notebooks

This semester project is aimed at one MSc student with strong technical skills. The student will be supervised by Tatjana Nazaretsky (Postdoc).

Project 9: Learning Science vs. Machine Learning?

Is learning science necesary to build our algorithms ? Is machine learning informing learning science? Are both fields benefitting from the others?

In this project, you will focus on learning science and machine learning papers which have tried to predict/understand student success on different open ended learning environments (OELEs). For each of these OELEs you will:

  1. Build a “human-made decision tree” solely based on your learning science discovery
  2. Build a “human-made decision tree” solely based on machine learning discoveries
  3. Build a “human-made decision tree” based on both learning science and machine learning discoveries
  4. Re-Implement the better working machine learning algorithms

And compare the 4 algorithms to understand where learning science helps machine learning, and where machine learning helps learning science.

This work will require a deep dive into the literature on both the learning science and machine learning side, as well as solid analytical skills to build the decision trees in a smart and thought out way.

This semester project is aimed at one MSc student with strong technical skills. The student will be supervised by Jade Mai Cock (PhD).