Student Projects ‒ ML4ED ‐ EPFL

On this page you can find our offerings for Master’s Projects and Master and Bachelor Research Projects in the realm of data mining and machine learning for (vocational) education ecosystems for the autumn semester 2024.

Last update: 05.07.2024

How to apply

Please apply via our student project application form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.

We will leave the student project application form open for applications until 10.08.2024. Applications will be reviewed on an on-going basis. We strongly recommend that you apply as quickly as possible.

External students: Non EPFL students are kindly requested to get in touch with the supervisors of the project(s) by e-mail.

Deadline: 10.08.2024 (contact with supervisors on an on-going basis)

Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.

Project Taken — Project 1: Fair Recommendation via Knowledge Graph Reasoning

Equipping Recommender Systems with Knowledge Graphs and/or Reinforcement Learning agents is a common practice in order to increase their predictive performances and to propose more personalized recommendations. However, thanks to their structure and explicit relations between entities, KGs have also been used to enhance the explainability of RS predictions. Recently, new approaches to explainable recommendation via RL-based path reasoning on KGs have been explored in order to provide intuitive explanations to RS users. The fairness of the recommendation of these graph reasoning-based models have been studied but not addressed yet.

At ML4ED, we are currently developing several explainable RS for courses based on Graph Reasoning and the objective of this project is to study the fairness of our approaches and propose solutions to make our system more fair. This project will focus on fairness related to demographic variables, mainly gender and ethnicity.

The schedule of the project will be the following:

Understanding the problem and existing solutions. “How to measure and study the fairness of recommender systems? How to address the lack of fairness of Graph Reasoning-based recommender systems?”
1. Literature review: reading relevant research papers
2. Getting familiar with our current solution
Propose an extension of our current solution based on the literature review
Implement solutions
Aim for a conference/workshop end 2024/early 2025

Requirements:

Required Skills: Python & Machine Learning
Preferred Skills: Experience with Pytorch, Recommender Systems and Reinforcement Learning

Level: Master

Supervision: Jibril Frej (Postdoc)

Project Taken — Project 2: Adapting LLMs for Explainable and Sequential Course Recommendation

Equipping Recommender Systems (RS) with Knowledge Graphs (KG) is a common practice to increase their predictive performances and to propose more personalized recommendations. Moreover, thanks to their structure and explicit relations between entities, KGs have also been used to enhance the explainability of RS predictions. Recently, new approaches to explainable recommendation via LLM and path reasoning on KGs have been explored in order to provide intuitive explanations to users. However, these approaches have not been adapted for sequential recommendations, where the system does consider the latest user’s interest to make recommendations that will fit their long term preferences.

At ML4ED, we have developed a sequential and explainable RS for courses based on Graph Reasoning and the objective of this project is to improve it with LLMs.

The schedule of the project will be the following:

Understanding the problem and existing solutions. “How were LLMs adapted for Graph Reasoning? How are current sequential systems working?”
1. Literature review: reading relevant research papers
2. Getting familiar with our current solution
Propose an extension of our current solution based on the literature to use LLMs
Implement solutions
Aim for a conference/workshop end 2024, early 2025

Requirements:

Required Skills: Python & Machine Learning
Preferred Skills: Experience with Pytorch, Recommender Systems, and LLMs

Level: Master

Supervision: Jibril Frej (Postdoc)

Project Taken — Project 3: Constraining Sequential Course Recommendations Systems with Prerequisites

At ML4ED, we have developed a sequential course recommendation system based on Graph Reasoning and the objective of this project is to improve our approach by including courses pre-requisites into the recommender system.

The schedule of the project will be the following:

Understanding the problem and existing solutions. “How does our current system works? How are prerequisites taken into account in course recommender systems?”
1. Literature review: reading relevant research papers
2. Getting familiar with our current solution
Propose an extension of our current solution based on the literature review
Implement solutions
Aim for a conference/workshop end 2024, early 2025

Requirements:

Required Skills: Python & Machine Learning
Preferred Skills: Experience with Pytorch, Recommender Systems, and LLMs

Level: Master

Supervision: Jibril Frej (Postdoc)

Project Taken — Project 4: Evaluating RL Explainability (XRL) Methods for RL-induced Pedagogical Policies

RL-induced pedagogical policies are increasingly being used in educational settings as intelligent tutors and personalized feedback and hint tools, playing key roles in student learning outcomes. However, many of these RL-based systems are based on black-box neural networks that are not transparent, making it difficult for stakeholders (instructors and learners) to understand the reasons behind decisions being made. Although several methods for explaining RL decisions have been developed and evaluated for various fields like autonomous driving and traffic control to enhance decision transparency, there have yet to be methods specifically designed and evaluated for explaining RL policies applied in educational environments.

This research project is focused on the intersection of RL, explainable AI, and learning sciences toward trustworthy explanations of RL-based policies. Specifically, we will evaluate and compare several XRL methods for explaining an RL-induced policy in an educational environment. We will adapt existing XRL methods for explaining black-box RL-induced pedagogical policies to our education use case. We will examine different aspects of explaining an RL policy (short term or long term behavior) by choosing various XRL methods. We will then compare these methods by designing qualitative and quantitative evaluation methods for an explanation that can be used across all the XRL methods.

Requirements

an interest in: Explainable AI, Reinforcement Learning
proficiency in: Python; Neural Networks; Jupyter Notebooks

Level: Master

Supervision: Bahar Radmehr (PhD student) and Vinitra Swamy (PhD student)

Project Taken — Project 5: Adaptive Reflection Assistance with Memory

In apprenticeships, learning happens simultaneously at the workplace and at school. One of the main challenges and tasks of apprentices is to connect the theoretical knowledge acquired at school with practical experiences from the workplace. Learning journals are thus meant to facilitate knowledge blending through reflection. However, such reflection is not spontaneous and learners need to be stimulated to report on (and deeply understand) their experiences. Previous approaches have experimented with static prompts to foster reflection.

In this project, we aim to create a “boundary-crossing” adaptive tool to help apprentices link theoretical knowledge with practical experiences by generating possible suggestions based on apprentices’ previous entries. One possible solution is to build two separate knowledge representations, one storing theoretical knowledge from previous entries and another one containing previous experiences. Then, we will create a recommender system or item retrieval pipeline that given a new entry, identifies previous entries that could allow the apprentice to make new connections between theory and practice. This project includes developing the retrieval pipeline and generating suggestions for a new entry as well as examining how to best visualize and communicate the suggestions to the apprentices.

Requirements:

ML and NLP knowledge (e.g., Transformers, LLMs such as BERT and GPT)
Bonus: Knowledge of front-end and back-end app development

Level: Master

Supervision: Paola Mejia (PhD student) and Seyed Parsa Neshaei (PhD student)

Project Taken — Project 6: Enhancing Machine Learning in Education: Can Generative AI Accurately Simulate the Full Spectrum of Student Knowledge Levels

In recent years, we have seen a dramatic increase in the rich data collected from technologically enhanced learning environments. Machine learning (ML) can effectively analyse these data and inform the design of data-driven mechanisms to support learners, teachers, and other stakeholders. However, the process most often requires large, labeled datasets, and labeling data in education often experts’ involvement.

Recent advances in Natural Language Processing (e.g., Large Language Models such as GPT4) allows generating an unlimited amount of label-conditioned data that hypothetically can be used for training corresponding ML models. Within the realm of education, this approach can be used to simulate the behavior of students. However, due to the intricate nature of student behaviors, knowledge patterns and various linguistic characteristics, this synthesized data frequently exhibits a distribution gap compared to the actual distribution of real-student data. As a result, ML models trained on such generated data suffer from low performance on real-student data.

In this project, we aim to investigate and evaluate various state-of-the-art NLP methods to bridge the gap between synthetic and real-student data. Specifically, we will create novel and apply state-of-the art methodologies to label-conditioned data synthesis and test the validity of the proposed methodology in a real-world educational scenario.

Requirements:

an interest in: Large Language Models and Generative Artificial Intelligence.
proficiency in Python & foundational ML and NLP principles.

Level: Master

Supervision: Tanya Nazaretsky (Postdoc)

Project Taken — Project 7: Using Large Language Models for Promoting Entrepreneurial Thinking and Acting Through Digital Games

Entrepreneurial acting and thinking are essential for young people who want to shape and advance tomorrow’s economy and society. Our game aims to train and teach students these critical skills by providing them with experiences on how to start businesses successfully, from idea generation through the systematic approach of evaluating the validity of the proposed ideas and their implementations. To reflect real-life business challenges related to uncertainty and decision-making under pressure, our game does not guide a player to the unique correct behavior. Instead, it allows them to experience failures, choose not optimal decisions, and face their consequences.

While effective, developing open-ended games requires a lot of human experts’ effort to design and implement many realistic, wrong, and sub-optimal scenarios and generalizing between different application fields is difficult. Recent advances in Natural Language Processing (NLP), in particular, Large Language Models (LLM) such as GPT4, have been used in various application domains for creative content generation. In this project, we aim to employ state-of-the-art LLM-based methodologies to enhance current interpreter game scenarios, generate rich textual content for additional predefined business ideas, and even personalize the gaming experience by on-the-fly generation of pedagogical content based on the ideas proposed by the students and decisions they made during the game.

Requirements:

an interest in Large Language Models and Generative Artificial Intelligence
proficiency in Python and foundational ML and NLP principles

Level: Master

Supervision: Tatjana Nazaretsky (Postdoc)

Project 8: Over/Under Confidence Prediction: A Multistep ML Pipeline

Some students tend to evaluate their ability to answer the questions incorrectly. For example, there are over-confident students who express high levels of confidence in their skills but then perform poorly on post-test. This overestimation of their competence can lead to a mismatch between their self-assessment and their actual performance. On the other hand, there are under-confident students who express low confidence in their abilities but perform better than they expect on the post-test. These students tend to underestimate their competence, sometimes leading to self-doubt or anxiety. That’s why it’s crucial to investigate how over- and under-confident students behave and predict being over- or under-confident from the behavioral markers to support students as early as possible.

The data for the project was collected in an experiment, in which we investigated how students in vocational education classrooms use open-ended learning environments (OELE). To that end, we guided students’ interactions by giving them a problem that indirectly required them to extract the mathematical models behind these environments. We then observed how they used inquiry principles and whether they were efficient in their investigations. To control what students have learned from this activity, after the exploration in the simulation, they were asked to solve a content post-test and mark how confident they felt about the answers.

In the current project, we will study the markers of over-/under-confident behavior mentioned in the literature that could be integrated into the interactive simulation context; and whether there is a generalizable set of features for over-under-confidence prediction across interactive environments. Finally, we will train a model to predict the under-/over-estimation of confidence. Some challenges to overcome will include the small size of the datasets and building of the back-to-back score-prediction-to-overconfidence-prediction pipeline. From the learning science point of view, a careful literature review should be conducted, and the behavioral markers that are found will have to be adapted to our context.

Requirements:

An interest in: Machine Learning, Learning Sciences
Proficiency in Python, Tensorflow, Scikit Learn, Pandas, foundational ML principles, and working with the scientific literature

Level: Master

Supervision: Ekaterina Shved (PhD student) and Jade Maï Cock (PhD student)