Student Projects – Spring 2022

We are offering a range of Master Projects and Semester Projects in data mining and machine learning for (vocational) education ecosystems.

To apply, you are requested to send an e-mail to the contact person, mentioning the topic of interest and attaching your grade transcripts. If you would like to receive more information on our research, do not hesitate to contact us!

Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic

Interactive Simulations

Image courtesy: University Colorado Boulder

Interactive simulations can often be a helpful learning tool, providing a safe environment in which learners can freely explore and experiment without the dangers, risks and constraints of real-world setups. When appropriately utilized by teachers and students, such tools can especially facilitate the understanding of concepts that are normally difficult to grasp. The instantaneous feedback provided, such as visualizations and animations, can support students’ comprehension by making mechanisms visible that otherwise would remain hidden. Rather than remembering a set of equations that might make no sense to them, they have the opportunities to integrate the different concepts at play in more depth.

The vocational training for many professions could benefit from the targeted use of interactive simulations and/or learning games. Unfortunately, inquiry learning, which is fostered on those types of platform, is notoriously hard. In order to help students be more efficient, we need to scaffold their interaction and prompt them with adapted hints to optimise their learning experience. To that effect, we first need to understand ourselves what types of interactions can lead to different learning outcomes: reinforcement of misconceptions, great learning outcomes, de-learning outcomes…

We also need to make sure that those interactions, if used to classify users, are similar across different groups of various demographics. This project in particular would put you in charge of mining strategies of different group of students, and link them to different learning outcomes. If you are interested in doing data mining on real life data to improve the learning experience of vocational students, please contact us!

Keywords:​ Interactive Simulations, Data Mining, Fairness

Preferred skills:​ Experience with programming, data science, machine learning and interest for education.

Useful tools:​ Libraries commonly used in data science (Jupyter Notebooks, Numpy, Pandas, ScikitLearn, Seaborn/Matplotil, etc.)

Level : for 1 Master student

Contact:​ Jade Cock ([email protected])

Digital game-based learning

Digital game-based learning innovates over traditional instructional approaches. Through gamification, these applications aim at conveying learning content in a more playful way, capitalizing on the idea that such approaches may have a positive impact on learning motivation and engagement. In this context, we use machine learning to analyze player traces (the time-series data generated by players’ click-streams) as they progress through the game. The end goal being education, we feed these traces to predictive models in order to correlate in-game behaviors with learning outcome. One possible application of such model is to identify players who struggle to understand the game’s concepts early on and trigger an in-game personalized intervention to address their gap in knowledge.

At this moment, the laboratory has exploitable datasets for two games that test people’s algorithmic and computational thinking skills. Both games are based on the idea of choice-based assessment. They let players make choices about how and when to learn, when to gather feedback, when to challenge their understanding, etc.. We then consider these choices as a proxy to players’ internal cognitive state and investigate their relationship with some metric of in-game or off-game performance. We currently propose a research track related to player trace encoding. The goal of student’s projects in this track is to propose, implement, and evaluate encodings of in-game behaviors that generalize well across game-based learning environments and yield good predictive performance when fed to machine learning models that can handle time-series data (e.g., recurrent neural networks).

For more details on the project, please do not hesitate to contact us!

Keywords: Digital Game-Based Learning; Player Trace Encodings; AI for Education; Time-Series

Preferred Skills: Proficiency in Python; Experience with Machine Learning Algorithms; Interest for Education

Useful Tools: Python libraries for data-science (Numpy, Tensorflow…, etc.)

Level : for 1 Master student

Contact: Lucas Ramirez ([email protected])

Topic recommender system

Image courtesy:

Educational recommender systems are supporting students in filtering out among all alternatives they have (e.g., courses, resources). The aim of the project is to recommend the next topic students should revise in an educational platform called Lernnavi (

Lernnavi is an instrument for promoting part of the basic study skills in mathematics used in high school or secondary school. It covers a wide range of topics in mathematics ranging from fractions to equations and functions. The master student’s task will be to create a recommender system that guides students in their navigation from topics. For example, if a student struggles with polynomial functions, the recommender system might suggest reading the theory or revising linear functions. The master student will (1) create a recommender system (possibly with reinforcement learning) (2) explain the agent recommendations (i.e., why did the agent suggest to revise the theory?).

Keywords: Recommender Systems, Reinforcement Learning, Education

Preferred Skills: Machine learning knowledge (reinforcement learning and recommender systems); Proficiency in data analysis with Python.

Useful Tools: Tensorflow RecSys, Jupyter Notebooks.

Level : for 1 Master student (Master thesis students encouraged to apply)

Bonus: basic German

Contact: Paola Mejia ([email protected])

Machine Learning for Massive Open Online Course Platforms

Image modified from: Coursera

Over the last few years, Massive Open Online Course (MOOC) platforms have been providing life-changing educational opportunities to millions of individuals, with unrestricted participation to any course of their choice, anytime, anywhere. Notable examples include Udemy, Coursera, edX, and Udacity. Top-tier universities, such as EPFL, are now offering a wide range of MOOCs as well, providing students with a way to learn in a setting similar to an online class, but with a loosely structured schedule and the opportunity of interacting with a large number of peers from around the world. However, scaling up education online towards these numbers is presenting core challenges, such as hardly manageable classes and overwhelming content alternatives.

The goal of students’ projects in this area is to analyze how individuals interact in MOOC platforms and provide them with timely supporting services. These activities are closely related with the lab research ranging from personalized recommendation (e.g., resources you might be interested in or peers who may have similar interests) to student’s dropout/success prediction aimed at anticipating whether and why a student will not complete (pass) a course, as examples. The data set includes millions of clickstreams records left by students from all over the world in hundreds of MOOCs offered by EPFL professors on platforms like CourseraedX, and Courseware. Both technical and pedagogical elements will be combined in student’s projects in this area, to relate clickstream behavior with numerical indicators pertaining to student’s learning aspects.

The models will be inspected the extent to which these indicators can serve as a predictive feature when fed into the related machine learning models and how this predictivity can transfer/generalize across courses and different types of classrooms, like flipped classrooms. The learning strategy for flipped classrooms requires students to follow pre-class learning activities, before meeting with the teacher and the other peers for an in-person discussion and assessment. For pre-class activities in flipped classrooms, teachers often provide students with videos and digital content by means of an online learning platform. We will also examine how engineered features for MOOCs can compare to latent generated features from autoencoders (representation learning), and what relevance this has to the transferability of the trained model embeddings. These models will be evaluated, discussed, paying attention to performance, interpretability, and impact.

For more details on the project, please do not hesitate to contact us!

Keywords: Neural Networks, Autoencoders, Representation Learning, Clickstream Analysis, Recommender Systems.

Preferred skills: Proficiency in Python; Experience with Machine Learning Algorithms.

Useful tools: Jupyter Notebooks, PyTorch, Scikit-Learn, Tensorflow, FastText.

Level : for 1 Master student

Contact: Vinitra Swamy ([email protected])

Responsible Machine Learning for Education

Image modified from Olga Kononok/Shutterstock

Educational technologies and platforms are increasingly integrating predictive models to provide data-driven support to students, instructors, and other educational stakeholders, with machine learning often being their core part of these predictive models. For instance, cognitive tutors help students mastery skills by adaptively providing them with learning materials, and student support systems assist or flag students based on how likely they may disengage, fail an exam, or experience certain affective states. Other applications include early dropout and course recommendations, as examples. As these predictive models play a relevant role in the educational landscape, it becomes essential to explore how they deal with beyond-accuracy aspects like ethics, fairness, transparency, explainability, and accountability. The laboratory is now interested in fairness and explainability in students’ success models for MOOC and flipped courses (other systems are under consideration).

  • Explainability Track. The goal of student’s projects in this track is to develop new neural network design paradigms related to making predictions of machine-learning models deployed in education more explainable and transparent, allowing teachers and students to better react to model’s decisions. To this end, several methods (e.g., LIME, SHAP, Shapley Values) have been investigated in the generic machine-learning domain to give explanations alongside predictions from black-box machine-learning models. Student projects will be devoted to extending these explainability techniques implemented the context of EPFL MOOC clickstream data to flipped classroom settings and exploring the design of explainability-oriented training objectives and loss functions for deep learning models tailored to the educational domain.

For more details on the project, please do not hesitate to contact us!

Keywords: Deep Learning, Natural Language Processing (NLP), Natural Language Generation (NLG), Interpretable AI, Fairness, Bias.

Preferred skills: Good Background in Statistics; Proficiency in Python; Experience with Machine Learning Algorithms; Proficiency in Natural Language Processing.

Useful tools: Jupyter Notebooks, Scikit-Learn, Tensorflow, PyTorch, SpaCy, NLTK, SQL.

Level : for 1 Master student

Contact: Vinitra Swamy ([email protected])