On this page you can find our offerings for Master Projects and Semester Projects in the realm of data mining and machine learning for (vocational) education ecosystems.
Last update: 8th November 2022
How to apply
Please apply via our Student Project Application Form. You will need to specify which project(s) you are interested in, why you are interested and if you have any relevant experience in this area. To access the form, you need to log in with your EPFL email address. If you would like to receive more information on our research, do not hesitate to contact us! Students who are interested in doing a project are encouraged to have a look at the Thesis & Project Guidelines, where you will gain an understanding about what can be expected of us and what we expect from students.
We process applications in two rounds. We collect applications for projects for the first round until 02.12.2022. We will get back to you about your application in the first week after the deadline. If we do not get back to you during the indicated period, this means that we do unfortunately not have the capacity to supervise your project.
If you miss the deadline, you can still apply for projects that are vacant after the first round. We will leave the student project application form open for late applications until 21.01.2023. Applications submitted after the deadline will be reviewed on an on-going basis. However, we strongly recommend that you apply as quickly as possible as we expect many projects to be taken after the first round.
External students: Non EPFL students are kindly requested to get in touch with the supervisors of the project(s) by mail.
Early deadline (first round): 02.12.2022
First contact with supervisors: 05.12.2022 – 24.12.2022
Late deadline: 21.01.2023 (contact with supervisors on an on-going basis)
Please, note that the list of topics below is not exclusive. In case you have other ideas or proposals, you are welcome to contact a senior member of the lab and talk about possibilities for a tailor-made topic.
Project 1: Student Profile Creation for Data Augmentation PROJECT TAKEN
Interactive simulations can often be a helpful learning tool, providing a safe environment in which learners can freely explore and experiment without the dangers, risks and constraints of real-world setups. Unfortunately, they are quite complex to use for the non-experienced students who can get lost very easily. Therefore, in order to offer an optimal learning experience, adaptive guidance on those platforms is required. Implementing this requires to: 1) predict which students will struggle using the simulation, 2) understand what efficient and non-efficient strategies are, and 3) evaluate the effect of offering adaptive guidance in the classrooms.
In this project, you will contribute to the first part: predict which students will struggle using the simulation. Our datasets tend to have a limited amount of data, and the samples we have tend to have diverse prior background which makes it hard for the algorithm to learn general patterns. Indeed, depending on the knowledge of the students, limited interactions might mean the students are trying to confirm what they already know, or it might be a sign of complete confusion from the student side. To palliate this issue, you will mix learning theories principles with classic clustering algorithms in order to identify different student types. Then, you will use learning science knowledge to build “student prototypes” in the shape of probabilistic (likely markov) models which will be used to augment the data and create more representation of our dataset. These probabilistic models will be tested against “behaviour shuffling” to compare the effect on classification performances.
- Python (classic data science, data visualisation and machine learning libraries)
- Interest for learning science
This semester project is aimed at one MSc student or an ambitious BSc student, who will work with Jade-Mai Cock.
Project 2: Active Learning for Text Classification PROJECT TAKEN
In the field of education, small and unlabeled data sets are a challenge for supervised Machine Learning (ML) methods. In addition, labeling data is expensive and may require a high degree of knowledge. Fortunately, integrating teachers in the loop could address these issues.
Active learning (AL) has gained significant attention for improving Machine Learning (ML) models’ performance in situations where labeled data is not abundant. AL is a “human-in-the-loop” framework in which the model selects the unlabeled samples that should be labeled to increase the confidence of its predictions.
In this project, our aim is to study AL methods for multi-label text classification. In agreement with the student, possible projects may involve the following:
- Applying and comparing several AL methods for multi-label classification.
- Testing the validity of the algorithm in a real-life educational scenario.
- Previous experience with transformer-based models (Huggingface). The student should know the basics (https://huggingface.co/course/)
- Good programming skills (Python).
- Basic German is a plus!
This semester project is aimed at one MSc student, who will work with Jibril Frej and Paola Mejia.
Project 3: Multi-variate Time Series Clustering PROJECT TAKEN
Educational platforms (e.g., Duolingo) have gained popularity due to their accessibility and ability to support a large number of students. Learning analytics can provide insight into how the students are using these platforms. Understanding student behavior helps to improve the platform and provide a better learning experience.
In this project, our aim is to apply innovative methods of multi-variate time series clustering to identify student behavior. In particular, the student will have access to the data from a controlled experiment with a pre and post-test as well as pre- and post-qualitative survey. In agreement with the student, possible projects may involve the following:
- Exploring different techniques to manipulate multi-dimensional data including Self-Organizing Map (SOM).
- Use clickstream data to identify learning strategies (self-regulated learning).
- Good programming skills (Python).
- Having taken MLBD or taking it next semester is a plus!
- Basic German is a plus (but not necessary at all)!
This semester project is aimed at one MSc student, who will work with Paola Mejia.
Project 4: Robust Machine Learning for Counterfactual Explanations PROJECT TAKEN
In human-centric deep learning, providing explanations for why the prediction was made is particularly important. In previous work from the ML4ED lab, we’ve seen counterfactuals generated by contrastive and traditional methods choose a varied collection of features as important across a collection of data points. For each datapoint, the features chosen often reflect small decision boundary differences; a passing student could be a failing student with very minimal differences in feature values. This leads to the question — is it the model that has such an unstable decision boundary, or is it the explainability method?
Robustly training ML models has shown promising results in generating more distinct decision boundaries, improved generalizability, and clearer interpretations. The student will be examining the implications of robustly trained models for explanation generation (starting with counterfactuals and moving to other post-hoc methods like LIME or SHAP). In this project, the student will work towards answering a few questions: 1) What are the implications of training robust models in the space of ML for education? 2) Can these robustly trained models generate higher quality (less variable and more useful) explanations?
- an interest in: Neural Networks, Clickstream Analysis, Explainable AI, ML Robustness.
- proficiency in: Python; Machine Learning Algorithms; Jupyter Notebooks; PyTorch / Tensorflow.
This semester project is aimed at one MSc student with strong technical skills. The student will work with Vinitra Swamy.
Project 5: Designing Human-Centered and Intelligent Chatbots for Education PROJECT TAKEN
A promising way to support students to learn metacognition skills and enable teachers to convey it to classes of large sizes bears be the usage of adaptive technology-based applications in a student’s learning journey. Recent advances in Machine Learning and Natural Language Processing are a promising approach, e.g., to provide students with adaptive skill feedback during a writing task or with individual tutoring in the form of a dialogue-based chatbots.
In this project, students should build upon recent advantages in the field of Machine Learning and Natural Language Processing to conceptualize and design new learning innovations in the field of conversational agents (aka chatbots). The aim is to design new forms of conversational learning interaction and experiences to improve self-regulated learning of students. Hence, the task of this project will be to design, build and evaluate a novel chatbot to conduct reflective exercises at vocational schools in Switzerland. The objective would be to research the design and the effects of a chatbot for self-regulated learning on students’ reflection skills and their learning perception. By combing human-centered design and state-of-the-art modeling techniques, the student will closely work with the supervisor to solve a real-world educational problem. The student can rely on a set of already existing data, code, tools, and knowledge to work with the supervisor together on a novel learning tool for an impactful instantiation in a school.
- an interest in: Intelligent Writing Support Systems, Conversational Agents, Human-Computer Interaction, Natural Language Processing
- Preferred skills: Python, Data Analysis, Machine Learning, Human-Centered Design, Natural Language Processing
Python libraries for NLP and ML (e.g. spacy, NLTK), frameworks for web tool development, e.g., React
This semester project is aimed at one MSc student who will work with Thiemo Wambsgans.
Project 6: Explainable Course Recommendation via Knowledge Graph Reasoning PROJECT TAKEN
Equipping Recommender Systems with Knowledge Graphs and/or Reinforcement Learning agents is a common practice in order to increase their predictive performances and to propose more personalized recommendations. However, thanks to their structure and explicit relations between entities, KGs have also been used to enhance the explainability of RS predictions. Recently, new approaches to explainable recommendation via RL-based path reasoning on KGs have been explored in order to provide intuitive explanations to RS users.
The objective is to provide the first RL models for explainable course recommendation using skill/competence Knowledge Graph Reasoning. The uniqueness is at the intersection between the integration of skills (from the KG) and the application of RL and KGs for explainable recommendation. This project is the first step to the development of an explainable course path recommendation.
- Required Skills: Experience with Python
- Preferred Skills: Experience with Pytorch, Recommender Systems and Reinforcement Learning
- Bonus Skills: Chinese
This semester project is aimed at one MSc student, who will work with Jibril Frej