Student projects ‒ LIDIAP ‐ EPFL

You will find below a list of available bachelor semester projects, master’s semester projects, and a master’s thesis (PDM). If you are interested in a specific project, please get in touch with the person listed under “Contact”, mentioning in the subject of the email the title of the project.

For all these projets, you will receive a number of ECTS credits that depends on the type of project and the program. Working on these projects is not remunerated. The projects can be done in the Fall semester and the Spring semester.

You can also work on a non-credited, part-time, remunerated project as an Assistant. Working on these projects is subject to the EPFL rules on the maximum allowed weekly hours.

(If you are not an EPFL student, you can apply to the open internship at Idiap.)

Cognitive architecture for assistive robots
Robotics: Diffusing Elementary Dynamics Actions
Robotics: Ergodic Search for Stiffness Trajectories
Adapting foundational models in Radiology to small scale datasets
Computer aided diagnostic tool for characterizing poorly understood disease of small and medium arteries
Grading inflammatory eye diseases with longitudinal data and multi-task learning
Pseudo labeling of Sensitive Attributes in Medical Imaging for Demographically Fair Machine Learning
Towards Scalable and Interpretable Foundation Models for Sleep EEG and Polysomnography
Design of robotic teleoperation interface
Gaze-Guided Shared Autonomy for Object Manipulation
Human body tracking with signed distance fields
Deep learning for a portraitist robot application
Ergodic drawing for a robot manipulator
Text flourishing for a robot writer: a learning and optimization approach
Scaling Pre-training for Gaze Following
Spatio-temporal Modeling of Human Behavior in Videos
Research Spotlight: Crafting An Interactive Webpage Template for Showcasing Science
Person Head Detection
From Gaze Following to Joint Attention
Parametric Gaze Following
Pathological speech enhancement
Pathological speech detection in adverse environments
Automatic speech recognition of air-traffic communication using grammar
Error correction in speech recognition using large pre-trained language models
Speech/Music Classification
Speaker identification enhanced by the social network analyser
Automatic named entity recognition from speech
Ergodic control for robot exploration
Punctuation restoration on automatic speech recognition output
Social media and crowdsourcing for social good
Understanding the robustness of machine learning models on underspecified tasks
Audiovisual person recognition
Tensor trains for human-guided optimization in robotics applications
An open-source framework for the quantification of Urban Heat Islands in Switzerland
Automatic identification of flight information from speech
Understanding generalization in deep learning
Using large pretrained language models in speech recognition

Robotics: Diffusing Elementary Dynamics Actions

Description: The project will use a diffusion policy as a high-level model-predictive controller operating at a relatively low frequency. This controller will activate feedforward actions consistent with the theory of Elementary Dynamic Actions (EDA), inspired by principles of human motor control. Using this framework, the student will test hypotheses about the structure and organization of human motor control. In this project, the student will collect motion and force data during contact interactions (using an OptiTrack motion-capture system and a Bota Systems force-torque sensor), they will use Idiap’s high-performance computing (HPC) grid to train models, and they will evaluate the controller on a Franka robot.

Keywords: Robotics, human motor control, bio-inspiration, controls, diffusion, contact-rich manipulation

Type: 80% software, 20% hardware

Prerequisites: Python, experience with robotics, and a solid mathematical foundation. Relevant programs would include mechanical/electrical engineering, computer science, or related field.

Research program:

References:

Hermus, J., Doeringer, J., Sternad, D., & Hogan, N. (2020). Separating neural influences from peripheral mechanics: the speed-curvature relation in mechanically constrained actions. Journal of Neurophysiology, 123(5), 1870-1885. PMID: 32159419. [Link]
Chi, C., Xu, Z., Feng, S., Cousineau, E., Du, Y., Burchfiel, B., Tedrake, R., & Song, S. (2025). Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research, 44(10-11), 1684-1704. [Link]
Nah, M. C., Lachner, J., & Hogan, N. (2024). Robot control based on motor primitives: A comparison of two approaches. The International Journal of Robotics Research, 43(12), 1959-1991. [Link]

Level: master project (full-time) or semester project

Contact: James Hermus ([email protected])

Robotics: Ergodic Search for Stiffness Trajectories

Description: In many robotics applications, the stiffness of a controller is typically chosen by the controls engineer. This is partly because exploring the parameter space of robot stiffness is time-consuming, and exhaustive parameter sweeps are challenging due to long and variable system time constants. This work will investigate the potential of using ergodic control to generate trajectories of stiffness parameters in order to identify optimal stiffness and ensure stability for an unstable contact task.

Keywords: Robotics, optimization, ergodic control, variable impedance control, contact-rich manipulation, geometry

Type: 80% software, 20% hardware

Prerequisites: Python, experience with robotics, and a solid mathematical foundation. Relevant programs would include mechanical/electrical engineering, computer science, or related field.

Research program:

References:

Bilaloglu, C., Low, T., & Calinon, S. (2025). Tactile ergodic coverage on curved surfaces. IEEE Transactions on Robotics, 41, 1421–1435. [PDF Link]
Tessari, F., & Hogan, N. (2025). Muscle stiffness, not force, limits human force production. bioRxiv. [Preprint Link]

Level: Semester project, Master project (full-time)

Contact: James Hermus ([email protected])

Adapting foundational models in Radiology to small scale datasets

Adapting foundational models for knowledge transfer in medical imaging.

Description: Foundational models (FMs) resulting from the combination of large datasets and deep learning algorithms are now capable of extracting valuable insights from complex data, with great potential for integration into various domains, including healthcare [1]. However, before deploying these models in real-world clinical settings, and to demonstrate local clinical validity, it is crucial to identify data or concept drifts, newly emerging biases, and performance degradation in unseen datasets [2]. Performing an optimal fine-tuning of FMs on small to medium-sized datasets is an ongoing line of research that will facilitate the use and continuous updating of these models in real clinical settings [3]. Adapting FMs for radiologic applications highlights the necessity for standardized evaluation benchmarks [4]. Recent studies have shown that, due to the knowledge acquired during training on large datasets, fine-tuning these large models often requires a minimal amount of high quality data.

Goals:

Survey the state-of-the-art in foundation models (FMs) to develop an evaluation framework for fine-tuning approaches in medical imaging that are trustworthy and reproducible.
Develop and test adaptive fine-tuning methods that add lightweight modules to adjust learned features, enabling FMs to generalize effectively to unseen datasets.

Research program: AI for Life

Prerequisites:

Interest in working in the healthcare domain, and with medical doctors
Basics of machine learning and deep learning
Previous experience working in this domain is not mandatory
English is the main working language
Good command of python programming, machine learning, and deep learning libraries (e.g. Pytorch), and linux/shell programming

References:

[1] Sun K, Xue S, Sun F, et al. Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions. Published online December 3, 2024. doi:10.48550/arXiv.2412.02621
[2] Lekadir K, Frangi AF, Porras AR, et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ. 2025;388:e081554. doi:10.1136/bmj-2024-081554
[3] Dutt R, Ericsson L, Sanchez P, Tsaftaris SA, Hospedales T. Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity. Published online June 10, 2024. doi:10.48550/arXiv.2305.08252
[4] Paschali M, Chen Z, Blankemeier L, et al. Foundation Models in Radiology: What, How, Why, and Why Not. Radiology. 2025;314(2):e240597. doi:10.1148/radiol.240597

Level: Master project or Master thesis

Contact: André Anjos, [email protected]

Computer aided diagnostic tool for characterizing poorly understood disease of small and medium arteries

Exploiting foundational models to improve characterization in CT of poorly understood blood vessel disease.

Description: Fibromuscular dysplasia (FMD), is a rare and under-recognized disease of the blood vessels [1]. When poorly treated, the disease may cause renovascular hypertension, stroke, and cranial-nerve palsies [2]. Computed tomography angiography (CTA) has become a promising imaging technique to detect the disease and reconstruct the renal vascular system of the patients [3]. However, the lack of defined clinical diagnostic criteria is one of the main challenges in the diagnosis of FMD [4]. In particular, the subjective visual interpretation of CTA can lead to ‘perceptual errors’ during the diagnostic process, missing the detection of the disease. Medical image analysis of CTA could provide an automated and objective assessment of the renal arteries. Through the extraction of high-order complex features from patients’ CTA scans, clinicians could improve their understanding and characterization of the disease [7].

Goals:

Develop an interpretable computer-aided diagnostic tool for detecting FMD findings in renal arteries from CTA scans, in collaboration with Idiap Research Institute and Lausanne University Hospital (CHUV).
Perform semantic segmentation of renal arteries from 3D CTA scans using both standard tubular segmentation and advanced 3D deep learning methods. Classify FMD lesions across vessel regions by training machine learning models on segmented arteries, ensuring explainability and clinical interpretability throughout the process.

Research program: AI for Life

Prerequisites:

Interest in working in the healthcare domain, and with medical doctors
Basics of machine learning and deep learning
Previous experience working in this domain is not mandatory
English is the main working language
Good command of python programming, machine learning, and deep learning libraries (e.g. Pytorch), and linux/shell programming

References:

[1] Plouin, P. F., Baguet, J. P., Thony, F., Ormezzano, O., Azarine, A., Silhol, F., … & Arcadia Investigators. (2017). High prevalence of multiple arterial bed lesions in patients with fibromuscular dysplasia: the ARCADIA Registry (Assessment of Renal and Cervical Artery Dysplasia). Hypertension, 70(3), 652-658.
[2] Slovut, D. P., & Olin, J. W. (2004). Fibromuscular dysplasia. New England Journal of Medicine, 350(18), 1862-1871.
[3] Vasbinder, G. B. C., Nelemans, P. J., Kessels, A. G., Kroon, A. A., Maki, J. H., Leiner, T., … & Renal Artery Diagnostic Imaging Study in Hypertension (RADISH) Study Group. (2004). Accuracy of computed tomographic angiography and magnetic resonance angiography for diagnosing renal artery stenosis. Annals of internal medicine, 141(9), 674-682.
[4] Gornik, H. L., Persu, A., Adlam, D., Aparicio, L. S., Azizi, M., Boulanger, M., … & Plouin, P. F. (2019). First international consensus on the diagnosis and management of fibromuscular dysplasia. Vascular medicine, 24(2), 164-189.
[5] Wittmann, B., Wattenberg, Y., Amiranashvili, T., Shit, S., & Menze, B. (2025). vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 20874-20884).

Level: Master project or Master thesis

Contact: André Anjos, [email protected]

Grading inflammatory eye diseases with longitudinal data and multi-task learning

Multi-task learning for the automated grading of retinal inflammation in dynamic imaging.

Description: Uveitis is one of the leading causes of blindness in industrialized countries [1]. It is a group of complex and sight-threatening inflammatory disorders affecting the eye with profound implications for patients [2]. Fluorescein angiography (FA) is the gold standard to assess eye inflammation as it is the only clinical method that allows the evaluation of the function and integrity of the blood-retinal barrier [3]. The interpretation of FA requires years of clinical experience as it is intrinsically challenging (dynamic imaging) [4]. Although a standard grading scale for retinal inflammation exists, it is not routinely implemented due to its complexity and number of findings. An standardized and automatic assessment of retinal inflammation on FA could be useful for personalized medicine and to predict treatment response [5].

Goals:

Design and validate multi-task machine learning models for the automatic detection and continuous grading of retinal inflammation from FA images and video.
Identify and assess novel imaging biomarkers of inflammatory eye diseases using objective and interpretable features to improve diagnosis and patient management.

Research program: AI for Life

Prerequisites:

Interest in working in the healthcare domain, and with medical doctors
Basics of machine learning and deep learning
Previous experience working in this domain is not mandatory
English is the main working language
Good command of python programming, machine learning, and deep learning libraries (e.g. Pytorch), and linux/shell programming

References:

[1] O. M. Durrani, N. N. Tehrani, J. E. Marr, P. Moradi, P. Stavrou, and P. I. Murray, ‘Degree, duration, and causes of visual loss in uveitis’, Br. J. Ophthalmol., vol. 88, no. 9, pp. 1159–1162, Sep. 2004, doi: 10.1136/bjo.2003.037226.
[2] T. Tsirouki et al., ‘A Focus on the Epidemiology of Uveitis’, Ocul. Immunol. Inflamm., vol. 26, no. 1, pp. 2–16, Jan. 2018, doi: 10.1080/09273948.2016.1196713.
[3] A. D. Dick et al., ‘Guidance on Noncorticosteroid Systemic Immunomodulatory Therapy in Noninfectious Uveitis: Fundamentals Of Care for UveitiS (FOCUS) Initiative’, Ophthalmology, vol. 125, no. 5, pp.
[4] Spaide, R. F., Klancnik, J. M., & Cooney, M. J. (2015). Retinal vascular layers imaged by fluorescein angiography and optical coherence tomography angiography. JAMA ophthalmology, 133(1), 45-50.
[5] Amiot, V., Jimenez–del–Toro, O., Guex–Crosier, Y., Ott, M., Bogaciu, T. E., Banerjee, S., … & Anjos, A. (2025). Automatic transformer-based grading of multiple retinal inflammatory signs in uveitis on fluorescein angiography. Computers in biology and medicine, 193, 110327.

Level: Master project or Master thesis

Contact: André Anjos, [email protected]

Pseudo labeling of Sensitive Attributes in Medical Imaging for Demographically Fair Machine Learning

Pseudo labeling of Sensitive Attributes in Medical Imaging for Demographically Fair Machine Learning.

Description: Demographic fairness is essential for ensuring equitable Machine Learning (ML) applications that provide equal access to all communities, regardless of race, gender or ethnicity. In contrast, biased ML systems that may prioritize privileged groups can lead to significant injustices, neglecting underrepresented communities. This issue is particularly critical in domains such as healthcare, where ML decisions directly impact human lives. A key solution is ensuring access to diverse and representative datasets that include demographic attributes.

Data scarcity in healthcare is a significant obstacle to the widespread development and deployment of fair ML systems. The two key sources of this problem, among many, are the limited access to healthcare services in underdeveloped regions, such as parts of South Africa, and the high cost of labor for data labeling efforts, as seen in countries like Switzerland. These limitations result in limited data availability with proper annotations and present a difficulty in developing fair ML systems. Considering how valuable data is in healthcare, we should utilize past and present data sources to make them have sensitive attributes if they did not before.

Goals: In this study, the main purpose is to develop ML models that perform pseudo-labeling of sensitive attributes in medical imaging datasets. Specifically, it is expected to train ML systems that map medical images to associated attributes such as gender, age, or other sensitive attributes so that resulting models can then be utilized for pseudo-labeling in medical imaging datasets that lack such demographic information. To achieve this, the study focuses on two main tasks, defined as follows:

Train ML models on medical imaging datasets that include sensitive attributes, such as gender, race or age, to learn the mapping between data and labels. Perform in- and cross-dataset experimentation to validate the efficiency of pseudo-labeling. Existing studies have explored age estimation from chest X-ray images [1], [2], [3], and these can serve as guidelines for developing pseudo-label prediction models.
Utilize the ML models developed in Task 1 to pseudo-label medical imaging datasets that lack sensitive attribute annotations. Assess the robustness of these pseudo-labels by developing fair ML systems that incorporate demographic information. Bias mitigation techniques, including pre-processing [4] and in-processing [5] approaches, can be leveraged to integrate pseudo-labeling into the model development process.

Research Program: AI for Everyone

Prerequisites:

Interest in working in the healthcare domain, and with medical doctors
Basics of machine learning and deep learning
Previous experience working in this domain is not mandatory
English is the main working language
Good command of python programming, machine learning, and deep learning libraries (e.g. Pytorch), and linux/shell programming

References:

[1] Ieki, H., Ito, K., Saji, M., Kawakami, R., Nagatomo, Y., Takada, K., … & Komuro, I. (2022). Deep learning-based age estimation from chest X-rays indicates cardiovascular prognosis. Communications Medicine, 2(1), 159.
[2] Mitsuyama, Y., Matsumoto, T., Tatekawa, H., Walston, S. L., Kimura, T., Yamamoto, A., … & Ueda, D. (2023). Chest radiography as a biomarker of ageing: artificial intelligence-based, multi-institutional model development and validation in Japan. The Lancet Healthy Longevity, 4(9), e478-e486.
[3] Azarfar, G., Ko, S. B., Adams, S. J., & Babyn, P. S. (2024). Deep learning-based age estimation from chest CT scans. International Journal of Computer Assisted Radiology and Surgery, 19(1), 119-127.
[4] Chai, J., & Wang, X. (2022). Fairness with adaptive weights. In International conference on machine learning (pp. 2853-2866). PMLR.
[5] Amini, A., Soleimany, A. P., Schwarting, W., Bhatia, S. N., & Rus, D. (2019). Uncovering and mitigating algorithmic bias through learned latent structure. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 289-295).

Level: Master project or Master thesis

Contact: André Anjos, [email protected]

Towards Scalable and Interpretable Foundation Models for Sleep EEG and Polysomnography

Towards Scalable and Interpretable Foundation Models for physiologicals signals recorded during human sleep.

Description: Sleep recording use neurophysiological signals such as EEG, EMG, ECG or respiratory signals. They are widely used in neuroscience, clinical diagnosis, and brain-computer interface (BCI) applications. Labeled datasets, especially for EEG, are scarce and expensive to collect at scale. Self-supervised learning (SSL) offers a powerful alternative by leveraging unlabeled data to pretrain models that can generalize across subjects, cohorts, and electrode setups.

This project explores contrastive learning, masked prediction, and transfer learning techniques to build robust physiological signals encoders. Using large-scale public and anonymized hospital datasets, the student will train self-supervised models and evaluate their transferability across datasets and clinical downstream tasks (sleep stages prediction, neurological disease prediction).

Goals:

Develop self-supervised pretraining strategies for polysomnographic and EEG data.
Integrate architectural modules to capture inter-channel dynamics.
Fine-tune pretrained encoders on supervised tasks such as sleep staging and seizure detection.
Evaluate cross-dataset generalization and zero-shot performance.

Research program: AI for Life

Prerequisites:

Interest in working in the healthcare domain, and with medical doctors
Basics of machine learning and deep learning
Previous experience working in this domain is not mandatory
English is the main working language
Good command of python programming, machine learning, and deep learning libraries (e.g. Pytorch), and linux/shell programming

References:

[1] Lawhern, V. J., et al. (2018). EEGNet: A compact convolutional neural network for EEG-based brain-computer interfaces. Journal of Neural Engineering, 15(5), 056013. https://doi.org/10.1088/1741-2552/aace8c
[2] Craik, A., He, Y., & Contreras-Vidal, J. L. (2019). Deep learning for EEG classification tasks: A review. Journal of Neural Engineering, 16(3), 031001. https://doi.org/10.1088/1741-2552/ab0ab5
[3] Schirrmeister, R. T., et al. (2017). Deep learning with convolutional neural networks for EEG decoding and visualization. Human Brain Mapping, 38(11), 5391–5420. https://doi.org/10.1002/hbm.23730
[4] Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
[5] Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15. https://doi.org/10.1016/j.dsp.2017.10.011
[6] Thapa, R., et al. (2025). A Multimodal Sleep Foundation Model Developed with 500K Hours of Sleep Recordings for Disease Predictions. https://doi.org/10.1101/2025.02.04.25321675

Level: Master project or Master thesis

Contact: André Anjos, [email protected]

Design of robotic teleoperation interface

Description: This project aims to design a teleoperation interface for the Lio robot (https://www.fp-robotics.com/products/lio/) to allow its operation on the fly. This interface should balance ease of use and expressiveness (what functions it allows to express). The goal of this interface will be two folds: being able to easily demonstrate Lio capabilities in demo to the public, but also allow to teleoperate Lio to conduct design studies with caregivers and potential users in the context of assistive robotics.
The project will involve implementing the robot functions in the back-end as well as the web frontend to run on a tablet or a phone. If time permits, the system will be tested in both simulated and real-world settings.

Goals:

Develop back end functions for Lio: navigation (both to set points and differential), manipulation (both to poses and differential-6D), speech
Implement frontend web interface and connect it to the backend.
Implement frontend functions: navigation, manipulation, visualization, speech, animation…
(For Master-level projects) Conduct a user study to evaluate the expressiveness and ease of use of the interface

Research Program: Human-AI Teaming

Prerequisites:

Proficiency in Python and one web languages
Experience with ROS 2 is a plus
Familiarity with Linux systems
Experience in robotics (e.g., control, inverse kinematics, perception) is a plus

Level: Bachelor / Master

Contact: Emmanuel Senft, [email protected]

Gaze-Guided Shared Autonomy for Object Manipulation

Description: This project aims to enhance shared autonomy in object manipulation by incorporating eye gaze as an additional source of user intent. Building on an existing joystick-based shared autonomy framework that allows users to switch robot trajectories through correction plane projections, this project will investigate how gaze direction can be used to predict intent earlier and more accurately. The integration of gaze is especially valuable in cluttered environments, where joystick input alone may not be sufficient to infer the user’s desired target.
The project will involve implementing gaze tracking, fusing gaze data with joystick-based correction input, and evaluating the effectiveness of this multimodal intent prediction. If time permits, the system will be tested in both simulated and real-world settings.

Goals:

Integrate a gaze tracking system into an existing shared autonomy framework.
Develop a method to fuse gaze direction with joystick input to enhance intent prediction.
Adapt the trajectory switching logic to respond to multimodal intent signals.
Evaluate system performance in simulation and (if possible) on a real robot.
(For Master-level projects) Conduct a user study comparing joystick-only vs gaze-enhanced shared autonomy.

Research Program: Human-AI Teaming

Prerequisites:

Proficiency in Python
Experience with ROS 2
Familiarity with Linux systems
Experience in robotics (e.g., control, inverse kinematics, perception) is a plus
Basic understanding of gaze tracking or HRI

Level: Bachelor / Master

Contact: Emmanuel Senft, [email protected]

Human body tracking with signed distance fields

Description: Signed distance fields (SDFs) are popular implicit shape representations in robotics. Most often, SDFs are used to represent rigid objects. However, they can also be used to represent general kinematic chains, such as articulated objects, robots, or humans. SDFs provide a continuous and differentiable representation that can easily be combined with learning, control, and optimization techniques. This project aims to explore the SDF representation of the human body based on state-of-the-art detection, tracking, and skeleton extraction techniques. The developed SDF representation can be used for human-robot interaction or transferring manipulation skills from humans to robots.

Goals: The human skeleton can be detected and tracked through images or videos using pre-trained vision models, and SDFs can be reconstructed by leveraging the SMPL-X model, a realistic 3D model for the human body based on skinning and blend shapes. This project proposes to utilize these techniques to build the SDF for the human body and then apply it to robot manipulation tasks.

Research Program: Human-AI Teaming

Prerequisites: Machine learning, computer vision, programming in Python or C++

References:

Li, Y., Zhang, Y., Razmjoo, A. and Calinon, S. (2024). Representing Robot Geometry as Distance Fields: Applications to Whole-body Manipulation. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pp. 15351-15357
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black (2019). Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In CVPR

Level: Bachelor/Master (semester project or PDM)

Contact: Sylvain Calinon, [email protected]

Deep learning for a portraitist robot application

Description: Text-driven generation of caricatures in the form of drawing strokes This project aims to explore the use of generative deep learning techniques based on image diffusion for a robot portrait drawing application.

Goals:

Most of the generative deep learning techniques use images as formats, but a few explored the use of vector graphics as output format, guided by text prompts for the rendering. This project will investigate the use and comparison of these techniques in the context of a robot portrait drawing application. The project will be implemented with a 6-axis UFactory Lite-6 robot available at Idiap.

Research Program: Human-AI Teaming

Prerequisites: Deep learning, programming in Python or C++

References:

SVGDreamer: Text Guided SVG Generation with Diffusion Model
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
SVG Differentiable Rendering: Generating vector graphics using neural networks

Level: Bachelor/Master (semester project or PDM)

Contact: Sylvain Calinon, [email protected]

Ergodic drawing for a robot manipulator

Description: This project aims to generate trajectories for a drawing robot by using the principle of ergodicity.

Goals: An optimal control approach combining an iterative linear quadratic regulator (iLQR) and a cost on ergodicity will be investigated for trajectory optimization. The project will also investigate the use of electrostatic halftoning or repulsive curves as initialization process.
The project will be implemented with a 6-axis UFactory Lite-6 robot available at Idiap.

Research Program: Human-AI Teaming

Prerequisites: Linear algebra, programming in Python or C++

References:

Löw, T., Maceiras, J. and Calinon, S. (2022). drozBot: Using Ergodic Control to Draw Portraits. IEEE Robotics and Automation Letters (RA-L), 7:4, 11728-34
Calinon, S. (2023). Learning and Optimization in Robotics – Lecture notes (Section 9 on ergodic control)
Robotics codes from scratch (RCFS)
drozBot, the portraitist robot

Level: Bachelor/Master (semester project or PDM)

Contact: Sylvain Calinon, [email protected]

Text flourishing for a robot writer: a learning and optimization approach

Description: This project aims to generate trajectories for a robot to embellish text in an automatic manner.

Goals: An optimal control approach based on iterative linear quadratic regulator (iLQR) will be investigated for trajectory optimization. First, the problem will be approached by designing an algorithm for automatically placing a set of ellipses above and below the words to be flourished, by considering the empty spaces available based on the surrounding texts. A path optimization algorithm will then be created to generate movements by using the ellipses as guides (possibly formulated as virtual mass and gravitational forces). The objectives will be designed by transforming aesthetic guidelines for artists into a set of cost functions that can be used in optimal control. The project will be implemented with a 6-axis UFactory Lite-6 robot available at Idiap.

Research Program: Human-AI Teaming

Prerequisites: Linear algebra, programming in Python or C++

References:

Calinon, S. (2023). Learning and Optimization in Robotics – Lecture notes
Robotics codes from scratch (RCFS)

Level: Bachelor/Master (semester project or PDM)

Contact: Sylvain Calinon, [email protected]

Cognitive architecture for assistive robots

Description: Controlling a robot for human interactions is a challenge. For decades, multiple research groups developed cognitive architecture to allow robots to achieve higher-level interaction with people. In this project, you will be working on the Lio robot and integrate it to the DIARC architecture from the HRILab in Tufts.

Goals:

Getting familiar with DIARC
Creating a Lio agent profile
Connecting DIARC to Ros
Developing human-robot interaction demos

Research Program: Human-AI Teaming

Prerequisites: Good command of Java/Python and basics of Linux. Experience in robotics/ROS is a plus.

Level: Bachelor/ Master

Contact: Emmanuel Senft, [email protected], Jean-Marc Odobez, [email protected]

Scaling Pre-training for Gaze Following

Description: Understanding where a person is looking, or gaze following, is vital for a variety of applications including autonomous driving, human-computer interaction and medical diagnosis. Existing models for gaze following are typically trained in a supervised manner on small, manually annotated datasets. The goal of the project is to perform pre-training on large video datasets by leveraging pseudo annotations from strong gaze following models. We also aim to investigate incorporating weak supervision from auxiliary labels to enhance the learned representations.

Goals:

After curating diverse video datasets, generate pseudo-annotations for the curated dataset by leveraging strong gaze following models such as [Tafasca et al., 2023]
Use this dataset to pre-train gaze following models on the pseudo annotations
Fine-tune and evaluate the pre-trained gaze following models on annotated video datasets

Research Program: AI for Life

Prerequisites: The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses.

References:

[Tafasca et al, 2023] Samy Tafasca, Anshul Gupta and Jean-Marc Odobez. (2023). ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

Level: Master: semester or master project. Can be done by multiple students

Contact: Dr. Jean-Marc Odobez, [email protected]

Spatio-temporal Modeling of Human Behavior in Videos

Description: In this project, the primary objective is to explore various spatio-temporal models to derive effective video representations for tasks related to facial behavior, such as head gesture and facial expression recognition. These tasks necessitate rich spatio-temporal representations, yet current methods mostly rely on hand-crafted features. Thus, in this project, the goal is to utilize video encoders in an end-to-end manner that extract effective spatio-temporal features as input to the facial-related task heads.
Furthermore, various facial behavior tasks can be jointly learned through weakly-supervised learning. Thus, in this project, there is potential to develop a method that trains these tasks jointly using pseudo-annotations extracted for the videos.

Goals:

Extract a facial tracking tool to extract facial clips
Implement spatio-temporal models and fine-tune
Evaluate the models on human behavior benchmarks, such as CelebV-HQ, CMU-MOSEI, MEAD, CCDb-HG, etc

Research Program: AI for Life

Prerequisites:

Proficiency in the Python programming language
Familiarity with deep learning and the PyTorch library
Knowledge of computer vision would be advantageous
A passion for modeling real-world problems and understanding human behavior!

References:

Head gesture recognition demo
Video representation for human behavior: Marlin: https://arxiv.org/abs/2211.06627

Level: semester research project (master), master project (PDM)

Contact: Dr. Jean-Marc Odobez, [email protected]

Research Spotlight: Crafting An Interactive Webpage Template for Showcasing Science

Description: In today’s world, simply getting your research papers published in journals or conferences is no longer sufficient for a scientist. The emphasis has shifted towards promoting and sharing one’s work via social media, websites, and interactive demonstrations. With this in mind, our project aims to create a versatile webpage template tailored for showcasing research papers effectively. This template will prioritize aesthetics, organized sections, and the ability to incorporate various types of content, such as text, images, videos, and interactive elements like sliders for visualizing how results change with specific parameters. We will investigate the adaptation of existing templates and the development of novel components designed for interactive model demonstrations.

Goals:

Develop a generic feature-rich webpage template for showcasing research papers
Experiment with tools to demonstrate machine learning models interactively (e.g. Streamlit, Gradio) and evaluate their potential integration in the template

Research Program: AI for Life

Prerequisites: Proficiency in web development is required.

References:

MultiMAE | Multi-modal Multi-task Masked Autoencoders – https://multimae.epfl.ch/
Gradio – https://www.gradio.app/

Type: Semester Project (Research Project)

Level: Bachelor or Master

Contact: Dr. Jean-Marc Odobez, [email protected]

Person Head Detection

Description: Numerous models are readily available for face detection, but there’s a noticeable gap when it comes to detecting the entire head of a person, which is crucial for certain applications. While many applications focus solely on identifying faces, some rely on information from the entire head. Examples of such applications include head pose estimation and gaze tracking, which is the specific focus of this project.
The primary objective is to curate a collection of publicly accessible datasets containing annotations for heads and use them to train a cutting-edge object detection model. This model will be designed to accurately locate people’s heads within images. Additionally, we will explore various augmentation techniques to enhance its performance. It’s important to highlight that the resulting model is expected to cater to a broad spectrum of users across diverse applications.

Goals:

Compile multiple available datasets containing head annotations and convert their labels to a unified format
Train and evaluate an object detector (e.g. Yolov8, DETR), and experiment with data augmentation strategies to maximize performance (especially for extreme head poses)
Package the model and inference pipeline in a clean and modular codebase that makes it easy for the end-user to run

Research Program: AI for Life

Prerequisites: Proficiency in Python programming (including the standard scientific libraries, e.g. numpy, pandas) is expected. Knowledge of deep learning and the PyTorch framework is desired, but not required.

References:

Shao, Shuai, et al. “Crowdhuman: A benchmark for detecting human in a crowd.” arXiv preprint arXiv:1805.00123 (2018).
Terven, Juan, and Diana Cordova-Esparza. “A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond.” arXiv preprint arXiv:2304.00501 (2023).
Carion, Nicolas, et al. “End-to-end object detection with transformers.” European conference on computer vision. Cham: Springer International Publishing, 2020.

Type: Semester Project (Research Project)

Level: Bachelor or Master

Contact: Dr. Jean-Marc Odobez, [email protected]

From Gaze Following to Joint Attention

Description: Gaze is an important marker for non-verbal communication that is indicative of a person’s visual attention. It is also a proxy measure of cognition and can be used to evaluate a subject’s state of mind, intentions, and preferences among other things. As such, it has attracted high interest over the years from different research communities ranging from psychology to neuroscience.
Here, we are specifically interested in the gaze following task, defined as the prediction of the 2D pixel location where a person in the image is looking. It may also involve predicting a binary flag that indicates whether the subject is looking inside or outside the image frame. The first step of this project is to train and evaluate a state-of-the-art transformer-based gaze following model on a new and challenging dataset. The idea is to evaluate not only gaze following performance using standard metrics but also joint attention between people using a post-processing approach based on the predicted gaze points. In the second stage, we will look to extend the network architecture to predict joint attention in an end-to-end manner. The resulting model will serve to pseudo-annotate a large-scale video dataset to highlight potential segments of joint attention for sampling and further manual annotation.
This work is part of a collaboration with the Language Acquisition and Diversity Lab of the University of Zurich, headed by Prof. Suzanne Stoll.

Goals:

Train a gaze-following model on a new dataset and evaluate gaze-following and joint attention performance
Extend the architecture to predict both the gaze point and joint attention simultaneously and compare with the baseline

Research Program: AI for Life

Prerequisites: The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses and projects.

References:

Tafasca, Samy, Anshul Gupta, and Jean-Marc Odobez. “Sharingan: A Transformer-based Architecture for Gaze Following.” arXiv preprint arXiv:2310.00816 (2023)
Fan, Lifeng, et al. “Inferring shared attention in social scene videos.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

Type: Semester Project (Research Project)

Level: Bachelor or Master

Contact: Dr. Jean-Marc Odobez, [email protected]

Parametric Gaze Following

Description: The gaze following task in computer vision is defined as the prediction of the 2D coordinates where a person in an image is looking. Previous research efforts cast the problem as a heatmap prediction and consider the point of maximum intensity to be the predicted gaze point. This formulation has the benefit of enabling the model to highlight different potential gaze targets when the scene does not contain enough information to be conclusive. However, aside from the argmax, it is relatively difficult to leverage such heatmaps to automatically retrieve more information about the distribution they represent (e.g. the different modes, the weight and variance of each mode, etc.). The goal of this project is to explore a different formulation of the gaze following task where we predict a parametric probability distribution instead of heatmap pixels. Preliminary experiments in this direction have shown promising results.

Goals:

Investigate ideas to cast gaze-following as the prediction of a parametric probability distribution (e.g. Mixture of Gaussians) instead of a heatmap
Propose new performance metrics capturing more information about the distribution compared to point-based metrics

Research Program: AI for Life

Prerequisites: The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses and projects

References:

Tafasca, Samy, Anshul Gupta, and Jean-Marc Odobez. “ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
Chong, Eunji, et al. “Detecting attended visual targets in video.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

Type: Master’s Thesis (Master’s Project)

Level: Master

Contact: Dr. Jean-Marc Odobez, [email protected]

Pathological speech enhancement

Description: Speech signals recorded in an enclosed space by microphones placed at a distance from the source are often corrupted by reverberation and background noise, which degrade speech quality, impair speech intelligibility, and decrease the performance of automatic speech recognition systems. Speech enhancement approaches to mitigate these effects have been devised for neurotypical speakers, i.e., speakers without any speech impairments. However, pathological conditions such as hearing loss, head and neck cancers, or neurological disorders, disrupt the speech production mechanism, resulting in speech impairments across different dimensions. This project will contribute to our efforts to understand the performance of state-of-the-art approaches for pathological signals and develop appropriate approaches targeting pathological speech.

Goals:

Set up datasets of interest
Implement existing approaches and/or get familiar with existing implementations
Examine the performance of various approaches for pathological speech signals
If relevant and time permits, develop novel approaches targeting pathological speech

Research Program: AI for Life

Prerequisites: Python programming; basic knowledge of machine learning

Level: Bachelor/Master

Contact: Ina Kodrasi, [email protected]

Pathological speech detection in adverse environments

Description: Various conditions of brain damage may disrupt the speech production mechanism, resulting in motor speech disorders that encapsulate altered speech production in different dimensions. To diagnose motor speech disorders, we have developed automatic speech processing approaches. Such approaches however can fail to cope with realistic clinical constraints, i.e., the presence of noise and reverberation when recording speech in clinical settings. This project will contribute to the efforts made in our group to understand the performance of state-of-the-art approaches in adverse environments and develop appropriate approaches targeting such scenarios.

Goals:

Set up datasets of interest
Implement existing approaches and/or get familiar with existing implementations
Examine the performance of various approaches in adverse environments
If relevant and time permits, develop novel approaches targeting adverse scenarios

Research Program: AI for Life

Prerequisites: Python programming; basic knowledge of machine learning

Level: Bachelor/Master

Contact: Ina Kodrasi, [email protected]

Automatic speech recognition of air-traffic communication using grammar

Description: Current state-of-the-art speech-to-text systems (i.e. automatic speech recognition engines (ASR)) applied to air-traffic control exploit statistical language models which require large amounts of textual data for training. Nevertheless, the Air Traffic Controller Officers (ATCOs) are required to strictly follow the phraseology (i.e. standardised International Civil Aviation Organization, ICAO) and thus context-free grammar (CFG) can be used to model sequences of words generated by ATCOs. The goal of this project is to explore new ways how traditional concepts of statistical language modeling can be enriched by standardised phraseology (i.e. modeled by CFG-based language modeling).

Goals:

Develop a baseline automatic speech recognition engine in Kaldi framework suited for air-traffic controllers
Explore use of CFG-based language model in ASR allowing to model sequences of words (i.e. replacing the statistical language model or enriching them)
Compare the performance of new language model on ASR tasks

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

References:

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Error correction in speech recognition using large pre-trained language models

Description: The aim of this work will be to find out if it is possible to use these language models for the correction of errors in the transcription of spoken speech. The student will run some standard speech set through one or more publicly available speech transcription models and then investigate how the language models are able to correct errors: Does the overall error rate matter? Are there any classes of errors that are better fixed? Is it better to use a traditional language model (e.g. LLaMA) or a conversational one (e.g. Alpaka)?

Goals:

Familiarize with speech recognition engines, available at Idiap
Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
Explore large language models for its deployment to post-process speech recognition output

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Speech/Music Classification

Description: Classifying sound into speech, music and possibly noise is important for systems based on statistical modeling. Statistical models are usually trained on a large database of input signals containing various sounds. In both the training process and the testing process it is advantageous to exclude segments containing non-speech sounds to improve the accuracy of the model. This project will develop a classifier discriminating speech from music and potentially also from noise. You will first analyze existing approaches to speech/music classification and evaluate their efficiency and accuracy using conventional metrics for binary classification. You will then propose your own classifier or improve an existing one.

Goals:

Familiarize with voice activity detectors, or existing speech/music detectors available publicly or at Idiap
Develop a new speech/music classifier
Evaluate the new technology with baseline on well-established data

Research Program: AI for Life

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

References:

Banriskhem K.Khonglah: Speech / music classification using speech-specific features, Digital Signal Processing, Volume 48, January 2016, Pages 71-83
Mrinmoy Bhattacharjee: Time-Frequency Audio Features for Speech-Music Classification
Toni Hirvonen: Speech/Music Classification of Short Audio Segments, 2014 IEEE International Symposium on Multimedia

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Speaker identification enhanced by the social network analyser

Description: The project will build, test and combine technologies associated with the ROXANNE platform by leveraging open source tools (e.g. SpeechBrain, and SocNetV) to demonstrate their strength in an improved identification of persons. The project definition can be adapted toward application of other modalities (e.g. estimating authorship attribution from text, or detection of person using face identification technology).

Goals:

Build a baseline automatic speaker identification engine, either using an open source tool (such as SpeechBrain, or the one available at Idiap), and test it on target (simulated) data related to lawful investigation
Build a baseline graph/network analysis tool with basic functionalities such as centrality or community detection (i.e. also many open source tools can be exploited) and test it on the simulated data
Study a combination of information extracted by speech and network analysis technologies to eventually improve the person identification

Research Program: Sustainable and Resilient Societies

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References:

Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations,
ROXANNE project website

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Automatic named entity recognition from speech

Description: The project will improve detection and recognition of named entities (e.g. names, places, locations) automatically from speech. Currently, two independent technologies are used, namely automatic speech recognition (i.e. usually evaluated to minimise a word error rate) and named entity recogniser. The goal of this project is to efficiently combine these two modules, while leveraging state-of-the-art open source tools such as SpeechBrain or BERT.

Goals:

Get familiarized with a baseline of speech recognition module developed in ROXANNE
Get familiarized with a baseline entity extractor module
Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References:

Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations
Mael Fabien, et al.,BertAA: BERT fine-tuning for Authorship Attribution
ROXANNE project website

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Ergodic control for robot exploration

Description: Ergodic control can be exploited in a range of robotics problems requiring the exploration of regions of interest, e.g. when the available sensing information is not accurate enough for a standard controller, but can guide the robot towards promising areas. In a collaborative task, it can also be used when the operator’s input is not accurate enough to fully reproduce the task, which then requires the robot to explore around the requested input (e.g., a point of interest selected by the operator). For picking and insertion, it can be applied to move around the picking/insertion point, thereby facilitating the prehension/insertion. It can also be employed for active sensing and localization (either detected autonomously, or with help by the operator). Further information.

Goal:

To study the pros and cons of Spectral Multiscale Coverage and Heat Equation Driven Area Coverage to solve robot manipulation problems

Research Program: Human-AI Teaming

Prerequisites: Control theory, signal processing, programming in Python, C++ or Matlab/Octave

References:

Mathew and I. Mezic (2009). Spectral multiscale coverage: A uniform coverage algorithm for mobile sensor networks. In Proc. IEEE Conf. on Decision and Control.
Ivić, B. Crnković, and I. Mezić (2007). Ergodicity-based cooperative multiagent area coverage via a potential field. IEEE Trans. on Cybernetics.

Level: Bachelor/ Master

Contact: Sylvain Calinon, [email protected]

Punctuation restoration on automatic speech recognition output

Description: The goal of the project is to train a model to post-process automatic speech recognition (ASR) output and add punctuation marks (and capitalizations for the next level of difficulty). This will improve readability of an ASR output and make it potentially more useful for other down-stream tasks, such as dialogue systems and language analysis.

Goals:

Get acquainted with the problem, available data, success metrics, machine learning frameworks
Program a simpler system predicting just sentence ends/full stops; Improve and make predictions for other punctuation marks; For extra difficulty learn to predict capital letters
Test and evaluate on a couple of languages, real scenarios

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References:

Yi, et al. Adversarial Transfer Learning for Punctuation Restoration
Pais, et al., Capitalization and punctuation restoration: a survey
Nanchen, et al. EMPIRICAL EVALUATION AND COMBINATION OF PUNCTUATION PREDICTION MODELS APPLIED TO BROADCAST NEWS

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Social media and crowdsourcing for social good

Description: The student will contribute to a multidisciplinary initiative for the use of social media and mobile crowdsourcing for social good. Several projects are available. Students will be working with social computing researchers working with academics in other countries, both in Europe and the Majority World.

Goal:

Social media analytics
Visualization of social and crowdsourced data
Smartphone apps for mobile crowdsourcing

Research Program: AI for Everyone

Prerequisites: Interest and/or experience in one or more of these areas: data analysis, machine learning, data visualization, phone apps, social media, natural language processing, computer vision

Level: Bachelor/ Master

Contact: Daniel Gatica-Perez, [email protected]

Understanding the robustness of machine learning models on underspecified tasks

Description: The performance of deep learning models can quickly degrade when used on test data beyond their training distribution. In recent work [1], we have observed intriguing patterns in the “in-distribution” vs. “out-of-distribution” performance of various models. In particular, there sometimes exists a tradeoff between the two, which evolves during its training and fine-tuning. It is not clear however what impact the pre-training and fine-tuning stages have. This project will contribute to the efforts to understand this topic. One of the objectives is to concretely contribute to a high-quality publication co-authored with other members of our research group.

Goals:

Select datasets of interest and train models with existing code
Examine the performance of various models under various hyper-parameters, numbers of epochs, pre-training/fine-tuning options, etc. Develop model selection strategies to identify robust models
Prepare results, visualizations, and analyses of experiments suitable for a scientific publication

Prerequisites: Solid programming background and experience with deep learning libraries (e.g. Pytorch)

References:

ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets (https://arxiv.org/abs/2209.00613)
The Evolution of OOD Robustness Throughout Fine-Tuning (https://arxiv.org/abs/2106.15831)

Level: Bachelor/ Master

Contact: Damien Teney, [email protected]

Audiovisual person recognition

Description: Audiovisual person identification systems combine two biometric modalities that lead to very good results, as shown in Idiap’s submission to NIST SRE2019. The student will be able to use most of Idiap’s scripts, mainly the audio-related part. Fusion scripts for combining audio and visual systems can also be shared. One of two approaches can be considered, either to develop these systems separately and then experiment with fusion, or attempt to make a single person identification system taking both audio and visual embedding representations as input.

Research Program: Sustainable and Resilient Societies

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References:

NIST SRE 2019
The 2019 NIST Audio-Visual Speaker Recognition Evaluation

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Tensor trains for human-guided optimization in robotics applications

Description: This project extends Tensor Train for Global Optimization (TTGO) to a human-guided learning strategy. Learning and optimization problems in robotics are characterized by two types of variables: task parameters representing the situation that the robot encounters (typically related to environment variables such as locations of objects, users or obstacles); and decision variables related to actions that the robot takes (typically related to a controller acting within a given time window, or the use of basis functions to describe trajectories in control or state spaces). In TTGO, the density function is modeled offline using a tensor train (TT) that learns the structure between the task parameters and the decision variables, and then allows conditional sampling over the task parameters with priority for higher-density regions. Further information.

Goals:

The goal is to test whether the original autonomous learning strategy of TT-Cross can be extended to a human-guided learning strategy, by letting the user sporadically specify task parameters or decision variables within the iterative process. The first case can be used to provide a scaffolding mechanism for robot skill acquisition. The second case can be used for the robot to ask for help in specific situations.

Research Program: Human-AI Teaming

Prerequisites: Linear algebra, optimization, programming in Python

Reference:

Shetty, S., Lembono, T., Löw, T. and Calinon, S. (2023). Tensor Train for Global Optimization Problems in Robotics. arXiv:2206.05077.
https://sites.google.com/view/ttgo

Level: Bachelor/ Master

Contact: Sylvain Calinon, [email protected]

An open-source framework for the quantification of Urban Heat Islands in Switzerland

Description: Cities throughout the world are overheating in Summer, with adverse effects on the health of citizens. Due to the very mineral nature of the built environment, the scarcity of nature in cities, and the anthropogenic heat release in the streets, the temperature will continue to increase with climate change. With physically-based simulation tools we can predict hot spots and evaluate scenarios for the mitigation of urban heat islands. While such tools exist, a framework based on open-data and easily accessible by researchers, practitioners and citizens is a must have to raise awareness and move towards efficient heat islands mitigation measures.

Goals:

Build on an open-source framework in Python or in C++ to go from Swiss open-datasets to equivalent inputs for CitySim and ENVI-met
Introduce the Physiological Equivalent Temperature (PET) as an indicator of Urban Comfort in the open-source CitySim software
Demonstrate the application of scenarios on 2 case-studies representative of the Swiss landscape, quantifying improvement measures
Compare the results obtained with CitySim and ENVI-met, and conclude on the advantages and disadvantages

Research Program: Sustainable and Resilient Societies
Prerequisites: Basic energy balance and thermodynamics knowledge; Basic scripting or programming skills (no Matlab).

References:

Coccolo, Silvia, Jérôme Kämpf, Jean-Louis Scartezzini, and David Pearlmutter. ‘Outdoor Human Comfort and Thermal Stress: A Comprehensive Review on Models and Standards’. Urban Climate 18 (December 2016): 33–57. https://doi.org/10.1016/j.uclim.2016.08.004
Coccolo, Silvia, David Pearlmutter, Jerome Kaempf, and Jean-Louis Scartezzini. ‘Thermal Comfort Maps to Estimate the Impact of Urban Greening on the Outdoor Human Comfort’. Urban Forestry & Urban Greening 35 (October 2018): 91–105. https://doi.org/10.1016/j.ufug.2018.08.007
Master thesis of Giuliano Di Pascalis: Urban Heat Island and Pedestrian Comfort in Switzerland. https://github.com/G-DePascalis/UHI_CH_mp
Master semester project of Yoan Codjia: Urban Heat Island and Pedestrian Comfort in Valais. https://github.com/Itsokyz/UHI_Valais_Citysim

Level: Master semester projects

Contact: Jérôme Kämpf, [email protected]

Automatic identification of flight information from speech

Description: Current approaches toward automatic recognition of call-signs from speech combine conventional automatic speech recognition (i.e. speech-to-text) with entity recognition (i.e. text-to-call-sign) technologies. This project will develop a unified module (e.g. adaptation of well-known BERT models), which will allow a direct mapping of speech on the call-sign.

Goals:

Get familiar with a baseline of speech recognition module for Air Traffic Control (ATC)
Get familiar with a baseline of concept-extractor module for ATC
Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

References:

Martin Kocour, et al, Boosting of contextual information in ASR for air-traffic call-sign recognition
Zuluaga, et al: Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
ATCO2 project

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Understanding generalization in deep learning

Description: State-of-the-art approaches in machine learning are based on deep learning. The reasons for its success are however still poorly understood. Most existing work on the topic has focused on the effects of gradient-based optimization. Interestingly though, even randomly-initialized networks encode inductive biases that mirror some properties of real-world data. This project will contribute to the efforts made in our research group to understand the success of deep learning. This project will emphasize theoretical or practical contributions depending on the student’s interests. One of the objectives is to contribute to a high-quality publication co-authored with other members of our research group, and provide the student with training in rigorous research practices.

Goals:

Select datasets of interest and train various architectures on these
Implement methods or use existing code from recent publications to understand the interplay of various properties of data vs. architectures
Prepare results, visualizations, and analyses of experiments suitable for a scientific publication

Prerequisites: Solid programming background and experience with deep learning libraries (e.g. Pytorch)

References:

Loss Landscapes are All You Need (https://openreview.net/forum?id=QC10RmRbZy9)
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning (https://arxiv.org/abs/2207.02598)

Level: Bachelor/ Master

Contact: Damien Teney, [email protected]

Using large pretrained language models in speech recognition

Description: The aim of this project is to measure how large language models perform in their native cradle – automatic speech recognition. The student will analyze a standard speech dataset through one of the speech recognition models (e.g., publicly available or internal at Idiap), score the outputs with the language models, and combine the scores to refine the transcriptions. The result should be a verdict on the influence of the model size (are the big ones really needed?), a comparison of different models (is GPT better than the same-size LLaMA?) and an evaluation of the usefulness of retraining the language model, which is easy today even on a single GPU.

Goals:

Familiarize with speech recognition engines, available at Idiap
Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
Explore large language models for its deployment in speech recognition

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]