Student projects ‒ LIDIAP ‐ EPFL

You will find below a list of available bachelor semester projects, master’s semester projects, and a master’s thesis (PDM). If you are interested in a specific project, please get in touch with the person listed under “Contact”, mentioning in the subject of the email the title of the project.

For all these projets, you will receive a number of ECTS credits that depends on the type of project and the program. Working on these projects is not remunerated. The projects can be done in the Fall semester and the Spring semester.

You can also work on a non-credited, part-time, remunerated project as an Assistant. Working on these projects is subject to the EPFL rules on the maximum allowed weekly hours.

(If you are not an EPFL student, you can apply to the open internship at Idiap.)

Breathing life in robots through animation
Biological networks for language processing
Automated segmentation of high-content fluorescent microscopy data
Cultural bias in cross-lingual transfer
Using large pretrained language models in speech recognition
Understanding generalization in deep learning
Grading of inflammatory diseases affecting blood vessels in the eye
Automatic identification of flight information from speech
An open-source framework for the quantification of Urban Heat Islands in Switzerland
Swiss Alpine Lakes & Citizen Science
Tensor trains for human-guided optimization in robotics applications
Multi-spectral image unmixing for rapid and automated image annotation
Audiovisual person recognition
Understanding the robustness of machine learning models on underspecified tasks
Social media and crowdsourcing for social good
Punctuation restoration on automatic speech recognition output
Ergodic control for robot exploration
Clinically interpretable computer aided diagnostic tool using multi-source medical data
Wavelets as basis functions for applications in robotics
Assessing radiomics feature stability and discriminative power in 3D and 4D medical data
Automatic named entity recognition from speech
A human-centered approach to understand local news consumption
Data-driven identification of prognostic tumor subpopulations from single-cell RNA sequencing data
Development of epigenetic biomarkers for chronic pain stratification
Compartment-specific mRNA metabolism in MNs and ACs in ALS pathogenesis
A robot manipulator writing texts using a pen
Speaker identification enhanced by the social network analyser
Speech/Music Classification
Error correction in speech recognition using large pre-trained language models
Automatic speech recognition of air-traffic communication using grammar
Pathological speech detection in adverse environments
Pathological speech enhancement
Parametric Gaze Following
From Gaze Following to Joint Attention
Person Head Detection
Research Spotlight: Crafting An Interactive Webpage Template for Showcasing Science
Spatio-temporal Modeling of Human Behavior in Videos
Scaling Pre-training for Gaze Following

Breathing life in robots through animation

Description
Robot motions are known to have a strong effect on human perception. In this project, you will be working on the Lio robot and design breathing animations for the platform. You will use the to add Perlin noise to the robot and create lifelike motion. If time permits, you will also create a number of animation inspired from animal behaviors to help the robot communicate emotions and test them in online and real world studies.

Goals

Adapt the robot model (urdf and meshes) to the requirements for Lively
Test behaviors in simulation
Apply approach on the real robot
conduct online study to evaluate the impact of the motion (only master project)

Research Program: Human-AI Teaming

Prerequisites: Good command of Python, experience with ROS/URDF, basics of Linux. Experience in robotics (inverse kinematics, control, or system architecture) would be a plus.

Level: Bachelor/ Master

Contact: Emmanuel Senft, [email protected], Jean-Marc Odobez, [email protected]

Biological networks for language processing

Description
Biological spiking neural networks are interesting from a scientific (evolution) point of view, as well as from a technical one. In the latter sense, spiking networks offer advantages over artificial ones in terms of recurrence, coding and power consumption. Recently we have shown that spiking neurons can be combined freely with artificial ones in the same architecture, showing promise in speech processing tasks. The object of the project would be to extend this work towards processing of discrete entities such as words at the level of language rather that audio. Some progress has already been made in the literature with the likes of “Spikeformer”, being a transformer equivalent.

Goals

Investigate the issues around using spiking for discrete outputs
Show that mainly spiking nets can replace, say, recurrent artificial components

Research Program: AI for Life

Prerequisites: The project will involve programming in python using the Pytorch library. Some knowledge of deep learning will be required, ideally from previous courses.

Reference

Alexandre Bittar and Philip N. Garner. A surrogate gradient spiking baseline for speech command recognition. Frontiers in Neuroscience, 16, August 2022. http://dx.doi.org/10.3389/fnins.2022.865897

Level: Master

Contact: Phil Garner, [email protected]

Automated segmentation of high-content fluorescent microscopy data

Description
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive and incurable neurodegenerative disease. The early events underlying the disease remain poorly understood. As a dramatic consequence no effective treatment has been developed. We previously found that the molecular events leading to ALS start during early development. It remains however unknown how and when these affect individual cell behaviour. This project aims to study how molecular biology shapes cellular morphology at early stage of ALS by integrating longitudinal cellular imaging with genomic data and it involves a close collaboration with the experimental laboratory of professor Rickie Patani, Francis Crick Institute/UCL.

Goals

The goal of the project is to develop an image analysis pipeline to extract and analyse single-cell phenotypic measurements from large-scale time-lapse fluorescence imaging data from astrocytes and motor neurons in culture. Specifically, it will involve

expansion on existing image analysis modules to obtain robust single-cell readouts from longitudinal images; and
development of statistical models to identify cellular trajectories associated with early stage of ALS development using the phenotypic features obtained in 1).

Research Program: AI for Life

Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in image processing and analysis, and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.

Level: Bachelor/ Master

Contact: Raphaëlle Luisier, [email protected]

Cultural bias in cross-lingual transfer

Description
Multilingual transformers have proven to be successful at cross-lingual transfer for multiple tasks in Natural Language Processing (NLP). In such a setup, a multilingual transformer model is fine-tuned on a given source language, for which annotations exist, and the resulting model is tested on a task in a target language by providing a few annotated examples or none. The standard approach to multilingual dataset creation for semantic tasks is translation from an existing (English) dataset, which introduces concerns regarding their suitability, because translations, especially to culturally diverse languages, may break certain relations, such as perceived causal relations in social behaviour, or moral values. Further information.

Goals

Estimate the level of cross-lingual transfer of cultural biases on existing datasets
Create datasets for the purpose of estimating cross-lingual transfer of cultural biases
If relevant and if time permits you will develop ways to mitigate this effect

Research Program: AI for Everyone

Level: Bachelor/ Master

Contact: Lonneke van der Plas, [email protected]

Using large pretrained language models in speech recognition

Description
The aim of this project is to measure how large language models perform in their native cradle – automatic speech recognition. The student will analyze a standard speech dataset through one of the speech recognition models (e.g., publicly available or internal at Idiap), score the outputs with the language models, and combine the scores to refine the transcriptions. The result should be a verdict on the influence of the model size (are the big ones really needed?), a comparison of different models (is GPT better than the same-size LLaMA?) and an evaluation of the usefulness of retraining the language model, which is easy today even on a single GPU.

Goals

Familiarize with speech recognition engines, available at Idiap
Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
Explore large language models for its deployment in speech recognition

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Understanding generalization in deep learning

Description
State-of-the-art approaches in machine learning are based on deep learning. The reasons for its success are however still poorly understood. Most existing work on the topic has focused on the effects of gradient-based optimization. Interestingly though, even randomly-initialized networks encode inductive biases that mirror some properties of real-world data. This project will contribute to the efforts made in our research group to understand the success of deep learning. This project will emphasize theoretical or practical contributions depending on the student’s interests. One of the objectives is to contribute to a high-quality publication co-authored with other members of our research group, and provide the student with training in rigorous research practices.

Goals

Select datasets of interest and train various architectures on these
Implement methods or use existing code from recent publications to understand the interplay of various properties of data vs. architectures
Prepare results, visualizations, and analyses of experiments suitable for a scientific publication

Prerequisites: Solid programming background and experience with deep learning libraries (e.g. Pytorch)

References

Loss Landscapes are All You Need (https://openreview.net/forum?id=QC10RmRbZy9)
Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning (https://arxiv.org/abs/2207.02598)

Level: Bachelor/ Master

Contact: Damien Teney, [email protected]

Grading of inflammatory diseases affecting blood vessels in the eye

Description
Fluorescein angiography is the only clinical method to evaluate the function and integrity of the blood-retinal barrier. Using real hospital data, we aim to detect and grade inflammatory diseases affecting blood vessels in the eye. Through computer vision and machine learning approaches the student will identify novel biomarkers that could improve patient management and care. This project is a collaboration with a multi-centric medical team to identify promising new leads in the research of this field. Challenges include the segmentation and registration of (retinal) fundus angiography data, and the grading of diseased patients.

Goals

To develop a system for the detection and grading of inflammatory eye diseases in medical images and video
Validate the proposed approach and compare results to the state-of-the-art techniques
Work together with clinical experts on improving the current understanding of the disease

Research Program: AI for Life

Prerequisites: Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)

Reference

Tugal-Tutkun, I., Herbort, C. P., Khairallah, M., & Angiography Scoring for Uveitis Working Group (ASUWOG). (2010). Scoring of dual fluorescein and ICG inflammatory angiographic signs for the grading of posterior segment inflammation (dual fluorescein and ICG angiographic scoring system for uveitis). International ophthalmology, 30, 539-552.

Level: Master

Contact: Andre Anjos

Automatic identification of flight information from speech

Description
Current approaches toward automatic recognition of call-signs from speech combine conventional automatic speech recognition (i.e. speech-to-text) with entity recognition (i.e. text-to-call-sign) technologies. This project will develop a unified module (e.g. adaptation of well-known BERT models), which will allow a direct mapping of speech on the call-sign.

Goals

Get familiar with a baseline of speech recognition module for Air Traffic Control (ATC)
Get familiar with a baseline of concept-extractor module for ATC
Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

References

Martin Kocour, et al, Boosting of contextual information in ASR for air-traffic call-sign recognition
Zuluaga, et al: Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
ATCO2 project

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

An open-source framework for the quantification of Urban Heat Islands in Switzerland

Description
Cities throughout the world are overheating in Summer, with adverse effects on the health of citizens. Due to the very mineral nature of the built environment, the scarcity of nature in cities, and the anthropogenic heat release in the streets, the temperature will continue to increase with climate change. With physically-based simulation tools we can predict hot spots, and evaluate scenarios for the mitigation of urban heat islands. While such tools exist, a framework based on open-data and easily accessible by researchers, practitioners and citizens is a must have to raise awareness and move towards efficient heat islands mitigation measures.

Goals

Build an open-source framework in Python (or any other language except Matlab) to go from Swiss open-datasets to indicators related to the Urban Heat Island effect
Introduce the Physiological Equivalent Temperature (PET) and the Universal Thermal Climate Index (UTCI) as indicators of Urban Comfort in our simulation tool
Demonstrate the application of scenarios on 3 case-studies representative of the Swiss landscape, quantifying improvement measures

Research Program: Sustainable and Resilient Societies

Prerequisites: Basic energy balance and thermodynamics knowledge; Basic scripting or programming skills (no Matlab).

References

Coccolo, Silvia, Jérôme Kämpf, Jean-Louis Scartezzini, and David Pearlmutter. ‘Outdoor Human Comfort and Thermal Stress: A Comprehensive Review on Models and Standards’. Urban Climate 18 (December 2016): 33–57. https://doi.org/10.1016/j.uclim.2016.08.004
Coccolo, Silvia, David Pearlmutter, Jerome Kaempf, and Jean-Louis Scartezzini. ‘Thermal Comfort Maps to Estimate the Impact of Urban Greening on the Outdoor Human Comfort’. Urban Forestry & Urban Greening 35 (October 2018): 91–105. https://doi.org/10.1016/j.ufug.2018.08.007

Level: Master

Contact: Jérôme Kämpf, [email protected]

Swiss Alpine Lakes & Citizen Science

Description
The 2000Lakes initiative aims to catalog the microbial diversity in Swiss alpine lakes while developing a network of citizen science and stakeholders. We are looking for several motivated students interested in alpine science, data science, and human-centered research to develop a master thesis project or semester project on this topic. This project offers the possibility of contributing to an innovative approach to scientific research.

Goals

Develop creative actions to inform and consolidate a network of stakeholders engaged with biodiversity in alpine lakes
Develop computational tools (using data visualization, social media, media archives) to support interaction with citizens and stakeholders
Participate in fieldwork and publications in the field of citizen science

Research Program: AI for Everyone

Prerequisites: Interest and/or experience in one or more of these areas: social media, citizen science, community organizing, data visualization, data analysis, machine learning

Level: Bachelor/ Master

Contact: Daniel Gatica-Perez, [email protected]

Tensor trains for human-guided optimization in robotics applications

Description
This project extends Tensor Train for Global Optimization (TTGO) to a human-guided learning strategy. Learning and optimization problems in robotics are characterized by two types of variables: task parameters representing the situation that the robot encounters (typically related to environment variables such as locations of objects, users or obstacles); and decision variables related to actions that the robot takes (typically related to a controller acting within a given time window, or the use of basis functions to describe trajectories in control or state spaces). In TTGO, the density function is modeled offline using a tensor train (TT) that learns the structure between the task parameters and the decision variables, and then allows conditional sampling over the task parameters with priority for higher-density regions. Further information.

Goals

The goal is to test whether the original autonomous learning strategy of TT-Cross can be extended to a human-guided learning strategy, by letting the user sporadically specify task parameters or decision variables within the iterative process. The first case can be used to provide a scaffolding mechanism for robot skill acquisition. The second case can be used for the robot to ask for help in specific situations.

Research Program: Human-AI Teaming

Prerequisites: Linear algebra, optimization, programming in Python

Reference

Shetty, S., Lembono, T., Löw, T. and Calinon, S. (2023). Tensor Train for Global Optimization Problems in Robotics. arXiv:2206.05077.
https://sites.google.com/view/ttgo

Level: Bachelor/ Master

Contact: Sylvain Calinon, [email protected]

Multi-spectral image unmixing for rapid and automated image annotation

Description
Object segmentation and identification methods often rely on the availability of large annotated image libraries. While such libraries are widely available for every-day image scenes, many applications in industry, science and medicine lack similar data because of their unique and specialized nature. The student will implement and characterize the potential of imaging scenes using a multi-spectral (colored) illumination patterns to facilitate object annotation in complex scenes. The project will involve the use of a custom hardware imaging setup consisting of a digital camera with triggered multi-color light sources, collecting images of objects, and implementing computational imaging algorithms for spectral unmixing and image segmentation.

Goals

Implement a multi-spectral image acquisition protocol using triggered LEDs of various wavelengths to acquire images of objects in a lab setting
Implement a spectral unmixing algorithm to segment objects in images
Depending on progress, deployment deploy method in a light microscope for imaging biological samples

Research Program: AI for Life

Prerequisites: Signal processing/image processing, Introduction to machine learning, Python programming.

References

Jaques, E. Pignat, S. Calinon and M. Liebling, “Temporal Super-Resolution Microscopy Using a Hue-Encoded Shutter,” Biomedical Optics Express, 10(09):4727-4741, 2019
Jaques, L. Bapst-Wicht, D.F. Schorderet and M. Liebling, “Multi-Spectral Widefield Microscopy of the Beating Heart through Post-Acquisition Synchronization and Unmixing,” IEEE International Symposium on Biomedical Imaging (ISBI 2019), pp. 1382-1385, 2019

Level: Master

Contact: Michael Liebling, [email protected]

Audiovisual person recognition

Description
Audiovisual person identification systems combine two biometric modalities that lead to very good results, as shown in Idiap’s submission to NIST SRE2019. The student will be able to use most of Idiap’s scripts, mainly the audio-related part. Fusion scripts for combining audio and visual systems can also be shared. One of two approaches can be considered, either to develop these systems separately and then experiment with fusion, or attempt to make a single person identification system taking both audio and visual embedding representations as input.

Research Program: Sustainable and Resilient Societies

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References

NIST SRE 2019
The 2019 NIST Audio-Visual Speaker Recognition Evaluation

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Understanding the robustness of machine learning models on underspecified tasks

Description
The performance of deep learning models can quickly degrade when used on test data beyond their training distribution. In recent work [1], we have observed intriguing patterns in the “in-distribution” vs. “out-of-distribution” performance of various models. In particular, there sometimes exists a tradeoff between the two, which evolves during its training and fine-tuning. It is not clear however what impact the pre-training and fine-tuning stages have. This project will contribute to the efforts to understand this topic. One of the objectives is to concretely contribute to a high-quality publication co-authored with other members of our research group.

Goals

Select datasets of interest and train models with existing code
Examine the performance of various models under various hyper-parameters, numbers of epochs, pre-training/fine-tuning options, etc. Develop model selection strategies to identify robust models
Prepare results, visualizations, and analyses of experiments suitable for a scientific publication

Prerequisites: Solid programming background and experience with deep learning libraries (e.g. Pytorch)

References

ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets (https://arxiv.org/abs/2209.00613)
The Evolution of OOD Robustness Throughout Fine-Tuning (https://arxiv.org/abs/2106.15831)

Level: Bachelor/ Master

Contact: Damien Teney, [email protected]

Social media and crowdsourcing for social good

Description
The student will contribute to a multidisciplinary initiative for the use of social media and mobile crowdsourcing for social good. Several projects are available. Students will be working with social computing researchers working with academics in other countries, both in Europe and the Majority World.

Goals

Social media analytics
Visualization of social and crowdsourced data
Smartphone apps for mobile crowdsourcing

Research Program: AI for Everyone

Prerequisites: Interest and/or experience in one or more of these areas: data analysis, machine learning, data visualization, phone apps, social media, natural language processing, computer vision

Level: Bachelor/ Master

Contact: Daniel Gatica-Perez, [email protected]

Punctuation restoration on automatic speech recognition output

Description
The goal of the project is to train a model to post-process automatic speech recognition (ASR) output and add punctuation marks (and capitalizations for the next level of difficulty). This will improve readability of an ASR output and make it potentially more useful for other down-stream tasks, such as dialogue systems and language analysis.

Goals

Get acquainted with the problem, available data, success metrics, machine learning frameworks
Program a simpler system predicting just sentence ends/full stops; Improve and make predictions for other punctuation marks; For extra difficulty learn to predict capital letters
Test and evaluate on a couple of languages, real scenarios

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References

Yi, et al. Adversarial Transfer Learning for Punctuation Restoration
Pais, et al., Capitalization and punctuation restoration: a survey
Nanchen, et al. EMPIRICAL EVALUATION AND COMBINATION OF PUNCTUATION PREDICTION MODELS APPLIED TO BROADCAST NEWS

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Ergodic control for robot exploration

Description
Ergodic control can be exploited in a range of robotics problems requiring the exploration of regions of interest, e.g. when the available sensing information is not accurate enough for a standard controller, but can guide the robot towards promising areas. In a collaborative task, it can also be used when the operator’s input is not accurate enough to fully reproduce the task, which then requires the robot to explore around the requested input (e.g., a point of interest selected by the operator). For picking and insertion, it can be applied to move around the picking/insertion point, thereby facilitating the prehension/insertion. It can also be employed for active sensing and localization (either detected autonomously, or with help by the operator). Further information.

Goals

To study the pros and cons of Spectral Multiscale Coverage and Heat Equation Driven Area Coverage to solve robot manipulation problems

Research Program: Human-AI Teaming

Prerequisites: Control theory, signal processing, programming in Python, C++ or Matlab/Octave

References

Mathew and I. Mezic (2009). Spectral multiscale coverage: A uniform coverage algorithm for mobile sensor networks. In Proc. IEEE Conf. on Decision and Control.
Ivić, B. Crnković, and I. Mezić (2007). Ergodicity-based cooperative multiagent area coverage via a potential field. IEEE Trans. on Cybernetics.

Level: Bachelor/ Master

Contact: Sylvain Calinon, [email protected]

Clinically interpretable computer aided diagnostic tool using multi-source medical data

Description
Fighting many rare diseases would benefit from automated image analysis tools to improve the available understanding about them. One of such rare diseases is fibromuscular dysplasia (FMD), which is an under-recognized disease of the blood vessels. Challenges include the segmentation of the renal artery from larger 3D volumes, and the classification of FMD from healthy patients. The main tasks of the project include: Literature review, Medical image analysis i.e., segmentation of 3D tubular structures in real computed tomography images, deep-learning disease detection, and proposing novel approaches to improve the understanding of this disease.

Goals

Improve characterization of the renal artery in computed tomography scans
Build an interpretable machine learning system using clinical imaging data for FMD
Develop – together with clinical experts – a computer aided diagnostic tool for this disease

Research Program: AI for Life

Prerequisites: Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)

Reference

Bruno, R. M., Mischak, H., & Persu, A. (2020). Multi-omics applied to fibromuscular dysplasia: first steps on a new research avenue. Cardiovascular research, 116(1), 4-5.

Level: Bachelor/ Master

Contact: Andre Anjos, [email protected]

Wavelets as basis functions for applications in robotics

Description
Basis functions can be used to encode signals through a weighted superposition of basis functions, acting as a dictionary of simpler signals. The dictionary can be any set of basis functions, including radial basis functions (RBFs), Fourier basis functions, or Bernstein basis functions (used for Bézier curves). Basis functions can be used to encode trajectories, whose input is a 1D time variable and whose output can be multidimensional. Basis functions can also be used to encode signals generated by multivariate inputs. For example, a Bézier surface uses two input variables to cover a spatial range and generates an output variable describing the height of the surface within this rectangular domain. Further information.

Goals

The project extends the above approach to the use of wavelet basis functions and to study the property of wavelets in the context of robot manipulation skills. Wavelets encompass both spatial and spectral properties, which makes them a good candidate to encode functions at different resolutions. The approach will be tested in the context of signed distance functions (see the second reference below).

Research Program: Human-AI Teaming

Prerequisites: Signal processing, programming in Python, C++ or Matlab/Octave

References

Calinon, S. (2019). Mixture Models for the Analysis, Edition, and Synthesis of Continuous Time Series. Bouguila, N. and Fan, W. (eds). Mixture Models and Applications, pp. 39-57. Springer.
Li, Y., Zhang, Y., Razmjoo, A. and Calinon, S. (2023). Learning Robot Geometry as Distance Fields: Applications to Whole-body Manipulation. ArXiv 2307.00533.

Level: Bachelor/ Master

Contact: Sylvain Calinon, [email protected]

Assessing radiomics feature stability and discriminative power in 3D and 4D medical data

Description
Radiomic features obtained from medical images and video can objectively quantify relevant information present in clinical studies. However, recent studies have shown that some of these features can be unstable and redundant, as features can be sensitive to variations of acquisition details. Therefore, reproducibility and discriminative power cannot be treated in isolation boosting the identification of the best features that show a show a higher tolerability towards those influences. Challenges include: Determine the stability of radiomics features against parameter variations during acquisition, as well as across different time points between patient studies.

Goals

Implementation and analysis of radiomic features extracted from 3D and 4D medical data
Identifying the most relevant features according to their variability and stability in different radiological tasks
Proposing novel approaches to mitigate biases and limitations of these features in a real clinical scenario

Research Program: AI for Life

Prerequisites: Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)

Reference

Jimenez-del-Toro, O., Aberle, C., Bach, M., Schaer, R., Obmann, M. M., Flouris, K., … & Depeursinge, A. (2021). The discriminative power and stability of radiomics features with computed tomography variations: task-based analysis in an anthropomorphic 3D-printed CT phantom. Investigative radiology, 56(12), 820-825.

Level: Bachelor/ Master

Contact: Andre Anjos, [email protected]

Automatic named entity recognition from speech

Description
The project will improve detection and recognition of named entities (e.g. names, places, locations) automatically from speech. Currently, two independent technologies are used, namely automatic speech recognition (i.e. usually evaluated to minimise a word error rate) and named entity recogniser. The goal of this project is to efficiently combine these two modules, while leveraging state-of-the-art open source tools such as SpeechBrain or BERT.

Goals

Get familiarized with a baseline of speech recognition module developed in ROXANNE
Get familiarized with a baseline entity extractor module
Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References

Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations
Mael Fabien, et al.,BertAA: BERT fine-tuning for Authorship Attribution
ROXANNE project website

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

A human-centered approach to understand local news consumption

Description
The project aims to design and implement a framework to study the consumption of local news in the European multicultural context. The project will include a combination of research methods for experimental design and data analysis, and will be done in the context of the AI4Media European project, a European Excellence Center for Media, Society, and Democracy.

Goals The specific goals of the project include

literature review
identification of news sources
mixed-method experimental design
experiments and data analysis
and writing

Research Program: AI for Everyone

Level: Bachelor/ Master

Contact: Daniel Gatica-Perez, [email protected]

Data-driven identification of prognostic tumor subpopulations from single-cell RNA sequencing data

Description
This project is part of a larger one aiming to integrate single-cell sequencing data with imaging data in order to develop accurate machine learning methods to identify tumor subpopulations. It involves a close collaboration with the Department of oncology UNIL CHUV header by Prof. Olivier Michielin and the Novartis Institute for Biomedical Research. Accumulating evidence shows aberrant mRNA metabolism in cancer however relatively little is known about the impact of genetic mutation on mRNA metabolism in cancers and how this confers resistance to therapy.

Goals

In this project the student will develop and implement bioinformatics pipelines to study alternative splicing and polyadenylation from single-cell transcriptome of Braf inhibitors resistant melanoma. This will then serve to test whether combining measurements from gene and alternative 3′ UTR expression enable the identification of subtle subpopulations that confer drug resistances.

Research Program: AI for Life

Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.

Level: Bachelor/ Master

Contact: Raphaëlle Luisier, [email protected]

Development of epigenetic biomarkers for chronic pain stratification

Description
Chronic pain is a major health care problem that affects millions of people worldwide. It has been demonstrated that complex interactions between biological, psychological, environmental, and social factors may influence pain chronicization. Therefore, epigenetic factors may be the trigger to explain the transition from acute to chronic pain and chronic pain maintenance. However, little is known about the influence of these biopsychosocial factors on epigenetic modifications in a population of chronic musculoskeletal pain patients consecutively to an orthopedic trauma. This project will analyze the whole genome methylation levels in a population of chronic pain patients and healthy controls through the prism of specific biological (age, medication) and psychological (anxiety/depression) factors.

Goals

This biological project will undertake bioinformatic analyses of methylation sites on the whole genome to identify specific genes that may be involved in the transition from acute to chronic pain. This project will be in collaboration with the medical research group at the Clinique romande de readaptation (CRR, Betrand Leger), where the student is expected to spend 20% of their time

Research Program: AI for Life

Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.

Level: Bachelor/ Master

Contact: Raphaëlle Luisier, [email protected]

Compartment-specific mRNA metabolism in MNs and ACs in ALS pathogenesis

Description
This project is part of a larger one aiming to study how molecular biology shapes cellular morphology at early stage of Amyotrophic Lateral Sclerosis (ALS) by integrating longitudinal cellular imaging with genomic data. It involves a close collaboration with the experimental laboratory of Prof. Rickie Patani, Francis Crick Institute/UCL. We recently uncovered cytoplasmic accumulation of aberrant intron retaining transcripts (IRTs) as the earliest detectable molecular phenotype in ALS 1–4. The mechanisms that control RNA binding protein mislocalization, the molecular hallmark of ALS, have yet to be elucidated and it remains unknown whether the early aCIRT relates to protein mislocalization, ER stress, mitochondrial depolarisation, oxidative stress, synaptic loss and cell death.

Goals

to study the temporal and spatial dynamics of intronic and 3′ UTR sequences in developing MNs and ACs derived from ALS-mutant and control iPSC cell lines using time-resolved RNA-sequencing data from nuclear and cytoplasmic fractions
to characterise the sequence features of cytoplasmic and nuclear cytoplasmic IRTs and 3′ UTR
to develop an mRNA subcellular localisation model using machine learning methods

Research Program: AI for Life

Prerequisites: Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.

References

Level: Bachelor/ Master

Contact: Raphaëlle Luisier, [email protected]

A robot manipulator writing texts using a pen

Description
This project proposes to develop a robot that can take any text as input and write the corresponding sequence of letters on a piece of paper. In order to look natural, the use of a typeface vector font such as Hershey will be investigated as first starting point. Such a vector font format can be employed to represent alphabet characters by a set of strokes forming the skeleton of the letters instead of their outlines (as in the conventional font formats). The project will be implemented with a 6-axis UFactory Lite-6 robot (https://www.ufactory.cc/lite-6-collaborative-robot), available at Idiap. Further information.

Goals

Several aspects will have to be considered in the project, such as placement and segmentation of the text on the page, the planning and control approach using inverse kinematics, as well as estimation of the writing result through the camera embedded within the robot arm.

Research Program: Human-AI Teaming

Prerequisites: Linear algebra, programming in Python or C++

References

Hershey Fonts – https://pypi.org/project/Hershey-Fonts/, https://en.wikipedia.org/wiki/Hershey_fonts
Robotics codes from scratch (RCFS) – https://rcfs.ch/
Calinon, S. (2023). Learning and Optimization in Robotics – Lecture notes. – https://rcfs.ch/doc/rcfs.pdf

Level: Bachelor/ Master

Contact: Sylvain Calinon, [email protected]

Speaker identification enhanced by the social network analyser

Description
The project will build, test and combine technologies associated with the ROXANNE platform by leveraging open source tools (e.g. SpeechBrain, and SocNetV) to demonstrate their strength in an improved identification of persons. The project definition can be adapted toward application of other modalities (e.g. estimating authorship attribution from text, or detection of person using face identification technology).

Goals

Build a baseline automatic speaker identification engine, either using an open source tool (such as SpeechBrain, or the one available at Idiap), and test it on target (simulated) data related to lawful investigation
Build a baseline graph/network analysis tool with basic functionalities such as centrality or community detection (i.e. also many open source tools can be exploited) and test it on the simulated data
Study a combination of information extracted by speech and network analysis technologies to eventually improve the person identification

Research Program: Sustainable and Resilient Societies

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning.

References

Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations,
ROXANNE project website

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Speech/Music Classification

Description
Classifying sound into speech, music and possibly noise is important for systems based on statistical modeling. Statistical models are usually trained on a large database of input signals containing various sounds. In both the training process and the testing process it is advantageous to exclude segments containing non-speech sounds to improve the accuracy of the model. This project will develop a classifier discriminating speech from music and potentially also from noise. You will first analyze existing approaches to speech/music classification and evaluate their efficiency and accuracy using conventional metrics for binary classification. You will then propose your own classifier or improve an existing one.

Goals

Familiarize with voice activity detectors, or existing speech/music detectors available publicly or at Idiap
Develop a new speech/music classifier
Evaluate the new technology with baseline on well-established data

Research Program: AI for Life

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

References

Banriskhem K.Khonglah: Speech / music classification using speech-specific features, Digital Signal Processing, Volume 48, January 2016, Pages 71-83
Mrinmoy Bhattacharjee: Time-Frequency Audio Features for Speech-Music Classification
Toni Hirvonen: Speech/Music Classification of Short Audio Segments, 2014 IEEE International Symposium on Multimedia

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Error correction in speech recognition using large pre-trained language models

Description
The aim of this work will be to find out if it is possible to use these language models for the correction of errors in the transcription of spoken speech. The student will run some standard speech set through one or more publicly available speech transcription models and then investigate how the language models are able to correct errors: Does the overall error rate matter? Are there any classes of errors that are better fixed? Is it better to use a traditional language model (e.g. LLaMA) or a conversational one (e.g. Alpaka)?

Goals

Familiarize with speech recognition engines, available at Idiap
Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
Explore large language models for its deployment to post-process speech recognition output

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Automatic speech recognition of air-traffic communication using grammar

Description
Current state-of-the-art speech-to-text systems (i.e. automatic speech recognition engines (ASR)) applied to air-traffic control exploit statistical language models which require large amounts of textual data for training. Nevertheless, the Air Traffic Controller Officers (ATCOs) are required to strictly follow the phraseology (i.e. standardised International Civil Aviation Organization, ICAO) and thus context-free grammar (CFG) can be used to model sequences of words generated by ATCOs. The goal of this project is to explore new ways how traditional concepts of statistical language modeling can be enriched by standardised phraseology (i.e. modeled by CFG-based language modeling).

Goals

Develop a baseline automatic speech recognition engine in Kaldi framework suited for air-traffic controllers
Explore use of CFG-based language model in ASR allowing to model sequences of words (i.e. replacing the statistical language model or enriching them)
Compare the performance of new language model on ASR tasks

Research Program: Human-AI Teaming

Prerequisites: Python programming, Shell programming, basic knowledge of machine learning

References:

Level: Bachelor/ Master

Contact: Petr Motlicek, [email protected]

Pathological speech detection in adverse environments

Description
Various conditions of brain damage may disrupt the speech production mechanism, resulting in motor speech disorders that encapsulate altered speech production in different dimensions. To diagnose motor speech disorders, we have developed automatic speech processing approaches. Such approaches however can fail to cope with realistic clinical constraints, i.e., the presence of noise and reverberation when recording speech in clinical settings. This project will contribute to the efforts made in our group to understand the performance of state-of-the-art approaches in adverse environments and develop appropriate approaches targeting such scenarios.

Goals

Set up datasets of interest
Implement existing approaches and/or get familiar with existing implementations
Examine the performance of various approaches in adverse environments
If relevant and time permits, develop novel approaches targeting adverse scenarios

Research Program: AI for Life

Prerequisites

Python programming; basic knowledge of machine learning

Level: Bachelor/Master

Contact: Ina Kodrasi, [email protected]

Pathological speech enhancement

Description
Speech signals recorded in an enclosed space by microphones placed at a distance from the source are often corrupted by reverberation and background noise, which degrade speech quality, impair speech intelligibility, and decrease the performance of automatic speech recognition systems. Speech enhancement approaches to mitigate these effects have been devised for neurotypical speakers, i.e., speakers without any speech impairments. However, pathological conditions such as hearing loss, head and neck cancers, or neurological disorders, disrupt the speech production mechanism, resulting in speech impairments across different dimensions. This project will contribute to our efforts to understand the performance of state-of-the-art approaches for pathological signals and develop appropriate approaches targeting pathological speech.

Goals

Set up datasets of interest
Implement existing approaches and/or get familiar with existing implementations
Examine the performance of various approaches for pathological speech signals
If relevant and time permits, develop novel approaches targeting pathological speech

Research Program: AI for Life

Prerequisites

Python programming; basic knowledge of machine learning

Level: Bachelor/Master

Contact: Ina Kodrasi, [email protected]

Parametric Gaze Following

Description
The gaze following task in computer vision is defined as the prediction of the 2D coordinates where a person in an image is looking. Previous research efforts cast the problem as a heatmap prediction and consider the point of maximum intensity to be the predicted gaze point. This formulation has the benefit of enabling the model to highlight different potential gaze targets when the scene does not contain enough information to be conclusive. However, aside from the argmax, it is relatively difficult to leverage such heatmaps to automatically retrieve more information about the distribution they represent (e.g. the different modes, the weight and variance of each mode, etc.). The goal of this project is to explore a different formulation of the gaze following task where we predict a parametric probability distribution instead of heatmap pixels. Preliminary experiments in this direction have shown promising results.

Goals

Investigate ideas to cast gaze-following as the prediction of a parametric probability distribution (e.g. Mixture of Gaussians) instead of a heatmap
Propose new performance metrics capturing more information about the distribution compared to point-based metrics

Research Program: AI for Life

Prerequisites
The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses and projects

References

Tafasca, Samy, Anshul Gupta, and Jean-Marc Odobez. “ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
Chong, Eunji, et al. “Detecting attended visual targets in video.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

Type: Master’s Thesis (Master’s Project)

Level: Master

Contact: Dr. Jean-Marc Odobez, [email protected]

From Gaze Following to Joint Attention

Description
Gaze is an important marker for non-verbal communication that is indicative of a person’s visual attention. It is also a proxy measure of cognition and can be used to evaluate a subject’s state of mind, intentions, and preferences among other things. As such, it has attracted high interest over the years from different research communities ranging from psychology to neuroscience.
Here, we are specifically interested in the gaze following task, defined as the prediction of the 2D pixel location where a person in the image is looking. It may also involve predicting a binary flag that indicates whether the subject is looking inside or outside the image frame. The first step of this project is to train and evaluate a state-of-the-art transformer-based gaze following model on a new and challenging dataset. The idea is to evaluate not only gaze following performance using standard metrics but also joint attention between people using a post-processing approach based on the predicted gaze points. In the second stage, we will look to extend the network architecture to predict joint attention in an end-to-end manner. The resulting model will serve to pseudo-annotate a large-scale video dataset to highlight potential segments of joint attention for sampling and further manual annotation.
This work is part of a collaboration with the Language Acquisition and Diversity Lab of the University of Zurich, headed by Prof. Suzanne Stoll.

Goals

Train a gaze-following model on a new dataset and evaluate gaze-following and joint attention performance
Extend the architecture to predict both the gaze point and joint attention simultaneously and compare with the baseline

Research Program: AI for Life

Prerequisites
The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses and projects.

References

Tafasca, Samy, Anshul Gupta, and Jean-Marc Odobez. “Sharingan: A Transformer-based Architecture for Gaze Following.” arXiv preprint arXiv:2310.00816 (2023)
Fan, Lifeng, et al. “Inferring shared attention in social scene videos.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018

Type: Semester Project (Research Project)

Level: Bachelor or Master

Contact: Dr. Jean-Marc Odobez, [email protected]

Person Head Detection

Description
Numerous models are readily available for face detection, but there’s a noticeable gap when it comes to detecting the entire head of a person, which is crucial for certain applications. While many applications focus solely on identifying faces, some rely on information from the entire head. Examples of such applications include head pose estimation and gaze tracking, which is the specific focus of this project.
The primary objective is to curate a collection of publicly accessible datasets containing annotations for heads and use them to train a cutting-edge object detection model. This model will be designed to accurately locate people’s heads within images. Additionally, we will explore various augmentation techniques to enhance its performance. It’s important to highlight that the resulting model is expected to cater to a broad spectrum of users across diverse applications.

Goals

Compile multiple available datasets containing head annotations and convert their labels to a unified format
Train and evaluate an object detector (e.g. Yolov8, DETR), and experiment with data augmentation strategies to maximize performance (especially for extreme head poses)
Package the model and inference pipeline in a clean and modular codebase that makes it easy for the end-user to run

Research Program: AI for Life

Prerequisites
Proficiency in Python programming (including the standard scientific libraries, e.g. numpy, pandas) is expected. Knowledge of deep learning and the PyTorch framework is desired, but not required.

References

Shao, Shuai, et al. “Crowdhuman: A benchmark for detecting human in a crowd.” arXiv preprint arXiv:1805.00123 (2018).
Terven, Juan, and Diana Cordova-Esparza. “A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond.” arXiv preprint arXiv:2304.00501 (2023).
Carion, Nicolas, et al. “End-to-end object detection with transformers.” European conference on computer vision. Cham: Springer International Publishing, 2020.

Type: Semester Project (Research Project)

Level: Bachelor or Master

Contact: Dr. Jean-Marc Odobez, [email protected]

Research Spotlight: Crafting An Interactive Webpage Template for Showcasing Science

Description
In today’s world, simply getting your research papers published in journals or conferences is no longer sufficient for a scientist. The emphasis has shifted towards promoting and sharing one’s work via social media, websites, and interactive demonstrations. With this in mind, our project aims to create a versatile webpage template tailored for showcasing research papers effectively. This template will prioritize aesthetics, organized sections, and the ability to incorporate various types of content, such as text, images, videos, and interactive elements like sliders for visualizing how results change with specific parameters. We will investigate the adaptation of existing templates and the development of novel components designed for interactive model demonstrations.

Goals

Develop a generic feature-rich webpage template for showcasing research papers
Experiment with tools to demonstrate machine learning models interactively (e.g. Streamlit, Gradio) and evaluate their potential integration in the template

Research Program: AI for Life

Prerequisites
Proficiency in web development is required.

References

MultiMAE | Multi-modal Multi-task Masked Autoencoders – https://multimae.epfl.ch/
Gradio – https://www.gradio.app/

Type: Semester Project (Research Project)

Level: Bachelor or Master

Contact: Dr. Jean-Marc Odobez, [email protected]

Spatio-temporal Modeling of Human Behavior in Videos

Description
In this project, the primary objective is to explore various spatio-temporal models to derive effective video representations for tasks related to facial behavior, such as head gesture and facial expression recognition. These tasks necessitate rich spatio-temporal representations, yet current methods mostly rely on hand-crafted features. Thus, in this project, the goal is to utilize video encoders in an end-to-end manner that extract effective spatio-temporal features as input to the facial-related task heads.
Furthermore, various facial behavior tasks can be jointly learned through weakly-supervised learning. Thus, in this project, there is potential to develop a method that trains these tasks jointly using pseudo-annotations extracted for the videos.

Goals

Extract a facial tracking tool to extract facial clips
Implement spatio-temporal models and fine-tune
Evaluate the models on human behavior benchmarks, such as CelebV-HQ, CMU-MOSEI, MEAD, CCDb-HG, etc

Research Program: AI for Life

Prerequisites

Proficiency in the Python programming language
Familiarity with deep learning and the PyTorch library
Knowledge of computer vision would be advantageous
A passion for modeling real-world problems and understanding human behavior!

References

Head gesture recognition demo
Video representation for human behavior: Marlin: https://arxiv.org/abs/2211.06627

Level: semester research project (master), master project (PDM)

Contact: Dr. Jean-Marc Odobez, [email protected]

Scaling Pre-training for Gaze Following

Description
Understanding where a person is looking, or gaze following, is vital for a variety of applications including autonomous driving, human-computer interaction and medical diagnosis. Existing models for gaze following are typically trained in a supervised manner on small, manually annotated datasets. The goal of the project is to perform pre-training on large video datasets by leveraging pseudo annotations from strong gaze following models. We also aim to investigate incorporating weak supervision from auxiliary labels to enhance the learned representations.

Goals

After curating diverse video datasets, generate pseudo-annotations for the curated dataset by leveraging strong gaze following models such as [Tafasca et al., 2023]
Use this dataset to pre-train gaze following models on the pseudo annotations
Fine-tune and evaluate the pre-trained gaze following models on annotated video datasets

Research Program: AI for Life

Prerequisites
The project will involve programming in Python using the Pytorch library. Knowledge of deep learning will be required, ideally from previous courses.

References

[Tafasca et al, 2023] Samy Tafasca, Anshul Gupta and Jean-Marc Odobez. (2023). ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

Level: Master: semester or master project. Can be done by multiple students

Contact: Dr. Jean-Marc Odobez, [email protected]