You will find below a list of available bachelor semester projects, master’s semester projects, and a master’s thesis (PDM). If you are interested in a specific project, please get in touch with the person listed under “Contact”, mentioning in the subject of the email the title of the project.
For all these projets, you will receive a number of ECTS credits that depends on the type of project and the program. Working on these projects is not remunerated. The projects can be done in the Fall semester and the Spring semester.
You can also work on a non-credited, part-time, remunerated project as an Assistant. Working on these projects is subject to the EPFL rules on the maximum allowed weekly hours.
(If you are not an EPFL student, you can apply to the open internship at Idiap.)
- Biological networks for language processing
- Automated segmentation of high-content fluorescent microscopy data
- Cultural bias in cross-lingual transfer
- Using large pretrained language models in speech recognition
- Understanding generalization in deep learning
- Grading of inflammatory diseases affecting blood vessels in the eye
- Automatic identification of flight information from speech
- An open-source framework for the quantification of Urban Heat Islands in Switzerland
- Swiss Alpine Lakes & Citizen Science
- Tensor trains for human-guided optimization in robotics applications
- Multi-spectral image unmixing for rapid and automated image annotation
- Deep learning data annotation pipeline for scene understanding towards pedestrian guidance
- Audiovisual person recognition
- Understanding the robustness of machine learning models on underspecified tasks
- Social media and crowdsourcing for social good
- Punctuation restoration on automatic speech recognition output
- Ergodic control for robot exploration
- Clinically interpretable computer aided diagnostic tool using multi-source medical data
- Wavelets as basis functions for applications in robotics
- End-to-end Robotic Manipulation from Verbal Commands
- Assessing radiomics feature stability and discriminative power in 3D and 4D medical data
- Automatic named entity recognition from speech
- A human-centered approach to understand local news consumption
- Data-driven identification of prognostic tumor subpopulations from single-cell RNA sequencing data
- Development of epigenetic biomarkers for chronic pain stratification
- Compartment-specific mRNA metabolism in MNs and ACs in ALS pathogenesis
- A robot manipulator writing texts using a pen
- Speaker identification enhanced by the social network analyser
- Speech/Music Classification
- Error correction in speech recognition using large pre-trained language models
- Automatic speech recognition of air-traffic communication using grammar
- Pathological speech detection in adverse environments
- Pathological speech enhancement
Biological networks for language processing
Description
Biological spiking neural networks are interesting from a scientific (evolution) point of view, as well as from a technical one. In the latter sense, spiking networks offer advantages over artificial ones in terms of recurrence, coding and power consumption. Recently we have shown that spiking neurons can be combined freely with artificial ones in the same architecture, showing promise in speech processing tasks. The object of the project would be to extend this work towards processing of discrete entities such as words at the level of language rather that audio. Some progress has already been made in the literature with the likes of “Spikeformer”, being a transformer equivalent.
Goals
1. Investigate the issues around using spiking for discrete outputs.
2. Show that mainly spiking nets can replace, say, recurrent artificial components.
Research Program: AI for Life
Prerequisites
The project will involve programming in python using the Pytorch library. Some knowledge of deep learning will be required, ideally from previous courses.
Reference
- Alexandre Bittar and Philip N. Garner. A surrogate gradient spiking baseline for speech command recognition. Frontiers in Neuroscience, 16, August 2022. http://dx.doi.org/10.3389/fnins.2022.865897
Level: Master
Contact: Phil Garner, [email protected]
Automated segmentation of high-content fluorescent microscopy data
Description
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive and incurable neurodegenerative disease. The early events underlying the disease remain poorly understood. As a dramatic consequence no effective treatment has been developed. We previously found that the molecular events leading to ALS start during early development. It remains however unknown how and when these affect individual cell behaviour. This project aims to study how molecular biology shapes cellular morphology at early stage of ALS by integrating longitudinal cellular imaging with genomic data and it involves a close collaboration with the experimental laboratory of professor Rickie Patani, Francis Crick Institute/UCL.
Goals
The goal of the project is to develop an image analysis pipeline to extract and analyse single-cell phenotypic measurements from large-scale time-lapse fluorescence imaging data from astrocytes and motor neurons in culture. Specifically, it will involve
1) expansion on existing image analysis modules to obtain robust single-cell readouts from longitudinal images; and
2) development of statistical models to identify cellular trajectories associated with early stage of ALS development using the phenotypic features obtained in 1).
Research Program: AI for Life
Prerequisites
Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in image processing and analysis, and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Cultural bias in cross-lingual transfer
Description
Multilingual transformers have proven to be successful at cross-lingual transfer for multiple tasks in Natural Language Processing (NLP). In such a setup, a multilingual transformer model is fine-tuned on a given source language, for which annotations exist, and the resulting model is tested on a task in a target language by providing a few annotated examples or none. The standard approach to multilingual dataset creation for semantic tasks is translation from an existing (English) dataset, which introduces concerns regarding their suitability, because translations, especially to culturally diverse languages, may break certain relations, such as perceived causal relations in social behaviour, or moral values. Further information.
Goals
- Estimate the level of cross-lingual transfer of cultural biases on existing datasets.
- Create datasets for the purpose of estimating cross-lingual transfer of cultural biases.
- If relevant and if time permits you will develop ways to mitigate this effect.
Research Program: AI for Everyone
Level: Bachelor/ Master
Contact: Lonneke van der Plas, [email protected]
Using large pretrained language models in speech recognition
Description
The aim of this project is to measure how large language models perform in their native cradle – automatic speech recognition. The student will analyze a standard speech dataset through one of the speech recognition models (e.g., publicly available or internal at Idiap), score the outputs with the language models, and combine the scores to refine the transcriptions. The result should be a verdict on the influence of the model size (are the big ones really needed?), a comparison of different models (is GPT better than the same-size LLaMA?) and an evaluation of the usefulness of retraining the language model, which is easy today even on a single GPU.
Goals
- Familiarize with speech recognition engines, available at Idiap
- Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
- Explore large language models for its deployment in speech recognition.
Research Program: Human-AI Teaming
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning.
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Understanding generalization in deep learning
Description
State-of-the-art approaches in machine learning are based on deep learning. The reasons for its success are however still poorly understood. Most existing work on the topic has focused on the effects of gradient-based optimization. Interestingly though, even randomly-initialized networks encode inductive biases that mirror some properties of real-world data. This project will contribute to the efforts made in our research group to understand the success of deep learning. This project will emphasize theoretical or practical contributions depending on the student’s interests. One of the objectives is to contribute to a high-quality publication co-authored with other members of our research group, and provide the student with training in rigorous research practices.
Goals
- Select datasets of interest and train various architectures on these.
- Implement methods or use existing code from recent publications to understand the interplay of various properties of data vs. architectures.
- Prepare results, visualizations, and analyses of experiments suitable for a scientific publication.
Prerequisites
Solid programming background and experience with deep learning libraries (e.g. Pytorch)
References
- Loss Landscapes are All You Need (https://openreview.net/forum?id=QC10RmRbZy9)
- Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning (https://arxiv.org/abs/2207.02598)
Level: Bachelor/ Master
Contact: Damien Teney, [email protected]
Grading of inflammatory diseases affecting blood vessels in the eye
Description
Fluorescein angiography is the only clinical method to evaluate the function and integrity of the blood-retinal barrier. Using real hospital data, we aim to detect and grade inflammatory diseases affecting blood vessels in the eye. Through computer vision and machine learning approaches the student will identify novel biomarkers that could improve patient management and care. This project is a collaboration with a multi-centric medical team to identify promising new leads in the research of this field. Challenges include the segmentation and registration of (retinal) fundus angiography data, and the grading of diseased patients.
Goals of the project
- To develop a system for the detection and grading of inflammatory eye diseases in medical images and video.
- Validate the proposed approach and compare results to the state-of-the-art techniques.
- Work together with clinical experts on improving the current understanding of the disease.
Research Program: AI for Life
Prerequisites
Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)
Reference
- Tugal-Tutkun, I., Herbort, C. P., Khairallah, M., & Angiography Scoring for Uveitis Working Group (ASUWOG). (2010). Scoring of dual fluorescein and ICG inflammatory angiographic signs for the grading of posterior segment inflammation (dual fluorescein and ICG angiographic scoring system for uveitis). International ophthalmology, 30, 539-552.
Level: Master
Contact: Andre Anjos, Oscar Jimenez-del-Toro, [email protected]
Automatic identification of flight information from speech
Description
Current approaches toward automatic recognition of call-signs from speech combine conventional automatic speech recognition (i.e. speech-to-text) with entity recognition (i.e. text-to-call-sign) technologies. This project will develop a unified module (e.g. adaptation of well-known BERT models), which will allow a direct mapping of speech on the call-sign.
Goals
- Get familiar with a baseline of speech recognition module for Air Traffic Control (ATC)
- Get familiar with a baseline of concept-extractor module for ATC
- Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules.
Research Program: Human-AI Teaming
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning
References
- Martin Kocour, et al, Boosting of contextual information in ASR for air-traffic call-sign recognition
- Zuluaga, et al: Contextual Semi-Supervised Learning: An Approach To Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems
- ATCO2 project
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
An open-source framework for the quantification of Urban Heat Islands in Switzerland
Description
Cities throughout the world are overheating in Summer, with adverse effects on the health of citizens. Due to the very mineral nature of the built environment, the scarcity of nature in cities, and the anthropogenic heat release in the streets, the temperature will continue to increase with climate change. With physically-based simulation tools we can predict hot spots, and evaluate scenarios for the mitigation of urban heat islands. While such tools exist, a framework based on open-data and easily accessible by researchers, practitioners and citizens is a must have to raise awareness and move towards efficient heat islands mitigation measures.
Goals
- Build an open-source framework in Python (or any other language except Matlab) to go from Swiss open-datasets to indicators related to the Urban Heat Island effect,
- Introduce the Physiological Equivalent Temperature (PET) and the Universal Thermal Climate Index (UTCI) as indicators of Urban Comfort in our simulation tool,
- Demonstrate the application of scenarios on 3 case-studies representative of the Swiss landscape, quantifying improvement measures.
Research Program: Sustainable and Resilient Societies
Prerequisites
Basic energy balance and thermodynamics knowledge; Basic scripting or programming skills (no Matlab).
References
- Coccolo, Silvia, Jérôme Kämpf, Jean-Louis Scartezzini, and David Pearlmutter. ‘Outdoor Human Comfort and Thermal Stress: A Comprehensive Review on Models and Standards’. Urban Climate 18 (December 2016): 33–57. https://doi.org/10.1016/j.uclim.2016.08.004.
- Coccolo, Silvia, David Pearlmutter, Jerome Kaempf, and Jean-Louis Scartezzini. ‘Thermal Comfort Maps to Estimate the Impact of Urban Greening on the Outdoor Human Comfort’. Urban Forestry & Urban Greening 35 (October 2018): 91–105. https://doi.org/10.1016/j.ufug.2018.08.007.
Level: Master
Contact: Jérôme Kämpf, [email protected]
Swiss Alpine Lakes & Citizen Science
Description
The 2000Lakes initiative aims to catalog the microbial diversity in Swiss alpine lakes while developing a network of citizen science and stakeholders. We are looking for several motivated students interested in alpine science, data science, and human-centered research to develop a master thesis project or semester project on this topic. This project offers the possibility of contributing to an innovative approach to scientific research.
Goals
- Develop creative actions to inform and consolidate a network of stakeholders engaged with biodiversity in alpine lakes.
- Develop computational tools (using data visualization, social media, media archives) to support interaction with citizens and stakeholders.
- Participate in fieldwork and publications in the field of citizen science.
Research Program: AI for Everyone
Prerequisites
Interest and/or experience in one or more of these areas: social media, citizen science, community organizing, data visualization, data analysis, machine learning
Level: Bachelor/ Master
Contact: Daniel Gatica-Perez, [email protected]
Tensor trains for human-guided optimization in robotics applications
Description
This project extends Tensor Train for Global Optimization (TTGO) to a human-guided learning strategy. Learning and optimization problems in robotics are characterized by two types of variables: task parameters representing the situation that the robot encounters (typically related to environment variables such as locations of objects, users or obstacles); and decision variables related to actions that the robot takes (typically related to a controller acting within a given time window, or the use of basis functions to describe trajectories in control or state spaces). In TTGO, the density function is modeled offline using a tensor train (TT) that learns the structure between the task parameters and the decision variables, and then allows conditional sampling over the task parameters with priority for higher-density regions. Further information.
Goals
The goal is to test whether the original autonomous learning strategy of TT-Cross can be extended to a human-guided learning strategy, by letting the user sporadically specify task parameters or decision variables within the iterative process. The first case can be used to provide a scaffolding mechanism for robot skill acquisition. The second case can be used for the robot to ask for help in specific situations.
Research Program: Human-AI Teaming
Prerequisites
Linear algebra, optimization, programming in Python
Reference
- Shetty, S., Lembono, T., Löw, T. and Calinon, S. (2023). Tensor Train for Global Optimization Problems in Robotics. arXiv:2206.05077.
https://sites.google.com/view/ttgo
Level: Bachelor/ Master
Contact: Sylvain Calinon, [email protected]
Multi-spectral image unmixing for rapid and automated image annotation
Description
Object segmentation and identification methods often rely on the availability of large annotated image libraries. While such libraries are widely available for every-day image scenes, many applications in industry, science and medicine lack similar data because of their unique and specialized nature. The student will implement and characterize the potential of imaging scenes using a multi-spectral (colored) illumination patterns to facilitate object annotation in complex scenes. The project will involve the use of a custom hardware imaging setup consisting of a digital camera with triggered multi-color light sources, collecting images of objects, and implementing computational imaging algorithms for spectral unmixing and image segmentation.
Goals
- Implement a multi-spectral image acquisition protocol using triggered LEDs of various wavelengths to acquire images of objects in a lab setting.
- Implement a spectral unmixing algorithm to segment objects in images.
- Depending on progress, deployment deploy method in a light microscope for imaging biological samples.
Research Program: AI for Life
Prerequisites
Signal processing/image processing, Introduction to machine learning, Python programming.
References
- Jaques, E. Pignat, S. Calinon and M. Liebling, “Temporal Super-Resolution Microscopy Using a Hue-Encoded Shutter,” Biomedical Optics Express, 10(09):4727-4741, 2019
- Jaques, L. Bapst-Wicht, D.F. Schorderet and M. Liebling, “Multi-Spectral Widefield Microscopy of the Beating Heart through Post-Acquisition Synchronization and Unmixing,” IEEE International Symposium on Biomedical Imaging (ISBI 2019), pp. 1382-1385, 2019
Level: Master
Contact: Michael Liebling, [email protected]
Deep learning data annotation pipeline for scene understanding towards pedestrian guidance
Description
This project takes place within the framework of the Biped project, which designs and develops computer vision and machine learning algorithms to improve the assisting system commercialized by Biped-AI. One important part of the Biped project consists of collecting video data from users walking in indoor and outdoor environments which will be further annotated and later be used to train neural networks aiming at understanding the environment, especially for detecting obstacles, people, objects (e.g. cars, bicycles, benches), as well as recognizing the main scene elements like the ground or walls, all this in order to identify a safe pathway. Further information.
Goals
- Study the panoptic segmentation principle and the main deep networks in the state of the art doing this task such as EfficientPS, Mask2Former, as well as considering or Segment Anything Model (SAM) to be acquainted with the models;
- Apply these models on a small set of relevant images and evaluate their suitability to annotate relevant items (obstacles, people, cars, ground, and other elements);
- Define an annotation protocol and develop a graphical interface using existing tools for labelling project images, e.g. by assigning or refining annotations suggested by the pre-trained networks;
Research Program: AI for Everyone
Prerequisites
Good command of Python, basics of Linux, deep learning background, basics of computer vision.
References
- SAM: segment anything model. https://segment-anything.com/.
- http://panoptic.cs.uni-freiburg.de/.
- https://github.com/facebookresearch/Mask2Former.
Level: Bachelor/ Master
Contact: Jean-Marc Odobez, [email protected]
Audiovisual person recognition
Description
Audiovisual person identification systems combine two biometric modalities that lead to very good results, as shown in Idiap’s submission to NIST SRE2019. The student will be able to use most of Idiap’s scripts, mainly the audio-related part. Fusion scripts for combining audio and visual systems can also be shared. One of two approaches can be considered, either to develop these systems separately and then experiment with fusion, or attempt to make a single person identification system taking both audio and visual embedding representations as input.
Research Program: Sustainable and Resilient Societies
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning.
References
The 2019 NIST Audio-Visual Speaker Recognition Evaluation
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Understanding the robustness of machine learning models on underspecified tasks
Description
The performance of deep learning models can quickly degrade when used on test data beyond their training distribution. In recent work [1], we have observed intriguing patterns in the “in-distribution” vs. “out-of-distribution” performance of various models. In particular, there sometimes exists a tradeoff between the two, which evolves during its training and fine-tuning. It is not clear however what impact the pre-training and fine-tuning stages have. This project will contribute to the efforts to understand this topic. One of the objectives is to concretely contribute to a high-quality publication co-authored with other members of our research group.
Goals
- Select datasets of interest and train models with existing code.
- Examine the performance of various models under various hyper-parameters, numbers of epochs, pre-training/fine-tuning options, etc. Develop model selection strategies to identify robust models.
- Prepare results, visualizations, and analyses of experiments suitable for a scientific publication.
Prerequisites
Solid programming background and experience with deep learning libraries (e.g. Pytorch)
References
[1] ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets (https://arxiv.org/abs/2209.00613)
[2] The Evolution of OOD Robustness Throughout Fine-Tuning (https://arxiv.org/abs/2106.15831)
Level: Bachelor/ Master
Contact: Damien Teney, [email protected]
Social media and crowdsourcing for social good
Description
The student will contribute to a multidisciplinary initiative for the use of social media and mobile crowdsourcing for social good. Several projects are available. Students will be working with social computing researchers working with academics in other countries, both in Europe and the Majority World.
Goals
- Social media analytics
- Visualization of social and crowdsourced data
- Smartphone apps for mobile crowdsourcing
Research Program: AI for Everyone
Prerequisites
Interest and/or experience in one or more of these areas: data analysis, machine learning, data visualization, phone apps, social media, natural language processing, computer vision
Level: Bachelor/ Master
Contact: Daniel Gatica-Perez, [email protected]
Punctuation restoration on automatic speech recognition output
Description
The goal of the project is to train a model to post-process automatic speech recognition (ASR) output and add punctuation marks (and capitalizations for the next level of difficulty). This will improve readability of an ASR output and make it potentially more useful for other down-stream tasks, such as dialogue systems and language analysis.
Goals
- Get acquainted with the problem, available data, success metrics, machine learning frameworks
- Program a simpler system predicting just sentence ends/full stops; Improve and make predictions for other punctuation marks; For extra difficulty learn to predict capital letters
- Test and evaluate on a couple of languages, real scenarios
Research Program: Human-AI Teaming
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning.
References
- Yi, et al. Adversarial Transfer Learning for Punctuation Restoration
- Pais, et al., Capitalization and punctuation restoration: a survey
- Nanchen, et al. EMPIRICAL EVALUATION AND COMBINATION OF PUNCTUATION PREDICTION MODELS APPLIED TO BROADCAST NEWS
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Ergodic control for robot exploration
Description
Ergodic control can be exploited in a range of robotics problems requiring the exploration of regions of interest, e.g. when the available sensing information is not accurate enough for a standard controller, but can guide the robot towards promising areas. In a collaborative task, it can also be used when the operator’s input is not accurate enough to fully reproduce the task, which then requires the robot to explore around the requested input (e.g., a point of interest selected by the operator). For picking and insertion, it can be applied to move around the picking/insertion point, thereby facilitating the prehension/insertion. It can also be employed for active sensing and localization (either detected autonomously, or with help by the operator). Further information.
Goals
To study the pros and cons of Spectral Multiscale Coverage and Heat Equation Driven Area Coverage to solve robot manipulation problems.
Research Program: Human-AI Teaming
Prerequisites
Control theory, signal processing, programming in Python, C++ or Matlab/Octave
References
- Mathew and I. Mezic (2009). Spectral multiscale coverage: A uniform coverage algorithm for mobile sensor networks. In Proc. IEEE Conf. on Decision and Control.
- Ivić, B. Crnković, and I. Mezić (2007). Ergodicity-based cooperative multiagent area coverage via a potential field. IEEE Trans. on Cybernetics.
Level: Bachelor/ Master
Contact: Sylvain Calinon, [email protected]
Clinically interpretable computer aided diagnostic tool using multi-source medical data
Description
Fighting many rare diseases would benefit from automated image analysis tools to improve the available understanding about them. One of such rare diseases is fibromuscular dysplasia (FMD), which is an under-recognized disease of the blood vessels. Challenges include the segmentation of the renal artery from larger 3D volumes, and the classification of FMD from healthy patients. The main tasks of the project include: Literature review, Medical image analysis i.e., segmentation of 3D tubular structures in real computed tomography images, deep-learning disease detection, and proposing novel approaches to improve the understanding of this disease.
Goals
- Improve characterization of the renal artery in computed tomography scans.
- Build an interpretable machine learning system using clinical imaging data for FMD
- Develop – together with clinical experts – a computer aided diagnostic tool for this disease.
Research Program: AI for Life
Prerequisites
Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)
Reference
- Bruno, R. M., Mischak, H., & Persu, A. (2020). Multi-omics applied to fibromuscular dysplasia: first steps on a new research avenue. Cardiovascular research, 116(1), 4-5.
Level: Bachelor/ Master
Contact: Andre Anjos, Oscar Jimenez-del-Toro, [email protected]
Wavelets as basis functions for applications in robotics
Description
Basis functions can be used to encode signals through a weighted superposition of basis functions, acting as a dictionary of simpler signals. The dictionary can be any set of basis functions, including radial basis functions (RBFs), Fourier basis functions, or Bernstein basis functions (used for Bézier curves). Basis functions can be used to encode trajectories, whose input is a 1D time variable and whose output can be multidimensional. Basis functions can also be used to encode signals generated by multivariate inputs. For example, a Bézier surface uses two input variables to cover a spatial range and generates an output variable describing the height of the surface within this rectangular domain. Further information.
Goals
The project extends the above approach to the use of wavelet basis functions and to study the property of wavelets in the context of robot manipulation skills. Wavelets encompass both spatial and spectral properties, which makes them a good candidate to encode functions at different resolutions. The approach will be tested in the context of signed distance functions (see the second reference below).
Research Program: Human-AI Teaming
Prerequisites
Signal processing, programming in Python, C++ or Matlab/Octave
References
- Calinon, S. (2019). Mixture Models for the Analysis, Edition, and Synthesis of Continuous Time Series. Bouguila, N. and Fan, W. (eds). Mixture Models and Applications, pp. 39-57. Springer.
- Li, Y., Zhang, Y., Razmjoo, A. and Calinon, S. (2023). Learning Robot Geometry as Distance Fields: Applications to Whole-body Manipulation. ArXiv 2307.00533.
Level: Bachelor/ Master
Contact: Sylvain Calinon, [email protected]
End-to-end Robotic Manipulation from Verbal Commands
Description
With the recent advances in machine learning, new end-to-end networks can cover the whole process from natural language understanding to command grounding in visual inputs to robot motion (e.g., https://cliport.github.io/). This project aims to build on recent advances to develop new learning architecture to control assistive robot from verbal commands. In this project, you will develop deep learning methods to control robot from visual and voice inputs. You will re-implement state-of-the-art ML systems and extend them to control robots both in simulation and with real platforms. Further information.
Goals
- Analyze the state of the art of end-to-end methods for robot manipulation.
- Deploy existing methods (e.g., https://cliport.github.io/) in simulation.
- Test the algorithm on real hardware.
Research Program: Human-AI Teaming
Prerequisites
Good command of Python, good experience with deep learning systems (e.g., pytorch), basics of Linux. Experience in robotics (inverse kinematics, control, or system architecture) would be a plus.
References
Shridhar, Mohit, Lucas Manuelli, and Dieter Fox. “Cliport: What and where pathways for robotic manipulation.” Conference on Robot Learning. PMLR, 2022.
Level: Bachelor/ Master
Contact: Emmanuel Senft, [email protected] , Jean-Marc Odobez, [email protected]
Assessing radiomics feature stability and discriminative power in 3D and 4D medical data
Description
Radiomic features obtained from medical images and video can objectively quantify relevant information present in clinical studies. However, recent studies have shown that some of these features can be unstable and redundant, as features can be sensitive to variations of acquisition details. Therefore, reproducibility and discriminative power cannot be treated in isolation boosting the identification of the best features that show a show a higher tolerability towards those influences. Challenges include: Determine the stability of radiomics features against parameter variations during acquisition, as well as across different time points between patient studies.
Goals
- Implementation and analysis of radiomic features extracted from 3D and 4D medical data
- Identifying the most relevant features according to their variability and stability in different radiological tasks
- Proposing novel approaches to mitigate biases and limitations of these features in a real clinical scenario.
Research Program: AI for Life
Prerequisites
Data analysis, machine learning, computer vision, programming (at least Python, and shell scripting languages for Linux)
Reference
- Jimenez-del-Toro, O., Aberle, C., Bach, M., Schaer, R., Obmann, M. M., Flouris, K., … & Depeursinge, A. (2021). The discriminative power and stability of radiomics features with computed tomography variations: task-based analysis in an anthropomorphic 3D-printed CT phantom. Investigative radiology, 56(12), 820-825.
Level: Bachelor/ Master
Contact: Andre Anjos, Oscar Jimenez-del-Toro, [email protected]
Automatic named entity recognition from speech
Description
The project will improve detection and recognition of named entities (e.g. names, places, locations) automatically from speech. Currently, two independent technologies are used, namely automatic speech recognition (i.e. usually evaluated to minimise a word error rate) and named entity recogniser. The goal of this project is to efficiently combine these two modules, while leveraging state-of-the-art open source tools such as SpeechBrain or BERT.
Goals
- Get familiarized with a baseline of speech recognition module developed in ROXANNE
- Get familiarized with a baseline entity extractor module
- Apply an end-to-end framework to train both modules together and compare its performance with independently trained modules.
Research Program: Human-AI Teaming
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning.
References
- Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations
- Mael Fabien, et al.,BertAA: BERT fine-tuning for Authorship Attribution
- ROXANNE project website
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
A human-centered approach to understand local news consumption
Description
The project aims to design and implement a framework to study the consumption of local news in the European multicultural context. The project will include a combination of research methods for experimental design and data analysis, and will be done in the context of the AI4Media European project, a European Excellence Center for Media, Society, and Democracy.
Goals
The specific goals of the project include: literature review; identification of news sources; mixed-method experimental design; experiments and data analysis; and writing.
Research Program: AI for Everyone
Prerequisites
Interest and/or experience in one or more of these areas: data analysis, machine learning, data visualization, phone apps, social media, natural language processing, computer vision
Level: Bachelor/ Master
Contact: Daniel Gatica-Perez, [email protected]
Data-driven identification of prognostic tumor subpopulations from single-cell RNA sequencing data
Description
This project is part of a larger one aiming to integrate single-cell sequencing data with imaging data in order to develop accurate machine learning methods to identify tumor subpopulations. It involves a close collaboration with the Department of oncology UNIL CHUV header by Prof. Olivier Michielin and the Novartis Institute for Biomedical Research. Accumulating evidence shows aberrant mRNA metabolism in cancer however relatively little is known about the impact of genetic mutation on mRNA metabolism in cancers and how this confers resistance to therapy.
Goals
In this project the student will develop and implement bioinformatics pipelines to study alternative splicing and polyadenylation from single-cell transcriptome of Braf inhibitors resistant melanoma. This will then serve to test whether combining measurements from gene and alternative 3′ UTR expression enable the identification of subtle subpopulations that confer drug resistances.
Research Program: AI for Life
Prerequisites
Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Development of epigenetic biomarkers for chronic pain stratification
Description
Chronic pain is a major health care problem that affects millions of people worldwide. It has been demonstrated that complex interactions between biological, psychological, environmental, and social factors may influence pain chronicization. Therefore, epigenetic factors may be the trigger to explain the transition from acute to chronic pain and chronic pain maintenance. However, little is known about the influence of these biopsychosocial factors on epigenetic modifications in a population of chronic musculoskeletal pain patients consecutively to an orthopedic trauma. This project will analyze the whole genome methylation levels in a population of chronic pain patients and healthy controls through the prism of specific biological (age, medication) and psychological (anxiety/depression) factors.
Goals
This biological project will undertake bioinformatic analyses of methylation sites on the whole genome to identify specific genes that may be involved in the transition from acute to chronic pain. This project will be in collaboration with the medical research group at the Clinique romande de readaptation (CRR, Betrand Leger), where the student is expected to spend 20% of their time.
Research Program: AI for Life
Prerequisites
Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
Compartment-specific mRNA metabolism in MNs and ACs in ALS pathogenesis
Description
This project is part of a larger one aiming to study how molecular biology shapes cellular morphology at early stage of Amyotrophic Lateral Sclerosis (ALS) by integrating longitudinal cellular imaging with genomic data. It involves a close collaboration with the experimental laboratory of Prof. Rickie Patani, Francis Crick Institute/UCL. We recently uncovered cytoplasmic accumulation of aberrant intron retaining transcripts (IRTs) as the earliest detectable molecular phenotype in ALS 1–4. The mechanisms that control RNA binding protein mislocalization, the molecular hallmark of ALS, have yet to be elucidated and it remains unknown whether the early aCIRT relates to protein mislocalization, ER stress, mitochondrial depolarisation, oxidative stress, synaptic loss and cell death.
Goals
1) to study the temporal and spatial dynamics of intronic and 3′ UTR sequences in developing MNs and ACs derived from ALS-mutant and control iPSC cell lines using time-resolved RNA-sequencing data from nuclear and cytoplasmic fractions;
2) to characterise the sequence features of cytoplasmic and nuclear cytoplasmic IRTs and 3′ UTR;
3) to develop an mRNA subcellular localisation model using machine learning methods.
Research Program: AI for Life
Prerequisites
Candidates should have strong mathematical and computational skills. Candidates should be familiar with Python/R, and with the Linux environment. Experience in sequencing data and machine learning is an asset. Candidates do not necessarily have to have a biological background but should have a strong desire to directly work with experimental biologists.
References
- Luisier, R. et al. Intron retention and nuclear loss of SFPQ are molecular hallmarks of ALS. Nat. Commun. 9, 2010 (2018).
- Tyzack, G. E. et al. Widespread FUS mislocalization is a molecular hallmark of amyotrophic lateral sclerosis. Brain 142, 2572–2580 (2019).
- Hall, C. E. et al. Progressive Motor Neuron Pathology and the Role of Astrocytes in a Human Stem Cell Model of VCP-Related ALS. Cell Rep. 19, 1739–1749 (2017).
- Tyzack, G. E. et al. Aberrant cytoplasmic intron retention is a blueprint for RNA binding protein mislocalization in VCP-related amyotrophic lateral sclerosis. Brain vol. 144 1985–1993 Preprint at https://doi.org/1093/brain/awab078 (2021).
Level: Bachelor/ Master
Contact: Raphaëlle Luisier, [email protected]
A robot manipulator writing texts using a pen
Description
This project proposes to develop a robot that can take any text as input and write the corresponding sequence of letters on a piece of paper. In order to look natural, the use of a typeface vector font such as Hershey will be investigated as first starting point. Such a vector font format can be employed to represent alphabet characters by a set of strokes forming the skeleton of the letters instead of their outlines (as in the conventional font formats). The project will be implemented with a 6-axis UFactory Lite-6 robot (https://www.ufactory.cc/lite-6-collaborative-robot), available at Idiap. Further information.
Goals
Several aspects will have to be considered in the project, such as placement and segmentation of the text on the page, the planning and control approach using inverse kinematics, as well as estimation of the writing result through the camera embedded within the robot arm.
Research Program: Human-AI Teaming
Prerequisites
Linear algebra, programming in Python or C++
References
- Hershey Fonts – https://pypi.org/project/Hershey-Fonts/, https://en.wikipedia.org/wiki/Hershey_fonts
- Robotics codes from scratch (RCFS) – https://rcfs.ch/
- Calinon, S. (2023). Learning and Optimization in Robotics – Lecture notes. – https://rcfs.ch/doc/rcfs.pdf
Level: Bachelor/ Master
Contact: Sylvain Calinon, [email protected]
Speaker identification enhanced by the social network analyser
Description
The project will build, test and combine technologies associated with the ROXANNE platform by leveraging open source tools (e.g. SpeechBrain, and SocNetV) to demonstrate their strength in an improved identification of persons. The project definition can be adapted toward application of other modalities (e.g. estimating authorship attribution from text, or detection of person using face identification technology).
Goals
- Build a baseline automatic speaker identification engine, either using an open source tool (such as SpeechBrain, or the one available at Idiap), and test it on target (simulated) data related to lawful investigation.
- Build a baseline graph/network analysis tool with basic functionalities such as centrality or community detection (i.e. also many open source tools can be exploited) and test it on the simulated data
- Study a combination of information extracted by speech and network analysis technologies to eventually improve the person identification.
Research Program: Sustainable and Resilient Societies
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning.
References
- Mael Fabien, et al, ROXANNE Research Platform: Automate criminal investigations,
- ROXANNE project website
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Speech/Music Classification
Description
Classifying sound into speech, music and possibly noise is important for systems based on statistical modeling. Statistical models are usually trained on a large database of input signals containing various sounds. In both the training process and the testing process it is advantageous to exclude segments containing non-speech sounds to improve the accuracy of the model. This project will develop a classifier discriminating speech from music and potentially also from noise. You will first analyze existing approaches to speech/music classification and evaluate their efficiency and accuracy using conventional metrics for binary classification. You will then propose your own classifier or improve an existing one.
Goals
- Familiarize with voice activity detectors, or existing speech/music detectors available publicly or at Idiap
- Develop a new speech/music classifier
- Evaluate the new technology with baseline on well-established data.
Research Program: AI for Life
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning
References
- Banriskhem K.Khonglah: Speech / music classification using speech-specific features, Digital Signal Processing, Volume 48, January 2016, Pages 71-83
- Mrinmoy Bhattacharjee: Time-Frequency Audio Features for Speech-Music Classification
- Toni Hirvonen: Speech/Music Classification of Short Audio Segments, 2014 IEEE International Symposium on Multimedia
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Error correction in speech recognition using large pre-trained language models
Description
The aim of this work will be to find out if it is possible to use these language models for the correction of errors in the transcription of spoken speech. The student will run some standard speech set through one or more publicly available speech transcription models and then investigate how the language models are able to correct errors: Does the overall error rate matter? Are there any classes of errors that are better fixed? Is it better to use a traditional language model (e.g. LLaMA) or a conversational one (e.g. Alpaka)?
Goals
- Familiarize with speech recognition engines, available at Idiap
- Focus on application of language models in the speech recognition framework (including its use for re-scoring of N-best hypotheses)
- Explore large language models for its deployment to post-process speech recognition output.
Research Program: Human-AI Teaming
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Automatic speech recognition of air-traffic communication using grammar
Description
Current state-of-the-art speech-to-text systems (i.e. automatic speech recognition engines (ASR)) applied to air-traffic control exploit statistical language models which require large amounts of textual data for training. Nevertheless, the Air Traffic Controller Officers (ATCOs) are required to strictly follow the phraseology (i.e. standardised International Civil Aviation Organization, ICAO) and thus context-free grammar (CFG) can be used to model sequences of words generated by ATCOs. The goal of this project is to explore new ways how traditional concepts of statistical language modeling can be enriched by standardised phraseology (i.e. modeled by CFG-based language modeling).
Goals
- Develop a baseline automatic speech recognition engine in Kaldi framework suited for air-traffic controllers
- Explore use of CFG-based language model in ASR allowing to model sequences of words (i.e. replacing the statistical language model or enriching them)
- Compare the performance of new language model on ASR tasks
Research Program: Human-AI Teaming
Prerequisites
Python programming, Shell programming, basic knowledge of machine learning
References:
- Oualil, et al, A Context-Aware Speech Recognition And Understanding System For Air Traffic Control Domain
- Oualil, et al, Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition
Level: Bachelor/ Master
Contact: Petr Motlicek, [email protected]
Pathological speech detection in adverse environments
Description
Various conditions of brain damage may disrupt the speech production mechanism, resulting in motor speech disorders that encapsulate altered speech production in different dimensions. To diagnose motor speech disorders, we have developed automatic speech processing approaches. Such approaches however can fail to cope with realistic clinical constraints, i.e., the presence of noise and reverberation when recording speech in clinical settings. This project will contribute to the efforts made in our group to understand the performance of state-of-the-art approaches in adverse environments and develop appropriate approaches targeting such scenarios.
Goals
- Set up datasets of interest.
- Implement existing approaches and/or get familiar with existing implementations.
- Examine the performance of various approaches in adverse environments.
- If relevant and time permits, develop novel approaches targeting adverse scenarios.
Research Program: AI for Life
Prerequisites
Python programming; basic knowledge of machine learning
Level: Bachelor/Master
Contact: Ina Kodrasi, [email protected]
Pathological speech enhancement
Description
Speech signals recorded in an enclosed space by microphones placed at a distance from the source are often corrupted by reverberation and background noise, which degrade speech quality, impair speech intelligibility, and decrease the performance of automatic speech recognition systems. Speech enhancement approaches to mitigate these effects have been devised for neurotypical speakers, i.e., speakers without any speech impairments. However, pathological conditions such as hearing loss, head and neck cancers, or neurological disorders, disrupt the speech production mechanism, resulting in speech impairments across different dimensions. This project will contribute to our efforts to understand the performance of state-of-the-art approaches for pathological signals and develop appropriate approaches targeting pathological speech.
Goals
- Set up datasets of interest.
- Implement existing approaches and/or get familiar with existing implementations.
- Examine the performance of various approaches for pathological speech signals.
- If relevant and time permits, develop novel approaches targeting pathological speech.
Research Program: AI for Life
Prerequisites
Python programming; basic knowledge of machine learning
Level: Bachelor/Master
Contact: Ina Kodrasi, [email protected]