iGH project proposals

Open for MSc semester projects, Interns and PhDs

The projects are modular and can be divided up into smaller chunks for shorter student stays.

Fall 2021–>Several Master Thesis/Semester projects OPEN!!!

Others also available–just email me to meet up 🙂

UPDATED: 06 July 2021

DYNAMIC diagnostic algorithms for resource-limited settings

See video

The WHO has developed a set of algorithms to guide clinicians through structured consultations in resource-limited settings. However, these algorithms are static, generic, rule- based decision trees printed on paper that are derived from outdated data or extrapolated from non-representative populations.
MedAL is a suite of MEDical ALgorithms to develop data-driven versions of these paper guidelines. More specficially MedAL-ai (which you will develop in this project) aims to better adapt decision trees to changing trends in local data in such a way that it improves health outcomes and reduces resource consumption. The static version of such a tool (ePOCT without data-driven algorithms) is currently fully functional and will be deployed in Tanzania and Rwanda in 2021 to guide over a million consultations.
You will work with the clinical and IT teams of a large interdisciplinary project (DYNAMIC) to build a novel pipeline that allows clinicians to generate, explore and implement interpretable machine learning algorithms derived from the data collected with the iPOCT tool. A prototype of the platform is already functional. During this project, you will be closely supported and co-supervised by a medical doctor and guided by the requirements of the clinical and IT teams.

Keywords: multidisciplinary, supervised ML, interpretable ML, visualisations, intelligent imputation, python, java, web-based dashboard development.


  • Cécile Trottet (MSc Thesis)
  • Co-supervisor: Thijs Vogels (PhD student)


  • Liamarcia Bifano (MSc Thesis)
  • Jonathan Donz (MSc Semester project)
  • Theo Bian (MSc thesis)
  • Antoine Basseto (BSc Semester project)
Detecting and visualising patterns in medical data to guide targeted interventions and medical training

(clinical, epidemiology)
Emerging and re-emerging infectious disease (e.g. COVID-19, Ebola, Dengue, Zika, Chikungunya etc.) are defined by their unexpectedness.
As early detection of these novel patterns of disease is critical to their control, we propose to automate the detection of epidemiologically relevant patterns that could better guide and scale interventions, and alert clinicians to evolving trends.

A proof of concept clustering visualisation platform has been developed and integrated into medAL-ai (cf first project). You will build on this tool to exploring various metrics and approaches of semisupervised and unsupervised anomaly detection.   During this project, you will be closely supported and co-supervised by a medical doctor and guided by the requirements of the clinical team.

Keywords: multidisciplinary, unsupervised ML, interpretable ML, visualisations++, intelligent imputation, python, web-based dashboard development.


  • Gasser Elbanna (MSc Intern)


  • Zeineb Sahnoun (MSc Semester project)
  • Kuan Tung (MSc Semester project)
DeepChest: Lung Ultrasound for diagnosis and risk stratification in COVID-19


Lung ultrasound (LUS) is a non-invasive, pragmatic and time-tested tool for evaluating and discriminating respiratory pathology at the bedside. Indeed, when LUS is interpreted by expert clinicians, its predictive accuracy can match CT scanners.
Recently, ultrasound-on-a-chip technology has made LUS extremely portable (pluggable into a mobile phone) and cheap enough (2000USD vs 30,000USD+) for use in resource limited settings. This makes it particularly useful in COVID-19, where its portability enables decentralised respiratory evaluations (at home or in triage centres rather than in the hospital) and simple inter-patient disinfection.

However, while acquisition may be simple, interpretation is comparatively challenging, prone to subjective bias as well as a lack of standardisation.

Thus, we propose DeepChest, a deep learning approach leveraging transformer architecture to address variable missingness for robust patient-level aggregation of individual images.

You will have a cleaned dataset of ultrasound images and videos and clinical tabular data, and your tasks will be to extend these methods to video and explore ways to make the outputs interpretable to clinicians (saliency mapping, gradcam etc). You will be co-supervised by a medical doctor, and ML-PhD student.

Keywords: interdisciplinary, deep learning, computer vision, BERT, image analysis, tabular data, prognostication, diagnosis


  • Pablo Canas (MSc Thesis)
  • Amandine Evard (MSc Thesis)
  • Lilia Ellouz (MSc Semester)
  • Jalel Zghonda (MSc Semester)
  • Co-supervisor: Jean-Baptiste Cordonnier (PhD student)


  • Mariko Makhmutova (MSc Intern)
  • Nicolas Martinod (MSc Semester project)
  • Hugo Schmutz (MSc Thesis)
Collaborative privacy: Empowering clinical research in Africa through secure, incentivised crowdsourcing

(clinical, privacy, ethics, theoretical+practical)
Sub-Saharan Africa suffers over a quarter of the global burden of disease, but despite this concentration of critical health information, the region currently produces less than 1% of global medical publications. In many clinics, data collection is still paper-based and clinicians are reluctant to collaborate with other parties due to important ethical and privacy concerns for their patients and the ownership of their intellectual research property.

You will explore approaches in distributed and federated learning to build a platform able to crowdsource models from multiple parties in a way that incentivises fair collaboration and high-quality interoperable data collection. Your models will be integrated into a prototype mobile data-entry application (already developed). You will be supported by an interdisciplinary team from EPFL (clinical and ML).

In collaboration with the DeAI project

Keywords: distributed ML, federated learning, privacy, predictive models (theory and practical), digital decolonialisation.


  • David Roschewitz (MSc Thesis)
  • Marcel Torné Villasevil (BSc Semester)
  • Frédéric Berdoz (MSc Semester)
  • Martin Beaussart (MSc Semester)


  • Felix Grimberg (MSc Thesis, the return!)
  • Felix Grimberg (MSC Semester project)
  • Mahmoud Hegazy (Intern)
  • Mahmoud Said (MSc Semester project)
  • Mohamed Ndoye (MSc Semester project)
  • Nikita Filipov (MSc Semester project)


  • Felix published a conference paper at the MICCAI2020 and another one is coming soon…
  • Felix also presented at the ASTMH2020 (the biggest tropical/global health conference in the world!)
Pneumoscope: an intelligent stethoscope for the diagnosis and prognosis of lung disease


Respiratory disease was a global crisis well before COVID-19. In children, it remains the leading cause of preventable death and also the primary context for antibiotic misuse.

Lung auscultation is an established clinical exam in the assessment of respiratory disease, but interpretation is highly subjective with considerable inter-user bias and poor accuracy.  

This project aims to use deep learning to identify the acoustic signatures of physiological and pathological lung sounds collected with a digital stethoscope.  

This is part of a close partnership with HUG, Geneva who are coordinating the effort and have developed a novel stethoscope for data capture. You will work in a large multi-disciplinary team co-supervised by medical and ML domains.

Keywords: interdisciplinary, deep learning, sound analysis, tabular data, diagnostics prognostication


  • Co-supervisor: Tatjana Chavdarova (Postdoc)
  • Coordinator: Jonathan Dönz (MSc Intern)
  • External Advisor: Daniel Müller
  • Julien Heitmann (MSc Thesis)
  • Asli Yörüsün (MSc Semester)
  • Dimitar Bajraktarov (BSc Semester)


  • Edoardo Holzl (Engineer)
  • Juliane Dervaux (MSc Thesis)
  • Deeksha MS (Intern)
CUMULATOR: a tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare

(environmental science, clinical)
A single training run of a complex deep learning transformer model for natural language processing can emit 15 million kilograms of CO2eq: equivalent to 15’000 (return!) flights between London and Paris. Perhaps surprisingly, there is currently little research in sustainable AI and only a few tools to monitor the environmental impact of ML projects. To this end, we developed

  1. CUMULATOR: a fully operational open-source API to quantify and report the carbon footprint of ML models, and
  2. a Carbon Impact Statement Protocol to report the carbon footprint of research projects.

CUMULATOR has been integrated within iPOCT (cf the first project proposal) as part of a large scale medical analysis platform to be deployed in Tanzania and Rwanda in 2020.

You will explore approaches to improve and personalise CUMULATOR’s estimations of the carbon footprint and optimise accuracy/time/carbon footprint efficiency trade-offs on distributed ML training methods.

Key words: machine learning, sustainable AI, carbon footprint


  • OPEN!


  • Tristan Trébaol (MSc Semester project)


  • Tristan presented CUMULATOR at the ASTMH2020 (the biggest tropical/global health conference in the world!)
Understanding Verbal Autopsies

(Epidemiology, Clinical)
In resource-limited settings with poor access to health care, many deaths occur in communities without having received a medical examination and burials may even proceed without a formal registration. Thus, these deaths are missed in a country’s vital statistics, leaving gaps in the surveillance of new disease outbreaks and resulting in a misrepresentation of a community’s health needs.
To capture these vital events, the WHO has formulated a standardised questionnaire that aims to infer the cause of death from structured interviews of the deceased’s family when deaths are reported during census efforts. However, the process of evaluating this data is laborious and requires many hours of work from medical doctors. The WHO  has complemented these questionnaires with several decision trees to help guide clinicians or automate the estimation. Unfortunately, these decision trees are static and generic, unable to adapt to changing epidemiology (e.g. COVID) or new environments.

As a first step in this project, we propose that you explore unsupervised/clustering methods to evaluate the outputs of the decision trees to help clinicians detect evolving patterns in this critical data.
You will be supervised by a specialist in verbal autopsies in Tanzania, a medical doctor and Prof Jaggi

Keywords : unsupervised machine learning, clustering, tabular data, NLP (possibly).


  • We’re waiting for the data before opening this project…


What if…   An Interactive Global COVID Policy Simulator

(epidemiology, public health policy)
As COVID-19 sweeps across the globe, the world debates the efficacy of the various combinations and timings of public health policies implemented by governments.
Was it too early? Too late? Poorly adapted to the local culture and resources? How does the efficacy of a policy in one country compare to another?
For instance, social distancing is a luxury many people cannot afford in overcrowded, multi-generational informal dwellings and their ability to adhere to policies are more quickly fatigued or supplanted by the mounting economic risks of staying at home.

One size certainly does not fit all.We have developed a set of models that explore the effect of the type and stringency of COVID-19 policies across the globe.

In this project, you will work on the What if… Interactive Global COVID Policy Simulator : a web based interactive platform that allows the general public to toggle between various policy combinations and visualise the predicted effect of these hypothetical various countries based on the available data.

Keywords: COVID-19, R0, visualisations, LSTM, machine learning, Python, Java


  • Co-supervisor: Prakhar Gupta (PhD student)
  • Ridha Chahed (MSc Intern)
  • Giorgio Mannarini (MSc Semester)
  • Francesco Posa (MSc Semester)
  • Kseniia Shevchenko (MSc Semester)
  • Andrea Pinto (BSc Semester project)
  • Advisor: Tatjana Chavdarova (post doc)


  • Thierry Bossy (MSc thesis)
  • Lucas Massemin (Research Assistant)
  • Pablo Canas (MSc Semester)


  • We are XPrize finalists! Join us at the AI4Good global summit
  • Thierry presented What If…? at the ASTMH2020 (the biggest tropical/global health conference in the world!)
  • Paper(s) coming soon…
Detecting spatiotemporal associations between remotely sensed satellite data and environmental infectious diseases

Many infectious diseases that exhibit spatial and temporal trends have been linked to the environment. Remotely sensed data are a valuable data source of predicting infectious disease dynamics because of their free availability, global coverage, and continued improvements in terms of spatial and temporal resolution. 

This project involves building upon an existing platform that enables pattern detection in health data from patients in Tanzania and Rwanda using unsupervised machine learning (see project 2). Your role will be to incorporate remotely sensed data into this platform that can be used to derive spatiotemporal associations with environmental parameters and build predictive models to forecast spikes in infectious diseases.

More specifically, you will assess multiple remotely sensed data sources and select the most appropriate sources to match the level of spatial and temporal resolution of available clinical data. (e.g. datasets: MODIS, Landsat 8, Sentinel-2, ASTER Global Digital Elevation Model, Global Urban Footprint, etc.).  During this project, you will be closely supported and co-supervised by a clinician, an environmental epidemiologist and a data scientist.

Keywords: infectious diseases, spatiotemporal modeling, environment, remote sensing.


  • Merged with 2nd project in this list

Co-supervisor: Dr Alexandra Kulinkina

Ebola modeling

Collaboration with the IDDO (Infectious Disease Data Observatory, via the University of Oxford) who have expertly curated the biggest Ebola dataset in the world.

We apply machine learning to explore clinical insights and specifically to compare the performance of centralised and decentralised learning.

Keywords: interdisciplinary, tabular data, diagnostics, prognostication


  • Aiyu Liu (MSc Thesis)


  • Ridha Chahed (MSc Semester Project)