iGH project proposals

Open for MSc semester projects, Interns and PhDs

The projects are modular and can be divided up into smaller chunks for shorter student stays.

UPDATED: 21 Sept 2020

DYNAMIC diagnostic algorithms for resource-limited settings

The WHO has developed a set of algorithms to guide clinicians through structured consultations in resource-limited settings. However, these algorithms are static, generic, rule- based decision trees printed on paper that are derived from outdated data or extrapolated from non-representative populations.
iPOCT (intelligent point of care test) is an electronic version of these paper guidelines, which aims to better adapt decision trees to changing trends in local data in such a way that it improves health outcomes and reduces resource consumption. The static version of iPOCT (ePOCT without data-driven algorithms) is currently fully functional and will be deployed in Tanzania and Rwanda in 2020 to guide over a million consultations.
You will work with the clinical and IT teams of a large interdisciplinary project (DYNAMIC) to build a novel pipeline that allows clinicians to generate, explore and implement interpretable machine learning algorithms derived from the data collected with the iPOCT tool. A prototype of the platform is already functional. During this project, you will be closely supported and co-supervised by a medical doctor and guided by the requirements of the clinical and IT teams.

Keywords: multidisciplinary, supervised ML, interpretable ML, visualisations, intelligent imputation, python, java, web-based dashboard development.

Currently working on this:

  • Jonathan Donz (MSc Semester project)
  • Theo Bian (MSc thesis)
  • Antoine Basseto (BSc Semester project)

Previous students

  • Liamarcia Bifano (MSc Thesis)
Detecting and visualising patterns in medical data to guide targeted interventions and medical training

(clinical, epidemiology)
Emerging and re-emerging infectious disease (e.g. COVID-19, Ebola, Dengue, Zika, Chikungunya etc.) are defined by their unexpectedness.
As early detection of these novel patterns of disease is critical to their control, we propose to automate the detection of epidemiologically relevant patterns that could better guide and scale interventions, and alert clinicians to evolving trends.

A proof of concept clustering visualisation platform has been developed and integrated into iPOCT (cf first project). You will build on this tool to exploring various metrics and approaches of semisupervised and unsupervised anomaly detection.   During this project, you will be closely supported and co-supervised by a medical doctor and guided by the requirements of the clinical team.

Keywords: multidisciplinary, unsupervised ML, interpretable ML, visualisations++, intelligent imputation, python, web-based dashboard development.

Currently working on this:

  • Kuan Tung (MSc Semester project)

Previous students:

  • Zeineb Sahnoun (MSc Semester project)
EQUIPOIZER: An open-source tool to minimise harm in clinical trials

(clinical ethics)
In a clinical trial comparing two treatments, the question of which patients receive which treatment is a critical ethical issue. Such trials are guided by the principle of clinical “equipoise”: the state of genuine uncertainty on whether or not a proposed treatment will be beneficial. i.e. Only when we are genuinely uncertain of benefit, can it be ethical to perform a clinical trial and randomise patients into different treatment groups. However, as a trial progresses and information is accumulated, equipoise is eroded as benefits and harm become measurable. In this project, you will make a platform that monitors trial outcomes and adjusts the number of patients randomised to each arm so as to maintain both the equipoise and statistical power of the clinical trial whilst minimising unnecessary harm.

Keywords: interdisciplinary, you can focus on the theory or practical part (as you like).

Currently working on this:

  • OPEN!
DeepChest: Lung Ultrasound for diagnosis and risk stratification in COVID-19


Lung ultrasound (LUS) is a non-invasive, pragmatic and time-tested tool for evaluating and discriminating respiratory pathology at the bedside. Indeed, when LUS is interpreted by expert clinicians, its predictive accuracy can match CT scanners.
Recently, ultrasound-on-a-chip technology has made LUS extremely portable (pluggable into a mobile phone) and cheap enough (2000USD vs 30,000USD+) for use in resource limited settings. This makes it particularly useful in COVID-19, where its portability enables decentralised respiratory evaluations (at home or in triage centres rather than in the hospital) and simple inter-patient disinfection.

However, while acquisition may be simple, interpretation is comparatively challenging, prone to subjective bias as well as a lack of standardisation.

Thus, we propose DeepChest, a deep learning approach leveraging transformer architecture to address variable missingness for robust patient-level aggregation of individual images.

You will have a cleaned dataset of ultrasound images and videos and clinical tabular data, and your tasks will be to extend these methods to video and explore ways to make the outputs interpretable to clinicians (saliency mapping, gradcam etc). You will be co-supervised by a medical doctor, and ML-PhD student.

Keywords: interdisciplinary, deep learning, computer vision, BERT, image analysis, tabular data, prognostication, diagnosis

Currently working on this:

  • Mariko Makhmutova (Intern, current)
  • OPEN!!!

Previous students

  • Hugo Schmutz (MSc Thesis)

Co-supervisor: Jean-Baptiste Cordonnier (PhD student)

Collaborative privacy: Empowering clinical research in Africa through secure, incentivised crowdsourcing

(clinical, privacy, ethics, theoretical+practical)
Sub-Saharan Africa suffers over a quarter of the global burden of disease, but despite this concentration of critical health information, the region currently produces less than 1% of global medical publications. In many clinics, data collection is still paper-based and clinicians are reluctant to collaborate with other parties due to important ethical and privacy concerns for their patients and the ownership of their intellectual research property.

You will explore approaches in distributed and federated learning to build a platform able to crowdsource models from multiple parties in a way that incentivises fair collaboration and high-quality interoperable data collection. Your models will be integrated into a prototype mobile data-entry application (already developed). You will be supported by an interdisciplinary team from EPFL (clinical and ML)

Keywords: distributed ML, federated learning, privacy, predictive models (theory and practical), digital decolonialisation.

Currently working on this:

  • Felix Grimberg (MSc Thesis, the return!)
  • Nicolas Martinod (MSc Semester project)

Previous students:

  • Felix Grimberg (MSC Semester project)
  • Mahmoud Hegazy (Intern)
  • Mahmoud Said (MSc Semester project)
  • Mohamed Ndoye (MSc Semester project)
  • Nikita Filipov (MSc Semester project)

AWARDS: Felix published a conference paper at the MICCAI2020 –> join us!

Co-supervisor: Sai Praneeth Karimireddy (PhD student)

Pneumoscope: an intelligent stethoscope for the diagnosis and prognosis of lung disease

details coming…

Keywords: interdisciplinary, deep learning, sound analysis, tabular data, diagnostics prognostication

Currently working on this:

  • Juliane Dervaux (MSc Thesis student, current)
  • Deeksha MS (Intern, current)
  • OPEN
CUMULATOR: a tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare

(environmental science, clinical)
A single training run of a complex deep learning transformer model for natural language processing can emit 15 million kilograms of CO2eq: equivalent to 15’000 (return!) flights between London and Paris. Perhaps surprisingly, there is currently little research in sustainable AI and only a few tools to monitor the environmental impact of ML projects. To this end, we developed

  1. CUMULATOR: a fully operational open-source API to quantify and report the carbon footprint of ML models, and
  2. a Carbon Impact Statement Protocol to report the carbon footprint of research projects.

CUMULATOR has been integrated within iPOCT (cf the first project proposal) as part of a large scale medical analysis platform to be deployed in Tanzania and Rwanda in 2020.

You will explore approaches to improve and personalise CUMULATOR’s estimations of the carbon footprint and optimise accuracy/time/carbon footprint efficiency trade-offs on distributed ML training methods.

Key words: machine learning, sustainable AI, carbon footprint

Currently working on this:

  • OPEN!

Previous students

  • Tristan Trébaol (MSc Semester project)

AWARDS: Tristan will be presenting CUMULATOR at the ASTMH2020 (the biggest tropical/global health conference in the world!)

Understanding Verbal Autopsies

(Epidemiology, Clinical)
In resource-limited settings with poor access to health care, many deaths occur in communities without having received a medical examination and burials may even proceed without a formal registration. Thus, these deaths are missed in a country’s vital statistics, leaving gaps in the surveillance of new disease outbreaks and resulting in a misrepresentation of a community’s health needs.
To capture these vital events, the WHO has formulated a standardised questionnaire that aims to infer the cause of death from structured interviews of the deceased’s family when deaths are reported during census efforts. However, the process of evaluating this data is laborious and requires many hours of work from medical doctors. The WHO  has complemented these questionnaires with several decision trees to help guide clinicians or automate the estimation. Unfortunately, these decision trees are static and generic, unable to adapt to changing epidemiology (e.g. COVID) or new environments.

As a first step in this project, we propose that you explore unsupervised/clustering methods to evaluate the outputs of the decision trees to help clinicians detect evolving patterns in this critical data.
You will be supervised by a specialist in verbal autopsies in Tanzania, a medical doctor and Prof Jaggi

Keywords : unsupervised machine learning, clustering, tabular data, NLP (possibly).

Currently working on this:

  • OPEN!


What if…   An Interactive Global COVID Policy Simulator

(epidemiology, public health policy)
As COVID-19 sweeps across the globe, the world debates the efficacy of the various combinations and timings of public health policies implemented by governments.
Was it too early? Too late? Poorly adapted to the local culture and resources? How does the efficacy of a policy in one country compare to another?
For instance, social distancing is a luxury many people cannot afford in overcrowded, multi-generational informal dwellings and their ability to adhere to policies are more quickly fatigued or supplanted by the mounting economic risks of staying at home.

One size certainly does not fit all.We have developed a set of models that explore the effect of the type and stringency of COVID-19 policies across the globe.

In this project, you will work on the What if… Interactive Global COVID Policy Simulator : a web based interactive platform that allows the general public to toggle between various policy combinations and visualise the predicted effect of these hypothetical various countries based on the available data.

Keywords: COVID-19, R0, visualisations, LSTM, machine learning, Python, Java

Currently working on this:

  • Thierry Bossy (MSc thesis)
  • Lucas Massemin (Intern)
  • Andrea Pinto (BSc Semester project)
  • OPEN!


  • We are XPrize finalists! Join us at the AI4Good global summit
  • Thierry will be presenting What If…? at the ASTMH2020 (the biggest tropical/global health conference in the world!)

Co-supervisor: Prakhar Gupta (PhD student)

Detecting spatiotemporal associations between remotely sensed satellite data and environmental infectious diseases

Many infectious diseases that exhibit spatial and temporal trends have been linked to the environment. Remotely sensed data are a valuable data source of predicting infectious disease dynamics because of their free availability, global coverage, and continued improvements in terms of spatial and temporal resolution. 

This project involves building upon an existing platform that enables pattern detection in health data from patients in Tanzania and Rwanda using unsupervised machine learning (see project 2). Your role will be to incorporate remotely sensed data into this platform that can be used to derive spatiotemporal associations with environmental parameters and build predictive models to forecast spikes in infectious diseases.

More specifically, you will assess multiple remotely sensed data sources and select the most appropriate sources to match the level of spatial and temporal resolution of available clinical data. (e.g. datasets: MODIS, Landsat 8, Sentinel-2, ASTER Global Digital Elevation Model, Global Urban Footprint, etc.).  During this project, you will be closely supported and co-supervised by a clinician, an environmental epidemiologist and a data scientist.

Keywords: infectious diseases, spatiotemporal modeling, environment, remote sensing.

Currently working on this:

  • OPEN!

Co-supervisor: Dr Alexandra Kulinkina

Ebola modeling

Collaboration with the IDDO (Infectious Disease Data Observatory, via the University of Oxford) who have expertly curated the biggest Ebola dataset in the world.

Keywords: interdisciplinary, tabular data, diagnostics, prognostication

Currently working on this:

  • Ridha Chahed (MSc Semester Project)