iGH project proposals

Open for MSc semester projects, Interns and PhDs

The projects are modular and can be divided up into smaller chunks for shorter student stays. 



Updated July 2022

(with Unisanté)

See video

The WHO has developed a set of algorithms to guide clinicians through structured consultations in resource-limited settings. However, these algorithms are static, generic, rule- based decision trees printed on paper that are derived from outdated data or extrapolated from non-representative populations.
MedAL is a suite of MEDical ALgorithms to develop data-driven versions of these paper guidelines. More specifically MedAL-ai (which you will help develop in this project) aims to better adapt decision trees to changing trends in local data in such a way that it improves health outcomes and reduces resource consumption. The static version of such a tool (ePOCT without data-driven algorithms) is currently deployed in Tanzania and Rwanda and will guide over a million consultations.
You will work with the clinical and IT teams of a large interdisciplinary project (DYNAMIC) to build a novel pipeline that allows clinicians to generate, explore and implement interpretable machine learning algorithms derived from the data collected with the iPOCT tool. 

Keywords: multidisciplinary, supervised ML, interpretable ML, visualisations, intelligent imputation, python, javascript, react native, web-based dashboard development.

(with Swiss TPH)

Emerging and re-emerging infectious disease (e.g. COVID-19, Ebola, Dengue, Zika, Chikungunya etc.) are defined by their unexpectedness.
As early detection of these novel patterns of disease is critical to their control, we propose to automate the detection of epidemiologically relevant patterns that could better guide and scale interventions, and alert clinicians to evolving trends.

A proof of concept clustering visualisation platform has been developed and integrated into medAL-ai (cf first project). You will build on this tool to exploring various metrics and approaches of semisupervised and unsupervised anomaly detection.   During this project, you will be closely supported and co-supervised by a medical doctor and guided by the requirements of the clinical team.

Keywords: multidisciplinary, unsupervised ML, interpretable ML, visualisations++, intelligent imputation, python, web-based dashboard development.

(with MSF)

Contact me for internship/MSc Thesis opportunities with the MSF eCARE team.

Tasks are similar to the first 2 projects.

Keywords: multidisciplinary, supervised and unsupervised ML, interpretable ML, visualisations++, python, web-based dashboard development (Power BI)

(with MSF)

Contact me for internship/MSc Thesis opportunities with the MSF Antibiogo team.

Keywords: computer vision,  decentralised and federated learning, python, full-stack dev, javascript



Lung ultrasound (LUS) is a non-invasive, pragmatic and time-tested tool for evaluating and discriminating respiratory pathology at the bedside. Indeed, when LUS is interpreted by expert clinicians, its predictive accuracy can match CT scanners.
Recently, ultrasound-on-a-chip technology has made LUS extremely portable (pluggable into a mobile phone) and cheap enough (2000USD vs 30,000USD+) for use in resource limited settings. This makes it particularly useful in COVID-19, where its portability enables decentralised respiratory evaluations (at home or in triage centres rather than in the hospital) and simple inter-patient disinfection.

However, while acquisition may be simple, interpretation is comparatively challenging, prone to subjective bias as well as a lack of standardisation.

Thus, we propose DeepChest, a deep learning approach leveraging transformer architecture for robust patient-level aggregation of individual images.

You will have a cleaned dataset of ultrasound images and videos and clinical tabular data, and your tasks will be to extend these methods to video and explore ways to make the outputs interpretable to clinicians (saliency mapping, gradcam etc). 

Keywords: interdisciplinary, deep learning, computer vision, BERT, image analysis, tabular data, prognostication, diagnosis

(with HUG, Terre des Hommes)

Respiratory disease was a global crisis well before COVID-19. In children, it remains the leading cause of preventable death and also the primary context for antibiotic misuse.

Lung auscultation is an established clinical exam in the assessment of respiratory disease, but interpretation is highly subjective with considerable inter-user bias and poor accuracy.  

This project aims to use deep learning to identify the acoustic signatures of physiological and pathological lung sounds collected with a digital stethoscope.  

This is part of a close partnership with HUG, Geneva who are coordinating the effort and have developed a novel stethoscope for data capture. You will work in a large multi-disciplinary team co-supervised by medical and ML domains.

Keywords: interdisciplinary, deep learning, sound analysis, tabular data, diagnostics prognostication

(with MLO)

Sub-Saharan Africa suffers over a quarter of the global burden of disease, but despite this concentration of critical health information, the region currently produces less than 1% of global medical publications. In many clinics, data collection is still paper-based and clinicians are reluctant to collaborate with other parties due to important ethical and privacy concerns for their patients and the ownership of their intellectual research property.

You will explore approaches in distributed and federated learning to build a platform able to crowdsource models from multiple parties in a way that incentivises fair collaboration and high-quality interoperable data collection. Your models will be integrated into a prototype mobile data-entry application (already developed). You will be supported by an interdisciplinary team from EPFL (clinical and ML).

In collaboration with the DeAI project

Keywords: distributed ML, federated learning, privacy, predictive models (theory and practical), digital decolonialisation.

(with Swiss TPH)

Many infectious diseases that exhibit spatial and temporal trends have been linked to the environment. Remotely sensed data are a valuable data source of predicting infectious disease dynamics because of their free availability, global coverage, and continued improvements in terms of spatial and temporal resolution. 

This project involves building upon an existing platform that enables pattern detection in health data from patients in Tanzania and Rwanda using unsupervised machine learning (see project 2). Your role will be to incorporate remotely sensed data into this platform that can be used to derive spatiotemporal associations with environmental parameters and build predictive models to forecast spikes in infectious diseases.

More specifically, you will assess multiple remotely sensed data sources and select the most appropriate sources to match the level of spatial and temporal resolution of available clinical data. (e.g. datasets: MODIS, Landsat 8, Sentinel-2, ASTER Global Digital Elevation Model, Global Urban Footprint, etc.).  During this project, you will be closely supported and co-supervised by a clinician, an environmental epidemiologist and a data scientist.

Keywords: infectious diseases, spatiotemporal modeling, environment, remote sensing.

(with MLO)

We have made a prototype app to track sitting posture in realtime using key-points detection

Help us launch a clinical trial to test it’s utility

Keywords: interdisciplinary, full stack app dev, clinical trial

(with the Infectious Disease Data Observatory)

Collaboration with the IDDO (Infectious Disease Data Observatory, via the University of Oxford) who have expertly curated the biggest Ebola dataset in the world.

We apply machine learning to explore clinical insights and specifically to compare the performance of centralised and decentralised learning.

Keywords: interdisciplinary, tabular data, diagnostics, prognostication, decentralised learning

(with SwissRE foundation)

Video: HERE

As COVID-19 sweeps across the globe, the world debates the efficacy of the various combinations and timings of public health policies implemented by governments.
Was it too early? Too late? Poorly adapted to the local culture and resources? How does the efficacy of a policy in one country compare to another?
For instance, social distancing is a luxury many people cannot afford in overcrowded, multi-generational informal dwellings and their ability to adhere to policies are more quickly fatigued or supplanted by the mounting economic risks of staying at home.

One size certainly does not fit all.We have developed a set of models that explore the effect of the type and stringency of COVID-19 policies across the globe.

In this project, you will work on the What if… Interactive Global COVID Policy Simulator : a web based interactive platform that allows the general public to toggle between various policy combinations and visualise the predicted effect of these hypothetical various countries based on the available data.

Keywords: COVID-19, R0, visualisations, LSTM, machine learning, Python, Javascript

A single training run of a complex deep learning transformer model for natural language processing can emit 15 million kilograms of CO2eq: equivalent to 15’000 (return!) flights between London and Paris. Perhaps surprisingly, there is currently little research in sustainable AI and only a few tools to monitor the environmental impact of ML projects. To this end, we developed

  1. CUMULATOR: a fully operational open-source API to quantify and report the carbon footprint of ML models, and
  2. a Carbon Impact Statement Protocol to report the carbon footprint of research projects.

You will explore approaches to improve and personalise CUMULATOR’s estimations of the carbon footprint and optimise accuracy/time/carbon footprint efficiency trade-offs on distributed ML training methods.

Key words: machine learning, sustainable AI, carbon footprint

(with Ifakara Health Insitute, Tanzania)

In resource-limited settings with poor access to health care, many deaths occur in communities without having received a medical examination and burials may even proceed without a formal registration. Thus, these deaths are missed in a country’s vital statistics, leaving gaps in the surveillance of new disease outbreaks and resulting in a misrepresentation of a community’s health needs.
To capture these vital events, the WHO has formulated a standardised questionnaire that aims to infer the cause of death from structured interviews of the deceased’s family when deaths are reported during census efforts. However, the process of evaluating this data is laborious and requires many hours of work from medical doctors. The WHO  has complemented these questionnaires with several decision trees to help guide clinicians or automate the estimation. Unfortunately, these decision trees are static and generic, unable to adapt to changing epidemiology (e.g. COVID) or new environments.

As a first step in this project, we propose that you explore unsupervised/clustering methods to evaluate the outputs of the decision trees to help clinicians detect evolving patterns in this critical data.
You will be supervised by a specialist in verbal autopsies in Tanzania, a medical doctor and Prof Jaggi

Keywords : unsupervised machine learning, clustering, tabular data, NLP (possibly).

  • We’re waiting for the data before opening this project…



EPFL/Exchange students: apply with this form

AIMS/AMMI students: apply via the ML African Academic Exchange Programme

Feel free to contact me for further information 🙂