Student Projects

*** NEW *** (Jan27_2026)

Master’s Thesis Project

Investigating Genomic Data Transformations for Predicting Disease Outcomes using Machine Learning

Keywords : Genomics, Disease Classification, Deep Learning

Description

Genomic data is of a large scale, often with more than many millions of features (single nucleotide polymorphisms (SNPs)). Each feature, although encoded numerically, represents a categorical change in DNA with; 0 for wild type (same as reference), 1 for a heterozygous (mutation on a single chromosome), 2 for homozygous (mutation on both chromosomes). Such data types make genomics unsuitable for disease prediction using machine learning and deep learning models. To keep analyses tractable and reduce redundancy, current methods perform aggressive dimensionality reduction (e.g., principal components analysis) or restrict the feature set to a subset of variants without any biological rational (e.g., the most variable).

This research will investigate the transformation of genotype data into compact, biologically meaningful features, suitable for disease prediction, by leveraging biological databases and transformations. It will also investigate the transformation of genomic data using foundational DNA models. Foundational DNA models have shown promise at capturing population level patterns in DNA. Recent models, e.g. AlphaGenome, have promised to revolutionise such analyses with the ability to transform genomic data into continuous biological measurements. Performing genomic transformations using PRS or foundational DNA models could therefore improve genomics disease predictions using machine learning and deep-learning techniques.

Requirements:

We are looking for a motivated student to participate in this interdisciplinary project. We are looking for a student with data science and programming skills in Python and previous experience with machine learning (PyTorch, Sklearn). Experience with genomic data types, bioinformatic tools and GitHub are valuable, yet not required.

Supervision:  Barry Ryan & Jacques Fellay

Expected duration: 3-4 months
If you are interested, please contact Barry Ryan ([email protected]).

********************************************************************

*** NEW *** (Jan29_2026)

Project Description: Enhancing Cardiovascular Disease Prediction by Integrating Rare Variants into Polygenic Risk Scores 

Primary Goal: This project aims to improve the predictive power of polygenic risk scores (PRS) for cardiovascular disease (CVD) by integrating information from rare genetic variants. The current standard for PRS relies predominantly on common variants; this research will investigate whether incorporating rare, high-impact variants can lead to a more accurate risk assessment tool.

Objectives:

  1. Construct a baseline CVD PRS using established common genetic variants from the UK Biobank.
  2. Identify and annotate rare genetic variants within the same cohort.
  3. Explore and implement statistical methods to integrate these rare variants into the PRS model.
  4. Evaluate whether the new integrated model outperforms the baseline PRS in predicting cardiovascular disease.
  5. (If time allows) Explore the generalizability of the model by testing it on data from the Swiss HIV Cohort Study (SHCS).

Datasets: The primary dataset for development and testing will be the UK Biobank. The Swiss HIV Cohort Study (SHCS) will serve as an optional external dataset for validation, offering a unique use-case in a specific patient population.

Outcome: The project will deliver a validated methodology for rare variant integration, a comparative analysis of model performance, and a finalized, well-documented codebase. The results will determine the value of rare variants for enhancing CVD risk prediction.

Requirements: We are looking for a motivated student to participate in this interdisciplinary project. We are looking for a student with data science and programming skills in Bash and R. Experience with genomic data types, bioinformatic tools and Docker are valuable, yet not required.

Supervision:  Simon Boutry & Chritian Thorball & Jacques Fellay

Expected duration: 3-4 months

If you are interested, please contact Simon Boutry ([email protected]).

**********************************************

Master Thesis Projects

Master Thesis Projects are started once the 90 credits of master cycle are obtained.

Projects should last 17 weeks – they are worth 30 credits.

Master Thesis Projects must be done individually.

Semester Projects and Lab Immersions

LST and BIOING students may do one semester project and several lab immersion(s) during their Master studies.

Semester projects are worth 12 credits, lab immersion 8 credits.