Swiss Data Science Center

This page lists the Swiss Data Science Center projects available to EPFL students. The SDSC is a joint venture between EPFL and ETH Zurich. Its mission is to accelerate the adoption of data science and machine learning techniques within academic disciplines of the ETH Domain, the Swiss academic community at large, and the industrial sector. In particular, it addresses the gap between those who create data, those who develop data analytics and systems, and those who could potentially extract value from it. The center is composed of a large multi-disciplinary team of data and computer scientists, and experts in select domains, with offices in Lausanne and Zurich.
Visit our website



Projects – Spring 2021

It may be possible to convert a thesis project into a semester project or extend a semester project to be suitable for a thesis project. If any of the present or past projects interests you, please feel free to contact us. We are always looking forward to meeting motivated and talented students who want to work on exciting projects.

Laboratory:
Swiss Data Science Center

Type:
Master Project

Description:
Variational autoencoders [1,2] are unsupervised deep learning techniques that learn latent representations of the input data of low dimensionality. Previous works have shown that the latent low-dimensional representations capture the most relevant features in the data which could be used directly for physical understanding, but also as input to other machine learning algorithms, such as clustering, forecasting or extreme event detection. Here, we will mostly focus on understanding the latent representations of physical systems, such as the Lorenz attractor or data representing climate systems. The goal will be to disentangle the representations, such that each feature ideally captures one driver of the dynamics.

Goals/benefits:

  • Working with machine learning and deep learning libraries in Python (pandas, scikit-learn, PyTorch)
  • Becoming familiar with the analysis of time series (power spectra)
  • Advancing research on an interdisciplinary problem
  • Possibility to publish a research paper

Prerequisites:

  • Machine learning and deep learning (advanced or intermediate skills)
  • Python (advanced skills)
  • Interested in interdisciplinary applications

Deliverables:

  • Well-documented code
  • Written report and oral presentation

References:
[1] D. Kingma, M. Welling, “Auto-encoding variational Bayes”, 2013

[2] I. Higgins, D. Amos, D. Pfau, S. Racaniere, L. Matthey, D. Rezende, A. Lerchner, “Towards a definition of disentangled representations”, 2018

Contact:
Eniko Székely ([email protected])
Natasha Tagasovska ([email protected])

Description: 
Dynamical systems such as the climate are highly nonlinear, and despite the fact that the observations are high-dimensional, most of the dynamics is captured by a small number of physically meaningful patterns. In this project we will apply unsupervised dimension reduction techniques for feature extraction, and more specifically, the Nonlinear Laplacian Spectral Analysis (NLSA) [1] approach to extract features from potential vorticity at the level of the stratosphere. NLSA uses the information on the past trajectory of the data and thus allows us to extract representative temporal and spatial patterns. We will compare the results with linear techniques such as Principal Component Analysis.

Goals/benefits:

  • Working with machine learning libraries in Python (pandas, scikit-learn)
  • Advancing research on an interdisciplinary problem
  • Possibility to publish a research paper

Prerequisites:

  • Linear algebra
  • Machine learning (intermediate skills)
  • Python (intermediate skills)
  • Interested in interdisciplinary applications

Deliverables:

  • Well-documented code
  • Written report and oral presentation

References:

  1. D. Giannakis and A.J. Majda. Nonlinear Laplacian Spectral Analysis: Capturing intermittent and low-frequency spatiotemporal patterns in high-dimensional data, Statistical Learning and Data Analysis, 2012
  2. E. Székely, D. Giannakis, A.J. Majda. Extraction and predictability of coherent intraseasonal signals in infrared brightness temperature dataClimate Dynamics, 2016
  3. M. Belkin and P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clusteringNeurIPS, 2001

Contact: 
Eniko Szekely: [email protected]
Raphaël de Fondeville: [email protected]