Student projects

At LIONS, we are concerned with optimized information extraction from signals or data volumes. We therefore develop mathematical theory and computational methods for information recovery from highly incomplete data.

We divide our research into two synergistic theory thrusts: information scalable optimization and data acquisition, and learning theory and methods for low-dimensional signal models.

Our team of Postdocs and Phd students work on different aspects of these interesting and challenging problems, from theoretical analysis of methods and algorithms to efficient implementations to solve real-world problems.

Choosing our laboratory for your project will enable you to work in an exciting and stimulating environment, allow you to quickly learn state-of-the-art mathemathical and algorithmic tools and deliver an outstanding project.

Below, you can find some project that our team is currently working on and would appreciate your collaboration. You are also welcome to come with your own project topic that matches with our expertise and interests.

Please contact Prof. Volkan Cevher for further information on the projects.

Below projects are reserved for master students only. PhD students should consult LIONS open position page.

Open Projects

Even though multiplicative and higher-order correlations are ubiquitous in machine learning and related fields, their inductive biases are not well understood yet. Higher-order polynomials have demonstrated success when trained on specific tasks, however what is the inductive bias of the polynomial structure when untrained? The seminal study of ‘Deep Image Prior’ [1] sheds light on feed-forward neural networks, however a related study is yet to be conducted for polynomial neural networks. The study of ‘Multiplicative Interactions and Where to Find Them’ [2] proves that second-order interactions enlarge the hypothesis space of feed-forward networks, hence the untrained polynomial networks should substantially improve the results of untrained feed-forward networks. An additional focus of the project is to illustrate the different inductive biases of polynomial network variants [3]. 
[1] ‘Deep Image Prior’, Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky, CVPR 2018.
[2] ‘Multiplicative Interactions and Where to Find Them’, Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu, ICLR 2020.
[3] ‘Π−nets: Deep Polynomial Neural Networks’, Grigorios G. Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Yannis Panagakis, Jiankang Deng, Stefanos Zafeiriou, CVPR 2020.
If interested please contact Dr Grigorios Chrysos for details.
Using polynomial expansion, such Taylor series, is a well-established tool in functional analysis for approximating functions. Even though in machine learning, higher-order correlations (and even polynomial expansions) have emerged for approximating functions, their properties have not been studied yet. Specifically: 
a) the initialization of the network is a crucial component for learning (deep) feed-forward neural networks [1], [2]. There has been no theoretical analysis to date on the initialization of polynomial neural networks.
b) terms of high-degree might be unstable outside the [-1, 1] value range. A study of the stability in such cases will be significant. New normalization schemes [3] might be beneficial for polynomial neural networks. 
[1] ‘Understanding the difficulty of training deep feedforward neural networks’, Xavier Glorot, Yoshua Bengio, AISTATS 2010.
[2] ‘Delving deep into rectifiers: Surpassing human-level performance on imagenet classification’, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, ICCV 2015.

[3] ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’, Sergey Ioffe, Christian Szegedy, ICML 2015.

If interested please contact Dr Grigorios Chrysos for details.

Langevin dynamics is a first-order sampling algorithm that has gained a lot of attention in the last years. Its similarity to gradient descent in optimisation makes us wonder whether we can transpose existing technique in optimization to sampling. In particular, we would like to design an adaptive version of Langevin dynamics, in the similar fashion as AdaGrad, and prove convergence guarantees with similar improvement as in optimization.

If interested please contact Paul Rolland for details.

This projects aims to find a robust optimization method that will perform well in many scenarious for deep learning. A student should be already familiar how to train neural networks.

If interested please contact Yurii Malitskyi for details

This projects aims to create a Python library that collects many real world variational inequality (saddle point problem) instances and some popular solvers.

If interested please contact Yurii Malitskyi for details

It has been empirically shown that carefully ordering the training examples improve the learning speed of SGD based algorithms. There very few works that attempt to provide some theoretical insights into this phenomenon. In this project, we aim to investigate (both theoretically and empirically) the curriculum strategies for adaptive optimization methods. 

If interested please contact Parameswaran Kamalaruban for details.

  1. Low-power analog-to-digital conversion for multichannel neural recording
  2. Hardware-friendly selective feature extractors for neural classification tasks
  3. Ultra-wideband (UWB) wireless data transfer for neural recording implants

If interested, please submit your applications to [email protected] with your CV, statement of results and a short paragraph describing your motivation and background to undertake the project.

Details of the above projects can be found here

Machine learning classifiers working on high dimensional data exhibit susceptibility to so called “adversarial examples”, wildly misidentifying inputs presented to them when modified with specifically crafted  small perturbations. While part of this behavior has been explained by the concentration of measure phenomenon and types of features learned by the classifiers, there seem to be additional drivers of adversarial risk not yet captured by current frameworks. This project – which can be adapted to either Masters or Semester projects – investigates these drivers, aiming at mathematical analysis, the creation of toy problems and benchmarks which can be used to gauge the sources of adversarial susceptibility a given model contains, as well as their severity. This is a challenging project and for both master or semester projects, you should be comfortable with Julia, pytorch or tensorflow, as well as have general background on the field.

If interested please contact Igor Krawczuk for details

Characterising novel circuit elements and developing compact behavioural models is an important step before they can be used in circuit design. However creating and tuning these models can be a tedious process. This project is a first exploration of using a SPIRAL inspired algorithm for the synthesis of compact models by synthesising SPICE circuit models.

If interested please contact Igor Krawczuk for details

This is part of a large scale project concerned with studying way to fully automate the design and layouting of ASICS. Possible projects touch on reinforcement learning, generative models, representation learning and theoretical work on how to represent and decompose action and state spaces.

If interested please contact Igor Krawczuk for details

Neuromorphic  machine  learning  hardware  uses device physics  like  the  switching  dynamics  of memristive devices or the charge dynamics of a capacitor to perform computation. To estimate the efficacy of neuromorphic machine learning hardware, one needs to marry together two computationally intensive tasks: we need to run machine learning algorithms on problems  representative of those faced in reality, while at the same time modeling the device physics used to perform the computations. This is somewhat unique workload since the total  training  time  will  be  orders  of  magnitude  longer  than  what  is  easily  feasible  in  circuit simulations, while possibly depending on the interactions of varying subsets of individual devices. This means that we will need to mix both abstract and detailed models, since global abstraction into behavioral or probabilistic models is risky.  To accommodate this special need, we  are developing a multi-model simulation framework.

This framework is is intended to minimize the computational load for the physical simulations by properly adapting the model to the desired accuracy and scale well onto GPU and cluster environments. In this project you will contribute to this framework by collecting existing device  models  for memristive devices from the literature and implement them as compute kernels. The project complexity can be scaled to both Master and Semester projects. Estimated workload breakdown: 

30 % literature survey

50 % model selection and implementation in Julia 

20 % Integration into framework and verification

If interested please contact Igor Krawczuk for details

Neural network models are increasingly deployed in safety critical and data sensitive  settings, for example in medical diagnostics. This project concerns the study of the interactions of various strategies to learn disentangled latent representations with interpretability and techniques meant to help user understand and reason about the features used by the network to give a certain output, their sensitivity to various forms of adversarial perturbations and distributional shifts as well as their susceptibility to attacks like the model inversion attack which is a concern when private data is used to train widely distributed models.

If interested please contact Igor Krawczuk for details

Details of the project can be found here 

If interested please contact Thomas Sanchez for details

We consider the problem of finding a global maximizer (or minimizer) of an unknown objective function from point evaluations. Bayesian optimization prescribes a prior belief over the possible objective functions and then sequentially refines this model as data are observed.

The student will develop new methods for Bayesian optimization that are both theoretically well founded and practically applicable.The methods will be tested on various real-world problems such as finding a good set of hyperparameters for deep neural networks or discovering molecules with good chemical properties.

Keywords: Bayesian optimization, Gaussian process, Multi-armed bandits

Requirements: Python, machine learning, familiarity with multi-armed bandits and submodular optimization is a plus.

References: [4], [1], [3], [2]


[1] Ilija Bogunovic, Jonathan Scarlett, and Volkan Cevher. Time-Varying Gaussian Process Bandit Optimization. In International Conference on Artifcial Intelligence and Statistics (AISTATS), 2016.

[2] Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, and Volkan Cevher. Truncated Variance Reduction: A Uni_ed Approach to Bayesian Optimization and Level-Set Estimation. In Conference on Neural Information Processing Systems (NIPS), 2016.

[3] Josip Djolonga, Andreas Krause, and Volkan Cevher. High-dimensional gaussian process bandits. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1025-1033. Curran Associates, Inc.,2013.

[4] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148-175, 2016.

The high-throughput screening of large databases of novel materials candidates constitutes a central goal of the recently awarded MARVEL NCCR grant. Given a training set of compounds with pre-calculated quantum mechanical properties, we seek to construct supervised machine learning models that accurately infer the corresponding properties for similar materials with correctness guarantees [1-4].


[1] A. P. Bartók, R. Kondor and G. Csányi , On representing chemical environments, Phys. Rev.., vol. 87, pp. 184115-1–184115-16, 2013.

[2] S. De, A. P. Bartók, G. Csányi and M. Ceriotti, Comparing molecules and solids across structural and alchemical space, Phys. Chem. Chem. Phys., vol. 20, pp. 13754–13769, 2016.

[3] Y. Nesterov, Introductory lectures on convex optimization. A basic course. Kluwer Academic Publishers,Boston, MA, 2004.

[4] M. Rupp, Machine Learning for Quantum Mechanics in a Nutshell, Quantum Chemistry, vol. 115(16), pp. 1058–1073, 2015.

Submodular functions are set functions, i.e., functions defined on the power set 2V of a given ground set V,, that satisfy a diminishing return property. They occur naturally in several applications in computer vision, machine learning, economics and theoretical computer science. Submodular functions have very interesting links to convex analysis and can be minimized exactly and maximized approximately by efficient algorithms with provable guarantees.

The objective of this project is to develop methods to improve on the theoretical and/or practical efficiency of submodular minimization/maximization algorithms in offline or streaming settings.

Requirements: MATLAB, familiarity with Submodular and Convex Optimization, and Machine Learning.


[1] Fujishige, Satoru. \Submodular functions and optimization”. Vol. 58. El-sevier, 2005.

[2] Bach, Francis. \Learning with submodular functions: A convex optimization perspective.” arXiv preprint arXiv:1111.6453 (2011).

[3] Iyer, Rishabh, and Je_ Bilmes. \Polyhedral aspects of Submodularity, Convexity and Concavity.” arXiv preprint arXiv:1506.07329 (2015).

Bundle methods for non-smooth optimisation were widely used in the literature [1,2,3]. The main idea of this method is to construct a piecewise linear approximation to the objective function. This construction is based on the sub-gradient of the previous iteration.

The objective of this project is to develop stochastic bundle methods and investigate their convergence, for solving non-smooth optimization of minimizing the sum of two functions f+g, where f is proximable and g can be decomposition into the finite sum of g_i.


[1] Yu Du, Andrzej Ruszczynski, Rate of Convergence of the Bundle Method

[2] Alexander J. Smola, S.V. N. Vishwanathan, Quoc V. Le, Bundle Methods for Machine Learning.

[3] C. H. Teo, S.V. N. Vishwanathanm A. Smola, Quoc. V.le, Bundle Methods for Regularized Risk Minimization, Journal of Machine Learning Research 11 (2010) , 311-365

Closed projects

We are interested in reducing the scan times in Magnetic Resonance Imaging (MRI) using compressive sensing techniques. For this purpose, we will design sampling mechanisms via a new learning-based approach, which unifies learning theory and the sampling theory. We also consider extensions to multi-coil MRI and provide demonstrations with real scans.

Keywords: Compressive Sensing, Fourier Transform, Image Reconstruction.

Requirements: MATLAB, Signal/Image Processing knowledge, convex and combinatorial optimization

In the inverse reinforcement learning (IRL) setting, the reward of the environment is not defined. Instead, the agent is provided with a set of expert demonstrations, and the agent tries to recover a reward function such that the expert behavior is optimal. Typically in IRL, the transition dynamics of the environment are assumed to be known. And only a few recent works attempt to learn both the reward and the transition dynamics via little interaction with the environment or in a truly-batch manner. In this project, we revisit this problem more systematically in light of new advancements in model-based RL. 

Person of  contact was Parameswaran Kamalaruban

In the inverse reinforcement learning (IRL) setting, the reward of the environment is not defined. Instead, the agent is provided with a set of expert demonstrations, and the agent tries to recover a reward function such that the expert behavior is optimal. Consider a setting where expert demonstrations are provided for only a few environments. Based on these demonstrations, we aim to learn a policy that is robust against a larger class of environments (MDPs).

Person of contact was  Parameswaran Kamalaruban

With the massive increase of available data for machine learning research, designing optimization algorithms that could handle large volumes of data is getting more and more important. For that purpose, we are interested in developing convex optimization algorithms that can be used in distributed environment.

In this project, the algorithms will be implemented and tested using Google’s machine learning library TensorFlow.

Keywords: convex optimization, distributed optimization, tensorflow

Requirements: Python, C++, familiarity with optimization is a plus

Details of the project can be found here: Neural Network Based Hardware Decoder Design

“Although this may seem a paradox, all exact science is dominated by the idea of approximation.”

–Bertrand Russell (1872–1970)

We are interested in developing approximation algorithms with provable estimation guarantees for structured sparse representations of spatial images and temporal data streams.

Students with experience in Hadoop MapReduce and/or Cuda programming are given priority.

Keywords: approximation guarantee, dynamic programming, greedy algorithm, submodularity, union-of-subspace models, randomized algorithms

Requirements: Matlab, C/C++

We are interested in developing algorithms that enable robust decompositions of large-scale data matrices for learning applications.

Keywords: randomized algorithms, fixed-rank approximation problem, approximation guarantees

Requirements: Matlab or C/C++

We are interested in studying non-iid, parametric and non-parametric prior distributions with almost sure compressibility properties. This project pays particular attention to exchangeable distributions defined over binary and quad-trees.

Keywords: compressible priors, order statistics, almost sure convergence, De Finetti’s theorem

Requirements: Matlab, C/C++

We are interested in the application of extremal combinatorial objects in compressive sensing problems, such as randomness extractors and expanders. This project pays particular attention to a recent development in information theory: polar codes.

Keywords: randomness extractors, expanders, polar codes, compressive sensing

Requirements: Matlab or C/C++

We are interested in characterizing the (set) restricted singular values of pseudo-random matrices.

Keywords: matrix martingales, tail bounds, concentration-of-measures

Requirements: Matlab, C/C++

We are interested in making inferences from biological data in order to understand the processes that cause a disease, potentially leading to the identification of therapies.

Within the machine learning framework of supervised linear regression, we want to exploit structured sparse models to both improve prediction performance and interpretability of the results. Structured sparse model encompass models such as hierarchical trees, group structures or union of subspaces model, which offer the possibility of estimating a linear model that can be more easily interpreted by clinicians.

Our aim is to develop and characterize new algorithms and structures to be applied to cancer data and neuroimaging data. This project had both theoretical and computational aspects.

In the classical Markowitz portfolio optimization, one is interested in allocating a fixed amount of capital among a set of assets so as to minimize the risk/variance of the portfolio return, subject to a target empirical return. Assuming that the true mean and covariance of the set of assets are known, the Markowitz portfolio problem has a closed form solution. In practice though, this assumption is unrealistic and we are usually restricted to some empirical estimates of the mean and the covariance from recorded data. Unfortunately, such compromise introduces several problems: in general, financial data are well-known to be non-stationary (which limits the amount of meaningful data in the estimates) while classic ML techniques are robust only in the asymptotic sense where a vast amount of data is required for a good estimate. We aim to improve on these approaches by exploiting other properties of the data, such as the sparsity in the covariance matrix or other low-dimensional assumptions.

During this project, the student learn modern techniques in sparse covariance matrix estimation as well as new scalable optimization algorithms. Our aim is to develop and characterize covariance estimators from a limited amount of data, leveraging sparse promoting and other low-dimensional models in the estimation. As part of the project, the student applies this knowledge to real world data and track progress on the performance of the designed portfolios. This project has both theoretical and computational aspects.

Recovering a large matrix from a small subset of its entries is a challenging problem arising in many real applications, such as recommender systems. Many existing approaches formulate this problem as a general low-rank matrix approximation problem, which may be used to impute unobserved entries. This approach leads to good performance even when very few are known.

We are interested in bandit optimization in this setting which introduces the exploration/exploitation tradeoff, where we actively decide on the sampling positions to optimize a given convex criteria.

We propose a dimensionality reducing matrix design based on training data with constraints on its Frobenius norm and number of rows. Our design criteria is aimed at preserving the distances between the data points in the dimensionality reduced space as much as possible relative to their distances in original data space. This approach can be considered as a deterministic Bi-Lipschitz embedding of the data points.

We introduce a scalable learning algorithm, dubbed AMUSE, and provide a rigorous estimation guarantee by leveraging game theoretic tools. We also provide a generalization characterization of our matrix based on our sample data. We use compressive sensing problems as an example application of our problem, where the Frobenius norm design constraint translates into the sensing energy.

Link to publication.

We consider the problem of solving large-scale optimization problems with matrix decision variables. For such problems, classical convex optimization algorithms scale poorly even at moderate data sizes. On the other hand, through cleverly employing the structures of real-world problems, non-convex alternating minimization methods succeed at providing efficient and accurate solutions, albeit sometimes without theoretical guarantees.

In this project, the student will develop novel alternating minimization algorithms, and examine their theoretical convergence. The student will test the applicability of the new methods on various real-world problems including phase retrieval and matrix completion.

Keywords: Alternating minimization, Phase retrieval, Matrix completion.

Requirements: Matlab or Python, basic knowledge in optimization such as gradient descent and its convergence rate, familiarity with phase retrieval or matrix completion is a plus.

References: [1], [2], [3], [4].


[1] Emmanuel J Candes, Xiaodong Li, and Mahdi Soltanolkotabi. Phase retrieval via wirtinger ow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985-2007, 2015.

[2] David E Carlson, Volkan Cevher, and Lawrence Carin. Stochastic spectral descent for restricted boltzmann machines. In AISTATS, 2015.

[3] Moritz Hardt. Understanding alternating minimization for matrix completion. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 651-660. IEEE, 2014.

[4] Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 665-674. ACM, 2013.