Student Projects ‒ TML ‐ EPFL

If you are interested in working with us, here are some additional projects which we would be happy to work on!

Combinatorial Learning Theory: Can we learn algorithms with polynomially many samples?: In this project, we formalize algorithms through Turing machines and consider the hypothesis space H of all Halting Turing machines (i.e. all computable functions). In our learning theory scenario, we observe computational traces of an unknown ground-truth Turing machine T*. That is, for a given input x, we observe the computations of T* on the tape until the final output T*(x) is reached. In other words, we observe not only the input output pair (x, T*(x)), as in classical learning theory, but also the Chain of Thought (CoT) leading to the final answer. We denote this CoT by T*[x]. Now, the question we are interested in is whether we can correctly identify T* with a number of samples polynomial in the number of states of T*. We will consider relaxations and generalizations of this problem with finite-state automata. We will also consider some interesting learning theory and grammatical inference papers from the last century.

The project will require familiarity with computability theory, Turing machines, and finite-state automata. The project will also involve combinatorial and algorithmic reasoning.

Contact person: Hristo Papazov
In-context learning as approximated Bayesian inference: A significant body of recent work has sought to demystify the in-context learning (ICL) capabilities of transformers by framing them as implicit optimization algorithms. For canonical tasks like in-context linear regression, compelling evidence suggests that a transformer’s forward pass can emulate one or more steps of Gradient Descent (GD) on a least-squares objective. This has led to a powerful and widely cited “ICL as GD” hypothesis. This project challenges the generality of the GD hypothesis and aims to show that, for more complex statistical problems and architectures, transformers learn more sophisticated mechanisms.

Contact person: Francesco D’Angelo
The Teaching Set Conjecture: In this project, we will tackle the combinatorial open problem given by Simon and Ziles at COLT 2015. The setting is the following: There is a finite domain X and a set C of binary functions (f : X -> {0,1}) with VC dimension d. We say that a subset A of X is a teaching set for f in C if f restricted to A is different from all other functions g in C restricted to A. The Teaching Set Conjecture posits that there is a function f in C with a teaching set of size O(d). The best known upper bound at the moment is O(d^2) given by Wang et al. at COLT 2017. Recently at COLT 2025, it was shown by Zhivotovskiy et al. that a greedy strategy would not work to resolve the conjecture.

The project will involve creative combinatorial thinking and will allow the student to explore the fundamentals of learning theory.

Contact person: Hristo Papazov