If you are interested in working with us, here are some additional projects which we would be happy on working on!
- Combinatorial Learning Theory: Can we learn algorithms with polynomially many samples?
In this project, we formalize algorithms through Turing machines and consider the hypothesis space H of all Halting Turing machines (i.e. all computable functions). In our learning theory scenario, we observe computational traces of an unknown ground-truth Turing machine T*. That is, for a given input x, we observe the computations of T* on the tape until the final output T*(x) is reached. In other words, we observe not only the input output pair (x, T*(x)), as in classical learning theory, but also the Chain of Thought (CoT) leading to the final answer. We denote this CoT by T*[x]. Now, the question we are interested in is whether we can correctly identify T* with a number of samples polynomial in the number of states of T*. We will consider relaxations and generalizations of this problem with finite-state automata. We will also consider some interesting learning theory and grammatical inference papers from the last century.
The project will require familiarity with computability theory, Turing machines, and finite-state automata. The project will also involve combinatorial and algorithmic reasoning.
Contact person: Hristo Papazov- Optimization aspects of RNNs and state space models
Contact person: Aditya Varre- Longer or More?
Language models are trained on a large corpus of text, with largeness measured by the number of tokens (T), which can be seen as the product of two quantities: the number of documents (D) and the length of each document (L), rather than the older concept of sample size (S). However, as tokens from the same document are correlated with one another, the i.i.d. assumption central to generalization theory is invalid, and the relationship between (T) and (S) is unclear. In this project, we will study how (D) and (L) influence learning in order to define the concept of an effective sample size (S). The main goal will be the identification of scaling laws w.r.t. (D) and (L) instead of (T) with baby language models in real-world data and some synthetic settings, such as Markovian languages.
Contact person: Oguz Kaan YĂŒksel- Interplays between approximate second-order optimizers and momentum
Contact person: Aditya Varre- Arrow of time in algorithmic tasks
The recent success of transformers in natural language processing has led to research on their reasoning capabilities. This line of research usually focuses on how learning occurs in transformers that are trained from scratch on a specific algorithmic task. Surprisingly, even in a simple task such as addition, training transformers from scratch does not succeed, and non-trivial modifications are required. These modifications are task-specific and take the form of either modifying the data, such as its ordering or adding meta-data for transformers, or modifying components, such as positional encoding. In the addition task, in particular, writing digits in reverse order helps transformers. In this project, we aim to develop a general training procedure that can handle different algorithmic tasks by considering generalized orderings of the data. The primary objective is to benchmark a certain training procedure on various algorithmic tasks and compare it with solutions in the literature.
Contact person: Oguz Kaan YĂŒksel- In-context learning as approximated Bayesian inference
- A significant body of recent work has sought to demystify the in-context learning (ICL) capabilities of transformers by framing them as implicit optimization algorithms. For canonical tasks like in-context linear regression, compelling evidence suggests that a transformerâs forward pass can emulate one or more steps of Gradient Descent (GD) on a least-squares objective. This has led to a powerful and widely cited âICL as GDâ hypothesis. This project challenges the generality of the GD hypothesis and aims to show that, for more complex statistical problems and architectures, transformers learn more sophisticated mechanisms.
Contact person: Francesco d’Angelo