If you are interested in working with us, here are some additional projects which we would be happy on working on!
- Understanding the representations learned by contrastive learning
Self-supervised learning methods based on contrastive learning methods perform well by learning a representation useful for downstream tasks. Some recent works justify the quality of these representations through the recovery of latent generative models, multi-view hypotheses on the data, or empirical properties such as alignment and uniformity (https://arxiv.org/abs/2102.08850, https://arxiv.org/abs/2005.10242, https://arxiv.org/abs/2006.05576). In this project, we seek to experimentally study the types of representations learned with contrastive learning and check the validity of the proposed mechanisms. Conditioned on these investigations, theoretical or empirical follow-up work on contrastive learning motivated by the different views mentioned is possible (e.g., a study on novel target spaces for CL/SSL).
For more info please contact Oguz.- Understanding the training instability of transformers
The transformer architectures are known to be extremely unstable to train and require many tricks to overcome this problem (e.g., see “Section 2.5 Training Processes” in “OPT: Open Pre-trained Transformer Language Models” (https://arxiv.org/abs/2205.01068) ). In this project, we want to understand the reasons behind this instability and how we can mitigate it.
For more info please contact Maksym.- Understanding BYOL
- “Bootstrap your own latent: A new approach to self-supervised Learning” (https://arxiv.org/abs/2006.07733) and its variations (e.g., SimSiam https://arxiv.org/abs/2011.10566, DINO https://arxiv.org/abs/2104.14294) have become very popular in self-supervised learning. However, the reasons behind their success are still not fully understood. In particular it is unclear what is the role of the moving average encoder and why a representation collapse does not occur. The goal of this project is to clarify the role of various (seemingly ad-hoc) design decisions introduced in these methods and come up with a unifying picture for these methods.
For more info please contact Maksym. - Understanding the order of learning of training examples
- It has been observed (e.g., in https://arxiv.org/abs/1706.05394, https://arxiv.org/abs/2007.00151) that machine learning models (including deep networks) first learn to correctly classify simple examples (e.g., typical or common examples) and only then more complex ones (e.g., mislabeled or rare examples) during training. In this project, we want to explore how the order of learning of training examples depends on the optimization algorithm (SGD with different step sizes, sharpness-aware minimization, etc), and how we can prioritize learning of rare examples over mislabeled ones which are usually hard to distinguish.
For more info please contact Maksym.