EPFL-CIS & RIKEN-AIP Joint Workshop on Machine Learning

 Featuring Rising Stars from the field from Switzerland and Japan | September 7 – 8, 2022 | Hybrid

Machine learning has been the main driving force for bringing artificial intelligence into the real world.
 
This workshop aims to bring together students and young researchers from EPFL-CIS and RIKEN-AIP and discuss the latest achievements and challenges of machine learning.
 
This event is part of a series of institutional exchanges between RIKEN-AIP and EPFL-CIS since 2020 based on a Memorandum of Understanding between the two institutions aiming for the establishment of a long-term relationship. Further joint events will follow in preparation of the visit of a RIKEN-AIP delegation in person at EPFL in 2023.
 

Date: September 7 and 8, 2022
Location: Hybrid: EPFL Speaker and audience on-side (Room TBC), Riken speakers will join online. 
Possibility to follow online or on-site

Registration zoom link

Registration for on-site (EPFL)

*As the onsite event is reserved to the EPFL community please register with your @epfl.ch email address. 

Program

 DAY 1: Sep. 7 

Adversarial robustness: from basic science to applications

Abstract
When we deploy models trained by standard training (ST), they work well on natural test data. However, those models cannot handle adversarial test data (also known as adversarial examples) that are algorithmically generated by adversarial attacks. An adversarial attack is an algorithm which applies specially designed tiny perturbations on natural data to transform them into adversarial data, in order to mislead a trained model and let it give wrong predictions. Adversarial robustness is aimed at improving the robust accuracy of trained models against adversarial attacks, which can be achieved by adversarial training (AT). What is AT? Given the knowledge that the test data may be adversarial, AT carefully simulates some adversarial attacks during training. Thus, the model has already seen many adversarial training data in the past, and hopefully it can generalize to adversarial test data in the future. AT has two purposes: (1) correctly classify the data (same as ST) and (2) make the decision boundary thick so that no data lie nearby the decision boundary. In this talk, I will introduce how to leverage adversarial attacks/training for evaluating/enhancing reliabilities of AI-powered tools.

Bio
Jingfeng Zhang is a researcher in RIKEN-AIP at “Imperfect Information Learning Team’’ supervised by Prof. Masashi Sugiyama. Prior to RIKEN-AIP, Jingfeng obtained his Ph.D. degree (in 2020) under Prof. Mohan Kankanhalli at School of Computing in the National University of Singapore, and his Bachelor’s Degree (in 2016) at Taishan College in Shandong University, China. Jingfeng is the receiver of Strategic Basic Research Programs ACT-X 2021-2023 funding, JSPS Grants-in-Aid for Scientific Research (KAKENHI), Early-Career Scientists 2022-2023, and the RIKEN Ohbu Award 2022 Jingfeng serves as a reviewer for prestigious ML conferences such as ICLR, ICML, NeurIPS, etc. Jingfeng’s long-term research interest is making artificial intelligence safe for human beings.

Catastrophic overfitting is a bug but also a feature

Abstract
Despite clear computational advantages in building robust neural networks, adversarial training (AT) using single-step methods is unstable as it suffers from catastrophic overfitting (CO): Networks gain non-trivial robustness during the first stages of adversarial training, but suddenly reach a breaking point where they quickly lose all robustness in just a few iterations. Although some works have succeeded at preventing CO, the different mechanisms that lead to this remarkable failure mode are still poorly understood. In this work, however, we find that the interplay between the structure of the data and the dynamics of AT plays a fundamental role in CO. Specifically, through active interventions on typical datasets of natural images, we establish a causal link between the structure of the data and the onset of CO in single-step AT methods. This new perspective provides important insights into the mechanisms that lead to CO and paves the way towards a better understanding of the general dynamics of robust model construction.

Bio
Guillermo Ortiz-Jimenez is a fourth-year PhD student at EPFL working under the supervision of Pascal Frossard. His research focuses on understanding deep learning using empirical methods with a focus on robustness and generalization. During his PhD, Guillermo has visited the University of Oxford as part of the ELLIS PhD program, where he is co-supervised by Philip Torr. He is currently a research intern at Google in Zurich. Before starting his PhD, Guillermo received his MSc. in Electrical Engineering from TU Delft, Netherlands, and his BSc. in Telecommunications Engineering from Universidad Politécnica de Madrid, Spain. He ranked first at both institutions.

Exact Statistical Inference for the Wasserstein Distance by Selective Inference

Abstract
The Wasserstein distance (WD), which is a metric used to compare the probability distributions, has attracted significant attention and is being used more and more in statistics and machine learning. When the WD calculated from noisy data is used for various decision-making problems, it is necessary to quantify its statistical reliability, e.g., in the form of confidence interval (CI). Several studies have been proposed in the literature, but almost all of them are based on asymptotic approximation and do not have finite-sample validity. In this study, we propose an exact (non-asymptotic) inference method for the WD inspired by the concept of conditional Selective Inference (SI). We will show that, by conditioning on the optimal coupling, the exact sampling distribution of the WD can be derived, which enables us to construct CI that has finite-sample coverage guarantee.

Bio
Vo Nguyen Le Duy is currently a PhD student at Nagoya Institute of Technology and Junior Research Associate at RIKEN Center for Advanced Intelligence Project, Japan. He received the B.S. degree from the Danang University of Science and Technology, Vietnam, in 2017. After that, he received the M.S. degree from Nagoya Institute of Technology, Japan,in 2020. His research interests include machine learning, data mining, and statistical data analysis.

Error rates for kernel methods under source and capacity conditions

Abstract
We investigate the rates of decay of the prediction error for kernel methods under the Gaussian design and source/capacity assumptions. For kernel ridge regression, we derive all the observable rates, and characterize the regimes in which each hold. In particular, we show that the decay rate may transition from a fast, noiseless rate to a slow, noisy rate as the sample complexity is increased. For noiseless kernel classification, we derive the rates for two standard classifiers, margin-maximizing SVMs and ridge classifiers, and contrast the two methods. In both cases, the derived rates also describe to a good degree the learning curves of a number of real datasets. This is joint work with Bruno Loureiro, Florent Krzakala and Lenka Zeborová.

Bio
Hugo Cui is currently a PhD student in the Statistical Physics of Computation lab in EPFL, Switzerland. He previously studied theoretical physics at ENS Paris.

Domain Generalization via Adversarially Learned Novel Domains

Abstract
We focus on the domain generalization task, which aims to learn a model that generalizes to unseen domains by utilizing multiple training domains. More specifically, we follow the idea of adversarial data augmentation, which aims to synthesize and augment training data with ”hard” domains for improving the model’s domain generalization ability. Previous works augment training data only with samples similar to the training data, resulting in limited generalization ability. We propose a novel adversarial data augmentation method, termed GADA (Generative Adversarial Domain Augmentation), which employs an image-to-image translation model to obtain a distribution of novel domains that are semantically different from the training domains, and, at the same time, hard to classify. Evaluation and further analysis suggest that adversarial data augmentation with semantically different samples leads to better domain generalization performance.

Bio
Yu Zhe was born in AnHui, China in 1994. He received the B.S degree in computer science from the Guangdong University, China, of Foreign Studies in 2016, and the M.S degree in computer science from the Kanazawa University, Japan, in 2019. He is currently pursuing the Ph.D. degree in computer science at University of Tsukuba, Japan.

Neural Network Loss Landscapes: Symmetry-Induced Saddles and the Global Minima Manifold

Abstract
In this talk, I will present a combinatorial analysis of the number of critical manifolds in neural networks as a function of overparameterization by exploiting the permutation-symmetry between the hidden layer neurons. I will first introduce a saddle point type, the so-called symmetry-induced (SI) saddles, emerging from a redundant arrangement of neurons in deep neural networks. Then I will describe the precise geometry of the global minima manifold in a teacher-student setting by giving its number of components and flatness. Similarly, counting the possible arrangements of neuron groups inside a neural network, I will also give the number of SI saddle manifolds in terms of the student and teacher widths. Our analysis shows that overparameterization gradually smoothens the landscape due to a faster scaling of the number of global minima components than the number of SI saddle manifolds for wide networks. However, in mildly overparameterized networks, the loss landscape exhibits roughness and spurious local minima near the saddle manifolds. In this regime, we empirically show that gradient-based optimizers fail to find a zero-loss solution for a fraction of random initializations.

Bio
I am a doctoral researcher at EPFL in Computer and Communication Sciences, supervised by Clément Hongler and Wulfram Gerstner. My research focus is the theory of deep learning, in particular, neural network landscapes, training dynamics, mild overparameterization, generalization, random features, and kernel methods. I also explored out-of-distribution generalization during my internship at Meta AI. Before my Ph.D., I studied Electrical-Electronics Engineering and Mathematics double-major at Koç University in Istanbul. Before that, I competed in International Mathematical Olympiads.

Expert advice problem with noisy low rank loss

Abstract
We consider the expert advice problem with a low rank but noisy loss sequence, where the loss vector on each round is composed by the low rank part and the noisy part. This is a generalization of the works of Hazan et al. (2016) and Barman et al. (2018), where the former one only treats noiseless loss and the latter one assumes that the low rank structure is known in advance. We propose an algorithm, where during the learning process we can re-construct the kernel of the low rank part under the assumptions, that the low rank loss is noised and there is no prior information about low rank structure. With this kernel, we obtain a satisfying regret bound. Moreover, even if in experiment, the proposed algorithm performs better than Hazan’s algorithm and the Hedge algorithm.

Bio
LIU, Yaxiong is currently a postdoctoral researcher in Computational Learning Team (leading by Prof. Hatano) AIP RIKEN. He received Dr. Sci. Degree from Kyushu University in 2022. His research interests include online learning and its applications to privacy etc.

Beyond spectral gap: the role of the topology in decentralized learning

Abstract
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects of real-world behavior. First, the ‘spectral gap’ of the communication graph is not predictive of its empirical performance in (deep) learning. Second, current theory does not explain that collaboration enables larger learning rates than training alone. In fact, it prescribes smaller learning rates, which further decrease as graphs become larger, failing to explain convergence in infinite graphs. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. We quantify how the graph topology influences convergence in a quadratic toy problem and provide theoretical results for general smooth and (strongly) convex objectives. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.

Bio
Hadrien Hendrikx is a post-doc in the MLO team from EPFL, working with Martin Jaggi. Before that, he completed a Ph.D. at Inria Paris, under the supervision of Francis Bach and Laurent Massoulié. His research focuses on optimization for machine learning, and on decentralized methods in particular.

 DAY 2: Sep. 8

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

Abstract
The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.

Bio
Etienne Boursier completed his PhD at ENS Paris-Saclay in September 2021, under the supervision of Vianney Perchet and entitled “Statistical Learning in a strategical environment”. During his PhD, he studied multi-agent learning, combining (online) learning with game theoretical tools. In particular, he mainly focused on the problem of Multiplayer Multi-armed bandits, but also worked on other bandits related problems, Social Learning and Utility/Privacy trade-off. Since October 2021, he is a postdoc in the Theory of Machine Learning Lab led by Nicolas Flammarion at EPFL. He is currently focusing on multitask/meta-learning and also providing theoretical insights on the empirical success of nonlinear neural networks.

Structure Search for Tensor Network Learning

Abstract
Recent works put much effort into tensor network structure search (TN-SS), aiming to select good tensor network structures involving TN-ranks, formats, and so on, for improving the performance of TNs on machine learning tasks. In this presentation, we will focus on the TN-SS problem and talk about the following three questions: 1) what is TN-SS, and what motives the studies on this issue; 2) how to resolve it, and what we need to pay for the searching; 3) how much benefit can we achieve from TN-SS for machine learning? We will first show that TN-SS can be modeled as a combinatorial optimization problem. Then, two searching algorithms, TNGA and TNLS, will be introduced to solve the problem. Last, several applications will be discussed to demonstrate the potential benefit of TN-SS for machine learning. Related works were published in (Li et al., ICML’20, ICML’22).

Bio
Dr. Chao Li is currently an indefinite-term research scientist with the AIP center, RIKEN institute, since 2021. Before that, he was a post-doctoral researcher with RIKEN-AIP from 2018 to 2020. He obtained his bachelor’s and Ph. D. degrees at Harbin Engineering University (HEU) in China in 2006 and 2017, respectively. He regularly serves as a (senior) reviewer of ICML, NeurIPS, IJCAI, AAAI, and so on. His research interests include tensor network and machine learning.

Proximal Point Imitation Learning

Abstract
This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center point, we also obtain an offline IL algorithm enjoying theoretical guarantees in terms of required expert trajectories. Finally, we achieve convincing empirical performance for both linear and neural network function approximation.

Bio
I am currently a ELLIS PhD student at EPFL advised by Volkan Cevher and co-advised by Gergely Neu working on optimization methods for reinforcement and imitation learning. Previously, I obtained my MSc in Computational Science from EPFL and my BSc in Physics Engineering from Politecnico di Torino.

Reinforcement Learning via Symmetries of Dynamics

Abstract
Offline reinforcement learning (RL) leverages large datasets to train policies without interactions with the environment. The learned policies may then be deployed in real-world settings where interactions are costly or dangerous. Current algorithms over-fit to the training dataset and as a consequence perform poorly when deployed to out-of-distribution generalizations of the environment. We aim to address these limitations by learning a Koopman latent representation which allows us to infer symmetries of the system’s underlying dynamic. The latter is then utilized to extend the otherwise static offline dataset during training; this constitutes a novel data augmentation framework which reflects the system’s dynamic.

Bio
Currently I am a Research Scientist at RIKEN AIP with the focus on generalization in Reinforcement Learning (RL). My background is however in mathematical physics in which I have completed my Master at ETH Zurich and then my PhD at the Max-Planck Institute for Physics in Munich with a focus on Sting theory. During my first Postdoc at the University of Tokyo (Kavli IPMU) my research interests shifted towards the vibrant field of AI. These days my main ambition is to evolve RL-algorithms to be able to generalize to new tasks.

Lipschitz Function Approximation using DeepSpline Neural Networks

Abstract
In this talk, we investigate NNs with prescribed bounds on the Lipschitz constant. One possibility to obtain Lipschitz-constrained NNs is to impose constraints on the architecture. Here, it turns out that this significantly limits the expressivity if we use the popular ReLU activation function. In particular, we are unable to represent even simple continuous piece-wise linear functions. On the contrary, using learnable linear splines instead fixes this problem and leads to maximal expressivity among all component-wise activation functions. From the many possible applications of Lipschitz-constrained NNs, we discuss one in more detail to see that the theoretical observations also transition into improved performance.

Bio
Sebastian Neumayer studied mathematics at TU Kaiserslautern and received his PhD in 2020 at TU Berlin under the supervision of Gabriele Steidl. Currently, he is a postdoctoral researcher in the Biomedical Imaging Group at EPFL. His main research interests are centered around convex analysis, inverse problems, and theoretical aspects of neural networks. In the past months, he has focused on studying stability properties of neural networks and designing new network architectures.

A method to construct exponential family by representation theory

Abstract
Exponential families play an important role in the field of information geometry, statistics and machine learning. By definition, there are infinitely many exponential families. However, only a small part of them is widely used. We want to give a framework to deal with these “good” families systematically. In light of the observation that the sample spaces of most of them are homogeneous spaces of certain Lie groups, we proposed a method to construct exponential families on homogeneous spaces by taking advantage of representation theory in [1]. This method generates widely used exponential families such as normal, gamma, Bernoulli, categorical, Wishart, von Mises-Fisher, and hyperboloid distributions. In this talk, we will explain the method and its properties. [1] K. Tojo, T. Yoshino, A method to construct exponential families by representation theory, arXiv:1811.01394

Bio
Koichi Tojo is a postdoctoral researcher in RIKEN AIP at Mathematical Science Team. He received Ph.D. (Mathematical Science) under Prof. Taro Yoshino from the University of Tokyo in 2018. His research interests include representation theory, Lie group theory, information geometry and machine learning.

The Topological BERT: Transforming Attention into Topology for Natural Language Processing

Abstract
In recent years, the introduction of the Transformer models sparked a revolution in natural language processing (NLP). BERT was one of the first text encoders using only the attention mechanism without any recurrent parts to achieve state-of-the-art results on many NLP tasks. This talk introduces a text classifier using topological data analysis. We use BERT’s attention maps transformed into attention graphs as the only input to that classifier. The model can solve tasks such as distinguishing spam from ham messages, recognizing whether a sentence is grammatically correct, or evaluating a movie review as negative or positive. It performs comparably to the BERT baseline and outperforms it on some tasks. Additionally, we propose a new method to reduce the number of BERT’s attention heads considered by the topological classifier, which allows us to prune the number of heads from 144 down to as few as ten with no reduction in performance. Our work also shows that the topological model displays higher robustness against adversarial attacks than the original BERT model, which is maintained during the pruning process. To the best of our knowledge, this work is the first to confront topological-based models with adversarial attacks in the context of NLP.

Bio
Raphael Reinauer is currently a Postdoctoral Fellow in the Laboratory for Topology and Neuroscience at the École Polytechnique Fédérale de Lausanne in Switzerland. He completed his Ph.D. in Mathematics at the University of Münster in 2020 under the supervision of Michael Joachim. His research focuses on applied topological data analysis, machine learning, and natural language processing.

Machine Learning Approaches for EEG-derived Early-onset Dementia Neuro-biomarker Development

Abstract
Brain-computer interface (BCI) and efficient machine learning (ML) algorithms belonging to the so-called `AI for social good’ domain contribute to the well-being improvement of patients with limited mobility or communication skills. We will review our recent results from a project focusing on developing a dementia digital neuro-biomarker for early-onset prognosis of a possible cognitive decline utilizing a passive BCI approach. We will report findings from elderly volunteer pilot study groups analyzing EEG responses in a classical short-term memory evaluating oddball paradigms, reminiscent images, and emotional evaluation implicit learning tasks. Results using feature engineering approaches using signal complexity/criticality and information geometry employing Riemannian geometry tools, as well as end-to-end training machine models, will be discussed. The reported pilot studies showcase the vital application of artificial intelligence (AI) for an early-onset mild cognitive impairment (MCI) prediction in the elderly.

Bio
Tomasz (Tomek) M. RUTKOWSKI is an applied AI/ML research scientist at the RIKEN Center for Advanced Intelligence Project (AIP), a research fellow at The University of Tokyo, Japan and Nicolaus Copernicus University in Torun, Poland. Tomasz’s research interests include computational neuroscience, primarily passive brain-computer interfacing (BCI) dementia biomarkers elucidation, computational modeling of brain processes, and AI for social good applications. More information and publications are available at http://tomek.bci-lab.info/

Co-organized

Contact

Any questions regarding the event? Please contact us:
[email protected]