“Neural Network Loss Landscapes: Symmetry-Induced Saddles and the Global Minima Manifold”
September 7, 2022 | Time 12:30 CET
In this talk, I will present a combinatorial analysis of the number of critical manifolds in neural networks as a function of overparameterization by exploiting the permutation-symmetry between the hidden layer neurons. I will first introduce a saddle point type, the so-called symmetry-induced (SI) saddles, emerging from a redundant arrangement of neurons in deep neural networks. Then I will describe the precise geometry of the global minima manifold in a teacher-student setting by giving its number of components and flatness. Similarly, counting the possible arrangements of neuron groups inside a neural network, I will also give the number of SI saddle manifolds in terms of the student and teacher widths. Our analysis shows that overparameterization gradually smoothens the landscape due to a faster scaling of the number of global minima components than the number of SI saddle manifolds for wide networks. However, in mildly overparameterized networks, the loss landscape exhibits roughness and spurious local minima near the saddle manifolds. In this regime, we empirically show that gradient-based optimizers fail to find a zero-loss solution for a fraction of random initializations.
I am a doctoral researcher at EPFL in Computer and Communication Sciences, supervised by Clément Hongler and Wulfram Gerstner. My research focus is the theory of deep learning, in particular, neural network landscapes, training dynamics, mild overparameterization, generalization, random features, and kernel methods. I also explored out-of-distribution generalization during my internship at Meta AI. Before my Ph.D., I studied Electrical-Electronics Engineering and Mathematics double-major at Koç University in Istanbul. Before that, I competed in International Mathematical Olympiads.