Optimization theories of neural networks with its statistical perspective
Prof. Taiji Suzuki, The University of Tokyo and AIP-RIKEN, Japan
Wednesday, Oct. 20, 2021, 10:00am-11:00pm (CEST)
In this talk, I discuss some optimization theories of deep learning and its impact on generalization ability. First, I present a deep learning optimization framework based on a noisy gradient descent in an infinite dimensional Hilbert space (gradient Langevin dynamics), and show generalization error and excess risk bounds for the solution obtained by the optimization procedure. The proposed framework can deal with finite and infinite width networks simultaneously unlike existing one such as neural tangent kernel and mean field analysis. It can be shown that deep learning can avoid the curse of dimensionality in a teacher-student setting, and eventually achieve better excess risk than kernel methods.
Next, I present a particle type optimization technique of two layer neural network in the mean field regime. The proposed method, called particle dual averaging (PDA), generalizes the dual averaging method in a finite dimensional convex optimization to the optimization over probability distributions, and is justified by quantitative global convergence theory. In addition to that, I present a stochastic dual coordinate ascent version of PDA. Unlike PDA, it can achieve an exponential convergence in terms of the number of the outer loops.
Finally, (if I have time,) I will discuss the generalization error of preconditioned ridgeless regression in the overparameterized regime. In particular, I will discuss the optimal preconditioner for both the bias and variance and how it depends on label noise and shape of the signal.
Taiji Suzuki is currently an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team leader of “Deep learning theory” team in AIP-RIKEN.
He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He has a broad research interest in statistical learning theory on deep learning, kernel methods and sparse estimation, and stochastic optimization for large-scale machine learning problems. He served as area chairs of premier conferences such as NeurIPS, ICML, ICLR, AISTATS and a program chair of ACML.
He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists’ Prize, Outstanding Achievement Award in 2017 from the Japan Statistical Society, Outstanding Achievement Award in 2016 from the Japan Society for Industrial and Applied Mathematics, and Best Paper Award in 2012 from IBISML.