“Representation power and optimization ability of neural networks”
Friday March 10, 2023 | Time 10:00am CET
In this presentation, I will provide an overview of recent developments of our theoretical work on representation power and optimization ability of neural networks. In the first half, I will present a nonparametric estimation analysis of transformer networks in a sequence-to-sequence problem. Transformer networks are the fundamental model for recent large language models. They can handle long input sequences and avoid the curse of dimensionality with variable input dimensions. We show that they can adapt to the smoothness property of the true function, even when the smoothness towards each coordinate varies for different inputs. In the latter half, we consider a mean field Langevin dynamics for optimizing mean field neural networks. We present a convergence analysis of space-time discretized dynamics with a stochastic gradient approximation.
Taiji Suzuki is currently an Associate Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team leader of “Deep learning theory” team in AIP-RIKEN. He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He worked as an assistant professor in the department of mathematical informatics, the University of Tokyo between 2009 and 2013, and then he was an associate professor in the department of mathematical and computing science, Tokyo Institute of Technology between 2013 and 2017. He has a broad research interest in statistical learning theory on deep learning, kernel methods and sparse estimation, and stochastic optimization for large-scale machine learning problems. He served as area chairs of premier conferences such as NeurIPS, ICML, ICLR, AISTATS and a program chair of ACML. He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists’ Prize, and Outstanding Achievement Award in 2017 from the Japan Statistical Society.