From statistical physics theory to machine learning algorithms: how to beat neural scaling laws through data pruning
Monday October 3, 2022 | 6pm
Statistical mechanics theory and neural network experiments have long enjoyed fruitful interactions spanning the fields of neuroscience and machine learning alike. These interactions have provided both conceptual insights into neural network function as well as engineering insights into how to improve network performance. We will review some of our recent work in this area and then focus on one recent story involving neural scaling laws and how to beat them. Neural scaling experiments reveal that the error of many neural networks falls off as a power law with network size, dataset size or compute. Such power laws have motivated significant societal investments in large scale model training and data collection efforts. Inspired by statistical mechanics calculations, we show both in theory and in practice how we can beat neural power law scaling with respect to dataset size, sometimes achieving much better exponential scaling instead, by collecting small carefully curated datasets rather than large random ones. This suggests a promising path forward to more resource efficient machine learning may lie in the creation of carefully selected foundation datasets capable of training many different models.
Surya Ganguli triple majored in physics, mathematics, and EECS at MIT, completed a PhD in string theory at Berkeley, and a postdoc in theoretical neuroscience at UCSF. He is now an associate professor of Applied physics at Stanford where he leads the Neural Dynamics and Computation Lab and is a Research Scientist at Meta AI.
His research spans the fields of neuroscience, machine learning and physics, focusing on understanding and improving how both biological and artificial neural networks learn striking emergent computations. He has been awarded a Swartz-Fellowship in computational neuroscience, a Burroughs-Wellcome Career Award, a Terman Award, a NeurIPS Outstanding Paper Award, a Sloan fellowship, a James S. McDonnell Foundation scholar award in human cognition, a McKnight Scholar award in Neuroscience, a Simons Investigator Award in the mathematical modeling of living systems, and an NSF career award.