Reinforcement Learning and the Brain ‒ LCN ‐ EPFL

Humans and animals learn by trial-and-error to repeat rewarded behavior and avoid actions with unpleasant consequences. Reinforcement learning is a computational framework to study this kind of learning.

Simple reinforcement learning methods are well-established models of learning in stimulus-response-reward (operant conditioning) experiments; mismatches between actual and expected reward (reward-prediction errors) are believed to be signalled by dopamine neurons; and dopamine is known to modulate synaptic plasticity as a third factor in NeoHebbian plasticity rules.

Much less is known about:

The biological mechanisms that underly reinforcement of actions that lead days or months later to reward (latent learning).
Reinforcement learning in settings where the individual observes at any moment in time only a fraction of the behaviorally relevant state of the environment (partial observability).
Interactions of the reward system with an individual’s knowledge about the environment (the agent’s model of the world).
The factors favoring exploratory behavior over exploitatory repetition of previously rewarded actions (curiosity).

At the LCN we address such questions with research on three different levels. First, on the level of synaptic plasticity rules and biological neural networks we study how reinforcement learning methods are implemented in brains. Second, we analyse reward-driven learning, decision-making and behavior of different species to find reinforcement learning algorithms that replicate this behavior and correlate with measured brain signals. Third, we investigate on a more abstract level reinforcement learning models of animal and human cognition, planning and learning.

Recent papers from the LCN

H.A. Xu, A. Modirshanechi, M.P. Lehmann, W. Gerstner, M.H. Herzog (2021)
Novelty is not Surprise: Human exploratory and adaptive behavior in sequential decision-making.
PLoS Comput Biol 17: e1009070

M. Lehmann, H.A. Xu, V. Liakoni, M. Herzog, W. Gerstner, K. Preuschoff (2019)
One-shot learning and behavioral eligibility traces in sequential decision making
eLife 2019;8:e47463

M. Martinolli, W. Gerstner, A. Gilra (2018)
Multi-Timescale Memory Dynamics Extend Task Repertoire in a Reinforcement Learning Network With Attention-Gated Memory
Frontiers in Computational Neuroscience, 12

W. Gerstner, M. Lehmann, V. Liakoni, D. Corneil, J. Brea (2018)
Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules
Frontiers in Neural Circuits, 12

D. Corneil, W. Gerstner, J. Brea (2018)
Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1049-1058

N. Frémaux, H. Sprekeler, W. Gerstner (2013)
Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons
Plos Computational Biology, 9 (4), 1-21 (2013)

N. Frémaux, H. Sprekeler, W. Gerstner (2010)
Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity
Journal of Neuroscience, 30 (40), 13326-13337 (2010)