Computational Neuroscience Seminar - LCN


18.09.09 Friday, 12h15, BC 01

Henning Sprekeler, Laboratory for Computational Neuroscience, EPFL

Reward-modulated spike timing-dependent plasticity requires a reward prediction system

Abstract:
It is commonly believed that synaptic plasticity is the major neuronal correlate of learning and memory. A prominent type of synaptic plasticity is spike timing-dependent plasticity (STDP), which depends on the exact timing of pre- and postsynaptic spikes. STDP is the spiking correlate of Hebbian learning and most mathematical models of STDP have the form of unsupervised learning rules. Although such learning rules are capable of establishing basic sensory representations, they fail to take signals that signify behavioral relevance into account and are therefore not suitable for behavioral learning.

A number of recent modelling studies have extended STDP to a 3-factor rule, which takes -- in addition to pre- and postsynaptic activity -- behavioral reward signals into account. Most of these learning rules are ad-hoc models, inspired by biological findings. Here, we address the question, if and under which conditions these learning rules do what they are supposed to: Do they increase the expected reward received by the animal? Using analytical arguments, illustrated by simulations, we show that reward-modulated STDP fails on many reward learning tasks, unless it is modulated not by a pure reward signal, but rather by the discrepancy between the actually received reward and an internal prediction of the expected reward. This prediction has to be task-specific. We conclude that, if the brain uses reward-modulated STDP for behavioral learning, a separate prediction system is required that transforms behavioral rewards into reward prediction errors.

back