Computational Neuroscience Seminar - LCN
18.09.09 Friday, 12h15,
BC 01
Henning Sprekeler, Laboratory for Computational Neuroscience, EPFL
Reward-modulated spike timing-dependent plasticity requires a reward prediction system
Abstract:
It is commonly believed that synaptic plasticity is the major neuronal
correlate of learning and memory. A prominent type of synaptic
plasticity is spike timing-dependent plasticity (STDP), which depends
on the exact timing of pre- and postsynaptic spikes. STDP is the
spiking correlate of Hebbian learning and most mathematical models of
STDP have the form of unsupervised learning rules. Although such
learning rules are capable of establishing basic sensory
representations, they fail to take signals that signify behavioral
relevance into account and are therefore not suitable for behavioral
learning.
A number of recent modelling studies have extended STDP to a 3-factor
rule, which takes -- in addition to pre- and postsynaptic activity --
behavioral reward signals into account. Most of these learning rules
are ad-hoc models, inspired by biological findings. Here, we address
the question, if and under which conditions these learning rules do
what they are supposed to: Do they increase the expected reward
received by the animal? Using analytical arguments, illustrated by
simulations, we show that reward-modulated STDP fails on many reward
learning tasks, unless it is modulated not by a pure reward signal,
but rather by the discrepancy between the actually received reward and
an internal prediction of the expected reward. This prediction has to
be task-specific. We conclude that, if the brain uses reward-modulated
STDP for behavioral learning, a separate prediction system is required
that transforms behavioral rewards into reward prediction errors.
back |