Dopamine encoding of novelty facilitates efficient uncertainty-driven exploration.

Wang Y
Lak A
Manohar S
Bogacz R

A trial-and-error process is often necessary to determine the most rewarding action in a certain context. Determining how many resources should be allocated to acquiring information (“exploration”) and how much to utilising existing information to maximise reward (“exploitation”) is key to overall effectiveness. We propose a theory whereby a deep brain area (the basal ganglia) uses an algorithm to optimally allocate resources between exploration and exploitation. We also test our theory with experimental results and assess the performance of this algorithm.

Scientific Abstract

When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.

Two figure; top is a schematic of a circuit made of nodes represented as circles and connections between them as lines. Bottom is a figure of an curved line in red and individual data points in blue
D1/D2 receptor-expressing neurons are involved in direct and indirect striatal pathways in the basal ganglia, respectively, and dopamine has opposite effects on the two pathways (top). With dopaminergic neurons encoding novelty (bottom), the circuit dynamics give rise to representation of posterior uncertainty about mean reward levels (𝜎), which is used to effectively modulate explorative behaviour.

2024. PLoS Comput Biol, 20(4)e1011516.

Related Content
Herz DM, Zavala B, Bogacz R, Brown P

2016.Curr. Biol., 26(7):916-20.

Fischer P, Tan H, Pogosyan A, Brown P

2016.Eur. J. Neurosci., 44(5):2202-13.

Shah A, Nguyen TAK, Petermann K, Khawaldeh S, Debove I, Shah SA, Torrecillos F, Tan H, Pogosyan A, Lachenmayer ML, Michelis J, Brown P, Pollo C, Krack P, Nowacki A, Tinkhauser G

2023. Neuromodulation, 26(2):320-332.

Syed EC, Grima LL, Magill PJ, Bogacz R, Brown P, Walton ME
2016.Nat. Neurosci., 19(1):34-6.