Learning the payoffs and costs of actions

How do we decide what to do in any given situation? We need to work out the payoffs and costs of actions – what we will gain and what it might cost us. But what is the exact nature of the mechanism in the brain which weighs up these payoffs and costs?

This article is an accessible summary of Möller & Bogacz 2019.

We strongly suspect that the mechanism for evaluating the payoffs and costs must lie in the basal ganglia, a set of structures underneath the cortex which is related to the initiation and inhibition of movements. It is thought that the possible payoffs and costs of a particular action are ‘encoded’ here by connections from the rest of the brain to two distinct populations of brain cells (neurons). These neurons are affected by the ‘dopaminergic signal’ which carries information about the motivational state. So, the neurons in the ‘Go’ pathway are excited by dopamine, and the neurons in the ‘No-Go’ pathway are inhibited by it. It is dopamine which changes the balance between the pathways and inclines us either to take a particular action or to avoid it.

For example, say the opportunity arises to pick an apple from a tree. Clearly, the payoff would be the nutrients available from eating the apple. The costs would be both the effort of climbing the tree and the risk associated with it. Nutrients are only really deemed valuable, however, if we’re hungry. If we are hungry, the payoff will be weighted more than the costs. If we’re not hungry, the costs will be weighted more than the payoff – which shows how motivation affects our decision-making.

But how exactly does the brain work out what the payoffs and costs of particular actions would be? How does it form the two sets of connections (those that form the ‘Go’ and ‘No-Go’ pathways)? This paper suggests that payoffs and costs are learned through reward prediction errors. So, to go back to our example, the first time you see an apple, you may overestimate its value. Next time you see an apple, you will have adjusted your estimation of its value, based on previous experience. Similarly, you tweak your estimation of how much effort it will cost you to get the apple. The surprise you feel when your prediction is not accurate is crucial to trigger learning.

The weights of the neurons in the ‘Go’ and ‘No-Go’ pathways are modified differently depending on what the brain has learned from repeated reward prediction. Brain plasticity comes into play as the brain learns from what happens in various situations. Without this knowledge, we would have a hard time trying to make good decisions.

This paper also reveals how important it is that the experiences of the payoff and cost associated with a particular action typically take place at different moments in time. You might find that the effort needed to pick the apple is more than you expected; seconds later, you might discover that the apple tastes even better than you thought it would. The brain is receiving two distinct pieces of information, and learns about positive and negative consequences by taking advantage of the fact that payoffs and costs seldom happen simultaneously (if they did, they would cancel each other out and there would be no learning).

Rather than proposing a new model to describe how the basal ganglia ‘learns’, this paper unifies two existing models describing what the basal ganglia learns and how it does it. The first model suggests that the basal ganglia acquires reasons to approach, and reasons to avoid, frequently arising opportunities. The payoffs and costs are weighed up according to motivational state. The second model sets out plasticity rules for the basal ganglia: how the learning takes place, biologically. But how does the brain then make use of what is learned? Our new research puts these two models and theories together, providing a flexible model of learning and decision-making which takes into account the motivational state as well as the learned representations of the payoffs and costs.

Summary by Jacqueline Pumphrey.