A Normative Account of Confirmation Bias During Reinforcement Learning.
Confirmation bias refers to our tendency to pay more attention to information consistent with our past experience or beliefs. Here, we used computer simulations to investigate whether and how confirmation bias can be useful in certain situations. We show that confirmation bias can enable more accurate decisions in the face of variability, because it leads to higher levels of confidence and helps ensure choices are less likely to be swayed by variability.
Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.