Confirmation bias optimizes reward learning

Tor Tarantola; Tomas Folke; Annika Boldt; Omar D. Pérez; Benedetto De Martino

doi:10.1101/2021.02.27.433214

ABSTRACT

Confirmation bias—the tendency to overweight information that matches prior beliefs or choices—has been shown to manifest even in simple reinforcement learning. In line with recent work, we find that participants learned significantly more from choice-confirming outcomes in a reward-learning task. What is less clear is whether asymmetric learning rates somehow benefit the learner. Here, we combine data from human participants and artificial agents to examine how confirmation-biased learning might improve performance by counteracting decisional and environmental noise. We evaluate one potential mechanism for such noise reduction: visual attention—a demonstrated driver of both value-based choice and predictive learning. Surprisingly, visual attention showed the opposite pattern to confirmation bias, as participants were most likely to fixate on “missed opportunities”, slightly dampening the effects of the confirmation bias we observed. Several million simulated experiments with artificial agents showed this bias to be a reward-maximizing strategy compared to several alternatives, but only if disconfirming feedback is not completely ignored—a condition that visual attention may help to enforce.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

↵+ Equal contribution
Removed the middle name of one of the authors, at the request of that author. Added ORCID to one author.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.