Unlocking a new dimension in the speed–accuracy trade-off

Why do we sometimes spend too much time on seemingly impossible-to-solve tasks instead of just moving on? Masís et al. provide a new perspective on the speed-accuracy trade-off (SAT), showing that, although prolonging deliberation looks suboptimal in the short run, it is a long-term investment that helps organisms reach proficient performance more rapidly.

Goal-directed agents operate within finite segments of time and, therefore, must constantly weigh the benefits of deliberating for longer against the costs of acting too slowly. When information about the available alternatives is noisy, taking more time to deliberate allows extra evidence to be considered, increasing the likelihood of correctly identifying the most gratifying course of action. However, prolonging deliberation is also costly because inaction delays gratification and increases the chances of foregoing other opportunities [1]. Understanding how biological organisms manage this socalled 'SAT' has been a central goal in the cognitive and neural sciences [2]. A recent study by Masís and colleagues [3] reconceptualises how the SAT might be solved in biological brains by proposing that taking more time to deliberate confers a previously overlooked long-term benefit: it enhances learning and enables an organism to master a task more quickly.
Influential research in statistics has offered a normative solution to the SAT [4].
Specifically, when deciding between two alternatives in a stationary statistical environment, agents maximise their reward rate by accumulating the relative evidence in favour of one alternative over the other, up until the relative accumulated evidence reaches an optimally placed criterial level (or response criterion). In turn, this optimal response criterion is a function of the throughput of information (the signal-tonoise ratio; SNR), which is influenced by the amount of external and internal (processing) noise that impinges on decision-relevant information. Thus, in the face of unsurmountable noise, the normative recipe dictates setting the response criterion at a low level to avoid wasting too much time on a seemingly impossible task. Conversely, when the incoming evidence is highly informative, the normative procedure considers it worthwhile to take more time to gather evidence and, thus, prescribes a higher response criterion [2]. Subsequent work in mathematical psychology and decision neuroscience formulated computational models of noisy evidence accumulation that mimic the statistically optimal procedure for decisions between two alternatives [5]. In parallel, empirical research has assessed the extent to which humans and other animals performing simple perceptual decisions maximise reward rate by optimally adjusting their response criterion to the levels of external and internal noise [6]. These assessments have favoured the view that, after extensive learning, performance approximates the reward-rate maximising standard. However, the way agents learn to optimally set their response criterion is not well understood. Masís et al. [3] set out to fill this gap by examining what happens in the 'dark matter' training period that precedes the stage of proficient performance. In one experiment, the authors reported that rats performing a free-response binary choice (visual recognition) task unsurprisingly began with a limited ability to perform the task at hand, which formally translates into a low SNR. Surprisingly, however, and contrary to the reward-rate optimal strategy, which requires fast responses under low SNR conditions, rats responded too slowly during these early trials.
The authors entertained the intriguing hypothesis that this sluggish and seemingly suboptimal early behaviour could have a normative explanation. Specifically, it is well established that, as agents perform more experimental trials, their SNR gradually improves, a process dubbed 'perceptual learning' [7]. Masís et al. [3] implemented perceptual learning in an artificial recurrent neural network (RNN) that gradually improves the quality of information encoding (the SNR) based on error-corrective feedback. Through highly sophisticated mathematical derivations, the authors demonstrated that as the amount of accumulated evidence preceding incorrect choices increases, the RNN reaches the asymptotic SNR more rapidly. Thus, setting a high response criterion during the early training trials can be regarded as a long-term investment, a strategy that foregoes shortterm reward-rate optimisation to swiftly catapult rats to the stage of proficient task performance.
In two further experiments, Masís et al. [3] sought to corroborate the hypothesis that slow responses during learning subserve long-term reward accrual. In a second experiment, rats were forced to either respond slower or faster than their average latencies. It was shown that the slow group improved the SNR more rapidly and ended up achieving a higher reward rate, indicating that longer deliberation guarantees more rapid learning. To ensure that these results truly reflect a strategic investment and do not just coincidentally fall out from more naive policies, the authors showed in a third experiment that when the task is unlearnable, rats opt for a low response criterion and the reward-rate optimal strategy from the beginning. This suggests that the speed of responses is

Trends in
Cognitive Sciences indeed subject to strategic control depending on whether foregoing short-term reward rate is a sensible investment or not.
Taken together, these findings unlock a new dimension in the SAT. Prolonging deliberation has always been thought to confer a local benefit in the amount of evidence that is available for a single decision, at the expense of delaying gratification. Masís et al. [3] move beyond this myopic costbenefit conceptualisation, proposing that prolonging deliberation has far-reaching benefits. That is, engaging for longer when you are still learning the ropes of a new task might appear fruitless, but is in fact a more effective learning strategy. Future work could examine whether this insight also applies to humans, who often learn new tasks via verbal instructions and high-level abstraction and reasoning rather than by extensive trial and error, or whether the slowing down observed during the initial stage of training mainly supported memory encoding, a vital element in visual recognition tasks.
Normative theories have been a constant source of inspiration and a starting point for cognitive and behavioural scientists, who seek to understand whether the behaviour of biological organisms complies with normative ideals. However, biological organisms strive to minimise costs (e.g., mental effort [8] or metabolic expenditure [9]) and to reap benefits that are irrelevant to nonbiological normative agents. Understanding the way normative trade-offs, such as speed versus accuracy or exploration versus exploitation, pan out within the stochastic and plastic reality of biological brains may help us understand behavioural suboptimalities in a new light [10].