Evidence accumulation in changing environments: linking normative computation and neural implementation

Several models posit that perceptual decisions under uncertainty result from the lossless temporal accumulation of momentary ‘evidence’ for alternative world states. An often-overlooked challenge of perceptual decisions in natural environments is that the world state can undergo hidden changes. This requires adaptive tuning of the accumulation process to suit the statistics of the changes. We assessed the behavior of human decision-makers performing a perceptual choice task with state changes, compared the behavior with the normative accumulation process for this task, and unraveled the underlying large-scale neural mechanisms with magnetoencephalography (MEG). Observers’ choices were consistent with those of the normative model. Both gave especially strong weight to evidence samples that indicated a high probability of a state change. Choice-specific preparatory activity in movement-selective regions of motor and parietal cortex exhibited the same sensitivity to change-point probability as the normative model and the human observers, and encoded the model’s decision-variable in a near-categorical fashion. These features qualitatively distinguished human behavior from simpler decision algorithms (e.g. drift diffusion or leaky accumulation) but were all reproduced by a biophysically inspired attractor model of decision-making. We propose that attractor dynamics in decision-related cortical activity approximate normative evidence accumulation in changing environments.


Task and modelling
Seventeen human participants viewed fast sequences of evidence samples (dot locations with onset asynchrony = 0.4s) that were generated by one of two noisy sources, and the generative source could change at any time within a trial with low probability (the hazard rate H, set to 0.08). The participants' task was to report via leftor right-handed button press which source was 'active' at the end of each trial (Figure 1a).

Normative model
The normative evidence accumulation process for this task is the following (Glaze, Kable & Gold, 2015): (1) (2)     where LLRn is the strength of the signed evidence provided by sample n, Ln is signed belief (the decision variable) after accumulating that sample, and ψn is prior belief (a transformation of Ln-1 that depends on H). We derived quantitative expressions from this model for the change-point probability p(CP) associated with each sample and the observer's uncertainty at sample onset ( Figure 1b,c), both of which typically increase around the time of a change in the generative source. We simulated ten million samples given the task generative statistics, passed these through the normative accumulation process, and fit the following linear model: (3) This model accounted for 99.8% of the variance in belief updating. Moreover, both p(CP) and uncertainty, but particularly the former, significantly modulated the degree to which new evidence led to belief change: when a new sample was either indicative of a change-point or encountered in a state of uncertainty, that sample elicited stronger belief change (cf. Nassar et al., 2010).

Model fits and psychophysical kernels
We fit a version of the normative model to participants' "left" vs. "right" choices that allowed for subjectivity in the generative task statistics, as well as a possible bias in how the observers' weighted evidence samples that were (in)consistent with their existing beliefs. We also fit a model with the same subjectivity and bias terms but in which the non-linear belief updating prescribed by Equation 2 was replaced by a linear leak term, as has been commonly invoked in previous studies of decisionmaking in both static and changing environments.
The consistency between the model fits and human behavior was assessed through logistic regression: (4) where the set of 12 LLR terms (β1 per sample position i) estimate the standard psychophysical kernel reflecting the weight of evidence presented at each sample position on choice. As per Equation 3, the two sets of interaction terms (β2 and β3) estimate modulation of evidence weighting by p(CP) and uncertainty ( Figure 1d).

Attractor modelling
Choice behavior was also simulated using a reduction of a biophysical circuit model for decision-making that exhibits attractor dynamics (Wang, 2002, Neuron). The reduced version (Roxin & Ledberg, 2008) is described by the diffusion of a decision variable X in the doublewell potential φ: where µ is proportional to the differential stimulus input (in our case, LLR), and a and b shape the potential. We adjusted the parameters so that the model's kernels approximated the average kernels from our sample. Indeed, the kernels generated by the chosen parameter set were qualitatively highly similar to those observed for the normative model and human observers, including exhibiting strong sensitivity to p(CP) (Figure 1e).

MEG
We acquired 275-sensor MEG data while participants performed the above decision-making task, as well as two 'localizer' tasks that did not entail decisions in the presence of change-points, but rather: (i) a delayed hand movement task to isolate effector-selective motor preparatory activity; and (ii) passive viewing of the stimulus samples to isolate sensory cortical responses in the absence of decision-related activity (not reported here). During the delayed hand movement task, participants viewed a verbal cue ('LEFT'/'RIGHT') and executed the corresponding response (left-or right-handed button press) upon presentation of a 'Go' cue, after a blank delay interval. Single-trial MEG data for all tasks were submitted to time-resolved spectral analysis.

Delayed hand movement localizer
Data from the motor task were used to construct a filter for isolating activity across MEG sensors and frequencies related specifically to choice-selective motor preparation. Using lateralized MEG data from the delay period of the motor localizer, we computed a set of spatiospectral weights that discriminated between the two motor responses that were prepared on different trials. We then computed the dot product of these weights and the time-resolved lateralized data from the main task to yield a dynamic signature of choice-selective motor preparation during decision formation. For each evidence sample i, time point t and frequency f we then regressed this motor preparation signal onto computational variables from the normative model fits that combine to generate the updated belief state after accumulating sample i (cf. Equation 3): We found that the motor preparation signal significantly encoded all terms (Figure 2a), thus indicating that it faithfully reflects the normative evidence accumulation process prescribed by Equations 1 and 2.
We also characterized the shape of the relationship between the motor signal and the model-derived decision variable L by examining the slope of a unit square sigmoid function that was fit to the data. This function takes the form of a regular sigmoid when the slope is > 0, which captures the shape of the L-to-ψ mapping in the normative model (Equation 2) and is therefore what we expect to find for the motor signal if it truly encodes the evolving belief state. By contrast, the function takes the form of an inverse-sigmoid when the slope is < 0; this approximates the shape of the relationship between L and the most recently encountered LLR, and is therefore what we expect to find for a neural signal that encodes momentary sensory evidence. We found that the   function for the motor signal was clearly sigmoidal (fitted slope > 0). The same analysis applied to the attractor model simulations also yielded a function that was highly similar in shape (Figure 2b), suggesting that the attractor can approximate both the behavior and neural dynamics of the normative and human observers.

Source-level analysis of ROIs along the sensory-motor pathway
We used individual structural MRIs and adaptive spatial filtering ('linear beamforming') to reconstruct the MEG activity of 13 cortical regions of interest (ROIs) that spanned the sensory-motor pathway for the decisionmaking task. These ROIs included retinotopically organized areas of occipital, temporal, and intraparietal cortex (Wang et al., 2015), as well regions of posterior parietal and (pre-)motor cortex exhibiting hand movementselective activity lateralization (de Gee et al., 2017). The informational content of the lateralized signal (LI) within each ROI was interrogated using a model comparison approach. We constructed two linear models reflecting pure encoding of either sensory evidence or belief: (7) (8) Both models were fit to power lateralization values (i.e., right minus left hemisphere per ROI) across time and frequency and we computed the difference in Bayes Information Criteria (BIC) between them, such that a positive score indicates better goodness-of-fit for the belief encoding model while a negative score indicates better goodness-of-fit for the evidence encoding model. Consistent with the motor localizer results, we found belief encoding in M1, premotor cortex and a movement-selective area in the junction of the anterior intraparietal and postcentral sulcus. By contrast, more posterior cortical areas (from V1 extending both dorsally and ventrally) were better characterized as encoding the momentary sensory evidence (Figure 3). However, V1 exhibited several characteristics consistent with decisionrelated feedback that were not evident in posterior intraparietal sulcus (IPS0-3) and ventro-lateral occipital ROIs: near-categorical encoding of belief state about 0.5s after sample onset, a prolonged intrinsic timescale, and strong choice-predictive intrinsic fluctuations.