FEF Biases the Persistence of Expectation

Expectations arise from past experience. How and where in the brain this happens is not well understood. Here, we used fMRI-guided HD-tDCS in combination with a new computational model and EEG to investigate the role of right frontal eye fields (rFEF) in the formation of expectations. Human participants performed a free choice saccade task before, during and after stimulation. Participants formed persistent choice biases based on choice history despite randomness in the task. Our model – a distributed Ornstein-Uhlenbeck process that was embedded in a reinforcement learning framework – allowed for quantification of the build-up of expectations underlying choice bias. Anodal (cathodal) stimulation increased (decreased) the influence of trial history on expectation. This effect was reversed post stimulation. Contrasting prevs post-stimulation EEG shows that the power of alpha and theta oscillations was dependent on the stimulation polarity, the amount of time that has passed since the previous choice, and the degree to which expectation biased the subsequent choice. This suggests that the neural activity giving rise to low frequency oscillations in FEF plays an active role in shaping how expectations form and persist.


Introduction
Random sequences lead to the persistence of expectations that are maladaptive in random experimental settings (eg Jarvik, 1951). Persistent expectations have been proposed to fall out of a mechanism that sequentially optimizes learning to the statistics of the world in the presence of computational cost (Yu and Cohen, 2008). How expectations arise in the brain is poorly understood. Computational models of choice have suggested that baseline information prior to the onset of a stimulus is sequentially updated across choices to reflect how expectations change with experience (Carpenter and Williams, 1995;Yu and Cohen, 2008). Here, we combine EEG and reinforcement learning of stochastic baseline activity with a novel noninvasive method (fMRI-guided High-definition transcranial direct current stimulation [HD-tDCS]) to assess how FEF, a region of the brain that has been implicated in sequential choice effects (Soltani et al, 2013), influences the formation and persistence of expectation. Keywords: Stochastic Processes; EEG; HD-tDCS; Expectation; FEF Figure 1: Free Choice Saccade Task and HD-tDCS (A) Single trial schematic. Each trial began with a variable delay period before the onset of two choice targets. Positive asynchrony values denote the rightward target appearing prior to the left, and vice versa 0, +/-16, 33, 66, 99ms. Participants were instructed to direct a saccade to eithertarget as fast as possible without anticipating.(B) Center out FEF localizer task. Saccades were performed clockwise during gradient EPI, and the resulting BOLD signal (C) was overlayed on a 3D reconstruction of the neuroanatomy to guide HD-tDCS electrode placement (D).

Model-based analysis of Behaviour
In the free choice saccade task, (Fig 1A), the asynchrony between targets and the fixation interval was 623 This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0 randomized across trials. Thus, the optimal strategy for fastest saccadic selection is to ignore any expectations built from the outcome of previous trials. Despite this, reaction times were sequentially-dependent, with sequences of repetitive choices to the same direction asymmetrically biasing early responses, and alternations biasing late responses (Fig 2C).
To model how the buildup of expectations influences subsequent reaction times, we fixed sensory best-fitting parameters from a bounded integration process, and allowed a distributed OU process capturing choice urgency (methods) to remain free when fitting to cumulative reaction time distributions. Model fits show the repetition of choice direction led to a progressive decrease in early reaction times, while alternating choices led to a progressive decrease in later reaction times. Fixing the sensory parameters shows that the influence of choice history on early and late reaction times can be quantitatively accounted for by the distributed OU process. This led us to the conclusion that repeating choices increased the rate of formation and the magnitude of expectation, while alternating choices reduced it.

Expectations are dissociably biased by HD-tDCS
Following our model-based analysis, we assessed the influence of stimulation on the sequential buildup of expectation by taking the KL divergence of early and anticipatory reaction times (-250:150 ms) from the cumulative distribution of anodal and cathodal stimulation, sorted by the number of choice repetitions in a row. Fig 3A shows a progressive decrease in early reaction times (first half of distribution) following anodal stimulation, as denoted by an increase in the median KL divergence. Fig 3C shows that the sequential change is negated in the post-stimulation period. Fig 3B shows the opposite relationship during cathodal stimulation, with a sequential increase in early repetition reaction times compared to pre-stimulation that is reversed in the post-stimulation period ( Fig 3D).

Role of low-frequency oscillations in persistent expectation
To contrast the effects of anodal and cathodal stimulation in the brain, we recorded EEG at the center and surround electrodes before and after stimulation. Timefrequency subtractions prior to the repetition ( Fig 4A) and alternation (Fig 4B) reveal that pre-choice alpha and theta oscillations were modulated in power relative to both the condition (anodal vs cathodal), the time relative to fixation and target onset, and the previous choice. Coupled with our behavioural analysis, this suggests a    (bottom). TFRs were computed relative to fixation onset (left) and target onset (right), with the color map denoting relative frequency power Z scores (cathodal higher in yellow, anodal higher in blue .

Conclusions
Our work demonstrates a role for FEF in biasing how expectations form and persist within and across saccadic choices. Future work will exploit the repeated sessions by each participant to analyze test-retest reliability in response to HD-tDCS, and assess individual heterogeneity through behaviour and single-subject anatomical current reconstructions.

Free Choice Task
Participants performed a free choice saccade task ( Fig  1A), during which saccades were directed as fast as possible (without anticipating) to either of two choice targets presented asynchronously. The magnitude of asynchrony and the length of the fixation interval were randomized. The task was performed in blocks of 90 trials. 8 participants performed 10 sessions, each of which contained 5 blocks of pre-stimulation behaviour, 5 blocks stimulation, and blocks post-stimulation.

fMRI-guided HD-tDCS
To target the site of stimulation, we localized the right frontal eye field using a saccade task during functional MRI. BOLD signals were collected using a gradient echo planar imaging sequence, and preprocessed with Brainvoyager QX 2.6. A T1-weighted anatomical scan was also performed, and a 3D reconstruction was overlayed with the functional data using Brainsight, which guided electrode placement prior to each session.
A 4X1 center surround electrode configuration was used to minimize current spread outside of rFEF. The central electrode was placed directly over the place on the scalp perpendicular to the cortical localization. Stimulation was administered for 21 minutes (20 minutes on, 30 seconds ramp up/down) with a current strength of 2mA. 8 Participants underwent 10 sessions of stimulation, each alternating between anodal (hyperpolarizing) and cathodal (depolarizing).

EEG Analysis
We computed the surface laplacian of the central electrode relative to the four surround electrodes, and contrasted the time-frequency response of pre-and poststimulation periods relative to fixation and target onset. EEG data was acquired from the same electrodes delivering the stimulation using a 16 channel EEG amplifier (V-AMP, Brain Products) sampled at 2kHz, and bandpassed between 1-45 Hz. Time-frequency response was computed using a wavelet decomposition (Tallon-Baudry et al, 1997).

Reinforced Gated Accumulator
Our model is composed of three components: an Ornstein-Uhlenbeck process representing the buildup of choice urgency, a diffusion process representing sensory integration, and an actor-critic RL module describing across-trial dynamics (Fig 2A). The diffusion process is gated by the OU process. The model is distributed across many dimensions to analytically capture the form of empirical reaction time distributions.

Distributed Ornstein-Uhlenbeck Model
We propose a continuous-time stochastic process to describe the influence of dynamic baseline activity on reaction times during the fixation interval. The model describes saccadic choice as two processes: a pre-target process that represents the growing urgency to receive sensory information, and a subsequent drift-diffusion that represents sensory integration. An Ornstein-Uhlenbeck (OU) process is a simple stochastic model for a process that reverts to a long-run mean. A distributed OU model is composed of many individual processes that are driven to a common equilibrium acting as an order parameter (Haken, 1986).. Let X 1 , X 2 , ...X K be a set of stochastic processes which evolve in time as individual realizations of an OU process, where X i = x 1 , x 2 , .., x t , ..., x n Individual realizations satisfy the stochastic differential equation Where θ is the drift rate, µ is the mean value the realization reverts to. W is a standard Weiner process, where The sample path for a single Ornstein-Uhlenbeck process x 1 , x 2 , ..x t can be computed analyticaly using a scaled time-transformed Weiner process (Doob, 1942).
We want to solve for the distribution of values X for all of t. We assume that one independent stochastic process x 1 , x 2 , ..x n operates under mean-reversion order parameters that represent the influence of choice history.
Here, the global stochastic properties θ, µ, and σ. act as order parameters that can fully describe the evolution of the system under the mean-field approximation. µ encodes a spatial prediction by setting the relative threshold-baseline difference for subsequent bounded integration, while θ encodes a temporal prediction by setting how fast this spatial prediction reaches equilibrium. σ describes shared variance between the two predictions.

Bounded Integration
We assume that each choice process evolves with equation 3 under mean-field Gaussian initial conditions to provide a threshold-baseline difference. Upon target presentation, we now have many bounded integration processes integrating sensory information with average rate r, and a threshold-baseline difference unique to the initial condition of each choice process. We describe a population readout of all choice processes encoding this information as the sum of an extended integration model (Nakahara et al, 2006) (4) where s(i) is the threshold-baseline difference for process i, µ ( r) is the average rate of sensory integration, and σ ( r) is the standard deviation of sensory integration. P T (j) is the reaction time probability at time t. Thus, many processes responsive to visual information at the position of the target integrate sensory information with a constant average rate (subject to fluctuations) are predicated on the initial conditions of its baseline at the time of target onset.

Actor-Critic
We simulate an actor-critic framework, in which expectation of choice direction and timing are embedded in the OU urgency signal and updated across trials. Following each choice, a critic estimating the elapsed time of choiceθ and choiceμ updates the OU process S i (actor) based on the temporal difference error (Sutton and Barto, 1998) from the previous state S i−1 , and the magnitude of asynchrony between the two targets R.

Model-Fitting and RT Analysis
Model-fitting was performed by sampling from the resulting distribution of equation 6, and minimizing an Akaike Information Criterion over cumulative reaction time distributions using a Bayesian adaptive direct search algorithm (Acerbi and Ma, 2018). Reaction time analysis for visualization and stimulation comparison was computed as the Kullback-Leibler divergence of binned data.