Distinct cortico-striatal connections with subthalamic nucleus underlie facets of compulsivity

The capacity to flexibly respond to contextual changes is crucial to adapting to a dynamic environment. Compulsivity, or behavioural inflexibility, consists of heterogeneous subtypes with overlapping yet discrete neural substrates. The subthalamic nucleus (STN) mediates the switch from automatic to controlled processing to slow, break or stop behaviour when necessary. Rodent STN lesions or inactivation are linked with perseveration or repetitive, compulsive responding. However, there are few studies examining the role of latent STN-centric neural networks and compulsive behaviour in healthy individuals. We therefore aimed to characterize the relationship between measures of compulsivity (goal-directed and habit learning, perseveration, and self-reported obsessive – compulsive symptoms) and the intrinsic resting state network of the STN. We scanned 77 healthy controls using a multi-echo resting state functional MRI sequence analyzed using independent components analysis (ME-ICA) with enhanced signal-to-noise ratio to examine small subcortical structures. Goal directed model-based behaviour was associated with higher connectivity of STN with medial orbitofrontal cortex (mOFC) and ventral striatum (VS) and more habitual model-free learning was associated with STN connectivity with hippocampus and dorsal anterior cingulate cortex (ACC). Perseveration was associated with reduced connectivity between STN and premotor cortex and finally, higher obsessive –compulsive inventory scores were associated with reduced STN connectivity with dorsolateral prefrontal cortex (PF). We highlight unique contributions of diffuse cortico-striatal functional connections with STN in dissociable measures of compulsivity. These findings are relevant to the development of potential biomarkers of treatment response in neurosurgical procedures targeting the STN for neurological and psychiatric disorders.


Introduction
The capacity to flexibly adapt to dynamic environments is a crucial component of optimal daily functioning. The development and emergence of rigid or inflexible behavioural patterns is dimensionally relevant across multiple psychiatric disorders, including addiction and obsessive-compulsive disorder. The construct compulsivity describes this tendency towards repetitive, deleterious behaviours that persist despite negative consequence (Robbins, Gillan, Smith, de Wit, & Ersche, 2012). Compulsivity can be deconstructed into several components, each detailing distinct cognitive contributions to the behavior and associated with overlapping yet distinct neural substrates.
The subthalamic nucleus (STN) is a major relay structure in the indirect pathway of the basal ganglia crucially involved in the switch between automatic and controlled processing and the balance between inhibition and executive control (Jahanshahi, 2013). The STN receives afferents from cortical regions involved in executive control (Haynes & Haber, 2013), allowing hyper-direct control of basal ganglia output based on frontal innervations. Direct cortical projections to STN, particularly from the right inferior frontal cortex (Aron, Behrens, Smith, Frank, & Poldrack, 2007) can usurp the cortico-basal ganglia loops (Balaz, Bockova, Rektorova, & Rektor, 2011) to slow, break or stop responding (Aron et al., 2007), with the STN responding to stop cues whether actions are cancelled or not (Schmidt, Leventhal, Mallet, Chen, & Berke, 2013). In rodents, STN and medial prefrontal cortex (PFC) disconnection via contralateral lesions (Chudasama, Baunez, & Robbins, 2003) and STN lesion, stimulation and inactivation (Baunez & Lardeux, 2011) enhances perseveration, a repetitive, compulsive form of responding. Deep brain stimulation (DBS) to the STN in humans provides insight into the role of the STN in behaviour, cognition and disease states. DBS is delivered via electrodes inserted into grey or white matter and uses high frequency stimulation to modulate network activity or pathological oscillatory activity. STN DBS is effective for the symptomatic management of Parkinson's disease (PD). Impairment's in task switching in PD are improved by ventral STN DBS (but not dorsal) (Greenhouse, Gould, Houser, & Aron, 2013) implicating limbic and associative rather than motoric STN. Furthermore, STN hyperactivity in PD is associated with more habitual behaviour as measured by random number generation that requires habit suppression (Obeso et al., 2011), which is improved by STN DBS in this group (Witt et al., 2004). STN DBS targeting more limbic and associative regions has also been shown to be effective in the management of obsessive-compulsive disorder (OCD) characterized by impairments in behavioural flexibility such as enhanced habitual responding and impaired set shifting behaviours (Fineberg et al., 2015). Together these findings implicate the STN in habitual or inflexible behaviour modulation.
Recent computational models suggest parallel, interactive and dissociable systems of behavioural control: a fast, reactive and model-free system that relies on habitual learning in which previously reinforced behaviours are repeated; and a slower, deliberative model-based system for more flexible goal-directed behavior that takes into account the taskstructure or internalized task model. The relative influence of each system on choice has been assessed with a two-step task, demonstrating concurrent use of both systems in healthy functioning (Daw, Gershman, Seymour, Dayan, & Dolan, 2011), and a tendency towards habitual, model-free learning in methamphetamine addiction, binge eating disorder and obsessive compulsive disorder (Voon et al., 2014). The ventral striatum (VS) has been implicated as a key node in both systems (Daw et al., 2011;Morris et al., 2015). The medial orbitofrontal cortex (mOFC) (Morris et al., 2015) and dorsolateral prefrontal cortex (dlPFC) (Smittenaar, FitzGerald, Romei, Wright, & Dolan, 2013) have been implicated in the modelbased, goal-directed system. The two-step task also provides a measure of perseveration. Whereas habitual behaviours are defined as repeated choices of previously reinforced behaviours and are hence outcome sensitive, perseverative behaviors involve repetition of behaviour irrespective of the outcome. The neural correlates of perseverative behaviours are less well-understood.
Here we aimed to characterize the latent resting state network of the STN and its relationship with inter-individual variability in measures of behavioural inflexibility in healthy individuals. We hypothesize that lower goal-directed behaviours are associated with lower functional connectivity between the STN and medial OFC and dlPFC.

2.
Materials and methods

Participants
Healthy volunteers were recruited from community-based advertisements in East Anglia. Psychiatric disorders were screened with the Mini International Neuropsychiatric Interview (Sheehan et al., 1998). Subjects were excluded if they had a major psychiatric disorder, substance addiction or medical illness or were on psychotropic medications. Subjects were included if they were 18 years of age or over and had no history of regular or current use of other substances. All participants completed the National Adult Reading Test (Nelson, 1982) to assess verbal IQ. We used the self-reported Obsessive Compulsive Inventory-Revised (OCI) (Foa, Kozak, Salkovskis, Coles, & Amir, 1998) which measures subjective distress related to obsessive and compulsive thoughts and behaviours. Participants completed the behavioural measures and resting state functional MRI within the same day, with not more than 4 h of delay between. Participants provided written informed consent and were compensated for their time. The study was approved by the University of Cambridge Research Ethics Committee.

Model-free model-based task
We employed a two-step choice task (Daw et al., 2011) shown to elicit engagement of goal-directed (model-based) and habitual (model-free) learning systems, as well as perseveration (p). The task involved two stages. At stage 1, participants were offered a choice between two stimuli, each leading with a fixed probability to one of two states at stage 2. At stage 2, participants were offered another choice between two stimuli, each leading, with differing probabilities, to monetary reward. The probability of reward slowly shifts over the course of the task. Participants received extensive, self-paced training including practices demonstrating the concepts of stage transitions and probability, lasting 15e20 min. Choice of one stimulus at stage one led to one of two stimulus-pairs at stage two with a fixed probability (P ¼ .70 or .30). Choice of the other stimulus led to the same stage two but with the opposite fixed probability (P ¼ .30 or .70). Choice of a stimulus at stage two led to an independently varying probability of reward (between P ¼ .25 to .75). Participants had 2 s to make a decision and the transition between stages was 1.5 sec. The chosen stimulus at stage one remained on the screen during stage two of that trial as a reminder. Participants completed 201 trials divided into three sessions. The outcome was an image of £1. Habit learning was modeled using a model-free reinforcement learning algorithm. However, the goal-directed learning algorithm takes into account the state transitions. A weighting factor (w) was calculated for each individual, capturing the relative contribution of either habitual model-free (w ¼ 0) or goal-directed model-based (w ¼ 1) learning. Perseveration (p) provides a measure of the tendency to select the same first stage choice irrespective of outcome. The task was programmed with Matlab 2011a.

Computational modeling
This task had three states: stage-one state A (s A ); stage-two state B and C (s B and s C ). Each state had two actions: a A and a B . In Model free learning was modeled using a SARSA (l) temporal difference (TD) algorithm where each choice is based on a predicted long-run value [Q TD (s,a)] for each action a at each stage s. The TD reward prediction error (d) informs subsequent predictions. For each trial (t), the stage-one state s 1,t (s A ) requires an action a 1,t choice. The stage-two state s 2,t (s B or s C ) also requires an action a 2,t choice, leading to a reward r 2,t (£1 or £0). After each stage i (1,2) of each trial t, a prediction error d i,t will occur that will update the previous states' s i,t value Q TD and action a i,t : The action value of stage-one is updated depending on the value after the stage-two state, Q TD (s 2,t ,a 2,t ). r 1,t ¼ 0 because no reward is received at this stage and r 2,t then updates the value at the second stage. The terminal value Q TD (s 3,t ,a 3,t ) ¼ 0. A separate parameter captures the learning rate for the update of each stage (a 1 , a 2 ). The stage-one action value is updated by the stage-one prediction error and the stage-two prediction error at the end of each trial when r 2,t is received: Q TD ðs 1;t ; a 1;t Þ ¼ Q TD ðs 1;t ; a 1;t Þ þ a 1 ld 2;t This update extent is also determined by the eligibility trace parameter l. At stage-one (Q MB ), the model-based reinforcement learning algorithm calculated the action value per action based on the probabilities that the current action would lead to each stage two state [P(s B js A .a A ] ¼ .70; [P(s B js A .a A ] ¼ .30; and conversely for s C ) and the values of those states. Therefore, for each action a j (j ¼ A,B): The stage-two value is equivalent to the model-free value of the optimal action as both model-free and model-based values coincide at the end state. For each stage-one action, a net action value is calculated depending on the weighted sum of both model-free and model-based values: Here, w is a weighting parameter and higher w (w ¼ 1) indicates reliance on model-based learning strategies while lower w (w ¼ 0) indicates greater reliance on model-free. At stage two, Q NET ¼ Q TD . For each stage, the probability of a choice is calculated using the softmax equation in Q net : with higher values indicating higher reliability. P accounts for perseveration (P > 0) or switching (P < 0) of choices in stage one. rep(a) acts as a binary indicator such that it has a value of 1 if a is an action from stage one and a ¼ a 1 , tÀ1 , and otherwise equals 0.

Resting state functional MRI
We employed a novel multi-echo resting state functional magnetic resonance imaging (fMRI) acquisition and analysis with four-fold greater signal compared to noise (Kundu et al, 2012(Kundu et al, , 2013 Anatomical images were also acquired with a T1-weighted magnetization prepared rapid gradient echo (MPRAGE) sequence (176 Â 240 FOV; 1-mm in-plane resolution; inversion time, 1100 msec). Functional data was denoised using multi-echo independent component analysis (ME-ICA v2.5 beta10; http://afni. nimh.nih.gov). Data were decomposed into independent components with FastICA. Blood oxygen level dependent (BOLD) percent signal change is linearly proportional to echo time (TE). Thus, independent components that strongly scaled with TE were retained as BOLD data, after assignment of high Kappa scores (Kundu et al., 2012). Components that were TE independent were measured by the pseudo-F-statistic, Rho and represent non-BOLD artefacts, which were removed by projection. This robustly denoises data for motion, physiological and scanner artefacts based on physical principles (Kundu et al., 2013). Denoised echo planar images were coregistered to their anatomical MPRAGE image and normalized to the Montreal Neurological Institute (MNI) template. For correlations with behavioural measures, but not baseline mapping, spatial smoothing was performed with a Gaussian kernel full width half maximum ¼ 6 mm.
Functional connectivity was computed using a seed-driven approach using the CONN-fMRI Functional Connectivity toolbox (Whitfield-Gabrieli & Nieto-Castanon, 2012) for Statistical Parametric Mapping (SPM). Functional data was temporally band-pass filtered (.008 < frequency < .09 Hz). Significant principle components of white matter and cerebrospinal fluid were removed. For correlations with behavioural measures of compulsivity, STN seed-to-whole brain connectivity maps were computed and entered into second level correlation analysis controlling for age and gender. For the w and P scores we further controlled for the variance related to the other variable as covariates of no interest to account for multiple comparisons and highlight unique contributions of each. The STN region of interest (ROI) provided by Wake Forrest University PickAtlas (Maldjian, Laurienti, Kraft, & Burdette, 2003) was used as the STN seed. This has the same centre of mass as a previously used STN ROI based on task-based fMRI (Aron & Poldrack, 2006;Aron et al., 2007) (10, À14, À4 for right STN). Cluster extent threshold correction was used for correlations with behaviour, calculated at 15 voxels at p < .001 whole brain uncorrected, correcting for multiple comparisons at p < .05 assuming an individual-voxel Type I error of p ¼ .01 (Slotnick, Moo, Segal, & Hart, 2003). Due to the possibility of mixed signals arising from adjacent structures, we also examined the adjacent substantia nigra (SN) as a seed region to ensure specificity of the current findings to STN. Thus, the same correlation for w was performed for SN-towhole brain functional connectivity maps.

Participant characteristics
We acquired resting state fMRI data from 77 healthy controls ( Table 1 demonstrates the results of the correlation between STN seed-to-whole brain functional connectivity beta maps with the measures of interest, including both positive and negative correlations. The weighting factor, w, which describes the relative contribution of either habitual (modelfree, MF, w ¼ 0) or goal-directed (model-based, MB, w ¼ 1) learning tendencies, was positively correlated with STN connectivity with left VS and mOFC. These regions are illustrated in Fig. 1, alongside a plot of their functional connectivity with STN against w. Also, w correlated negatively with STN connectivity with left hippocampus, dorsal anterior cingulate cortex (ACC) and medial parietal cortex (statistics in Table 1).

Compulsivity measures
To examine the specificity of these correlations for STN, rather than adjacent structures, we examined adjacent SN. We found no similar pattern for SN functional connectivity and its relationship with w, suggesting that the current findings for STN were not driven primarily by signals from adjacent structures (Supplementary Table 1). To further confirm this, functional connectivity for adjacent SN (with regions currently implicated for STN and w, VS, medial OFC, dorsal ACC, hippocampus) was computed and correlated with w. No significant correlations were observed between adjacent SN and regions implicated for STN, with w (see supplementary materials).
For comparison purposes we also investigated perseveration, which was associated with reduced connectivity between STN and left premotor cortex and left insula (Fig. 1). OCI Statistics for the bilateral subthalamic nucleus (STN) seed-towhole brain connectivity positive and negative correlations with measures of compulsivity. Cluster extent threshold correction of 15 voxels at p < .001 whole brain uncorrected was used. Abbreviations: Z, Z score; xyz, peak voxel coordinates; w, weighting of model based (w ¼ 1) and model free (w ¼ 0) learning; OCI, obsessive compulsive index; OFC, orbitofrontal cortex; ACC, anterior cingulate cortex; PFC, prefrontal cortex; IFC, inferior frontal cortex.

Discussion
We illustrate the relationships between intrinsic resting state functional connectivity of the STN and behavioural measures of compulsivity across a relatively large sample of healthy volunteers. Higher connectivity between STN with medial OFC and left VS was associated with more model-based goaldirected learning whereas more model-free habitual learning implicated STN connectivity with dorsal ACC and left hippocampus. Furthermore, perseveration was associated with STN with premotor and insula connectivity whereas higher selfreported obsessive compulsive scores were associated with lower connectivity between STN and right dlPFC and left The two-step model-based modelfree learning task is depicted on the left. A stimulus chosen at stage 1 (S1) led with 70/30% probability to one of two states (pink or blue in the schematic image) at stage 2 (S2). Choice of a stimulus at S2 led, with varying probability, to reward or no reward. Subthalamic nucleus (STN) connectivity with whole brain was computed and correlated with w, the relative contribution of model-free (w ¼ 0) or model-based (w ¼ 1) learning tendencies derived from the task. The y axis represents the functional connectivity between STN and a given region, and the x axis is the behavioural measure of w (top) or perseveration (bottom). STN connectivity with VS and mOFC positively correlated with w (top) and STN connectivity with premotor cortex and insula negatively correlated with perseveration (bottom). Displayed at p < .005 whole brain uncorrected for illustration on standard MNI template. Fig. 2 e Subthalamic nucleus connectivity and compulsivity. Subthalamic nucleus (STN) connectivity with whole brain was computed and correlated with obsessive compulsive index. Abbreviation: DLPFC, dorsolateral prefrontal cortex. Displayed at p < .005 whole brain uncorrected for illustration on standard MNI template.
inferior parietal cortex. We highlight unique neural couplings of the STN, contributing to distinct measures of compulsivity. The relationship between model-basedness and STN connectivity with OFC and VS dovetails with several studies implicating this cortico-striatal pathway in model-based learning. Model-based behaviour has been associated with higher grey matter volume in the medial OFC (Voon et al., 2014) and the reward prediction errors used to guide both model-based and model-free behaviour are encoded by the VS (Daw et al., 2011). Furthermore, we have previously demonstrated that higher functional connectivity between medial OFC and VS is associated with greater model-based learning tendencies using the same task (Morris et al., 2015).
In contrast, greater habitual model-free learning was associated with greater connectivity of the STN with dorsal ACC and hippocampus. The neural correlates of modelfreeness have been less well established. Previous studies assessing habitual behaviour in humans have implicated the putamen and premotor cortex using the 'slips of action' task  and the supplementary motor area (SMA) using the current two-step task (Morris et al., 2015). Traditionally, there has been a dissociation between dorsal striatal habit and hippocampal declarative or cognitive memories driving behaviour (Broadbent, Squire, & Clark, 2007;Packard, Cahill, & McGaugh, 1994;Wingard & Packard, 2008). However, the hippocampus has been shown to encode reward prediction (Tanaka et al., 2004), which is necessary for the reinforcement learning that drives model-free behaviour (Glascher, Daw, Dayan, & O'Doherty, 2010). The dorsal ACC receives extensive projections from dopaminergic midbrain projections and is also implicated in reward prediction and prediction error for guiding reinforcement driven behaviour (Holroyd & Yeung, 2012;Kennerley, Walton, Behrens, Buckley, & Rushworth, 2006). Links between the STN and dorsal ACC have been exemplified by studies in PD patients, which show that STN DBS reduces cerebral blood flow in the dorsal ACC (Ballanger et al., 2009;Thobois et al., 2007). STN DBS affects habitual behaviour, as measured by the generation of a sequence of random numbers (requiring habit suppression), although DBS has been shown to both improve (Witt et al., 2004) and impair (Thobois et al., 2007) performance on this task. STN DBS has also been shown to consistently hasten responding in the context of conflict or competing responses related to mesial prefrontal theta activity (Cavanagh et al., 2011). In the context of habit learning, conflict resolution may be relevant in resolving choices that involve switching between strategies. Thus, the STN may mediate the shift between automatic habit learning from enhanced reliance on previously encoded reward prediction mediated via dorsal ACC and hippocampal structures to controlled goal-directed learning via the representation of goals in the medial OFC to flexibly guide responding.
Both w and perseveration capture similar repeated choices but are dissociated as a function of relevance of previously learned outcomes. We implicate a relationship between perseveration and STN connectivity with premotor cortex, a region responsible for action ownership and recognition (Ehrsson, Spence, & Passingham, 2004;Rizzolatti, Fadiga, Gallese, & Fogassi, 1996). Changes in perseveration for reward (Albuquerque et al., 2014; Herzog et al., 2009;Houeto et al., 2002) are observed following STN DBS in PD. Thus, whereas habit learning implicates regions involved in the encoding of reward prediction, perseveration implicates motor preparatory regions. Finally, higher obsessivecompulsive inventory scores were associated with weaker connectivity between STN and a fronto-parietal executive network including dorsolateral PFC, a network crucial for cognitive and attentional flexibility and shifting and implicated in OCD (Fineberg et al., 2015).
We chose to examine resting state neural properties rather than task-based for several reasons. Firstly, understanding the resting and latent neural network provides insight into the default or intrinsic function of the network as a wholewithout perturbation by cognition, which may differ on an interindividual basis. As such, two levels of interindividual variability are possible: variability within the intrinsic network itself; and variability in the way in which that network is recruited during task. This distinction certainly requires further exploration and delineation. However, understanding the baseline characteristics of neural networks is key, before any network recruitment by task demand. Furthermore, resting state fMRI data is quicker and easier to collect compared to task fMRI-features that are crucial in clinical settings. As the current study is of relevance to clinicians interested in STN DBS, we use a tool that is accessible to clinical work. This technique can therefore be expanded to other areas of clinical interest, for example for pre-surgical mapping studies based on behavioural or cognitive faculties of particular importance. While we employ a technique that improves signal compared to noise for examining small structures, there are certainly still limitations for the use of 3T fMRI for examining such small regions, where the signal can be mixed or contaminated by adjacent structures. We aimed to combat this by illustrating that the observed findings were not produced primarily from the adjacent SN.
Together the findings highlight unique contributions of diffuse cortico-striatal functional connections with STN to dissociable measures of compulsivity. These observations are particularly relevant to the impact of STN DBS on behavioural inflexibility in neurological and psychiatric disorders and may potentially act as biomarkers of treatment response.