Cortico-striatal white-matter connectivity underlies the ability to exert goal-directed control

The balance between goal-directed and habitual control has been proposed to determine the flexibility of instrumental behavior, in both humans and animals. This view is supported by neuroscientific studies that have implicated dissociable neural pathways in the ability to flexibly adjust behavior when outcome values change. A previous Diffusion Tensor Imaging study provided preliminary evidence that flexible instrumental performance depends on the strength of parallel corticostriatal white-matter pathways previously implicated in goal-directed and habitual control. Specifically, estimated white-matter strength between caudate and ventromedial prefrontal cortex correlated positively with behavioral flexibility, and posterior putamen – premotor cortex connectivity correlated negatively, in line with the notion that these pathways compete for control. However, the sample size of the original study was limited and so far, there have been no attempts to replicate these findings. In the present study, we aimed to conceptually replicate these findings by testing a large sample of 205 young adults to relate cortico-striatal connectivity to performance on the slips-of-action task. In short, we found only positive neural correlates of goal-directed performance, including striatal connectivity (caudate and anterior putamen) with the dorsolateral prefrontal cortex. However, we failed to provide converging evidence for the existence of a neural habit system that puts limits on the capacity for flexible, goal-directed action. We discuss the implications of our findings for dual-process theories of instrumental action.


Introduction
Many of our actions are repeated every day in the same context, making them highly prone to become habitual (Wood & Rünger, 2016).According to dual-systems accounts, separate habitual and goal-directed systems both contribute to instrumental action control (de Wit & Dickinson, 2009).While the goal-directed system flexibly selects actions based on the anticipation of outcomes that one is motivated to obtain, the habit system automatically activates actions based on learned stimulusresponse (S-R) associations.This enables the habit system to select actions quickly and efficiently, but it comes at the cost of flexibility when goals change.The strongest evidence for dual underlying processes comes from brain lesioning studies in animals that shows that dissociable cortico-striatal pathways contribute to goal-directed and habitual action (Balleine, 2019).While translational research in humans has also convincingly implicated a cortico-striatal pathway, involving caudate and (dorsolateral and ventromedial) prefrontal cortex, in goal-directed control (Balleine & O'Doherty, 2010), there is less evidence for a separate, neural habit pathway.A notable exception is a prominent functional Magnetic Resonance Imaging (fMRI) study that indirectly implicated activity in the posterior putamen in (overtrained) habitual behavior (Tricomi et al., 2009).However, a recent well-powered registered report failed to replicate this finding (Gera, Bar Or, et al., 2023).Another study used structural MRI to provide correlational evidence for a dual-system neural architecture, by relating dissociable white-matter neural pathways to behavioral flexibility (de Wit et al., 2012).While the evidence appears compelling, the limited sample size of that study and the lack of replication raise the question whether this is a robust finding.To address this, we conducted the present conceptual replication study.
To discriminate goal-directed from habitual control of instrumental actions, researchers have developed experiments evaluating the sensitivity of instrumental performance to the relationship between actions and outcomes (contingency degradation) or sensitivity of the action to changes in the desirability of the outcome.The latter, outcome devaluation, is the most widely used assay of the balance between habitual and goal-directed control. .To illustrate, in the animal version of this assessment, animals first learn to press a lever to obtain a reward (e.g., food pellets) during the instrumental learning phase.Subsequently this outcome (food pellet) is devalued through satiation or by pairing it with Lithium Chloride-induced nausea, and the animal is given the opportunity again to press the lever in an extinction test.Following limited training, animals will immediately reduce lever pressing when the outcome is no longer valuable, indicating that performance is under goal-directed control.In contrast, more extensive training can cause a shift to habitual control, with rats continue lever pressing even though the outcome is no longer a desirable goal (Dickinson, 1985).Importantly, brain lesioning studies in animals research have shown that distinct but interacting cortico-striatal pathways underlie goal-directed and habitual control, thereby supporting dual-system accounts of action control.For example, lesioning prelimbic cortex and dorsomedial striatum leads to habitual performance, even after limited training.In contrast, lesioning the infralimbic cortex and dorsolateral striatum renders performance goal-directed, even after extensive training (Balleine, 2019).
To study action control in humans, the outcome-devaluation paradigm was translated to computerized tasks (see for review O'Doherty et al., 2017).A specific version of this task that was used in the aforementioned, structural MRI study is the 'slips-of-action' task (de Wit et al., 2012).This task was developed to model 'slips of action' in daily life, that are triggered by contextual stimuli through learned stimulus-response associations even when not in line with one's current intentions (e.g., turning right towards the office, when one's current goal is to go into town).Specifically, it assesses the ability of participants to withhold learned instrumental responses (left and right key presses) to various discriminative stimuli (visual icons) when the outcome (visual icons that are worth points) contingent on the response is no longer valuable (while continuing to respond for still-valuable outcomes).Failures to suppress learned response towards devalued outcomes are referred to as 'slips of action' and are interpreted as evidence for reliance on habits.Using this task, de Wit and colleagues (2012) showed that vulnerability to slips of action was related to the strength of structural whitematter tracts.Specifically, whereas estimated white-matter connectivity between caudate and ventromedial prefrontal cortex (vmPFC) correlated positively with performance on the slips-of-action task, connectivity between the posterior putamen and premotor cortex correlated negatively (de Wit et al., 2012).We interpreted this as evidence that these pathways play a role in goal-directed control and habitual control, respectively, and compete to some extent for action control.Therefore, this study suggested a similar neural architecture as in animals.Furthermore, it lent credence to the idea that there are individual differences in the tendency to rely on habitual control, due to strong habitual control and/or weak goal-directed control.In line with previous efforts to relate traits (e.g., neuroticism) to white-matter connectivity (Zhao et al., 2021), 'habit tendency' (or 'propensity') is conceptualized here as a relatively stable trait.
Since then, habit tendency has been suggested to be a transdiagnostic mechanism for developing and maintaining addictive and compulsive behaviors (Robbins et al., 2012).Indeed, previous case-control studies with outcome-devaluation tasks and the related two-step task (Daw et al., 2011) have found disruptions in the balance between habitual and goal-directed control in patients suffering from addiction (e.g., Ersche et al., 2016;Sjoerds et al., 2013), obsessive-compulsive disorder (e.g., Gillan et al., 2011), binge-eating disorder (Voon et al., 2015), and Gilles de la Tourette (Delorme et al., 2016), amongst others (but see for critical assessments e.g.Hogarth, 2020;Watson & de Wit, 2018).Furthermore, interindividual variability in habit tendency has been related to a spectrum of compulsive behaviors in nonclinical populations (Gillan et al., 2016;Snorrason et al., 2016).
However, the view that competing dual processes underlie human action control has not gone unchallenged.The original demonstration of habit formation due to extensive training in humans (Tricomi et al., 2009) has not been replicated by more recent studies (de Wit et al., 2018;Gera, Bar Or, et al., 2023;Pool et al., 2022;Watson, Gladwin, et al., 2022).Furthermore, action slips may also arise as a consequence of failures to learn the associations between discriminative stimuli and the signaled outcomes, which has been argued to contribute to impairments observed in, e.g., addictive populations (Hogarth, 2020).In light of the ongoing debate on the role of habits in human action control, and the limited amount of neuroscientific evidence to support this view in humans, the present study was conducted to conceptually replicate and extend our previous investigation of whitematter connectivity in relation to the vulnerability to slips of action (de Wit et al., 2012).
To this end, we trained and tested a large sample of young adults (n=205) on the slips-of-action task and related the ability to selectively respond for valuable outcomes (while withholding responses for devalued outcomes) to corticostriatal connectivity from the three original striatal seed regions: the caudate, anterior putamen, and posterior putamen.Because the main associations between connectivity and behavior reported in our previous study (de Wit et al., 2012) were observed on the 'standard' bi-conditional discrimination type (in which different pictures function as discriminative stimuli and outcomes), we used a version of the task that only has standard discriminations (for a detailed discussion of the differences with the original task, see Methods and Discussion).We also performed additional analyses with the subset of participants who achieved perfect S-O knowledge during training, to remove noise due to slips that arise from failures to acquire contingency knowledge (de Houwer et al., 2018).Based on the previous findings, we hypothesized that the ability to flexibly base responding on the current value of the outcome would be positively associated with white-matter connections between the vmPFC and caudate, and negatively with white-matter tract strength between the premotor cortex and the posterior putamen.

Materials & Methods
The current study was part of the 'Population Imaging' project (PIoP2) of the Spinoza Centre for Neuroimaging in which several researchers collaborated to collect data of a large number of participants (Snoek et al., 2021).As such, the sample size was not determined specifically for the purpose of this study.

Participants
A total of 243 community-dwelling young adults participated in this study.They were recruited using flyers and word of mouth.Participants were sampled from the University of Amsterdam student population based on convenience and not equal probability (i.e.convenience sampling).Participants had to be between 18-25 years of age, and they had to pass the standard MRI safety screening.The data of 38 participants were excluded, the reasons for which are listed in Table 1.The remaining 205 participants that were included in all analyses ranged in age from 18 to 25 years (M = 21.40,SD = 1.82; 85 males, 20 left-handed).Educational level varied from secondary school to university, but the majority of the participants (176) were students at the bachelor or master level.Participants received a financial compensation of 50 euros for their time.They gave informed consent before participation.
All procedures were executed in compliance with relevant laws and institutional guidelines and approved by the local ethics committee (2017-EXT-7568).

Study procedure
Participants were tested at the University of Amsterdam.They took part in a four-hour procedure, of which one hour was spent inside the MRI scanner.Besides the structural scans reported here, several tasks were performed while fMRI was recorded, and additional tasks and questionnaires were performed outside the scanner.The order of measurements was counterbalanced over participants in such a way that four possible orders were created in which all elements of the study could be presented.The current study only focused on the Slips-of-Action Task (similar to Worbe et al., 2015; see next section for a detailed description), which was performed outside the scanner.In two of the task orders, the Slips-of-Action task (SoAT) was presented as the second task after the start of the test session (early SoAT group).In the other two task orders, the SoAT was presented as the second task of the second half of the test session, after participants had already undergone ~two hours of testing, including an hour in the MRI scanner (late SoAT group).Next, a subset of participants chose to participate in a follow-up investigation of the automatization of a real-life routine (in relation to whitematter connectivity), the data of which have been reported by van de Vijver et al. (2023).

Training phase
Participants had to learn by trial-and-error for each of six stimuli (S) with which outcome (O) they were related and which of two response buttons (R) would result in obtaining this outcome (Figure 1a).Specifically, on each trial a box was presented with a symbol of a fruit (S) on the front.If the correct button was pressed, the box opened and the fruit (O) inside the box was displayed for 1000 ms.This was accompanied by the reception of one or more points depending on the reaction time of the participant: 5 points were gained within the first second, 4 points after 1-1.5 sec, 3 points after 1.5-2 sec, 2 points after 2-2.5 sec and 1 point if the response time was longer than 2.5 sec.If the incorrect button was pressed, the box was empty, and no points were gained.Trials were separated by an ITI of 1500 ms.
The six stimulus fruits were associated with six unique, different outcome fruits.Three stimuli were mapped to the left-and three to the right response button.The training consisted of 96 trials in which all stimuli were presented 16 times.Trial order was randomized per 12 trials (including two presentations of each stimulus).
There is one big difference between the current task version and the one used in the study that we aim to conceptually replicate.We use a more recent version that contains three standard instrumental discriminations, whereas the original task contained: one standard discrimination (e.g., lemon: left --> pineapple; orange: right --> banana), one congruent (apple: left --> apple; cherries: right --> cherries), and one incongruent (e.g., coconut: left --> pear; pear: right --> coconut).The incongruent discrimination has been argued to give rise to habitual performance (de Wit et al., 2007).However, due to the response conflict that is provoked by incongruent outcomes, the interpretation of incongruent performance and its neural correlates is more challenging than that of standard performance.For this reason, the task was adjusted to just contain standard discriminations (Worbe et al., 2015), and that version has been used widely since then to study both the role of habit tendency in psychopathologies (Delorme et al., 2016;Ersche et al., 2016) and its neural correlates (Watson et al., 2018).Importantly, the finding that we aim to replicate actually pertained to the relation between performance on the standard (and not congruent/incongruent) discriminations and cortico-striatal connectivity.

Slips-of-action test
Each block of the SoAT was preceded by the presentation of all six outcome fruits for 10 sec (Figure 1b).Two of the outcomes were no longer valuable, as indicated by a red cross.Participants were again presented with boxes with the stimulus fruits on the front and instructed to press the correct response for each box (Figure 1c).However, they should abstain from pressing a button when the outcome contained inside the box was currently not valuable.The presentation time of the boxes, and, thus, the response window, were now limited to 1000 ms, again separated by a 1500 ms intertrial interval.This part of the task consisted of short blocks of 12 trials, including two presentations of each stimulus fruit.Before each block, two new fruits were devalued.Of the devalued fruits, one was always associated with the left response button, the other with the right response button.Given this restriction, all possible combinations of outcome fruits featured once in the test, resulting in nine blocks in total.Trial order was randomized per block.Outcome fruits were not displayed in this phase (i.e., nominal extinction), but participants were instructed that a correct button press now resulted in gaining one point, a button press for a box containing a devalued outcome in losing one point, and when the incorrect button was pressed, or no button was pressed at all there were no points gained or lost.

Explicit tests of knowledge of S-R-O relations
After the SoAT, participants were shown all stimulus fruits one at a time and asked to indicate which (response) direction was associated with each fruit (test of S-R knowledge).Next, the same question was asked for all outcome fruits (test of O-R knowledge), and, finally, participants had to indicate for each outcome fruit with which stimulus fruit was paired (test of S-O knowledge).For each of the 18 questions, participants also had to indicate their certainty on a scale of 0 to 100.A final multiple-choice question assessed which strategy participants applied during the test phase when viewing the screens showing the valuable and devalued outcomes.The response options were that they focused on the pictures on the inside of the box they no longer needed to press for, the pictures on the inside they still needed to press for, the pictures on the outside they no longer needed to press for, or the pictures on the outside they still needed to press for.
Participants received extensive instructions about both the training phase and the SoAT before starting the real task.They were also informed that the two best-performing participants would receive a financial bonus.

Statistical analyses of behavior
Data of participants that did not reach an accuracy level of 75% in the last two training blocks (learning phase criterion) or a SoAT difference score of zero or lower (test phase criterion, explained below) were excluded from all analyses (see Table 1).Of the included 205 participants we first examined how well they learned the correct responses during training.To this end, average accuracy scores and reaction times (RT) from the training phase were entered into separate repeated-measures ANOVAs with within-subject factor Block (each set of 12 trials, 8 blocks).
To investigate 'slips of action' during the SoAT, we first directly compared the response rates between conditions (valuable and devalued) with a t-test.Next, individual SoAT difference scores were computed by subtracting response rates for devalued outcomes from response rates for valuable outcomes.Thus, lower SoAT difference scores reflect a tendency to rely more on habitual control and higher scores reflect more goal-directed control.To find out whether the level of performance during training was related to test performance, we correlated training accuracy with this SoAT difference score using a Spearman rank correlation.We also examined participants' accuracy on the explicit questions about S-R, R-O, and S-O knowledge.To investigate to what extent variability in the SoAT difference scores directly related to S-O knowledge we computed the Spearman correlation between the two measures.
Finally, we checked whether the behavioral patterns that were found for the training and SoAT remained the same when focusing on the group with perfect S-O knowledge only.For all ANOVAs Greenhouse-Geisser corrections were applied when necessary, but uncorrected degrees of freedom are reported.

MRI data preprocessing
MRI data were analyzed using FSL (Jenkinson et al., 2012;Smith et al., 2004) and custom written Matlab scripts (The MathWorks, Natick, MA, USA).We followed the common FSL pipeline for processing of diffusion-weighted data, which consists of (1) eddy correction, (2) brain shape extraction from the b0-image of the eddy-corrected data ("Betcrawler"), and (3) diffusion-tensor fitting at each voxel ("dtifit").This final step results in the FA values per voxel.Brain shapes were also extracted from the T1 scans.Finally, we used "Flirt" to compute transformation matrices between (1) diffusion-tensor imaging (DTI) space and T1 space, (2) DTI and standard 2mm MNI (Montreal Neurological Institute) space, and (3) T1 and 2mm MNI space.

Probabilistic tractography
The distributions of voxel-wise principal diffusion directions at each voxel that are necessary for probabilistic tractography were estimated for each participant with repeated sampling in their individual DTI space ("bedpostX").Probabilistic tractography also requires the definition of seed masks from which the tracts are sampled.We acquired bilateral masks of the caudate nucleus and putamen from the Harvard-Oxford subcortical atlas in FSL.The putamen masks were split at y=2 to create separate masks of anterior and posterior putamen, respectively.Next, for each participant the three bilateral masks were registered to their individual DTI space.To minimize the impact of spatial smoothing caused by this procedure, the resulting masks were thresholded at 0.8 and voxels with nonzero values in more than one masks were attributed to only one mask.
Subsequently, the connectivity distribution from these seed masks was estimated using "probtrackX".This procedure computes the probabilities of white-matter connections between each voxel in a seed mask and all other voxels in the brain (5000 paths per seed voxel, using only standard options).Per seed voxel, it results in values for all other brain voxels that represent the statistical likelihood of a white-matter tract existing between the seed voxel and the target voxel.Note that whereas the tracks are estimated from seed voxels to other voxels in the brain, the likelihood values do not provide information on the direction of information flow in the estimated connection.The probability distributions from all seed voxels in a mask were combined in FSL.We corrected the resulting values per voxel for the number of paths estimated and the number of voxels in the seed mask to correct for differences in mask sizes (and, thus, number of paths estimated) between masks and between individuals.Finally, tract probability maps were transformed from individual DTI space to standard 2mm MNI space and smoothed using an isotropic 4 mm Gaussian kernel.

Statistical analyses of the relation between Slips-of-Action Test behavior and structural brain data
To investigate how white-matter tract values were related to SoAT behavior, we used the same approach as in de Wit et al. ( 2012) but with a more stringent statistical threshold.For each separate seed mask, we computed the Spearman correlation between the SoAT difference (ranked) score and the tract value per voxel.Only voxels in which at least 60% of the participants had a tract value higher than zero were included.To correct for multiple comparisons, only voxels with p-values < 0.001 were considered significant in line with other studies examining the relation between structural brain connectivity and a behavioral measure at the whole-brain level (e.g.Delorme et al., 2016;Huynh et al., 2021;L. S. Morris et al., 2016;van de Vijver et al., 2023) and only clusters of >20 significant voxels (160 mm3) are reported (in line with de Wit et al., 2012;van de Vijver et al., 2016van de Vijver et al., , 2023;;van den Brink et al., 2014).The application of a popular Monte Carlo method that has been created to compute appropriate cluster thresholds to correct for multiple comparisons at the whole-brain level when analyzing MRI data (Slotnick, 2017;Slotnick et al., 2003;recently used in, e.g. Huynh et al., 2021;van de Vijver et al., 2023) confirmed that this cluster threshold is indeed appropriate for the current analyses.The whole-brain voxel threshold was set at p < .001,which was applied to the complete 91 × 109 × 91 MNI voxel matrix with a voxel size of 2 × 2 × 2 mm and smoothed with a 4-mm FWHM Gaussian kernel (in line with the smoothing that was applied during processing of the DTI data).
We also examined whether relations between tract values and behavior could be explained by local differences in FA values rather than differences in specific connections.To this end, we correlated SoAT difference scores with FA values per voxel and applied the same statistical thresholding.None of the clusters demonstrating a correlation between SoAT behavior and tract values overlapped with a cluster demonstrating a correlation between SoAT behavior and FA values.Note that the brainstem and cerebellum were not (completely) in the field of view for some participants.Clusters located in these brain areas are therefore not reported, as well as a cluster at the border of the occipital pole.
The explicit knowledge of S-R, R-O, and S-O associations as reported in the questionnaires confirmed that participants learned the correct relations.After the SoAT, S-R knowledge was on average 97.6% (SD 7.1; perfect score in 182/205 participants), O-R knowledge 90.9% (SD 15.3, perfect score in 136/205 participants), and S-O knowledge 82.3% (SD 25.1, perfect score in 118/205 participants).Thus, the majority of participants were well aware of the relations between stimuli, responses, and outcomes.As expected, SoAT difference scores were significantly related to explicit S-O knowledge, rS = .652,p < .001.
We also assessed participants' strategies during the test phase.Out of 204 participants (1 participant did not fill out this question), 151 participants reported focusing on the fruit cues on the outside of the box (S) in response to which they no longer needed to press, 20 focused on the fruit cues on the outside in response to which they still needed to press, 27 on the fruit outcomes on the inside of the box (O) for which they no longer needed to press, and 6 on the outcomes on the inside for which they still needed to press.Thus, while the instruction screen demonstrated which outcomes were valuable and devalued during the SoAT, 73.7% of the participants appears to have immediately translated this to the discriminative stimuli that signalled those outcomes (and most focused on the ones signaling devalued outcomes towards which they should inhibit their responses).In the group with perfect knowledge, this percentage was even higher, namely 83,1%.

<< Figure 2 >>
Because participants performed the SoAT at different moments during the test session, we examined the possible impact of fatigue on task performance.We first compared training accuracies and RTs between the early and late SoAT groups.In the whole group of participants (N=205) there were no differences between the groups in the learning phase (p-values > .17).Next, we investigated possible differences in the SoAT test phase.There was no difference between groups in the response rates for the still valuable outcomes, but the response rates for devalued outcomes were significantly higher for the late (N = 78; M 36.11,SD 21.08) compared to the early group (N = 118; M 30.53,SD 18.88), t(203) = -1.990,p = 0.048.Importantly, there was no significant difference between the groups in the SoAT difference scores that were used to investigate the relation with the structural brain data (p-values > .2). SO knowledge also tended to be higher in the early (M .85,SD .23)than the late group (M .78,SD .28),although this difference was only marginally significant, t(164.391)= 1.882, p = 0.062.
Of the 118 participants with perfect S-O knowledge that we included in our extra analyses, 71 were in the early condition, while 41 were in the late condition, suggesting that even if there were effects of fatigue, many participants in the late group still performed well.We also checked whether there were differences in performance between the early and late participants in the group with 100% S-O knowledge, but this was not the case (training RTs p = 0.084, all other p-values > .3).

White-matter tracts related to Slips-of-Action Test performance in all participants.
To investigate the relationship between action control and anatomical corticostriatal connections, we correlated performance on the SoAT to structural brain connectivity from the caudate nucleus, the anterior and the posterior putamen to the rest of the brain.All findings surviving the whole-brain level statistical threshold can be found in Table 2 and Figure 3. SoAT performance, with higher difference scores reflecting goal-directed performance, correlated positively with track probabilities between the anterior putamen and the dorsolateral prefrontal cortex (dlPFC; Figure 3a).Moreover, goal-directed control was positively correlated to white-matter tract strength between all three striatal seed regions and the (superior) inferior fronto-occipital fasciculus, a large-white matter tract of the human cerebrum, and the primary (sensori)motor cortex.Positive correlations were also found between SoAT performance and connections between the caudate nucleus and the presupplementary motor area, and between the posterior putamen and corpus callosum.Additionally, white-matter bundles between the caudate nucleus and the lateral occipital cortex, angular gyrus and superior longitudinal fasciculus correlated positively with goal-directed control.Finally, SoAT difference scores correlated with tracts linking the anterior putamen with the lateral occipital cortex and the inferior and superior temporal gyrus.Contrary to our hypotheses, however, at the whole-brain level we did not find positive correlations between SoAT difference scores and caudate-vmPFC tracts, nor negative correlations with posterior putamen-premotor tracts.In fact, no significant negative correlations were observed at all.

White-matter tracts related to Slips-of-Action Test performance in participants with perfect S-O knowledge.
We also examined these relations in the subset of participants (N=118) who achieved perfect S-O knowledge during training (Table 3 and Figure 4).The goal of this analysis was to exclude participants who did not learn all S-O contingencies, which could lead to erroneous responses during the test phase due to a failure to learn the contingencies in the first place.In the participants with perfect S-O knowledge, SoAT difference scores again correlated positively with tracts between the dlPFC and the anterior putamen, as well as caudate nucleus seeds.This finding lends more credence to the idea that this tract subserves goal-directed performance.SoAT difference scores were also positively related to a tract between the anterior putamen seed and the middle temporal gyrus.We still did not observe any negative correlations with cortico-striatal connectivity, validating the robustness of our current findings.Overall, this analysis revealed a lower number of significant tracts, but that is not surprising given reduced power with this smaller sample.

Discussion
The present study aimed to replicate a previous structural MRI investigation that implicated competing dual cortico-striatal pathways in behavioral flexibility in humans (de Wit et al., 2012).In short, the current results largely deviate from those previous findings.While the previous study reported a positive association between flexible goal-directed test performance and white-matter tract strength seeding from caudate to ventromedial prefrontal cortex, we found white matter strength from the caudate (and anterior putamen) to the dorsolateral prefrontal cortex (dlPFC), as well as several other cortico-striatal tracts, to be positively correlated to goal-directed control.Moreover, in contrast to the negative association between flexibility after outcome devaluation and structural connectivity between the posterior putamen and premotor cortex reported by de Wit et al. ( 2012), we did not find any negative associations.Therefore, the present replication study -with a larger sample (205 as opposed to 23 participants) and more stringent neuroimaging threshold criteria -does not provide evidence for the existence of a neural habit system that constrains the capacity for flexible, goaldirected action.These findings will be discussed in more detail, as well as their implications for dualprocess/system theories.
We will first have a more detailed look at the positive correlates of performance on our outcome-revaluation paradigm, the slips-of-action task (SOAT).We found that goal-directed performance was related to striatal connectivity with several cortical areas, including the dorsolateral prefrontal cortex (dlPFC), primary motor cortex, lateral occipital cortex, inferior and superior temporal gyrus, and pre-supplemental motor area.In a next step, we ran additional analyses on the subsample of participants who had achieved perfect S-O knowledge during training.This was to eliminate noise due to slips of action towards devalued outcomes being driven by imperfect knowledge of the outcome signaled by each discriminative stimulus (i.e., the S-O contingencies) rather than the ability to exert goal-directed control at the choice point.In this subsample, the most prominent cortico-striatal tract was between the anterior putamen and caudate regions and the dlPFC, and another tract was between anterior putamen and middle temporal gyrus.
Several aspects of these positive correlates stand out.First, all positive cortico-striatal correlates in the subsample with perfect S-O knowledge involved the caudate (as well as anterior putamen), which was also implicated in goal-directed control by the original structural MRI study (de Wit et al., 2012), as well as by fMRI outcome-revaluation studies (Gera, Bar Or, et al., 2023;R. W. Morris et al., 2015;van Timmeren et al., 2023;Watson et al., 2018).Second, we found that interindividual differences in goal-directed performance were associated with striatal (caudate and anterior putamen) connectivity strength with the dlPFC.This region has been associated with abstract cognitive control functions (reviewed by Petrides, 2005) that may be important to perform in a goaldirected manner on the SOAT, including implementing cognitive control (Ridderinkhof, Ullsperger, et al., 2004), cognitive flexibility (Ridderinkhof, van den Wildenberg, et al., 2004), working memory (Barbey et al., 2013), and response inhibition (Miyake et al., 2000).More directly related to goaldirected control, findings suggest that the dlPFC is involved in the encoding of action-specific value comparisons (R. W. Morris et al., 2014), action-outcome representations (McNamee et al., 2015) and state prediction errors key for model-based learning (Gläscher et al., 2010).Related to this last result, disrupting dlPFC functioning with transcranial magnetic stimulation has been shown to disrupt modelbased decision making (Smittenaar et al., 2013), suggesting a causal role in goal-directed action.The dlPFC has also been implicated in goal-directed control by fMRI studies using outcome devaluation paradigms.Gera et al. (2023) found that dlPFC activity was elevated early in training, but only in a subset of participants who remained sensitive to outcome devaluation following extensive training, a pattern that was absent in habitual participants.Moreover, Watson et al. (2018) showed that during successful test trials of the SOAT, dlPFC activation correlated with individual differences in goaldirected performance.Combined with our current findings, these results suggest that dlPFC functioning, through its connection with the striatum, plays a crucial role in enabling the ability to control behaviour in a goal-directed manner.
A notable absence from the positive correlates, on the other hand, is the ventromedial PFC (vmPFC).This contrasts not only with the original white matter study (de Wit et al., 2012), but also with several structural (Piray et al., 2016;van Steenbergen et al., 2017) and functional (reviewed by O'Doherty, 2011) MRI studies that have implicated the vmPFC in goal-directed control.The vmPFC plays a central role in supporting goal-directed actions by representing action-values (Gläscher et al., 2009;O'Doherty, 2011).Using event-related fMRI designs, previous outcome-devaluation studies found vmPFC activity during the performance of a goal-directed action in the test phase (de Wit et al., 2009;Valentin et al., 2007;van Timmeren et al., 2023) or in relation to the anticipation of an upcoming reward during training (Tricomi et al., 2009).Therefore, while those fMRI studies compared activity at the choice point across participants, offering a more direct window onto the neural correlates of valuebased decision making (reflected by vmPFC activity), our study indicates that individual differences in the tendency to make action slips are not associated with structural striatal-vmPFC connectivity.
Importantly, while we found evidence for positive correlates of behavioral sensitivity to outcome devaluation, we did not replicate the previously observed negative correlation with posterior putamen -premotor cortex connectivity, nor any other negative correlations.This was also the case in the subsample with perfect S-O knowledge, validating the robustness of our current findings.We therefore do not replicate the original evidence for a habit system that (co-)determines behavioral flexibility.It seems unlikely that this failure to replicate is due to a difference in the task design.The task we adopted was the same, except that it only included the 'standard' discrimination, while the version used by de Wit et al. ( 2012) additionally included a congruent and incongruent discrimination.
However, the critical findings that we aimed to replicate concerned the correlation between performance on trials of the 'standard' discrimination exclusively.In light of our current findings, it is also relevant to point out that the evidence for this negative correlation was already weaker than the positive correlations in the original study that we aimed to replicate (R 2 = .23and .50,respectively).In any case, our failure to find structural evidence for neural dual processes underlying performance on our outcome-revaluation paradigm raises concerns about the robustness of this finding, especially given the fact that our present study included a much larger sample than the original study and employed more stringent threshold criteria (but see Marek et al., 2022).
The theoretical framework for this line of research into action control is provided by dualprocess theories, and more specifically the notion that the ability to adjust instrumental behavior when our goals change depends on competition between a goal-directed and a habitual system (Balleine & Dickinson, 1998;Balleine & O'Doherty, 2010;Dickinson, 1985).Support for this framework has come from research in animals, with lesioning studies implicating distinct brain regions in goal-directed and habitual control when outcomes are revalued (Balleine, 2019).But is human action control similarly supported by competing dual neural pathways?Our findings certainly do not provide evidence for this view.An alternative dual-process account is that habitual and goal-directed processes or pathways do not just compete but instead work in a more integrated manner.An example of this is the associativecybernetic model (de Wit & Dickinson, 2009;Dickinson, 2012;Dickinson & Balleine, 1994), where the habit pathway plays a pivotal role in response selection via S-R associations but is linked to an associative memory in which a representation of that action is linked to a representation of its outcome.When that outcome is rewarding it will lead to activity in an incentive system that boosts the motor system, to enhance the likelihood that the selected response is performed.Conversely, if the outcome is devalued, then the motor system will be inhibited, and it becomes less likely that the subject performs the response that was selected via the habit pathway.Importantly, this version of a dual-process/system account would not necessarily predict that individual differences in habitual control are reflected in negative correlations with structural pathways, as it also plays a role in goaldirected control.
Still an alternative take on our findings is a purely goal-directed account, according to which individual differences in flexibility of instrumental behavior -at least in our paradigm -are driven solely or mostly by the extent to which people are able to exert goal-directed control.Although learned associations that (in)directly link the discriminative stimuli and responses clearly mediate the slips of action on this paradigm, their strength may not differ (as) significantly among participants and therefore not the primary factor driving individual variation in flexibility.This goal-directed account seems to align well with not only our present findings, but also with other previous neuroimaging studies in humans.For instance, the finding by Tricomi et al. (2009) that posterior putamen activity during habit learning increased over days was observed across participants, but not related to variation in test performance.Two recent studies found a similar trend towards increasing activity of the posterior putamen over training across participants (Gera, Bar Or, et al., 2023;van Timmeren et al., 2023), although only when zooming in on the posterior part of the putamen in exploratory analyses.
Perhaps more tellingly, when Gera et al. related brain activity to performance on the outcomerevaluation test, they once again only found evidence for positive correlates (Gera, Bar Or, et al., 2023).
One other study that related neural activity during training to test performance revealed regions of the premotor cortex and cerebellum as negative predictors but found no evidence for posterior putamen involvement.In fact, so far, devaluation sensitivity has been mainly linked to activity in brain regions implicated in goal-directed control, and attentional and conflict monitoring, such as the anterior caudate, insula, anterior cingulate cortex, dlPFC and lateral OFC (Gera, Bar Or, et al., 2023;Howard et al., 2020;Perkes et al., 2023;van Timmeren et al., 2023;Watson et al., 2018).Thus, identifying a reliable neural signature of habit learning and expression remains challenging.As noted by Gera et al. 2023, this difficulty could stem from greater variations among individuals in the goaldirected system than the habitual system, or because variations within the goal-directed system have a more pronounced impact on behavior.
It has proven not only challenging to find evidence for an independent habit system at the neural level, but also to capture habits behaviorally as a function of overtraining in humans.While Tricomi et al. (2009) demonstrated habits as a function of behavioral repetition, this finding has not been replicated since, despite multiple published attempts.This includes a paper reporting five failed attempts to find an effect of overtraining on behavioral flexibility (de Wit et al., 2018), using different existing outcome-revaluation paradigms (including the slips-of-action test and Tricomi et al.'s paradigm), followed by another failed attempt with a novel 'symmetrical outcome-revaluation task' (Watson, Gladwin, et al., 2022).Furthermore, Pool et al. (2022) recently conducted a well-powered (N=306 as opposed to the original 32) registered report of Tricomi et al.'s study which also failed to replicate the original evidence for habits as a function of overtraining.Those authors argued that a possible explanation for this could be that the replication study was conducted outside the scanner, in contrast to Tricomi et al. (2009).However, the recently published pre-registered MRI study of Gera and colleagues (N = 123) casts doubt on that account, by once again not showing any effect of overtraining on behavioral flexibility even when conducted inside an MRI scanner (Gera, Bar Or, et al., 2023).
To summarize, the translation of the dual-process model from animals to humans proves to be challenging.Unlike animal research, many attempts to capture habits in humans using outcomerevaluation tasks have failed, both behaviorally and in terms of its neural signature.As it stands, findings from the neuroimaging literature with outcome-revaluation paradigms is more in line with the idea that behavioral flexibility in humans is primarily driven by (interindividual variations in) goaldirected control than with a dual-process view.This could reflect a fundamental difference between animals and humans.Alternatively, the outcome-revaluation tasks we current adopt in the lab fail to capture the contribution of habitual control to instrumental actions.One reason for the lack of overtraining effects in particular, could be that instrumental training is either too short or too long, with a recent study suggesting that participants rapidly rely on S-R associations during instrumental discrimination training, where correct responses are reliably signaled by discriminative stimuli (van Timmeren et al., 2023).In support of this argument, Gera et al. (2023) also observed a quick development of habitual responding in the majority of their participants.If habit formation is rapid in these tasks, then adding more sessions beyond the initial training may not further increase S-R habit strength.Another potential issue with current paradigms is the clear break between the training phase and the subsequent test, which disrupts the flow of responding and may provoke a return to a more goal-directed mode.Furthermore, the explicit instruction of changed outcome values prior to each test block may facilitate the spontaneous formation of novel S-R rules, such that the changed outcome values inform new rules that can be readily applied during the ensuing test phase.Indeed, such S-R rules have previously been shown to significantly improve performance on this task (Verhoeven et al., 2018).Finally, humans may in general be very capable of exerting goal-directed control and thus adjusting learnt responses under those circumstances, certainly in relatively simple tests without distraction or other tasks taxing cognitive resources.
In recent years, researchers have started to use alternative ways to capture and reveal habits (Watson, O'Callaghan, et al., 2022).For example, taxing the goal-directed system by imposing a cognitive load may be a promising direction for future research into habits (Otto et al., 2013), or alternatively, by focusing on more subtle effects on reaction time.In line with this idea, Luque et al., 2019 showed that reaction time switch cost increases as a function of training.Along the same lines, Hardwick et al. (2019) used an elegant task that manipulated the response window, thereby unmasking habitual responses as a function of repetition after extensive practice.Another option is to exploit well-established real-life S-R associations in an experimental setting (Ceceli et al., 2019), even though this of course limits the degree of experimental control.Finally, more ecologically valid settings may hold promise for controlled investigation of habits and their neural basis, for example through the use of gamified smartphone apps (Banca et al., 2020;Gera, Barak, et al., 2023).Relatedly, the participants in the study reported here were invited to take part in a follow-up study, in which we related individual differences in cortico-striatal connectivity to the subjective automatization of a real-life, daily routine of pill intake during three consecutive weeks (van de Vijver et al., 2023).That study implicated the dlPFC in plan enactment (i.e., pill intake adherence), and also showed that caudate-vmPFC and ACC connectivity was negatively related to perceived automatization of this novel routine, in line with studies implicating this pathway in goal-directed control.While these findings converge with research on habits in the lab, this first attempt to study the neural basis of habits 'in the wild' also failed to provide evidence for a neural habit region.Therefore, we can conclude that it is not trivial to capture habits behaviorally, and further research is needed to determine whether this is due to a lack of sensitivity of existing paradigms or to a modest contribution of habitual control process to the flexibility of human action control.
Our findings and recent related neuroimaging studies raise questions for research into 'habit tendency' in, e.g., addiction and obsessive-compulsive disorder.This tendency to rely on simple S-R associations for action control may indeed be primarily driven by impairments in goal-directed action control and related cognitive control functions, and not by the strength of S-R habit formation per se.This does, however, not undermine the view that habit-based interventions hold great promise to support adaptive behavior change, as these would be expected to reduce the need for effortful goaldirected control.
Our findings should be interpreted in light of several limitations.Recruitment of participants for this study was based on convenience sampling of young adults between 18-25 years, limiting the generalizability of our findings.Another limitation is that behavioral performance on the outcome devaluation task may have been impacted by state factors.For example, previous research has found that behavior becomes less sensitive to outcome devaluation under stress (Quaedflieg et al., 2019;Schwabe & Wolf, 2009, 2010) and following sleep deprivation (Chen et al., 2017).Thus, a bad night's sleep or a stressful event may have incidentally influenced task performance, adding noise to our assessment of goal-directed control.Future studies into the relationship between structural brain connectivity and the goal-directed action should therefore use repeated behavioral measurements to obtain a more reliable read-out.More generally, it would be valuable for our field to focus attention on the test-retest reliability of outcome devaluation paradigms, which (to the best of our knowledge) has never been evaluated.

Conclusion
Dual goal-directed and habitual processes are argued to underlie human instrumental behavior and determine the ability to flexibly adjust it when goals change.Outcome-revaluation paradigms have been used to capture the balance between those processes in animals and to provide evidence for dissociable underlying pathways.However, in humans the experimental and neuroimaging evidence is not as convincing.Our present structural MRI findings converge with recent functional MRI evidence that variations in goal-directed rather than habitual control are the main determinant of behavioral flexibility on these tasks in humans.Future research with improved behavioral measures is needed to determine the role of habitual and goal-directed processes in human action control.

Figure 1 -
Figure 1 -The Slips of Action Task.A) Sequence of events in an example trial of the training phase.Participants were instructed to learn the correct response for every stimulus fruit by trial-and-error.Only if the correct response was made, points were gained and the outcome associated with the specific stimulus was shown.B) During the Slips-of-Action Test (SoAT), two outcome fruits were devalued in each block.C) Sequence of events in an example trial of the SoAT.Participants were instructed to withhold their response for boxes of which the stimulus fruit indicated that the enclosed outcome fruit was currently devalued.

Figure 2 -
Figure 2 -Behavioral performance on Slips-of-Action Task by complete sample.a) Choice accuracy increased with training and b) reaction times decreased with training.c) Slips-of-Action Test response rates were higher for still valuable (Sv) than for devalued (De) outcomes (the black horizontal line represents the mean, error bars represent standard deviations over participants, each black dot represents the data of an individual participant, and the width of the violin represents the kernel density estimate of the data).

Figure 3 .
Figure 3. Clusters showing significant correlations between white-matter tract values from the three striatal seed regions and Slips-of-Action Test behavior in all participants (N=205).Slips-of-Action Test difference scores, with higher scores reflecting more goal-directed performance, correlated positively with tracts between (a) the anterior putamen and a cluster in dorsolateral prefrontal cortex, (b) caudate, anterior putamen, and posterior putamen seeds and the fronto-occipital fasciculus, (c) all three seed regions and clusters in primary motor cortex, (d) the caudate seed and a cluster in presupplementary motor area, (e) the posterior putamen seed and a cluster in the corpus callosum (not shown) and subcallosal cortex (f) the anterior putamen seed and two clusters in superior and inferior temporal gyri, and (g) the caudate and anterior putamen seeds and clusters in lateral occipital cortex and angular gyrus.Red patches represent positive correlations from all three seed regions, overlayed.Scatter plots represent voxels with peak correlations, with SoAT difference scores on the x-axis, and tract values on the y-axis (both rank transformed).Images are shown in radiological convention.MNI coordinates (X, Y, and Z) indicate the location of the displayed slice in the brain in mm (R/L = right / left, Ant put = anterior putamen, Cau = caudate nucleus, Post put = posterior putamen, dlPFC = dorsolateral prefrontal cortex, FOF = fronto-occipital fasciculus, Amyg = Amygdala, M1 = primary motor cortex, S1 = primary sensory cortex, SMA = (pre)supplementary motor cortex, SC = subcallosal cortex, TG = temporal gyrus, LOC = lateral occipital cortex , SoAT diff score = SoAT difference score).

Figure 4 .
Figure 4. Clusters showing significant correlations between white-matter tract values and Slips-of-Action Test behavior in participants with perfect S-O knowledge (N=118).In this group, Slips-of-Action Test (SoAT) difference scores correlated positively with tracts between (a) anterior putamen and caudate nucleus seeds and a cluster indorsolateral prefrontal cortex, and (b) the anterior putamen seed and temporal cortex.Red patches represent positive correlations from all three seed regions, overlayed.Scatter plots represent voxels with peak correlations, with SoAT difference scores on the x-axis, and tract values on the y-axis (both rank transformed).Images are shown in radiological convention, X, Y, and Z coordinates (MNI) indicate the location of the displayed slice in the brain in mm (Ant put = anterior putamen, R dlPFC = right dorsolateral prefrontal cortex, Cau = caudate nucleus, TG = temporal gyrus, R temp = right temporal cortex, SoAT diff score = SoAT difference score).

Table 1 .
Reasons for exclusion of datasets after data collection

Table 2 .
Clusters of voxels in which the tract values correlated significantly with SoAT performance (N = 205).Montreal Neurological Institute (MNI) coordinates(x, y ,z)indicate the location of the voxel with the highest correlation within the cluster, in mm.

Table 3 .
Clusters of voxels in which the tract values correlated significantly with SoAT performance in participants with perfect S-O knowledge (N = 118; x/y/z-coordinates indicate the location in the brain in mm ofthe voxel with the highest correlation within the cluster).