Decreased putamen activation in balancing goal-directed and habitual behavior in binge eating disorder

Acute stress is associated with a shift from goal-directed to habitual behavior. This stress-induced preference for habitual behavior has been suggested as a potential mechanism by which binge eating disorder (BED) patients succumb to eating large amounts of high-caloric foods in an uncontrolled manner (i.e., binge episodes). While in healthy subjects the balance between goal-directed and habitual behavior is subserved by the anterior cingulate cortex (ACC), insular cortex, orbitofrontal cortex (OFC), anterior caudate nucleus, and posterior putamen, the brain mechanism that underlies this (possibly amplified) stress-induced behavioral shift in BED patients is currently unknown. In the current study, 76 participants (38 BED, 38 healthy controls (HCs)) learned six stimulus-response-outcome associations in a well-established instrumental learning task. Subsequently, three outcomes were selectively devalued, after which participants underwent either a stress induction procedure (Maastricht Acute Stress Test; MAST) or a no-stress control procedure. Next, the balance between goal-directed and habitual behavior was assessed during functional magnetic resonance imaging. Findings show that the balance between goal-directed and habitual behavior was associated with activity in the ACC, insula, and OFC in no-stress HCs. Although stress and BED did not modulate the balance between goal-directed and habitual behavior, BED participants displayed a smaller difference in putamen activation between trials probing goal-directed and habitual behavior compared with HCs when using a ROI approach. We conclude that putamen activity differences between BED and HC could reflect changes in monitoring of response accuracy or reward value, albeit perhaps not sufficiently to induce a measurable shift from goal-directed to habitual behavior. Future research could clarify potential boundary conditions of stress-induced shifts in instrumental behavior in BED patients.


Introduction
Binge eating disorder (BED) is a common eating disorder characterized by eating unusually large amounts of food whilst experiencing a loss of control and marked distress (see American Psychiatric Association, 2013). Importantly, it has been associated with various health risks (Guerdjikova et al., 2019;Citrome, 2019;Treasure et al., 2020). Overreliance on habitual behavior has been suggested as a potential mechanism underlying the loss of control during binge eating episodes (Voon et al., 2015(Voon et al., , 2020Wierenga et al., 2018). In contrast, goal-directed behavior is directed toward, and sensitive to, a specific outcome (O'Doherty et al., 2017). Goal-directed behavior is utilized when learning associations between the outcome (O), a stimulus (S), and a response (R; e.g., pressing a button; or S-R-O association). When the behavior is repeated, a shift will slowly take place from goal-directed toward habitual control. Subsequently the behavior becomes increasingly insensitive to the outcome (S-R association; O'Doherty et al., 2017), potentially reflected by eating large amounts of foods beyond satiety.
In the brain, goal-directed and habitual behavior are associated with different cortico-striatal loops (e.g., Watson et al., 2018;Balleine, 2019). Goal-directed behavior is associated with the caudate nucleus and regions in the prelimbic prefrontal cortex, whilst habitual behavior is associated with the posterior putamen and the motor cortex (Valentin et al., 2007;De Wit et al., 2012;Watson et al., 2018;Voon et al., 2015Voon et al., , 2020Schwabe and Wolf, 2011; see for a review Balleine et al., 2019). Several other structures such as the anterior cingulate cortex (ACC), orbitofrontal cortex (OFC), and insula have been implicated as well, and are said to be involved in conflict monitoring between the two control systems (Watson et al., 2018). The aforementioned areas are also affected in BED, both in structure, volume and functional connectivity (Balodis et al., 2015), suggesting their potential role in more pronounced habitual behavior in BED. In addition, functional neuroimaging studies have found activation differences in these areas during tasks associated with reward sensitivity, cognitive control, and negative affect, three cognitive constructs that play an important role in BED (Mele et al., 2020; for an overview see Vainik et al., 2019).
Stress, which regularly leads to increased negative affect, could also play an important role in overeating in BED. In healthy participants, acute stress has been shown to induce a shift toward habitual behavior Wolf, 2009, 2011;Smeets et al., 2019;Quaedflieg et al., 2019;Hartogsveld et al., 2020). A similar process in BED patients may represent a possible explanation of why stress is a risk factor for BED and plays a role in eliciting binge eating episodes (Wierenga et al., 2018). However, hitherto no study has investigated whether BED patients are more sensitive to stress, as expressed by a greater shift toward habitual behavior after acute stress exposure compared with HC.
The aim of the present study was to investigate whether individuals with BED show an amplified shift from goal-directed toward habitual behavior under stress, and whether this is reflected in differences in brain activity. To assess this, we employed an adapted version of the slips-of-action (SOA) task originally designed by de Wit and colleagues (De Wit et al., 2007;Watson et al., 2018). A combination of a behavioral (i.e., eating until satiety) and instructed cognitive devaluation procedure was used to devalue the outcomes more strongly (e.g., Hartogsveld et al., 2020;Smeets et al., 2019). In the SOA task, behavior is said to be habitual when one continues to respond to devalued outcomes (through S-R associations). We hypothesized that 1) individuals with BED show more responses towards obtaining devalued outcomes (slips) compared with healthy controls (HC), that 2) this effect is larger in individuals with BED under acute stress, and that 3) stressed individuals with BED, and to a lesser degree stressed HCs, will show an altered difference in activation between trials with devalued and valuable outcomes in the posterior putamen (associated with habitual behavior), ACC, insula, and OFC (associated with conflict monitoring) and altered difference in activation in the caudate nucleus (associated with goal-directed behavior) (e. g., Valentin et al., 2007;Watson et al., 2018).

Participants
An a priori power analysis using GPower with power (1-β) set at.80, α = 0.05 two-tailed, based on effect sizes of stress observed in our previous studies using a within-subjects design (η p 2 =.10; Smeets et al., 2019), indicated that to detect a medium sized effect a total sample size of 80 participants is required. Participants were recruited via online and paper advertisements and received financial compensation or partial course credit for their participation. This study was approved by the standing ethics committee of the Faculty of Psychology and Neuroscience, Maastricht University, complying with the declaration of Helsinki (version 2013).
In total, 83 participants were tested out of the contacted 172 potential participants, who reported to not have any other psychological disorders aside from BED (see Supplement for exclusion criteria and further details). Binge eating disorder was determined with the Eating Disorder Examination 16.0D interview (Fairburn, 2008). A further 7 participants had to be excluded after data processing (see Behavioral data analysis), resulting in a final sample size of 76. The thirty-eight participants (9 men, 29 women) in the binge-eating group had a mean age of 26.11 years (range=20-49, SD=6.49) a mean BMI of 25.50 kg/m 2 (range=19. SD=4.88) and high self-reported education level (university level or higher n = 27, higher professional education n = 11). For the control group, participants were matched as close as possible on age and education compared with the BED group. In total, the 38 participants (6 men, 32 women) in the control group had a mean age of 23.84 (range=18-33, SD=4.25), a mean BMI of SD=2.55), and the same high education level (university or higher n = 27, higher professional education n = 11). Participants were randomly allocated to either a stress group (BED stress: n = 19, 4 men, 15 women; control stress: n = 20, 4 men, 16 women) or a no-stress control group (BED no-stress: n = 19, 5 men, 14 women; control no-stress: n = 18, 2 men, 16 women). None reported that they had experienced a stress manipulation similar to the currently used Maastricht Acute Stress Test (MAST) in the past. The four groups did not differ in age, BMI, and distribution of men and women (all p-values >0.101).

Instrumental learning task
Participants performed an instrumental learning task (for more details see Supplement and Hartogsveld et al. (2020)), consisting of an instrumental learning stage, a reminder stage, a selective devaluation stage, and a SOA stage divided over 2 days (see Fig. 1 panel I). In the instrumental learning stage, participants learned 6 different S-R-O associations. If the participants pressed the correct key (either the '1 ′ or '6 ′ key on a standard keyboard), a virtual box would open. In 75% of cases, the correct key would lead to the associated food (outcome) that was shown inside the box. In the remaining 25% of cases and for pressing incorrect keys, no food would be presented. The learning stage consisted of eight blocks of 24 trials and after each second block, explicit task knowledge was assessed. This consisted of twelve questions (i.e., for each of the six patterns 'What was the correct response for pattern …?', and 'What food was inside the box for pattern …?'). The answers were collected using a second computer. Participants were told that if they performed well enough on the explicit knowledge test (not specified further) they would receive food rewards. In reality, all participants received a small food reward irrespective of their actual performance. Participants evaluated these rewards with two visual analogue scales (VAS; 'How much do you like (i.e., enjoy the taste of) this piece of chocolate?' and 'How much do you like (i.e., enjoy the taste of) these crisps?'; anchors: 0 = 'not at all' -100 = 'very much').
The reminder stage the next day was identical to the instrumental learning stage and consisted of two blocks of 24 trials. In the behavioral devaluation stage, participants ate one randomly chosen food type (chocolate or crisps; randomized over participants: HC no-stress 9/9, HC stress 10/10, BED no-stress 9/10, BED stress 10/9) to satiety. While they were only offered one food type, participants ate all three chosen food items within that type (e.g. KitKat, Twix, Bounty), which were offered in equal quantities. For the devaluation, participants were required to eat a minimum of 50 g in total for the three chosen foods and thereafter until satiety from (a) larger 200 g portion(s) and were provided with water ad libitum. Three VASs were presented after this phase to compare devaluation across groups ('How much do you like (i.e., enjoy the taste of) this food?', 'How hungry are you at this moment?', and 'Do you feel like eating something tasty?'; anchors: 0 = 'not at all' -100 = 'very much').
In the SOA stage outcomes were cognitively devalued by presenting an image displaying all potential outcomes with crosses displayed over devalued outcomes at the start of each block. This stage consisted of 6 blocks of 24 trials, with half of all trials involving devalued outcomes. Participants were told that this stage had two differences compared with the learning and reminder stage: 1) the boxes wouldn't open anymore, so they would not be able to determine whether their answer was correct, and 2) they now had to press the opposite button in response to stimuli leading to the food type that they had just consumed (devalued). Learned responses (i.e. responses towards obtaining food outcomes) One of the food types (chocolate or crisps) was devalued, by letting the participant eat to satiety. All three chosen food items on day 1 were offered in equal quantities. First the minimum required amount of 50 g was offered (including all three chosen food items in equal amounts), and then subsequent portions of 200 g were offered in the same manner until satiety was reached. Panel F) In the SOA stage participants had to press the opposite button for stimuli associated with devalued outcomes. Panel II. Overview of the procedure of day 2. First participants filled out compliance questionnaires (checks), whereafter they performed a reminder stage (reminder). Half of the outcomes were then devalued by eating to satiety (devaluation), participants underwent the Maastricht Acute Stress Test procedure (MAST) or placebo, and finally performance and brain activation were assessed in the MRI scanner (MRI). Vital signs (blood pressure, heart rate) were taken five times and saliva samples 4 times. would be counted as incorrect for the devalued food (slips) but correct for the still valuable food. For devalued outcomes, this results in a competition between the learned response (more dependent on S-R associations) and the new response (reflecting S-R-O associations).

Stress versus no-stress control procedure
The stress group completed the Maastricht Acute Stress Test (MAST; Smeets et al., 2012), which combines psychological with physical components of stress. Participants received instructions during the first five minutes and then inserted their left hand repeatedly in ice-cold water (2 • C), alternated with subtracting 17 from 2043 repeatedly as fast and accurate as possible out loud. Negative feedback was given by an experimenter (e.g., 'incorrect', or 'faster'). Participants saw themselves on a laptop screen and were told that they were being visually recorded which would be used later for facial analysis. In reality, no recordings were made. The MAST lasted twelve minutes in total. The control group completed the placebo MAST, which is a validated control condition (Smeets et al., 2012, Exp. 3), having similar operations and an identical duration to the MAST in the stress condition. Lukewarm water (36 • C) was used, participants counted repeatedly from 1 to 25 at a relaxed pace and did not include negative feedback or mock videotaping.

Procedure
On all three testing days, participants were tested between 11:00 and 19:00 to minimize potential influences of the cortisol awakening response (Kudielka et al., 2009). For practicality, exact times were allowed to vary within participants across test days. The first two test days were performed with 24 h in between (to promote consolidation of the learned associations and to prevent any stress-induced effects on consolidation and retrieval processes that would have occurred in a one-day paradigm; see Quaedflieg & Quaedlflieg and Schwabe, 2018;Smeets et al., 2008) and took one and a half hours and two and a half hours, respectively. Participants were asked to abide by the following rules to minimize cortisol and performance variability: refrain from alcohol consumption and strenuous exercise 24 h before testing, sleep at least 6 h before testing, not to consume any foods three hours before testing and any liquids other than non-sparkling water two hours before testing, not brush their teeth two hours before testing, not to take additional medication 48 h before testing without informing the researchers.
On the first testing day, participants first provided written informed consent and completed a MR safety screening form. Participants confirmed their compliance with the rules described above, and completed two VASs assessing their hunger status ('How hungry are you at the moment?' and 'Do you feel like eating something tasty?'; anchors: 'not at all (hungry)' and 'very hungry'/'very much so'). They then rated tastiness of eight types of chocolate and eight types of crisps on a scale from 1 to 10 and picked their three favorite chocolates and crisps (see Supplement). Participants then completed the instrumental learning stage. They finally completed the VAS assessing their current hunger status again.
On the second day (see Fig. 1 panel II), compliance was confirmed again. After the baseline saliva sample, vital signs measurements (diastolic and systolic blood pressure, heart rate) were taken and the VAS assessing hunger was completed. Participants then performed the reminder stage. After this stage, either crisps or chocolates were devalued. Simultaneously, instructions were given for the SOA stage and MRI safety procedures. After completing a VAS assessing hunger and tastiness of the devalued food they just ate, participants rinsed their mouth with water thoroughly and waited two minutes before the second saliva sample and vital signs measurement were taken. Then either the stress or the control MAST procedure was completed. Halfway into the MAST and at the end of the procedure, the third and fourth vital signs measurements were taken, respectively. At the end, participants also completed three VASs assessing experienced stress ('How stressful /painful/ unpleasant did you find the recently completed (water and counting) task to be?'), and the third saliva sample was taken.
Participants were then escorted to the 3 T MRI scanner, where they completed the rest of the session. Time between the end of the MAST and the beginning of the scan was limited to approximately five minutes, ensuring that the SOA stage was likely to co-occur with the expected peak in salivary cortisol elicited by the stress condition of the MAST (Smeets et al., 2012;Quaedflieg et al., 2017;Shilton et al., 2017). After conducting a localizer scan, the MRI sequences were aimed at assessing functional brain activation during the SOA stage (lasting approximately 20 min), resting state brain activation (to be reported elsewhere), T1-based structural data, and structural connectivity using a diffusion weighted imaging sequence (also to be reported elsewhere). The entire MR session lasted for approximately 55 min. Finally, the fourth saliva sample, the fifth vital signs measurements, and the last VAS assessing hunger were taken.

Behavioral data analysis
As mentioned earlier, several additional data-based exclusion criteria were used: 1) outlier on baseline cortisol measurement (median absolute difference, double/correcting for skewness, cut-off value of 3.5, see Leys et al., 2013), 2) insufficient learning of the associations (less than 2/3 of questions in explicit knowledge test after the 8th block and reminder stage correct), 3) performance ≤ 50% correct answers on valued trials in the SOA stage, 4) no mistakes on devalued trials in the SOA stage. Seven participants had to be excluded based on these criteria (three HC no-stress, one HC stress, one BED no-stress, two BED stress). The data of the final sample (N = 76) were checked for normality with Q-Q plots, skewness values, and kurtosis values. Since the cortisol data were non-normally distributed, they were log transformed for inferential statistical analysis. In case of violations of sphericity assumptions (significant Mauchly's test) Greenhouse-Geisser corrected p-values are reported. The Holm-Bonferroni method was used to correct the α = 0.05 for multiple comparisons in follow-up tests (Holm, 1979).
Performance data from the learning stage were analyzed to confirm adequate learning and no a-priori difference between groups. Specifically, we used a 2(Stress: stress, control) x2(Group: BED, HC) x8 (BlockLearning: B1-B8) mixed ANOVA with BlockLearning as withinsubjects factor. Performance during the reminder stage was analyzed similarly with a 2(Stress: stress, control) x2(Group: BED, HC) x2 (BlockReminder: B1-B2) mixed ANOVA. To confirm sufficient explicit knowledge of the S-R-O associations, analyses were performed on explicit knowledge test data obtained during the learning and reminder stages. Two 2(Stress: stress, control) x2(Group: BED, HC) ANOVAs, assessing group differences for the number of questions on which responses and outcomes were correctly identified.
Effectiveness of the devaluation procedure was determined by assessing the amount eaten during devaluation, hunger-, and willingness-to-eat. Data were analyzed using 2(Stress: stress, control) x2 (Group: BED, HC) univariate ANOVAs for amount eaten, and 2(Stress: stress, control) x2(Group: BED, HC) x2(Time: pre-devaluation, postdevaluation) mixed ANOVAs for both hunger-and willingness-to-eat ratings.
Performance on the SOA stage was expressed as percentage learned responses, and assessed using a 2(Stress: stress, control) x2(Group: BED, HC) x2(Value: devalued, valuable) mixed ANOVA.

Imaging data acquisition and processing
A Siemens MAGNETOM 3 T MRI scanner was used to perform functional imaging during the SOA stage. These data were acquired using a multi-echo sequence (TE1 = 15 ms; TE2 = 29.93 ms; TE3 = 44.86 ms) to optimize the signal for each voxel offline. Each volume consisted of 42 slices, 3 mm isotropic voxels in a 224 mm field of view, a TR of 1000 ms, a flip angle of 60 degrees and a multi-band acceleration factor of 3. High resolution T1-weighted structural images were acquired using a MPRAGE sequence with a TE of 2.34 ms, 256 slices, 0.7 mm isotropic voxels in a 224 mm field of view, a TR of 2400 ms and a flip angle of 8 degrees. Using FSL and custom shell scripts the three echo images were combined in one optimized 4D image in which the TE with the best signal-to-noise ratio was determined for each voxel. Realignment was performed with MCFLIRT for the first echo image and subsequently applied to the second and third echo images by registering them using FLIRT. Parallel-Acquired Inhomogeneity Desensitization weighting (PAID) was subsequently performed by splitting the first 30 volumes from the time series and smoothing them with a Gaussian kernel of FWHM 2 mm. A mean image and standard deviation were then calculated, and the mean was multiplied by the echo time and then divided by the standard deviation. Values were then adjusted so the value for each voxel is 1. The individual images were then divided by the total sum of all images, the weights were applied to the individual echoes, and the weighted echoes were summed to generate a weighted time series. This same pipeline was used to optimize the reverse phase encoded acquired images, which were used with FSL's topup to correct for susceptibility-induced off-resonance field associated distortions. High resolution structural images were extracted using Brain Extraction Tool (BET2).

Neuroimaging data analysis
For the functional MRI analysis, FEAT version 6.00 was used, part of FSL (www.fmrib.ox.ac.uk/fsl). Registration to high resolution structural images was carried out with linear registration (FLIRT), and normalization of high-resolution structural images to MNI152 standard space with nonlinear registration (FNIRT). Pre-processing consisted of a highpass filter using a 100 s cut-off, motion correction (MCFLIRT), spatial smoothing using a Gaussian kernel of FWHM 5 mm, grand-mean intensity normalization of the entire 4D dataset by a single multiplicative factor, the data were pre-whitened using FILM, and the first three volumes were removed from the analysis. None of the participants showed head motion (frame-to-frame) exceeding a threshold of > 3 mm (i.e., single voxel) that we used as exclusion criterion. The first-level analysis was performed using an event-related GLM approach in which the regressors in the model were time epochs associated with: 1) correct responses to valuable outcomes, 2) incorrect responses (slips) to devalued outcomes, and 3) six head motion parameters (3 rotational, 3 translational, estimated by MCFLIRT) to further correct for motion induced signal change. Events were time-locked to the stimulus onset and lasting two seconds.
The second-level analysis employed two contrasts. 1) Incorrect responses to stimuli leading to devalued outcomes (slips; devalued learned response ) > correct responses to stimuli leading to still valuable outcomes (valued learned response ). This contrast was used for the ACC, insula, OFC, and putamen. 2) The opposite contrast, valued learned response > devalued learned response , was used for the caudate nucleus (both are in line with Watson et al., 2018). To do so, a whole-brain analysis showing task activation in no-stress HC validated the selection of the ROIs (Watson et al., 2018) using permutation testing (FSL's Randomise; 10.000 permutations; p < .05 FWE; Family Wise Error, and TFCE; Threshold-Free Cluster Enhancement correction). Then the groups were compared within specific regions of interest (ROIs). ROIs were made for the anterior part of ACC, anterior insula, lateral OFC, and body of the caudate nucleus, independent from the functional data and based on peak voxel activation of Watson et al. (2018) and a posterior putamen ROI based on De Wit et al. (2012). The spherical mask was corrected for irrelevant voxels with the Harvard-Oxford structural atlas (https://fsl. fmrib.ox.ac.uk/fsl/fslwiki/atlases; 50% for OFC and posterior putamen, 30% for caudate body, ACC, insula; see Supplement for more details). Mean bilateral beta values were extracted from the ROI's and analyzed using 2(Stress: stress vs. control) by 2(Group: BED vs. HC) univariate ANOVAs (IBM SPSS Statistics for Windows, Version 25.0).

Preregistration and data availability
This study was preregistered on AsPredicted and can be found here: https://aspredicted.org/blind.php?x = ct4ht2. All data and code is available on request via the corresponding author and can be found here: https://osf.io/nvtx2/?view_only= 97d766cf63dd43b385d67a3d3 0627b4f.

Neuroendocrine stress responses
Stress and no-stress control groups differed in salivary cortisol responses when collapsed over Group (Stress*Time interaction

Instrumental learning stage
Performance increased significantly across the 8 learning blocks

Reminder stage
On day 2, learned response accuracy and explicit knowledge remained high in all groups (see Fig. 3 panel I), demonstrating successful retaining of associations in all groups. There was no difference between  the groups in percentage correct (Group*Stress*BlockReminder interaction: F (1, 67) < 0.002, p = .966; or main effects of Group or Stress: corrected p-values > 0.142). There were also no significant differences between the groups in explicit knowledge assessed after the reminder stage for the correct response (Group*Stress: F (1, 75) = 0.096, p corrected = 0.758, η p 2 = .001) and the outcome food (Group*Stress: F (1, 75) = 0.172, p corrected > 0.999; main effects: all p-values > 0.560).

Validation of the task with whole-brain analysis in no-stress HC
In line with previous studies using the SOA paradigm (Watson et al., 2018), contrasting devalued learned response > valued learned response revealed several clusters of activation differences (Fig. 4, and Table 1) in no-stress HC. Specifically, this analysis revealed three clusters, one in the ACC/paracingulate gyrus extending into the superior frontal gyrus, one in the left insula extending into the left OFC, and a final cluster in the right insula extending into the right OFC (see Fig. 4).

Comparison of groups with ROI analysis
Comparing mean beta values from the a-priori defined ROIs (ACC, insula, OFC, putamen) for devalued learned response > valued learned response revealed a significant difference between BED and HC in the putamen (F (1, 72) = 5.888, p = .018, η p 2 = .076), with a reduced difference between devalued learned response and valued learned response trials in BED compared with HC (see Fig. 5). There was no interaction between Group and Stress or a main effect of Stress for the putamen (all p's > 0.131). There was no significant difference between groups for the other ROIs, i. e., the ACC (Group*Stress: For the contrast valued learned response > devalued learned response there was no difference between groups for the body of the caudate nucleus (Group*Stress: F (1, 72) = 0.097, p = .757; main effects p's > 0.069).

Discussion
In this study, we aimed to investigate differences between BED and HC in stress-induced changes in brain activation and associated habitual and goal-directed behavior. All groups successfully learned the S-R-O associations, and the behavioral devaluation (as per quantity of food eaten and subjective hunger ratings) and stress induction procedure (determined by salivary cortisol levels and diastolic and systolic blood pressure) was equally successful in the BED and HC groups. Contrary to our hypothesis, there was no effect of stress or BED on performance in the SOA stage, with all groups having similar levels of learned responses towards devalued outcomes and valued outcomes. Assessing the balance between goal-directed and habitual behavior correlates in the brain (i.e., contrast devalued learned response > valued learned response ) in no-stress HC, we found activity in the ACC/paracingulate and bilateral OFC/insula. This finding is well in line with previous studies (e.g., Watson et al.,Fig. 4. Brain activation related to goal-directed behavior in HC. Contrasting devalued learned response > valued learned response , three significant clusters were found, covering the ACC/paracingulate, left OFC/insula, and right OFC/insula in no-stress HC (whole brain, p <.05, FWE, TFCE correction for multiple comparisons). More details about the clusters can be found in Table 1.

Table 1
Overview of clusters found for devalued learned response > valued learned response in HC in the no-stress condition. Nr. in the figure represents the numbers given to the various clusters in Fig. 4. Clusters were found to be significant after FWE correction for multiple comparisons and TFCE applied. 2018, De Wit et al., 2011) that also found activity in the ACC, paracingulate, OFC, and insula when contrasting trials with devalued and valued outcomes. The ROI analyses comparing groups revealed a difference in putamen activity between BED and HC groups. Importantly, although BED status did not modulate task performance, we found an activation difference in the putamen between BED and HC. In BED, there was a smaller difference in putamen activity between valued and devalued outcome trials compared with HC. The posterior putamen has previously been shown to be associated with habitual behavior (Tricomi et al., 2009;De Wit et al., 2012;Delorme et al., 2016), although not consistently (Valentin et al., 2007;Watson et al., 2018). It is important to mention that some studies demonstrating the involvement of the putamen in habitual behavior have not done so by measuring brain activation through fMRI, but rather by measuring structural connectivity through diffusion imaging (e.g. De Wit et al., 2012;Delorme et al., 2016). It should also be noted that brain mechanisms underlying habitual and goal-directed behavior are said to compete for behavioral control (e.g. Gillan et al., 2011;Watson et al., 2018;De Wit et al., 2018). As such, shifts in the balance between these types of behavior could be the result of a decrease in one type of behavior or an increase in the other, making the interpretation of changes in brain activation resulting from a contrast between these respective task conditions impossible. To our knowledge, there is no clear model yet that explains what changes in posterior putamen activity would mean for behavioral performance during devalued trials. Nonetheless, what aligns best with the current results is that the putamentogether with the rest of the dorsal striatum -appears to be involved in processing reward value of the outcome during habitual behavior (e.g. Graybiel and Grafton, 2015;Peterson and Seger, 2013). The encoding of the reward value might however be more in the anterior and middle part of the putamen, and not the posterior part (Brovelli et al., 2011). Nevertheless, the putamen activation differences are in line with previous studies showing differences in both regional volume and connectivity of the putamen in BED patients (see Balodis et al., 2015 for an overview).
The absence of a shift to habitual behavior in BED compared with HC in the current study contrasts with findings from Voon and colleagues (2015). They demonstrated that BED patients showed dominant habitual behavior over goal-directed behavior compared with obese subjects, albeit using a different paradigm. Specifically, they used a Mean beta values are displayed for the ROIs (anterior part of ACC, anterior part of insula, lateral part of OFC, posterior putamen) with the contrast devalued learned response > valued learned response . A significant difference for mean beta value was found between HC and BED for the posterior putamen (p-value =.018). The ROI masks are displayed next to the corresponding mean beta values. Panel B. Mean beta values for the ROI caudate nucleus body, for the contrast valued learned response > devalued learned response . Error bars represent ± S.E. sequential decision-making task in which the outcome value changes multiple times during the task. In contrast, the SOA paradigm used in the current study relies on outcome devaluation only once before the testing phase . These methodological differences could also explain the lack of difference between the groups in instrumental behavioral responding. On the other hand, the absence of an effect of BED status on the balance between goal-directed and habitual behavior dovetails nicely with absent differences between HC and other eating disorders like anorexia nervosa (Godier et al., 2016), but also obese and overweight participants (e.g. Watson et al., 2017;see Ciria et al., 2021 for an overview of the literature). Previously, an altered balance between goal-directed and habitual behavior has mostly been shown by disorders characterized by strong repetitive habit-like symptoms, such as addiction, Parkinson's disease, obsessive-compulsive disorder, and Tourrete's syndrome (Sjoerds et al., 2013;De Wit et al., 2011;Gillan et al., 2011;Delorme et al., 2016). Although several cortico-striatal loops have been shown to be affected in BED (e.g., Balodis et al., 2015), it is possible that eating disorders and obesity do not affect the brain structures sufficiently to elicit a general change in behavior, but that such behavioral changes are only manifest during binge eating episodes. Indeed, it has been suggested that in bulimia nervosa, a disorder similarly characterized by binge eating, goal-directed behavior might be unaffected outside binge eating episodes (Neveu et al., 2018). Future research could aim at specifically investigating factors that affect habitual and goal-directed behavior in binge eating episodes specifically, for example by including foods consumed during the episodes.
In line with previous studies (Watson et al., 2018), we found activation in the ACC/paracingulate and OFC/insula in no-stress HC that is associated with goal-directed behavior. We did not find any differences in activity in our ROIs between the stress and no-stress groups. In a recent study (van Ruitenbeek et al., 2021), we showed that acute stress decreased activity in the posterior putamen and insula. Since we did not find differences between BED and HC in areas such as the ACC, insula, and OFC, it is likely that demands on conflict monitoring areas were comparable across groups, leading to a similar number of slips in BED and HCs. The insula has been associated with a large variety of functions (Uddin et al., 2017), but the anterior division in particular has been shown to be crucial for cognitive control capacity, with lesions reducing the amount of potential cognitive load (Wu et al., 2019). In the current study for the OFC, activity was higher for devalued outcome trials in which there is conflict between goal-directed and habitual behavior in none stressed HC. This corroborates previous studies suggesting a role of the OFC in conflict monitoring or inhibitory control (see also Jonker et al., 2015;Brockett and Roesch, 2020), rather than goal-directed behavior itself (e.g., Valentin et al., 2007). It should be mentioned that there may be other roles for the OFC during a SOA paradigm, such as resolving reward sensitivity and outcome expectancy (Seabrook and Borgland, 2020;Brockett and Roesch, 2020). The current study also shows that activity in the ACC / paracingulate gyrus is associated with the balance between goal-directed and habitual behavior. Both the ACC and paracingulate have been implicated in response conflict detection and are commonly found in a wide variety of tasks (e.g., Zhang et al., 2017;Shenhav et al., 2016). However, it has also been proposed that the ACC plays a more central role in reward value and actions based upon it (e.g., Alexander & Brown, 2019; Shenhav et al., 2016). A model incorporating many proposed functions for the ACC, places the ACC on top of a cognitive control system determining selection and execution of behavior over an extended time (Holroyd and Yeung, 2012). In short, this network of the ACC/paracingulate and OFC/insula is responsible for monitoring the balance between the goal-directed and habitual behavior systems, through conflict detection and other related cognitive processes such as determining outcome expectancy.
Acute stress did not shift the balance between goal-directed and habitual behavior, contrasting several previous studies showing that stress induces a shift from goal-directed to habitual behavior (e.g., Schwabe and Wolf, 2009;Hartogsveld et al., 2020). This may be due to several boundary conditions for the stress-induced shift from goal-directed to habitual behavior, such as cortisol reactivity , chronic stress (Soares et al., 2012), and working memory capacity (Otto et al., 2013;Quaedflieg et al., 2019). In addition, Zerbes and Schwabe (2021) recently showed that the stress-induced shift is dependent on the length of training during the acquisition phase, with participants who received moderate (but not prolonged) training being receptive to acute stress effects. Even without administrating acute stress, the instrumental learning and outcome devaluation procedure have led to inconsistent results in HC (e.g., De Wit et al., 2018). In an earlier study (Hartogsveld et al., 2020), we aimed to improve the effectiveness of the devaluation procedure by adding reward-based devaluation (i.e., eating to satiety), which has been shown to be effective in other paradigms Wolf, 2009, 2011). It is also possible that other underlying boundary conditions (e.g. working memory capacity, life stress exposure) may result in inconsistent findings across studies, and in particular in our sample of BED patients and HC. The precise influence of these boundary conditions might be a good candidate for future studies to examine.
Some limitations of the current study deserve to be mentioned. Firstly, comorbidities of other psychological disorders (e.g., anxiety disorder, depression) are common among people with BED and the prevalence is associated with binge eating episode frequency (Citrome, 2019). We also excluded participants with psychological disorders in order to isolate the effect of BED. Generalizability to samples with BED including those with comorbidities may therefore be limited. Secondly, it is possible that the explicit knowledge tests during the learning stage directed participants' attention to the action-outcome associations. The attention to the associations at the later stage of learning may have rendered habit formation more difficult, potentially leading to more pronounced goal-directed behavior during the SOA stage (see Feher da Silva and Hare, 2020 for a further discussion). Thirdly, we did not specifically use the food items that the BED patients consumed during their binge eating episodes. It is therefore possible that using food items tailored to their binge episodes would have led to more pronounced habitual behavior (e.g. because of distinctive reward values compared to foods not previously consumed during binge eating episodes). Lastly, for the ROI analysis we did not correct for the number of statistical comparisons due to multiple ROIs.
Taken together, the current results of a difference in putamen activity between BED and HC suggest that monitoring of responses on correctness and / or differentiation of reward value is affected in BED, although not sufficiently to induce a shift from goal-directed to habitual behavioral. Stress did not affect the balance between goal-directed and habitual behavior in BED nor in HC. Even though stress and cognitive control play a large role in BED, it is possible that this effect is behaviorally limited to actual binge eating episodes specifically, rather than an effect that can be measured outside binge eating episodes. As this is the first study investigating brain activation during a SOA paradigm in BED, more research is needed to determine whether differences in putamen activation affect goal-directed and habitual behavior.

Declaration of Competing Interest
None.
Matau for their help in collecting the data and recruitment of participants. We also thank Michiel Vestjens for his invaluable help with programming the instrumental learning task.