Task-specific subnetworks extend from prefrontal cortex to striatum

Functional magnetic resonance imaging (fMRI) studies on the dynamic representation of task content focus preferentially on the cerebral cortex. However, neurophysiological studies report coding of task-relevant features also by neurons in the striatum, suggesting basal ganglia involvement in cognitive decision-making. Here we use fMRI data to show that also in humans the striatum is an integrated part of the cognitive brain network. Twelve participants performed 3 cognitive tasks in the scanner, i.e., the Eriksen flanker task, a 2-back matching spatial working memory task, and a response scheme switching task. First, we use region of interest-based multivariate pattern classification to demonstrate that each task reliably induces a unique activity pattern in the striatum and in the lateral prefrontal cortex. We show that the three tasks can also be distinguished in putamen, caudate nucleus and ventral striatum alone. We additionally establish that the contribution of striatum to cognition is not sensitive to habituation or learning. Secondly, we use voxel-to-voxel functional connectivity to establish that voxels in the lateral prefrontal cortex and in the striatum that prefer the same task show significantly stronger functional coupling than voxel pairs in these remote structures that prefer different tasks. These results suggest that striatal neurons form subnetworks with cognition-related regions of the prefrontal cortex. These remote neuron populations are interconnected via functional couplings that exceed the time of execution of the specific tasks.


Introduction
Meaningful behavior is founded on the ability to select the most appropriate action given the available information. The appropriateness of an action is related to its outcome and is predicted by rules and goals that have been acquired through past learning. This ability is generally referred to as 'cognition', and it is in operation at all levels of our every-day life.
Decennia of research, both in animals and humans, have uncovered the relative preference of cortical and subcortical regions in a broad array of cognitive tasks, and revealed that cognition involves a large-scale network of areas. In it the lateral prefrontal cortex is thought to play a central role, as the integrator of the diverse elements that make up a task e elements that each are represented in less central brain structures (Duncan, 2001;Miller & Cohen, 2001;Passingham & Sakai, 2004). To fulfill its function, it interacts intimately with sensory cortices via the dorsal attention system (Corbetta & Shulman, 2002;Womelsdorf & Everling, 2015), and with motor-related cortices via the medial motor areas (Nachev et al., 2008;Paus, 2001). A range of cognitive tasks have been shown to co-activate the areas of this network (Duncan & Owen, 2000;Mennes et al., 2006;Stiers et al., 2010). Cellular recordings in non-human animals unveil the finegrained cortical organization. They show that functional specialization exists also at the sub-area level, and that cortical areas are at a finer spatial scale characterized by functional heterogeneity (Funahashi, 2013;Goldman-Rakic, 1995;Takeda & Funahashi, 2007). This fine-scale specialization is also evident in functional imaging studies in humans, where multivariate pattern classification supports distributed representation of task-relevant information (Stiers et al., 2010;Waskom et al., 2014;Woolgar et al. 2011Woolgar et al. , 2015 and voxel-wise functional connectivity shows sub-areal heterogeneity in connectivity profiles (Stiers & Goulas, 2018). At a larger scale, the different neuron populations that give rise to this heterogeneity are integrated across areas into parallel taskspecific subnetworks within the cognition network (Stiers et al., 2010;Stiers & Goulas, 2018;Waskom & Wagner, 2017).
While cognition research has strongly focused on the cerebral cortex, a growing body of studies suggests a contribution of the basal ganglia to cognition. Anatomically, it is well established that thalamic inputs to the prefrontal cortex, including the areas known to be involved in cognition, are chronically inhibited and selectively gated by the basal ganglia (Haber, 2016;Redgrave et al., 2010). Corticalebasal gangliaethalamicecortical circuits are topographically organized, with putamen controlling input to motor-related areas, the caudate nucleus gating input to lateral prefrontal cortex and ventral striatum regulating affect and value-related orbital and ventromedial prefrontal cortex (Alexander et al., 1990;Haber, 2016;Redgrave et al., 2010). Physiological studies provide additional evidence that the basal ganglia, and particularly the striatum, contribute to cognitive decision making (Bissonette & Roesch, 2017;Chiu & Egner, 2019;Graybiel & Grafton, 2015;Helie et al., 2015). For instance, striatal neurons in macaque monkeys encode relevant task features (Hori et al., 2009;Ito & Doya, 2015;Samejima et al., 2005) and show activity coupling with prefrontal neurons during task execution (Antzoulatos & Miller, 2014). Several recent fMRI studies aimed at investigating striatal contributions to human cognition, but focused on a specific function, such as categorization learning (Helie et al., 2015), cognitive control (Chiu & Egner, 2019;Jiang et al., 2015;Robertson et al., 2015) or task switching (Kehagia et al., 2017;van Schouwenburg et al., 2010). Our aim in the present study is to show that the striatum is not just involved in cognition in specific situations such as learning or task switching, but that it is an integrated part of the cognitionrelated brain network. We do this by elucidating a finegrained functional organization in the human striatum that is similar to that in cortical areas involved in cognition. We collected fMRI data 1 from 12 participants performing 3 cognitive tasks, i.e., the Eriksen flanker task, a 2-back matching spatial working memory task, and a response scheme switching task. Using multivariate pattern analysis (MVPA) we establish that different tasks induce temporally stable, spatially distributed activation patterns in the striatum, as well as in the lateral prefrontal cortex. Next, we use voxel-wise functional connectivity analysis to show that these activation patterns in striatum and in lateral prefrontal cortex constitute interdigitated task-specific subnetworks. These subnetworks are not only present during the execution of cognitive tasks, but their coalitions persist during the taskfree (resting) state. This indicates a more stable wiring code that binds together neuronal populations across the prefrontal cortex and the striatum for supporting cognition and behavior. The fact that this organization pertains to the cerebral cortex and the striatum indicates an overarching, brainwide organizational principle.

Methods
We report how we determined our sample size, all data exclusions (if any), all inclusion/exclusion criteria, whether inclusion/exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.

Participants
Twelve people volunteered to participate in this study. They were between 20 and 47 years old and 5 were males. All participants were right-handed and free of neurological or psychiatric conditions. No person who volunteered for the study violated any of these conditions. All participants gave their informed consent to participate in the study, which was approved by the local ethics committee. No participant was excluded from the study because of suboptimal performance, excessive head movement in the scanner, or for any other reason.

Behavioral tasks
The participants performed three tasks with different cognitive demands during scanning. All three tasks were administered in alternating short blocks of 5 trials (Fig. 1A): -Eriksen task (Fig. 1B): Participants had to indicate the left or right orientation of a centrally positioned arrowhead, which was flanked on the left and the right with similar arrow heads that either had congruently the same or incongruently the opposite orientation. In two trials the flanking arrow heads were congruent with the central one and in three trials they were incongruent. Congruent and incongruent trials alternated randomly. -Back-matching task (Fig. 1C): The stimulus lay-out was a three-by-three array of empty squares. A stimulus consisted of the letter "X" presented centrally in one of the nine squares for 1 sec. Participants matched the current array-position of the X with its position two trials back in time (2-back). Because information needs to be retained over trials, the first two trials in a block could not be a target. Nonetheless, participants needed to start monitoring and feeding information into their working memory starting with the first trial. -Response scheme switching task (Fig. 1D): The stimulus array consisted of five open squares positioned in a horizontal row. A stimulus was a colored square filling one of these five placeholders. Participants indicated whether the square in the current trial had jumped to the left or to the right compared to its position on the preceding trial. In congruent trials the stimulus square was black and participants had to press the key indicating the direction of the jump. In incongruent trials the stimulus square was white and participants indicated the direction opposite to the jump. In each block, the stimulus was white on two of the five occasions, in a random order. The first stimulus in a block was always black and did not require a response.
Stimuli were black or white against a gray background, back-projected on a screen positioned vertically at the end of the scanner bore. Each task block consisted of 5 trials of the same task. A 500 msec cue preceding the first trial of each block by 4 sec announced what task would be administered. The cue consisted of the first letter of the task name (e.g., "E" for Eriksen, etc.). The trial onset interval in a block varied randomly between 1.5 and 2.0 sec, in 100 msec steps. In the Eriksen and Backmatching tasks the first 1000 msec of this interval comprised the presentation of the stimulus. In the Switching task the stimulus remained visible for the entire interval, to avoid working memory demands.
Participants responded via a standard key pad operated with the index and middle finger of the right hand. For the Eriksen and the Switching task the key presses indicated the left or right direction. For the Backmatching task the index finger indicated a match and the middle finger a mismatch. Note that all trials required a motor response.
A single run consisted of 6 blocks of each task for a total of 18 blocks. The order of the blocks was pseudo-random, with the restriction that no task was administered in two consecutive blocks. The time between blocks was 14 sec. Each participant completed six runs with the three tasks.
The average reaction times and error rates of individual participants in each of the task trials were statistically compared with pairwise repeated measures t-tests, while the significance level was Bonferroni-corrected for three comparisons, with a ¼ .0166.

2.3.
MRI image acquisition and processing 2.3.1. Image acquisition Brain images were acquired on a Siemens MAGNETOM Allegra 3T MRI head-only scanner, while head motion was constrained with foam padding. A T2*-weighted gradient echo planner pulse sequence was used to generate functional images of the whole brain including the cerebellum (repetition time (  A) The temporal structure of the paradigm. Participants performed three tasks administered in block of 5 trials per task with an inter-block interval of 14 sec. Each block was initiated by a cue stimulus indicating which task participants were going to perform. The tasks administered were Eriksen flanker task (B), a spatial backmatching task (C) and a response scheme switching task (D). See text for detailed description of the tasks. c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 slices ¼ 192). Voxel size was 1 Â 1 Â 1 mm. We used all collected data from Stiers and Goulas (2018) in the current analyses.

Preprocessing
The data was pre-processed using the SPM5 software (Wellcome Department of Cognitive Neurology, University College, London, UK). The task-related fMRI data and the resting state fMRI data were preprocessed in the same way. This included re-alignment, distortion correction using the field map, slice time correction, and co-registration with the anatomical scan. The T1-weighted anatomical image was segmented into tissue density maps and non-linearly normalized to the MNI template in an integrated procedure. The resulting normalization parameters were applied to the functional images, which were then resliced to 2.0 mm isotropic voxels. No smoothing was applied beyond the smoothing that is inherent in reslicing and resampling the images at the end of the preprocessing.

Head motion handling in task-related data
The preprocessed data is used in three ways in subsequent analyses: 1) to estimate single trial activations for the multivariate pattern analysis, 2) to estimate relative task preferences of individual voxels using a univariate statistical analysis across all trials of a particular task, and 3) to estimate functional connectivity strength during task execution. The specifics of these estimations will be described in Methods, Sections 2.5.1, 2.6.1, and 2.6.2, respectively. Here we only point out how head motion effects were handled for these estimations. First, the realignment parameters derived from preprocessing each run were used to linearly regress out the effects of head motion on the measured data. Second, trials contaminated with fast head movement were identified and either left out (MVPA) or modeled as a separate class of events of no interest (univariate analysis). Fast head motion was defined as volume-to-volume displacement exceeding .4 mm (10% of the slice thickness) in the z-dimension. For all trials with an onset within the time window of four volumes preceding this head motion contaminated volume, the hemodynamic response was considered contaminated by fast head movement, and these trials were set aside as events of no interest.

Resting state data
The resting state data set of each participant was subjected to a regression analysis to remove nuisance effects. Linear trends were removed by applying a whitening filter, as a standard procedure in SPM. Realignment parameters and the average signal from the white matter and the cerebrospinal fluid (but not the whole brain or the grey matter) were added to the model, together with a constant regressor to remove the session specific mean. Lastly, intrinsic autocorrelations were removed from the data following the standard SPM procedure. The residual data were Fourier band pass filtered from .1 to .01 Hz to retain only the low-frequency fluctuations in the signal. After filtering, volumes contaminated with excessive head movement (volume-to-volume displacement >10% of slice thickness, i.e., .4 mm) were eliminated, together with the volumes 1 back and 2 forward. This data was also used for the functional coupling study (see Results, Section 3.3).

Region of interest identification
Regions of interest (ROIs) for the analyses were defined from existing atlas data. The individual anatomy of participants was taking into account by taking the intersection of each template ROI with the individual's grey matter density map cut off at .5 density. The template ROI masks as well as the individual grey matter density maps were resliced to the image grid of the functional data. The cortex in and around the inferior frontal sulcus has been strongly implicated in cognition, i.e., in tasks requiring rule-based stimulus response mapping, and has been demonstrated to exhibit the multiple demand property (Duncan, 2010;Stiers et al., 2010). A ROI mask for this cortex was created by merging the clusters 11 ("8Av"), 5 ("9/46v-I") and 8 ("46") in the resting state functional connectivity based parcellation study of Goulas et al. (2012) (Fig. 2A). The clusters 5 and 11 together cover the posterior half of the inferior frontal sulcus, including the inferior frontal junction, and constitute the posterior part of the lateral prefrontal cortex ROI. This sub-ROI, which we will refer to as "IFS", overlaps considerably with the cluster 11 in Fedorenko et al. (2013), but extends more anteriorly into the IFS. The cluster 8 ("46") from the parcellation study of Goulas et al. (2012) covers a part of the anterior middle frontal gyrus, and presumably Brodmann area 46 (Goulas et al., 2012;Rajkowska & Goldman-Rakic, 1995). This cluster constitutes the anterior portion of our lateral prefrontal cortex ROI and will be referred to as dorsolateral prefrontal cortex (dlPFC). Because the parcellation study by Goulas et al. (2012) focused on the left hemisphere, a right hemisphere ROI was created by flipping the left hemisphere mask to the right hemisphere.
The striatum ROI masks are illustrated in Fig. 2B. For the caudate and putamen ROIs we used the fields 71e74 from the Automated Anatomical Labeling (AAL) map (WFU PickAtlas) (Maldjian et al., 2003;Tzourio-Mazoyer et al., 2002). Since there is no AAL label for the ventral striatum, the masks for the ventral striatum were created from the striatal AAL fields using the definition and instructions of Tziortzi et al. (2011). The ventral striatum mask covers the area functionally defined as ventral striatum in connectivity studies and includes nucleus accumbens, medial caudate nucleus and rostroventral putamen (Tziortzi et al., 2011). Since the ventral striatum mask was subtracted from the caudate and the putamen masks, there was no overlap between the striatum ROI masks.

ROI-based multivariate pattern analysis
Multivariate pattern analysis (MVPA) of the three task contexts was conceived as a pairwise binary classification problem: Eriksen versus Backmatching, Eriksen versus Switching and Backmatching versus Switching. An estimate of the overall decodability of task contexts was obtained by averaging accuracies for each pairwise comparison.
c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 2.5.1. MVPA procedures To investigate whether the task being performed can be decoded from the BOLD values of the voxels in a ROI we used a machine learning algorithm to discriminate activation data from trials of two different tasks. The algorithm was trained on a subset of the available data and we tested the learned discriminative pattern by using it to classify the remainder of the data not included in the training set. The machine learning algorithm used was the Spider implementation of the support vector machine-learning (SVM) algorithm (version 1.71; www.kyb.mpg.de/bs/people/spider/main.html), equipped with a linear kernel and assuming unbalanced classes. Normalized and unsmoothed functional data were the raw data for the MVP analysis. Separate analyses were performed for each ROI, participant, and pair of tasks. For each analysis, voxels were selected that fell within the intersection of the individual's grey matter tissue map and the ROI-specific mask (see Methods, Section 2.4). For the selected voxels, single trial BOLD response strength estimates were obtained by creating trial-specific epochs, starting from one volume prior to trial onset and including the six subsequent volumes (7 volumes in total). The single epoch data was de-trended by applying a general lineal model analysis. The model included a linearly increasing variable, a constant regressor to remove the average signal, and the six realignment parameters from preprocessing. The average residual values in the 4th and 5th volume in the epoch (or 4e8 sec after trial onset) were used as a measure of the trial specific response of the selected voxels. Only correct trials not contaminated by head motion (see Methods, Section 2.3.3) were included in the MVP analysis. The single-trial response values were rescaled per voxel to the 0e1 range, across all trials in the training set and across the two tasks that the classifier was going to be trained to discriminate. The same rescaling parameters were then applied to the voxel values of the test examples, with rescaled values higher than 1 or lower than 0 being reset to 1 and 0, respectively.
The division of the data in training and test examples was based on the runs. There were six runs of data collected. Each run was kept apart once as test data set, while data from the other five runs was then used to train the algorithm. This means that we had 6 different training folds, in which each time a different run was set aside as test set, while the other 5 runs were used as training data. To further avoid overfitting of the data, at each training fold the training set was split into 5 equal sub-sets of trials, and the classification was performed 5 times, each time using only 4 sub-sets, while leaving a different sub-set out of the analysis (i.e., not used for training or testing). After completing the five classifications, the average absolute discriminative weight of each voxel over the 5 split-wise analyses was computed, and used as the result for that specific training fold. At this level e i.e., the analysis of a particular fold of the training set e a recursive feature elimination procedure (De Martino et al., 2008) was applied, which comprised of eliminating the 20% voxels with the lowest average absolute weight (i.e., the 20% voxels that contributed the least to the previous classification) and then repeating the five-split-analyses with the reduced voxel set. This iterative process was repeated 20 times, yielding 20 classification accuracies for a particular training fold. After repeating this whole recursive process for each of the six training folds, the six series of classification accuracies for the 20 levels of the recursive feature elimination process were collapsed into one series by averaging the accuracy over the 6 folds at each of the 20 recursive elimination levels. The final accuracy for the whole analysis for a particular ROI, participant and task pair was the highest classification accuracy observed in this series of 20 accuracy values.
To investigate the dependency of the results on the choice of algorithm, the basic analysis was repeated with the regularized logistic regressor algorithm included in the Princeton MVPA toolbox (http://www.csbmb.princeton.edu/mvpa). Moreover, to investigate the dependency of the results on the Fig. 2 e Regions of interest (ROI) masks overlayed on the Colin brain (MRIcron). A) Lateral prefrontal cortex ROIs. The inferior frontal sulcus (IFS; violet) mask comprised of "8Av" and "9/46v-I" in Goulas et al. (2012). The dorsolateral prefrontal cortex (dlPFC; yellow) mask consisted of area "46" in Goulas et al. (2012). These masks, merged together, constituted the lateral prefrontal cortex ROI mask. B) Striatal ROIs. The masks divide the striatum in nonoverlapping caudate nucleus (green), putamen (blue), and ventral striatum (red).
choice of MVPA parameters, the basic MVP analysis with the SVM classifier was repeated with 50 instead of 20 recursive feature elimination iterations and with 15 instead of 5 splits on the training data.

Temporal effects analysis
To investigate changes over time in the classification of task contexts, trials were re-grouped in smaller sets of two or three runs. The first time-related investigation was a split-half analysis, in which separate analyses were perform on data from only the first three runs and from only the last three runs. Because this substantially reduces the number of trials in training, we switched from the leave-one-run-out crossvalidation to a leave-two-examples-out cross-validation. This cross-validation method leads to higher classification accuracy values. Etzel et al. (2011) showed that this accuracy gain derives from the increased ratio of training over test examples, rather than contamination by training examples coming from the same run. Moreover, for a better comparison between cortical and striatal decoding in terms of the number of voxels/features, we chose to work with the IFS sub-ROI alone, instead of the entire lateral prefrontal cortex ROI. The average number of voxels in the bilateral IFS mask was 1947 ± 126, compared to 1926 ± 304 in the bilateral striatum mask [t (11) ¼ .2, 2-tailed p ¼ .817]. All other parameter settings and procedures were as described above.
The second time-related analysis tested the generalizability of a discriminative pattern over time. The trials of the first two runs were used for training the classifier, while trials from either the 3rd and 4th run, or trials from the 5th and 6th run, were used as test examples. In this case no leave-out procedure was applied to the training set. Hence, training was performed on all examples from the first two runs that were available for a particular participant. However, the 5-way splitting procedure on the training set (divide the training set in 5 subsets and repeat training on only 4 subsets, setting each time another subset unused aside), safe-guarded against overfitting of the data. To preserve the independence between test and training examples, the test examples were rescaled to the 0e1 range based on the scaling parameters computed for the training set, as described above.

Assessing statistical significance of MVPA results
A randomization procedure assessed the statistical significance of MVPA results, based on 125 re-analyses of the data for a particular ROI, task pair and participant, each time with a random re-distribution of the training labels. Apart from this label-shifting, the exact same steps were followed as described above (cross-validation, splitting, recursive feature elimination and best iteration selection). The random reassignment of labels mimics the null-hypothesis that there is no systematic association between feature values and classes, so that the class labels are interchangeable. The number of randomizations was kept low to keep the computation time within reasonable limits. This low number of tests may affect the tails of the null-distribution, and therefore the reliability of assessing statistical significance beyond the .05 level of individual participant analyses. Therefore, statistical significance of results was also assessed at the group level, and the randomization analysis was used to assess the mean and standard deviation of the null-distribution for each analysis. Based on the null mean and standard deviation, the participant's observed accuracy for that ROI and task pair were transfered into a Z-score. This score expresses the excess of accuracy observed relative to the null distribution, taking into account not only the estimated mean but also the spread of the null-distribution. Then, across these participant-specific Z-scores for a particular ROI and pair of tasks, a one-sample t test against the expected value of 0 was performed, and evaluated at the one-tailed significance level of .05. To help appreciate the size of the reported effects, for some critical comparisons we also report Cohen's d, which is the standardized mean effect.
Randomization accuracies for the overall decodability across the three tasks were computed from the 125 randomizations that were computed for each pairwise comparison separately. At each of the 125 iterations of the randomization process, the average was computed of the three accuracies obtained for each of the three pairwise comparisons. This yields 125 averaged accuracies, each expressing a possible classification accuracy across the three pairwise task comparisons, when there is no relationship between the training examples and the task labels. Statistical significance was established as described above for pairwise comparisons.

Visualization of voxel classification importance maps
The importance of a voxel for classification is expressed in the classifier weight that the voxel receives during the training process. Visualizing these weights gives an impression of the clustered or segregated location of the voxels that contribute most to the classification of the task contexts. The values of the weights in a SVM classification can be positive or negative, reflecting the support of the voxel value for one or the other of the two labels. Hence, the absolute size of the weight informs about the importance of the voxel. For two participants, we created 3D voxel maps of the 10% most contributing voxels in each pairwise classification of examples. These maps comprise the 5% voxels with the highest positive and the 5% voxels with the highest negative classifier weight. The percentage threshold is arbitrary and chosen only for optimally visualizing the spatial spreading of contributing voxels while avoiding too large voxel clusters. It does not imply that other voxels are irrelevant for the classification.

Functional coupling between prefrontal and striatal voxels
The functional coupling between voxels was studied in subdivisions of the lateral prefrontal cortex and/or of the striatum. The ROIs included were the bilateral IFS and dlPFC sub-ROIs for the lateral prefrontal cortex and the bilateral putamen and caudate nucleus sub-ROIs for the striatum. This subdivision of cortical and striatal ROIs made it possible to compare remote functional coupling between cortex and striatum with remote coupling within each of these structures. Three classes of voxels were compared. These classes were based on the voxels' functional preference for one or the other task. Thus, the strength of functional coupling between voxels in separate sub-ROIs was compared for voxels from different preference classes, for instance, voxels preferring Eriksen compared to voxels preferring one of the other two tasks. Again, only voxels belonging to individual participants' grey matter mask were included.

Voxel-wise task preference profiles
Voxels in the regions of interest were categorized into several task preference types according to their response strength during each of the tasks, as described in Stiers et al. (2010) and Stiers and Goulas (2018). All six runs of unsmoothed task-related functional volumes were used for this. Measures of response strength during task execution were derived from a whole brain first-level univariate general linear model (GLM) analysis performed in SPM12. Each task in each run was modeled by marking the occurrences of individual trails as separate events. Task cues preceding each block of trials were modeled by one separate variable across tasks. Similarly, error trials and trials contaminated with fast head movement were modeled across tasks as a separate class of events of no interest. The modeling variables were each convolved with the theoretical hemodynamic response function and its time and dispersion derivatives. The six realignment parameter values were included in the model. Lastly, run-specific mean regressors were added to neutralize baseline signal differences between runs. After estimating the weights associated with each of the components in the model, the task-specific regressor weight maps were used to compute whole brain percent signal change images. These images were sampled to get for each of the selected voxels three percent signal change values, one for each task.
The three task specific percent signal change values constituted the task preference profile for each voxel. These preference profiles were assigned to one of six possible prototypical profiles, each defined by a binary 1 Â 3 vector in which each position represents one task and the binary value at each position indicates whether the corresponding task is being preferred or not by the voxel. Thus, [1 0 0] indicates preference for the Eriksen task, and [0 1 0] indicates preference for the Backmatching task, while [1 0 1] indicates preference for Eriksen and the Response Scheme Switching task. Voxels were assigned to a specific prototypical profile by correlating their observed task preference profile with each of the six possible binary profile types. The voxel was assigned to the type that yielded the highest correlation. Therefore, every voxel fell within one preference type only. Our primary interest was in voxels with a clear preference for one task over the other two tasks. This was because functional couplings between voxels are likely to become less specific if the voxels respond to more than one task. In that case, the time courses of these voxels reflect activity in two or more neuron populations of different functional specialization. Therefore, our main analysis focused on voxels with a preference for one of the three tasks (i.e., a "mono-preference": [1 0 0], [0 1 0], and [0 0 1]). However, for the sake of completeness we also performed in parallel a similar analysis including voxels of all six task preference types (including "multi-preference" voxels).
As an additional step in voxel selection, only voxels with a significant task-dependent modulation of their BOLD signal were included. The significance was assessed in a disjunctive F-contrast evaluating any pairwise difference between the three tasks, and defined and computed within the single subject GLM analysis of the six runs. The functional coupling analyses reported below only included voxels for which this Ftest was significant (p < .05), false discovery rate corrected for multiple comparisons (Genovese et al., 2002).

Task-related functional coupling
The first form of functional coupling between voxels that we investigated was the coupling during execution of the tasks. Neuron populations that work together to accomplish a particular task are likely to show similar fluctuations in activity measures, induced by stimulus-locked events, common afferent input, or phasic coupling of neural assemblies (Friston 1994(Friston , 2011. We particularly wanted to look at the signal fluctuation over time, independent of the overall response strength of the voxel. Therefore, we extracted the volume to volume fluctuations in the MRI signal, but only for volumes reflecting BOLD signal modulated by participants' performing a task, and without filtering out high frequency fluctuations. The continuous fMRI data was divided into epochs starting with the volume during which the task cue was presented and including 12 subsequent volumes. Each epoch was detrended by applying a general lineal model analysis, which including a linearly increasing variable and the six realignment parameters corresponding to these volumes. From every epoch the residual values from volumes 6 to 10 were selected. This confined the data to the part of the hemodynamic response wave when the BOLD signal was high e i.e., starting with the volume acquired after 4 sec (2 volumes) of cue time plus 6 sec (3 volumes) after the start of the first trial, and ending 10 sec later (average trial duration was 1.75 sec). The five selected volumes in each epoch were mean-standardized by subtracting from the voxel values in each of the five volumes the average voxel signal over all five time points. This eliminated signal increase relative to baseline (i.e., the periods when no task was being performed), leaving only the volume-to-volume fluctuations in the signal. These standardized epoch time courses, derived from blocks of the same task, were concatenated across runs, to yield for every voxel a task-specific time course. Next, for each task separately a Pearson correlation matrix was created by voxel-by-voxel correlating the time series vectors of all selected voxels in the four sub-ROIs. The matrices were Fisher-Z transformed to correct for the nonnormal distribution of the correlation statistic. From these functional connectivity matrices, voxel pairs could be selected based on the ROI the voxels belong to and on their task preference profiles, as described below, and the strength of the correlations between their time courses studied.

Resting state functional coupling
The second form of functional coupling investigated was the coupling strength between voxel pairs during rest, i.e., independent of task performance. For this measure we used the 10 min resting state fMRI dataset. This data was preprocessed as described above, retaining only the low frequency (.1e.01 Hz) fluctuations in the unsmoothed BOLD signal. For each of the selected voxels, a voxel-by-voxel low frequency time course correlation matrix was computed using the Pearson correlation coefficient. The correlation matrix was Fisher-Z transformed to correct for the non-normal c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 distribution of correlations. From this matrix subgroups of voxel pairs were selected, based on their ROI belonging and task preference, in order to investigate a relationship between task preferences of the voxels and the strength of their rest functional coupling (see Section 2.6.4).

Relating functional coupling of voxel pairs to their task preferences
The dependency of the functional coupling between voxels on task preferences was investigated for both types of coupling data in the same way. The analysis took place at the level of individual participants. The results from these analyses were then integrated at the group level to draw conclusions across participants. Voxel pairs were selected from the correlation matrix by selecting one voxel position on each matrix dimension, and their coupling strength was read from the matrix cell defined by these two positions. Voxel pairs were grouped into categories by selecting from the correlation matrix voxel pairs with specific features, such as the task preference type of the voxels in the pair and the ROI to which they belonged. For instance, in the basic comparison between same preference and different preference voxel pairs, first all voxel pairs are selected in which both voxels have the same preference type. Next, all voxel pairs are selected in which one voxels has that same preference and the other voxel has a different preference type. The average coupling strength in these two groups of voxel pairs are then computed for each participant, and statistically compared with a one-tailed paired samples t-test, to test the hypothesis that across participants same preference voxel pairs show a stronger coupling than different preference voxel pairs. To give an impression of the size of the reported effects, for some critical comparisons Cohen's d is reported.

ROI-based multivariate pattern analysis
Multivariate pattern analysis allows establishing whether different tasks induce reliable spatially dissimilar activation patterns in a particular brain structure. This is the minimal requirement to support the claim that the brain structure contributes to cognition. First, we investigate this requirement for the lateral prefrontal cortex and the striatum as a whole. Secondly, we examine whether some parts of the striatum, and the lateral prefrontal cortex, are more important for representing task contexts. A third line of analyses looks for habituation and/or learning induced changes in the task representations. Such temporal changes suggest that striatum's contribution to cognition is more limited in time, which weakens the claim that it has a central role in cognition. Finally, we present some control analyses that strengthen confidence in the results. They deal with the choices of MVPA parameters, the signal quality in the striatum, and the contribution of task difficulty effects in the reported results.
3.2.1. Decoding task contexts in lateral prefrontal cortex and striatum The results of task context classification in the bilateral lateral prefrontal cortex voxels are summarized in Fig. 3A. Significant above chance classification was observed in all 12 participants, with overall accuracies (i.e., averaged over the three pairwise task comparisons) ranging between 67% and 83%. Averaged across the participants, the overall accuracy was 76.1 ± 4.4%, which was significantly above chance [t (11) ¼ 25.7, Cohen's d ¼ 7.43, p < .001]. A comparable level of accuracy was also obtained for each pairwise comparison alone (Fig. 3A).
Classification results for the bilateral striatum are presented in Fig. 3B. The decoding of task contexts from the activity of voxels in the striatum was clearly more difficult compared to classifying from the lateral prefrontal cortex voxel's activity. Nonetheless, the classifier was able to classify task contexts significantly above chance level in 8 out of 12 participants (cumulative binomial p ¼ 1.61 Â 10 À8 ). The overall accuracy over the three pairwise classifications ranged from 53.4% to 64.9%. Averaged across participants this yielded an overall accuracy of 59.9 ± 4.1% [t (11) ¼ 5.9, Cohen's d ¼ 1.70, p < .001]. The accuracies for each of the pairwise comparisons separately were at a comparable level (Fig. 3B), and at the group level all three were significantly different from chance (lowest t value was 4.1, p < .001). This establishes that classification of task contexts from striatal voxel activity is possible. However, the level of accuracy was significantly lower than for the lateral prefrontal cortex voxels [smallest t was observed for Backmatching versus Switching: t (11) ¼ À7.4, 2-tailed p < .001].
The reported results are based on the best iteration of the recursive feature elimination procedure. The fluctuations in accuracies throughout all the 20 iterations of the recursive feature elimination process are presented in Fig. 3C for the first three participants included in the sample. Data are from the bilateral striatum ROI. The mostly flat curves in the graphs indicate that the decoding of task contexts is relatively stable over the range of voxel counts. This means that the taskcritical information is not confined to a specific subset of the voxels but contained across a large range of the voxels in the ROI. As a consequence, the best accuracies reported for each analysis represent only small upward fluctuations in this more or less stable accuracy level.
To appreciate the distribution of voxels contributing to the classifications in each ROI, we created voxel importance maps for two participants. These maps show the 10% voxels that contributed most to the classification in each pairwise classification in the IFS and the striatum ROIs ( Fig. 4A and B, respectively). The most important voxels were evenly spread c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 throughout each of the ROIs, suggesting that a widely organized assembly of voxels is specifically engaged in one or other of the three tasks.

Spatial stability of decoding
To establish whether there is spatial variation in the classification of tasks contexts, we did a series of classifications using sub-ROI masks. First we investigated whether there was any evidence of a lateralization in the representations of the task contexts. Including only voxels from the left or the right striatum did not affect classification accuracy. The accuracy across the three pairwise classifications was 58.9 ± 3.8% for the left striatum and 59.0 ± 4.1% for the right hemisphere, compared to 59.9 ± 4.1% for the bilateral mask. Neither of these was statistically different [largest t (11) ¼ 1.3, 2-tailed p ¼ .117]. Very similar results were obtained for the separate pairwise classifications. In fact, only the pairwise classification between Switching and Eriksen showed a significant drop in accuracy when the right hemisphere striatal mask (59.5 ± 4.9%) was compared to the bilateral mask (61.6 ± 6.0%) Fig. 3 e Pairwise SVM classification of Eriksen, Backmatching and Response Switching trials with leave-run-out crossvalidation, for the bilateral lateral prefrontal cortex (A) and for the bilateral striatum (B). Reported are best accuracies obtained in a recursive 20% feature elimination procedure with 20 iterations. Overall accuracies ("All three comparisons" in panel A and B) are the average across the three pairwise comparisons for each participant. One-tailed repeated measures ttest: **p < .001. Error bars for the chance data represent ±1 SD over 125 randomizations of labels within each participant and then averaged over the 12 participants. C. Accuracies of pairwise classification in the bilateral striatum as a function of the progression in the recursive feature elimination procedure for the first three participants (out of 12) included in the study. Arrows demarcate the iteration at which the reported best accuracy was obtained for each pairwise classification. c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 Fig. 4 e Spatial distribution of discriminative patterns. Voxel importance maps showing the 10 pct. voxels with the highest classifier absolute weights in a particular pairwise classification of two representative participants: Eriksen versus Backmatching (red), Backmatching versus Switching (blue) and Switching versus Eriksen (green). The maps are overlaid on each participant's anatomical image, showing the Inferior frontal sulcus (IFS) at a sagittal section in the right hemisphere, at x ¼ þ54 mm (A), and the striatum in a more posterior (y ¼ þ4 mm) and anterior (y ¼ þ12 mm) coronal section. In both the IFS and the striatum the voxels are evenly spread through the ROI, with only moderate overlap. White outline in A delineates the IFS ROI; White line in B marks boundary between the ventral striatum and the rest of the striatum.
Lastly, we also looked at lateralization effects in decoding within striatal sub-regions, as right-ward lateralization could be expected given the visuospatial nature of the tasks. Alternatively, since only right hand responses were required, a leftward bias might emerge, particularly in the more motorrelated putamen. In a preparatory ROI-based group analysis some evidence for small lateralization effects in task-related BOLD activity was found. First, in the putamen ROI the left side showed significantly more active voxels across the three tasks compared to the right side [F(1,66) ¼ 5.51, p ¼ .0219], consistent with the right-hand motor action. Second, a small left-sided dominance was also seen in the dlPFC, where the BOLD amplitude (but not the number of active voxels) was larger on the left than on the right side [F (1,66) ¼ 5.82, p ¼ .01865]. This may reflect the association of the dorsolateral prefrontal cortex (BA46) with a motor coordination network, together with the ventral premotor cortex, the frontal eye fields and the medial motor-related areas (Goulas et al., 2012;Miller & Cohen, 2001;Petrides, 2005). Nonetheless, follow-up analyses of classification accuracies in separate left-and right-sided subROIs did not reveal significant differences in classification accuracies between the left and the right-side ROIs [largest t-value for overall-classification: t (11) ¼ À.220, 2-tailed p ¼ .83]. We can conclude, therefore, that classification results do not necessarily provide the same information as the univariate BOLD activity results. Despite a difference in BOLD activity strength observed in the putamen and the dlPFC in each hemisphere, the activity measured in these voxels contained equally precise information regarding which context was being processed.

Temporal stability of decoding
The hypothesis that striatum actively contributes to cognitive decisions implies that this contribution must be stable over time. Hence, task-specific activation patterns should be as clearly differentiated near the end of the experiment as they are at the beginning. Temporal decrease in striatal task classification accuracy, compared to the that in lateral PFC, could arise if training and automatization effects reduce the contribution of the striatum to the execution of the task, or because they cause a shift in the voxels that contribute to the task context representations. Possible temporal change in the voxels contributing to classification was investigated by splitting the data in temporal chucks and comparing the decoding accuracy over time. First, to investigate whether classification reduced over time, we directly compared the classification accuracies obtained from trials in the first three and in the last three runs. Secondly, to estimate changes in the pattern of voxels carrying the classification, we tested whether the multivariate patterns estimated from the trials in the first two runs allowed better classification of trials near in time (i.e., from runs 3 and 4) than farther in time (i.e., from runs 5 and 6). A mere time on task effect would be suggested by a reduced accuracy but preserved generalization, whereas a qualitative change in the patterns would be indicated by a preserved accuracy over time but a low generalization.
Because of the smaller numbers of examples in these analyses we switched from the leave-one-run-out to a leave-two-examples-out cross-validation procedure. Moreover, to balance the number of voxels in the cortical and striatal ROIs we chose to work with the bilateral IFS ROI, instead of the entire lateral prefrontal cortex ROI, and compare it with the bilateral striatal ROI. The results of the split-half analysis are summarized in Fig. 6A. As expected (Etzel et al., 2011), the accuracies obtained in the striatum with the leave-two-out cross-validation method (overall mean accuracy 81.2 ± 3.7%) were much higher than those obtained with the leave-onerun-out validation method (59.9 ± 4.1%). More importantly, there was no evidence for a time on task effect. The overall accuracies for classifying task contexts in the second half of the data was not lower than in the first half of the data, in the lateral prefrontal cortex or in the striatum [the t-values were À1.0 (Cohen's d ¼ À.29) and À.8 (Cohen's d ¼ À.23), respectively, p > .1]. There was also no evidence of reduced decodability over time in the separate pairwise classification analyses. In fact, classification of Backmatching and Switching trials was somewhat better in the last three runs, both for the IFS voxels [85.3 ± 4.3% and 88.3 ± 6.2%, respectively, t (11) ¼ À2.0, p ¼ .035] and for the striatum voxels [79.8.3 ± 4.0% and 82.0 ± 2.8%, respectively, t (11) ¼ À2.
There was also no support for a dynamic change in the discriminative activity patterns over time (Fig. 6B). For both ROIs the discriminative patterns learned from trials of the first two runs were no more effective in classifying trials from the immediately following runs as they were in classifying the temporally remote trials from the last two runs [the t-values for overall accuracies were À.4 (Cohen's d ¼ À.11) in the striatum and .6 (Cohen's d ¼ .16) in the IFS ROI, p > .1]. The results from the three separate pairwise analyses were fully in line with this [highest t-value: t (11) ¼ 1.1, p ¼ .148].

Control analyses
MVP parameter dependency. To check for the dependency of the result on the specific parameter settings in this analysis we also repeated the bilateral striatum classification using a set of different parameters. The number of iterations in the recursive feature elimination procedure was increased from 20 to 50, while the percentage of features left out at each iteration was reduced from 20% to 10%. This yields a much finer grained search for discriminative patterns. Moreover, the number of splits was increased from 5 to 15 to further reduce the chance of overfitting. The consequence of these changes is that the execution time becomes substantially longer. The classification accuracy in the striatum increased from 59.9 ± 4.1% to 60.4 ± 3.7 under these more thorough conditions, which was not significantly different [t (11) ¼ 1.0, 2-tailed p ¼ .329]. For each pairwise classification of task contexts, the increased scrutiny led to a significant increase in classification accuracy only for the decoding of Eriksen versus Backmatching, which increased from 59.8 ± 5.8% to 60.8 ± 5.7% [t (11) ¼ 3.2, 2-tailed p ¼ .009]. We furthermore repeated the bilateral striatum analysis (with the initial parameter choices) using the regularized logistic regression based classification algorithm as implemented in the Princeton MVPA toolbox (https://pni.princeton.edu/pnisoftware-tools/mvpa-toolbox). The overall classification accuracy with this algorithm was 60.5 ± 5.30%, which was not significantly different from the 59.9 ± 4.1% obtained with the SVM algorithm [t (11) ¼ .6, 2-tailed p ¼ .530]. There was also no significant difference for any of the three pairwise task context comparisons [largest t-value for Backmatching versus Switching: t (11) ¼ 2.0, 2-tailed p ¼ .067].
Block-wise analysis. To check whether the reported findings are dependent on the single trial quantification of voxel activity, the main analyses were repeated with estimates of task-specific activity assessed over all five trials in a block, instead of for each trial separately. Per block of trials, epochs of 13 consecutive volumes were created, starting with the volume during which the task cue was presented. The epochs were detrended as described for the single trial epochs (see Methods, Section 2.5.1). The estimated activity strength was the average residual value over five successive volumes, starting with the 6th volume in the epoch, i.e., after 2 volumes (4 sec) of cue time and 3 volumes (6 sec) of hemodynamic response delay. This yielded 6 response estimates per task and per run, and a total of 36 estimates per task. For the bilateral lateral prefrontal cortex ROI, the average classification accuracy across participants obtained with these blocked estimates was 71.64% [range 65e78%; t (11) ¼ 17.5, p > .0001]. For the bilateral striatum ROI, the average task classification accuracy using blocked estimates of task activity was 60.7% [range ¼ 52e68%; t (11) ¼ 4.3, p ¼ .0006]. These results are very similar to those obtained with the single trial analyses.
ROI-specific signal quality. The lower classification accuracies for the striatum ROIs compared to the lateral prefrontal cortex ROIs cannot be explained by lower signal quality due to the deep location in the brain of the striatum, compared to the location near to the surface and the imaging coil of the cortical ROIs. While the ventral striatum ROI did have a lower temporal signal-to-noise ratio (tSNR) than the other four ROIs, the tSNR in the other two striatal ROIs was similar to and if anything somewhat higher than the tSNR in the cortical ROIs (see Supplementary information, Section SI1). Hence, the lower accuracies in the ROIs do not co-vary with these tSNR measurements.
Task difficulty effects. An alternative explanation for the observed task context decoding is that the different activity patterns merely reflect differences between the three tasks in the amount of cognitive effort that they require. While it is problematic to directly match the difficulty level of different cognitive tasks, there are several lines of argument that together render unacceptable the suggestion that the current results are driven by differences in the difficulty level of the three tasks (See Supplemental information, Section SI2 for a more elaborate discussion of this point). First, as is evident from the voxel importance maps in Fig. 4, unique sets of striatal voxels contribute to the task classifications. This speaks against an underlying single factor. Second, if cognitive effort plays a crucial role, the classification accuracies can be expected to decline over time due to reduced effort following increased training. Such a decline has been reported for decoding of specific task-relevant features Results presented are at the group level and pertain to overall accuracies (i.e., averaged across the three pairwise analyses). A) Comparison of leave-two-out classification accuracies when using only trials from the first three (full fill) or only trials from the last three runs (striped fill) trials. There was no significant decline from the first to the last three runs in either ROI. B) Mean generalization accuracies for training the classifier on examples from the first two runs and testing either on the examples from runs 3 and 4 (full fill), or on the examples from runs 5 and 6 (striped fill). There was no difference between generalization to runs 3e4 and to more remote runs 5e6, for either ROI. Error bars indicate þ1 SD. IFS ¼ bilateral inferior frontal sulcus mask; striatum ¼ bilateral striatum mask. Two-tailed repeated measures t-test: ns. ¼ not significant. c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 (Waskom et al., 2014;Woolgar et al. 2011Woolgar et al. , 2015. The lack of accuracy decline over six runs in the present study means that task-specific representations are strong regardless of the level of practice, automation, or effort (Waskom et al., 2014). Thirdly, if a one-dimensional factor such as difficulty underlay the classification, the classifier trained to discriminate a more easy from a more difficult task would also be able to classify a third task as being either one or the other, provided the difficulty level of the third task is not in between those of the two training tasks. We found no evidence for this type of generalization across task contexts (See Supplemental information, Section SI2). Lastly, if task difficulty drove the classification, then the trained classifier would be at a loss when fed with activation patterns from trials without the difficulty manipulation (i.e., no incongruent distractors in Eriksen, no two-back matching in Backmatching, and no incongruent response schemes in Switching). In contrast to that expectation, we found that training on trials with the difficulty component and testing on trials without these components did not disrupt classification (See Supplemental information, Section SI2). Based on these arguments, the conclusion is warranted that the differential activations observed in the striatum during execution of the tasks reflect differences in the representation of task context elements, and not just effort-related differences, that could distinguish these three tasks.

Task-dependent functional coupling between voxels
Our second main question was whether the taske discriminative activity patterns in the striatum are integrated in the cognition network. To address this question we studied the functional coupling between voxels in the two sub-ROIs of the dorsal striatum and in the two sub-ROIs of the lateral prefrontal cortex. More specifically, we investigated whether voxels in these ROIs with a preferential response during one of the three tasks show stronger functional coupling than voxels preferring the other two tasks. If so, these same preference voxels can be seen as forming a functional subnetwork across these structures that supports the execution of that preferred task. The task preferences of the selected voxels were established using the six runs of the task-related functional data, following established procedures (Stiers et al., 2010;Stiers & Goulas, 2018), as explained above (see Section 2.6.1 in the Methods). The distribution of voxels over the six possible task preference types is summarized in Fig. 7A. Averaged over participants, the number of voxels with a significantly different response across the three tasks in the lateral prefrontal cortex ROI (i.e., IFS and dlPFC) was 2731 ± 712, or 51.4 ± 13.1% of the available voxels. The number of voxels with a significant task profile in the dorsal striatum (i.e., caudate and putamen) was 561 ± 314 out of 2025 ± 295 available voxels, or 27.2 ± 14.3% of all voxels. This means that about half of the voxels in the lateral prefrontal cortex ROIs responded significantly different across the three tasks, while this was only around a quarter of all voxels in the striatum ROIs. As can be seen in Fig. 7A, the selected voxels are rather evenly distributed over the six prototypical preference types, except for the voxels preferring Backmatching (i.e., [0 1 0]) and those preferring Backmatching and Switching (i.e., [0 1 1]), which were most frequent in the lateral prefrontal cortex.
If execution of a specific task depends on a unique subnetwork of interconnected neuron populations within and across the nodes of the cognition-related network, than it follows that voxels in remote areas that are more active during that specific task, and hence share a task preference, should show a stronger functional coupling than voxels preferring different tasks. This hypothesis has been confirmed previously for cerebral cortex ROIs (Stiers et al., 2010;Stiers & Goulas, 2018;Waskom & Wagner, 2017). To extend this principle to the striatum, we quantified BOLD signal variation in the peri-and post-trial period and computed their correlation between the selected voxels. More specifically, we asked whether the correlation strength varied depending on the task preference of the voxels. The results for voxels with a significant preference profile favoring one task over the other two (mono-preference voxels) are summarized in Fig. 7B. Voxels preferring the same task exhibit a significantly stronger coupling between their response amplitude fluctuations than voxel pairs preferring a different task. This was the case for voxel pairs with one voxel lying in dlPFC and the other in IFS [t (11) ¼ 6.7, 1-tailed p < .0001], and also, but less strongly, for voxel pairs spanning the putamen and caudate nucleus [t (11) ¼ 1.9, 1-tailed p ¼ .0450]. Most importantly for the current research question, it was also clearly the case for voxel pairs straddling the cortical and in the striatum ROIs [t (11) ¼ 4.8, Cohen's d ¼ 1.39, 1-tailed p ¼ .0003]. This effect was found significant in the individual data of 11 out of 12 participants (cumulative binomial p ¼ 5.59 Â 10 À14 ). The stronger functional coupling between cortex and striatum in same preference profile voxel pairs was also evident when data from each task were analyzed separately [smallest effect for Switching: t (11) ¼ 3.6, Cohen's d ¼ 1.03, 1-tailed p ¼ .0021]. The results were also similar when in addition to mono-preference voxels also voxels were considered that preferred two tasks (e.g., [1 1 0], etc.) [t (11) ¼ 4.5, Cohen's d ¼ 1.29, 1-tailed p ¼ .0005].
To investigate the temporal stability of the task-specific couplings between the cortex and the dorsal striatum we also quantified functional coupling strength between the selected voxels when participants were at rest. By correlating the low-frequency signal fluctuations of individual voxel in a pair, we investigated whether the coupling strength was influenced by the taskepreference profiles of the voxels making up the pairs. The resting-state functional connectivity analysis for voxels with a significant activation preference for one task over the other two (mono-preference profiles) is summarized in Fig. 7C. When considering voxel pairs with one voxel located in the lateral prefrontal cortex and the other voxel in the dorsal striatum, we again find that pairs were both voxel have the same task preference show a significantly stronger low-frequency functional coupling compared to similar voxel pairs with different task preferences [t (11) ¼ 3.4, Cohen's d ¼ .98, 1-tailed p ¼ .0029]. The effect reached statistical significance (a ¼ .05) in 9 out of 12 participants (cumulative binomial p ¼ 3.74 Â 10 À10 ). The same result was also found for voxel pairs linking different subROIs of the lateral prefrontal cortex [t (11) ¼ 5.2, 1-tailed p ¼ .0001], as demonstrated before (Stiers & Goulas, 2018). In contrast, this same outcome pattern did not reach significance for voxels located c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 in different sub-ROIs of the striatum [t (11) ¼ .8, 1-tialed p ¼ .2273]. Similar results were also found when also voxels with a preference profile favoring two tasks were included in the analysis [For voxel pairs linking lateral PFC and dorsal striatum: t (11) ¼ 2.5, Cohen's d ¼ .73, 1-tailed p ¼ .0141].

Discussion
The results presented above support the idea that the striatum is an integrated part of the cognition network. Firstly, during cognitive performance, subsets of voxels distributed throughout the striatum show task-specific BOLD signal modulations that are reliably articulated over time. To the extent that BOLD modulations reflect at least post-synaptic potentials of neurons involved in task-relevant processing (Logothetis et al., 2001), this result suggests that a widely distributed, unique assembly of striatal neurons receives information that is relevant for a specific task. That these neurons exhibit temporally stable and task-specific functional couplings with similarly unique assemblies of lateral prefrontal cortex neurons further suggests that these local striatal assemblies are integrated into task-specific subnetworks that run through the macroscopic large-scale cognitive brain network.

Lower decoding accuracies in striatum than cortex
The fact that task context decoding yields significantly lower accuracies in the striatum than in the lateral prefrontal cortex mask (even after matching voxel size), can have different reasons. Control analyses exclude that it is due to lower signal quality (see Results, Section 2.4 and Supplementary information, Section SI1). One possible explanation is that striatal representations related to task contexts are potentially more dynamic than those of the lateral prefrontal cortex. It has been suggested that basal ganglia are particularly involved during initial phases of task performance (Helie et al., 2015;Muhammad et al., 2006). This would be in line with its commitment to habit formation (Balleine & O'Doherty, 2010;Graybiel & Grafton, 2015) and to training the neocortex for more automated task execution (Helie et al., 2015). Alternatively, there may be a shift over time in the particular region of the striatum that is engaged by a task as learning progresses. Such an activity shift from caudal to more rostral putamen has been observed during motor sequence learning (Reithler et al., 2010). Dynamic changes in the engagement of basal ganglia during cognition would diminish the accuracy of classifying task contexts, particularly across temporally remote tasks trials. Contrary to this prediction, we did not find a reduction in accuracies with time on task, not in the cortex and also not in the striatum. Moreover, we found no evidence in either ROI that the discriminative patterns generalized better to trials from runs near in time compared to trials from temporally more remote runs. Therefore, the lower accuracies observed in the striatum cannot be attributed to higher temporal dynamics in the task-specific activity patterns. Another possible explanation is the different cytoarchitectonic structure of the striatum compared to the prefrontal cortex. In contrast to the highly structured organization of the Fig. 7 e Voxel-wise functional coupling strength in relation to task preference of voxels. A) Frequency distribution of task preference profiles. Percentage of voxels across participants that was assigned to each of the task preference profiles in the bilateral lateral prefrontal cortex (black bars: IFS þ dlPFC) and in the bilateral dorsal striatum (white bars: caudate nucleus þ putamen). Preference type [1 0 0] groups voxels with a higher BOLD amplitude for the Eriksen task than for Backmatching or Switching, etc. (See Methods, Section 2.6.1 for further explanation). B) Functional coupling strength during task execution in voxels with preference for one specific task (i.e., [1 0 0], [0 1 0], and [0 0 1]). Average Fisher-Z transformed Pearson correlation between pairs of voxel that prefer the same task (light gray bars) compared to pairs of voxels preferring a different task (dark gray bars). The two voxels in each pair are always located in a different ROI, either in different sub-ROIs of the lateral prefrontal cortex (left), different sub-ROIs of the striatum (middle), or in a sub-ROI of lateral prefrontal cortex and striatum (right). C) Functional coupling during rest, unrelated to task execution. The graph structure and data presentation as the same as in B. Error bars indicate ±1 SE. One-tailed repeated measures ttest: *p < .05, **p < .005. c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 cortex in layers, columns, and stripes, the striatum's internal structure is dense in overlapping dendritic arborization. It has been estimated that approximately 380,000 cortical neurons innervate the dendritic area of a single medium spiny neuron (MSN) (Zheng & Wilson, 2002). Each of the cortical collaterals traverses a substantial portion of the striatum, making a few synaptic contacts with a large number of medium spiny neurons (and interneurons). Consequently, a highly unique pattern of divers input is required to activate each MSN (Wickens & Arbuthnott, 2010;Zheng & Wilson, 2002). Moreover, the output of MSNs is largely controlled by the sparse fast spiking interneurons that make strong perisomatic GABAergic connection to hundreds of surrounding MSNs (Koos & Tepper, 1999;Planert et al., 2010). These interneurons share with MSNs a largely common input from areas outside the striatum, but with a greater responsivity due to a larger number of synaptic contacts of each afferent fiber to each interneuron (Tepper, 2010). Consequently, the spatial segregation of changes in post-synaptic potentials induced by incoming signals during specific cognitive tasks may be less articulated in the striatum compared to the prefrontal cortex.

Involvement of striatal sub-regions in cognition
Different subparts of the larger ROIs contained sufficient taskspecific information to allow above chance decoding. For the lateral prefrontal cortex this is not surprising, because both the posterior and the dorsolateral sub-regions have already been demonstrated to allow classification of task relevant information (Waskom et al., 2014;Woolgar et al. 2011Woolgar et al. , 2015Woolgar et al. , 2016. A less expected finding was that all three subdivisions of the striatum also contain sufficiently distinct activity patterns in response to the three tasks. That the putamen supports task-execution could be expected. It is safe to assume that the tasks were well trained. Under these circumstances, the automated execution of the tasks is particularly carried by the more motor specific cerebral cortex and by the putamen (Reithler et al., 2010). It is somewhat surprising, therefore, that when classification was confined to putamen voxels, this led to a significant (albeit small) reduction in accuracy. One reason could be that the three tasks were more similar in the required motor responses. The fact that the decoding decrease could be attributed to the Eriksen versus Switching classification supports this hypothesis: in both these tasks the left and right key press responses reflected directional information. The caudate nucleus allowed for the highest classification accuracy, indistinguishable from the decoding level obtained for the striatum as a whole. This result suggests, together with the classification success obtained in the dlPFC subROI, that classification is not merely driven by motor planning and execution processes in the cortical-striatal motor loop. Our result agrees with the view that caudate nucleus, together with the lateral prefrontal cortex, constitutes the "cognitive" basal ganglia loop (Alexander et al., 1990;Haber, 2016). It is also in line with the finding reported by Ito and Doya (2015) that rat dorsomedial striatum has the highest proportion of active neurons during the task phase where the animal has to select the appropriate action. Least expected, however, was our finding that the ventral striatum contains as much task context specific information as the putamen. The ventral striatum is considered the input nucleus for the "limbic" or motivational loop through the basal ganglia, which is implicated in reward processing, reward based learning, and addiction (Haber & Knutson, 2010). Therefore, it may not be a surprise that VS is engaged during cognitive tasks from the perspective of neurophysiological studies in animals, in which rewards are an essential ingredient of the task paradigm, even in experiments that focus on neurocognitive processes. In these studies encoding of taskrelevant features has been reported in all three major nuclei of the striatum (Ito & Doya, 2015;. Paradigms to study cognition in humans typically do not include rewards, and in line with this fMRI studies of cognition rarely report activity in brain structures implicated in the reward system, such as the orbital and medial prefrontal cortex or the ventral striatum. However, this may be due to insensitivity of the mass univariate statistical approach employed in most fMRI studies. For instance, Daniel and Pollmann (2010) found that unrewarded cognitive trials did induced a BOLD response in the ventral striatum during task execution that correlated with measures of intrinsic motivation of the participants and that was similar to the larger response induced by trials that were rewarded with money, but without reaching statistical significance. This result suggests that purely informative feedback also has a reward value Pollmann 2010, 2014). In line with this, electrophysiological signals from a deep brain electrode implanted in the nucleus accumbens in patients with affective disorders showed frequency band specific coupling with frontal and parietal electroencephalogram signals during execution of a non-rewarded visual attention task. If it is true that cognitive feedback has a reward value, the limbic loop should be engaged during pure cognitive tasks and might be responsible for the adjustments in motivation and network efficacy following errors (Sarter et al., 2006). Whatever the content of the representations in VS, the modulations in ventral striatal activity induced by each task are sufficiently reliable and distinct to allow the classification of trials from their activity alone.

Subnetworks in cognition
The functional coupling analyses support our hypothesis that the striatum is integrated in the cognition network, because it shows that the subgroups of striatal voxels that are more engaged by a specific task have a reliable functional coupling with cortical voxels that are more active during the same task. These functional couplings persist beyond the time window of task execution, as they are also observed when participants are at rest. That neuron populations in distant regions form task-context specific assemblies has been shown in multiple cell recoding data. For instance, Buschman et al. (2012) found differences in beta-band synchronization between recording sites in the macaque lateral prefrontal cortex depending on which task rule was in operation. Frequency-band specific synchronization was also seen between the cortex and the striatum. Antzoulatos and Miller (2014) found in multielectrode recordings in monkeys that learning a two-class categorization of visual stimuli lead to an increase of the beta-band synchronization between lateral prefrontal cortex c o r t e x 1 5 6 ( 2 0 2 2 ) 1 0 6 e1 2 5 and caudate nucleus in the time window before responding. Moreover, after learning stronger beta-band synchrony was observed in local field potentials between specific pairs of prefrontal and striatal electrodes for presentation of one or the other category. These emerging cortical-striatal couplings could provide a neural basis for the subnetworks in BOLD coupling observed between lateral prefrontal cortex and striatum in the current study. However, our finding that the BOLD-signal couplings persist in the low frequency signals during rest suggests that they have stability over time. Lowfrequency activity coupling during rest is thought to reflect structural connections between neuronal populations that are established by repeated use (Margulies et al., 2009;Miranda-Dominguez et al., 2014). In this respect, it is worth noting that in nine of our 12 participants the resting state data were collected at the beginning of the second scanning session, which was after an interval of 5e15 days in eight of them. This suggests a longer temporal stability of the coupling patterns. Our study leaves it open for discussion what aspect of a task is represented by the neuron populations in different brain structures that make up the subnetwork for executing that specific task. We used three non-overlapping cognitive tasks, each with different stimuli, rules, cognitive demands, and response conditions. Any of these can have contributed to the classification of tasks, in some or all of the ROIs investigated. For the current purpose of establishing whether striatum is part of the subnetworks involved in cognition, it was sufficient to show that striatum is differentially engaged by, and preferentially coupled with cortical voxels processing specific concrete tasks. The conclusion from this study warrants future studies, in which carefully designed task paradigms allow disentangling what aspects of the tasks are represented by different groups of voxels in different subregions of the striatum, and whether further differentiation of cortical-striatal subnetworks is possible, specialized for processing specific aspects of the tasks. For instance, given the well-established differentiation of the corticostriatal circuits into functional sub-systems (Alexander et al., 1990;Haber, 2016), one could hypothesize that anterior frontal cortex and ventral striatum are stronger engaged by processes related to stimulus selection (Rushworth et al., 2012), lateral frontal cortex and caudate head represent task aspects relevant for behavioral goal selection (Hoshi, 2013;Yamagata et al., 2012), while medial motor cortex and putamen are typically involved in representing information relevant for action selection (Rushworth et al., 2012). Because the three tasks differ in these three aspects, the contribution of the three sub-ROIs in the striatum might just reflect such a functional differentiation. With these more detailed questions in mind, and the weaker decodability of task contexts in the striatum, future fMRI studies of basal ganglia's contribution to cognition might benefit considerably from a higher spatial imaging resolution. The current findings were obtained with data acquired at a resolution of 3.5 Â 3.5 Â 4.0 mm. At this relatively low resolution the voxel-to-voxel differences in activation during execution of a task is already large enough to obtain the results presented here. However, because these differential activations are thought to reflect variation in density of specialized neuron populations imaged by each voxel, smaller voxels will increase variations in neuron population densities across voxels. Smaller voxels imaged with the same SNR quality are therefore likely to yield larger differential activations of voxels in an active ROI, with higher detectability of differentiating patterns across and functional coupling between these voxels.

Conclusion
Taken together, the results presented support the hypothesis that cognition is not arising from a cortical network, but that its neural implemented is carried by cortical-striatalthalamic-cortical circuits. Furthermore, our data suggest that this implementation involves all the functional loops of this circuitry, not just the motor or cognition subsystems. This means that cognition must be conceived as a complex unity of motivational, cognitive, attentional and motor components that together constitute meaningful goal-oriented behavior. Lastly, our findings indicate that the conceptualization and examination of the brain as a set of areas and nuclei forming large-scale networks must be coupled with a more nuanced organization at the sub-areal, sub-nuclei level. This organization is already discernible at the voxel level in low field imaging, thus laying down specific hypothesis that will be better exploited with state-of-the-art high-field imaging.

Ethics approval
All procedures performed in this study have been approved by the local ethics committee of the institute and are in agreement with the 1964 Helsinki declaration and its later amendments.

Funding
Not applicable.

Open practices statement
No part of the study procedures and study analyses was preregistered prior to the research being conducted. We used established analytic methods for data analysis, the details of which are provided in the text. Stimulus presentation was implemented in Presentation software (www.neurobs.com). The presentation code no longer exists, however, due to backup hardware failure. The brain masks and custom code for analyzing data are available at https://osf.io/5sfc7, or to those interested after written request to the corresponding author.
No collected data were excluded.
The conditions of our ethics approval do not permit public archiving of individual anonymized study data. Readers who want access to the data should send a written request to the corresponding author, which should include a statement that the data is used for research purposes only, without commercial intentions, and that the current paper is referenced when the data is used in a scientific publication.

Consent to participate
All participants gave their informed consent to participate in the study.