Neurodevelopment of the incentive network facilitates motivated behaviour from adolescence to adulthood

: The ability to enhance motivated performance through incentives is crucial to guide and ultimately optimize the outcome of goal-directed behaviour. It remains largely unclear how motivated behaviour and performance develops particularly across adolescence. Here, we used computational fMRI to assess how response speed and its underlying neural circuitry are modulated by reward and loss in a monetary incentive delay paradigm. We demonstrate that maturational fine-tuning of functional coupling within the cortico-striatal incentive circuitry from adolescence to adulthood facilitates the ability to enhance performance selectively for higher subjective values. Additionally, during feedback, we found developmental sex differences of striatal representations of reward prediction errors in an exploratory analysis. Our findings suggest that a reduced capacity to utilize subjective value for motivated behaviour in adolescence is rooted in immature information processing in the incentive system. This indicates that the neurocircuitry for coordination of incentivised, motivated cognitive control acts as a bottleneck for behavioural adjustments in adolescence. The ability to enhance motivated performance through incentives is crucial to guide and ultimately optimize the outcome of goal-directed behaviour. It remains largely unclear how motivated behaviour and performance develops particularly across adolescence. Here, we used computational fMRI to assess how response speed and its underlying neural circuitry are modulated by reward and loss in a monetary incentive delay paradigm. We demonstrate that maturational fine-tuning of functional coupling within the cortico-striatal incentive circuitry from adolescence to adulthood facilitates the ability to enhance performance selectively for higher subjective values. Additionally, during feedback, we found developmental sex differences of striatal representations of reward prediction errors in an exploratory analysis. Our findings suggest that a reduced capacity to utilize subjective value for motivated behaviour in adolescence is rooted in immature information processing in the incentive system. This indicates that the neurocircuitry for coordination of incentivised, motivated cognitive control acts as a bottleneck for behavioural adjustments in adolescence.


INTRODUCTION
Goal-directed behaviour depends fundamentally on the capacity to attribute significance to stimuli in the environment and adapt performance accordingly. This does not only include choosing between available options but also deciding about how much effort and speed to dedicate to an action (Dayan, 2012). While acting too slow can result in a lost opportunity, acting too fast can lead to excessive opportunity costs. Thus, optimal goal-directed behaviour is an adjustment of the speed of a behavioural response (motoric vigour, Niv et al. (2007)) as a function of subjective relevance (Manohar et al., 2015). The ability for continuous, flexible behavioural adjustments to achieve goals is supported by cognitive control systems that can selectively improve performance by integrating motivational outcome values and available resources (Kool et al., 2017;Kouneiher et al., 2009;Mir et al., 2011).
Ample evidence suggests that motivated action depends on the interactions within cortico-striatothalamic networks. The prefrontal cortex supports complex cognitive control processes including action selection, performance monitoring, and feedback-based learning (Botvinick and Braver, 2015).
In turn, striatum and insula might have opponent roles in encoding the motivational value of cues ("expected value") and prediction errors in reward and loss avoidance contexts, respectively, to guide learning of action-outcome contingencies and to facilitate the selection of a candidate action in the premotor cortex (Averbeck and Costa, 2017;Niv et al., 2007;Palminteri et al., 2012). Prospective outcome has been linked to motivation by demonstrating that motoric vigour increases as a function of an unsigned expected value (Dudman and Krakauer, 2016;Manohar et al., 2015;Niv et al., 2007;Pessiglione et al., 2007;Rigoli et al., 2016). This delineates the importance of striatal and insular modulation of action selection as a function of motivational salience. In other words, the anticipated outcome value of a specific action may facilitate its selection and its ensuing execution. The signal integrated in the striatum passes via the basal ganglia to the thalamus and the cortex, where they guide motivated behaviour (Haber and Knutson, 2010).
The ability of selective exertion of cognitive control based on prospective outcomes to improve performance has been well-documented in adults (Chiew and Braver, 2016;Locke and Braver, 2008;Pfabigan et al., 2014;Wrase et al., 2007;Wu et al., 2014). Nevertheless, studies using incentivised tasks in adolescents have yielded a complex picture of neurodevelopmental patterns and their manifestation on a behavioural level (Davidow et al., 2018). Prior work has established a functional remodelling of this key circuit in the domains of decision-making (Barkley-Levenson and Galván, 2014;Cohen et al., 2010;Hauser et al., 2015;Van Den Bos et al., 2012;Van Den Bos et al., 2015), inhibitory control (Hallquist et al., 2018;Insel et al., 2017;Somerville et al., 2011), and incentive anticipation (Cho et al., 2013;Lamm et al., 2014) across adolescence. Influential work of Ernst et al. (2006), Steinberg (2010), and Casey et al. (2011) has suggested that prefrontal cortex maturation improves exertion of cognitive control to perform motivated behaviour and self-control adaptively.
However, there are conflicting reports on behavioural manifestations of the immature adolescent control system. While some studies demonstrated improvements in behavioural performance with reward in adolescents compared to adults (Cohen et al., 2010;Geier et al., 2009), other studies investigating reward and loss processing (Bjork et al., 2010;Cho et al., 2013;Joseph et al., 2016;Lamm et al., 2014) did not find age-specific differences in either reaction time (RT), performance in inhibition (Paulsen et al., 2015;Strang and Pollak, 2014), or choices and loss aversion in decisionmaking (Barkley-Levenson et al., 2013). Recent studies reported selective performance improvements in reward and punishment contexts across development (Hallquist et al., 2018;Insel et al., 2017) that were linked to connectivity changes in the neural circuitry supporting motivation and salience processing. Those results are consistent with the idea that the deployment of cognitive resources supporting motivated behaviour emerges along with the maturation of cortico-striatal networks. These varying accounts indicate that more work is necessary to establish an enhanced understanding of the functional architecture of the cortico-striatal system that supports the integration of control and value signals to shape behaviour during incentivised processing.
Here, we performed a functional magnetic resonance imaging (fMRI) study to investigate incentive processing with varying magnitude (low, high) and valence (reward, loss) across development  years) using a well-validated monetary incentive delay (MID) paradigm (Knutson et al., 2000). We utilised a computational learning model to investigate (1) factors affecting trial-by-trial response vigour of adults and adolescents and (2) model-based brain activity and effective connectivity patterns of expected value and prediction error in reward and loss contexts. We hypothesised that the immature cortico-striatal circuitry of adolescents would demonstrate less incentive-guided behavioural adaptation, i.e. slower reaction time for high incentives compared to adults (Insel et al., 2017). Based on studies showing diminished striatal activity in adolescence during reward and loss anticipation (Bjork et al., 2010;Lamm et al., 2014), we predicted that activity in the ventral striatum (VS) correlates more strongly with the expected values in adults than in adolescents. In contrast, we predicted that increased striatal outcome sensitivity in youth (Bjork et al., 2010;Cohen et al., 2010) should result in stronger VS activity in response to reward prediction errors in adolescents. A weaker effective connectivity between the prefrontal cortex and the VS during anticipation in adolescents than in adults could be indicative of a protracted/late maturation of cortico-striatal circuits across adolescence.

Participants
We recruited a group of 67 subjects (age 21.4 ± 5.9y, age range 11 − 35 , 46 females and 21 males, 62 right and 5 left handed). Inclusion criteria comprised age 8 − 45 years and signed informed consent. Parents gave signed informed consent for subjects younger than 14 years old. Exclusion criteria comprised any MRI contraindication, pregnancy, a history of brain injury, a current psychiatric disorder, other major medical illnesses, and drug abuse. Three adolescent participants had a past diagnostic work-up for ADHD but they were currently symptom-free and were not taking any medication during the study. Data from all participants have been acquired and analyzed within the scope of a larger study including a clinical sample (Willinger et al., submitted). All participants were reimbursed for participation and informed about the opportunity to additionally win up to CHF 20 during the task. This study was approved by the ethics committee of the Kanton Zürich and was conducted in accordance with the Declaration of Helsinki.

Experimental design
In our study, we employed the Monetary Incentive Delay (MID, Figure 1) task to investigate motivational states and outcome processing (Knutson et al., 2000). This task allows to investigate incentive anticipation and the ensuing feedback processing, while minimizing possible cognitive confounds due to the simple decision processes (Oldham et al., 2018). Every trial strated with a cue indicating the level of magnitude (CHF 1, CHF 4) and the valence (reward, loss-avoidance, null) for a button press on time ("hit"). Participants were instructed to use the index finger of their dominant hand to press a button on a two-button fibre-optic response pad (Current Design Inc., Philadelphia, PA) as soon as the go-signal target symbol, a star, appeared. In total, each cue was presented 24 times (i.e. 120 trials in total, mean stimulus onset asynchrony = 8500ms, 7750 -10750ms), in two separate MRI runs. We used an adaptive algorithm that adjusted the presentation times of the target to the response time of the participant to ensure a hit rate of ~66%. The cue symbols indicating valence (square, triangle, and circle) were counterbalanced across subjects. The level of magnitude was represented by using a full symbol for high magnitude (CHF 4) and an empty symbol for low magnitude (CHF 1). All participants had a short training session outside the scanner (~2 minutes) to become familiarised with the task and we ensured that cue-outcome contingencies were understood.
The task was implemented in python (pygame, https://www.pygame.org) and presented using video goggles (VisuaStimDigital, Resonance Technology, Northridge, CA) with a resolution of 800x600px. feedback processing. In the beginning of each trial, cues indicated level of magnitude (low, high) and valence (reward, loss, null) of the possible outcome. After a variable delay a target (star) was presented, that was used as go-signal where subjects were instructed to respond as fast as possible. After the next fixation period, the actual outcome was presented for 1500ms. Altogether, the task comprised 24 trials per cue, i.e. 120 trials in total. An adaptive algorithm ensured a hit rate of ≈ 66%.
Subjective liking and arousal for rewards and losses were assessed after the MRI scan outside the scanner. Participants were presented with the amount of money they were able to win in each condition and they were asked to rate their (1) liking ("How much did you like this outcome?") and (2) arousal ("How excited were you by the outcome?") during the feedback phase of the respective outcome on a continuous scale using a slider between 0 (strongly dislike, not aroused) and 100 (strongly like, highly aroused).

MRI data acquisition and preprocessing
MRI recordings were conducted on an Achieva 3T scanner (Philips Medical Systems, Best, the Netherlands) using a 32-channel head coil array. Functional images were acquired with a multi-slice were addressed by calculating the framewise displacement (FD) of each subject across the task (Power et al., 2012). No subject exceeded a mean FD of 0.5mm ( = 0.17, = 0.08 ), however, single volumes that exceeded a FD greater than 1 were censored in the ensuing analyses by including an additional binary regressor (% volumes censored per subject = 0.65, = 1.58%).

Behavioural analysis of raw data
We performed the raw data analysis on log transformed RTs to achieve a more normally distributed data set. We conducted a linear mixed model analysis with random intercept and the five task conditions (high reward, low reward, neutral, low loss, and high loss) and age as fixed factors.
Significant main or interaction effects were subsequently analysed using post hoc Tukey tests.
Response data that deviated more than three standard deviations from the respective mean per condition and per subject were excluded from the analysis (1.90%).
Similarly, we analysed the subjective liking ratings of the monetary value in the feedback using a linear mixed model with condition and age as fixed factor, and participants as random factor. Extreme ratings were excluded (+/-3 SDs, four ratings in total). We excluded eight subjects from this behavioural analysis for not completing the subjective ratings due to time constraints.
The behavioural analysis was conducted in R (version 3.5.3, The R Foundation for Statistical Computing, http://www.r-project.org/index.html) using the package lme4. The significance level for all statistical tests of the behavioural analyses was p < 0.05, two-tailed. In mixed models, we used the Satterthwaite approximation for the degrees of freedom.

Computational learning model
We adapted the Rescorla-Wagner model (1972) to compute different signals of interest across trials.
After cue presentation, it has been observed that brain activity in dopaminergic brain regions correlate with an expected value (O'Doherty et al., 2003). During receipt or omission of reward or loss respectively, prediction errors are thought to be teaching signals that enable the adaptation of future behaviour to optimize outcome and continue to be computed even when behaviour is already highly trained (Bayer and Glimcher, 2005). To disentangle effects of loss and reward, we defined two different signals, based on the current cue. The probability of achieving a miss was ( ) = 1 − ( ) ≈ 66%.
Here, + represents an expected reward, whereas − represents an expected loss, dependent on the subjective probability for a reward and loss (1 − ) and the possible outcome .
Depending on the actual outcome in the trial, reward ( + ) and loss ( − ) prediction error signals were calculated as The update rule for the hit probability in the subsequent trial was given by where was a free parameter and corresponded to the learning rate, constrained to the boundaries 0 and 1, and represented the signed prediction error (i.e. it reflects merged + and − − ). In addition, average reward and loss at each trial was defined as: where and represent the actual and ̅ and ̅ the average reward or loss at trial t. The trajectories resulting from the learning model were then used to generate trial-by-trial predictions of logRTs in the response model ( Figure 2). (b-d) Trial-by-trial analysis of response time revealed moderators of vigour in our cohort.
Learning rate decreased across age (b), i.e. adolescents changed their predictions about expected outcomes faster. Moreover, we found an age-related increase of response vigour in trials with higher cue salience (c) and in post-error trials (d). r, Pearson correlation coefficient.

Response model
It remains largely unclear how expected value and prediction error signaling affect response vigour in reward and punishment contexts. Thus, we sought to identify the latent factors that could explain the observed response time data best by comparing five different plausible response models. All models assume that the logRT is a linear combination of individual task-related parameters and a constant term. Given the results from the raw data analysis (main effect of condition) and the results from previous work (Dudman and Krakauer, 2016), we strongly expected the values + and − to modulate response vigour in our subjects. We therefore included these terms in all response models.
In addition, we included a linear function to model any drift across task duration. We created different response models and used Bayesian model comparison for the formal assessment of additional factors affecting response vigour.
First, as previous work has shown that average reward rate is related to tonic dopamine and could boost vigour across task trials (Beierholm et al., 2013;Niv et al., 2007), we tested if average reward and loss rates were additional predictors for the logRT in the MID task (M1, equation 8). Secondly, there is the possibility that dopaminergic release by reward prediction errors affects subsequent performance (Bestmann et al., 2014). In addition, loss prediction errors might be signalled differently and could modulate vigour on the next trial through a different mechanism (Lawson et al., 2014).
Therefore, we tested if independent influence of reward and loss expected values and prediction errors on response vigour would account for the observed data better (M2, equation 9). Other research has indicated that cue salience (i.e. unsigned expected value) and novelty (i.e. unsigned prediction error) can influence dopaminergic activity (Bunzeck and Düzel, 2006). In contrast to expected value , cue salience is the unsigned quantity that represents the amount of attention a cue will draw. Here, cue salience is the absolute value | | of the potential outcome independent of the outcome being reward or punishment (Kahnt and Tobler, 2017). Similarly, the novelty parameter represents the unsigned, magnitude-dependent surprise over changes in reward or punishment contingencies, respectively.
Thus, novelty tends to be high for high-magnitude reward omissions or losses after the cue-outcome association has been established. We therefore created three additional response models where cue salience and novelty (M3, equation 10), valence-dependent expected values and novelty (M4, equation 11), or cue salience and reward and loss prediction errors (M5, equation 12) served as predictor for logRT. denotes a binary vector of trials after an error (i.e. trials where the button press occurred too slow or too early), denotes a binary vector of successive presentation of equal cues, and denotes Gaussian noise.

Behavioural model fitting and model comparison
The behavioural models were fitted to the data using the TNU Algorithms for Psychiatry-Advancing Science (TAPAS, http://www.translationalneuromodeling.org/tapas) HGF Toolbox 5.3, using a quasi-Newton optimization algorithm. Priors of the response models were set to the defaults of the TAPAS toolbox, while the prior for the learning rate was set based on previous studies that showed a low learning rate in similar tasks (Beierholm et al., 2013; Table 1).
Trials without response were excluded for the model fitting procedure. For model comparison, we used Bayesian Model Selection (spm_BMS.m) to choose the best-fitting model by comparing the negative free energies, an approximation to the log-model evidence. Herein, we report the exceedance probability (XP) of each model, i.e. the probability that one model explains the data better than the other models, and the posterior probability (PP) of each model.
Subsequently, we were interested in whether the parameters of the winning behavioural model correlate with age. We employed the procedure of Benjamini and Hochberg (1995) to control the false discovery rate (FDR; adjusted pFDR < 0.05) for the correlation analyses between age and the model parameters. As developmental trajectories might show a nonlinear pattern, we also compared if the individual model parameters followed a linear, quadratic or inverse-age function (Supplement).

Simulation analyses and parameter recovery
We performed posterior predictive checks to assess the reliability of the behavioural model by mirroring the raw data analysis with simulated logRT data to see if we can replicate meaningful effects in our data. Based on the estimated individual parameters from the best-fitting model, we ran 1000 simulations per parameter set obtained for each subject and averaged the simulated trial-by-trial logRT using TAPAS. Lastly, parameter recovery was performed by estimating parameter values from the simulation data. Model performance was assessed by evaluating correlations between the predicted parameter values and the parameter values from the observed data.

Model-based fMRI -GLM analysis
The goal of the fMRI analysis was to identify reward and loss related signals during anticipation and outcome processing that covary with age. In the first-level analysis, we created a general linear model (GLM) for each participant. The cue onsets were convolved with the haemodynamic response function, and the + and the | − | values were added as parametric modulators, representing the expected outcomes based on previous experience. Secondly, the feedback onsets convolved with the HRF were added to the model with + and − serving as parametric modulators. Note, that the neutral condition was the unmodulated case and thus the reference in both anticipation and feedback case. In addition, we added the temporal and dispersion derivatives of each regressor, and the six realignment parameters and a vector for scans with > 1 FD as nuisance regressor to the model.
The derivative terms were included to improve model fit on the first-level by decreasing the residual error for a better the identification of active voxels during the time course extraction (Cignetti et al., 2016). Finally, we applied a 1/128Hz cut-off high-pass filter to eliminate low frequency drifts.
In the random effects group analysis we conducted four multiple regression analyses, where individual contrast images for + /| − | and + / − served as dependent variable. These second-level models included the group mean, , and the interaction term × as predictors. We used t-contrasts to test the individual effects for significance. Two participants with poor behavioural model fit were included using the prior expectation of the learning model parameters. In addition, we conducted conventional GLM analyses contrasting the different conditions using cue and feedback onsets. The results of the additional analyses are shown in the supplement (Table S4-S5, Figure S4).
We report results from the whole-brain analysis using cluster-level family-wise error correction (pFWE < 0.05) with a cluster-defining threshold of (pCDT<0.001). All fMRI analyses were conducted in SPM12 (7487).

Dynamic causal modelling
To assess how these age-dependent changes emerged on a network level, we conducted a dynamic causal modelling (DCM) analysis. DCM has been demonstrated to be more capable to separate agerelated vascular from neural changes compared to functional connectivity measures (Tsvetanov et al., 2016) rendering it a useful tool for studying the developing brain.
In DCM studies, normally a model space is specified, in which individual models represent specific hypotheses about the functional architecture of the brain. The models within the model space can then differ in either the presence or absence of an intrinsic connection or the contextual modulation of a connection. However, in our study, the goal was not to find the best model structure. Instead, our goal was to assess how connectivity strengths (1) are altered under different contextual manipulations (i.e. processing + or | − |) and (2) are modulated by development, i.e. how they change across age. For this, we harnessed recent methodological improvements of DCM analysis in the framework of PEB to estimate connectivity parameters in the incentive processing circuitry. In the first-level analysis, we iteratively estimated the full model of each participant within an empirical Bayesian inversion scheme that uses the group average parameter estimates as priors for the estimations in the next iteration (Zeidman et al., 2019). After the inversion of the full model for each participant, we performed a second level analysis using a PEB model to determine the group average and the age effect for each connectivity parameter, separately for intrinsic and modulatory connections. Based on the results from the GLM analyses, we created a PEB model that included the group mean and the mean-centred age.
We performed an additional PEB analysis, in which we splitted adolescents and adults in two groups (split at age 19y, n adolescents =32, n adults =33) and report the results in Table S6. We applied Bayesian model reduction to perform an automatic search over reduced PEB models and iteratively removed model parameters that did not contribute to the evidence. Finally, we performed Bayesian model averaging of the best PEB models by averaging their parameters weighted by the model evidence. We report the posterior probabilities of the model with as compared to the model without the respective parameter. The significance threshold for the posterior probability was set to >.95. Leave-one-out cross-validation was used to assess whether the model parameters possessed predictive validity for the age of participants.
The selection of regions in each individual was guided by findings from previous studies and the results from our GLM analyses. As our main research question pertained to developmental changes of connectivity in cortico-striatal regions, we selected five regions that play a significant role in incentive processing and are hypothesised to change their connectivity patterns throughout development (Cho et al., 2013;Insel et al., 2017;Van Den Bos et al., 2015). Thus, we chose one bilateral striatal, one bilateral thalamic and three cortical regions that spanned the network of interest. The choice of the left PFC was based on findings that showed its involvement in modulating the dopaminergic system in motivational contexts (Ballard et al., 2011;Spaniol et al., 2015).
For the VS, we used an anatomical mask derived from the Harvard-Oxford atlases adjusted it for the effects of interest. One subject was excluded from the DCM analysis, as we did not find any active voxels surpassing our threshold in the thalamus. For the DCM analysis, the two scan sessions were concatenated with SPM. We added an additional nuisance regressor to the concatenated model that modelled the volumes at session transition. All stimulus cues were entered as driving input in the thalamus. The full model comprised a fully connected corticothalamic network that projected unidirectionally to the VS. In turn, the VS had one main output to the thalamus, modelling the principle anatomy of cortico-striatal-thalamic loops (Haber and Knutson, 2010). + and | − | were allowed to modulate the self-connections of the VS, the insula, and the dACC. Limiting task modulation to the self-connections allows a straightforward biological interpretation of the modulatory parameter estimates, namely the change in synaptic gain for a given task context (Zeidman et al., 2019).

Data and code availability
All relevant anonymised data and code used to generate results are available from the authors on request in accordance with the requirements of the cantonal ethics board.

RESULTS
First, we examined the log reaction time (logRT) and the accuracy of 67 participants that performed the MID task in the scanner (Figure 1). The hit rate across participants was 61.8% (SD = 2.0%) and thus close to the hit rate of 66% that we aimed for in the task design (Table 2)

Modelling response vigour
We employed a computational reinforcement learning model that predicted logRTs for each trial to assess how response vigour was modulated across the task. This modelling approach extends the standard analysis by allowing us to track individual representations of reward and loss and their respective modulation of response vigour at each trial. We used a Rescorla-Wagner-like model (Rescorla and Wagner, 1972), where expected values of reward ( + ) and loss ( − ) were updated according to reward ( + ) and loss ( − ) prediction errors weighted by the learning rate ( ). Then, we defined several alternative response models that described the mapping of the variables derived from  Table S1). Moreover, an additional analysis showed that for different age groups (11-16y, 17-22y, 23-27y, >26y) model M3 performed best in all groups (Table S2). This means that in our task (1) average reward and loss rates and (2) (signed) prediction error signalling did not contribute to explaining the response data. Based on this result, we used the trial-by-trial predictions of the best model in the subsequent fMRI analysis.
To check the model fit of our behavioural model, we performed a linear mixed model analysis for simulated logRTs of the winning model. For this, we excluded two participants, for which the model fitting procedure did not converge. This analysis revealed comparable effects to the behavioural effects observed in the data. As in the raw data analysis, we found a significant main effect of condition, F(4,7727) = 5.39, p = 0.0002, an age-by-condition interaction, F(4, 7727) = 8.59, p < 10 -6 , but no main effect of age, F(1, 63.1) = 0.26, p = 0.61. The model captured the differences between the conditions very well and reproduced effects found in the raw data analysis, namely that in the high loss condition participants increase vigour compared to neutral, low loss and low reward condition (p < 0.0001) and that they respond faster with high rewards at stake compared to neutral, low loss and low reward condition (p < 0.0001). Simulated mean logRTs did also significantly correlate with age in the high reward condition (r(63) = -0.300, p = 0.03). Parameter recovery using the simulated data showed a good to excellent recovery for the simulated data (Supplement).

Response model parameters are related to age
To see if there is a relationship between the model parameters and (1) Figure S1 for all parameter correlations). We did not find any correlation between behavioural parameters and post-scan outcome liking ratings (all p > .05). The response model parameters were only moderately correlated across age with all absolute < 0.562 ( Figure S2). The correlation of the learning rate and cue salience 1 was significant (r(63) = 0.302, p = 0.01). There was no evidence that quadratic or inverse-age models fitted the individual parameters better than the linear model (∆ < 6.2 for all model comparisons, Table S1) and therefore they were not investigated further. The parameters of the winning behavioural model are summarised in Table S1.

Incentive valuation remains constant across development
Additionally, we performed an analysis of post-scan ratings, to assess whether any age-related behavioural differences are related to different incentive valuation.

Cue salience representation in incentive networks changes with age
Using fMRI, the first key question we sought to answer was whether representations of expected value and prediction error vary across age. We carried out parametric whole-brain analyses using the computed signals from the behavioural analysis as (nonorthogonalised) predictors for the BOLD signal to examine how they modulate brain activity during the task. The regressors of interest were the parametric modulators for expected value ( + /| − |) during the anticipation phase and for prediction errors ( + / − ) during the outcome phase. The resulting first-level maps were entered into separate multiple regression analyses to determine the effects of age, sex and age × sex on neural signatures of expected value and prediction error processing.
The average effect of the + signal was significant in the bilateral VS, the left ventrolateral prefrontal cortex, anterior cingulate, bilateral angular gyri, the right insula, middle temporal gyrus, the thalamus and the cerebellum (Figure 3, Table S3). These effects were not modulated by age or sex. On the other hand, an average effect of | − | was observed in the right ventral striatum, thalamus, anterior cingulate, supplementary motor area, postcentral gyrus, lingual gyrus and fusiform gyrus ( Figure 3, Table S3). Age had a significant positive effect on | − | activity in a cluster comprising dorsal anterior cingulate and supplementary motor cortex, as well as the bilateral prefrontal cortex, the bilateral insulae, the supramarginal gyrus and the occipital fusiform gyrus. No significant effect of sex was observed. Reward prediction error signals showed a significant age-by-sex interaction (right panel), with all clusters showing an age-dependent decrease in females and an increase in males.

Prediction error signaling depends on age-by-sex interaction
Next, we examined where reward or loss prediction error signalling during outcome receipt are encoded in the brain and if their representations differ across age. Representations of the reward prediction error + at the outcome presentation were detected in the bilateral ventral striatum, the bilateral caudate nuclei, the ventromedial prefrontal cortex, the left dorsolateral prefrontal cortex, supplementary motor area, a cluster spanning dorsal hippocampus and the lateral thalamus, and the occipital cortex (Figure 3, Table S3). We found that + activity was positively correlated with age in the bilateral fusiform gyrus, but, contrary to our hypothesis, no negative correlation was found in the VS. However, we observed a significant age-by-sex interaction in the right ventral striatum and the superior temporal gyrus. In particular, older females exhibited reduced activity related to + , while in males the activity increased. The average effect of the loss prediction error − was located in the bilateral ventral striatum peaking in the putamen, bilateral caudate nucleus, anterior cingulate, bilateral posterior orbital gyri, bilateral anterior insula, thalamus, pre-/postcentral gyri. In addition, a main effect of sex was found in the supramarginal gyrus ( Figure 3, Table S3s).

Fine-tuning of cortico-striatal connectivity from adolescence to adulthood
Results from behavioural and the whole-brain analyses indicated, that during reward and loss anticipation there is a significant effect of age on (1) response vigour to salient cues (but not prediction error signalling) and (2) activity in core regions of the incentive processing circuitry during the anticipation phase. Given this association of age with response vigour and neural responses to cue salience, we evaluated how age differences of processing reinforcement learning signals manifested in the incentive processing network during the anticipation of incentives. For this, we performed an analysis of effective connectivity (dynamic causal modelling, DCM; see Materials and methods) that determined the model that fitted the neural dynamics best.
We estimated each first-level DCM and analysed the (1) group average and (2) the effect of age on each connection with a second-level Parametric Empirical Bayes (PEB) model. Then, we examined the effect on the average connectivity between regions and the self-inhibition parameters of the DCM.
In the DCM framework, self-inhibition parameters reflect a region's sensitivity to inputs for a given task context. The averaged connectivity strength of each connection is presented in Table 3. Note, that some connections have been removed in the Bayesian model reduction procedure, as they did not contribute to the model evidence (Friston et al., 2016). Across all participants, we found that the input region thalamus has excitatory influence on all other regions in our modelled network. The VS received input from the thalamus, the insula and the lateral prefrontal cortex. In addition, we found inhibitory connectivity from the striatal region to the thalamus. Connections originating in the insula showed negative connectivity to the thalamus, the VS, and the lateral prefrontal cortex (LPFC).
Dorsal anterior cingulate connectivity was targeting thalamus, the LPFC and the insula. The LPFC exhibited negative efferent connectivity to the thalamus, the dorsal anterior cingulate cortex (dACC) and the insula, and positive efferent connectivity to the VS. In the VS and the dACC, we found significant modulatory effects of + and | − |. In the dACC, the self-inhibition correlated with + positively and with | − | negatively (Figure 4).
We found a significant increase of connectivity with age from the LPFC to the VS and in the selfinhibition of the insula. The negative connectivity from the VS to the thalamus (THL) became less inhibitory with age. Decrease of effective connectivity with age was found in the connectivity from the thalamus to the dACC. These results indicated that the cortico-striatal-thalamic circuitry is fine tuned with age ( Figure 4). Using leave-one-out cross-validation (LOOCV), we assessed whether these effects were predictive for the age of an independent subject, i.e. we fitted the PEB model to all but one subject to obtain the model parameters and use the effective connectivity of the left out subject to predict their age. That is, we assessed if we could predict the age of an independent subject given only its intrinsic connectivity. As the correlation between the estimated and the actual age was significant, r(64) = 0.26, p = 0.02, we can expect that if we included new subjects, they would exhibit the same association based on the model parameters of this cohort.
In a final step, we assessed whether the cue salience behavioural parameter 1 was related to the connectivity parameters revealed in the PEB model using robust percentage-bend correlation ( Figure   4). We found that 1 was significantly correlated with the posterior mean of the connections LPFC → VS, = 0.278, pFDR = 0.035, VS → THL, = -.327, pFDR = 0.017, and THL → dACC, = .327, pFDR = 0.017. The association between VS → THL and 1 remained significant after removing five left sided outliers of the DCM parameter determined by Rosner's test ( = -.303, p = 0.020). No significant association was found for the self-inhibition of the insula and 1 , = -.168, pFDR = 0.18.   (Wilcox, 1994) and FDR adjusted p-values are given for each correlation.
Abbreviations can be found in Table 3.

DISCUSSION
The ability to adjust behaviour is pivotal when facing ever-changing environmental demands. Here, we demonstrate that during an instrumental task the ability to specifically increase response vigour for high incentives improves from early adolescence to early adulthood and is paralleled by developmental changes of information flow within cortico-striatal-thalamic connectivity. These results suggest that rather than a simple cortical-subcortical imbalance Steinberg, 2010), the network that supports incentive-guided action undergoes a fine-tuning of effective connectivity across adolescence into young adulthood. By applying a trial-by-trial reinforcement model in conjunction with dynamic causal modelling, we were able to extend previous studies of the neurobiology of instrumental vigour across development (Cho et al., 2013;Lamm et al., 2014). We found evidence that differences in adaptive responses can be linked to age-related changes in corticostriatal-thalamic effective connectivity. These findings support that cortico-striatal-thalamic circuits serving efficient motivated behaviour undergo a smooth functional transition during maturation.
On the behavioural level, we demonstrated that increase in vigour is dependent on the cue salience of a given trial rather than on the valence of a cue (i.e. reward or loss) or the average reward or loss rate.
Although the average reward rate has been linked to increased instrumental vigour (Niv et al., 2007), it did not contribute significantly to explaining the observed response behaviour. Studies that systematically manipulated the average reward rate suggested a link between response vigour and the average reward rate (Beierholm et al., 2013;Griffiths and Beierholm, 2017;Guitart-Masip et al., 2011). Note, that the average rates for reward and loss in a paradigm like the MID task are low (Beierholm et al., 2013) and the power to detect an effect of average reward might have been too low.
In the MID task, subjects perform categorical comparisons between reward or loss magnitudes and typically show faster responses in trials with larger values at stake (Cho et al., 2013;Pfabigan et al., 2014;Wrase et al., 2007;Wu et al., 2014). Therefore, it would be interesting to employ experiments with varying average reward and punishment rates to test how they affect response vigour across development.
In adults, we found a lower learning rate together with a trend for speeding up during high expected values, suggesting a more stable representation of values and a stronger behavioural discrimination between low and high incentives. In contrast, younger participants show a lower behavioural discrimination butdue to the higher learning rateare able to adapt their behaviour to expected values faster over the course of the task. Selective improvement of performance for high incentives has been found previously in young adults compared to children and adolescents (Hämmerer et al., 2011;Störmer et al., 2014) and is consistent with theories of cognitive control that assert that action execution of adults (e.g. in response to a go-signal) can be selectively modulated by incentives (Botvinick and Braver, 2015). Moreover, our data show that with increasing age the effect of prediction errors on the subsequent expected values decreases, resulting in lower learning rates.
Together, this indicates that the behavioural adaptation is less and less influenced by feedback that is not relevant to perform well in the task (due to more stable behaviour) across development (Van Den Bos et al., 2012). Importantly, our sample included an age range from as young as 11 years up to 35 years, in which we could show that there is a gradual increase from adolescence into adulthood. It is possible that this age-dependent incentivised vigorous behaviour is related to differences in subjective valuation of monetary values between adolescents and adults. However, in our study, increased vigour during high cue salience was not related to differences in valuations of monetary outcome. This suggests that age-related differences cannot be attributed to valuation per se, but likely originate from the cognitive demands of the task.
Our behavioural model comparison suggested that logRT is more related to misses in general, rather than to distinct signed prediction error signals incorporating magnitude and valence (i.e. reward omission or monetary loss). Although post-error and novelty parameters of the winning model were correlated, parameter recovery showed that their effects could be discriminated well. While they shared a notable amount of explained variance, the novelty parameter accounts particularly for changes in response vigour for higher or lower deviations from the outcome value in a given trial. Our results pointed towards a trend of improving performance after errors (i.e. late response) with age, rather than magnitude-related behavioural adjustments. Prospective studies should investigate how errors for various magnitudes, and how magnitude-dependence of those signals changes across development.
Model-based analyses of incentive anticipation demonstrated consistent activation in the insula and the dACC, the principle nodes of the salience network, as well as the lateral prefrontal cortex, the striatum and the thalamus. Importantly, we observed a significant modulation of expected value signals for both, reward and loss, in the ventral striatum, the thalamus, the insula and the dACC (implicitly compared to non-incentivised trials). Former work reported that an expected value signal in the striatum can boost instrumental vigour in reward approach and loss avoidance behaviour, i.e.
regardless of valence (Dayan, 2012;Rigoli et al., 2016). This is consistent with previous studies that assessed reward processing in adolescents and adults (Oldham et al., 2018). Crucially, we did observe a difference across age for the encoding of the expected value during loss trials, but not to reward. We identified clusters in the anterior insula and the dorsal ACC where the | − | signal positively correlated with age. In adults, aversive processing has been repeatedly shown to be associated with activation of the dACC (Jensen et al., 2003;Pohlack et al., 2012). Research of loss or aversion processing from early adolescence to adulthood has been sparse, nevertheless, the few existing studies reported decreased activity in the dorsal caudate (Cho et al., 2013;Lamm et al., 2014), the ACC (Bjork et al., 2010), and the insula (Galván and McGlennen, 2013) in adolescents compared to adults.
Our findings of decreased activity in caudate, insula and ACC for | − | corroborate these earlier findings and extend it by showing age-related increase of activity in lateral prefrontal cortex. This correlation was not explained by differences of valuation nor arousal ratings, thus suggesting that the observed differences are not rooted in age-dependent salience attribution. This could indicate that the integration of motivational and salient events for more proactive control in loss avoidance continues to mature into young adulthood.
Contrary to our hypothesis, we could not replicate earlier findings of general heightened reward sensitivity of the NAcc during adolescence compared to adulthood (Barkley-Levenson and Galván, 2014;Braams et al., 2014;Somerville et al., 2011). Our results point to a more specific age-by-sex interaction effect of reward prediction error signals in the nucleus accumbens. In particular, our results suggest that in females, nucleus accumbens activity related to reward prediction errors decreases, activity in males increases across age. Previous studies have not only shown the influence of gonadal hormones on structural brain changes during puberty (Peper et al., 2011) but also on accumbens activity during reward processing (Forbes et al., 2010;Ladouceur et al., 2019).
Nonetheless, behavioural model comparison substantiated that prediction error signaling did not show to affect response vigour significantly across age. However, we stress that the unequal sample size of males and females could have affected our results as it has been shown that it can inflate Type I error (Aguinis et al., 1999). Given the exploratory nature of this finding and because no hormonal levels were measured in the present study, this link remains suggestive, should be interpreted with caution and needs confirmation in future studies with larger samples..
Based on the findings of behavioural and whole-brain analyses, we conducted an effective connectivity analysis to assess age-related changes in the incentive processing network. Our results suggested that response vigour is closely related to the expected value of reward and loss incentives and changes across adolescence along with associated brain activity. Thus, we assessed the maturation of the functional architecture of the network comprised of regions (1) encoding expected values and (2) serving cognitive control of motivational processes (Botvinick and Braver, 2015).
First, in line with our hypothesis, we observed an increase of connectivity between the LPFC and the VS. The LPFC is well known for supporting motivated behaviour by storing and updating goalrelevant information and executing regulative control (Botvinick and Braver, 2015). Nevertheless, it remains unclear how a protracted LPFC maturation (Gogtay et al., 2004) affects the orchestration of incentive-based behavioural adaptations in concert with other, differentially developing regions.
Across adolescence, cognitive control emerges transiently and is associated with task performance differences between youth and adults (Crone and Dahl, 2012). Prior work has shown elevated corticosubcortical connectivity during processing of salient stimuli predicting reward in adults (Ballard et al., 2011;Kinnison et al., 2012). Recently, first evidence has emerged that this adult ability to selectively exert cognitive control and improve performance to obtain high rewards is associated with the development of cortico-striatal connectivity during adolescence (Hallquist et al., 2018;Insel et al., 2017). Hence, the observed increase of information flow from the LPFC to the VS could reflect a strengthened control signal that is necessary to retrieve cognitive resources to improve performance.
Secondly, we identified a developmental change in effective connectivity from the VS to the thalamus. A similar study using the MID task in an exceptionally large sample showed that striatal and thalamic regions show connectivity changes among adolescents during reward anticipation (Cao et al., 2019). Using explicit models of behavioural and effective connectivity, our study not only complements and corroborates these findings. Given the large age range of our participants and the use of reward and loss trials, our results critically extend these findings by characterizing connectivity changes from adolescence into adulthood and by being able to show that the connectivity changes are independent of approach or avoidance behaviour. Furthermore, this functional pathway has already been identified in previous DCM studies (Cho et al., 2013;Li et al., 2015) and its engagement seems to be particularly present during adolescence compared to adulthood (Cho et al., 2013). Thus, our results confirm a developmental decrease of coupling from VS to the thalamus across adolescence and adulthood. The striatum projects to the ventral pallidum, which in turn projects mainly inhibitory GABAergic to the thalamus. Therefore, the VS is in a suitable position to regulate the disinhibition of the thalamus (Haber and Knutson, 2010). The striatothalamic pathway has been implicated in successful reinforcement learning, in particular learning the relationship between an action and their consequences (Dudman and Krakauer, 2016;Pessiglione et al., 2007). The thalamus shares bidirectional connections with a wide range of cortical regions (Haber and Knutson, 2010) and evidence from animal studies suggest, that thalamic lesions severely affect the ability to use rewards for goal-directed behaviour (Chakraborty et al., 2016;Leung and Balleine, 2015). Moreover, pharmacogenetical models of thalamic hypofunction during Pavlonian conditioning are associated with failures of reward-related behavioural modulations (Parnaudeau et al., 2013). Given the importance of integrity of this pathway in reinforcement learning, this underlines that the maturation of striatothalamic connectivity supports the facilitation of salience attribution to an incentivised stimulus and thereby promoting signals indicating a need for cognitive control. According to recent proposals, the dACC integrates these signals for monitoring demand and the allocation of cognitive control to maximize outcome (Cavanagh and Frank, 2014). This computation in the dACC is thought to result in a specification of which control mechanism to execute in order to optimize behaviour adaptively, which can be transmitted to other regions (e.g. the LPFC). Suggestive evidence from primate studies indicates that the dACC uses valence-specific representations of outcome uncertainty for this purpose (Monosov, 2017), and information of past outcomes is encoded in inhibitory interneurons (Kawai et al., 2018;Sajad et al., 2019) . In the DCM framework, a summary measure of the excitatory/inhibitory balance within a region is modeled by the self-connections (Zeidman et al., 2019). The DCM results demonstrated that the dACC did modulate its self-connections with regard to the cue valence that was processed. This supports the evidence presented and further emphasises the valence-encoding role of the dACC in humans.
Moreover, we found a decrease in effective connectivity from the thalamus to the dACC across adolescence. Although it is widely appreciated that excitatory thalamocortical connections critically contribute to reward-related behaviour as motor planning and salience detection (Pergola et al., 2018), surprisingly little research on functional coupling during incentive processing has been performed in humans. Increased functional connectivity between thalamus and dACC has been associated with increased risk-taking behaviour in adult smokers (Wei et al., 2016). In addition, different lines of evidence have shown significant remodelling of this circuit during adolescence. For instance, the levels of glutamate in the medial prefrontal cortex are elevated in adolescence and decrease across young adulthood (Marsman et al., 2013). Moreover, myelinic maturation within this circuitry has been associated with lower impulsivity (Ziegler et al., 2019). Animal research suggests that the role of the thalamus in adapting behaviour as a function of incentives might be fundamentally dependent on inhibitory activity of thalamocortical neurons (Delevich et al., 2015;Rikhye et al., 2018). Taken together, adolescent hyperconnectivity between thalamus and dACC might reflect an immature mechanism of generating appropriate control signals for adjusting behaviour. A decrease in connectivity across adolescence could therefore reflect a damping of the striatothalamic feedback to the cortex and a shift towards cortical control.
Lastly, we found that the self-inhibition of the insula during incentive anticipation increases with age.
In parallel to the dACC above, this means that the input gain decreases across adolescence. The insula is a hub that shapes motivational states and attention based on the affective evaluation of sensory input and tags relevant stimuli for further processing (Gogolla, 2017). The adolescent disinhibition of the insula that might reflect a distorted weighting of ascending salience-attributed sensory signals.
Immature salience attribution in this brain hub orchestrating cognitive control might have contributed to the failure of flexible behaviour. Further, our results indicate a functional coupling from the valence-sensitive dACC to the insula, both being highly implicated in processing salience (Uddin, 2015). Again, a decrease in sensitivity to inputs could therefore reflect a shift from weighing bottomup salience signals towards a mature top-down cognitive control to achieve an adjustment of behaviour to salient stimuli.
These findings support the idea that appropriate attentional filtering is important to adapt ones behaviour to incentivised stimuli (Parro et al., 2018). Different brain systems like the salience network or prefrontal-striatal network work in concert to support appropriate filtering and adjustment of behaviour. Maturation of the cortico-striato-thalamic system should eventually facilitate cognitive processes or motor responses via exertion of cognitive control of the LPFC. In line with this idea, a recent rodent study has shown that the prefrontal cortex is able to modulate sensory processing in the thalamus via the basal ganglia for attentional filtering of sensory signals fostering goal-directed behaviour (Nakajima et al., 2019). This finding demonstrates the complexity of neural circuits involved in motivated behaviour and underscores the necessity to study how developmental processes manifest with appropriate network models in humans. We acknowledge that the low number of participants between 19-22 years is a limitation of this study warranting consideration. The modest sampling within this age range precludes any conclusions about the brain and development in this particular age and our results shall be interpreted with care regarding the effect of incentives in the latest stages of adolescence. Moreover, we would like to emphasize that our results primarily describe the functional development of cortico-striatal circuits during instrumental learning. Research over the last two decades has found several discrepancies for different experimental paradigms and age ranges that have yet to be reconciled. It is possible that different aspects of punishment and reward show distinct neural sensitivity at different developmental stages (Richards et al., 2013). Thus, future work will show whether our findings generalize to younger children and other cognitive domains as e.g. more complex or risky decision-making. Nevertheless, they lend strong support to the broader notion that development of cortico-striatal circuit function can enhance motivated behaviour.

CONCLUSIONS
To summarize, our study demonstrated that the ability to adapt response vigour towards salient cues in a trial-by-trial fashion improves from early adolescence to adulthood. Furthermore, we show how classic models of reinforcement learning in conjunction with biophysics of neuronal dynamics can reveal developmental aspects of the underlying functional architecture of behaviour. We corroborate previous studies that found that performance of adults improves during incentivised compared to nonincentivised tasks (Chiew and Braver, 2016;Locke and Braver, 2008;Pfabigan et al., 2014;Wrase et al., 2007;Wu et al., 2014) and show that, compared to adults, adolescents have difficulties to adapt behaviour for high subjective value (Insel et al., 2017). Our computational fMRI approach allowed us to link the overt behavioural adaptations guided by latent processes to maturational changes in activity and functional coupling. With this, we provide evidence that progressive fine-tuning of the corticostriatal-thalamic circuit facilitates motivated action. Additionally, this approach revealed a functional sex difference in the development of striatal reward prediction error signalling. Although here, this did not affect task performance critically, it is highly likely that diverging sex-specific trajectories extending into adulthood have implications in the context of decision making and risk-taking. Thus, we believe this study could have important ramifications that pertain public health and the prevention of high-risk behaviour. During this important stage of development adolescents form habits that can lead to problems in later life (e.g. obesity, diabetes, or smoking) or have acute effects (e.g. substance abuse or sexually transmitted disease) (Kann et al., 2018;Kühn et al., 2019). Hitherto, however, the efficacy of incentive based education and prevention programs is not well established (Bright et al., 2018;Johnston et al., 2012;Levitt et al., 2016). Neuroscientifically informed policies that respect age and sex-specific neural and behavioural constraints across development might be able to improve intervention approaches (Whitten, 2013). The results obtained in this study indicate that interventions using incentives might not always be sufficient to efficaciously unfold the full motivational potential in adolescents and might also differentally engage girls and boys. Hence, future programs might benefit from adjustments to suit cognitive brain trajectories and being attuned to specific needs. To eventually optimize targeted intervention programs, it is important to further characterize motivational effects of incentives in different neurodevelopmental phases.