The Neurobiology of Personal Control During Reward Learning and Its Relationship to Mood

Background The majority of reward learning neuroimaging studies have not focused on the motivational aspects of behavior, such as the inherent value placed on choice itself. The experience and affective value of personal control may have particular relevance for psychiatric disorders, including depression. Methods We adapted a functional magnetic resonance imaging reward task that probed the value placed on exerting control over one’s decisions, termed choice value, in 122 healthy participants. We examined activation associated with choice value; personally chosen versus passively received rewards; and reinforcement learning metrics, such as prediction error. Relationships were tested between measures of motivational orientation (categorized as autonomy, control, and impersonal) and subclinical depressive symptoms. Results Anticipating personal choice activated left insula, cingulate, right inferior frontal cortex, and ventral striatum (pfamilywise error–corrected < .05). Ventral striatal activations to choice were diminished in participants with subclinical depressive symptoms. Personally chosen rewards were associated with greater activation of the insula and inferior frontal gyrus, cingulate cortex, hippocampus, thalamus, and substantia nigra compared with rewards that were passively received. In participants who felt they had little control over their own behavior (impersonal orientation), prediction error signals in nucleus accumbens were stronger during passive trials. Conclusions Previous findings regarding personal choice have been verified and advanced through the use of both reinforcement learning models and correlations with psychopathology. Personal choice has an impact on the extended reward network, potentially allowing these clinically important areas to be addressed in ways more relevant to personality styles, self-esteem, and symptoms such as motivational anhedonia.

underpinning (11), one's locus of causality is considered more dynamic (12) and is likely more environmentally adaptive.
Neurobiologically, the feeling of personal control (13), even when illusory (14), is associated with striatal activation, which suggests it may itself incur an additional value signal not typically captured by reward-learning paradigms. Leotti and Delgado (15,16) attempted to isolate this within a reward learning context by testing whether the mere anticipation of control, elicited by a cue signaling an opportunity to make a choice versus a passive selection, would recruit neural systems of reward. They found that cues indicating personal control elicited greater reward system activation in both reward-obtaining (15) and loss-avoiding (16) contexts. However, this previous paradigm did not clearly dissociate between choice anticipation and receipt of the reward itself. In the present study, we have adapted this value of choice task to clearly separate anticipation and outcome phases of choice and applied reinforcement learning models to better characterize the relationship between the value of choice and neural activation in healthy individuals.
Specifically, our aims were to 1) verify previous findings concerning choice-anticipatory activation; 2) determine if responses to rewards differ according to whether or not they were personally won or passively received; 3) establish that, with appropriate modification of the original paradigm, computational models of reinforcement learning can explain observed brain activity; and 4) determine whether elicited activation covaries with subclinical depressive symptoms and personality factors relevant to depression, namely, neuroticism and measures of causality orientation. We anticipated that high neuroticism and impersonal scores would be associated with diminished activation to the inherent value of choice because depression has been linked to other types of blunted reward value (17). We were particularly interested in the roles that the striatum and dopaminergic midbrain may play, given their key importance in reinforcement learning, incentive salience, and hedonic signaling.

METHODS AND MATERIALS Participants
Individuals were selected from a wider ongoing study [Stratifying Resilience and Depression Longitudinally (18)] and underwent lifetime diagnostic screening using the Structured Clinical Interview for DSM-IV-TR Axis I Disorders (19) and DSM-IV-TR criteria. Only individuals without a lifetime diagnosis of major mental illness were included in the current analyses, which were performed when data from the first 149 healthy control participants were available. The following people were excluded: 15 people owing to nonperformance of the task (no response or incorrect for .33% of trials), 6 people owing to scan acquisition technical difficulties, and 6 people owing to excessive motion (more than three events involving motion greater than [0.5 3 largest voxel dimension = 2.5 mm]). After these exclusions, there were 122 participants. All participants provided written informed consent, and the study was approved by local and regional ethics committees.

Neuropsychology and Behavioral Analyses
Neuropsychological data collected included the General Causality Orientations Scale (20), which examines the sources from which a person is motivated to act (9) and consists of three dimensions: autonomy, control, and impersonal. Neuroticism scores, the severity of depressive symptoms, and handedness were also assessed (see Supplement).

Neuroimaging Data Acquisition and Preprocessing
Data were acquired using a 3T magnetic resonance imaging scanner (repetition time = 1.56 seconds) (see Supplement).

Modified Inherent Value of Choice Imaging Task
The task was adapted from Leotti and Delgado (15) and implemented in NeuroBehavioural Systems Presentation software (NeuroBehavioural Systems Presentation, Inc., Berkeley, CA). Each trial had three phases ( Figure 1): 1) the cue phase, where participants learned whether they would personally be making the reward decision (choice value trial) or would be following the computer's direction (no-choice value trial); 2) the selection phase, whereby a decision was made between a yellow or blue card; and 3) the outcome phase, when participants received a probabilistic reward according to their decision. During the selection phase, participants were able to freely select their preferred card during choice trials; on nochoice trials, a rectangle appeared around the card that the computer had selected for them, which they were obliged to confirm. Selections were made via a button press.
In the original task, the yellow and blue cards shared equal reward contingencies. In our adaptation, they had different contingencies to permit modeling of reinforcement learning: the yellow card was associated with an 80% chance of a 100-point reward, and the blue card was associated with a 20% chance. The alternative outcome was 0 points. We also introduced 1500 to 4000 ms of jitter between selection and outcome phases of each trial, allowing for disambiguation of all three phases.
Participants completed 66 trials, 33 choice and 33 nochoice. Trial order and the side of the screen on which the yellow and blue cards appeared were randomized, preventing final action planning. Decisions made by the participant during choice trials were mirrored by the computer with a three-trial lag during no-choice trials in an effort to match the overall rewards received across conditions of interest. Total task length was 14 minutes 59 seconds.
Participants were told their objective was to learn by trial and error which color card was more likely to give them points. They were informed that for some trials they would get to choose, but for others the computer would choose for them. During the latter trials, they had to follow the computer's selection. Participants were also told that the reward contingencies remained consistent regardless of whether they or the computer were doing the choosing. A questionnaire administered after scanning asked participants to rate their desire to win points on a scale of 1 to 10 and their preference for choice or no-choice trials.

Functional Magnetic Resonance Imaging Data Analysis
Two analytic approaches were adopted: 1. The basic model was used to (a) verify that appropriate reward responses were seen for the outcome phase

Basic Model
This was modeled at the first level as a series of delta functions convolved with a canonical hemodynamic response function, the onsets of which were denoted by experimental conditions of interest. These were the onsets of the choice and no-choice cues and the onsets of trial outcome, with choice/no-choice and 0/100 points being modeled separately, giving six experimental vectors of interest. Nuisance regressors included the onsets of yellow/blue selection, trials where an incorrect response or no response was received, and motion parameters. At the second level, cue phase contrasts of choice . baseline and no-choice . baseline were entered into a random-effects flexible factorial analysis, modeling the factors of participant and choice/no-choice. The outcome phase was considered in a separate 2 3 2 flexible factorial analysis incorporating the contrasts of choice 100 . baseline, choice 0 . baseline, no-choice 100 . baseline, and no-choice 0 . baseline, modeling the factors of participant, choice/nochoice, and reward amount (see Supplement). For both models, each participant's desire to win points and any difference in points received for choice versus no-choice trials were included as nuisance covariates. Regions identified as showing significant activation for the contrasts of interest were subject to extraction of the first eigenvariate for the suprathreshold cluster and their relationships with our covariates of interest explored (autonomy, control, impersonal, Quick Inventory of Depressive Symptomatology [QIDS] depressive symptoms, and Eysenck Personality Questionnaire Revised neuroticism scores). This was done using backward regression in IBM SPSS Version 23 (IBM Corp., Armonk, NY): for each extracted region, the model that best accounted for the data was identified by analysis of variance; within this, significant coefficients of explanatory covariates were reported. These were subjected to false discovery rate correction with Q = .05 across all comparisons, and standardized b values were reported.

Pavlovian Reward Learning Model
The task was also modeled as an instance of classical conditioning, using a temporal difference learning model (21). We wished to identify if learning rate varied according to whether or not participants were actively choosing. The model implemented four different learning rates, 0.2, 0.4, 0.6, and 0.8, used to generate cue value and prediction error (PE) estimates across the task for each participant based on their cueoutcome experiences during the scan. The unconditioned stimulus was the outcome phase of each trial (the receipt of 100 or 0 points). The conditioned stimuli were the choice and no-choice indicators during the cue phase (see Supplement). Cue value was used to modulate trial-by-trial regressors representing the cue phase of each trial, and PE was used to modulate the outcome phase. Choice and no-choice conditions were modeled separately. These were entered into firstlevel SPM analyses, with a different SPM for each learning rate. Contrast estimates for each regressor were taken into second-level 2 3 4 flexible factorial analyses, which modeled the main effects of participant, choice/no-choice, and learning rate. As we expected choice value estimates to strongly covary with measures of autonomy, control, impersonal, neuroticism, and depression scores, these were included in the second- For both the basic and Pavlovian models, second-level contrasts were evaluated at a whole-brain voxel height threshold of p familywise error-corrected , .05. Given a priori interest in the striatum and dopaminergic midbrain, we also conducted region-of-interest analyses within a structurally defined mask comprising bilateral caudate, putamen, and dopaminergic midbrain (see Supplement). Masked voxels were reported as significantly activated if they exceeded a familywise errorcorrected height threshold of p , .05.

Demographics, Neuropsychology, and Symptoms
Median age of participants was 62 years, and 46% were men (Table 1). There was no correlation between age and task performance (p . .823). Of participants, 93% preferred making their own choices. Learning continued throughout the task, with the most rewarding card being chosen 79% of the time during the final quarter of the session (Supplemental Figure S3). Both QIDS depression (s = .213, p = .003) and impersonal scores (s = .197, p = .003) were positively correlated with neuroticism.  Table 2). Bilateral putamen was activated within the striatum/midbrain a priori mask ( Figure 3A).
No-choice . choice showed activation in occipital cortex only ( Table 2).
Basic Model: (c) Reward and Choice: Outcome Phase Next we examined whether responses to personally earned outcomes differed from those passively received. The outcome phase choice . no-choice contrast showed significant activation in the bilateral insula, anterior cingulate, right IFG, left hippocampus, and left thalamus ( Figure 2 and Table 3). Within the striatum/midbrain region of interest, there was significant choice . no-choice activation within the left substantia nigra and right caudate nucleus. No-choice . choice activated left middle frontal cortex, precuneus, and angular gyrus.

Reward 3 Choice Interaction Activations
The contrast of choice (0 . 100) . no-choice (0 . 100) showed activation in right IFG pars opercularis (Table 3). Conjunction analysis confirmed that this lay within the choice . no-choice cluster (p familywise error-corrected = .038) but not that of 0 . 100 ( Figure 4A). Contrast estimates suggested enhanced activation when one personally failed to win ( Figure 4B). Conversely, impersonal showed a negative association in the same region (b = 2.288, p = .012) (Supplemental Figure S4). Here QIDS depression demonstrated a similar pattern to control (b = .226, p = .039). Table 4 details these relationships.

Pavlovian Reward Learning: Value of Personal Choice
The final analytical thread considered whether the ability to choose was intrinsically rewarding in itself, within a reinforcement learning context. During the cue phase of each trial, there were no main effects of choice/no-choice or learning rate. However, as anticipated, there was significant covariation with several metrics of interest (Table 5). Increasing autonomy was associated with greater no-choice . choice value estimates in right amygdala (p = .008) and greater choice . no-choice value estimates in anterior caudate (p = .019). Again during the cue phase, control orientation demonstrated a positive relationship with learning rate in the right superior temporal sulcus (p = .017). During the outcome phase, there was a significant main effect of learning rate (a) in ventral striatum, with a lower a being associated with greater PE representation (p , .001). Conversely, there was an effect of increasing a in the right anterior insula and supplementary motor area (p , .006). Learning in ventral striatum therefore appears to operate over a longer timescale than in insula and supplementary motor area. Finally, impersonal showed a stronger PE representation for no-choice . choice in bilateral nucleus accumbens (p , .004)   Figure 3C, D). For interest, results significant at p , .001 uncorrected can be found in the Supplement.

DISCUSSION
In this study, we modified Leotti and Delgado's 2011 inherent reward of choice task (15) to 1) verify their previous findings, 2) disambiguate the cue and outcome phases, 3) demonstrate the utility of computational models in this context, and 4) see whether task-elicited activation covaried with personality factors of relevance to depression. Their findings concerning choice anticipation were replicated within our larger independent sample of healthy control subjects. The task was amenable to Pavlovian reward learning analysis. We then demonstrated a series of novel findings within regions key to reward and depression, their relationship to depressive symptoms, and measures that attempt to personalize notions of reward and value. This aligns them with the depressive phenomena of motivational anhedonia and devaluation of the self.

Anticipating Choice
We verified the striatal anticipatory response to choice as seen in Leotti and Delgado (15). Critically, we observed that this effect was diminished in participants with more depressive symptoms, suggesting an impairment in the hedonic value or salience attributed to personal choice. Reduced ventral striatal reward-linked responses are a well-replicated finding in patients with MDD, be it when viewing positive images (22), which correlates with anhedonia (23), or anticipating and receiving rewarding outcomes (24,25). In healthy control subjects, depressive symptoms correlate with a reduction in the usual performance-enhancing effects of positive feedback, implying striatal dysfunction (26). Striatal activation correlates with enhanced recall of personally chosen items and exerts a modulatory effect over hippocampus (27): this mechanism may underpin the cognitive biases observed in MDD. It is notable that we too report striatal dysfunction in a group of healthy control participants, who have not been subject to the effects of medication or an episodic illness, while having a narrower distribution of depressive symptoms. We also found enhanced insula and cingulate activation during choice . no-choice anticipation: these regions have been shown to correlate with momentary subjective well-being in rewarding contexts (28), supporting the view that personal choice is intrinsically appetitive. Both are key components of the salience network and play a role in cognitive control (29).

Personally Earned Versus Passively Received Rewards
Responses to personally chosen outcomes were enhanced compared with those that were passively received: insula/IFG   (30) in the inhibition of motor and affective responses (31). It is also activated by personal regret versus simple disappointment (32). It could be argued that personally failing to win induces a self-blame response (33) that requires inhibition or emotional regulation. Such a response would be relevant to depression and particularly to resilience in the face of adversity (34).

Inferior Frontal Gyrus and Goal-Sensitive Self-Regulation
Right IFG and insula showed a choice . no-choice response across the sample during the outcome phase, which was enhanced by high autonomy but diminished by high  impersonal scores. The concept of locus of causality is not far removed from that of learned helplessness, which inspired an animal model of MDD and gave ventral prefrontal cortex (IFG) particular prominence in a recent update by its architects Maier and Seligman (35). Prolonged aversive events are proposed to stimulate the raphe nuclei, releasing serotonin within the striatum (inhibiting behavior) and amygdala (inducing fear and anxiety), regardless of detected contingencies. This response is inhibited if the agent has previous experience of acting to escape aversive events, mediated by the regulatory influence of ventral prefrontal cortex over the raphe nuclei and striatum.
Maier and Seligman suggest that this process equates to the agent's being able to imagine having control over future aversive situations. Right IFG and insula are crucial contributors to cognitive control, governing the ability to select and maintain goal-directed action at the expense of other alternatives (36). Strong evidence from a meta-analysis supports their role in the cognitive reappraisal of emotional stimuli (37). Reduced responses to negative affective stimuli have been reliably demonstrated in patients with MDD (38). In this study, we show the response of IFG to personal choice is greater in individuals having high autonomy and reduced in individuals having an impersonal, passive style. The latter may therefore have a reduced ability to act to escape aversive situations and regulate subcortical limbic responses to aversive events, whereas the former would be more adaptive and resilient. Bhanji et al. (39) linked resilience to believing one has personal control: they found that one's ability to overcome setbacks was reduced following exposure to an acute stressor; however, this was diminished in people who believed that they had some control over the setbacks.

Precuneus and Agency Perception
The precuneus showed a no-choice . choice response during the outcome phase, especially so in participants with high control, with the opposite being seen with high impersonal scores. Precuneus is part of the default mode network and generally deactivates during goal-directed tasks (40). This happens to a lesser degree during tasks having a selfreferential component, taking a first-person perspective, or inducing the experience of agency (41). It also activates when mentally simulating the actions of another versus oneself (42), taking perspectives alternative to one's own (43), and considering the emotional states of both oneself and others versus neutral judgments (44). More abstractly, it is activated during judgments of intentional versus simple physical causality (45). In summary, it is arguable that any process that involves consideration of an intentional agent engages precuneus, regardless of whether this is one's own self, although the self is likely to prevail during default mode operations. The control orientation may increase the propensity to seek cues in the minds of others and consider the computer's intentions.
Conversely, the impersonal orientation shows an apparent  abolition of the effect seen in the general sample, suggesting a reduced inclination to consider intentionality at all.

Reinforcement Learning
The final analysis phase attempted to capture the learning process underlying how the choice/no-choice cues developed their inherently rewarding character and how this related to participants' characteristics. The use of a computational model potentially allows for a more mechanistic understanding of the activation observed and highlighted relationships with personality metrics that were not detected during the basic analysis. Highly autonomous people encoded value during presentation of the no-choice cue within right amygdala, suggesting that either no-choice cues (46) or the uncertainty associated with what the computer might select (47) was regarded as aversive. They also showed greater choice . nochoice cue valuations in dorsal anterior caudate, which through its interactions with prefrontal cortex plays a crucial role in goal-directed action (48). High control participants showed enhanced learning in the right superior temporal sulcus, which is especially involved in considering the intentions of external others (49). Finally, high impersonal participants had stronger nucleus accumbens PE signals for passively received rewards, suggesting that a reduced belief in the ability to control one's behavior related to more reward system reactivity to gifted versus earned rewards.

Limitations
A number of study participants were unable to perform the task correctly, suggesting that it was subjectively hard to understand or that a potentially important section of the population was excluded. We have not examined trial-by-trial assessments of choice preference or changes in stay/switch behavior, which are also believed to covary with depressive symptoms (26). The 80:20 yellow:blue reward contingency was used to permit reliable learning across a range of participants and may have induced ceiling effects in some participants, as we did not find a simple interaction between choice/no-choice and learning rate. However, it allowed us to focus on whether or not the participant did the choosing, without that choice in itself being particularly onerous. Indeed, additional confounds may have been introduced if there was a difference in decisionrelated deliberation between the choice and no-choice conditions. Alternatively, our temporal difference learning model may not have adequately captured the variance introduced by personal choice.

Clinical Relevance
Our findings suggest that the modified inherent value of choice task could provide useful insights into the neurobiology of MDD. Within this large sample of healthy control subjects, we have shown how personal choice modulates activation within areas known to be disrupted in MDD. This covaries with how inclined participants are to see themselves as drivers of their actions, to look to the outside world for their cues, or even to feel at a loss as to why they act at all. Being able to tease apart how particular manifestations of personality impact one's vulnerability to MDD is likely to be important to stratification. Characteristics such as causality orientation arguably build on more stable and heritable measures such as neuroticism, as they are more responsive to environmental events and so may provide more timely information regarding the risk of transition to illness as well as offering targets for psychotherapeutic interventions. The hope is that by examining the reward system in a manner that ties self-perception to behavior, more clinically applicable insights can be drawn. For example, a particularly effective therapeutic strategy for individuals having a high impersonal/low autonomy style might be to both enhance dopaminergic transmission and challenge self-orientation beliefs during cognitive behavioral therapy.