Individual differences in learning positive affective value

As a fundamental process underlying how individuals learn affective value, Pavlovian learning has been used to account for the emergence of individual differences in behavior and cognition. In affective sciences, early investigations into Pavlovian learning have often relied on aversive stimuli; more recently, the field has broadened its focus on positive emotions and reward. Individual differences in Pavlovian learning involving rewards could represent a translational and transdiagnostic model for compulsive reward-seeking behavior. Specific learning phenotypes have been linked in humans to distinct behavioral profiles — known as sign-tracking and goal-tracking — shown to convey differential vulnerability to compulsive reward-seeking behavior in animals. Such findings underline the explanatory potential of Pavlovian learning for the development of individual differences in a variety of affective phenomena.


Introduction
Biological fitness depends on individuals' ability to learn from interactions with their environment how to best deploy behavior in order to maximize reward and minimize punishment [1]. Multiple learning mechanisms are thought to be responsible for the attribution of affective value underlying behavior [2,1]. Among them, Pavlovian learning represents one of the most elementary and fundamental forms of learning, whereby a conditioned stimulus (CS; e.g. an image) is associated to the delivery of an affectively relevant outcome (i.e. unconditioned stimulus or US; e.g. a tasty food) due to contingent pairings. Such predictive learning -in which individuals can prepare for but cannot influence the occurrence of the outcome -is distinguished from instrumental learning, in which outcome delivery is contingent on individuals emitting a specific behavioral response, and positive affective value is attributed to actions leading to desired outcomes.
The relationship between the Pavlovian learning and emotion research fields has traditionally been very fruitful. On one hand, the Pavlovian conditioning paradigm provides a platform to investigate basic mechanisms underlying how individuals assign affective value to stimuli and how affective value is further shaped and updated through experience [3,4 ]. On the other hand, research on emotion provides a theoretical framework defining how and why an outcome is affectively relevant [5] and how the affective response elicited by the CS during Pavlovian learning can be measured. Indeed, most emotion theories agree that an affective response is multi-componential [5][6][7]. More precisely, this response -typically triggered by an appraisal process -can be observed at the autonomic nervous system level (e.g. skin conductance), the action tendency level (e.g. leading to approach or avoidance behaviors), the motor expression level (e.g. facial expression) and the subjective feeling level (e.g. feeling of pleasure; see Figure 1). The measure of the affective response is particularly critical in Pavlovian conditioning paradigms, since -unlike in instrumental learning paradigmsthere is no action the individual can do that will influence the outcome delivery. The only way to assess if Pavlovian learning occurs is to measure whether the CS triggers an affective response after having been associated with an affectively relevant outcome. For many decades, emotion research focused particularly on negative affect (e.g. [8,9]), mostly conducting research with aversive stimuli [2,3]. However, as positive emotions have been receiving increased attention in emotion research in recent years, interest for appetitive Pavlovian learning has grown, and studies investigating the fundamental mechanisms involved in learning positive affective value are becoming more widespread [10,11,12 ]. Here we review recent methodological approaches to the study of individual differences in appetitive Pavlovian learning and present a promising line of research aimed at modeling individual differences. We then illustrate how applying these methods to mental health research can contribute to our understanding of the etiology and maintenance of psychological disorders by reviewing recent findings linking individual differences in appetitive Pavlovian learning with differential vulnerability to disorders involving compulsive reward-seeking. Finally, we discuss further potential applications and avenues for research into individual differences in appetitive Pavlovian learning.
Recent methodological approaches to the study of individual differences in appetitive Pavlovian learning  Investigation of individual differences in appetitive Pavlovian learning. Individuals learn the association between a conditioned stimulus (e.g. an image) and a reward (e.g. chocolate). Multiple components of the conditioned response are measured through various behavioral and physiological methods. This data is used to fit the computational model for each individual, formalizing individual differences as potential computational phenotypes. These potential phenotypes can then be related to resilience and vulnerability factors for mental health.
learning trajectories; however, these individual differences have for decades typically been considered to be noise around normative models. It is becoming increasingly recognized that individual differences in Pavlovian learning should be exploited -rather than ignored -in particular for the identification of risk profiles for psychological disorders [13]. Computational modeling could be a powerful tool for the investigation of these individual differences [14 ,15]. Computational approaches aim to formalize psychological theories by specifying the algorithms underlying computations, thereby enabling researchers to construct formally sharp theories and capture individual differences at a mechanistic level [14 ,15]. Moreover, pairing computational approaches with brain imaging can help uncover how processes are implemented in specific brain areas [16]. Indeed, computational modeling readily lends itself to investigations of neuronal correlates: methods such as model-based functional magnetic resonance imaging (fMRI) incorporate signals derived from computational models into the analysis of brain function, providing mechanistic accounts of the neural activity underlying specific cognitive processes such as Pavlovian learning [16]. Consequently, the increasing application of computational approaches to appetitive Pavlovian learning promises to deliver important insights into this central and ubiquitous form of learning [17,11,12 ].
Computational modeling involves specifying an algorithm which formalizes the cognitive or neuronal process of interest; behavioral, neural or physiological trialby-trial data is then used to estimate model parameters.
Multiple computational accounts of appetitive Pavlovian learning have been suggested; among them, reinforcement learning models have proved to be particularly influential [18,19]. Indeed, given its ability to seamlessly link computational and algorithmic analyses to neuronal implementations [14 ,20], reinforcement learning is a particularly apt framework for the investigation of individual differences in different forms of value learning.
During appetitive Pavlovian conditioning paradigms, conditioned responses can be measured at different levels. Consequently, behavioral and physiological measures used for model fitting often reflect a dimension of the affective response (e.g. skin conductance responses, pupil dilation; see Figure 1). For example, in a conditioning paradigm during which an image is associated with the receipt of a rewarding food, approach tendencies have been measured by recording time spent gazing at the location of the expected reward [21 ,12 ]. Pavlovian action tendencies have also been measured using errors committed by participants in Go/No-Go tasks, showing that individuals have a tendency to approach rewards and avoid punishments, even when this Pavlovian bias conflicts with instrumental instructions to withhold or execute actions [22,10]. As ocular responses convey both the approach tendency and autonomic components (e.g. pupil dilation) of the affective response, eye-tracking appears to be a particularly appealing methodology for the study of appetitive Pavlovian learning in humans [11 ,12 ]. Trial-by-trial measures of the conditioned response are thus used to estimate model parameters for each individual, formalizing individual differences as computational phenotypes [23,15] (see Figure 1). Perhaps the most widely used reinforcement learning model in appetitive Pavlovian learning is the Rescorla-Wagner model [24], which stipulates that individuals update the expected value of a conditioned stimulus based on the trial-wise discrepancy between the observed and the expected reward -that is, the prediction error -pondered by a learning rate parameter. Typically, computational phenotypes derived from the Rescorla-Wagner model rest on differences in learning rate, indicating that some individuals learn faster than others.
The Rescorla-Wagner model is only one of several that can be applied to Pavlovian learning. For instance, while the Rescorla-Wagner model is a discrete trial-level model of learning -meaning that prediction errors are computed at the moment of outcome deliveryother algorithms such as temporal difference learning implement continuous-time predictions, allowing prediction errors to be computed every time new information is presented (e.g. not only at outcome delivery but also at CS onset) [19]. Furthermore, some models build an internal representation of the task and its structure rather than simply assigning value based on error-correcting rules like the Rescorla-Wagner algorithm. The degree to which the selected model accounts for the data can be quantified using Bayesian indices (e.g. Akaike and Bayesian information criteria, model evidence; [23]); these are then used to establish which among several competing models provides a better fit with measured data. Thus, distinctions between individuals can be drawn in two manners: either through the different values parameters take, or on the basis of which model best explains their behavior -the latter being a means to probe whether individuals employ qualitatively different learning strategies [23].
Finally, combining computational approaches with investigations of neural substrates can further inform our understanding of the cognitive processes underlying Pavlovian learning: the emergence of distinct computational phenotypes can be directly attributed to variability in brain function, enriching mechanistic accounts of individual differences in behavior with an explanation at the implementation level [14 ]. For instance, the electrophysiolgoical activity of dopaminergic neurons during Pavlovian conditioning paradigms has been shown to be consistent with temporal difference learning [19]. Studies using model-based fMRI have also shown that while reward prediction error robustly correlates with activity in the ventral striatum, the strength of the prediction error signal in the ventral striatum varies across individuals according to their computational phenotype [14 ,12 ].

Applications to mental health
The implication of Pavlovian mechanisms in the etiology and maintenance of a large variety of psychological disorders has been postulated numerous times in the literature [25,26]. Pavlovian associative mechanisms might represent one of the smallest building units in the process of attributing and updating affective value. Indeed, Pavlovian learning is an old and fundamental form of learning, found in organisms ranging from primitive invertebrates to complex mammals such as humans [1]. Given its unique status as a central learning mechanism, its influences on various aspects of cognition and behavior are farreaching [27], making Pavlovian learning an especially interesting target for the identification of transdiagnostic resilience and risk profiles. Transdiagnostic approaches to mental health, which aim for an understanding of disorders in terms of dimensional constructs spanning across traditional categorical boundaries, are becoming increasingly relevant as they are thought to address important nosological issues with current diagnostic systems [28]. The ability of computational modeling to characterize individuals on dimensional scales that capture mechanisms which are agnostic to psychiatric categories has contributed to the advocacy for computational approaches in mental health research (e.g. [29][30][31]). The application of innovative methodologies such as computational modeling to the study of Pavlovian learning can therefore yield key insights into the understanding of adaptive and maladaptive behavior, such as the identification of computational phenotypes underpinning resilience and vulnerability factors. As such, research into the individual determinants of Pavlovian learning is of particular relevance to the field of mental health -as already evidenced in the aversive facet of Pavlovian learning, where Pavlovian threat conditioning is an established laboratory model of anxiety disorders [32]. Recent findings suggest that individual differences in appetitive Pavlovian learning could similarly constitute a translational and transdiagnostic model for the etiology and maintenance of a variety of compulsive reward-seeking behaviors [33].
One of the main findings stemming from the computational investigation of individual differences in reinforcement learning mechanisms is the identification of a link between vulnerabilities to compulsive behavior and deficits in a specific type of instrumental reinforcement learning called model-based learning [34,35]. Indeed, shifts in the balance between model-based or model-free learning -two families of reinforcement learning mechanisms thought to operate in parallel [36] -are an important source of individual differences, with repercussions on behavior. While model-based and model-free algorithms both aim to compute the expected value of stimuli or actions, they do so in markedly different ways: modelbased learning involves constructing a cognitive model of the environment, from which the value of states or actions are prospectively simulated; in contrast, model-free learning is solely based on cached values reflecting previous history of rewards [1,19]. Because model-based learning allows individuals to flexibly adapt to changing environmental contingencies, it is typically assumed to underlie goal-directed behavior, while model-free learning renders individuals insensible to immediate changes in environmental contingencies and is thought to map onto habitual behavior [35]. In dynamic environments, the rigidity of habitual behavior can become maladaptive, as it can translate to persistent engagement in behavior producing an outcome that is no longer of value [35]. Accordingly, transdiagnostic phenotyping has shown that deficits in goal-directed control contribute to vulnerability to compulsive behaviors -a hallmark of disorders such as compulsive-obsessive disorder and addiction [35,34,37] (but see [38] for a conflicting account of habit theory for addiction).
Interestingly, a recent study has provided the first evidence for individual differences in the balance between model-based and model-free learning in appetitive Pavlovian learning [12 ]. This line of research stems from the identification of different behavioral phenotypes in rats during Pavlovian autoshaping procedures -a Pavlovian conditioning paradigm in which the CS is a discrete cue presented before the US delivery at a different location. Two types of conditioned responses can emerge, each capturing a specific action tendency: while goal-trackers approach the location of US delivery upon presentation of the CS, sign-trackers approach and engage the CS instead [13]. While both phenotypes learn the predictive value of the CS, only sign-trackers assign it incentive value; for them, reward cues become 'motivational magnets' [39 ,13]. Furthermore, these behavioral differences seem to reflect specific computational strategies: while goal-trackers appear to engage cortical regions and model-based computations, sign-trackers appear to rely on a modelfree, subcortical, dopamine-dependent form of learning [13,40,17]. Crucially, relative to goal-trackers, signtrackers tend to present concomitant characteristics typically associated with compulsive reward-seeking behaviors such as addiction -namely, attentional deficits and personality traits such as novelty-seeking, riskseeking and impulsivity [39,41,13]. Several researchers have therefore posited that sign-tracking might represent a promising candidate for a transdiagnostic vulnerability profile to compulsive reward-seeking behaviors [42,43].
Translational research has provided preliminary evidence that the same distinction can be drawn in humans, where sign and goal-tracking has been demonstrated using eye-tracking [44,12 ] (for a review, see [41]). Importantly, Schad et al. [12 ] have further found a neurocomputational double dissociation, with behavioral, physiological, and neural responses being guided by model-free values for sign-trackers and model-based values for goal-trackers. More specifically, for sign-trackers, gaze and pupil dilation were driven by a model-free reward prediction error derived from a Rescorla-Wagner computational model; for goal-trackers, gaze and pupil dilation were driven by a model-based state-prediction error [12 ]. Additionally, the temporal difference reward prediction error signal was only found in the ventral striatum for sign-trackers, while goal-trackers exhibited a stronger state prediction error signal in the intraparietal sulcus, which is rather associated to model-based learning [12 ]. Finally, this study replicated a previous finding [44] that Pavlovian cues exhibited more control over behavior in sign-trackers than goal-trackers [12 ], in line with the incentive salience hypothesis of addiction. Preliminary data also suggests that human signtrackers might be more impulsive than goal-trackers [44]; however, it remains to be determined whether differences in sign-tracking and goal-tracking during Pavlovian learning translate to increased vulnerability to compulsive reward-seeking behaviors in humans.
Hence, individuals' computational phenotypes during appetitive Pavlovian learning, by impacting on the degree to which reward cues are able to exert control over behavior, appear to be an important source of individual differences at the behavioral and implementational level. These findings constitute an important step in translational efforts to apply the sign-tracking/goal-tracking animal model to the understanding of human compulsive reward-seeking behaviors such as addiction, and to the study of individual differences in appetitive Pavlovian learning in general. Additionally, they underscore the potential of computational approaches to appetitive Pavlovian learning to inform our understanding of individual differences with regards to mental health.

Perspectives
These findings also suggest that computational approaches to Pavlovian learning could be useful in accounting for individual differences in a variety of affective processes, with applications beyond mental health. For instance, a better understanding of how individuals learn to assign positive affective value could also provide insights into the development of preferences. In this vein, a recent study has provided a computational account of the optimism bias [45]. Using a variant of the Rescorla-Wagner model that implements dual learning rates -an excitatory learning rate, which drives learning when the prediction error is positive, and an inhibitory rate, which drives learning when the prediction error is negativethe authors suggested that the optimism bias resulted from positive prediction errors being systematically overweighed [45]. Importantly, not all individuals showed this characteristic asymmetry in learning rates, suggesting that not all individuals were susceptible to the optimism bias [45]. Optimism further impacted on behavior: indeed, optimism was also characterized by a propensity to favor exploration over exploitation [45]. The authors draw a parallel between this association of a tendency to 'disregard bad news' with behavioral rigidity and conservatism, citing the example of inaction in the face of climate change [45]. Such findings show how seemingly small individual differences in computational strategies could have profound implications, shaping beliefs and dictating how individuals derive meaning from experience.
As exemplified in this study, much of the literature on individual differences in affective value learning is based on instrumental tasks. The recent results stemming from the study of model-based and model-free Pavlovian learning show how applying dimensions identified in instrumental learning to Pavlovian learning can be productive. Indeed, the investigation of model-based Pavlovian learning is relatively recent, as -contrary to instrumental learning -Pavlovian learning has long been presumed to be solely model-free [22]. The identification of model-based computations during Pavlovian learning [46 ,47,21 ,48,12 ,49] has led to suggestions that Pavlovian learning could involve two behavioral controllers analogous to habitual and goal-directed mechanisms in the instrumental domain [22]. However, recent findings suggest that the application of the modelbased/model-free typology to Pavlovian learning might be more complicated than expected; for instance, certain Pavlovian responses appear to share characteristics of both model-based and model-free learning [21 ]. More generally, as psychology is building new frameworks trying to go beyond dual-system theories [50], the model-based/model-free typology is being elaborated and extended [51]. While this appealing dichotomy continues to be theoretically and experimentally fruitful, the notion that a reductive architecture could ultimately account for the complexity of reward-seeking behavior is being questioned [51][52][53][54][55]. Researchers have thus pointed to the need to develop a more nuanced framework to push forward investigations into the computations underlying value learning. Promising approaches include exploring the dynamic arbitration between competing behavioral control mechanisms [2], rethinking classical tasks on which the model-based/model-free dissociation is often based [56,55], and considering subtler algorithms such as the successor representation, which compromises between model-based and modelfree operations by storing partially computed outcome values [57,53]. Such research could provide a more granular account of individual differences in appetitive Pavlovian learning. Additionally, the question remains as to whether individual differences in other aspects of appetitive instrumental learning such as sensitivity to risk [58], to volatility [59], or to mood [60] extend to Pavlovian learning.
Finally, further research into individual differences in Pavlovian learning could benefit emotion research by providing more detailed accounts of how such learning processes contribute to the various components constitutive of positive emotions. While aversive Pavlovian learning has extensively been associated with negative emotions -in particular fear -explicit links between appetitive Pavlovian learning and positive emotions are not as widespread. Affective modeling conducted with artificial intelligence agents has provided accounts of how specific positive emotions (e.g. hope, joy, interest) or emotional dimensions (e.g. positive valence) can be derived from various aspects of reinforcement learning (e.g. value signal, prediction error; [61]). Probing these links in human populations could lead to a better understanding of not only how specific emotional processes (e.g. appraised affective relevance of the CS and US) contribute to appetitive Pavlovian learning, but also, in turn, of how appetitive Pavlovian learning contributes to different aspects of emotion elicitation and response, including the emotional experience.

Conflict of interest statement
Nothing declared.