Animal affect and decision-making

The scientific study of animal affect (emotion) is an area of growing interest. Whilst research on mechanism and causation has predominated, the study of function is less advanced. This is not due to a lack of hypotheses; in both humans and animals, affective states are frequently proposed to play a pivotal role in coordinating adaptive responses and decisions. However, exactly how they might do this (what processes might implement this function) is often left rather vague. Here we propose a framework for integrating animal affect and decision-making that is couched in modern decision theory and employs an operational definition that aligns with dimensional concepts of core affect and renders animal affect empirically tractable. We develop a model of how core affect, including short-term (emotion-like) and longer-term (mood-like) states, influence decision-making via processes that we label affective options, affective predictions, and affective outcomes and which correspond to similar concepts in schema of the links between human emotion and decision-making. Our framework is generalisable across species and generates questions for future research.


Introduction
The concept of animal emotion has a turbulent history in biology spanning from Darwin's (1872) ready acceptance that non-human animals (hereafter animals) have emotional states of mind, through objections that such internal states are not accessible to scientific study (Skinner, 1953;Tinbergen, 1951), to the recent resurgence of interest in animal emotion in neuroscience, psychopharmacology, and animal welfare science (Adolphs and Anderson, 2018;Bach and Dayan, 2017;Dawkins, 2000;de Waal, 2011;LeDoux, 1996;Panksepp, 1998;Paul et al., 2005;Rolls, 2005). Current interest is so wide that it even stretches to journals whose traditional focus is on cellular and molecular biology (Anderson and Adolphs, 2014;LeDoux et al., 2016), and to discussions about the possibility of insect emotion (Baracchi et al., 2017;Mendl et al., 2011;Mendl and Paul, 2016;Perry and Baciadonna, 2017).
The types of research question that are asked about animal emotion can be usefully organised according to the 'Four Whys' framework for studying animal behaviour advocated by the ethologist, Niko Tinbergen (Bateson and Laland, 2013;Mobbs et al., 2018;Tinbergen, 1963) 1 . Proximate questions about cause and, to a lesser extent, development have dominated, as in affective neuroscience which seeks to understand the neural substrates of emotion (Adolphs and Anderson, 2018;Cardinal et al., 2002;Dalgleish, 2004;LeDoux, 1996;Panksepp, 1998;Rolls, 2005), and psychopharmacology which addresses the interplay between drugs, emotions, and other psychological states (Cryan et al., 2002(Cryan et al., , 2005Everitt et al., 2018;Prut and Belzung, 2003). Ultimate questions about function and evolution have received less attention. One set of functions frequently attributed to human emotion is to coordinate or organise adaptive responses and decisions. For example, emotions "…function within individuals in the control of goal priorities" (Oatley and Jenkins, 1992); emotional experience "…serves as data for judgment and decision-making processes and also for reordering processing priorities" (Clore, 1994); "emotion…can be regarded as the primary source of decisions and, thus, control of behaviour" (Frijda, 1994); "emotions are a major factor in the interaction between environmental conditions and human decision processes" (Bechara and Damasio, 2005); "emotions play complex roles in economic decision-making" (Dunning et al., 2017; see also : Levenson, 1994;Loewenstein et al., 2001;Mellers et al., 1999;Scherer, 1994).
This putative interplay between emotion and adaptive decisionmaking / behaviour control combines questions about mechanism and function and is also emphasised by researchers working with animals. "Pleasure serves as the common currency in the tradeoffs between clashing motivations" (Cabanac, 1992); emotions "…gear a particular type of motivational control to the perception of critical circumstances" https://doi.org/10.1016/j.neubiorev.2020.01.025 Received 14 June 2019; Received in revised form 11 December 2019; Accepted 20 January 2020 and "constrain decision-making" (Aureli and Whiten, 2003); emotions are "mental and bodily states that potentiate behaviour appropriate to environmental challenges" (de Waal, 2011; see also : Broom, 1998;LeDoux, 2012b;Mendl et al., 2010;Nesse, 2000;Panksepp, 2011;Rolls, 2014). Despite an apparent consensus that an important function of animal emotion, like human emotion, is to coordinate or organise decisions and behavioural control, it is often not made clear what 'coordinating' and 'organising' actually involve, and the exact role that animal emotion plays.
In human research, theory and empirical findings have generated more detailed descriptions of the coordinating and organising role of emotions in decision-making. Many of these are captured in Loewenstein & Lerner's (2003) affect and decision-making schema which is summarised in Fig. 1. Humans are predicted to make decisions and select behaviours based on the expected emotions that they anticipate will occur following the expected consequences of their decisions (Loewenstein and Lerner, 2003; also 'somatic markers', 'affective forecasting', 'subjective expected pleasure', 'affect heuristic': Bechara and Damasio, 2005;Knutson and Greer, 2008;Mellers et al., 1999;Slovic et al., 2007;Wilson and Gilbert, 2005). Contemplation of potential decision outcomes and their associated expected emotions act as anticipatory influences which generate an individual's immediate emotions during decision-making, for example by inducing 'anxiety' if negative or uncertain outcomes are envisaged. Similarly, Knutson and Greer (2008) propose that anticipation of decision outcomes generates what they term anticipatory affect during decision-making, and Bechara and Damasio (2005) suggest that somatic markers of the prospective decision outcome are registered whilst a decision is being made. These markers (changes in bodily physiological and behavioural states (components of emotion)) are hypothesised to signal the consequences of actions in the decision context, and hence to guide decisions. However, Dunning et al. (2017) suggest that immediate emotions are generated partly through contemplation of the action to be taken per se (action-related emotions), irrespective of any prospective decision outcome. Loewenstein and Lerner (2003) propose that both immediate and expected emotions combine to influence the decision taken. For example, current anxiety coupled with a negative expected emotion would predispose caution or avoidance. Immediate emotions are also moderated by incidental influences such as moods and background somatic states; a sunny day may generate a happy mood and alter immediate emotions with knock-on effects for decisions (Loewenstein and Lerner, 2003; see also : Clore and Tamir, 2002;Clore, 1992;Dunning et al., 2017;Mathews and MacLeod, 1994;Mineka et al., 1998;Mogg and Bradley, 2005;Schwarz and Clore, 1983;Slovic et al., 2007). Recursively, immediate emotions can also have indirect effects on expected decision outcomes and expected emotions (Loewenstein and Lerner, 2003).
This human-based schema can provide a useful start-point for framing research on animal affect and decision-making. However, it raises a number of theoretical and empirical questions. How are the concepts of immediate emotion, expected emotion and mood related? Can we operationalise them in such a way that allows us to study them empirically in animals? How exactly might they interact with each other and alter decisions? Does this require a 'common currency' system and, if so, what might this be? What happens following a decision and how does this affect subsequent ones?
We suggest that a conceptual framework that offers answers to these questions can help to structure the study of animal affect and decisionmaking and provide hypotheses that can be tested experimentally. Steps in this direction have been taken by, amongst others, Mendl et al. (2010); Nettle and Bateson (2012); Bach and ; Gygax (2017), and Burghardt (2019). Here we explore this issue in depth by developing a framework that is explicitly couched in terms of modern decision theory, employs an operational definition that renders animal emotion empirically tractable and can be applied across species, shows how this definition readily aligns with prominent dimensional concepts of emotion, and uses principles from reinforcement learning theory to provide a model of the links between animal affect and decisionmaking. We start by defining our use of terms and, after developing our model, end by considering implications, elaborations, caveats, and questions for future research.
'fear', 'elation', 'anxiety', and 'depression'. 'Emotion' refers to short-term event-or object-focused states and 'mood' to longer-term free-floating states. A shared and defining characteristic of these states is that they are valenced -perceived as pleasant or unpleasant, positive or negative, rewarding or punishing (Carver, 2001;Posner et al., 2005;Russell, 2003;Watson et al., 1999). An over-arching term for valenced states is affect and therefore emotions and moods are examples of affective states. Affective states also include the valenced component of sensations such as the unpleasantness of pain or a bitter taste, as opposed to the sensory component of sensations such as the specific (non-valenced) taste of a bitter substance generated by dedicated sensory apparatus (Rolls, 2005).
Human affective states are given linguistic labels ('happiness', 'sadness' etc.) which imply consciously experienced phenomena or 'feelings'. However, we cannot be certain that non-human animals share similar states, experience them subjectively, or even have the capacity for conscious experience (Boly et al., 2013;Dawkins, 2015Dawkins, , 2017Edelman and Seth, 2009;LeDoux, 2017;LeDoux and Brown, 2017;Paul et al., 2020;Wynne, 2004). Contemporary psychologists take a componential view of emotion or affect (Bradley and Lang, 2000;Clore and Ortony, 2000;Frijda, 1986;Panksepp, 2003;Scherer, 1984Scherer, , 2005 acknowledging that subjective feelings are accompanied by behavioural, physiological and neural changes and that, in some cases, human affective states may actually be unconscious (e.g. relevant behavioural responses unaccompanied by reported feelings: Lane, 2015, 2016;Winkielman and Berridge, 2004;Winkielman et al., 2005). Translating this approach to animals allows us to study measurable behavioural, physiological and neural components of affective states in the absence of knowledge about subjective experience.
However, there is concern about translating emotion-words to animals because lay-usage can contaminate technical usage leading to confusion about whether consciousness is implied or not. LeDoux (2017) for example, advocates only using these words to denote conscious feelings, and using terms such as 'defensive survival circuits' to describe exactly what is being studied in animals. We employ a slightly different approach. When using 'emotion' and emotion-words (e.g. anger) or 'mood' and mood-words (e.g. depression) with reference to animals, we add the suffix '-like' to indicate agnosticism about the conscious component of these states (cf. 'episodic-like memory' in animals; Clayton and Dickinson, 1998). However we use terms such as 'affect', 'affective state', 'valence' and 'core affect', which have a more technical than colloquial provenance, in the same way when referring to animals and humans, but without implying conscious experience in animals (see Table 1 for full details).

An operational definition of animal affect
Given the potential for confusion about ideas and terminology in this area, animal emotion research will benefit from a clear operational definition that can provide a foundation for experimental investigations in non-human species (see Paul and Mendl, 2018). The definition should capture the defining characteristic of affective states which is that they are valenced. To this end, we build on reinforcement-based concepts of emotion from 20 th century experimental psychology (Corr, 2008;Corr and McNaughton, 2012;Gray, 1975Gray, , 1987McNaughton and Corr, 2003;Millenson, 1967;Mowrer, 1960;Thorndike, 1911) and draw on Rolls' (2005) definition of emotions as 'states elicited by rewards and punishers' to produce the following definition which is operationalised by defining rewards and punishments in behavioural terms (Rolls, 2005(Rolls, , 2014; see also: Leknes and Tracey, 2008;Seymour et al., 2007).
Animal affective states are elicited by rewards and punishers or their predictors. A reward is anything for which an animal will work, and a punisher is anything that it will work to escape or avoid. Rewards or the absence of punishers, and associated predictions thereof, induce positive affect. Punishers or the absence of rewards, and associated predictions thereof, induce negative affect. Short-term emotion-like states follow immediately from individual rewarding or punishing events, whilst cumulative experience of events influences longer-term mood-like states.
The definition identifies four fundamental affective states: positive states generated by (i) a reward or its predictors or (ii) termination or omission of an anticipated punishment; negative states generated by (iii) a punishment or its predictors or (iv) termination or omission of anticipated reward. Indeed, when people anticipate or gain access to objects or events that they want, or avoid those that they don't want (and are anticipating), they usually experience short-term positive affect (e.g. 'excitement' or 'joy', and 'relief' respectively), and when they anticipate or are exposed to objects or events that they try to avoid, or We use these phrases to denote research areas and fields of study.
'Affect', 'valence' and related terms We use 'affect' as an overarching term for states that have the property of valence (they are positive or negative, rewarding or punishing, pleasant or unpleasant etc.). Affective states include both emotions and moods (see below) but also the valenced component of sensations (e.g. the unpleasantness of pain). They are usually considered to be consciously experienced given their provenance in studies of reported subjective feelings. Nevertheless, affective states also have nonconscious components (e.g. behavioural indicators; neural changes), and because the term 'affect' has a more technical rather than colloquial usage, it is less likely to be interpreted as necessarily implying conscious experience. For these reasons, when referring to animals we use the terms 'affective state', 'affect' and 'valenced state' in the same way as for humans, but without implying conscious experience. 'Emotion ' and emotion-words (e.g. 'fear', 'anger', 'happiness) Following convention in the human psychology literature, we use these terms to refer to short-term valenced states induced by a specific event or object. When referring to animals, we add the suffix '-like' (e.g. 'emotion-like', 'fear-like') to indicate agnosticism about the conscious component of an emotion (cf. 'episodic-like memory', Clayton and Dickinson, 1998). ' Mood' and mood-words (e.g. 'depression') Following convention in the human psychology literature, we use this term to refer to longer-term 'free-floating' valenced states not associated with particular events or objects. When referring to animals, we add the suffix '-like' (e.g. 'mood-like') to indicate agnosticism about the conscious component of a mood (cf. 'episodic-like memory', Clayton and Dickinson, 1998). 'Core affect' and related terms 'Core affect' refers to a conceptual model of human self-reported emotions that identifies valence and also arousal (how activated or energised the individual is) as the key underlying dimensions of affective states. 'Core affect space' is defined by these dimensions. When referring to animals, we use the terms 'core affect', 'valence' and 'arousal' in the same way as for humans, but without implying conscious experience.

'Feelings'
We use this term when specifically referring to the conscious experience of affective states including emotions, moods, and sensations.
M. Mendl and E.S. Paul Neuroscience and Biobehavioral Reviews 112 (2020) 144-163 fail to obtain those that they want (and are anticipating), they experience short-term negative affect (e.g. 'anxiety' or 'fear' and 'disappointment' respectively) (Camille et al., 2004;Carver, 2001;Leknes and Tracey, 2008;Mellers et al., 1999;Mellers, 2000;Seymour et al., 2005;Watson and Tellegen, 1985;Zeelenberg et al., 2000). Furthermore, the repeated experience of, for example, stimuli or events that people seek to avoid can generate longer-term negative mood states such as depression and other forms of emotional distress (Hammen, 2005;Kessler, 1997;Schilling et al., 2007;Turner and Lloyd, 1995). We know that there are many stimuli or resources (usually beneficial or harmful (e.g. food, mates, shelter, predators)) that non-human animals will work to acquire or avoid (Dawkins, 1990), and we can therefore point to one major set of instances in which affective states arise in animals; that is when these things (rewards or punishers) are acquired or not acquired, avoided or not avoided. According to our definition, short-term emotion-like states occur in these circumstances.
Animals don't just work to acquire or avoid biologically salient stimuli, they will also work to access or avoid stimuli that predict these things. For example, the phenomenon of second-order conditioning demonstrates that animals will work to access a cue that predicts a firstorder cue that predicts a biologically salient stimulus (e.g. Gilboa et al., 2014). This indicates that the first-order cue is itself rewarding and so, following our definition, can induce short-term affect. A second set of instances when short-term emotion-like states occur in animals is, therefore, in response to predictors of rewards and punishers.
As well as working for short-term stimuli or instantaneous reward, animals also work or show preferences for environments that they have experienced over a longer time period. For example, they exhibit conditioned-place preference for, or avoidance of, environments in which they have, over the course of minutes to hours, experienced the effects of a drug, or mild threat, or social contact (e.g. Tzschentke, 2007). They also show preferences for and work harder to access some environments in which they have lived for periods of days to weeks, compared to others (Mason et al., 2001;Nicol et al., 2009). In line with our operational definition, working to access or avoid these environments can be taken as evidence of longer-term 'mood-like' states generated by cumulative experience of rewarding or punishing events in the environments. This would be most convincingly demonstrated if, for example, an environment characterised by a high ratio of rewarding:punishing events is worked for more strongly than one characterised by a lower ratio, indicating that animals are working for / preferring an emergent property of the environment rather than individual events per se. We would also expect that preferences / work would alter over time as the characteristics of environments changed (e.g. punishers became more or less frequent). Unfortunately, there is a paucity of such data, but there are new methods that may help us measure longer-term mood-like states which we discuss in Sections 6.5 and 6.6.
Our operational definition is thus useful in a number of ways. It incorporates the notion that valence is the defining feature of affective states. Its behavioural grounding allows animal affect to be studied empirically; rewards and punishers can be identified experimentally and then used to induce positive or negative valence, thus providing a 'ground-truth' for the animal's affective state. It identifies four types of affective state that can occur in at least three temporal contexts; in response to immediate reward or punishment, in response to predictors of reward and punishment, and across longer time scales divorced from specific events.
If we assume that rewards enhance survival and punishers threaten survival, the approach provides an evolutionary framing that focuses on two major imperatives likely to be evident in most animal speciesacquiring resources for and avoiding threats to survival and reproduction. It can thus be readily generalised across taxa. Moreover, the definition allows affect to be studied independently of the question of consciousness (cf. Dawkins, 2015Dawkins, , 2017. Affective states, as defined, need not be consciously experienced in other species. Rather, they can be thought of as being implemented in the nervous system and forming the evolutionary roots of conscious emotion that has evolved in humans and probably other species too. Separate empirical, theoretical and philosophical studies of animal consciousness (Boly et al., 2013;Edelman et al., 2005;Edelman and Seth, 2009;Paul et al., 2020;Seth et al., 2005;Wemelsfelder, 2001;Wynne, 2004) are required to establish which species may share this conscious experience. Despite the utility of the definition, it inevitably has limitations which we discuss in Section 6.1.

Mapping the definition to dimensional models of emotion
Dimensional models of human affective states posit that felt emotions and moods can be explained by the interplay between two or three underlying neurobehavioural systems or dimensions. The notion of valence (positivity vs negativity) is at the heart of these models. For example, a prominent model is that of core affect (Posner et al., 2005;Russell, 2003) in which the dimensions and associated systems are valence and arousal. Related theories propose that key dimensions reflect the activity of neurobehavioural systems concerned with Reward Acquisition (RAS) and Punishment Avoidance (PAS) (Carver, 2001;Corr, 2008;Higgins, 1998;Norris et al., 2010;Watson and Tellegen, 1985;Watson et al., 1999). These two types of model have been combined by conceptualising RAS and PAS as lying at 45°to the core affect axes (Burgdorf and Panksepp, 2006;Carver, 2001;Knutson and Greer, 2008;Mendl et al., 2010;Russell and Barrett, 1999;Yik et al., 1999;Fig. 2).
In Fig. 2 we show how the four fundamental affective states of our operational definition map closely to locations in 'core affect' space. The arrival of rewards or their predictors immediately generates a transient increase in RAS activity leading to a REW-H ('high-reward') state (green circle in Fig. 2A), whilst absence of, or failure to obtain, expected rewards generates a transient decrease in RAS activity and associated REW-L ('low-reward') state. Likewise, punishment or its prediction transiently increases PAS activity resulting in a PUN-H state, and omission or successful avoidance of an anticipated punishment decreases PAS activity generating a transient PUN-L state. Activation of either system is associated with an increase in arousal that prepares the organism for (vigorous and imperative) action to acquire the reward or avoid the punisher (Bach and Boureau and Dayan, 2011;Carver, 2001;Trimmer et al., 2013). Valence is dependent on which system is activated or deactivated (high RAS activity and/or low PAS activity for positive valence, and vice versa for negative valence: Carver, 2001;Higgins, 1998). We suggest that these short-term transient changes in system activity correspond to emotion-like states.
An individual's history of activation or deactivation of RAS and PAS is likely to influence the baseline or 'resting' activity of these neurobehavioural systems which can be likened to a background 'freefloating' mood-like state. Thus, we propose that mood-like states reflect some cumulative function of prior short-term RAS and PAS emotionlike states, which is likely discounted with time (cf. Bach and Eldar et al., 2016;Gygax, 2017;Mendl et al., 2010;Nettle and Bateson, 2012). These are depicted in Fig. 2B as two clouds of red points that can be likened to a sampling distribution in memory (cf. Kacelnik and Bateson, 1996) made up of past affective outcomes of decisions weighted by their frequency of occurrence and recency. In this case, the individual has experienced many punishing events (PUN-H) and a fair number of events in which it failed to acquire rewards (REW-L). Theoretically, RAS and PAS may operate independently (Carver, 2001;Norris et al., 2010) or interact, including via mutual inhibition, to generate a higher-level bipolar dimension (Leknes and Tracey, 2008;Tellegen et al., 1999). Accordingly, we represent their interplay as generating competing areas of activity in core affect space and/or combining to influence a single location (cloud of grey points in Fig. 2B) and discuss this further in Section 6.7.
Neural structures associated with reward acquisition (RAS) include mesolimbic dopaminergic and opioidergic circuits, (medial) orbitofrontal cortex, nucleus accumbens and ventral pallidum, and those associated with punishment avoidance (PAS) include serotonergic circuits, (lateral) orbitofrontal cortex, amygdala, anterior insula, and lateral habenula (Berridge, 2003;Kringelbach, 2013, 2015;Boureau and Dayan, 2011;Cardinal et al., 2002;Cohen et al., 2012;Knutson and Greer, 2008;Leknes and Tracey, 2008;Norris et al., 2010;Redish, 2015;Robbins and Everitt, 1996;Rolls, 2005;Schultz et al., 1997;Seymour et al., 2007). For example, the midbrain dopamine system is a candidate RAS with individual experiences of reward and its prediction generating transient phasic responses, and tonic activity potentially reflecting average reward rate, whilst a PAS role for the serotonergic system has also been proposed, but evidence for this is weaker (Boureau and Dayan, 2011;Cools et al., 2007;Dayan andHuys, 2008, 2009;Niv et al., 2007; but see: Daw et al., 2002;Leknes and Tracey, 2008). We should note that some, perhaps many, brain areas may be associated with both reward and punishment processing (Cohen et al., 2015;Hamann and Mao, 2002;Kim et al., 2016;Leknes and Tracey, 2008;Lindquist et al., 2012;Sergerie et al., 2008), and there is even evidence for flexibility within a brain area across time and context (Berridge, 2019;Reynolds and Berridge, 2008). The notion that tonic activity in relevant circuits provides information about past experience is similar to Roll's (2005) suggestion that mood states reflect reverberation or 'carry-over' activity in relevant reward or punishment circuits.
The dimensional 'four affective state' model illustrated in Fig. 2 can be thought of as a set of building blocks from which more complex emotion-like states can be constructed. For example, it has been argued that the same basic state generated by different stimuli (e.g. acquisition of food or sexual reward) is likely to be differentiated according to the different sensory and contextual properties of the stimuli (Rolls, 2005). Constructionist accounts of the generation of conscious emotions in humans make a similar argument positing that core affect combines with cognitive representations of other features of the ongoing situation and past experience, to generate a potentially limitless set of emotions (Barrett, 2017a, b). Such accounts can also be translated to animals (Bliss-Moreau, 2017). For example, in 'cognitively sophisticated' species core affect may be coloured by the influence of detailed social knowledge about the eliciting situation to generate 'grief-like' (elephants: Douglas-Hamilton et al., 2006) or 'envy-like' states (inequityaversion in non-human primates: Brosnan and de Waal, 2003). A different view is that discrete emotion-like states (e.g. 'fear-like', 'happylike') are generated fully-formed by the activation of independent neurobehavioural systems (e.g. 'FEAR system', 'PLAY system': Panksepp, 1998Panksepp, , 2005. In Section 6.2, we discuss how this discrete emotion model may be incorporated into our framework.

Integrating animal affect and decision-making theory: a model
Combining our operational definition with dimensional models of emotion generates a scientifically tractable approach to studying animal affect. In particular, it provides an empirical way of specifying an animal's location in core affect space. For example, presenting the animal with a known reward is assumed to generate a REW-H state, whilst removing a known punisher generates a PUN-L state. This allows systematic investigation of core affect states in animals, including how they affect decision-making, whilst leaving the question of whether these states are consciously experienced to be addressed separately.
As we now discuss, the operational definition also dovetails with principles of modern decision theory. This enables us to incorporate the concept of core affect into decision-theoretical accounts of how actions are selected and hence decisions are made. In doing so, we build on past proposals in psychology and behavioural economics for how felt emotions influence human decisions, and our own previous work hypothesising that mood-like core affect states aid adaptive decisionmaking (Mendl et al., 2010; see also : Gygax, 2017;Nettle and Bateson, 2012). We take, as our foundation, principles of Bayesian decision theory and reinforcement learning theory, acknowledging that are predicted by an operational definition of animal affect and map on to the quadrants of core affect space. They are associated with the activation (or low activation threshold) or deactivation (or high activation threshold) of RAS and PAS whose activity lies at 45°to the core affect axes. (A) The green circle depicts a transient core affect response (RAS activation) to a rewarding event -a short-term emotion-like state. (B) Red circles represent hypothetical (in this case, negatively-valenced) baseline activation of systems (or inverse activation thresholds) reflecting the individual's history of reward and punishment and associated short-term emotion-like states of the sort shown in (A). Circle size reflects cumulative frequency, duration or intensity of rewarding or punishing experience (discounted with time) and hence the 'weight of evidence' for a particular state which may be coded as a sampling distribution in memory (cf. Kacelnik and Bateson, 1996). RAS and PAS can have both negatively valenced (left half of core affect space) and positively valenced (right half of core affect space) states. The grey circle indicates a resulting hypothetical integrated location in core affect space. Adapted from Russell (2003); Panksepp & Burgdorf (2006); Knutson and Greer, (2008); Mendl et al. (2010). See text for details. decisions can involve both Pavlovian and instrumental processes and can be effected by both model-free and model-based behavioural control (Dayan and Berridge, 2014). We demonstrate that affective states may influence the decisions of animals in a number of ways depending on the nature of the decision problem (high or low levels of ambiguity or uncertainty) and the behavioural controller(s) employed. We thus propose a descriptive model of the interplay between affect and decision-making that precisely describes hypothesised actions of short-and long-term affective states before, during and after decisions.

Basic principles of decision theory
Situations, actions, value and the decision problem: Bayesian decision theory proposes that an individual carries (neural) representations of situations of the world where a situation comprises the animal's own state (behavioural, physiological etc.) and that of its environment (Redish, 2015). The individual also represents a probability distribution of future situations, has a set of actions that it can take, and each action has a value in each situation -more beneficial actions have higher value. The decision problem is to select the action which maximises value in the current situation by leading to the most beneficial future situations. Although actions are usually thought of as being directed externally at the animal's environment, they can also include internallydirected choices such as whether and how to deploy attention and working memory (Dayan, 2012). Reinforcement learning (RL) theory offers a set of principles for solving the decision problem (Averbeck and Costa, 2017;Bach and Dayan, 2017;Dayan, 2012;Dayan and Berridge, 2014;Huys et al., 2011;Rangel et al., 2008;Redish, 2015;Sutton and Barto, 1998).
Pavlovian and instrumental RL: Instrumental RL allows selection of arbitrary actions in a given situation according to the value of outcomes of these actions, in order to optimise long-term pay-offs. Pavlovian RL relates to a smaller set of often preparatory actions (e.g. approach or avoid) which are 'automatically' elicited by features of the current situation that have reliably (e.g. during the species' evolutionary history) predicted the value of the next situation. Thus, a current situation that includes the appearance of a prey item and hence predicts food intake, automatically elicits approach behaviour. This may be the case even if the appropriate response in that particular circumstance is to move away or perform a detour (Guitart-Masip et al., 2012;Jones et al., 2017). Despite the apparent 'hard-wired' nature of these Pavlovian responses, changes in how the current situation predicts the value of the next situation can be learnt leading, for example, to Pavlovian responses to previously neutral cues.
Model-free (habitual) and model-based (goal-directed) control: Computationally, RL decisions can be achieved using model-free or model-based control. In model-based (goal-directed) control a model of the world is simulated, for example as a decision-tree predicting how actions at different decision-points will lead to transitions from one situation to the next. The tree must be searched to establish which actions and associated outcomes optimise long-term future gains. Specific outcomes of decisions are represented (e.g. whether food X or food Y will be acquired at the end of a decision sequence). This allows their current value to be evaluated (e.g. I am full of food X so it is less valuable at present) and decisions made accordingly. Model-based control is thus flexible and rapidly adaptable to new circumstances. However, as the decision-tree acquires more branches, prospective simulation of decision outcomes rapidly becomes cognitively and computationally demanding and so the efficacy of model-based control is tightly constrained by limitations in information processing and working memory capacity (Bach and Dayan and Berridge, 2014;Dayan and Daw, 2008;Dickinson, 1985;Dickinson and Balleine, 1994;Dolan and Dayan, 2013). In humans, model-based control equates to conscious reflection about two or more options that one might take, where they may lead, what might be done next, and what the eventual outcome is likely to be. In animals the underlying neural processes may be similar but whether they are accompanied by the conscious experience of future planning is unknown (e.g. Raby et al., 2007).
Model-free (habitual) controllers dispense with the need for an internal model of the world and instead make decisions based on the value of actions in the current situation, as determined by past experience. These values are (neurally) represented in memory as a form of general utility but, unlike in model-based control, the specific outcomes of the actions are not represented. In model-free control, memory of how valuable an action was in the past determines decisions, dispensing with the taxing forward simulation of model-based control. Model-free control is thus computationally more efficient. However, a lack of explicit representation of the decision outcome (e.g. food X or food Y) means that any current devaluation of that outcome (e.g. due to specific satiety) does not alter decisions until the new value of the outcome is experienced. Model-free control is thus less flexible and responsive to changes in current circumstances than model-based control (Dickinson, 1985;Dickinson and Balleine, 1994;Dolan and Dayan, 2013). Instrumental RL can involve both types of control. Although Pavlovian RL has generally been assumed to involve model-free control, model-based control may also play a role in certain circumstances (Dayan and Berridge, 2014). Neural substrates of model-based and model-free control have been identified (Dolan and Dayan, 2013), and one way of behaviourally discriminating between the two is to determine whether current devaluation of decision outcomes alters decisions appropriately (model-based) or not (model-free) (Dickinson, 1985;Dickinson and Balleine, 1994).
We now combine the above principles from reinforcement learning theory with our operationalized definition of core affect to provide an account of the interplay between affect and decision-making. The framework that we develop is particularly relevant to decision-making under model-free control but we suggest how it may be extended to model-based control in Section 5.6. We illustrate our framework using the example of a rat moving through its environment and making decisions (Fig. 3).

An animal's situation comprises elements including its internal status, memories, affective coding of past reward and punishment, and incoming sensory information
The rat's situation at any one time can be conceptualised as comprising a number of elements that are forms of neutrally-encoded information and are illustrated in the grey box in Fig. 3. Information about its current internal status is provided by variables such as blood sugar level, body temperature, and hormone levels; in our example, blood sugar levels are low. Information about its external environment is provided by incoming sensory information from the current environment; here, a predatory snake is about to appear. The rat also carries memories of situations. These memories are from the individual's own past, but also include 'evolutionary memory' of cues and scenarios that reliably predicted fitness-enhancing or threatening outcomes during the species' evolutionary history and which are associated with the expression of appropriate Pavlovian actions such as approach or avoid.
A fourth type of information is affectively coded as it pertains to the rat's recent history of success or failure at acquiring reward or avoiding punishment. We propose that this is coded as a mood-like 'baseline' or 'resting' location in core-affect space, embodied as activation or deactivation of Reward Acquisition (RAS) and Punishment Avoidance Systems (PAS), or inverse activation thresholds (cf. Mendl et al., 2010;Nettle and Bateson, 2012;Trimmer et al., 2013). In our example, the rat has recently been through some tough times with little to eat and frequent encounters with predators. Consequently, its mood-like core affect state comprises an activated (or low response threshold) PAS and a deactivated (or high response threshold) RAS similar to that depicted in Fig. 2B and generating an overall PUN-H state.
We now illustrate how these elements of the rat's situation combine M. Mendl and E.S. Paul Neuroscience and Biobehavioral Reviews 112 (2020) 144-163 to guide a sequence of decision-making events as depicted in Fig. 3, with the horizontal axis representing time.

Prediction and detection of new incoming sensory information is influenced by the animal's situation
We propose that information about the rat's current internal status and its history of reward and punishment in the external environment gives it a basis for estimating, respectively, the value and probability of subsequent situations or action outcomes. In our example, the rat's low blood sugar levels enhance the value, or rewarding properties, of outcomes that are associated with energy-rich food. This parallels the notions of motivation and incentive salience whereby choice of actions and how strongly they are performed relates to the animal's need for the associated outcome (Berridge, 2007(Berridge, , 2012Berridge et al., 2009;Cabanac, 1992;Dayan and Berridge, 2014;Niv et al., 2006;Rolls, 2005). At the same time, we propose that the rat's PUN-H mood-like core affect state predicts increased likelihood of negatively-valenced outcomes such as predators or competitors in the environment; based on recent past experience, the rat expects more bad than good things to happen (Mendl et al., 2010;Nettle and Bateson, 2012;Trimmer et al., 2013).
Because the rat's situation includes predictions and valuations of subsequent situations, the arrival and interpretation of sensory information is not a purely passive process. New information, for example cues predicting potential reward or punishment, may thus generate a prediction error (Schultz et al., 1997). Expectation violations and surprise are often associated with increased arousal and alertness and the recruitment of attentional resources (Belova et al., 2007;Esber and Haselgrove, 2011;Lee et al., 2006;Schomaker and Meeter, 2015). The resulting increase in attention to the external environment is also influenced by predictions and is likely to be biased towards detecting cues predicting danger due to the rat's PUN-H mood-like core affect state (Bar-Haim et al., 2007;Bethell et al., 2012b;Bradley et al., 1999;Lee et al., 2016;Mogg et al., 1992). Attentional resources will also be directed towards detecting food-related stimuli due to the rat's low blood sugar levels and associated motivation for food (Field et al., 2016;Werthmann et al., 2013). These influences are illustrated by arrows running from internal status and mood-like core affect to incoming sensory information in Fig. 3, and are a form of internal action selection, in this case deployment of attention. The perception of sensory information is thus modulated through predictions. In this example, the incoming information is pretty unambiguous -a dangerous snake -and the rat's PUN-H state likely facilitates its detection.

Retrieved memories of similar situations provide information on the value of different actions
The rat's situation is now a combination of low blood sugar, a PUN-H mood-like core affect state, and incoming sensory information in the form of the appearance of a snake. The next stage of the decision A PUN-H mood-like core affect state, low blood glucose internal status, and memories of previous situations from the individual's past or species' evolutionary history, determine the rat's initial situation (grey box). Its mood-like state and internal status influence the deployment of attention (thick black and red arrows in the grey box) resulting in the detection of new incoming sensory information, in this case a visual predator cue. The new situation is matched to memory of similar situations in the past. Because the incoming sensory information is unambiguous, and hence likely to provide the basis for reliable prediction of outcomes, it has a stronger influence on memory retrieval (thick dark green arrows) than mood-like core affect and internal status (thin black and red arrows in the grey box). The retrieved memory (dark green box with black outline) is associated with actions previously taken in this situation, including evolved Pavlovian responses (ACTION OPTIONS, in this case approach, avoid, ignore), and their outcomes (ACTION VALUES) which are coded in core affect space and represent AFFECTIVE OPTIONS. Through a core affect comparison process, the action with the most positively-valenced affective option (avoid) is selected. This now becomes an AFFECTIVE PREDICTION (i.e. a prediction of the affective consequence of the selected action in this situation). The PHYSICAL OUTCOME of this action is probabilistic (indicated by thickness of arrows between selected action and outcome), and its associated AFFECTIVE OUTCOME is determined not just by direct experience of reward or punishment, but also by their presence/absence in relation to the affective prediction. Physical outcomes feedback to influence internal status, whilst affective outcomes act as prediction errors to update situation-dependent action values. They also update mood-like core affect. In core affect diagrams, grey dashed arrows denote RAS and PAS, and red and green circles indicate, respectively, negatively and positively valenced states of these systems. Grey circles indicate a hypothetical integrated location in core affect space (see Fig. 2). See text for a full description. Line drawings by Elsa Mendl.
process involves matching this situation to the most similar situation held in memory. This is denoted by the arrow running between incoming sensory information and individual / evolutionary memory of situations in Fig. 3. Retrieval of the situation memory (green rectangle with a black outline in Fig. 3) is assumed to bring with it associated information about the benefits or otherwise of different actions taken in the past by the rat in that situation. This is illustrated in Fig. 3 by the dashed arrows leading from the retrieved memory to three potential actions (ACTION OPTIONS). We suggest that information about the value of these actions is coded as memories of the emotion-like core affect states generated by rewarding or punishing outcomes of the actions in corresponding situations in the individual's and evolutionary past. These ACTION VA-LUES are shown in the centre of Fig. 3 as core affect locations alongside each action.

Memories of emotion-like core affect states resulting from different actions are compared via an affective common currency to identify the most beneficial action in the current situation
Here we consider three ACTION OPTIONS -approach, avoid, ignorewhich have relevance in both instrumental and Pavlovian contexts. Although we give these behavioural labels, each action comes with a set of preparatory physiological actions including, for example, activation of sympathetic-adrenomedullary and hypothalamic-pituitary adrenal 'stress' systems -physiological arousal -to mobilise resources ready for active responses. Whether these physiological actions are set in train when ACTION OPTIONS are retrieved (cf. Bechara and Damasio, 2005), or only when an action is selected requires experimental investigation. Each ACTION OPTION is associated with an ACTION VALUE. In the past in this situation, approach has resulted in a PUN-H emotion-like state due to an attack or chase, avoid has usually resulted in a transient PUN-L state, although occasionally a PUN-H state occurs (e.g. due to a chase), and ignore has generally resulted in a PUN-H state, although occasionally a PUN-L state may occur (e.g. if the predator fails to detect the rat). These action values can be thought of as AFFECTIVE OPTIONS which must be resolved through some form of core affect comparison process (Fig. 3) to identify the most beneficial action.
Such a process requires that core affect acts as a common currency or common scaling allowing comparison of action values across functionally distinct domains (cf. Cabanac, 1992;McFarland and Sibly, 1975;McNamara and Houston, 1986). The notion that affect plays such a role in decision-making was put forward by Cabanac (1992Cabanac ( , 2002 who argued that the valenced-currency of 'pleasure' underpins decision-making. Cabanac and colleagues showed that both humans and rats made decisions based on trade-offs between physiological (e.g. water need, cold challenge) and non-physiological (e.g. sweet taste and, in humans, video gaming) outcomes in an additive way and that, in people, these choices were predicted by reports of (dis)pleasure (e.g. Balasko and Cabanac, 1998;Cabanac and Johnson, 1983;Johnson and Cabanac, 1982). Other studies further demonstrated that humans and rats would work for stimulation of certain brain regions (e.g. septal area) which humans reported as pleasurable, and would work harder for higher stimulation rates. Shizgal and Conover (1996) showed that hungry rats would trade off the value of sucrose reward against increasing brain stimulation rate additively, and proposed that the stimulated brain areas may thus be a neural substrate of an affective common currency. In humans at least, it thus appears that processing of affective information during decision-making may at least sometimes be conscious, but non-conscious 'automatic' implementation is also likely to occur (cf. Bechara and Damasio, 2005). The role of consciousness in a common-currency comparison process in animals remains unknown.
If, as argued, core affect provides a common currency for decisionmaking, we propose that valence determines which action is selected, in line with pleasure and reward maximisation ideas, and arousal influences the vigour and timing (speed, latency) of actions (Bach and Boureau and Dayan, 2011;Carver, 2001;Trimmer et al., 2013). The result of the core affect comparison process is to reduce the affective options to one SELECTED ACTION VALUE which becomes an AFFECTIVE PREDICTION of the outcome of that action. In this case as indicated in Fig. 3, avoid has the most positively-valenced action value, likely due to a combination of the individual's own past experiences, and evolved Pavlovian avoid or freeze responses to cues predicting punishment (Guitart-Masip et al., 2014). If the rat's situation included a reliable prey cue as opposed to predator cue, approach would become the selected action because in the past it likely resulted primarily in a REW-H emotion-like state and hence would have a more positivelyvalenced action value than avoid or ignore.
The whole decision-making process, from detecting incoming information to action selection, may happen extremely rapidly or take several seconds (Knutson and Greer, 2008) depending on factors such as the number of previous situations to which the current situation is similar, the number of available actions, the number of previously experienced outcomes in these situations, and the extent to which one outcome is clearly more rewarding than all others. In neural terms, this process has parallels with competitive interactions between neural representations of external information for control of action output in a two-alternative forced-choice task (e.g. Platt and Glimcher, 1999), except that here internal information from memory is involved and there may be many more than two available options. Such processes can be simulated by drift-diffusion and race models of decision-making (Hales et al., 2016;Trimmer et al., 2008) with elaboration for multiple choices (Bogacz, 2007;Bogacz et al., 2006). In a meta-analysis of human fMRI studies, Knutson and Greer (2008) searched for neural correlates of affective processes occurring during decision-making and found that enhanced activation in nucleus accumbens was associated with anticipating reward (monetary gains), reporting high arousal positively-valenced affect, and selecting high-risk or approach behaviour. In contrast, anterior insula activation increased during both monetary gain and loss anticipation, was correlated with reported high arousal positive and negative affect, and preceded low-risk or avoidance actions.

Selected action values may vary in the certainty of their affective predictions
Selected action values and affective predictions will vary according to how reliably the associated action has led to a specific outcome in recent similar situations. In the example in Fig. 3, the selected action avoid, although the best option available, has had mixed success in the past and has an action value that is intermediate between PUN-H and PUN-L. However, for other individuals, avoid may have nearly always resulted in escape from a predator and will thus have a clear PUN-L action value. The former type of action value will yield a more uncertain affective prediction that incorporates both PUN-H and PUN-L states and this may initiate or maintain (physiological) arousal and vigour in preparation for an uncertain outcome (Knutson and Greer, 2008). In contrast, the latter type may induce a PUN-L affective prediction consistent with the observation that fear-like behaviours in the presence of a cue predicting punishment attenuate during successful avoidance learning (Starr and Mineka, 1977). Some argue that uncertainty of predictions may itself generate a negatively valenced state (Clark et al., 2018; see Section 6.6).

Mood-like core affect and internal status can influence the decision process
Whilst the rat's situation as a whole determines action values, it is possible to envisage the influence of internal status and mood-like core affect by considering likely action values if these changed. For example, the relative value of avoid would increase further in a rat with higher blood sugar levels and less immediate need for food. In terms of moodlike core affect, because a PUN-H state will likely be associated with a frequently threatening environment in which avoidance has been more beneficial than approach, it influences action values in favour of avoid. Mood-like state also influences the overall valence of the current situation per se, in this case making it more negative, and hence predisposes valence-specific Pavlovian action (in this case, avoid). It follows therefore that a change to a REW-H mood-like state would lead to a relative increase in the value of approach. However, influences of mood-like state and internal status are likely to be small when the situation includes an unambiguous predator cue, is clearly dangerous, and hence provides the basis for reliable prediction of outcomes. To reflect this, the arrows running to memory of situations from internal status and mood-like core affect state in Fig. 3 are thinner than that coming from incoming sensory information, indicating the relatively stronger influence of the latter.

The selected action has both physical and affective outcomes
Once selected, the rat's action has a probabilistic PHYSICAL OUTCOME which, in this case, can be successful avoidance (safety) or, for example, injury (punishment) due to inefficient avoidant action. Physical outcomes generate a change in internal status via feedback mechanisms (feedback to internal status in Fig. 3). For example, energy consumption due to fleeing will decrease blood sugar levels. The rat's action also has an AFFECTIVE OUTCOME (Fig. 3) influenced both by the physical presence or absence of reward or punishment, and also by the preceding action value prediction. In this case successful avoidance would result in an emotion-like PUN-L state, whilst injurious punishment would lead to an emotion-like PUN-H state. The latter outcome is unexpected and surprising relative to the affective prediction that guided action selection and is referred to as a prediction error (e.g. Schultz et al., 1997;Fig. 3). In accord with model-free reinforcement learning theory, this acts as a feedback or learning signal about the success or otherwise of the decision made and updates action values associated with the current situation (Rescorla and Wagner, 1972;Sutton and Barto, 1998). In this case, an unexpected punishing outcome would increase the PUN-H weighting of the avoid action value, and hence increase uncertainty of future predictions of this action's likely outcome. In contrast, successful avoidance in line with the affective prediction would have a smaller updating effect. It has been suggested that the difference between affective prediction and affective outcome may be a particularly potent determinant of conscious emotional experiences in people (Eldar et al., 2016;Rutledge et al., 2014), perhaps because expectation violations and surprise generate arousal (Schomaker and Meeter, 2015) and signal new information that needs to be attended to and learned.
In addition to these prediction error effects, we propose that shortterm emotion-like affective outcomes have another function which is to update mood-like core affect (Fig. 3). This follows from our operational definition that mood-like states are determined by cumulative experience of rewarding and punishing events. For example, a PUN-L (safe) outcome would generate a small shift in the rat's mood-like state away from PUN-H.

Action selection in ambiguous situations
In the wild, there are likely to be many situations in which the nature of incoming sensory information is uncertain and ambiguous and yet the individual's survival may depend on it making the correct decision. For example, a rustle in the grass bears some similarity to both reliable predator and prey cues, but if the wrong decision is made the animal risks missing out on a meal or becoming one. How does our model tackle decision-making under ambiguity? Mendl et al. (2010) proposed that long-term mood-like states play a particularly important role and we now provide more detail on how this may actually work within a decision-theory framework.
As discussed, the rat's PUN-H mood-like core affect state is likely to bias attention towards detecting threat, whilst its low blood sugar levels will direct attentional resources towards detecting food-related stimuli (arrows running from mood-like core affect state and internal status to incoming sensory information in the left-hand grey box of Fig. 4). Because a rustle in the grass shares some stimulus properties with both threat and food cues, our rat is more likely to detect the rustling noise than a rat who is satiated and has a more neutral mood-like state.
The detected stimulus updates the rat's situation to one which shares the characteristics of a number of different situations held in memory. Three such 'memories' are illustrated with a black outline in the left-hand grey box of Fig. 4; a lighter green fill-colour indicates association of the memory with a higher likelihood of rewarding outcomes and darker green with a higher likelihood of punishing outcomes. On its own, the incoming sensory information is insufficient to discriminate between these different situations held in memory, but the additional information provided by the rat's mood-like core affect state and internal status can facilitate this discrimination (thick red arrow leading from mood-like core affect to memory of situations in right-hand grey box in Fig. 4).
Assuming that the rat's current PUN-H mood-like state reflects an environment that has been stably and reliably characterised by frequent threat and punishment in the past, it will likely have been in a PUN-H mood-like state during recent encounters with ambiguous stimuli, and these situations will have been associated with a high probability of punishment. Consequently, the rat will match its current 'PUN-H ambiguous' situation with memories of recent 'PUN-H ambiguous' situations (dark green 'memory' selected in right-hand grey box in Fig. 4). In these situations avoid actions were primarily associated with positivelyvalenced PUN-L states, and only occasionally with REW-L states when the rustle presaged food, approach actions usually resulted in PUN-H emotion-like states, and ignore had purely negative consequences resulting from being attacked (PUN-H) or occasionally missing out on a reward (REW-L). These are the affective options shown in Fig. 4. Moreover, because the PUN-H mood-like state makes the overall valence of the situation more negative, it also predisposes valence-specific Pavlovian actions, in this case avoid. Through both instrumental and Pavlovian processes, the PUN-H mood-like core affect state thus effectively acts as a Bayesian prior for the likelihood of different action outcomes under ambiguity and, in this scenario, favours avoid behaviour (the affective prediction in Fig. 4).
Such effects are well known from studies showing that people in negative affective states make more negative or pessimistic judgements about ambiguity (e.g. Mathews and MacLeod, 1994;Mineka et al., 1998) and, more recently, that non-humans animals exhibit a similar relationship between affect and decision-making under ambiguity (e.g. Baciadonna and McElligott, 2015;Bethell, 2015;Clegg, 2018;Gygax, 2014;Hales et al., 2014;Harding et al., 2004;Mendl et al., 2009;Neville et al., 2020;Paul et al., 2005;Roelofs et al., 2016). Our analysis from a model-free reinforcement learning perspective suggests that there may be similarities between the process by which mood-like core affect exert its effects and the phenomenon of mood state-dependent memory in which memories encoded when an individual is in a particular affective state are more readily retrieved when they are subsequently in the same state (Eich, 1995;Eich and Macaulay, 2000;Thorley et al., 2016;Xie and Zhang, 2018). Thus, the rat's current PUN-H mood-like state enhances retrieval of action values from past ambiguous situations that have also been associated with a PUN-H state and hence, assuming some cross-time consistency in the environment, with danger.
In addition to the influence of mood-like states, the rat's low blood sugar internal status enhances the value of food acquisition outcomes and hence, on its own, favours Pavlovian approach responses related to reward acquisition (Dayan and Berridge, 2014). This could occur through enhancement of the value of the current situation hence predisposing 'automatic' Pavlovian selection of approach, or it may be that an evolutionarily programmed Pavlovian policy favouring approach is implemented when blood sugar levels are low.
The conflict between internal status and mood-like core affect may be resolved by the rat's recent history. Whilst its internal status (e.g. blood sugar level) may have been different on different ambiguous occasions because such states change on short time scales, its longerterm core affect is likely to have been more consistently associated with such situations, at least in temporally autocorrelated environments (Nettle and Bateson, 2012). If so, the combination of PUN-H, internal status, and stimulus ambiguity will favour avoid as described above, and this is depicted in the right-hand grey box of Fig. 4 by the arrow running to memory of situations from mood-like core affect being thicker than those coming from internal status and incoming sensory information. However, if the rat has only recently entered a PUN-H mood-like state, then its blood sugar internal status will have a relatively greater impact on its decision. These arguments indicate that the longer lasting a mood-like core affect state, the greater its relative influence on decision-making, especially under ambiguity, although this will depend on the precise autocorrelation structure of the environment (e.g. the longer it has been winter, the higher the probability of a change to spring).

Extension of the framework to model-based control
Our framework has focused on model-free (habitual) control of decisions based on memories of situation-specific action values without any representation of what the actions achieve. In contrast, model-based (goal-directed) control involves simulated predictions of exactly what will be achieved, combined with information about the likelihood and value of achieving it. Fig. 5 shows how our framework might be implemented for model-based decision-making. For illustrative purposes we show an unambiguous prey cue -a grasshopper (food X). As in the model-free case, internal status and mood-like core affect combine to influence attention and the consequent detection of incoming sensory information which is then matched to similar situations in memory, in this case encounters with grasshopper prey (left hand grey box of Fig. 5). However once the closest matching memory has been retrieved, model-based processing involves prospectively searching (in working memory) a decision-tree representation of potential chains of actions and their outcome identities, associated with the retrieved memory (right hand shaded box of Fig. 5).
So, for example, approach-chase has been the most successful sequence in the past, followed by approach-stalk, then avoid-detour towards prey (e.g. because this lulls prey into a false sense of security), ignore-detour, and then avoid-continue foraging for other prey and ignorecontinue foraging. In addition to these AFFECTIVE OPTIONS the decision-tree model also represents decision OUTCOME IDENTITIES. For example, the continue foraging options, although relatively unsuccessful, sometimes result in catching another prey type (food Y). Prospective search of the model identifies the best option which becomes an AFF-ECTIVE PREDICTION & OUTCOME IDENTITY PREDICTION, in this case approach-chase to get food X. As in the model-free case, when cues are unambiguous, internal status and mood-like core affect have relatively minor influences on the retrieval of memories of situations. However, they can have a stronger effect on decision-tree predictions. This can be illustrated through the phenomenon of specific-satiety in which the rat's current internal status influences the value of different food types. If the rat has recently eaten and hence is satiated on food X, this will be devalued relative to food Y. Because the identity of decision outcomes is represented in model-based control, the rat's internal status can immediately update the value of different outcomes according to their current utility. In this case, the continue foraging options will become more highly valued because they lead to acquisition of food Y rather than food X, and this may alter the decision made. This effect is indicated by the arrow leading from internal status to the decision-tree model. It of course depends on internal status carrying information about which foods have been eaten, as empirically demonstrated in specific-satiety studies (Balleine and Dickinson, 1998;Correia et al., 2007;Kringelbach et al., 2003;O'Doherty et al., 2000). Model-free processes, lacking decision outcome identities, cannot implement such changes prospectively.
In ambiguous situations, mood-like core affect and internal status Where incoming sensory information is ambiguous, mood-like core affect and internal status provide additional information which can help to discriminate between the situations held in memory. In this case, mood-like core affect has a particularly strong effect favouring retrieval of memories of ambiguous situations in which decision-outcomes tended to be negative (thick red arrow in the grey box and dark green box with black outline). Consequently, the associated actions and AFFECTIVE OPTIONS favour selection of avoid behaviour, as described in Fig. 3. In core affect diagrams, grey dashed arrows denote RAS and PAS, and red and green circles indicate, respectively, negatively and positively valenced states of these systems. Grey circles indicate a hypothetical integrated location in core affect space (see Fig. 2). See text for a full description. Line drawings by Elsa Mendl.
may influence retrieval of memory of situations as described for modelfree decision-making, and hence influence decisions in this way. However, it is possible that they also influence decision-tree searching.
In humans this process likely involves deliberative conscious simulation of information in working memory, one piece of relevant information being the person's current mood which can sway the outcome of working memory simulations in one direction or another, especially when predictions are uncertain. For example, effects of moods on human decision-making may involve some overall assessment of potential decision outcomes that is coloured by mood (e.g. 'how do I feel about it? ' Gasper and Clore, 1998;Clore, 1983, 2003). It is possible that this effect on the prospective search process, as opposed to the memory retrieval process, also occurs in animals during modelbased decision-making and underpins some empirical findings of affectinduced changes in animal decision-making under ambiguity (Harding et al., 2004). Such a 'high-level' deliberative effect might conceivably

Fig. 5. Schematic illustration of the hypothesised role of core affect in model-based decision-making in unambiguous situations.
Through the same processes as detailed in Fig. 3 and represented in the left-hand grey box, the rat detects unambiguous incoming sensory information (in this case a visual prey cue) and retrieves a memory of similar situations from its individual past / the species' evolutionary history. The retrieved memory (light green box with black outline) is associated with a decision-tree model of (sequences of) actions and the identities of their outcomes in similar scenarios (grey-shaded box). By prospectively searching this model of AFFECTIVE OPTIONS and OUTCOME IDENTITIES the animal can evaluate the best option which becomes the AFFECTIVE PREDICTION & OUT-COME IDENTITY PREDICTION, in this case approach-chase to acquire food X. Internal status can influence the decision-tree process as indicated by the thick black arrow leading to the decision tree. For example, if the rat is specifically-satiated by recently consumed food X, this will bias decisions in favour of those with a food Y outcome. Mood-like core affect can also exert an influence, but this will be most evident under ambiguous situations (hence the thin red arrow leading to the decision-tree in this unambiguous situation) where a PUN-H state may colour evaluation of the options on offer in favour of those which minimise negative outcomes such as encounter with a predator. In core affect diagrams, grey dashed arrows denote RAS and PAS, and red and green circles indicate, respectively, negatively and positively valenced states of these systems. Grey circles indicate a hypothetical integrated location in core affect space (see Fig. 2). See text for a full description. Line drawings by Elsa Mendl and Michael Mendl. Loewenstein and Lerner's (2003) schema, immediate emotions during decisions correspond to our notion of AFFECTIVE OPTIONS. They are moderated by incidental influences, including moods and other aspects of the decision scenario, just as mood-like core affect and internal status influence affective options in our model. Immediate emotions also influence expected consequences of a decision and their associated expected emotions which correspond to AFFECTIVE PREDICTION in our model. Immediate and expected emotions determine the decision made, just as affective options and the resulting affective predictions determine the ACTION selected in our model. Adapted from Loewenstein and Lerner (2003).

Fig. 6. Loewenstein & Lerner's (2003) emotion and decision-making schema including corresponding constructs from our model. In
also take place during the core affect comparison process in model-free decision-making (Figs. 3 and 4), but this is usually regarded as more 'automatic' and not requiring working memory resources or forward simulations (Dayan and Daw, 2008;Dolan and Dayan, 2013;Redish, 2015). The way in which mood-like states influence decisions may thus depend on whether model-free or model-based control is in operation.

Implications, elaboration, caveats, and questions
We believe that our model offers a structured way of thinking about how animal affective states influence decisions, which has relevance across a broad range of species. It also identifies specific processes that illuminate more general concepts alluded to in schema of the interplay between human emotion and decision-making, including how different sorts of affective state inter-relate.
Returning to Loewenstein & Lerner's (2003) framework (Fig. 6), we suggest that their notion of expected emotions corresponds to our AFF-ECTIVE PREDICTION which incorporates an operationalized concept of affect and provides a clear hypothesis for exactly how the properties of core affect states can guide action selection and choice. In our model the combined core affect locations of AFFECTIVE OPTIONS during the core affect comparison process or decision-tree search may correspond to Loewenstein & Lerner's immediate emotions (see also: Bechara and Damasio, 2005;Knutson and Greer, 2008). For example, in Fig. 3 this would yield a largely high-arousal and negatively-valenced 'anxietylike' PUN-H dominated state that acts to prepare the rat for rapid vigorous action. Our model also proposes a role for incidental influences such as 'mood-like core affect' and 'internal status' and shows exactly how they can influence decision-making via effects on internal actions such as attention, memory retrieval and decision-tree searches, and how they are altered by decision outcomes to affect future choices.
Our model thus captures the key affective phenomena thought to influence human-decision making and provides an explicit description couched in modern decision-theory of exactly how they can be integrated to guide choices and action selection. Nevertheless, it inevitably has limitations and raises further theoretical and empirical questions, some of which we now consider.

Limitations to the operational definition
The operational definition states that positive affect is induced by stimuli or events (rewards) for which animals will work, and negative affect by stimuli or events (punishers) that they will work to avoid (Rolls, 2005). However, in certain circumstances there may be exceptions to this. For example, 'wanting' or motivation to access a stimulus may sometimes dissociate from 'liking' of that stimulus, as during drug addiction in which individuals work hard to get hold of drugs despite a decrease in the reward that they derive from them (Berridge et al., 2009). Animals may also sometimes voluntarily work to access stimuli that are potentially dangerous, as during predator inspection (Blaszczyk, 2017;Fitzgibbon, 1994;Humphrey, 1972;Humphrey and Keeble, 1974), although involuntary encounter with a predator is unlikely to induce positive affect and will be actively avoided. Thus, the context in which work is observed, including the degree of control that the subject has, may influence what the subject is prepared to work for or avoid, and how hard. This is an empirical issue to bear in mind when applying the operational definition to identify rewards and punishers. A related methodological issue is that it may be difficult to measure how hard animals will work to actively avoid a punisher. This is because evolved Pavlovian predispositions to perform specific actions in particular situations may favour freezing rather than active instrumental responses to dangerous stimuli (e.g. Guitart-Masip et al., 2012;Jones et al., 2017).
A theoretical challenge is provided by the suggestion that the difference between anticipated reward / punishment and the actual reward / punishment received has a stronger influence on affective state than the absolute levels of reward or punishment themselves. Thus, an individual's 'trajectory' -whether it is doing better or worse than expected -could be an important determinant of, in particular, its longterm affective state (e.g. humans: Eldar et al., 2016;Rutledge et al., 2014). Empirical studies of animals are needed to investigate this possibility, although an initial analysis failed to find supportive evidence (Raoult et al., 2017).
Notwithstanding these issues, we believe that the operational definition has much to offer in providing a start-point for experimental studies of animal affect.

Discrete vs dimensional models of emotion and model-free vs modelbased control
For reasons outlined earlier, our framework is grounded in a dimensional core affect model of emotion (Posner et al., 2005;Russell, 2003;Russell and Barrett, 1999) which posits that interplay between core affect and contextual information can construct a range of specific emotions (Barrett, 2017a, b). However, an alternative discrete emotions model proposes that there are a number of basic emotional stateshappiness, fear, anger, sadness, disgust etc. -controlled by specific neural circuits and characterised by a distinct set of behavioural, physiological and subjective responses, and that these states are the building blocks for all emotions (Darwin, 1872;Ekman, 1992Ekman, , 1999Izard, 2007;Panksepp, 1998Panksepp, , 2005. Ongoing arguments concern which of these models most accurately reflects how emotions arise and are coded in the brain (Adolphs, 2017;Barrett, 2017b;Kragel and Labar, 2016;Lindquist et al., 2012). If both have some veracity, another question is which has causal primacy? For example does core affect, in combination with contextual information, generate discrete emotions as constructionist theorists would argue (e.g. Barrett, 2017a, b;Bliss-Moreau, 2017; Fig. 7A)? Or is there a process by which the activation of ongoing discrete emotion systems is integrated to generate an overall core affective valence and arousal state (Fig. 7B)? Or is there some form of bidirectional relationship (Izard, 2007;Mendl et al., 2010;Panksepp, 2007)? One possibility is that interplay between discrete and dimensional systems allows the former to be mapped to the common currency that the latter can provide, hence enabling it to generate action values. For example, 'happiness' could map to REW-H space, 'fear' to PUN-H space, and 'anger' to an intermediate or slightly positive high arousal location (Carver and Harmon-Jones, 2009;Russell, 2003;Watson et al., 1999). In this way, discrete emotion systems can be incorporated into the decision-making schema that we have described.
It is also possible that different decision-making processes predispose discrete emotion-like or core affect states. Bach and  suggest that the general utility metric in model-free control of reinforcement learning decisions has parallels with the common currency of core affect. On the other hand model-based control, by representing specific actions and outcomes, recruits distinct behavioural and physiological responses that may engender discrete emotional states (Bach and ). In our model, core affect processes are viewed as important in both model-free and model-based decision-making (Figs. 3-5).

Arousal and vigour: the problem of low-arousal action values for punishment avoidance
We have proposed that arousal, including enhanced attention and activation of sympathetic-adrenomedullary and hypothalamic-pituitary adrenal 'stress' systems, heightens in anticipation of reward or punishment, uncertainty of the outcomes of optimal actions, and/or in response to prediction error and activation of Reward Acquisition or Punishment Avoidance Systems resulting from such errors (cf. Esber and Haselgrove, 2011;Knutson and Greer, 2008;Lee et al., 2006;Schomaker and Meeter, 2015). Elevated arousal plays an important role M. Mendl and E.S. Paul Neuroscience and Biobehavioral Reviews 112 (2020) 144-163 in preparing the organism for vigorous action to acquire anticipated rewards or avoid anticipated punishers (Boureau and Dayan, 2011;Carver, 2001;Trimmer et al., 2013). A potential problem for our model is that the core affect value of actions that avoid punishment are envisaged to lie in the low arousal PUN-L quadrant, whereas we would expect that such actions are associated with high arousal and associated vigour.
As mentioned earlier, our hypothesis is that valence determines which action is selected whilst arousal determines the vigour of action execution. If so, the arousal level of the action value which is most positively-valenced (selected) may indeed be low (PUN-L). For example, Starr and Mineka (1977) demonstrated a decrease in fear-like behaviour in rats that had learnt actions that were reliably successful at avoiding punishment, as long as feedback was given contingent on these actions. In such circumstances, punishment avoidance will be executed calmly and efficiently. In contrast, when the selected optimal action is associated with variable and sometimes unsuccessful outcomes, as in Fig. 3, its AFFECTIVE PREDICTION arousal level will be higher and hence responses will be less calm and more vigorous (see Section 5.4.3).
Our model also raises the possibility that changes in arousal are initiated during the core affect comparison process before an action is selected, and are influenced by the combined core affect locations of AFFECTIVE OPTIONS. Since these options usually include actions whose anticipated outcomes are rewarding and/or punishing, associated activation of Reward Acquisition and/or Punishment Avoidance Systems and elevated (physiological) arousal may be initiated whilst the optimal action is being selected. This arousal may be maintained, due to temporal inertia in physiological systems, and have an energising effect on the selected action, even if its action value is PUN-L.
Studies of the temporal dynamics of physiological changes during the decision-making and action selection process (cf. Lang and Bradley, 2010;Lang et al., 2000) are needed to discriminate between the above two explanations. In a recent experiment with chickens, Davies et al. (2015) found that during decisions to approach either a high pay-off but risky or low pay-off non-risky location, heart-rate changes did not predict the decision made. However, following the decision they did anticipate the riskiness of the selected outcome, in line with the first explanation that variation in the predicted outcomes of a selected action (AFFECTIVE PREDICTION) can influence arousal.
Another possibility is that there is a transient high arousal component to both successful avoidance of punishment and thwarting of reward acquisition. Studies of humans indicate that 'relief' can be an invigorating and arousing state, and that 'frustration' in the face of reward loss or acquisition failure is also arousing, for example leading to anger (Berkowitz and Harmon-Jones, 2004;Gatzke-Kopp et al., 2015;van Well et al., 2019). Likewise, animal studies report enhanced activity rates and even spontaneous aggression following the denial of anticipated reward (Arnone and Dantzer, 1980;Dudley and Papini, 1997;Haskell et al., 2000;Roper, 1984;Waitt and Buchanan-Smith, 2001), and high arousal play behaviour when farm animals are released from confined and barren spaces that may be both punishing (e.g. discomfort; inability to avoid aggressive conspecifics) as well as lacking in reward (Rushen and de Passillé, 2014). Such effects would alter the mapping of Reward Acquisition and Punishment Avoidance Systems to core affect (Fig. 2) by extending the former to include the PUN-H quadrant and the latter to include the REW-H quadrant. These transient core affect states would likely result in the action values of even regularly successful punishment avoidance actions having an enhanced arousal component that could contribute to vigour of execution.

Are affective states the same during and following decisions?
As argued earlier, there is evidence that animals will work to access or avoid not just biologically salient rewards and punishers ('primary reinforcers': Rolls, 2005), but also their predictors, including secondorder cues. Following our operational definition, this means that affective states can occur both in response to decision outcomes, but also to predictors and hence during the decision-making process. One obvious question is whether these states are exactly the same or whether they differ in the way that they are coded according to their role and sequence in the decision-making process. Loewenstein and Lerner (2003), for example, propose that expected emotions (Fig. 1) are evoked without conscious experience, whereas immediate emotions are consciously experienced. Dickinson and Balleine (2010) argued that action values occurring during decisions (AFFECTIVE OPTIONS and PREDICTIONS; Fig. 3) are coded in a different form to affective states occurring after a decision outcome (AFFECTIVE OUTCOMES; Fig. 3). In an ingenious set of experiments, they showed that rats recently trained to press a lever for a Fig. 7. Core affect and discrete models of emotion. Differing views of the causal links between discrete emotions and core affect. (A) Advocates of constructionist theories contend that discrete emotions arise when core affect combines with sensory and contextual information about the specifics of the current situation, including its similarity to past events (e.g. Barrett, 2012Barrett, , 2017bBliss-Moreau, 2017). (B) The contrary view is that discrete emotions are fundamental, and activation of ongoing discrete emotion systems is integrated to generate an overall core affective valence and arousal state. A bi-directional relationship is also possible (Panksepp, 2007;Mendl et al., 2010;Bach and Dayan, 2017). See text for details. food reward were sensitive to reward devaluation via a subsequent independent experience involving pairing of the food with LiCl-induced sickness. Following this experience, the rats reduced pressing when reintroduced to the lever (measured in extinction) indicating modelbased (goal-directed) control of the behaviour. However, this devaluation was prevented if the rats were presented with an anti-emetic alongside the LiCl, presumably by removing the affective impact of the sickness.
If affective information is used in the same form during decisionmaking as following action outcomes, then presenting an anti-emetic just prior to / during a decision should also prevent devaluation effects; rats with experience of LiCl induced sickness that has devalued the food reward will not be able to re-experience this affective state during the decision due to the anti-emetic. They should therefore press the lever just as much as control animals who have not experienced reward devaluation. However, Balleine et al. (1995) found that this was not the case and the rats who had experienced sickness pressed the lever less indicating that affective information was in a different form during decision-making compared to following an action outcome (Dickinson and Balleine, 2010).
Our framework raises the additional possibility that the anti-emetic specifically interferes with a discrete emotion representation of action value ('nausea'), whilst leaving a core affect action value (negativity) intact and available to implement devaluation. Administering a drug with more general valenced effects (e.g. an anxiolytic) during decisionmaking would shed light on this possibility. If prevention of devaluation during decision-making was then observed, this would suggest that similar core affect processes were at play both during decision-making and following a decision outcome.

Mood-like core affect: model and measurement
As we have described, mood-like core affect can have important influences on decision-making and, because it encompasses depressionlike (REW-L) and anxiety-like (PUN-H) states, is itself of interest in disciplines such as neuroscience, psychopharmacology and animal welfare science. Accurate measurement of such states is therefore important, and a clear prediction of our model is that their effects on decision-making are most readily revealed under ambiguity (Fig. 4). As argued, animals in a negatively-valenced PUN-H mood-like state should favour punishment avoidance actions under ambiguity (operationally defined as 'pessimistic') compared to those in a PUN-L state, whilst animals in a positively-valenced REW-H state will favour reward seeking actions ('optimistic') relative to those in a REW-L state. More generally, positive valence will be associated with 'optimistic' decisions under ambiguity and negative valence with 'pessimistic' decisions.
'Optimistic' or 'pessimistic' decision-making under ambiguity can thus provide a window on mood-like states and we have developed a test of such 'cognitive' or 'judgement biases' in which animals are trained to make a 'positive response' (p: e.g. press right lever) to a 'positive cue' (P: e.g. tone of a particular frequency) to obtain a reward, and a 'negative response' (n: press left lever) to a 'negative cue' (N: tone of a different frequency) to avoid punishment. Once trained, subjects receive occasional ambiguous cues (tones intermediate between P and N). The prediction is that animals in a relatively negative affective state make response n more often to these cues indicating anticipation of a punishment -a 'pessimistic' decision -than animals in a more positive state (Harding et al., 2004;Mendl et al., 2009).
Variants of this task have been used in over 100 published studies on a variety of species. The majority support the predictions, but there are also null and opposite results (for a recent meta-analysis, see Neville et al., 2020). Some of these may be explained by a closer analysis of the putative underlying processes suggested by the framework presented here. For example, in the ambiguous situation of Fig. 4, the rat's low blood sugar internal status comes into conflict with its PUN-H moodlike core affect state; the former favours an 'optimistic' approach response by enhancing the value or incentive salience of reward whilst the latter favours 'pessimistic' avoid behaviour. The balance between these two may be determined by how long the animal has been in a particular core-affect state, how autocorrelated the environment is and hence how well mood-like core-affect predicts decision outcomes (Nettle and Bateson, 2012), and how far internal status deviates from a 'desired' homeostatic balance. Different combinations of these factors may explain some of the unpredicted findings in studies of judgement bias. Their effects can be investigated more directly by including measures of reward valuation or incentive salience alongside judgement bias, or by using computational modelling of judgement bias data to evaluate the influence of decision-making parameters such as a subject's estimation of reward value and probability (Iigaya et al., 2016).

Mood-like core affect: relationship to short-term emotion-like states
An assumption of the model is that an individual's history of shortterm emotion-like states (success or failure at acquiring reward / avoiding punishment) leads to additive changes in longer-term moodlike core affect. There is ample evidence that, for example, repeated exposure to stressors can generate depression-like states in humans and animals (Hammen, 2005;Kessler, 1997;McEwen, 2005;Rygula et al., 2005). However, direct empirical study is needed to understand the temporal dynamics of these effects, for example whether more recent events are more heavily weighted than those in the past because they are most likely to be better predictors of what will happen to the individual next.
Using judgement bias as a proxy indicator of mood-like core affect, some studies do indeed find that recent short-term affect manipulations have the predicted effects on decision-making under ambiguity (e.g. Bethell et al., 2012a;Burman et al., 2009;Destrez et al., 2012;Neave et al., 2013;Rygula et al., 2012;Saito et al., 2016). For example, Jones et al. (2018) found that rats experiencing a high frequency of rewarding events during the 15 min preceding a judgement bias test made more 'optimistic' judgements of ambiguity than those experiencing a low frequency of reward. On the other hand, other studies find opposite effects which may result from the paradoxical induction of, for example, PUN-L states when a short-term aversive stimulus is removed prior to testing (e.g. Doyle et al., 2010;Sanger et al., 2011; see also Burman et al., 2011).
To date, however, no study has systematically investigated how different sequences and timing of rewarding and punishing events, with differing temporal relationship to judgement bias tests, influence decision-making under ambiguity. Such studies could generate novel information on how short-term emotions map to longer-term mood-like states and whether it is the relative difference between anticipated and actual reward or punishment -how the individual is doing relative to expectations -or the absolute experience of each reward and punishment that most strongly influences mood-like states (Carver, 2001;Eldar et al., 2016;Franks et al., 2016;Franks and Higgins, 2012;Higgins, 1997Higgins, , 1998Rutledge et al., 2014).
Another possibility is raised by Clark et al. (2018) who argue that high uncertainty about decision outcomes leads to negatively-valenced short-term emotions whilst positive emotions are associated with high certainty about outcomes (see also Seth and Friston, 2016). They suggest that moods reflect the long-term prediction of the certainty or uncertainty of action outcomes. For example, depression reflects a precise long-term prediction that action outcomes will be uncertain. Such a hypothesis could be tested using judgement bias methods by varying an animal's experience of decision outcome certainty, and assuming that REW-L core affect reflects depression-like states. The question of whether the four fundamental affective states generated by long-term activation or deactivation of putative underlying RAS and PAS have domain-specific effects is of interest as it can tell us whether reward and punishment processes work separately or influence each other (Carver, 2001;Leknes and Tracey, 2008;Norris et al., 2010;Watson et al., 1999). The question can be investigated by exposing animals to repeated success or failure at acquiring reward (REW-H / -L) or avoiding punishment (PUN-H / -L). Judgement bias tasks can then be used to test whether REW-H / -L states only influence 'optimistic' or 'pessimistic' responses in reward-based tasks, and PUN-H / -L states only exert an influence in punishment-based tasks, or whether they exert cross-domain influences. Although this question has not been a specific focus of judgement bias studies, partly due to a lack of punishment-only based tasks, there are examples in which punishmentrelated treatments lead to the predicted 'pessimistic' change in rewardbased tasks (e.g. Barker et al., 2016Barker et al., , 2017Chaby et al., 2013;Hales et al., 2016), but also those which find no such effect (e.g. Brydges et al., 2012;Hernandez et al., 2015;Novak et al., 2016;Parker et al., 2014). Double-dissociation studies are needed to address this issue properly. Findings from trans-reinforcer blocking studies that a cue predicting omission of food delivery can block learning about a new cue predicting shock point to generalisation across, in this case, negativelyvalenced REW-L (RAS) and PUN-H (PAS) states (Dickinson and Dearing, 1979).
An assumption of our model is that there are unitary RAS and PAS sensitive to, and hence integrating, reward and punishment histories across many functional domains (e.g. competition for mates, foraging). However, it is possible that there may also be domain-specific RAS and PAS. This can be explored by varying an individual's history of reward and punishment in one functional domain and then testing it in a judgement bias task in which reward is provided in a different domain to see whether the predicted 'optimistic' or 'pessimistic' response profile is exhibited. Again, no judgement bias study has explicitly investigated this question, but nearly all judgement bias tasks use food as a reward and hence have relevance to a 'foraging' functional domain, whilst affect manipulations are many and varied, including environmental enrichment, unpredictable housing, restraint, social isolation, temperature changes, and others which are not directly related to foraging. Many of these studies have found predicted effects on judgement bias, suggesting that reward and punishment information from different functional domains are integrated, perhaps into unitary RAS and PAS. Double-dissociation studies are again needed to provide definitive evidence.
From a theoretical perspective, unitary systems would seem more likely in species whose behavioural and ecological niche predisposes positive correlations within reward and punishment likelihoods across functional domains, compared to those where reward or punishment in one domain is not a good predictor of that in another. That being said, Nettle and Bateson (2012) point out that the fact that it is the same individual acting across contexts is itself an important source of crossdomain autocorrelation. For example, when in poor condition an individual is likely to fare less well across contexts than when it is in better condition. Moreover, ultimately there must be a way of drawing together information, even if domain-specific, to allow a decision to be made. Hence the concepts of common currency, competition between neural pathways for control of motor output, and race or drift-diffusion decision-making discussed earlier.

What about conscious experience of emotional feelings?
As we have emphasised, our framework does not address whether affective states are consciously experienced or occur in the absence of subjective feelings. In humans there is evidence that some physiological and behavioural changes that characterise affective responses appear to occur without conscious awareness (e.g. Lane, 2015, 2016;Winkielman and Berridge, 2004;Winkielman et al., 2005). For example, Winkielman and Gogolushko (2018) found that people presented with masked subliminal or supraliminal images of emotionallyvalenced happy or angry faces drank, respectively, more or less of a sugary beverage, indicating an affectively-guided behavioural response (consumption of a rewarding substance) which occurred in the absence of any subjectively reported emotion. Given this dissociation between subjective and other components of affect in humans it is quite possible that, for example, long-term experience of reward and punishment in a non-human species might generate a neurally-coded mood-like state which functions without any accompanying conscious experience. The finding that, in addition to mammalian and bird species, insects such as bees show predicted judgement biases (Bateson et al., 2011;Deakin et al., 2018;Perry et al., 2016;Schluns et al., 2016) is in line with the notion that mood-like states have adaptive value in guiding decisions that may be evident across a range of taxa, including those which many assume are unlikely to be conscious or may have only rudimentary (e.g. perceptual but not affective) forms of conscious experience (see Barron and Klein, 2016;Mendl et al., 2010Mendl et al., , 2011Mendl and Paul, 2016;Nettle and Bateson, 2012;Paul et al., 2020). These findings thus inform us about the potential evolutionary origins and conservation of affective processes across species (Anderson and Adolphs, 2014;Mendl et al., 2011;Mendl and Paul, 2016), but do not tell us about the conscious experience of such processes.
One possibility is that there is a conscious component of neurallyinstantiated affective processes in some species but not others see: Key, 2016;Klein and Barron, 2016 and debates therein). Separate studies of the existence of consciousness in nonhuman animals are required to address this possibility (Boly et al., 2013;Edelman et al., 2005;Edelman and Seth, 2009;Hampton, 2001;Smith et al., 2003;van Vugt et al., 2018). An alternative idea is that affective processes become consciously experienced in certain circumstances. These could include during surprising or unexpected events which generate prediction errors between anticipated affect and affective outcomes of decisions, a speculation that is supported by recent work suggesting that the difference between the actual and predicted outcome of a decision is a key determinant of reported subjective happiness (Eldar et al., 2016;Rutledge et al., 2014). Another possibility is that integration or binding of information from RAS and PAS generates neural signals that are broadcast widely across the brain for use in decision-making (Dehaene, 2014), and result in conscious subjective report of location in core affect space. Studies of human decisionmaking that collect data on reported subjective experience should allow us to identify which aspects of the interplay between affect and decision-making generate or require access to conscious emotional experience and which do not, and hence give further clues about the possible functions of conscious as opposed to non-conscious affective processes.

Conclusions
Using a simple definition of animal affect which employs behaviourally operationalized concepts of reward and punishment and maps to the dimensional core affect model, we propose a framework, rooted in reinforcement learning theory, for conceptualising the role of affective processes in animal decision-making. This advances the frequentlyexpressed idea that a major function of affective states is to organise and guide behavioural choices, by providing a model of how affective phenomena including AFFECTIVE OPTIONS, AFFECTIVE PREDICTION and AFFECTIVE OUTCOMES underpin moment-to-moment decision making, and how longer-term mood-like core affect provides context to guide decisions. Although our framework is grounded in dimensional models of affect, discrete emotion constructs can also be incorporated.
Our framework generates a number of questions that require empirical investigation as outlined in Section 6. Because it is based on an underpinning operational definition of affect, it can be applied across species allowing comparison of similarities and differences in affectrelated decision-making across taxa, and hence shedding light on ultimate questions about the evolution and function of affective states. This translatability also allows proximate mechanistic questions to be addressed in neurobiologically simpler species such as Drosophila where there is already a good understanding of the neural substrates of reward and punishment learning, a burgeoning interest in affective processes, and candidate systems that have the potential to code a mood-like state (Anderson and Adolphs, 2014;Aso et al., 2014;Deakin et al., 2018;Perisse et al., 2013;Waddell, 2013).
In summary, we hope that the framework developed here provides a formalised way of conceptualising the various types and functions of animal affect in the context of decision-making, generates empirically tractable questions of the sort outlined above, offers a structured approach to addressing these, and helps our thinking about the links between affect and decision-making in a way that is relevant and applicable across a broad range of taxa.