Engineered highs: reward variability and frequency as potential prerequisites of behavioural addiction

Influential learning-based accounts of substance addictions posit the attribution of incentive salience to drug-associated cues, and its escalation by the direct dopaminergic effects of drugs. In translating this account to disordered gambling, we have noted how the intermittent nature of monetary rewards in gambling (i.e. the variable ratio) may allow for analogous learning processes, via effects on dopaminergic signalling. The aim of the present article is to consider how multiple sources of reward variability operate within modern gambling products, and how similar sources of variability, as well as some novel sources of variability, also apply to other digital products implicated in behavioural addictions, including gaming, shopping, social media and online pornography. Online access to these activities facilitates not only unparalleled accessibility but also introduces novel forms of reward variability, as seen in the effects of infinite scrolls and personalized recommendations. We use the term uncertainty to refer to the subjective experience of reward variability. We further highlight two psychological factors that appear to moderate the effects of uncertainty: 1) the timecourse of uncertainty, especially with regard to its resolution, 2) the frequency of exposure, allowing temporal compression. Collectively, the evidence illustrates how qualitative and quantitative variability of reward can confer addictive potential to non-drug reinforcers by exploiting the psychological and neural processes that rely on predictability to guide reward seeking behaviour.


Introduction
Prior to the 21 st century, popular and scientific notions of addiction were closely linked with psychoactive drugs (Courtwright, 2019;Goodman et al., 2007). This fostered an assumption that addiction inhered in the properties of a small group of chemicals, and their pharmacology was the basis of the syndrome. The recent emergence of behavioural addictions as syndromes that closely resemble substance addictions, but do not involve direct modulation of brain pathways by exogenous chemicals, challenges this assumption. These syndromes refer to the excessive consumption of a discrete set of products --currently, only gambling and video gaming within clinical classificatory systems (Brand, 2022;Holden, 2010; D. L. King et al., 2019) -as modern, predominantly digitized products and have been extensively designed and engineered over the span of several decades. If the engagement with such products can become addictive, perhaps the addictive nature of drugs does not lie in their pharmacology per se?
Incentive Sensitization Theory contends that increased reactivity of the dopamine (DA) system to a particular stimulus, via repeated exposure to that stimulus (i.e., 'sensitization'), is a core process in addiction pathology. By activating DA, drug-related conditioned stimuli (CSs) acquire the ability to preferentially evoke approach and seeking of drugs (incentive salience) (Robinson & Berridge, 2003). Because the conditioned response to a CS often approximates the unconditioned response to the reward itself (i.e. the US) (c.f. Siegel, 2016), the potential for incentive sensitization should apply to any USs that are capable of repeatedly activating DA.
Although addictive drugs meet this criterion, the fact that animals ingest drugs repeatedly in nature without developing an addictive syndrome suggests that pharmacological activation of DA alone is not sufficient to cause addiction (Hagen & Sullivan, 2019).
In the past decade, multiple studies have demonstrated sensitization-like effects in healthy animals chronically exposed to non-drug reinforcers (e.g., saccharin) delivered under variable schedules (Mascia et al., 2019;Singer et al., 2012;Zeeb et al., 2017). By rendering delivery of the US unpredictable, variable schedules ensure the ongoing elicitation of reward prediction errors (RPEs), registered by phasic midbrain DA activation, whenever the US is delivered (Fiorillo et al., 2003;Schultz, 1998). Mimicking the DA-activating effects of addictive drugs may permit an ongoing increase in incentive salience for stimuli associated with unpredictable non-drug reinforcers, via temporal difference learning (TDL) (Zack & Poulos, 2009). TDL is a process whereby phasic DA release in response to a novel (appetitive) US begins to occur in response to the CS rather than the US, with repeated CS-US pairings. Under normal circumstances, TDL ensures that information about reward delivery is registered once -as an RPE -when the CS is encountered, with no further DA response when the US arrives. However, when the relation between the CS and US is inconsistent, it is hypothesized that the CS evokes an initial phasic DA spike (RPE) and the US delivery continues to elicit a second RPE (Redish, 2004;Redish et al., 2007;Zack et al., 2020). It is this 'dual activation' of DA by CS and US under variable reward conditions that creates the conditions for perpetual escalation ('hyperlearning') in TDL, analogous to that described by Redish (2004) for addictive drugs (Zack et al., 2020).
In our two previous articles developing this framework for behavioural addictions, we used the term 'uncertainty' to refer to the effects of variable reward schedules on the sensitization process in Gambling Disorder (Clark et al., 2019;Zack et al., 2020). Upon reflection, we believe it is useful to distinguish reward variability from its experiential consequences. Henceforth, we use the term uncertainty to refer to the subjective experience of reward variability. Naturally, with more variable US delivery, we expect concomitant increases in (subjective) uncertainty, but as psychologists, we also expect various non-linearities and biases to exist in that relationship. We note further that we are exclusively interested in unpredictable variability, as opposed to a 'cyclical' event such as the sequencing of traffic lights, or a fixed ratio schedule (A, A, A, B, A, A, A, B…) where the outcomes technically vary but in a fully predictable manner. In a taskrelated Positron Emission Tomography study by Zald et al (2004), striatal dopamine release was observed to monetary rewards delivered on a variable ratio sequence compared to a visuomotor control condition, but significant dopamine release was not detected to equivalent monetary sums delivered on a fixed ratio basis.
Our emphasis on reward variability also generates some testable predictions for drug reinforcement. If delivery schedules can enable sensitization to a non-drug US, then these schedules may also influence the sensitizing potential of drugs. Early research on amphetamine sensitization noted the role of intermittent dosing as opposed to chronic, uninterrupted dosing for inducing sensitization rather than toxicity (Robinson & Becker, 1986). The pivotal role of schedule was revisited recently by Kawa et al (2019) who found that intermittent cocaine access reliably induced sensitization (increased stimulant-induced DA release), whereas chronic long access cocaine exposure led to decreased stimulant-induced DA release. Collectively, the data suggest that inconsistent reward delivery in models of gambling may represent a special case of a broader phenomenon: i.e., variability (provided it is random) may be crucial for development of sensitization to both drug-and non-drug rewards. This aligns with learning-based accounts of substance addictions as maladaptive patterns of learning with a physiological -albeit potentially reversible -basis (Lewis, 2018), and with concepts from behavioural ecology and foraging theory of 'incentive hope' (sustained reward expectancy) to explain persistent seeking of unpredictable non-drug rewards in nature (Anselme & Güntürkün, 2019).
In behavioural psychology, the role of reward variability in promoting behaviours such as gambling and videogaming is far from novel, and can be directly traced back to Skinner's work in the 1950s. However, researchers have tended to interpret reward variability solely in terms of the Variable Ratio schedule that governs delivery of monetary reward (i.e. "on which trial do I receive the reward?"). In the classic electrophysiological study by Fiorillo et al. (2003), the effect of unpredictable CSs on DA neuron firing (i.e. during the CS-US interval) was notably greatest when the size of the US (i.e. fruit juice volume) was also variable. Subsequent research in nonhuman primates found that DA neurons not only code the expected size of a primary reward, but also the information gained from cues that signal reward size. When offered a choice between an uninformative cue or cue with clear predictive value, the monkeys rapidly develop a preference for the predictive cue (Bromberg-Martin & Hikosaka, 2009). Thus, information about a US alone can have strong incentive value, and DA mediates this effect. In this article, we describe how multiple sources of variability exist and operate concurrently in modern digital products, which includes gambling, but is also relevant to video games, social media, and online access to shopping and pornography (see Table 1 for summary). A recent Perspectives article in Science (Brand, 2022) revisits the concept of 'internet addiction', noting that specific features of internet applications require further investigation and that it remains "unclear whether online addictive behaviors are developed because of a general tendency to be addicted or whether the online environments are specifically addictive" (p. 799). We argue that reward variability is a critical theme cutting across these applications, and that these sources of variability are directly enhanced by the internet environment. In the following sections, we will break down these sources of variability, distinguishing those that are common across multiple formats, and other sources which may be more domain-specific. Our main thesis is that these overlapping sources of variability will amplify the addictive potential of these products via incentive sensitization.

Variability in gambling
The slot machine is the textbook example of a 'variable ratio' schedule (Schacter et al., 2020): on any spin, the gambler knows that monetary reinforcement might be received, but not which spin will yield the payout. This form of intermittent reward ("which one?") is present across all forms of gambling. But even within a slot machine game, it is not the only source of variability. In most gambling products, the amount of monetary payout ("how much?") can also vary. For example, slot machines conventionally offer different payouts to different symbol combinations, and lotteries offer prizes smaller than the main jackpot for subsets of correct numbers. This is analogous to the amplification in DA firing in Fiorillo et al. when the probability of the CS-US pairing was compounded by variable US magnitude.
Some gambling products also introduce concurrent schedules, a concept that is harnessed further within videogames (see below). We will highlight two examples. In modern sports betting, side bets and 'in play' options mean that the final score is not the sole outcome of interest; bettors can bet on multiple outcomes that are resolved over different timescales (e.g. in soccer, which player will score the next goal) (Killick & Griffiths, 2021). In modern slot machines, the variable ratio governs delivery of monetary wins, but most games also offer rarer 'bonus features', which can be a number of free spins, or a separate 'game within a game' (Parke & Griffiths, 2006). Regular slot machine gamblers highly value these bonus features when choosing which game to play (Landon et al., 2018), and receipt of bonus features is physiologically arousing (Kim et al., 2022;Moodie & Finnigan, 2005). Third, the bonus features are signalled by special symbols that are often over-sized and accompanied by a unique auditory signature, and in this way, bonus features also enable 'near-miss' outcomes: when 2 out of 3 required bonus symbols are displayed, it is highly salient to the gambler that they just missed the bonus round, even on a visually-complex 'multi-line' game (Dixon et al., 2015).
The features described above also introduce some subtle aspects of temporal variability. Even on fast, continuous forms of gambling like slot machines, spin durations are not perfectly uniform.
Some games offer a 'stop' button, where the gambler can voluntarily brake the reels to shorten the spin duration (Chu et al., 2018). The delivery of one or more bonus symbols also typically lengthens the overall spin duration (Limbrick-Oldfield & Clark, unpublished observation), building up anticipation. At a temporal level, near misses can be conceptualized as manipulating the expectancy of reward within a single trial, by displaying an initial partial match (i.e. a conditioned reinforcer) that is subsequently withheld (Pisklak et al., 2020). In a study using a wheel-of-fortune task, when these expectancies were blocked by hiding the wheel during the spin (Wu et al., 2017), this largely abolished the subjective effects of near-miss outcomes.
Neurophysiological studies of DA signalling confirm that such temporal variability serves to potentiate DA release (Starkweather et al., 2017).

Variability in Videogames
One key distinction between videogaming and gambling is the much greater emphasis on skill in videogames. This enables mastery and competence to fuel intrinsic motivation (Depue & Collins, 1999). But the impact of skill on reward variability bears scrutiny. Intuitively, one may think that as a gamer gets more skillful, reward variability diminishes, and thus based on our argument, videogames might be less prone to excessive consumption. However, videogames maintain engagement by increasing game difficulty as the gamer's skill improves e.g. across successive levels. This adjustment may help to maintaining a subjective state of immersion, termed flow, which may specifically arise when the challenge posed by the activity and the skill level of the participant are balanced (Csikszentmihalyi, 2000;Larche & Dixon, 2021). Notably, increasing game difficulty also ensures a moderate and fairly stable rate of 'failures' at the game, such that the rewarding outcomes continue to be variable in nature.
Modern videogaming is also typically online, and this multi-player functionality adds several forms of social reward that are relevant to variability. For example, social rank is conveyed by leaderboard position (D. King et al., 2010); for skilled players, advancing on the leaderboard forms a concurrent schedule with its own value via prestige. This social dimension interacts further with the skill dimension, because gamers can choose to segregate with other players of a similar skill level. When highly skilled players compete with each other, the outcomes are again rendered more unpredictable (Delfabbro et al., 2020).
With regard to variability of reward delivery, videogame loot boxes represent a singular example. Currently present in around 77% of iPhone games , loot boxes can be earned through game-play or purchased directly. Opening a loot box yields a randomized virtual prize, which could be a rare ('legendary') and highly valuable item, but in practice is often a low value or duplicate item. Many studies have noted both the simple resemblance, and the overlap in actual engagement, between loot boxes and conventional gambling (Garea et al., 2021).
Nevertheless, the virtual prizes delivered by loot boxes also highlight a distinction between videogames and gambling, that loot box prizes offer multiple versions of reward. In contrast to the fungibility of money in gambling, in which one $5 win is equivalent to any other $5 win, gamers will work for new characters, weapons, or skins that are delivered via randomized prizes, and game designers can continue to offer unlimited forms of these prizes .

Variability in relation to Shopping, Pornography & Social Media
In this section, we consider three behaviours that are not formally recognized as addiction diagnoses within clinical classificatory systems, but which have credible academic literatures that support their harmful use in some individuals. The primary motivations are clearly distinct in these 3 cases, although each seems intuitively likely to rely on survival-based drives.
Pornography can be assumed to rely on sex (mating instinct) as a primary reward (Gola et al., 2016). The drive to use social media can be linked to a number of social reward components, including affiliative bonding (Depue & Morrone-Strupinsky, 2005), social sharing (Tamir & Mitchell, 2012), and social network size and rank (Meshi et al., 2015). In compulsive shopping/buying disorder, purchases often entail items for the home (nesting instinct; Anderson & Rutherford, 2013), or apparel or beauty products that may fulfil self-enhancement and social signalling motives. Notably, browsing for (without actual purchasing) such items also provides excitement and/or escape from negative feelings (Müller et al., 2021;Sharif et al., 2021).
A feature that these activities have in common is how their online access has afforded the injection of new forms of variability. The 'infinite scroll', a defining feature of social media apps, is a continuously refreshing and never-ending feed that might include posts, advertisements, or recommended content. Across all three activities, consumers receive recommendations to new content, which are often personalized from their prior selections. New video content can be made more salient via autoplay. Equivalent to the multiple versions of reward described above in videogame loot boxes, these features may fuel sustained use by rewarding the consumer with a steady and practically limitless stream of novel experiences, which nonetheless aligns closely with their personal preferences as established via their prior choices. Social media apps rely on other social and gamified mechanics that further amplify variability, including the intermittent social reinforcement of 'likes', new followers, and friendship streaks (e.g. in SnapChat) (see Lindström et al., 2021;Westbrook et al., 2021). Online shopping platforms may provide price comparison information, reviews, or related products from other customers who purchased a viewed item.
Research on the impact of these features is perhaps surprisingly limited. A study by Noe et al (2019) indirectly speaks to the 'addictiveness' of the infinite scroll feature: using specialized code to record users' phone interactions (taps, writing, scrolling), scrolling behaviours were the most common event type, and were the only individual event type that correlated significantly with problematic smartphone use. These 'micro-behaviours' were also substantially independent from the overall time spent on the device, given that modern apps promote 'passive' usage for activities such as navigation or listening to music.
To date, research on online pornography and shopping has not considered reward variability as a key dimension. In relation to pornography, phenomenological descriptions point to increases in problematic sexual behaviours since the advent of the internet ('cybersex'). The Triple-A model by Cooper (e.g. Cooper, 1998) highlights the high Accessibility, Anonymity, and Affordability of sexual content online. Riemersma and Sytsma (Riemersma & Sytsma, 2013) noted a divergence between pre-internet descriptions of sexual addiction being characterized by history of abuse, insecure attachment, and high impulsivity, compared to more contemporary 'rapidonset' cases that they argued were facilitated by the accessibility and intensity of online sexual content. Similarly, in Compulsive Shopping/Buying Disorder, there is an active debate on the clinical utility of distinguishing online and land-based subtypes (Augsburger et al., 2020;Müller et al., 2021). In a German study examining 122 treatment-seeking cases with compulsive shopping, only a third had problematic online involvement (Müller et al., 2019) and in another study there was substantial overlap between latent classes for 'addicted' online and land-based shoppers (Augsburger et al., 2020). Nonetheless, the affordances of the online environment for shopping are well recognized, and overlap with the triple-A model for online pornography (Muller et al). Clinical and phenomenological associations are reported between compulsive shopping and problem gambling (Black et al., 2015), and between online compulsive buying and excessive social media use (Sharif et al., 2021). For example, Sharif et al. note how multiple features of social media, including wealth signalling, sharing of products and purchases between friends, and posts by influencers of material possessions, may all reinforce compulsive buying.
Lastly, in the context of the COVID-19 pandemic it is also likely that increased variability (in the form of scarcity) may further increase consumption, as seen for online shopping (Taylor, 2021; see also Castro-Calvo et al., 2018). Further research on the behavioural effects of specific product features (e.g. personalized recommendations), as well as the overall impact of the online environment, will be critical in substantiating this account in relation to behavioural addictions.

Additional processes shaping reward uncertainty
Our discussion so far has focussed on the pervasiveness and layering of reward variability across both recognized and putative forms of behavioural addiction. Yet reward variability by itself is unlikely to be sufficient for development of behavioural addiction. Here, we note two further factors that will intuitively shape the effects of a variable US.
The time course of uncertainty and its resolution: In order for the uncertainty elicited by reward variability to be attractive, the uncertainty may need to be resolved. The surplus value that derives from an uncertain outcome being revealed has been termed 'resolution utility' (Ruan et al., 2018;Shen et al., 2019). In principle, this resolution utility to an uncertain choice is positive, even if the desired outcome is not obtained, because the person has moved from a state of not knowing to a state of knowing -analogous to how monkeys choose informative over uninformative cues (Bromberg-Martin & Hikosaka, 2009). Similar ideas in behavioural ecology and foraging research refer to 'information primacy' and 'need to know' (Anselme & Güntürkün, 2019;Inglis, 2000). In an applied example, Shen et al (2019) randomized members of a running club to earn money for training on two different reward schedules: the certain group received 10 cents for every lap, while the uncertain group received either 10c or 5c per lap. Even though the mean reward expectancy was less in the second group, they completed more laps over a 15-day event; i.e., reward outcome variability incentivized exertion. In subsequent experiments, this effect was shown to depend on regular feedback to participants regarding their accumulated earnings, thus resolving their uncertain gains.
Other research indicates that resolution may not be the only source of value (or utility) that is derived from reward uncertainty. A recent study by Rauwolf et al (2021) induced a state of incidental reward uncertainty using a covered dice shaker that could award a bonus payment at the end of the experiment. In a control condition, the mean bonus was already determined.
Participants were given the opportunity to consume snacks or alcohol in one study, and had a taste test for the intensity of sweetness and saltiness in another experiment. Participants in the uncertain group consumed more snacks and reported higher taste intensity. Notably in these experiments, at the point of consumption, the uncertainty had not yet been resolved, although it may be relevant that participants were aware it would be resolved.
Elevated DA transmission has been shown to fuel risk-taking and consumption of a primary reward (food) in healthy volunteers and people with Parkinson's disease (de Chazeron et al., 2019;Riba et al., 2008). Taken together, the implications of Rauwolf et al and work on the role of DA in reward seeking are potentially far-reaching: they indicate that unresolved uncertainty in one 'domain' (i.e. activity A) can directly increase the incentive value and consumption of immediately gratifying primary rewards in another domain (activity B). Thus, from the standpoint of DA, pre-existing uncertainty can transfer its motivating effects (confer salience) to the reward that is most available. By implication, unresolved uncertainty from countless everyday sources (e.g., work, school) could conceivably contribute to over-consumption of primary rewards in unrelated activities, via residual activation of DA.
These behavioural effects also raise important questions about the role of phasic (i.e. rapid, stimulus-evoked) vs. tonic DA in the response to uncertainty. The anticipatory DA activity during uncertain CS-US intervals observed by Fiorillo et al. is generally interpreted as a tonic effect, but recent studies revisiting this phenomenon have largely implicated phasic signalling, in the form of expectancy-generated sub-second bursts of DA release instigated by the CS (Mohebi et al., 2019). This conforms to the escalation of phasic DA as an animal gets physically closer to an expected reward (e.g., goal box) observed in multiple prior studies (Hamid et al., 2016;Phillips et al., 2003). At the same time, it is hard to see how the behavioural effects on consumption in Rauwolf et al would be mediated by phasic DA, given both the timeframe and motivational transfer that are involved.
Although a direct effect of phasic DA signalling struggles to account for the temporal delay and motivational transfer of unresolved uncertainty in the study by Rauwolf et al, one way to reconcile these data is to assign a permissive role to tonic DA on reward-seeking, as originally proposed by Grace (Grace, 1991(Grace, , 2016. This work outlines a circuit whereby an excitatory signal from the ventral hippocampus indirectly regulates the population activity of limbic DA neurons so that when these neurons are disinhibited, they exhibit high tonic activity and high phasic response, resulting in "enhancement of both general motivation and specific reactivity to reward-predicting cues" (Kirschner et al., 2020, p. 2). In this framework, unresolved uncertainty coupled with the expectation of imminent reward in the gambling task (uncertainty resolution) in the Rauwolf study may have had a disinhibiting effect on tonic DA (increased vigor mediated via hippocampus) coupled with an attracting phasic response (accumbens DA release) to proximal reward (Mohebi & Berke, 2020;Morrison & Nicola, 2014). Future studies could test the effects of more 'open-ended' uncertainty on consumption behaviours, in which no information is provided about the timing of resolution, as this circumstance more closely reflects the conditions of ambient uncertainty outside the laboratory.
The significance of uncertainty resolution raises a further moderator: the extent to which the participant is actively engaged in other activities during the delay period. The subjective state of uncertainty is generally more acute if the participant is less distracted while uncertainty is encountered. This is clearly illustrated in slot machine gambling, where the period of acute uncertainty occurs between the initiation of the spin and the moment that the final reel stops spinning. With no other activity to perform during this interval, gambling products can induce a state of 'immersion', as a hyper-attentional state that is also correlated with problematic engagement (Murch et al., 2020).
Frequency: Under natural conditions, many otherwise innocuous non-drug reinforcers exhibit at least some variability. One does not know the exact temperature or pleasantness of a warm bath, so could uncertainty not pave the way for any behaviour to become addictive? We suggest that in most cases, the bias to seek out non-drug reinforcers does not escalate to an addiction syndrome primarily due to insufficient exposure. As long as the range of alternative reinforcers is wide and the organism's mobility is not restricted (Vuchinich & Tucker, 1988), the number of times the perpetual RPE/escalating TDL cycle occurs may not suffice to induce the addiction syndrome.
But conversely, this raises a further critical factor, in how often the reinforcing stimulus or activity can be administered per unit time (i.e. frequency). Like all learning, sensitization increases with repetition. If each variable CS-US trial augments TDL, the more trials experienced in a given time, the greater the ability of a CS to bias pursuit of a specific US at the expense of alternative reinforcers. In short, both variability and frequency likely contribute to the addictive potential of a non-drug reinforcer. Satiation mechanisms (or a lack thereof, in the case of social media for example) are also clearly relevant to this potential for temporal compression.
By enabling near limitless diversity and speed of delivery of non-drug rewards, digital technology has permitted engineering of reinforcers with addictive potential that, delivered under natural conditions, would likely never become addictive.

Conclusions
Reward uncertainty has long been recognized to exert motivational effects, with relevance to gambling as an activity that becomes disordered in some people. We offer a conceptual account of behavioural addictions that identifies this reward variability, which underpins uncertainty in gambling, as a common and mechanistically crucial feature of numerous activities with addictive potential. We have described how neural mechanisms, primarily involving overly frequent or persistent DA signalling, may account for how reward variability can drive escalating incentive salience of non-drug reinforcing stimuli (Anselme & Güntürkün, 2019;Anselme & Robinson, 2013). We acknowledge that this framework assumes a functional correspondence between incentive sensitization and addiction (Robinson & Berridge, 2003), and further research is required to test whether compulsive seeking in the context of behavioural addictions entails mechanisms that go beyond sensitization. In the case of gambling, reward variability had historically been described primarily in terms of the variable ratio governing the receipt of monetary reward. We delineate multiple concurrent sources of reward uncertainty in modern gambling products, and consider how these sources of variability also arise in other emergent behavioural addictions, namely videogaming, social media, buying/shopping, and pornography.
Online access to these products is argued to elicit further sources of variability (e.g. recommended content) that may amplify the addictive potential of old-fashioned activities.
Within this uncertainty-based account of behavioural addictions, the timecourse and resolution of the reward uncertainty, and the temporal compression of learning that is enabled by higher event frequency, are both likely to moderate the potential for an addiction syndrome to arise.