Gambling Environment Exposure Increases Temporal Discounting but Improves Model-Based Control in Regular Slot-Machine Gamblers

Gambling disorder is a behavioral addiction that negatively impacts personal finances, work, relationships and mental health. In this pre-registered study (https://osf.io/5ptz9/) we investigated the impact of real-life gambling environments on two computational markers of addiction, temporal discounting and model-based reinforcement learning. Gambling disorder is associated with increased temporal discounting and reduced model-based learning. Regular gamblers (n = 30, DSM-5 score range 3–9) performed both tasks in a neutral (café) and a gambling-related environment (slot-machine venue) in counterbalanced order. Data were modeled using drift diffusion models for temporal discounting and reinforcement learning via hierarchical Bayesian estimation. Replicating previous findings, gamblers discounted rewards more steeply in the gambling-related context. This effect was positively correlated with gambling related cognitive distortions (pre-registered analysis). In contrast to our pre-registered hypothesis, model-based reinforcement learning was improved in the gambling context. Here we show that temporal discounting and model-based reinforcement learning are modulated in opposite ways by real-life gambling cue exposure. Results challenge aspects of habit theories of addiction, and reveal that laboratory-based computational markers of psychopathology are under substantial contextual control.


Model comparison and validation
We compared three versions of the drift diffusion model (DDM) that varied in the way that they accounted for the influence of value differences on trial-wise drift rates, based on modelfit (WAIC).To verify comparable model ranking across conditions, we first carried out a model comparison separately for each environment (see Supplemental Table S3).In both environments, a DDM with nonlinear drift-rate scaling (DDM s ) (Fontanesi et al., 2019;Peters & D'Esposito, 2020;Wagner et al., 2020) accounted for the data best when compared to a DDM with linear scaling (DDM lin ) (Pedersen et al., 2017) and a null model without value modulation (DDM 0 ).
We then build a full model with group level distributions for the baseline condition (neutral context) and s x parameters for each model parameter x, modeling the change from the neutral to the gambling context.S x parameters where modeled with Gaussian priors with means of zero (see methods section).Model ranking was confirmed for the full model (Supplemental Table S3).We next compared the DDMs and the softmax model with respect to the proportion of binary choices (LL vs. SS selections) that they correctly accounted for.
As can be seen from Supplemental Table S4, the DDM S and DDM lin performed numerically on par with the softmax model, whereas the DDM 0 performed substantially worse (see Supplemental Figure S1, Supplemental Table S4).Posterior predictive checks for the winning model showed that it accurately captured the effect of decision conflict (value difference) on RTs (see section Posterior Predictive Checks below and Supplemental Figure S2).Parameter recovery for this model was reported in our prior papers (Peters & D'Esposito, 2020;Wagner et al., 2020).
Supplemental Table S3

Posterior predictive checks Temporal discounting
We carried out posterior predictive checks to visualize if our computational analysis captures key aspect in the data, in particular the value-dependency of RTs (Peters & D'Esposito, 2020;Wagner et al., 2020) .For the temporal discounting task, we binned trials per participant into five bins according to the absolute difference in larger-later vs. smaller-sooner value ("decision conflict", computed according to each participant's median posterior log(k) parameter from the DDM S , and separately for the neutral and gambling context conditions).
We then plotted the mean observed RTs as a function of decision conflict per participant and context, as well as the mean RTs across 10.000 data sets simulated from the posterior distributions of the DDM 0 , DDM lin and DDM S (see Supplemental Figure S2).Supplemental Table S5.Model agnostic analysis of stay probability via a hierarchical general linear model (HGLM).HGLMs were estimated for each context separately using reward and transition as fixed and subject as random effects.The full model with stay probability as dependent variable included the predictors reward, transition (rare vs. common) and context (gambling vs. neutral) as fixed effects and subject as random effect.As a modelagnostic performance measure, the probability of choosing the same S1 option as in the previous trial (stay-probability) is typically analyzed as a function of reward, transition, and their interaction (Daw et al., 2011).Since the 2-step task version employed here utilized continuous payoffs, every trial was rewarded.The "reward" in S2 can thus not be used to directly predict stay probabilities, as done in previous work.Therefore, here the "reward"

Supplemental
factor is computed relative to a moving average of recent rewards.Specifically, we categorized a reward R t-1 as positive "R+" if R t-1 was higher than the mean of last 7 rewards (R t > mean[R t-1: t-7 ]) and as negative "R-" if R t < mean(R t-1:t-7 ).

Model free analysis of Stage 1 RTs
S1 RTs were modeled as a function of categorized reward in the previous trial (see previous section for how this was defined) and context as fixed effects and trial and subject as random effects.Previous reward significantly increased RTs (t = -2.431,p = 0.015, see Supplemental Table S6).We also observed a reward * context interaction (see Supplemental Table S6) on stage 1 RTs.RTs were slower following rewarded trials, more so in the gambling than the neutral context.S6.Hierarchical general linear model results of S1 RTs with reward and context as fixed effects and subject as random effect.S3 Model free analysis of S2 RTs.RTs were substantially slower following rare transitions, both in the neutral (A) and the gambling context (B), see also Table 4.

Model comparison and validation
Model comparison based on the WAIC (Vehtari et al., 2017) (see Supplemental Table S8) revealed that in the neutral context, a DDM with nonlinear drift-rate scaling DDM s (Fontanesi et al., 2019;Peters & D'Esposito, 2020;Wagner et al., 2020) accounted for the data best when compared to a DDM with linear drift rate scaling (DDM lin ) (Pedersen et al., 2017) and a nullmodel without learning (DDM0) (see Supplemental Table S8).The same ranking held for the gambling context.
We next build a full model with group level distributions for the baseline condition (neutral context) and additional s x parameters for each model parameter x, modeling the change in from the neutral to the gambling context.These s x parameters where modeled with Gaussian priors with means of zero (see methods section).The full model reproduced the model ranking (see Supplemental Table S8).We then compared the three DDMs and the softmax model with respect to the proportion of binary choices that they correctly accounted for.As can be seen from see Supplemental Table S9, the DDM S and DDM lin performed numerically on par with the softmax model, whereas the DDM 0 performed substantially worse.Posterior predictive checks showed that the final model accurately captured the effect of reward differences on second stage RTs and reproduced choice behavior (see Supplemental Figure S4 and S5 below).
Supplemental Table S8 Intertemporal choice task: Proportions of correctly predicted binary choices for the softmax model (A) and the drift diffusion model with non-linear drift rate scaling (B, DDMs) in both contexts (neutral [blue], gambling [pink]).
Figure S2.Posterior predictive checks for temporal discounting drift diffusion models.For each participant and condition (Gambling vs. Neutral), trials were binned into five equal sized bins according to the absolute difference between subjective LL and SS option values (decision conflict bin).Plotted are mean observed RTs per bin (data) as well model-generated RTs (blue: DDM 0 , red: DDM lin , orange: DDM S ) averaged over 10,000 datasets simulated from the respective posterior distributions of the hierarchical models.

Supplemental Table S4 Proportions of
correctly predicted binary choices (mean [range]) for the temporal discounting models (neutral vs. gambling context; see Supplemental FigureS1).

Table S7 .
Model agnostic analysis of S2 RTs.HGLM with transition and context as fixed effects and subject as random effect.

.
Reinforcement learning DDM model comparison using the Widely-Applicable Information Criterion (WAIC) revealed the same model ranking for each condition (neutral or gambling context) as well as for the full model.Scores are WAIC (SE).