Anxiety Impedes Adaptive Social Learning Under Uncertainty

Assessing Anxiety. Subjects were grouped as a function of clinical-significance depending on their scores from the 7-item Generalized-Anxiety Disorder Scale, the GAD-7. Based on clinical guidelines (Spitzer, Kroenke, Williams, & Löwe, 2006), a score of 10+ on the GAD-7 scale reliably predicts the prevalence of an underlying anxiety disorder (89% post-test predictive probability) and is therefore considered clinically-significant. Assessed clinical-significance in our study does not mean that subjects were already clinically-diagnosed based on DSM-5 standards, however, clinically-significant scores strongly suggest that participants experience many of the characteristic symptoms associated with GAD (Löwe et al., 2008). GAD-7 scores are displayed in Figure S1 below.

Treating Anxiety as a Continuous Metric. As noted above, subjects were grouped together depending on whether they were above or below clinically-established thresholds on the GAD-7 scale. Although investigating differences in task performance across groups is a useful way to parse our study hypotheses, we wanted to ensure that our effects also hold when treating anxiety as a continuous construct. In order to do this, we modeled our data using raw GAD-7 scores (i.e. scores shown in Figure S1) and the results still hold. When examining the mean investments across task blocks as a function of GAD-7 score (treated continuously: Investment ~ GAD Continuous * Valence) the anxiety × valence interactions are still significant and follow the same pattern of results observed in the original analyses where GAD-7 is modeled dichotomously. Below are observed statistical effects of linear mixed effects regressions (lme4 package in R) using raw GAD-7 scores (i.e. treated continuously); TG neutral start player, anxiety × valence interaction, t(351) = 2.39, p = 0.017; TG positive start player, anxiety × valence interaction, t(351) = 2.62, p = 0.009; SM neutral start player, anxiety × valence interaction, t(351) = 2.53, p = 0.012; SM positive start player, anxiety × valence interaction, t(351) = 2.36, p = 0.019. As observed with dichotomous analyses, no significant anxiety × valence interactions were observed in the TG or SM negative start cases.
When we examine the decay rate difference with GAD as a continuous variable (Decay rate ~ GAD Continuous * Condition), we observe a significant main effect of GAD (t(687.88)= 2.49, p = 0.013), and condition (t(352) = 3.36, p < 0.001), such that there is an observed Trust Game x GAD effect (t(352) = 2.64, p = .009) but no Slot Machine x GAD effect (p > .05), mirroring the data from figure 4b. There is not, however, a GAD x condition interaction (t(352) = -1.39, p = 0.165) which may be the result of the model being slightly underpowered to detect an effect considering how variance is accounted for in the continuous mixed model regression (there are only two decay rate data points per subject).
Excluded Subjects. Exclusion criteria were twofold: those who demonstrated poorer learning than chance and those who clicked through the entire experiment by indicating the same response on all trials of one or both tasks, which was assessed through model fit values (AICs) that were not better than those derived from chance behavior (-2*log (.5)*(No. trials) = -116.45), or those who had very close to 0 values (i.e. perfectly predictable), which were diagnostic of highly deterministic and repetitive behavior (e.g. investing $1.00 on all 84 trials of the task). These extremely repetitive behaviors could be captured in the model through reaching upperbound on the inverse temperature parameter (i.e. highly deterministic behavior), together with upper and lower-bounds of the bias parameter (depending on whether the subject invested on every trial), and lower bounds on decay parameters (i.e. no updating of information). Therefore, to exclude subjects with worse than chance performance and those who simply did not engage in the task as evidenced through repetitious responses on all trials, we restricted our sample to subjects with AIC values > -116.45 (indicating performance at or better than chance) and < the model AIC upper bound (indicating variability in trial by trial responses). Therefore, using AIC values derived from the simplified B-RL model across both trust and slot machine tasks, we excluded data from 58 subjects (n = 20, AICB-RL < -116.45 & n = 38, AICB-RL = -12.00) for a final sample size of N = 354. We chose to exclude subjects based on simplified B-RL model fit because simplified B-RL was identical to DB-RL but had fewer parameters, and therefore a slightly less strict benchmark for chance performance. Because our experiment requires within-subject analyses across tasks, subjects who exhibited worse-than-chance or extremely repetitive behavior in only one task were still excluded from the study, in order to ensure pairwise comparisons were feasible. Model fits of all subjects prior to exclusion (N = 412) are plotted below. AIC = 2*(log likelihood)-2*(No. model parameters) As a proportion of the total sample, the exclusion criteria removed 5.10% of subjects that were below chance and also had clinically-sig. anxiety, compared to 8.74% that were below chance but were healthy. Therefore, anxious and healthy subjects were removed at similar rates. The excluded group (N = 58) did not have significantly higher anxiety scores compared to the final sample (Mean GAD-7 in final sample: 6.16; Mean GAD-7 of excluded group: 7.63; t(410) = -1.73, p > 0.05).  The far-right dotted blue line denotes the upper-bound of model AIC in which subjects were perfectly fit to the model through repetitive behavior, therefore identifying participants that indicated the same response on nearly all trials of the task.

Believability Ratings:
We wanted to be extra cautious that subjects were treating the Trust Game as a fundamentally social task and therefore measured the extent to which subjects believed they were interacting with real partners online. At the end of the experiment, subjects were asked to indicate how much they believed they were interacting with real partners. 70% reported no doubt to only some doubt that their partners were real. As a more conservative test, we wanted to ensure that believability ratings (reported on a scale from 1-6) did not impact our observed statistical results. We entered subject-specific believability ratings into our main regressions, which controls for the effects of task suspicion. Entering each subject's believability score into the main models did not quantitatively change any of the relevant statistics reported in the main manuscript or supplement (the betas associated with believability scores for all analyses were non-significant, p > 0.29 for all).

Modeling Decision Space.
Although subjects could in principle invest any amount from 0.10 to 1.00 on each trial, the optimal payoff maximizing strategy is to invest the full amount ($1.00) if the trustee returns more than 0.25 the amount (after it is multiplied by 4), or to invest the minimum ($0.10) if the trustee returns less than that. Indeed, the distribution of investments in both healthy and anxious subjects were bimodal at $0.10 and $1.00 ( Figure S3), suggesting that most subjects learned to use these anchors to maximize their earnings. We thus binarized the decision space to predict whether subjects would invest or not: when the player was set to return more than 0.25, high investments (at or above $0.50) were considered optimal, whereas low investments (below $0.50) were considered approximately optimal when the player-algorithm was set below $0.25. This was further validated by the observation that subjects were best fit by a model in which they would choose $1.00 if the probability of the other person repaying their trust (i.e. that the return was greater than 0.25) was greater than 50%, as quantified by the bias parameter (see Tables S5 and S6, Bias ~ 0.5 in all cases).
Due to the 4% noise boundary, subjects rarely broke exactly even (i.e. the randomly sampled proportion of return always fell on one side of the 0.25 threshold) and subjects generally earned or lost at least $0.01 on the 0.25 proportion return trials. Although conceptually these trials should be treated as breaking even, the effects produced from the 4% noise boundary essentially cancelled out rewards and losses during the break-even blocks, so value was generally kept stable in these periods.

Modeling Feedback.
Because feedback was presented on a continuous scale in our experiment, we also examined a continuous learning rule in which the value of investing $1.00 or $0.10 was updated as a function of scaled prediction errors (i.e. the outcome relative to their actual investment). Continuous versions of both the Bayesian and RL model did not adequately capture the data and resulted in poorer fits to the data, as indexed through the poorer model fits and posterior predictive checks. In other words, subjects were effectively treating the outcomes as more dichotomous in nature, despite the fact that feedback was continuous. This strategy is roughly normative, given that the payoff-maximizing strategy is to fully invest if the outcomes are above the threshold needed to increase the expected value, and to minimally invest otherwise. We therefore simplified the model, such that feedback was coded in terms of the payoffmaximizing strategy (i.e., whether the outcome indicated that investing would be beneficial or not). In other words, if the agent (or machine) was set to return more than 0.25 of the investment, the model iteratively coded this as gain trial, whereas any investment below 0.25 was coded as a loss trial. All subsequent models were fit using the binarized learning rule.

Figure S3
Dynamic Bayesian-RL Model Model parameters. There were 6 free parameters that were fit to each subject by maximizing the log likelihood across 5 iterations using the fmincon function in MATLAB R2017a. Model parameters are described below. Model setup. As noted above we assumed that subjects tracked the probability that it was worth investing in the trustee (i.e., that their proportion of return > 0.25). The optimal behavior for whether to invest only considers whether the outcome is better than the indifference point, and hence we track the probability of the trustee returning amounts that are larger than 0.25, rather than tracking the specific proportion returned (models that considered more continuous representations did not fit as well). Given that the likelihood of any observation given the underlying rate of return is Bernoulli, ( 8, … 8: | ), we modeled the belief about this probability, ( ), as the conjugate, a beta distribution (Daw, Niv, & Dayan, 2005;Doll, Jacobs, Sanfey, & Frank, 2009;Frank, Doll, Oas-Terpstra, & Moreno, 2009;Franklin & Frank, 2015). Thus, each time the proportion return is greater than 0.25 (i.e. there is a positive prediction error) alpha was incremented +1, whereas each time there was a negative prediction error beta was incremented +1. Alpha and beta values were used to update the posterior distribution at the end of each trial. Effectively, this approach tracks the probability of a reward prediction error rather than tracking accumulated rewards over time. Each player (and machine) was individually modeled in separate distributions and all priors were initialized using the beta distribution conjugate prior (beta 1,1; i.e. uniform), as we opted for a more conservative approach in modeling subject-specific priors. Additionally, we did not have any strong predictions regarding subject-specific priors pertaining to the Trust Game, since subjects were told they would be engaging with anonymous others online. Below are the equations for the mean( jt) and variance( @ jt) of the posterior Beta distribution, where j corresponds to the player type (or machine type) and t denotes the current trial. The alpha and beta values were therefore used to keep a running track of positive and negative outcomes (α, β), which are used to update the posterior distribution on a trial-by-trial basis, thus approximating the probability that it is worth investing in the trustee.

Choice rule.
On each trial of the TG and SM, choice probabilities were modeled through a softmax logistic function, comparing the mean value of the posterior distribution (i.e. the best estimate of whether they are likely to return the investment), where ζ and ψ are inverse temperature and bias parameters, respectively. Note that if the subject is optimal the bias parameter should be 0.5, which would indicate that they only choose to invest $1.00 when the probability of doing so is greater than 50%. Nevertheless, we allowed for this bias parameter to be freely estimated, to allow for subject-specific biases in overall investment. Here, we will represent the probability of trusting $1.00 or $0.10 as p($1.00) and p($0.10), respectively.

Entropy.
In order to index uncertainty sensitivity, we modeled the entropy ( ) in the choice policy to summarize the uncertainty as to whether one should invest or not, as follows: 8 = −] ($1.00) × @ a ($1.00)b − ($0.10) × @ a ($0.10)bc We then kept track of how this entropy changes from one trial to the next ∆ = 8 − 8d, to capture whether there is likely to be a change point, such that this change in entropy could be used to further increase uncertainty and thereby speed up learning (Franklin & Frank, 2015).
Decay. When observed outcomes are not decayed ( = 1), the beta distribution is maximally updated through the alpha and beta hyperparameters. In other words, the posterior distribution will reflect the entire history of actions taken by the trustee. When observed outcomes are decayed ( < 1), the added uncertainty prevents the posterior distribution from becoming overly confident (i.e. it allows for the possibility that earlier observations are no longer relevant) and thus increases opportunity for learning flexibility. The γ % parameters correspond to the overall decay of previous outcomes. The γ , captures the extent to which decay is modulated on a trialby-trial basis as a function of changes in the entropy (∆ ) and therefore indexes the extent to which one's learning rule is selectively adjusted through perceived changes in task-level uncertainty (Franklin & Frank, 2015). Critically, both γ % and γ , parameters jointly impact overall decay ( ) of alpha and beta hyperparameters. All gamma parameters were further partitioned by valence (positive vs. negative outcomes) to measure valence-dependent differences in learning.  Table S1 for parameter descriptions. The simplified B-RL model was the same in all respects to DB-RL, except that the model omitted γ , )*+ and γ , &'( parameters. Decay. Decay parameters were not modeled as a function of changes in task entropy (∆ ). Thus, only the decay intercepts were used to decay alpha and beta parameters. This model therefore allows for the possibility that reward statistics might change (due to overall decay rates) but does not adaptively alter this perception with changes in uncertainty of the trustee's behavior. Alpha and beta were updated as shown below.

Reinforcement Learning Model
Model parameters. In the RL model 3 free parameters across 5 iterations were fit to each subject.  Table  S1 20 -highly deterministic behavior (always repeats action previously associated with reward)

-always selects at random
Bias See description in Table  S1 1.0 -high benchmark for investing (subject is biased towards never investing $1.00) 0.1 -low benchmark for investing (subject is biased towards always investing $1.00)

Learning rate
Degree of to which positive and negative prediction errors update the value function.
0.1 -minimal update of V 1.0 -maximal update of V As a benchmark-comparison model we constructed an RL model derived from the optimal (i.e. payoff-maximizing) learning rule in the task. The model was set up to separately track the reward-statistics of each player through the value function, where value (V, for each playeri:j tracked at each time-stepi:t) was calculated as the difference between actual rewardt (i.e. return) from playerj and anticipated reward ( p8 ), weighted by the learning rate ( ), which was fit to each subject. Value was initialized at 0.5, as we assumed participants would begin the task treating players indiscriminately.
For the choice function, denoted below as the probability of trusting, we again leveraged the optimal choice rule for the task. Because the payoff maximizing response on a given trial was always either $0.10 or $1.00, subject choices were modeled using these values. Here, the probability of investing the full $1.00 was indexed by p($1.00) and the probability of investing the minimum $0.10 was indexed as p($0.10). Subject choices on a given trial were modeled in the following choice function, where p8 is the derived predicted value associated with trusting a specific player, and and , respectively, are inverse-temperature and bias parameters, fit to each subject.

Model Fits and Model Comparison. All model fits were calculated based on Bayes
Loglikelihood, which was used to compute AIC (Akaike information criterion) using the following formula.

AIC = 2*(log likelihood)-2*(No. model parameters)
To determine the best-fitting model, we applied Bayesian model selection over group AICs using the spm_BMS function (Stephan, Penny, Daunizeau, Moran, & Friston, 2009). Bayesian model selection evaluates the exceedance probability that a given model is more likely than others, given the full set of AIC values for each model and each subject. Bayesian model selection is particularly robust to outliers than a simple comparison of mean AIC fits across groups (Stephan et al., 2009).

Figure S4
Dynamic Bayesian-RL Model Parameter Fit  Figure S5. Plot of posterior predictive simulation of DB-RL model. Posterior predictive check was conducted by simulating data from subject-specific parameter fits.

Anxious Subjects
Model Performance Check. In order to ensure that our parameters of interest were adequately capturing task performance, we modeled the decay rate difference as a function of distance from the optimal response on the previous trial which is a rough approximation of learning from feedback (subject investment -optimal investmentt-1). As shown below, the decay rate difference was significantly correlated with distance from optimal (p < 0.001 for all), validating that our measure of interest was capturing relevant task behavior. As shown below, a higher decay rate difference ( % )*+ − % &'( ) was associated with over-investing, whereas a lower negative rate difference ( % &'( > % )*+ ) was indicative of under-invested. Conversely, a decay rate difference of 0, ( % &'( = % )*+ ) meaning equal decay of rewards and losses, was associated with optimal learning.
Trust Game Slot Machine