Distinguishing between intrinsic and instrumental sources of the value of choice

,

, suggesting that choice occupies a similar role to rewards in motivating behaviour.This body of findings has been interpreted by some researchers as evidence for a value of choice-an intrinsic utility afforded to the act of choosing-which motivates preference for choice in the absence of (or even despite) other reinforcers (Bartling, Fehr, & Herz, 2014;Bobadilla-Suarez et al., 2017;Leotti & Delgado, 2011;Sunstein, 2015).
On this view, it could be argued that choice carries what sociologist Max Weber termed intrinsic value (Weber, 1978)-that is, the value of choice is not derived from a choice's outcome (i.e., instrumental value), but from the act of choosing itself.While it seems plausible that we value choices in this way, our choices are typically made with a particular goal or outcome in mind.For instance, even if an individual prefers to choose the colour of their necktie (a seemingly inconsequential decision) over having the choice made for them, that choice is still made in the service of a desired outcome (e.g., avoiding clashing with the colour of one's shirt).Similarly, a growing body of literature suggests that people seek out information, even when this information does not confer additional benefit or useful information about rewards (Bennett, Bode, Brydevall, Warren, & Murawski, 2016;Niv & Chan, 2011).In these cases, choice may act as an instrumental means to acquire additional information, which serves as a desired goal, and, without consideration for informational value, may provide the illusion of an intrinsic value of choice.In other words, when considering the value humans ascribe to information, even in the absence of other choice-reward relationships, choice may be instrumentally contingent with information, and not necessarily intrinsically valued per se.Here we investigate the role of instrumental value in the well-documented preference to choose.
On the basis of this previous empirical work examining the value of choice, it is unclear whether people prefer choice because there is an intrinsic value to the act of choosing, or whether people prefer choice because it typically helps us achieve a goal-i.e., because it confers instrumental value.Critically, the majority of the work described above has examined behavioural and/or neural signatures of the value of choice in situations where choices are instrumental in acquiring a desired outcome (Leotti et al., 2010).That is, while the expected rewards associated with different actions are equated, participants' actions nevertheless have a direct influence over the task environment-e.g., by choosing a cue to reveal a specific stimulus or a stochastic reward; (DuBrow, Eberts, & Murty, 2019;Leotti & Delgado, 2011).However, in situations where passivity is predictive of future rewards (e.g., having a choice imposed on you; Suzuki, 1997) or when the connection between one's choice and the outcome is ambiguous (e.g., when the number of choices is overwhelming; Iyengar & Lepper, 2000), peoples' predisposition for exerting choice is diminished.Similarly, the ability to choose seems to be instrumental in facilitating memory encoding (DuBrow et al., 2019), but only when choices were valuable (Katzman & Hartley, 2020).In other words, the value of choice appears to be determined, at least in part, by the instrumentality of one's choices: the concordance between one's actions and their ultimate outcomes.It follows, then, that if people value (and prefer) choice over no choice because it allows for apparent instrumental control over desired outcomes (Leotti et al., 2010), then disrupting the instrumentality of action-by severing the contingency between the immediate consequences and the ultimate outcomes of one's choices-should attenuate the value of (and preference for) choice.To our knowledge, no prior work has tested this hypothesis directly.
Accordingly, here we explore how the degree of coupling between one's choices and desired outcomes affected the value of choice.To quantify this sort of instrumental contingency, we borrow a concept from information theory: mutual information (MI; Shannon, 1948;see Walters-Williams & Li, 2009 for an introduction).We provide a deeper intuition for MI in the Method section, but in short, MI varies continuously from 0 to 1 and reflects the mutual dependence of two events by quantifying the amount of information obtained about a latter event by observing a former event (for a more detailed mathematical explanation of MI, see the Supplemental Materials).Here, we use MI to quantify the degree of dependency between the outcomes of one's actions and the rewards these actions confer.
By doing so, in the present studies we examine whether peoples' welldocumented preference for choice persists in the absence of instrumental contingency-i.e., when choices are decoupled from their outcomes.To answer this question, we conducted two experiments.In line with traditional free-choice paradigms used to measure the value of choice (Bown et al., 2003;Leotti et al., 2010;Suzuki, 1997), participants completed a two-stage choice experiment, in which they first chose among decks (Stage 1) that either afforded or denied the ability to make a subsequent (Stage 2) choices between two cards from that deck (see Fig. 1A).Depending on the value of the chosen card in Stage 2, participants could then procure rewards.Critically, in both experiments, choice and no-choice decks were matched in expected value, but MI varied across deck identities, yielding four unique deck types (see Fig. 1C).
Using this design, we predicted two possible patterns of choice.On the one hand, consistent with an intrinsic value of choice (Bobadilla-Suarez et al., 2017;Bown et al., 2003;Leotti et al., 2010;Leotti & Delgado, 2011;Suzuki, 1997), instrumental contingency may play little role in people's preferences, in which case participants would simply prefer decks that procured them the freedom to choose, irrespective of whether or not these decks conferred MI about future outcomes.On the other hand, the value of choice may depend fundamentally on instrumental contingency, such that participants prefer choice only-or to a greater extent-when it is accompanied by MI.As we will show, we found that the "inherent" value of choice depends, additively, on its informational value in predicting future rewards, suggesting against a purely intrinsic value of choice.

Experiment 1
In Experiment 1 we examined preferences for decks of cards, for which we factorially manipulated choice and instrumental contingency.Specifically, participants chose between multiple pairs of decks defined by 1) the ability to subsequently choose among separate cards drawn from those decks and, simultaneously, 2) whether this subsequent choice provides MI about eventual rewards.These deck properties were static and were both directly instructed to and learned by participants.In analyzing choices from all possible resultant deck pairings, we were able to interrogate the role of MI in modulating the value of choice.

Participants
An a priori power analysis revealed that at least 55 participants would be required to detect a small effect size with 80% power (see Supplemental Materials, Fig. S2).Anticipating exclusions in an online cognitive experiment, we recruited 100 participants (mean age = 37.25, SD = 11.73,41% women) for Experiment 1 from the online recruitment website Prolific.co.As we will describe shortly, our experimental design included 8 catch trials used to identify participants who were disengaged or responding randomly.Analyzing these catch trials, 23 participants failed to choose the correct option on at least 5 trials (66% accuracy) and were excluded from further analysis, leaving 77 participants in the final analysis.Notably, including all participants in the analysis does not change the pattern of results reported below.
All participants gave informed consent prior to testing and were compensated with 6 USD regardless of performance, in conforming with ethical standards.This procedure was approved by the ANONYMIZED UNIVERSITY Research Ethics Board (REB #137-0816).

Procedure
Participants completed a free choice task that measured the degree to which they preferred to make their own choices rather than have a choice made for them (see Suzuki, 1997 for a similar design).Critically, this task manipulated both the freedom to choose and the instrumental contingency between actions and outcomes, operationalized as MI.The task was comprised of three phases: a learning phase, a choice phase, and a rating phase.
To illustrate the construct of instrumental contingency examined here, consider a gambler selecting numbers on a lottery ticket.The rules of this lottery are as follows: the gambler selects six numbers and if these numbers perfectly match a specific, randomly chosen, "jackpot number", the gambler wins the jackpot.
We can think of the gambler's decision as a three-phase process.In the first phase, the gambler must choose, at the shop, whether to select her own numbers, or to use the automated number generator to randomly choose a set of numbers for the ticket.The gambler is emphatic about choosing her "lucky number"-for example, the six digits representing her birthdate (280496)-so she visits the shop.In the second phase, she receives and verifies her ticket to see that the numbers she selected appear on the ticket.Finally, in the third phase, the gambler returns to the shop to check her ticket against the (hitherto unseen) "jackpot number" to determine if she won or not.Once the jackpot number is known, the relationship between the numbers on the gambler's ticket and whether or not she wins is dependent-knowing both her ticket numbers and the winning numbers eliminates uncertainty concerning whether she won-and consequently, the MI between her ticket number and potential winnings is maximal (MI = 1).Now imagine that, unbeknownst to the gambler, the lottery agency does not actually use the "jackpot number" for selecting a winning number.Instead, they simply choose a random ticket as the winner, with likelihood determined by the number of digits on a ticket.In this case, even if the gambler's numbers match the jackpot number, she may be told when returning to the shop that she lost.In this case, knowledge of the numbers on the gambler's ticket is independent from whether she wins or not, and thus MI = 0.
Importantly, while these two scenarios differ with respect to MI, they are identical in terms of the expected value of either choosing ticket numbers or having them automatically generated: whatever number ends up on the gambler's ticket, the likelihood of her winning the random lottery remains the same.Thus, when MI is present, the gambler's choice is instrumental in ensuring that her lucky number is represented on her ticket.In contrast, to persist in choosing her lucky number in the absence of MI, the gambler would need to derive an intrinsic value from the act of choosing itself, because her choice has become irrelevant for the potential outcome.The question is then: would the gambler continue to insist on choosing her lucky number if she knew that her numbers would be disregarded, even though her chances of winning would not change?Or in such a case, would she be happy to give up her freedom to choose?
Learning Phase.In the learning phase, participants first learned the basic structure of the task under four different conditions (described below).In this phase, participants were shown one of four decks of cards (Fig. 1A).Deck identities were characterized by two features, shape (moon or star) and colour (red or black).As we will describe shortly, the deck identity dictated the relationship between the outcomes and eventual rewards in the task.Deck features (colour and shape) were randomized across participants, thus here we refer to decks as A-D for simplicity (see Fig. 2).After viewing the deck, two cards were "drawn" from this deck (displayed on the screen).To differentiate cards drawn from different decks, the backs of the cards bore the same colour and symbol as the deck they were drawn from (e.g., a card drawn from the red-moon deck was red and had small moons in the corners of the card).Here, participants were told that their goal was to select one of these cards (≤ 1.5 s).The card they selected would be flipped to reveal a number between 1 and 10, excluding 5 (1.5 s).If the selected card was greater than 5, they would win this round of the game.If not, they would lose.The probability of selecting a winning card was fixed, (P(W) = 0.5), and participants were informed that each card draw was entirely random, with replacement.Participants were told that winning a round could, in some cases, result in winning additional money in the task (up to 6 USD).
Critically however, deck identity determined 1) participants' ability to choose a card drawn from the deck, and 2) the relationship between the outcome of that card and possible rewards.Specifically, in decks A and C (Fig. 1), participants were free to choose either card among the two drawn from the deck, using the Z or M keys on their keyboard.In decks B and D however, only one option was provided, and participants were forced to choose this card.The deck identities also determined the likelihood of earning a reward after observing a winning card.That is, in decks A and B, if the selected card was greater than 5, participants would always receive the associated rewards, P(R|W) = 1.Conversely, in decks C and D, regardless of whether their card was winning or not, the probability of transitioning to a reward was fixed at chance-level, P(R|W) = 0.5.Thus, two of the decks provided full MI about the rewards and the other two provided no MI about the reward.The properties of each of the four decks is described in Fig. 1C.
In addition to the decks described above, we also included two "lightning bolt" decks, as catch trials, to ensure participants were correctly following the instructions and understood their choices in the choice phase.Participants were instructed that if they selected a deck with a lightning bolt, they would always win a reward, P(R) = 1.We included two lightning bolt decks: one which provided a choice of cards (in line with decks A and C) and one which did not (as in decks B and D), to avoid a value-of-choice-related bias in catch deck preference (i.e., not choosing the no-choice catch deck because they overwhelmingly preferred the alternative choice non-catch deck).Overall, participants completed 24 trials during the learning phase, four repetitions for each deck identity.
Choice Phase.Following the learning phase, participants completed the choice phase of the experiment.The general structure of the task was the same as in the Leaning Phase, except that participants now had the option to choose which deck they preferred to draw cards from at the beginning of each trial using the W or O keys (Fig. 1B).These deck choices served as our key dependent variable, taking deck preference as a proxy for the relative value of choice, in turn allowing us to examine how this value might depend on the presence of MI.Namely, by analyzing deck choices, we were able to test for 1) a preference for choice over no choice-if, all else being equal, people preferred Choice Decks (A or C), to No-Choice Decks (B or D)-2) a general preference for MI-if, all else being equal, people preferred decks with full MI (A or B) to decks with no MI (C or D)-and 3) whether preference for choice was affected by MI-if people preferred Choice Decks less in when choosing them would result in a loss of MI (e.g., B versus C).
Participants completed 68 trials of the choice phase, 8 of which were catch trials, in which one of the available decks was a lightning bolt deck.As described above, participants were informed that they would always receive a reward if they chose the lightning bolt deck.A catch trial was considered correct if participants chose the lightning bolt deck.The other 60 trials presented all pairings between the four key decks (A-D; 6 pairs) 10 times each.
Rating Phase.After completing the choice phase, participants rated 1) their perceived confidence of winning a reward, after selecting a winning card, under each deck identity (i.e., red moon, black stars etc.), and 2) their subjective sense of control under each deck.First, participants completed a mock trial identical to a trial in the learning phase, in which they passively viewed a deck before selecting among available card options.However here, the flipped card would always result in a win, e.g.P(W) = 1.After seeing the outcome of the card flip, but without having seen whether they would receive rewards or not, participants were presented with an unlabelled sliding scale and asked "How confident are you that you will win points?"The scale was anchored, from left to right, "not at all" and "very" (ranging from 1 to 100).
After completing four of these mock trials (one for each deck of interest), participants were shown each deck in isolation with a sliding scale below it and were asked "When you chose this deck, how much did you feel like you could control whether you earned points or not?"The scale was anchored, from left to right, as "no control", "intermediate control", "complete control" (ranging from 1 to 100) (Wenke, Fleming, & Haggard, 2010).

Inferential statistics
Deck choices were analyzed with Bayesian logistic multilevel models using the brms package in R (version 2.18.0;Bürkner, 2017).We used these models to 1) account for the nested structure of the data (choices per participant) and 2) to directly compare posteriors of choice preferences between deck pairings.By virtue of our factorial design, some deck pairings revealed participants' preferences about choice (e.g., preference for deck A over B, Fig. 1C) and others about mutual information (e.g., preference for deck A over C).Accordingly, two models were fit to assess preference for choice and preference for MI decks respectively.In both models, binary choices (0 or 1, no choice versus choice, or no MI versus MI respectively) were predicted from deck pairing.In the case of the model that predicted preference for choice, deck pairings were 1) both available decks carried MI about the reward (A vs. B), 2) neither deck carried MI (C vs. D), 3) the deck that allowed participants to choose cards also provided MI (A vs. C), and 4) the deck that did not allow participants to choose cards provide MI (B vs. C).Predictors in the model that predicted preference for MI were similar: 1) both available decks yielded the choice between cards (A vs. C), 2) neither deck yielded choice (B vs. D), 3) one deck yielded mutual information but the other didn't (e.g., B vs. D).Random intercepts and slopes per deck pairing were taken per participants.The intercept term was omitted from the fixed effects specification, so that each fixed effect effectively compared choice proportions against chance (0.5).
Confidence and sense of control ratings were predicted using Bayesian linear multilevel modeling, predicting ratings from decks' MI (deviance coded; − 0.5 = no MI, 0.5 = MI) and choice (− 0.5 = No Choice, 0.5 = Choice) status, with random intercepts computed per participant.
Each model was fit with 3 chains and 5000 iterations, taking 2500 iterations as burn-in.All models converged well, as assessed by visual traceplot inspection, R-hat values near 1, and large effective sample sizes.All reported credibility intervals (CI) are at the 95% level.Bayesian p values (P) represent one minus the proportion of the posterior that falls above or below zero (depending on the sign of the median posterior value: below zero if b < 0 and above if b > 0).Bayesian p values can be interpreted probabilistically as "there is a (P⨉100) percent chance that the effect is zero or a reversal of the central tendency".BF refers to the Bayes Factor (or more precisely in this context, the evidence ratio, as implemented by the hypothesis function in brms (version 2.18), (Bürkner, 2017) ( 1− P P ) which reflects the relative evidence for one directional hypothesis (i.e., the effect is larger than zero) over another (i.e., the effect is smaller than zero).

Results
Table 1 shows results of a regression modeling first-stage choices in favour of choice decks and those in favour of MI decks.All comparisons between deck pairings reflect direct comparisons of the posteriors (e.g., coefficients for differences in choices between Decks A and B versus Decks C and D are computed by subtracting posterior values of these two coefficients).

Preference for choice
To establish a canonical "value of choice" effect, we first examined deck preferences across pairings within the same MI level (e.g., both MI = 0) but different choice conditions (i.e., Choice versus No Choice).Echoing past work (Leotti et al., 2010), participants exhibited a preference for decks that allowed them to choose between cards (P(Choose Choice) = 0.62).However, this preference differed according to the global MI in the pairing: when neither available deck provided instrumental information about rewards (Decks B vs. D), preference for choice was diminished, as compared to when full MI was available (Decks A vs. B; b = − 0.30, CI = [− 0.58, − 0.03], P = 0.03, BF = 28.88;Fig. 2A).This finding suggests that while participants preferred choice overall, in the absence of a choice outcome's ability to predict future rewards (i.e., when the overall MI level was zero), the preference for choice was attenuated.

Preference for mutual information
We next examined participants' preference for MI decks across deck pairings in which choice level was equated (e.g., Choice versus Choice), but MI condition differed (i.e., MI = 0 versus MI =1).Holding the level of choice afforded by the decks cards constant, participants significantly preferred decks that yielded MI (P(Choose MI) = 0.70; Fig. 2B).This preference was not reliably affected by the choice status of the pairing: whether both decks (A vs. C) or neither deck (B vs. D) provided the opportunity to choose between cards, participants preferred MI to a similar degree (b = − 0.11, CI = [− 0.37, 0.14], P = 0.33, BF = 3.27).

Difference in choice preference by mutual information
Finally, we examined preference for choice in deck pairs where both the possibility of choice and MI about rewards varied (Fig. 2C).In deck pairs where MI was associated with a deck that provided choice (A vs. D; rightmost bar), participants overwhelmingly preferred the choice deck (P(Choose Choice) = 0.71), b = 1.30,CI = [0.89,1.75], P = 0, BF > 5000).Conversely, when MI was associated with a "choiceless" deck-in other words, when MI about rewards could be obtained by choosing a deck that restricted the subsequent choice between cards (Decks B vs. C; leftmost bar)-participants actively avoided the opportunity to choose (P(Choose Choice) = 0.35, b = − 0.85, CI = [− 1.26, − 0.47], P = 0, BF > 5000).

Subjective ratings
To understand whether participants successfully encoded the MI level associated with each deck-that is, P(R | W), the probability of receiving a reward after having observed a winning card-we asked them to rate their confidence about receiving a reward after observing a winning card in each deck.As shown in Fig. 2D, participants reported higher confidence when the outcomes of choices were predictive of rewards (i.e., when MI = 1) (b = 2.28, CI = [0.68,3.90], P = 0.01, BF = 93.94).We did not observe any change in confidence by choice status (b = 0.19, CI = [− 1.37, 1.76], P = 0.42, BF = 1.38), nor an interaction between choice and MI (b = − 0.06, CI = [− 1.68, 1.52], P = 0.53, BF = 1.11).Together these results suggest that participants were able to accurately learn the conditional instrumental contingency of each deck.Furthermore, the lack of choice effects on confidence suggests that participants' preference for choice cannot be explained by a false sense of confidence about chosen cards yielding a higher likelihood of future rewards.
In a final, exploratory analysis of individual differences, we probed the relationship between sense of control ratings and overall preference for choice, examining whether individual differences in subjective sense of control across choice conditions predicted individual differences in overall preference for choice.In other words, did individuals' subjective sense of control correlate with their preference for choice in the choice phase of the experiment?To do so, we examined the correlation between participants' average preference for choice decks over non-choice decks (A versus B and D; C versus B and D) observed in the choice phase, and the difference in sense of control ratings between choice (decks A and C) and no choice decks (decks B and D).We observed a robust correlation between these two measures (r = 0.31, CI = [0.11,0.49], P = 0.001; Fig. S3A), such that participants who reported a stronger sense of control when making choices also exhibited a stronger preference for choice decks.

Discussion
Experiment 1 provides initial evidence that the canonical value of choice (Leotti et al., 2010) depends, to a substantial degree, on the predictive relationship between the outcomes of one's choices and its eventual consequences-i.e., its instrumentality, operationalized here as MI.When choices were entirely decoupled from future rewards, we observed a weaker preference for choice, and when we pitted the freedom to choose is against instrumental contingency, the preference for choice was altogether reversed.Finally, we observed that individual differences in the preference for choice, were explained, in part, by individual the subjective sense of control imbued by the choice decks.
Additionally, to control for individual participants' experienced histories of rewards-which engender fluctuations in deck expected values-we fit a series of Reinforcement Learning (RL) models to participants' choices (see Supplemental Material).In short, we found that both choice and MI significantly biased deck preferences, over and above the (experienced) expected value of the decks (see Fig. S4).Together, these results suggest that the putatively inherent value of choice (Bobadilla-Suarez et al., 2017;Leotti & Delgado, 2011) depends, to an important degree, on instrumental contingencies between choice and rewards.

Methods
An open question in Experiment 1 concerned whether participants' preferences were driven in part by the clearly delineated deck identities which encoded choice and MI.That is, due to the factorial design, it was difficult to determine whether preferences for decks that yielded choice were driven by an intrinsic value of choice and/or an aversion to a lack of instrumental value.While this issue in inferring directionality does not invalidate the conclusions drawn above, a more stringent test of the influence of MI on the value of choice would be to latently manipulate MI and examine the degree to which it dynamically impacts the preference for choice between the same options.Below, we extend the experimental design used in Experiment 1 to accomplish this goal.

Participants
We recruited 100 participants (mean age = 38.43,SD = 10.17,44% women) for Experiment 2 from Prolific.co.Two participants' datasets were excluded due to technical errors.As in Experiment 1, our experimental design included 16 catch trials used to identify participants who were disengaged or responding randomly.Analyzing these catch trials, 17 participants failed to choose the correct option on at least 11 trials (66% accuracy) and were excluded from further analysis, leaving 81 participants in the final analysis.
All participants gave informed consent prior to testing and were compensated with 7 USD.This procedure was approved by the ANONYMIZED UNIVERSITY Research Ethics Board (REB #137-0816).

Procedure
Participants completed 200 trials of a modified Choice Task, divided into four blocks.This modified design mirrored that of Experiment 1, except for three key differences (Fig. 3).
First, only two decks were available to participants-a Choice Deck (deck A in Fig. 4A) and a No Choice deck (deck B).Participants repeatedly made choices between these decks throughout the task and their impact on the subsequent choice between cards was the same in Experiment 1.As before, first-stage choices served as our main dependent variable.
Second, MI about rewards was no longer tied to deck identity, but instead varied dynamically, over the course of the task over 50-trial blocks (Fig. 3B).To avoid order effects, we employed two block orders, counterbalanced across participants.In the first order, during the initial 50 trials, obtaining a winning card (a card >5) would only result in a reward 50% of the time (P(R|W) = 0.5, MI = 0).In the next 50 trials, this would no longer be the case, and obtaining a winning card would guarantee reward (P(R|W) = 1, MI = 1).This pattern then repeated for the remaining 100 trials (Fig. 3B).In the second counterbalance condition, the order was reversed, such that participants began the task with full MI and ended with no MI.To communicate this contingency, participants were told that they were choosing cards at a "casino" and the end of each block represented the "end of day".Specifically, participants were instructed that "How you win bonus points will depend on the day you are playing at the casino.On some days, you will win points every time the card you pick is above 5.Other days, you will only sometimes win points when the card you pick is above 5." Third, after observing the selected card, approximately 10 randomly selected trials per block would not display a reward, but instead ask participants to rate "How confident are you that you will win points?"These trials were sampled uniformly after the 5th trial of each block, U(5, 50), to allow participants time estimate of the underlying environment MI level (Fig. S6).Participants rated confidence using a sliding scale, anchored at "not at all" and "very", from left to right (1− 100).
Together, these changes allowed for us to investigate 1) the degree to which trial-to-trial changes in MI affects preference for choice for the same options, and 2) how MI itself is learned through repeated experience observing outcomes (cards less 5 or greater than 5) and obtained rewards.

Inferential statistics
As in Experiment 1, we analyzed first-stage (i.e.deck) choices using Bayesian logistic multilevel models.Preference for choice was predicted from MI (deviance-coded, 0 = − 0.5,1 = 0.5), trial number (1-50, meancentered and scaled to be between 0 and 1), and counterbalance condition (deviance-coded, 1 = − 0.5, 2 = 0.5).We also fit a Bayesian linear multilevel model predicting confidence ratings from MI, trial number, counterbalance condition, and card outcome (deviance-coded, lose = − 0.5, win = 0.5).In both models, random slopes of trial number were computed, and random intercepts were taken per participant.Fixed effects were initialized with standard normal priors.Three chains of 5000 iterations (2500 burn-in) were sampled.We analyzed sense of control ratings (provided at the end of the task for both decks) using with Bayesian regression, predicting ratings from deck identity.Three chains of 5000 iterations (2500 burn-in) were sampled.

Preference for choice
Results from the multilevel regressions are reported in Table 2 and visualized in Fig. 4. Echoing Experiment 1, we found that participants preferred the deck that afforded them choice compared to the deck that restricted their ability to choose cards (P(Choose Choice) = 0.58, b = S. Devine et al. 0.39, CI = [0.17,0.62], P = 0.002, BF = 499).Importantly, replicating Experiment 1, we found that this preference depended on environment MI level, such that preference for choice-granting decks was attenuated when the outcome of these choices no longer provided reliable information about rewards (i.e., when MI = 0; b = 0.15 CI = [0.09,0.21], P = 0, BF > 5000; Fig. 4A).Moreover, this preference for choice strengthened over the course of blocks, reflected by the interaction between MI level and trial number (b = 0.27, CI = [0.09,0.47], P = 0.01, BF = 99), suggesting that the influence of environment MI level upon preference for choice preference strengthened as participants presumably learned more about current block's MI level (Fig. 4B).The counterbalance order (i.e., whether participants' block sequence began with MI = 0) did not influence these effects (smallest P = 0.15).

Subjective ratings
Across MI conditions, participants reported feeling more confident about obtaining a reward after having observed a winning card (> 5) and more confident about not receiving a reward after observing a losing card (< 5) (Fig. 4C;b = 43.17,CI = [40.31,46.00],P = 0, BF > 5000), suggesting that their confidence levels were sensitive to the most recently observed outcome.However, the strength of this effect depended on MI, such that participants were more confident, in both directions, under full MI (b = 37.20,CI = [31.55,42.83],P = 0, BF > 5000).In other words, participants were more confident they would not receive points after observing a losing (<5) card and more confident they would win points after observing a winning (>5) card-a pattern which was most pronounced under full mutual information (Fig. 4C).Moreover, this modulation of confidence by MI increased over the course of a block (b = 0.94, CI = [0.56,1.32], P = 0, BF > 5000), such that participants became more confident in rewards after wins, and more confident in not receiving rewards after losses, as they gained more information about the latent MI in the environment.
Finally, in line with the results from Experiment 1, participants reported an increased sense of control over rewards under the Choice deck compared to the No Choice deck (b = 15.77,CI = [7.57,23.97], P = 0, BF > 5000; Fig. 4D).And mirroring Experiment 1, we observed that differences in sense of control between the Choice and No Choice deck predicted overall preference for the choice deck (r = 0.32, CI = [0.10,0.51], P = 0.003; Fig. S3B).Because these ratings were only provided at the end of the experiment, and MI was not static across deck identities in Experiment 2, we were unable to directly examine the effects of MI on self-reported self-control ratings.As a proxy, we considered the effect of the most recent MI level (i.e., the MI level on the last block per participant) on sense of control ratings in an exploratory analysis.Mirroring the results of Experiment 1, we found that participants who most recently experienced an environment MI level of 1 reported having a greater sense of control than those who most recently experienced a lack of instrumental contingency (b = 10.36,CI = [3.33,17.55],P = 0.01, BF = 130.58).

Discussion
Experiment 2 extends and replicates the results of Experiment 1, demonstrating that the value of a given option depends not only on the level of choice it affords, but also on a dynamically-learned estimate of MI-i.e., instrumental value. Replicating past work (Bobadilla-Suarez et al., 2017;Leotti et al., 2010;Leotti & Delgado, 2011), participants preferred the deck that afforded them the freedom to choose between cards.However, this preference depended on the relationship between choices' outcomes and future rewards-i.e., MI: when choices were decoupled from predicting rewards, preference for choice decreased.Notably, from the participant's perspective, the shift in preference was experienced as one for the same stimulus (deck), supporting the idea that the value of choice depends, importantly, on fluctuations in environmental instrumental value (MI) rather than deck or stimulus identity.Supplemental computational modeling confirmed this intuition, demonstrating that the model that best captured participants' behaviour was one in which participants formed an estimate of MI from the recent history of rewards and in turn used it to temper or amplify the intrinsic value of choice (see Supplemental Materials; Fig. S7).

General discussion
A considerable body of work spanning the fields of psychology, neuroscience and behavioural economics suggests that the freedom of choice is desirable (Leotti et al., 2010;Sunstein, 2015).Here, we consider the possibility that the value of choice may not be intrinsic to choice itself, but instead depends importantly on the instrumental relationship between one's choices and their ultimate consequences.To operationalize this relationship, we borrowed a concept from information theory-mutual information (Shannon, 1948)-which we used to quantify the degree to which the outcome of one's choice predicted future rewards.Critically, past work examining the value of choice has assumed that the contingency between actions and outcomes has remained intact-i.e., that choice is made under full mutual information.Here, we interrogated whether the supposed intrinsic nature of the value of choice persists when this connection is severed-does the preference for choice persist in the absence of mutual information?
In short, across two experiments, we find that the value of choice was influenced by the absence (versus presence) of MI, such that, when choices were not predictive of future rewards, participants' preference for making choices diminished.This was the case both when MI was statically associated with stimulus identities (i.e., decks in Experiment 1) and when MI was dynamically learned from the environment (Experiment 2).Notably, while participants preferred choice less in the absence of mutual information in the present experiments, all options effectively had identical expected reward values, which means that the optimal strategy in these tasks would prescribe indifference between options.Despite the fact that all decisions were, from expected reward perspective, inconsequential, we nevertheless observed that participants preferred making choices and that this preference was undercut when the outcomes of one's choice failed to predict future rewards.
If the value of choice is informed by the information that choices may confer about future rewards, yet these choices are inconsequential, what is the source of this instrumental value?Notably, a considerable body of work suggests that people tend to seek out new information, even when this information is of limited utility (Friston et al., 2013;Miller, 1983;   Niv & Chan, 2011).Moreover, under some circumstances, individuals are willing to forego rewards to make choices (Bobadilla-Suarez et al., 2017), or even pay financial costs for the opportunity to receive inconsequential information (Bennett et al., 2016).Similarly, both humans and animals will seek opportunities to learn new information, even when this information is not instrumental to acquiring rewards (Gipson, Alessandri, Miller, & Zentall, 2009;Lea & Ryan, 2015;Liew, Embrey, & Newell, 2023).These results have led some theorists to ascribe an intrinsic value to information, such that information about future outcomes is valued for its own sake, independent of direct rewards (Grant, Kajii, & Polak, 1998).From this perspective, the present results point to the possibility that choice may act as an intermediate means of acquiring instrumental information about prospective reward (Leotti et al., 2010) and, when the ability to do so is severed-here, in the form of restricting MI-the value of choice is attenuated.For example, in a full MI environment, choice may render the world more predictable, reducing environmental uncertainty, and increasing the subjective utility of actions by facilitating learning and prediction (Loewenstein, 1999;Ruiz, DuBrow, & Murty, 2023).When this contingency between actions and rewards is curtailed however (i.e., when MI is eliminated), the predictability of the environment conferred by free choice, and consequently the value of choice, is diminished.While the present results add nuance to our understanding of the intrinsic value of choice-and the value of free choice more generally-future work should aim to further elucidate the role of MI in the value of choice.First, the present work leaves open the question whether the impact of MI observed in the present study would hold in more complex settings than simple action-outcome problems (e.g., in valuebased choice tasks or more complex RL-like environments).For example, a similar design could be employed to examine preference for choice, with and without MI, in which participants make guesses or choose among reward-neutral alternatives (e.g., where card outcomes do not yield monetary rewards).This approach would extend the present results beyond reward learning to the prediction of outcomes more generally. 1Second, recent work has suggested that information-seeking behaviour-which as may bear an important connection to MI-can be viewed as a process that reduces environmental uncertainty (Bennett et al., 2016;Liew, Embrey, Navarro, & Newell, 2023).On this view, in more uncertain environments, people seek information more readily, and even at additional costs.As such, it would be interesting to examine how the preference for choice, both with and without MI, would depend on the degree of uncertainty in the environment.For example, the design utilized in Experiment 1 could be modified to incorporate variable degrees of delays between outcomes and rewards as a proxy for temporal uncertainty (Bennett et al., 2016).
Putting aside the question of whether the value of information explains the connection between MI and the value of choice, our results suggest a refinement to previous accounts of the value of choice, which posit that free choice is imbued with an "inherent" value that undergirds humans' default preference for choice (Bobadilla-Suarez et al., 2017;Leotti & Delgado, 2011).Instead, at least in the context of basic actionoutcome choices, we find that the act of choosing need not be entirely intrinsically motivating and may in addition reflect an instrumental behaviour to acquire information about future rewards.Accordingly, the current results have important implications for our understanding of why people prefer to make choices, even when doing so is completely inconsequential.Namely, choice is an important vehicle for acquiring information about future outcomes.However, it can be difficult to isolate the act of choosing from choices' impact on the world-i.e., their instrumental value.Here, we demonstrate that when this instrumental contingency is controlled for, the preference for choice is attenuated.Together our results demonstrate that-above and beyond the intrinsic value of choice-a significant factor in people's preference for choice is an assumption about its instrumentality.

Fig. 1 .
Fig. 1.Experimental design for Experiment 1. (A) Example learning phase.Depending on the deck, participants may or may not be able to choose among cards drawn form that deck (indicated by the grey lettering in this figure).The selected card is then flipped and if it is larger than 5, the participant wins this round of the game.Based on the deck identity, winning a round of the game may or may not result in a reward, according to the deck's MI. (B) Example choice phase.Participants choose between decks they learned in the learning phase.The remainder of a trial is then identical to the learning phase.(C) Features of each deck.Note: Deck identities here are labelled as A-D for reader's convenience.Participants instead learned deck identities as combinations of shapes (moon or star) and colour (red or black) in randomized order.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Descriptive Results in Experiment 1. (A) Proportions of first-stage choices in favour of Choice decks (left) and Bayesian posterior of these proportions, where the red density reflects decks with no MI and the blue density reflects decks with full MI.(B) Proportions of first-stage choices in favour of MI decks.(C) Proportions of first-stage choices in favour of Choice decks.(D) Ratings for the perceive percent chance at winning a reward having observed a winning card (> 5).Dark bars reflect these ratings in a No Choice Deck and light bars reflect the same in Choice decks.(E) Ratings for subjective sense of control under each deck.Dark bars reflect these ratings in a No Choice Deck and light bars reflect the same in Choice decks.All errorbars reflect 1 SE.Parentheticals under x-axis labels refer to deck pairings-see Fig. 1C.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. Experimental Design for Experiment 2. (A) The experimental design in the Experiment 2. This was identical to the experimental design in Experiment 1, except that MI varied over the course of the task.(B) Shows this variation in one counterbalance condition (reversed in the other condition, not pictured).Note: Deck identities here are labelled as A-B for reader's convenience.Participants instead learned deck identities as shapes (moon or start).

Fig. 4 .
Fig. 4. Results from Experiment 2. (A) Proportions of first-stage choices in favour of Choice decks.(B) Same proportions as A, but over the course of a block.The blue line represents blocks with full MI and the red line blocks with no MI.(C) Ratings of confidence about receiving rewards after observing a card.The x-axis represents whether the card was losing (card <5) or winning (card >5).Dark bars represent blocks with no MI and light bars blocks with full MI.(D) Sense of control ratings for the No Choice and Choice deck.Colours represent the most recently experience MI condition.Namely, dark bars represent sense of control ratings from participants who provided ratings just after experiencing a no-MI block, and light bars show the same for participants who most recently completed a block with full MI.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 1
Regression table for Bayesian multilevel model of choice and MI preferences in Experiment 1.
*, † These rows refer to the same deck pairing, but with different outcome variables.They are included twice only for completeness.S. Devine et al.

Table 2
Regression table for Bayesian multilevel model of choice preferences in Experiment 2.
S. Devine et al.