Ergodicity-breaking reveals time optimal economic behavior in humans

Ergodicity describes an equivalence between the expectation value and the time average of observables. Applied to human behaviour, ergodic theory reveals how individuals should tolerate risk in different environments. To optimise wealth over time, agents should adapt their utility function according to the dynamical setting they face. Linear utility is optimal for additive dynamics, whereas logarithmic utility is optimal for multiplicative dynamics. Whether humans approximate time optimal behavior across different dynamics is unknown. Here we compare the effects of additive versus multiplicative gamble dynamics on risky choice. We show that utility functions are modulated by gamble dynamics in ways not explained by prevailing economic theory. Instead, as predicted by time optimality, risk aversion increases under multiplicative dynamics, distributing close to the values that maximise the time average growth of wealth. We suggest that our findings motivate a need for explicitly grounding theories of decision-making on ergodic considerations.

Ergodicity is a foundational concept in models of physical systems that include elements of randomness 1,2,3 . A physical observable is ergodic if the average over its possible states, is the same as its average over time. For instance, the velocity of gas molecules in a chamber is ergodic if averaging over all molecules at a fixed time (an expectation value) yields the same value, as averaging a single molecule over an extended period of time (a time average). In other words, ergodicity ensures an equality between the time average and the expectation value. The relevance of ergodicity to human behavior is that it provides important constraints for thinking about how agents should compute averages when making decisions 4,5 .
In the behavioral sciences, decision making is studied predominantly using experiments with additive dynamics, where choice outcomes exert additive effects on wealth. An agent might gamble on a coin toss for a gain of $1 each time they win, they might score a point each time they correctly execute a motor action, and so on. In these examples, changes in wealth are ergodic, and in such settings a linear utility function is optimal for maximising the growth of wealth over time 5 . In other words, for this utility function, when changes in expected utility are maximized per unit time, this maximizes the time average growth rate of wealth (Fig. 1f). In contrast, settings with multiplicative dynamics have non-ergodic wealth changes, which means that the expectation value of changes in wealth no longer reflect timeaverage growth. Indeed, it is possible to set up gambles in which changes in wealth have a positive expectation value, but have a negative time average growth rate 4 . Counterintuitively, for such gambles, maximising expected value eventually leads to ruin. In such multiplicative settings a logarithmic utility function is time optimal, since maximizing changes in expected utility per unit time then maximises the time average growth rate of wealth 5 (Fig. 1g).
These examples highlight the fact that time optimal behavior relies on agents adapting their utility functions according to the dynamics of their environments. In contrast, prevailing formulations of utility theory, including expected utility theory 6,7,8 and prospect theory 9,10,11 , are not premised on the dynamics of the environment. In treating all possible dynamics as the same, these formulations imply that utility functions are indifferent to the dynamics. From an ergodic perspective, utility functions have a different meaning compared to the standard economic interpretation. They do not represent idiosyncratic preferences but rather arise as the ergodicity mappings that agents apply as they attempt to grow their wealth over time. In other words, utility functions appear as the transformations required to Figure 1 | Experimental design and wealth trajectories. a, two-sheets summarise the repeated protocol for both days, which only differ in the dynamics of wealth changes. Numbers indicate durations in minutes. b, single trial from a passive session, where durations are in seconds and ranges depict a uniformly distributed temporal jitter. c, single trial from an active session. d, wealth trajectories in real time over the course of each passive session. The trajectory for Passive´ is plotted on a log scale, appropriate to the multiplicative dynamics. Eight randomly selected trajectories are plotted. Dotted line shows initial endowment level of 1000DKK. e, discrepant trials are a subset of trials, where linear and logarithmic utility functions generate opposing choices. f, wealth trajectories of synthetic agents with different utility functions (prospect theory and isoelastic) repeatedly playing the set of additive gambles over one week (Supp. Methods). The agent with Linear utility has the highest time average growth rate (green). g, equivalent simulations for multiplicative gambles. The agent with log utility has the highest time average growth rate (green). The time optimal agent is an agent with linear utility for additive dynamics, and log utility for multiplicative dynamics, and thus also experiences both of the wealth trajectories depicted in green (in f and g).
Gamble dynamics affect choice frequencies. Discrepant trials are the subset of trials in which a linear utility agent would choose a different gamble to a log utility agent (Fig.1e), 25 of 312 trials in the active session had this discrepant property. By observing the choice proportions (CP) we obtain evidence about the dependency between choices and gamble dynamics (Fig. 2a). Levels of evidence are reported throughout according to standard interpretations of Bayes factors (BF) 13,14 .We found moderate evidence against the hypothesis that subjects choose in favour of linear utility under additive dynamics ( Fig. 2b- 0.351], robust over prior widths). Finally, averaging across all models that entail possible combinations of factors and covariates, we found that the inclusion of the dynamic as a factor was uniquely favoured by the data (rmANOVA, BFinclusion = 80.2) with all other factors including order of testing showing BFinclusion < 1, see Supp. Fig. 2b). Together, this shows strong evidence that in the discrepant trials, gamble dynamics exert a strong and systematic influence over choices. , for multiplicative (red) and additive (blue) dynamic. All box and whisker plots indicate range, 1st & 3rd quartiles, and median. b, prior and posterior density for the hypothesis CPlog < 0.5 in terms of effect size, for the additive dynamic (Bayesian t-test), reporting Bayes Factor in favour of CPlog being negative (indicated by BF-0) and its reciprocal in favour of the null hypothesis (BF0-) c, robustness analysis of Bayes Factors in b, with different prior widths. d, Sequential analysis showing how this Bayes factor changes with increasing numbers of subjects, with the different markers indicating different prior widths. e-g, equivalent analyses for the multiplicative dynamic for the hypothesis CPlog > 0.5. h, raincloud plot of the change in choice proportion (DCPlog ) where positive numbers indicate an increase under multiplicative dynamics. i, posterior and prior densities for the hypothesis that CPlog is larger for multiplicative compared to additive dynamics (Bayesian Paired t-test). j-k, equivalent robustness and sequential analyses for this test. 3 | Hierarchical Bayesian model for estimating dynamic-specific risk preference. a, hierarchical Bayesian model for estimating risk preferences. Circular nodes denote continuous variables, square nodes discrete variables; shaded nodes denote observed variables, unshaded nodes unobserved variables; single bordered nodes denote deterministic variables, double bordered nodes stochastic variables. Along the left-hand side describes what role these variables play, and along the right side includes details on the distributions and logistic choice function. The data generating process (blue) which maps from theta to binary choice, is equivalent to a Bernoulli distribution. b, spectrum of utility functions entailed by different settings of the risk aversion parameter h. c, schematic of model predictions for a time optimal and dynamic-invariant isoelastic models. Heatmap indicate probability density, with red & blue lines indicating time optimal risk aversion for additive and multiplicative conditions, respectively, intersecting at the time optimal strategy for both dynamics. Diagonal line indicates risk aversions that are invariant to dynamics. Upper and lower panels indicates stochastic variation around time optimal and dynamic invariant models, respectively. d, frequency distribution of risk aversion values collapsed over subjects for additive (blue) and multiplicative (red) dynamics. Dotted lines indicate time optimal values of risk aversion. e, joint distribution of dynamic-specific risk preferences. Maximum a priori (MAP) values are plotted for the group (pink dot), and for each subject (cyan dots), and are superimposed over the group-level frequency distribution. Error-bars indicate the central BCI95% for the subject-specific MAP values. Red, blue, and diagonal lines have same meaning as in panel c. f-g, posterior distributions of the population-level mean risk aversion. h, displacement in parameter space caused by changing the gamble dynamic. Filled and empty circles indicate additive and multiplicative dynamics, respectively. i-j, equivalent displacements splitting subjects according to the temporal order of their experience of the dynamics. k, mean risk aversion under each dynamic, bars show central BCI95% l, distribution of subject specific time average growth rates and risk aversion under both dynamics. m, correlation between time average additive growth rate of subject's choices and deviation of subject's risk aversion away from the time optimal value. n, equivalent plot for multiplicative dynamics. o, raincloud plots of Euclidean distances of MAPh estimates to the predictions of the time optimal and dynamic invariant utility models. Grey lines link estimates from the same subjects. p, individual posterior probability distributions for risk aversion. Red and blue lines indicate time optimal values for additive and multiplicative dynamics, respectively. Estimates for utility model approximate time optimality. The isoelastic utility model has a single risk aversion parameter ( ), negative values of which entail risk seeking, zero entails risk neutrality, and positive values entail risk aversion (Fig. 3b. eq. 11). This model is suited to an explorative analysis of time optimality insofar as its parameter space contains values that are time optimal solutions for both additive and multiplicative dynamics. Specifically, an agent that switches from risk neutrality with an of 0 under additive dynamics, and to risk aversion with an of 1 under multiplicative dynamics, is achieving time optimality by switching between linear and logarithmic utility. Thus, from this perspective, risk aversion should be calibrated to the dynamical setting to maximise the time average growth rate of wealth. Such time optimal agents would be expected to distribute their parameters around this optimal point as in Fig. 3c (upper panel), whereas agents with no systematic shift (dynamic-invariant agents), would distribute around the diagonal line (lower panel). In estimating a hierarchical Bayesian model of isoelastic utility (Fig. 3a), we obtained separate posterior distributions of risk aversions for each gamble dynamic, which can be compared to these theoretical predictions. We refer to this as a dynamic-specific isoelastic model. Firstly, we find strong evidence that risk aversion increases from additive to multiplicative dynamics (Fig. 3k, Paired-t, BF10 = 2.9 × 10 7 , MD =1.001, SD = 0.345, SE = 0.081, BCI95%[0.829,1.172]), which is indistinguishable from the predicted size of change in , under time optimality. As with the choice proportions, we found extreme evidence for the effect of gamble dynamic on risk aversion, compared to all other factors tested (rmANOVA BFinclusion = 2.45 × 10 9 , all other factors < 1, Supp. Fig. 4b).
Finally the frequency histograms of risk aversion marginalised over all subjects (Fig. 3c) show that the maximum a priori (MAPh) value approximates the time optimal predictions for each dynamic: under additive dynamics, the distribution estimated from the data has a MAPh= 0.1506, compared to the time optimal prediction of = 0 (Fig. 3d, blue); under multiplicative dynamics, the distribution estimated from the data has a MAPh=1.1534, compared to the time optimal prediction of = 1 (Fig. 3d, red). The joint distribution over a risk aversion space (Fig. 3e) shows that the MAP estimate of the joint distribution is likewise close to the optimal point indicated by the intersection of the prediction lines. A complementary visualisation of this correspondence comes from the posterior distribution of the population parameter for the mean of ( Fig. 3f-g). This indicates a qualitative agreement between the distribution of risk aversions, and the normative predictions of the time optimality model.

Risk preferences are closer to predictions of time optimality.
To test whether risk aversion values are explained better by time optimality (Fig. 3c upper), or alternatively by a dynamic invariant utility model (Fig. 3c, lower), we computed the distance of each subject's risk aversion (MAPh) to the predictions of each model. For the time optimal model this is the Euclidean distance to the time-optimal coordinate (0,1), and for the dynamic invariant model this is the distance to the closest point on the diagonal. We find extreme evidence that risk aversions are closer to the time optimal prediction (Fig. 3o, Paired-t, BF10 = 2.8 × 10 11 , M = 0.623, BCI95% [0.565, 0.681], Supp. Fig. 3e-h), and that this is true for every subject tested. Together this shows that the time optimality model is a better predictor of risk aversion over different dynamics, than a null model which assumes no effect of dynamics on risk aversion.

Order of gamble dynamics does not substantially affect choice.
In the dynamic-specific isoelastic model, both the risk aversion parameter and the sensitivity parameter (modelling how sensitive choices are on differences in utility, eq. 15) are free to vary for each subject when the gamble dynamics change (Fig. 3a). Plotting the joint distribution of both and , affords visualisation of the effect of the dynamic on both risk aversion and on choice sensitivity (Fig. 3h). We found that a switch from additive to multiplicative dynamics is associated with a characteristic shift in this parameter space toward greater risk aversion, and toward greater sensitivity. The order in which subjects experienced different gamble dynamics was counterbalanced. In the subgroup that tested in the additive condition first (Fig. 3j), the movement in parameter space is in the opposite direction to the subjects tested multiplicative condition first (Fig. 3i), as predicted if the effect was primarily driven by the dynamic, not the order of testing. The inclusion probability for the order of testing had a Bayes Factor below one, indicating anecdotal evidence that the data disfavours its inclusion in the model (BFinclusion = 0.891, Supp. Fig. 4c). Thus, the order of exposure to different gamble dynamics did not substantially affect choice.
Deviation from time optimal value decreases time average growth rates for wealth. The relation between a subject's risk aversion and the time average growth rate of their choices (eqs. 8-9) can be noisy due to the probabilistic relation between utility and choice. This stochasticity is visible in the relation between the time average growth rates of the choices made and the risk aversion estimated for each subject under both dynamics, though the highest growth rates coincide with values close to the time optimal risk aversion ( Fig. 3l). Further, we found that the closer the subjects shifted their risk aversion toward time optimal values, the higher the time average growth rates of their wealth, given their choices for both additive ( . Thus, the risk aversion parameter that best describes a subject's choices is predictive of their time average growth rate. This illustrates that deviating from time optimality has negative consequences for growing wealth, as implied by theory. Bayesian model selection supports time optimality over other utility models. The dynamic-specific isoelastic utility model suggested that subjects dynamically adapt their choice behaviour in a way predicted by time optimality. We next compared the predictive adequacy of three models, an isoelastic model, a prospect theory model (eq. 10), and the time optimal model (eq. 12), detailed in Fig. 4a&b. The time optimal model is fixed in its theoretical predictions for the population means of , restricted to be 0 for additive dynamics and 1 for multiplicative dynamics. However, the variance around this mean is a free parameter in order to account for the plausible assumption that not all subjects are phenotypically identical. prospect theory has two utility parameters whose means are not fixed at the population level, but are free to vary within standard restrictions that define the theory (See Models, in Methods section). Finally, the isoelastic model has one utility parameter that is estimated across both sessions, whose mean is free to vary at the population level. Markov chain Monte Carlo sampling of this model results in posterior frequencies for the model indicator variable that are interpreted as posterior probabilities for each model, estimated for each subject 15 . Most subjects had most of their probability mass located over the time optimal model (Fig. 4c), as is evident from the marginal probability over subjects (Fig. 4d). Computing protected exceedance probabilities, which measure how likely it is that any given model is more frequent (estimated frequencies in Fig. 4e) than all other models in the comparison set, we found that the time optimal model had an exceedance probability of 0.976 (Fig. 4f) which corresponds to very strong evidence for being the most frequent (BFTime-PT = 76.9, BFTime-Iso 80.6).  Fig. 3. This model adds a model indicator variable (z) to modelling latent mixtures of the three different utility models nested within it. Note that for prospect theory risk preference parameter , there is one parameter for gains, and another for losses. b, hyperprior and prior distributions, including structural equations, choice functions, and choice generating distributions. Hyperpriors for are duplicated to model gains and losses separately. c, posterior model probabilities for each model based on the model indicator variables representing each utility model. d, posterior model probabilities summed over subjects, with the red bar indicating prior probabilities assuming equal prior probability for the three utility models, and error bars as standard deviations. e, estimated model frequencies from the cohort. f, protected exceedance probabilities for each utility model being the most frequent.

Discussion
By manipulating the dynamical properties of simple gambles, we show that ergodicity-breaking can exert strong and systematic effects on human behavior. Switching from additive to multiplicative dynamics reliably increased risk aversion, which in most subjects tracked close to the levels that maximise the time average growth of wealth. We show that these effects cannot be adequately explained by the prevailing models of utility in economics and psychology, and are well approximated by a null model of time optimality based on ergodic theory.
The size of the cohort (achieved n=18) was constrained to concentrate power within subjects, and by the highstakes design, in which each participant could walk away with up to 750 USD in payout. Restricting our inferences to this cohort, the effect was consistent across all participants, and was reproducible across different inferential approaches. In general, the strength of the evidence we obtained from individuals likely derives from the fact that the game is high stakes, and also from the fact that we collected a large number of decisions (over 600 per participant) over a large number of distinct gambles (320 per participant). This affords opportunity for stringent testing between utility models that make overlapping predictions. It should be noted that these participants were able to perform the task under challenging cognitive conditions. Gambles were chosen based on the participant's memory of the fractal stimuli from the previous passive session, with choices being made every ~10s for ~1 hour of testing (per day) in a noisy environment. Nearly all participants (18 of 19) tested could choose dominant gambles in No-brainer trials (Supp. Fig.  4e) above chance. Finally, the fact that discrimination between utility models is possible under our modelling framework is evident from its ability to recover parameters and model identities from synthetically generated agents (Supp. Fig. 5).
The time optimal model assumes that agents prefer their wealth to grow faster, and that this preference is stable. From these two assumptions, it can be shown that to maximise the time average growth rate of wealth, agents should adapt their utility functions according to the wealth dynamics they face, such that changes in utility are rendered ergodic 5 . From this, a number of simple predictions can be derived. First, to approximate time optimal behavior, different dynamics require different ergodicity mappings. Thus, when an agent faces a different dynamic, this should evoke the observation of a different utility function. This was observed, in that all subjects showed substantial changes in their estimated utility functions (Fig. 3p). Second, in shifting from additive to multiplicative dynamics, agents should become more risk averse. This was also observed in all subjects. Third, the predicted increase in risk aversion should be, in the dimensionless units of relative risk aversion, a step change of +1. The mean step change observed across the group was +1.001 (BCI95%[0.829,1.172]). Third, to a first approximation, most (not all) participants modulated their utility functions from ~linear utility under additive dynamics, to ~logarithmic utility under multiplicative dynamics (Fig. 3d). Each of these utility functions are provably optimal for growing wealth under the dynamical setting they adapted to 5 , and in this sense they are reflective of an approximation to time optimality. Finally, Bayesian model comparison revealed strong evidence for the time optimal model compared to both prospect theory and isoelastic utility models, respectively. The latter two models provide no explanation or prediction for how risk preferences should change when gamble dynamics change, and even formally preclude the possibility of maximising the time average growth rate when gamble dynamics do change. Congruent with this explanatory gap, both prospect theory and isoelastic utility models were relatively inadequate in predicting the choices of most participants (Fig. 4c).
The dependency between dynamics and risk aversion that we observe here is relevant to a widespread assumption in economic theory that preferences are stable over time 8,16 . Primarily, this is motivated on epistemological grounds. If utility is to predict behaviour in future settings, then it must be stable, otherwise if behavior changes, it is not known if this is due to a change of setting or preference, or both 16,19 . However, there are a diversity of empirical demonstrations of preference instability. In animals, including humans, there is evidence suggesting that risk preferences depend on homeostatic 20,21,22,23 , circadian 24 , and affective states 25 . Test-retest stability in the same settings, though typically reported as modest 26 , can be relatively high when estimated using hierarchical models of the sort used here 27 . The findings reported here place the stability of utility in a broader context by connecting to an optimality framework for how utility functions should change in response to environmental dynamics. This casts the dynamical dependence of utility functions observed here, not as preference instability per se, but simply as a manifestation of a stable preference for growing wealth over time when facing different circumstances.
Economic and psychological models of decision-making, are predominantly developed without recourse to dynamical considerations, and are typically experimentally tested in settings that evoke additive dynamics. The theories developed under these conditions are then assumed to generalise to settings in which additive dynamics may no longer apply, and multiplicative dynamics likely dominate, which may contribute to the predictive inadequacy of many models. Together, this motivates a need to explicitly condition theories of decision making, and their applications, on ergodic considerations.

Methods
Subjects, Power, Ethics. This paper focuses on the behavioral data obtained from a neuroimaging study on the neural encoding of utility. The criteria for inclusion were being aged 18-50, and fluent in English. The criteria for exclusion were a history of: psychiatric or neurological disorder, credit problems (operationalized via bad pay status on www.dininfo.dk), or expertise in a quantitative or cognitive domain (finance, banking, accountancy, economics, mathematical sciences, computer science, engineering, physics, psychology, neuroscience). MRI-specific exclusion criteria were also applied, including: implanted metallic or electronic objects, heart or brain surgery, severe claustrophobia, or inability to fit into the scanner (weight limit of ~150kg, bore diameter of 60 cm). Except for the latter, all such information was self-reported. The intended sample size was 20, however due to post-hoc exclusion (1 participant fell asleep, 1 failed to learn the stimuli) the achieved sample size was 18 (6 female, age: M = 25.79, SD = 4.69, range 20-38). Subjects were recruited as a convenience sample, via the subject recruitment website www.forsøgsperson.dk. The sample number was based on general guidelines for the minimal number of subjects required for medium effect sizes in neuroimaging datasets 28 . The number, timing, and jittering (randomised timing) of events within each session was based on prior efficiency simulations for similar neuroimaging paradigms. As such, no a priori design analyses were performed for the behavioral data only. No stopping rule or interim analyses were performed. Data collection ran from the 10/06/2017 to 30/07/2017. All data was acquired at the Danish Research Centre for Magnetic Resonance. Informed consent was obtained from all subjects as approved by the Regional Ethics Committee of Region Hovedstaden (protocol H-17006970) and in accordance with the declaration of Helsinki. Independent of their payouts in the gambling paradigm, all subjects were compensated 1020 DKK / ~$160 for a grand total of 6 hours of participation over the two days. A forthcoming paper will focus primarily on the neuroimaging data.
Experimental procedure. After changing into hospital gowns subjects were read the instruction sheet (Supp. Materials). To précis, subjects were truthfully informed that the aim of the experiment was to study how the brain reacts to changes in wealth, that all of the money involved is real, and that the total accumulated wealth will be paid out as the sum of that accumulated over the two days (Fig. 1a). They then played ~20 demo trials of the paradigm in the scanner control room, including both active and passive sessions (~5mins) for no financial consequence. The experimenter demonstrated what happened if buttons were not pressed in time (Fig. 1b&c). Subjects were instructed that each day lasts 3 hours in total, with ~60mins for the passive session (inc. time for localiser scan and shim), a short break, then ~60-75mins for the active session (inc. localiser, shim, anatomical scans), with short breaks within the session (Fig. 1a). Each subject entered the scanner, was set up with a respiratory belt to monitor breathing, and with a pulse meter on the middle or index finger of the non-responding hand. All stimuli were projected under dark conditions onto a screen located within the bore of the MRI magnet (Siemens, MAGNETOM Prisma), and viewed via mirrors mounted to the head coil. Subjects were instructed to fixate the central fixation cross at all times (Fig. 1b&c) and choose via button box. The paradigm was presented via the Psychopy2 toolbox (v1,84.2) running on Python (2.7.11).
Experimental design. The experiment is a fully crossed randomized controlled trial in which the wealth dynamic is the primary independent variable, and choice is the primary observable. The wealth dynamic, as well as the deterministic association between fractals and outcomes was controlled via computer programme and thus double blinded. Further, since payouts at the end of the test day were subject to being randomly realized from each subjects' choices as well as being statistically balanced between conditions, payout was also effectively double-blinded. Subjects were neither informed of any explicit details concerning dynamics or differences between test days, nor given any reason to expect that the test days were different. The instructions, procedures and setup were otherwise identical for both test days. The order in which multiplicative and additive test days were conducted was counterbalanced across the group. The primary measures were the choices acquired during the active session. Measures collected but not included in this report include all functional and structural neuroimaging modalities, physiological noise measurements (pulse rate and breathing), and reaction times. To ensure good quality model estimation, we recorded a large number of decisions (312 total per active session) spanning a large subspace (144) of the possible unique gamble combinations. To avoid the problems associated with gambling for "peanuts" 29  Pre-registration and deviations. The experimental protocol was preregistered at www.osf.io/9yhau. There was one deviation from the protocol: The preregistration stated that in the Passive + session, the final additional fractal applied to their wealth after having returned to 1000DKK (see section "Passive session fractal sequences" below) would exclude the most extreme fractals. Those were, however, included in the paradigm.

Passive session instructions.
Subjects were instructed in English as follows: "For the passive phase, you will see a number in the middle of the screen, this is your current wealth for the day in kr. When you see a white box around the number, you are to press the button within 1s. (If you do not, you will be instructed to "press button earlier"). Shortly after pressing the button you will see an image in the background, and this will cause your wealth to change. You are instructed to attend to any relationship between the images and the effect this has on your wealth, since in the active phase that follows you will be given the opportunity to choose images to influence your wealth. Learning these relationships can make a large difference to your earnings in the active phase." Passive session dynamics. Formally the passive session can be described as follows: At the start of each test day, subjects were endowed with an initial wealth ( 0 ) of 1000DKK, which defined their wealth at the first timepoint, which we denote as 0 . Independently for each subject, 9 fractal stimuli were randomly assigned (from a fixed set of 18) for Day + , with the remaining 9 assigned to Day´. Each fractal, viewed at time was programmed to have a deterministic effect on the subject's wealth ( ), with the sequence of fractals causing stochastic fluctuations in wealth (Fig. 1d). The sequence of fractals deterministically caused dynamics in their wealth which can be expressed as: where ⊛ is a wildcard operator, which on Day + is the addition operator +, and on Day´ is the multiplication operator ×. ( ) is a random outcome variable drawn from set × on Day´, and from set 9 on Day + (see Supp. Fig. 1a). This means that the type of wealth dynamic that the fractals caused was controlled by the test day. On Day´ under multiplicative dynamics, the outcome ( ) is the realisation of a random multiplier (growth factors) that can range from ~doubling at one extreme, to ~halving at the other (equally spaced on a logarithmic scale). On Day + , under additive dynamics, the outcomes ( ) is the realisation of a random increment, ranging from +428 to -428DKK (equally spaced on a linear scale). Though the dynamics are qualitatively different, we set the bounds of the random increments for Passive + to the central 85th percentile interval of the absolute wealth changes on Day´.
Passive session fractal sequences. The fractal sequence was randomized such that wealth levels were constrained to lie in the interval (0 , 5000 ) at all times. This was achieved by generating a set of 333 fractals such that each of the 9 fractals would be seen 37 times. The sequence order was randomised without replacement. Any sequence that resulted in a partial sum larger than 5000 or lower than 0DKK, would be rejected and another random sequence generated. This was necessary to render the experiment subjectively plausible, and to avoid debts, which for ethical reasons could not be realised. Since each fractal was presented with equal frequency, the finite time average growth rate at the end of these 333 trials was zero and subjects had returned to their initial endowed wealth of 1000DKK. One additional fractal was then shown and applied to their wealth, meaning that all subjects had a randomly determined wealth level, as they had been informed (Fig 1d).
Passive session wealth trajectories and growth. The wealth at the end of the Passive + session can be calculated as: and for the Passive´ session as: where, in both equations, ( ) is the random outcome variable in round , and T is the total number of trials in the passive sessions. The finite time average growth of wealth on Day + can be calculated as: where ∆ = ( 0 + d ) − ( 0 ), and ∆ = d . On Day´ this is calculated as: This design ensured substantial opportunity for subjects to learn the causal effects of each fractal, whilst also not accumulating extremely high or low wealth levels.

Active session instructions.
After the passive session, the subjects had a short break of ~5mins outside of the scanner before returning to engage in an active choice task in which they repeatedly decided between two different gambles composed of the fractals they had just learnt about (Fig. 1a). Subjects were instructed as follows: "With the money accumulated in the passive phase, you will play gambles composed of the same images. In each trial, you will be presented with two of the images that you have learned about in the passive phase. By pressing the buttons in the scanner to move a cursor, you now have the option to choose to either: a) Accept gamble one, in which case you will be assigned one of the two images, each with 50% probability (not shown), or… b) Accept gamble two, in which case you will be assigned one of the two images, each with 50% probability (again not shown). The outcomes of your gambles will be hidden from you, and only 10 of them will be randomly chosen and applied to your current wealth. You will be informed of your new wealth at the end of the active phase. You can keep any money accumulated after the active phase. If you do not choose in time, then we will give you one of the worst images, it is recommended that you always choose in time. Fig. 1c, within a trial, subjects first saw the first gamble of a pair of gambles. This gamble is composed of two fractals on the left-hand side of the screen, each of which they knew has a 50% chance of being applied to their wealth should this gamble be chosen. We refer to this as the left gamble, (OPQI) . 1.5-3 seconds later (uniformly distributed), on the right they saw another two fractals, here comprising the right gamble (RSTUI) . In a two alternative forced choice, on each trial, subjects choose via button press between gamble (OPQI) and (RSTUI) . Formally the gambles are: Choosing between two gambles eliminates any confounds caused by potential preferences for or against gambling 30 . Note that all probabilities are equal and correspond to a fair coin, such that these are easily communicated and control for any probability distortion effects. The outcome of each gamble was hidden from subjects to avoid subjects being "conditioned" to prefer particular fractals as a function of the stochastic pattern of previous outcomes. This also prevents mental accounting, where subjects keep track of what they have earnt, which introduces idiosyncratic path dependencies.

Active session gambles. As shown in
Active session growth rates. For any gamble we can calculate its time average growth rate. The time average additive growth rate for the left-hand gamble is: and equivalently for the right-hand gamble. The time average multiplicative growth rate for the left-hand gamble is: and equivalently for the right-hand gamble. Note from that there were no numerical or symbolic cues at this point, their decision could only be based on their memory of each fractal (Fig. 1c). If subjects did not respond within the decision window, then they were assigned the worst fractal for that trial.
Active session gamble space. For any one gamble, there are 81 possible combinations of fractals (9 2 , see Supp. Fig. 1b), and 6561 possible pairs of gambles (81 2 ). This gamble-choice space is too large to exhaustively sample, and contains many gambles that do not discriminate between our hypotheses, and thus we imposed the following constraints: All gambles should be mixed (composed of a gain and a loss), and no two fractals presented in one trial should be the same, this reduces the gamble choice space down to 144 unique non-dominated choices between gambles -16 mixed gambles (red text cells, in Supp. Fig. 1b) , paired with 9 other mixed gambles with unique fractals, gives 16*9 possible gamble pairs. Each of these choices was presented twice, resulting in 288 in total. Subjects were also presented with 24 No-brainer choices, in which both gambles shared an identical fractal, but differed in a second. These are otherwise known as statewise dominated choices. In these No-brainer choices, the subject should choose whichever gamble includes the better unique fractal. This offers a direct means of testing of whether subjects could accurately rank the fractals. One participant (#5) failed to choose statewise dominated gambles with a probability > 0.5, and was excluded from further analysis (Supp. Fig. 4e). All choices were presented in a random order without replacement.
Subject payout. Subjects were informed of the following on the first test day prior to the passive session: "At the end of the two days. Your accumulated wealth will be added over the two days, and transferred to your account, within approximately two weeks, and is taxable under standard regulations (B-income). Total earnings = (Wealth after day 1) + (Wealth after day 2). This will be paid over and above your remuneration for participating in the experiment." Payout on each test day was limited to the range of 0 to 2000DKK for each day, and thus the range of possible grand total payouts was 0 to 4000DKK (excluding compensation for time).

Models
Model summary. The aim of the modelling was to perform both parameter estimation and model selection. All models deployed hierarchical Bayesian methods, estimated via Monte Carlo Markov Chain sampling. For parameter estimation we estimated a hierarchical model of isoelastic utility, whereas for model selection we estimated a hierarchical latent mixture model, to model latent mixtures of three different utility models.
Model space. Following 11 models can be described by specifying three functions: a utility function, a stochastic choice function, and probability-weighting function. Since all probabilities of outcomes are identical in our experiment we do not deploy any probability-weighting function. The principal objective of the modelling is to compare between different utility functions in accounting for the choice data over both dynamical conditions. We compared three utility models: Prospect theory where changes in utility are equal to a power function of changes wealth: where pqrr and TsSt are risk preference parameters lying on the interval (0,1), and is a loss aversion parameter which lies on the interval(1, ∞). Note that, although this is referred to as value within prospect theory itself, we here refer to this as utility for clarity of comparison between models.
Isoelastic utility where changes in utility are given by: where is a risk aversion parameter which lies on the real number line, with risk aversion increasing for numbers above 0, and risk seeking increasing for increasingly negative numbers.
Time optimal utility where changes in utility are determined by linear utility under additive dynamics, and by logarithmic utility under multiplicative dynamics.

( . 12)
Note that this model follows from one criterion, that agents maximise the time average growth rate of their wealth according to the dynamic they face. These utility functions allow the time average growth rates under these two dynamics to be computed, and maximised by choice.
Expected utility. For each gamble the expected utility is calculated for each utility model as the expectation value: and equivalently for the right-hand gamble. Differences in utility between the left and right gambles are denoted by ∆ such that the difference in expected utility between the left and right-hand gamble is ( . 14) Stochastic choice function. The stochastic choice function is identical for all models under consideration, and is comprised of a logistic function: where is a sensitivity parameter that determines the sensitivity of the choice probability to differences in the expected change in utility between the two gambles, and where evaluates to the probability of choosing the left-hand gamble. For clarity of presentation we suppress subscripts and superscripts that denote model, and subject specific parameters (Fig. 4a). Note that is free to vary over both subjects and condition for all three models, and thus there are two sensitivity parameters per subject, for each of the three utility models. Allowing the sensitivity parameter to change with the dynamic, allows any potential scaling differences in the change of wealth, to be accommodated in the stochasticity of the choices.
Sampling procedures. The Bayesian modelling affords computation of full probability distributions of parameters, rather than only point estimates which ignore the uncertainty with which parameters are estimated. Via its hierarchical structure, individuals are modelled as coming from group-level distributions, such that information from the group informs the estimation of the individual, and constrains extreme values that might be estimated with uncertainty 31  Model selection. The three utility models were estimated via a single hierarchical latent mixture (HLM) model. Whilst these utility models are submodels of the HLM, for consistency we call them utility models. The HLM model is depicted graphically in Fig. 4a and with distributional and structural equations detailed listed in 4b. The sensitivity parameter parameter is common to all three utility models and is free to vary by subject and by condition, to accommodate any differences in the scaling of wealth changes. Following Nilsson and colleagues 31 we set weakly informative hyperpriors, such that the group mean of was certain to lie in an interval that ranges from 0.  (Fig. 3c). Prospect theory utility model: has three further free parameters. For risk preferences it has one parameter each for gains and losses, both are constrained to be lie between 0 and 1, here assumed to each come from a lognormal distribution, with an uninformative uniform prior distribution on the lognormal group means and standard deviations b ~ Uniform(-2.3, 0) and b ~ Uniform(0,1.6). The third parameter is the loss aversion parameter λ, which we assumed to lie on an interval from 1 and 5, and thus we set equivalent non-informative uniform priors on the lognormal group means and standard deviations • ~ Uniform(0, 1.6) and •~ Uniform(0, 1.6). Isoelastic utility model: Assuming uninformative uniform priors for the population mean of the risk aversion parameter x~U niform(-2.5, 2.5) and x ~ Uniform(0,1.6) for the lognormal standard deviations. Latent mixtures of utility models: Finally, the modelling of latent mixtures of models via indicator variables, allows model comparison between qualitatively different, as well as nested utility models, within one superordinate model 15 . The model indicator variable was set with non-informative uniform priors and was free to vary by subject. This represents our agnosticism toward which utility model is best under variable dynamics. The posterior model probabilities (Fig. 4c), estimated model frequencies (Fig. 4e) and the protected exceedance probabilities (Fig. 4f) were estimated via the Variational Bayesian Analysis toolbox 32 (mbb-team.github.io/VBA-toolbox/). Parameter estimation. Via the hierarchical model depicted in Fig. 3a, we estimated the posterior distribution of risk aversion parameters for a single dynamic-specific isoelastic utility model, given the choice data. This model is an isoelastic model in which the risk aversion parameter is free to vary over dynamics, as well as over subjects. It is specified to be the same as the isoelastic utility model used in the model selection, except here the risk aversion parameter is estimated condition-wise, and there are no other utility models or latent model indicator variables.
Data Availability. The datasets, analyses, stimuli, code, and codebook are available in the 'ergodicity-breaking-choiceexperiment" repository: github.com/ollie-hulme/ergodicity-breaking-choice-experiment. All data figures have associated raw data. There are no restrictions on data availability. Choice proportion analysis Day + . In the following H0 denotes the null hypothesis, H-to denotes the alternate hypothesis specifying values less than a reference value, and H+ to denote the equivalent for values above a reference value. Bayes factors obeys the same notation: BF-0 denotes a Bayes factor for H-over H0, BF-0 for H0 over H-, and so on. To assess choice proportions on Day + we performed a one-sample Bayesian t-test in which we assign effect sizes a zero-centred Cauchy prior with scale 0.707 (

B √X
). The fat-tailed Cauchy distribution is used because it fulfils particular criteria 33,34 . Of interest is the posterior distribution for the underlying choice proportion CPlog. The resulting posterior distribution which is concentrated near 0.5, with a central 95% credible interval of 0.395 to 0.591. The alternative hypothesis (H-) is relatively informative insofar as it states that CPlog is lower than 0.5, but that values of CPlog close to 0.5 are more likely than those values far below it (H-: 0 < CPlog < 0.5) as seen in Fig. 2b which shows the one-sided prior and posterior distribution for the effect size of CPlog under the informative H-. Correspondingly, the null hypothesis (H0) states that agents will choose with respect to the linear utility less often than in favour of log utility, and thus predicts that the choice proportion in favour of log utility will be larger than 0.5 (H0 : CPlog > 0.5). A one sample Bayesian t-test revealed a BF0-of 3.678, which indicates the null hypothesis is nearly 4 times more likely than the alternative, which can be classed as moderate evidence. As shown in Fig. 2b, compared to the prior distribution, the posterior distribution is more concentrated near an effect size of 0. For robustness checks, the effect of different prior widths (wide and ultrawide priors, scale factors 1 and √2, respectively) can be seen in Fig. 2c&d, which show that they do not effectively change this interpretation. In conclusion, this indicates moderate evidence that under additive dynamics, choices in favour of linear utility were not more likely than those in favour of log utility.
Choice proportion analysis Day´. As above, to assess choice proportions on Day´ we performed a one-sample Bayesian t-test in which we assign effect sizes a zero-centred Cauchy prior, with scale 0.707. Of interest is the posterior distribution for the underlying choice proportion CPlog. The resulting posterior distribution is concentrated near 0.7, with a central 95% credible interval for CPlog that ranges from 0.625 to 0.812. The alternative hypothesis is relatively informative and states that CPlog is higher than 0.5, but that values of CPlog close to 0.5 are more likely than values far above it (H+ : 1 > q > 0.5) as seen in Fig. 2e which shows the one-sided prior and posterior distribution for the effect size of CPlog under the informative H+. The null hypothesis states that agents will not choose with respect to the log utility more often, and thus predicts that the choice proportion in favour of the log utility will be smaller than 0.5 (H0 : CPlog < 0.5). A one sample Bayesian t-test revealed a Bayes Factor for the data being ~460 times more likely under H+ than under H0, which is classed as extreme evidence. As shown in Fig. 2e, compared to the prior distribution, the posterior distribution is concentrated near an effect size of 1. Robustness checks and sequential analysis can be seen in Fig. 2f-g, and do not effectively change this interpretation. In conclusion, this indicates extreme evidence that under multiplicative dynamics, choices in favour of log utility are more likely than those in favour of linear utility.

Effect of dynamic on choice proportion.
To assess within subject changes in choice proportion following the different dynamics, we performed a Bayesian paired t-test in which we assign effect sizes a zero-centred Cauchy prior with scale 0.707. Of interest is the posterior distribution for the between-dynamic difference in choice proportion DCPlog. The resulting posterior distribution is concentrated near a proportion difference of 0.23, with a central 95% credible interval for CPlog that ranges from 0.099 to 0.351. The null hypothesis states that agents will not change their choice proportion under different dynamical conditions, and thus predicts that the choice proportion will be equal for each condition (H0 DCPlog = 0). The alternative hypothesis is relatively informative and states that DCPlog is larger than 0, but that values of DCPlog close to 0 are more likely than values far above it (H+ : 1> DCPlog > 0) as seen in Fig. 2i which shows the one-sided prior and posterior distribution for the effect size of DCPlog under H+. The paired Bayesian t-test revealed a Bayes factor of 52.376, which indicates the alternate hypothesis is around 50 times more likely than the null, which can be classed as very strong evidence. As shown in Fig. 2i, compared to the prior distribution, the posterior distribution is concentrated near an effect size of 0.8. Robustness checks and sequential analyses can be seen in Fig. 2j&k, and do not effectively change this interpretation. In conclusion, we find that gamble dynamics have a very strong effect on choice frequencies, with the gamble dynamics moving choice frequencies in the direction predicted by time optimality. Repeated measures ANOVA for choice proportions. We conducted a Bayesian repeated measures ANOVA on the choice frequency data, with gender, age, and order of testing (*First) as between subject factors, and dynamic as within subject factors. We used the default prior options for the effects (r = 0.5 for the fixed effects, prior scale factor 0.707). To assess the robustness of the result, we also repeat the analysis over wide and ultrawide priors. The 'Model Comparison' table (Supp. Fig. 2a) gives the results with respect to the different models that are compared. The models that are considered are all possible models including interactions of factors. The table lists all of the models, and the corresponding Bayes factors, where the best performing model (here, this is the model that includes only the dynamic factor) is compared to all the other models. The column of BF01 shows that the data are ~6.6 times more likely under the model with only the dynamic, than under the full model (i.e., the model with age, gender and order of testing, and their interactions). The column P(M) lists the prior model probabilities, which are held uniform across all the models. The column P(M|data) lists the posterior model probabilities. The column BFM lists the comparisons between the best model (dynamic factor only) and each other model. The 'Analysis of Effects' (Supp. Fig. 2b) gives Bayes factors for the inclusion of each effect that appears in at least one model. For each effect, the BFinclusion column reflects how well the effect predicts the data by comparing the performance of all models that include the effect to the performance of all the models that do not include the effect. For the gamble dynamic, there is very strong evidence in favor of its inclusion (BFinclusion > 80), whereas for all other factors there is either evidence against their inclusion, or only anecdotal evidence for their inclusion. In conclusion, compared to other factors and covariates, gamble dynamics have a uniquely strong effect on choice frequencies. Gamble dynamics exert strong effects on risk aversion parameters. To assess within subject changes in following the different dynamics, we performed a Bayesian paired t-test in which we assign effect sizes a zero-centred Cauchy prior with scale 0.707. Of interest is the posterior distribution for the between-dynamic difference in , denoted D . When comparing the of the multiplicative to the additive, the resulting posterior distribution is concentrated near a decrease of 1.01, with a central 95% credible interval for D that ranges from 0.829 to 1.172 (Supp. Fig. 3a). The null hypothesis states that agents will not change their risk aversion under different dynamical conditions, and thus predicts that will be equal for each condition (H0 : D = 0). The alternative hypothesis is relatively informative and states that D is less than 0, but that values close to 0 are more likely than values far below it (H-D < 0) as seen in Supp. Fig. 3b which shows the one-sided prior and posterior distribution for the effect size of D under H-. The paired Bayesian t-test revealed a Bayes factor of 2.9 × 10 7 , which indicates extreme evidence in favour of the alternate hypothesis. As shown in Supp. Fig. 3b, compared to the prior distribution, the posterior distribution is concentrated near an effect size of -3. Robustness checks over different prior widths can be seen in Supp. Fig. 3c&d, and do not effectively change this interpretation. Descriptive statistics for the parameter are in Supp. Fig. 4a. In conclusion, there are strong effects of gamble dynamics on risk aversion, with the multiplicative dynamics increasing estimated risk aversions, compared to additive dynamics.
Estimates of risk aversion are closer to predictions of time optimal model. To establish whether the values are closer in " -space" to the predictions of the time optimal or dynamic-invariant models, we computed the Euclidean distances of each subjects MAP estimate to each of the models predicted coordinate(s): In the time optimal case this is simply the distance to the [0,1] coordinate, whereas for the dynamic invariant utility model this is the shortest distance to the main diagonal (Fig. 3c). We are interested to test whether these distances to the model predictions are smaller under the time optimal model, and thus we compute the difference in distance as DistD = Distinvariant -Disttime. DistD had a mean of 0.65 (Supp. Fig. 3e) indicating the time optimal model was closer in its predictions. To test this, we performed a Bayesian paired t-test in which we assign effect sizes a zero-centred half-Cauchy prior with scale 0.707. Of interest is the posterior distribution for the effect sizes of DistD. The resulting posterior distribution is concentrated near a median effect size of 5.321, with a central 95% credible interval that ranges from 4.583 to 6.223 (Supp. Fig. 3f). The alternative hypothesis is relatively informative and states that difference in distances will be positive, but that values close to 0 are more likely than values far above it (H+: DistD > 0) as seen in Supp. Fig. 3f which shows the onesided prior and posterior distribution for the effect size of DistD under H+. The null hypothesis states that the distance of the data to the predictions is larger for the dynamic invariant model than for the time optimal model, and thus predicts that the difference in distances will be negative (H0 : DistD < 0). The paired Bayesian t-test revealed a Bayes factor of 2.8 × 10 11 , which indicates extreme evidence in favour of the alternate hypothesis. As shown in Supp. Fig. 3f, compared to the prior distribution, the posterior distribution is concentrated near an effect size of 5.3. Robustness checks and sequential analyses over different prior widths can be seen in Supp. Fig. 3g&h, and do not effectively change this interpretation. In conclusion, there is extreme evidence that the estimated risk aversions are closer to the prediction of the time optimal model than a model which assumes no dynamic specific changes in risk aversion.
Deviations from time optimality correlates negatively with time average growth rates. We conducted a Bayesian correlation analysis for the relation between the deviation of the estimated risk aversion from time optimality, and the time average growth rate achieved by the participants' choices. We used a default prior (as specified in JASP software) which yields a uniform distribution on Kendall's 35 . We focus on hypothesis testing, specifying a one-sided alternative hypothesis which posits a negative correlation between deviation from time optimality and time average growth rate, compared to the null hypothesis that postulates that the correlation is non-negative. The Bayes factor for each correlation quantifies the evidence in favor of a negative correlation. Negative correlations were found for both additive dynamics ( Repeated measures ANOVA shows strong effect of gamble dynamics on risk aversion. We conducted a Bayesian repeated measures ANOVA on the risk aversion parameter , with gender, age, and order of testing as between subject factors, and dynamic as a within subject factor. We used the default prior options for the effects (r = 0.5 for the fixed effects). To assess the robustness of the result, we also repeat the analysis for different widths of prior. The 'Model Comparison' table in Supp. Fig. 4b gives the results with respect to the different models. The models that are compared are all possible combinations of factors including interactions of factors. The table lists all of the models, and the corresponding Bayes factors, where the best performing model (here, the model that includes only the dynamic factor) is compared to all the other models. The column of BF01 shows that the data are ~7 times more likely under the model with only the dynamic, than under the full model (i.e., the model with age, gender and order of testing, and their interactions). The column P(M) lists the prior model probabilities, which are held uniform across all the models. The column P(M|data) lists the posterior model probabilities. The column BFM indicates how many times the best model is compared to each other model. The 'Analysis of Effects' (Supp. Fig. 4c) gives Bayes factors for the inclusion of each factor that appears in at least one of these models. For each factor, the BFInclusion column reflects how well the effect predicts the data by comparing the performance of all models that include the factor to the performance of all the models that do not include the factor. For the factor representing the gamble dynamic (Dynamic), there is extreme evidence in favor of its inclusion (BFInclusion > 100), whereas for all other factors there is evidence against their inclusion.
In conclusion, compared to other factors, the gamble dynamic has a uniquely strong effect on risk aversions.

Parameter recovery.
To evaluate whether the model estimation methods were capable of recovering approximate parameter estimates, we performed a parameter recovery simulation in which we subjected our estimation procedures to synthetic data for which ground truth parameter values were set a priori. Supp. Fig. 5a shows the correspondence between the estimates of risk aversion parameters and the ground truth values used in simulating synthetic agents. Agents were simulated to have all pairwise combinations of values of [-0.5,0,0.5,1,1.5] for additive and multiplicative dynamics. 20 subjects were simulated for each parameter combination, and then the same parameter estimation procedures were applied to visualise the recovery of parameters. Supp. Fig. 5a shows a subset of this space most relevant to the key results of this paper.

Model recovery.
To evaluate whether the model selection methods were capable of recovering the set of utility models tested, we performed a model recovery simulation in which we subjected our estimation procedures to choices made by synthetic agents for which ground truth model values were set a priori. Supp. Fig. 5b shows the correspondence between the posterior inclusion probabilities for each utility model and the ground truth identities of the utility models used in simulating synthetic agents. The first seven subjects were synthesised as prospect theory agents (with same parameters for additive and dynamic sessions), the next seven subjects as isoelastic utility agents (again with same parameters for both sessions), and finally the last seven subjects were time optimal agents.  Fig. 3. b, model recovery for three different groups of synthetic agents, for the three models compared. Posterior model probabilities map strongly onto the ground truth model identities, insofar as the first seven agents were synthesised via a prospect theory model, the next seven from an isoelastic utility model, and the final seven from a time optimal model. Wealth trajectories of synthetic agents. Fig. 1e-f shows wealth trajectories of synthetic agents repeatedly playing the set of 144 different gambles used in this paradigm. No-brainers were not included. 12 different agents were synthesised, comprising the three model classes: 9 variants of prospect theory agents, comprising all possible combinations of {1, 2, 3} and a {0.3, 0.6, 0.9} (identical for both gains and losses), and 2 variants of isoelastic agents h {0, 1}. Each prospect theory and isoelastic agent had the same parameters for both additive and multiplicative dynamics. The time optimal agent is a special case, having linear utility under additive dynamics, and logarithmic utility under multiplicative dynamics. The trajectories were computed over several timescales, hour, day, week, year. At a long enough timescale noise is removed by the passage of time, and the time average growth rates of the different agents become apparent. In this artificial environment, agents were playing trials every 9.5s, continuously. Over the duration of a week, a reasonable approximation to the time average growth rate is typically revealed (Fig. 1e-f). Once both computers are turned off, press "System Off" under the quench button ☐ After pressing "System Off", turn the key to lock the scanner ☐

Admin
Put MR-safety protocol and subject code in locked drawer ☐

Subject Instructions
Introduction. The experiment is divided over two days, within each day there will two different phases, a passive phase and an active phase.
The main aim is to study how the brain reacts to changes in wealth. All of the money involved is real, and you will be will be paid out the total wealth accumulated, summed over the two days.

Day 1
Passive phase. For the passive phase, you will see a number in the middle of the screen, this is your current wealth for the day in kr.
When you see a white box around the number, you are to press the button within 1s. (If you do not, you will be instructed to "press button earlier").
Shortly after pressing the button you will see an image in the background, and this will cause your wealth to change.
You are instructed to attend to any relationship between the images and the effect this has on your wealth, since in the active phase that follows you will be given the opportunity to choose images to influence your wealth.
Learning these relationships can make a large difference to your earnings in the active phase.
Active phase. With the money accumulated in the passive phase, you will play gambles composed of the same images.
In each trial, you will be presented with two of the images that you have learned about in the passive phase.
By pressing the buttons in the scanner to move a cursor, you now have the option to choose to either a) Accept gamble one, in which case you will be assigned one of the two images, each with 50% probability (not shown), or… b) Accept gamble two, in which case you will be assigned one of the two images, each with 50% probability (again not shown),.
The outcomes of your gambles will be hidden from you, and only 10 of them will be randomly chosen and applied to your current wealth.
You will be informed of your new wealth at the end of the active phase.
You can keep any money accumulated after the active phase.
If you do not choose in time, then we will give you one of the worst images, it is recommended that you always choose in time.
The decisions you make can make a big difference to your end wealth.

Day 2
Introduction. On day two, you will be endowed with a new wealth of 1000kr, and you will go through the same active and passive phases as described before, but the images will be new and they will be associated with different changes in wealth.
At the end of the two days. Your accumulated wealth will be added over the two days, and transferred to your account, within approximately two weeks, and is taxable under standard regulations (B-income).
Total earnings = (Wealth after day 1) + (Wealth after day 2) This will be paid over and above your remuneration for participating in the experiment