The statistics of cognitive variability: Explaining common patterns in individuals, groups and financial markets

Psychological variability (i.e., “ noise ” ) displays interesting structure which is hidden by the common practice of averaging over trials. Interesting noise structure, termed ‘stylized facts ’ , is observed in financial markets (i.e., behaviors from many thousands of traders). Here we investigate the parallels between psychological and financial time series. In a series of three experiments (total N = 202 ), we successively simplified a market-based price prediction task by first removing external information, and then removing any interaction between participants. Finally, we removed any resemblance to an asset market by asking individual participants to simply reproduce temporal intervals. All three experiments reproduced the main stylized facts found in financial markets, and the robustness of the results suggests that a common cognitive-level mechanism can produce them. We identify one potential model based on mental sampling algorithms, showing how this general-purpose model might account for behavior across these very different tasks.


Introduction
Psychological experiments are generally designed to minimize "noise": to collect enough data for experimental manipulations to show an effect on average responses.However, average responses hide the surprising structure of psychological time series when people are asked to make repeated responses for a long time periods without interruption: for example, if participants are asked to repeatedly estimate fixed targets (e.g., a 1-s temporal interval), the resulting psychological time series has been found to show long-range and slowly decaying serial dependence, termed 1/f noise, and this complex autocorrelation structure can explain a larger fraction of the variability in behavior than even the experimental manipulation does (Gilden, 2001;Gilden, Thornton, & Mallon, 1995;Kello et al., 2010;Wagenmakers, Farrell, & Ratcliff, 2004;Zhu, León-Villagrá, Chater, & Sanborn, 2022). 1 Yet such time series phenomena have typically been ignored in data analysis (where averaging over trials obliterates sequential dependencies) and modelling (where the trials are typically assumed to be independent) (Zhu et al., 2022).
By contrast, a particular type of time-series generated by the interactions between large numbers of people has been the focus of a huge amount of attention: the statistical properties of time series from financial markets have been studied intensively, and found to have a range of characteristic properties (Cont, 2001;Atkinson and Piketty, 2007).Such markets are typically viewed as the domain of finance, but they can also be viewed as large-scale natural psychological experiments, involving high-stakes decision making by substantial numbers of highly incentivised participants.Financial time series also differ from time-series in typical individual psychological experiments in another way: market participants are not engaging in a repetitive task, but are following a "moving target," partly driven by a continually changing informational environment.
Perhaps understandably given these differences, there has been little work exploring potential connections between the statistical properties of individual-and market-level time-series.In this paper, we develop a series of experiments that suggest that, surprisingly, individual-level behavior and the aggregate behavior of thousands of market participants may share striking and distinctive properties.Such links may help inform the study of individual time series (by looking for phenomena that have been uncovered in market contexts); and shed light on the psychological foundations of market behavior, informing the development of models that can operate at both the individual-and marketlevel.

Stylized facts in financial time series
First, we review the statistical stylized facts found in financial time series, focusing specifically on three key properties: absence of autocorrelation in returns, heavy tails in price changes, and volatility clustering, each briefly described below.While we draw these features from observations of stock prices, it is important to note that many of these properties are generalizable to a wide range of financial assets, including foreign currencies, bonds, and market indices (Cont, 2001;Bollerslev, 1986).
One of the most important stylized empirical facts of financial markets is the absence of autocorrelation in returns (i.e., changes in price) for lags greater than a few minutes; that is, new returns are generally unrelated to past returns beyond a short window, with correlations between returns at greater distances falling near zero (see Fig. 2c S&P500 panels).This is also demonstrated in the power spectrum of the price series, which resembles 1/f 2 noise, the spectral characteristic of random walks (see Fig. 2a S&P500 panels; Cont, 2001), implying price movements are similarly random and unpredictable.However, the exact timescale for such autocorrelation to decrease to zero can vary from a few minutes to a few months (Cont, 2001;Moskowitz, Ooi, & Pedersen, 2012).The idea of a random-walk model of price is closely related to the efficient market hypothesis (Fama, 1970), which supposes that price changes reflect reactions to news: whenever any economic news, which is naturally unpredictable, is made available, it will be incorporated into the price without delay.Thus, a price series as predicted by the efficient market hypothesis must be a random walk, in which price changes are independent of one another.
A further common assumption is that the log of a price series follows a Gaussian random walk (i.e., the price follows geometric Brownian motion), meaning the distribution of log price changes should be Gaussian (Black & Scholes, 1973;Merton, 1973).But Gaussian random walks do not describe real markets.First, Gaussian distributions of price changes underestimate extreme movements: in a hypothetical Gaussian market, a loss of greater than 4 standard-deviations is only expected to occur once every 126 years (Frain, 2009), but the FTSE-100 index has shown losses of such magnitude 11 times between 1987 and 2008, even excluding the 2008 global financial crisis.The distribution of returns is thus better described as having 'heavy tails': there is a greater density of extreme events in the tails of this distribution than is predicted by the Gaussian standard (see Fig. 2b S&P500 panels;Mandelbrot, 1963), particularly for returns over short time lags.
Besides heavy tails, there exists another critical aspect of the financial markets which is not captured by the Gaussian random-walk model: long-range dependencies in price series.While the absence of autocorrelation in price changes supports a random-walk model of prices, it does not necessarily imply that price changes are fully independent.Indeed, another stylized fact, volatility clustering, demonstrates that absolute price changes are not independent, but have positive autocorrelations which persist across longer lags (see Fig. 2d S&P500 panels; Mandelbrot, 1963;Campbell, Lo, MacKinlay, & Whitelaw, 1998).Thus, while the directions of price changes are unpredictable, their magnitudes are not: large price changes are more likely to be followed by large price changes and vice versa.

Market-based models of the stylized facts
Two traditions of economic models have been developed to accommodate these stylized facts and make useful predictions for asset prices: models that assume optimizing individuals and agent-based models.The former approach typically assumes agents that rationally forecast future economic variables and optimize economic decisions given that forecast (Kaplan, Moll, & Violante, 2018;Kydland & Prescott, 1982;Veronesi, 1999).The latter approach instead assumes heterogeneous agents with limited rationality (Anufriev & Hommes, 2012;Farmer & Foley, 2009;Hommes, 2011).Traditionally, models of optimizing individuals are unable to reproduce stylized facts such as volatility clustering, though recent extensions have attempted to address this (Barberis & Thaler, 2005;Veronesi, 1999).In contrast, heterogeneous agent-based models more easily produce the stylized facts, and indeed have accomplished this in numerous ways, such as differences in the time scale at which investors focus their attention (Andersen & Bollerslev, 1997;Bouchaud & Potters, 2003;Guillaume et al., 1997;Hommes, 2011;Kirchler & Huber, 2009;LeBaron, 2001) or heterogeneity in trading strategies between agents (Cont, 2007;Kirman, 1993;De Long, Shleifer, Summers and Waldmann, 1990).
These two approaches do however share a common focus on marketlevel explanations, either through market mechanisms or agent interactions.Both thus predict that most of the stylized facts should disappear in time-series generated by individual agents rather than market interactions.Ultimately, however, markets are made up of people, and price series should be influenced by the expectations of individuals.But there is a lack of empirical knowledge about the statistical features of psychological time series in such cases, especially with regard to how individuals change their beliefs over time, as current work predominantly focuses on estimates of fixed targets (Gilden, 2001;Gilden et al., 1995;Zhu et al., 2022;Chater and Brown, 1999).This leads to a rather neglected question: Do individual psychological time series mirror price series even without market mechanisms or interactions?If so, then part of the variability in financial systems could be attributable to individual cognition, making understanding of the mechanisms supporting this behavior even more crucial.To answer this question, we conduct a series of laboratory experiments that study long-range statistical behaviors of human cognition in both group and individual settings and in forecasting and time estimation domains.

Method
To begin to explore what conditions are sufficient to produce the stylized facts in the laboratory, our first experiment simplified the design of conventional asset-market experiments: participants only need to predict the next price in a series, which will be the average prediction of all group members in that trial (see Fig. 1a).In contrast to conventional asset-market experiments, the Group Prediction Task removed the buying-selling procedure and information about the fundamental value of the asset, making this a pure belief-based pricing task.There was also no heterogeneity of information: we presented the same price information to all participants simultaneously.

Participants
One-hundred-and-fifty participants were recruited through Prolific in return for financial compensation between £6 and £7 based on accuracy of prediction in a randomly selected trial.120 participants qualified for further analysis (44 females, 73 males, 3 declined to answer, aged between 18 and 80; exclusion criteria explained below).Sample size was determined to provide at least 20 groups each consisting of 6 participants.Ethical approval for the experiment was given by the University of XXX Humanities and Social Sciences Research Ethics Committee.

Procedure
Experiment 1 examined estimates in small experimental markets: groups of participants each made separate predictions of the price of a risky fictional asset based on a common price history, with the actual price of that asset in each period then being based on aggregated predictions from all participants in that period.This was based on the 'learning-to-forecast' experiments of Hommes and colleagues, most notably Hommes, Sonnemans, Tuinstra, and Van de Velden (2008) and Heemeijer, Hommes, Sonnemans, and Tuinstra (2009), in which participants were asked to predict the next price of a fictional asset based on the price history up to that point.Unlike these experiments, however, we do not define a dividend for the target asset, nor do we note any safe alternative asset with a fixed interest rate, meaning the asset has no defined fundamental price; this was to focus attention on price movements arising from cognitive fluctuations in expectations without any external information.Price in this experiment is therefore purely a function of participant predictions rather than any pre-determined series.
Experiment 1 ran in a series of 1-h sessions performed online using the software oTree (Chen, Schonger, & Wickens, 2016).Upon entering the experiment, participants were first sorted into groups of 6 by order of arrival, which remained consistent across the session.If a group did not reach 6 members within 5 min of the start of the session, the task proceeded with that incomplete group, with price being calculated according to the average of the present members; incomplete groups were then removed from subsequent analysis.
Participants were each framed as an advisor to a large financial firm deciding whether to invest in a particular asset, making repeated individual predictions of the price of the asset in the next immediate trial, mirroring the one-period ahead predictions of Heemeijer et al. (2009).Firms were stated to base their demand for the asset on the predictions of their advisor: high price predictions lead to greater demand, and low predictions to lower demand.Price would then be set according to the total demand of all firms within the market, thus reflecting the predictions of all participants in the group for that period.Participants were however informed that they would be rewarded purely based on the accuracy of their individual predictions, and so should focus only on being as close to the true price as possible.Bonuses were noted to be determined by accuracy on a randomly selected trial to incentivise maximizing accuracy in every trial.
After reading the instructions, participants began the main trial block.Each trial requested participants predict the price of the asset for that trial (i.e.today's as-yet undetermined price) by typing their estimate into an on-screen text box.Predictions were required to be positive integers with an upper limit of 10,000 to prevent extreme increases in realised price, as used in Hommes et al. (2008); participants were not made aware of this limit at the start of the task, but would be asked to enter a new estimate if a submitted prediction exceeded this value.In addition, to counter potential typing errors like missed digits, predictions with an absolute log difference from the most recent price greater than 0.3 required secondary confirmation by the participant to be accepted; this threshold was determined via pilot testing on four separate groups using the same procedure prior to main data collection.
To assist predictions, each trial showed a line plot illustrating the history of both the actual price and that participant's predictions (but not the predictions of other group members) for all past trials for that group.For the first trial in which no previous information was available, participants were given a fixed starting value of 200 representing the most recent price of the asset, chosen arbitraily to allow space for both upward and downward movements, and common to all experimental sessions.New price values were then appended to this history once determined on each trial.A time limit of 10 s was placed on each prediction to assure task progression; if no prediction was entered within this time, the task automatically advanced and that participant had no influence on price for that trial.
Price was then calculated by taking the average prediction across participants who responded for that trial, simulating a basic equilibrium between supply and demand, with the addition of Gaussian noise: where p e i,t is the prediction of participant i on trial t, and n t is the number of participants who made a prediction on that trial.Prices were rounded to the nearest integer to match with responses.This price was then revealed to each group member alongside the potential reward for their prediction if that trial were selected for bonus payment; this feedback remained on-screen for 3 s before advancing to the next trial.
The task continued until either a maximum of 300 trials or a total duration of 60 min.The experiment then ended by calculating a bonus payment for each participant according to their prediction accuracy on a randomly sampled trial.Reward was calculated using an exponential function of absolute log deviation from true price on the sampled trial restricted to fall between £0 and £1: R i,t = exp ) .
Participants thus optimize their rewards by aligning their predictions with those of their fellow group members to reduce their deviation from the mean, thereby enhancing accuracy.In essence, the strategy that maximizes rewards involves collectively agreeing upon a price prediction within the group.This collaborative approach ensures that all members aim for a consensus, which serves as the most beneficial outcome for each participant.During the experiment, however, direct communication among participants is prohibited: The only information they can access is their own predictions and the group's aggregated price outcomes.This also means that no single deterministic prediction strategy necessarily outperforms any other so long as all participants follow a common strategy: for example, trend chasing can be as rewarding as predictions of stability if all group members predict the same trend.

Results
Groups completed an average of 235.25 trials, with a range of 136 to 300.We examined the price series produced by each group using the same analyses applied to financial asset prices to look for the stylized facts, with average behaviors across groups shown in Fig. 2. Participants demonstrated little autocorrelation in the changes of their price series: across the first 100 lags of the autocorrelation function, an average of 5% (±3.63% 95% CI) significantly differed from zero across groups (using confidence bounds defined by Box, Jenkins, & Reinsel, 1994), and spectral density analysis found an average slope of − 1.85 (±0.02 95% CI), falling slightly above the − 2 slope of market series (Cont, 2001).
The autocorrelation function in absolute changes did however display a slow decay across lags, indicating volatility clustering, though this was not as strong as that seen in financial markets.The distribution of changes also demonstrated strong deviations from Gaussian standards with an average kurtosis of 22.41 (±14.72 95% CI), matching the heavy tails of asset returns (see Fig. 2b).
Thus, these results suggest that a very basic experimental market setup is sufficient to generate the stylized facts of asset prices.In common with other asset market experiments, however, this experiment does not clarify which aspect of the task is responsible for generating the stylized facts and whether all the ingredients are needed (c.f.LeBaron, 2000): it is unclear from this data whether the stylized facts emerge from the interaction between participants or from individual cognition.

Method
To test whether internal cognitive fluctuations alone are sufficient to reproduce the stylized facts, we further simplified the design of group prediction task by cutting out group interactions.In Experiment 2, participants were asked to individually predict the next price in a series in which price was not influenced by their predictions; instead, the target price series followed a Gaussian random-walk in log price, in essence reflecting an idealized rational price sequence without heavy tails or volatility clustering (see Fig. 1b).The reward-maximizing behavior in this task, if the nature of underlying price series is known or has been learnt, is to simply use the preceding price as the prediction for the next.

Participants
Seventy-four participants were recruited through the University of XXX SONA subject pool, and completed the 60-min experiment in exchange for monetary rewards between £2 and £20 based on accuracy of prediction in a randomly selected trial.Participants were excluded from analysis if they reported an interruption during the task, or if less than 350 predictions were completed within the 60-min session; this removed 33 participants, leaving 41 participants for further analysis (15 females, 26 males, and aged between 19 and 32).Sample size was determined to provide at least 20 time-series of individual predictions for both conditions.Ethical approval for the experiment was given by the University of XXX Humanities and Social Sciences Research Ethics Committee.

Procedure
Experiment 2 was conducted remotely using the PsychoPy software packages (Peirce et al., 2019), installed on participants' personal computers.Participants were first assigned to one of two potential target time series, with 20 participants viewing series A, and 21 viewing series B. Both target time series were defined using random walk processes with Gaussian increments in log scale: The two series differed in standard deviation of step size (σ A = 0.25, σ B = 0.2) and the random seed used to generate each step.These values were multiplied by 100 and rounded to the nearest integer for display as prices to participants.We confirm that the initial 350 values of the target series follow a Gaussian distribution (see Fig. S2 in Appendix B).Moreover, Anderson-Darling tests for normality were conducted: A 2 = 0.27, p = 0.69 for series A, and A 2 = 0.40, p = 0.37 for series B. These results suggest that we cannot reject the null hypothesis, indicating that the log changes in the target series are not detectably different from normality.
Participants were instructed that they would be predicting the price of a financial asset that moved up and down randomly, and would be rewarded based on their accuracy of predicting the target in the next trial.Each trial presented the current target to the participant on-screen for 2 s.After presentation, participants were immediately asked to predict the next target by typing a positive integer.Unlike Experiment 1, no limit was placed on predictions as these did not affect the true price series, and there was no time limit on responses.Once the prediction was made, a feedback screen was shown noting the next target and the potential monetary reward that the participant would receive if that trial were selected for payment.Trials were separated by an inter-trial interval of 1 s while displaying a cross fixation at the center of the screen.Trials continued until a maximum duration of 60 min.
At the end of the experiment, participants were also asked to report whether they had been interrupted during the task, with any participants indicating that they had being removed from subsequent analysis.Payments were then determined using a nonlinear transform of the participant's log error in their prediction on a randomly selected trial: ) . The optimal prediction is thus the most recent price plus the median of the step distribution; as this median was zero, participants should simply repeat the last target to maximize their reward.As such, the optimal prediction series should replicate the random walk target, and so match its statistical properties, including a Gaussian change distribution and no autocorrelation in either changes or volatility.

Results
Participants completed an average of 485.93 trials, with the number of trials ranging from 350 to 558.We applied the same measures used above to the prediction series of each participant to check for the stylized facts.Participants demonstrated little autocorrelation in the movements between their predictions, with 7% (±1.10% 95% CI) of their autocorrelation function significantly differing from zero and an average spectral density slope of − 1.71 (±0.04 95% CI), again falling slightly above the equivalent measure from markets.The autocorrelation in volatility, meanwhile, does show a strong positive value in the shorter lags, though this appears to decay faster than in either the group prediction task or the market data.Finally, the distribution of movements also again showed deviations from Gaussian levels in its tails with an average kurtosis of 20.94 (±5.28 95% CI), as seen in its quantilequantile plot in Fig. 2b.
Participants in Experiment 2 thus not only deviated from optimal behavior, their predictions showed the stylized facts of markets despite the absence of these properties in the target price series, though volatility clustering is notably weaker in this task.Thus, individuals are imposing these features when predicting a time-series in which these are absent.
The framing of the task in terms of predicting asset prices, however, could have encouraged participants to believe that the target price series had the same properties as real-world price series.Hence, despite presenting a Gaussian random walk as a target, participants' experience with real-world markets may have overshadowed the "small" number of training trials in the laboratory, resulting in a strategy of reproducing the statistics of real-world markets.This does not only apply to experience with financial markets: real-world time series can emit heavy tails and volatility clustering (e.g., turbulence; Ghashghaie, Breymann, Peinke, Talkner, & Dodge, 1996; Mantegna & Stanley, 1996), which potentially could have influenced participants' expectations in this task.We therefore attempted to address this issue in our third experiment by moving away from the prediction paradigm.

Method
In Experiment 3, we further abstracted away from real-world markets by removing any reference to price, financial assets, or predictions.Participants were instead simply asked to reproduce temporal intervals (see Fig. 1c).Rather than predicting the next target, participants were asked to reproduce the most recent target as accurately as possible immediately after its presentation, where the true duration varied from trial to trial following the same Gaussian random-walks as in Experiment 2, but with the symbolic numbers converted to physical times.As before, the optimal behavior was to directly replicate the target series (and therefore its properties), and in this case the optimal behavior was the explicit goal of the task.

Participants
Fifty-eight participants were recruited through the University of XXX SONA participant pool in exchange for monetary rewards between £2 and £20 based on accuracy of estimate in a randomly selected trial.Exclusion criteria were identical to Experiment 2, removing 17 participants and leaving 41 for subsequent analysis (22 females, 18 males, 1 non-disclosed gender, and aged between 19 and 45).Sample size was determined to provide at least 20 time-series of individual time estimates for both conditions.Ethical approval for the experiment was given by the University of XXX Humanities and Social Sciences Research Ethics Committee.

Procedure
Experiment 3 was again conducted remotely using the PsychoPy software packages (Peirce et al., 2019), installed on participants' personal computers.Participants were first assigned to one of two potential target time series, with 20 participants viewing series A, and 21 viewing series B. These were the exact same series as in Experiment 2, though the log changes were scaled down by a factor of 5 from the ones used in Experiment 2 so that the time intervals were not too long or too short.As they are the same time series, the statistical tests showing that these series are not detectably different from a Gaussian distribution also apply to this experiment.
Participants were told that the task involved viewing and replicating a time interval, and they would be rewarded based on their accuracy.Each trial presented the target time interval to the participant as a red circle on-screen for the given duration; that is, the red circle appeared on-screen for x t seconds in the t-th trial.After presentation, participants were immediately asked to replicate the target duration by holding the space-bar for the same length of time as the red circle had appeared.To aid response, a matching red circle was displayed as participants held the space-bar.As with Experiment 2, no limits were placed on responses.Once the estimate was made, a feedback screen was shown noting the potential bonus that the participant would receive if that trial were selected for payment.Trials were separated by an inter-trial interval of 1 s while a cross fixation was displayed at the center of the screen.Trials continued until a maximum duration of 60 min.Four practice trials were given to all participants prior to the main task, following a sequence of target time intervals in the order of 1 s, 2 s, 1 s, and 2 s.
Finally, participants were also asked to report whether they had been interrupted during the task, with any participants indicating that they had being removed from subsequent analysis.Payments were then determined by applying the same reward function as in Experiment 2 on a randomly selected trial.

Results
On average, participants completed 485.93 trials, with counts ranging from 350 to 558 trials.We applied the same analyses to each participant's series of time estimates as used in the previous tasks; this means our analyses consider both variability generated by the participants and that from the original series, but ensures comparability with the previous tasksif we had performed the more common analysis of errors, this would have changed the stylized facts in which we were interested.We find little autocorrelation in the changes between estimates, with an average of 7.98% (±1.27% 95% CI) of lags significantly differing from zero across participants.Spectral density slopes do fall above market levels with an average of − 1.32 (±0.06 95% CI), though this appears to be driven by particular deviations in the high-frequency regions (see Fig. 2a); this is likely due to motor noise dominating cognitive noise in short lags, as has been previously observed in similar tasks (Gilden et al., 1995).Volatility meanwhile shows similar patterns to the individual prediction task above, with strong autocorrelations at short lags which quickly decay to zero.Finally, the distribution of log changes again shows strong deviations from Gaussian standards in its tails with an average kurtosis of 32.01 (±8.01 95% CI), mirroring the previous experiments.
Thus, much like Experiment 2, participants' time series of estimates did not behave like the target Gaussian random-walk but imposed the stylized facts of financial markets.Hence, despite the vast gulf between the time estimation and price prediction tasks, individuals still exhibit the stylized facts, demonstrating their generality.

Explaining the statistics of human behavior
We have provided empirical evidence of consistent stylized facts in psychological time series for both group and individual tasks, which parallel those observed in financial markets (see Appendix A for additional stylized facts).Why might this be the case?To propose a possible theory for the observed statistical stylized facts, we next conduct an initial comparison between several economic and cognitive models.Evidently, the optimal model (coordination on a uniform price in group tasks and replication of the target time series in individual tasks) falls short in capturing deviations from the target series.This limitation highlights its inability to accurately reflect human responses in these tasks.One possible explanation for the observed human patterns is that the responses of any complex system always exhibit these properties.For example, air turbulence has been suggested to show similarly heavy tails to price series (Ghashghaie et al., 1996), though subsequent work has argued turbulence does not show a similar absence of autocorrelations (Mantegna & Stanley, 1996).Furthermore, even the near-ubiquitous dependencies in psychological time series are greatly reduced when memory or prediction are not needed, for example if people are instructed to simply respond as quickly as possible to an unpredictable target (i.e., making the response times an unintentional side-effect, rather than the target of cognition; Gilden et al., 1995;Kello, Beltz, Holden, & Van Orden, 2007).This suggests that deliberately generated cognitive processes may be necessary to produce the stylized facts.In sum, cognition may be the necessary ingredient to produce the empirical observations in our three experiments.
If cognition generates these stylized facts, how are agents able to produce both features that align with rational agents (absence of autocorrelation) and those that do not (heavy tails and volatility clustering)?Bayesian models of cognition could serve as a starting point, as they can be applied to a very broad range of tasks including generating time estimates (e.g., Jazayeri & Shadlen, 2010) and making numeric predictions (e.g., Wu, Schulz, Speekenbrink, Nelson, & Meder, 2018).But instead of investors being entirely rational, it could be that human beings approximate rational solutions with limited cognitive resources, which helps explain people's deviations from the predictions of Bayesian models (Lieder & Griffiths, 2020).One scheme for implementing resource-rationality is to assume that people are sampling forecasts or time estimates according to their probability rather than always producing the single best response, using sampling algorithms like those computer scientists and statisticians use to make probabilistic models tractable (Griffiths, Vul, & Sanborn, 2012;Sanborn & Chater, 2016).There is recent computational work showing that the same sampling algorithm, Metropolis-coupled Markov Chain Monte Carlo (MC 3 ), can explain the stylized facts found in Experiment 2 (Spicer, Zhu, Chater, & Sanborn), as well as both heavy tails and 1/f noise in fixed-target experiments where the target series itself does not exhibit 1/f 2 noise (Zhu et al., 2022).In Experiment 2, participants received feedback based on a 1/f 2 noise target series.Giving participants this feedback seems to alter the psychological time series generated by the participants, making it more akin to 1/f 2 noise compared to the 1/f noise observed in the fixedtarget experiment.
We therefore next consider whether such a sampling algorithm provides a better description of behavior in these tasks than the optimal model and a variety of other leading models that depart from optimality.This focuses on Experiments 2 and 3 given the complexities of modelling the interactions of the group task, though we revisit this task with preliminary modelling in the Appendix.

Candidate models
We considered 10 total candidate models for behavior across our tasks, including the optimal solution, sampling approximations to this optimal, market-level models, and statistical approaches.As both the individual price prediction and time estimation tasks use a Gaussian random walk in log space as a target and the reward function is based on absolute deviation, the optimal response for each period for both tasks is the median of the step distribution, which in both cases was zero.
We also include three sampling-based approximations of the optimal solution in which responses reflect samples from the true step distribution taken in different ways.First, the Direct Sampler (DS) draws independent and identically distributed samples from the target, using a single sample in each period as the response for that trial, ln(p t+1 ) ∼ N (ln(x t ) , σ), where x t is the most recent value of the target and σ is the standard deviation of the step distribution.Second, the MCMC sampler uses an autocorrelated sampling process in which each sample depends on its immediate predecessor by proposing a value close to the most recent sample and evaluating whether this proposal is better or worse according to the true step distribution: better values are immediately accepted, while worse values are accepted with probability equal to the ratio of the proposal against the last sample (Andrieu, De Freitas, Doucet, & Jordan, 2003).The proposal distribution of the MCMC sampler is a Gaussian with mean zero: N (0, σ MCMC ) where σ MCMC is a free parameter.Third, the MC 3 sampler referenced above extends the MCMC sampler to include multiple parallel sampling chains which distort the true distribution to make movements to poorer proposals more likely (Geyer, 1991;Robert and Casella, 2011).This is achieved by raising the target distribution to a unique exponent for each chain c: T c = 1/(1 + φ(c − 1) ), where the number of parallel chains and the increment φ are free parameters.Full definitions of these models are given in Spicer, Zhu, Chater, & Sanborn (in press).
To represent market-level models, we focus on heterogeneous expectation models given their relative ease in application to individuallevel behavior.These models are predicated on the assumption that individual market participants deploy a variety of heuristic strategies to forecast market prices.Demonstrating substantial alignment with experimental market data, these models are noteworthy for their empirical validity (Anufriev & Hommes, 2012;Hommes, 2011).Specifically, we draw on the Heuristic Switching model of Anufriev and Hommes (2012) in which forecasters can make use of four potential heuristic forecasting rules.For these heuristics, the coefficients are bestfitting values identified in Anufriev and Hommes (2012), which are unaltered in the present simulations.
First, an adaptive expectation heuristic which calculates the next price prediction as a weighted average of the current price (x t ) and the preceding prediction (p t ): Second, a trend-following heuristic in which the prediction for the forthcoming price is derived from the current price, adjusted by a weighted difference between the most recent price and its immediate predecessor (that is, the short-term trend).This heuristic has two variants, a weak trend and a strong trend heuristic, which are differentiated by the strength of the short-term trend: Finally, an anchoring-and-adjustment heuristic which employs the mean of the cumulative historical price average and the current price as an anchor, thereby nudging the next price prediction towards it, plus the most recent movement: According to Anufriev and Hommes (2012), the Heuristic Switching model dynamically alternates among these four heuristic rules based on their past success in forecasting prices.The fitness of heuristic i is computed by considering the squared error of its prediction of the most recent price, adjusted by the weighted performance of past predictions.
The influence of past performance diminishes at a rate denoted by η: + ηU (i)   t where i = {AE, WTF, STF, AA}.Next, the model determines the popularity of each heuristic at a given time as follows: where b denotes the persistent or inertia affecting the influence of a specific heuristic, and d denotes the temperature of the fitness of a specific heuristic rule.We here consider both the Heuristic Switching model and its four separate heuristics as candidate models for participants in our tasks: while these models were designed for forecasting of prices, we also apply these here to the time estimate task by equating the prediction of the next price with the estimate of the current time interval.
In addition to these heuristic market models, we also explored a prominent statistical approach previously applied to the analysis of financial data, the Multifractal Model of Asset Returns (MMAR) (Mandelbrot & Van Ness, 1968).While full definition of this complex model is beyond the scope of this paper, in brief this model is based on the concept of self-similarity, suggesting that price change patterns are consistent across various time scales, though not uniformly so.Unlike the traditional Brownian motion model, which offers a simplistic view of price movements, MMAR incorporates the notion of long-range dependencies and significant fluctuations in prices.Its statistical foundation allows for a broad application, not limited to specific markets or groups, providing a versatile framework for the analysis of time series with fractal characteristics.

Model behaviors
Before proceeding to a quantitative comparison of the models, we first present a comparison of the behaviors of these models using their prior predictive distributions of the stylized facts, shown in Fig. 3.For these plots, sets of parameters are drawn from the prior distribution for each model (given in Appendix A), experimental responses are simulated for each set, and the statistics of each set are averaged.This illustration is designed to highlight the unique characteristics and performance aspects of each model before being fitted to data.The optimal model, for instance, replicates the statistical properties of the target time series, but it falls short in capturing the systematic deviations typical of human responses like heavy tails and volatility clustering, which are critical in understanding complex human patterns.
The MMAR model meanwhile effectively captures the key stylized facts from markets as it was intentionally developed to showcase features such as volatility clustering, 1/f 2 noise, and heavy tails in changes.However, it does not successfully predict the negative autocorrelations in log change at the initial lags observed in experiments conducted at the individual level.In contrast, the Heuristic Switching model is adept at generating volatility clustering, 1/f 2 noise, and negative autocorrelations for the early lags.Yet, this model falls short in representing the heavy tails in changes.
The MC 3 sampler was able to qualitatively replicate the four statistical features of the psychological time series: model predictions show little autocorrelation in their changes, 1/f 2 noise in their spectral density, deviations from Gaussian tails and (limited) volatility clustering.This is however a purely qualitative comparison, so we next consider the quantitative fit of these models to our collected data.

Model comparisons
We fitted the 10 models using data from Experiments 2 and 3 for each individual, following the same procedure outlined in Spicer, Zhu, Chater, & Sanborn (in press) and described in Appendix A. In short, model fitting involved an Approximate Bayesian Computation (ABC) process to determine the correspondence between the key statistics of model predictions and human data, providing an approximation of the marginal likelihood of that model for each participant.In addition, we performed an alternate modelling analysis using a machine learning technique of 'Random Forests,' which are directly trained on model simulations to distinguish between candidate models, and so predict the most likely model for each participant based on their individual statistics (Pudlo et al., 2016).Table 1 summarizes the fitting results of the computational models: the MC 3 sampling algorithms generally outperforms the optimal, heuristic, and statistical models in both Experiments 2 and 3, though a subset of participants in the time estimation task are better fit by the MMAR.

Discussion and conclusions
We have presented a candidate model for explaining the stylized facts, but even without a unified theory of stylized facts in both psychology and finance, the current empirical findings suggest the benefits of a cognitive perspective on market fluctuations which may have been overlooked in traditional economic theories.Our approach is also complementary to both existing market-based pricing theories and the empirical finance literature that relates price changes to macroeconomic events using real-world data, where it has been repeatedly demonstrated that news (i.e., changes in fundamental values) can explain only a small fraction of price variation: at most about a third of the variance is explicable by economic influences (Cornell, 2013;Cutler, Poterba, & Summers, 1989;Roll, 1984;Shiller, 1981).While it is empirically difficult to connect price changes to fundamental economic news, it is also methodologically difficult to identify causes of price changes when only dealing with real-world data, especially when testing a psychological hypothesis: there are just too many confounding factors jointly influencing the market that need to be properly identified and controlled in order to arrive at meaningful conclusions.To this end, by decomposing the complex real-world problem into simulated markets and even simpler tasks, our laboratory studies serve a unique and useful perspective in helping to identify the causal link between psychology and price changes.For example, comparing our group and individual prediction tasks, market-like interactions can amplify the volatility clustering arising from individual cognitive fluctuations.
Certainly, our data deviate from the standard assumption in psychology and economics that "noise" is independent and Gaussiandistributed, and it is surprising that the deviations matched the stylized facts shown in markets.Still, our analysis is inevitably tentative because we cannot estimate the proportions of the variance in price changes that can be attributed to individual psychological fluctuations.Even so, such experiments present a relatively simple and low-cost method for continued exploration of the potential causes of price changes, particularly at smaller scales.
Delving into individual cognition and interactions might also enhance our understanding of how individuals incorporate information from others.Generally, our results find the idea that individuals employ sophisticated sampling algorithms to approximate Bayesian solutions is relevant to the price prediction task and the time estimation task with a varying temporal target.Indeed, the hypothesis that individuals make stochastic approximations offers a potential explanation for market phenomena not addressed by our experiments, such as the greater market volatility than the efficient market hypothesis predicts.This discrepancy, noted by Shiller (1981) and LeRoy and Porter (1981), might stem from cognitive fluctuations impacting market dynamics.Translating individual behavior to group settings is a challenging problem, however: it is uncertain whether simple aggregation of independent sampling algorithms is sufficient to capture group-level behavior (see Appendix A).Indeed, existing market models such as the Multifractal model considered above may still be better placed to capture these series given they were specifically developed to capture the statistical properties of financial markets.Such models do not however offer any clarity on the cognitive processes of individual market members, limiting their explanatory power.Further work is thus required to explore the relationship between individual and group-level behavior, with small-scale experimental markets such as those used here being a valuable paradigm for continued investigation.
This work also opens up an interesting avenue for research in other areas of psychology: analyses developed for financial time series are also applicable to individual psychological data, revealing the "signatures" of cognition.While research has examined power spectra in different experimental tasks (Gilden, 2001), there has been less focus on heavy tails, volatility clustering, and other phenomena.We believe that exploring the full range of stylized facts will be important not just for understanding the boundary conditions for the present experimental results, but providing new insights into underlying cognitive mechanisms, as well as a powerful constraint on computational cognitive models (Sanborn et al., in press).This is not to say that our present results are conclusive, however: even within these experiments, we do observe deviations in the statistics between tasks, most notably the weaker volatility clustering seen in the individual settings, which warrants further examination.In addition, while our model comparison suggests general support for sampling models of cognition in these tasks, this is not unanimous across participants, and naturally only considers a subset of the available models, though contrasts with models of individuals other than those we consider here have shown similar support for sampling explanations (Spicer, Zhu, Chater, & Sanborn, in press).Finally, there are still questions to be answered about how individual cognition translates to larger group settings: even where common features are observed at the individual and group levels, it is unclear how these features might survive aggregation across large numbers of members (though see Online Supplementary Information for some potential methods).We thus hope that this study offers a starting point for further study of the interplay between individual cognition and large-scale market behavior.
In sum, our findings suggest closer attention be paid to the role of cognitive fluctuations in macroeconomic activity.The spontaneous emergence of heavy tails and volatility clustering in cognition is particularly worrisome given its potential to cause financial disasters.However, it does suggest that interventions that improve the independence or accuracy of repeated estimates, such as slowing down decision making or considering problems from multiple perspectives (Herzog & Hertwig, 2014), could help alleviate these issues, though first Note.Summed marginal log likelihoods from each participant in Experiments 2 and 3 for each candidate model, and the number of participants best fit by that model.
J.-Q.Zhu et al. investigations are needed on how these interventions influence the stylized facts.We see this endeavor as an interdisciplinary one which will benefit from dialogue between psychology, economics, statistics, computer science and beyond.

Fig. 2 .
Fig. 2. Comparing statistical behaviors of financial (daily close of S&P500 for the 32-year period Jan 1986 to Jun 2018) and psychological time series (Experiment 1: group price predictions, Experiment 2: individual price predictions, and Experiment 3: individual time estimates).(a) Power spectral density (95% CIs shaded around the solid blue line as mean).Dashed lines denote Brown noise (or 1/f 2 noise).(b) Empirical probability density function in quantile-quantile plot (95% CIs shaded in empirical quantiles).Dashed horizontal lines denote normal distributions.(c) Autocorrelation function of logarithm of successive changes (95% CIs shaded around correlational coefficients at each lag).Dashed horizontal lines denote no autocorrelation.(d) Autocorrelation function of volatility or absolute log changes (95% CIs shaded around correlational coefficients at each lag).Dashed horizontal lines denote no autocorrelation.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. Comparing prior predictive distributions for model behaviors.From left to right, each panel displays the power spectrums (with the black dashed line representing 1/f 2 noise), quantile-quantile plots of log changes (with the black dashed line representing Gaussian distributions), autocorrelation functions of log changes (with the black dashed line representing no autocorrelation), and autocorrelation functions of volatility (with the black dashed line representing no autocorrelation).Shaded regions denote 95% confidence intervals across participants/simulations.

Table 1
Summary of model fitting results.