A description–experience gap in statistical intuitions: Of smart babies, risk-savvy chimps, intuitive statisticians, and stupid grown-ups

Comparison of different lines of research on statistical intuitions and probabilistic reasoning reveals several puzzling contradictions. Whereas babies seem to be intuitive statisticians, surprisingly capable of statistical learning and inference, adults' statistical inferences have been found to be inconsistent with the rules of probability theory and statistics. Whereas researchers in the 1960s concluded that people's probability updating is "conservatively" proportional to normative predictions, probability updating research in the 1970s suggested that people are incapable of following Bayes's rule. And whereas animals appear to be strikingly risk savvy, humans often seem "irrational" when dealing with probabilistic information. Drawing on research on the description-experience gap in risky choice, we integrate and systematize these findings from disparate fields of inquiry that have, to date, operated largely in parallel. Our synthesis shows that a key factor in understanding inconsistencies in statistical intuitions research is whether probabilistic inferences are based on symbolic, abstract descriptions or on the direct experience of statistical information. We delineate this view from other conceptual accounts, consider potential mechanisms by which attributes of first-hand experience can facilitate appropriate statistical inference, and identify conditions under which they improve or impair probabilistic reasoning. To capture the full scope of human statistical intuition, we conclude, research on probabilistic reasoning across the lifespan, across species, and across research traditions must bear in mind that experience and symbolic description of the world may engage systematically distinct cognitive processes.


Introduction
Why are grown-ups often so stupid about probabilities when even babies and chimps can be so smart?
-Alison Gopnik As developmental psychologist Gopnik (2014) noted in an article written for The Wall Street Journal, juxtaposing distinct lines of research on statistical intuitions reveals several puzzling inconsistencies. On the one hand, recent studies on infant cognition have shown that babies possess a remarkable ability to absorb, process, and apply everyday statistical information. In their first year of life, infants are capable of drawing accurate inferences from samples to populations (e.g., Xu & Garcia, 2008), integrating physical information about objects when making statistical inferences (e.g., Denison & Xu, 2010a;Téglás, Girotto, Gonzalez, & Bonatti, 2007), and taking into account information about the intentions of sampling agents (e.g., Xu & Denison, 2009). Likewise, recent research on animal cognition suggests that several nonhuman primates and some parrot species share human infants' ability to make adequate statistical inferences (e.g., Bastos & Taylor, 2020;De Petrillo & Rosati, 2019;Eckert, Rakoczy, Call, Herrmann, & Hanus, 2018;Rakoczy et al., 2014;Tecwyn, Denison, Messer, & Buchsbaum, 2017). On the other hand, research conducted in the heuristics-and-biases tradition has found that adults' statistical inferences are often inconsistent with the rules of probability theory and statistics (e.g., Tversky & Kahneman, 1974). For nearly five decades, this research tradition has profoundly shaped scholars' conception of adults' statistical intuitions in psychology, economics, and beyond, concluding that "intuitive judgments of all relevant marginal, conjunctive, and conditional probabilities are not likely to be coherent, that is, to satisfy the constraints of probability theory" (Tversky & Kahneman, 1983, p. 313). Yet just a few years before the emergence of heuristics-and-biases research, a seminal review of studies on adult statistical inference-tellingly titled "Man as an intuitive statistician"-had come to the opposite conclusion: that "probability theory and statistics can be used as the basis for psychological models that integrate and account for human performance in a wide range of inferential tasks" (Peterson & Beach, 1967, p. 29). In short, several decades of research from different lines of inquiry have reached surprisingly inconsistent conclusions, despite addressing a common question: How good are humans' (and other animals') statistical intuitions?
In this article, we aim to systematize and synthesize this body of research, bringing together research areas that have, to date, operated largely in parallel. We argue that a key factor in understanding incongruities in research on statistical intuitions (though, of course, not the only one, see 4. Other Possible Contributors to Inconsistencies in Research on Statistical Intuitions) is whether the experimental paradigms used to measure these intuitions offer authentic experience of the relevant probabilistic information or descriptive summary representations thereof. By experience, we mean situations that involve interaction with the environment to reach an understanding of its statistical structure-for instance, by sampling information sequentially, making repeated judgments with or without feedback, or actively operating or observing the experimental microworld. In description, by contrast, a full or partial symbolic summary representation of the situation is provided (see Hertwig, Hogarth, & Lejarraga, 2018). To appreciate this difference, consider how experimenters typically communicate with adults and with babies or nonhuman animals. One of the greatest cultural achievements of humankind is the capacity to communicate by means of written symbols, a powerful form of self-expression that makes the spoken word permanent and enables humans to draw on the accumulated knowledge of others (Schmandt-Besserat, 1996). Infants do not yet possess this ability to process and produce symbolic representations of the world. Therefore, whereas the experimental protocols used with adults typically involve symbolic (often text-based) description of information, the protocols used with infants and, similarly, with animals tend to involve first-hand experience of information.
In nearly all cases, the description-experience dimension has been studied within a single developmental period-typically adulthood. We propose that the distinction between description and experience may be one key to understanding otherwise puzzling behavioral patterns across development. Furthermore, we argue that disparities in research findings on animal versus human cognition-and indeed in findings on probabilistic reasoning in research from the 1960s versus 1970s-can be better understood by considering the degree of experiential involvement inherent in the respective experimental protocols. In the following, we first review findings from diverse domains of probabilistic reasoning research and highlight how distinguishing between description and experience can forge bridges between these largely unrelated fields, reconciling their seemingly inconsistent findings. Second, we discuss empirical evidence supporting a significant role of the description-experience dimension in determining people's statistical intuitions. Third, we delineate our framework from other views on inconsistencies in probabilistic reasoning research. Fourth, we summarize characteristics that define experience, identify the mechanisms by which they facilitate appropriate statistical inference, and consider the conditions under which attributes of experience can improve or impair probabilistic reasoning. Finally, we outline implications for the development of broadly informed paradigms in statistical intuitions research, for theory integration, and for educational applications.

Babies' intuitive understanding of statistics
A recent surge in studies on infants' probabilistic reasoning (for a review, see Denison & Xu, 2019) has fueled the development of novel paradigms for testing their statistical intuitions. Fig. 1 illustrates one such experimental paradigm, in which colored objects are sequentially sampled from a box, the content of the box is revealed, and infants' looking time is measured. Based on the premise that infants look at unexpected or novel events for longer than at expected or known events, several studies have found that babies form adequate expectations about the underlying population from the randomly drawn sample (Denison, Reed, & Xu, 2013;Denison & Xu, 2010a;Xu & Denison, 2009;Xu & Garcia, 2008). Likewise, infants have been shown to consider the properties of a population when inferring the likelihood that a single-or a multi-object sample will be drawn from that population (Kayhan, Gredebäck, & Lindskog, 2018;Téglás et al., 2007;Xu & Garcia, 2008). Moreover, infants use statistical information to anticipate probabilistic outcomes (Téglás & Bonatti, 2016), to make inductive generalizations (Gweon, Tenenbaum, & Schulz, 2010), to differentiate between the relative likelihoods of two simultaneous events (Kayhan et al., 2018), and to guide their exploratory actions (Sim & Xu, 2017) and preferential choices (Denison & Xu, 2010b. Experiments using this last preferential choice task have shown that infants use proportional rather than quantitative reasoning to make statistical inferences -two possibilities that are confounded in the setup illustrated in Fig. 1. Infants also appear to integrate information about the physical properties of objects (e.g., mobility, location, density, and speed of movement) when making statistical inferences (Denison, Trikutam, & Xu, 2014;Denison & Xu, 2010a;Lawson & Rakison, 2013;Téglás et al., 2007;Téglás, Ibanez-Lillo, Costa, & Bonatti, 2015;Téglás, Vul, Girotto, Gonzalez, Tenenbaum, & Bonatti, 2011). For example, if some balls in the paradigm illustrated in Fig. 1 can be sampled without constraints whereas others are fixated and cannot be sampled, infants disregard the immovable subset when making statistical inferences Denison & Xu, 2010a).
Further, infants are sensitive to the randomness of the sampling process and integrate attributes of the sampling agent (e.g., expressed preferences or visual access) when making inferences about object populations and inductive generalizations about object properties (Attisano & Denison, 2020;Gweon et al., 2010;Xu & Denison, 2009). They can infer other people's preferences from the statistical properties of their actions (Kushnir, Xu, & Wellman, 2010;Ma & Xu, 2011;Wellman, Kushnir, Xu, & Brink, 2016) and attribute the observation of nonrandom samples to intentional agency (Ma & Xu, 2013). Infants also use statistical information to infer the cause of a failed action: When given a dysfunctional toy, infants attribute its failure to work to either the toy or another person, depending on whether or not the toy's properties differ from those of the other person's-functional-toy (Gweon & Schulz, 2011). Table 1 summarizes this research on infants' statistical intuitions, describing the experimental protocol used in each study.

Adults' error-prone statistical intuitions
Results from heuristics-and-biases research paint a different picture. Research in this tradition has identified numerous errors-often called cognitive illusions-that adults tend to commit when judging the probability of an uncertain event, gauging the value of an uncertain quantity, Fig. 1. Illustration of one experimental paradigm used in research on infants' statistical intuitions (e.g., Xu & Garcia, 2008). The experimenter draws several colored balls (e.g., red or white) from an opaque box without looking, the content of the box is revealed, and infants' looking time is recorded. If the sample drawn does not reflect the object distribution in the box (unexpected outcome, bottom left panel), infants tend to look at the contents of the box for longer than if the sample is consistent with the object distribution (expected outcome, bottom right panel). Adapted from Gopnik (2012).
12.5 Téglás and Bonatti (2016) 12.5 A striped ball moving inside a rectangular frame with 3 openings on one vertical side and 1 opening on the opposite side The ball moved inside the frame until it exited at the side with 3 openings (probable outcome) or 1 opening (improbable outcome); eye-tracking and VoE

Movies on screen
Repeated sampling from population Attisano and Denison (2020) 6.5 or 9.5 Different ratios of red balls and yellow rubber ducks in one box One toy was grasped repeatedly by either the experimenter or a mechanical claw in up to 14 habituation trials; at test, 3 toys of the same type were sampled by either the experimenter with visual access to the population or a mechanical claw, the population was then revealed; VoE Puppet stage Denison et al. (2013) 4.5 or 6 Different ratios of yellow and pink ping-pong balls in two boxes 5 balls were drawn from one of two populations in two sequential actions ( Direct interaction Ma and Xu (2011) 16.5 or 26.5 Different ratios of boring (e.g., white wooden cubes) and interesting (e.g., orange slinkies with pumpkin-face print) toys in one jar 6 boring toys were sampled sequentially (one at a time) and infants' choice behavior was coded ("Can I have the one I like?") Direct interaction Ma and Xu (2013) 9.5 Yellow and red ping-pong balls (ratio 2:1) in one jar 9 balls were sampled sequentially (one at a time) in each sampling event by either a human hand or mechanical claw; 3 sampling events per trial; VoE Puppet stage Sim and Xu (2017) 13 Six balls of different colors (red, purple, blue, green, yellow, and orange) in one box (VoE) or each of two boxes (choice task) 4 balls of either the same or different colors were sampled sequentially (one at a time) with replacement; VoE or exploratory choice ("Do you want to come and play?")

Puppet stage
Wellman et al.
10 Different ratios of blue and red balls in one box 5 balls of the same color were drawn repeatedly from the population in three sequential actions (2,2,1 ball[s] at a time) in up to 8 habituation trials; at test, the experimenter selected one type of ball; VoE Puppet stage Xu and Denison (2009) 11 Different ratios of red and white ping-pong balls in (what appeared to be) two boxes 5 balls were drawn sequentially from one of two populations (either in two actions, Xu & Denison, 2009; or one at a time, Xu & Garcia, 2008) and the population was then revealed. In Xu and Denison (2009), whether or not sampling was random was varied (with or without visual access to the population during sampling). In Xu and Garcia (2008;Exp. 4-6), the population was revealed before the 5 balls were sampled; VoE Puppet stage Xu and Garcia (2008) 8 Note. In addition to the experimental procedures being experience based, in all but two studies (Gweon et al., 2010;Kayhan et al., 2018), infants were also familiarized with (i.e., had prior exposure to) the sampling materials and sampling procedure used. VoE: Violation-of-expectation looking paradigm. * This study differs from the others in that it involved sampling of "actions" (i.e., repeated demonstrations) rather than repeated sampling of objects.  Tversky and Kahneman (1974).
Bias and task used to demonstrate it Example procedure and study Insensitivity to prior probability of outcomes Ranking of outcomes based on likelihood versus similarity Text-based instruction to rank nine fields of graduate specialization in order of the likelihood that a hypothetical person described in a personality sketch is a student in these fields ) Judgment of the likelihood of outcomes in the presence of prior versus individuating evidence Engineer-lawyer problem: Text-based instruction to judge the probability that a thumbnail description of a professional individual, sampled at random from a population of 30 engineers and 70 lawyers (or vice versa), belongs to one of the engineers in the population ) Insensitivity to sample size Production of sampling distributions for different sample sizes Text-based instruction to produce a sampling distribution of the average height of men examined at a regional induction center with seven categories (up to 160 cm, 160-165 cm, …, more than 185 cm) and for different sample sizes (N = 10, 100, or 1000; Kahneman & Tversky, 1972) Judgment of the likelihood of sampling outcomes contingent on sample size Maternity-ward problem: Text-based instruction to judge which of two hospitals, a smaller or a larger one, is more likely to have recorded a higher number of days on which over 60% of babies born were boys (Kahneman & Tversky, 1972 Text-based instruction to experienced research psychologists to judge the probability of a successful replication of a significant result in a sample smaller than the one that produced the original finding (Tversky & Kahneman, 1971) Insensitivity to predictability Numerical prediction of a remote criterion versus evaluation of inputs Text-based instruction to judge the professional standing of school teachers based on descriptive summaries of their performance in a practice lesson held 5 years earlier  The illusion of validity Confidence in prediction from consistent versus inconsistent evidence Text-based instruction to report confidence in predicting grade point averages based on pairs of aptitude tests that produced either consistent scores and were described as correlated or inconsistent scores and were described as uncorrelated

Misconceptions of regression Anticipation and interpretation of regression toward the mean
Text-based instruction to interpret a description of a flight maneuver training situation in which verbal rewards for successes resulted in subsequent performance loss ) Biases due to the retrievability of instances* Judgment of the frequency of a class based on the familiarity versus occurrence rate of its instances Sequential presentation of a recorded list of names of more or less famous men and women and verbal instruction to judge whether the list contained more men's names or women's names ) Biases due to the effectiveness of a search set Judgment of word frequency Text-based instruction to judge the relative likelihood with which various letters of the alphabet (e.g., K, L, N, R, V) appear in the first and third positions in words in the English language

Biases of imaginability Estimation of the numerosity of possible combinations
Text-based instruction to estimate the number of all possible distinct committees of various sizes (between 2 and 8 members) that can be formed from a set of 10 people ) Illusory correlation* Judgment of the frequency of co-occurrence of events Sequential presentation of a recorded list consisting of highly related (knife-fork) and unrelated (head-fork) word pairs. Half the pairs were repeated three times in the recording; the other half, twice. Text-based instruction to judge the frequency with which each word pair was presented (from a written list of all pairs; e.g., ) Insufficient adjustment Estimation of quantities relative to an uninformative number Verbal instruction to estimate the percentage of African countries in the UN relative to an arbitrary number between 0 and 100 generated by spinning a wheel of fortune in the participant's presence (Tversky & Kahneman, 1974) Estimation of quantities based on an incomplete computation High school students estimated, within 5 seconds, the product of a numerical expression written on the blackboard

Biases in the evaluation of conjunctive and disjunctive events
Choice among gambles with elementary, conjunctive, and disjunctive events Repeated choice between described pairs of gambles, one elementary (draw of one marble from a mixed urn) and one compound, either conjunctive (multiple successive draws of the same color marbles) or disjunctive (multiple successive draws in which a particular color marble is drawn at least once). At the end of the experiment, one gamble chosen by the participant was played out (Bar-Hillel, 1973) Anchoring in the assessment of subjective probability distributions Elicitation of subjective probability distributions Text-based instruction to assess the 10th or 90th percentile in the distribution of subjective beliefs about the air distance from New Delhi to Peking (group 1); or to assess the odds that the median air distance from New Delhi to Peking given by group 1 exceeds the true value (Tversky & Kahneman, 1974) * Cognitive biases demonstrated with experience-based experimental protocols.
or deciding between risky prospects. For instance, adults commit the conjunction fallacy, do not properly account for sample sizes, overestimate the prevalence of easily recalled events, fail to make full use of base-rate information, do not anticipate regression toward the mean, are susceptible to a problem's reference frame, are overconfident, and violate various axioms of expected utility theory (Fischhoff, Slovic, & Lichtenstein, 1977;Kahneman & Tversky, 1972, 1973, 1974, 1981, 1983. What explains this inconsistency between infants' and adults' statistical inference skills? Gopnik (2014) suggested that infants' "intuitive, unconscious statistical ability may be completely separate from [adults'] conscious reasoning." This perspective echoes a dual-system view frequently invoked in heuristics-and-biases research-except that there, reasoning errors are typically attributed to the intuitive system (Kahneman, 2011;Kahneman & Frederick, 2002), which Gopnik sees as the very engine of infants' statistical competence (see also Oaksford & Hall, 2016). Several explanations are conceivable and it is likely that more than one factor is at play. One explanation that has not yet been thoroughly considered is the striking and systematic difference in the experimental protocols used in studies with adults versus infants. As summarized in Table 1, participants in infant studies typically experience the probabilistic texture of the experimental microworld at first hand. In the paradigm illustrated in Fig. 1, for instance, the child observes each sample being drawn one-by-one. Participants in adult studies, in contrast, typically receive description-based and symbolic task representations. In fact, nearly all classic tasks that demonstrate cognitive illusions in adults' statistical inferences are text based. No experiential engagement with raw data is required. For instance, adults' responses to the Linda task have been interpreted as evidence that their reasoning is at odds with the conjunction rule, "[p]erhaps the simplest and the most basic qualitative law of probability" (Tversky & Kahneman, 1983, p. 293). In this task, the conjunction rule is embedded within a description of a hypothetical woman and her possible occupations and avocations. Many other statistical inference problems and choice tasks-such as the engineer-lawyer problem, the maternity ward problem, the Asian disease problem (Kahneman & Tversky, 1972, 1973Tversky & Kahneman, 1981), as well as monetary lotteries giving rise to various violations of expected utility theory (Kahneman & Tversky, 1979)-likewise use a description-based approach. In their foundational article in Science, Tversky and Kahneman (1974) summarized over a dozen putative cognitive biases observed in studies using a variety of tasks, as listed in Table 2. All but two of those tasks (see asterisks in Table 2) were solely description based.

Are People Intuitive Statisticians or Not Statisticians at All?
Notably, Tversky and Kahneman's (1974) article, which introduced the notion of heuristics-and-biases, was published just 7 years after Peterson and Beach (1967) reviewed several decades of experimental research (more than 160 experiments) on adult statistical inference-and concluded that the normative benchmarks of probability theory and statistics indeed provide a good initial description of people's statistical abilities: that people are intuitive statisticians. Like Lejarraga and Hertwig (2021), we suggest that this inconsistency across research traditions may also be attributable, at least in part, to differences in the experimental protocols used. Heuristics-and-biases research marked a turning point not only in the conceptualization and evaluation of people's ability to reason probabilistically but also in experimental protocols (see also Hertwig et al., 2018). Consider, for instance, research on Bayesian reasoning-the pinnacle of statistical inference and a key building block in classic models of rational choice-conducted in these two influential research traditions. In the 1970s, research conducted in the heuristics-and-biases tradition, primarily using text-based scenarios to present all available information (see top panel of Fig. 2), concluded that Bayes's theorem failed to describe the workings of the mind: "[i]n his evaluation of evidence, man is apparently not a conservative Bayesian: he is not Bayesian at all" (Kahneman & Tversky, 1972, p. 450). The term "conservative Bayesian" harks back to Edwards (1968), who had used experience-based protocols that required sequential updating of probability estimates (see lower panel of Fig. 2) and concluded that people's probability updating, albeit "conservative" (beliefs are revised less strongly than prescribed by Bayes's theorem), was proportional to the normative values (Edwards, 1968;Edwards, Lindman, & Phillips, 1965).
These conflicting views and the shift in experimental protocols are surprising. In a quantitative assessment of the methodological protocols used in more than 600 empirical studies, Lejarraga and Hertwig (2021) recently demonstrated that heuristics-and-biases research established a new experimental protocol, with a strong emphasis on described scenarios, replacing the past emphasis on experiential experimental

Fig. 2.
Illustrative comparison of paradigmatic description-and experience-based Bayesian inference problems. In the description-based task (upper panel), participants read a summary of an inference problem and return a single probability judgment about the aggregate evidence (adapted from Kahneman & Tversky, 1972). In the experience-based task (lower panel), participants are asked to imagine two populations and then sequentially experience the actual presentation of evidence from one, randomly selected, population and return a revised probability judgment after each new observation (see, e.g., Edwards et al., 1965). In both example scenarios, the solution prescribed by Bayes's theorem is 97%. protocols in work under the intuitive statistician umbrella. Moreover, they showed that this shift in experimental culture continues to have a profound influence on empirical research on probabilistic reasoning and human rationality.

Of Smart Animals and "Irrational" Humans
A final inconsistency in research on statistical intuitions concerns the extent to which probabilistic reasoning competences (or the lack thereof) are uniquely human. The origins of statistical abilities have, in fact, been traced beyond human ontogeny. Several nonhuman primates, including great apes, capuchin monkeys, and rhesus macaques, have been shown to draw flexible statistical inferences from populations to samples (De Petrillo & Rosati, 2019;Eckert, Call, Hermes, Herrmann, & Rakoczy, 2018;Rakoczy et al., 2014;Tecwyn et al., 2017; but see Placì, Eckert, Rakoczy, & Fischer, 2018) and, to a more limited extent, from samples to populations (Eckert, Rakoczy, & Call, 2017). Likewise, the ability to make probabilistic inferences has been demonstrated in different bird species (Bastos & Taylor, 2020;Clements, Gray, Gross, & Pepperberg, 2018) and some birds integrate physical information about objects when making statistical inferences (Bastos & Taylor, 2020). Finally, like babies, both chimpanzees and a bird species have been shown to integrate information about the psychological states of the sampling agents (humans), such as expressed preferences or visual access, when drawing statistical inferences (Bastos & Taylor, 2020;. It seems that "[i]ntuitive statistics in its most basic form is thus an evolutionarily more ancient rather than a uniquely human capacity" (Rakoczy et al., 2014, p. 60). Table 3 summarizes representative examples of this work on animals' statistical intuitions and describes the experimental procedure used. Because most animals, unlike humans, are not able to process and produce symbolic representations, animals' statistical intuitions and studies thereof are based on experience rather than description.
Assuming that humans' ability to cope with an uncertain world has evolved from animals' capacities to face volatile, competitive, and hardto-predict environments, we might expect human and animal behavior to converge when humans are tested in situations in which they, too, make decisions from experience rather than description. Indeed, it has been found that when animals and humans accumulate information via direct experience, their risk sensitivity is predicted by a measure of risk per unit of return-the coefficient of variation-rather than by outcome variance, which is typically used in normative economic models . Similarly, the divergence between humans' and other animals' tendency to show a "certainty effect" in repeated risky choice could be resolved by taking into account differences in the experiential aspect of perceptual precision between the respective paradigms (Shafir, Reich, Tsur, Erev, & Lotem, 2008). Whereas animals typically experience primary rewards whose exact quantity is difficult to discern when consumed (e.g., water), humans often receive monetary rewards that can be precisely differentiated. When the perceptual noise in the presentation of rewards was increased, humans showed the same pattern of choices as animals; when it was decreased, honeybees displayed the same choice pattern as humans (Shafir et al., 2008). Finally, when making risky foraging decisions, "bumblebees underperceive rare events Kea Different ratios of rewarding and unrewarding tokens in two transparent jars. In Exp. 2, a physical barrier divided the sampling population, impeding sampling from part of the population One object was drawn from each population and animals made a single forced choice between the samples drawn while they were still concealed from view. In Exp. 3, whether or not sampling was random was varied (with or without visual access to the population during sampling) and the two samples were drawn by different experimenters

Direct interaction
Clements et al.
Grey parrot 3:1 ratio of two different types of objects (e.g., 3 corks and 1 piece of paper), visibly placed in an opaque bucket One object was randomly drawn from the population. The animal vocally identified the object drawn while it was still concealed from view Long-tailed macaques Rakoczy et al.
Great apes Tecwyn et al.
Capuchin monkeys Repeated sampling from population Eckert et al.
Great apes Different ratios of favorable (fruit pellets) and neutral (carrot pieces) food items in two transparent containers Two populations were shown to animals, then occluded, and the positions shuffled. Next, multiple food items were simultaneously drawn from each population in one action (Exp. 1: five items from each population; Exp. 2: three or five vs. 12 items from each population). Animals made a single forced choice between the occluded populations

Direct interaction
Note. VoE: Violation-of-expectation looking paradigm. and overperceive common events" (Real, 1991, p. 985), which is the reverse of the inverse S-shaped probability weighting observed in description-based risky choices in humans (Kahneman & Tversky, 1979). When humans, like bees, rely on sampled experience, their weighting pattern also appears to reverse (when conditioned on objective probabilities; Regenwetter & Robinson, 2017) or to indicate more linear weighting (Wulff et al., 2018). Thus, when compared on equal terms, animals, infants, and adults in research conducted in the 1960s and 1970s no longer seem so far apart in terms of the quality of their probabilistic reasoning.

Empirical Support for a Description-Experience Gap in Statistical Intuitions
This experience-based perspective on research on statistical intuitions gives rise to a strong, testable prediction, namely, that variations in the information format that increase or decrease the level of experiential involvement will lead, ceteris paribus, to corresponding changes in performance. This point implies that we, like others (Rakow & Newell, 2010), understand description and experience not in terms of a strict dichotomy but as concepts spread along a continuum.
On the descriptive side of the continuum, symbolic descriptions that reflect more of the underlying experience should lead to better statistical inferences. For example, described frequency formats have been argued to represent a proxy of the original experience (i.e., a series of events) on which cognitive algorithms have evolved (Cosmides & Tooby, 1996;Gigerenzer, 1991;Gigerenzer & Hoffrage, 1995). Described natural frequencies are defined as the product of natural sampling, which is the process of encountering observations of raw rather than normalized counts of events, without the marginal frequencies being fixed a priori (Kleiter, 1994). As such, natural frequencies maintain information about the underlying sample size and base rates, which is lost in conditional probabilities (Gigerenzer & Hoffrage, 1995). There is substantial evidence that descriptive formats harnessing natural frequencies improve Bayesian inference. A recent meta-analysis, reviewing 20 years of research on the "natural frequency facilitation effect" in Bayesian inference, estimated that the proportion of participants who correctly solved Bayesian inference problems presented in a natural frequency format was 20 percentage points higher than that solving problems presented as conditional probabilities (McDowell & Jacobs, 2017). Presenting information in a frequency format also improves other types of probabilistic reasoning-for example, in conjunction rule tasks (Fiedler, 1988;Hertwig & Gigerenzer, 1999;Tversky & Kahneman, 1983), sample size tasks (Sedlmeier & Gigerenzer, 1997), and the Monty Hall problem (Krauss & Wang, 2003). Importantly, although natural frequencies reflect some of the underlying experience, they remain a descriptive format. They can thus not be used to study probabilistic reasoning in babies and nonhuman primates, which requires truly experience-based methodologies-and thus moving further away from the descriptive side of the continuum.
For example, we would expect adding first-hand experience to reasoning tasks to further improve statistical inference. Indeed, allowing adults to directly experience simulated outcomes of probabilistic processes in several otherwise description-based inference tasks (e.g., conjunction rule tasks, the maternity ward problem, Bayesian inference, and the Monty Hall problem) drastically improved judgments (Hogarth & Soyer, 2011). Moreover, research on medical judgments has found that Bayesian inferences are more accurate when based on experiential formats (e.g., a sequence of representative patient cases illustrating the relative frequency of a disease and of positive/negative test results) than on described probability formats (Armstrong & Spaniol, 2017;Wegier & Shaffer, 2017). Similarly, a seminal review of base-rate fallacy research concluded that "when base rates are directly experienced through trialby-trial outcome feedback, their impact on judgments increases" (Koehler, 1996, p. 6), in contrast to the relative "neglect" of stated base rates. Finally, people hold accurate distributional priors about the duration and extent of everyday phenomena-such as human lifespans or movie run-time lengths-which can be learned from everyday experience with the statistical texture of real-world environments (Griffiths & Tenenbaum, 2006).
The work reviewed thus far has investigated the role of experience either within one subject population-typically human adults-or within a single task format across different subject populations. But to what extent can the description-experience continuum cast light on inconsistencies across subject populations and research traditions? This question is to some degree theoretical. Direct comparisons of all subject populations' statistical intuitions across the entire spectrum of the description-experience continuum are not feasible; systematic comparisons, where they are possible, are scarce. There are, however, important exceptions. For instance, in a study examining children's and adults' Bayesian probability updating in a natural frequency format versus a conditional probability format, sixth graders' Bayesian inferences in the natural frequency format matched those of adults in the conditional probability format (Zhu & Gigerenzer, 2006; but see Pighin, Girotto, & Tentori, 2017). Similarly, in a comparison of children's and adults' statistical intuitions on experience-or description-based paradigmatic probabilistic inference tasks-conjunction rule tasks and Bayesian inference tasks-we recently showed that sequentially experiencing statistical information considerably improved both adults' and children's statistical intuitions (Schulze & Hertwig, 2021). In fact, adults' probabilistic reasoning performance in description was surpassed by that of children in experience-seemingly a developmental reversal that was, we argue, in reality driven by the experimental protocol.
In sum, there is considerable empirical evidence that the description-experience dimension plays a key role when it comes to interpreting probabilistic reasoning within and across the lifespan, research traditions, and species. But the experimental paradigms we have reviewed of course also differ on other dimensions, and our view does not exclude the possibility of genuine evolutionary or developmental differences in probabilistic inference. Next, we summarize further factors that may play a role in determining the statistical competencies of different subject populations.

Do Experience-Based Tasks Measure More Basic Statistical Skills and Are They Thus Easier?
Perhaps the most important alternative explanation for the inconsistencies observed is that studies with infants and animals target basic statistical abilities that adults possess as a matter of course. Yet the empirical findings we have reviewed suggest that description-experience gaps in performance cannot be attributed solely to a difficult-easy task dichotomy. When participants of the same age were given more experience-or more description-based problems, their statistical inferences differed systematically, even when the abstract statistical principle (e.g., Bayes's rule) remained the same (Armstrong & Spaniol, 2017;Fiedler, 1988;Hertwig & Gigerenzer, 1999;Hogarth & Soyer, 2011;McDowell & Jacobs, 2017;Wegier & Shaffer, 2017). Importantly, when participants of different ages were given inference problems that used different information formats but measured the same underlying statistical ability (e.g., adherence to the conjunction rule), performance differed depending on the information format (Schulze & Hertwig, 2021;Zhu & Gigerenzer, 2006). These findings suggest that experience-based paradigms do not produce better results solely because the statistical abilities engaged are more basic, even though sometimes this may be the case. Moreover, differences in task difficulty are not easily defined. If anything, one might expect the experience-based protocols used in the intuitive statistician tradition to be cognitively more taxing-in terms of memory, attention, and computation-than equivalent description-based formats in which all information is packaged and delivered at once (Lejarraga & Hertwig, 2021). Finally, the risk of casting differences in performance between described and experienced experimental protocols as differences in task difficulty is that it merely re-describes the empirical data, unless the cognitive processes potentially underlying such a reduction in difficulty are specified (see 5.1. Mechanisms Underlying the Beneficial Qualities of Experience).

Implicit Versus Explicit Measures?
Another possible contributor to performance differences resides in the nature of the dependent measure used. Whereas infants' probabilistic understanding is often inferred from relatively implicit measures such as looking time (see Table 1), adults are typically asked for explicit verbal (written) judgments (see Table 2). Explicit responses may engage cognitive functions not needed for implicit responses (e.g., inhibitory control, working memory, or attention), thus contributing to the inconsistencies observed. The distinction between implicit and explicit measures has proven central in accounting for age-related discrepancies in other cognitive domains, such as false-belief understanding (Onishi & Baillargeon, 2005;Scott & Baillargeon, 2017) and language processing (Creel & Quam, 2015), and has also been examined in the context of probabilistic reasoning (e.g., Téglás et al., 2007;Yost, Siegel, & Andrews, 1962). Yet, again, several findings cannot be readily explained by this distinction. For instance, almost all studies probing animals' statistical intuitions (see Table 3; and see some studies with human infants, e.g., Denison & Xu, 2010b required the animal to make an explicit, albeit nonverbal, decision. Research conducted with adults in the intuitive statistician and heuristics-and-biases traditions also tended to use explicit dependent measures.

Genuine Developmental or Evolutionary Differences?
Our perspective does not imply that infants' remarkable statistical intuitions attest to a cognitive competence that adults, for unclear reasons, no longer possess. Rather, we generally expect adults to equal or exceed infants' and children's performance when the task format is held constant (see Schulze & Hertwig, 2021), unless there are good empirical or theoretical reasons to the contrary. This view does not preclude the possibility that infants' probabilistic reasoning is genuinely different from adults', or that younger learners sometimes outperform older ones. For instance, it has been argued that humans' conceptual repertoire in dealing with numerical content shows considerable discontinuities across development (Carey, 2009) or that children's representation of probabilistic information may be profoundly different from that of adults, inasmuch as the role of heuristic cognitive processes appears to increase rather than decrease as a function of life experience (Reyna & Brainerd, 1995). Moreover, young children sometimes outperform older children or adults within description-based or experience-based paradigms (Arkes & Ayton, 1999;Davidson, 1995;Gopnik, Griffiths, & Lucas, 2015;Gualtieri & Denison, 2018), and statistical intuitions demonstrated in infants have not been confirmed in 3-and 4-year old children (Girotto, Fontanari, Gonzalez, Vallortigara, & Blaye, 2016;Girotto & Gonzalez, 2008). Counterintuitively, some of these discontinuities and even reversals may be the price for growth in other cognitive abilities: Biased responses could supplant unbiased ones because the knowledge structures needed for the operation of a specific heuristic (e.g., fully formed social stereotypes) are established at a later age (Davidson, 1995;Jacobs & Klaczynski, 2002;Jacobs & Potenza, 1991;Stanovich, West, & Toplak, 2011).
Another view rooted in evolutionary developmental psychology is that the greater cognitive constraints under which infants or animals operate-more generally, cognitive immaturity-can enable adaptive cognitive functioning (Bjorklund, 1997). For instance, restrictions in memory span may foster language acquisition by initially focusing the learner on simple grammatical relationships, laying the foundation for the later acquisition of more complex grammatical structures (Elman, 1993;Newport, 1990). Similarly, prolonged prefrontal immaturity in early development delays the onset of stringent cognitive control, which may help children to learn social and linguistic conventions (Thompson-Schill, Ramscar, & Chrysikou, 2009) or to engage in broader internal and external exploration behaviors (Gopnik, 2020;Gopnik et al., 2015). Finally, animals' cognitively simpler architecture may enable them to better adhere to axioms of rational choice because they encode less complex contextual or symbolic information (Stanovich, 2013) and are less likely to abstract and overgeneralize learned rules (Arkes & Ayton, 1999).

Smart Babies, Smart Adults?
Finally, several lines of research on adult cognition paint a more optimistic picture of adults' competence to deal with probabilistic information than research in the heuristics-and-biases tradition does. Probabilistic models of cognition assume that human cognition at all ages can be explained in terms of a rational Bayesian framework, casting new light on several core cognitive functions (Clark, 2013;Friston, 2010;Oaksford & Chater, 2001;Tenenbaum, Kemp, Griffiths, & Goodman, 2011) and the course of cognitive development (Bonawitz, Denison, Griffiths, & Gopnik, 2014;Gopnik, 2012;Gopnik & Wellman, 2012;Perfors, Tenenbaum, Griffiths, & Xu, 2011;Xu, 2019). The related notion of resource rationality suggests that human cognition is guided by the optimal use of limited computational resources, meaning a rational trade-off between the costs and benefits of using computationally sophisticated analytic versus more heuristic strategies (Griffiths, Lieder, & Goodman, 2015;Lieder, Griffiths, & Hsu, 2018). More generally, the concept of bounded and ecological rationality has highlighted that-provided that people's cognitive strategies fit the structure of their environment-even simple strategies can lead to accurate decisions (Gigerenzer, Todd, & the ABC Research Group, 1999;, & the Center for Adaptive Rationality, 2019; Simon, 1955Simon, , 1956.
In summary, the ability to reason probabilistically is likely determined by various features of the inference task at hand as well as by the specific capacities and characteristics of the person facing the task. One key feature that has not yet been thoroughly considered in the literature is the experiential involvement in a task. Next, we highlight important properties of experience that distinguish it from description and identify mechanisms by which they can, under some circumstances, foster accurate judgments and decisions.

What Characterizes Experience and When Is It Beneficial?
Direct interaction with the world affords myriad concurrent dimensions of information that symbolic description lacks or conveys in condensed form, if at all . A learner experiencing a sequence of events may concurrently receive sensory and motoric feedback; obtain temporal, structural (e.g., clustering), and sample size information; or gain first-hand insights into conditions for statistical inferences (e.g., randomness) that need to be explicitly stipulated or assumed in descriptions. But what are the specific mechanisms by which first-hand experience facilitates appropriate statistical inference, and under which conditions can properties of experience improve or impair accurate statistical intuitions? Several underlying mechanisms are conceivable (see also Lejarraga & Hertwig, 2021) and it is likely that different factors determine the influence of experience on statistical intuitions, depending on the task at hand.

Computational ease
One mechanism by which the sequential experience of events can reduce difficulty and improve judgments is by easing computational demands. For instance, in choices between n-armed bandits, a simple method for estimating an action's value is to sum up the sampled rewards and weigh the sum by the sample size (Sutton & Barto, 1998). In experience-based choice, decision makers can thus maximize expected returns (based on the sampled experience) with a much simpler calculus than description-based expected value theory, which requires probabilities to be estimated and multiplied by rewards (Hertwig & Pleskac, 2010). In a recent meta-analysis of research on description-and experience-based choices between risky lotteries (using the sampling paradigm), a median of 55% of decisions from description maximized expected value, whereas 66% of decisions from experience maximized the experienced mean returns when the sampling sequence included all possible outcomes, including the rare one, relative to 89% when it did not (Wulff et al., 2018). Moreover, ongoing updating processes can alleviate the burden of storing all experienced outcomes (Frey, Mata, & Hertwig, 2015); experiential approaches may therefore become more important as decision problems become more complex Lejarraga, 2010).
Computational ease has also been proposed to explain why descriptions that retain some properties of the underlying experience, such as natural frequencies, improve reasoning. Natural frequencies have been argued to render Bayesian inferences computationally simpler because they preserve information about base rates, whereas conditional probability formats require this information to be incorporated into the calculation via additional computational steps (Gigerenzer & Hoffrage, 1995;McDowell & Jacobs, 2017). Thus, although different representation formats may be informationally equivalent, they may not trigger computationally equivalent cognitive algorithms (Gigerenzer & Hoffrage, 1995). Moreover, abstract symbolic information formats can introduce sources of ambiguity that experience-based information resolves more easily. In descriptive conjunction rule problems, for instance, people have been found to infer nonmathematical meanings of the term "probability" (Hertwig & Gigerenzer, 1999). Frequency formats of the same task disambiguate its semantic interpretation by highlighting the mathematical context and the applicability of the conjunction rule (Hertwig & Gigerenzer, 1999).
Finally, experience often affords control over both the source and the amount of information that needs to be processed and people can adjust their cognitive strategies based on their goals, cognitive abilities, and past experience. This aspect of experience-based reasoning can change the problem landscape such that the prevalence of more distinct, and thus easier, choice situations varies as a function of experiential involvement (Hertwig & Pleskac, 2010;Wulff et al., 2018). When decision problems are equated by matching task difficulty across description and experience-e.g., when comparing performance on experiencebased perceptuo-motor tasks with that on equivalent described monetary gambles-there may be little difference in performance (Jarvstad et al., 2013;Wu et al., 2009). In many cases, however, it may not be possible to precisely quantify and thus equate computational ease. This would require an algorithmic implementation of the cognitive processes engaged by experience-and description-based tasks within a broader cognitive architecture, which would in turn need to consider developmental changes in the cognitive processes engaged by different task formats (see 5.2.2. Experience and description across the lifespan). One challenge for future research is to map properties of description-and experience-based tasks onto cognitive functions.

Incremental learning
Another core quality of experience is that it often involves repeated trials, permitting people to adapt to the environment and its demands, and thereby learn to exploit its affordances, in a step-by-step manner. Heuristics-and-biases research has been criticized for focusing on oneshot situations, thereby lacking such a continuous perspective and excluding "one of the most important determinants of the behavior they purport to explain" (Hogarth, 1981, p. 213). The distinction between repeated trials and one-shot studies is also key to understanding sometimes surprisingly disparate experimental practices in economics and parts of psychology (Hertwig & Ortmann, 2001). One-shot studies leave little room for people to learn-to make incremental changes in light of new information, explore diverse solution pathways, correct behavioral consequences, and reach solutions even without being able to explain how and why (see, e.g., learning direction theory; Selten, 1998). Metaphorically speaking, sequential experience offers decision makers the opportunity to incrementally clip a hedge, whereas one-shot descriptive situations expect them to fell a tree in a single pass (cf. Connolly, 1988).

Internal states as a source of information
Finally, direct interaction with the environment affects the experiencing organism in a variety of ways, and the accumulation of these influences can serve as a source of information: "any physiological or psychological state variable that is altered by experience might function as an efficient integrator (a 'memory') of past experiences" (Higginson, Fawcett, Houston, & McNamara, 2018, p. 8). In a foraging context, reliance on a readily accessible physiological state, such as an animal's energy reserve, has been shown to be nearly as effective for survival as an optimal Bayesian learning strategy, which explicitly integrates all encountered experiences (Higginson et al., 2018). Similarly, psychological indicators such as emotions or moods can help to adaptively adjust behavior under environmentally uncertain conditions (Nettle & Bateson, 2012). Experience thus affords a means to exploit the structure of the environment by tapping into information about external conditions stored in an organism's internal states.

Experience Is Not a Panacea: Under Which Conditions Is It
Beneficial?

Wicked environments and meta-cognitive myopia
Although experience entails properties that can improve statistical inference, it is no guarantee for good probabilistic reasoning. One important determinant of the value of first-hand experience is the conditions under which it is gathered (Erev & Roth, 2014;Hogarth, 2001). If an environment is "kind," in that it maps experience onto valid representations, intuitive statistics can be remarkably accurate; if an environment is "wicked," offering experience that is unrepresentative, biased, or misleading, statistical intuitions may be mistaken (Hogarth, 2001;Hogarth, Lejarraga, & Soyer, 2015). For instance, there is ample evidence that people automatically encode accurate frequency information about events they encounter in the world (Hasher & Zacks, 1979, 1984Zacks & Hasher, 2002). Yet people sometimes behave as if they were held hostage by their experience-unwilling or unprepared to go beyond the data, and, in a kind of meta-cognitive myopia, failing to take the history and validity of the sample and the sampling process into account (Fiedler, 2000(Fiedler, , 2012. These findings have led some researchers to describe humans as "naïve intuitive statisticians" (Fiedler & Juslin, 2006, emphasis added; but see Le Mens & Denrell, 2011, for a challenge to this naivety conclusion). Moreover, research on the description-experience gap in risky choice has shown that people do not always perform better when choosing from experience, although their choices in the sampling paradigm (see Hertwig & Erev, 2009) tend to be well described as maximization of mean return based on sampled experience (Hertwig & Pleskac, 2010;Wulff et al., 2018). Furthermore, in experimental paradigms featuring sequences of choices with feedback (e.g., the full feedback paradigm; Hertwig & Erev, 2009), systematic deviations from maximization have been observed-often opposite to those obtained in description (Erev et al., 2017;Plonsky et al., 2015). Animals sometimes likewise make suboptimal choices or violate principles of rational choice (e.g., Bateson, Healy, & Hurly, 2002;Shafir et al., 2008;Shafir, Waite, & Smith, 2002;Zentall, 2015).
Finally, description-based formats also have a place in people's information ecologies. It may not always be feasible to rely on direct experience of information (e.g., when facing novel realities) and there are many situations in which descriptions offer clear benefits over experience (e.g., precision in distinguishing minute differences; summary representations of other people's experience). In many situations, moreover, people have access to first-hand experience in addition to a symbolic description. Drawing on research investigating the interplay between described and experienced information can thus advance our understanding of people's probabilistic thinking-for instance, when anticipating how people will perceive and respond to real-world risks, both existing and novel (Hertwig & Wulff, 2021).

Experience and description across the lifespan
The role of experiential involvement may also be complicated by developmental changes in the cognitive processes engaged by experience-and description-based tasks. For instance, experiential formats can require costly information acquisition that taxes memory and fluid intelligence. These demands may contribute to developmental differences in decision making (Rakow & Rahim, 2010). One example of a complex experiential task is the Iowa gambling task, in which adults typically outperform children (see Boyer, 2006). Giving children prior information about the options' outcomes and probabilities in addition to online experience of the outcomes (thus decreasing search and memory demands) has been shown to improve their performance (van Duijvenvoorde, Jansen, Bredman, & Huizenga, 2012). The cognitive demands of experiential formats may also affect decision making in older age. In complex choice environments (e.g., with numerous options), older adults (mean age = 71 years) explored substantially less than younger adults (mean age = 24 years) did, and their lower sampling effort was correlated with declines in measures of fluid intelligence (Frey et al., 2015). Moreover, there may be age-related differences in the affective processes triggered by immediate outcome experience, causing adolescents to take more risks than adults in, for instance, description-based dynamic choice tasks with immediate outcome feedback (Figner, Mackinlay, Wilkening, & Weber, 2009).
Likewise, the ability to deal with described probability formats may undergo significant change during development and may potentially require formal instruction through schooling. Research has shown that, when asked to choose between lotteries with stated outcomes and probabilities, children between 5 and 7 years old do not systematically consider differences in expected value between choice options (Levin, Weller, Pederson, & Harshman, 2007). Similarly, children younger than 9 years old do not systematically use stated probabilistic cues to inform their decisions in information-board paradigms; rather, they rely on unsystematic or unsuitable strategies (Betsch & Lang, 2013). In sum, making proportional calculations on the basis of described probabilistic information appears to be a difficult skill to learn (see also Bryant & Nunes, 2012). One challenge for future research is to further map out the developmental trajectory of the relationship between experience-and description-based probabilistic reasoning.

Cognitive illusions in experience-based paradigms
Finally, although the cognitive illusions reported in the context of heuristics-and-biases research predominantly occur in description-based tasks, there are important exceptions. One is the gambler's fallacy, which describes the belief that, in a random sequence such as the flip of a coin, a long streak of one outcome (heads) is more likely to be followed by the opposite outcome (tails) than would be expected under random sampling. The gambler's fallacy has been argued to arise from the misconception of chance processes as locally representative: "people expect that the essential characteristics of the process will be represented, not only globally in the entire sequence, but also locally in each of its parts" (Tversky & Kahneman, 1974, p. 1125). An alternative view suggests that the gambler's fallacy, and people's seeming misperception of randomness more generally, occur precisely because of how people experience statistical environments: Human experience is finite, and short-term memory capacity is limited (Hahn & Warren, 2009). In a finite sequence of 20 coin tosses, for instance, people may only be able to monitor a limited number of events (e.g., four consecutive tosses) as their attentional window moves through this data stream. Under these conditions, there is a good chance that they will never encounter a sample of four consecutive heads; in fact, this probability is more than twice that of never seeing three heads followed by one tail (Hahn & Warren, 2009). That is, under conditions that match people's typical experience with finite samples, a sequence associated with committing the gambler's fallacy (three heads followed by one tail) is much more likely to occur than a sequence of four consecutive heads.
The notion of a "belief in the law of small numbers" (Tversky & Kahneman, 1971) has also been invoked to explain a classic choice anomaly that occurs when people make repeated choices from experience. For example, when asked to repeatedly choose between two alternatives with unequal odds of the same payoff (e.g., p = .70 and 1 − p = .30), people tend to probability match by allocating responses to the choice options in proportion to their relative rates of success Vulkan, 2000). Again, this phenomenon may be driven by people's typical experience with repeated choices among probabilistic options. That is, probability matching in laboratory experiments may be an overlearned response from common real-world settings in which it can be a highly successful strategy due to competition for available resources (Gallistel, 1990;Schulze, van Ravenzwaaij, & Newell, 2015) or sequential dependencies in the outcome sequence (Gaissmaier & Schooler, 2008;Schulze, Gaissmaier, & Newell, 2020;Schulze, van Ravenzwaaij, & Newell, 2017). Thus, given the structure and affordances of the environments people typically encounter in their daily lives, paradigmatic experience-based "cognitive illusions" no longer seem so fallacious. Moreover, for some experience-based phenomena that have been labelled fallacious, such as the belief in a hot hand (Gilovich, Vallone, & Tversky, 1985), the bias may in fact reside in how researchers evaluate the adequacy of people's probabilistic reasoning (see Miller & Sanjurjo, 2014.

Implications of the Description-Experience Continuum of Statistical Intuitions
The ability to make good statistical inferences is a hallmark prerequisite for coping with the demands of an uncertain world. To navigate uncertainty successfully, humans and other animals must be able to draw apposite inferences from finite samples. This ability is likely determined both by properties of the inference task and by the capacities and characteristics of the individual facing it. One key aspect that has not yet been thoroughly considered is how decision makers acquire the data on which their inferences are based: via experience, symbolic description, or something in between. The description-experience framework we have outlined can help to explain the inconsistencies between infant and adult probabilistic reasoning and other contradictory findings in the young history of behavioral decision research. Let us conclude by outlining four key implications of the description-experience continuum of statistical intuitions: one methodological, one normative, one conceptual, and one relating to policy.
Concerning methodology, we have shown that diverse paradigms are needed to fully understand the breadth and adaptability of human probabilistic thinking. Experimental approaches that reduce people's statistical intuitions to a snapshot of their shortcomings run the danger of dismissing people (adults) as fundamentally inept in matters of probabilistic thinking-a prevalent and influential portrayal of the intuitive statistical mind in research on adult judgment and decision making that has been accentuated by current research on infant and animal probabilistic reasoning (see Gopnik, 2014). More generally, as our review highlights, researchers' choice of experimental design is crucial for determining the level of statistical competency they will observe. If an experimental task does not accurately represent the situation toward which a researcher intends to generalize, the processes studied may be altered in such a way that the results obtained are no longer representative of people's functioning in the natural causal texture of their ecology (Brunswik, 1955;Dhami, Hertwig, & Hoffrage, 2004). Our argument is not that experience-based situations are more representative of people's information ecology than description-based situations-although they may arguably be more prevalent-or that they better represent the situations toward which researchers aim to generalize. Rather, we emphasize that both classes of situations exist in the information ecology of the 21 st century and that it is important to strive to represent and contrast both designs wherever possible when investigating people's statistical intuitions. Only then will it be possible to gain a comprehensive understanding of the intuitive statistical competences of the human mind.
Second, the description-experience continuum of statistical intuitions has normative implications. Continuous experimental conditions require the evaluation of continuous learning processes (Hogarth, 1981). In description-based studies, researchers commonly assume benchmarks of statistical reasoning that hold in large samples or at the limit. Yet these principles do not always hold in small or finite samples-which arguably represent a more veridical reflection of the environmental statistics faced by real-world people with limited attention and short-term memory. It follows that "evaluations of rationality based on long-run considerations or limit properties may lead to a distorted picture" of human rationality (Hahn, 2014, p. 239). One task for future research will be to test experimental demonstrations of classic cognitive illusions not only against norms that hold at the limit but also against the reality of humans' limited experience (see Hahn, 2014;Hogarth, 1981).
A third implication is conceptual and points to directions for future research. We have suggested that the level of experiential involvement is crucial in determining the availability of various cognitive strategies that may differ in computational complexity, and that experiencebased formats can enable the use of computationally simpler algorithms (see Gigerenzer & Hoffrage, 1995;Hertwig & Pleskac, 2010). Future research will need to further investigate the cognitive mechanisms by which experience improves or impairs probabilistic reasoning, and to determine what it is about symbolic descriptions that can make them difficult. Is it that they subtract "analogical" properties of the original experience (e.g., sequential experience in time)? Or that symbols are cultural inventions that humans have not evolved to process intuitively? Do they introduce sources of ambiguity that experience resolves more easily (e.g., semantic ambiguity; Hertwig, Benz, & Krauss, 2008;Hertwig & Gigerenzer, 1999)? And to what extent can descriptions, because they carry the author's implicit frame of reference (McKenzie & Nelson, 2003), make people more susceptible to being manipulated or misled? Of course, there will also be important contexts in which symbolic descriptions of the probabilistic texture of the world are indispensable because they, for instance, represent and summarize the accumulated wisdom and experience of others. Here the question is how such symbolic descriptions can be constructed to ensure maximum transparency and accessibility.
A related avenue for future research is to embed the description-experience distinction into research on the developmental trajectory of how probability is computed in the mind. Recent advances in research on probabilistic models of cognitive development suggest that human cognition is guided by powerful rational learning mechanisms from infancy onward (e.g., Bonawitz et al., 2014;Gopnik, 2012;Gopnik & Wellman, 2012;Perfors et al., 2011;Xu, 2019). Infants' remarkable statistical inference abilities have been interpreted as evidence for this theoretical framework because they indicate that the vital tools needed for rational statistical learning are already available in early infancy.
Finally, we turn to potential educational and other policy implications. Our review has been informed by a growing number of experiments on risky decisions from experience versus description (see, e.g., Wulff et al., 2018). This research has highlighted implications for policy in the world outside the laboratory: in communicating risk, designing economic markets, and implementing safe practices in the workplace and everyday life (Barron, Leider, & Stack, 2008;Erev & Roth, 2014;Kaufmann, Weber, & Haisley, 2013;Weber, 2006;Yechiam, Erev, & Barron, 2006). Extending this conceptual distinction to developmental issues promises to generate fruitful research questions and, ultimately, educational applications. Some of the implications of the description-experience continuum that we have highlighted, such as the use of natural frequency-based formats to foster statistical inference, have been discussed in medical, legal, and educational settings (see, e.g., Hoffrage, Lindsey, Hertwig, & Gigerenzer, 2000). Another avenue for future research is to better understand the relationship between intuitive, experience-based notions of probability and the more formal views of probability acquired through schooling. A wealth of developmental research suggests that making proportional calculations of probability is a difficult skill to learn (for a review, see Bryant & Nunes, 2012). But might it be conceivable to capitalize on infants' well-formed intuitions when teaching children to solve formal probability problems? Gopnik (2014) suggests that it may indeed be possible to "exploit [babies'] intuitive abilities to teach children, and adults, to understand probability better and to make better decisions as a result." We have shown that one crucial aspect in pursuing this goal is to take into account experiential features that can foster good statistical intuitions. Finally, it is possible that enlisting experience in the process of teaching and learning probabilistic reasoning-for instance, by experiencing the processes of information sampling or the consequences of cognitive acts (e.g., inferences, choices, and estimates), by engaging with and systematically manipulating physical instantiations of chance devices, or by being exposed to stationary and nonstationary environments in which the information sampled and the decisions made entail different degrees of contingency (that are complicated to describe symbolically)-is better suited to empower self-directed and active learning (see Gureckis & Markant, 2012) than is the processing of symbolic information.
Let us conclude by returning to Gopnik's (2014) question. Our review of the large and disparate literature on statistical intuitions suggests that one key to understanding the puzzling discrepancy between smart babies and stupid grown-ups in matters of probabilistic reasoning is that infants and children typically operate on the basis of immediate experience or require information formats that closely approximate that experience. Adults-and this is one of the great achievements of cognitive development-are also able to process symbolic, propositional representations of the world. Their untutored statistical intuitions may, however, be better tuned to experience-than to description-based formats.

Declaration of Competing Interest
None.