Throughout our daily lives, we encounter an ongoing barrage of mundane stimuli that demand routine responses. This incidental experience forms the fabric of our interaction with the world. Clearly, the sum of this experience determines our behavior, but how long-lasting is the effect of each experience on subsequent behavior?

The effects of recent experience on decision making have been studied via a two-alternative forced choice (2AFC) paradigm. On each trial, one of two stimuli is presented, and subjects are asked to press one of two response keys as quickly as possible. Response time (RT) varies reliably as a function of the exact sequence of preceding trials, as depicted in Fig. 1a (see, e.g., Cho et al., 2002; Jentzsch & Sommer, 2002; Remington, 1969; Soetens, Boer, & Hueting, 1985). These sequential dependencies are not a mere laboratory curiosity, but can have a meaningful impact on naturalistic decision making. For example, professional basketball players choose their shot locations on the basis of recent attempts and successes (Neiman & Loewenstein, 2011). The recent braking or acceleration actions of automobile drivers can explain variability in response latencies of up to 100 ms, which is potentially the difference between a collision and a near miss (Doshi, Tran, Wilder, Mozer, & Trivedi, 2012). Sequential dependencies have also been demonstrated in legal reasoning and jury evidence interpretation (e.g., Furnham, 1986; Hogarth & Einhorn, 1992) and in clinical assessments (Mumma & Wilson, 2006).

Fig. 1
figure 1

Reanalysis of a representative sequential-effects study (Jentzsch & Sommer, 2002, Exp. 1). (a) Mean response times for a current trial type—repetition (R) or alternation (A)—as a function of sequence history (current trial at top of label on the x-axis). Error bars here and elsewhere for the behavioral data indicate standard errors. The graph shows exponential (light) and power (dark) models—with full context horizon—fit to the per-subject trial-by-trial data and averaged across subjects. (b) Cumulative prediction error (R 2) as a function of context horizon. Error bars indicate standard errors of the R 2 differences between models (Loftus & Masson, 1994), thus aiding in comparing models but not horizons

Sequential dependencies arise naturally from psychological and neurobiological models of incremental learning, including error correction methods (Rescorla & Wagner, 1972), reinforcement learning (Sutton & Barto, 1998), and Hebbian learning (Hebb, 1949). These models yield an exponentially discounted influence of past trials, which explains the inverted-V pattern common to many 2AFC experiments (as in Fig. 1a). Similarly, models from optimal-control theory for tracking in nonstationary environments, such as the Kalman (1960) filter, also produce exponential decay. These models are all appealing, because the past trial history is captured by a single state variable (or sufficient statistic) that can be maintained and updated from trial to trial.

Models that produce exponential decay of past trials predict that sequential dependencies will operate only on short time scales. Moreover, analyses of sequential dependencies have focused on the short time scale, and the design of experiments has not been well suited to measuring longer-range effects. However, several studies have hinted at the possibility that a single experience can have an influence on behavior that persists for minutes (e.g., Link, Kos, Wager, & Mozer, 2011; Wong & Shelhamer, 2011), or even a day (Ward & Lockheed, 1970), consistent with an alternative theoretical perspective in which each experience is stored in long-term memory, and behavior is guided by the cumulative impact of these memories (e.g., Kasif, Salzberg, Waltz, Rachlin, & Aha, 1998; Stanfill & Waltz, 1986).

Instead of an exponential discounting of the past, long-term memory is typically characterized as following a “power law of forgetting” (Anderson et al., 2004; Rubin & Wenzel, 1996; Wixted & Carpenter, 2007; Wixted & Ebbesen, 1997). Power functions are qualitatively different from exponential functions because they can produce a single curve that exhibits both rapid decay of the most recent trials (a strong short-term recency effect) and slow decay of far-back trials (a long-range residual effect). With exponential decay, long-term effects are vanishingly small, at least with decay rates in the range needed to explain short-term recency.

In this investigation, we explored the persistence of incidental experience, in terms of both the scope of its influence and the nature of its decay. We began by reanalyzing trial-by-trial data from a typical 2AFC experiment (Jentzsch & Sommer, 2002). We compared two models of sequential effects that assume that subjects form an expectation for the next trial using an average of previous trials that is weighted either exponentially or according to a power function. RT was predicted to be fast when the expectation matched the actual trial, and slow when the expectation did not. In the analysis of the Jentzsch and Sommer data and throughout this article, each model was fit to the specific trial history of individual subjects by minimizing the mean squared error across all trials. Both models had a single theoretically relevant free parameter for determining the relative weighting of past trials.

The analysis used in previous investigations, in which RTs have been conditioned on the four-back sequence (Fig. 1a), does not gauge the persistence of experience or facilitate discrimination of the two models. Thus, to examine the influence of past trials more closely, we studied how model fits vary as a function of the number of past trials used to form each expectation (the context horizon). Out-of-sample fits were obtained for each context horizon by iteratively computing a prediction for each trial using a model that was fit to the preceding trials and constrained to have the desired context horizon. All of the models had one free parameter, regardless of horizon size. Accumulative prediction error (Wagenmakers, Grünwald, & Steyvers, 2006) was computed from the out-of-sample fits (Fig. 1b). For an error measure, we used the coefficient of determination (R 2), which is derived from the “sum of squared residuals” error measure recommended by Wagenmakers et al. Increasing the horizon beyond four trials back yields reliable improvements in fit: Across models that used four to 1,024 past trials, a significant main effect of horizon on R 2 emerged [F(8, 72) = 3.28, p = .003], but despite the appearance of a better fit for the power model, the interaction between horizon and model was not reliable [F(8, 72) = 1.033, p = .42].

The Jentzsch and Sommer (2002) study was limited because higher-order sequence statistics were not controlled—introducing an additional source of variability—and because distinguishing the predictions of two models is difficult when sequences have no structure. The latter point is due to the fact that, when two trial types—for instance, repetition and alternation—occur with equal probability, their influences tend to cancel out, regardless of how strongly individual trials are weighted.

Experiment 1: Autocorrelation in the sequence structure

For the data from Experiments 1 and 2, go to https://sites.google.com/site/mattwilder/research/persistent.

We therefore conducted a 2AFC study with a biased sequence structure in two opposing conditions, one in which 2/3 of the trials were repetitions of the preceding trial, and one in which 2/3 of the trials were alternations of the preceding trial—that is, positive and negative autocorrelation, respectively. This experiment will be reported in more detail in Jones, Curran, Mozer, and Wilder (2013). Here we focus on long-range sequential effects; in the Jones et al. study, we address orthogonal features of the data.

Method

Subjects

A group of 28 young adults (ages 21.5 ± 2.9 years; nine female, 19 male) participated in exchange for monetary compensation. Each subject performed in two sessions, one each in the positive and negative conditions. The sessions were spaced by 2–7 days, and order was counterbalanced between subjects. One subject was removed from the analysis because of an error in response recording during one block. The subjects gave informed consent in accordance with the University of Colorado’s Institutional Review Board.

Stimuli and apparatus

The subjects’ task was to respond to the location of a white dot, 5 mm in diameter, presented 11 mm above or 12 mm below a 4-mm horizontal white fixation line that was visible throughout the task. Responses were made using a button box, oriented vertically so as to be spatially compatible with target locations. The left and right index fingers were assigned to the two buttons, with the assignments counterbalanced across subjects and fixed across sessions for each subject. The stimulus duration was 60 ms, and a 700-ms response-to-stimulus interval followed each response. RTs were recorded at 1000 Hz.

Procedure

Each session consisted of 3,402 experimental trials, divided into 14 blocks of 243 trials. Within each block, local stimulus histories were controlled to a depth of six trials, and the frequency of each of the 64 (26) different trial sequences was exactly as dictated by the repetition rate for the condition (1/3 and 2/3 repetitions for the negative and positive conditions, respectively). The actual stimulus identities (above or below the fixation line) were equally probable. Subjects were given rest breaks roughly every 116 trials, and additional practice and postrest contextual lead-in trials were inserted into the sequence, for a total of 3,744 trials.

Results and discussion

As expected, RTs were modulated by the short-term context (Fig. 2a). However, behavior also depended on the autocorrelation structure: The RTs for repetition trials (left side of Fig. 2a) were faster in the positive than in the negative condition, and vice versa for alternation trials. The difference due to autocorrelation structure when conditioned on the immediate context indicates that the influence of the past extends beyond four trials back. Although one cannot determine how far back from Fig. 2a, a preference for the power model emerged when fits to the per-subject trial-by-trial data were aggregated according to the four-back sequence history. The R 2 between the model and the data across the 32 histories (16 in each condition) was greater for the power model for 25 of the 27 subjects [mean R 2 across subjects, .798 vs. .730; paired t test, t(26) = 7.45, p < .001]. The R 2 values reported for the means of the four-back sequence histories are higher than those for the individual trial data—for instance, Fig. 1a—because some sources of variability are averaged out.

Fig. 2
figure 2

(a) Mean response times (RTs) for the positive and negative autocorrelation conditions as a function of sequence history: Exponential (light lines) and power (dark lines) models—with full context horizon—fit to per-subject trial-by-trial data and averaged across subjects. (b) Cumulative prediction error (R 2) as a function of horizon. Error bars are as in Fig. 1. (c) Lag profile averaged across conditions and subjects in log–log coordinates: Means of the exponential and power function fits to the per-subject lag profiles. (d) Histogram of log-likelihood ratios for individual-subject fits, in which negative (dark squares) supports the power model and positive (light squares) supports the exponential model. Significance was determined by Vuong’s (1989) closeness test. (e) Difference in mean RTs for repetition and alternation trials by blocks (234 trials) for each autocorrelation condition

Support for a long-range sequential effect was obtained by examining the accumulative prediction error values for the two models with varied context horizons (Fig. 2b). We found a significant main effect of horizon [F(8, 208) = 85.1, p < .001] and an interaction between model type and horizon [F(7, 182) = 62.3, p < .001]. The exponential model fit improves reliably as more trials are included out to 32 trials [comparing 32 vs. 16: t(26) = 4.12, p = .0003], but no further [1,024 vs. 32: t(26) = 0.36, p = .72]. In contrast, the power model fit improves for up to 1,024 trials [1,024 vs. 512; t(26) = 2.84, p = .0086]. Behavior in this task is clearly affected by a long history of prior experience.

Further support for power over exponential decay was obtained by studying a lag profile derived from the data, which is plotted on a log–log scale in Fig. 2c. The lag profile isolates the effect of the trial that is l trials in the past by computing the difference between the mean RT when the current trial does not match the lag-l trial and the mean RT when those trials do match. Because the exponential and power models both predict a lag profile that matches the decay function, this analysis offers another means of differentiating the models. The empirical lag profile appears linear in log–log coordinates, suggesting power decay. We fit the individual subject lag profiles to both power and exponential functions and obtained a better fit for the power function [mean R 2 across subjects, .878 vs. .855; t(26) = 2.17, p = .039].

Even though both the power and exponential functions have a single free parameter, one could argue that the power function fits better because it has more flexibility. To rule out this possibility, we compared out-of-sample fits using leave-one-out cross validation. The power fit was consistently better than the exponential fit across lags and subjects: The mean absolute deviation between the empirical and predicted lag values was smaller for the power function [F(1, 26) = 10.47, p = .003]. For nine of the ten lags, the mean absolute deviation was smaller for the power function. Furthermore, we compared the fits for individual subjects using an extension of the likelihood ratio test that is appropriate for non-nested models (Vuong, 1989). Figure 2d presents a histogram of the log-likelihood ratios across subjects. A preference for the power model is evidenced by both the larger number of significantly negative ratios according to the Vuong test (11 dark vs. three light boxes) and the larger total number of negative ratios (18 vs. nine).

If incidental experience has a long-lasting influence, a cumulative effect of trial statistics across the entire course of the experiment might be observable. Figure 2e reveals a preference for repetitions in the positive condition that increases as the experiment progresses, and a preference for alternation in the negative condition. When superimposed over Fig. 2e, the predictions derived from power model fits capture the long-range effect of condition. In contrast, the trajectory from the exponential model fits is roughly flat, because the model cannot benefit from integrating beyond about 64 past trials.

The power model is appealing because it is capable of explaining effects across a range of time scales, from the variation due to the immediate four-back context to the bias that grows over the hour-long duration of the sequence in each autocorrelation condition.

Experiment 2: Sequential dependencies in motor control

Although we have argued for a unified explanation of short- and long-term adaptation via the power model, there is an alternative, though somewhat less parsimonious, possibility: that the two time scales reflect distinct mechanisms. For instance, in Experiment 1, the sequence structure might have been detected by the subjects, leading to explicit learning and deliberate biasing of behavior. We thus aimed to strengthen our account by demonstrating the persistence of incidental experience in the absence of sequential structural regularity.

However, as our re-analysis of the Jentzsch and Sommer (2002) data revealed, it was difficult to uncover long-range effects when the sequence history was balanced and response latency was the dependent variable. We conjectured that response latency might not be a terribly sensitive measure because speedy responses are a secondary consideration in the performance of 2AFC; responding correctly is the subjects’ primary goal. Consequently, RTs might be more susceptible to perturbation by task-unrelated factors. A task whose behavioral measures are better aligned with the subjects’ primary goals might be more effective in exposing a persistent influence of incidental experience, despite the previously described cancellation of far-back effects that results from balanced sequences.

One domain of study that seems suitable is motor control, because movement trajectories reflect planning processes. Long-term motor adaptation has been observed when systematic and consistent perturbations have been applied to the control system (e.g., Hoppand & Fuchs, 2004; Robinson, Soetedjo, & Noto, 2006; Shadmehr & Mussa-Ivaldi, 1994). Some support for the persistent influence of incidental experience was found in an eye movement task in which error-based adaptation was observed extending back nearly 100 trials and decaying according to a power function (Wong & Shelhamer, 2011). However, in this task, the correlations could be attributed to endogenous variation rather than exogenous effects of the target sequence, because the target timing and position were completely predictable on every trial. Though ignored in many motor control studies, short-term sequential dependencies have been demonstrated in reaching tasks in which straight-line arm movements have been disrupted by variable perpendicular perturbation forces (Fine & Thoroughman, 2006; Scheidt, Dingwell, & Mussa-Ivaldi, 2001).

To bridge the gap between traditional 2AFC experiments and motor studies that have exhibited sequential effects, we explored a reaching task with a sequential structure akin to that of 2AFC. Rather than imposing an autocorrelation structure, as in Experiment 1, the two trial types were controlled to be equally probable.

Method

Subjects

A group of 20 right-handed young adults (ages 18.3 ± 0.7 years; 14 female, six male) participated in exchange for monetary compensation. The subjects gave informed consent in accordance with the University of Colorado’s Institutional Review Board.

Stimuli and apparatus

Subjects sat in a chair with full back support and made horizontal planar reaching movements while grasping the handle of a robotic arm (Interactive Motion Technologies, Shoulder-Elbow Robot 2). The handle position, handle velocity, and robot-generated force were recorded at 20 Hz.

The task involved making rapid 15-cm out-and-back movements along the midline of the transverse plane. Visual feedback of a cursor representing hand position and the home and target circles was presented on an LCD screen in front of the subjects (see Fig. 3a). Once subjects had centered the cursor within the home circle, the target appeared, and an audio cue signaled the trial onset. On each trial, a perturbing force was applied perpendicular to the desired direction of movement. The force increased linearly as a function of distance from the home circle over the first 5 cm (1 N/cm) and remained fixed at 5 N for the remaining 10 cm. No forces were applied on the return. Subjects received a warning message if the trial duration exceeded 1.4 s.

Fig. 3
figure 3

(a) Experimental setup for Experiment 2. (b) Mean trajectories from the dark dot to the light dot for different sequences of right (R) and left (L) perturbations (current trial at the right end of each label). Sequential dependencies here result from the history of right and left forces rather than from repetition/alternation sequences. (We anticipated this on the basis of the theoretical division between perceptual and response sequential effects; see Wilder et al., 2010.) (c) Cumulative prediction error (R 2) as a function of context horizon. Error bars are as in Fig. 1. (d) Lag profile in log–log coordinates for mean exponential and power function fits. (e) Histogram of log-likelihood ratios for individual-subject fits, in which negative (dark squares) supports the power model and positive (light squares) supports the exponential model. Significance was determined by Vuong’s (1989) closeness test

Procedure

Two versions of the task were run, which were identical except for the control of the stimulus sequences, with ten subjects in each. In Version 1, ten introductory null trials with no force were followed by 490 force trials, with the force direction (left or right) being randomly selected with equal probabilities. Subjects were given a 30-s break after every 100 trials. In Version 2, the subjects completed a total of 1,106 trials, with ten introductory null trials and 30-s rests every 137 trials. The nine trials following each rest were excluded from the analyses. The local stimulus histories of right and left trials were controlled to a depth of nine trials, so that each of the 512 (29) trial sequences occurred exactly twice. For the model fitting, the deflection measures for right and left trials were normalized—for each subject—to have the same mean and standard deviation, thus eliminating imbalances due to structural constraints of the arm. All statistical analyses focused on model fits to the individual subjects and collapsed across data from the two versions of the experiment.

Results and discussion

Individual trial movement trajectories were affected by the recent trial sequence: Subjects compensated for the current perturbation more accurately when it was consistent with the recency-weighted sequence of prior perturbations (Fig. 3b). For the purpose of modeling, the accuracy of the trajectory on a given trial was quantified as the absolute value of the maximum horizontal deviation of the trajectory. However, other deflection measures—for instance, initial angle, mean deviation, and area under the deflection curve—gave similar results. The persistence of past experience was revealed by analyzing accumulative prediction error as a function of context horizon (Fig. 3c). We found support for the hypothesis that sequential effects extend back more than 32 trials [one-tailed t test for 64 vs. 32: exponential, t(19) = 1.93, p = .035; power, t(19) = 1.86, p = .040]. Because the exponential and power models differ primarily in the weights that they assign to far-back trials, we expected that the balanced sequences in this experiment would make it difficult to compare the two models directly. Despite this limitation, evidence for power decay over exponential decay was found in the near-linear trend of the lag profile in log–log coordinates (Fig. 3d). The per-subject fits to the lag profile values were reliably better for a power function than for an exponential function [mean R 2 across subjects, .891 vs. .835; t(19) = 4.98, p < .001].

Using leave-one-out cross validation, the power fit was significantly better than the exponential fit across lags and subjects: The mean absolute deviation between the empirical and predicted lag values was smaller for the power function [F(1, 19) = 15.26, p = .001]. Additionally, for nine of the ten lags, the mean absolute deviation was smaller for the power function. Figure 3e shows a strong preference for the power model according to Vuong’s (1989) test, with more significantly negative log-likelihood ratios (12 dark vs. 0 light) and a larger total number of negative ratios (17 vs. 3).

A normative account of long-range effects

Many theoretical accounts characterize sequential dependencies as being a by-product of adaptation to the statistical structure of a dynamic environment (e.g., Jones & Sieck, 2003; Mozer, Kinoshita, & Shettel, 2007; Wilder, Jones, & Mozer, 2010; Yu & Cohen, 2009). These accounts suppose that the statistics of the environment are tracked over time—statistics such as relative stimulus frequency or the magnitude and direction of perturbing forces. The statistics represent not only a summary of the past, but an expectation for the future, facilitating tuning of perceptuo-motor control to perform optimally in the anticipated environment.

If environments have temporal nonstationarity, more recent experience is most indicative of what an individual will experience next. Specific theoretical formulations lead to specific characterizations of how past experiences should optimally be combined to predict future events. Yu and Cohen’s (2009) dynamic belief model (DBM) explains sequential effects as a consequence of optimal Bayesian inference in an environment whose characteristics are stationary for an interval of time, until they are redrawn from a reset distribution at abrupt changepoints distributed in time according to a Bernoulli process. The DBM assumptions lead to predictions about behavior that are consistent with an exponentially decaying lag profile. Consequently, the model fails to produce long-range effects of experience.

We propose an extension of the DBM, called the hierarchical dynamic belief model or HDBM (Fig. 4a), that yields roughly a power function lag profile, and consequently outperforms the DBM when fit to the entire experimental data in one pass [Fig. 4b; Exp. 1, t(26) = 7.69, p < .0001; Exp. 2, t(19) = 3.87, p = .0010]. The HDBM relaxes a seemingly unnatural assumption in the DBM, that environmental statistics have a time-invariant probability of change. For example, it would seem that the dynamics of change during a 4-h plane flight would not be the same as those during the half hour it takes to deplane, walk through the terminal, collect bags, catch a taxi, and check into a hotel. The HDBM avoids this restrictive assumption by taking a hierarchical Bayesian approach in which the underlying generative model is a nonhomogeneous Bernoulli process—that is, a process with a fluctuating change-point probability that is driven by a separate Markov process. Because the HDBM models a spectrum of environments—ranging from rapidly changing to stable—its expectations of the future reflect strong short-term recency, as well as long-range dependencies.

Fig. 4
figure 4

(a) Graphical representation of the hierarchical dynamic belief model (HDBM). x t is the trial type at time t, γ t is the parameter of the Bernoulli process generating x t , and α t is the change probability. The original DBM (Yu & Cohen, 2009) consists of only the dark parts of the graph, with α constant. (b) Comparison of model performances for Experiments 1 and 2. Error bars for the power and exponential models—and, similarly, for the HDBM and DBM models—represent the standard errors of the R 2 differences between the two models across subjects

The success of the HDBM in fitting the data suggests a normative explanation for the long-range influence of incidental experience on behavior. Under the assumptions of the HDBM, the mind optimally adapts to a complex dynamic environment in which even seemingly irrelevant experiences that occur far in the past offer predictive information about upcoming environmental states and task demands. Specifically, the expected relevance of a past experience to the current moment falls off according to an approximate power function.

As we previously mentioned, human forgetting of explicit (declarative) knowledge in long-term memory is often characterized in terms of power decay. This decay function has been cast as rational, via the observation that in diverse domains—newspaper articles, parental speech, and electronic mail—the empirical probability of needing access to a specific piece of information is well fit by a power function of time (Anderson & Schooler, 1991). The present analyses of the DBM and HDBM indicate that this observation is not well explained by nonstationarity with a fixed change probability, but that introducing variable change rates offers the basis for a normative explanation. Thus, power decay serves as an informative connection between sequential effects, long-term memory, and the statistical structure of the environment.

Concluding remarks

Contrary to the prevailing assumption that variations in experience produce only fleeting perturbations in behavior, we have argued that incidental priming yields enduring modulations of behavior. Modeling indicates that past experience is integrated to anticipate the future using a weighting that is strongly recency based but that also has a heavy tail, consistent with power but not exponential discounting. Power discounting can be characterized as optimal adaptation to the statistics of an environment with second-order nonstationarity.

To perform optimal prediction in nonstationary environments with change-point dynamics, the complete history of experience must be maintained (Adams & MacKay, 2006). Consequently, our results are consistent with the perspective that as individuals interact with their world, they continually log their experiences, forming a library of memory traces that is called on to adapt behavior to an environment that can change on time scales ranging from seconds to months. Alternatively, a good approximation of optimal prediction can be achieved by combining across several exponentially decaying sequence statistics that span a range of time scales (e.g., Kording, Tenenbaum, & Shadmehr, 2007; Mozer, Pashler, Cepeda, Lindsey, & Vul, 2009; Sikström, 1999, 2002; Staddon, Chelaru, & Higa, 2002; Wixted, 2004). Indeed, Mozer et al. (2009) and Murre and Chessa (2011) demonstrated mathematically that power functions emerge when an infinite collection of exponential functions are averaged together, assuming certain constraints on the distribution of decay rates. Our work suggests the necessity of combining across multiple time scales, ranging from just a few to hundreds of trials, to the entire duration of an experiment. The presence of power decay, regardless of the precise mechanisms that produce it, suggests that sequential dependencies in rapid decision-making are best understood as a memory phenomenon akin to human long-term declarative memory, rather than as a byproduct of short-term incremental learning.

The perspective that sequential effects reflect memory storage and updating offers a novel interpretation of the continual and often long-range (Gilden, Thornton, & Mallon, 1995) fluctuations observed in human behavior and cognition. Far from being internal noise in the system, trial-to-trial variability in choice, response latency, and movement reflects an adaptive process in which individuals exploit their extensive experience in order to respond optimally to a dynamic world (Appendix).