Introduction

People often fail to persist in their pursuit of outcomes that they initially desire. They abandon diets, quit waiting for buses, and hang up on customer service. A classic laboratory demonstration of this phenomenon is the “marshmallow task” (Mischel et al., 1972; Mischel and Ebbesen, 1970), in which children initially start waiting for a larger reward (e.g., two marshmallows), but often give up at some point and take a smaller reward (e.g., one marshmallow) instead. Why do people fail to persist in waiting for delayed rewards? There are at least two competing explanations. The “limited self-control” hypothesis (Baumeister and Heatherton, 1996; Muraven and Baumeister, 2000) explains quitting as the result of a limited ability for self-control; waiting is effortful, and people cannot sustain that effort indefinitely. In contrast, the “temporal expectations” hypothesis explains quitting as the result of decision-making in the face of temporal uncertainty; people do not know exactly when the delayed reward will arrive, and given their expectations about the possible times at which it could arrive, there is a point at which waiting is no longer value-maximizing (McGuire and Kable, 2013). We used a physiological index of surprise and effort—pupil dilation—to adjudicate between these two explanations of persistence under temporal uncertainty.

One explanation for why people quit waiting for delayed rewards is that persistence is effortful and becomes more difficult over time as the capacity to exert effort declines (Baumeister and Heatherton, 1996; Muraven and Baumeister, 2000). Therefore, individual differences in persistence may reflect differences in the capacity for self-control. In line with this idea, children who wait longer in the marshmallow task do better academically and have fewer behavioral problems as adolescents and young adults (Falk et al., 2020; Mischel et al., 1989; Shoda et al., 1990). This suggests that the same self-control processes that support goal attainment in real life also may support persistence for delayed rewards.

An underlying assumption of this “limited self-control” explanation, however, is that it is always valuable to keep waiting for delayed rewards once you have started waiting. However, whether continued persistence is valuable depends on a person’s expectations about the possible times at which delayed rewards will materialize. When there is uncertainty about the timing of delayed rewards, the normative amount of time to wait depends on the person’s beliefs about the probability distribution of reward arrival times (see McGuire and Kable, 2013 for an overview of this theoretical framework). In some cases, a person’s beliefs are such that the expected wait time remaining decreases as time elapses (e.g., if the distribution of potential arrival times is Gaussian or uniform). Under such beliefs, once you have started waiting for a delayed reward, you should persist in waiting (e.g., waiting through a commercial break). Giving up under Gaussian or uniform beliefs could be interpreted as a failure of self-control, because the person believes that the arrival of the awaited reward is getting closer, but they quit anyway.

However, in other cases, a person’s beliefs are such that the expected wait time remaining increases as time elapses (e.g., if beliefs about arrival times follow a “heavy-tailed” distribution). Under such beliefs, there is a point at which you should quit, because the expected arrival of the awaited reward is now so distant that it is no longer worth waiting. Thus, giving up under heavy-tailed beliefs cannot be interpreted as a failure of self-control, because limiting persistence is the rational course of action. It follows that if people hold heavy-tailed beliefs about reward timing, then their quit decisions are more likely to be explained by their temporal expectations than by their (lack of) self-control. Consistent with this “temporal expectations” account of quitting, people do hold heavy-tailed beliefs about reward timing in many real-world situations (Griffiths and Tenenbaum, 2006), such as waiting for a diet or exercise regimen to work (McGuire and Kable, 2013). In addition, experimental manipulations of reward timing expectations impact people’s willingness to wait for delayed rewards (Fung et al., 2017; Kidd et al., 2013; Lang et al., 2021; Lempert et al., 2018; Massar and Chee, 2015; McGuire and Kable, 2012, 2015; Michaelson et al., 2013; Michaelson and Munakata, 2016). Moreover, individual differences in wait times in the marshmallow task are strongly associated with childhood environmental circumstances, such as socioeconomic status and social support, which could be proxies for the sparseness and predictability of rewards in daily life (Michaelson and Munakata, 2020; Watts et al., 2018).

In addition to making different predictions about participants’ reward timing beliefs, the limited self-control and temporal expectations accounts also make different predictions about which quit decisions require the most effort. Under the limited self-control account, waiting becomes more and more effortful (or aversive) over time, so the longer one has waited before quitting, the more effort they must have exerted. Under a temporal expectations account, however, decisions to quit are the consequence of a value-based decision process. In a value-based decision-making context (taking response time as an indicator of effort), the most effortful decisions are the ones that are most atypical, because they require overriding one’s typical decision tendency or strategy (Krajbich et al., 2015). Therefore, decisions to wait longer than one usually does will be effortful, but so will decisions to quit earlier than one usually does.

In the current study, we examined whether participants’ persistence in waiting was better explained by limited self-control or expectations about temporal uncertainty. We used a task, adapted from our previous research (McGuire and Kable, 2012), that was designed to minimize any differences in the payoffs between different waiting strategies and therefore assess an individual’s temporal expectations and willingness to wait for rewards when minimally influenced by feedback. It would be difficult to decide which explanation for quitting—limited self-control or temporal expectations—is more likely by using behavioral data alone, because temporal expectations and effort are latent variables. Therefore, we measured pupil diameter throughout the task, and we focused our analyses on two different metrics that have been associated with related, but distinct constructs: phasic pupil responses (post-outcome) and pre-decision pupil diameter (pre-outcome).

Under constant illumination, rapid phasic pupil dilation occurs in response to surprising events (Critchley et al., 2005; Nassar et al., 2012; O’Reilly et al., 2013; Preuschoff et al., 2011). This surprise response does not simply reflect sensory changes and is modulated by individual differences in beliefs (Filipowicz et al., 2020) and reward valuation (Lempert et al., 2016; Lempert et al., 2015). We expected to observe a larger phasic pupil dilation response when participants collected rewards compared with trials on which they quit before receiving the reward. Our key analysis examined the relationship between this reward-related pupil dilation response and the length of time the participant had been waiting for the reward. If participants believed that the reward’s arrival was becoming more likely over time, then their pupil surprise responses would decrease as a function of wait time (and quitting would be irrational, and therefore indicative of a failure to sustain self-control). If, instead, participants believed that the reward’s arrival was becoming less likely as time passed, then their pupil surprise responses would increase as a function of wait time (and eventually quitting would be warranted).

In addition to examining phasic pupil responses to reward receipt, we also examined “pre-decision” pupil diameter just before quit decisions (de Gee et al., 2014; Urai et al., 2017). Pupil diameter measured during engagement with a task is associated with cognitive effort (Kahneman and Beatty, 1966; van der Wel and van Steenbergen, 2018). Pupils dilate when cognitive effort needs to be exerted to complete a task (Da Silva Castanheira et al., 2020; Robison et al., 2021; Sayalı et al., 2022). Pupils also dilate more during decision-making when decisions are more effortful: when option values are close to each other and decision confidence is low (Colizoli et al., 2018; de Gee et al., 2014; Urai et al., 2017; Zénon, 2019). Under a limited self-control account of persistence, we might expect that waiting becomes more effortful over time, and therefore pupil diameter would increase the longer one has been waiting. In this case, pre-decision pupil diameter would be largest when delays were very long. In contrast, under a temporal expectations account of persistence, quitting is the consequence of a value-based decision-making process, and the most effortful quit decisions would not necessarily be the ones preceded by the longest wait. Rather, these quit decisions will be more effortful to the extent that they go against one’s usual choice tendency or strategy. Indeed, in other choice contexts, pupil dilation is highest around the time of choices that go against one’s typical tendency (de Gee et al., 2014, 2020; Krishnamurthy et al., 2017). Therefore, under a temporal expectations-based explanation, we would expect pre-decision pupil to be highest when the trial’s quit time deviates substantially from one’s usual quit time, in either direction.

In summary, we measured pupil diameter while participants performed a persistence task to adjudicate between two potential explanations for giving up on delayed rewards: a limited self-control account in which people exhaust their capacity to wait over time, and a temporal expectations-based account in which people believe that delayed rewards are less and less likely to materialize as time passes.

Method

Participants

Ninety-six participants (69 F, 27 M; mean age = 21.30; SD = 4.34, range: 18-46) completed the study. Thirteen participants were excluded from pupil dilation analyses because of missing eye tracking data (n = 1) or insufficient variability in their persistence behavior (see below for details; n = 12). The Institutional Review Board of the University of Pennsylvania approved all procedures, and all participants provided informed consent.

Persistence task

Participants performed two 10-min blocks of a persistence task while eye tracking data were recorded at 60 Hz using a Tobii T60 eye tracker. Participants wore headphones. They used a stabilizing chin rest, and the distance between the participant and the screen was fixed at 65 cm. They were asked to fixate on the center of the screen for the duration of each block. Participants were allowed to take a break between blocks, and the eye tracker was calibrated at the beginning of each block.

The persistence task was adapted from our previous work (Lempert et al., 2018; McGuire and Kable, 2012, 2015). At the outset of the task, participants were told that their goal was to earn as much money as possible in a fixed amount of time (10 min per block), because they would keep any money that they earned in the task. On each trial of the task, participants saw a token labeled “0¢” appear on the screen. After a variable delay, the token “matured,” the border of the token changed color (from gray to yellow) and its value changed to 10¢. The token always matured within 40 seconds, and participants knew that the token would always mature, but they did not know what the possible delays or maximum delay would be (see below for information about the distribution of delays). Participants could sell the token at any time for its current value (0¢ or 10¢) by pressing the spacebar. Upon pressing the spacebar, they immediately received auditory feedback (a chime sound played binaurally over headphones) for 0.25 s, and the screen refreshed with a new 0¢ token and the updated earnings (Fig. 1). The token’s value was added to the participant’s total earnings, and the total accumulated earnings were displayed on the screen throughout the experiment. Participants were informed that they could sell the token before it matured if they felt it was taking too long and they wanted to move on to a new token, but they would not earn any money from selling 0¢ tokens. To minimize potentially confounding visual changes during the trial, unlike previous versions of the task, there was no bar indicating progress through the trial, nor was the time remaining in the block shown. The task was programmed using the Psychophysics Toolbox in MATLAB (Brainard, 1997; Kleiner et al., 2007).

Fig. 1
figure 1

Design of the persistence task. Subjects waited a variable amount of time for a token to mature. When the token matured, the token’s value changed from “0¢” to “10¢” and the border of the token turned from gray to an isoluminant yellow. Subjects had to press the spacebar to sell the token, receive its current value, and advance to the next trial. At any point in the trial, subjects could quit waiting for the token to mature, sell it for no value, and advance to the next trial. Immediately upon the button press, the token reset to 0¢ and earnings were updated (if necessary), and a 250-ms chime played into subjects’ headphones to provide feedback that the token had been sold. Accumulated earnings appeared on the screen throughout the experiment

The delays until the token matured were drawn from a distribution that yields an approximately equivalent reward rate regardless of the participant’s waiting policy. The waiting policy is defined as the time at which a decision maker will give up waiting on each trial if the reward has not yet arrived. The expected return for giving up at time t is calculated as follows. Let pt be the proportion of rewards delivered earlier than t. Let τt be the mean duration of these rewarded trials. One trial’s expected return, in dollars per second, is:

$${R}_t=\frac{0.10\left({p}_t\right)\ }{\uptau_t\ {p}_t+\textrm{t}\ \left(1-{p}_t\right)+\textrm{ITI}\ }$$
(1)

The numerator is the trial’s expected gain in dollars, given a 10¢ reward. The denominator is the trial’s expected time cost in seconds, which depends on the length of the inter-trial interval (ITI). As the ITI increases, waiting policies that are very short will yield lower expected reward rates. We used a 0.25-s ITI, which allows for participants to get feedback when selling the token but is sufficiently short that very short waiting policies are not severely disadvantaged.

We used simulations to identify a reward timing distribution that would generate an approximately flat relationship between waiting policy and expected return rate, given our experimental conditions. For these simulations, we used the Generalized Pareto distribution, which has the following cumulative distribution function:

$${\textrm{F}}_{GP}\left(\textrm{t}\right)=1-{\left(1+\frac{k_t\ }{\uplambda}\right)}^{-1/k}$$
(2)

(The location parameter θ is set to zero and omitted, implying that zero is the shortest delay.) The parameter k is the shape parameter. When k = 0 (and θ = 0), the generalized Pareto distribution equals the exponential distribution with scale parameter λ. We estimated Rt (Eq. (1)) for waiting policies t in the range [0, 40] s in increments of 0.01 by taking 100,000 draws from a Generalized Pareto distribution. We did this for each k in the range [0.001, 0.1] in increments of 0.001. We then found the value of k that resulted in the minimum absolute slope of a linear fit between t and Rt over the range of 5 s to 40 s. We set the scale parameter λ to 12 s, as this would yield total earnings that were comparable to those in previous studies (~$4-5 per block); the scale parameter determines the median delay, which does not affect the expected reward rate as a function of waiting policy, although it does affect the overall earnings in the task. Thus, the final reward timing distribution we selected was a Generalized Pareto distribution with the following parameters: k = 0.034; λ = 12; θ = 0 (Fig. 2). Note that if there was no ITI, an exponential reward timing distribution (i.e., k = 0) should generate a flat expected return rate across all waiting policies, but that a slightly higher k is necessary when there is an ITI.

Fig. 2
figure 2

Reward timing distribution for the persistence task. Histogram of possible reward delays (a), and projected, z-scored, average rate of return as a function of waiting policy (b). This quasi-exponential reward distribution was used so that we could capture individual differences in persistence behavior absent any optimal strategy

To avoid very long delays, the distribution was truncated at 40 s, such that any delays longer than 40 s were set to 40 s. Furthermore, as in previous studies (Lempert et al., 2018; McGuire and Kable, 2012), to ensure that participants’ experiences were representative of the true underlying distribution, delays were not drawn randomly on each trial, but instead were sampled from each octile of the distribution in random order before an octile was repeated. Finally, the random seed that was used to generate delays was held constant across subjects, so the scheduled delays that participants were exposed to were identical. This means that any behavioral differences between participants cannot be attributed to random differences in the scheduled delays across participants.

To operationalize individual differences in persistence, we constructed a Kaplan–Meier survival curve for each participant. The Kaplan–Meier is a nonparametric estimator of the survival function (Kaplan and Meier, 1958). For each time t, it plots the participant’s probability of waiting at least until t if the reward is not delivered earlier. The area under the survival curve (AUC) is a useful summary statistic, representing the average number of seconds an individual was willing to wait within the analyzed interval. AUC calculations were done using custom software in MATLAB. Participants with AUC values of less than 5 s across both blocks were excluded from analysis (n = 11), both because there would be an insufficient trial number for examining pupil dilation responses and because routinely waiting less than 5 s would result in lower task earnings, since our task was designed so that earnings would be approximately equal across the range of 5 s to 40 s. An additional participant was excluded from analysis because they never quit waiting for a token in one block of the task, so there were no quit events for pupil analyses in that block.

In a pilot study (n = 32; mean age = 26.07; SD = 7.30; range: 18-56; 17 F, 14 M, 1 not reported), we ensured that, in this task, AUC was uncorrelated with earnings (Spearman ρ = 0.13; p = 0.478). The pilot study also confirmed that this task generates a wide range of levels of persistence (mean Day 1 AUC = 11.79; SD = 6.07; range: 4.43–32.37) and that our persistence measure is test-retest reliable (rank-order correlation between Day 1 AUC and AUC 1 week later: ρ = 0.85; p < 0.001).

Questionnaires

After completing the persistence task, participants completed the following questionnaires in Qualtrics: a computerized intertemporal choice questionnaire (Lempert et al., 2020; Senecal et al., 2012), the Rotter locus of control questionnaire (Rotter, 1966), the Alcohol Use Disorders Identification Test (Saunders et al., 1993), the Drug Abuse Screening Test (Skinner, 1982), and the Fagerstrom Test for Nicotine Dependence (Heatherton et al., 1991). These questionnaires were included for exploratory analyses examining associations between these constructs and persistence. Because we found no significant associations (all ps > 0.10), these measures are not discussed further.

Pupil diameter preprocessing

Pupil diameter data were preprocessed (separately for each 10-min block) with the following steps using custom MATLAB software: removing and linearly interpolating over samples that corresponded to abnormally fast changes in pupil diameter (defined as ±3 SD from the mean temporal derivative; these include eye blinks), regressing out the influence of eye gaze displacement, low-pass filtering with a 4-Hz threshold, and finally, z-scoring each measurement relative to the mean.

Pupil diameter analyses: Post-outcome phasic pupil response

We predicted that pupil dilation responses to the trial outcome would be modulated by whether the trial was rewarded or not. For this analysis, also done using custom software in MATLAB, we extracted pupil diameter time courses for the four second epoch time-locked to the token maturation time (for rewarded trials) or the button press that sold the 0¢ token (for quit trials). Pre-outcome pupil diameter (averaged over 200-ms before the end of the trial) was subtracted from each of the pupil diameter measurements. Note that these pupil time courses overlapped with the following trial’s delay period. Therefore, we excluded trials for which the following trial was less than 4 seconds to ensure that each time course did not contain more than one trial. We ran a series of regressions for each subject at each time point, with pupil diameter as the dependent variable and the trial outcome (1 = rewarded; 0 = quit) as the independent variable. The outcome of the following trial (1 = will be rewarded; 0 = will quit), the pre-outcome pupil diameter, and the accumulated earnings (that were shown on the screen throughout the trial) were included as covariates of no interest. Accumulated earnings might influence phasic pupil responses to rewards, if each additional reward gained in the task is valued less than the one before (i.e., money has diminishing marginal utility). Controlling for earnings also effectively controls for time-on-task, because earnings always increase within a task block. Regression coefficients were compared to zero using one-sample t-tests, and the Benjamini-Hochberg false discovery-rate procedure was used to correct for multiple comparisons.Footnote 1

Pupil diameter analyses: Relationship between wait time and reward-related pupil responses

We also examined the relationship between the reward-related pupil response and the amount of time participants waited for the reward. We expected that reward-related pupil responses would be modulated by the participants’ surprise at receiving a reward at that moment in time. If participants believed that the reward’s arrival was becoming more likely with time (as under uniform timing beliefs; see below), then we would expect that the reward-related pupil response would decrease with time waited. In contrast, if participants believed that the reward’s arrival was becoming less likely with time (as under heavy-tailed timing beliefs; see below), then we would expect that the reward-related pupil response would increase with time waited.

For just the rewarded trials, we determined the peak pupil diameter change (relative to the pre-outcome period) over the 0.5- to 4-s period following token maturation. We then regressed out any variance due to the pre-outcome pupil diameter and the outcome of the following trial. Thus, the following analysis, which was done in STATA, used the peak pupil diameter residuals as the dependent variable. We ran a mixed-effects linear regression with peak pupil response as the dependent variable and wait time as the independent variable. We allowed random slopes and intercepts to vary by subject and allowed for correlation between random slopes and intercepts. We used restricted maximum-likelihood estimation and computed degrees of freedom using the Satterthwaite method. We excluded trials in which the wait time was <1 second, reasoning that very quickly maturing tokens would always be surprising, since participants would most likely expect there to be some minimum wait time regardless of the form of their temporal expectations, but that this minimum wait time would likely be <1 second for most participants.

To illustrate the links between different forms of temporal beliefs and surprise at the reward’s arrival, we considered two distributions (Fig. 5a). We selected a uniform reward timing distribution as an example of one in which the likelihood that the reward will arrive now, given that one is still waiting, increases as time passes. In a uniform reward timing distribution, the reward is equally likely to be scheduled to arrive after every possible delay. We considered a uniform reward timing distribution between 0 s and 40 s and solved analytically for the probability of the reward arriving at that exact time for each 1-s interval over the range from 0 s to 40 s, given that the reward had not already arrived (i.e., conditional on the reward not arriving earlier). We defined the surprise to the reward arriving as the Shannon entropy for this probability (i.e., −log2 (p)). Under uniform timing beliefs, the expectation that the reward will arrive now increases exponentially as time elapses within the trial, and correspondingly, the anticipated surprise in response to the reward arriving decreases exponentially (Fig. 5a plots this curve for the range 0–30 s).

We selected a heavy-tailed reward timing distribution as an example of one in which the likelihood that the reward will arrive now, given that one is still waiting, decreases as time passes. We considered a Generalized Pareto distribution (Eq. (2)) with the following parameters: k = 0.81; λ = 16.1309; θ = 0, truncated at 90 s (i.e., setting all delays greater than or equal to 90 s to 90 s). We selected the same value of the shape parameter k used for illustration in a previous paper (McGuire and Kable, 2013) and set the value of the scale parameter λ so that the median initial expected waiting time equaled 15 s. We took 1,000,000 simulated draws from this distribution to estimate the probability of the reward arriving at each 1-s interval over the range from 0 s to 40 s, given that the reward had not already arrived (i.e., conditional on the reward not arriving earlier). The estimated probability of the reward arriving now decreases linearly as time elapses in the trial. Correspondingly, the anticipated surprise at the reward’s arrival increases as time passes (Fig. 5a plots this curve for the range 0-30 s). As long as k > 0 (i.e., beliefs are heavy-tailed) and the truncation point is after 40 s (i.e., participants believe that the maximum possible delay is at least 40 s), we see a similar pattern for other values of k, λ, and the truncation point.

Pupil diameter analyses: Pre-decision pupil diameter

Finally, we examined the relationship between the length of time waited before quitting (“quit time”) and pupil diameter at the end of quit trials. We focused on quit trials only, because it is impossible to know where in the decision process a participant is when the token matures and they are rewarded. By focusing on quit trials, we can be certain that the participant has reached a decision to quit right before they press the button to do so. If pupil diameter is highest when participants waited the longest before quitting (i.e., significant positive relationship between trial quit time and pupil diameter), that would be consistent with a “limited self-control” account in which waiting becomes more effortful with time.Footnote 2 Conversely, if pupil diameter is highest when the quit time deviates most from the subject’s typical quit time, that would be consistent with an account of persistence in which people are rationally weighing the costs and benefits of waiting. Under this account, we would expect the most effortful and difficult decisions to be those that go against one’s typical choice tendency, as observed in other decision-making contexts.

For this analysis, we used the pre-outcome pupil diameter measurements described above, which averaged over 200 ms before the end of each trial. We only included quit trials that were ≥3 s, so that these measurements were not contaminated by pupil responses following previous trials. In STATA, we conducted two mixed-effects linear regressions to predict pre-decision pupil diameter and compared their model fits using AIC. In the first regression, trial quit time was entered as the independent variable, with the previous trial’s outcome entered as a covariate of no interest. In the second regression, the independent variable was the absolute difference between the quit time and the participant’s AUC, and again we controlled for the previous trial’s outcome. The mixed-effects structure allowed for random intercepts at the participant level and random slopes by participant for the predictor of interest. It also allowed for correlation between random slopes and intercepts. We used restricted maximum-likelihood estimation and computed degrees of freedom with the Satterthwaite method.

Results

Individual differences in persistence

Participants (n = 83 in the final sample) performed a persistence task (Fig. 1) in which they waited for small (10¢) rewards on each trial. The arrival time of the reward was uncertain, and participants could choose to quit waiting and forgo the reward at any time to move on to a new trial. The task was designed so that the expected rate of return was equivalent regardless of how long participants chose to wait on each trial before giving up (Fig. 2). We found that, under these conditions, participants varied substantially in how long they chose to wait on average (in the final sample, mean AUC = 12.22 s; SD = 5.08; range: 5.43–32.59 s; Fig. 3).

Fig. 3
figure 3

Individual differences in persistence behavior (n = 83). AUC refers to the area under the Kaplan-Meier curve. For each time t, the Kaplan-Meier curve plots the participant’s probability of waiting at least until t if the reward is not delivered earlier. The area under the survival curve represents the average number of seconds an individual was willing to wait before quitting. The number of participants whose AUC fell into each 2-s bin between 5 s and 33 s is shown (n =1 participant who never quit in one block, and n = 11 participants with AUC < 5 s were excluded from analyses, so they are not included in the histogram)

Reward-related phasic pupil responses are consistent with heavy-tailed temporal expectations

Pupil dilation increased following the outcome on trials in which the token matured, and the participant was rewarded with 10¢ compared to those in which participants chose to sell the token early and received no reward (Fig. 4). Regression coefficients, with reward on that trial (1 = rewarded; 0 = quit) predicting pupil diameter, were significantly greater than zero from 1.38 s to 4 s following the outcome (note: they were significantly less than zero from 0 s to 0.82 s, which is expected, given the initial pupil constriction in response to a visual change). Given that reward timing was unknown to participants, but quitting was participant-initiated, this result validates our use of phasic pupil dilation as an index of surprise.

Fig. 4
figure 4

Effect of reward on pupil diameter, time-locked to the trial outcome. The outcome is defined as the token maturation time for rewarded trials, and the button press for quit trials. Regression coefficients, with reward on that trial (1 = rewarded; 0 = quit) predicting pupil diameter, were briefly significantly less than zero from 0 s to 0.82 s, and then significantly greater than zero from 1.38 s to 4 s. Pupil responses are plotted relative to the pre-outcome pupil diameter (average over 200 ms before the outcome). The outcome of the following trial, the pre-outcome pupil diameter, and the accumulated earnings at that point in the block were all included as covariates of no interest. Regression coefficients were compared to zero using one-sample t-tests, and the Benjamini-Hochberg false discovery rate procedure was used to correct for multiple comparisons (***statistical significance: p < 0.05, FDR-corrected)

We found that reward-related phasic pupil responses were consistent with participants having heavy-tailed expectations about when rewards would arrive (Fig. 5b). We examined the relationship between pupil responses on rewarded trials and the amount of time participants waited for the reward. There was a significant linear relationship between wait time and pupil response (β = 0.0101; z = 4.91; p < 0.001); pupil responses were larger on trials with longer wait times. Taking phasic pupil responses as a measure of surprise, this result suggests that, as time elapsed in the trial, participants believed that the token was less and less likely to mature at any given moment.

Fig. 5
figure 5

Hypothesized and actual surprise responses to reward as a function of wait time. a Simulations of expected pupil surprise responses given uniform and heavy-tailed beliefs about reward timing (1-s bins, range: [0–30] s). When beliefs are uniform, that is, when participants believe that the reward is scheduled to arrive at any time during the waiting period with equal probability, then the probability that the reward will arrive at any given time increases throughout the trial and the anticipated surprise in response to the reward decreases. When beliefs are heavy-tailed, then the probability that the reward will arrive at any given time point decreases as time elapses, and anticipated surprise in response to the reward increases. b Following rewarded trials, the relationship between the time waited and the peak pupil response was linear and positive. For illustration purposes, the mean peak pupil for all rewarded trials in which tokens matured within a certain time bin is plotted, averaged across participants. Peak pupil responses are residuals, after the effects of pre-outcome pupil diameter, previous trial outcome, and accumulated earnings were regressed out. Error bars correspond to the standard error of the mean. The total number of participants varied somewhat across bins (n = 83 included for bins <10 s, n = 73 for 10-13 s, n = 63 for >13 s)

One caveat to the above finding is that the distribution of rewards that participants actually experienced, although close to exponential, was technically heavy-tailed. Therefore, it is possible that participants’ initial expectations were not heavy-tailed, and they developed those expectations over time. However, when examining just the first twenty rewarded trials for each participant (which corresponds to about the first 5 min of the task), we found very similar results (wait time β = 0.0149; z = 3.87; p < 0.001). Therefore, it is unlikely that this pattern of pupil responses emerged with experience with the task.

Pre-decision pupil diameter around quit decisions depends on typical quit times

Pre-decision pupil diameter just prior to quit decisions depended on participants’ typical quit times, such that quit times that were more atypical for that participant were associated with the largest pupil diameter (Fig. 6). There was no significant relationship between quit time and pre-decision pupil on quit trials (quit time β = 0.0056; p = 0.090; AIC = 15403.25). However, the absolute difference between the actual quit time and the subject’s average quit time (AUC) was positively associated with pre-decision pupil (|AUC – quit time| β = 0.0478; p < 0.001; AIC = 15326.77). For illustration purposes, Fig. 6 shows the average pre-decision pupil diameter as a function of the difference between actual quit time and AUC. Taking pre-decision pupil as a measure of effort, this result goes against the idea that the need to exert effort increases as the delay lengthens and instead suggests that the most effortful quit trials were the ones in which participants quit either much later or much earlier than usual.

Fig. 6
figure 6

Pre-decision pupil directly prior to quitting was associated with the absolute difference between actual quit time and the participant’s typical quit time, or AUC (AUC = area under the Kaplan-Meier survival curve). For the illustration, we took the average pre-decision pupil diameter for each bin of trials for each subject, and then averaged across subjects. Bins of trials are defined by the difference between the trial quit time and the participant-specific AUC. Error bars correspond to the standard error of the mean. The total number of participants varied somewhat across bins (< −5 s: n = 55; −5 to −2 s: n = 82; −2 to 2 s: n = 83; 2 to 5 s: n = 78; > 5 s: n = 73)

Discussion

We measured pupil diameter while participants waited for delayed rewards with uncertain timing to adjudicate between two explanations of “failures” to persist in waiting: a limited self-control account and a temporal expectations account. We found that pupil dilation increased following trials in which participants were rewarded compared with those in which they quit. These phasic responses were modulated by how long the participants had to wait for the reward, in a manner that was consistent with temporal expectations being heavy-tailed. In other words, pupil responses suggested that participants believed that the reward’s imminent arrival was becoming less likely as time passed and therefore were more surprised as rewards arrived later. Finally, we examined the relationship between quit time and the pre-decision pupil diameter before quit events and found that pupil diameter was largest when quit times differed most from the subject’s typical quit time. This result suggests that, similar to other domains of decision-making, the most effortful choices in our task were those that were atypical for that subject. These results provide evidence for a temporal expectations-based account of persistence, in which quitting results from a choice that takes into account idiosyncratic temporal expectations, rather than from a failure to sustain effort.

Several theories have been put forward to explain failures to delay gratification (McGuire and Kable, 2013), but it is difficult to arbitrate between them using behavioral evidence alone. People report that delaying gratification is effortful, and individual differences in persistence are associated with real-world achievements that require effort (Shoda et al., 1990). Conversely, participants’ willingness to wait for rewards can be changed by altering their reward timing expectations (Fung et al., 2017; Kidd et al., 2013; Lempert et al., 2018; McGuire and Kable, 2012, 2015), suggesting that how long people persist may be the result of a rational cost–benefit decision process that takes those expectations into account. To judge between these two hypotheses, we must be privy to a person’s reward timing expectations. We used pupil responses, which are largest when rewards are least expected (O’Reilly et al., 2013; Preuschoff et al., 2011), to infer reward timing expectations. If people believed that the reward’s imminent arrival was becoming more likely as time passed (e.g., if reward arrival follows a uniform timing distribution), then pupil dilation responses should be smaller when rewards arrive later in the trial. We found the opposite: pupil dilation responses were larger when rewards arrived later. Therefore, we conclude that people’s “default” expectations are such that the reward’s imminent arrival becomes less likely as time passes (e.g., if reward arrival follows a heavy-tailed timing distribution). This accords with previous findings that expectations are heavy-tailed in many self-control domains (e.g., diet, exercise; McGuire and Kable, 2013) and many situations that involve significant temporal uncertainty (e.g., waiting for an e-mail response, a bus late at night, or on hold with customer service). Given the ubiquity of heavy-tailed distributions in real life (Newman, 2007), it is perhaps unsurprising that this would be the case in our paradigm. One caveat is that the quasi-exponential reward timing distribution used in our paradigm was technically heavy-tailed, so this result could reflect experience with the task rather than initial expectations. However, this is unlikely, because there was no evidence that the association between wait time and pupil responses changed as people gained experience with the task. Another caveat is that since people’s time perceptions are noisy, especially as time intervals get longer (Wittmann, 2013), they might always show more surprise as rewards arrive later, regardless of their beliefs. This explanation could be ruled out by examining pupil responses when the reward timing distribution is actually uniform. Although we have not examined pupil responses under those conditions, our research examining other measures of reward expectancy—reaction time (Lempert et al., 2022; McGuire and Kable, 2015) and heart-rate acceleration (McGuire and Kable, 2015)—suggests that rewards do become less surprising as a function of wait time in environments with uniform reward timing statistics.

Our pre-decision pupil diameter result was also consistent with quit decisions being the result of a value-based decision-making process, rather than a failure to sustain effort. Previous research on pupil dilation during decision-making has shown that pupil diameters are largest right before people choose an option that goes against their typical tendency (de Gee et al., 2014, 2020; Krishnamurthy et al., 2017). These choices are the most effortful, and pupil dilation has been shown to be an index of mental effort in many contexts (Da Silva Castanheira et al., 2020; Robison et al., 2021; Sayalı et al., 2022; van der Wel and van Steenbergen, 2018). In our task, deciding against one’s usual tendency can be defined as waiting either much longer or much shorter than usual before quitting. Indeed, we found that pre-decision pupil was well-explained by the extent to which the current trial’s quit time differed from the participant’s usual quit time. If waiting became more effortful as time passed, then we would expect that pre-decision pupil diameter would increase the longer one waited before quitting, but this was not the case.

Our study has some strengths and limitations worthy of mention. A strength is that our persistence task effectively decoupled a participant’s ability to learn reward timing statistics (and/or the optimal strategy based on those statistics) from their willingness to wait for delayed rewards. We also attempted to match the reward timing experiences between subjects as much as possible, although, of course, subjects who quit more often experienced fewer long delays. Under these task conditions, we found substantial variability in participants’ persistence. In future work, we hope to link these individual differences to real-world decisions that require persistence. One limitation of our study is that our findings are correlational, and we cannot make any causal claims about the origin of individual differences in expectations and/or persistence. Longitudinal study designs are needed to reveal, for example, whether reward timing expectations are a product or a cause of experience with persistence decisions.

Our pupillometry results suggest that people’s beliefs about when delayed rewards will arrive are likely to be heavy-tailed; thus, overall, their quitting behavior is more likely to result from a dynamic valuation process than from self-control lapses. However, it is important to recognize that other factors beyond reward timing beliefs can influence a dynamic valuation process and contribute to individual differences in persistence. For example, in addition to differing in their temporal expectations, people may also differ in their time perception. Someone who overestimates how long they have waited in the current trial may be inclined to quit earlier. Moreover, even though we have presented evidence that giving up on delayed rewards cannot be explained solely by self-control failure, that does not preclude the possibility that self-control contributes to individual differences in persistence decisions. For instance, self-control could be involved in promoting more rational quitting decisions given a set of reward timing beliefs. In future studies, we hope to delineate the different factors that contribute to individual differences in persistence, including in contexts where there is an optimal waiting strategy.

Nonluminance-mediated changes in pupil diameter are thought to reflect changes in arousal regulated by the locus coeruleus-norepinephrine system (Joshi et al., 2016; Joshi and Gold, 2020; Murphyet al., 2014a). Previous studies have shown that periods of heightened arousal, driven by tonic neuromodulation and indexed by increased pupil diameter, are associated with more variable decision-making (de Gee et al., 2014, 2020; Murphy et al., 2014b; Van Den Brink et al., 2016). Our pre-decision pupil diameter results are consistent with those findings. Our results suggest that increased arousal would lead those who are more patient to quit earlier and those who are more impatient to wait longer before quitting. Manipulations of noradrenergic tone could shed additional light on the role that arousal plays in this choice process.

On the other hand, the increase in phasic pupil responses that we observed as a function of wait time may reflect not only noradrenergic surprise responses, but also dopaminergic reward prediction error responses (Colizoli et al., 2018; Van Slooten et al., 2018). In our study, surprise and reward prediction error co-occurred, since the most surprising events in the task were the tokens maturing and participants gaining 10¢. It is possible that these phasic responses reflect increased dopamine spike rates (Bayer and Glimcher, 2005; Schultz et al., 1997), since the locus coeruleus and dopaminergic nuclei are highly interconnected, and both receive inputs from the same prefrontal cortical regions (Sara, 2009). Moreover, pupil responses have been shown to correlate with blood-oxygenation-level-dependent (BOLD) signal in dopaminergic nuclei (de Gee et al., 2017). Additional research is needed to establish the roles of different neuromodulators during persistence decisions, as well as to explore the neurobiological underpinnings of individual differences in persistence.

Decisions about how long to wait for delayed rewards are ubiquitous and consequential, so it is critical to uncover the factors that influence persistence. We used a latent physiological marker of surprise and effort, pupil dilation, to show that decisions to quit waiting are the result of rational decision-making that takes into account expectations about temporally uncertain rewards, rather than the result of a failure to sustain effort. This finding has implications for interventions to promote persistence. For example, if in the face of uncertainty people generally assume that reward timing distributions are heavy-tailed, then providing information about reward timing may help people to persist longer in situations in which the actual distributions are not heavy-tailed (e.g., providing information about transportation times). These findings also shed new light on the links between individual differences in persistence and real-world success, suggesting that initial expectations about reward timing—which are likely to be shaped by life circumstances (Michaelson and Munakata, 2020; Watts et al., 2018) but are also malleable in response to experience (Kidd et al., 2013; McGuire and Kable, 2012)—are probably a more important determinant of success than the capacity for self-control is.