Introduction

Consider the following visual task: Report on the attributes of a target item, identified by a cue, embedded in a series of distractor items presented in rapid succession in the same spatial location. To successfully perform this task, where the items alternate quickly as successive frames in a movie (rapid serial visual presentation [RSVP] procedure), the observer must be able to temporally individuate the target from the distractors. That is, it is not sufficient to simply detect the appearance of the target item; rather, it is necessary to identify the item and its attributes, as well as recognize its rank order in relation to adjacent items and to the cue. With such a task, it is thus possible to test the limits of temporal individuation in the judgment of serial order.

A number of specific variants of the generic task described above have been investigated previously. In an early study by D. H. Lawrence, observers were required to report a singleton word presented in uppercase within a list of lowercase words (Lawrence, 1971). Lawrence varied presentation rate and serial position of the target in the list and found that errors for the first and intermediate target positions increased with increasing rate. However, responses for the last position were very accurate and were not affected by presentation rate. Furthermore, in Lawrence’s study, erroneous responses for intermediate list positions tended to more frequently correspond to words following, rather than preceding, the target word (posttarget intrusion errors). Subsequent studies varied a number of stimulus factors in Lawrence-type tasks and demonstrated pre-, post-, or symmetric intrusion patterns of errors (Botella & Eriksen, 1992; Botella, Garcia, & Barriopedro, 1992; Botella, Suero, & Barriopedro, 2001; Gathercole & Broadbent, 1984; Intraub, 1985; Kikuchi, 1996; McLean, Broadbent, & Broadbent, 1983; Vul, Hanus, & Kanwisher, 2009). From the published data, the following conclusions about accuracy of report in Lawrence-type tasks can be drawn: Accuracy degrades with increasing presentation rates; it is better at list edges than at intermediate positions, particularly at end of list; it can be biased toward list positions early or late with respect to the position of the target, depending on a variety of stimulus attributes and task demands.

What is the cue that the observer uses to individuate the target, and what are the specific factors that limit the observer’s accuracy? Despite the apparent simplicity of tasks of the type described above, temporal individuation is severely limited, and such sluggishness does not yet have a clear explanation and has been discussed within a variety of contexts. I focus here on two broad explanatory frameworks: the illusory features conjunction concept within the attention literature and the serial order judgment problem in memory and motor control.

Much research in vision has investigated the temporal limits of early visual filters with linear systems techniques, characterizing the spatiotemporal contrast sensitivity function and the temporal impulse response (Watson, 1986). The filter characteristics of early vision pose an upper limit to our ability to individuate events, because stimuli that alternate at frequencies outside the filter’s bandwidth are simply not detected. Estimates of the upper limit of visible temporal frequencies range between 50 and 100 Hz. However, temporal individuation cannot be achieved at such fast alternation rates. In fact, the temporal limits of individuation can be very severe—in many cases, as low as 2–3 Hz (Holcombe, 2009). At faster rates, observers often commit illusory conjunction errors, reporting an incorrect pairing of features or an item that appeared earlier or later than the cue. One prominent class of explanations for such sluggish performance invokes attention as the limiting factor. Botella and colleagues stated that “illusory conjunctions occur when the experimental conditions impede adequate focusing of attention on the presented stimuli, e.g. when exposure times are brief” (Botella et al., 2001, p. 1455). Reeves and Sperling claimed that “the perceived order of rapidly presented items in short-term visual memory is determined primarily by the amount of attention they receive at the time of input” (Reeves & Sperling, 1986, p. 181). Vul and Rich suggested that “decreased precision of attention amounts to worse estimates—and thus greater uncertainty—about the location of the object” (Vul & Rich, 2010, p. 1169). Thus, there seems to be widespread acceptance of Treisman’s conjecture that “when attention is loaded, participants make many conjunction errors” (Treisman & Schmidt, 1982, p. 138); in the present context, attention overloading would be achieved by fast presentation rates. Supporting evidence for this attentional overload explanation comes primarily from two types of experiments. The first type of evidence concerns tasks that exclude the involvement of low-level spatiotemporal correlation mechanisms by requiring effortful individuation and comparison of temporal segments of repeating stimuli at large spatial separations. For example, tasks that require individuation and comparison of light and dark phases of stimuli, such as temporal phase discrimination (Battelli, Cavanagh, Martini, & Barton, 2003; Forte, Hogben, & Ross, 1999), or tasks that require integration of spatially separate object features, such as color and orientation binding (Holcombe & Cavanagh, 2001), are limited to rates below 10 Hz. The second type of evidence considered in support of the attentional overload explanation comes from experiments involving dual-probe techniques. For example, identification and report of the second of two targets in an RSVP sequence is severely impaired when the second target appears less than 500 ms after the onset of the first target, a phenomenon known as the attentional blink (Raymond, Shapiro, & Arnell, 1992). The suggestion that in both types of task, a high-level attentional selection process must be setting the limits of performance is based mostly on an exclusion principle. In phase discrimination tasks, low-level mechanisms such as motion detection filters have been excluded on the basis that the spatial density of stimulus elements is too sparse to engage them. In feature-binding experiments, it is argued that elementary feature detectors jointly coding for two arbritary features at separate locations, such as color and orientation, are unlikely to exist and, in fact, have never been found in neurophysiology. In the case of the attentional blink, the observation that the blink disappears when the first target does not need to be identified and reported, but merely detected, supports the suggestion that the source of the phenomenon must be a form of refractoriness in a higher-level attentional selection process (Chun & Potter, 1995). The question remains as to whether performance in these disparate tasks is limited by a unitary or by multiple attention mechanisms and what exactly such putative attention mechanisms are.

An almost completely parallel literature in the domain of serial order in memory has been concerned with tasks that share similar problematics with Lawrence-type tasks. For example, in a probed memory serial order recall task, a participant is required to recall an item that appeared in a specific serial position within a previously presented, temporally ordered list. Here, the cue appears after, rather than within, the list, but the two types of tasks share the common requirement of having to individuate items and encode their serial order. There has been much discussion on what is the nature of the code that enables serial order judgments in memory (for a thorough review, see Henson, 1998). After discarding older ideas, such as chaining, the recent literature has concentrated on distinguishing different types of positional codes—that is, codes based on a signal directly related to the temporal position of an item in the sequence. The focus has been on contrasting an ordinal code based on the rank order of items in the list, but independent of the temporal scale, and a temporal code that explicitly represents events on a temporal dimension and is, thus, sensitive to the time scale. Recent research appears to favor temporal over ordinal codes (e.g., Brown, Neath, & Chater, 2007).

In contrast, an ordinal, rather than a temporal, code appears to be used for representing serial order in motor action sequence generation, perhaps the oldest domain where the serial order problem has been discussed (Lashley, 1951). There exist neurons in a variety of premotor areas—most notably, the basal ganglia and frontal cortex—that appear to be rank-order-selective (ROS) neurons; that is, they change systematically their firing rate depending on the serial order position of an action within a sequence (Tanji, 2001). Prefrontal ROS neurons appear to code rank order independently of passage of time (Berdyyeva & Olson, 2011) and, crucially, also appear to be rank-order generalists; that is, they seem to signal temporal ordinal position for both action and object sequences (Berdyyeva & Olson, 2009, 2010). As such, the most recent investigations in this field seem to suggest that frontal ROS neurons may be implicated generally in tasks requiring temporal serial order judgments, including action, memory, and perception.

In the light of the conflicting findings in memory and action sequence generation above, it seems surprising that, to date, the distinction between ordinal and temporal coding has not been considered in the context of perceptual, Lawrence-type tasks. Just as in memory and motor control, I suggest that this distinction is timely and fundamental for understanding the nature of the mechanisms responsible for temporal individuation. The original observations of Lawrence indicated that error rates increase as presentation accelerates. This finding may be taken to imply an essentially temporal code, but in fact, the information available from extant studies is not sufficient to draw definite conclusions. We do not have quantitative estimates of the precision of temporal individuation in Lawrence-type tasks or of the degree to which such precision is affected by presentation rate and other manipulations that may involve attentional orienting. Until such estimates are obtained, any theoretical account can only remain poorly constrained.

In the present study, I analyze systematically the patterns of errors produced by observers in a single-probe temporal individuation task. The aim of the study is twofold: to investigate the nature of the signal used to achieve individuation and to identify the sources of uncertainty that limit accuracy and precision of temporal judgments. The task employed is identical to that in other recent studies on the same topic (Vul et al., 2009; Vul, Nieuwenstein, & Kanwisher, 2008). Observers are given a list of all 26 letters of the English alphabet, randomly ordered and presented in a rapid serial visual presentation (RSVP) procedure stream, and are asked to report the letter corresponding to a single cued temporal position. I report the results of two experiments. In the first experiment, the cue always corresponded to the midstream temporal position, and the variable of interest was the presentation rate. In the second experiment, the presentation rate was constant, but the cue appeared at different temporal positions within the stream. As such, the first experiment was aimed at distinguishing ordinal versus temporal codes in the presence of minimal uncertainty about the temporal location of the cue, whereas the second experiment investigated contextual influences by introducing uncertainty about the location of the cue, thus modulating attentional orienting by means of elapsed time (increasing cue hazard rate).

The results show that the distribution of observers’ errors can be modeled as a Uniform–Gaussian mixture with at least two sources of uncertainty: a specific temporal uncertainty that gives rise to errors clustered around the position of the cue with a distribution well modeled as a Gaussian, and at least another additional source responsible for spreading errors uniformly across all temporal positions, suggesting uncertainty in letter categorization. The characteristics of the Gaussian distribution of errors with different presentation rates suggest that judgment is based on a temporal, rather than ordinal, position signal. Introducing uncertainty in the location of the cue affects uniform random errors, such that their frequency becomes a decreasing linear function of the ordinal position of the cue in the sequence, but does not affect the Gaussian component; particularly, it does not decrease its precision. Judgments are often biased, demonstrating pre- or posttarget error patterns depending on the statistical distribution of the cue, and are affected by list edge effects, such that items appearing at the beginning and end of the stream are often reported, instead of items cued just after or before a list edge.

Experiment 1a: fixed probe position, different presentation rates

This experiment examined in single observers the ability to individuate a target letter always cued in the middle of the list, so as to allow an estimate of accuracy and precision minimally contaminated by uncertainty about the cue’s location. Performance was compared across a range of presentation rates to establish whether individuation is based on an ordinal or a temporal position signal.

Method

Participants

The author and three students, unaware of the purpose of the study, participated in the experiment.

Stimuli

On each trial, all 26 letters of the English alphabet were presented in an RSVP stream in the center of a CRT monitor with a refresh rate of 90 Hz. The letters were drawn in white Courier font on a black background. Each letter subtended 2.5° of visual angle at a viewing distance of 57 cm. The cue was a white circle with a diameter of 12° centered on the target letter.

Procedure

The sequence order of the 26 letters of the English alphabet was chosen randomly on every trial. Each letter in the RSVP stream was presented in two consecutive screen refreshes (22.2 ms) and was followed by an empty interval before the onset of the following letter (see Fig. 1). Onset and offset of the cue always coincided with the onset and offset of the 14th letter. Observers were instructed to report the cued letter by pressing the corresponding key on a computer keyboard and were not given feedback on the accuracy of their reports. An intertrial interval of 1,500 ms occurred after each response. Each participant was tested in blocks of 400 trials, comprising four sequences of successive 100 trials separated by brief interruptions. The RSVP presentation rate was fixed within each block. Several rates (obtained by altering the spacing between offset–onset of successive letters, with letter exposure kept fixed at 22.2 ms) were tested in randomized order over several days, and each participant was run in three blocks (1,200 trials) per rate. The very first block was treated as practice and discarded.

Fig. 1
figure 1

Experiment 1a. All letters of the English alphabet were presented in an RSVP stream. Each letter appeared in two screen refreshes (22.2 ms) and was followed by a blank interval before onset of the following letter. The letter occupying ordinal position 14 in the stream was cued for report with a white circle

Data analysis

Report frequency was calculated for each serial position and presentation rate from data averaged across all blocks. In the frequency histogram, each serial position x was expressed as a signed distance from cue onset. A Uniform–Gaussian mixture model was then fitted to the histogram of reports by nonlinear regression. The model had the following form:

$$ p(x)=\frac{{a\left( {1-u} \right)}}{{\sqrt{{2\pi {\sigma^2}}}}}\exp \left[ {-\frac{{{{{\left( {x-\mu } \right)}}^2}}}{{2{\sigma^2}}}} \right]+\frac{u}{26}\cdot $$
(1)

Parameter a is fixed and corresponds to the histogram’s bin width (RSVP frame exposure in milliseconds), μ and σ are the mean and standard deviation of the Gaussian component, respectively, and parameter u represents the cumulative probability of the uniform component.

Results

Representative distributions of reports corresponding to the choices of observer J.L. over a range of presentation rates (22.2–6.4 Hz, or stimulus onset asynchronies [SOAs] of 45–157 ms) are plotted in Fig. 2. Letters presented in the immediate neighborhood of the cued item are reported with increasing frequency as presentation rate decreases. When report frequency is plotted as a function of ordinal distance from the cue expressed as number of items (ordinal error; Fig. 2, left panel), the distributions’ variance appears to decrease with decreasing RSVP rate. However, when report frequency is plotted as a function of temporal distance from the cue (temporal error; Fig. 2, right panel), the distributions’ variance appears unaffected by rate, the main difference being a multiplicative change of gain.

Fig. 2
figure 2

Distribution of reports for observer J.L. at presentation rates of 6.4, 8.1, 11.1, 14.9, and 22.2 Hz in Experiment 1a. The frequency of reports is plotted as a function of distance from the cue (occurring at time 0) expressed in items (ordinal error) in the left panel and in milliseconds (temporal error) in the right panel. As presentation rate decreases, letters that appeared in the neighborhood of the cued location are reported with increasing frequency. Note that when plotted on an ordinal error scale (left panel), the variance of the distribution of reports appears to increase with increasing presentation rate, whereas on a temporal error scale (right panel), the distribution of reports appears to scale in gain without changes in variance

In order to measure quantitatively the variation of accuracy and precision of observers’ reports with presentation rate, the Uniform–Gaussian model (Eq. 1) was fitted by nonlinear regression to the frequency histograms of each observer. Representative examples of such fits are shown in Fig. 3 at slow and fast presentation rates for observer J.L. The need for a uniform component in the model is justified by theoretical considerations (see the General Discussion section) and is supported empirically by the substantial presence of reports at large distances from the cue. Note that such uniform errors are more prevalent at the fast presentation rate. Inspection of residuals from the fitted model across observers indicates that the reports’ distributions are not systematically skewed or kurtotic. Accuracy and precision of the reports are taken as the mean and standard deviation of the Gaussian component, respectively.

Fig. 3
figure 3

Examples of model’s fits (solid line) to the data of observer J.L. at 22.2 (top) and 8.1 (bottom) Hz in Experiment 1a. Note the higher frequency of reports outside the 600-ms region centered on the cue in the fast, as compared with the slow, rate condition

Best fits of the model’s parameters are plotted in Fig. 4 as a function of RSVP rate for each individual observer. For all observers, the cumulative probability of uniform errors (Fig. 4, top panels) increases with increasing rate. Accuracy and precision (mean and standard deviation of the model’s Gaussian component, respectively) are plotted in the middle panels of Fig. 4 as an ordinal error (distance from the cue as number of letters). For all observers, both accuracy and precision tend to decrease with increasing rate, as demonstrated by the progressive increase of ordinal error magnitude and variance. However, when errors are expressed as a temporal distance from the cue (Fig. 4, bottom panels), accuracy and precision fail to show a systematic trend as a function of rate. Note that the mean value of the Gaussian component is negative in the three naïve observers (pretarget intrusion pattern), but positive for the author (posttarget intrusion pattern).

Fig. 4
figure 4

Best fits of model’s parameters as a function of RSVP rate for 4 individual observers in Experiment 1. Top panels report estimates of the cumulative probability of uniform random errors. Estimates of the mean and standard deviation of the Gaussian component are reported as number of letters in the middle panels and as a temporal delay in milliseconds in the bottom panels

Given the similar trends observed for all parameters, averages were computed at the rates of 8.1, 11.1, 14.9, and 22.2 Hz, which were tested across all observers (Fig. 5). Because the model’s means had a different sign for 1 observer, accuracy estimates were recalculated before averaging as absolute deviations from the cue; these absolute (always positive) values for the means are plotted in Fig. 5. In the average data, uniform errors (Fig. 5, top panel) increase with increasing RSVP rate in a seemingly linear manner. On an ordinal scale (Fig. 5, middle panel), the mean and standard deviation of the Gaussian component increase roughly threefold with a threefold increase in RSVP rate. However, on a temporal scale (Fig. 5, bottom panel), the mean remains invariant with rate, while the standard deviation increases only slightly, by a factor of 1.14 (in both cases, linear regression slopes are not statistically significant, p > .05). The relationship between standard deviation and rate remains statistically nonsignificant (p > .05) even after correction for grouping with Sheppard’s formula (Ulrich & Giray, 1989), a method that deflates the estimates of variance at slower rates that are biased by coarser temporal sampling (Heitjan, 1989). Average values of mean and standard deviation are μ = 27 ms and σ = 72 ms. Overall, temporal invariance of accuracy and precision as a function of SOA is consistent with the notion that a temporal, not ordinal, position signal drives observers’ choices.

Fig. 5
figure 5

Best fits of model’s parameters as a function of RSVP rate averaged across observers in Experiment 1a. The means of the Gaussian component in the middle and bottom panels were transformed to absolute values before averaging

Experiment 1b: fixed probe position, individual differences

The aim of this experiment was to investigate individual differences in the model parameters. The interest here was in knowing how variable across observers accuracy and precision of judgments in the conditions of the previous experiments were.

Method

Participants

Nineteen undergraduate students participated in the experiment for course credit.

Stimuli, procedure, and data analysis

The stimuli, procedure, and data analysis were identical to those in Experiment 1a, with the following exceptions. The RSVP frame rate was fixed at 15 Hz, such that each letter was presented for 22.2 ms and empty intervals of 44.4 ms were interleaved between successive letters. Each participant was tested in one block of 200 trials, comprising two sequences of successive 100 trials separated by a brief interruption. The proportion of reports at each serial position was calculated and the Uniform–Gaussian model of Eq. 1 was fitted to the resulting histogram, obtaining estimates of the parameters of interest for each participant. Reaction times of individual participants for responses at each serial position were log-transformed, z-scored, and then averaged across participants.

Results

The average distribution of reports in Experiment 1b (Fig. 6, left) is broadly similar to the data obtained in Experiment 1a under the same conditions and is mirrored by the distribution of reaction times (Fig. 6, right): Responses corresponding to positions in the immediate neighborhood of the cue are more prevalent and are produced faster.

Fig. 6
figure 6

Experiment 1b. Average distribution of reports (left) and reaction times (right) across all participants in Experiment 1b. Note that responses corresponding to positions in the immediate neighborhood of the cue are not only more prevalent, but also faster

To investigate individual differences, the model of Eq. 1 was fitted to the data from each individual participant, and the resulting distributions of parameters’ estimates are plotted in Fig. 7. Most participants are biased toward anticipating the cue (pretarget intrusions), as indicated by the fact that the majority of means are negative (Fig. 7, left); however, about a quarter of participants show a posttarget intrusion pattern, similar to that for the author in Experiment 1a. The interquartile ranges are 58–91 ms for standard deviation and .07–.2 for the probability of uniform errors. The data from Experiment 1b are in good agreement with those from Experiment 1a in the averages but also show substantial variability across participants.

Fig. 7
figure 7

Histograms of model’s parameter estimates in Experiment 1b

Experiment 2: variable probe position, fixed presentation rate

The aim of this experiment was to investigate the effect on accuracy and precision of individuation produced by adding uncertainty about the temporal position of the cue in the list. The temporal position of the cue was uniformly distributed across list positions, such that cue expectation could increase with time elapsed from list onset. The question of interest was whether temporal expectation would affect bias and/or precision by modulating the allocation of attention over time.

Method

Participants

Fifty-four undergraduate students participated in the experiment for course credit.

Stimuli, procedure, and data analysis

The stimuli, procedure, and data analysis were identical to those in Experiment 1a, with the following exceptions. The RSVP frame exposure was fixed at 15 Hz, such that each letter was presented for 22.2 ms and empty intervals of 44.4 ms were interleaved between successive letters. On each trial, the cue appeared in one of four or five possible list positions, alternating across trials in random order according to a uniform distribution. Each participant was assigned to one of three groups that differed with respect to the possible list positions of the cue: Group 1 with positions 2–8–14–20–25, Group 2 with positions 4–10–16–22, or Group 3 with positions 6–12–18–24. Each participant was tested in one block of 400 trials, comprising four sequences of successive 100 trials separated by brief interruptions. For each cued position, the proportion of reports at each serial position was calculated from data averaged across participants. The Uniform–Gaussian model of Eq. 1 was fitted to the resulting histogram, obtaining estimates of the parameters of interest for each cue position.

Results

Examples of the distributions of observers’ choices in Experiment 2 are reported in Fig. 8. Reports for targets cued at intermediate positions in the list (e.g., positions 12 and 16; Fig. 8, middle panels) are distributed in a manner similar to that in Experiment 1. However, reports for targets cued at positions 2 and 25 (Fig. 8, left and right panels), adjacent to the beginning and end of the list, are markedly biased toward the first and last items in the list, respectively. Furthermore, in pilot data where the first and last items were cued, reports for those items achieved almost perfect accuracy, replicating the findings of Lawrence (1971).

Fig. 8
figure 8

Experiment 2. Frequencies of observers’ choices are plotted for cue positions 2,12,16, and 25. Unlike intermediate cue positions, responses to cues appearing adjacent to the beginning and end of the list (positions 2 and 25) are heavily biased toward the list edges

Because the three participant groups were tested at different serial positions, data modeling was conducted separately for each group. Precision of judgments (model’s SD; Fig. 9) does not vary with cue position in all groups (the apparently higher estimates for positions 2 and 25 in group 1 have a large uncertainty and were necessarily obtained from truncated data). The average standard deviation, σ = 72.8 ms, is similar to the average estimate obtained in Experiment 1. Interestingly, the estimates for accuracy of judgments (model’s means; Fig. 9) are affected by the distribution of cue positions and differ from those in Experiment 1. Observers in groups 2 and 3, not exposed to cues near list edges, produced reports centered on the veridical location of the cue (Fig. 9, center and right panels). However, observers in group 1, who were exposed to cues near list edges, as in Lawrence (1971), were markedly biased: When probed at positions 2 and 25, they tended to produce reports biased toward the beginning and end of the list, respectively; for intermediate list positions, they tended to report more often items following the cue (posttarget intrusion pattern), as reported in Lawrence (1971). As such, accuracy of judgments in all groups differed from that in Experiment 1, where a pretarget intrusion pattern was most commonly observed.

Fig. 9
figure 9

Best fits of the uniform–Gaussian model’s mean and SD parameters as a function of cue position in Experiment 2. Precision (SD) is not affected by cue position in all groups. Accuracy (mean) is unbiased when testing avoids targets near list edges (groups 2 and 3) but is biased when cues can appear near the beginning and end of list (group 1)

Best estimates of the cumulative probability of uniform errors as a function of cue position in the list are reported in Fig. 10. For all groups, uniform errors decrease in frequency by about a factor of three from the beginning to the end of the list (significantly negative linear regression slopes; all ps < .05). In summary, variability in cue position demonstrates two main effects on responses: It affects uniform random errors and biases the accuracy of reports when items near list edges are tested but does not affect precision.

Fig. 10
figure 10

Best fits of the uniform–Gaussian model’s cumulative probability of uniform errors as a function of cue position in Experiment 2. For all groups, uniform errors decrease in frequency with increasing cue position in the list

General discussion

Sources of uncertainty

There are at least two potential sources of uncertainty in the temporal individuation task studied here: categorical and temporal.

Consider the case where the observer localizes a cued target letter (e.g., the letter “O”) with perfect temporal accuracy: He or she may still commit a categorical error (e.g., erroneously report the letter “Q”). Since by experimental design any single letter of the alphabet has an equal probability (p = 1/26) of appearing in each list position on any single trial, categorization errors across trials will necessarily be uniformly distributed among all list positions, including positions far from the cued letter’s location. As such, any model of the distribution of errors in this task needs to include a uniform component. Errors of this kind may depend on many factors: how legible the character is, how similar the target letter is to other letters in the alphabet, how biased the observer is against reporting a particular letter, and so forth. Regardless of their origin, these errors should be reduced in frequency at slow alternation rates, because it has been found empirically in masking studies that interference decreases, while categorization improves, with increasing SOA (e.g., Bundesen & Harms, 1999). Uniformly distributed errors in the task studied here follow precisely this pattern (see Fig. 5, top panel).

Consider now the case where the observer achieves perfect categorization of all letters. In this case, report accuracy and precision will depend entirely on the observer’s uncertainty in mutually localizing in time the cue and the letters. A model accounting for temporal errors must specify the rules governing the observer’s choices and the nature of the signals upon which choices are made. I will discuss choice models later and concentrate now on the signal involved. It is an empirical fact that most errors cluster in the near vicinity of the cue according to a distribution that appears symmetrical and mesokurtic (see Fig. 2). Let us accept for the moment that a Gaussian distribution is a good fit to the data (see Fig. 3): What factors do its parameters depend on? If the ordinal position of letters in the list is encoded, the parameters of the Gaussian distribution should be a fixed number of letters regardless of the presentation rate. However, if serial order is encoded more finely as a temporal position signal, accuracy and precision should improve with slower presentation rates when expressed as number of letters but should be constant when measured in milliseconds. Temporal, rather than ordinal, invariance of accuracy and precision is indeed observed in the data (Fig. 5), suggesting that order information is coded as a time signal. I suggest that the latency dispersion of such a time signal is the major limit in temporal individuation.

Sources of bias

The accuracy of observers’ responses was biased by a number of factors. First, single observers’ choices in the fixed cue condition were not centered on the timing of the cue but, rather, often tended to anticipate it (pretarget intrusion errors; see Figs. 4 and 5). In the present context, this occurs in the absence of error feedback. A similar bias is often reported in other contexts, such as sensory–motor synchronization tasks (Aschersleben, 2002; Repp, 2005). Although this phenomenon has been known for a long time in the sensory–motor synchronization literature, its explanation within that domain, as well as in the present context, remains unclear.

Second, when there is uncertainty about the temporal location of the cue (Experiment 2), items cued near the beginning of the list tend to be reported less often than items cued at intermediate list positions. Such a phenomenon was noted in previous research. Ariga and Yokosawa (2008) named this effect attentional awakening and interpreted it as the result of a slow-to-start process of orienting of attention to locations in time. However, such a phenomenon also occurs near list end, and the distribution of errors shows that responses are not merely suppressed or uniformly distributed across all list positions but, instead, are selectively clustered toward edge items. Furthermore, the same group of observers produced delayed responses for intermediate list positions. Thus, the mere presence of cues near list edges induces a complex pattern of biases, with strong edge intrusions at the beginning and the end of the list and with posttarget intrusions dominating intermediate list positions, as was originally found by Lawrence (1971). Edge items were not pre- (post-) masked, suggesting that their categorical distinctiveness could be one of the sources of these biases in the observer’s choices. Such edge effects may also be related to previously observed misjudgments of temporal duration at the beginning and end of a sequence of events (Bachmann, Luiga, Poder, & Kalev, 2003; Kanai & Watanabe, 2006; Rose & Summers, 1995). Curiously, when cues are kept moderately distant from list edges, reports become unbiased.

In summary, temporal judgments are unbiased when cues are randomly distributed within a range of intermediate list positions that excludes items proximate to list edges. Judgments do become biased when cues are fully predictable, or when their random distribution includes near-edge items.

Choice model

How does the observer choose the cued target letter? I consider here a Thurstonian (Thurstone, 1927a, 1927b) choice model based on two assumptions: The letters and the cue are assigned noisy temporal tags on each trial, and the decision rule is a minimum operator. The time tags of each letter and of the cue on any given trial are assumed to be independent random samples from a Gaussian distribution centered on the physical timing of each item, and the decision rule is to choose the item whose time tag has the shortest distance from the time tag of the cue. This kind of model produces across trials a Gaussian distribution of choices centered on the cued item with a variance that is the sum of the variances of the timing distributions of the letters and of the cue. This simple model is intended to account for the observed pattern of precision of the Gaussian component of errors (compare the data in Fig. 2 with the simulation in Fig. 11) but would need further complexity to deal with bias and with errors that are distributed uniformly.

Fig. 11
figure 11

Simulation of a Thurstonian stochastic latency model of Experiment 1. On each of 10,000 simulated trials, independent random samples were generated from a normal distribution with unit variance and mean at time 0 for the cue and at times multiples of the SOA for the letters. On each trial, the simulated choice was the item having the minimum absolute temporal distance from the cue. Symbols in the figure represent the frequency of choices as a function of distance from the cue, and solid lines are best fits to a Gaussian distribution model. In the left panel, distance is expressed as items, whereas in the right panel, it is expressed as time units. Replicating the pattern observed in Experiment 1, the standard deviation of the Gaussian distribution of choices is \( \sigma =\frac{{\sqrt{2}}}{SOA } \) in the left panel but remains constant at \( \sigma =\sqrt{2} \) in the right panel

The observed error distributions, which do not appear to be particularly kurtotic or skewed, justify empirically the assumption that, in the present task, the noisy distribution of temporal tags is Gaussian and the timeline is linear. This is in contradiction with other models and tasks. For example, temporal ratio models of serial order judgments in memory, such as SIMPLE (Brown et al., 2007), assume exponential distributions and logarithmic time, which are justified by the form of the distribution of errors observed in serial order memory tasks. Other related magnitude estimation models—for example, those devised to account for errors in numerosity judgments (Piazza, Izard, Pinel, Le Bihan, & Dehaene, 2004)—often assume lognormal distributions. It remains unclear whether these idiosyncrasies reflect true differences in mechanisms or are related to methodological and task peculiarities.

Estimates of temporal noise

A robust finding of the present study is the fact that precision of judgment was invariant with presentation rate and was quantified as a standard deviation of 72 ms. Previous research has investigated the temporal limits of individuation by measuring the highest alternation frequency that can support performance at a certain threshold level (e.g., 75 % correct); 7 Hz is often cited as the perceptual limit in these tasks. The peak of the Gaussian component of responses in the present task (disregarding uniformly distributed errors) reaches 75 % at a presentation rate of 7.4 Hz, thus agreeing closely with previously measured limits in a variety of other tasks. For example, Linares and colleagues recently measured the spatial localization of moving objects and found that it was limited by a temporal precision of 70 ms (Linares, Holcombe, & White, 2009). On the basis of a reanalysis of data from a study by Vul et al. (2008) that used stimuli identical to those in the present study, Linares rightly conjectured that a similar limit may apply also to RSVP tasks. However, there are also many reported examples of tasks with even lower limiting rates (Fujisaki & Nishida, 2010; Holcombe, 2009), raising the question of the generality of the 70-ms limit. Many of those studies did not distinguish or consider the many sources of uncertainty in temporal judgment, and as such, it is difficult to know to what extent the limit of performance were set, for example, by the completion rate of categorization processes, rather than by temporal uncertainty per se. A further complicating factor is the nature of the task: Some tasks may require at least two individuation episodes to be performed, suggesting the involvement of refractory processes such as those observed in the attentional blink. It remains for future studies to further investigate the generality of the 70-ms precision (or 7-Hz) limit for individuation, after partialing out sources of uncertainty other than temporal imprecision per se.

Role of attention

Attention has repeatedly been cited in previous research as having a major modulating influence in the generation of errors in Lawrence-type tasks. Botella and colleagues (and several previous authors) have stated this explicitly: “Illusory conjunctions occur when the experimental conditions impede adequate focusing of attention on the presented stimuli, e.g. when exposure times are brief” (Botella et al., 2001, p. 1455). An unanswered question is what exactly does attention (or the lack thereof) do to influence the occurrence of errors.

One classical interpretation may be that attention equates to filtering (Broadbent, 1958; Treisman, 1964)—a mechanism capable of restricting the range of items from which the target has to be chosen. In the present study, an attempt was made to modulate such filtering action by changing the statistical distribution of the cue’s position in the list. As such, I argued that the conditions of the first experiment, where cue position was fixed, should allow maximum filtering of irrelevant letters, whereas in the conditions of the second experiment, where the cue was randomly distributed across list positions, filtering should depend on the hazard rate of the cue appearing in the next instant of time, and as such, it should vary with cue position. Were these manipulations effective, and what specific aspect of performance did they interfere with? The results of both experiments indicate that the precision of the Gaussian component of errors was unaffected in all conditions. What did change was the probability of committing errors uniformly distributed across list positions. Uniform errors for the midstream cue position were about twice more prevalent in the second than in the first experiment (compare Fig. 5, top panel, 15 Hz, with Fig. 10, Group 1, position 14), suggesting that fixing the cue aided temporal filtering. Furthermore, in the variable cue condition (Experiment 2), observers tended to produce more uniformly distributed errors at early list positions, such that the frequency of uniform errors appeared to have an inverse linear relation with list position (Fig. 10). This effect, reminiscent of the influence of the foreperiod in reaction time tasks (Niemi & Naatanen, 1981), suggests that the uniform distribution of cue times allows observers to increase their expectation of cue arrival as time elapses from RSVP onset and to use such expectation to improve filtering (see Nobre, Correa, & Coull, 2007).

In the light of the above observations, it appears likely that attention improves categorization efficiency through selective filtering. How much weight, then, does attention carry in explaining the uncertainty of judgments in individuation tasks? Let us assume, as a limit case, that uniform errors are entirely due to ineffective attentional filtering: In the most taxing condition of 22.2 Hz of Experiment 1, these errors accounted for about 20 % of all responses. The remaining responses are distributed normally around the cue position, with a temporal spread that does not vary with presentation rate and is, thus, unlikely to be limited by attention. Thus, it appears that the preponderant factor limiting temporal individuation is not attention but an intrinsic uncertainty in the temporal localization of the cue and the targets. It may be argued that such temporal uncertainty in the fixed cue condition corresponds to the precision of temporal bisection over the length of the list. There are two observations that make this conjecture unlikely: First, bisection may be more precise than individuation at the fastest RSVP rates tested here (see Zanker & Harris, 2002, for comparison bisection data); second, the Weber-like behavior of bisection thresholds (Kopec & Brody, 2010) would predict a similar Weber-like pattern for individuation (precision degrading at slower rates), which was not observed. I propose instead that the observed uncertainty relates to the latency dispersion of temporal markers assigned to events within the stream (see also Nishida & Johnston, 2002).

Latency dispersion of temporal markers as a limiting factor in temporal individuation

Research on temporal aspects of visual cognition is often introduced by noting the large disparity between the limits of detection of temporally modulated signals as opposed to the limits of individuation of specific temporal segments of such signals (Holcombe, 2009). For example, while temporal modulation in excess of 20 Hz of the luminance contrast of spatially separate objects can be easily detected, discrimination of their relative temporal phase can be achieved only at modulation rates below 10 Hz (Forte et al., 1999). The limits of detection can be explained as the minimum excess activation of neural filters induced by a signal, as compared with the activation induced by noise. The signal can be detected as long as the temporal impulse response of the filter is brief enough as not to blur it completely. However, a phase comparator not only must be able to encode the ongoing contrast modulations, but also must do so with temporal fidelity, such that relative phase information is preserved. As such, phase discrimination imposes stricter limits to the temporal characteristics of the encoding mechanisms, and the different requirements of the two tasks dictate the timescale of the neural code that is appropriate in each condition. A rate code based on counting spikes in an interval commensurate with the timescale of the stimulus’ modulation is sufficient to allow contrast detection and discrimination at that timescale. However, temporal phase discrimination necessitates coding at a finer scale (see Geisler, Albrecht, Salvi, & Saunders, 1991, for a thorough discussion of this point).

Temporal integration within a window of less than 100 ms, as found in early visual areas (Bair & Movshon, 2004), limits the ability to perceive two consecutive flashes as separate events (Bowen, 1989), contributes to explaining the phenomenon of masking (Cogan, 1992), and consequently limits the accuracy of target identification (Bundesen & Harms, 1999). In the present task, such effects due to temporal integration at a timescale of 50–100 ms must affect the frequency of uniformly distributed errors, which ought to be increasingly more prevalent at faster rates, as is in fact observed. However, when uniformly distributed errors are excluded, the data indicate that temporal uncertainty does not depend on presentation rate and suggest that the limit to temporal individuation may be set by the dispersion of latency of a putative internal time marker assigned to each event. Assuming that the latencies of markers for the letters and the cue are independently and identically distributed, the present data indicate that such latency distribution has a standard deviation of about 50 ms. This is a large dispersion, as compared with the reliability of neural responses to repeated stimulus presentations measured at early levels of the visual system, such as first-spike latencies, which seems to fall within a range of less than 10 ms (Butts et al., 2007). It is possible that more advanced stages of visual processing are less reliable. For example, available estimates of the variance of visual latencies in various cortical areas in the macaque monkey’s brain suggest that variability in more frontal regions is more similar in magnitude to the variability observed here (Jin, Fujii, & Graybiel, 2009; Pouget, Emeric, Stuphorn, Reis, & Schall, 2005; Schmolesky et al., 1998). However, while the behavioral data are suggestive of a limit based on temporal coding and a number of neural temporal coding schemes have been proposed (Tiesinga, Fellous, & Sejnowski, 2008), it remains at present unclear what specific aspect of neural activity may embody the putative internal timing marker representing each stimulus event.

Conclusions

Contrary to a widely held belief that the precision of temporal individuation in Lawrence-type tasks depends crucially on attentional load, the present data suggest that performance in such tasks is critically limited by an intrinsic source of temporal noise distinct and independent from attention. Similarly to the time marker theory of Nishida & Johnston (2002), I suggest that perceptual events are assigned internal time markers that are then compared to compute perceived temporal order; but I further suggest that these time markers are subject to a considerable latency dispersion and can be biased by a variety of stimulus factors. The latency dispersion of perceptual time markers can be thought of as a sampling instability that has the same effect as time jitter in a clock and phase noise in frequency oscillators (Balakrishnan, 1962; Souders, Flach, Hagwood, & Yang, 1990); it acts as a low-pass filter that critically limits the observer’s ability to judge the order of a series of events that unfold too quickly.