Sources of bias and uncertainty in a visual temporal individuation task

Martini, Paolo

doi:10.3758/s13414-012-0384-y

Sources of bias and uncertainty in a visual temporal individuation task

Published: 10 November 2012

Volume 75, pages 168–181, (2013)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Sources of bias and uncertainty in a visual temporal individuation task

Download PDF

Paolo Martini¹

1247 Accesses
8 Citations
Explore all metrics

Abstract

Despite a clear ability to detect temporal modulations of visual stimuli in excess of 50 Hz, temporal individuation and serial order judgment tasks can be performed only when stimuli alternate at much slower rates, and the nature of such sluggishness remains unclear. One example of a task with a slow temporal limit is the individuation of a cued letter in a rapid serial visual presentation (RSVP) stream. The present study investigates the nature of the code used to perform such a slow temporal individuation task and the sources of uncertainty involved. The results demonstrate that temporal, rather than ordinal, position in the RSVP stream is critical in serial order estimation, suggesting the involvement of a noisy temporal code. In addition to variability in temporal coding, observers’ choices are also limited by a number of other factors, such as categorical errors and biases related to the position of the cue in the letters’ stream. Attentional filtering improves categorization, but crucially, it does not seem to increase the temporal precision of judgment. Generalizing the present results, I suggest that perception of order is limited by an internal temporal sampling instability that is distinct and independent from attention and that, similarly to temporal jitter in a clock, acts as a low-pass filter that hinders the judgment of the order of events that unfold too quickly.

Perception of the temporal order of digits during rapid serial visual presentation is influenced by their ordinality

Article 12 November 2022

A delay in sampling information from temporally autocorrelated visual stimuli

Article Open access 15 April 2020

Error modulates categorization of subsecond durations in multitasking contexts

Article 16 March 2024

Introduction

Consider the following visual task: Report on the attributes of a target item, identified by a cue, embedded in a series of distractor items presented in rapid succession in the same spatial location. To successfully perform this task, where the items alternate quickly as successive frames in a movie (rapid serial visual presentation [RSVP] procedure), the observer must be able to temporally individuate the target from the distractors. That is, it is not sufficient to simply detect the appearance of the target item; rather, it is necessary to identify the item and its attributes, as well as recognize its rank order in relation to adjacent items and to the cue. With such a task, it is thus possible to test the limits of temporal individuation in the judgment of serial order.

A number of specific variants of the generic task described above have been investigated previously. In an early study by D. H. Lawrence, observers were required to report a singleton word presented in uppercase within a list of lowercase words (Lawrence, 1971). Lawrence varied presentation rate and serial position of the target in the list and found that errors for the first and intermediate target positions increased with increasing rate. However, responses for the last position were very accurate and were not affected by presentation rate. Furthermore, in Lawrence’s study, erroneous responses for intermediate list positions tended to more frequently correspond to words following, rather than preceding, the target word (posttarget intrusion errors). Subsequent studies varied a number of stimulus factors in Lawrence-type tasks and demonstrated pre-, post-, or symmetric intrusion patterns of errors (Botella & Eriksen, 1992; Botella, Garcia, & Barriopedro, 1992; Botella, Suero, & Barriopedro, 2001; Gathercole & Broadbent, 1984; Intraub, 1985; Kikuchi, 1996; McLean, Broadbent, & Broadbent, 1983; Vul, Hanus, & Kanwisher, 2009). From the published data, the following conclusions about accuracy of report in Lawrence-type tasks can be drawn: Accuracy degrades with increasing presentation rates; it is better at list edges than at intermediate positions, particularly at end of list; it can be biased toward list positions early or late with respect to the position of the target, depending on a variety of stimulus attributes and task demands.

What is the cue that the observer uses to individuate the target, and what are the specific factors that limit the observer’s accuracy? Despite the apparent simplicity of tasks of the type described above, temporal individuation is severely limited, and such sluggishness does not yet have a clear explanation and has been discussed within a variety of contexts. I focus here on two broad explanatory frameworks: the illusory features conjunction concept within the attention literature and the serial order judgment problem in memory and motor control.

Much research in vision has investigated the temporal limits of early visual filters with linear systems techniques, characterizing the spatiotemporal contrast sensitivity function and the temporal impulse response (Watson, 1986). The filter characteristics of early vision pose an upper limit to our ability to individuate events, because stimuli that alternate at frequencies outside the filter’s bandwidth are simply not detected. Estimates of the upper limit of visible temporal frequencies range between 50 and 100 Hz. However, temporal individuation cannot be achieved at such fast alternation rates. In fact, the temporal limits of individuation can be very severe—in many cases, as low as 2–3 Hz (Holcombe, 2009). At faster rates, observers often commit illusory conjunction errors, reporting an incorrect pairing of features or an item that appeared earlier or later than the cue. One prominent class of explanations for such sluggish performance invokes attention as the limiting factor. Botella and colleagues stated that “illusory conjunctions occur when the experimental conditions impede adequate focusing of attention on the presented stimuli, e.g. when exposure times are brief” (Botella et al., 2001, p. 1455). Reeves and Sperling claimed that “the perceived order of rapidly presented items in short-term visual memory is determined primarily by the amount of attention they receive at the time of input” (Reeves & Sperling, 1986, p. 181). Vul and Rich suggested that “decreased precision of attention amounts to worse estimates—and thus greater uncertainty—about the location of the object” (Vul & Rich, 2010, p. 1169). Thus, there seems to be widespread acceptance of Treisman’s conjecture that “when attention is loaded, participants make many conjunction errors” (Treisman & Schmidt, 1982, p. 138); in the present context, attention overloading would be achieved by fast presentation rates. Supporting evidence for this attentional overload explanation comes primarily from two types of experiments. The first type of evidence concerns tasks that exclude the involvement of low-level spatiotemporal correlation mechanisms by requiring effortful individuation and comparison of temporal segments of repeating stimuli at large spatial separations. For example, tasks that require individuation and comparison of light and dark phases of stimuli, such as temporal phase discrimination (Battelli, Cavanagh, Martini, & Barton, 2003; Forte, Hogben, & Ross, 1999), or tasks that require integration of spatially separate object features, such as color and orientation binding (Holcombe & Cavanagh, 2001), are limited to rates below 10 Hz. The second type of evidence considered in support of the attentional overload explanation comes from experiments involving dual-probe techniques. For example, identification and report of the second of two targets in an RSVP sequence is severely impaired when the second target appears less than 500 ms after the onset of the first target, a phenomenon known as the attentional blink (Raymond, Shapiro, & Arnell, 1992). The suggestion that in both types of task, a high-level attentional selection process must be setting the limits of performance is based mostly on an exclusion principle. In phase discrimination tasks, low-level mechanisms such as motion detection filters have been excluded on the basis that the spatial density of stimulus elements is too sparse to engage them. In feature-binding experiments, it is argued that elementary feature detectors jointly coding for two arbritary features at separate locations, such as color and orientation, are unlikely to exist and, in fact, have never been found in neurophysiology. In the case of the attentional blink, the observation that the blink disappears when the first target does not need to be identified and reported, but merely detected, supports the suggestion that the source of the phenomenon must be a form of refractoriness in a higher-level attentional selection process (Chun & Potter, 1995). The question remains as to whether performance in these disparate tasks is limited by a unitary or by multiple attention mechanisms and what exactly such putative attention mechanisms are.

An almost completely parallel literature in the domain of serial order in memory has been concerned with tasks that share similar problematics with Lawrence-type tasks. For example, in a probed memory serial order recall task, a participant is required to recall an item that appeared in a specific serial position within a previously presented, temporally ordered list. Here, the cue appears after, rather than within, the list, but the two types of tasks share the common requirement of having to individuate items and encode their serial order. There has been much discussion on what is the nature of the code that enables serial order judgments in memory (for a thorough review, see Henson, 1998). After discarding older ideas, such as chaining, the recent literature has concentrated on distinguishing different types of positional codes—that is, codes based on a signal directly related to the temporal position of an item in the sequence. The focus has been on contrasting an ordinal code based on the rank order of items in the list, but independent of the temporal scale, and a temporal code that explicitly represents events on a temporal dimension and is, thus, sensitive to the time scale. Recent research appears to favor temporal over ordinal codes (e.g., Brown, Neath, & Chater, 2007).

In contrast, an ordinal, rather than a temporal, code appears to be used for representing serial order in motor action sequence generation, perhaps the oldest domain where the serial order problem has been discussed (Lashley, 1951). There exist neurons in a variety of premotor areas—most notably, the basal ganglia and frontal cortex—that appear to be rank-order-selective (ROS) neurons; that is, they change systematically their firing rate depending on the serial order position of an action within a sequence (Tanji, 2001). Prefrontal ROS neurons appear to code rank order independently of passage of time (Berdyyeva & Olson, 2011) and, crucially, also appear to be rank-order generalists; that is, they seem to signal temporal ordinal position for both action and object sequences (Berdyyeva & Olson, 2009, 2010). As such, the most recent investigations in this field seem to suggest that frontal ROS neurons may be implicated generally in tasks requiring temporal serial order judgments, including action, memory, and perception.

In the light of the conflicting findings in memory and action sequence generation above, it seems surprising that, to date, the distinction between ordinal and temporal coding has not been considered in the context of perceptual, Lawrence-type tasks. Just as in memory and motor control, I suggest that this distinction is timely and fundamental for understanding the nature of the mechanisms responsible for temporal individuation. The original observations of Lawrence indicated that error rates increase as presentation accelerates. This finding may be taken to imply an essentially temporal code, but in fact, the information available from extant studies is not sufficient to draw definite conclusions. We do not have quantitative estimates of the precision of temporal individuation in Lawrence-type tasks or of the degree to which such precision is affected by presentation rate and other manipulations that may involve attentional orienting. Until such estimates are obtained, any theoretical account can only remain poorly constrained.

In the present study, I analyze systematically the patterns of errors produced by observers in a single-probe temporal individuation task. The aim of the study is twofold: to investigate the nature of the signal used to achieve individuation and to identify the sources of uncertainty that limit accuracy and precision of temporal judgments. The task employed is identical to that in other recent studies on the same topic (Vul et al., 2009; Vul, Nieuwenstein, & Kanwisher, 2008). Observers are given a list of all 26 letters of the English alphabet, randomly ordered and presented in a rapid serial visual presentation (RSVP) procedure stream, and are asked to report the letter corresponding to a single cued temporal position. I report the results of two experiments. In the first experiment, the cue always corresponded to the midstream temporal position, and the variable of interest was the presentation rate. In the second experiment, the presentation rate was constant, but the cue appeared at different temporal positions within the stream. As such, the first experiment was aimed at distinguishing ordinal versus temporal codes in the presence of minimal uncertainty about the temporal location of the cue, whereas the second experiment investigated contextual influences by introducing uncertainty about the location of the cue, thus modulating attentional orienting by means of elapsed time (increasing cue hazard rate).

The results show that the distribution of observers’ errors can be modeled as a Uniform–Gaussian mixture with at least two sources of uncertainty: a specific temporal uncertainty that gives rise to errors clustered around the position of the cue with a distribution well modeled as a Gaussian, and at least another additional source responsible for spreading errors uniformly across all temporal positions, suggesting uncertainty in letter categorization. The characteristics of the Gaussian distribution of errors with different presentation rates suggest that judgment is based on a temporal, rather than ordinal, position signal. Introducing uncertainty in the location of the cue affects uniform random errors, such that their frequency becomes a decreasing linear function of the ordinal position of the cue in the sequence, but does not affect the Gaussian component; particularly, it does not decrease its precision. Judgments are often biased, demonstrating pre- or posttarget error patterns depending on the statistical distribution of the cue, and are affected by list edge effects, such that items appearing at the beginning and end of the stream are often reported, instead of items cued just after or before a list edge.

Experiment 1a: fixed probe position, different presentation rates

This experiment examined in single observers the ability to individuate a target letter always cued in the middle of the list, so as to allow an estimate of accuracy and precision minimally contaminated by uncertainty about the cue’s location. Performance was compared across a range of presentation rates to establish whether individuation is based on an ordinal or a temporal position signal.

Method

Participants

The author and three students, unaware of the purpose of the study, participated in the experiment.

Stimuli

On each trial, all 26 letters of the English alphabet were presented in an RSVP stream in the center of a CRT monitor with a refresh rate of 90 Hz. The letters were drawn in white Courier font on a black background. Each letter subtended 2.5° of visual angle at a viewing distance of 57 cm. The cue was a white circle with a diameter of 12° centered on the target letter.

Procedure

The sequence order of the 26 letters of the English alphabet was chosen randomly on every trial. Each letter in the RSVP stream was presented in two consecutive screen refreshes (22.2 ms) and was followed by an empty interval before the onset of the following letter (see Fig. 1). Onset and offset of the cue always coincided with the onset and offset of the 14th letter. Observers were instructed to report the cued letter by pressing the corresponding key on a computer keyboard and were not given feedback on the accuracy of their reports. An intertrial interval of 1,500 ms occurred after each response. Each participant was tested in blocks of 400 trials, comprising four sequences of successive 100 trials separated by brief interruptions. The RSVP presentation rate was fixed within each block. Several rates (obtained by altering the spacing between offset–onset of successive letters, with letter exposure kept fixed at 22.2 ms) were tested in randomized order over several days, and each participant was run in three blocks (1,200 trials) per rate. The very first block was treated as practice and discarded.

Data analysis

Report frequency was calculated for each serial position and presentation rate from data averaged across all blocks. In the frequency histogram, each serial position x was expressed as a signed distance from cue onset. A Uniform–Gaussian mixture model was then fitted to the histogram of reports by nonlinear regression. The model had the following form:

$$ p(x)=\frac{{a\left( {1-u} \right)}}{{\sqrt{{2\pi {\sigma^2}}}}}\exp \left[ {-\frac{{{{{\left( {x-\mu } \right)}}^2}}}{{2{\sigma^2}}}} \right]+\frac{u}{26}\cdot $$

(1)

Parameter a is fixed and corresponds to the histogram’s bin width (RSVP frame exposure in milliseconds), μ and σ are the mean and standard deviation of the Gaussian component, respectively, and parameter u represents the cumulative probability of the uniform component.

Results

Representative distributions of reports corresponding to the choices of observer J.L. over a range of presentation rates (22.2–6.4 Hz, or stimulus onset asynchronies [SOAs] of 45–157 ms) are plotted in Fig. 2. Letters presented in the immediate neighborhood of the cued item are reported with increasing frequency as presentation rate decreases. When report frequency is plotted as a function of ordinal distance from the cue expressed as number of items (ordinal error; Fig. 2, left panel), the distributions’ variance appears to decrease with decreasing RSVP rate. However, when report frequency is plotted as a function of temporal distance from the cue (temporal error; Fig. 2, right panel), the distributions’ variance appears unaffected by rate, the main difference being a multiplicative change of gain.

In order to measure quantitatively the variation of accuracy and precision of observers’ reports with presentation rate, the Uniform–Gaussian model (Eq. 1) was fitted by nonlinear regression to the frequency histograms of each observer. Representative examples of such fits are shown in Fig. 3 at slow and fast presentation rates for observer J.L. The need for a uniform component in the model is justified by theoretical considerations (see the General Discussion section) and is supported empirically by the substantial presence of reports at large distances from the cue. Note that such uniform errors are more prevalent at the fast presentation rate. Inspection of residuals from the fitted model across observers indicates that the reports’ distributions are not systematically skewed or kurtotic. Accuracy and precision of the reports are taken as the mean and standard deviation of the Gaussian component, respectively.

Best fits of the model’s parameters are plotted in Fig. 4 as a function of RSVP rate for each individual observer. For all observers, the cumulative probability of uniform errors (Fig. 4, top panels) increases with increasing rate. Accuracy and precision (mean and standard deviation of the model’s Gaussian component, respectively) are plotted in the middle panels of Fig. 4 as an ordinal error (distance from the cue as number of letters). For all observers, both accuracy and precision tend to decrease with increasing rate, as demonstrated by the progressive increase of ordinal error magnitude and variance. However, when errors are expressed as a temporal distance from the cue (Fig. 4, bottom panels), accuracy and precision fail to show a systematic trend as a function of rate. Note that the mean value of the Gaussian component is negative in the three naïve observers (pretarget intrusion pattern), but positive for the author (posttarget intrusion pattern).

Given the similar trends observed for all parameters, averages were computed at the rates of 8.1, 11.1, 14.9, and 22.2 Hz, which were tested across all observers (Fig. 5). Because the model’s means had a different sign for 1 observer, accuracy estimates were recalculated before averaging as absolute deviations from the cue; these absolute (always positive) values for the means are plotted in Fig. 5. In the average data, uniform errors (Fig. 5, top panel) increase with increasing RSVP rate in a seemingly linear manner. On an ordinal scale (Fig. 5, middle panel), the mean and standard deviation of the Gaussian component increase roughly threefold with a threefold increase in RSVP rate. However, on a temporal scale (Fig. 5, bottom panel), the mean remains invariant with rate, while the standard deviation increases only slightly, by a factor of 1.14 (in both cases, linear regression slopes are not statistically significant, p > .05). The relationship between standard deviation and rate remains statistically nonsignificant (p > .05) even after correction for grouping with Sheppard’s formula (Ulrich & Giray, 1989), a method that deflates the estimates of variance at slower rates that are biased by coarser temporal sampling (Heitjan, 1989). Average values of mean and standard deviation are μ = 27 ms and σ = 72 ms. Overall, temporal invariance of accuracy and precision as a function of SOA is consistent with the notion that a temporal, not ordinal, position signal drives observers’ choices.

Experiment 1b: fixed probe position, individual differences

The aim of this experiment was to investigate individual differences in the model parameters. The interest here was in knowing how variable across observers accuracy and precision of judgments in the conditions of the previous experiments were.