Introduction

Orienting selective attention toward a source of interest is a core ability needed in order to extract potentially relevant information from the surroundings (Spence & Driver, 2004). This ability has been well documented in studies using Posner’s classic cost and benefit paradigm (Posner, 1980; Posner & Cohen, 1984; Wright & Ward, 2008). In this task, to-be-detected targets presented in the left or right hemifield are preceded by spatially nonpredictive (i.e., exogenous) cues that are also presented in the left or right hemifield. Facilitatory effects (evidenced by shorter response latencies) are typically observed when targets are presented in the cued, as compared with the uncued, hemifield, reaching a maximum at cue–target stimulus onset asynchronies (SOAs) of around 150 ms (e.g., Müller & Rabbitt, 1989). At SOAs longer than a few hundred milliseconds, however, the reverse effect is typically observed—that is, slower responses for targets presented at the cued, as compared with the uncued, locations. This effect was first reported by Posner and Cohen (1984) and, subsequently, was named inhibition of return (IOR) by Posner, Rafal, Choate, and Vaughan (1985), who interpreted IOR as an inhibitory bias against returning attention to previously attended locations (e.g., Klein, 2000; but see Lupiáñez, 2010, for an alternative account).

The above-mentioned literature clearly indicates that peripheral (exogenous) onsets are able to capture a participant’s attentional resources at a given spatial location. Attentional capture, however, might be modulated by several factors: In fact, by now, a large body of evidence exists demonstrating that attentional capture by unimodal exogenous cues is not an entirely automatic process. That is, there are a number of circumstances in which attentional capture simply does not take place (for reviews, see Ruz & Lupiáñez, 2002; Santangelo & Spence, 2008). One of the first pieces of evidence to show that exogenous cues do not necessarily capture spatial attention comes from an experiment by Theeuwes (1991). In his study, the exogenous orienting effects elicited by irrelevant visual onsets were no longer observed when a central arrow, presented in advance, reliably (i.e., with a validity of 100%) indicated the location of a target letter in a four-letter display. This finding led Theeuwes (1991) to conclude that “outside the focus of attention, abrupt transients are not capable of attracting attention” (p. 83; see Yantis & Jonides, 1990, for similar findings).

The abolishment of spatial attentional orienting toward the location of an exogenous cue is even more evident when attention is already engaged in another (perceptually demanding) task. For instance, Cosman and Vecera (2009) recently assessed whether abrupt onsets were able to capture attention, using complex search displays in which participants searched for a target letter through low- and high-perceptual-load conditions (set size = 1 and 6, respectively). On each trial, irrelevant flankers were also presented: Crucially, they affected search in the low-load displays, but not in the high-load displays. These findings led Cosman and Vecera to conclude that attentional capture by abrupt onsets is attenuated when people search through high-load displays.

Similarly, Santangelo, Olivetti Belardinelli, and Spence (2007) have shown that attentional capture by peripheral exogenous cues does not occur when attention is already engaged by a central (perceptually demanding) task. More specifically, the participants in their study had to discriminate, as rapidly and accurately as possible, the elevation (up vs. down) of targets preceded by spatially nonpredictive cues presented on the orthogonal (i.e., independent) left/right axis, just as in the orthogonal spatial-cuing task (e.g., Spence & Driver, 1997; see Spence, McDonald, & Driver, 2004, for a review). Participants’ performance was examined under two main conditions: A dual-task condition, in which the participants performed the orthogonal cuing task (i.e., up/down elevation discrimination) along with the detection of a digit embedded in a rapid serial visual presentation (RSVP) of letters presented in the center of the display,Footnote 1 and a no-stream condition, in which only the cuing task was presented and the participants had to perform only the elevation discrimination task—that is, the RSVP stream was replaced by a static fixation point. The results showed the suppression of exogenous spatial-cuing effects when the central stream of alphanumeric characters was presented (i.e., in the dual-task condition). Importantly, this result was replicated in a number of follow-up studies using different combinations of sensory modalities (Ho, Santangelo, & Spence, 2009; Santangelo, Ho, & Spence, 2008; Santangelo & Spence, 2007a, 2007b; for further converging results, see also Boot, Brockmole, & Simons, 2005; Santangelo, Finoia, Raffone, Olivetti Belardinelli, & Spence, 2008, Experiments 1 and 2).

Taken together, it is now well established that peripheral onsets do not affect the processing of targets subsequently presented at that location under conditions in which a concurrent perceptually demanding task (i.e., the RSVP task) is being performed. However, it is not clear whether visual onsets simply lose their effectiveness in capturing spatial attention under conditions in which a concurrent RSVP task is being performed, or they are still effective but the central stream of letters entails a faster disengagement of spatial attention from the cued location. For instance, Theeuwes, Atchley, and Kramer (2000) reported a study in which participants had to search for a shape target and had to ignore a salient (i.e., colored) distractor. When the target and distractor were presented simultaneously, the distractor captured participants’ attention, thus disrupting their performance. However, when the distractor was presented 150 ms prior to the to-be-searched-for target, participants’ performance was unaffected. According to Theeuwes and his colleagues, these findings support the idea of an early bottom-up capture by the distractor that is later overridden by top-down attentional control. One might therefore argue that even when performing a concurrent perceptually demanding task (e.g., the RSVP task, as in Santangelo et al.’s, 2007, study), an exogenous cue captures spatial attention but that this effect quickly dissipates because of a rapid reallocation of spatial attention to the central location, where a new stimulus (i.e., a letter) is presented and possibly needs to be responded to (i.e., top-down attentional control).

The aim of the present study was, therefore, to clarify this issue, by measuring the magnitude of exogenous spatial-cuing effects when participants have to perform a concurrent perceptually demanding task (involving the monitoring of an RSVP stream) at three different SOAs (whereas a fixed SOA was always used in the previous studies on this topic; e.g., Santangelo et al., 2007; see below): a short SOA, for which there was insufficient time for the letter in the central RSVP stream to change between the presentations of the spatial cue and target; a medium SOA, for which one letter change occurred in the central RSVP stream between the presentations of the spatial cue and target, just as in the previous studies (e.g., Santangelo et al., 2007); and a long SOA, where several (six) changes to the central letter occurred between the presentations of the spatial cue and target. In a second experiment, the participants were also tested in a no-stream (baseline) condition, in which the central stream of letters was replaced by a static fixation point. If the central perceptually demanding task entails a fast disengagement from the cued location, we would expect to find cuing effects only when no changing letters occur in the central RSVP stream.

Experiment 1

Method

Participants

Data were collected from 19 university volunteers (6 male; mean age 26.3 years, ranging from 20 to 38 years), who reported normal or corrected-to-normal vision and were naïve as to the purpose of the experiment, which lasted for 45 min.

Apparatus and materials

The visual stimuli were displayed on a black background on a 17-in. computer monitor (refresh rate = 60 Hz) located in a dark room. In order to mask any external noises, the participants wore headphones playing white noise. Participants sat approximately 50 cm from the monitor. The spatial cue used in the orthogonal cuing task consisted of the presentation of a white rectangle (2.1° × 1.4°) presented on either the left or the right of the computer monitor (located approximately 17° from the center of the screen). The spatial target consisted of a white circle (1.6° in diameter) presented from one of the four corners of the computer monitor (17° to the left/right and 12° above/below the fixation point; see Fig. 1a). Both spatial cues and targets were presented for a duration of 16 ms (i.e., for one screen refresh). The distractor set in the RSVP task consisted of 24 letters (A, B, C, D, E, F, G, H, J, K, L, M, N, P, Q, R, S, T, U, V, W, X, Y, Z), and the target set consisted of eight digits (2, 3, 4, 5, 6, 7, 8, and 9; 2.6° × 1.5°).

Fig. 1
figure 1

a Schematic illustration of the experimental setup. Participants stared at a central stream of letters searching for a to-be-detected target digit presented on two thirds of the trials. On the remaining trials, a spatial target was presented in one of the corners of the screen, requiring an up/down discrimination response. On every trial, a spatially uninformative visual cue was presented equiprobably on either the left or the right side. b Schematic representation of the sequence of events in the orthogonal spatial-cuing (at an SOA of 80, 190, or 750 ms) and digit detection tasks. In the spatial-cuing task (one third of all trials), no changing letters (80-ms SOA), one changing letter (190-ms SOA), or six changing letters (750-ms SOA) could occur in the central RSVP stream between the presentations of the peripheral spatial cue and the target. In the digit detection task (two thirds of all trials), no peripheral spatial target was presented; instead, a to-be-detected central digit was presented in a random position between the position immediately after the spatial cue and the end of the stream. Note that target digits in the stream and spatial targets in the orthogonal cuing task were never presented on the same trial

Procedure

Each trial consisted of the presentation of a stream of 20 alphanumeric characters, which started 1,000 ms after the offset of a visual warning signal (i.e., an asterisk of 0.8 × 0.8° of visual angle presented for 500 ms). Each character was presented for 96 ms, with an interstimulus interval of 16 ms (i.e., one screen refresh; a black screen filled the gaps between the presentations of successive stimuli). The distractor letters in the stream were chosen randomly before each trial, with the sole restriction that no distractor was repeated within a given stream. A peripheral spatial cue was presented on each trial. This cue appeared equiprobably at the third, sixth, or ninth stream position and equiprobably on either side of fixation, while the target digit appeared randomly at a position between the position immediately after the presentation of the spatial cue and the end of the stream. When presented, the spatial target in the spatial-cuing task appeared at the same position in the stream as the spatial cue (i.e., there was no changing letter in the center; 80-ms SOA), at the following position (i.e., one changing letter in the center; 190-ms SOA), or after six positions (i.e., six changing letters in the center; 750-ms SOA; see Fig. 1b). The spatial target could appear either on the same side as (cued trials) or the side opposite to (uncued trials) the peripheral spatial cue. A target digit was presented on two thirds of the trials, while a peripheral target (requiring an elevation discrimination response) was presented on the remaining one third of the trials. Note that target digits in the RSVP stream and spatial targets in the orthogonal cuing task were never presented on the same trial (cf. Santangelo et al., 2007).

The participants were informed of the proportion of each type of trial that would be presented prior to the start of the experiment. They were instructed to stare at the central stream of letters and not to make any eye movements (e.g., toward the periphery).Footnote 2 They were also instructed to press one of three buttons on a keypad (as rapidly and accurately as possible) in response to either the target digit (in the center of the screen, with the index finger of one hand) or the spatial target (up/down discrimination of the peripherally presented target circles, with the index and middle fingers of the other hand; these two buttons were arranged vertically), regardless of the side of presentation. Which hand was used was counterbalanced across participants. The participants performed 432 trials in total, including 288 digit detection trials, and 144 spatial-cuing trials (24 cued and 24 uncued trials for each SOA). These trials were presented in two separate blocks, each of which lasted for approximately 20 min. Prior to starting the experiment, the participants completed a 24-trial training session.

Results and discussion

The participants detected the target digits very accurately (making only 5.2% errors; mean reaction time [RT] = 440 ms). More informative as regards the main aim of our study were the data derived from the spatial-cuing task. The mean RTs are highlighted in Fig. 2. Those trials on which the participants responded in less than 100 ms (premature responses) or else failed to respond within 1,200 ms of target onset (misses) and trials on which participants responded erroneously were excluded from the analysis of the RT data. Overall, these trials occurred seldomly (M = 2.8% of the trials) and were not analyzed further. An ANOVA was performed on the RT data with two within-participants factors: cuing (cued vs. uncued) and SOA (80, 190, and 750 ms). This analysis failed to reveal a significant main effect of cuing, F(1, 18) = 1.5, p = .232. On the other hand, there was a significant main effect of SOA, F(2, 36) = 30.4, p < .001, indicating that participants responded more slowly at the 750-ms SOA (M = 569 ms) than at either the 80-ms (M = 533 ms; p < .001) or the 190-ms (M = 517 ms, p < .001) SOA, which did not differ significantly (p < .179), as shown by post hoc comparisons. Crucially, the analysis revealed a significant interaction between the two factors, F(1, 36) = 9.1, p = .001, indicating a significant exogenous spatial-cuing effect at the 80-ms SOA (M = 24 ms, p < .001) but no significant cuing effects at either the 190-ms SOA (M = 8 ms, p = .356) or the 750-ms SOA (M = -18 ms, p = .008; i.e., an opposite pattern of cuing effects as in the typical IOR study), as shown by post hoc comparisons.

Fig. 2
figure 2

Mean reaction times highlighting a significant exogenous spatial-cuing effect at the 80-ms stimulus onset asynchrony (SOA) in Experiment 1. The error bars represent the standard errors of the means

These results are in line with our predictions: We observed a significant spatial cuing effect only at the shortest SOA, when no changing letters occurred in the central RSVP stream. Conversely, when one or more letter(s) change occurred in the central RSVP stream, the peripheral exogenous cue no longer facilitated the participants’ elevation discrimination performance, thus replicating the results reported in the extant literature (see Santangelo & Spence, 2008, for a review). However, it could be argued that the suppression of orienting effects at the SOAs exceeding 80 ms was not due to the presentation of central stimuli between (temporally) the presentation of the spatial cues and the subsequent presentation of the target, but just to the elimination of the capacity of the peripheral onsets to capture participants’ spatial attention for intervals that exceeded 80 ms. In other words, it is necessary to include a baseline condition in order to establish the magnitude of cuing effects at these different SOAs, irrespective of any dual-task manipulation. For this reason, we conducted a follow-up experiment in which the participants carried out two different conditions: a dual-task condition, identical to that in Experiment 1, that included both the spatial-cuing and digit detection tasks, and a no-stream (baseline) condition, in which the central stream of letters was replaced by a static fixation point and the only task to be performed was the spatial cuing. If the suppression of spatial attentional-orienting effects resulted from the presentation of the central RSVP stream, we would expect to find the elimination of orienting effects only in the dual-task condition at SOAs longer that 80 ms, but not in the no-stream condition.

Experiment 2

Method

Participants

Data were collected from 16 university volunteers (4 males, mean age 23.9 years, ranging from 18 to 35 years) who reported normal or corrected-to-normal vision and were naïve as to the purpose of the experiment, which lasted for 60 min.

Apparatus and materials

The apparatus and materials were identical to those in Experiment 1.

Procedure

Two different conditions (dual task and no stream) were presented in separate blocks of experimental trials, with the order of presentation of the various conditions counterbalanced across participants. The participants were allowed to rest for a few minutes between each block of trials. In the dual-task condition, the participants carried out the task just as in Experiment 1—that is, both the spatial-cuing task and the central digit detection were performed. In the no-stream condition, the central stream of letters was replaced by a central fixation point (a cross of 0.7 × 0.7° of visual angle—i.e., just as in a typical exogenous spatial-cuing task), which was presented 1,000 ms after the offset of the visual warning signal (i.e., the asterisk) and remained on the screen for 2,500 ms or until a response was made. After a random interval (ranging from 300 to 600 ms) starting from the onset of the fixation point, the spatial cue was presented and was followed (after an equiprobable SOA of 80, 190, or 750 ms) by the spatial target. The participants were instructed to stare at the fixation point and to perform the visual elevation discrimination task by pressing one of two buttons on a keypad as rapidly and accurately as possible. This block lasted for approximately 14 min and included 144 trials (24 cue and 24 uncued trials for each SOA). Prior to starting the experiment, the participants completed a 24-trial training session for each condition.

Results and discussion

Just as for Experiment 1, the participants were very accurate in detecting target digits (making only 6.7% errors; mean RT = 476 ms). More informative as regards the main aim of our study were the data derived from the spatial-cuing task. Those trials in which the participants responded in less than 100 ms (premature responses) or else failed to respond within 1,200 ms of target onset (misses) and trials on which participants responded erroneously were excluded from the analysis of the RT data. Overall, these trials occurred seldomly (M = 6.1% of the trials) and were not analyzed further. An ANOVA was performed on the RT data with three within-participants factors: task (no stream vs. dual task), cuing (cued vs. uncued), and SOA (80, 190, or 750 ms). This analysis revealed (1) a main effect of task, F(1, 15) = 93.9, p < .001, showing that participants responded more slowly to spatial targets in the dual-task (M = 598 ms) than in the no-stream condition (M = 489 ms); (2) a main effect of cuing, F(1, 15) = 18.2, p = .001, indicating faster responses to cued (M = 534 ms) than to uncued (M = 553 ms) spatial targets; and (3) a main effect of SOA, F(2, 30) = 20.6, p < .001, indicating faster responses at the 190-ms SOA (M = 525 ms) than at both the 80-ms (M = 548 ms) and the 750-ms (M = 557 ms) SOAs. Moreover, this analysis revealed a significant interaction between task and SOA, F(2, 30) = 9.0, p = .001, indicating different response latencies in the two task conditions as a function of the SOA. In fact, in the no-stream condition, responses were faster at the 750-ms SOA (M = 490 ms) than at the 80-ms SOA (M = 504 ms). Conversely, in the dual task, this pattern was reversed, showing fastest responses at the 80-ms SOA (M = 592 ms) than at the 750-ms SOA (M = 624 ms), while in both task conditions, responses were faster at the 190-ms SOA (M = 474 and 577 ms, respectively) than at the other SOAs. Finally, this analysis revealed a significant interaction between cuing and SOA, F(2, 30) = 3.6, p < .040, indicating a larger cuing effect at the 80-ms SOA (M = 34 ms) than at the other SOAs (M = 14 and 10 ms, respectively). This analysis failed to reveal any other significant effect, all Fs < .321, all ps > .728.

However, one may argue that these results might be affected by an order effect due to the fact that half of the participants performed the no-stream condition first, whereas the other participants started with the dual-task condition. To rule out this potential confound, we conducted a further analysis. Here, we analyzed, for the no-stream condition, only the data from those participants who started with the no-stream condition (n = 8), and analogously, we analyzed, for the dual-task condition, only the data from those participants who started with the dual-task condition (n = 8). Importantly, this choice guarantees full comparability between the dual-task conditions in this and the previous experiment (i.e., there were no other differences, apart from the counterbalancing of the no-stream and dual-task conditions). These mean RTs are highlighted in Fig. 3. In line with the aim of this follow-up experiment, we performed specific two-tailed paired-sample t tests (95% confidence interval [c.i.]) to assess the orienting effects in the two conditions at the three different SOA.

Fig. 3
figure 3

Mean reaction times highlighting significant exogenous spatial-cuing effects at 80- and 190-ms stimulus onset asynchronies (SOAs) in the no-stream condition, as well as a significant spatial cuing effect at the 80-ms SOA in the dual-task condition in Experiment 2. The error bars represent the standard errors of the means

Statistically significant spatial-cuing effects were documented at both the 80-ms SOA (M = 36 ms), t = 3.2, p = .013, 62.9 < c.i. < 9.9, and the 190-ms SOA (M = 22 ms), t = 2.6, p = .033, 41.0 < c.i. < 2.3, but not at the 750-ms SOAFootnote 3 (M = 22 ms), t = 1.7, p = .134, 52.5 < c.i. < 8.5, in the no-stream condition. Crucially, however, in the dual-task condition, we found a spatial-cuing effect only at the shorter SOA (M = 27 ms), t = 3.0, p = .018, 47.3 < c.i. < 6.0, but not at the other SOA conditions, M = 8 ms, t = .7, p = .507, 34.6 < c.i. < 18.6, and M = 6 ms, t = 1.1, p = .302, 19.7 < c.i. < 7.0, respectively. These results are in line with our predictions and replicate the results of Experiment 1. In fact, we observed significant spatial-cuing effects at both the shortest and medium SOAs in the baseline (i.e., no-stream) condition (in agreement with the majority of the literature on exogenous spatial attentional orienting; see Spence et al., 2004, for a review). Conversely, when one or more changing letters occurred at fixation (i.e., in the dual-task condition), the peripheral exogenous cue no longer facilitated participants’ elevation discrimination performance, thus replicating the results reported previously (see Santangelo & Spence, 2008, for a review). In the dual-task condition, we observed a significant spatial-cuing effect only at the shortest SOA, before a changing letter occurred in the central RSVP stream.

General discussion

The aim of the present study was to assess whether or not a central perceptually demanding task affects the time course of the exogenous orienting of spatial attention. Across two experiments, the participants had to perform a version of the orthogonal spatial-cuing task (e.g., Spence & Driver, 1997) either combined with a to-be-monitored central stream of rapidly presented letters for a target digit detection (i.e., the dual-task condition, Experiments 1 and 2) or in isolation (i.e., the no-stream condition, Experiment 2). In agreement with the majority of the literature on exogenous spatial attention, we found exogenous orienting effects at both 80- and 190-ms SOAs in the no-stream condition (see, e.g., Klein & Shore, 2000). Crucially, however, in the dual-task condition, we observed an attentional capture effect only at the 80-ms SOA, but not at the 190-ms SOA, when a central letter was presented between the spatial cue and the spatial target. At the 190-ms SOA in the dual-task condition, we therefore found an abolishment of any attentional capture effect, which is also in line with the findings in the recent literature using similar versions of this task (see Ho et al., 2009; Santangelo, Finoia, et al., 2008; Santangelo, Ho, & Spence, 2008; Santangelo et al., 2007; Santangelo & Spence, 2007a, 2007b).

Overall, these results clearly indicate that visual onsets do not lose their effectiveness under conditions in which participants have to perform a concurrent perceptually demanding task. On the contrary, they are still capable of capturing participants’ attention at the cued location, as evidenced by the spatial-cuing effect observed at the shortest SOA (80 ms). However, the intervening presentation of abrupt novel stimuli (i.e., letters) in the center of the display between the presentation of peripheral cues and targets resulted in the rapid disengagement of participants’ spatial attention from the cued location, as evidenced by the elimination of any exogenous spatial-cuing effect. Unsurprisingly, spatial-cuing effects were eliminated not only when a changing letter drew participants’ spatial attention back to the central location of the screen, but also when additional intervening letters were presented in the RSVP stream—that is, at the 750-ms SOA. However, given that attentional capture effects were also eliminated in the no-stream condition at the 750-ms SOA, this result likely reflects the consequences of the decrease of efficiency of the exogenous cue to capture spatial attention at that SOA. As was noted in the introduction, the orienting effect elicited by peripheral abrupt onsets disappears or is even reversed (i.e., giving rise to an IOR effect) for SOAs longer than a few hundred milliseconds (e.g., 500 ms; see, e.g., Posner & Cohen, 1984).

The previous literature considered the abolishment of exogenous orienting of spatial attention under a concurrent perceptually demanding task (such as the RSVP task) as a consequence of increased perceptual load (e.g., Santangelo et al., 2007). According to the perceptual load theory of selective attention (e.g., Lavie & Tsal, 1994; for reviews, see Lavie 2005; Lavie, Hirst, de Fockert, & Viding, 2004), a person’s attentional resources are always fully deployed in the processing of any incoming sensory information. Hence, under conditions in which a participant’s primary task is not overly demanding, there may well be spare attentional resources available for the processing of other stimuli (such as the irrelevant distractor stream in a dichotic listening study). Lavie argued that under such low-load conditions, late selection may be observed. However, if the load of the primary task increases (e.g., if the complexity or presentation rate of the to-be-monitored stimulus, such as the RSVP stream, increases), participants will have to devote more resources to processing it, and hence, fewer residual attentional resources will be available for the processing of other auditory stimuli. Under such high-load conditions, Lavie argued, selection will occur relatively early in information processing instead, thus entailing that all task-irrelevant stimuli will be filtered out.

However, the present results demonstrate that our central perceptually demanding task, rather than providing some kind of early selection that filters out every task-irrelevant stimulus, instead resulted in a quicker disengagement of attention from the cued location toward the central location where a new stimulus (i.e., a letter or a digit) is presented. Otherwise, we should not have found an abolishment of the exogenous orienting effect at the 80-ms SOA in the dual-task condition. It is worth noting that this finding does not contrast with the perceptual load theory. Crucially, in this paradigm, we did not explicitly manipulate the participants’ perceptual load—for instance, by having different rates of presentation of the central stream. In other words, we cannot establish whether or not our central RSVP task, despite the fact that it was perceptually demanding, entailed a high or low perceptual load. In any case, our results show that the central RSVP task used in our study simply resulted in a quicker disengagement of attention from the cued location. This was a consequence of the ongoing central stream presentation, rather than necessarily eliminating any attentional capture effect because of consuming perceptual/attentional resources to filter out any task-irrelevant stimulus (i.e., our peripheral cues).

However, alternative explanations should also be considered to account for the abolishment of attentional capture in the dual-task conditions of the present study. For instance, Belopolsky, Zwaan, Theeuwes, and Kramer (2007) reported a study in which they manipulated the size of the so-called attentional window. In particular, they asked their participants to start the search for a target only when they detected either a global signal (i.e., a shape consisting on the combination of all the stimuli in the display; diffuse attention) or a local signal (i.e., the shape of the fixation point; focused attention). Their manipulation of the size of the attentional window proved to be effective: In fact, Belopolsky and colleagues found that increasing the size of the attentional window caused the observers to frequently orient to an irrelevant color singleton (see also Proulx & Egeth, 2006; Theeuwes 2004). A similar rationale might be used to account for our main finding: Given that in the great majority of trials in the dual-task condition, our participants performed a focused-attention task (i.e., central digit detection on two thirds of the trials), a small-size attentional window might have resulted in a reduced capability of peripheral abrupt onsets to capture spatial attention in a bottom-up manner. However, this notion seems to be challenged by the evidence that at the 80-ms SOA, we also found an exogenous orienting effect in the dual-task condition.

Overall, the present findings show a clear interplay between bottom-up and top-down attentional control, which agrees very well with previous research. For instance, Gibson and Kelsey (1998) reported a study in which they showed a contingency (in line with the notion of contingent capture; Folk, Remington, & Johnston, 1992; see also Folk, Ester, & Troemel, 2009) between display-wide visual features (i.e., particular features that signal the appearance of task-relevant targets displayed as a whole) and the features that captured attention. Crucially, Gibson and Kelsey showed that onset distractors captured attention when the target display was signaled by an onset. Similarly, our attentional capture effect when performing a perceptually demanding task might derive from the fact that our participants had an attentional control setting for onsets. In fact, both the RSVP task and spatial-cuing task share between them the feature of using onset targets. Our findings corroborate and extend further these previous results (Folk, Leber, & Egeth, 2002; Gibson & Kelsey, 1998) in the domain of temporal processing (see also Folk et al., 2009, on this point). They demonstrated that contingent capture for onsets facilitates peripheral capture until a central event (i.e., a new RSVP letter intervening before the spatial target) redirects participants’ attention toward the main RSVP task at central location.

Taken together, these findings confirm the notion that although focused attention might be a necessary condition for preventing attentional capture by peripheral exogenous cues, it might not be sufficient (Folk et al., 2002; Gibson & Kelsey, 1998). In fact, on the basis of the present finding, we must assume that attentional capture by abrupt peripheral onsets (i.e., our exogenous cues) occurs whenever an abrupt onset (such as our exogenous cue) is presented, irrespective of the magnitude of the orienting effect measured in response to the following spatial target (i.e., at the 190- and 750-ms SOAs, in the present study). Consistent with Theeuwes et al. (2000), these findings show that task-irrelevant abrupt onsets cannot be entirely overridden by top-down attentional control (see also Kim & Cave, 1999, for other consistent results). Abrupt peripheral onsets capture attentional resources despite a central ongoing perceptually demanding task, thus supporting the idea of a selection model guided by stimulus-driven factors at early levels of processing.

To conclude, this study examined the time course of spatial-cuing effects after attentional capture when performing a concurrent perceptually demanding task—that is a central RSVP letter stream. Our results (contrary to several recent claims; see Ho et al., 2009; Santangelo, Finoia, et al., 2008; Santangelo, Ho, et al., 2008; Santangelo et al., 2007; Santangelo & Spence, 2007a, 2007b) demonstrate that abrupt peripheral onsets are able to capture participants’ spatial attention under conditions that are perceptually demanding, although their effectiveness seems to dissipate rapidly (i.e., as soon as changing letters in the RSVP stream draws attention back to the central location). Therefore, these results clearly show that the time course of exogenous orienting is a key factor determining whether attentional capture will be observed under conditions that are concurrently perceptually demanding.