Visual masking, as both tool and object of study, is a major component of visual cognitive neuroscience. It provides a principal method of studying the microgenesis of conscious visual perception (Bachmann, 2006; Breitmeyer & Öğmen 2006). Object substitution masking (OSM) is a surprising new form of visual masking. First reported by Enns and Di Lollo (1997; Di Lollo, Enns, & Rensink, 2000), it has prompted interest both as a means for studying the relationship between awareness and brain networks thought to be involved in its generation (Boehler, Schonfeld, Heinze, & Hopf, 2008; Bouvier & Treisman, 2010; Bridgeman, 2006; Carlson, Rauschenberger, & Verstraten, 2007; Dux, Visser, Goodhew, & Lipp, 2010; Koivisto, 2012; Prime, Pluchino, Eimer, Dell’Acqua, & Jolicoeur, 2011; Woodman & Luck, 2003) and as a phenomenon to be explained in its own right (Bridgeman, 2007; Chen & Treisman, 2009; Di Lollo et al., 2000; Francis & Cho, 2007; Francis & Herrmens, 2002; Goodhew, Dux, Lipp, & Visser, 2012; Koivisto & Silvanto, 2011; Lleras & Moore, 2003; Reiss & Hoffman, 2006; see Goodhew, Pratt, Dux, & Ferber, 2013, for a recent review). In its simplest form, an array of items, such as Landolt Cs or digits, is briefly presented; the target item is indicated by four surrounding and simultaneously onsetting dots that offset either simultaneously with it (control condition) or a few hundred milliseconds later (trailing mask condition). The task might typically require participants to either discriminate the target (e.g., the orientation of a Landolt C) or identify it (e.g., digit identity). Performance is typically found to be maximal in the control condition and drops off with increasing mask duration. Thus, OSM is measured as the decline in performance as the duration of the trailing mask increases.

Di Lollo et al. (2000) proposed a theoretical framework to explain OSM, drawing on the notion of bidirectional communication between hierarchically organized brain areas (Felleman & Van Essen, 1991). Onset of the target display activates low-level cells that code only simple stimulus attributes and precise location information. A feed-forward sweep communicates this information to higher (extrastriate) visual areas, which generate one or more perceptual hypotheses as to what the stimulus may be. The higher level cells have large receptive fields, and the resultant hypotheses have poor spatial resolution. To resolve potential ambiguities in figural synthesis and stimulus location, hypothesis information is sent back to low-level areas via reentrant projections, where a matching process occurs. If the unchanged display or its icon (target plus mask) persist throughout the duration of the reentrant loop, one hypothesis from the extrastriate areas will match the current activity in lower visual areas, the system will lock onto this interpretation, and a stable percept will be achieved. If, however, the display changes to mask alone during the iterative loop, a mismatch is created between the reentrant information and the current visual input, and a new cycle of processing begins based only on the current sensory input activating lower level neurons. This second cycle of recurrent processing is likely to lead to perception of the mask alone or to perception of the mask with a partially and indistinctly seen target.

According to Di Lollo, Enns, and colleagues (Di Lollo et al., 2000; Enns, 2004; Enns & Di Lollo, 1997, 2000; Lleras & Moore, 2003; Moore & Lleras, 2005), a signature feature of OSM is its modulation by spatial attention. OSM is claimed to occur only when attention cannot be rapidly focused, or prefocused, upon the target location. A central assumption in support of the role of attention is that mask duration (after target offset) interacts with the set size of the search array, the effect of each being stronger as the other increases (e.g., Di Lollo et al., 2000; Enns, 2004; Enns & Di Lollo, 1997; Goodhew et al., 2012; Tata, 2002). According to Di Lollo et al., this is because set size effectively determines the speed with which attention can be focused on the target within the search array. A target presented in isolation, for instance, can enjoy recurrent processing free of distractor interference. All available attentional resources can be directed to target processing, thereby ameliorating the effects of the trailing mask.

Argyropoulos, Gellatly Pilling, and Carter (2013) reviewed the literature on set size and masking and commented that although an interaction between set size and mask duration is promoted as the hallmark of OSM (Di Lollo et al., 2000; Goodhew et al., 2011, 2012; Kotsoni et al., 2007), the actual evidence for it was surprisingly sparse. On the basis of their own experiments, Argyropoulos et al. (2013) showed that while set size and mask duration independently influence the perceptibility of a target in OSM displays, the two factors do not interact. It was argued that the multiplicative relationship previously reported by Di Lollo and colleagues may have been artifactual. In forced choice tasks, such as discriminating the orientation of a Landolt C, it resulted from ceiling and/or floor effects present in the data, which constrained the measurable range of performance. When these limiting effects were avoided, Argyropoulos et al. (2013) showed that no interaction ensued. Further evidence of set size influencing OSM had been reported by Di Lollo et al. in the context of a present/absent task. Argyropoulos et al. (2013) showed that when performance on target-absent trials was held sufficiently below ceiling such that a meaningful guessing correction could be applied to the target-present data, the apparent interaction in those data was eliminated. As before, both set size and mask duration strongly affected performance, but they did so only independently (see also Jannati, Spalek, & Di Lollo, 2013, for further confirmatory evidence of the independence of set size and mask duration in OSM).

The lack of interaction between set size and mask duration found by Argyropoulos et al. (2013) is consistent with spatial attention having no effect on OSM. However, it is not definitive evidence. In the first instance, it must be noted that even when a target was presented alone by Argyropoulos et al. (2013), there was spatial uncertainty as to its location in each array. This means that attention would have to be distributed across multiple locations prior to target onset, a point recently made by Goodhew and colleagues about the Argyropoulos et al. (2013) findings in their recent review of the OSM literature (Goodhew et al., 2013). Furthermore, although set size manipulations are generally considered to be a proxy for manipulations of the spatial distribution of attention, this assumed relationship is certainly not without challenge. It has been suggested that set size effects are often attributable to factors that are associated with other changes when item number is increased, such as in probable stimulus location, distractor proximity, and the amount of visual information. Set size effects have been argued to be a consequence of low-level factors such as changes in stimulus eccentricity (Carrasco, Evert, Chang, & Katz, 1995; cf. Wolfe, O’Neill, & Bennett, 1998), increased lateral inhibition (Pöder, 2004), or other visual processes such as crowding (e.g. Palmer 1994; Pelli & Tillman, 2008), priming preparation (Zelinsky, 1999), or increases in the level of visual noise (Magyar, Van den Berg, & Ma, 2012).

We will be reporting a set of experiments in which we unpack the effect of set size from other potential confounding factors in a separate forthcoming paper (Argyropoulos, Gellatly, Pilling, & Carter, 2013). However, for now, it is important to note that the relationship between set size and attention is rather less straightforward than might initially have been thought. It can therefore be stated that the absence of set size effects does not, by itself, rule out a role for attention in OSM in the manner described by Di Lollo et al. (2000).

More direct evidence in support of the role of attention in OSM has been presented from studies that manipulate attention through the introduction of a spatial cue prior to the onset of the target. In such work, it has been claimed that OSM is substantially weakened or even abolished entirely when a cue allows attention to be prefocused on the target (e.g., Di Lollo et al., 2000; Enns, 2004; Luiga & Bachmann, 2007; Germeys, Pomianowska, De Graef, Zaenen, & Verfaillie, 2010; Tata, 2002). Argyropoulos et al. (2013) also reviewed the literature on the effect of spatial precuing on OSM. It was noted that in all cases, these earlier studies were either subject to ceiling artifacts or open to alternative interpretations. For instance, in Di Lollo et al. (2000, Experiment 6), target location was precued by having a four-dot mask (FDM) onset prior to the target array, which served both to indicate the target and to mask it. Di Lollo et al. found for all set sizes tested, that performance improved as precue duration increased, with the two factors interacting. However, they did not vary mask duration; therefore, their finding is only that cuing enhances performance with a particular mask duration, not that OSM is reduced by precued attention. Furthermore, the near-ceiling-level accuracy they observed for set size one for precue durations beyond zero means that their interaction was most likely artifactual, an argument that applies also to the study by Germeys et al. 2010. Moreover, as Argyropoulos et al. (2013) noted, the introduction of an asynchrony between target and mask onsets is known to reduce OSM even where the asynchrony is uninformative as a spatial cue (Gellatly, Pilling, Carter, & Guest, 2010; Guest, Gellatly, & Pilling, 2011a, b; Lim & Chua, 2008; Neill, Hutchinson, & Graves, 2002; Tata & Giaschi, 2004). Indeed, this temporal asynchrony effect seems to occur because it prompts the visual system to individuate the target as a separate object independent from the mask, a finding consistent with the object updating hypothesis account of OSM (Lleras & Moore, 2003; Moore & Lleras, 2005).

Thus, current evidence about the role of spatial cuing in OSM is not conclusive and requires further examination. In the present article, we report five experiments that examine the effect upon OSM of direct manipulations of spatial attention using different kinds of precue and different sorts of search display. There are several important reasons for conducting such a study. First, there is a clear theoretical basis for it. The claim that attention exerts a modulatory influence on OSM is central to the recurrent processing theory of Di Lollo et al. (2000), which is implemented in a computer simulation and has also been said (Oriet & Enns, 2010) to be crucial to the updating account of OSM. If no such modulatory influence is revealed by our data, this will indicate a need for theoretical revision. Second, there is the demand of empirical consistency. As was just noted, the findings of Argyropoulos et al. (2013) are based on implicit manipulation of attention and so provide only indirect evidence that it does not modulate OSM. It is important to check whether evidence from an explicit manipulation of attentional focus will yield findings consistent with the indirect evidence. Third, the assumption of a central role for attention has informed the interpretation not only of much behavioral data, but also of most, if not all, brain imaging studies of OSM to date. Should that assumption be brought into question, the brain imaging results will, in many cases, require reinterpretation.

The experiments reported here were approved by the ethics committee of Oxford Brookes University (OBU). All participants reported normal or corrected-to-normal visual acuity; they gave informed consent and had been prewarned that they should not take part if they had a medical history of epilepsy or visual migraine. Testing sessions lasted approximately 40 min.

Experiment 1

Previous studies of precuing in OSM cued the target location on every trial and compared the effect of varying the relative onset times of cue and target object (precuing, simultaneous cuing, or postcuing). In Experiment 1, by contrast, cue validity was manipulated (Posner, 1980). Performance was compared when the target location was precued (i.e., a valid cue) and when a distractor location was precued (an invalid cue). Each item in the search display appeared inside an outline box, one of which “flashed” briefly prior to onset of the display (see Fig. 1), Because having attention directed away from the target location is supposedly a prerequisite for obtaining OSM, it was expected that masking would be obtained for invalid cue trials, with performance lower when FDM trailed target offset than when it offset simultaneously with the target. Of interest was whether OSM would be reduced, or even eliminated, for valid cue trials.

Fig. 1
figure 1

Stimulus sequence in Experiment 1 showing valid and invalid trials

Pilot work suggested that if target and distractor items were drawn from the same conceptual category (e.g., all were digits), there was a tendency on invalid cue trials for observers to report the item that had been cued, rather than the item inside the FDM. To obviate this confound, distractors in Experiment 1 were all Xs, whereas the target was one of the digits from 0 to 9. Participants were instructed simply to report the lone digit in the display; thus, their attention was explicitly drawn neither to the flashing box nor to the FDM in which the target digit appeared.

Method

Design and participants

The experiment had a within-participants design with two factors: cue validity with two levels (valid, invalid), and trailing mask duration with three levels (0 ms [simultaneous offset of dots and display items], 60 ms, 180 ms). Twenty individuals from Oxford Brookes University (15 female) participated in part-fulfillment of a course requirement.

Stimuli and apparatus

In this and the four following experiments, the stimuli were presented on a 20-in. Sony Trinitron CRT running at 100 Hz, viewed from 113 cm in a dimly lit and sound-attenuated room. All stimuli were black (0.3 cd/m2) on a white background (97 cd/m2), apart from the outline boxes of Experiment 1, which were light gray (72 cd/m2). Search display items for Experiments 1, 2, 4, and 5 were presented in Arial font. Experiments 1, 2, 3, and 5 were controlled by software routines written in BlitzMax (V. 1.44; Sibley, 2011).

Procedure

The trial sequence is shown in Fig. 1. Trials began with a 900-ms blank screen, followed by the fixation cross surrounded by four outline square boxes (subtending an angle of 3.65°) at the four cardinal positions for 350 ms. One gray box then briefly became black for 50 ms, followed by a 50-ms cue–target interval (CTI). The search display then appeared for 40 ms, followed by the dot mask alone for 0, 60, or 180 ms, then the fixation display until response. The search display consisted of Xs in three boxes and a digit (0–9) surrounded by four dots in the fourth box. The distance from the center of fixation to that of display items was 4.82°; the items were 0.51° in height; the four dots were 0.06° in diameter and created a notional square of side 1.01° (see Fig. 1 for displays and trial sequence). The task was to report the identity of the digit by pressing the corresponding key on a standard keyboard. Error feedback in the form of an auditory tone followed each response. Responses initiated the next trial. For each mask duration, there were 40 trials with a valid cue and 120 trials with an invalid cue. This gave a cue validity of 25 % across the 480 randomly ordered trials. For all experiments, the experimental session was preceded by verbal instructions, demonstration trials with slowed display sequences, and 40 practice trials.

Results and discussion

Mean scores are shown in Fig. 2. No participant scored higher than 83 % or lower than 37 % in any condition, showing that the data are free of ceiling and floor effects (chance = 10 %). Performance decreased with trailing mask duration and was lower for invalidly cued than for validly cued trials. A two-way repeated measures ANOVA gave significant main effects for validity, F(1, 19) = 9.99, MSE = 33.10, p < .01, partial η 2 = .345, and mask duration, F(2, 38) = 27.52, MSE = 28.31, p < .001, partial η 2 = .592. Critically—and in contrast to various previous reports of the effect of spatial attention on OSM—there was no interaction of the factors, F(2, 38) = 0.45, MSE = 26.60, p = .641. Post hoc analyses (Tukey’s LSD) indicated that performance with a 180-ms mask duration differed from that in both the 60-ms (p < .01) and 0-ms (p < .001) conditions, which also differed from each other (p < .001).

Fig. 2
figure 2

Mean accuracy in Experiment 1 for valid and invalid trials with a trailing mask duration of 0, 60, or 180 ms

The results of Experiment 1 challenge the view that spatial attention modulates OSM. When attention was precued to the target location, performance was better than when it was invalidly precued to a distractor, indicating that the precuing procedure was sufficient to affect performance on the task. Yet there was no interaction with mask duration, showing that the modulatory effect of attention was equivalent for all mask durations. Directing spatial attention toward or away from the target affected overall performance, but it did not modulate OSM. This result conflicts with results from a number of previous studies that manipulated spatial attention to the target. Why should this be so? Aside from the possibility that attention really does not modulate OSM, could our failure to find an interaction have resulted from something to do with the cuing procedure we used and the comparison we made between valid and invalid cue trials? Other investigators only ever cued the target location, either before or simultaneously with target onset. Although there seems to be no reason in principle why this method of precuing should reveal effects of attention not obtainable with the valid/invalid procedure, it is true that the cue validity effect in Experiment 1 was quite small, producing an average difference in accuracy of less than 4 %. Since cue validity was only 25 %, it is possible observers may have tried (with only partial success) to ignore the cue. Perhaps the attentional cuing effect was simply not strong enough to interact with mask duration. We therefore tested for this possibility by conducting Experiment 2, which used similar stimuli to Experiment 1 but a different precuing technique.

Experiment 2

Although Experiment 1 yielded clear-cut results, the cuing effect, although highly significant, was relatively small. This is not surprising given that some investigators have claimed that Posner cuing does not affect target discrimination tasks (e.g., Prinzmetal, Park, & Garrett, 2005), although others have reported a robust cue validity effect for such tasks (e.g., Liu, Pestilli, & Carrasco, 2005). We, therefore, conducted a second experiment in which attention was manipulated by cuing the target location either simultaneously with or shortly before target onset. There is a large literature reporting strong effects of this kind of cuing procedure (Colegate, Hoffman, & Eriksen, 1973; Eriksen & Hoffman, 1972, 1973). The cue was a line from fixation to the target location, and its onset preceded target onset by 0 or 150 ms. The line was both an endogenous cue to the target location and, due to illusory line motion away from fixation (Hikosaka, Miyauchi, & Shimojo, 1993), an exogenous cue.

In many studies of OSM, an FDM serves both to cue the target and to mask it. It is useful to separate these functions in the present context because we want to study the effect of the line cue as its onset time varies in relation to target onset without the complication of having the target also cued by an FDM that “popped out” of the display. We, therefore, presented all search display items inside FDMs. This then raised the question of whether, on trailing mask trials, distractor FDMs should disappear along with the search display items or stay on for the same duration as did the target’s FDM. In other words, should distractors, as well as targets, be masked, or only the latter? Given that Argyropoulos et al. (2013) found that set size has no influence on the extent of OSM—OSM was the same with 15 distractors or none—there are a priori grounds for supposing that it should make no difference whether or not distractors are masked. To test whether or not this would prove to be the case, we ran separate blocks of trials in which FDMs surrounding distractors either always disappeared along with the search display or else always had the same duration as the FDM surrounding the target. The order of these trial blocks was counterbalanced across observers.

Method

Design and participants

The experiment had a mixed design with four factors. The between-participants factor was the order in which trials with or without distractor masking were presented (masked/unmasked, unmasked/masked). The three within-participants factors were distractor masking with two levels (distractors masked/unmasked), cue–target onset asynchrony (CTOA) with two levels (0 ms, 150 ms), and trailing mask duration with two levels (0 ms, 180 ms). Twenty-four participants (14 female) as previously described, but including also authors AG., M.P., and Y.A, performed the experiment.

Stimuli

The search display contained one digit and seven Xs, the center of each item positioned on a virtual circle of radius 4.82°. Each of these stimuli was surrounded by four dots (see Fig. 3). Digits, Xs, and the four dots were of the same dimensions as in Experiment 1. The straight line radial cue (3.20°) was 1 pixel in width. Target and cue locations were randomized within the constraints of the experimental design.

Fig. 3
figure 3

Stimulus sequence in Experiment 2 for trials on which the distractors were masked and ones on which the distractors were unmasked (i.e., a trailing mask is present only at the target location)

Procedure

Each trial began with a 900-ms blank screen, followed by a fixation cross for 350 ms. On nonzero CTOA trials, the cue then appeared for 150 ms before the search display was added; on zero CTOA, trials, the cue and search display appeared simultaneously. The search display was presented for 40 ms and was followed by the dot masks for 0 or 180 ms, followed by the fixation display again until response (see Fig. 3). Aural error feedback followed each response, which initiated the next trial. There were two blocks of 160 trials. Half of the participants began with a block of trials with unmasked distractors, followed by a block with masked distractors; the other half underwent the opposite assignment of trial blocks. Trials within each block were in random order, 40 for each combination of mask duration and CTOA.

Results and discussion

Mean scores are shown in Fig. 4. No participants scored higher than 88 % or lower than 25 % in any condition. Performance was higher with a precue (150-ms CTOA) than without (0-ms CTOA) and decreased as mask duration increased, but there was little difference between conditions with masked and unmasked distractors. The data were analyzed using a three-way repeated measures ANOVA. There was no main effect of distractor masking condition (masked, unmasked) F = 0.26, p = .614,. but there were significant main effects of CTOA, F(1, 23) = 185.97, MSE = 54.50, p < .001, partial η 2 = .89, and trailing mask duration, F(1, 23) = 123.60, MSE = 32.26, p < .001, partial η 2 = .843. Critically, as in Experiment 1, there was no hint of a CTOA × mask duration interaction, F(1, 23) = 1.11, MSE = 42.45, p = .30. There was, however, a significant distractor masking condition × mask duration interaction, F(1, 23) = 6.31, p < .05, partial η 2 = .215. This reflected the fact that more masking resulted when the target alone had a trailing mask, as compared with trials when distractors also had a trailing mask. No other interactions were significant (maximum F = 2.24, p = .148).

Fig. 4
figure 4

Mean accuracy in Experiment 2 for trials when the target is spatially precued (150-ms cue–target onset asynchrony [CTOA]) or not precued (0-ms CTOA) when the distractors are masked or unmasked. On masked trials, the mask(s) trailed for 180 ms

The interaction between distractor masking condition and trailing mask shows that OSM was proportionally weaker when the trailing mask was given at all locations than when given only at the target location. This could point to an influence of distractor processing on masking; it is possible that perceptual interference from the Xs present when only the target was masked was reduced when the Xs themselves were being masked, leading to less OSM. Another possibility, one consistent with the finding of Argyropoulos et al. (2013), is that the effect of having trailing masks at distractor locations has nothing to do with distractor interference, but with the salience of the trailing mask stimuli. Tata and Giaschi (2004, Experiment 1) similarly report reduced OSM when, in addition to the target, distractor items were also surrounded by a trailing mask. They argued, on the basis of that and other findings, that the presence of a single trailing mask was more likely to capture attention from the target toward the mask in an involuntary manner than when several trailing masks were present at several display locations. The same process could explain the effect of distractor mask condition on OSM observed in our task.

However, the focus of our article is not to explore attentional capture by the trailing mask object but whether and how spatial attention, manipulated by spatial precuing, influences OSM. Experiment 2, like Experiment 1, indicated no significant effect of precuing on masking. Thus, the results of Experiment 2 suggest that the lack of an interaction between cue validity and mask duration in Experiment 1 did not result from use of the valid/invalid cuing procedure, rather than a precue/simultaneous-cue procedure. As with Experiment 1, the data of Experiment 2 militate against the idea that spatial attention modulates OSM. When attention was precued to the target location, participants were much better able to report the identity of the digit target. But this was equally the case for both mask durations. Once again, spatial attention affected performance but not OSM.

It is usual in studies of OSM for targets and distractors to be drawn from a single conceptual category, the members of which share the same limited set of physical features. We elected to break with this tradition (digit target, X distractors) for the reason given in the introduction to Experiment 1. However, it could be argued that this decision was influential in determining the pattern of results obtained in both our first two experiments. In a series of papers, Ghorashi and colleagues (Ghorashi, Enns, Klein, & Di Lollo, 2010; Ghorashi, Enns, Spalek, & Di Lollo, 2009a; Ghorashi, Spalek, Enns, & Di Lollo, 2009b) have argued that in rapid serial visual presentation tasks, uncertainty as to target location and uncertainty as to target identity are resolved by different forms of attention, where and what systems Similar to this proposal is that stated by Argyropoulos et al. (2013) regarding why Gellatly et al. (2006) found that pop-out on a task-irrelevant dimension does not reduce OSM, whereas pop-out of a task-relevant dimension does. The former could be said to engage spatial (where) but not identity (what) attention, with the latter engaging the identity system and, possibly, also the spatial system. There may be circumstances in which the two may share resources or else operate independently of one another (Ghorashi et al., 2009; Visser, 2011). It is possible that in Experiments 1 and 2, target and distractors were sufficiently distinct that target identification was achieved very easily, without calling upon resources that would have to have been shared with spatial attention.

One way in which this might have happened is if distractor suppression based on physical features played a greater role in Experiments 1 and 2 than is usual in OSM studies. Potentially, observers could have inhibited processing of diagonals at a featural level, thus freeing resources for processing of the only search display item not (usually) containing any—that is, the target. Since OSM seems to occur after the processing of physical features (Binsted, Brownell, Vorontsova, Heath, & Saucier, 2007; Woodman & Luck, 2003) but before processing at a category level (Chen & Treisman, 2009), this could be consistent with the absence of any effect of distractor masking. According to Di Lollo et al. (2000), focused attention reduces OSM by reducing distractor interference with target processing. But if, in Experiments 1 and 2, distractor processing was already being suppressed, due to a top-down strategy adopted by participants, then there may no longer have been a means by which attentional focus could modulate the extent of OSM. Therefore, in order to test the generality of our finding of no interaction between attentional cuing and mask duration, we next conducted an experiment using the kinds of cuing and target–distractor relations more typically employed in OSM studies. We also included extra levels of the precue and mask duration factors, similar to Experiment 1, in case this would increase the probability of finding some evidence of an interaction.

Experiment 3

Targets and distractors in Experiment 3 were all digits. As in Experiment 2, we wanted to study the effect of the line cue as its onset time varied in relation to target onset without the complication of having the target also cued by an FDM that “popped out” of the display. We, therefore, presented all search display items inside an FDM. Either all FDMs offset with the search display items, or else they all stayed on for a given duration.

Method

Design and participants

The experiment had a within-participants design with two factors: CTOA with three levels (0 ms, 50 ms, 150 ms) and trailing mask duration with three levels (0 ms, 60 ms, and 180 ms). Twenty participants (17 female) as previously described performed the experiment.

Stimuli

The search display contained eight digits in a virtual circle of radius 4.82°, each surrounded by four dots (see Fig. 5). Digits and the four dots were of the same dimensions as in Experiment 1. The straight line radial cue (3.20°) was 1 pixel in width.

Fig. 5
figure 5

Stimulus sequence in Experiment 3

Procedure

Each trial began with a 900-ms blank screen, followed by a fixation cross for 350 ms. On nonzero CTOA trials, the cue then appeared for 50 or 150 ms before the search display was added; on zero CTOA trials, cue and search display appeared simultaneously. The search display was presented for 40 ms and was followed by the dot masks for 0, 60, or 180 ms, followed by the fixation display again until response (see Fig. 5). Aural error feedback followed each response, which initiated the next trial. There were 360 trials in random order, 40 for each combination of mask duration and CTOA.

Results and discussion

Mean scores are shown in Fig. 6. No participants scored higher than 92 % or lower than 25 % in any condition. Performance decreased with decreasing CTOA and increasing mask duration. A two-way repeated measures ANOVA showed that both main effects were significant [CTOA, F(2, 38) = 43.54, MSE= 81.01, p < .001, partial η 2 = .696; trailing mask duration, F(2, 38) = 14.96, MSE = 95.33, p < .001, partial η 2 = .441]. But as in Experiments 1 and 2, there was no hint of an interaction between these two factors, F(4,76) = 0.77, MSE = 44.15, p = .548. Post hoc analyses (Tukey’s) indicated that overall, the zero precue condition differed from both the 50- and 150-ms conditions (both ps < .001), which also differed from each other (p < .01); the 180-ms mask duration condition differed from both the 60-ms (p < .05) and 0-ms (p < .001) conditions, which also differed from each other (p < .01).

Fig. 6
figure 6

Mean accuracy in Experiment 3 with a spatial precue of 0, 50, or 150 ms (cue–target onset asynchrony 0–150) for the three trailing mask durations

As with Experiments 1 and 2, the data of Experiment 3 call into question the idea that spatial attention modulates OSM. When attention was precued to the target location, participants were more able to report the identity of the target. This was equally the case for all mask durations. Once again, spatial attention affected only overall performance and not OSM. This result contradicts those of a number of previous investigators, and we will consider later on what reasons there might be for such conflictingly different results. For the moment, it is worth noting that the data of Experiment 3 might be taken to show that it makes little difference whether distractors differ from the target on simple physical characteristics (X distractors vs. a digit target in Experiment 2) or only in terms of categorical identity (all items are digits, as in Experiment 3). However, it might also be argued that on masking trials in Experiments 2 and 3, there were effectively no distractors, since the distractors were either being masked along with the target (Experiment 3, Experiment 2 masked distractors block) or else inhibited at a featural level (Experiment 2 unmasked distractor block). On this view, distractor interference from unmasked distractors on control trials could have been as great as or greater than interference from masked or inhibited distractors on masking trials. The relatively shallow slope of the masking (that is, mask duration) functions in Figs. 4 and 6 are consistent with this possibility. Perhaps there was not enough variation in performance levels across mask durations and precue conditions for an interaction to emerge. This seems an unlikely possibility given that the slope of the functions in Experiments 1 and 2 was similar to the slope in Experiment 3 and that, although modest by comparison with some examples of OSM (e.g., Di Lollo et al. 2000; Enns, 2004; Gellatly et al., 2010), these are not out of line with some other reports (e.g., Goodhew et al., 2012; Guest et al., 2011a, b; Reiss & Hoffman, 2006). However, to guard against this possibility, we conducted a fourth experiment, using a task we already knew to produce a larger OSM effect.

Experiment 4

Targets and distractors in Experiment 4 were Landolt squares with a gap in one side (Argyropoulos et al., 2013. Participants reported the side of the gap in the target square by pressing the corresponding arrow key on a standard keyboard. Once again, a radial line cue signaled the target location either simultaneously with target onset or shortly beforehand. Also, as in Experiments 2 and 3, search display items all appeared inside FDMs, all of which either offset with the search display items or else stayed on for a given duration, so that on masking trials not only the target, but also the distractors could have been subject to the masking effect of an FDM. Experiment 4, therefore, closely resembled Experiment 3 in its design.

Method

Design and participants

The experiment had a within-participants design with 2 factors: CTOA with four levels (0 ms, 50 ms, 100 ms, 150 ms) and trailing mask duration with three levels (0 ms, 60 ms, and 180 ms). Twenty-two participants (17 female) as previously described participated; the 3 authors also participated.

Stimuli and apparatus

The stimuli were eight outline squares with lines of 1.5 min arc thickness. Each side of a square was 0.3°, with a 0.1° gap in one side. The thickness of each masking dot was 3 min arc, and the distance between them was 0.5°. The cue line was 1.6° in length and had the same thickness as the dots. The experiment was written in and controlled by MATLAB using the Psychophysics Toolbox [PTB-3] extension (Brainard, 1997; Pelli, 1997)

Procedure

The stimulus sequence was essentially as it was for Experiment 3 (see Fig. 7 for schematic of the sequence). Each trial began with a 1,000-ms blank screen, followed by a 500-ms fixation cross. The target search display, preceded by 0, 50, 100, or 150 ms by the cue line, then appeared for 50 ms. It was followed by a dot mask for 0, 60, or 180 ms. This was followed by a blank display until response, which initiated the next trial. Accuracy feedback was not provided. There were 480 trials in random order, 40 for each combination of mask duration and CTOA.

Fig. 7
figure 7

Stimulus sequence in Experiment 4

Results and discussion

Mean scores are shown in Fig. 8. Performance decreased markedly with increasing mask duration and was lower for the simultaneous onset cue than for any other CTOA. A two-way repeated measures ANOVA gave significant main effects of mask duration and CTOA, respectively, F(2, 48) = 247.55, MSE = 83.15, p < .001, partial η 2 = .912, and F(3, 72) = 17.9, MSE = 70.18, p < .001, partial η 2 = .43. As in Experiments 1–3, there was not a significant interaction between the factors, F(6, 144) = 1.73, MSE = 51.30, p = .118. Although the group means for all conditions are substantially below 100 % and above chance (25 %), there was great variability in the individual data. To ensure that the results were not distorted by ceiling or floor effects, we separately analyzed the data of 11 participants who did not score above 90 % or below 33 % in any condition. This subset showed results very similar to those for the whole group [F(2, 20) = 156.51, MSE = 63.35, p < .001, partial η 2 = .94, for CTOA; F(3, 30) = 13.50, MSE= 51.94, p < .001, partial η 2 = .57, for mask duration; and F(6, 60) = 0.64, MSE = 39.08, p = .695, for the interaction]. Post hoc analyses of the complete data set indicated that the zero precue condition differed from all other precue conditions, (p < .001), which did not differ from each other, (min. p = .135)Footnote 1; and that performance with a 180-ms mask duration differed from both the 60-ms and 0-ms conditions, (both ps < .001), which also differed from each other (p < .001). Consistent with our expectation, Experiment 4 yielded a large OSM effect, as seen in the steep slope of the functions in Fig. 8. But although the effect of mask duration is much greater than in the preceding two experiments and while precuing once again has a highly reliable effect on performance, the two factors still do not interact. Once again, a manipulation of spatial attention did not modulate OSM.

Fig. 8
figure 8

Mean accuracy in Experiment 4 with a spatial precue of 0, 50, 100, or 150 ms for the three trailing mask durations (0, 60, or 180 ms)

If obtaining a strong OSM effect requires that there is a high level of interference from distractor items, then the use of masked distractors in Experiment 4 should have made it impossible to obtain strong OSM. In fact, however, the OSM effect is actually stronger than in comparable experiments by Argyropoulos et al. (2013), which used identical displays except for the line cue and FDMs surrounding distractors. However, because it still might be argued that masking all display items on masking trials is not usual in OSM experiments, we conducted one further experiment. In the final experiment, we employed only digits as display items, and distractors were never masked.

Experiment 5

Experiment 5 was similar in many respects to Experiment 3. All search display items were digits; two of the same CTOAs and two of the same mask durations were used as in Experiment 3. Each digit was initially surrounded by an FDM, for the reason given in the introduction to Experiment 2, but distractor FDMs always offset with search display items so that distractors were not subject to OSM. A further issue investigated in Experiment 5 was whether distractor homogeneity plays a part in determining the extent of OSM. If distractor interference with target processing is an important factor in OSM, then it might be expected that heterogeneous distractors could lead to greater OSM than homogeneous distractors. The distractors in Experiment 2 were all Xs and so homogeneous, as well differing from the target digit physically and conceptually. By contrast, distractors in Experiment 3 were heterogeneous digits that did not differ either physically or conceptually from the target. Both these experiments yielded a modest OSM effect. Experiment 4, however, produced a strong OSM effect with distractors that were certainly not conceptually different from the target. Whether they were heterogeneous or homogeneous and whether or not they should be considered to differ physically from the target depends on the emphasis one gives either to their essential squareness or to the varied orientations of the gaps they contained. In other words, comparing results across Experiments 2, 3, and 4 cannot tell us how distractor heterogeneity/homogeneity affects OSM. To investigate this issue, Experiment 5 included a distractor-type factor: Distractors were either random digits (heterogeneous distractors) or all 7 s (homogeneous distractors).

Method

Design and participants

The experiment had a within-participants design with three factors: distractor type with two levels (heterogeneous, homogeneous), cue–target onset asynchrony (CTOA) with two levels (0 ms, 150 ms), and trailing mask duration with two levels (0 ms, 180 ms). Twenty-four participants (18 female) as previously described performed the experiment.

Stimuli

Stimulus and display parameters were as described for Experiment 3. The search display contained eight digits, each surrounded by four dots. The dots around distractor items always disappeared with the search items (see Fig. 9).

Fig. 9
figure 9

Stimulus sequence in Experiment 5 for trials on which the distractors are homogeneous (all 7 s) or heterogeneous (random digits)

Procedure

The trial sequence was as described for Experiment 3. There were 320 trials in random order, 40 for each combination of mask duration, CTOA, and distractor type (i.e., heterogeneous and homogeneous distractors were not blocked).

Results and discussion

Mean scores are shown in Fig. 10. No participant scored higher than 95 % correct or lower than 17 % correct in any condition. Performance was better with a precue than with a simultaneous cue and decreased with a trailing mask. The effect of the trailing mask was somewhat less for the homogeneous distractors than for the heterogeneous distractors. A three-way repeated measures ANOVA showed that all three main effects were significant [distractor type, F(1, 23) = 7.47, MSE = 41.84, p < .05, partial η 2 = .245; CTOA, F(1, 23) = 176.4, MSE = 87.86, p < .001, partial η 2 = .885; mask duration, F[1, 23) = 161.08, MSE = 50.52, p < .001, partial η 2 = .875]. Significant two-way interactions were found for distractor type × mask duration, F(1, 23) = 17.34, MSE = 27.04, p < .001, partial η 2 = .43, and CTOA × mask duration, F(1, 24) = 4.82, MSE = 67.51, p < .05, partial η 2 = .173. No other interactions approached significance (maximum F = 0.16, p = .69).

Fig. 10
figure 10

Mean accuracy in Experiment 5 with a spatial precue of 0 or 150 ms, for trials on which distractors were homogeneous or heterogeneous for masked (180-ms trailing mask duration) and unmasked (0-ms trailing mask duration) trials

The analysis was repeated, but with participant accuracy in each condition calculated for trials on which the presented target was not the digit 7. This limited the analysis to trials on which the target “popped out” from (digit 7) distractors on homogeneous trials. In order to maintain parity, the same exclusion criteria were applied for both heterogeneous and homogeneous trials (note that it meant that the number of trials per data point was reduced from 40 to 36). This reanalysis found essentially the same basic pattern of results. All three main effects were significant [distractor type, F(1, 23) = 5.55, MSE = 51.07, p < .05, partial η 2 = .194; CTOA, F(1, 23) = 139.88, MSE = 104.81, p < .001, partial η 2 = .859; mask duration, F(1, 23) = 144.66, MSE = 52.81, p < .001, partial η 2 = .863]. A significant two-way interaction was found for distractor type × mask duration, F(1, 23) = 20.77, MSE = 35.79, p < .001, partial η 2 = .475. The CTOA × mask duration interaction no longer reached significance, F(1, 23) = 3.84, MSE = 80.91, p = .062, partial η 2 = .143. No other interaction approached significance (maximum F = 0.59, p = .452).

Unlike in the previous four experiments, we did find an interaction between precuing and masking. Thus, with the stimulus conditions of Experiment 5 (all items digits, trailing masks around only the target), there is a small but statistically significant reduction in masking when the target is precued. This suggests that attention does play some role when the target is presented in competition with other distractors of the same stimulus category (digits) that themselves are not surrounded by trailing masks. However, even then, as can be seen from Fig. 10, the effect on OSM is modest, particularly when taken in contrast with the more prominent main effect of precuing across both unmasked and masked trials.

Interestingly, distractor type may also have some effect on OSM. OSM was greater when distractors were a heterogeneous, rather than a homogeneous, set of digits. This suggests that having the target pop-out from the display reduced its susceptibility to masking. Others have reported similar findings. Di Lollo et al. (2000, Experiment 5) reported that OSM was greatly reduced when the target was a pop-out item among homogeneous distractors—a target circle with a vertical line segment in an array of items that consisted only of circles—although interpretation of their finding is hindered by the presence of ceiling effects and the fact that only data for target-present trials are given. Tata (2002, Experiment 3) also reports, similar to our Experiment 5, that masking was affected by distractor homogeneity: A target Landolt C presented in an array with other distractor Landolt Cs in different orientations resulted in greater masking than when distractors consisted of completed circles. Similarly, Gellatly, Pilling, Cole, and Skarratt (2006) found that a target consisting of a color or orientation singleton in a display was far less susceptible to OSM than a target that was nonunique within a display, although this factor was found to be important only for judgments made for the pop-out dimension. The findings of Gellatly and colleagues suggest that pop-out effects on OSM are fundamentally different from spatial cuing effects of the type identified in Experiment 5 in being specific to judgments on the pop-out dimension. What this suggests is that pop-out serves to increase the salience of the report feature by increasing feature contrast, rather than through drawing attention to the target object in the manner of a spatial cue. Interestingly, our data found no hint of a three-way interaction, suggesting that precuing had a similar reductive effect on masking irrespective of whether or not the target was a pop-out item within the display. Thus, although pop-out does modulate OSM, it is more likely to do so via feature salience than via spatial attention, and its effects seem orthogonal to those of spatial cuing. The weakness of the precuing interaction, in comparison with that of feature salience (in terms of relative effect size), seems to further downplay the role of focused attention in OSM

General discussion

In four of our experiments, using two kinds of cuing and a range of different stimulus items and presentation conditions, we found no evidence that exogenous spatial attention (as manipulated by spatial precuing) modulates OSM. In Experiment 1, a valid but informative precue improved reporting of a digit target among distractor Xs, relative to an invalid and uninformative precue, but did not modulate the effect of mask duration. In Experiment 2, precuing of the target location enhanced reporting of a digit target among Xs, relative to a simultaneous cue, but did not modulate the effect of mask duration, whether or not the X distractors were also subject to masking. The same benefit of precuing was found in Experiment 3 for reporting a target digit among masked digit distractors, but there was again no interaction with OSM. Experiment 4 used squares with a gap in one side as display items but was otherwise almost identical to Experiment 3; it yielded essentially identical results, but with a much stronger OSM effect. Only in Experiment 5 was any statistical relationship found between precuing and masking, and even here the interactions were significant in only one of the two ANOVAs we conducted. However, we now need to consider whether our treatment of our data has been appropriate.

While a few authors have reported OSM in terms of values derived from signal detection analysis (e.g., Koivisto, 2012), the vast majority of studies of the topic have calculated the magnitude of OSM from raw accuracy scores. By doing the same here, we have followed the convention in the field. However, Schweickert (1985) has presented a convincing argument that accuracy data should be log-transformed before testing for interactive effects between factors.Footnote 2 We, therefore, reanalyzed all our experiments in the same manner as previously, but having log-transformed the data. For Experiments 1–4, the log transformation did not change the significance level of any interaction term—they all remained nonsignificant—although in some cases, the F-values did show an increase. For Experiment 5, an ANOVA of the overall log-transformed scores resulted in increased F-value and significance level for the CTOA × mask duration interaction, F = 15.96, p < .01, partial η 2 = .41, as compared with the untransformed data. Additionally, and unlike for the untransformed scores, the interaction was also significant when the analysis was repeated on the data in which 7 trials were excluded, F = 13.03, p < .01, partial η 2 = .362. Significance was not changed for any other interactions in either analysis.

The small effect and, in other cases, total absence of an effect of precuing cannot be explained by the spatial cue being ineffective. On the contrary, in many of our five experiments, precuing as a factor accounted for a similar, or even greater, proportion of variance as the mask duration itself. Nor is the absence of influence of precuing on masking in some of our experiments (Experiment 1, Experiment 2) explained as a consequence of the target somehow being a “pop-out” stimulus within the array. In Experiment 5, the “pop-out” status of the target was manipulated directly and was shown to influence masking, but independently of spatial cuing.

Thus, precuing can influence OSM, but in many cases, it has no measurable effect. This is contrasted with the reliable and substantial effect precuing has on overall target processing across all our experiments. This result is consistent with recent work looking at how set size affects OSM. Argyropoulos and colleagues (2013) found that although the overall perceptibility of the target was reduced when set size was increased, the amount of OSM was always the same. If distractor number does not influence OSM, it is, perhaps, no surprise that spatial precuing has only a limited effect.

According to the original reentrant account of Di Lollo et al. (2000), focusing attention on the target location reduces the number of recurrent processing iterations needed to bind target features into a unified representation, because interference from distractors (crowding) is minimized relative to when attention is initially diffused over the display (cf. Di Lollo, 2012). Similarly, with small rather than large set sizes, attention supposedly can be more rapidly focused upon the target, so distractor interference is minimized and target identification enhanced (Di Lollo et al., 2000; cf. Argyropoulos et al., 2013). Processing of a brief target may continue after target offset, using a fading memorial representation that, in the trailing mask condition, must compete with information about the presence of mask dots around a blank space. The fewer target-processing iterations completed before offset, the longer the period of this competition. Consequently, the more likely it is that the representation of the mask alone will either replace that of the target plus mask or else be perceived as a transformation of it—substituting for it in visual short-term memory (VSTM; Di Lollo et al., 2000; Lleras & Moore, 2003; Oriet & Enns, 2010; Pilling & Gellatly, 2010). So focused attention is supposed to reduce OSM by speeding up target identification and, thereby, shortening the period of competition between processing of the target plus mask memory trace and processing of information that only the mask is present. Our results can be said to strongly support the first part of this prediction but offer only limited support to the second. Attention clearly increases iterative target processing, shown by increased accuracy of reporting the target, but the probability of substitution seems mostly independent of that process.

Our results are largely comparable with findings by Ghorashi and colleagues found in the context of the attentional blink paradigm (Ghorashi et al., 2010; Ghorashi et al., 2009a; Ghorashi et al., 2009b; cf. Visser, 2011). What the authors found was that focusing attention on the target (T2 in the attentional blink) by means of a spatial cue facilitated stimulus identification; however, this occurred in a manner additive with the AB effect itself (as manipulated by the T1–T2 intertarget lag). Our results show a similar pattern on stimulus identification: Performance is improved by precuing almost irrespective of whether or not a target is masked through OSM. Ghorashi and colleagues argued on the basis of their findings that the spatial selection of a stimulus and the identification of the stimulus should be thought of as functionally separate cognitive processes analogous to the functional separation between “what” and “where” associated, respectively, with the anatomically distinct ventral and dorsal visual pathways (Milner, Goodale, & Vingrys, 2006; Mishkin, Ungerleider, & Macko, 1983). One could arguably use the same framework to understand our results. The utilization of spatial cue information to select the target location can be understood to be processed along the dorsal (“where”) pathway, the processing of target digit identity being processed along the ventral (“what”) pathway. In such a model, OSM could be conceived to operate within the ventral stream, rendering masking largely insensitive to spatial attentional manipulations, although with overall performance still being affected. What this model does not explicitly formulate, however, is how spatial attention actually operates to modulate processing of a target. A frequent suggestion is that a spatial precue produces enhancement at the indicated display location amplifying the target signal (e.g., Carrasco, Williams, & Yeshurun, 2002; Posner, 1980), thus improving the quality of the stimulus representation (Carrasco, Penpeci-Talgar, & Eckstein, 2000; Dosher & Lu, 2000). Others argue that spatial attention improves distractor inhibition (Eriksen & Eriksen, 1974; Shiu & Pashler, 1994), reducing the noise in the decision process in which the target is identified (Gould, Wolfgang, & Smith, 2007; Smith & Ratcliff, 2009). We did not specifically seek to tease apart these possibilities; however, our data give some indications. Seen across our experiments, the effect of spatial cuing seemed broadly similar irrespective of whether distractors were confusable with the target (random digits) or different from the target (e.g., Xs) and whether distractors were heterogeneous or homogeneous. This seems to indicate that precuing had a fairly limited role in inhibiting distractors or in reducing decision noise associated with distractors.

Perhaps what determines substitution is less the extent to which the target is processed prior to offset, but the probability that, subsequently, the trailing mask, rather than the target memory trace, is attended.Footnote 3 If attention remains focused on the target (plus mask) memory trace, it consolidates the target into VSTM and/or conscious experience; but if attention switches to the developing representation of the mask alone, it consolidates that into VSTM instead. In our studies, spatial attention may have become focused on the target location by the time of target offset even on invalidly cued trials (Experiment 1) or zero precue trials (Experiments 1 and 2). Hence, the probability of attention being directed to the mask alone would be equal for all trial types and, consequently, so would the likelihood of substitution. On this account, spatial attention is a single mechanism that facilitates recurrent object processing, its effects being determined by which object is selected (target or mask). A rather different account of the role of attention in masking has recently been proposed by Smith, Ellis, Sewell, and Wolfgang (2010). They posit two mechanisms of attention, an early selection mechanism that enhances processing of a selected stimulus and a late selection mechanism that increases the rate of transfer of information about that stimulus to VSTM. Imaging studies of the kind already being used to study OSM (Boehler et al., 2008; Carlson et al., 2007; Prime et al., 2011; Woodman & Luck, 2003) offer the best means of distinguishing between our proposal and that of Smith et al.

Finally, one might ask whether there are conditions in which precuing effects on OSM might be amplified. We think this is a possibility. In all our experiments, target and mask occupied the same position and onset in the same frame. It may be that spatial precuing is more effective in ameliorating OSM in displays in which target and mask are in separate locations (Jiang & Chun, 2001) or when the target proceeds the mask in time (e.g., Gellatly et al., 2010). Here, the spatial precue may be more effective at directing attention to the target alone, rather than, as is possibly the case in the present experiments, to a location that contains both the target and mask. If spatial attention can be brought more effectively toward the target, this may facilitate the ease with which the target is individuated from the mask as a separate perceptual object, leading—according to the object updating account of OSM (e.g., Moore & Lleras, 2005)—to release from masking. Such questions could be a focus of future research.

Conclusion

OSM, as initially reported by Enns and Di Lollo (1997) and Di Lollo et al. (2000), remains an intriguing and counterintuitive type of visual masking and an important tool for investigating the implementation of reentrant processing in brain networks. There is still much to be learned from it about how the early stages of processing do or do not result in visual consciousness of a target object and its features. In this article, we sharpen understanding of OSM regarding the influence of directed spatial attention. We demonstrate that, counter to previous claims, spatial attention often has no influence on OSM. Where an effect was produced, it was small. This presents a rather different picture from other claims in the literature where it has been suggested that prior attention entirely abolishes the OSM effect (e.g., Di Lollo et al., 2000; Enns, 2004; Neill et al., 2002). Precuing seems to produce, at best, mild attenuation of OSM and, in many cases, has no effect at all.

Our findings are convergent with recent work that, contrary to earlier claims, has indicated that the number of display items does not influence OSM when ceiling and floor effects are avoided (Agryropoulos et al., 2013; Jannati et al., 2013). Given these combined results, one must acknowledge that attention has a far less privileged role in OSM than was claimed in the original description of the phenomenon (Di Lollo et al., 2000; Enns & Di Lollo, 1997) and in much of the OSM literature to date.