Introduction

Our visual environment is complex, dynamic, and densely occupied. To make sense of this environment, our visual system relies on attention to select a subset of input for preferential processing. In keeping with this notion of selective processing, a vast body of research indicates that attention modulates performance on tasks involving the processing of a single stimulus, including – but not limited to – orientation discrimination (e.g., Lee et al., 1997), letter identification (e.g., Luck et al., 1996), and spatial-gap localization (e.g., Yeshurun & Carrasco, 1999).

By and large, the effects of visual attention on these local aspects of visual perception have been well characterized (see Carrasco, 2011, for a review). However, some percepts – such as those of ensemble representations – arise from the pooling of many items. Ensemble stimuli consist of multiple elements that can be perceived as a single set, even when a single item within that set cannot be recognized (Alvarez, 2011; Whitney & Yamanashi Leib, 2018). While a commonly studied ensemble representation is the average of features across space (e.g., mean size), there are also other, pattern-based, ensemble representations (Alvarez, 2011; Alvarez & Oliva, 2009). Examples of these are global motion, which involves the visual system pooling multiple local motion signals so that a coherent motion direction is perceived (e.g., clockwise versus counterclockwise; Newsome & Paré, 1988), and global form (Glass, 1969), where local orientation signals are pooled to create a percept of a coherent spatial pattern (e.g., a concentric arrangement of dots). These stimuli have strong ecological validity: We receive optic flow signals as we move through our environment, and therefore the visual system is constantly pooling motion signals; similarly, the visual system continually pools orientation signals in the service of object recognition (Gibson, 1979).

Ensemble processing is believed to be a means through which the visual system efficiently represents vast arrays of visual information. This has led some authors to make the claim that ensemble processing occurs with minimal, or even no, reliance on attention (e.g., Bronfman et al., 2014; Joo et al., 2009). If it is assumed that ensemble representations are computed in the name of efficiency, then such claims seem plausible, and they are part of a broader argument for ensemble coding being an automatic process (Yildirim et al., 2018). However, evidence is mixed regarding the role of attention in ensemble processing. While some studies have found that limiting attentional resources has no effect on the representation of ensembles (e.g., Bronfman et al., 2014; Joo et al., 2009), others have found that limiting attentional resources either renders ensemble processing impossible, or at least impairs ensemble processing (e.g., Brand et al., 2012; Huang, 2015; Jackson-Nielsen et al., 2017; McNair et al., 2017). Importantly, attention does appear to have a modulatory role (Whitney & Yamanashi Leib, 2018) – for instance, it is known that a broad attentional breadth facilitates mean size estimation relative to a narrow attentional breadth (Chong & Treisman, 2005), and attention to a single item biases the perception of mean size (de Fockert & Marchant, 2008). However, whether attention also modulates the processing of pattern-based ensembles that have greater ecological validity (e.g., global motion) is unclear.

When considering the question of whether attention modulates the processing of more naturalistic ensembles, it is important to delimit this issue by specifying which type of attention is being examined. Part of the reason for the mixed findings regarding whether attention is necessary for ensemble processing may be the plethora of methods that have been used to manipulate attention. These methods include the attentional blink (Joo et al., 1999; McNair et al., 2007); inattentional blindness (Jackson-Nielsen et al., 2017); multiple object tracking of target objects moving amongst distractors (Alvarez & Oliva, 2008, 2009); explicit instructions to attend to one set of items intermixed with another, to-be-ignored, set (Oriet & Brand, 2013); precue versus postcue comparisons, where participants are either forewarned about the aspect of an ensemble that they must report, or are only informed after the presentation of the stimulus (e.g., Huang, 2015); dual-task paradigms in which participants must process an aspect of a cued location, while the ensemble is assumed to be either inside or outside the focus of attention (Bronfman et al., 2014; Preston et al., 2020; Ward et al., 2016); and visual search, where the ensemble information is contained in the distractors (e.g., Chetverikov et al., 2016). This variety of methods may be a product of the general lack of clarity regarding what type of attention authors are intending to examine. To allow for conclusions to be drawn about the involvement of attention in ensemble processing, and about the extent to which the studies are consistent with or contradict one another, the need for clarity is paramount.

But what do we mean by a “lack of clarity” regarding the method of attentional manipulation? We believe this lack of clarity manifests in two forms. First, researchers in the field often do not acknowledge that attention is a multifaceted construct, and therefore do not make clear what type of attention they are manipulating (or aiming to manipulate). For instance, in an investigation of mean size encoding, Oriet and Brand (2013) asked participants to attend to only the horizontal lines in an array consisting of both horizontal and vertical lines. The authors claimed to be examining “selective attention,” but given that all forms of attention are necessarily selective, this description fails to specify the nature of attentional allocation entailed by the task requirements (e.g., the deployment of multiple foci of object-based attention, or the allocation of feature-based attention). To take another example, Alvarez and Oliva (2009) examined the effects of “reduced attention” on the processing of an orientation ensemble by having participants perform a change-detection task while they tracked moving objects; however, they did not make it clear what type of attention they were aiming to “reduce” with the requirement to track moving stimuli (e.g., the authors may have been referring to a reduction in working memory resources available to perform the ensemble task, or to a redeployment – rather than a reduction – of spatial attention). By contrast, in the attention and low-level perception literature, different types of attention are clearly distinguished – for instance, exogenous (i.e., stimulus-driven) attention versus endogenous (i.e., goal-driven) attention (e.g., Barbot et al., 2012), and spatial attention versus temporal attention (e.g., Sharp et al., 2018) – and there is a concerted research effort to unearth the differences in their perceptual effects (see Carrasco, 2011, for a review). The existence of these differences at lower levels of visual processing leads to the logical possibility that the effects of attention on ensemble processing also depend on the type of attention being examined; therefore, attempting to reconcile the findings of studies that have used different paradigms to examine this relationship may be futile.

Second, for a given method of attentional deployment, the nature of attentional allocation is sometimes ambiguous. For example, manipulation involves a dual task where participants are instructed to prioritize one (nonensemble) task that has been precued, and their performance on the other task (the ensemble under investigation), which can be presented either within the cued location or at the uncued location, is assessed (Bronfman et al., 2014; Ward et al., 2016). It is likely that this type of procedure is manipulating the allocation of attention over space, or object-based attention given that the to-be-attended region is circumscribed by a visual cue.Footnote 1 However, in such a paradigm, it is not clear how participants are allocating their attentional resources when asked to prioritize the nonensemble task, since participants know that their performance on both tasks will be assessed (Preston et al., 2020). Therefore, even when the type of attention being investigated is somewhat clear, some paradigms may not actually be manipulating the intended aspect of attention.

In this study, we examined the effects of misdirecting attention on ensemble processing using a paradigm that clearly manipulates spatial attention: the Posner – or spatial – cueing paradigm (Posner, 1980; Posner et al., 1978). Spatial attention is an important mechanism for prioritizing different regions of the visual field for enhanced processing and is constantly deployed as we interact with our environment, making it an important aspect of attention to study. As a manipulation of spatial attention, the Posner cueing paradigm has been frequently used in the attention and low-level perception literature (Carrasco, 2011), and involves comparing performance in valid trials – where attention is directed to the location of an upcoming target by a visual cue – against performance in invalid trials – where attention has been directed to a nontarget location. In our study, 75% of trials were valid, and 25% of trials were invalid; the high validity of the cue provided a clear incentive for participants to attend to the cued location, which meant that there was little ambiguity regarding where attentional resources were being allocated. Indeed, there has been a recent call for the use of spatial cueing in light of exactly this advantage (Preston et al., 2020).

Yet another benefit of this paradigm is that the “attended” (validly cued) and “unattended” (invalidly cued) regions are well spatially separated, which makes it unlikely that both locations are within the focus of spatial attention. Some previous studies (e.g., Oriet & Brand, 2013) have presented to-be-attended and to-be-ignored sets of stimuli in a spatially intermixed fashion, which – for the purposes of investigating the effects of focal, spatial attention on ensemble processing – is clearly problematic. To take another example, Chen et al. (2020) recently examined the effect of an “unattended” ensemble on another ensemble that had been briefly cued, and found that when the unattended ensemble contained information that matched the target category, processing of the “attended” ensemble was facilitated; conversely, processing of the attended ensemble was impaired when the unattended ensemble contained target-incongruent information. However, the “attended” and “unattended” ensembles were in relatively close spatial proximity, with an interstimulus distance of only 2°, making it unlikely that the unattended ensemble was indeed outside the focus of attention. We believe the cleanest method of examining the effect of attention on ensemble processing is to maximize the spatial separation between “attended” and “unattended” locations while presenting one ensemble stimulus at a time, either at a cued or at an uncued location. Using this simple, yet effective, manipulation, we asked the basic question: Does spatial attention enhance, impair, or have no effect on ensemble processing?

To examine the effect of spatial attention on ensemble processing, we used a global motion stimulus. While most ensemble processing research has focused on the coding of summary features such as mean size (e.g., Chong & Treisman, 2005), the need for researchers to consider more ecologically valid spatial regularities has been highlighted (Alvarez & Oliva, 2009). Global motion stimuli comprise many moving dots, a proportion of which move in a consistent direction and the remainder of which move in random noise directions (Newsome & Paré, 1988); integrating these local signals results in a percept of global motion (e.g., the dots generally moving in a clockwise or counterclockwise direction). The pooling of local motion signals is imperative for the perception of optic flow – the sensation of motion we experience as we move around our environment (Gibson, 1950) – as well as the perception of object motion. Cortical areas specialized for the processing of this information have been identified, including V1 for the extraction of local motion signals, and areas V5/MT and MSTd for the pooling of these signals (Duffy & Wurtz, 1991; Tootell et al., 1995). Performance on tasks requiring the pooling of motion signals tends to be very good, with individuals able to discriminate global motion signals at low coherence levels (e.g., Edwards et al., 1998), and exhibiting high sensitivity to large-field optic flow stimuli (e.g., Edwards & Ibbotson, 2007); the ability of individuals to perceive such patterns of motion attests to the functional significance of pooling motion signals. Global motion stimuli, which entail such pooling, therefore possess strong ecological validity. Finally, the well-characterized pooling mechanisms that underlie global motion processing give them an advantage over other ensembles (e.g., mean size) with less clear pooling mechanisms (Alvarez, 2011).

In Experiment 1, we used a centrally presented, predictive attentional cue and a rotational global motion stimulus to examine the effects of spatial attention on global motion processing. In Experiment 2, we used a translational global motion stimulus and altered the cueing procedure to maximize our chances of detecting an attentional modulation; that is, we improved the potential for our paradigm to reveal a performance difference between the valid and invalid trials by using a more sensitive motion signal and performance measure, and a cue that simplified task demands. To preview our results, in both experiments, we found no evidence that spatial attention affected global motion processing, as revealed through accuracy (Experiment 1) and coherence thresholds (Experiment 2). This is despite evidence from the reaction time (RT) data that participants were effectively using the cue to guide their attention. Together, our results indicate that naturalistic global motion stimuli are remarkably impervious to misdirected spatial attention.

Experiment 1

In Experiment 1, we examined the effect of spatial attention on the processing of a rotational global motion stimulus. We compared accuracy in detecting the direction of motion (clockwise versus  anticlockwise) across valid and invalid trials, using a 75% predictive cue that was presented in central vision. Prior to this cueing block, we determined performance thresholds for the global motion stimulus using the psi adaptive method (Kontsevich & Tyler, 1999), which gives a threshold estimate corresponding to a performance level of 75%; this threshold estimate is equivalent to the proportion of dots that must be moving in the signal direction in order for performance to be 75% correct. A 75% performance level corresponds to the most sensitive portion of the psychometric curve for a two-alternative forced choice procedure, and therefore setting stimulus coherence at this threshold in the cueing block maximized our chances of observing performance differences between the valid and invalid trials (Goodhew & Edwards, 2019). If global motion processing is affected by spatial attention, accuracy on the task should be lower in invalid trials (where attention is misdirected) than in valid trials (where attention is guided to the location of the subsequently presented stimulus).

Our attentional cue was a number that correctly indicated the location of the global motion stimulus on 75% of trials. The number presented could be “1,” “2,” “3,” or “4,” and each number was associated with one of the four possible target locations with 75% validity; in other words, on 75% of trials, the global motion stimulus appeared at the location with which the presented number was paired. Importantly, because the number-location associations were arbitrary, this type of cueing promoted the voluntary allocation of spatial attention (Olk et al., 2008; Olk et al., 2014); there is evidence that this form of orienting more reliably modulates perception than does involuntary attention (e.g., Esterman et al., 2008; Prinzmetal et al., 2005), and therefore our use of a predictive number cue ensured that we had created the optimal conditions for revealing an attentional modulation of global motion processing. Participants were informed of the number-location associations at the beginning of the experiment and completed a simple letter discrimination task so they could learn to use the cue to guide their attention.

Method

Participants

To obtain an estimate of the required sample size, we used the web version of the within-subjects sample-size planning calculator developed by Anderson et al. (2017) (available from https://designingexperiments.com/), along with the data from Experiment 2 in Kerzel et al. (2009).Footnote 2 With N = 17, this experiment yielded an effect size of ηp2 = .422 for the main effect of cue validity. After specifying a desired power of .8, a desired assurance of .7, and an alpha of .05 for both the previous and the current study, the analysis yielded a recommended sample size of 29 participants, which we rounded up to 30.

Overall, 30 individuals (18 female) with an average age of 20.5 years (SD = 2.0) participated in exchange for either course credit or $10 payment. All participants had either normal vision or vision that had been corrected to normal with contact lenses. Each participant provided written informed consent, and both experiments in this study were approved by the Australian National University Human Research Ethics Committee.

Apparatus and stimuli

Stimuli were presented on a gamma-corrected, Philips Brilliance (the Netherlands) 202P4 CRT monitor with a refresh rate of 85 Hz and a background of mid-grey (39.3 cd/m2). The viewing distance to the monitor was fixed at 50 cm with a chinrest. Stimuli were presented using the Psychophysics Toolbox (Brainard, 1997) in MATLAB, and the Palamedes Toolbox (Prins & Kingdom, 2009) was used for the thresholding blocks. Eye movements were monitored with the Cambridge Research Systems (UK) 50 Hz Video Eyetracker Toolbox.

The global motion stimulus consisted of 50 white dots spanning 0.1° of visual angle in diameter. These dots had a luminance contrast of 109% (Weber value) and fell within a viewing aperture that had an inner radius of 0.1° and an outer radius of 2.5°, resulting in a dot density of 2.6 dots/deg2. The motion sequence consisted of five pages each lasting for three frames,Footnote 3 resulting in a total stimulus duration of approximately 150 ms. A proportion of the dots in this stimulus moved in a clockwise or anticlockwise direction, with the remaining dots moving in random directions. The spatial step size of the motion stimulus was 0.3°, resulting in a dot speed of 8.9 deg/sec. The direction in which each dot moved was randomly assigned at the start of each frame.

The experiment display consisted of a black fixation dot surrounded by four white, circular placeholders (diameter: 6°; thickness: 0.1°), which were located 12° above, below, to the left of, and to the right of the fixation dot. The number cue could be “1,” “2,” “3,” or “4,” and was presented in white, Arial font, and measured approximately 1.5°. The target for the letter discrimination task was the letter “E” or “F,” which was presented in white, Arial font, and measured approximately 2.3°; this letter target appeared at an eccentricity of 12°.

Procedure

This experiment consisted of a global motion thresholding block, a letter discrimination task with cueing, and a global motion block with cueing. These blocks were completed in the same order by all participants. Each thresholding block comprised two 60-trial runs of the psi adaptive method (Kontsevich & Tyler, 1999), which gives a threshold estimate corresponding to a performance level of 75%; here, participants were required to determine whether the dots were moving in an anticlockwise or clockwise direction (left-arrow-key response for “anticlockwise”; right-arrow-key response for “clockwise”). Each trial began with 50 ms of fixation, after which the stimulus appeared randomly inside one of the four placeholders for approximately 150 ms. Following response, a blank screen appeared for 1,000 ms before the next trial began. Participants were encouraged to maintain fixation at all times and to prioritize accuracy over speed. The threshold estimated by the second run of this adaptive method determined the coherence of the global motion stimulus in the cueing block.

Participants then completed a 100-trial letter-discrimination task involving the number cue. The purpose of this block was for participants to learn the number-location pairings, and required participants to indicate whether the letter “E” or “F” had been presented on any given trial. In this block, each trial began with 1,000 ms of fixation, followed by the number “1,” “2,” “3,” or “4” appearing in the center of the display. Each number was associated with a particular target location, such that when a given number appeared, there was a 75% chance that the target would appear at the location with which it was paired. There were four possible “sets” of associations (see Table 1), and the set that each participant received was randomly determined. Participants were informed of these associations, as well as the validity of the cue, prior to completing the letter discrimination task. After a 600-ms delay (sufficient time to allow for interpretation of the cue and a subsequent attentional shift; e.g., Olk et al., 2008), the letter “E” or “F” appeared in one of the four placeholders, and participants had to indicate which letter appeared by pressing the corresponding letter key on the keyboard as quickly and as accurately as possible. The letter remained visible until response. The letter discrimination block consisted of 75 valid trials, where the letter appeared at the location associated with the presented number, and 25 invalid trials, where the letter appeared at one of the other three locations.

Table 1 Possible number-location associations in Experiment 1

After this letter discrimination block, participants were run on a 200-trial global motion block with 150 valid trials and 50 invalid trials. The sequence of a single trial in this block was identical to that of a single trial in the letter discrimination block, except for the presentation of the global motion stimulus instead of the letter (see Fig. 1), and the instruction to again prioritize accuracy over speed. The coherence of the motion stimulus was set at the threshold estimate given by the second run of the psi adaptive method. Initiation of each trial was contingent on 250 ms of continuous fixation inside the defined fixation region (a 1.5° invisible square), and if the participant’s gaze was not within the fixation region when the number cue was first presented, at the beginning of the motion sequence (600 ms after initiation of the trial), or at the end of the motion sequence (750 ms after initiation of the trial), the trial was discarded. This ensured that we were able to examine the effect of covert spatial attention on global motion processing, as opposed to any modulation that reflects the combined effect of covert spatial attention and eye movements.

Fig. 1
figure 1

This figure shows the presentation sequence of a single trial in the global motion block of Experiment 1. If the number “3” was associated with the rightmost target location, this would be an example of a valid trial; if it was associated with one of the other three locations, this would be an example of an invalid trial. See text for further detail

Results and discussion

We first analyzed the RT data from the letter discrimination block. Trials in which participants responded incorrectly or pressed a non-designated response key, as well as those in which the RT was below 100 ms or exceeded 2.5 SDs above that participant’s mean RT, were removed (3.2% of all trials). A paired t test on mean RTs revealed faster responding in the valid trials (M = 859 ms, SD = 201) compared to the invalid trials (M = 1,077 ms, SD = 345), t(29) = 5.76, p < .001, Cohen’s d = 1.05.Footnote 4 We conducted an equivalent paired Bayesian t test using JASP (JASP Team, 2020). BF10 values above 1 indicate evidence in favor of the alternative hypothesis, while values below 1 indicate evidence in favor of the null hypothesis (with clear evidence for the alternative and null hypotheses typically indexed by a BF10 of at least 3 or at most 1/3, respectively). This analysis revealed a BF10 of 6171.81: “decisive”Footnote 5 evidence of a cueing effect. This indicates that participants were able to use the symbolic cue to guide their attention to the appropriate target location.

We then analyzed the accuracy and RT data for the global motion block, which are illustrated in Fig. 2. Average global motion thresholds were 21.4% in the first run and 20.9% in the second run of the psi adaptive method. We examined the accuracy (mean percentage correct) data for the global motion block to identify any participants whose performance was below an average of 60% or above an average of 90% correct in both cue-validity conditions; this was done to identify participants who were performing at floor or at ceiling, and thus whose data may have been obscuring a cueing effect (see Argyropoulos et al., 2013; Goodhew, 2019). No participants met these criteria, and therefore all participants’ data were retained. Trials in which the participant’s gaze fell outside of the fixation region were excluded (8.1% of all trials), and 3.3% of trials were excluded based on the same RT cutoffs used for the letter discrimination task.

Fig. 2
figure 2

This figure shows the mean accuracy (percentage correct responses) and reaction time (RT) in ms for each cueing condition of the global motion block in Experiment 1. Error bars represent standard errors and were calculated according to the Cousineau-Morey method (Cousineau, 2005; Morey, 2008; O’Brien & Cousineau, 2014)

A paired t test on the mean percentage correct responses in the valid trials (M = 77.2%, SD = 8.3) compared to the invalid trials (M = 77.4%, SD = 7.1) revealed no significant difference in accuracy, t(29) = 0.17, p = .866, Cohen’s d = 0.03. Therefore, there was no evidence that attention affected performance on the global motion task. The equivalent Bayesian t test yielded a BF10 of 0.20: “substantial” evidence against a cueing effect on accuracy. Another paired t test on the mean RTs in this block revealed significantly faster responding in the valid trials (M = 579 ms, SD = 150) compared to the invalid trials (M = 657 ms, SD = 199), t(29) = 4.55, p < .001, Cohen’s d = 0.83, with a BF10 of 288.07 showing “decisive” evidence for a cueing effect on RTs. Therefore, while cueing did not produce any differences in accuracy, there was strong evidence that participants were attending to the cued location.

Overall, there was no evidence from Experiment 1 that spatial attention modulated global motion processing. Indeed, mean accuracies for the valid and invalid trials were almost identical (77.2% and 77.4%, respectively), and there was “substantial” support for a null effect from the Bayesian analysis. However, while there was no cueing effect on accuracy, there was strong evidence that participants were attending to the cued location: RTs were significantly faster in valid compared to invalid trials. Such RT effects can be taken to show that attention has been shifted to the cued location, and importantly, are common even in experiments where accuracy is emphasized over speed (e.g., Montagna et al., 2009; Yeshurun & Carrasco, 1999). Therefore, we can be confident that attention was shifted to the cued location in this experiment, but that this allocation of attentional resources did not result in improved performance on this particular ensemble task. Nevertheless, we sought to obtain further evidence for this null effect in a subsequent experiment.

Experiment 2

In Experiment 1, there was no evidence for an effect of spatial attention on global motion processing. Nevertheless, we performed another experiment to enhance the sensitivity of our paradigm for detecting any influence of spatial attention on performance. First, instead of analyzing percentage correct responses at the global motion threshold estimated by the psi adaptive method, we directly compared thresholds between the valid and invalid trials. Thresholds may be a more sensitive measure of performance than percentage correct responses because they correspond directly to a 75% performance level (the most sensitive part of the psychometric curve for a two-alternative forced choice task; Goodhew & Edwards, 2019), and by only sampling a proportion of the valid trials in order to compute this threshold, we were able to compare equal numbers of valid and invalid trials (cf. 150 valid trials versus 50 invalid trials in Experiment 1); this, in turn, may improve estimates of cueing effects.

We also attempted to reduce sources of experimental noise that may have been present in Experiment 1. We removed the top and bottom target locations so that there were now two possible target locations – one to the left and one to the right of the fixation stimulus. This change was made in view of asymmetries in motion sensitivity along the vertical meridian (Edwards & Badcock, 1993), which may have unduly added noise to the data of the previous experiment. To further reduce the variability in motion thresholds, the nature of the global motion signal was changed from rotational to translational, with a proportion of the dots now moving either upwards or downwards on any given trial.Footnote 6 Finally, we employed an arrow cue rather than a number cue, and reduced the cue-target interval from 600 ms to 300 ms. This change was made to simplify task demands as much as possible, since interpretation of an arrow cue may be less taxing than interpretation of a number cue arbitrarily associated with a particular location. Notably, this stimulus configuration (i.e., a predictive arrow cue with two possible target locations and a 300-ms cue-target interval) has been shown to reliably elicit attentional shifts (e.g., Posner et al., 1978).

Method

Participants

Thirty individuals (17 female) with an average age of 23.5 years (SD = 5.6) participated in exchange for either course credit or $15 payment. All participants had either normal vision or vision that had been corrected to normal with contact lenses.

Apparatus and stimuli

Stimuli were presented on a 1 920 x 1 080 LCD monitor with a refresh rate of 100 Hz and a background of mid-grey (71.1 cd/m2). The viewing distance to the monitor was fixed at 86 cm with a chinrest. As in Experiment 1, stimuli were presented using the Psychophysics Toolbox (Brainard, 1997) in MATLAB, and the Palamedes Toolbox (Prins & Kingdom, 2009) was used to obtain threshold estimates. Eye movements were monitored with the SR Research (Canada) Eyelink 1000 desktop-mounted eyetracker, which has a sampling rate of 1,000 Hz.

The spatial and temporal properties of the global motion stimulus were similar to those of Experiment 1; however, rather than moving clockwise or anticlockwise, a proportion of the dots now moved either upwards or downwards. The placeholders and arrow cue were the same as in Experiment 1, except now there were only two placeholders – one to the left and one to the right of the fixation stimulus. The fixation stimulus was a bullseye and crosshair combination target, which has been found to produce especially stable fixations relative to other fixation stimuli (Thaler et al., 2013). The outer circle of this stimulus had a radius of 0.5°, and the inner circle had a radius of 0.2°.

Procedure

The main cueing block consisted of two independent psi adaptive staircases running simultaneously, one for the valid trials and one for the invalid trials. Each staircase estimated a global motion threshold corresponding to a 75% performance level. There were a total of 400 trials in this block: 300 valid trials and 100 invalid trials. To ensure that threshold estimates for valid and invalid trials were computed over an equivalent number of trials, only every third trial in the valid condition was sampled when the threshold for this condition was being computed. The block was configured in such a way that each consecutive quarter of the block contained an equal proportion of valid and invalid trials; this way, we could be sure that threshold estimates were not disproportionately derived from trials occurring earlier or later in the block. For all blocks, participants were asked to prioritize accuracy over speed, and to maintain fixation.

Participants first completed a 32-trial practice block of the global motion task with accuracy feedback, which did not involve the arrow cue. Here, the global motion stimulus appeared randomly to the left or to the right of the fixation stimulus, and participants were required to press the “up” arrow key to indicate that the dots were moving upwards and the “down” arrow key to indicate that the dots were moving downwards. Participants then completed a 32-trial practice block of this task with the arrow cue (see Fig. 3), which was 75% valid with respect to the target location; here, the global motion stimulus appeared 300 ms after the onset of the arrow. Participants were informed of the cue’s validity and instructed to use the cue to guide their attention. After the practice block, participants completed the 400-trial experimental block. Trials were excluded if participants were found to be fixating outside of the defined fixation region at the onset of the arrow cue, at the onset of the global motion stimulus, or at the offset of the global motion stimulus.

Fig. 3
figure 3

This figure shows the presentation sequence of a single (valid) trial in the main cueing block of Experiment 2

Results and discussion

Trials in which the participant’s gaze fell outside of the fixation region were excluded (10.4% of all trials), and 3.1% of trials were excluded based on the same RT cutoffs used in Experiment 1. We then conducted a paired t test on the global motion thresholds (proportion of dots) for the valid and invalid cueing conditions (Fig. 4). The mean global motion threshold in the valid trials (M = 38.2%, SD = 18.2) did not significantly differ from the mean global motion threshold in the invalid trials (M = 37.2%, SD = 18.4), t(29) = 1.00, p = .324, Cohen’s d = 0.18. The corresponding Bayesian t test showed “substantial” evidence for no difference between the thresholds, BF10 = 0.31. Another paired t test on mean RTs revealed significantly faster responding in the valid trials (M = 689 ms, SD = 120) compared to the invalid trials (M = 720 ms, SD = 114), t(29) = 3.02, p = .005, Cohen’s d = 0.55, with the corresponding Bayesian t test showing “substantial” evidence for this effect, BF10 = 7.88.

Fig. 4
figure 4

This figure shows the mean threshold (proportion of dots) and reaction time (RT) in ms for each cueing condition of Experiment 2. Error bars represent standard errors and were calculated according to the Cousineau-Morey method (Cousineau, 2005; Morey, 2008; O’Brien & Cousineau, 2014)

Overall, the results of Experiment 2 aligned with those of Experiment 1: Global motion performance did not significantly differ between the valid and the invalid trials, but there was evidence of attention being shifted to the cued location in that participants were faster to respond in the valid trials compared to the invalid trials. This indicates that the cueing procedure effectively shifted participants’ attention, but that these shifts had no measurable impact on the perception of this ensemble stimulus. An advantage of Experiment 2 over Experiment 1 is that it involved a direct comparison of performance thresholds between valid and invalid trials, and these thresholds were computed across equivalent numbers of trials. With this improvement in the sensitivity of the design for revealing cueing effects, as well as the additional measures we introduced to reduce experimental noise (e.g., changing the nature of the motion signal), we again found no evidence that spatial attention affected global motion processing.

General discussion

In this study, we used Posner cueing to examine the effect of spatial attention on the processing of a global motion stimulus, a highly naturalistic ensemble. Posner cueing is a highly effective way of manipulating the locus of spatial attention, and involves the attended and unattended stimuli being presented at separate spatial locations. Overall, the results of Experiments 1 and 2 converged to provide evidence against an effect of spatial attention on global motion processing. This was so despite strong evidence from both experiments that attention was being shifted to the cued location, as indexed by faster responding in valid trials compared to invalid trials. The fact that this result emerged across different measures (percentage correct responses versus performance thresholds), cue types (number versus arrow), and types of motion signal (rotational versus translational) is evidence of its robustness. Overall, our results show that global motion processing is remarkably impervious to the misallocation of spatial attention.

Since our results showed no effect of spatial attention on the processing of a motion ensemble, how do they align with the ensemble processing literature more broadly? As discussed earlier, evidence is mixed regarding the effects of attention on ensemble processing. While some studies have found that limiting attentional resources has no effect on ensemble coding (e.g., Bronfman et al., 2014; Joo et al., 2009), others have found that limiting attentional resources precludes ensemble coding, or at least has a harmful effect (e.g., Brand et al., 2012; Huang, 2015; Jackson-Nielsen et al., 2017; McNair et al., 2017). Our results are consistent with the former group of studies. One advantage of our study over these previous efforts, however, is that we have been very clear about the type of attention we were manipulating with our paradigm of choice, and can therefore conclude that the misdirection of spatial attention has no effect on a task requiring ensemble processing. This is not to say that other attentional manipulations might produce different effects; for example, one study found evidence of inattentional blindness to ensemble stimuli (Jackson-Nielsen et al., 2017), and given that inattentional blindness may not be equivalent to misdirected spatial attention (e.g., Memmert, 2010), it could be that this paradigm reveals an attentional cost to ensemble coding that is the consequence of limiting an altogether different type of attentional process. Additionally, while other researchers have used paradigms that may have been manipulating spatial attention, an advantage of our paradigm is that it involved only one stimulus being presented at a time – thereby simplifying task demands (cf. Bronfman et al., 2014) – and we spatially separated the attended and unattended locations to limit the amount of attentional resources being directed to the “unattended” location (cf. Chen et al., 2020; Oriet & Brand, 2013).

Interestingly, a previous study examined the effects of withdrawing attention on global motion processing and observed somewhat counterintuitive results. Specifically, Motoyoshi et al. (2015) investigated the effect of “limited attention” on global motion processing by using a dual-task paradigm in which participants completed a challenging digit discrimination task presented within a rapid serial visual presentation stream, which was embedded within the global motion array. They found that global motion thresholds for determining the direction of motion were reduced in the dual-task condition compared to the single-task condition, indicating that limiting attentional resources actually improved performance on the task. An analogous pattern of findings was later obtained by Pavan et al. (2019) for Glass patterns (which tap the processing of coherent form, rather than motion); when attention was diverted away from a Glass pattern with a rapid serial visual presentation stream embedded within the stimulus, form adaptation was stronger – this indicates more extensive processing of the Glass pattern in the “limited attention” condition. Therefore, both studies indicate that limiting attentional resources not only spares the processing of at least some naturalistic ensemble stimuli, but can actually improve their processing. Motoyoshi et al. attributed their findings to limited attention attenuating center-surround inhibition in the high-level receptive fields dedicated to motion processing; this produces an increase in the spatial extent of motion-signal integration, which facilitates performance on the global motion task. Pavan et al. invoked a similar explanation for their Glass-pattern findings.

Our manipulation differs from that used by Motoyoshi et al. (2015) and Pavan et al. (2019) in that it was specifically a manipulation of spatial attention and its locus, rather than an attempt to divert attentional resources more generally through the use of a secondary task (i.e., a process more akin to “distraction” rather than a redirection of spatial attention). With our paradigm, we found no effect of spatial attention on global motion processing. However, it is important to note that Motoyoshi et al. found no difference in performance between the single-and dual-task conditions when they reconfigured their global motion stimulus into an annulus that only stimulated peripheral vision; this could be seen as consistent with the lack of cueing effect on performance we observed in our experiments, where the ensembles were also presented peripherally. Indeed, what may be a critical factor determining whether limiting attention benefits ensemble processing is the spatial separation between the “attended” and “nonattended” conditions: The greater the separation, the more likely it is that a lack of attention becomes harmful – or at least nonconsequential – rather than beneficial. To take another example, Preston et al. (2020) examined whether spared color-diversity processing in the “absence” of attention (originally observed by Bronfman et al., 2014) generalized to situations where the target is presented in the far visual periphery, a configuration that resulted in a greater distance between the cued and uncued regions. They found that under these conditions, color-diversity processing incurred an attentional cost (note that this study was a direct extension of Bronfman et al., 2014, so it is unclear if similar results would be observed with Posner cueing). Altogether, the discrepancy between our results and the facilitative effect of limited attention on global motion processing observed by Motoyoshi et al. may be due to the spatial separation of the attended and nonattended locations in our experiments. It should be noted, however, that our stimuli and attentional paradigm differed from those in these earlier studies, so caution should be exercised when drawing comparisons between them.

Recently, Baek and Chong (2020) have proposed that “distributed” attention is necessary for ensemble processing and “focused” attention is necessary for the processing of individual objects. How can the results of our study be reconciled with this distinction, given that the cueing procedure we used is most likely manipulating the locus of spatial attention, rather than attentional breadth? We believe it is more applicable in situations where the ensemble stimulus is presented in central vision – when this is the case, attention can either be broadened to encompass the entire ensemble, or narrowed to encompass a single element within the array. In the spatial cueing paradigm, it is not clear whether attention is “distributed” or “focused.” Since we observed no difference in performance between the valid and invalid cueing conditions, one possibility is that there was a diffuse allocation of attention across the entire display that enabled ensemble processing to occur at the “unattended” location (even highly naturalistic motion, such as biological motion, has been found to rely on some amount of attentional allocation; e.g., Cavanagh et al., 2001; Thornton et al., 2002), but the application of focal attention to the cued location did not provide additional benefits to performance. In other words, there may be a “baseline” level of attention that is required for ensemble coding to take place, but our results show that the misdirection of focal, spatial attention has no effect on this process.

If our finding for a particular type of naturalistic ensemble stimulus is indicative of how spatial attention affects all ensemble stimuli, why might ensemble processing be impervious to its effects, especially since it is known to improve many aspects of low-level visual perception (see Carrasco, 2011, for a review; see also evidence that attention can affect the activity of single neurons; e.g., Treue & Maunsell, 1996)? Ensemble coding is a unique process in that it may be a means through which the visual system maximizes efficiency in the face of a highly detailed visual environment, and therefore can occur with minimal input from attention; indeed, findings from the ensemble processing literature have been used as examples of how visual awareness can occur in the absence of mechanisms such as attention and working memory (e.g., Bronfman et al., 2014; see Hutchinson et al., 2021, for a general discussion of this issue). However, it is worth highlighting a series of studies by Yeshurun and colleagues that have shown effects of spatial cueing (manipulated via a procedure broadly similar to ours) on a texture segmentation task, which can be seen as requiring a type of ensemble coding. In these studies, participants were required to identify which of two presented textures contained a small set of lines (the “texture target”) that was oriented differently to the lines making up the background of the stimulus (Yeshurun & Carrasco, 1998, 2000; Yeshurun et al., 2008). In this task, attention elicited via a peripheral cue enhances performance when the target is in the visual periphery, but impairs performance when the target is presented in central vision (Yeshurun & Carrasco, 1998, 2000); when attention is elicited via a central cue, performance is uniformly improved (Yeshurun et al., 2008). Our results are at odds with these findings in that they indicate no effect of spatial attention on ensemble processing.

This discrepancy serves to highlight that just as attention is not a monolithic construct, neither is ensemble processing – there are many different types of ensemble processes, ranging from the computation of mean size to the perception of variance in the emotions expressed by multiple faces (Whitney & Yamanashi Leib, 2018). This is in addition to the less commonly studied pattern-based ensembles that do not require the computation of a summary statistic per se, but which nevertheless require the visual system to collapse across individual elements, such as patterns of spatial orientation or frequency (e.g., Alvarez & Oliva, 2009; Brady et al., 2017; Yeshurun & Carrasco, 1998, 2000; Yeshurun et al., 2008); our global motion stimulus falls primarily into the latter category of ensemble stimuli in that it is naturalistic and pattern-based.Footnote 7 The fact that there is no correlation between performance on high-level and low-level ensemble tasks (Haberman et al., 2015) further demonstrates the variability in the visual processes that different ensemble stimuli might entail. The potential, and indeed very likely, consequence of this diversity is that spatial attention may not affect all ensemble tasks in the same way, and that it would be worthwhile to see whether our findings for global motion processing can be replicated with other ensemble representations. Indeed, in an event-related-potential study, Ji et al. (2018) observed no differences in accuracy between valid and invalid trials for mean emotion judgments, which shows that our results may extend to higher-order ensemble representations that involve the computation of a single summary statistic.Footnote 8 However, it is unclear how some of the most commonly studied ensemble representations – such as that of mean size – are affected by spatial attention as manipulated through Posner cueing, and it is clear that spatial attention can have an effect on texture segmentation (e.g., Yeshurun & Carrasco, 1998, 2000; Yeshurun et al., 2008). It is therefore imperative that future studies systematically examine how different attentional manipulations affect the processing of different types of ensembles.

In sum, with the use of Posner cueing, we have shown that misdirecting spatial attention has no effect on the processing of a naturalistic motion ensemble. While we do not believe that differences in attentional manipulations can explain all discrepant findings in the attention and ensemble processing literature (conclusions have differed even between studies using the same attentional paradigm; e.g., the attentional blink; Joo et al., 2009; McNair et al., 2017), we believe future studies would do well to clearly specify the type of attention being examined and how it is being deployed. Moreover, we believe the field would benefit from a more systematic examination of how different types of attention affect different types of ensemble representations given their heterogeneity. Nevertheless, our results clearly demonstrate that global motion processing – a naturalistic form of ensemble coding – is remarkably resistant to misdirected spatial attention.