Introduction

When we first view a scene, the visual system rapidly extracts information about common patterns that may exist in the scene such as repeated colors or textures. The representation of such patterns is referred to as “summary statistical information” or “ensemble perception,” and is thought to play a critical role in the perception of natural scenes (Brady, Shafer-Skelton, & Alvarez, 2017). In particular, the ability to rapidly extract summary statistical information from a scene may reduce the burden of processing that would be otherwise needed by limited-capacity attentional and cognitive mechanisms. Nevertheless, despite evidence of rapid extraction of statistical information (e.g., Ariely, 2001; Chong & Treisman, 2003; Haberman & Whitney, 2007; also see review by Whitney & Yamanashi, 2018), it is still unclear whether ensemble perception can occur in the absence of visual attention. Resolving the issue has important implications for the understanding of perception more generally, and is the focus of the present paper.

Several recent studies have examined the extent to which ensemble perception could be carried out without attention, and they have yielded mixed results. On the one hand are studies that suggest that ensemble information can be processed without attention, or with only minimal attention. For example, Alvarez and Oliva (2008) asked participants to track a set of moving objects while ignoring a set of moving distractors. Although the to-be-ignored distractors were well outside the focus of attention, participants were still able to extract accurate summary statistics specifying their center of mass. Similarly, participants in Alvarez and Oliva (2009) were able to detect changes in an unattended background pattern more effectively when the change produced a different ensemble structure (compared to equivalent local changes that did not alter the summary statistics), suggesting reduced attentional demands for ensemble perception. In another study, Bronfman, Brezis, Jacobson, and Usher (2014) found that participants could report the diversity of colors contained in objects outside of focal attention with no cost to the performance of their primary task, which required focal attention elsewhere in the display, showing that color diversity, even outside focal attention, could be perceived automatically (see also Ward, Bear, & Scholl, 2016). Several other studies have also reached similar conclusions involving summary statistical information of other global visual attributes, such as circle size (Chong & Treisman, 2005) and gabor patch orientation (Alvarez & Oliva, 2009).

On the other hand, some studies have shown that ensemble perception does incur an attentional cost. Jackson-Nielsen, Cohen, and Pitts (2017) found that participants had no information about color diversity, size diversity, or the mean size of elements outside the focus of attention. Huang (2015) had participants make judgments about either individual visual features or summary statistics of stimuli presented sometimes in unexpected locations. He found that judgments of summary statistics benefited just as much from a spatial precue (which permitted focal attention) as did judgments of an individual feature. These findings suggest that ensemble perception is indeed attention-demanding, and cannot be accomplished in the absence of attention.

One reason for the conflicting results may be that many of the studies that have provided evidence for attention-free ensemble perception have employed dual-task paradigms in which participants perform a primary task with high attentional demands and are then probed on a secondary task about summary statistical information for unattended elements in the display (e.g., Alvarez & Oliva, 2008, 2009; Bronfman et al., 2014; Ward et al., 2016). Such paradigms leave open the possibility that participants may have allocated some attentional resources to the secondary (ensemble) task, rendering these experiments imperfect tests of the attentional demands of ensemble processing. However, studies that have revealed attentional costs of ensemble perception have used different methods. For example, Jackson-Nielsen et al. (2017) employed an inattentional blindness paradigm in which participants performed a focal task for several trials and then received an unexpected query regarding unattended elements after one of the trials. In that study, the participants did not have the motivation to allocate any attention beyond what was required of the focal task, and indeed there was no evidence of ensemble perception for unattended parts of the display. In the study by Huang (2015), focal attention was contrasted with divided attention across trials – with no incentive for participants to divide their attention on the focal attention trials. The results also showed that ensemble perception relies on focal attention.

The studies by Jackson-Nielsen et al. (2017) and Huang (2015) suggest that attention may be necessary for ensemble perception; however, those studies also suffer from a potential weakness. In particular, correct responses regarding the ensemble summary statistics in those experiments required the participants to successfully remember details of the ensemble in order to correctly respond. Thus, any failures of memory for the ensemble might be incorrectly assumed to reflect the absence of ensemble perception itself (Chen & Wyble, 2016; Jiang, Shupe, Swallow, & Tan, 2016; but see Ward and Scholl, 2015a, b). As a result, it is still unclear whether ensemble perception does or does not require attention.

In order to address this question, here we adopted a method that permits the assessment of ensemble perception without any memory requirement and without any motivation for the participant to attend to irrelevant portions of the display. The method is based on that used by Gronau and Izoutcheev (2017), who studied a similar question regarding the extent to which attention is required for the perception of scenes. The method also bears some similarity to methods used by others in which the processing of an unattended stimulus is assessed by examining its (indirect) effect on the processing of an attended stimulus (e.g., Du & Abrams, 2008, 2012; Eriksen & Eriksen, 1974; Gaspelin, Ruthruff, & Jung, 2014; Theeuwes, 1992, 1994, 2010). Here we use the method to examine ensemble perception. In the critical experiment reported below, participants were required to attend to one region of a display that contained the relevant stimulus and ignore another region that contained a distractor. Our goal was to determine the extent to which ensemble information about the distractor (outside of attention) was processed – but the distractors were not probed by explicit report. Instead, the processing of the unattended distractor was inferred on the basis of interference or facilitation caused by the distractor on evaluation of the relevant, attended stimulus. Thus, the method indirectly assesses the extent to which ensemble information is processed outside the focus of attention without any motivation for participants to attend to the critical (distractor) stimulus and without any requirement that features of the unattended stimulus be retained in memory.

Overview of experiments

We report two experiments below. In the first experiment, participants were briefly exposed to two ensemble stimuli (consisting of clusters of lines) and were asked to determine the presence of an ensemble that matched a pre-specified target category (vertical, horizontal, or oblique line orientations). Our interest was to determine whether performance would be facilitated when the two stimuli were from the same category. If such facilitation does occur, that would show that shared ensemble category membership can affect ensemble judgments when both ensemble stimuli are attended. To anticipate the results, such facilitation did occur. Then, in Experiment 2 our goal was to seek evidence for the same facilitation when one of the ensembles is outside the focus of attention. Such an effect there would indicate that ensemble perception can take place in the absence of attention.

Experiment 1

In this experiment two ensemble stimuli were briefly presented. The stimuli consisted of clusters of lines that were mostly vertical (the “vertical” category), mostly horizontal (“horizontal”), or mostly oblique (“oblique”; similar to stimuli used by others, e.g., Huang, 2015). Participants were to decide whether either ensemble was from a pre-specified target category. Our interest was to determine whether the decision was influenced by the extent to which both ensembles did or did not share the same category. Such a result would serve as an important prerequisite for the test conducted in Experiment 2, in which some ensembles were presented outside the focus of attention.

Method

Participants

The sample size here and in Experiment 2 was based on the study by Gronau and Izoutcheev (2017), who used a similar paradigm. The sample of 18 participants in their Experiment 1 yielded a medium effect size (partial eta squared = 0.66) when stimuli were fully attended. In order to enhance the power of the present experiment, 24 undergraduate students (13 females, 11 males, age 19–22 years) with normal or corrected-to-normal vision participated. They were paid 15 RMB (equivalent to about $2.14) for their participation.

Apparatus and procedure

Stimuli were presented on a 17-in. CRT with an 85-Hz refresh rate viewed from a distance of 57 cm, on a gray background. The sequence of events on each trial is shown in Fig. 1. At the beginning of each trial, participants attended to a red fixation cross (.8° × .8°) presented at the center of the screen. After 600 ms, two ensemble stimuli were presented – one above and one below fixation – for 47 ms. The ensembles were followed by a 129-ms pseudonoise pattern mask and then a 1,082-ms blank screen. Participants were to press one key on the computer keyboard as quickly as possible if either ensemble was a member of the pre-specified target category, and another key if neither ensemble was a member. Trials without responses by the end of the blank interval were considered errors.

Fig. 1
figure 1

Sequence of events on a trial in Experiment 1

The ensembles were selected from one of three categories: vertical, horizontal, or oblique, with one category designated in advance as the target category. Each stimulus subtended 9.3° by 9.3° and consisted of 16 black line segments (1.2° × .3°) arranged in a 4 × 4 grid in which a randomly selected 12 lines matched the category designation and the other four lines had orientations selected randomly from the other categories. Ensembles were centered approximately 5° above and below fixation. Depending on the particular line orientations, rows within an ensemble were between 1.2° and 2° apart with the space between ensembles at least 2°.

For both the target-present and target-absent trials, the two ensembles were on some trials from the same category, whereas on other trials, the categories differed. Figure 2 shows examples of the different trial types when the target category was horizontal. On same-category target-present trials, both ensembles were from the same (pre-specified target) category (horizontal in the example). On different-category target-present trials, one ensemble was from the target category and the other was from one of the other categories. Finally, on target-absent trials, the two ensembles could be either from the same category or a different category, but never included ensembles from the target category.

Fig. 2
figure 2

Examples of the stimuli for each trial type in Experiment 1 when “horizontal” was the specified target category. The designations “same-category” and “different-category” refer to the relations between the categories of the two ensembles on a trial. See text for additional details

Design

The experiment contained 180 target-present trials and 240 target-absent trials. For the target-present trials, one-third (60 trials) contained two ensembles from the target category (e.g., both horizontal when horizontal is the target category), and two-thirds contained one ensemble from the target category and one ensemble from one of the other two categories (60 trials for each of the possible non-target categories; e.g., one horizontal and one vertical, or one horizontal and one oblique). For the target-absent trials, one-half of them (120 trials) contained two ensembles from the same category (60 for each of the two non-target categories; e.g., both vertical or both oblique), while the other half (120 trials) contained one ensemble from each of the two non-target categories (e.g., one vertical and one oblique). Each of the three ensemble categories served as the target category for one-third of the participants. When the target category was present, it was equally likely to appear above or below fixation. When one or two oblique ensembles were in the display, all oblique lines had the same orientation. Trials were presented in a random order. At the beginning of the session, participants completed a practice block of 42 trials (with trial types in the same proportions as in the formal testing) that was not included in the analysis. Two prospective participants were replaced because they were unable to achieve 80% accuracy in the practice block.

Results

Trials with errors and those with reaction times more than three standard deviations above or below each participant’s mean in each experimental condition were excluded from analysis. Mean reaction times are shown in Fig. 3. We conducted a target-presence (present or absent) by category relation (same-category or different-category) ANOVA. Reaction times were faster for target-present than target-absent judgments, F(1, 23) =49.19, p < .001, ηp2 = .68. Reaction times were also faster when both ensembles were from the same category, F(1, 23) = 104.74, p < .001, ηp2 = .82. The effect of category relation was greater for target-present than for target-absent trials, F(1, 23) =9.25, p = .006, ηp2 = .29. Importantly, follow-up contrasts showed that the category relation effect was significant not only for target-present trials, F(1, 23) = 131.27, p < .001, ηp2 = .85, but also for target-absent trials, F(1, 23) = 39.05, p < .001, ηp2 = .63.

Fig. 3
figure 3

Mean reaction times from Experiment 1 as a function of target presence and the relation between the two ensembles. Reaction times were faster when a target was present, and when the two ensembles were from the same category (for both target-present and target-absent conditions). Error bars represent the standard errors of the means in each condition

Accuracy rates are shown in Table 1. Participants were more accurate when the two ensembles were from the same category, F(1, 23) = 40.87, p < .001, ηp2 = .64, matching the effect in reaction times and ruling out a speed-accuracy tradeoff. There was no overall difference between target-present and target-absent trials, F(1, 23) = .01, p = .90, ηp2 = .001, but the effect of category relation was greater for the target-present trials, as revealed by an interaction between the two factors, F(1, 23) = 24.22, p < .001, ηp2 = .51. Follow-up contrasts showed that the category relation effect was significant not only for target-present trials, F(1, 23) = 49.53, p < .001, ηp2 = .68, but also for target-absent trials, F(1, 23) = 4.76, p = .040, ηp2 = .17.

Table 1 Mean accuracy rates (proportion correct) from Experiment 1. Standard errors are shown in parentheses

Discussion

In this experiment, participants attended to two ensembles of lines, searching for the presence of a pre-specified category. When the target was present in both ensembles (target-present, same-category trials) participants were faster than when the target was present in only one ensemble (target-present, different-category). This occurred presumably in part because assessment of the orientation of either ensemble would lead to a “present” response. More importantly, there was also a same-category advantage on the target-absent trials despite the fact that target-absent trials always required both stimulus ensembles to be inspected prior to an “absent” response. This result shows that when an entire scene is attended, ensemble perception is influenced by the ensemble category relations present in the scene. While the source of that result could lead to insight into ensemble perception, doing so was not our objective.Footnote 1 Most importantly, it serves as an important pre-requisite for Experiment 2, in which we examined the possibility that ensemble category membership can influence ensemble perception outside the focus of attention.

Experiment 2

Experiment 1 revealed a same-category advantage: when both stimuli in the display were from the same ensemble category, the stimuli were processed more quickly. Because that experiment required participants to indicate if a target category was present anywhere in the display, presumably both elements were attended there. Here we repeated the experiment with one important difference: we cued one of the ensemble stimulus locations in advance, and asked subjects to report only whether the stimulus in the cued location matched the pre-specified target category. As a result, the other ensemble stimulus was a distractor – outside the scope of spatial attention. If ensemble statistics are perceived automatically and without attention, then the ensemble category of the unattended distractor stimulus would be expected to influence responses here and reveal a same-category advantage for judgments, as in Experiment 1. Alternatively, if attention is required for ensemble perception then there should be no effect of the category relation between the attended and unattended ensembles on judgments of the attended ensemble. Importantly, the task measures the processing of the distractor ensemble when there is no motivation for participants to split their attention between the two stimuli – the distractor is completely irrelevant to the task. Additionally, there is no requirement that participants remember anything about the distractor in order for us to determine that ensemble information about the distractor was processed.

Method

Participants

A new group of 24 undergraduate students (13 females, 11 males, age 19–21) participated in Experiment 2. They were paid 15 RMB (equivalent to about $2.14) for their participation. All participants had normal or corrected-to-normal vision.

Procedure

The sequence of events on each trial is shown in Fig. 4. The procedure was identical to that used in Experiment 1 with only two exceptions. First, we inserted a 71-ms spatial cue (a three-sided frame) at one of the stimulus locations prior to presentation of the ensemble stimuli. Second, participants were instructed to report only whether the cued stimulus did or did not match the pre-specified target category – the uncued ensemble was a distractor that could be ignored.

Fig. 4
figure 4

Sequence of events in Experiment 2. Participants were asked to assess only whether the cued ensemble (as cued by the three-sided frame) matched the pre-specified target category

Examples of the stimuli are shown in Fig. 5. When an ensemble from the target category appeared in a cued location (target-present/cued) the distractor (i.e., uncued) ensemble could either be from the same (i.e., target) category, or a different category (Fig. 5a). Similarly, when a target was absent from the display, the two ensembles could come from the same or from different categories (Fig. 5b). These conditions allowed us to assess the effect of a same-category ensemble in the unattended (i.e., uncued) location. Finally, there were trials that contained one ensemble from the target category that was uncued (Fig. 5c).

Fig. 5
figure 5

Examples of stimuli from Experiment 2 when “horizontal” was the target category. Participants here indicated only whether the cued ensemble matched the target. (a) When the target was present and cued (and hence the correct response was “present”), the distractor (the uncued ensemble) could be either from the same-category (i.e., matching the target category), or from a different category. (Note that the figure does not show one additional different-category trial type where the uncued ensemble belonged to the other non-target category [oblique in this example].) (b) When the target was absent, the distractor could either be from the same category as the cued ensemble or from a different one. (Note that one additional combination of same-category target-absent [oblique, oblique] was tested but is not shown in the figure.) (c) Finally, there were also trials in which an ensemble matching the target category was present, but a non-target category was cued (and hence the required response would be “absent”). (Not shown in the figure are trials in which the cued non-target ensemble was from the other non-target category [oblique])

Design

The experiment included 180 trials in which the target category was present and cued (100 of which included a same-category distractor in the uncued location; 80 of which had a different-category distractor). 160 trials were target-absent trials (with half of those containing a same-category distractor, and half a different-category distractor). Finally 80 trials included a non-target category in the cued location and a target-category ensemble in the distractor location (note that this number matched the number of trials containing a target category that was cued and a different-category distractor; a similar design was used by Gronau & Izoutcheev, 2017, who studied scene perception). The trial numbers were selected so that the ratio of “present” responses to “absent” responses here (based on only the ensemble in the cued location) matched that of Experiment 1 (180:240 = .75). All other aspects of the design matched Experiment 1. In particular, for trials that contained non-target categories, each of the two non-target categories were represented equally often. The top and bottom locations were equally likely to be cued.

Results

Trials in which participants made errors or trials that had RTs that deviated from each participant’s mean RT by more than three standard deviations were excluded from the RT analysis. RTs are shown in Fig. 6. To assess the effects of the unattended (i.e., uncued) ensemble, we conducted a target-presence (present or absent) by category relation (same-category or different-category) ANOVA similar to that used in Experiment 1, excluding the target-present/uncued trials. Although the main effect of target presence was not significant, F(1, 23) = 2.32, p = .141, ηp2 = .09, RTs were faster when the uncued ensemble was from the same category, F(1, 23) = 4.87, p = .037, ηp2 = .18, showing that the ensemble statistics of the unattended stimulus were processed. However, as seen in the figure, this occurred only for the target-present trials, as revealed by an interaction between the two factors, F(1, 23) = 5.36, p = .03, ηp2 = .19. Follow-up tests showed that the category effect was significant for the target-present trials, F(1, 23) = 7.46, p = .012, ηp2 = .25 (comparison “A” in Fig. 6), but not the target-absent trials, F(1, 23) = .03, p = .859, ηp2 = .001.

Fig. 6
figure 6

Reaction times from Experiment 2. The comparison marked “A” reveals the benefit to respond “present” when the unattended ensemble was also from the target category. The comparison marked “B” shows the cost to responding “absent” when the cued ensemble was not from the target category but the unattended ensemble did come from the target category. Error bars represent the standard errors of the means in each condition

To further examine the potential processing of the uncued (unattended) ensemble, we conducted a planned contrast comparing the target-absent different category condition with the target-present uncued condition (comparison B in Fig. 6). Both of these conditions cued a non-target category (requiring an “absent” response), yet in the former condition the distractor also came from a non-target category whereas in the latter condition the distractor was from the target category. The result of the comparison revealed a marginally significant cost to responding “absent” when the distractor was from the target category, t(23) = 2.05, p = .052, Cohen’s d = .85, suggesting that the unattended distractor’s category was indeed processed.

Additional insight into the attentional effects of the target category comes from a comparison of the different target category conditions. Recall that participants with horizontal and vertical categories defined as the target were searching for the presence of a specific ensemble feature whereas those who were assigned the oblique category were searching for oblique lines that were either tilted to the left or tilted to the right. As a result, the target category was less specific for those searching for an oblique target, and that might be expected to result in a reduced same-category advantage (the comparison marked “A” in Fig. 6). Indeed, oblique targets yielded a numerically smaller advantage when the unattended ensemble was also oblique (m = 8.79 ms) compared to the same-category advantage for horizontal and vertical target categories (m = 10.42 ms), but the difference was not statistically significant, t(22) = .21, p = .837, Cohen’s d = .09.

Accuracies are shown in Table 2. There was no effect of target presence, F(1, 23) = 2.97, p = .098, ηp2 = .115, or of category relation, F(1, 23) = .04, p = .842, ηp2 = .002, and the two factors did not interact, F(1, 23) = .25, p = .623, ηp2 = .01.

Table 2 Mean accuracy rates (proportion correct) from Experiment 2. Standard errors are shown in parentheses

Discussion

In this experiment participants attended to one cued ensemble of lines and ignored a second, distractor ensemble. Nevertheless, the category of the distractor ensemble influenced judgments of the cued ensemble. Participants were faster to indicate that a cued ensemble was in the target category when the distractor was in the target category. They were also (marginally) slower to indicate that the cued ensemble was not a member of the target category when the distractor was a member of the category. Because the experiment did not encourage any division of attention between the two ensembles (i.e., there was never any reason for participants to assess the distractor), the results show that ensemble statistics under at least some circumstances can be perceived in the absence of focal attention.

General discussion

The present study examined ensemble perception in the absence of focal attention. In Experiment 1, we found that when two ensembles are attended, the category membership of both ensembles is processed – yielding a same-category advantage in responding. In Experiment 2 we required that participants attend to only one of two ensemble stimuli that were presented – the uncued ensemble could be safely ignored. Nevertheless, we also found a same-category advantage there: the category membership of the unattended ensemble influenced performance. These results show that summary ensemble statistics, under at least some circumstances, can be processed in the absence of focal attention.

Importantly, our experiments did not have the same shortcomings that have influenced earlier attempts to address the same issue. In particular, we assessed perception of the unattended ensemble implicitly by examining any effects of the unattended ensemble on decisions related to the attended ensemble. Because the unattended ensemble was entirely irrelevant to the task, there was no need for participants to partially divide their attention between the two ensembles, unlike in some past studies that examined the same question (e.g., Alvarez & Oliva, 2008, 2009; Bronfman et al., 2014). Second, our experiments also did not impose any memory requirements upon the participants in order to assess perception of summary statistics outside the focus of attention. Some past studies that have examined the same question did require that participants retain and then explicitly recall certain aspects of the unattended stimuli (e.g., Jackson-Nielsen et al., 2017; Huang, 2015). Thus, the method used here has some advantages over previous methods.

Attentional control settings and task set

One noteworthy aspect of our findings is that the category identity of the unattended ensemble in Experiment 2 only influenced performance when it matched the target category. In particular, when the to-be-ignored stimulus matched the target category, it facilitated judgments when an ensemble from the target category had been cued and impaired judgments when the cued stimulus was not a member of the target category. On the other hand, when the to-be-ignored ensemble did not match the target category, it had no influence on performance. These results suggest two possible interpretations. First, it is possible that ensemble statistics are processed outside the focus of attention – but only when the ensemble matches the participant’s task set. This interpretation is consistent with late-selection theories of attention that propose that meaning can be evaluated across the visual field prior to the selectivity of attention (e.g., Deutsch & Deutsch, 1963; Duncan, 1984). Results consistent with such a possibility have been reported by Gronau, Cohen, and Ben-Shakhar (2009). They showed that some distractors outside the focus of attention are able to exert their effect on the processing of focal stimuli without themselves attracting attention (see also Eriksen & Eriksen (1974), LaBerge (1983), and Peelen, Fei-Fei, & Kastner (2009), who have results with similar interpretation). Importantly, however, in Experiment 2, such processing of the uncued ensemble only occurred when it matched the target category. Target non-matching ensembles in the uncued location were not processed (because if they had been, they would have affected RTs, as they did in Experiment 1 when both ensembles were attended). That aspect of the results suggests that the attentional control setting or task set for the target category may have acted as an early filter over the entire scene, allowing only target-matching elements to be processed more deeply because only such ensembles matched the properties of the sought-for target. Related findings have been reported by Folk, Leber, and Egeth (2002). They showed that distractors in to-be-ignored locations were nevertheless processed when they matched the participant’s task set.Footnote 2

The match between target-category ensembles and the participant’s task set also leads to a second interpretation of our findings. It is possible that target-matching ensembles captured attention automatically precisely because they were consistent with the participants’ attentional control settings. Such contingent capture has been demonstrated in a wide range of situations (e.g., Folk, et al., 2002; Folk, Remington, & Johnson, 1992; Gaspelin et al., 2014; Reeder, van Zoest, & Peelen, 2015; Wyble, Folk, & Potter, 2013). One interpretation of contingent capture effects is that the attentional control setting enhances the salience of elements that match the task set, thus causing them to capture attention (e.g., Biggs & Gibson, 2010, 2014; Cosman & Vecera, 2010 ).Footnote 3 If this had occurred in Experiment 2 then our results would be better characterized as resulting not from pre-attentive processing of the target-matching ensemble statistics, but instead from contingent capture of attention by such ensembles.

Distinguishing between these two possibilities will be difficult because tasks that attempt to assess the locus of attention typically require presenting occasional probes at the locations being tested (e.g., Kim & Cave, 1995). The possibility of such probes could motivate the subjects to intentionally allocate attention to uncued regions, rendering such a test ineffective for ensemble perception.

Although it is not possible to distinguish between the two possibilities with the present results, both interpretations reveal the critical role of ensemble processing for perception: Ensembles matching one’s task set are capable either of being processed in the absence of attention or of summoning attention to their location. In either case, perception of important summary statistics of complex scenes can be rapidly and efficiently accomplished.

Relation to scene perception

The present findings and conclusions closely match those reported by Gronau and Izoutcheev (2017) in their study of scene perception (see also Gronau, 2020). Those researchers examined perception of the gist of scenes in attended and unattended locations using a method very similar to the one we used here. As in our study, Gronau and Izoutcheev (2017) found that scenes outside the focus of attention influenced judgments only when they matched the sought-for scene category, suggesting that scene gist is processed without attention when the scene is consistent with the observer’s goals, parallel to our results for ensemble perception (in Experiment 2). Some researchers have argued that processing of summary statistics is a fundamental part of scene gist processing (e.g., Brady et al., 2017), sharing mechanisms with visual object categorization (Khayat & Hochstein, 2019). The similarity of the present findings to the Gronau and Izoutcheev (2017) results provides support for that idea.

Alternative explanations

It is possible that the results that we reported stem from properties of the responses required in our task as opposed to the properties of the stimuli, as we have suggested. In particular, in Experiment 2, when the unattended ensemble matched the target category, the response to that ensemble (if one had been required) also matched the required response when the cued stimulus was in the target category but it did not match the required response when the cued stimulus was not a target category member. Thus our results might simply derive from the response congruency associated with the attended and unattended ensembles. For the case of scene processing, Gronau (2020, Experiment 4) has shown that response congruency cannot completely account for gist processing outside of focal attention. Although we cannot rule out that possibility here, if it did occur, the present findings would still indicate that ensemble summary statistics can be processed outside the focus of attention. If, in fact, the effect was entirely due to response congruence, then that could indicate that both target-matching and target-nonmatching ensembles can be processed without attention. More work will be needed to rule out that alternative.

Conclusion

In summary, the present results reveal robust processing of summary statistical information of ensembles outside of focal attention when the ensembles matched the properties of a sought-for target. Such ensembles (but not non-matching ones) were either processed preattentively or they caused attention to be directed to their location. The results are similar to those from studies of scene gist processing (e.g., Gronau & Izoutcheev, 2017) and help to further illuminate the way in which we rapidly assess the contents of visual scenes.