Statistical learning in the absence of explicit top-down attention

Recently it has been shown that statistical learning of regularities presented in a display can bias attentional selection, such that attentional capture by salient objects is reduced by suppressing the location where these distractors are likely to appear. The role of attention in learning these contingencies is not immediately clear. Speciﬁcally, it is not known whether attention needs to be directed to the contingencies present in the display for learning to occur. In the current study we investigated whether participants can learn statistical regularities present in the environment even when these regularities are not relevant for the participant and are not part of their top-down goals. We used the additional singleton paradigm in which a color singleton was presented much more often in one location than in all other locations. We show that after being exposed to these regularities regarding the location of the color singleton during an unrelated task in which there are no targets nor distractors, participants showed a suppression effect from the previously learned contingencies when switching to a task in which they search for a target and suppress a distractor. We conclude that visual statistical learning can occur in the absence of top-down attention.


Introduction
When we look upon a scene, the first things we are consciously aware of are the things to which we attend (Posner, 1980). In an everyday example that may be a billboard, a street sign or someone running across the street; a case which highlights how understanding attention is essential for designing the sort of environment that ensures the correct target will be selected by attention, thereby avoiding potentially dangerous situations.
Traditionally, attentional selection is considered to be the result of the interaction between the goals of the observers (the top-down attentional set) and the physical properties of the environment (the saliency of objects) (Egeth & Yantis, 1997;Itti & Koch, 2001;Theeuwes, 2010Theeuwes, , 2018. On one end of the spectrum, it has been shown that objects that stand out from the environment are selected regardless of the task set. For example, in the so-called additional singleton paradigm (Theeuwes, 1991(Theeuwes, , 1992 participants need to search for a unique shape (a diamond among circles or a circle among diamonds) while on some trials a unique color singleton distractor is present (a red item among green items). Typically, participants are slower when a unique color singleton is present than when it is absent indicating that the color singleton captured attention in a bottom-up, exogenous way. On the other end of the spectrum, it has been claimed that only objects that are in line with the goals and intentions of the observer are selected (Bacon & Egeth, 1994;Folk, Remington & Johnson, 1992;Leber & Egeth, 2006). The idea being that an observer's attention only gets captured by a particular stimulus feature if it matches their top-down attentional set (e.g., a red singleton captures attention when the observer is looking for red items). In other words, when a top-down set is adopted, only elements that fit that topdown set receive any attentional priority.
Recently, it was argued that selection is often neither controlled by top-down nor by bottom-up factors but instead is driven by lingering biases of previous selection episodes (Awh, Belopolsky, & Theeuwes, 2012, Theeuwes, 20192018). It was argued that through statistical learning (SL) observers extract regularities present in the environment which in turn biases attentional selection (Theeuwes, 2018). SL is considered to be a mechanism that enables the extraction of distributional properties from sensory input across time and space (Frost, Armstrong, Siegelman, & Christiansen, 2015). The learning of these statistical regularities makes it possible to interact more effectively with the visual world (Chun & Jiang, 1998). Furthermore, it is generally assumed that the adaptation to regularities that exist in world evolves largely unconscious, 1 and without the intention to learn nor awareness of learning (e.g., Fiser & Aslin, 2001;Turk-Browne, Jung e, & Scholl, 2005).
Given the view that SL is automatic and occurs without intention to learn, one question that has been heavily debated is whether people learn to extract these regularities from the environment even if the regularities are completely irrelevant for the task as hand. In other words, do people learn to extract regularities by simply looking at displays that have these regularities in them? In terms of attentional control, can learning take place even if extracting these regularities is not part of the task relevant top-down set?
Several studies have investigated the question of whether attention is needed for statistical learning to occur. Using auditory stimuli, some have shown that attention improves SL while others have shown that without attention there is no learning. For example, it was shown that if there is less attention available, auditory SL is negatively impacted (Toro, Sinnett, & Soto-Faraco, 2005). Also, instructions to attend to one auditory pattern (i.e., words) improved learning, at the expense of learning other features of the word stream (such as the grammar; see Finn, Lee, Kraus, & Hudson Kam, 2014). On the other hand, there are also studies showing that the role of attention in SL is limited. In a study by Saffran, Newport, Aslin, Tunick, and Barrueco (1997) children and adults had to listen to unsegmented artificial language while performing a cover task of creating computer illustrations. Participants did not know they were listening to an artificial language. Critically, even when they performed the additional cover task, both adults and children learned this artificial language equally well. More recently, Batterink and Paller (2019) demonstrated that participants can learn statistical properties of language even if fully engaged in an unrelated and difficult visual n-back task.
While the emerging picture in auditory SL is that learning benefits but does not rely on attention to occur, the picture is far less clear-cut in the field of visual statistical learning (VSL). The seminal work of Turk-Browne et al. (2005) stood as the last word in the question of the role of attention in SL for nearly a decade. In their study, participants viewed sequentially presented streams of non-sense shapes. Participants had to perform a demanding n-back task while they viewed a centrally presented interleaved stream of shapes. Participants were instructed to attend one of the streams (i.e., shapes in the color green) while ignoring the other interleaved stream (i.e., shapes in the color red). The results indicated only learning of the triplet shapes of the attended stream and not of triplet shapes of the unattended stream. It was thus concluded that selective attention determines the input for statistical learning. However, a replication by Musz, Weber, and Thompson-Schill (2015) utilizing a larger participant group found that SL could indeed be observed in the unattended shape group, though the effect was greatly attenuated and possibly obscured by the original studies limited sample size. Further studies by Campbell, Zimerman, Healey, Lee, and Hasher (2012) and Forest and Finn (2018) add to the growing tide of evidence that VSL may be more similar to auditory SL than was previously thought, where it greatly benefits from the presence of attention, but is not dependent on it to occur.
Several recent studies have provided evidence that VSL biases attentional selection. It has been shown that in visual search, participants learn that the target appears more often in one location than other locations, speeding up target detection (e.g., Ferrante et al., 2018;Geng & Behrmann, 2002Jiang, Swallow, Rosenbaum, & Herzig, 2013). Clearly in these cases in which the regularities concern the location of the target, learning may be particularly effective because "looking for targets" is clearly part of the top-down set of the observers.
However recently in a series of experiments, it was demonstrated that participants can also learn regularities in the environment that are not directly part of the topdown task set. Using the additional singleton task, Wang & Theeuwes, (2018a, b, c;also Failing, Wang, & Theeuwes, 2019) showed that participants can learn to suppress a salient singleton that was systematically presented more often in one location than in all other locations. These findings show that participants can learn about regularities present in the displays even when looking for these regularities is not necessarily part of the task set (see also Ferrante et al., 2017;Goschy, Bakos, Mü ller, & Zehetleitner, 2014 for similar findings using a different paradigm). Indeed, the typical instruction in the additional singleton paradigm is to search as fast as possible for a shape singleton (i.e., a diamond between circles or a circle between diamonds). In order to find the target quickly, participants need to effectively suppress the influence of the 1 Though any claim of unconscious learning should generally be approached cautiously as highlighted by Vadillo et al. (2019). c o r t e x 1 3 1 ( 2 0 2 0 ) 5 4 e6 5 salient distractor on search. The Wang & Theeuwes, findings indicate that participants learn the distractor regularities such that they become better able to suppress distractor if these consistently appear more often in one location than in all other locations. Even though strictly speaking, ignoring these distractors is not part of the topdown instruction to observers, ignoring the distractor does in fact improve search and as such suppression of the distractor may at least be considered indirectly relevant for the task and the top-down goals of the participant.
In the current study we wanted to push the limits of whether participants can learn statistical regularities present in the environment even when these regularities are clearly not part of and not relevant to the explicit top-down set. The question we addressed is whether participants can learn regularities even if they are engaged in a task that has nothing to do with visual search.

Experiment 1
In Experiment 1 we presented displays that were similar to the Wang & Theeuwes, (2018a,b,c) displays in which a color singleton distractor was presented much more often in one location than in all other locations. Yet, instead of participants having to search for a target, they simply had to look at the display and decide whether the elements in the display form a global circle or a global diamond shape (see also Belopolsky & Theeuwes, 2010 who used a similar task). After being exposed to arrays containing the statistical regularity, participants next switched to a regular additional singleton task and searched for a diamond among circles or a circle among diamonds. During this search task, the statistical regularity was no longer present and, critically, the color distractor was equally likely to appear at any of the locations. The question addressed was whether participants learn the color distractor regularity during the learning phase in which they did not have to search for anything but only judge the overall arrangement of the elements. If so, we would expect to see some lingering bias during the test phase when participants searched for the shape singleton while ignore the color singleton.

Stimuli & apparatus
The experiment took place in a dimly lit lab; stimuli and responses were managed by HP Compaq 800 Elite computers with 21-inch liquid crystal display (LCD) colored monitors. The experiment was coded on a custom python script utilizing stimuli from the Psychopy library of psychophysical tools (Peirce, 2009). Participants were provided with a chinrest 71 cm away from the monitors and eye movements were monitored using an Eyelink 1,000 eye tracker to ensure fixation remained central at all times (sampling rate: 1,000hz; spatial resolution < .2 ). Our stimuli were based on those used by Wang & Theeuwes, (2018a) which themselves were a version of the popular additional singleton paradigm (Theeuwes, 1992).

Design & procedure
Our experiment had two parts which both utilized nearly identical stimuli. Every trial would begin with a 500 msec presentation of the fixation cross. Following this, an array of eight colored shapes were presented around the fixation cross for 3000 msec or until the participant gave a response.
In the first three blocks of the experiment (the training blocks) the shapes could either be arranged in a circle or in a diamond, and participants had to indicate which it was by pressing the 'left' or 'up' directional keys respectively (see Fig. 1). In the final two blocks (the testing blocks) the stimuli were always presented in a circle, and participants had to do the classic additional search task (Theeuwes, 1992). On these trials, participants had to find the unique shape (either a circle among seven diamonds or vice versa) and indicate whether the line segment inside this shape was vertical or horizontal by pressing the 'up' or 'left' directional buttons as quickly as possible. Following the completion of each trial an inter-trial interval occurred which was a random duration between 500 msec and 750 msec. If participants moved their eyes, provided the wrong answer or failed to provide an answer within 3000 msec, an error message would be displayed and the erroneous trial would be appended to the end of the trial list to be presented again at the end of the block.
On each trial, the display consisted of either circles and one diamond or vice versa, and the shapes were equally likely to be red or green. On 67% of the trials a color singleton was present. This color singleton was one of the non-singleton shapes which would be a different color than all the other shapes (either red or green). During the training phase the color singleton was presented more often in one specific location than in all other locations. The color singleton was 13 times more likely to appear in this high probability location than the other seven locations. The color singleton was present in this location in 65% of all trials (52 trials) and evenly distributed among the remaining seven locations in the 2 We report how we determined our sample size, all data exclusions (if any), all inclusion/exclusion criteria, whether inclusion/exclusion criteria were established prior to data analysis, all manipulations, and all measures in the study.
3 This sample size was chosen based on those used in previous experiments on SL using the additional singleton paradigm (Wang & Theeuwes, 2018a,b,c). A post-hoc sensitivity analysis focusing on the critical one-tailed t-test comparing high & lowprobability distractor trials with an alpha of .05 and a power of .8 indicated a sensitivity to Cohen's d of .512 and above, indicating this design was best suited for detecting medium-to-large effects.
c o r t e x 1 3 1 ( 2 0 2 0 ) 5 4 e6 5 remaining trials (four presentations each). The high probability location could only be one of the four cardinal directions (top, bottom, left or right) and would never be one of the inbetween locations (top-left, top-right, bottom-left, bottomright). 4 During the final two blocks, in which participants performed the singleton search task, all display elements were equally likely to appear at each location (i.e., there was no high probability location). The distractor appeared in each location the same number of times (10 times per location).

Results
Following the methods of Wang & Theeuwes, (2018a), all trials in which Reaction Times (RTs) were slower than 2,000 msec (2.39%) were excluded from the analysis. Additionally, all trials faster than 300 msec (.64%) were also excluded as this was deemed too fast of a RT to have properly performed the task. Furthermore, participants with error rates higher than 30% were removed (in this experiment, no participants were removed).
Due to the small number of trials in the test blocks in which the distractor was presented in the previously highprobability location during the search task (ten per block), and because we excluded trials which were too slow or too fast, it was possible that participants could end up having their average RT for high-probability trials calculated on a very small number of trials. For this reason, we chose to exclude any participants who had more than three trials in which the distractor was presented in the previously highprobability location due to being too slow or too fast (so averages were calculated on seven or more trials). This resulted in one participant's result being excluded from block four and one from block five. Because these participants were often perfectly fine in all other respects of the experiment, we chose not to exclude their entire dataset from all our analyses, but rather to selectively exclude their data on the blocks in which their data was insufficient. For the combined analysis of the testing blocks (blocks 4 & 5) if a participant was excluded from either testing block for having too many high-probability trials excluded, then their entire dataset was excluded from the combined analysis of these blocks so as to not have an unbalanced representation of each block. 5 Fig. 2 shows the differences in RTs when the colored singleton distractor was presented in the high probability location, in a low probability location, or was absent in the trial. One-way repeated measures analysis of variances (ANOVA's) using mean RTs across the distractor conditions (high-probability location, low-probability locations and nodistractor conditions) as factors showed a main effect of dis- 6 However, planned comparisons between high-probability and low-probability distractor Fig. 1 e Each experiment started with three training blocks in which participants had to decide and respond whether the elements were arranged in a global circle (left) or a global diamond (right). In these displays and only during training, the color singleton was presented much more often in one location than in all other locations. Training blocks were followed by two testing blocks in which participants had to perform the classic additional singleton paradigm: searching for a shape singleton while ignoring the color singleton. During the testing blocks, the singleton was equally likely to appear at any of the location and the display was always arranged in a global circle. The target shape identity and color as well as target line orientation were random on every trial. 4 This was done as the cardinal locations actually appeared in the exact same locations on the computer screen for both circle and diamond arrays. 5 Because two participants were included in the individual block analyses that were not included in the combined block analyses, the results of the combined testing block analysis should not be taken to be a direct synthesis of the two individual block analyses. 6 Many of our ANOVA's failed to pass Mauchly's test for sphericity, necessitating corrections. The reason for this was likely due to the disparity in trial counts between high-probability distractor trials and low-probability/no-distractor trials. In all further analyses if sphericity was violated then Greenhouse-Geisser corrected F values used and the approximate chisquared, significance value and epsilon are reported in brackets following the main analysis.
locations showed no significant difference in RTs if the distractor was present in the previously highly probable location [one-tailed t (22)  A further analysis was undertaken in which each individual test block was analyzed in isolation (Fig. 3). The reason for this extra step was to see whether SL was indeed present in the early parts of testing (block four) as it is known that these effects can quickly fade away as evidence accumulates against the existence of a high probability location. For example, Wang & Theeuwes (2020) showed that participants start learning to reorient quite rapidly (within a few trials) when the high probability location changed between blocks. Two new unplanned repeated measures ANOVA's were conducted for the RTs on blocks four and five independently. Significant main effects were found for the RT's in blocks four and five [F (1.415,32.55 However, critically, one-tailed t-tests showed marginally significant differences in RTs for distractors presented in the previously highprobability locations in block four [one-tailed t (23) ¼ 1.703, p ¼ .051, d ¼ .348] but not in block five [one-tailed t (23) ¼ 1.481, p ¼ .924] suggesting that indeed participants showed a tendency to suppress the distractor when presented in the previously trained high probability location in the block immediately following training which rapidly faded until it was absent one block later.

Discussion
Our pre-planned analyses showed no significant differences between the RTs of trials in which the distractor was presented in the previously high-probability location from those in which it was presented elsewhere. However, further unplanned analyses of the testing blocks separately seemed to suggest that there may have been some learning taking place, but this learning quickly went extinct when the regularities were no longer in place. It was recently shown that participants very quickly adjust their statistical contingencies in the additional singleton task when the high-probability location changes from one to another location (Wang & Theeuwes, 2020). However, Britton and Anderson (2020) very recently showed that learned statistical bias should still be detectable after 180 trials of the additional singleton task without any high probability location. We therefore designed Experiment 2 to answer the question of exactly how quickly we should expect a learned contingency to fade when the regularities are no longer in place using our specific task and setup. We first ensured that participants learned the contingency, establishing a clear suppression of the high probability location. After this we changed the contingencies such that the distractor was equally likely to appear at each location, and observed how quickly the suppression effect waned. We expected that the effect would at least be present in the block immediately following training, but were unsure how long the effect would linger when the contingencies were no longer in place.

Experiment 2
Experiment 2 was intended to determine the speed with which SL dissipates following a paradigm in which we knew SL should occur. In Experiment 2, participants encountered nine blocks. Unlike Experiment 1, all blocks in Experiment 2 required participants to perform the additional singleton search. However, only the first six blocks used a high  probability distractor location while the final three presented singletons with no regularity whatsoever. Because the learning block used the additional search task, it can be claimed that this design indirectly evoked top-down attention.

Material and methods
A new set of 25 naïve subjects (22 female, mean age 20.55) were recruited in the exact same way as in Experiment 1. Experiment 2 began with an exact replication of Wang & Theeuwes, (2018a) where six blocks were presented with a high probability distractor location (training blocks). An additional three blocks were appended to the end of the experiment in which participants continued to perform the singleton search task but distractors were presented in all locations an equally often (testing blocks). In this experiment we did not use an eye tracker anymore as eye movement errors were relatively rare (~5% of trials in Experiment 1).

Results
The same exclusion criteria as Experiment 1 was used in this experiment. 3.74% of trials were excluded for being slower than. 2,000 msec, .06% were excluded for being faster than 300 msec. Again, zero participants were excluded having error rates above 30%. One participants data was excluded from block eight for having too few trials included in which the distractor appeared in the previously high probability location. One participant's data was excluded from block nine for the same reason. Again, these participant's data was excluded from the combined analysis of the three testing blocks. A repeated measures ANOVA taking the RTs in the training blocks (blocks 1e6) with the three distractor conditions as factors showed a significant main effect [F (1.587,38.1) ¼ 124.87, p < .001, h 2 ¼ .106 (X 2 ¼ 6.923, p ¼ .031, ε ¼ .794)] as well as the comparison between trials in which the distractor was presented in the high vs. low probability locations [one-tailed t (24) ¼ 6.609, p < .001, d ¼ 1.322] thereby perfectly replicating the results of Wang & Theeuwes, (2018a) and demonstrating without a doubt that participants had learned the regularities in the eight training blocks (Fig. 4, training column).

Discussion
The results of Experiment 2 showed that the learned attentional bias obsolesce when the regularities are no longer in place. While participants showed a strong learned bias in the first block of testing, this difference was no longer reliable after two blocks of testing. The fact that even after extensive training in which participants develop a strong bias to suppression the high probability location, the effects quickly dissipate when the contingencies are no longer in place, puts the results of Experiment 1 in another perspective. Indeed, the indication of suppression of the high probability location even when participants simply judged the spatial arrangement of the display elements during training suggests that some learning took place. In Experiment 4 we address this issue again in an experiment using a much larger sample size. Before we conducted this experiment, we first sought to replicate the results of Experiment 2 using an on-line procedure and with the reduced block count of Experiment 1.

Experiment 3
In Experiment 2 we employed six blocks of training while in Experiment 1 we only had three blocks. One may wonder how much learning would have taken place if only three blocks of training would have been applied. In Experiment 3 we ran three blocks of training involving visual search with the statistical regularity followed by two blocks of testing in which the regularities were no longer in place. Additionally, we choose to conduct this study on-line to determine whether we could replicate the basic Wang & Theeuwes, effect as an online study.

Material and methods
30 anonymous naïve participants were recruited online through the website Prolific (www.prolific.co), a web-based platform for online experiments. In order to convert the original python scripted experiment to a form understandable by java-script-restricted online platforms, the experimental design tool OpenSesame was used (Mathôt, Schreij, & Theeuwes, 2012). The stimuli and procedure were reproduced as closely as possible in the new script. All participants indicated informed consent before beginning. The experiment lasted approximately 30 min and participants were compensated £3.15 for their time. Participants were first instructed to find a quiet environment to perform the experiment in and were requested to perform the experiment on full screen. Following 12 practice trials participants performed the singleton search task for five blocks. The first three blocks participants encountered utilized a high probability location, while the last two presented the color singleton with no regularity. Additionally, unlike the previous two experiments, error trials were no longer recycled to the end of each block.

Results
The exact same exclusion criteria as the previous two experiments was used in this experiment. 8.37% of trials were excluded for being slower than 2,000 msec, .4% were excluded for being faster than 300 msec and three participants were excluded for having an error rate above 30%.
Because error trials were not recycled in this version of the experiment, more participants had less than seven trials during the test blocks in which the distractor was presented in the previously high probability location. Two participant results were excluded from the analysis of block four and four from block five for this reason. Again, these participants were excluded from the combined testing block analysis.
Our analysis exactly mirrored that used in Experiment 2. A repeated measures ANOVA taking the RTs in the training blocks with the three distractor conditions as factors again showed a significant main effect [F (2,52) ¼ 74.46, p < .001, h 2 ¼ .11] as well as the comparison between trials in which the distractor was presented in the high vs. low probability locations [one-tailed t (26) ¼ 5.671, p < .001, d ¼ 1.091] thereby once again replicating the results of Wang & Theeuwes, (2018a) and demonstrating that such a result can be observed using an online study.

Discussion
Experiment 3 shows that when only three blocks of training are given, suppression is only found in the block immediately following training but not any subsequent blocks, essentially replicating the findings of Experiment 2. Furthermore, Experiment 3 demonstrated that online experiments are an acceptable means of studying the SL effect using our paradigm, producing reasonable and interpretable results and reproducing the effects seen in lab settings.

Experiment 4
The results of Experiment 1 suggested that participants may have learned regularities of the colored distractor during the training blocks even when this was no part of their task. However, the results indicate only a trend. Therefore we chose to re-run the experiment but with a much larger sample size and on-line.

Material and methods
120 anonymous naïve participants were once again recruited using the same web service as Experiment 3. All participants indicated informed consent before beginning. The experiment again lasted approximately 30 min and participants were compensated £3.15 or their time. A post-hoc sensitivity analysis for this sample size for our critical one-tailed t-test comparing high and low-probability distractor trial reaction times with an alpha of .05 and a power of .8 revealed this sample size was sensitive to Cohen's d effect sizes of .228 and above.

Results
While 120 participants completed our experiment online, due to a bug in our code some participants finished the experiment before completing all 600 trials. As a result of this, 16 participants were excluded from the analysis of block 5 for having completed less than 75% of block 5's trials. These participants data were not excluded from the overall analysis as often they were complete in all other respects. Otherwise, the same exclusion criteria as in Experiments 3 were applied. 2.46% of trials were excluded for being slower than 2,000 msec, .11% were excluded for being faster than 300 msec and 12 participants were excluded for having an error rate above 30%. A large number of participants were excluded from the individual block analyses due to having too few correct trials in which the distractor was in the previously high-probability location. 28 were excluded form block four and a different 28 from block five. Presumably these large numbers reflect fatigue from running a psychophysical experiment online. 7 Again, these participants were excluded from the testing block's combined analysis.
A repeated measures ANOVA of the RTs in the testing blocks taking the three distractor conditions as factors again showed a significant main effect of distractor condition [F (1.658, 89.552) ¼ 69.267, p < .001, h 2 ¼ .068 (X 2 ¼ 12.253, p ¼ .002, ε ¼ .829)]. Additionally, this analysis did find a significant  7 Interestingly, a much smaller percent of participants struggled to in the latter blocks of Experiment 3, which was also online and presumably employed a more difficult task. Our interpretation of this is that the effort of suddenly switching from a very easy task to a much harder task late in an experiment caused more participants to struggle than simply performing one hard but consistent task for the entire experiment. difference between high and low distractor probability trials [one-tailed t (54) ¼ 2.239, p ¼ .015, d ¼ .302, see Fig. 8].
Additionally, an unplanned repeated measures ANOVA using the three distractor conditions as factors was conducted on the training blocks and surprisingly revealed an extremely weak but highly significant effect of distractor conditions in these blocks as well [F (1.825,171.57 where the same pattern of results as in the testing blocks began to emerge with a large enough sample despite a total lack of top-down attention directed to the regularities of the coloured singleton.

6.
General discussion The question we addressed in this study was whether participants learn particular statistical regularities present in the environment even when these regularities are not part of the top-down task-set. In other words, do participants learn regularities simply because they are exposed to them, even when attending these regularities is unrelated to the current task and goals. The results of Exp. 1 and 4 indicate that even under these circumstances participants do learn these contingencies. We show that after being exposed to these contingencies during an unrelated task in which there are no targets nor distractors, participants showed a suppression effect from the previously learned contingencies when switching to a task in which they search for a target and suppress a distractor. During search, when participants search for a shape target singleton while ignoring a color singleton (the additional singleton task), we show that participants are better able to ignore the color singleton when it appears in the location it was more likely to appear during the previous unrelated training block. This was despite the fact that during the search the distractor was actually equally likely to appear at any of the eight locations, and thus the previously learned attentional bias served no behavioral benefit. The current study also show that the effects are small and wane quickly when these regularities are no longer in place. Critically, our Experiments 2 and 3 show that this quick waning also occurs when there is clear evidence that the regularities are well learned during training when they are relevant for the task. Indeed, even when participants show a strong evidence for suppressing the high probability location during training sessions, the effect quickly wanes and only stays around for about 120 trials when contingencies are no longer in place, thereby replicating the approximate timescale found by Britton and Anderson (2020). The current study's finding is consistent with a number of recent studies which assert that VSL can occur even when attention is not focused at the task at hand. Both the works of Musz et al. (2015) as well as Campbell et al. (2012) found evidence for attention free VSL when using a serial shape presentation task where participants were instructed to attend only to a subset of shapes (e.g. attend to only the red shapes). Both studies found significant familiarity scores as well as  c o r t e x 1 3 1 ( 2 0 2 0 ) 5 4 e6 5 speeded RTs in relation to trained shape contingencies, but not their foil counterparts, regardless of their membership in the attended shape set or not. However, in both studies the strength of the effect for attended shapes were stronger than the unattended shapes, highlighting top-down attentions beneficial relationship to VSL. Forest and Finn (2018) found a similar beneficial role for top-down attention using stimuli that were rapidly changing shapes embedded within changing colored squares; finding that SL scaled greatly depending on instruction set. Participants learned attended-to regularities better than unattended regularities and regularities passively viewed with no explicit instructions. Participants also learned the passively viewed regularities better than the unattended regularities, showing that VSL scaled nicely to the availability and orientation of attention. The above studies all used a form of serial shape presentation tasks, thus making our study the first in this body of work to demonstrate implicit VSL with an attention-based search task.
Additionally, these findings compliment nicely the work in audio statistical learning where it has been found that SL occurs without directed attention (e.g. Saffran et al., 1997;Batterink & Paller, 2019) though it has also been suggested that SL improves when the stimuli is represented by the top-down attentional set (Toro et al., 2005).
It is important to emphasize that these results do not suggest that participants picked up the statistical regularities of their environment in the absence of attention but rather they did so in the absence of goal directed top-down attention. The task utilized in this experiment inherently required participants to attend to the contingent stimuli, assuming participants divided their attention across the visual field employing what has been called a large "attentional window" (Belopolsky, Zwann, Theeuwes, & Kramer, 2007;Treisman, 2006). Even though not part of the top-down set, it is feasible that during training when judging the overall arrangement of the elements in the visual field, the color singleton did capture attention. Indeed, it known that in the additional singleton paradigm, attention is captured exogenously in a bottom-up way (Theeuwes, 1991(Theeuwes, , 1992, indicating during training attention may have been captured by the color singleton giving rise to observed SL (as supported by the findings in Experiment 4 that a small but significant speedup was observed in the training phase when a large enough sample was available for analysis). Furthermore, the results of Zhao, Al-Aidroos, and Turk-Browne (2013) showed that attention was implicitly oriented towards a location containing a structured stream of visual items relative to a location containing a random stream of visual items even though the regularity itself was irrelevant to the task at hand, suggesting that attention can indeed be automatically attracted towards environmental regularities. This capture of attention in turn may lead to implicit learning that the color singleton is presented more often in one location than in all other locations, which can then be teased out in the test phase in which the color singleton becomes the distractor and thus implicitly is connected to the top-down attentional set. In other words, while our results clearly show that SL may occur in the absence of top-down attention, we cannot rule out that bottom-up capture may still play a role.
The current results are related to studies investigating the effect of habituation on attentional capture. Dating back to Sokolov (1963), it has been suggested that due to repetitive exposure, the orienting of attention towards salient stimuli may habituate. For example, several recent studies have shown habituation can lead to attenuated capture effects, whether from irrelevant sudden onsets (Chelazzi, Marini, Pascucci, & Turatto, 2019;Turatto, Bonetti, Pascucci, & Chelazzi, 2018;Turatto & Pascucci, 2016), or irrelevant color distractors during serial search (De Tommaso & Turatto, 2019;Won & Geng, 2020). Specifically, De Tommaso and Turatto (2019) demonstrated the potential in reinterpreting the results of attentional capture research as cases of habituationshowing distractor suppression could be learned only when a feature search mode was engaged, not when participants employed the singleton detection mode. In contrast to De Tommaso and Turatto (2019), the current study did find a reduction in attention capture even though the singleton detection mode was employed. Note however that the reduction in capture was observed for color distractors presented at the high probability location suggesting that during singleton search, habituation may only be possible when it concerns spatial regularities.
Generally, it is believed that any repeatedly presented stimuli should become habituated leading to a reduced attentional orienting response which should, in principle, lead to faster reaction times in any concurrent task (Sokolov, 1963). Thus a habituation interpretation of the current study would have predicted RT differences in the training phase as well as the testing phase, as was found in the unplanned analysis of Experiment 4's training blocks. Furthermore, while research has shown habituated responses lasting over long time periods (e.g.  showed a complete habituation of attentional capture that lasted for at least two weeks after training), alternative results have shown that a rapid extinction of habituated responses is possible given enough intervening trials in which the habituated element is excluded, suggesting that habituated responses can rabidly extinguish under certain criteria (Turatto & Pascucci, 2016, Experiment 3b).
In the current study we pushed the limits of statistical learning and tested whether participants can learn statistical regularities by simply being passively exposed to these regularities. When performing a passive task with neither targets nor distractors, participants still showed a suppression effect from the previously learned contingencies when they engaged in searching for a shape singleton while ignoring the color singleton. We conclude that participants learned the statistical regularities of the singleton presentations despite such learning having no explicit or explicit connection to a topdown attentional set, thereby demonstrating that visual statistical learning can occur in the absence of top-down attention.

Author notes
JT was supported by a European Research Council advanced grant 833029 e [LEARNATTEND]. Correspondence should be c o r t e x 1 3 1 ( 2 0 2 0 ) 5 4 e6 5