In search of oculomotor capture during film viewing: Implications for the balance of top-down and bottom-up control in the saccadic system

In the laboratory, the abrupt onset of a visual distractor can generate an involuntary orienting response: this robust oculomotor capture effect has been reported in a large number of studies (e.g. Ludwig & Gilchrist, 2002; Theeuwes, Kramer, Hahn, & Irwin, 1998) suggesting it may be a ubiquitous part of more natural visual behaviour. However the visual stimuli used in these experiments have tended to be static and had none of the complexity, and dynamism of more natural visual environments. In addition, the primary task in the laboratory (typically visual search) can be tedious for the participants with participant's losing interest and becoming stimulus driven and more easily distracted. Both of these factors may have led to an overestimation of the extent to which oculomotor capture occurs and the importance of this phenomena in everyday visual behaviour. To address this issue, in the current series of studies we presented abrupt and highly salient visual distractors away from fixation while participants watched a film. No evidence of oculomotor capture was found. However, the distractor does effect fixation duration: we find an increase in fixation duration analogous to the remote distractor effect (Walker, Deubel, Schneider, & Findlay, 1997). These results suggest that during dynamic scene perception, the oculomotor system may be under far more top-down control than traditional laboratory based-tasks have previously suggested.


Introduction
Theories of attention seek to explain how observers manage to selectively process the bombardment of visual information reaching the brain from the world. A useful distinction is made in this field between top-down processing of information that is influenced by the goal of the observer and bottom-up processing which depends directly on the external stimuli (Posner, 1980). One major focus for attention researchers over the past twenty-five years is the extent to which visual attention is guided by top-down or bottom-up processes (for reviews see: Egeth & Yantis, 1997;Theeuwes, 2004;Yantis, 1993;Yantis & Jonides, 1996). The bottom-up position or Attentional Capture Hypothesis, states that fast attentional capture is an automatic process driven by saliency (see : Itti, Koch & Niebur, 1998), and that this cannot be overridden by the slower process of top-down control (Theeuwes, 2004). The top-down position, or Contingent Capture Hypothesis, maintains that the top-down goals of the observer can be used to filter, modulate or override any bottom-up process. In this model attentional capture will only occur if the stimulus shares characteristics (e.g. colour, onsets, or size) with the goals of the observer (Becker, Folk, & Remington, 2010;Folk & Remington, 1998;Folk, Remington, & Johnston, 1992;Folk, Remington, & Wright, 1994;Ludwig & Gilchrist, 2002).
Experimentally, these capture processes have been studied by looking at how often, irrelevant, visual distractors interfere with a primary task. For example the primary task might be a visual search task and the key measure is the extent to which the onset of an irrelevant distractor interferes with this primary task. Many of these studies have looked at the effect of the irrelevant distractor on response times to find the search target (Egeth & Yantis, 1997;Yantis & Jonides, 1996). However a more direct measure of attentional capture is to investigate the effect on eye movements during the task. In a now classic and much cited study, Theeuwes, Kramer, Hahn, and Irwin (1998) had participants do a simple visual search task. On occasional trials the appearance of the target was accompanied by the onset of an irrelevant distractor. They found that on about half of these trials, the onset of the irrelevant distractor interfered with the saccade towards the target: the participants made a saccade directly to the distractor. This oculomotor capture effect is seen as strong evidence that spatial attention, as measured by the allocation of saccades, was heavily influenced by bottom up signals. The influence of this result stems from both the clear demonstration of oculomotor capture and the frequency with which capture occurred in this paradigm.
The deficiency of traditional laboratory approaches to study attention has been reported by Lorist (2008). In a study of a timed trial task, reported that top-down attention was a casualty, whereas automatic attention continued to operate. However film is a stimulus that holds viewer attention for prolonged periods of time and has been advocated as an ideal stimulus for psychological studies of attention (see Cutting, Brunick, DeLong, Iricinschi, & Candan, 2011;Smith, 2012).
Our goal in the current experiments was to determine if oculomotor capture was a phenomena that occurred in more naturalist viewing situations. First, it is important to know the extent to which oculomotor capture is part of the normal behavioural repertoire or just a laboratory phenomenon and second, such a result will contribute to our understanding of the extent to which oculomotor control is more normally under top-down control.
A number of factors may determine the extent of capture. The first of which is the state of the system. In a series of experiments Yantis and Johnston (1990) found that attentional capture did not occur when observer covert attention was already spatially focused. Godijn and Theeuwes (2002) later proposed, that for oculomotor capture to occur that the oculomotor system should be in a disengaged state, when fixation cells are inhibited. This is supported by results reported by Tse, Sheinberg, and Logothetis (2002), who found that oculomotor capture can only occur when the oculomotor system is in a state of preparation to make a saccade. It would seem then that there are two states for the oculomotor system during a visual task, one of which is susceptible to oculomotor capture and one that is not. During scanning behaviour the eyes typically move about 3 times a second and the saccadic system is likely to be entering the disengaged state for between 60 and 100 ms before the next saccade is generated (e.g. Ludwig, Mildinhall, & Gilchrist, 2007). This suggests that during scanning we can expect the saccadic system to be disengaged, and therefore liable to capture for about a third of the total viewing time.
The second factor that may be important is how engaged the participants are in the primary task. One inevitable characteristic of laboratory based visual search tasks is that they are somewhat repetitive and tedious in nature and boring for the participant. Boksem, Meijman, and Lorist (2005) showed that participants become more responsive to automatic irrelevant stimuli as participant boredom sets in. This suggests that participants may, over time cease to exert as much top-down control and as a result such experiments may over-estimate the extent to which the oculomotor system is under bottom-up control. At its most extreme this may suggest that oculomotor capture itself is a laboratory curiosity rather than a phenomena that occurs in more naturalistic visual interactions with our environment.
One experience that is designed to be engaging and in which participants do not generally become bored is watching movies. The use of movies in psychology studies as an engaging stimulus was predicted by Münsterberg (1916) and a number of contemporary studies have used film as a platform to study cognition (Hasson, Nir, Levy, Fuhrmann, & Malach, 2004;Mital, Smith, Hill, & Henderson, 2011;Smith, Levin, & Cutting, 2012). It would seem therefore that film is an ideal stimulus for studying oculomotor capture. More specifically for investigating capture when the task is both engaging and has involved scanning behaviour resulting in the oculomotor system being disengaged for a significant proportion of the time.
As well as re-directing gaze, irrelevant distractors, such as those used in oculomotor capture paradigms, can also effect the timing of ongoing oculomotor behaviour. The effect of task irrelevant distractors on saccade latency has been studied in basic oculomotor paradigms. In these paradigms the task for the participants is to make a saccade to a known target in a fixed location. If a distractor is presented simultaneously close to the target; there is no modulation of the latency but instead the saccade landing position is affected and falls in between the target and distractor; this is the Global Effect (Findlay, 1982). In contrast when the distractor is presented at non-target locations in close temporal proximity to the target then the latency of the saccade to the target is increased; this is known as the remote distractor effect (Ludwig, Gilchrist, & McSorley, 2005;Walker & Benson, 2013;Walker, Kentridge, & Findlay, 1995;Walker, Deubel, Schneider, & Findlay, 1997;Weber & Fischer, 1994).
The Remote Distractor Effect is also thought to be related to the phenomena known as Saccadic Inhibition (SI) (Reingold & Stampe, 1999, 2004. SI occurs with the presentation of the distractor at a remote location to the target. These studies do not simply look at average saccade latencies but instead examining the effect on the latency distributions. SI is characterised as a selective notch in the saccade latency distribution between 60 ms to 120 ms after distractor onset. Neural theories of saccadic control have been advanced to explain these effects, proposing that distractors slow-down saccade processing by stimulating the network of medial layers in the superior colliculus (Findlay & Walker, 1999;Gandhi & Keller, 1997). These distractor paradigms have been applied to many different types of stimulus, including reading studies (Reingold & Stampe, 1999) and complex scene viewing, such as paintings (Graupner, Velichkovsky, Pannasch, & Marx, 2007;Pannasch, Schulz, & Velichkovsky, 2011). Note that all these effects lead to a modulation of the latency of the saccade following the onset of the distractor (which we will refer to as the temporal effect) but are distinct from effects that change the direction of the distractor (the spatial effect).
In this study we ask the question: what is the extent of oculomotor capture when participants are carrying out an interesting primary task which presumably will motivate them to maximise top down control and so not get distracted? The primary task chosen in this study was the watching of movies, which are (in general) specifically designed by the filmmakers to be engaging and interesting. In exactly the same way as previous oculomotor capture studies, we will present irrelevant, salient distractors and measure the extent of oculomotor capture. We choose to allow the participants to free view the movies normally to ensure that they will be in a state of oculomotor disengagement for some of the time and so that their engagement with the film content was uninhibited. Because the movies were free viewed we cannot predetermine fixation position so we will present the distractors in a fixation contingent manner to ensure that they are always displayed at the some eccentricity.

Experiment 1: Watching movies with gaze contingent distractors
In this experiment, four 10 min movie clips, extracted from different movies were shown to participants. Gaze contingent, irrelevant, distractors appeared during the films, at frequent, semirandom intervals which observers were told to ignore. The experiment looked for both spatial and temporal effects on the oculomotor system during movie viewing. A spatial effect would be evident in a saccade towards the distractor. Temporal effects would be detected by a lengthening of fixation durations following the onset of the distractors. Both effects would indicate that the distractors were being processed by the oculomotor system with the spatial effect corresponding to oculomotor capture. Additionally we will look for capture in the initial period of viewing as observers may not be initially engaged in the movie and so bottom-up processes might initially dominate. We will look for this effect by looking specifically at the first 60 s of each clip, and in particular the first clip viewed.

Apparatus
Viewing distance for all participants was 59 cm. Movies were displayed on an LCD monitor (Dell U2412 M) in High Definition (HD; a resolution of 1920 Â 1080). The diagonal distance on the screen was 60 cm. Eye movements were recorded using an Eyelink 1000 eye tracker (SR Research, Canada) running Experimental Builder TM version 1.10.1385 (E-Builder).

Observers
Twelve undergraduate volunteers, with normal vision, studying Psychology at the University of Bristol completing the experiment for course credit. The ages ranged from 18 to 27 (mean = 23 years). There were eight female. All observers were naïve to the aims of the experiment, and none had previously watched any of the movies from which the clips were taken. The experimental protocols were approved by the Ethics Board at the University of Bristol.

Stimuli
The film stimuli consisted of four, ten minute clips displayed at a frame rate of 24 fps. The clips were taken from the opening scenes (following the director's credit) from the following films: The Good, The Bad and The Ugly (henceforth referred to here as The Good for brevity, Leone, 1967), About Time (Curtis, 2013), Limitless (Burger, 2011), and The Great Gatsby (Luhmann, 2013). Film order was counterbalanced across participants.
The distractors were 1.3 Â 1.3 degs in size and presented at 4 degs away from the current fixation with an equal probability along one of the four diagonal axes, at positions relative to the horizontal of 45°, 135°, 225°, or 315°as shown in Fig. 1. The timing of distractors presentation was determined as follows. Once the clip has started, there was a following a random delay of 2-4 s (varied to minimise anticipation effects). After this delay the next start of a fixation was detected, and then after a further 40 ms delay is added by the SR Research Experiment Builder code before the distractor was presented. This extra 40 ms ensured that the eyes were stable and towards the beginning of the first fixation that followed the delay. Following the presentation of this first distractor the next 2-4 s interval occurred. The distractors were designed to be highly salient. The distractors switched from white to black at a frequency of 12 Hz which was selected to be at the peak of the temporal sensitivity function (Barlow & Mollon, 1982). The two luminance values the squares switched between were 16 and 116 cd/m 2 as measured by a Photo Research, PR-670, Spectrascan. To confirm that the distractors were highly salient we processed all film clips through the Itti et al. (1998), iNVT. (2014) saliency model. This analysis showed that in 98% of cases the distractors were the most highly salient items in display.
In addition we ran a supplementary study in which 10 observers (7 female; mean age = 19.6 years, range 18-24 years) were given a speeded distractor detection task. In order to assess the extent to which the film interfered with the visibility of the distractors there were two conditions. In the first, distractors were present with the film and in the second they were present against a grey background. Average response times were fast overall whether the film was present (524 ms) or not (456 ms). The 68 ms difference between these two conditions was not reliable 95% CI [À98 ms, 235 ms]. Distractors were only missed in 0.5% of trials overall. This confirms that the distractors were highly visible and salient enough that the film content had little effect on their detectability when participants were explicitly instructed to look out for them.

Procedure
Observers were told that they should watch and enjoy the film clips, and that they would be asked questions about the film at the end (a ruse, to foster engagement). Also observers were warned that irrelevant visual objects would appear during the film from time-to-time, and to simply ignore them. The eye tracker was calibrated at the start of the experiment and every ten minutes, between film clips.

Data analysis
The landing positions of the saccade following the distractor onset was coded to test if it was in the same quadrant as the corresponding distractor stimulus. This is the most liberal criteria for detecting the presence of oculomotor capture. For each clip we calculated the mean probability of saccades being directed into the distractor quadrant (leading to a value in the range 0 and 1 and a chance level of 0.25, since each distractor is occurs in one of four random quadrant). We also calculated a time dependent mean for the first 60 s in order to look for evidence of oculomotor capture at the very beginning of viewing each clip.
In order to look for a change in fixation duration as a result of the distractor, two fixation durations were calculated. The first was the duration of the fixation that was ongoing during the onset of the distractor. The second was a mean background fixation duration calculated as the mean of the fixation duration directly before (prior) and directly after (post) the fixation in which the distractor was present. If we only took the fixation duration before (or after) the key fixation then we would confound the effect of the distractor with any systematic drift in the saccade latencies over time. This method of calculating the background fixation duration avoids this confound. The difference between distractor fixation duration and background fixation duration, will be referred to as the remote distractor effect, where appropriate (Walker & Benson, 2013).

Results
Fig. 2 shows a histogram of the mean probability of capture by fixations from distractors. The graph shows that the chances of the saccade being directed into the distractor quadrant was close to 0.25 for all four films. In all cases the 95% CI overlap the chance level. We find no evidence that oculomotor capture occurs for these distractors.
In order to explore whether capture is a function of the time until the end of the saccade we have plotted the proportion of capture against time to the end of the saccade (ms). Average capture across participants is binned so that each bin has an equal numbers of distractors, for successive time values. These results are shown in Fig. 3. It is clear that there is no capture for a range of varying the delays between the distractor and the next saccade. As the distractor is presented 40 ms into the saccade a values in this plot represent 40 ms less than the total fixation duration. Fig. 4 shows a rose plot giving polar histograms of the angle of saccade directions, following the presentation of the distractor, averaged across all four film clips. The origin is the gaze contingent fixation from which the distractor was triggered. Note that there are four distractor positions, spatially counterbalanced, but here we rotationally transform these data before combining them on the same chart, transforming the data from each distractor position as if the distractor had appeared in the same quadrant (here denoted as a red line at 45 degrees). Visual inspection of the rose plot shows some evidence of a tendency for the eyes to move in the cardinal directions (see: Gilchrist & Harvey, 2006;Tatler & Vincent, 2008) but there is again no evidence for saccades to be directed toward the distractor (i.e. an absence of a peak around the red line).
One possibility considered was that oculomotor capture occurs only at the very beginning of the first clip before participants are engaged in the film content. Fig. 5 shows the mean capture rate during the first 60 s, of the first clip watched. We find no evidence for this initial capture hypothesis, a linear regression of the probability of capture rate against time shows there is no linear relationship, r 2 = 0.001, F(1,17) < 1, and the probability of capture is around 0.25 throughout this whole period. Fig. 6(a) and (b) show the fixation durations for the distractor fixation and the background fixations, calculated from the fixations prior to the distractor fixation and those post distractor fixation, collapsed across all four movie clips, expressed as a frequency distribution and as a mean difference between background fixations and distractor fixations. The distractor fixation distribution has two important attributes. Firstly the overall mean fixation duration is greater than the background fixation durations, as has been reported. Secondly the overall variance is increased. One possible explanation for this increase in variance is that the distractor has an alerting effect on short latency saccades (c.f. The Gap Effect: Forbes & Klein, 1996) and a slowing remote distractor effect on the longer latency saccades.
We carried out a two factor (Film, Distractor) repeated measures ANOVA on these fixation durations. Film had four levels (About Time, The Good, Limitless. The Great Gatsby) and Distractor had two levels (absent, present). There was a highly significant effect of Distractor -F(1,11) = 52.5 p < 0.001, N p 2 = 0.827, with a mean background fixation duration of 355 ms, compared to 402 ms when a distractor was present, indicating a remote distractor type effect (RDE). There was a significant effect of Film -F(3,33) There was also a significant interaction between Distractor and Film -F(3,33) = 3.24, p = 0.035, N p 2 = 0.227. This indicates that the film watched played a role in the magnitude of the RDE. Exploring this further for each of the four films, by performing a 1-tailed paired t-test, adjusted with the bonferroni correction, revealed that three out of four of the films had significant or highly significant RDE. These are detailed in Table 1. In order of decreasing magnitude of remote distractor effect size: the largest effect was with film 1 (About Time), then film 4 (The Great Gatsby): and then film 3 (Limitless). No significant RDE was found for Film 2.

Discussion
The experiment set out to look for evidence of oculomotor capture while observers were performing an engaging visual task i.e. watching movies. We found no evidence of oculomotor capture in terms of a fixations towards the distractor. We found that fixation duration varied with the film watched. We also found evidence of an interaction effect of the distractor on fixation duration, similar to a remote distractor effect. This effect was present for three out of four of the films. Intriguingly the size of the remote distractor effect varied significantly with the film watched. The fixation duration frequency distribution showed a shift in the peak latencies (250-400 ms), and an increase of variance, with some positive skew, that clearly biases the mean latency, this is consistent with the remote distractor type effect (Walker & Benson, 2013). The frequency distribution does not clearly show the characteristic notch in the fixation distribution of saccadic inhibition (Reingold & Stampe, 1999, 2004 after the distractor onset, but given the variability in the onset time, the notch could be smeared and unrecognisable. The lack of oculomotor capture suggests that there is little evidence that gaze is captured involuntarily by artificial transients artificially introduced into a movie. Instead gaze appears to be very much under top down control in this movie viewing context.
We also searched for oculomotor capture at the very beginning of film viewing and again found no evidence for it. It remains possible that participants were already fully engaged in the film within the first few seconds and as a result the spatial programming of the eye movements were already under strong top-down control The timing of saccade generation was affected by the onset of the distractor, this effect was analogous to the remote distractor effect that has previously been reported for more simple saccadic tasks. We also found an interaction effect with the film watched, suggesting that different film stimuli engage the observer at different rates this is consistent with similar findings already reported in Mital et al., 2011 and for infant studies by Wass and Smith (2014).
In Experiment 1 we found no evidence of spatial oculomotor capture by an irrelevant distractor, when participants were engaged in a compelling task, i.e. watching a film. We did however find an effect on fixation duration. This suggests that high-level processing can partially override the automatic processing associated with bottom-up oculomotor capture. Intriguingly we found that the magnitude of the effect on fixation duration by the distractor, varied with the film being viewed in the three out of four films showing an effect. It was particularly surprising that there was no evidence for initial capture, prior to a full engagement in the film watching task, we might expect that there would be a small window at the start of watching a film clip when visual selection process might be expected to be more bottom-up and saliency driven (Carmi & Itti, 2006;Mital et al., 2011). In experiment 2 we set out to replicate the findings of experiment 1 across different films as well as looking for initial oculomotor capture, by using a larger set of ten short 60 s film clips.

Apparatus
The apparatus was the same as in Experiment 1.

Observers
Twenty new observers took part in the experiment, taken from the same population as Experiment 1. There were twelve female and the ages ranged from 19 to 26 (mean = 22 years).

Stimuli
The film stimuli were short 60 s clips with the same characteristics as Experiment 1, taken from the following films: About Time Catching Fire (Lawrence, 2013), Real Genius (Coolidge, 1986), Spider Man 2 (Raimi, 2007), Identity Thief (Gordon, 2013), The Good The Bad and The Ugly (Leone, 1967), and The Great Gatsby (Luhmann, 2013).

Procedure
The Procedure was the same as in Experiment 1, except that order of film clips shown was not counterbalanced, for operational reasons, so all observers saw the clips in the order listed in the Stimuli section, i.e. About Time (Curtis, 2013) as their first clip, etc.

Data analysis
Details are the same as Experiment 1. . 7 shows a histogram of the mean probability of oculomotor capture. For all 10 films the number of saccades directed into the distractor quadrant is close to chance level (0.25) and the 95% CI's overlap with that chance level.

Fig
In order to explore whether capture is a function of the time until the end of the saccade we have plotted the proportion of capture against time to the end of the saccade (ms). Average capture across participants is binned so that each bin has an equal numbers of distractors, for successive time values. These results are shown in Fig. 8. It is clear that there is no capture for a range of varying the delays between the distractor and the next saccade. Fig. 9 is a rose plot showing a polar histogram of the angle of the directions of the first saccade following distractor onset for all ten film clips. The origin is the gaze contingent fixation from which the distractor was triggered. The four distractor positions are used to rotationally transform data so that if the distractor had appeared in the positive quadrant (as shown by the red line). Visual inspection of the rose plots is supportive of the previous evidence that there is no spatial capture by the distractors. However, again there is a suggestion of an overall horizontal and vertical bias in saccade direction.
Next we again looked for evidence of initial capture, in other words whether there is capture at the start of a film prior to immersion in the film. Fig. 10, shows the mean probability of gaze landing in the quadrant in which the probe first appeared (i.e. hits) over elapsed time, during the 60 s of the first film clip watched by all viewers, which was always About Time (Curtis, 2013). The slope of a regression of mean probability is not significant -F(1,18) < 1. Fig. 11(a) and (b) show the fixation durations for the distractor fixation and the background fixations, calculated from the fixations prior to the distractor fixation and those post distractor fixation, collapsed across all ten movie clips, expressed as a frequency distribution and as a mean difference between background fixations and distractor fixations. As with Experiment 1, the increase in fixation duration and a modification of the fixation distribution is  Hunger Games: Catching Fire 6. Real Genius 7.
The Good, The Bad, and The Ugly 10.
The consistent with the remote distractor effect, but there is no characteristic notch in the distribution which would be a marker of SI. There is also an increase in the variance of the distractor fixation duration distributions as for experiment 1. We carried out a two factor repeated measure ANOVA on fixation durations with two factors Film (see Stimuli section for order; note: all observers saw the same film clips in the same order) and Distractor (absent, present). We found a highly significant effect of the distractor -F(1, 19) = 39.6, p < 0.001, N p 2 = 0.676. In other words there is a highly significant slowing down of the fixation durations when distractors are present compared to the back-ground level, i.e. a remote distractor effect.
We also found a highly significant effect of Film -F(9, 171) = 8.17, p < 0.001, N p 2 = 0.301. The film watched (or the order viewed) affected overall mean fixation durations with a range from 303 ms (Spider Man; 7th film) to 472 ms (The Great Gastby; 10th film). Also, there is a significant interaction effect between the Film and the Distractor presence -F(9, 171) = 2.42, p = 0.013, N p 2 = 0.113.
Exploring this further for each of the four films, by performing a 1-tailed paired t-test, adjusted with the bonferroni correction, revealed that four out of ten of the films had significant or highly significant RDE. Noteworthy too is that RDE is always positive. These are detailed in Table 2. In order of decreasing magnitude of remote distractor effect size: the largest effect was with film 9 (The Good), then film 6 (Real Genius), then film 10 (Gatsby), and then film 4 (GI Joe).

Discussion
The experiment set out to replicate the findings from Experiment 1, across a more diverse set of short film clips and to look for evidence of capture at the start of short films clips, prior to engagement. The findings were clear, again we found no evidence of oculomotor capture by distractors. However, fixation distributions were affected by the onset of the distractors.
The specific results for the fixation durations in the current experiment are remarkably similar to Experiment 1. There is an increase in fixation durations consistent with the remote distractor effect (Walker & Benson, 2013). Again there is the intriguing finding that the remote distractor effect varies with the film watched, although this is confounded here with order viewed and total viewing time. The frequency distribution again shows a modification, appears to be some inhibition of saccades at peak latencies and an increase of variance, with some positive skew, that clearly biases the mean latency (Walker & Benson, 2013). The frequency distribution does not show the characteristic notch in the fixation distribution of SI (Reingold & Stampe, 1999, 2004. However, there is evidence of a slowing down of fixation durations by the distractors, and modification of the frequency distribution, showing some evidence of bottom-up oculomotor effects similar to the remote distractor effect reported by Walker and Benson (2013).
Overall the evidence from this experiment suggests that these salient distractors do not influence the spatial programming of the saccades in this task -i.e. we again find no evidence for oculomotor capture in these circumstances where the primary task is engaging. We do however find effects on the control of when the eyes move.

General discussion
The two experiments in this study both found that there was no spatial oculomotor capture by irrelevant distractors when observers were engaged in a compelling task, i.e. watching movies. There was neither capture by the distractors nor a movement in the more general direction of the distractors by observers.
One possible explanation for our results is that the distractors in the current study were simply not visually salient enough to cause oculomotor capture. This is highly unlikely for three reasons. First, they were selected to be as visually salient as possible based on the response properties of the visual system (Barlow & Mollon, 1982). Second, we used a well-established computer model of salience (iNVT, 2014;Itti et al., 1998) to demonstrate that the distractors were the most salient items in the display and third, we carried out a distractor detection experiment to demonstrate that all the distractors were easy to detect and that they were so salient that the presence of the film did not reliably modulate either their detectability or the time to detect the distractors.
Another possible explanation for our findings is that the oculomotor system is not in the required state during film viewing to see oculomotor capture, specifically that the oculomotor system needs to be in a disengaged state for capture to occur (e.g. Godijn & Theeuwes, 2002;Tse et al., 2002;Yantis & Johnston, 1990). However as we and others have shown (Mital et al., 2011), film viewing is characterised by regular saccadic eye movements at a rate typical of normal scene scanning. As a result we would argue that the saccadic system would have been in a disengaged state for some of the time (perhaps between a quarter and a third) and so we would have expected to see some evidence of capture: we found none.
Our results suggest that the oculomotor system can be under strong top-down control which prevents irrelevant visual distractors interfering with a primary task. Indeed in the current task that control appears to be strong enough to eradicate oculomotor capture completely despite our best efforts to design highly salient (and distracting) onsets.
Previous studies have shown a predominance of bottom-up factors in predicting gaze with moving images (Itti et al., 1998;Loschky, Larson, Magliano, & Smith, 2015). One explanation for this is that viewers are able to distinguish between relevant (i.e. belonging to the semantic/contextual space of the moving image) and irrelevant (i.e. distractor or surface imperfections) motion and onsets. Which would suggest that the top-down factor that is important could be some form of perceptual grouping of image features into those belonging to the scene and those that are irrelevant. In other words viewers may utilise some form of very strong attentional presetting (c.f. Folk et al., 1992) to ensure that irrelevant surface imperfections due to the films presentation (such as spots, dust or damage evident in a projection of an old celluloid film) do not capture attention and draw their gaze away from the relevant film content. We are not arguing that oculomotor capture does not occur: it is a very well documented laboratory phenomena (Ludwig & Gilchrist, 2002, 2003Theeuwes, 2004;Theeuwes et al., 1998). However, it would appear that when the visual environment is richer and more engaging that it may not be part of normal eye movement behaviour (Although it is not possible to dissociate the visual environment from the increase in observer engagement). This has implications for researchers working in a more applied context in which they record eye movements.
Another engaging and interesting activity, which appears to lend support for this view is looking at oculomotor capture whilst observers watch conjuring tricks. In a study by Smith, Lamont, and Henderson (2013) to study change detection ability during a magic trick, it was shown that oculomotor capture, did not occur when a bright and salient onset appeared while participants were engaged in counting cards during a magic trick. This additionally lends supports to our findings that oculomotor capture does not occur when participants are engaged in compelling activities. This study showed reliable temporal effects of the distractor on the ongoing fixation duration. This affect is analogous to the wellestablished remote distractor effect (Walker & Benson, 2013;Walker et al., 1995Walker et al., , 1997 or Saccadic Inhibition (Reingold & Stampe, 2004). This suggests that the distractors are visually detected (as supported by our detection task) and are having an impact on the timing of the saccades if not an effect on the landing position of the next saccade. We would argue that the distractors are being processed in a bottom-up manner even though these signals do not have sufficient priority to cause spatial capture.
We also explored an initial capture hypothesis, that there might be capture right at the start of watching a movie clip prior to engagement. However, there seemed no evidence for this in either of the two studies. Operationally we explored initial capture in the first 60 s of the movie clip, however this time interval was arbitrarily chosen. The anecdotal information, reported by participants, when watching 60 s clips, was that they found the films interesting. Some said that they then knew which films they wanted to go away and watch, and that the short clips acted as 'movie trailers.' This perhaps suggest that just 60 s is sufficient to become quite engaged in a film. In order to look for initial spatial capture it might be necessary to explore shorter and shorter time scales, and find a point prior to when engagement occurs: a topic for fur-ther investigation. An alternative hypothesis would be that there is increased top-down engagement in the first few seconds of a film due to the novelty.
The results suggest that where the next saccade is directed is under top-down control but the control of when the next saccade is generated can be modulated by bottom-up factors. Top-down control prioritises features of the semantic/contextual space of the depicted dynamic scene for attention control over irrelevant surface features but this does not mean that bottom-up control isn't active within these scene features. The balance is likely to be more complex and subtle. One possible unified explanations for these results is in terms of covert attention. It is possible that the distractor captures covert attention interfering with the generation of the next saccade but without leading to a saccade being generated to that location. Our results do not rule out this kind of explanation, but the explanation does depend on there being a certain kind of relationship between covert and overt attentionthe nature of this relationship remains a controversial topic (see Kristjánsson, 2011). However, if this explanation is correct we might expect at least some capture given the magnitude and robustness of the latency effect -we find none. This provisionally leads us to the conclusion that these results support a model of oculomotor programming in which the where and when of saccade programming are at least partially separate (Findlay & Walker, 1999).

Conclusion
This study set out to answer the question: does oculomotor capture happen if observers were engaged in a compelling or engaging task and by so-doing inform the debate on the relevant importance of top-down versus bottom-up processes in visual attention (Egeth & Yantis, 1997;Theeuwes, 2004;Yantis, 1993;Yantis & Jonides, 1996; for reviews).We suggested that in laboratory experiments designed to evaluate the relative contribution of top-down versus bottom-up processing that the tasks were less than compelling for participants, and lead to fatigue effects, exaggerating the effect of bottom-up control, and masking top-down effects. The results of  this experiment were surprisingly clear: when the task is compelling and engaging for participant's spatial oculomotor capture was not found. These studies attempted to find evidence of initial capture at the start of the compelling task (watching the movie), prior to engagement, but found no such evidence. Oculomotor capture itself may then be a laboratory curiosity, albeit one which can give us an insight into eye-movement control. In turn this suggests that laboratory experiments may well greatly over-estimate the extent to which attention is under bottom up control.