Visual search entails a complex interplay between scene salience and search strategy. Although we are capable of looking at any scene feature as often as we wish, it is usually in our best interest to be guided in our search by scene elements that closely resemble the object of our search, or to focus on locations that we believe will provide the most information. But search can also be influenced by bottom-up saliency; that is, it can be driven by attention-grabbing features in the search array, such as motion, sudden onsets, high contrast, or unique color or size (see Wolfe & Horowitz, 2004, for a review). A further source of influence is midlevel mechanisms (Hooge, Over, van Wezel, & Frens, 2005; Klein & MacInnes, 1999; MacInnes & Klein, 2003; Smith & Henderson, 2009, 2011a) that drive the saccadic system toward novel regions, as has been suggested by models of human search performance (Itti & Koch, 2001) and neurophysiological investigations in rhesus monkeys (Dorris, Klein, Everling, & Munoz 2002; Fecteau & Munoz, 2006). Two such midlevel effects that could drive the saccadic system toward novel locations are inhibition of return, which is a bias away from previous fixations, and saccadic momentum, which is a bias to repeat the most recent saccadic vector.

We move our eyes roughly three times every second to bring new parts of the environment to the central, high-resolution part of the retina. Patterns of these saccades can provide information on underlying visual processes and have been used to produce and test many models of saccadic behavior in visual search (Foulsham & Kingstone, 2012; Itti & Koch, 2001; Wolf, 2007). Saccade patterns are dependent on instructions (Yarbus, 1967), scene salience (Henderson, 2003), the entropy of the search array (Gilchrist & Harvey, 2006), and the previous state of the oculomotor system (Zelinsky, 1996). Although the importance of a low-level salience map (e.g., Itti & Koch, 2001) for the control of overt orienting has been challenged (Einhäuser & König, 2003; Tatler, Baddeley, & Gilchrist, 2005; Tatler, Hayhoe, Land, & Ballard, 2011), these challenges are aimed at narrow definitions of a salience map. Some have sought to overcome the challenges by redefining salience to include deviation or “surprise” (Itti & Baldi, 2006), a retinotopic priority map (Wischnewski, Belardinelli, & Schneider, 2010), or object-level salience (Einhäuser, Spain, & Perona, 2008). Search of complex scenes has shown influences of both top-down and bottom-up factors (Huestegge & Radach, 2012). We define the bottom-up contribution to search in broad terms—as, simply, all the information that is in the image projected onto the retina. Viewed in this way, all orienting behavior in the real world will be influenced (albeit to different degrees) by both bottom-up and top-down processes.

Most studies that have reported on the patterns of eye movement in “free looking and free search” have done so in situations that were not so “free.” Although the experimental results from these different paradigms have been fruitful and important, the ethological data might have been compromised by the experimental manipulations. For example, one fruitful paradigm has been to evaluate the aftermath of a search episode with responses to a secondary, or probe, task (e.g., Klein & MacInnes, 1999; MacInnes & Klein, 2003; Smith & Henderson, 2011b). The “free” saccades made before probes in such a task might very well be influenced by strategic adaptations to the possibility of the probes. The results from well-controlled studies that have used highly regularized search arrays (e.g., Gilchrist & Harvey, 2006) and gaze-contingent display changes (e.g., Foulsham & Kingstone, 2012)—both manipulations that might permit the researcher to confidently link array features to saccadic behavior—are limited, from an ethological perspective, on grounds of oversimplification and ecological invalidity.

A number of researchers have looked at the role of action in viewing by analyzing saccades in tasks such as sports (Ballard & Hayhoe, 2009; Land & McLeod, 2000) or making tea (Land, Mennie, & Rusted, 1999). Many saccades tend to land in areas with no current salient features, but where objects will be after an action, giving further support for the role of top-down influences over salience in these tasks. Here we present a descriptive analysis of human-generated saccades—an ethology for visual search in static scenes. Although the spatial scope of search was limited to a computer monitor, the search arrays were chosen to be extremely dense, with the search target often intentionally being camouflaged. This allowed us to analyze search over a much longer period of time, rather than limiting search to it earliest stimulus-driven stage (Parkhurst, Law, & Niebur, 2002). In the present experiment, we measured the locations of fixations and the direction, amplitude, and timing of saccades in free search. Because we were interested in natural search behavior, no probes interrupted search, and observers were allowed up to 120 s of search per image. We analyzed our data with the intent of finding the relative contributions of image properties, top-down strategies, and also midlevel orienting mechanisms. Although we will stop short of producing a working model of visual search, we investigated and identified many factors that could be important in future models, and developed and tested techniques that could be useful in future exploration of search patterns. We took two approaches to analyzing the data. In the “Results and Discussion” section, we present descriptive analyses and look for similarities and repeating patterns across individuals and across images that would indicate top-down strategies or image-driven effects on search behavior. Then, in the “Midlevel Orienting Mechanisms” section, we look at effects, specifically inhibition of return (IOR) and saccadic momentum (SM), and use the search data to test specific hypotheses based on current models and theories of these effects.

Method

Eight students of Aberdeen University were paid to participate in a simplified version of the Where’s Wally? search task, based on a popular series of children’s books in which a specified character, Wally, is hidden in a complicated illustration. The only task was to search for Wally and press the space bar when he was found. Thirteen scenes of varying complexity were displayed to the observers until they found Wally or until 120 s had elapsed. Wally, or some portion of him, was present in all of the images, with his size ranging from 0.2 to 1.8 visual degrees. Images were presented on a 19-in. Sony CRT monitor at a resolution of 1,024 × 768 and a refresh rate of 100 Hz. Eye position was monitored using an EyeLink 1000 desktop eyetracking system. With search times ranging from a few seconds to the full 120 s, we were able to record thousands of saccades in free search for each observer. From the eye movement data, we were able to extract a variety of dependent measures from the trial saccades and fixations, namely (1) the amplitude of each saccade, both on its own and relative to the distance between the start-point of the saccade and the one-back and two-back fixations; (2) the angle (in degrees) of each saccade relative to both the preceding fixation (one-back) and the fixation that preceded the preceding fixation (two-back) (Fig. 1); (3) the saccadic latency—that is, the duration of the fixation that preceded the current saccade; and (4) fixation coordinates, in absolute screen pixels. These variables were explored in isolation, in combination, and as a temporal sequence. Other computational analysis techniques will be discussed as they are introduced.

Fig. 1
figure 1

Illustration of how the relative angle of each saccade was coded. The circles represent the locations of a sequence of fixations, 1–2–3. (A) Upon landing at “3,” the previously fixated location “2” would be coded as 0º. Subsequent saccades would be coded relative to this location, such that a saccade along the same trajectory would be coded as a “forward” saccade (180º ± 5º), and a saccade back to “2” would be 0º (± 5º) and would be considered a return saccade. (B) The same coding scheme was applied to the “two-back” fixation location: Upon landing at “3,” location “1” would be coded as 0º, and the angle of the subsequent saccade would be calculated relative to this location

Results and discussion

Over 20,000 saccades were collected from 8 subjects searching 13 scenes from Where’s Wally?

Descriptive statistics

The typical saccadic amplitude was skewed toward shorter distances (mean = 3.97, median = 2.68 visual degrees), and the mean fixation duration was 277.15 ms (Figs. 2A and B). These general saccadic tendencies are similar to those in data from Klein and MacInnes (1999) and Smith and Henderson (2011b). In both of these previous studies, unlike the present one, search was frequently interrupted with a probe to which subjects were instructed to respond as quickly as possible. The search scenes ranged in difficulty in terms of the number of times that Wally was found and the average time required to find him. Wally was not found by any of the subjects in the “fruit” scene, but he was consistently found in less than 20 s in the “fountain” scene (Fig. 2C).

Fig. 2
figure 2

Distributions for (A) saccadic amplitudes and (B) fixation durations across all observers. (C) Numbers of times that Wally was found for each image, and the mean search times for successful searches only (images are denoted by a simple descriptor of the image content)

Top-down (search strategy) and bottom-up (salience map)

To explore the interplay of top-down strategy and image-based salience, we will begin with simple scanning strategies and then explore more subtle top-down influences. First, our observers did not seem to solely employ a simple strategy, like systematically “reading” the scene from left to right and top to bottom. Whereas typical reading studies (silent reading, English) show biases of 85 % of saccades to the right and 15 % to the left (Rayner, 1998), our search data do not show such an extreme bias (Fig. 3B). The fixations for one observer (Fig. 3A) illustrate a typical search, with clusters of visits to salient locations. By binning the data into absolute angular distance and measuring saccadic tendencies to the left and right (±5º), we see that observers do saccade to the left (7.7 %) and right (7.0 %) more than in other directions (2.5 % average in other equal-size bins) [left/right vs. oblique, t(7) = 12.4, p < .001]. Rightward saccades are no more common than leftward ones, and 85 % of the saccades do not follow a simple left–right scanning pattern.

Fig. 3
figure 3

(A) Fixations for one subject on one trial. The dots are individual fixations, with size reflecting the duration of the fixation. (B) Polar plot of all saccadic angles (in degrees) by amplitude (visual degrees), demonstrating no discernible bias for fewer and longer leftward saccades, as would be expected from a reading strategy

Although no overriding simple scanning strategy may be present, observers were likely employing some type of top-down influence in this task. Current models (Wolfe, 2007) take the stance that visual search is driven by a combination of top-down strategies and bottom-up salience. We believe that both of these factors can be modeled separately, and that their relative contributions can be measured even in a free search task. For example, our data include eight observers searching 13 different images. To the extent that bottom-up control rooted in the image properties matters, we should find consistent differences between the images across observers. Conversely, to the extent that top-town strategies residing in the observers matter, we should find consistent differences between the observers across images.

Looking at Fig. 4, we see fixation durations, amplitudes, and search times for individual trials, but also the means for each image (final column) and observer (final row). For example, Image 3 tends to have very long search times, and Image 11 tends to have very short fixation durations. These are likely due to the particulars of the salience and feature maps of those images. Observers 2 and 5, however, tend to make saccades of very short and long amplitudes, respectively, and this is part of their search strategies, since it is consistent across trial images. To further explore the relative contributions of salience and strategy, in what follows we will introduce a number of new metrics of saccadic and fixational similarity, and analyze directly whether these similarities are predominant across images or observers.

Fig. 4
figure 4

The z-score histograms for all observer–image combinations (individual cells), depicting the mean fixation duration (first column in each cell), mean amplitude of individual saccades (second column), and total search time (third column). Zero/baseline is indicated by the horizontal hairline within each cell and reflects the global mean for each variable. Negative z scores are in orange beneath that line, and positive z scores are in green above the line. Mean z scores across each observer (bottom row) or across each image (last column) reflect how that image or subject differs from the mean in each variable. Consistent patterns across a single image, such as the short fixation durations in Image 11, likely represent the influence of that image’s salience, whereas patterns across each subject, such as Observer 2’s short amplitudes, are likely an individual’s strategy

Scanpath analysis

Two recent studies on scanpath analysis have tackled the problem of scanpath similarity by using algorithms borrowed from genetics research, used to compare sequences of genes. SCASIM (von der Malsburg & Vasishth, 2011) and ScanMatch (Cristino, Mathôt, Theeuwes, & Gilchrist, 2010) use variants of the Needleman–Wunsch algorithm (Needleman & Wunsch, 1970) to calculate scanpath similarity by converting series of fixations into strings of discrete characters representing the temporal features of the saccade or fixation sequence. These strings are then scored for similarity by the work it takes to convert one into the other through a series of deletions, insertions, and the introduction of gaps. Our data differ from ScanMatch and SCASIM data, however, in that our trials include sequences of hundreds of saccades and that the Where’s Wally? images do not lend themselves to easy “region-of-interest” divisions. We did use a number of ideas from these algorithms, but with a few important differences. First, we divided our fixation and saccade sequences into a number of smaller subsequences using a nonoverlapping sliding window of random size between one and five eye movements. Smaller ranges of subsequences were chosen as a first step to detect simpler patterns in the saccadic data, and this range could be expanded to detect longer strings if any patterns of four or five saccades were detected. Nonoverlapping windows were used so as to avoid the sequence similarity confounds that are introduced with overlapping windows (Keogh & Lin, 2005). Also, to convert our saccade and fixation information into discrete symbols, we used three separate coding schemes, with each being chosen to be sensitive to a number of strategies available in visual search. Since saccadic strategies could be relative to recent saccadic history or to the screen itself, we included codings for both absolute and relative saccadic angles.

We first created a discretized variable to represent absolute saccadic angle (AbsAng), using 18 bins of saccadic angles, each one spanning 10º, as compared to an absolute rightward, horizontal saccade. We also created a discretized variable to represent presaccadic latency, by converting its duration to its log10 and binning between 1.5 and 3.0 in 0.1 increments. Each saccade in a sequence could then be represented by a dyad of two alphabetical characters denoting its absolute angle and latency.

The other two sequence codings were relative saccadic angle (one-back) (RelAng1) and relative saccadic angle (two-back) (RelAng2). These followed the same coding rules as absolute angle, except that saccadic angles were not calculated relative to an absolute rightward direction, but instead to a vector going back to a previous fixation. RelAng1 codes the angular distance to the one-back location, whereas RelAng2 codes to the two-back location (see Fig. 1). The log of the fixation duration completes the dyad in both variables.

These different coding schemes may be sensitive to different influences on the saccade sequences, given that the absolute angle of saccades will represent sequences in absolute (or scene-based) coordinates, whereas the relative angle will be sensitive to patterns based on the previous state of the oculomotor system. This is not to say that all AbsAng patterns would represent strategic planning: Frequent long strings in a rightward direction would clearly represent a reading bias, but if the most common sequences were short, a more likely interpretation would be a tendency or preference for edges or corners of the display. We will further discuss our interpretation of common subsequences below.

Finally, we suggest that these scanpath sequences would be influenced by both scene salience and observer strategy, but as with our descriptive analysis, we propose that similarities in the scanpaths common to a given subject would be more strategy-driven, whereas scanpath similarities within images would be influenced by scene features. We also propose that patterns across an absolute scale (the entire image search) would be strategic, whereas local, relative patterns would be influenced more by local features or attentional state.

To carry out the scanpath analysis, substrings were extracted from each trial using a nonoverlapping sliding window, and the Needleman–Wunsch (NW) algorithm (Needleman & Wunsch, 1970) was used to generate a distance score representing the amount of work required to convert one subsequence into the other. Valid string manipulations for NW included gap insertion and the transformation of one character into another. The cost (T) of these transformations was set at 1.0 for gap insertion, and a relative transition cost of 1.0 minus the inverse of the distance between the alphabetical characters representing the discretized angle and latency of the saccade:

$$ T=1-1/\left[\mathrm{abs}\left(\mathrm{old}\kern0.5em -\kern0.5em \mathrm{new}\right)\right]. $$

This inverse distance allowed for the fact that our spatial, angular, and temporal codes represented a scale, with closer categories taking less work to transform. That is, angle B is more similar to C than it is to G.

The score from comparing any pair of sequences (A and B) was the final number of characters that matched in value and location, minus the cost of the transformations needed to reach that match. This score was then divided by the number of characters in the sequence, to normalize for sequence length. This produced a range of similarity scores from –1 (for no similarity) to +1 (for a perfect match). So,

$$ NW=\left({\displaystyle {\sum}_{i=0}^n}S\left({A}_i,{B}_i\right)-{\displaystyle \sum T}\right)/n, $$

where n is the length of the resulting string, S() is the similarity value of each character in the subsequence, and T is the cost of each transform.

Because our trials lasted up to 120 s and many contained hundreds of saccades, the similarity score for any two full trials would be meaningless. We therefore sampled substrings of lengths one to five from each trial and looked at the mean NW similarity score for these trials. Subpatterns also allowed us to look for shorter repeating patterns within each trial. Our selection used a nonoverlapping sliding window of random length, for reasons discussed in Keogh and Lin (2005).

Results for scanpath similarity

The similarity scores for our eight observers were analyzed using paired t tests and adjusted for multiple comparisons with the Holm–Bonferroni (Holm, 1979) correction. First, we compared differences between our three coding schemes (AbsAng, RelAng1, and RelAng2) to determine the relative strengths of the substring patterns (NW score) within each. We observed significant NW score differences between all coding schemes, with AbsAng being less than RelAng2 [t(7) = 24, p < .001] and RelAng2 being less than RelAng1 [t(7) = 25, p < .001] (see Fig. 5). In general, the relative patterns were stronger than those measured in absolute coordinates or angles. The mean NW values were also consistently negative, suggesting that overall, there was relatively little similarity of substrings within each search and that the saccades were not likely generated by any single, simple repeating pattern. A second analysis compared NW scores when the trials were grouped by image against trials grouped by observer. We propose that string similarities within a single image would be primarily caused by scene features, whereas similarity within subjects would be more indicative of a top-down strategy. Substring t tests were conducted for all of our coding schemes, and the mean similarity score was calculated for each. We grouped these means by images and by observers to determine which of these factors contributed more to any patterns observed with NW, but we did not find any differences for any of our string codings [AbsAng, t(7) = 0.80; RelAng1, t(7) = 0.73; RelAng2, t(7) = 0.48] (Fig. 5). Again, NW scores are negative, suggesting few or weak similarities among the substrings, whereas the lack of effect when comparing the image against the observer groupings suggests that neither was a stronger influence in determining what similarities exist in that saccadic subsequences: The relative and absolute patterns in these search data were equally influenced by scene features and top-down strategies.

Fig. 5
figure 5

Mean Needleman–Wunsch similarity scores, grouped by image, by observer, for the full data in the actual order, and the full data in a randomized order. All of the scores are strongly negative, rejecting simple scanning strategies, but the original angular data have more pattern similarity than does the random order. Relative patterns are also stronger than the absolute coordinate patterns. No difference in pattern strength is apparent when the data are grouped by images or observers, suggesting that these patterns result from equal contributions of observer (strategy) and image (salience)

We wanted to ensure that our null result when comparing the similarity of strings for the observer and image groupings was due to weak but equal contributions of strategy and salience, so we performed two tests to ensure that our measure would detect changes in patterns. First, we compared the results of the three codings in their original order to sequences resulting from a random walk of saccade locations (Fig. 5). We performed t tests for each coding scheme, comparing the NW score of each original ordering against its temporally randomized equivalent. These comparisons resulted in significantly more patterns (less negative) for all three angular measures [AbsAng, t(7) = 2.75, p < .05; RelAng1, t(7) = 771, p < .001; RelAng2, t(7) = 283, p < .001], suggesting that patterns did arise in our data, as measured by these coding schemes, and that the NW is sensitive to those patterns.

For further evidence that our NW score is sensitive to differences in the search patterns, we turned to Gilchrist and Harvey (2006), who manipulated the entropy of search arrays to explore the effect on systemic search patterns that they attributed to cognitive strategy. In their results, they showed that regular search arrays tended to produce a stronger horizontal saccade bias than did arrays that were less regular. Although we did not choose our images with this manipulation in mind, our stimulus set included one image with more regular features than the others. In most Wally images, the character and object distractors are spread out equally throughout the scene, but for one image in our set, the distractor characters were displayed as framed portraits, with empty space between the frames (The Great Portrait Exhibition in Where’s Wally? The Great Picture Hunt, 2006). If our NW similarity measure is sensitive to systemic patterns, we should be able to replicate Gilchrist and Harvey’s results with the picture frame image (Fig. 6). We compared the mean NW similarity scores for our typical images to the mean of the frame image. We found that the typical Wally image contained fewer similar subsequence patterns than the Wally frame image for all of our angular sequence codings [AbsAng, t(7) = 3.02, p < .05; RelAng1, t(7) = 3.15, p < .05; RelAng2, t(7) = 4.36, p < .01]. Although the mean difference was small in each case (maximum .05 NW score), it was consistently in the expected direction of the “frame” image producing more similar strings, and the absolute saccadic angle coding (AbsAng) replicated the coding used in Gilchrist and Harvey’s study.

Fig. 6
figure 6

Comparison of Needleman–Wunsch (NW) scores of the “Frame” image and the mean of the other images. The AbsAng result is a close replication of the search array manipulation and results in Gilchrist and Harvey (2006)

Common substrings

Another common analysis for data mining and genetics is the discovery of common substrings, or “motifs” (Chiu, Keogh, & Lonardi, 2003). We looked for common sequences in our data set by using a probability weight matrix applied to the dyads established above. Typical sequences tended to be short, with most being only a single dyad, and none extending beyond two dyads. These single common saccades generally reflected the broad tendency toward horizontal saccades reported above. These results suggest that no clear motif search patterns exist in our data, in either the absolute or relative angles of saccades, so the full analysis and results are not reported in detail here.

Midlevel orienting mechanisms

Midlevel effects, rooted in an observer’s prior orienting behavior, have been suggested to be a driving force in visual search. Both IOR (Klein & MacInnes, 1999; Posner, Rafal, Choate, & Vaughan, 1985) and SM (Smith & Henderson, 2009) have been proposed to play a role during search by biasing saccades: either away from previously fixated locations or toward a continuation of the current vector, respectively.

As opposed to top-down strategy, which could be measured in lengthy, global patterns, midlevel orienting effects are most likely to affect saccadic distributions as a function of the current state of the oculomotor system. We therefore focused on short-term, relative measures—specifically, on the only computationally explicit model of SM. Wang, Satel, Trappenberg, and Klein (2011) proposed that leftover activity in the superior colliculus following a saccade leads to an increased probability of a repeated saccadic vector or “saccades in the forward direction, particularly those with the same amplitude as the previous saccade” (p. 3). We analyzed our data with models such as this in mind, in particular for the relative occurrences of saccadic amplitudes, fixation durations, and spatial locations at recently visited (one- and two-back) locations.

Repetition of amplitude

Our null hypothesis for the distribution of amplitudes for individual saccades in a given search is that they are chosen randomly from some distribution. Without making any assumptions regarding the properties of the distribution of amplitudes for individual saccades, we can still say something about the difference between two saccadic amplitudes that are randomly sampled from that distribution; notably, the distribution of differences will have a mean and mode of zero and a normal distribution. Sequential saccades may not be independent, however, and midlevel orienting mechanisms could influence the selection of saccades, such that the difference in amplitudes within a saccade pair would not fit the “expected” distribution of differences based on random selection. In particular, SM predicts that saccades would tend in a forward direction (Smith & Henderson, 2009), which, in the computationally explicit model (Wang et al., 2011), produces a higher-than-expected frequency of repeat vectors (amplitudes and direction) than would be expected by chance, resulting in a mean amplitude difference of zero but an increase in the mode produced without SM (oblique saccades). On the other hand, IOR would lead to a reduced probability of saccade pairs in a reverse direction having equal amplitudes, resulting in a deviation of the normal distribution. We also tested this null hypothesis against oblique amplitude pairs (saccades that neither continued nor reversed), which should not be affected by either SM or IOR.

Smith and Henderson (2011a, 2011b) assessed relative amplitudes in their analysis of SM in visual search, but they used a fairly coarse bin for relative saccadic amplitudes. Their difference measure subtracted the amplitude of the current saccade from that of the previous saccade, and the fact that the distribution of differences tended to center on ±2 visual degrees in their results demonstrates a high likelihood of amplitude repetition. However, given that most saccades are likely to be less than 4º in amplitude (MacInnes & Klein, 2003; von Wartburg et al., 2007), and might be even smaller in complex scenes, this binning could mask subtly different patterns for refixations relative to saccades 180º away. We divided our own data into 1.0º amplitude bins instead of 2º bins, to gain a more precise measure of repetition. For angular distance, we again used the angle between the previous and current saccadic vectors, with repeat vectors being 180º ± 5º and reverse vectors being 0º ± 5º (see Fig. 1), and the “oblique” category containing all other saccadic angles. As can be visualized in Fig. 7, had we used larger bins of ±2º magnitude differences, too high a percentage of the overall saccadic distribution of differences would have been in the first two bins (as a consequence of the fact that almost 70 % of the saccades in our experiment fell between 0 and 4 visual degrees in amplitude).

Fig. 7
figure 7

Distributions of saccade amplitude differences (current – previous) for forward (A), backward (B), and oblique (C) saccades, as well as means for the distributions of differences (D). The polar plot (E) shows all saccades according to their relative angle and distance. The bisecting lines represent pie slices used to designate the forward and backward saccades in panels A, B, and D

Using this analysis, the mean relative amplitudes for forward and backward saccades were both significantly different from zero, with forward saccades tending to undershoot the previous amplitude [mean = –1.415; t(7) = 3.37, p < .02] and backward saccades tending to overshoot [mean = +1.169; t(7) = 3.35, p < .02]. These results reject the null hypothesis that pairs of sequential saccades are randomly selected from some underlying distribution. The significant difference in relative amplitudes (the mean does not equal 0.0) among successive forward saccades also does not support the prediction made by SM that equal amplitudes should be more likely when direction is also repeated. The significant amplitude difference among successive backward saccades, although it is not predicted by IOR, is consistent with IOR. The significant differences that we did detect were not an artifact of the bin sizes for the amplitude or angle, given that we applied the same analysis to oblique saccades, and found that the mean relative amplitude was not significantly different from zero [t(7) < 1; Fig. 7]. It is true that the modes of all three distributions (forward, backward, and oblique) are at or near zero, and we see no differences in the proportions of saccades at the mode between forward, backward, and oblique saccades [t(7) < 1]. We do not dispute that repeat amplitudes are common for both forward and backward vector saccades (this is discussed in more detail in the “Backward Probability” section below), but our null hypothesis predicts a distribution in which both the mode and mean of the distributions would fall at 0, and either of these scores measuring a nonzero value is sufficient to reject the null. Some process is acting on the selection of successive backward and forward amplitudes to shift them away from the purely random selection observed in successive oblique saccades (Fig. 7C). We do not see the increase in repeat frequency that we would expect from the Wang et al. (2011) SM account of forward amplitudes, nor do we see the decrease in the probability of repeat amplitudes that we would expect to see in backward saccades from an IOR account. The pattern reveals multiple processes that could be acting on the selection of saccade vectors. For example, perhaps observers’ previous attentional states shift the distribution of differences for backward and forward saccades away from the Gaussian predicted by random sampling or SM, and other factors, such as the salience of the previous fixation, generate a large number of refixations. This is also evident in the polar plot (Fig. 7E), which shows frequent repeat amplitudes for all angles (the 0º amplitude ring), but a break from the normal distribution for forward and backward saccades (green pie slices).

Forward probability

That pairs of sequential saccades tend to repeat their current angular direction is not in dispute, and this pattern can be observed in the present data, as well as in previous studies (Klein & MacInnes, 1999; Smith & Henderson, 2011a, 2011b). Both SM and IOR have been proposed as a basis for this forward tendency, but most analyses have focused on the most recently fixated location. A forward tendency could be rooted in spatiotopic coordinates, such as an inhibitory tag (Klein, 2000), or in vector coordinates, as suggested by Wang et al. (2011), and these theories are not easily distinguished at the one-back location. A tendency to saccade forward (180º—i.e., away from previously visited spatial locations) would cause a vector bias away from only the one-back location, and would predict a smaller bias away from the two-back location. IOR and SM make different predictions concerning the expected reduction of the number of forward saccades from the one-back location, relative to forward saccades from the two-back location. A vector-based explanation for the forward bias predicts a bias away from the two-back location only when two forward (180º) saccades are produced in a row. A spatiotopic-based account such as IOR, on the other hand, predicts an increased probability of saccades being directed away not only from the immediately preceding fixation, but also from the two-back fixation (Fig. 1). Figure 8 shows the distributions of saccade angles relative to both the one-back and two-back locations, and although we did see a decrease in forward saccades away from the two-back location (relative to the one-back location) of 0.2 %, this reduction was not significant [t(7) < 1.0], suggesting equal biases away from the two-back and one-back locations, consistent with a spatiotopic attentional influence.

Fig. 8
figure 8

Distribution of all saccade angles relative to the (A) one-back and (B) two-back locations, using bins of five visual degrees. Angular distances range from 0º (backward saccadic vector) to 180º (forward saccadic vector) (see Fig. 1 for the calculation of angular distances)

Some of these saccades away from the two-back location, however, are also saccades away from the one-back location, when the two previous saccades line up with the current vector (see Fig. 9 for an illustration). We measured the probability that the two-back location was forward, given that the one-back location was also forward, to determine whether these sequences could explain the lack of reduction in two-back saccades. Selecting only saccades directed forward relative to the two-back location, we found that 54 % of these saccades were also directed forward relative to the one-back location (180º ± 5º). The percentage of repeat saccades, however, would have to be the probability that one-back saccades would fall in the 180º bin divided by the probability that the two-back would fall in the 180º bin. Taking these numbers from the 180º bins in Fig. 8A and B, we would need .058/.060 = 96.7 % of saccades to continue forward to entirely explain our observed lack of reduction in two-back forward saccades. Since the percentage of forward saccades does not decrease from one- to two-back, we must conclude that either 97 % of saccades continue in the same direction (they do not) or that something else is shifting saccadic direction away from the two-back location. Thus, these results are consistent with an IOR effect biasing saccadic direction away from the spatial inhibitory tags generated during previous inspections. It is also clear from Fig. 8 that backward saccades are as prominent for one-back as for two-back locations, which is also a problem for IOR to explain. This issue is addressed in the next section.

Fig. 9
figure 9

Under saccadic momentum, the probability of saccades away from the one-back location, P(X), should be greater than that away from the two-back location, P(Y). The only cases in which saccades would be predicted to be directed away from the two-back location are those in which the two previous saccades line up

Backward probability

The proposal that IOR is a facilitator of visual search leads to the prediction that the likelihood of a saccade returning to previously fixated locations will be reduced (Klein, 1988). But reduced from what? Although the incidence of return saccades has consistently been shown to be less than that of forward saccades, it is also higher than that of neutral, oblique angles, relative to previous locations (Klein & MacInnes, 1999; Smith & Henderson, 2011a, 2011b; present data). However, the location that was just fixated is likely to be relatively more salient and/or task-relevant than any otherwise equivalent location, simply because the observer has already fixated that location at least once (Klein & Hilchey, 2011). This makes its salience unique among other equidistant locations. Comparing the frequency and metrics of forward saccades to a baseline (such as 90º saccades) is therefore justified (because neither has been previously fixated), but comparing refixations to a similar baseline would be confounded by previous fixation status. To determine whether refixations are more or less likely than baseline, that baseline must be equivalent to the previously fixated target in saliency, task relevance, and distance from the current fixation. Smith and Henderson (2011a, 2011b) controlled for saliency and task relevance by comparing the probability of returning to a location within one or two fixations to the probability of that location repeating when the sequence of fixations is randomly shuffled. The idea was to generate a proportion of refixations that would be expected if IOR did not influence the sequence in which salient locations were fixated. The rate of refixations in the actual sequence was higher than the shuffled baseline, which they took as evidence that IOR was not discouraging refixations. This shuffled baseline does not, however, control for the distance of the previous fixation from the current one. Sequences of fixations are spatially clustered, and when shuffled, this clustering would be eliminated. The shuffling method used by Smith and Henderson (2011b; see also Hooge et al., 2005) therefore introduced a new problem, which is that the distance between consecutive fixations when their order has been randomly shuffled will be larger than in the original sequence of fixations (Fig. 10). Because locations closer to the fovea will be more attractive than more distant locations, refixations would be expected to have a higher base rate than other locations in a natural sequence of saccades.

Fig. 10
figure 10

Proportions of saccades that revisited the one-back and two-back locations in the actual data, and when the fixation order was randomly shuffled. The amplitudes of refixations in the actual data for both locations are shorter than in the randomly shuffled data set

To verify this, we conducted a similar comparison in the present study, defining a refixation as a saccade that fell within one visual degree of a previous fixation. The mean probabilities for all observers to return to the one-back, two-back, and shuffled locations are illustrated in Fig. 10. Although refixations were significantly more likely for the one-back [t(7) = 11.4, p < .001] and two-back [t(7) = 7.5, p < .001] than for the shuffled locations, the distance between the current location and these shuffled locations was also much higher than to one-back return locations [one-back, t(7) = 3.6, p < .01; two-back, t(7) = 2.2, p < .06]. Thus, observers may have returned to previously fixated locations because, despite any influence of IOR, they were still nearby or salient locations. Moreover, Bays and Husain (2012) conducted a Bayesian analysis of search saccades, and they were able to control for scene salience and compare the observed likelihood of return fixations to the likelihood that would be predicted by a memoryless system. Relative to this salience-controlled baseline, return saccades were indeed less likely, giving further support to the IOR account.

Fixation durations

Although midlevel orienting effects can and do generate spatial patterns and probabilities, top-down influences can certainly override these tendencies. There would be no benefit to an orienting system that influenced saccade selection if that system could not be overridden when needed. We would, however, expect to see repercussions of those choices in data such as the fixation durations prior to saccades. In particular, if effects such as SM and IOR ease the oculomotor system forward or discourage it from returning, respectively, then we should see a temporal cost when return saccades are executed and an advantages when vectors are repeated. These predictions for fixation durations are not mutually exclusive, and indeed Smith and Henderson (2009) found evidence for both SM and IOR in natural viewing.

The SM account suggests that when observers follow the tendency to continue forward, there should be an effect of reduced fixation duration prior to that forward saccade. Alternatively, the IOR account predicts an increased fixation duration when observers override that inhibition to refixate a previous location. As we discussed in the introduction, observers can and do return to previously fixated locations, especially with noisy or complex scenes, in which observers may choose to revisit a location to ensure that nothing was missed. We analyzed the fixation durations of gaze locations prior to the current saccade. Since a true return saccade is one that matches the previous saccade in amplitude, yet reverses the direction, we binned our data by both relative amplitude and angular difference. To match the equivalent analysis from Smith and Henderson (2009, 2011a), for relative amplitudes we created seven bins, each of 2º, centered on relative amplitudes from –6º to +6º. For angular difference, we created five bins of 45º, from 0º to 180º (i.e., 0º, 45º, 90º, 135º, and 180º). Saccades of less than 1º were excluded, as were relative amplitudes greater than 7º or less than –7º and fixations that fell within 1º of the screen edge, which limited the potential angular bins. The remaining 8,600 fixation durations were analyzed in 7 (relative amplitude) × 5 (angular difference) within-subjects analyses of variance, separately for the one-back and two-back locations. We expected two patterns to emerge on the basis of previous research: Return saccades of equal amplitude (0º angular distance and 0 relative amplitude) should be slowed relative to other combinations, as predicted by IOR, and forward saccades (either the entire 180º line or the 180º/0 bin) should be speeded relative to other directions, as predicted by SM (Smith & Henderson, 2009).

We found main effects of relative amplitude in both the one-back [F(6, 42) = 11.7, p < .001] and two-back [F(6, 42) = 14.5, p < .001] analyses, with longer fixation durations when short saccades followed long saccades (consistent with an observation made by Smith & Henderson, 2011a, 2011b). No effect of angular distance emerged for the one-back locations [F(4, 28) = 1.3], but it was significant for two-back locations [F(2, 28) = 3.3, p < .03], though not caused by differences in 180º or 0º. The absence of significance at in the one-back 180º analysis is contrary to Smith and Henderson (2009, 2011a, 2011b), who found shorter fixation durations for forward saccades in search and free viewing. The interaction was significant in both analyses [one back, F(24, 168) = 1.7, p < .04; two back, F(24, 168) = 2.5, p < .001].

We tested for the significance of the 0º peak by comparing the observed 0º/0 peak against those predicted by the regression line of the other amplitudes. On the basis of expected interactions between durations and relative amplitudes (Smith & Henderson, 2009; Tatler & Vincent, 2008; Unema, Pannasch, Joos, & Velichovsky, 2005), we expected a linear decrease in fixation durations from relative amplitudes of –6 to +6. Separate regressions for each observer’s 0º angular distance line were used to determine their expected fixation durations at the 0 amplitude location, which resulted in predicted values of 251.8 ms (SD = 15.8) for one back and 249.4 ms (SD = 19.2) for two back. These expected return fixation durations were then compared against the measured fixation durations (one-back mean = 270.3, SD = 29.4: two-back mean = 268.4, SD = 33.6). Dependent-measures t tests against the actual subjects’ means revealed that the 0º peak was significantly slower than predicted for the one-back location [t(7) = 2.5, p < .05]. This difference was not significant for the two-back location [t(7) = 1.8, p < .11].

This analysis suggests that saccades that return to the previously fixated location (reversing direction and repeating amplitude) are particularly slow. If IOR exists independently of, or despite, bottom-up or top-down mechanisms that might generate a return saccade, such saccades are likely to be delayed by IOR, as reflected in longer fixation durations prior to these return saccades. This is exactly the pattern of results that was observed for return saccades of repeat amplitudes in Smith and Henderson (2009, 2011a, 2011b) and in the present study (Fig. 11). It is interesting to note that the 0º/0 peak does not produce the slowest fixation duration of all the locations measured; clearly, other factors than IOR contribute to saccadic latencies. Consistent with the IOR account, we did find a relative cost in fixation duration for saccades that returned to previously fixated locations, but we did not find the signature SM effect of shorter fixation durations for forward saccades from one-back locations.Footnote 1

Fig. 11
figure 11

Fixation durations for saccades and their directions relative to the one-back (upper) and two-back (lower) locations, split by the relative amplitude (current – previous) and angular distance of the saccade. The signature inhibition-of-return effect of a slowed peak at 0º/0 (an exact refixation) is present, whereas the signature saccade momentum effect of a fast 180º (repeat vector) line is not observed

Summary and conclusion

Visual search is a complex interplay of scene salience, searcher strategy, and midlevel aftereffects of orienting. Fixations and saccades from search data unfettered by control conditions or secondary tasks can provide insights from all three of these perspectives if we perform analyses across scene images, observers, and patterns over time.

Top-down strategy and attentional sets are pervasive in all search tasks, whether controlled or free, and they interact with the underlying scene salience (Henderson, 2003). Traditional measures such as search completion times, fixation durations, and spatial distributions of saccades can be augmented with measures of temporal sequence similarity, such as the Needleman–Wunsch score. Through inspection of search data as a sequence of fixations, these measures can be applied to access patterns that are absolute in scene terms or relative to the current state of the oculomotor system. In addition to these more data-driven approaches, it is also possible to test specific hypotheses about the relative impacts of effects such as IOR and SM in natural search data. Although no single one of these measures alone is sufficient in itself to describe the complexities of search, each contributes a lens through which we are able to observe the respective contributions of strategy, salience, and attention in visual search.

Strategy and salience

Patterns of search involving an observer’s top-down strategy can be simple, such as a left–right “reading” strategy, or more complex and situational, such as focusing attention on reddish scene items that might match Wally’s shirt. Simple global strategies were not seen in our data, as evidenced by relatively low string similarity for our observers and the very short length of common substrings. Consistent strategies would predict saccadic sequences that were more similar when grouped by observers than those grouped by particular images, yet we found no evidence for this. Saccade sequence similarities across images and observers were not different, suggesting equal contributions of each. The patterns detected by our analysis of sequence similarity suggest that, overall, repeated sequences are uncommon and short, with most lasting one or two saccades. Those that are present tend to be more frequent in local and relative coordinates than are those measured in global or absolute coordinates, meaning that they are more likely influenced by the then-current state of the oculomotor system and image salience. Due to the complexity of the Where’s Wally? search scenes, our sequence and strategy analyses did not consider scene-, object-, or feature-based strategies, except insofar as they would be represented by saccadic selection and consistent across observers.

Just as the patterns for any given subject implicate strategic control, patterns for any given image implicate a role for the salience and features of that image. As we mentioned, a comparison of saccade similarities by observers and images showed no differences, suggesting relatively equal contributions of each while search proceeded in complex scenes. Comparing specific images of differing regularities, however, replicated the controlled study by Gilchrist and Harvey (2006), in that the image with more regular features showed more local patterns in both absolute and relative angular coordinates.

Aftereffects of orienting behavior

We confirmed that saccades are not independent in visual search: Saccades are more likely to move in a forward direction than toward the previous location. Our analysis of eye movement behavior during natural search suggests that observers are biased away from recently visited locations. This tendency toward novel locations can of course be overridden; in the context of complex scenes, for example, refixations are common and necessary to discover missed details. Here, we also found that refixations were a common occurrence during natural search. However, in most cases, when an oculomotor bias was overridden and a saccade was directed toward a recently visited location, we observed the effects of the bias in fixation durations.

Although SM has been shown to contribute to saccade behavior in other studies, the majority of the evidence here points toward IOR being the primary mechanism driving saccades away from previously attended locations. We found tendencies for forward saccades to diminish in amplitude, for return saccades to increase in amplitude, and for saccades in other directions to be, on average, of similar amplitude. Although a reason for diminishing amplitudes in consecutive forward saccades cannot be endorsed with our data alone, it is inconsistent with SM, which predicts that the amplitudes for consecutive saccades executed in the same direction should be similar in size. It is similarly not clear why return saccades would tend to increase in relative amplitude, but this is consistent with the possibility that these saccades are targeting not a previously fixated location, but another object along the same trajectory. One explanation for these results could be strategic: Although saccades falling close to the screen’s edge were excluded, a string of forward saccades would eventually run out of search space, given the screen dimensions. In fact, for every pairing of forward saccades in the relative-amplitude analysis, the second forward saccade must have less screen space in which to move forward than its predecessor. Corrective forward saccades that result from undershooting a saccade target could also explain this tendency in some portion of saccades. Either of these explanations, along with scene saliency, would likely combine with any momentum in the superior colliculus to produce relative forward amplitudes that would approach the repeat amplitudes, but fall short. Just as IOR might compete with other mechanisms that influence backward saccades, SM might combine with other mechanisms for forward repetitions. Considering that successive oblique saccades do average to zero, these inhibitory and forward mechanisms are unique to those directions.

For an IOR account based on spatial inhibitory tags, the tendency should be to saccade away from not only the immediately previous (one-back) location, but also the location before it (two back), given that IOR has been measured for locations extending back four fixations previous to the current one (Dodd, Van der Stigchel, & Hollingworth, 2009). Our results clearly show the existence of a bias away from the two-back as well as the one-back locations, consistent with IOR. However, forward saccades are an indirect measure of the effect of IOR, on the basis of the idea that the forward direction opposite the previously fixated location would carry the least inhibition. The more difficult, but perhaps most important, question will be whether IOR effectively biases saccades away from previously fixated locations—that is, whether previously fixated locations are visited less often than would be expected on the basis of chance. Bays and Husain (2012) have completed just such an analysis, and showed a clear bias away from a previously fixated location, as compared to the predictions of a memoryless model.

Finally, we looked at the expected effects of both IOR and SM on fixation durations during search. Previous results (Smith & Henderson, 2009) had shown both a slowing of fixation durations for those saccades that repeat one-back locations and an overall speed advantage for saccades continuing in a repeat direction. Although we did find the slowed duration of saccades returning, we found no speed advantage for forward saccades, even in the one-back analysis, in which we would expect to find the greatest influence from SM. The key difference between our study and previous ones is that we explored saccadic patterns without a secondary probe detection task. If the secondary probe task is indeed the reason for the contradictory results, we are inclined to favor ours as being the more valid approximation of natural search behavior. It is reasonable to suspect that fixation durations would be affected by the expectation of the sudden onset of a task-relevant probe. Indeed, the probe onsets are usually yoked to fixation behavior in learnable ways, and observers may be inclined to learn these contingencies and change their behavior to try and anticipate or accommodate them. Another possibly important factor is search time, which was much longer here than in previous studies. This could also contribute to differences in fixation durations, although one could argue that by looking over a longer timeframe, we have extended the conclusions that can be drawn on the basis of our data, as opposed to limiting them by repeatedly calling off the search earlier than an observer naturally would.

We observed midlevel effects, in the form of IOR, supporting its putative role as a foraging facilitator in visual search. Although we did not see evidence for SM in these data, we cannot rule out the existence of mechanisms that could drive search forward, in addition to biasing it away from returns. Questions remain, however, about how low-level and oculomotor mechanisms interact with scene salience, experiment instructions, and observer strategy during search. For instance, free search and reading produce very different strategies and saccadic tendencies, yet both produce reliable IOR (Rayner, Juhasz, Ashby, & Clifton, 2003), and yet IOR is not always found when observers are asked to memorize a scene (Dodd et al., 2009). This suggests that much is still left to learn about the roles of context and task in IOR.