The present review was primarily motivated by a desire to address a particular claim regarding the visual detection of threats in humans. The claim is most readily associated with the work of Öhman and colleagues (Öhman, 1999; Öhman & Mineka, 2001), and it is that humans have evolved to detect threats in the immediate visual environment rapidly. By this account, throughout their evolutionary history, humans have developed a fear system (Öhman & Mineka, 2001), and this cognitive module (Fodor, 1983) is specially adapted “for the elicitation of fear and fear learning” (p. 483). It incorporates defense mechanisms (and gives rise to emotional reactions) for dealing with threatening situations. The associated fear response hypothesis is that the fear system produces an automatic early warning signal whenever a scene that contains any form of threatening item confronts an observer and it is this that alerts the observer to the threat. It has been argued that survival chances can be increased only if threatening items are immediately prioritized and, as a consequence, the fear system should be invoked quickly and automatically by the presence of threat in the environment. It is this claim, about the automatic alerting of the fear system toward immediate threat, that has been the driver of much research on how the presence of emotional visual stimuli may affect attentional control (see Quinlan & Dyson, 2008, Chap. 16, for a review).

In the present article, the primary intention is to assess critically the empirical evidence that has been used to bolster support for the fear response hypothesis. From a recent review of some of the evidence (see LoBue, Rakison, & DeLoache, 2010), the impression gained is that the basic account is sound and that it has general support in the literature. However, here a more skeptical view is adopted, and a quite different conclusion is arrived at from a more critical appraisal of the empirical evidence.

The present article is predominantly about evidence from speeded visual search tasks, since it is this that provides the bedrock of the support for the fear response hypothesis. As a consequence, the article begins with a comprehensive review of the literature on effects of threat in speeded visual search. In considering this material, it rapidly becomes apparent that the literature is dogged by notable and recurrent methodological confounds that seriously limit the conclusions that can be drawn. These confounds are described, and the evidence for the rapid detection of threat is examined in light of these concerns.

Most of the work on the effects of threat in speeded visual search has focused on biological threats related to the presence of images of predatory/dangerous animals. However, there has been some work that has addressed effects of threat relating to the presence of images of man-made, dangerous items. One notable study is considered, in which search for images of nonbiological threats was examined, and the evidence is critically appraised. Again, it seems that certain methodological confounds undermine the generality of conclusions that may be drawn.

Other evidence relating to the search for nonbiological threats is then reviewed. It is this sort of evidence that has been used to bolster claims that humans are particularly sensitive to learning about possible threats during the course of their lifetimes. Issues regarding this sort of learning are discussed, as are issues concerned with incidental perceptual learning that may be taking place during actual testing. The review is brought to a close with a consideration of how search can be affected when aversively trained stimuli are used as targets. Different conceptions of attentional control are described, and it is these, together with methodological concerns, that form the basis of the conclusions that are drawn.

It is important to be clear at the outset that this short review is directed solely at the evidence relating to the fear response hypothesis from studies of speeded visual search. Although some general lessons, particularly about stimulus selection, can be taken away from what is written, the article is limited to a consideration of the work on speeded visual search. It is this sort of evidence that is germane to the fear response hypothesis and to the related case for support advanced by LoBue et al. (2010). Consequently, it is beyond the scope of the article to consider evidence from other paradigms (see Quinlan & Dyson, 2008); there is no intention to provide a comprehensive review of threat detection in general.

It is also only important to point out at the outset that the literature on speeded visual search with facial stimuli is not discussed in any depth. Part of the reason for this is that that work has recently and thoroughly been explored by Frischen, Eastwood, and Smilek (2008).Footnote 1 More critically perhaps, it seems that, the strongest message to emerge from this literature is that it is happy expressions that are more rapidly and easily detected than are any of the negative emotions (Calvo & Nummenmaa, 2008; Juth, Lundqvist, Karlsson, & Öhman, 2005). It would also be somewhat remiss not to acknowledge the many other very intriguing effects concerning attentional control and the perception of facial expressions in nonsearch tasks (see, e.g., Bocanegra & Zeelenberg, 2009; Phelps, Ling, & Carrasco, 2006), but a detailed consideration of these would be a digression too far removed from the primary aims of the present review.

Visual search and the automatic attention-grabbing nature of threatening stimuli

The basic claim (Öhman, 1999) to be addressed is that the fear system is automatically activated by immediate visual threats. The primary experimental evidence for this comes from studies of speeded visual search. Using such techniques, the overarching aim has been to try to establish empirical support for the claim that visual threats automatically grab attention.

There are many variants of the basic visual search paradigm (see Quinlan, 2003, for a review), but a classic example is where, on a given trial, a participant is told to search for a target stimulus of a particular type in a search display that contains several possible targets. Participants have to respond by pressing one key if they find the target (i.e., they make a target-present response) and a different key if they are unable to find the target (i.e., they make a target-absent response). The typical measures of interest are response time (RT) and response accuracy.

The basic window on cognitive processes is the so-called search function that maps out the relation between the speed of response and the complexity of the display. Typically, display complexity is indexed by the display set size defined as the number of search elements that the participant is presented with in a given display. A general finding is that search time increases in a monotonic (typically linear) fashion with display set size. Such an increase is taken to reflect the ease/difficulty of the actual search. If detection of a target is very easy, conditions can arise where RTs do not scale with display set size. That is, the search function is described as being “flat,” and this is known as target pop-out. Under these circumstances, the target is as easy to detect among a few nontargets as among many nontargets. Much of this kind of work can be traced back to the origins of feature integration theory (Treisman & Gelade, 1980), and it is from here that a pop-out effect is taken to reflect a parallel, automatic process in which the target immediately grabs attention. In contrast, more difficult searches are reflected in cases where the RTs scale with display set size. When such functions are linear, the slope of the line is typically taken to reflect the speed of the search. Relatively slow responding and linear search functions are taken to reflect failures of a target to engage attention automatically.

In the early literature on visual feature integration theory (Treisman & Gormican, 1998; Treisman & Souther, 1985), a “flat” search function was taken to be indicative of automatic/preattentive pick-up of target information across the whole of the visual field (Quinlan & Humphreys, 1987; Treisman & Gelade, 1980). As a rough rule of thumb, a “flat” search function is one where the corresponding slope value fell within “the limits of a maximum of 5–6 ms/item” (e.g., Treisman & Souther, 1985, p. 471). In the early work on the effects of threat in visual search, appeals to flat search functions were made to support claims about the automatic attention-grabbing properties of threatening stimuli. More recently, though, concerns have been raised about the legitimacy of inferring the kind of search process from a single estimate of search performance (see Thornton & Gilden, 2007; Wolfe, 1998, p. 36). Nonetheless, 5–6 ms/item is indeed an index of a very shallow slope (Haslam, Porter, & Rothschild, 2001; Wolfe, 1998) and clearly demonstrates very rapid target detection.

In their seminal article, Öhman, Flykt, and Esteves (2001) employed an oddball version of the visual search task to examine how the threat content of the search items might influence performance. It is this form of search task that has been widely adopted in the literature on the effects of threat detection. In their version of the task, each search item was a full color photographic image of a plant or animal, and participants, on each trial, had to judge whether all the search items were taken from the same biological category or whether there was a distinctive singleton (an oddball) present. Two categories of so-called fear-relevant items were chosen (namely, spiders and snakes), and two categories of fear-irrelevant items were also chosen (namely, flowers and mushrooms). The aim was to see whether participants would show an advantage in detecting the presence of an image of a threatening item in a display containing images of nonthreatening items gauged relative to detecting a nonthreatening item in the presence of threatening items. Such a performance benefit in detecting the threatening targets we may call a threat advantage (see Quinlan & Dyson, 2008, Chap. 16), and evidence for such an advantage was found in two cases.

In the oddball version of the search task and on absent trials, all of the items are taken from the same category, and we may label these items nontargets. On present trials, a singleton image is contained in the search display, and this is taken from a distinctive category. We may label the singletons as targets. In Öhman et al.’s (2001) first experiment, the display set size was fixed at nine items, and the mean RTs were shorter in detecting the presence of the target when it was threatening than when it was nonthreatening. In their second experiment, the threat advantage was examined as a function of display set size because, across trials, the display set size could be either four or nine items. It was found that, statistically speaking, RTs did not increase directly with increases in display set size when the target was a threatening item: The time to detect the threatening target was, statistically, the same regardless of how many items were to be searched. However, RTs did increase directly as a function of display set size when the target was a nonthreatening item.

Critically, the numerical increase in RTs with display set size for the threatening targets was interpreted as showing an additional 3-ms time cost for each search item in the display. Such a slope value falls within the estimates of automatic target detection as described before (Treisman & Gormican, 1998; Treisman & Souther, 1985). It is this “flat” search function, reported by Öhman et al. (2001), that has been taken as the key evidence for the rapid and automatic detection of visual threats. This is the critical empirical support for the fear response hypothesis.

Since the publication of the work by Öhman et al. (2001), there has been a steady stream of studies that have addressed the basic issues and have used various versions of the speeded visual search task to examine the claims about the automatic detection of threat. At the time of writing this article, 18 relevant studies had been identified, including the original Öhman et al. study. The nature of these studies and the critical findings are summarized in Table 3 in the Appendix. In total, 44 different experiments are summarized in the table. In generating such a table, an accepted next step would be to carry out a meta-analysis in order to (1) examine the degree to which the basic findings documented by Öhman et al. have been replicated and (2) assess the size of the threat advantage as documented across the studies. However, something that stands in the way of doing this is that the literature has been dogged by a number of unfortunate methodological confounds that act as a serious impediment in attempting to collate the findings together in a single meta-analysis. These confounds are discussed in detail in what follows, because it is these that stand in the way of any simple summary of the research findings.

The first notable confound

In thinking about the first confound, it is useful to consider some critical aspects of more typical search tasks. Ideally, if we wish to examine how different types of targets affect responding, these ought to be examined in the context of the same sorts of displays containing the same kinds of nontargets. For instance, in a typical example of speeded search (e.g., Quinlan & Humphreys, 1987), colored letters were used throughout as the search elements. Distinctive colors and shapes defined the feature targets, and particular colored shapes defined the feature conjunction targets. Across the different search conditions, each kind of target was embedded in displays containing the same kinds of nontarget, colored letters. On these grounds, any differences in performance across the feature and feature conjunction conditions could not be attributable to any coincident change in the nontargets. If different kinds of targets are tested in displays comprising different sorts of nontargets, the problem is that we cannot know whether any differential effects are due to the presence of the targets or to the presence of the nontargets.

It would be disingenuous to try to claim that this is the first time that this confound has been discussed in the threat detection literature (see, e.g., Rinck, Reinecke, Ellwart, Heuer, & Becker, 2005). However, the intention here is to examine carefully the findings in light of this confound. Indeed, Öhman et al. (2001) were aware of the potential problems but attempted to sidestep these by discussing performance on absent trials. This maneuver will be considered first.

Performance on absent trials

In the original Öhman et al. (2001) study, displays containing a target spider or snake contained nontarget images of mushrooms or flowers, and displays containing a target mushroom or a flower contained nontarget images of spiders or snakes. The basic concern therefore is that there is no way of knowing whether the speeding on present trials witnessed in the case of the spider/snake targets was being driven by the attention-grabbing nature of these targets or because of the ease of searching through the nonthreatening flora nontargets. This is particularly pertinent given the growing evidence for the claim that attention may be particularly difficult to disengage from any threatening item (see Fox, Russo, Bowles, & Dutton, 2001; Huang & Yeh, 2011). Hence, performance may be slowed on nonthreatening target trials simply because of the stalling induced by the threatening nontargets.

Öhman et al. (2001) were clearly aware of this confound and attempted to allay any concerns by shifting attention from present trials to performance on absent trials. Öhman et al. reported two general patterns of performance on absent trials. In their first experiment, RTs on absent trials did not vary as a function of the threat-relevance of the nontargets; however, in their later experiments, participants were faster in processing displays containing threat-relevant nontargets than displays containing threat-irrelevant nontargets. Although both patterns might dispel concerns that performance on present trials is reflecting critical processes associated with the nontargets, the hardened skeptic would be unmoved on the grounds that performance on absent trials is only an imperfect indicator of factors that play a role on present trials. This particular point will be returned to shortly.

Questions may also be asked about the replicability of the two different patterns of effects on absent trials reported by Öhman et al. (2001). On the positive side, two other studies have reported no effects attributable to nontargets on absent trials. Lipp (2006, Experiment 2) showed no difference in RTs across threat-relevant and threat-irrelevant nontargets at the smallest display size tested (i.e., displays containing four search elements). Soares, Esteves, and Flykt (2009, Experiment 1) showed the same null effect when larger display sizes were tested (i.e., when displays contained nine search elements). In addition, there are four cases where participants have been faster in searching through displays containing threat-relevant nontargets than displays containing threat-irrelevant nontargets on absent trials. In all four cases, the work has been carried out by Lipp and co-workers (Lipp, 2006, Experiment 1; Lipp, Derakshan, Waters, & Logies, 2004, Experiments 1 and 1a; Purkis & Lipp, 2007, Experiment 1).

Importantly, though, there are five experiments where the reverse effect has been reported on absent trials—namely, where participants search more quickly through displays containing threat-irrelevant images than through those containing threat-relevant images (Brosch & Sharma, 2005; Lipp, 2006, Experiment 2, at the largest display size; Soares et al., 2009, Experiment 2; Tipples, Young, Quinlan, Broks, & Ellis, 2002, Experiment 1).

In sum, the evidence from performance on absent trials in the experiments in which targets and nontargets have been interchanged across trials is mixed. Therefore, the degree to which search performance is determined by the threat content of the nontargets remains unresolved. Indeed, the picture is complicated further because, as Tipples et al. (2002, Experiments 3 and 4) have demonstrated, performance on absent trials can vary according to what kind of target is being searched for. They tested participants in a target-present versus target-absent version of the task (instead of an oddball version), and three different kinds of targets were presented in separate blocks of trials. Targets were pleasant animals, threatening animals, or, plants, and throughout, the nontargets were images of man-made objects. What they found was that performance on absent trials varied across the different blocks according to target type. Participants were particularly slow on absent trials when searching for a pleasant animal, even though there were no effects of target type on present trials.

Collectively, therefore, the wider literature shows that attempting to draw conclusions about performance on present trials on the basis of performance on absent trials in speeded visual search tasks is fraught with difficulties. More particularly, given the mixed nature of the relevant evidence, no firm conclusions about the effects of threat on search performance can be made on the basis of the extant data.

Performance on present trials

On the basis of what has just been argued, it is clearly of primary importance to focus on performance on target-present trials. Initially, the discussion continues to move forward in the context of studies in which targets and nontargets have been interchanged across trials. A list of these cases is provided in Table 1. The table contains details of 12 studies and a corresponding breakdown of the 27 constituent experiments. For some of the studies, a further breakdown is provided of the different conditions in the experiments. There are, therefore, 39 separate cases listed. The table contains details of whether or not a threat advantage was found. Of the 39 cases, a threat advantage was found in 23. No difference in performance with threat-relevant and threat-irrelevant targets was found in 14 cases. On these grounds, therefore, there does seem to be more support for a threat advantage than for the null effect.

Table 1 Summary of studies in which target and nontarget images were interchanged over trials

However, again, we need to exert some caution because a quite different methodological confound exists and the extant evidence needs to be reassessed in the light of this. From the work of Tipples et al. (2002), it has been shown that what might be taken as evidence of a threat advantage may, in fact, be due to a categorical difference between the targets and nontargets. Tipples et al. initially ran an extension and replication of Experiment 2 reported by Öhman et al. (2001). Images of threatening animals were used along with nonthreatening images of plants. Whereas the estimate of the slope for the threatening target search function was 11 ms/item—an estimate close to the <10 ms/item that defines “quite efficient” searches (Wolfe, 1998) —the estimate of the slope for the nonthreatening target search function was 28 ms/item, and this is considered to be an “inefficient” search (Wolfe, 1998). The basic threat advantage was replicated, but strictly speaking, inspection of the corresponding slope values shows that performance violated the rule of thumb regarding the definition of automatic threat detection.

In a follow-up experiment, the images of the threatening animals were replaced with images of nonthreatening animals. In this case, target detection responses were generally faster to the images of the nonthreatening animals than to the plants, and the estimate of the slope of the animal target search function was less than that for the plant search function. Indeed, the slope for the search function related to detection of nonthreatening animals was only 8 ms/item. These data were taken to suggest that the original threat advantage (found by both Öhman et al., 2001, Experiment 2, and Tipples et al., 2002, Experiment 1) was confounded with the animal/plant distinction. It seemed, therefore, that the original threat advantage might actually be due to better detection of images of animals than of plants regardless of the emotional valence of the images.

Indeed, this very same categorical confound is present in 14 cases in Table 1, and in 11 of these, a threat advantage has been reported. We therefore simply do not know and cannot tell whether the reported threat advantage in these 11 cases is actually an animal advantage of the sort described by Tipples et al. (2002). For completeness, the three remaining cases are ones where the categorical confound existed but where, apparently, no threat advantage was found. One such case is Tipples et al. (2002, Experiment 2), but here pleasant animals were tested alongside images of flora. The other cases are Experiments 4 and 5a reported by LoBue and DeLoache (2011). In the first case, images of coiled snakes were tested alongside images of coiled objects, and in the second, images of straightened snakes were tested against images of flowers. In neither case were responses faster to the threatening animals than to the corresponding foils. Hence, neither a threat advantage nor an animal advantage occurred. Both of these cases are discussed in more detail shortly.

It is, however, also important to consider the cases where the categorical confound is not present but where a threat advantage has been reported. Lipp (2006, Experiment 2) showed a threat advantage when snake and spider targets were presented alongside birds and fish nontargets. LoBue and DeLoache (2008) found that children (but not adults) were quicker to pick out a snake target among caterpillar nontargets than vice versa. Finally, the same researchers in 2011 (LoBue & DeLoache, 2011) found that both adults and children were quicker to pick out a snake target when presented alongside frog nontargets than vice versa.

In sum, when searches involving biological categories are considered, there are 3 clear cases where a threat advantage has been found in the absence of the categorical confound and 11 cases where the categorical confound is present and the threat advantage has been found. There are also 2 clear cases where the categorical confound did occur but no threat advantage was found. Overall, therefore, it is again difficult to conclude anything compelling about the nature of visual threat detection on the basis of this evidence.

Clearly, the evidence considered so far does not provide a firm foundation for theorizing about the automatic capture of attention by visual threats. In contrast, and given the striking variation in the different outcomes, it seems that other factors, aside from threat content, must be considered. For example, it has been argued that apparently incidental categorical differences between targets and nontargets may be key. Consideration of other, apparently incidental factors gives further cause for concern.

Other stimulus factors that are confounded with the presence of threat

The fact that other stimulus factors, aside from the threat content of the search elements, may play some role in threat detection experiments has been brought to the fore in the recent work of LoBue and DeLoache (2011). In their first experiment, images of snakes and of frogs were used as search items such that search for a distinctive snake among frogs was compared with search for a distinctive frog among snakes. Participants were told to search for a snake or to search for a frog, in the respective conditions and color images that were used throughout. The task required a pointing response to the position of the target on a touch screen. Both 3-year-old children and their parents were tested, and under these circumstances, a threat advantage was found in both participant groups.

The effect was also replicated when grayscale images were used, and this provided a useful control that showed that search performance could not be attributed to the pick-up of any distinctive color associated with the threat-relevant targets. Having ruled out color as being the critical factor, LoBue and DeLoache (2011) began to examine the role that distinctive shape might have in the tasks. Next, they assessed performance for a coiled snake target among images of other coiled objects and vice versa. Under these circumstances, they found no difference in target detection performance according to the threat nature of the targets. So despite a coiled snake being a threat-relevant target, participants were no quicker to detect these targets than they were to detect the threat-irrelevant coiled object targets.

When they ran a further experiment, in which they assessed performance with the threat-irrelevant coiled objects against flowers, they found that participants were quicker to detect a coiled object among flowers than vice versa. The latter result echoed a similar finding when LoBue and DeLoache (2008) compared performance across coiled snakes and flowers: Snakes were detected faster than flowers. Perhaps, therefore, there is something special about a coiled shape. Indeed, when LoBue and DeLoache (2011, Experiment 5a) assessed detection of a stretched snake among flowers (and vice versa), there was no longer a snake advantage. What LoBue and DeLoache (2011) therefore concluded was that it is the snakelike, coiled shape that is distinctive and it is this that is rapidly detected. On the understanding that the rapid detection of a potential threat object must be based on the rapid pick-up of some distinctive perceptual cue (i.e., a threat cue; see Cave & Batty, 2006), then LoBue and DeLoache (2011) have demonstrated that a salient “snake” cue is its coiled shape.

The idea that humans readily detect the presence of a snakelike shape in the immediate environment seems highly plausible from an evolutionary perspective. Indeed, the notion of such visual threat cues will be returned to as the discussion proceeds. However, a troubling thought is that the threat/nonthreat distinction used to classify the search images is itself confounded by more mundane and incidental stimulus factors. That this may be so is now considered in more detail.

Further issues concerning incidental stimulus factors

In a careful review of some of the issues, Cave and Batty (2006) provided a useful starting point for discussion. Following their lead, we must distinguish between “elementary threat features” (Öhman et al., 2001, p. 475) (i.e., threat cues) and some other coincidental visual characteristic that discriminates the targets from nontargets. In the previous section, the idea that humans have evolved a sensitivity toward certain threat cues, such as the snakelike coiled shape, was considered, and it is tempting to use this idea to interpret the relevant evidence from speeded visual search more generally. Some caution is warranted, though, in concluding that every search advantage is indicative of the pick-up of a critical threat cue. The less enticing possibility is that the data may actually reflect the presence of more mundane confounding factors. Such an alternative possibility receives some support from a consideration of the work of Blanchette (2006).

In her experiments, the original oddball version of the search task (cf. Öhman et al., 2001) was repeated, and performance was measured under conditions in which the stimuli were taken from a number of different categories. A main division was between categories of high and low evolutionary significance. In the high evolutionary significance categories, the threat items were images of snakes and images of spiders, and the nonthreat images were images of mushrooms and images of flowers. In the low evolutionary significance categories, the threat items were images of guns and images of knives, and the nonthreatening items were images of clocks and images of toasters. (Following the example of Brosch & Sharma, 2005, we may label the high evolutionary significance threats as phylogenetic threats and the low evolutionary significance threats as ontogenetic threats.)

The results were taken to reveal a basic threat advantage with displays containing threat-relevant targets being responded to faster than displays containing threat-irrelevant targets. Moreover, this effect was present in the data for targets taken from both categories of phylogenetic and ontogenetic threats. Setting aside the fact that the study is one where the targets and nontargets were interchanged over trials, the result with the ontogenetic threats is of some note. This is because it sits relatively uncomfortably with the prior arguments about the evolution of a fear module that is sensitive primarily to biological threats. Given the comparatively recent invention of guns and knives, it is not credible to argue about an adapted mechanism for their detection in the manner described in the fear response hypothesis (see LoBue et al., 2010, p. 377).

Indeed, the findings with the ontogenetic threats were followed up in a final experiment in which different kinds of categories of high and low evolutionary significance were tested: Lions/cats and rats/rabbits were designated as being of high evolutionary significance, and hand grenades/balls and syringes/pens were of low evolutionary significance. The thinking behind choosing these particular pairs of categories was that members of the paired categories are highly visually similar to one another; hence, any associated difference in search efficiency must be due to differences in threat (cf. the snakes/caterpillars control as used in Experiment 3 of LoBue & DeLoache, 2008).

Unfortunately, in places, the text and the graphical summary contradict one another. The error reflects a mislabeling of the search functions in the figure legend (Blanchette, personal communication). In actuality, there is a threat advantage in data for both the high and low evolutionary significance cases. In addition, participants were generally quicker to detect the man-made targets than they were to detect the biological targets. Moreover, there was clear evidence of pop-out for the ontogenetic threats (i.e., syringes against pens and hand grenades against balls): The associated slope value is 7 ms/item.

It is the latter pattern of results that is most provocative, because it, again, fits relatively uncomfortably with the fear response hypothesis. The theory flounders in attempting to account for the efficient detection of hand grenades and syringes, because such items are of low evolutionary significance. Such items have featured only in our very recent evolutionary history, and there is no biological imperative to detect these particular items. Although Blanchette (2006) discussed at length possible neural reasons for the findings, following a careful consideration of the stimuli used, it seems that a more prosaic account is plausible.

In taking the textual summary of the data at face value, there is a clear search asymmetry between cases of low evolutionary significance that differ in their threat content. For instance, participants found searching for a syringe in an array of pens easier than searching for a pen in an array of pens. The critical point is that this search asymmetry is strikingly similar to other search asymmetries that have been discussed at length in the visual search literature (see Wolfe, 2001). It well established that participants are better able to detect the presence of an additional critical feature than they are to detect the absence of a critical feature (see Fig. 1a, from Wolfe, 2001, p. 382). Inspection of the kind of search displays used by Blanchette (2006) reveals that although syringes and pens are visually similar, syringes have flanges and pens do not (see Fig. 1b, from Blanchette, 2006, p. 1504). In addition, a critical difference between a ball and a hand grenade is that the hand grenade possesses a pin. It is therefore tempting to conclude that the findings reported by Blanchette provide yet another case of a standard search asymmetry based on the presence/absence of a visual feature that distinguishes the target from the nontargets. Unless and until this alternative interpretation of the results is ruled out, no firm conclusions can be drawn about visual threat detection on the basis of these data.

Fig. 1
figure 1

a Example of displays giving rise to a clear asymmetry such that it is easier to detect the oddball when it contains an additional feature than when it is missing a critical feature. (Reproduced from Wolfe, 2002, with permission) b Example of a search display used by Blanchette (2006) in which the ease of detection of the oddball (i.e., the syringe) may be attributable to its additional critical features. (Reproduced from Blanchette, 2006, with permission)

The choice seems to be between (1) taking the data at face value and concluding that there is something special about the detection of ontogenetic threats or (2) taking the data to reflect more mundane confounding factors between the particular stimulus sets used in the study that give rise to well-understood search asymmetries. Here, the case for the second alternative has been provided. It is nonetheless important to consider the first alternative in more detail.

Further evidence from studies with targets of low evolutionary significance

From the work of LoBue and DeLoache (2011), a snakelike coiled shape appears to function as a salient threat cue. Tipples et al. (2002) considered a different threat cue—namely, a snarling facial configuration with bared teeth (p. 1008). In both of these cases, there is a plausible evolutionary account as to why such “elementary threat features” have taken on significance for humans: Both sorts of cues signal a potential and immediate biological threat. Rapid detection of these sorts of cues fits comfortably with the fear response hypothesis. In the case of man-made artifacts—the so-called ontogenetic threats—there is no such biological imperative. Any search advantage for ontogenetic threats therefore sits less comfortably with the hypothesis.

In order to accommodate such effects, arguments have been made that “humans may learn to detect threatening stimuli . . . particularly quickly as a result of negative experiences” (LoBue et al., 2010, p. 377). The basic idea is that humans have evolved to be particularly sensitive to learning about potential threat items. As Blanchette (2006) has written, “one possibility is that extensive association between higher level representation of a stimulus and fear reactions (i.e., learning) may lead to efficient detection” (p. 1495). The claim, endorsed by Lobue et al. (2010), is that humans have “the ability to learn to quickly detect other types of threat-relevant stimuli . . . through all-purpose learning mechanisms such as association and conditioning” (pp. 377–378). Such ideas, although not germane to the fear response hypothesis, fit comfortably with it. Nevertheless, it is important to examine the evidence base properly.

Of the reviewed studies, there are only three cases (in addition to the study by Blanchette, 2006) that have examined performance with ontogenetic threats. In the study by Brosch and Sharma (2005), guns and syringes were classified as threat relevant, and cups and mobile phones were classified as threat irrelevant. As Table 1 shows, this is a case where the targets and nontargets were interchanged over trials in an oddball version of the search task. Bearing this caveat in mind, the results are consistent with a threat advantage. Indeed, when expressed in terms of slope values, search for the ontogenetic targets was surprisingly fast. For gun/syringe targets, the slope value was 0.8 ms/item, whereas for cup/mobile phone target, it was 9.4 ms/item.

The second case is that of Fox, Griggs, and Mouchlianitis (2007). The design of the first experiment was very similar to the oddball task used by Brosch and Sharma (2005). Fear-relevant targets (i.e., guns or snakes) were always embedded in displays containing threat-irrelevant nontargets (i.e., flowers or mushrooms). RTs to both threat-relevant target types were shorter than were RTs to threat-irrelevant target types. In their second experiment, targets and nontargets were not interchanged. On absent trials, displays contained only images of flowers or images of kettles. In this case, the search rate for the guns was 9.1 ms/item, and for toasters it was 14.8 ms/items. Again, these data are consistent with a threat advantage, but it needs to be borne in mind that in the case of the snake targets, there was a categorical difference between the targets and nontargets. An animal target was tested when embedded in displays containing nonanimals, and the data are therefore as consistent with an animal advantage as with a threat advantage.

Finally, there is the study by Lobue (2010). In this case, there was a target present on every trial, and the child participants were instructed to touch the target image on each trial. In this case, the paired targets were pens/syringes and knives/spoons, and targets, and nontargets were interchanged over trials. Overall, there was no difference in RTs to the knife and spoon targets; however, there was a threat advantage in the data for pen versus syringe targets. Given that the stimuli were those used by Blanchette (2006), this particular search asymmetry with the pens versus syringes is open to interpretation.

In sum, the evidence for an ontogenetic threat advantage is less than compelling when considered against the possible confounding factors that are being described and discussed here. On these grounds, an interim conclusion is that although it may well be the case that humans have evolved a particular sensitivity toward learning about possible threats in such a way that it influences threat detection, the reviewed evidence fails to provide a strong case in support of this.

However, having raised the issue of learning, it is important to be mindful of the sort of incidental learning that might be going on during the experiment itself. Critically, in the cases under discussion, it may be that during the course of testing, participants home in on particular visual features that consistently distinguish targets from nontargets. A clear example of this has been provided by the work of Rabbitt (1964, 1967). In simple letter search tasks, participants were asked to carry out target present/absent judgments on displays containing a “C,” an “O,” or neither target. Early on during the course of practice, there was a clear effect of display set size, but over trials, the display set size effect diminished. The critical factor appeared to be that whereas the targets contained curved features, none of the nontargets contained curves (the nontargets were sampled from AEFHIKL). Over the course of the experiment, the claim was that participants tuned into the curved/angular distinction and used this as the discriminative cue. When the nontargets were switched to letters containing curves, performance collapsed to the starting level.

It is quite possible that the hardened skeptic would argue that similar perceptual tuning is going on in the visual search experiments discussed here (indeed, such sentiments have been aired by Notebaert, Crombez, Van Damme, De Houwer, & Theeuwes, 2011, p. 87). It may be that, in any of the particular search experiments discussed here, there are simple visual features that correlate perfectly with the target/nontarget distinction and that it is these that participants (perhaps unwittingly) pick up on and use over the course of testing (e.g., the distinctive pin of the hand grenade, the flanges on the syringes). Without a proper consideration of this possibility, it would be incautious to attempt to draw any firm conclusions about the nature of threat detection. In reply, perhaps some of these concerns can be allayed by ensuring that individual targets are always presented only once (as in the case of Fox et al., 2007). With such a design, participants never have the opportunity to become familiar with specific visual features of particular targets. Perhaps the problem therefore exists only in the studies where only a small number of targets are repeatedly sampled over trials (as is the case in the study by Brosch & Sharma, 2005). However, even in the cases where no target is presented more than once in the experiment, we need to be assured that the different kinds of targets are equally discriminable from the nontargets. Maybe guns are simply more discriminable from toasters than are kettles (cf. Fox et al., 2007).

The importance of this cannot be underestimated given the work of Duncan and Humphreys (1989) and Humphreys, Quinlan, and Riddoch (1989). These researchers have shown that target detection, in more typical search tasks, is generally governed by two factors: – target–nontarget similarity and nontarget–nontarget similarity. In very broad terms, target detection is facilitated if the target stands out from the other nontargets. So target detection is easier if the target and nontargets are visually dissimilar than when the target and nontargets are more visually similar. In addition, target detection is also governed by the degree to which the nontargets themselves are visually similar. Target search is easier when the nontargets are visually similar to one another than when the nontargets are visually dissimilar from one another.

Unfortunately, considerations such as these have barely been featured in the literature on the speeded detection of threat, and it seems that such visual factors must have contributed to performance in the search tasks. Indeed, exactly these kinds of effects can be seen in the data from Rinck et al. (2005). In their first experiment, a target image was presented prior to a search display, and participants were instructed to make a speeded present/absent response following the presentation of the display. The threat-relevant targets were images of spiders, and threat-irrelevant targets were images of beetles and butterflies. In all of these cases, search was slowest when the target image was embedded in a display containing images of other instances from the target’s own category.

As has been noted, special problems exist when targets and nontargets are interchanged over trials, and these will be exacerbated if the different sorts of nontargets vary in their interitem similarity. More critically, perhaps, in cases where the different kinds of targets are tested against the same kinds of nontargets, we need to ask whether the different sorts of target types are equally visually similar to the nontargets. To perhaps belabor the point, in a letter search task, it would hardly be surprising to find that a target X is easier to find among nontarget Os than is a target C. Bearing these concerns in mind, it is now appropriate to consider the cases where different types of targets have been tested in displays containing the same kinds of nontargets.

Search in cases where nontargets remain constant across trials but the targets change

Table 2 shows the cases where, across trials, different kinds of targets have been embedded in displays containing the same kind of nontargets. There are 15 cases listed from eight different studies. In only six of these, however, was a threat advantage found. In the study by Rinck et al. (2005), the effect was present in the data for spider-fearful participants when a spider target was present among images of dragonflies in an oddball version of the search task. The second has already been discussed, and this is the data reported by Fox et al. (2007). The third case is that reported by Purkis and Lipp (2007). In this case, although visually the threat advantage is apparent in the data, the relevant statistical comparisons are not reported.

Table 2 Summary of studies in which targets and nontargets were not interchanged over trials

This particular study is, however, interesting for other reasons. The authors questioned whether the rapid detection of threat-relevant targets is necessarily accompanied by invocation of a putative fear system. Such an implication follows naturally from the fear response hypothesis (Öhman & Mineka, 2001). In their study, the apparent threat advantage was found both in the search data for control participants who were negatively disposed toward the snake and spider targets and in the data for expert participants who were not so disposed. On the basis of this, Purkis and Lipp (2007) argued that it is only after a target has been detected on the basis of some so-called “low-level perceptual features” (p. 322) that an evaluation of its fear relevance may then take place. Such sentiments clearly do not sit so comfortably with the fear response hypothesis as originally stated.

The final notable study is that by Quinlan and Johnson (2011). As can be seen from Table 2, a threat advantage was found in the two experiments reported. This study addresses several of the task design problems that have been found in many of the previous studies. All of the nontarget items were images of wild (nonpredator) birds. Half of the target images were of threatening animals, and the remaining half were of nonthreatening animals. Both sets of targets were tested in displays containing the same sort of nontargets; targets and nontargets were not interchanged across conditions. Furthermore, since both targets and nontargets were animals, the previous categorical confound was also obviated. Participants saw each target only once, and all of the threatening images were chosen such that, in each case, the bared teeth of the snarling animal were prominent. As Tipples et al. (2002) noted, “according to evolutionary theory . . . an open mouth and exposed teeth are characteristics associated with rage and the intention to attack” (p. 1009). In this respect, the targets reflected cases that were of high evolutionary significance.

With these controls in place, participants were tested with oddball versions of the search task, and in two experiments, robust threat advantages were found. Participants were faster to respond “different” in the presence of a threat-relevant target than a threat-irrelevant target. Therefore, it seems that, on the basis of these data, there are good grounds for arguing that the presence of a snarling animal is detected rapidly in the immediate visual scene. Examination of the slope values of the search functions for the threatening animals revealed, however, that in all cases, the values were greater than 20 ms/item. Such evidence is not consistent with the idea that threat detection reflects a process of automatic target pop-out.

Of some additional interest is the fact that the effects of threat in both experiments reported by Quinlan and Johnson (2011) were manifest in the intercept values and not the slope values of the corresponding search functions. On these grounds, it is arguable that the effects reflect something other than aspects of perceptual/attentional processing. To reiterate, a repeatedly made assumption in the visual search literature is that the slope of a search function provides an index of the speed of search. The threat advantages reported by Quinlan and Johnson reveal, in contrast, that the effect is carried by an overall RT difference in responding to threat-relevant versus threat-irrelevant targets. Perhaps, therefore, the effects reflect some kind of arousal that directly influences the selection of the appropriate response, but only once the threat item has been detected. Such a possibility remains to be properly explored.

Attentional capture versus prioritization

At this juncture, it is apposite to introduce the distinction between attentional capture and attentional prioritization. In simple terms, we have taken attentional capture to be associated with highly efficient search as indexed by a “flat” search function. In addition to capture, the notion of attentional prioritization has also featured in discussions of speeded visual search. At one level, the distinction between capture and prioritization fits comfortably with the difference between stimulus-driven effects and top-down effects. For example, the evidence for attentional capture is most readily found in cases where a new item is abruptly added to a search display (i.e., it is stimulus driven; see Yantis, 1993, for a summary review). Indeed, there is a wealth of evidence showing cases where target identification responses are facilitated when associated with an abrupt onset item that is the target (Jonides & Yantis, 1988) and cases where such responses are interfered with when an abrupt onset item is a nontarget (Yantis & Jonides, 1990).

A different body of evidence has been used to inform about attentional prioritization. Much of the early work on attentional prioritization was also carried out with displays in which various kinds of abrupt onsets occurred. A typical example is provided in the article by Yantis and Johnson (1990). In this case, “digital” letters were defined in terms of the line segments of a rectangular figure eight. Deletion of particular line segments from the “8” defined a given letter. On each trial, a given target letter was initially presented. Next, a display containing four placeholders—four digital “8s”—was presented. Finally, the search display was presented. Some of the search letters occurred at the placeholder positions (these were known as the no-onset letters), and other letters were presented at new locations (these were known the onset letters). Participants now made a speeded present/absent response as to whether or not the predefined target was in the final search display.

In their first experiment, the search display contained four letters (two onset and two no-onset letters) or it contained eight letters (four onset and four no-onset letters). A basic result was that target detection responses were considerably shorter when the target was an onset letter than when it was a no-onset letter. Moreover, this advantage held even in the case where the target was one of four possible onset letters. This was taken to show that participants, in some sense, prioritized their search for the target by first considering the onset letters and then considering the no-onset letters and that up to four letters could be prioritized in this way. More particularly, the claim was that the analysis of the onset letters occurred prior to that of the no-onset letters.

In this particular case, the process of item prioritization was stimulus driven, because the prioritized items were the onset items (i.e., they were defined solely by a stimulus event). However, it is possible to broaden the idea of attentional prioritization to encompass ideas concerning top-down processes. It is this notion of top-down prioritization that is implicated in the work of Notebaert et al. (2011). In their key experiment (i.e., Experiment 2), they examined performance in a variant of speeded visual search. In their search displays, the search elements were small round rings arrayed on the circumference of a virtual circle. Each of the rings was colored and contained a small, oriented line segment. On each trial, a target was either a vertical or a horizontal line segment, and participants had to decide under speeded two-alternative forced choice conditions which orientation the target had.

Prior to the testing of search performance, participants had undergone aversive training with a color that was then associated with a ring in the search task. Participants were presented with displays just containing the rings (and not the line segments), and a particular color was selected as the aversive stimulus. On 50 % of the trials following the presentation of a display containing the aversive stimulus, a small and short electrical shock was administered. Following this acquisition stage of the experiment, the participants were then quizzed about their knowledge of the aversive color before they were tested in the speeded visual search task.

Search performance was examined across three conditions: In the congruent condition, the aversive color belonged to the target’s ring; in the incongruent condition, the aversive color belonged to a nontarget’s ring; and in the baseline condition, the aversive color was not presented. Search performance was then mapped out as a function both of condition and of display set size. The results were relatively clear-cut, showing monotonic increases in RT with increases in display size for all three cases. In addition, and generally speaking, RTs were shortest on congruent trials, and they were longest on incongruent trials. Moreover, the slope value was also smaller for the congruent trials than it was for either the incongruent or the baseline condition.

This particular pattern of results Notebaert et al. (2011) took to reflect attentional prioritization rather than attentional capture. The estimate of search rate in the data for the congruent trials was 90 ms/item, which is considerably longer than 10 ms/item indicative of attentional capture. Nonetheless, the effects clearly reveal the potency of threat-relevant stimulus information in speeded visual search. That is, when a threat cue was (essentially) coincident with the imperative stimulus (the target/oriented line segment), responses were facilitated, relative to when there was no threat cue present. In addition, responses were slowed when a threat cue was located at a display position different from that of the target. Both the former facilitation and latter interference were taken as being indicative of attentional prioritization. The presence of the threat cue either speeded or slowed responding, depending on whether or not it was or was not, respectively, coincident with the target stimulus.

The data reported by Notebaert et al. (2011) are consistent with the idea that search for aversive targets can be prioritized over search for nonaversive targets. Most critically, this study is immune to the previously expressed concerns about stimulus factors and stimulus confounds, since these simply do not apply. The study also clearly demonstrates how prior learning about the aversive nature of a stimulus can have a demonstrable effect on speeded search. Such evidence is consistent with the claims of how learning about ontogenetic threats may influence their detection (Blanchette, 2006; LoBue et al., 2010).

However, the relevance of the study (and the evidence) to the fear response hypothesis may be questioned on the grounds that performance with biologically relevant threats was not examined. Nonetheless, the notion of attentional prioritization is important, and it may provide some further insights into processes of threat detection in general. In this regard, it may be more fruitful to think about how the evidence reflects processes of attentional prioritization rather than attentional capture. For instance, the data reported by Quinlan and Johnson (2011) might be usefully reinterpreted in terms of attentional prioritization rather than generalized arousal.

Main points and concluding comments

The literature on threat detection primarily in the context of speeded visual search tasks has been surveyed, and the associated evidence has been critically assessed. Although some data have been described that are consistent with the fear response hypothesis, a case has been built that the extant evidence does not provide unequivocal support for the hypothesis. In particular, it seems that the evidence for the automatic detection of threatening targets is both scant (Blanchette, 2006; Öhman et al., 2001) and open to alternative interpretation. It has been argued that the associated effects are, essentially, uninterpretable due to the presence of one or more confounding factors. A basic conclusion is that, on the basis of the present reading of the relevant studies of speeded visual search, the original fear response hypothesis has no convincing empirical support.

One conclusion, therefore, is that it would be very surprising if any further progress in understanding the visual detection of threat can be made if the problems in experimental design are repeated in the future. Steps need to be taken to ensure that, as far as possible, any incidental visual factors that vary across the threatening and nonthreatening images are controlled for, so that any contingent effects must be due to a difference in target valence. Until these concerns are addressed, the usefulness of the oddball search paradigm remains in doubt.

The implications of these sentiments extend further and apply with some force to other cases in which performance in nonsearch tasks has been examined. For instance, in a recent article by Mermillod, Droit-Volet, Devaux, Schaefer, and Vermeulen (2010), performance in a timed image identification task was reported. Initially, participants were familiarized with a set of 20 target images, and then they were tested with both the original images and spatially filtered versions of these. The filtered images were transformed such that only information from particular spatial frequency bands was retained.

In the experimental trials, the imperative stimulus was one of the target images, and the participant was expected to judge whether the image depicted a dangerous or neutral stimulus and to respond rapidly within a 600-ms deadline. Throughout the trials, participants were intermittently rewarded with negative feedback if responses were incorrect or too slow. The central result was that participants were overall quickest to classify the images of living threats. However, this advantage occurred only when the coarse details were present in the images. When such details were removed, the threat advantage was abolished. As consequence, it was concluded that “coarse scales are sufficient for fast recognition of visual stimuli depicting living dangerous entities” (p. 6).

On the face of it, this systematic manipulation of the visual characteristics of the target images is exactly the sort of experimental test that fits with the sentiments expressed here. Nonetheless, following more careful scrutiny, some caution is warranted because of the very small number of images used and the nature of those images. Five of the images comprised living threats (a striking venomous snake, a snarling dog, a snarling bear, an attacking shark, a burning man being clubbed by an assailant), five images were of nonbiological threats (a tornado, a tank, a gun, broken glass, a burning oil drum being attacked by an assailant), five images were images of nonthreatening biological entities (a pelican, a horse, puppies, infant children, a flower), and five were nonthreatening images (clouds, a lamp, a clock, a cake, a pile of money).

Visual inspection of the images for the dangerous, living category reveals that four of the images were headshots of “snarling” animals. In contrast, the nonliving threatening images were much more heterogeneous. At a conceptual level, all of the pictured entities were nonliving perceptually, although they shared no common visual characteristics. Unfortunately, the authors did not report an item analysis of the participants’ responses; therefore, it is impossible to know whether the effects generalize across all images within the different sets of targets. Nonetheless, it is unfortunate that in four out of the five dangerous living images, the face was highly salient. The rapid categorization of these dangerous images therefore may reflect nothing more than a particular sensitivity to processing facial characteristics (cf. Morton & Johnson, 1991). Until this possibility is ruled out, it is unsafe to conclude that the data do reveal anything of importance about the visual detection of threat.

More general theoretical considerations

At the very heart of the fear response hypothesis is the idea that “the visual system is capable of preattentively using information about the emotional valence of a target item before that target item becomes the focus of attention” (Gerritsen, Frischen, Blake, Smilek, & Eastwood, 2008, p. 1056). In attempting to try to make sense of this claim, it has been assumed that any sort of target detection fundamentally rests on a visual analysis of the scene. The accessing of a target’s meaning takes place only once some form of visual analysis has been completed. The issue, then, is about how attentional selection takes place. If we assume a simple late selection account (Deutsch & Deutsch, 1963), the emotional valence of the items determines which items are attended to. By this view, the threatening items can be prioritized over the nonthreatening items purely on the basis of meaning (i.e., their emotional content).

An alternative view is that item selection is determined not by emotional valence but by distinctive physical properties that are associated with threat. As was discussed by Gerritsen et al. (2008), “It is also possible that the meaning of a target stimulus . . . is through learning connected with particular physical properties that are then searched for preferentially” (p. 1057). By this view, items are prioritized according to their physical composition. In line with this are the findings of LoBue and DeLoache (2011) that show that the mere presence of a snake is not rapidly detected. Instead, there is sensitivity toward registering a coiled shape, and apparently, it is this distinctive cue that is diagnostic of the presence of a snake. Consequently, it is not a sensitivity to threat per se that is being revealed but a sensitivity toward certain visual cues that are diagnostic of threat. Although the latter alternative is favored here, empirical data that can be used to adjudicate finally between the two possibilities still need to be collected. Neither alternative rests on the assumption that the mere presence of a visual threat automatically captures attention, and the evidence described and discussed here provides no real support for this.

Although the primary purpose in writing this review has been to critically examine claims about the automatic capture of attention by visual threats, perhaps surprisingly, evidence has emerged that supports a different aspect of emotional processing, and this relates to the difficulties that are experienced in trying to disengage attention once an aversive stimulus has been found (Fox et al., 2001). The critical study here is that by Huang and Yeh (2011). In this study, photographic search items were used, and each photograph was contained within a bounded contour. One of these contours described a Landolt C, and the task was to make a speeded judgment of the up/down orientation of the gap in the contour.

On every trial, the target contour encompassed a photographic image of animals (some included images of humans), and the other search items were images of man-made artifacts. Half the time, the target picture displayed negative affect, and half the time, the picture displayed neutral affect. Participants were instructed that the target contour bounded a picture of an animal. Hence, a simple strategy would be to look for the picture of the animal in order to find the target boundary. A key finding was that responses were generally slower when the target contour surrounded a negative affect photograph than when it surrounded a neutral affect photograph. The orientation judgments were slowed when the bounded image was associated with negative affect, rather than when it was associated with neutral affect. They concluded that “affectively negative information has more impact on the disengagement process than on the engaging process” (p. 230).

The reason special mention is made of this study is that it is a clear case where some of the more serious methodological concerns listed here do not apply. Although the target images were pictures of animals and the nontarget images were pictures of man-made items, the same categorical distinction applied in the negative- and neutral-affect target trials. The nontarget items were otherwise heterogeneous; hence, it is difficult to dismiss the study, because of some systematic confound in target/nontarget similarity with target affect.

More particularly, Huang and Yeh (2011) explicitly manipulated perceptual salience of the targets independently of target valence. On all trials, the nontarget images were grayscale. However, on half the trials, the target images were presented in grayscale, and on half, they were presented in full color. The results clearly showed independent effects of perceptual salience and emotional valence. Whereas perceptual salience altered the slope of the corresponding search functions (colored targets were searched for more efficiently than were grayscale targets), emotional valence altered the intercept of the corresponding search functions (participants were generally slower to respond in the presence of a negative rather than a neutral target). The latter finding was taken to indicate the time-consuming nature of attentional disengagement from an aversive stimulus.

It would be a misrepresentation of the study to claim that it informs directly about threat detection: The aversive images were not selected on the basis of threatening content alone. Nonetheless, here is a clear case where evidence from speeded visual search can be used to inform the understanding of the cognitive/perceptual processes that are implicated in the processing of aversive stimuli. It would be extremely surprising if the findings were shown not to generalize to similar studies in which performance with threat-relevant and threat-irrelevant targets were to be compared.