Visual search is ubiquitous in daily life, as when we look for a particular object (target) in a crowded scene containing numerous other objects (distractors), and is central to the investigation of the nature of selective visual attention. Classical theories of selective attention suggest that two stages or modes of processing are involved in visual search: 1) a parallel, preattentive, and capacity-unlimited stage in which all visual items are processed to extract a search-guiding “master” or “salienceFootnote 1 map, and 2) a serial, capacity-limited stage during which focal attention is allocated serially to locations flagged on the salience map to identify selected items (Treisman & Gelade, 1980; Treisman, 1988; Wolfe, 1994, 2007). Henceforth, we will refer to the classical two-stage theory of attention as the “serial model,” due to the nature of its attentional component. In contrast, according to single-stage, mostly signal-detection-based, parallel theories—henceforth referred to as the “parallel model”— attention is distributed diffusively and all items are identified simultaneously (Cameron, Tai, Eckstein & Carrasco, 2004; Eckstein, Thomas, Palmer & Shimozaki, 2000; Palmer, 1995; Palmer & McLean, 1995; Palmer, Verghese, & Pavel, 2000; Shaw, 1984; Verghese, 2001; Ward & McClelland, 1989). Single-stage parallel models, in turn, are in contention with respect to whether the diffuse-attention mode of processing is capacity-limited (Snodgrass & Townsend, 1980; Thornton & Gilden, 2007; Ward & McClelland, 1989) or unlimited (Palmer & McLean, 1995; Verghese, 2001); we will discuss this important distinction below.

The most important empirical pattern that has been taken to be critical for distinguishing between serial and parallel (or 1- vs. 2-stage) attentional-allocation theories of visual search is the effect of set size on mean search RT, in particular the slope of the function relating RT to set size. Thus, the classical result of zero slopes in easy search tasks has been interpreted as an indication of parallel, “pop-out” search, whereas positive slopesFootnote 2 in more difficult tasks have been interpreted as an indication of a serial search process that relies on focal attention (Treisman & Gelade, 1980; Treisman, 1988). According to the theory of Guided Search (GS), a continuum of search slopes can be obtained by varying the salience of the target among distractors, which is a function of the target-distractor contrast. In GS, target salience controls the probability with which the target item is chosen and identified in each (serial-step) deployment of focal attention: When target salience is very high, the target is invariably selected and identified first, irrespective of the set size, thus accounting for the flat RT/set-size slopes in pop-out tasks. On the other extreme, when target salience is very low, the target is not more salient than any of the distractors and hence item-selection is random. Accordingly, more items need to be searched as set size increases, resulting in steep slopes. Finally, in-between these two extremes, intermediate slopes result from moderate levels of target saliency (Wolfe, 1994, 1998, 2007; Liesefeld et al., 2015).

This classic account, however, has been challenged by supporters of parallel models, who pointed out that under certain assumptions, single-stage parallel search models also can account for the set-size regularities discussed above. In particular, it has been argued that mean RT × set size functions are inadequate to discriminate between serial and parallel search mechanisms (Thornton & Gilden, 2007; Townsend, 1972, 1976, 1990; Palmer, 1995; Palmer & McLean, 1995; Palmer et al., 2000; Verghese, 2001; Ward & McClelland, 1989). That is, shallow search slopes in easy searches and steep slopes in difficult searches can be generated both from serial and parallel mechanisms. For example, a parallel search across all the items in the display can display positive slopes if attentional capacity is limited, so that the amount of processing resources that can be allocated towards each item decreases with set size (Ward & McClelland, 1989; Snodgrass & Townsend, 1980). Moreover, even unlimited-capacity parallel models can account for set-size effects, as a consequence of increased decision criteria to mitigate increases in the influence of decision noise with increasing set size: without such decision-criteria change, the more elements there are to identify, the higher are the chances that one of the distractors will be misidentified as a target (Palmer & McLean, 1995; Verghese, 2001). Recently, Williams, Eidels, and Townsend (2014) have challenged another alleged marker of seriality, namely: bimodality of RT distributions (Cousineau & Shiffrin, 2004), by demonstrating that such distributions could be generated by parallel models with attentional gradients. In sum, patterns of effects on RTs and RT distributions that on first sight appear characteristic for serial models can also be explained by purely parallel models. Accordingly, the serial/parallel controversy is far from being settled. So far, however, no formal quantitative comparison of serial and parallel visual search models with respect to RT-distribution data has been performed.

The purpose of the present paper was to compare a serial-search with a parallel-search model in their ability to account for the full distribution of search RTs (for both target-present and target-absent displays) and error rates, as a function of set size. The serial model exemplar is the Competitive Guided Search model (CGS), which was recently fitted to the data of Wolfe, Palmer, and Horowitz (2010), providing a satisfactory account of the RT distributions, error rates, and their dependence on set size (Moran, Zehetleitner, Müller, & Usher, 2013). The parallel model was developed by us as an extension and integration of proposals made in previous studies (Palmer & McLean, 1995; Thornton & Gilden, 2007; Ward & McClelland, 1989). In particular, it includes a family of models that have flexibility with respect to capacity (which can be limited or non-limited, on a continuum), strategic set-size adjustments of the decision criteria, and the search-termination policy (i.e., how soon does one quit the search when the target has not been found).

We fit RT distributions, following Wolfe et al.’s (2010) demonstration that RT distributions are more informative and constraining with respect to visual-search theories than mean RTs alone (Balota & Yap, 2011; Ratcliff, 1978). The models were fitted to an extensive benchmark data-set (of more than 100K search trials) collected by Wolfe et al., which includes three of the most prevalent tasks in the visual-search literature: a color feature, a color-orientation conjunction, and a spatial-configuration search task. Whereas the spatial configuration (2 vs. 5) and the conjunction tasks produce positive set-size effects and have traditionally been considered to be indicative of a ‘serial’ architecture, the color-feature task produces flat set-size slopes and has thus customarily been considered to be indicative of a parallel architecture. We start with a brief description of the two models, followed by our computational methods and results. To anticipate, we find that the parallel model is limited in its ability to fit the qualitative data patterns from these search tasks and that quantitative formal model comparisons consistently favor CGS. Finally we discuss interpretations and potential follow-up studies.

Computational models

Serial search model: competitive-guided search

Competitive-guided search (CGS) is an instantiation of the Guided Search framework, which, like GS (Wolfe, 1994, 2007), conceives of the search process as a sequence of selection–identification iterations. In each iteration, all visual items compete for selection by the limited-capacity identification process, with weights that are proportional to item salience. Once an item is selected, it is correctly identified (with probability 1), with a Wald-distributed identification latency (reflecting noisy accumulation to a single boundary; Luce, 1986). If the target is selected, search terminates with a “target-present” response. If a distractor has been selected and identified as such, it is inhibited to prevent future reselection of the same item. Additionally, CGS features a quit-unit that competes with the visual items for selection (Fig. 1). The activation of this unit increases over the course of the search with each identified distractor item. When the quit-unit is selected, the search is terminated and a “target-absent” response is given. This allows the model to terminate search in a probabilistic way before all items are searched even when the target is not found, accounting for the large overlap in RT distributions between target-present and target-absent responses (Wolfe et al., 2010). Together with residual time and motor-error parameters, the model features a total of 8 parameters (see Moran et al., 2013, for full details). An attractive property of this model is that only a single set of parameters is needed for all set-size conditions; that is, the number of parameters is independent of the number of set-size conditions.

Fig. 1
figure 1

CGS model (reproduced from Moran et al., 2013). Flow chart depicts the sequence of decisions. When a trial is started, first a “quit-or-continue” decision is made. The probability of quitting is described by the equation for pquit, which is equal to the weight associated with the quit unit relative to the summed weights associated with the quit unit and the display items, wj. If search is not terminated, an item is selected for inspection. If the target is selected, a “target-present” response is issued. If a nontarget has been selected, the weights are adjusted, that is, wquit is increased and the weight of the just inspected item is set to zero, after which the sequence starts over with the next quit-or-continue decision. Responses are subject to a small proportion of motor errors. The icons to the right of the quitting decision and the attentional selection unit denote the weights for the quit unit as well as the weights of one target, T, and three distractors, D1 through D3. D2 has already been identified as a nontarget and its weight was reset to zero. Also, the quit weight has already been increased. The example illustrates some “target guidance,” as the target weight is slightly higher than the distractor weights

Parallel search model

We developed a parallel search model as an extension and integration of a number of previous models (Palmer & McLean, 1995; Thornton & Gilden, 2007; Ward & McClelland, 1989). The core assumption of the model is that all items are identified in parallel. For each item in the display, we thus assume a corresponding item identifier that accumulates evidence for and against the hypothesis that the item is a target. One such identifier is illustrated in Fig. 2, modeled as a two-boundary noisy diffusion process, whose upper boundary corresponds to a match (item is the target) and the lower boundary to a mismatch (item is not the target). We assume that all diffusers race in parallel and that they have the same boundary separation a and starting point z. We additionally assume that the target diffuser (if a target is present) has a drift rate v and that all distractor diffusers have the same absolute drift rate but with the opposite sign, − v.Footnote 3 This two-boundary diffusion is a standard way to extend signal-detection theory to account for RTs and speed-accuracy tradeoffs (Ratcliff, 1978; see also Ratcliff et al., 2007, for a dual-diffusion model based on a race of diffusion processes).Footnote 4 The model also includes decision noise, an essential component in the account of set-size effects in parallel search models (Palmer & McLean, 1995).

Fig. 2
figure 2

Noisy target match in the parallel model. Targets have a positive drift v pointing towards the upper (yes) boundary; for nontargets, we assume a symmetric diffusion process (drift -v) towards the negative (no) boundary. Integration is subject to a diffusion noise denoted s

For a display of set size n, we assume that n such diffusors run independently in parallel. We now describe the search-termination rule. A “target-present” decision is made as soon as one of the diffusors reaches the upper boundary (self-termination on matches). By contrast, ‘target-absent’ decisions are triggered by a quit unit, whose activation rises as more diffusers reach the lower boundary. Typically, parallel search models postulate that the search is exhaustive when a target is not found (Palmer & McLean, 1995; Ward & McClelland, 1989; Williams et al., 2014). Importantly, our quit-unit-based termination rule reduces to an exhaustive search for certain parameters (see below). However, for other parameters, our termination-rule will quit the search “early,” i.e., before full-display inspection. Thus, our termination-rule augments the model with further flexibility, with exhaustiveness comprising a special case, thus offering similar flexibility to that which is present in the CGS model.

To elaborate, we assume that when the k'th item reaches the lower identification boundary (note that k does not index a spatial position but the fact that k − 1 items have already reached the distractor boundary before the focal item), the search quits with probability \( {\left(\raisebox{1ex}{$k$}\!\left/ \!\raisebox{-1ex}{$n$}\right.\right)}^q \), where q ≥ 0 is the quit-unit exponent and a free parameter of the model. \( \raisebox{1ex}{$k$}\!\left/ \!\raisebox{-1ex}{$n$}\right. \) is the proportion of display items that have already reached the lower boundary. Accordingly, the tendency to quit the search becomes stronger as the proportion of the display items identified as distractors increases (see Donkin & Shiffrin, 2011, for a similar search termination rule). Note that if the n'th item reaches the lower boundary (and assuming the search has not already terminated), the quit unit is triggered with probability 1. For very high quit-unit exponents (q → ∞) the search is exhaustive, because for any k < n the quitting probability is negligible (\( \underset{q\to \infty }{ \lim }{\left(\raisebox{1ex}{$k$}\!\left/ \!\raisebox{-1ex}{$n$}\right.\right)}^q=0 \)). The other extreme is obtained when q = 0, where the quit unit is deterministically triggered by the first element to reach the lower bound (i.e., single-item inspection). Intermediate levels of q control the tendency to quit the search earlier or later. This choice of quit unit shares important similarities with the operation of the quit unit in CGS: In both models, the search-termination probability increases as a function of the number of rejected distractors and decreases as a function of set size.

In addition to the search time, the model includes a uniformly distributed ‘residual-time’ component that captures the time consumed by ‘non-search’ processes, such as the initial perceptual encoding of the display and the motor production of the response. This choice of a uniform residual time is typical for applications of the diffusion model.Footnote 5 In the Appendix, we provide analytical derivations of the RT densities and error rates, for both target-present and target-absent conditions, based on the assumptions described above.

To account for set-size effects on RTs and error rates, parallel models of this type must assume that set size affects either the drift rates and/or the response boundaries. If processing capacity is limited (Ward & McClelland, 1989), then the drift rate should decrease as a function of set size. By contrast, if capacity is unlimited, then the drift rate would be invariant with respect to set size and set-size effects are attributable to strategic changes in decision criteria mitigating the increasing influence of noise. To equip the parallel model with ample flexibility, we allowed the drift rates, starting points, and boundary separations to vary with set size. Specifically, we let the starting point, z, and the boundary separation, a, vary freely as a function of set size (yielding 8 free model parameters for the four values of set size in the experiments). Because in Wolfe et al.’s (2010) experiment, set size is randomized across trials, this assumption entails that (highly experienced) observers are able to rapidly estimate the set size and use this information to adjust decision criteria.Footnote 6 To allow a flexible amount of capacity limitation, we assume that drift rates vary with set size (n) as a power function, \( v(n)=\frac{v}{n^c} \), whose drift rate “v” and exponent “c” are additional free parameters; note that c = 0 corresponds to unlimited capacity and c = 1/2 to a signal-detection-based derivation of limited capacity (Ward & McClelland, 1989; Smith & Sewell 2013). As customary, the diffusion noise s was maintained fixed at a constant level s = 0.1 (but see Donkin, Brown, & Heathcote, 2009). In addition to the two residual-time parameters, the mean T er and the range s er , the model thus included 13 free parameters. The number of free parameters (n p ) depends on the number of different set sizes k empirically tested in the experiment, as n p  = 5 + 2 * k.

Methods

Sketch of the experimental methods of Wolfe et al. (2010)

Wolfe et al. (2010) collected data from a total of 28 participants for three classic search tasks: nine participants in a feature search (with target defined by color), 10 in a conjunction search (with target defined by a combination of color and orientation), and nine in a spatial configuration search (with a target digit-2 among distractor digit-5s). In each task, four set sizes (3, 6, 12, and 18 items) were crossed with two trial types (target present vs. absent) to create a factorial design with a total of eight conditions. For each participant, approximately 500 trials were run for each of the eight factorial cells. Both factors were intermixed within experimental blocks, that is, they varied randomly from trial to trial.

Model fitting

Our full method for fitting the CGS model has been reported in detail elsewhere (Moran et al., 2013).Footnote 7 In fitting the parallel model, we repeated the same steps. Accordingly, our method is only sketched here. In brief, we adopted the Quantile Maximal Probability Estimation (QMPE; Heathcote, Brown, & Mewhort, 2002) procedure to our purpose. To utilize QMPE, each of the eight set-size (4) * target-presence (2) experimental conditions was separated into seven bins: six bins defined by the 0.1, 0.3, 0.5, 0.7, and 0.9 quantiles for correct RTs and one bin for all error trials. Thus, the data from each search task provided 8 (conditions) * (7-1) = 48 free empirical observations. In essence, QMPE consists of Maximum-Likelihood Estimation (MLE) once the precise RT is censored and only bin identity is maintained. We fitted the model separately to the data of each participant as well as to the “average observer” obtained by averaging accuracy rates and correct-RT quantiles across participants. Further details are provided in the appendix.Footnote 8

Results

Spatial-configuration search (2 vs. 5)

The best fits of the two models for the hardest task in the benchmark data of Wolfe et al. (2010) are illustrated in Fig. 3. The figure (as the following similar figures) depicts model predictions based on the fit for the average observer. As can be seen in Table 1, the capacity-limitation parameter for all observers (range 0.21-0.42; mean = 0.33) falls in the range between the unlimited capacity (c = 0) and the signal-detection notion of limited capacity (c = 0.5). Additionally, the quit-unit exponent (range 9.45-42.76, mean = 21.59) indicates that participants tend to search the display deeply but not exhaustively when the target is not found. Figure 4 (left panel) displays the distribution over the number of items that were identified as distractors before the search was terminated on correct-rejection trials.

Fig. 3
figure 3

Data fits of the serial and parallel models to the Wolfe et al. (2010) “2 vs. 5” average-observer data. Empirical data are denoted with black * symbols, the parallel-model predictions with red + symbols, and the CGS predictions with blue diamonds. Right and left panels correspond to target-present and target-absent trials, respectively. Upper and lower panels correspond to quantile correct RTs and error rates, respectively

Table 1 Best fitting parameters for the parallel model in the 2-vs.-5-search task
Fig. 4
figure 4

Distributions of the number of identified distractors before quit-unit triggering on correct rejection trials for the different tasks of Wolfe et al. (2010), based on the best-fitting parameters for the average observers. The different colors indicate the different set sizes

As evident in the upper panels of Fig. 3, both models were able to account satisfactorily for the slowdown of RT with set size, for the skew in the RT distributions (larger distance between the upper quantile symbols) and for the substantial overlap between the target-present and target-absent RT distributions (Wolfe et al., 2010). In the parallel model, this slowdown is accounted for by both the decrease in drift rate (c > 0) and the increase in the boundary separation as functions of set size (Table 1). For hit trials (top left panel), however, there is a tendency for the parallel-model RT distributions to be too wide for the smaller set sizes (3, 6 items) and too narrow for the larger set sizes (12, 18 items). Additionally, the parallel model (red symbols) shows discrepancies in the false-alarm (FA) rates, particularly for set sizes 12 and 18 items (bottom right panel). Indeed, the model predicts an increase in the FA rate as a function of set size, whereas the empirical FA rate is constant. The reason for the predicted increase in FA with set size is that as set size increases, so does the probability that one of the distractors will mistakenly hit the upper “target” diffusion boundary in a target-absent display. Notably, this tendency is mitigated by the search being non-exhaustive, so that the effective set size that is searched when a target is not found is smaller than the nominal set size. Furthermore, the set-size-related increase in boundary separation acts to reduce FAs. Still, these influences are overruled by the decrease in drift rate, which acts to increase FAs. Regarding miss rates, the parallel and serial models seem to be “on par” (bottom left panel).

To compare the goodness of fit for both models, we calculated (Table 2) the difference between the parallel and CGS models with respect to deviance (i.e., minus twice the log likelihood of the data, under the QMPE parameters), AIC (Akaike, 1973), and BIC (Schwarz 1978). Strikingly, even without the additional penalty imposed by information criteria (AIC, BIC) for the five extra parameters of the parallel model, the CGS model fits the data better for seven individual participants (i.e., all except participants 4 and 7) as well as for the group as a whole, as evidenced by the positive ΔDev values. Penalizing the models for complexity, AIC still prefers the parallel model for Participants 4 and 7. According to BIC, CGS becomes superior for Participant 4, whereas for Participant 7 the models are tied.

Table 2 Model comparison measures for the parallel vs. the CGS model for the different tasks

Conjunction search

For the conjunction-search task, too, the model fits provide strong support for the CGS model (see Table 3 for the best-fitting parameters). As shown in Fig. 5, the parallel model provides a good fit for the target-present RTs (top left panel). However, the model fails with respect to target-absent displays: it underestimates the inter-quantile range of RTs for the large set size (12, 18 items; top right panel) and falsely predicts an increasing FA rate with increasing set size (bottom right panel). Additionally, CGS accounts better for the miss rates (bottom left panel). A model comparison (Table 2) showed that for all participants (except for Participant 1) as well as for the group as a whole, CGS yielded lower deviance values despite its lower number of parameters. After adding the penalty term, CGS was preferred according to AIC (let alone, BIC) for all participants, including Participant 1.

Table 3 Best fitting parameters for the parallel model in the conjunction-search task
Fig. 5
figure 5

Model fits for the conjunction-search task of Wolfe et al. (2010) to average-observer data. The arrangement of the figure is identical to Fig. 3

Feature search

For the feature-search task, too, the model fits provide strong support for the CGS model (see Table 4 for the best-fitting parameters). To understand the reasons for this, we focus below on the fits of the parallel model. Considering first the target-present displays, we find that the parallel model provides a good fit for the hit RTs (Fig. 6, top left panel). Remarkably, as in the data, there are no observable set size effects on the predicted hit RTs. Table 4 shows that with increasing set size, the threshold separation hardly changes, while the starting point moves closer to the lower target-absent boundary. With everything else being equal, this effect would lead to an increase in hit RTs since the target has to traverse a longer distance to reach the upper boundary. However, this effect is offset by a weak tendency for “super-capacity,” that is: a negative capacity exponent, which results in set-size increasing drift rates. This weak super-capacity could arise if the larger number of display items increased the target's (bottom-up) salience.Footnote 9 Note, however, that this drift/starting-point trade-off is less successful in accounting for the miss rates: unlike the data, the parallel model predicts a large increase in miss rates as a function of set size (bottom left panel). Why does the starting point move downwards, however?

Table 4 Best-fitting parameters for the parallel model in the feature-search task
Fig. 6
figure 6

Model fits for the feature-search task of Wolfe et al.’s (2010) to average-observer data. Arrangement of the figure is identical to that of Fig. 3

To understand this, we need to consider the target-absent condition. As shown in Fig. 6, in this condition, unlike in the data, the model predicts a speed-up in correct rejections (top right panel) in the form of shrinkage of the upper part of the RT range. Figure 4 shows that when the target is not found, the search is exhaustive. Thus, all other things being equal (including boundaries and starting point), RTs for CRs would increase with set size (it takes longer for more distractors to reach the lower boundary). However, this “exhaustiveness effect” is offset by the set-size-dependent decrease of the starting point and the super-capacity. These “opposing effects” balance each other almost perfectly with respect to the three lowest RT quantiles (including the median), and maintain a satisfactorily low stable rate of FAs (bottom right panel). However, unlike the data, the model predicts a speed-up in the two upmost (0.7, 0.9) CR quantiles. This intricate trade-off provides further demonstration for why stronger model constraints can be gleaned by fitting search models to RT distributions, rather than only to central-tendency measures (Wolfe et al., 2010).

Finally, a model comparison (Table 2) showed that for all participants (except for Participant 6) as well as for the group as a whole, CGS yielded lower deviance values despite its lower number of parameters. According to AIC and BIC, CGS was preferred for all participants, including Participant 6. This finding is striking, taking into account that for a long time, feature search has been considered the prototypical task for a parallel search architecture.

The parallel-model fits to the feature-search task that we presented above correspond to a highly flexible model, which assumes that boundaries, starting points, and drift rates can vary with set size and which also included the capacity and the quit-termination parameters. Interestingly, the inclusion of the latter did not help the parallel model in this case because the fit always converged to large q-values that correspond to exhaustive search (Table 4; Fig. 4, rightmost panel). To better understand the reason for this intriguing behavior, we explored a more constrained model variant, which was obtained by setting a moderate upper bound on the quit parameter (q ≤ 5) that prevented a fully exhaustive search. As expected, the fits were worse than for the flexible model that we presented in Fig. 5. Notably, this model was able to account for the traditional property of flat mean-RT with set size, but not for the full RT distribution and the error rate functions (see Supplemental information).

Discussion

Despite the remarkable support for the two-stage Guided Search model in accounting for visual search data (Wolfe, 1994, 2007), it has been suggested that the typical set-size effects on mean RT (positive slopes) also are consistent with a number of parallel search models (Palmer & McLean, 1995; Thornton & Gilden, 2007; Verghese, 2001; Ward & McClelland, 1989). Such models could, in principle, account for the positive slopes as a result of either a reduced rate of item processing due to limited capacity (Snodgrass & Townsend, 1980; Shaw, 1984) or an increase in the decision boundary necessitated to maintain error rates at reasonable levels; without such a boundary change, the FA-rate would dramatically increase with set size (Palmer & McLean, 1995). The purpose of our investigation was to develop, based on an extension and integration of prior suggestions (Palmer & McLean, 1995; Thornton & Gilden, 2007; Ward & McClelland, 1989), such a parallel model that combines both capacity limitations and flexible decision-boundary settings and to assess how well it accounts for visual search data compared with the serial CGS model, which has recently been shown to account well for RT-distribution data (Moran et al., 2013). To endow the parallel model with ample flexibility, we even introduced a “quit unit” that allows for pre-exhaustive-search termination when the target is not found. We focused on three classical search tasks from a rich data set (Wolfe et al., 2010) that provided reliable estimators of the full RT distributions for individual observers. Importantly, both the spatial configuration and the conjunction tasks exhibit robust set-size effects, thus allowing for probing the origin(s) of those effects. Methodologically, we embraced Wolfe et al.’s (2010) call for fitting the models to RT distributions, rather than simply to RT means, as RT distributions provide enhanced constraints on the nature of the generating search mechanism(s).

Consider first the more difficult (“serial”) search tasks. The results showed that the fits of the parallel model were problematic. In the 2-vs.-5 task, the model erroneously predicted a set-size-dependent increase in FA rate and failed in accounting for the set-size-related range expansion in hit RTs. For the conjunction task, the parallel model failed to account for performance on target-absent displays with respect to both RT distributions and error rates. Formal model-comparison procedures using AIC and BIC consistently favored CGS for almost all participants and for the group as a whole (Table 2). Importantly, the superiority of CGS was not a consequence of “over-parameterizing” the parallel model and hence subjecting it to heavier AIC/BIC penalties. Indeed, despite its larger number of free parameters (13 vs. 8 for CGS), the parallel model performed worse based on a goodness-of-fit deviance measure, which does not apply number-of-parameters-related penalties. This finding is striking, especially when taking into account that CGS provided adequate fits with parameters that were invariant with respect to set size, whereas the parallel model allowed for flexible set-size adjustments in boundary separation and identification bias. Furthermore, by introducing a capacity parameter “c,” the parallel model was equipped with the ability to behave in a capacity-limited (e.g., Ward & McClelland, 1989) as well as in a capacity-unlimited (Palmer & McLean, 1995; Verghese, 2001) and even in a super-capacity manner. Thus, our model comparisons show that the serial, two-stage CGS model (Moran et al., 2013) performs better than a family of parallel models that vary along the degree of capacity limitation.

Having compared the models with respect to these traditional serial search tasks, we next compared the models based on their fits to the feature task. Given that this task has traditionally been considered to epitomize a parallel search architecture, it provides a stringent test for the serial model. Strikingly, we found consistent superiority for CGS (Table 2), especially in its ability to provide a better account for miss rates and correct-rejection RTs.

Differences between the serial and the parallel model with respect to the feature task

As explained by Moran et al. (2013), CGS accounts for the feature-task data by assigning very high weights to both target saliency and to the quit-unit boost. This has two desirable consequences with respect to the model’s capability in fitting the data. On target-present trials, for any set size, the target is almost certainly identified as the first item. On target-absent trials, for any set size, the quit unit is almost always selected after the rejection of the first distractor. In other words, if the target failed to pop out, observers safely terminate the search, deciding that a target is absent. Thus, in both target-present and target-absent displays, a single item is identified; consequently, there are no set-size effects. The rightmost panel in Fig. 4, however, shows that according to the parallel-model account, feature search was exhaustive when the target was not found. This finding appears puzzling at first glance, because by setting the quit-unit exponent to 0 (q = 0), the model can also quit after a single item is inspected. Why, then, can it not successfully mimic the serial model?

Interestingly, this question highlights a fundamental distinction between the serial and parallel models we compared. Whereas in our CGS parameterization, identification time was invariant with respect to set size, this was not the case with the parallel model. Imagine we would maintain all the parameters of the parallel model invariant across set size and set a zero quit-unit exponent so that it quits after the first distractor reaches the lower boundary. Even in this case, the parallel model would not mimic the serial model because the event of “identifying the first item” in the parallel model is an event that is sensitive to statistical facilitation: With more display items, ceteris paribus, the first item would be identified faster, producing negative RT slopes (this is obvious for target-absent responses, but also would occur with target-present responses, as the target diffusor needs to be faster than all of the distractor diffusors in order for the response to amount to a “hit”).

Notably, everything else is not necessarily equal as the parallel model was endowed with ample flexibility to apply set size modulations with respect to threshold separation, starting point, and capacity. The results of our quantitative model fits show, however, that the empirical RT distributions and error rates provide strict constraints such that a policy of exhaustive search (when the target is not found) yielded the best fits. Ceteris paribus, search-exhaustiveness induces a plethora of set-size effects: a slowdown in CRs (due to the 'need to wait' for the last distractor to reach the lower boundary), a speedup in hits (due to statistical facilitation; note that a 'hit' can be triggered by a distractor, rather than the target, mistakenly reaching the upper bound) an increase in FA-rate (higher likelihood that one of the distractors will mistakenly reach the upper boundary in target absent displays) and a reduction in miss rates (once more, due to the higher probability that one of the distractors will mistakenly reach the upper boundary in a target present display and will trigger a correct hit response). In its best fits, the parallel model tried compensating for these effects with set size reductions of starting points and with increasing drift (“super-capacity”). These fits, alas, were inferior to those produced by CGS, because they failed to provide a satisfactory tradeoff in accounting for miss rates and they generated an unobserved set-size related speedup in the high quantiles of the CR distributions. As shown in the Supplement, more constrained fits (which impose a limit on the quit parameter) failed to improve the model fits.

Qualifications and future directions

While our results favor the two-stage serial CGS model, they need to be taken with caution with regard to concluding an unequivocal superiority for a serial over a parallel architecture of attentional selection. First, extensions of our parallel model need to be explored. For example, within the framework of parallel-diffusor models, it would be important to probe the possibility that different items are processed with different drift rates due to attentional gradients. Such gradients (Cheal, Lyon & Gottlob, 1994; Downing, 1988; LaBerge & Brown, 1989; Müller & Humphreys, 1991) may play an important role in parallel models because, as recently shown by Williams et al. (2014), they allow parallel models to produce mixture-RT distributions, an important characteristic of serial search models (e.g., Moran et al., 2013). Because such investigations will depend on a number of critical assumptions (e.g., the magnitude of attentional gradients, the within-trial dynamics of such gradients, etc.), they will require a dedicated investigation. Additionally, in the current model, we adopted the simplifying assumption that people set nonbiased drift rate criteria for the item-identification process and that any identification-biases are reflected in the starting point (see also Footnote 3). Consequently, the identification drift-rates for the target and the distractors are equal in magnitude. This assumption, however, could be relaxed in future studies to allow for different target-distractors drifts. Future investigations may also explore alternative termination rules for target-absent responses. While our quit unit constitutes one approach for implementing an “urgency signal” (the tendency to quit the search increases as more distractors are rejected), alternative mechanisms could be explored, for example, by collapsing decision boundaries (Drugowitsch et al., 2012; Moran, 2015; Thura et al., 2012; but see Hawkins et al., 2015; Moran, Teodorescu, & Usher, 2015).

It should be noted that our model comparison study is parametric in that it makes specific distributional assumptions with respect to the components of the model (e.g., item-identification and residual times). In this respect, our approach is modest in its ambition as compared with non-parametric, model-free attempts to identify the visual-search cognitive architecture. Alas, prior model-free attempts have produced inconclusive conclusions, because they highlighted the possibility for serial-parallel mimicry (for a recent review, see Algom et al. 2014). Still, one limitation of our study is that it cannot rule out the possibility that different distributional assumptions in future serial and parallel models will improve visual search models and that such future parallel models will outperform future serial models. This, however, does not imply that our current findings are trivial. On the contrary, we contend that the advantage of our approach is that—as a consequence of making parametric assumptions—it avoids the risk of model mimicry. Furthermore, our parametric assumptions are well motivated: By grounding our parallel model on a diffusion-type architecture—the currently most popular approach for modeling speeded decisions across a wide range of cognitive tasks—we believe that our findings are highly informative in the context of current research. Finally, these results provide a challenge that more sophisticated parallel models will need to rise to if they wish to compete with Guided-Search type serial models in accounting for visual search data.

An alternative approach to testing the adequacy of the parallel model of visual search that may avoid the pitfall of specific assumptions associated with the model we explored here (e.g., termination rule on target-absent trials) could rely on the development of ideal-observer-inspired models (Palmer, Verghese & Pavel, 2000). One such promising signal-detection model was developed by Verghese (2001) to account for accuracy with brief search displays. Unlike the current parallel model, in the Verghese model, the individual items are not separately identified. Rather, search decisions are based on a global match between the search display and a target template. This global match in turn is based on the maximal value of the local matches between each display item and the target. Decisions are based on comparing the global match with a “signal-detection” criterion (see also Cameron et al., 2004; Eckstein et al., 2000). While this model was shown to account for set-size effects on accuracy, it has not yet been formally extended and tested on its ability to account of RT distributions.Footnote 10

Furthermore, conclusions (favoring the serial model) may need to be qualified to visual search displays that are available until response. In a set of studies, Palmer and colleagues (2000) showed that in a paradigm in which the display was presented very briefly and the dependent variable was accuracy (rather than RT), a parallel signal-detection model with unlimited capacity provided a better account of the data than either a parallel model with limited capacity or a serial model (Palmer, 1994; Palmer, Ames, & Lindsey, 1993; Dosher, Han, & Lu, 2004, 2010). It thus is possible that the strategy that observers rely on in visual search varies with task contingencies: While for briefly presented displays observers may rely on the maximal value of saliency, with time-unlimited and difficult search displays, they may use the salience map to engage in serial attentional selections that guide a high-resolution identification process to verify target presence. With more difficult displays still, observer also may need to use eye movements to explicitly search through the display (Bloomfield, 1979; Zelinsky & Sheinberg, 1997).

To better understand the nature of the operating processes in visual search, future studies comparing serial and parallel models are required. Such studies should examine additional data-sets based on experimental manipulations that are designed to differentiate between these types of models. For example, it would be important to test how these types of models account for visual-search performance in displays in which target salience is manipulated on a continuum (Liesefeld et al., 2015) or in which target prevalence is manipulated (Wolfe & Van Wert, 2010). Furthermore, the understanding of the nature of attentional processes in visual search will have to include efficiency considerations. For example, once attentional gradients are assumed (Williams et al., 2014), a two-stage serial model, such as Guided Search, which shifts its high-resolution attentional resources across the display, may just be the best way to use the visual system to optimize search performance under its constraints.

Finally, while serial and parallel theories describe two prototypical search mechanisms, future research also should consider the possibility of hybrid mechanisms. For example, an attentional spotlight (Eriksen & Yeh, 1985; LaBerge, 1995; Posner & Petersen, 1990) might be deployed serially between spatial locations in the search display, while processing items simultaneously (i.e., in parallel) within locations. Another possibility is that the search mechanism is analogous to a “car washing” pipeline, wherein several items are identified in parallel and the identification of another item can begin only after the identification of an 'engaged' item completes (Wolfe, 2003, 2007). Exploring such possibilities in future formal models of visual search, and evaluating these models based on RT distributional data, may yield “middle-ground theories” with respect to the serial-parallel search debate.