Introduction

When looking for the vine that contains the most ripe red berries among vines that contain less ripe ones, we are selecting a color feature distributed broadly across space and aggregating that information to come to a useful conclusion. This ability to carry out tasks that prioritize information carried by a particular visual feature has enormous evolutionary value, and a great deal of empirical evidence confirms our intuition that we indeed can do this (Ball and Sekuler 1981; Baldassi and Verghese 2005; Davis and Graham 1981; Haenny et al. 1988; Ho et al. 2012; Lankheet and Verstraten 1995; Ling et al. 2009; Liu et al. 2007; Martinez-Trujillo and Treue 2004; Maunsell et al. 1991; Muller et al. 2006; Saenz et al. 2003; Serences and Boynton 2007; Shih and Sperling 1996; Treue and Martinez-Trujillo 1999). Many studies support the specific claim that such “feature-based attention” (FBA) does indeed modulate sensitivity broadly across space (Felisberti and Zanker 2005; Liu et al. 2007; Rossi and Paradiso 1995; Saenz et al. 2003; Hayden and Gallant 2005; McAdams and Maunsell 2000; Motter 1994; Saenz et al. 2002; Seidemann and Newsome 1999; Treue 2001), exerting influence even at locations that are irrelevant to the participant’s task (Arman et al. 2006; Liu and Mance 2011; Serences and Boynton 2007; White and Carrasco 2011; Zhang and Luck 2009). In addition, imaging studies have shown that attention deployed broadly across space for a particular feature modulates the gain of cortical regions selective for the attended feature (Kamitani and Tong 2005; Liu et al. 2003; Liu and Mance 2011; O’Craven et al. 1997; Muller et al. 2006; Schoenfeld et al. 2007; Serences and Boynton 2007; Serences et al. 2009). Ling et al. (2009) measured Threshold-vs.-Noise curves (Lu and Dosher 1998) in a global motion task to provide evidence that FBA increases sensitivity to a target motion direction both by increasing the gain and also by sharpening the tuning of the MT neuron population response for motion direction. These studies confirm the existence of FBA as a human capability and sketch in bold strokes how FBA operates to control behavior. In addition, they provide important insights into where in the brain FBA is implemented and how the brain might operate to accomplish FBA.

The current paper introduces methods for addressing the next conceptual plane of questions concerning the behavioral goals achieved by FBA. Any given FBA-deployment aims to heighten sensitivity to a specific body of target information (e.g., information carried by the red elements of a scene), and much of the research cited above confirms that a given FBA-deployment can indeed alter sensitivity broadly across space to information carried by different features in the visual input. These observations, however, leave open many important questions central among which are:

  1. 1.

    How effective is the FBA-deployment in sensitizing the participant to the target information?

  2. 2.

    Is the FBA-deployment sensitive to information other than the target information? If so,

    1. (a)

      which non-target features influence the FBA-deployment? and

    2. (b)

      how exactly do they influence it?

These questions led us to conceptualize FBA in terms of attention filters (Drew et al. 2010). An attention filter is a process, initiated by a participant in the context of a task requiring FBA, which operates broadly across space to modulate the relative effectiveness with which different features in the retinal input influence performance.

The main purpose of the current paper is to describe the development of the centroid method for measuring the attention filter achieved by a particular deployment of FBA. The paradigm enables one to describe precisely (1) the relative amounts by which the attention filter passes each to-be-attended feature and rejects each to-be-ignored feature, and (2) the attention filter’s overall sensitivity to the information in the stimulus relative to the noise compromising performance. For a given set of stimulus items, by varying the instructions to the participant to induce different FBA-deployments in different attention conditions, one can iterate the centroid method to discover the structure of the entire space of attention filters achievable by human vision for that set of stimulus items.

Summary statistics and the centroid paradigm

Much recent work has focused on the ability of human participants to extract “summary statistics” from brief displays of ensembles of items. For example, substantial research now supports the claim that human participants are adept at extracting the mean size of an ensemble of disks (Ariely 2001; Chong & Treisman 2003, 2005a, 2005b). Other work has focused on the effectiveness with which human participants can estimate the mean orientation of an ensemble of items (Dakin 2001; Solomon 2010).

As emphasized by Alvarez and Oliva (2008), the centroid is another summary statistic that human participants are adept at extracting from an ensemble of items. This paper shows how to analyze the attention filters that a human participant can interpose between a briefly-presented ensemble of items and the computation that he/she uses to extract the centroid.

The centroid paradigm, first used by Drew et al. (2010) and substantially refined here, offers a number of important advantages over previous methods used to measure attention filters (Chubb and Nam 2000; Nam and Chubb 2000; Chubb and Talevich 2002), including the following:

  1. 1.

    It is much more efficient than these previous psychophysical choice paradigms, requiring many fewer trials to estimate the attention filter deployed by a participant in a given attention condition.

  2. 2.

    Fitting the data is surprisingly simple.

  3. 3.

    Refinements in training methods and in stimulus constraints simplify the summary computations and make the centroid paradigm more precise and resistant to artifacts.

Matlab code for analyzing the data from centroid experiments is provided in Appendix 4.

The centroid paradigm–overview

Imagine a large, flat, weightless piece of hard plastic upon which are placed a number of different stacks of pennies of different heights at different locations. The centroid (or center of gravity) of such a spatial array of penny-piles is the average location of all the pennies in the array. If a fulcrum were placed directly under the centroid of the penny-pile array, the plastic sheet and the penny-piles on top of it would balance perfectly.

The paradigm described in this paper enables one to measure the attention filters that participants can achieve in estimating the centroids of clouds of items drawn from a given set T y p e s of item types. The first example application, which is described in the following two sections to illustrate the formal descriptions of the method, uses the set of vertically oriented Gabors shown in the upper row of Fig. 1; the second example application (the Dots experiment), uses eight dots with Weber contrasts \(-1,-\frac {3}{4},-\frac {1}{2},-\frac {1}{4},\frac {1}{4},\frac {1}{2}, \frac {3}{4},1\) shown in the bottom row of Fig. 1. In other potential applications, the set T y p e s might contain (1) dots of different colors, (2) Gabors of different orientations, (3) Gabors of different spatial frequencies, (4) Gabors varying in both spatial frequency and orientation, (5) line segments of different lengths and orientations, (6) small objects of different shapes, etc. There is no requirement that the items in T y p e s be ordered (or even related) in any way; however, many applications of interest use items equally spaced along a single continuum (as in the example experiments).

Fig. 1
figure 1

The sets T y p e s of item types used in the two example experiments: The Gabor patterns experiment and The dots experiment. Top row: Gabor patterns with contrasts \(\frac {1}{8},\frac {1}{4},\cdots ,1\). Bottom row: Enlarged images of small, square dots with Weber contrasts \(-1,-\frac {3}{4},-\frac {1}{2},-\frac {1}{4},\frac {1}{4},\frac {1}{2}, \frac {3}{4},1\)

To appreciate the basic idea behind the method, imagine a simulated experiment in which, on each trial

  1. 1.

    A stimulus is presented consisting of a spatially random array comprising several items of each of the item types in T y p e s, and

  2. 2.

    A response is produced as follows:

    1. (a)

      An unknown filter f, constant from trial to trial, is applied to the stimulus to create a map in which each item i of a given type τ i T y p e s in the stimulus field is replaced by a pile of pennies of size f(τ i ), and finally

    2. (b)

      The centroid of the filtered stimulus is extracted.

Although this may not be obvious, the unknown filter f can easily be derived from the data from such an experiment; the section entitled “Analyzing the data: Estimating f ϕ ” explains how.

In the experiments described below, the participant is asked in different attention conditions to try to weight the different item types in accordance with various different “target filters,” ϕ. Each of these different attention conditions requires an experiment analogous to the simulated experiment described above to measure the attention filter f ϕ that the participant actually manages to achieve.

The Gabor pattern experiment

To make the presentation of the paradigm for estimating attention filters more concrete, we illustrate it with an example experiment based on the Gabor patterns shown in the top row of Fig. 1. These eight Gabor patterns were the T y p e s in this example experiment. They were identical in form but differed in contrast. Each was 25×25 pixels. The space constant of the Gaussian window was 5 pixels in each of the horizontal and vertical dimensions, and the windowed sinusoid was vertical, had phase \(-\frac {\pi }{2}\) relative to the center of the envelope, and had a period of 13 pixels. A single Gabor pattern in the stimulus subtended 0.51 deg. The principal spatial frequency of the Gabors was 3.51/cpd. Gabor contrasts were \(\frac {k}{8}\) for k = 1,2,⋯ ,8. The 512×512 pixel region in which the stimuli were displayed subtended 10.51 deg. of visual angle at the viewing distance of 1m. The luminance of the homogeneous background was 52.1 cd/m 2.

Participants were the first two authors plus two naive participants who had never before participated in any psychophysical experiments. The methods used in all experiments were approved by the UC Irvine Institutional Review Board, and the participants provided signed consent forms.

Typically, a participant will be tested in a number of different attention conditions. Each attention condition is defined by a target filter ϕ that assigns nonnegative weights to different item types; the target filter ϕ is used to give feedback in that condition. In this experiment, five attention conditions were investigated: in the Uniform attention condition, the target filter gave equal weight to all eight of the Gabor patterns; in the Graded attention condition, the target filter weighted each Gabor pattern in proportion to its contrast; in the Inverse-graded attention condition, the target filter weighted each Gabor pattern in inverse proportion to its contrast; in the Lowest-only (Highest-only) attention condition, the filter gave all its weight to the minimum (maximum) contrast Gabor pattern. As will be seen later, some attention conditions are more difficult than others.

In an attention condition with a target filter ϕ, the participant strives to weight different item types in accordance with ϕ; usually, however, he/she is unable to do so perfectly. He/she gives too much weight to some item types and too little to others. The function that gives the weights exerted on the participant’s responses by different item types in T y p e s is called the participant’s attention filter f ϕ ; the subscript ϕ keeps track of the target filter that yielded this particular attention filter.

An experimental trial

Defining the attention-weighted centroid

A particular stimulus (e.g., Fig. 2b and g) consists of N s t i m items. Each item i = 1,2,⋯ , N s t i m is of a particular type τ i T y p e s, and occurs at a location (x i , y i ). Note that there are N s t i m items i in a stimulus, and N t y p e s different item types in T y p e s. Typically, but not necessarily, N s t i m N t y p e s , and different items in a display may well be of the same type.

Fig. 2
figure 2

The sequence of displays that occur during a trial. Panels (a) through (f) show the displays in a trial from the Gabor experiment: The participant initiates a trial by pressing the space bar and is then presented with a blank screen (a) with a thin, square, black frame surrounding the region where the stimulus will be displayed (1 s). Then, the stimulus (b) is presented for 100ms; it is followed by a display identical to (a) for 50ms. Then, a post-stimulus mask (c) is presented for 100 ms. After the mask, the participant is presented with (d), a blank screen with a mouse cursor in the middle. The arrow in (e) indicates the participant’s movement of the mouse cursor to click on a response location, (R x , R y ). Then, feedback (f) is presented consisting of i. the stimulus, ii. the mouse cursor located at (R x , R y ), and iii. a bullseye centered at the location of the correct response (x c o r r e c t , y c o r r e c t ) given by Eq. 1. This display remains on until the participant presses the space bar to initiate the next trial. In the Dots experiment, the sequence of events is identical; however, Panels (g), (h), and (i) replace panels (b), (c), and (f) respectively

The target attention filter ϕ assigns a weight ϕ(τ i ) to item i. Thus the spatial coordinates of the target centroid (x c o r r e c t , y c o r r e c t ) are

$$\begin{array}{@{}rcl@{}} x_{correct} &&= \frac{1}{S}\sum\limits_{i=1}^{N_{stim}} \phi(\tau_{i}) x_{i}~~~\text{and}~~~ \\ y_{correct} &&= \frac{1}{S}\sum\limits_{i=1}^{N_{stim}} \phi(\tau_{i}) y_{i}~~~\text{for}\\ S&&=\sum\limits_{i=1}^{N_{stim}} \phi(\tau_{i}). \end{array} $$
(1)

By convention, ϕ is scaled so that the sum over item Types is 1.0; i.e., \( {\sum }_{k=1}^{N_{types}} \phi (k ) = 1.0 \).

Stimuli

In the Uniform, Graded and Inverse-graded attention conditions, every stimulus cloud included two instances of each of the eight Gabor patterns in the top row of Fig. 1. In the Lowest-only and Highest-only attention conditions, every stimulus cloud included three instances of each of these Gabor patterns. Therefore, N s t i m = 16 for the Uniform, Graded and Inverse-graded attention conditions, and N s t i m = 24 for the Lowest-only and the Highest-only attention conditions.

For any given target filter ϕ, the participant’s task was to try to mouse-click the ϕ-weighted centroid (x c o r r e c t , y c o r r e c t ) of the stimulus cloud.

Sequence of trial events

The events that occurred in a trial in the Gabor experiment are shown in panels (a) through (f) of Fig. 2:

  1. 1.

    The participant initiated a trial by pressing the space bar. A blank screen of mean luminance was then presented for 1000 ms. In this display, a thin black line framed the region in which the stimulus cloud would be displayed (Fig. 2a).

  2. 2.

    The stimulus (Fig. 2b) was presented for 100 ms after which it was replaced by a blank stimulus field identical to Fig. 2a for 50 ms. The locations of the Gabor patterns were drawn from a bivariate Gaussian density constrained (as described below in “Generating full-set stimulus clouds”) to keep cloud size constant across trials.

  3. 3.

    150 ms after the stimulus onset, a post-stimulus mask (Sperling 1963) was presented of the sort shown in Fig. 2c; this mask stayed on for 100 ms.

  4. 4.

    The mask was then replaced by a blank stimulus field (Fig. 2d) with a cross-shaped mouse cursor in the middle.

  5. 5.

    The participant used the mouse to move the cursor (as indicated in Fig. 2e) to click on what he/she judged to be the correct location.

  6. 6.

    Then the participant was presented with feedback consisting of

    1. (a)

      the stimulus,

    2. (b)

      the mouse cursor located at the participant’s response, and

    3. (c)

      a bullseye centered at the location of the correct response (x c o r r e c t , y c o r r e c t ) (Eq. 1).

    (The feedback panel in Fig. 2f shows that on this trial, the participant’s response was slightly below and to the right of the correct response.) The feedback stayed on the screen until the participant pressed the space bar to initiate the next trial.

Recorded on every trial:

  1. 1.

    the x- and y-coordinates and types of all items presented;

  2. 2.

    the x- and y-coordinates of the response location clicked on by the participant.

Generating full-set stimulus clouds

Of central interest in the attention condition with a given target filter ϕ are the responses produced by the participant on full-set trials. Each full-set stimulus cloud contains at least one of each type of item, and the number n k of items of type k in T y p e s is the same on every full-set trial. However, there is no requirement that the n k ’s be equal.Footnote 1

Every full-set cloud in the attention condition with a given target filter ϕ contains the same number N s t i m of items; what should the spatial distribution of these items be? For reasons that will become apparent later, it is useful (1) to fix the expectation of the center of the cloud at the center of the stimulus field, and (2) select the standard deviation of the cloud distribution to insure that the clouds we present are contained within the stimulus field. To avoid unwittingly imposing any additional constraints, the natural choice for the distribution of item locations is a bivariate Gaussian density, which is the maximum entropy distribution for a fixed mean and variance.

There is, however, a problem with this simple strategy. Specifically, when the x- and y-coordinates of item locations are independent random variables, full-set stimulus clouds vary randomly (and strongly) across trials in how far items are spread out around the centroid. In the centroid task, it is empirically observed that responses tend to be more accurate on the trials in which items happen to bunch closely around the centroid than they do on trials in which the items are dispersed more broadly. In subsequent analyses, it will be critical to separate response variability due to trial-to-trail variability in the stimulus centroid location (x c o r r e c t , y c o r r e c t ) from variability due to other stimulus factors. This can be accomplished much more easily when item clouds are created that do not vary in size, i.e., in dispersion.

Dispersion

To deal with the problem of varying cloud size, it helps to define the dispersion of a cloud of items. Let the vectors of x- and y-coordinates of the item locations in a given cloud be \(\mathbf {x} = \left (x_{1}, x_{2},\cdots , x_{N_{stim}}\right )\) and \(\mathbf {y} = \left (y_{1}, y_{2},\cdots , y_{N_{stim}}\right )\). Then the Dispersion(x, y) of the stimulus cloud composed of x, y is

$$ \text{Dispersion}(\mathbf{x},\mathbf{y})=\left [ \frac{1}{2N_{stim}-2}\sum\limits_{i=1}^{N_{stim}} \left (x_{i}-\bar{X} \right )^{2} + \left (y_{i}-\bar{Y} \right )^{2} \right ]^{\frac{1}{2}}, $$
(2)

where \(\bar {X}\), \(\bar {Y}\) are the means of the vectors x, y. Note: Dispersion is proportional to the root-mean-square (RMS) distance of the display items from their mean location; the proportionality constant is chosen so that Dispersion(x, y) is an unbiased estimator of the standard deviation used to generate the cloud.

To keep the value of dispersion constant at some value D for all full-set stimulus clouds used in a given experiment:

  1. 1.

    draw independent standard normal random variables \(\tilde {x}_{i}\) and \(\tilde {y}_{i}\), i = 1,2,⋯, N s t i m .

  2. 2.

    and then produce the x- and y-coordinates of the actual item locations by setting

    $$\begin{array}{@{}rcl@{}} x_{i} &&= \frac{D\tilde{x}_{i}}{\text{Dispersion}(\tilde{x},\tilde{y})}~~~~\text{and}\\ y_{i} &&= \frac{D\tilde{y}_{i}}{\text{Dispersion}(\tilde{x},\tilde{y})}~~~~\text{for}\\ i&&=1,2,\cdots,N_{stim} \end{array} $$
    (3)

This process starts with a cloud of points \((\tilde {x}_{i},\tilde {y}_{i})\); then the location of each dot gets shifted relative to the center of the screen by factor \(\frac {D}{Dispersion(\tilde {x},\tilde {y})}\). The choice of D must strike a compromise between:

  1. 1.

    maximizing the information derived from each trial by making D as large as possible, and

  2. 2.

    insuring that all items in the stimulus cloud fit within the stimulus field.

A procedure that works well is to choose D in a given experiment so that approximately 95 % of the full-set stimulus clouds produced are contained within the stimulus field. When a given cloud produced using this D has one or more items that fall outside the stimulus field, that cloud sample is discarded, and a new cloud sample is produced.

For example, in the Uniform, Graded and Inverse-graded attention conditions in the Gabor experiment, each stimulus cloud comprised two Gabor patterns of each contrast value. The expectation of the (unweighted) centroid of each stimulus cloud was the center of the stimulus region. The dispersion (Eq. 2) of each stimulus cloud was 80 pixels (1.65 deg. of visual angle). This value was chosen because it led to discarding roughly 5 % of the stimulus clouds produced due to one or more item locations falling outside the stimulus region. An additional constraint was that the center-to-center distance between items (each of which subtended 25 × 25 pixels) was constrained to be at least 26 pixels to prevent items from overlapping.

General training in the centroid task

It will generally be useful to start an experiment by training the participant (with trial-by-trial feedback) to extract centroids of clouds. The number of items included in the displays used in this phase of training is typically equal to the number that will ultimately be used in data collection; however, the items are identical, even though the stimulus items will vary in the actual experiment. If the data collection phase mixes trials that include different numbers of items (as in the Dots experiment described below), the training trials in this phase may similarly vary the numbers of items occurring in clouds. Also, the post-stimulus mask used in this phase has an SOA identical to the masking SOA used in the data-collection phase.

The purpose of this training is to minimize idiosyncratic differences in the centroid computations used by different participants. As noted by Drew et al. (2010), in the absence of general training in the centroid task, different participants show significant individual differences in the centroid computations they use. In particular, some participants tend to overweight the contributions of peripheral items relative to items near the center of the cloud whereas other participants show the opposite tendency. These effects can be quite strong. Typically, a participant should remain in this phase of the experiment until his/her performance (as reflected by mean response error) has stabilized. Only then should he/she be introduced to displays composed of items of different item types.

Data collection

After the participant has completed general training, he/she participates in several different attention conditions. In each attention condition, he/she is asked to use a new target filter ϕ to weight display items. It is natural to expect that performance in the task with each new target filter ϕ will require practice. Accordingly, for each new ϕ, it is important to begin by training the participant to perform the task with this target filter, collecting test data only after performance has stabilized.

ϕ-specific training

Using clouds in which items of different types are mixed, the participant is trained (with trial-by-trial feedback) to mouse-click centroids of clouds with items weighted by the target filter ϕ. (On a given trial, the correct response is given by Eq. 1.)

Standard training

The nature of the training used in a given attention condition is likely to depend on the target filter ϕ. We call a target filter ϕ binary if ϕ assigns equal weight to some subset of “target” items in T y p e s and weight 0 to the remaining “distractor” items. For a binary target filter (and sometimes for other target filters), ϕ-specific training typically uses the same mix of item-cloud conditions as will be used in the data collection phase (see “Data collection with target filter ϕ”). In these instances, the participant typically is tested in blocks of 100 trials. After each block, the participant is shown the attention filter he/she achieved in that block as well as several summary measures of accuracy. This feedback is provided to enable the participant to adjust his/her strategy to optimize performance. Because the procedures in ϕ-specific training blocks are identical to procedures in data-collection blocks, it is typically necessary only to retain as data the results from the first two blocks in which performance shows no improvement.

Pretraining

For non-binary target filters ϕ, however, it will sometimes be useful to include an initial phase of ϕ-specific training that uses “simplified” item-clouds that comprise fewer items than will be used in the data-collection phase. This strategy is likely to be appropriate if ϕ assigns a range of different weights to the different elements of T y p e s. In particular, this pretraining phase might include (often exclusively) clouds that comprise just two items. For a two-item cloud whose items are of types i and j in T y p e s, the correct response lies \(\frac {\phi (j)}{\phi (i)+\phi (j)}\) of the way from the location of the item of type i to the location of the item of type j. When the participant can produce appropriate responses to all such two-item clouds, this suggests that he/she understands the task at a rudimentary level. Following this pretraining, the participant progresses to standard training and data collection.

Data collection in the Gabor example

Blocks in the Gabor experiment comprised 100 full-set trials. On each full-set trial in each of the Uniform, Graded and Inverse-graded (Lowest-only and Highest-only) attention conditions, 16 (24) Gabor patterns were presented, two (3) of each contrast. The number of display items was increased in the Lowest-only and Highest-only attention conditions to insure that a given stimulus display contained at least three target items.

Procedure with naive participants

S3 completed 400 trials of general training (∼20–27 min), S4 completed 200 trials (∼10–13 min). In the final block of general training trials, each of participants S3 and S4 achieved a mean response error comparable to the mean response errors typically achieved by practiced participants, and general training was terminated.

Each of participants S3 and S4 completed 200 trials in each attention condition following a number of ϕ-specific training trials that varied around a mean of 333 depending on the particular target filter ϕ.Footnote 2

Neither S3 nor S4 was tested in either of the Lowest-only or Highest-only attention conditions.

Procedure with experienced participants

Each of participants S1 and S2 had extensive previous experience in the centroid task. In addition, each had previous experience in variants of the centroid task using target filters similar to those tested in the current experiment. Accordingly, general training was omitted for each. Each of S1 and S2 performed 100 ϕ-specific training trials followed by 200 data-collection trials in each attention condition. Attention conditions were tested in the following order: Uniform → Graded → Inverse-graded → Lowest-only → Highest-only.

Modeling

Some attention conditions in the centroid task are likely to be harder than others. The difficulty encountered by a participant in an attention condition using a given target filter ϕ is likely to show up in two main ways:

  1. 1.

    The target filter f ϕ achieved by the participant may deviate from the target filter.

  2. 2.

    Response accuracy may be compromised by

    1. (a)

      random errors,

    2. (b)

      a bias toward a fixed response location.

In analyzing the data from a given attention condition, it will be important to measure the strengths of these effects.

Model assumptions

The primary variable of interest is the attention filter f ϕ achieved by the participant across just the full-set trials in the attention condition using a given target filter ϕ.Footnote 3 However, the model includes four other parameters to account for specific sources of error that may influence responses: a default location (x d e f a u l t , y d e f a u l t ), Data-driveness (V) that describes a participant’s reliance on the present stimulus versus the default location, and a noise parameter σ. These parameters are defined below.

To model the process by which the x-coordinate R x (j) and y-coordinate R y (j) of the participant’s response are produced on a full-set trial j we define the following:

  1. 1.

    τ i (j), x i (j) and y i (j) are the type, and x- and y-coordinates of the i th item in the stimulus cloud presented on trial j.

  2. 2.

    Q x (j) and Q y (j) are independent, normally distributed random variables, each with mean 0 and variance σ 2 that represent random response error.

  3. 3.

    V, the Data-drivenness parameter, reflects the proportion to which the participant’s response is determined by the stimulus presented on each trial as opposed to the fixed point (x d e f a u l t , y d e f a u l t ) to which the participant’s response is assumed to tend (with weight (1−V)) on each trial.

  4. 4.

    f ϕ is the attention filter achieved by the participant across the full-set trials in the attention condition defined by the target filter ϕ.

  5. 5.

    S is the sum of the weights assigned to all the different items in any given full-set display. That is, on any trial j,

    $$ S=\sum\limits_{i=1}^{N_{stim}}f_{\phi} (\tau_{i}(j)) = \sum\limits_{k=1}^{N_{types}}n_{k}f_{\phi} (k) $$
    (4)

    where for k = 1,2,⋯, N t y p e s , n k is the number of items of type k in a full-set display. We assume without loss of generality that S is positive.

Then, the x- and y-coordinates of the predicted response on a given trial j are

$$ R_{x}(j) = \frac{V}{S} \sum\limits_{i=1}^{N_{stim}}f_{\phi} (\tau_{i}(j))x_{i}(j) + (1-V)x_{default}+Q_{x}(j),\quad \text{and} $$
(5)
$$ R_{y}(j) = \frac{V}{S}\sum\limits_{i=1}^{N_{stim}}f_{\phi} (\tau_{i}(j))y_{i}(j) + (1-V)y_{default}+Q_{y}(j). $$
(6)

The methods used to estimate the model parameters V, x d e f a u l t , y d e f a u l t , and f ϕ are described in “Appendix 1. Estimating model parameters.” Methods for computing 95 % confidence intervals for V (Data-drivenness) as well as for the f ϕ (k), k = 1,2,⋯ , N t y p e s , are described in “Appendix 2. Estimating confidence intervals for model parameters.” The Matlab code that is used to perform these computations is given in “Appendix 4. Matlab code for fitting the centroid model.”

Results from the Gabor example experiment

Figure 3 shows the results for all participants for the Uniform, Graded, and Inverse-graded attention conditions. The attention filters f ϕ achieved by all four participants in the Graded attention condition match the target filter ϕ fairly well. In addition, the attention filter achieved by participant S3 in the Uniform attention condition matches the corresponding target filter ϕ remarkably well. However, systematic deviations of f ϕ from ϕ are evident for participants S1, S2 and S4 in the Uniform attention condition and for all four participants in the Inverse-graded attention condition.

Fig. 3
figure 3

Results for the Uniform, Graded and Inverse-graded attention conditions in the Gabor experiment for four participants. The four columns from left to right correspond to participants S1, S2, S3, and S4. Top, middle, and bottom panels show results for the Uniform, Graded and Inverse-graded attention conditions, respectively. In each panel, the dashed line with small open squares gives the target filter ϕ, and the solid line with filled circles gives the attention filter f ϕ achieved by the participant. Error bars give 95 % confidence intervals derived using the methods described in “Appendix 2. Estimating confidence intervals for model parameters.” In addition, the values of Efficiency (Eff), Filter-fidelity(FF), and Data-drivenness (DD) achieved in each attention condition are shown in the corresponding panel. Note that each participant achieves strikingly different attention filters in the Uniform and Graded attention conditions, in each case with high Efficiency. However, values of Efficiency, Filter-fidelity and Data-drivenness are lower in the Inverse-graded attention condition compared to the other two attention conditions

Figure 4 shows the results for S1 and S2 in the Lowest-only and Highest-only attention conditions. The attention filter f ϕ achieved by each of participants S1 and S2 in each of these two attention conditions deviates strongly from the target filter ϕ. In each case, although the target filter ϕ assigns nonzero weight to Gabor patterns of a single contrast (contrast 0.125 (1.0) in the Lowest-only (Highest-only) attention condition), the attention filter f ϕ achieved by each participant gives nonzero weight to Gabors ranging broadly in contrast.

Fig. 4
figure 4

Results for the Lowest-only and Highest-only attention conditions in the Gabor experiment. Results for participant S1 (S2) are shown in the left (right) two panels. Results for the Lowest-only (Highest-only) attention condition are shown in the top (bottom) two panels. In each panel, the dashed line with small open squares gives the target filter ϕ, and the solid line with filled circles gives the attention filter f ϕ achieved by the participant. Error bars give 95 % confidence intervals computed as described in “Appendix 2. Estimating confidence intervals for model parameters.” Note that values of Efficiency (Eff), Filter-fidelity (FF), and Data-drivenness (DD) are low in comparison to the values observed in Fig. 3 for participants S1 and S2 in the Uniform and Graded attention conditions

In addition to plotting the target filter ϕ and the attention filter f ϕ achieved by the participant, each of the panels in Figs. 3 and 4 is annotated with three additional statistics: “Data-drivenness,” “Filter-fidelity” and “Efficiency.” Data-drivenness is parameter V in Eqs. 5 and 6. Along with Data-drivenness, Filter-fidelity and Efficiency reflect the overall skill of the participant in the given attention condition. They are explained in the next section.

Analyzing the data: response error

Potential sources of response error

The participant’s responses can deviate from the target responses for various reasons. These include:

  1. 1.

    corruption of responses by random error, sources of which include

    1. (a)

      early perceptual noise, including

      1. i.

        misregistration of the locations of items in the display

      2. ii.

        misregistration of the types of different items

      3. iii.

        failure to register (i.e. missing) some items

    2. (b)

      late noise, including instability across trials in

      1. i.

        the attention filter being deployed

      2. ii.

        the centroid computation

      3. iii.

        motor response execution

  2. 2.

    corruption of responses by nonrandom error, sources of which include

    1. (a)

      mismatch between the attention filter f ϕ versus the target filter ϕ,

    2. (b)

      Data-drivenness less than 1, implying a tendency to produce responses biased toward a fixed default location,

    3. (c)

      model failure (i.e., the computation used by the participant to produce responses deviates from the description provided by Eqs. 5 and 6)Footnote 4

Measuring the quality of the participant’s attention filter: filter-fidelity

We use a statistic called Filter-fidelity to measure the effectiveness with which f ϕ approximates the target filter ϕ for purposes of performing the centroid task on the item clouds used in a given type of trial (e.g., full-set trials or target-only trials). Filter-fidelity ranges in value from 0, if the attention filter f ϕ achieved by the participant for this class of item clouds is the worst possible filter, to 1 if f ϕ = ϕ. In this context, “worst” means that the variance of the difference between the x-coordinates (or the y-coordinates) of the centroids derived by using f ϕ vs. ϕ is maximal.

A worst possible attention filter f ϕ, w o r s t (there may be more than one worst attention filter) is derived by putting all the filter weight on a single item-type which should (given the task demands) exert minimal influence on the participant’s response. Specifically, a worst possible attention filter can be obtained by choosing an item-type j for which n j ϕ(j) is minimal across j = 1,2,⋯ , N t y p e s and setting

$$ f_{\phi,worst}(k)=\left\{\begin{array}{ll} 1 & \text{if}~k=j, \\ 0 & \mathrm{otherwise,} \end{array}\right. $$
(7)

for all types k = 1,2,⋯, N t y p e s .

Then Filter-fidelity is defined as

$$ \textnormal{Filter-fidelity} = 1 - \frac{ \left \| \widetilde{f}_{\phi} - \widetilde{\phi} \right \| } {\left \| \widetilde{f}_{\phi,worst}-\widetilde{\phi} \right \|} ~~ , $$
(8)

where, for any function \(f:Types\rightarrow \mathbb {R}\) (for which the denominator in Eq. 9 is nonzero),

$$ \widetilde{f} = \frac{f}{{\sum}_{k=1}^{N_{types}} n_{k}f(k)}. $$
(9)

This normalization insures that the x- and y-coordinates of the centroid of a full-set display derived using f are the following weighted sums:

$$\begin{array}{@{}rcl@{}} x\mathrm{-coordinate}&&={\sum}_{\begin{array}{c} \mathrm{all~items}~i~\text{in}\\ \textnormal{full-set~cloud} \end{array}} \widetilde{f}(\tau_{i})x_{i}~~~~~ \text{and}\\ y\mathrm{-coordinate}&&={\sum}_{\begin{array}{c} \mathrm{all~items}~i~\text{in}\\ \textnormal{full-set~cloud} \end{array}} \widetilde{f}(\tau_{i})y_{i} \end{array} $$
(10)

where x i and y i are the x- and y-locations of the i th item in the display, and τ i is its type.

Thus Filter-fidelity is the ratio of the Euclidean distance (in N t y p e s -dimensional space) of \(\widetilde {f}_{\phi }\) from \(\widetilde {\phi }\) to the Euclidean distance of the worst possible filter (i.e., \(\widetilde {f}_{\phi ,worst}\)) from \(\widetilde {\phi }\).

The Filter-fidelity values, computed according to Eq. 8, are displayed in all panels in Figs. 3 and 4. All Filter-fidelity values for the Uniform and Graded attention conditions are quite large (>0.83) reflecting the skill displayed by all four participants in matching the target functions. By contrast, for all participants, Filter-fidelity values are from 8 % to 23 % lower for the Inverse-graded attention condition.

The Lowest-only and Highest-only attention conditions are especially difficult. Across the Uniform, Graded and Inverse-graded conditions, the lowest Filter-fidelity values were achieved in the Inverse-graded condition. For S1 (S2) this Filter-fidelity value was 0.765 (0.732). By contrast, in the Highest-only condition S1 (S2) achieved Filter-fidelity 0.381 (0.421) (Fig. 4). In the Lowest-only condition, Filter-fidelity is even worse, indicating that although participants attempt to selectively attend to Lowest-only and Highest-only targets, they cannot successfully do so.

Measuring resistance to residual error: Efficiency

All of the error in the data unaccounted for by the model of Eqs. 5 and 6 is captured in S S R e s i d u a l (Eq. 24). This quantity includes both

  1. 1.

    random error from various early and late sources in the response-production process as well as

  2. 2.

    error due to model failure.

The statistic that is typically used to quantify random error in the context of a regression analysis is

$$ \widehat{\sigma}=\sqrt{\frac{SS_{Residual}}{df}}, $$
(11)

where

$$\begin{array}{@{}rcl@{}} df &&= 2N_{trials} - (\mathrm{number~of~free~model~parameters}) \\ &&= 2N_{trials}-(N_{types}+2). \end{array} $$
(12)

Although the model has parameters V, f ϕ (which is of length N t y p e s ), x d e f a u l t , y d e f a u l t , the number of free parameters is equal to N t y p e s +2 because f ϕ is constrained to sum to 1 (therefore it absorbs only N t y p e s −1 degrees of freedom). (In the Gabor experiment, d f = 10 because N t y p e s = 8.) The statistic \(\widehat {\sigma }\) is an unbiased estimate of the standard deviation of each of the random variables Q x and Q y (Eqs. 5 and 6). In itself, however, \(\widehat {\sigma }\) is difficult to interpret.

An alternative measure that facilitates comparison across experiments is the statistic p m i s s . To get a clear sense of what p m i s s reflects, imagine a centroid task in which

  1. 1.

    T y p e s contains only two items A and B,

  2. 2.

    the stimulus cloud on any given trial comprises 10 items of type A and 10 of the B, and

  3. 3.

    the task is to click on the centroid of the locations of all items of type A and ignore all items of type B.

If it is difficult to achieve an attention filter that is selective for items of type A vs. type B, then the participant may adopt the strategy of picking out a single item of type A on each trial and simply clicking on the location of that one item.

Under this strategy, on each trial, the participant’s response is determined exclusively by items (actually, by only one item) of type A; items of type B exert no systematic influence whatsoever. Thus, the attention filter achieved by the participant will match the target filter nearly perfectly. Nonetheless, performance will be very poor because the participant ignores nearly all of the relevant information in the display in producing his/her response on each trial. For this reason, even though Filter-fidelity is likely to be very close to 1, \(\widehat {\sigma }\) (Eq. 11) will be large. In the experiment imagined here, if the participant were always able to find exactly one item of type A and to click with perfect accuracy on its location, this strategy will yield a value of p m i s s = 0.90 because the participant is failing to include nine of the ten requested items in the display.

More generally, p m i s s is the answer to the following question: Given

  1. 1.

    the attention filter f ϕ achieved by the participant, and

  2. 2.

    the observed value of \(\widehat {\sigma }\),

what is the maximum possible proportion of display items that the participant could be failing, trial by trial, to include in his/her centroid computation?

Another way of thinking about p m i s s is in terms of an ideal detector operating on a reduced stimulus. Suppose a computer performs the centroid task as follows: On each trial, the computer (1) discards proportion p of items from the stimulus (where the specific items discarded are chosen randomly), then (2) applies attention filter f ϕ to the remaining items in the display, and (3) extracts (without any additional error) the centroid of the decimated and filtered display. The centroid derived through this procedure will vary randomly in each of the x- and y-coordinate values with some standard deviation σ p that will increase with p. The statistic p m i s s is the value that p must take in order for σ p to be equal to \(\widehat {\sigma }\).

Although p m i s s is useful as a summary of performance, it should not be taken seriously as an estimate of the proportion of items actually missed by the participant. It assigns all of the residual error to missed stimulus items; however, any credible process model must admit possible contributions to \(\widehat {\sigma }\) from both model failure as well as from all of the noise sources outlined above. Thus, p m i s s should be viewed as an upper bound on the proportion of display items missed by the participant in his/her centroid computation. Following conventional nomenclature, we refer to 1−p m i s s as Efficiency (which is a lower bound on the proportion of display items included by the participant in his/her centroid computation).

As described below, and illustrated in the Dots Experiment, the lower bound on the proportion of display items included by the participant in his/her centroid computation can be lowered further by including in the experiment “singleton” trials in which a single target item is presented. The variance of the errors produced on these trials is due to other sources than failing to include display items in the centroid computation. It can therefore be subtracted from \(\widehat {\sigma }^{2}\) for purposes of computing p m i s s . Whenever the Efficiency value reported from a particular experiment reflects such a correction, we refer to it as “singleton-corrected Efficiency.”

An algorithm to compute p m i s s is described in “Appendix 3. Computing Efficiency,” and the Matlab code implementing this algorithm is given in “Appendix 4. Matlab code for fitting the centroid model.” (see specifically “GetPMiss.m (Called by FitCentroidModel)” and the functions called by GetPMiss.m.)

The Efficiency values, computed as described in “Appendix 3. Computing Efficiency” are displayed in all the panels in Figs. 3 and 4. High Efficiency in a given attention condition indicates that the participant can indeed deploy the attention filter he/she has achieved broadly across space with high sensitivity. The current results support this conclusion for all participants in the Uniform and Graded attention conditions. However, Efficiency values for the Inverse-graded, Lowest-only and Highest-only attention conditions are much lower. As previously observed, Filter-fidelity values also tend to be suppressed for these attention conditions. It appears that human vision does not have the capability to produce attention filters matched to these attention conditions, at least not with the number of training trials provided here.

The relation between Data-drivenness, Filter-Fidelity and Efficiency

There are several relationships between Efficiency, Data-drivenness and Filter-fidelity that should be noted.

  1. 1.

    It is eminently possible for a participant to achieve very high values of both Filter-fidelity as well as Data-drivenness in a given attention condition even though his/her Efficiency is very low. Efficiency thus emerges as a key measure of performance. If Efficiency is low, then even if Filter-fidelity and Data-drivenness are both high, we must conclude either that (1) the attention filter achieved by the participant cannot be effectively deployed broadly across space or else (2) the items disclosed by the attention filter are too poorly localized in the output image of the filer to enable an accurate estimate of the centroid.

  2. 2.

    On the other hand, if Data-drivenness is low, then Efficiency is likely to be low as well. In particular, note that if Data-drivenness is 0, then the participant’s responses do not depend at all on the locations and types of items in the display. In this case, removing items from the stimulus display has no effect on the deviation of the participant’s responses from the responses predicted by the model. This means that if Data-drivenness is 0, then Efficiency will be undefined. More generally, for any fixed value of \(\widehat {\sigma }\) (Eq. 11), Efficiency will shrink with shrinking Data-drivenness V and will be undefined if \(\frac {\widehat {\sigma }}{V}\) is greater than the dispersion D used to control the size of stimulus clouds in a given experiment.

Simulations exploring interactions between Data-drivenness, Filter-fidelity and singleton-corrected Efficiency

Both as a way to test the accuracy of the FitCentroidModel program (Appendix 4) and to understand better the dependencies among the estimates it produces, we carried out a series of Monte-Carlo simulations. The attention filter of the simulated observer was similar to that achieved by Subject 3 in the Dark-only condition of the Dots experiment (Fig. 6, top row, third panel from left). Simulated data were produced for 60 variants of this observer. These variations were derived from the factorial combination of three factors: five levels of stimulus decimation proportion p (i.e., on each simulated trial, the observer failed to incorporate a randomly selected subset comprising proportion p of the display items into the centroid computation, with p ranging from 0 up to 0.625), four levels of Data-drivenness V, ranging V = 1 down to V = 0.4, and three levels of late error, σ L (i.e., independent, normal random variables with standard deviation σ L were added to the x- and y-coordinates of the response location on each simulated trial), which ranged from 10 to 40 % of the dispersion of the stimulus cloud (see Eq. 2). In addition, in each condition singleton-corrected Efficiency was estimated using two different levels of singleton standard deviation, σ s i n g l e t o n . Five hundred simulated runs of a 100-trial block were generated and analyzed for each of these 120 variations.

Estimates of the attention filter weights, f ϕ,full-set, were generally quite accurate, as were the estimates of Data-drivenness. Specifically, when the decimation level was zero, these estimates were unbiased. In the extreme case, in which the simulated observer registered only 37.5 % of the display items, the mean estimate of an attention filter weight with a simulated value of 0.270, was reduced to 0.252; a distractor component of the attention filter, with a simulated weight of 0.080, was increased to 0.090. Although the simulated value of σ L did not influence the bias of these estimates, it did affect their variability.

The results for singleton-corrected Efficiency are more complicated because this measure is decreased by any manipulation that reduces response accuracy (other than misspecification of the attention filter). In the special case in which Data-drivenness V = 1.0 and the value of σ s i n g l e t o n used for the analysis exactly matches the σ L of the simulated observer, the estimated values of singleton-corrected Efficiency closely match 1−p, where p is the decimation proportion. However, the estimated singleton-corrected Efficiency was also reduced when V was less than 1.0 or σ L >σ s i n g l e t o n . Because singleton-corrected Efficiency is constrained to lie between zero and one, the effects of these three sources of judgment error (V, σ L , σ s i n g l e t o n ) on estimated singleton-corrected Efficiency are subadditive.

Testing whether two attention filters are significantly different

In various circumstances, it may be of interest to assess whether the attention filters estimated from two sets of data are significantly different. Matlab code to perform an F-test of this hypothesis is provided in “Appendix 5. Matlab code for nested model comparison.” See specifically “Main program: FTestForEqualityOfFilters.m.” The test compares the fits provided by two models:

  1. 1.

    The full model allows all model parameters, including the two attention filters to take different, arbitrary values for the two data sets.

  2. 2.

    The nested model assumes that the two data sets resulted from the use of single, shared attention filter (with the other model parameters, V, x d e f a u l t , and y d e f a u l t , allowed to take different values for the two data sets).

Let N t r i a l s,1 (N t r i a l s,2) be the number of trials in the first (second) data set. Then the numbers of degrees of freedom in the full and nested models are

$$\begin{array}{@{}rcl@{}} df_{full} &&= 2(N_{{trials},1}+N_{{trials},2}) - 2(N_{types}-1)-6~~~\text{and} \end{array} $$
(13)
$$\begin{array}{@{}rcl@{}} df_{nested} &&=2(N_{{trials},1}+N_{{trials},2}) - (N_{types}-1)-6, \end{array} $$
(14)

and for S S f u l l (S S n e s t e d ) the sum of squared residual errors between observed and predicted response locations under the full (nested) model, if the nested model captures the true state of the world, then the ratio

$$ Q=\frac{\left( \frac{SS_{nested}-SS_{full}}{df_{numerator}}\right )}{\left( \frac{SS_{full}}{df_{denominator}}\right )} $$
(15)

has an F distribution with degrees of freedom

$$ df_{numerator} = df_{nested}-df_{full} =N_{types}-1, $$
(16)

and

$$ df_{denominator} = df_{full} . $$
(17)

A test of the null hypothesis that the attention filters achieved by participants S1 and S2 in the Lowest-only attention condition (top two panels of Fig. 4) are identical in form, yields F 7,780 = 1.7052, p = 0.1044. By contrast, a test of the null hypothesis that the attention filters achieved by participant S1 in the Highest-only and Lowest-only attention conditions (the left-hand panels of Fig. 4) are identical in form, yields F 7,780 = 30.9317, p≈0.

Gabor experiment: summary and caveat

The prominent features of the Gabor experiment results are:

  • Participants can flexibly deploy very different attention filters in response to altered task demands. Stimuli are identically distributed in the Uniform and the Graded attention conditions, yet participants achieve attention filters that differ strongly in form, in each case with high Filter-fidelity, high Efficiency and high Data-drivenness. The stimulus exposure duration was only 100 ms, and the stimulus is followed after 50 ms by a pattern mask; thus, the accurate attention filters and high Efficiency were achieved with only 150 ms of available stimulus information.

  • Participants are limited in the attention filters they can achieve. While they can achieve high values of Filter-fidelity, Efficiency and Data-drivenness when using Uniform and Graded attention filters, the values of these measures are suppressed in the Inverse-graded, Lowest-only and Highest-only attention conditions.

  • C a v e a t. What is here designated as an attention filter may involve other processes than merely attention. For example, Fig. 3 shows that in the uniform and inverse attention conditions, participants S1, S2, S4 clearly have some difficulty in giving adequate weight to the lowest-contrast Gabor patch. On the other hand, in the graded attention condition, participants are required to give very little or zero weight to this lowest-contrast patch, and this they do quite well, especially when they cannot see it. These participants may be attending to the lowest-contrast patch as much as to the other patches, but fail to weight it properly because of deficient discrimination, not deficient attention. If one developed a measure of discriminability, and incorporated it into the attention filter computation, one might arrive at a “discriminability-corrected attention filter.” The choice we make here is to use the term “attention filter” for the simple but possibly impure concept, and to allow for the subsequent development of more complex, purer measures of attention.

The rest of this section focuses in more careful detail on the inferences enabled by the results of the Gabor experiment.

The filter mixture model

A basic framework for interpreting the results of centroid method experiments is the “filter mixture model.” This model proposes that participants possess a limited set of basic attention filters, i.e., a basis set. The large number of different observed attention filters is assumed to result from combinations of the basic attention filters. As a starting point, the basic attention filters are assumed to be “basic” in the sense that parametric variations in their properties are disallowed. This is formalized as follows. For a given set T y p e s:

  1. 1.

    The participant possesses a basis set of attention filters f j , j = 1,2,⋯ , N, which confer sensitivity to the item types in T y p e s broadly across space, where

    1. (a)

      each filter f j is implemented by a retinotopically organized array of neurons in early vision, and

    2. (b)

      f j (k) gives the activation produced in this neural array by items of type k.

  2. 2.

    The participant can produce attention filters

    $$ f = \sum\limits_{j=1}^{N} A_{j} f_{j} $$
    (18)

    where the A j ’s are constrained

    1. (a)

      to be nonnegative (implying that the participant cannot reverse the sign of the pattern of sensitivity of a given basic attention filter) and

    2. (b)

      to sum to a value no greater than 1 (imposing a bound on the sensitivity that the participant can achieve).

  3. 3.

    In the attention condition with target filter ϕ, the participant strives to choose the weights A j in Eq. 18 to produce the attention filter f ϕ = f that will minimize response error (the difference between judged and correct centroids) when f ϕ is used in Eqs. 5 and 6.

Note that only if the participant possesses one or more basic filters that correlate strongly and positively with the target filter ϕ, will he/she be able to use Eq. 18 to produce an attention filter f sufficiently high in amplitude to robustly estimate the response location in spite of the noise in Eqs. 5 and 6.

It is beyond the scope of the current paper to discuss how one might estimate the basic filters f j from the data from a centroid experiment or submit the filter mixture model to an empirical test. The model nonetheless provides several inferential principles that are useful. Under the filter mixture model, the attention filter f ϕ that a participant achieves in the attention condition with target filter ϕ is a weighted sum of the basic filters f j , j = 1,2,⋯, N. If Efficiency is high in this attention condition, this suggests that

  1. 1.

    some of the basic filters f j , j = 1,2,⋯, N correlate strongly and positively with ϕ, and

  2. 2.

    these useful filters are given large weights in the sum that yields the attention filter f ϕ .

Under the filter mixture model, the results of the Gabor experiment suggest that the number N of basic filters in human vision with sensitivity to the eight Gabor patterns used in our stimuli is at least two. This follows from the finding that participants are able to achieve clearly distinct attention filters in the Uniform vs. Graded attention conditions, in each case with high Efficiency. The attention filters achieved in the Inverse-Graded, Lowest-only and Highest-only attention conditions also differ in form from each other as well as from the attention filters achieved in the Uniform and Graded attention conditions. However, the low Efficiencies observed in the latter three attention conditions imply that each of the available basic filters f j , j = 1,2,⋯ , N, correlates either negatively or near 0 with the target filters used in these attention conditions; it would be precarious to infer from these data that human vision possesses basis filters that correlate positively with the target filters in these attention conditions.

Elaborating and fine-tuning the centroid method

Choosing an appropriate stimulus onset asynchrony

As stimulus onset asynchrony (SOA: the time between stimulus onset and the onset of the post-stimulus mask) is increased, response error decreases to some asymptotic level. The decrease of response error with an increase in SOA reflects increased effectiveness of visual input relative to early noise in the processing stream. The asymptote of response error at long stimulus durations reflects random perturbations of the response process that are invariant with respect to the strength of the input signal. For example, all of the following could contribute to asymptotic response error level: (1) trial-to-trial instability in the centroid computation, (2) error in localizing the to-be-clicked-on location, and (3) motor error in registering the response.

Typical applications of the centroid method should use an SOA that is brief enough to preclude eye movements and/or spatial shifts of attention yet long enough to insure that response error has descended to its asymptotic level. Often a brief pilot experiment is required to select an appropriate SOA. The Dots Experiment (below) includes an example of such a pilot experiment.

More on stimulus clouds

Typically, to measure an attention filter, it is convenient to use full-set trials. Some applications require other sorts of trials. This section addresses the question of how best to construct the stimulus clouds used in some useful non-full-set trials.

Singleton trials

A “singleton” trial is a trial in which only a single item (whose type is typically fixed and selected to be highly salient) is presented. Singleton trials can provide a useful lower-bound on response error.

In an experiment using singleton trials, how should the locations of singletons be distributed? It is tempting to distribute singletons identically to individual items occurring in full-set clouds. This strategy, however, defeats the main purpose of including singleton trials: to derive a lower bound on response noise. If singletons are distributed identically to individual items occurring in full-set clouds, then the participant will need to produce much more variable responses on singleton trials than he/she does on full-set trials (because the centroid of a full-set cloud has lower variance than the individual items in the cloud). Empirically, we have observed that response error increases with the variability of the target location. To equalize the variability of target response locations on singleton trials and full-set trials, on each singleton trial,

  1. 1.

    derive a full-set cloud using the method described in “Generating full-set stimulus clouds.” Then

  2. 2.

    place the singleton at the ϕ-weighted centroid of that cloud (Eq. 1).

Target-only clouds

In many attention conditions of interest the target filter ϕ assigns equal nonzero value to all items in a particular “target” subset T and 0 to the remaining “distractor” types D. We call such target filters binary. In the special case in which the target filter is binary, it is useful to mix three sorts of trials during both condition-specific training and data collection. The first two sorts of trials are the “full-set” and “singleton” trials discussed above. For binary target filters, it is also useful to include “target-only” trials in which the stimulus cloud contains the same mix of target items as on full-set trials but contains no distractor items. The target-only trials are equivalent to providing the participant with a perfect attention filter. The point of including target-only trials in a binary attention condition is to two-fold:

  1. 1.

    to compare performance (as reflected by the attention filter, Filter-fidelity, Efficiency and Data-drivenness) achieved in the presence of distractors with performance achieved with a perfect filter, i.e., with no distractors (e.g., Sperling et al 1992).

  2. 2.

    to enable the participant to refine his/her attention filter by experiencing stimulus clouds unpolluted by distractors.

Both aims are achieved by using target-only clouds in which item locations are distributed identically to the locations of target items on full-set trials.

The dots experiment

The Dots Experiment is an example application that uses binary target filters. This experiment illustrates the utility of including a small number of (1) singleton trials and (2) target-only trials interspersed among full-set trials in experimental blocks.

Methods

Item types

In this experiment, T y p e s included the eight square dots of different gray levels shown in the bottom row of Fig. 1. Dots were 7×7 pixels, subtending 0.144 deg. of visual angle at the viewing distance of 1 m. The Weber contrasts of the eight dots (relative to the uniform gray background) were approximately −1.0,−0.75, −0.5 −0.25, 0.25, 0.5, 0.75 and 1.0.

Displays

As in the Gabor experiment, the stimulus region surrounded by the thin black frame (Fig. 2a) comprised 512×512 pixels. At the viewing distance of 1 m, this region subtended 8 deg. of visual angle. The luminance of the homogeneous background was 77 cd/m 2. Stimulus clouds were constructed as described in “Generating full-set stimulus clouds.” The expectation of the centroid of each stimulus cloud was the center of the stimulus region. Each full-set stimulus cloud comprised two dots of each of the eight Weber contrasts. The dispersion (Eq. 2) of each full-set stimulus cloud was 80 pixels (1.65 deg. of visual angle). This value led us to discard roughly 5 % of the stimulus clouds produced due to one or more item locations falling outside the stimulus region. All dots were constrained to be separated from each other by at least two pixels. Each target-only trial comprised two dots of each target type and no distractor dots; the dots presented were distributed in the stimulus field exactly as they would have been in a full-set trial. Each singleton trial contained a single black dot (Weber contrast −1.0); the location of this dot was distributed identically to the correct response on a full-set trial.

Individual trials

The sequence of events that occurred on an experimental trial precisely paralleled the sequence that occurred in a trial of the Gabor patterns experiment, except that the display items were the dots shown in Fig. 1 rather than the Gabor patterns. Display durations were identical to those used in the Gabor experiment (see “The Gabor pattern experiment”). The sequence of displays in a full-set trial is illustrated in Fig. 2.

Attention conditions

Participants were tested in two, complementary full-set attention conditions using binary target filters ϕ. Stimulus clouds were composed of two dots of each of the eight contrasts. In the first “Dark-only” attention condition the four dot types darker than the background were the target items, and the four dot types lighter than the background were the distractor items. That is, the target filter assigned equal weight to the eight target dots darker than the background and weight 0 to the eight distractor dots brighter than the background. In the “Light-only” attention condition, the roles of dark and light dots were reversed; light dots became the targets and dark dots became the distractors.

Participants

The participants were the same four who participated in the Gabor experiment.

Pilot experiment: selection of the target-to-mask SOA

SOA (stimulus onset asynchrony) refers to the interval from the onset of the stimulus to the onset of the post-stimulus mask. Collection of data in the main experiment was preceded by a brief pilot experiment to choose the SOA to be used in the main experiment. And prior even to collecting data in the pilot experiment, each participant ran separate blocks of 50 trials at each of SOA = 48 ms (36 ms stimulus exposure followed by 12-ms blank frame prior to mask), 82 ms (48-ms stimulus, 34-ms blank), 150 ms (100-ms stimulus, 50-ms blank) and 300 ms (100-ms stimulus, 200-ms blank) using homogeneous dot clouds comprising eight black dots. After this practice, a pilot experiment was conducted using exclusively the Dark-only attention condition. Each participant performed a separate block of 140 trials for each of the four SOAs (560 trials total). A block of 140 trials comprised 100 full-set trials, 20 target-only trials and 20 singleton trials.

Results of the pilot experiment are shown in Fig. 5. For SOA = 48, 82, 150 and 300 ms, the four graphs plot (for participants S1, S2, S3 and S4), response error (the mean Euclidean distance of the participant’s responses from the correct responses across the 100 full-set trials at the given SOA). For all participants, response error has declined to its asymptote by 150 ms, so this was the SOA used in the main experiment.

Fig. 5
figure 5

Stimulus availability requirements for attention-filtered centroid judgments: Response error as a function of SOA in the Dots experiment. For each SOA (48, 82, 150, and 300 ms), each of the four participants performed a separate block of 140 trials comprising 100 full-set trials, 20 target-only trials, and 20 singleton trials in an attention condition in which the target filter assigned equal weight to all dots darker than the background (targets) and weight 0 to all dots brighter than the background (distractors). Response error is the mean Euclidean distance of the participant’s responses from the correct responses across the 100 full-set trials at the given SOA

Main experiment: procedure

As in the Gabor experiment, general training was used for S3 and S4 but omitted for S1 and S2. Additionally, standard ϕ-specific training was used for all participants in each attention condition. As in the pilot study, experimental blocks comprised 100 full-set trials, 20 target-only trials, and 20 singleton trials in a mixed list, i.e., trials within a block were randomly sequenced.

For each participant, performance in each task showed no significant improvement from block 2 to block 3. Therefore, each participant performed just three blocks of trials in each attention condition. The first block was discarded as practice; the second and third were retained as data.

Results and discussion

The results for participants S1, S2, S3, and S4 are plotted in columns 1, 2, 3, and 4 of Fig. 6. All participants achieve strikingly different attention filters in the Dark-only versus the Light-only attention conditions. Specifically, the F-test described in the section entitled “Testing whether two attention filters are significantly different” yields for S1, F 7,780 = 140.96, p≈0; for S2, F 7,780 = 271.01, p≈0; for S3, F 7,780 = 111.78, p≈0; for S4, F 7,780 = 108.85, p≈0. The attention filters achieved in the Dark-only attention condition give dramatically higher weight to dots with negative Weber contrasts than they do to dots with positive Weber contrasts (on average, 5.61× greater), and the reverse is true in the Light-only attention condition (5.78× greater). Note, however, that the Filter-fidelity values (inset in panels) tend to be smaller than the Filter-fidelity values in the Uniform and Graded attention conditions of the Gabor experiment. This reflects the fact that the attention filters achieved by all participants deviate strongly and systematically from the target filters. In particular,

  1. 1.

    participants are unable to completely ignore distractors (distractor weights tend to deviate positively and, in most cases, significantly from 0 in the attention filters achieved), and

  2. 2.

    participants are unable to give equal weight to all target items (target items with higher absolute Weber contrasts tend to receive higher weight).

By comparison, notice that in most cases, the attention filter achieved for target-only trials tends to be more uniform across the four target items than does f ϕ,full-set. (The exceptions are S3 and S4 in the Light-only attention condition.) This suggests that one of the costs incurred in ignoring dots of the distractor polarity may be loss of sensitivity to low-salience dots of the target polarity.Footnote 5

Fig. 6
figure 6

Results of the Dots experiment. Top (Bottom) panels give results from the Dark-only (Light-only) attention conditions for participants S1, S2, S3, and S4 from left to right. In each panel, the thin dashed line gives the target filter ϕ, the solid line marked with filled circles gives the attention filter f ϕ,full-set achieved by the participant in the full-set trials, and the dashed line marked with open circles gives the attention filter f ϕ,target-only estimated from the target-only trials; because only target items are presented on target-only trials, f ϕ,target-only is defined only across the targets in a given attention condition. To facilitate comparison, f ϕ,target-only has been rescaled so that the sum of its values is equal to the sum of f ϕ,full-set across just the target items. Error bars give 95 % confidence intervals computed as described in Appendix 2. Note that f ϕ,full-set is drawn with the solid line running through the point (0,0). This convention reflects the fact that had invisible dots (of Weber contrast 0) been included in the stimulus cloud, their influence on the centroid would have been 0. The values of singleton-corrected Efficiency (scEff), Filter-fidelity (FF) and Data-drivenness (DD) achieved by each participant in each attention condition on full-set and target-only trials are given in each panel. All participants achieve strikingly different attention filters in the two different attention conditions. However, the attention filters achieved differ in important ways from the target filters

Each panel contains the singleton-corrected Efficiency, scEfffull-set (scEfftarget-only), which gives the proportion of dots the participant would need to include in his/her centroid computation to account for the random error in his/her responses on full-set (target-only) trials if the only source of error were missed dots. In all cases, singleton-corrected Efficiency values are high indicating that the participant can indeed deploy the attention filter he/she has achieved broadly across space with high sensitivity. In each panel scEfftarget-only is slightly higher than scEfffull-set indicating that one cost of filtering out distractors is to inject noise into the response-production process.

Averaging across participants and the two attention conditions, response error on singleton trials was 46 % (50 %) as large as \(\widehat {\sigma }\) estimated from full-set (target-only) trials. This suggests that approximately \(\frac {1}{4}\) of the variance in random response error on full-set and target-only trials results from error in localizing and moving the mouse to click on the selected response location; by the same token, approximately \(\frac {3}{4}\) of the variance in random response error on full-set and target-only trials results from processing the stimulus and computing the centroid.

Additional performance measures

Attention filter selectivity

Let f ϕ be the attention filter achieved by a participant in an attention condition with a binary target filter ϕ. In this case, it is useful to define “filter selectivity” as the average of f ϕ (t) across all target items tT y p e s divided by the average of |f ϕ (d)| for distractor items dT y p e s. For example, filter selectivities of 10 or higher are commonly observed in attending to black versus white items (Inverso et al. In press) or red vs. green items or large vs. small items (Blair et al. 2015), and these represent highly selective attention filters. On the other hand, in the Dots experiment, filter selectivities achieved by participants S1, S2, S3 and S4 in the Dark-only (Light-only) attention condition were all lower than 10: 7.29, 7.97, 3.91 and 4.89 (9.55, 6.88, 5.28, 3.78).

The productivity function

Another potentially useful descriptor (whose usefulness is not limited to attention conditions with binary target filters) is the “productivity function,” P ϕ (k) = scEff ϕ ×f ϕ (k), kT y p e s, where f ϕ is the attention filter achieved by a given participant in the condition with target filter ϕ and scEff ϕ is the singleton-corrected Efficiency achieved in that condition. For any item type k, P ϕ (k) provides an estimate of the overall effectiveness with which items of type k influence responses in the centroid task in the attention condition with target filter ϕ. Insofar as f ϕ characterizes the perceptual limits on information that is available to brain processes subsequent to the early centroid computation, and insofar as Efficiency scEff ϕ characterizes cognitive limits, the productivity function P ϕ is an estimate of that portion of the stimulus information that is available to subsequent brain processes.

General discussion

An attention filter is a process, initiated by a participant in the context of a task requiring feature-based attention, which operates broadly across space to modulate the relative effectiveness with which different features in the retinal input influence task performance. As we have shown, the specific task of extracting the centroid of a cloud of items can form the core of a method for deriving precise, quantitative measurements of attention filters.

The feature-similarity gain model and the centroid task

A prominent theory of FBA is the “feature-similarity gain model” (Treue & Martinez-Trujillo 1999; Martinez-Trujillo & Treue 2004). Under this theory, “...the up- or down-regulation of the gain of a sensory neuron reflects the similarity of the features of the currently behaviorally relevant target and the sensory selectivity of the neuron along all target dimensions.” Treue and Martinez-Trujillo (1999). In other words (this theory proposes), FBA will operate to amplify the responses of neurons sensitive to the attended feature and attenuate the responses of neurons insensitive to the attended feature.

The feature-similarity gain model is intended first and foremost to apply to deployments of FBA in which the participant strives to heighten the salience of a singular feature–e.g., a specific direction of motion or a specific color. It might be argued that the Highest-only and Lowest-only attention conditions in the Gabor experiment aim at FBA deployments of this sort; however, participants perform poorly at each of these tasks suggesting that human vision is devoid of neurons selective for the target feature in either of these two tasks. In the Uniform, Graded, and Inverse-graded attention conditions of the Gabor experiment, as well as the Dark-only or Light-only attention conditions in the Dots experiment, participants strive to deploy FBA in ways that heighten the salience of a range of feature values rather than a single, specific feature value.

The feature-similarity gain model might be generalized to handle FBA deployments of this sort by assuming that the gain of a given class of neurons is set in a given attention condition according to the degree to which the differential sensitivity of that neuron to the items in the display “matches the target filter” (i.e., the function ϕ in Eq. 1 used to give feedback in a particular attention condition).

There are various possible interpretations that might be given to the phrase, “matches the target filter”; however, under all such definitions, the proposed generalization of the feature-similarity gain model is likely to produce suboptimal performance in the centroid task. In particular, consider the case in which human vision comprises (1) a particular class C i d e a l of neurons whose sensitivity to the different items in T y p e s precisely matches the target filter in a given attention condition as well as (2) other classes C 1, C 2,⋯ of neurons whose sensitivity to the different items in T y p e s matches the target filter less well. The generalized feature-similarity gain model would assign gain to the neurons in a given class C k in proportion to the degree to which the activation produced in neurons in class C k by the items in T y p e s “matches the target filter.” Typically, then, various classes C k of neurons would be likely to receive non-zero gain under the generalized feature-similarity gain model. However, under nearly all definitions of the phrase, “matches the target filter,” performance in the centroid task will be optimized by assigning full gain to neuron class C i d e a l and zero gain to all other classes of neurons.

It is an empirical question whether the generalized feature-similarity model holds. We submit, however, that it would be surprising if human vision were committed to a general strategy so likely to produce suboptimal behavior given the available neural resources.

Beyond the centroid task

It is important to realize that the attention filters a participant can achieve for one task may differ from those he/she can achieve for another task. A promising direction for future work is to compare the attention filters for various different sets of T y p e s across different tasks. The centroid task is especially appealing because of the remarkable statistical power it confers in estimating attention filters. Nonetheless, experiments to measure attention filters using other tasks are straightforward. For example, it is easy to imagine paradigms generalizing the sorts of experiments that have been used to investigate the extraction of summary statistics from ensembles of items (Alvarez & Oliva, 2008; Alvarez 2011; Ariely, 2001; Chong & Treisman, 2003, 2005a, 2005b).

One family of such experiments might use dots of the sort used in the Dots experiment. On each trial the stimulus is a cloud of dots (with the counts of different dot types varying across trials). In a given attention condition the participant is asked to attempt to apply a target attention filter ϕ to the cloud of dots in the display and sum the filter output across space to extract a summary statistic; if this statistic is greater than some fixed criterion, then the correct response is “A,” otherwise “B.” For example, in an attention condition analogous to the Dark-only condition in the Dots experiment, ϕ assigns the value 1 (0) to all dots of negative (positive) Weber contrast, and the criterion might be set at 9.5. In this case, the correct response would be “A” if the number of dots darker than the background was 10 or more and “B” otherwise. This rule would be used to give trial-by-trial feedback. The data would consist of (1) the matrix M whose j th row contained the counts of different dot types in the stimulus on trial j, and (2) the vector R whose j th entry was 1 (0) if the participant responded “A” (“B”) on trial j. A simple probit model (i.e., a general linear model with a Gaussian linking function) that uses the counts of different dot types as variables to predict the participant’s trial-by-trial classifications of stimulus clouds suffices to estimate the attention filter achieved by the participant in a given attention condition.

Will this classification task yield attention filters identical to the centroid task? The answer to this question is likely to have important consequences for our understanding of the functional architecture of human vision.

Final remarks

The centroid method described here provides guidelines developed (often through painful experience) for efficiently measuring attention filters. The two example experiments illustrate various aspects of the method. The method is distinguished by its power and simplicity:

  1. 1.

    Statistical power: The attention filter plotted in each of the 12 panels of Fig. 3 is derived from only 200 trials (∼10–13 min.) The same is true for each of the full-set attention filters in Fig. 6. To achieve comparable results using a standard psychophysical choice paradigm (e.g., Nam and Chubb (2000), Chubb and Nam (2000), and Chubb and Talevich (2002) would require 3000 or more trials). The target-only attention filters plotted in each of the four panels of Fig. 6 are based on only 40 trials; although their confidence intervals are larger, these curves are nonetheless very informative.

  2. 2.

    Modeling simplicity: To estimate an attention filter (as well as the other model parameters, x d e f a u l t , y d e f a u l t , V and σ) from centroid task data is easy: A simple linear model is used to predict the x- and y-coordinates of the participant’s response from the locations of different types of dots across all of the trials observed in the data set.