Abstract
This paper elaborates a recent conceptualization of feature-based attention in terms of attention filters (Drew et al., Journal of Vision, 10(10:20), 1–16, 2010) into a general purpose centroid-estimation paradigm for studying feature-based attention. An attention filter is a brain process, initiated by a participant in the context of a task requiring feature-based attention, which operates broadly across space to modulate the relative effectiveness with which different features in the retinal input influence performance. This paper describes an empirical method for quantitatively measuring attention filters. The method uses a “statistical summary representation” (SSR) task in which the participant strives to mouse-click the centroid of a briefly flashed cloud composed of items of different types (e.g., dots of different luminances or sizes), weighting some types of items more strongly than others. In different attention conditions, the target weights for different item types in the centroid task are varied. The actual weights exerted on the participant’s responses by different item types in any given attention condition are derived by simple linear regression. Because, on each trial, the centroid paradigm obtains information about the relative effectiveness of all the features in the display, both target and distractor features, and because the participant’s response is a continuous variable in each of two dimensions (versus a simple binary choice as in most previous paradigms), it is remarkably powerful. The number of trials required to estimate an attention filter is an order of magnitude fewer than the number required to investigate much simpler concepts in typical psychophysical attention paradigms.
Similar content being viewed by others
Introduction
When looking for the vine that contains the most ripe red berries among vines that contain less ripe ones, we are selecting a color feature distributed broadly across space and aggregating that information to come to a useful conclusion. This ability to carry out tasks that prioritize information carried by a particular visual feature has enormous evolutionary value, and a great deal of empirical evidence confirms our intuition that we indeed can do this (Ball and Sekuler 1981; Baldassi and Verghese 2005; Davis and Graham 1981; Haenny et al. 1988; Ho et al. 2012; Lankheet and Verstraten 1995; Ling et al. 2009; Liu et al. 2007; Martinez-Trujillo and Treue 2004; Maunsell et al. 1991; Muller et al. 2006; Saenz et al. 2003; Serences and Boynton 2007; Shih and Sperling 1996; Treue and Martinez-Trujillo 1999). Many studies support the specific claim that such “feature-based attention” (FBA) does indeed modulate sensitivity broadly across space (Felisberti and Zanker 2005; Liu et al. 2007; Rossi and Paradiso 1995; Saenz et al. 2003; Hayden and Gallant 2005; McAdams and Maunsell 2000; Motter 1994; Saenz et al. 2002; Seidemann and Newsome 1999; Treue 2001), exerting influence even at locations that are irrelevant to the participant’s task (Arman et al. 2006; Liu and Mance 2011; Serences and Boynton 2007; White and Carrasco 2011; Zhang and Luck 2009). In addition, imaging studies have shown that attention deployed broadly across space for a particular feature modulates the gain of cortical regions selective for the attended feature (Kamitani and Tong 2005; Liu et al. 2003; Liu and Mance 2011; O’Craven et al. 1997; Muller et al. 2006; Schoenfeld et al. 2007; Serences and Boynton 2007; Serences et al. 2009). Ling et al. (2009) measured Threshold-vs.-Noise curves (Lu and Dosher 1998) in a global motion task to provide evidence that FBA increases sensitivity to a target motion direction both by increasing the gain and also by sharpening the tuning of the MT neuron population response for motion direction. These studies confirm the existence of FBA as a human capability and sketch in bold strokes how FBA operates to control behavior. In addition, they provide important insights into where in the brain FBA is implemented and how the brain might operate to accomplish FBA.
The current paper introduces methods for addressing the next conceptual plane of questions concerning the behavioral goals achieved by FBA. Any given FBA-deployment aims to heighten sensitivity to a specific body of target information (e.g., information carried by the red elements of a scene), and much of the research cited above confirms that a given FBA-deployment can indeed alter sensitivity broadly across space to information carried by different features in the visual input. These observations, however, leave open many important questions central among which are:
-
1.
How effective is the FBA-deployment in sensitizing the participant to the target information?
-
2.
Is the FBA-deployment sensitive to information other than the target information? If so,
-
(a)
which non-target features influence the FBA-deployment? and
-
(b)
how exactly do they influence it?
-
(a)
These questions led us to conceptualize FBA in terms of attention filters (Drew et al. 2010). An attention filter is a process, initiated by a participant in the context of a task requiring FBA, which operates broadly across space to modulate the relative effectiveness with which different features in the retinal input influence performance.
The main purpose of the current paper is to describe the development of the centroid method for measuring the attention filter achieved by a particular deployment of FBA. The paradigm enables one to describe precisely (1) the relative amounts by which the attention filter passes each to-be-attended feature and rejects each to-be-ignored feature, and (2) the attention filter’s overall sensitivity to the information in the stimulus relative to the noise compromising performance. For a given set of stimulus items, by varying the instructions to the participant to induce different FBA-deployments in different attention conditions, one can iterate the centroid method to discover the structure of the entire space of attention filters achievable by human vision for that set of stimulus items.
Summary statistics and the centroid paradigm
Much recent work has focused on the ability of human participants to extract “summary statistics” from brief displays of ensembles of items. For example, substantial research now supports the claim that human participants are adept at extracting the mean size of an ensemble of disks (Ariely 2001; Chong & Treisman 2003, 2005a, 2005b). Other work has focused on the effectiveness with which human participants can estimate the mean orientation of an ensemble of items (Dakin 2001; Solomon 2010).
As emphasized by Alvarez and Oliva (2008), the centroid is another summary statistic that human participants are adept at extracting from an ensemble of items. This paper shows how to analyze the attention filters that a human participant can interpose between a briefly-presented ensemble of items and the computation that he/she uses to extract the centroid.
The centroid paradigm, first used by Drew et al. (2010) and substantially refined here, offers a number of important advantages over previous methods used to measure attention filters (Chubb and Nam 2000; Nam and Chubb 2000; Chubb and Talevich 2002), including the following:
-
1.
It is much more efficient than these previous psychophysical choice paradigms, requiring many fewer trials to estimate the attention filter deployed by a participant in a given attention condition.
-
2.
Fitting the data is surprisingly simple.
-
3.
Refinements in training methods and in stimulus constraints simplify the summary computations and make the centroid paradigm more precise and resistant to artifacts.
Matlab code for analyzing the data from centroid experiments is provided in Appendix 4.
The centroid paradigm–overview
Imagine a large, flat, weightless piece of hard plastic upon which are placed a number of different stacks of pennies of different heights at different locations. The centroid (or center of gravity) of such a spatial array of penny-piles is the average location of all the pennies in the array. If a fulcrum were placed directly under the centroid of the penny-pile array, the plastic sheet and the penny-piles on top of it would balance perfectly.
The paradigm described in this paper enables one to measure the attention filters that participants can achieve in estimating the centroids of clouds of items drawn from a given set T y p e s of item types. The first example application, which is described in the following two sections to illustrate the formal descriptions of the method, uses the set of vertically oriented Gabors shown in the upper row of Fig. 1; the second example application (the Dots experiment), uses eight dots with Weber contrasts \(-1,-\frac {3}{4},-\frac {1}{2},-\frac {1}{4},\frac {1}{4},\frac {1}{2}, \frac {3}{4},1\) shown in the bottom row of Fig. 1. In other potential applications, the set T y p e s might contain (1) dots of different colors, (2) Gabors of different orientations, (3) Gabors of different spatial frequencies, (4) Gabors varying in both spatial frequency and orientation, (5) line segments of different lengths and orientations, (6) small objects of different shapes, etc. There is no requirement that the items in T y p e s be ordered (or even related) in any way; however, many applications of interest use items equally spaced along a single continuum (as in the example experiments).
To appreciate the basic idea behind the method, imagine a simulated experiment in which, on each trial
-
1.
A stimulus is presented consisting of a spatially random array comprising several items of each of the item types in T y p e s, and
-
2.
A response is produced as follows:
-
(a)
An unknown filter f, constant from trial to trial, is applied to the stimulus to create a map in which each item i of a given type τ i ∈T y p e s in the stimulus field is replaced by a pile of pennies of size f(τ i ), and finally
-
(b)
The centroid of the filtered stimulus is extracted.
-
(a)
Although this may not be obvious, the unknown filter f can easily be derived from the data from such an experiment; the section entitled “Analyzing the data: Estimating f ϕ ” explains how.
In the experiments described below, the participant is asked in different attention conditions to try to weight the different item types in accordance with various different “target filters,” ϕ. Each of these different attention conditions requires an experiment analogous to the simulated experiment described above to measure the attention filter f ϕ that the participant actually manages to achieve.
The Gabor pattern experiment
To make the presentation of the paradigm for estimating attention filters more concrete, we illustrate it with an example experiment based on the Gabor patterns shown in the top row of Fig. 1. These eight Gabor patterns were the T y p e s in this example experiment. They were identical in form but differed in contrast. Each was 25×25 pixels. The space constant of the Gaussian window was 5 pixels in each of the horizontal and vertical dimensions, and the windowed sinusoid was vertical, had phase \(-\frac {\pi }{2}\) relative to the center of the envelope, and had a period of 13 pixels. A single Gabor pattern in the stimulus subtended 0.51 deg. The principal spatial frequency of the Gabors was 3.51/cpd. Gabor contrasts were \(\frac {k}{8}\) for k = 1,2,⋯ ,8. The 512×512 pixel region in which the stimuli were displayed subtended 10.51 deg. of visual angle at the viewing distance of 1m. The luminance of the homogeneous background was 52.1 cd/m 2.
Participants were the first two authors plus two naive participants who had never before participated in any psychophysical experiments. The methods used in all experiments were approved by the UC Irvine Institutional Review Board, and the participants provided signed consent forms.
Typically, a participant will be tested in a number of different attention conditions. Each attention condition is defined by a target filter ϕ that assigns nonnegative weights to different item types; the target filter ϕ is used to give feedback in that condition. In this experiment, five attention conditions were investigated: in the Uniform attention condition, the target filter gave equal weight to all eight of the Gabor patterns; in the Graded attention condition, the target filter weighted each Gabor pattern in proportion to its contrast; in the Inverse-graded attention condition, the target filter weighted each Gabor pattern in inverse proportion to its contrast; in the Lowest-only (Highest-only) attention condition, the filter gave all its weight to the minimum (maximum) contrast Gabor pattern. As will be seen later, some attention conditions are more difficult than others.
In an attention condition with a target filter ϕ, the participant strives to weight different item types in accordance with ϕ; usually, however, he/she is unable to do so perfectly. He/she gives too much weight to some item types and too little to others. The function that gives the weights exerted on the participant’s responses by different item types in T y p e s is called the participant’s attention filter f ϕ ; the subscript ϕ keeps track of the target filter that yielded this particular attention filter.
An experimental trial
Defining the attention-weighted centroid
A particular stimulus (e.g., Fig. 2b and g) consists of N s t i m items. Each item i = 1,2,⋯ , N s t i m is of a particular type τ i ∈T y p e s, and occurs at a location (x i , y i ). Note that there are N s t i m items i in a stimulus, and N t y p e s different item types in T y p e s. Typically, but not necessarily, N s t i m ≥N t y p e s , and different items in a display may well be of the same type.
The target attention filter ϕ assigns a weight ϕ(τ i ) to item i. Thus the spatial coordinates of the target centroid (x c o r r e c t , y c o r r e c t ) are
By convention, ϕ is scaled so that the sum over item Types is 1.0; i.e., \( {\sum }_{k=1}^{N_{types}} \phi (k ) = 1.0 \).
Stimuli
In the Uniform, Graded and Inverse-graded attention conditions, every stimulus cloud included two instances of each of the eight Gabor patterns in the top row of Fig. 1. In the Lowest-only and Highest-only attention conditions, every stimulus cloud included three instances of each of these Gabor patterns. Therefore, N s t i m = 16 for the Uniform, Graded and Inverse-graded attention conditions, and N s t i m = 24 for the Lowest-only and the Highest-only attention conditions.
For any given target filter ϕ, the participant’s task was to try to mouse-click the ϕ-weighted centroid (x c o r r e c t , y c o r r e c t ) of the stimulus cloud.
Sequence of trial events
The events that occurred in a trial in the Gabor experiment are shown in panels (a) through (f) of Fig. 2:
-
1.
The participant initiated a trial by pressing the space bar. A blank screen of mean luminance was then presented for 1000 ms. In this display, a thin black line framed the region in which the stimulus cloud would be displayed (Fig. 2a).
-
2.
The stimulus (Fig. 2b) was presented for 100 ms after which it was replaced by a blank stimulus field identical to Fig. 2a for 50 ms. The locations of the Gabor patterns were drawn from a bivariate Gaussian density constrained (as described below in “Generating full-set stimulus clouds”) to keep cloud size constant across trials.
-
3.
150 ms after the stimulus onset, a post-stimulus mask (Sperling 1963) was presented of the sort shown in Fig. 2c; this mask stayed on for 100 ms.
-
4.
The mask was then replaced by a blank stimulus field (Fig. 2d) with a cross-shaped mouse cursor in the middle.
-
5.
The participant used the mouse to move the cursor (as indicated in Fig. 2e) to click on what he/she judged to be the correct location.
-
6.
Then the participant was presented with feedback consisting of
-
(a)
the stimulus,
-
(b)
the mouse cursor located at the participant’s response, and
-
(c)
a bullseye centered at the location of the correct response (x c o r r e c t , y c o r r e c t ) (Eq. 1).
(The feedback panel in Fig. 2f shows that on this trial, the participant’s response was slightly below and to the right of the correct response.) The feedback stayed on the screen until the participant pressed the space bar to initiate the next trial.
-
(a)
Recorded on every trial:
-
1.
the x- and y-coordinates and types of all items presented;
-
2.
the x- and y-coordinates of the response location clicked on by the participant.
Generating full-set stimulus clouds
Of central interest in the attention condition with a given target filter ϕ are the responses produced by the participant on full-set trials. Each full-set stimulus cloud contains at least one of each type of item, and the number n k of items of type k in T y p e s is the same on every full-set trial. However, there is no requirement that the n k ’s be equal.Footnote 1
Every full-set cloud in the attention condition with a given target filter ϕ contains the same number N s t i m of items; what should the spatial distribution of these items be? For reasons that will become apparent later, it is useful (1) to fix the expectation of the center of the cloud at the center of the stimulus field, and (2) select the standard deviation of the cloud distribution to insure that the clouds we present are contained within the stimulus field. To avoid unwittingly imposing any additional constraints, the natural choice for the distribution of item locations is a bivariate Gaussian density, which is the maximum entropy distribution for a fixed mean and variance.
There is, however, a problem with this simple strategy. Specifically, when the x- and y-coordinates of item locations are independent random variables, full-set stimulus clouds vary randomly (and strongly) across trials in how far items are spread out around the centroid. In the centroid task, it is empirically observed that responses tend to be more accurate on the trials in which items happen to bunch closely around the centroid than they do on trials in which the items are dispersed more broadly. In subsequent analyses, it will be critical to separate response variability due to trial-to-trail variability in the stimulus centroid location (x c o r r e c t , y c o r r e c t ) from variability due to other stimulus factors. This can be accomplished much more easily when item clouds are created that do not vary in size, i.e., in dispersion.
Dispersion
To deal with the problem of varying cloud size, it helps to define the dispersion of a cloud of items. Let the vectors of x- and y-coordinates of the item locations in a given cloud be \(\mathbf {x} = \left (x_{1}, x_{2},\cdots , x_{N_{stim}}\right )\) and \(\mathbf {y} = \left (y_{1}, y_{2},\cdots , y_{N_{stim}}\right )\). Then the Dispersion(x, y) of the stimulus cloud composed of x, y is
where \(\bar {X}\), \(\bar {Y}\) are the means of the vectors x, y. Note: Dispersion is proportional to the root-mean-square (RMS) distance of the display items from their mean location; the proportionality constant is chosen so that Dispersion(x, y) is an unbiased estimator of the standard deviation used to generate the cloud.
To keep the value of dispersion constant at some value D for all full-set stimulus clouds used in a given experiment:
-
1.
draw independent standard normal random variables \(\tilde {x}_{i}\) and \(\tilde {y}_{i}\), i = 1,2,⋯, N s t i m .
-
2.
and then produce the x- and y-coordinates of the actual item locations by setting
$$\begin{array}{@{}rcl@{}} x_{i} &&= \frac{D\tilde{x}_{i}}{\text{Dispersion}(\tilde{x},\tilde{y})}~~~~\text{and}\\ y_{i} &&= \frac{D\tilde{y}_{i}}{\text{Dispersion}(\tilde{x},\tilde{y})}~~~~\text{for}\\ i&&=1,2,\cdots,N_{stim} \end{array} $$(3)
This process starts with a cloud of points \((\tilde {x}_{i},\tilde {y}_{i})\); then the location of each dot gets shifted relative to the center of the screen by factor \(\frac {D}{Dispersion(\tilde {x},\tilde {y})}\). The choice of D must strike a compromise between:
-
1.
maximizing the information derived from each trial by making D as large as possible, and
-
2.
insuring that all items in the stimulus cloud fit within the stimulus field.
A procedure that works well is to choose D in a given experiment so that approximately 95 % of the full-set stimulus clouds produced are contained within the stimulus field. When a given cloud produced using this D has one or more items that fall outside the stimulus field, that cloud sample is discarded, and a new cloud sample is produced.
For example, in the Uniform, Graded and Inverse-graded attention conditions in the Gabor experiment, each stimulus cloud comprised two Gabor patterns of each contrast value. The expectation of the (unweighted) centroid of each stimulus cloud was the center of the stimulus region. The dispersion (Eq. 2) of each stimulus cloud was 80 pixels (1.65 deg. of visual angle). This value was chosen because it led to discarding roughly 5 % of the stimulus clouds produced due to one or more item locations falling outside the stimulus region. An additional constraint was that the center-to-center distance between items (each of which subtended 25 × 25 pixels) was constrained to be at least 26 pixels to prevent items from overlapping.
General training in the centroid task
It will generally be useful to start an experiment by training the participant (with trial-by-trial feedback) to extract centroids of clouds. The number of items included in the displays used in this phase of training is typically equal to the number that will ultimately be used in data collection; however, the items are identical, even though the stimulus items will vary in the actual experiment. If the data collection phase mixes trials that include different numbers of items (as in the Dots experiment described below), the training trials in this phase may similarly vary the numbers of items occurring in clouds. Also, the post-stimulus mask used in this phase has an SOA identical to the masking SOA used in the data-collection phase.
The purpose of this training is to minimize idiosyncratic differences in the centroid computations used by different participants. As noted by Drew et al. (2010), in the absence of general training in the centroid task, different participants show significant individual differences in the centroid computations they use. In particular, some participants tend to overweight the contributions of peripheral items relative to items near the center of the cloud whereas other participants show the opposite tendency. These effects can be quite strong. Typically, a participant should remain in this phase of the experiment until his/her performance (as reflected by mean response error) has stabilized. Only then should he/she be introduced to displays composed of items of different item types.
Data collection
After the participant has completed general training, he/she participates in several different attention conditions. In each attention condition, he/she is asked to use a new target filter ϕ to weight display items. It is natural to expect that performance in the task with each new target filter ϕ will require practice. Accordingly, for each new ϕ, it is important to begin by training the participant to perform the task with this target filter, collecting test data only after performance has stabilized.
ϕ-specific training
Using clouds in which items of different types are mixed, the participant is trained (with trial-by-trial feedback) to mouse-click centroids of clouds with items weighted by the target filter ϕ. (On a given trial, the correct response is given by Eq. 1.)
Standard training
The nature of the training used in a given attention condition is likely to depend on the target filter ϕ. We call a target filter ϕ binary if ϕ assigns equal weight to some subset of “target” items in T y p e s and weight 0 to the remaining “distractor” items. For a binary target filter (and sometimes for other target filters), ϕ-specific training typically uses the same mix of item-cloud conditions as will be used in the data collection phase (see “Data collection with target filter ϕ”). In these instances, the participant typically is tested in blocks of 100 trials. After each block, the participant is shown the attention filter he/she achieved in that block as well as several summary measures of accuracy. This feedback is provided to enable the participant to adjust his/her strategy to optimize performance. Because the procedures in ϕ-specific training blocks are identical to procedures in data-collection blocks, it is typically necessary only to retain as data the results from the first two blocks in which performance shows no improvement.
Pretraining
For non-binary target filters ϕ, however, it will sometimes be useful to include an initial phase of ϕ-specific training that uses “simplified” item-clouds that comprise fewer items than will be used in the data-collection phase. This strategy is likely to be appropriate if ϕ assigns a range of different weights to the different elements of T y p e s. In particular, this pretraining phase might include (often exclusively) clouds that comprise just two items. For a two-item cloud whose items are of types i and j in T y p e s, the correct response lies \(\frac {\phi (j)}{\phi (i)+\phi (j)}\) of the way from the location of the item of type i to the location of the item of type j. When the participant can produce appropriate responses to all such two-item clouds, this suggests that he/she understands the task at a rudimentary level. Following this pretraining, the participant progresses to standard training and data collection.
Data collection in the Gabor example
Blocks in the Gabor experiment comprised 100 full-set trials. On each full-set trial in each of the Uniform, Graded and Inverse-graded (Lowest-only and Highest-only) attention conditions, 16 (24) Gabor patterns were presented, two (3) of each contrast. The number of display items was increased in the Lowest-only and Highest-only attention conditions to insure that a given stimulus display contained at least three target items.
Procedure with naive participants
S3 completed 400 trials of general training (∼20–27 min), S4 completed 200 trials (∼10–13 min). In the final block of general training trials, each of participants S3 and S4 achieved a mean response error comparable to the mean response errors typically achieved by practiced participants, and general training was terminated.
Each of participants S3 and S4 completed 200 trials in each attention condition following a number of ϕ-specific training trials that varied around a mean of 333 depending on the particular target filter ϕ.Footnote 2
Neither S3 nor S4 was tested in either of the Lowest-only or Highest-only attention conditions.
Procedure with experienced participants
Each of participants S1 and S2 had extensive previous experience in the centroid task. In addition, each had previous experience in variants of the centroid task using target filters similar to those tested in the current experiment. Accordingly, general training was omitted for each. Each of S1 and S2 performed 100 ϕ-specific training trials followed by 200 data-collection trials in each attention condition. Attention conditions were tested in the following order: Uniform → Graded → Inverse-graded → Lowest-only → Highest-only.
Modeling
Some attention conditions in the centroid task are likely to be harder than others. The difficulty encountered by a participant in an attention condition using a given target filter ϕ is likely to show up in two main ways:
-
1.
The target filter f ϕ achieved by the participant may deviate from the target filter.
-
2.
Response accuracy may be compromised by
-
(a)
random errors,
-
(b)
a bias toward a fixed response location.
-
(a)
In analyzing the data from a given attention condition, it will be important to measure the strengths of these effects.
Model assumptions
The primary variable of interest is the attention filter f ϕ achieved by the participant across just the full-set trials in the attention condition using a given target filter ϕ.Footnote 3 However, the model includes four other parameters to account for specific sources of error that may influence responses: a default location (x d e f a u l t , y d e f a u l t ), Data-driveness (V) that describes a participant’s reliance on the present stimulus versus the default location, and a noise parameter σ. These parameters are defined below.
To model the process by which the x-coordinate R x (j) and y-coordinate R y (j) of the participant’s response are produced on a full-set trial j we define the following:
-
1.
τ i (j), x i (j) and y i (j) are the type, and x- and y-coordinates of the i th item in the stimulus cloud presented on trial j.
-
2.
Q x (j) and Q y (j) are independent, normally distributed random variables, each with mean 0 and variance σ 2 that represent random response error.
-
3.
V, the Data-drivenness parameter, reflects the proportion to which the participant’s response is determined by the stimulus presented on each trial as opposed to the fixed point (x d e f a u l t , y d e f a u l t ) to which the participant’s response is assumed to tend (with weight (1−V)) on each trial.
-
4.
f ϕ is the attention filter achieved by the participant across the full-set trials in the attention condition defined by the target filter ϕ.
-
5.
S is the sum of the weights assigned to all the different items in any given full-set display. That is, on any trial j,
$$ S=\sum\limits_{i=1}^{N_{stim}}f_{\phi} (\tau_{i}(j)) = \sum\limits_{k=1}^{N_{types}}n_{k}f_{\phi} (k) $$(4)where for k = 1,2,⋯, N t y p e s , n k is the number of items of type k in a full-set display. We assume without loss of generality that S is positive.
Then, the x- and y-coordinates of the predicted response on a given trial j are
The methods used to estimate the model parameters V, x d e f a u l t , y d e f a u l t , and f ϕ are described in “Appendix 1. Estimating model parameters.” Methods for computing 95 % confidence intervals for V (Data-drivenness) as well as for the f ϕ (k), k = 1,2,⋯ , N t y p e s , are described in “Appendix 2. Estimating confidence intervals for model parameters.” The Matlab code that is used to perform these computations is given in “Appendix 4. Matlab code for fitting the centroid model.”
Results from the Gabor example experiment
Figure 3 shows the results for all participants for the Uniform, Graded, and Inverse-graded attention conditions. The attention filters f ϕ achieved by all four participants in the Graded attention condition match the target filter ϕ fairly well. In addition, the attention filter achieved by participant S3 in the Uniform attention condition matches the corresponding target filter ϕ remarkably well. However, systematic deviations of f ϕ from ϕ are evident for participants S1, S2 and S4 in the Uniform attention condition and for all four participants in the Inverse-graded attention condition.
Figure 4 shows the results for S1 and S2 in the Lowest-only and Highest-only attention conditions. The attention filter f ϕ achieved by each of participants S1 and S2 in each of these two attention conditions deviates strongly from the target filter ϕ. In each case, although the target filter ϕ assigns nonzero weight to Gabor patterns of a single contrast (contrast 0.125 (1.0) in the Lowest-only (Highest-only) attention condition), the attention filter f ϕ achieved by each participant gives nonzero weight to Gabors ranging broadly in contrast.
In addition to plotting the target filter ϕ and the attention filter f ϕ achieved by the participant, each of the panels in Figs. 3 and 4 is annotated with three additional statistics: “Data-drivenness,” “Filter-fidelity” and “Efficiency.” Data-drivenness is parameter V in Eqs. 5 and 6. Along with Data-drivenness, Filter-fidelity and Efficiency reflect the overall skill of the participant in the given attention condition. They are explained in the next section.
Analyzing the data: response error
Potential sources of response error
The participant’s responses can deviate from the target responses for various reasons. These include:
-
1.
corruption of responses by random error, sources of which include
-
(a)
early perceptual noise, including
-
i.
misregistration of the locations of items in the display
-
ii.
misregistration of the types of different items
-
iii.
failure to register (i.e. missing) some items
-
i.
-
(b)
late noise, including instability across trials in
-
i.
the attention filter being deployed
-
ii.
the centroid computation
-
iii.
motor response execution
-
i.
-
(a)
-
2.
corruption of responses by nonrandom error, sources of which include
-
(a)
mismatch between the attention filter f ϕ versus the target filter ϕ,
-
(b)
Data-drivenness less than 1, implying a tendency to produce responses biased toward a fixed default location,
-
(c)
model failure (i.e., the computation used by the participant to produce responses deviates from the description provided by Eqs. 5 and 6)Footnote 4
-
(a)
Measuring the quality of the participant’s attention filter: filter-fidelity
We use a statistic called Filter-fidelity to measure the effectiveness with which f ϕ approximates the target filter ϕ for purposes of performing the centroid task on the item clouds used in a given type of trial (e.g., full-set trials or target-only trials). Filter-fidelity ranges in value from 0, if the attention filter f ϕ achieved by the participant for this class of item clouds is the worst possible filter, to 1 if f ϕ = ϕ. In this context, “worst” means that the variance of the difference between the x-coordinates (or the y-coordinates) of the centroids derived by using f ϕ vs. ϕ is maximal.
A worst possible attention filter f ϕ, w o r s t (there may be more than one worst attention filter) is derived by putting all the filter weight on a single item-type which should (given the task demands) exert minimal influence on the participant’s response. Specifically, a worst possible attention filter can be obtained by choosing an item-type j for which n j ϕ(j) is minimal across j = 1,2,⋯ , N t y p e s and setting
for all types k = 1,2,⋯, N t y p e s .
Then Filter-fidelity is defined as
where, for any function \(f:Types\rightarrow \mathbb {R}\) (for which the denominator in Eq. 9 is nonzero),
This normalization insures that the x- and y-coordinates of the centroid of a full-set display derived using f are the following weighted sums:
where x i and y i are the x- and y-locations of the i th item in the display, and τ i is its type.
Thus Filter-fidelity is the ratio of the Euclidean distance (in N t y p e s -dimensional space) of \(\widetilde {f}_{\phi }\) from \(\widetilde {\phi }\) to the Euclidean distance of the worst possible filter (i.e., \(\widetilde {f}_{\phi ,worst}\)) from \(\widetilde {\phi }\).
The Filter-fidelity values, computed according to Eq. 8, are displayed in all panels in Figs. 3 and 4. All Filter-fidelity values for the Uniform and Graded attention conditions are quite large (>0.83) reflecting the skill displayed by all four participants in matching the target functions. By contrast, for all participants, Filter-fidelity values are from 8 % to 23 % lower for the Inverse-graded attention condition.
The Lowest-only and Highest-only attention conditions are especially difficult. Across the Uniform, Graded and Inverse-graded conditions, the lowest Filter-fidelity values were achieved in the Inverse-graded condition. For S1 (S2) this Filter-fidelity value was 0.765 (0.732). By contrast, in the Highest-only condition S1 (S2) achieved Filter-fidelity 0.381 (0.421) (Fig. 4). In the Lowest-only condition, Filter-fidelity is even worse, indicating that although participants attempt to selectively attend to Lowest-only and Highest-only targets, they cannot successfully do so.
Measuring resistance to residual error: Efficiency
All of the error in the data unaccounted for by the model of Eqs. 5 and 6 is captured in S S R e s i d u a l (Eq. 24). This quantity includes both
-
1.
random error from various early and late sources in the response-production process as well as
-
2.
error due to model failure.
The statistic that is typically used to quantify random error in the context of a regression analysis is
where
Although the model has parameters V, f ϕ (which is of length N t y p e s ), x d e f a u l t , y d e f a u l t , the number of free parameters is equal to N t y p e s +2 because f ϕ is constrained to sum to 1 (therefore it absorbs only N t y p e s −1 degrees of freedom). (In the Gabor experiment, d f = 10 because N t y p e s = 8.) The statistic \(\widehat {\sigma }\) is an unbiased estimate of the standard deviation of each of the random variables Q x and Q y (Eqs. 5 and 6). In itself, however, \(\widehat {\sigma }\) is difficult to interpret.
An alternative measure that facilitates comparison across experiments is the statistic p m i s s . To get a clear sense of what p m i s s reflects, imagine a centroid task in which
-
1.
T y p e s contains only two items A and B,
-
2.
the stimulus cloud on any given trial comprises 10 items of type A and 10 of the B, and
-
3.
the task is to click on the centroid of the locations of all items of type A and ignore all items of type B.
If it is difficult to achieve an attention filter that is selective for items of type A vs. type B, then the participant may adopt the strategy of picking out a single item of type A on each trial and simply clicking on the location of that one item.
Under this strategy, on each trial, the participant’s response is determined exclusively by items (actually, by only one item) of type A; items of type B exert no systematic influence whatsoever. Thus, the attention filter achieved by the participant will match the target filter nearly perfectly. Nonetheless, performance will be very poor because the participant ignores nearly all of the relevant information in the display in producing his/her response on each trial. For this reason, even though Filter-fidelity is likely to be very close to 1, \(\widehat {\sigma }\) (Eq. 11) will be large. In the experiment imagined here, if the participant were always able to find exactly one item of type A and to click with perfect accuracy on its location, this strategy will yield a value of p m i s s = 0.90 because the participant is failing to include nine of the ten requested items in the display.
More generally, p m i s s is the answer to the following question: Given
-
1.
the attention filter f ϕ achieved by the participant, and
-
2.
the observed value of \(\widehat {\sigma }\),
what is the maximum possible proportion of display items that the participant could be failing, trial by trial, to include in his/her centroid computation?
Another way of thinking about p m i s s is in terms of an ideal detector operating on a reduced stimulus. Suppose a computer performs the centroid task as follows: On each trial, the computer (1) discards proportion p of items from the stimulus (where the specific items discarded are chosen randomly), then (2) applies attention filter f ϕ to the remaining items in the display, and (3) extracts (without any additional error) the centroid of the decimated and filtered display. The centroid derived through this procedure will vary randomly in each of the x- and y-coordinate values with some standard deviation σ p that will increase with p. The statistic p m i s s is the value that p must take in order for σ p to be equal to \(\widehat {\sigma }\).
Although p m i s s is useful as a summary of performance, it should not be taken seriously as an estimate of the proportion of items actually missed by the participant. It assigns all of the residual error to missed stimulus items; however, any credible process model must admit possible contributions to \(\widehat {\sigma }\) from both model failure as well as from all of the noise sources outlined above. Thus, p m i s s should be viewed as an upper bound on the proportion of display items missed by the participant in his/her centroid computation. Following conventional nomenclature, we refer to 1−p m i s s as Efficiency (which is a lower bound on the proportion of display items included by the participant in his/her centroid computation).
As described below, and illustrated in the Dots Experiment, the lower bound on the proportion of display items included by the participant in his/her centroid computation can be lowered further by including in the experiment “singleton” trials in which a single target item is presented. The variance of the errors produced on these trials is due to other sources than failing to include display items in the centroid computation. It can therefore be subtracted from \(\widehat {\sigma }^{2}\) for purposes of computing p m i s s . Whenever the Efficiency value reported from a particular experiment reflects such a correction, we refer to it as “singleton-corrected Efficiency.”
An algorithm to compute p m i s s is described in “Appendix 3. Computing Efficiency,” and the Matlab code implementing this algorithm is given in “Appendix 4. Matlab code for fitting the centroid model.” (see specifically “GetPMiss.m (Called by FitCentroidModel)” and the functions called by GetPMiss.m.)
The Efficiency values, computed as described in “Appendix 3. Computing Efficiency” are displayed in all the panels in Figs. 3 and 4. High Efficiency in a given attention condition indicates that the participant can indeed deploy the attention filter he/she has achieved broadly across space with high sensitivity. The current results support this conclusion for all participants in the Uniform and Graded attention conditions. However, Efficiency values for the Inverse-graded, Lowest-only and Highest-only attention conditions are much lower. As previously observed, Filter-fidelity values also tend to be suppressed for these attention conditions. It appears that human vision does not have the capability to produce attention filters matched to these attention conditions, at least not with the number of training trials provided here.
The relation between Data-drivenness, Filter-Fidelity and Efficiency
There are several relationships between Efficiency, Data-drivenness and Filter-fidelity that should be noted.
-
1.
It is eminently possible for a participant to achieve very high values of both Filter-fidelity as well as Data-drivenness in a given attention condition even though his/her Efficiency is very low. Efficiency thus emerges as a key measure of performance. If Efficiency is low, then even if Filter-fidelity and Data-drivenness are both high, we must conclude either that (1) the attention filter achieved by the participant cannot be effectively deployed broadly across space or else (2) the items disclosed by the attention filter are too poorly localized in the output image of the filer to enable an accurate estimate of the centroid.
-
2.
On the other hand, if Data-drivenness is low, then Efficiency is likely to be low as well. In particular, note that if Data-drivenness is 0, then the participant’s responses do not depend at all on the locations and types of items in the display. In this case, removing items from the stimulus display has no effect on the deviation of the participant’s responses from the responses predicted by the model. This means that if Data-drivenness is 0, then Efficiency will be undefined. More generally, for any fixed value of \(\widehat {\sigma }\) (Eq. 11), Efficiency will shrink with shrinking Data-drivenness V and will be undefined if \(\frac {\widehat {\sigma }}{V}\) is greater than the dispersion D used to control the size of stimulus clouds in a given experiment.
Simulations exploring interactions between Data-drivenness, Filter-fidelity and singleton-corrected Efficiency
Both as a way to test the accuracy of the FitCentroidModel program (Appendix 4) and to understand better the dependencies among the estimates it produces, we carried out a series of Monte-Carlo simulations. The attention filter of the simulated observer was similar to that achieved by Subject 3 in the Dark-only condition of the Dots experiment (Fig. 6, top row, third panel from left). Simulated data were produced for 60 variants of this observer. These variations were derived from the factorial combination of three factors: five levels of stimulus decimation proportion p (i.e., on each simulated trial, the observer failed to incorporate a randomly selected subset comprising proportion p of the display items into the centroid computation, with p ranging from 0 up to 0.625), four levels of Data-drivenness V, ranging V = 1 down to V = 0.4, and three levels of late error, σ L (i.e., independent, normal random variables with standard deviation σ L were added to the x- and y-coordinates of the response location on each simulated trial), which ranged from 10 to 40 % of the dispersion of the stimulus cloud (see Eq. 2). In addition, in each condition singleton-corrected Efficiency was estimated using two different levels of singleton standard deviation, σ s i n g l e t o n . Five hundred simulated runs of a 100-trial block were generated and analyzed for each of these 120 variations.
Estimates of the attention filter weights, f ϕ,full-set, were generally quite accurate, as were the estimates of Data-drivenness. Specifically, when the decimation level was zero, these estimates were unbiased. In the extreme case, in which the simulated observer registered only 37.5 % of the display items, the mean estimate of an attention filter weight with a simulated value of 0.270, was reduced to 0.252; a distractor component of the attention filter, with a simulated weight of 0.080, was increased to 0.090. Although the simulated value of σ L did not influence the bias of these estimates, it did affect their variability.
The results for singleton-corrected Efficiency are more complicated because this measure is decreased by any manipulation that reduces response accuracy (other than misspecification of the attention filter). In the special case in which Data-drivenness V = 1.0 and the value of σ s i n g l e t o n used for the analysis exactly matches the σ L of the simulated observer, the estimated values of singleton-corrected Efficiency closely match 1−p, where p is the decimation proportion. However, the estimated singleton-corrected Efficiency was also reduced when V was less than 1.0 or σ L >σ s i n g l e t o n . Because singleton-corrected Efficiency is constrained to lie between zero and one, the effects of these three sources of judgment error (V, σ L , σ s i n g l e t o n ) on estimated singleton-corrected Efficiency are subadditive.
Testing whether two attention filters are significantly different
In various circumstances, it may be of interest to assess whether the attention filters estimated from two sets of data are significantly different. Matlab code to perform an F-test of this hypothesis is provided in “Appendix 5. Matlab code for nested model comparison.” See specifically “Main program: FTestForEqualityOfFilters.m.” The test compares the fits provided by two models:
-
1.
The full model allows all model parameters, including the two attention filters to take different, arbitrary values for the two data sets.
-
2.
The nested model assumes that the two data sets resulted from the use of single, shared attention filter (with the other model parameters, V, x d e f a u l t , and y d e f a u l t , allowed to take different values for the two data sets).
Let N t r i a l s,1 (N t r i a l s,2) be the number of trials in the first (second) data set. Then the numbers of degrees of freedom in the full and nested models are
and for S S f u l l (S S n e s t e d ) the sum of squared residual errors between observed and predicted response locations under the full (nested) model, if the nested model captures the true state of the world, then the ratio
has an F distribution with degrees of freedom
and
A test of the null hypothesis that the attention filters achieved by participants S1 and S2 in the Lowest-only attention condition (top two panels of Fig. 4) are identical in form, yields F 7,780 = 1.7052, p = 0.1044. By contrast, a test of the null hypothesis that the attention filters achieved by participant S1 in the Highest-only and Lowest-only attention conditions (the left-hand panels of Fig. 4) are identical in form, yields F 7,780 = 30.9317, p≈0.
Gabor experiment: summary and caveat
The prominent features of the Gabor experiment results are:
-
Participants can flexibly deploy very different attention filters in response to altered task demands. Stimuli are identically distributed in the Uniform and the Graded attention conditions, yet participants achieve attention filters that differ strongly in form, in each case with high Filter-fidelity, high Efficiency and high Data-drivenness. The stimulus exposure duration was only 100 ms, and the stimulus is followed after 50 ms by a pattern mask; thus, the accurate attention filters and high Efficiency were achieved with only 150 ms of available stimulus information.
-
Participants are limited in the attention filters they can achieve. While they can achieve high values of Filter-fidelity, Efficiency and Data-drivenness when using Uniform and Graded attention filters, the values of these measures are suppressed in the Inverse-graded, Lowest-only and Highest-only attention conditions.
-
C a v e a t. What is here designated as an attention filter may involve other processes than merely attention. For example, Fig. 3 shows that in the uniform and inverse attention conditions, participants S1, S2, S4 clearly have some difficulty in giving adequate weight to the lowest-contrast Gabor patch. On the other hand, in the graded attention condition, participants are required to give very little or zero weight to this lowest-contrast patch, and this they do quite well, especially when they cannot see it. These participants may be attending to the lowest-contrast patch as much as to the other patches, but fail to weight it properly because of deficient discrimination, not deficient attention. If one developed a measure of discriminability, and incorporated it into the attention filter computation, one might arrive at a “discriminability-corrected attention filter.” The choice we make here is to use the term “attention filter” for the simple but possibly impure concept, and to allow for the subsequent development of more complex, purer measures of attention.
The rest of this section focuses in more careful detail on the inferences enabled by the results of the Gabor experiment.
The filter mixture model
A basic framework for interpreting the results of centroid method experiments is the “filter mixture model.” This model proposes that participants possess a limited set of basic attention filters, i.e., a basis set. The large number of different observed attention filters is assumed to result from combinations of the basic attention filters. As a starting point, the basic attention filters are assumed to be “basic” in the sense that parametric variations in their properties are disallowed. This is formalized as follows. For a given set T y p e s:
-
1.
The participant possesses a basis set of attention filters f j , j = 1,2,⋯ , N, which confer sensitivity to the item types in T y p e s broadly across space, where
-
(a)
each filter f j is implemented by a retinotopically organized array of neurons in early vision, and
-
(b)
f j (k) gives the activation produced in this neural array by items of type k.
-
(a)
-
2.
The participant can produce attention filters
$$ f = \sum\limits_{j=1}^{N} A_{j} f_{j} $$(18)where the A j ’s are constrained
-
(a)
to be nonnegative (implying that the participant cannot reverse the sign of the pattern of sensitivity of a given basic attention filter) and
-
(b)
to sum to a value no greater than 1 (imposing a bound on the sensitivity that the participant can achieve).
-
(a)
-
3.
In the attention condition with target filter ϕ, the participant strives to choose the weights A j in Eq. 18 to produce the attention filter f ϕ = f that will minimize response error (the difference between judged and correct centroids) when f ϕ is used in Eqs. 5 and 6.
Note that only if the participant possesses one or more basic filters that correlate strongly and positively with the target filter ϕ, will he/she be able to use Eq. 18 to produce an attention filter f sufficiently high in amplitude to robustly estimate the response location in spite of the noise in Eqs. 5 and 6.
It is beyond the scope of the current paper to discuss how one might estimate the basic filters f j from the data from a centroid experiment or submit the filter mixture model to an empirical test. The model nonetheless provides several inferential principles that are useful. Under the filter mixture model, the attention filter f ϕ that a participant achieves in the attention condition with target filter ϕ is a weighted sum of the basic filters f j , j = 1,2,⋯, N. If Efficiency is high in this attention condition, this suggests that
-
1.
some of the basic filters f j , j = 1,2,⋯, N correlate strongly and positively with ϕ, and
-
2.
these useful filters are given large weights in the sum that yields the attention filter f ϕ .
Under the filter mixture model, the results of the Gabor experiment suggest that the number N of basic filters in human vision with sensitivity to the eight Gabor patterns used in our stimuli is at least two. This follows from the finding that participants are able to achieve clearly distinct attention filters in the Uniform vs. Graded attention conditions, in each case with high Efficiency. The attention filters achieved in the Inverse-Graded, Lowest-only and Highest-only attention conditions also differ in form from each other as well as from the attention filters achieved in the Uniform and Graded attention conditions. However, the low Efficiencies observed in the latter three attention conditions imply that each of the available basic filters f j , j = 1,2,⋯ , N, correlates either negatively or near 0 with the target filters used in these attention conditions; it would be precarious to infer from these data that human vision possesses basis filters that correlate positively with the target filters in these attention conditions.
Elaborating and fine-tuning the centroid method
Choosing an appropriate stimulus onset asynchrony
As stimulus onset asynchrony (SOA: the time between stimulus onset and the onset of the post-stimulus mask) is increased, response error decreases to some asymptotic level. The decrease of response error with an increase in SOA reflects increased effectiveness of visual input relative to early noise in the processing stream. The asymptote of response error at long stimulus durations reflects random perturbations of the response process that are invariant with respect to the strength of the input signal. For example, all of the following could contribute to asymptotic response error level: (1) trial-to-trial instability in the centroid computation, (2) error in localizing the to-be-clicked-on location, and (3) motor error in registering the response.
Typical applications of the centroid method should use an SOA that is brief enough to preclude eye movements and/or spatial shifts of attention yet long enough to insure that response error has descended to its asymptotic level. Often a brief pilot experiment is required to select an appropriate SOA. The Dots Experiment (below) includes an example of such a pilot experiment.
More on stimulus clouds
Typically, to measure an attention filter, it is convenient to use full-set trials. Some applications require other sorts of trials. This section addresses the question of how best to construct the stimulus clouds used in some useful non-full-set trials.
Singleton trials
A “singleton” trial is a trial in which only a single item (whose type is typically fixed and selected to be highly salient) is presented. Singleton trials can provide a useful lower-bound on response error.
In an experiment using singleton trials, how should the locations of singletons be distributed? It is tempting to distribute singletons identically to individual items occurring in full-set clouds. This strategy, however, defeats the main purpose of including singleton trials: to derive a lower bound on response noise. If singletons are distributed identically to individual items occurring in full-set clouds, then the participant will need to produce much more variable responses on singleton trials than he/she does on full-set trials (because the centroid of a full-set cloud has lower variance than the individual items in the cloud). Empirically, we have observed that response error increases with the variability of the target location. To equalize the variability of target response locations on singleton trials and full-set trials, on each singleton trial,
-
1.
derive a full-set cloud using the method described in “Generating full-set stimulus clouds.” Then
-
2.
place the singleton at the ϕ-weighted centroid of that cloud (Eq. 1).
Target-only clouds
In many attention conditions of interest the target filter ϕ assigns equal nonzero value to all items in a particular “target” subset T and 0 to the remaining “distractor” types D. We call such target filters binary. In the special case in which the target filter is binary, it is useful to mix three sorts of trials during both condition-specific training and data collection. The first two sorts of trials are the “full-set” and “singleton” trials discussed above. For binary target filters, it is also useful to include “target-only” trials in which the stimulus cloud contains the same mix of target items as on full-set trials but contains no distractor items. The target-only trials are equivalent to providing the participant with a perfect attention filter. The point of including target-only trials in a binary attention condition is to two-fold:
-
1.
to compare performance (as reflected by the attention filter, Filter-fidelity, Efficiency and Data-drivenness) achieved in the presence of distractors with performance achieved with a perfect filter, i.e., with no distractors (e.g., Sperling et al 1992).
-
2.
to enable the participant to refine his/her attention filter by experiencing stimulus clouds unpolluted by distractors.
Both aims are achieved by using target-only clouds in which item locations are distributed identically to the locations of target items on full-set trials.
The dots experiment
The Dots Experiment is an example application that uses binary target filters. This experiment illustrates the utility of including a small number of (1) singleton trials and (2) target-only trials interspersed among full-set trials in experimental blocks.
Methods
Item types
In this experiment, T y p e s included the eight square dots of different gray levels shown in the bottom row of Fig. 1. Dots were 7×7 pixels, subtending 0.144 deg. of visual angle at the viewing distance of 1 m. The Weber contrasts of the eight dots (relative to the uniform gray background) were approximately −1.0,−0.75, −0.5 −0.25, 0.25, 0.5, 0.75 and 1.0.
Displays
As in the Gabor experiment, the stimulus region surrounded by the thin black frame (Fig. 2a) comprised 512×512 pixels. At the viewing distance of 1 m, this region subtended 8 deg. of visual angle. The luminance of the homogeneous background was 77 cd/m 2. Stimulus clouds were constructed as described in “Generating full-set stimulus clouds.” The expectation of the centroid of each stimulus cloud was the center of the stimulus region. Each full-set stimulus cloud comprised two dots of each of the eight Weber contrasts. The dispersion (Eq. 2) of each full-set stimulus cloud was 80 pixels (1.65 deg. of visual angle). This value led us to discard roughly 5 % of the stimulus clouds produced due to one or more item locations falling outside the stimulus region. All dots were constrained to be separated from each other by at least two pixels. Each target-only trial comprised two dots of each target type and no distractor dots; the dots presented were distributed in the stimulus field exactly as they would have been in a full-set trial. Each singleton trial contained a single black dot (Weber contrast −1.0); the location of this dot was distributed identically to the correct response on a full-set trial.
Individual trials
The sequence of events that occurred on an experimental trial precisely paralleled the sequence that occurred in a trial of the Gabor patterns experiment, except that the display items were the dots shown in Fig. 1 rather than the Gabor patterns. Display durations were identical to those used in the Gabor experiment (see “The Gabor pattern experiment”). The sequence of displays in a full-set trial is illustrated in Fig. 2.
Attention conditions
Participants were tested in two, complementary full-set attention conditions using binary target filters ϕ. Stimulus clouds were composed of two dots of each of the eight contrasts. In the first “Dark-only” attention condition the four dot types darker than the background were the target items, and the four dot types lighter than the background were the distractor items. That is, the target filter assigned equal weight to the eight target dots darker than the background and weight 0 to the eight distractor dots brighter than the background. In the “Light-only” attention condition, the roles of dark and light dots were reversed; light dots became the targets and dark dots became the distractors.
Participants
The participants were the same four who participated in the Gabor experiment.
Pilot experiment: selection of the target-to-mask SOA
SOA (stimulus onset asynchrony) refers to the interval from the onset of the stimulus to the onset of the post-stimulus mask. Collection of data in the main experiment was preceded by a brief pilot experiment to choose the SOA to be used in the main experiment. And prior even to collecting data in the pilot experiment, each participant ran separate blocks of 50 trials at each of SOA = 48 ms (36 ms stimulus exposure followed by 12-ms blank frame prior to mask), 82 ms (48-ms stimulus, 34-ms blank), 150 ms (100-ms stimulus, 50-ms blank) and 300 ms (100-ms stimulus, 200-ms blank) using homogeneous dot clouds comprising eight black dots. After this practice, a pilot experiment was conducted using exclusively the Dark-only attention condition. Each participant performed a separate block of 140 trials for each of the four SOAs (560 trials total). A block of 140 trials comprised 100 full-set trials, 20 target-only trials and 20 singleton trials.
Results of the pilot experiment are shown in Fig. 5. For SOA = 48, 82, 150 and 300 ms, the four graphs plot (for participants S1, S2, S3 and S4), response error (the mean Euclidean distance of the participant’s responses from the correct responses across the 100 full-set trials at the given SOA). For all participants, response error has declined to its asymptote by 150 ms, so this was the SOA used in the main experiment.
Main experiment: procedure
As in the Gabor experiment, general training was used for S3 and S4 but omitted for S1 and S2. Additionally, standard ϕ-specific training was used for all participants in each attention condition. As in the pilot study, experimental blocks comprised 100 full-set trials, 20 target-only trials, and 20 singleton trials in a mixed list, i.e., trials within a block were randomly sequenced.
For each participant, performance in each task showed no significant improvement from block 2 to block 3. Therefore, each participant performed just three blocks of trials in each attention condition. The first block was discarded as practice; the second and third were retained as data.
Results and discussion
The results for participants S1, S2, S3, and S4 are plotted in columns 1, 2, 3, and 4 of Fig. 6. All participants achieve strikingly different attention filters in the Dark-only versus the Light-only attention conditions. Specifically, the F-test described in the section entitled “Testing whether two attention filters are significantly different” yields for S1, F 7,780 = 140.96, p≈0; for S2, F 7,780 = 271.01, p≈0; for S3, F 7,780 = 111.78, p≈0; for S4, F 7,780 = 108.85, p≈0. The attention filters achieved in the Dark-only attention condition give dramatically higher weight to dots with negative Weber contrasts than they do to dots with positive Weber contrasts (on average, 5.61× greater), and the reverse is true in the Light-only attention condition (5.78× greater). Note, however, that the Filter-fidelity values (inset in panels) tend to be smaller than the Filter-fidelity values in the Uniform and Graded attention conditions of the Gabor experiment. This reflects the fact that the attention filters achieved by all participants deviate strongly and systematically from the target filters. In particular,
-
1.
participants are unable to completely ignore distractors (distractor weights tend to deviate positively and, in most cases, significantly from 0 in the attention filters achieved), and
-
2.
participants are unable to give equal weight to all target items (target items with higher absolute Weber contrasts tend to receive higher weight).
By comparison, notice that in most cases, the attention filter achieved for target-only trials tends to be more uniform across the four target items than does f ϕ,full-set. (The exceptions are S3 and S4 in the Light-only attention condition.) This suggests that one of the costs incurred in ignoring dots of the distractor polarity may be loss of sensitivity to low-salience dots of the target polarity.Footnote 5
Each panel contains the singleton-corrected Efficiency, scEfffull-set (scEfftarget-only), which gives the proportion of dots the participant would need to include in his/her centroid computation to account for the random error in his/her responses on full-set (target-only) trials if the only source of error were missed dots. In all cases, singleton-corrected Efficiency values are high indicating that the participant can indeed deploy the attention filter he/she has achieved broadly across space with high sensitivity. In each panel scEfftarget-only is slightly higher than scEfffull-set indicating that one cost of filtering out distractors is to inject noise into the response-production process.
Averaging across participants and the two attention conditions, response error on singleton trials was 46 % (50 %) as large as \(\widehat {\sigma }\) estimated from full-set (target-only) trials. This suggests that approximately \(\frac {1}{4}\) of the variance in random response error on full-set and target-only trials results from error in localizing and moving the mouse to click on the selected response location; by the same token, approximately \(\frac {3}{4}\) of the variance in random response error on full-set and target-only trials results from processing the stimulus and computing the centroid.
Additional performance measures
Attention filter selectivity
Let f ϕ be the attention filter achieved by a participant in an attention condition with a binary target filter ϕ. In this case, it is useful to define “filter selectivity” as the average of f ϕ (t) across all target items t∈T y p e s divided by the average of |f ϕ (d)| for distractor items d∈T y p e s. For example, filter selectivities of 10 or higher are commonly observed in attending to black versus white items (Inverso et al. In press) or red vs. green items or large vs. small items (Blair et al. 2015), and these represent highly selective attention filters. On the other hand, in the Dots experiment, filter selectivities achieved by participants S1, S2, S3 and S4 in the Dark-only (Light-only) attention condition were all lower than 10: 7.29, 7.97, 3.91 and 4.89 (9.55, 6.88, 5.28, 3.78).
The productivity function
Another potentially useful descriptor (whose usefulness is not limited to attention conditions with binary target filters) is the “productivity function,” P ϕ (k) = scEff ϕ ×f ϕ (k), k∈T y p e s, where f ϕ is the attention filter achieved by a given participant in the condition with target filter ϕ and scEff ϕ is the singleton-corrected Efficiency achieved in that condition. For any item type k, P ϕ (k) provides an estimate of the overall effectiveness with which items of type k influence responses in the centroid task in the attention condition with target filter ϕ. Insofar as f ϕ characterizes the perceptual limits on information that is available to brain processes subsequent to the early centroid computation, and insofar as Efficiency scEff ϕ characterizes cognitive limits, the productivity function P ϕ is an estimate of that portion of the stimulus information that is available to subsequent brain processes.
General discussion
An attention filter is a process, initiated by a participant in the context of a task requiring feature-based attention, which operates broadly across space to modulate the relative effectiveness with which different features in the retinal input influence task performance. As we have shown, the specific task of extracting the centroid of a cloud of items can form the core of a method for deriving precise, quantitative measurements of attention filters.
The feature-similarity gain model and the centroid task
A prominent theory of FBA is the “feature-similarity gain model” (Treue & Martinez-Trujillo 1999; Martinez-Trujillo & Treue 2004). Under this theory, “...the up- or down-regulation of the gain of a sensory neuron reflects the similarity of the features of the currently behaviorally relevant target and the sensory selectivity of the neuron along all target dimensions.” Treue and Martinez-Trujillo (1999). In other words (this theory proposes), FBA will operate to amplify the responses of neurons sensitive to the attended feature and attenuate the responses of neurons insensitive to the attended feature.
The feature-similarity gain model is intended first and foremost to apply to deployments of FBA in which the participant strives to heighten the salience of a singular feature–e.g., a specific direction of motion or a specific color. It might be argued that the Highest-only and Lowest-only attention conditions in the Gabor experiment aim at FBA deployments of this sort; however, participants perform poorly at each of these tasks suggesting that human vision is devoid of neurons selective for the target feature in either of these two tasks. In the Uniform, Graded, and Inverse-graded attention conditions of the Gabor experiment, as well as the Dark-only or Light-only attention conditions in the Dots experiment, participants strive to deploy FBA in ways that heighten the salience of a range of feature values rather than a single, specific feature value.
The feature-similarity gain model might be generalized to handle FBA deployments of this sort by assuming that the gain of a given class of neurons is set in a given attention condition according to the degree to which the differential sensitivity of that neuron to the items in the display “matches the target filter” (i.e., the function ϕ in Eq. 1 used to give feedback in a particular attention condition).
There are various possible interpretations that might be given to the phrase, “matches the target filter”; however, under all such definitions, the proposed generalization of the feature-similarity gain model is likely to produce suboptimal performance in the centroid task. In particular, consider the case in which human vision comprises (1) a particular class C i d e a l of neurons whose sensitivity to the different items in T y p e s precisely matches the target filter in a given attention condition as well as (2) other classes C 1, C 2,⋯ of neurons whose sensitivity to the different items in T y p e s matches the target filter less well. The generalized feature-similarity gain model would assign gain to the neurons in a given class C k in proportion to the degree to which the activation produced in neurons in class C k by the items in T y p e s “matches the target filter.” Typically, then, various classes C k of neurons would be likely to receive non-zero gain under the generalized feature-similarity gain model. However, under nearly all definitions of the phrase, “matches the target filter,” performance in the centroid task will be optimized by assigning full gain to neuron class C i d e a l and zero gain to all other classes of neurons.
It is an empirical question whether the generalized feature-similarity model holds. We submit, however, that it would be surprising if human vision were committed to a general strategy so likely to produce suboptimal behavior given the available neural resources.
Beyond the centroid task
It is important to realize that the attention filters a participant can achieve for one task may differ from those he/she can achieve for another task. A promising direction for future work is to compare the attention filters for various different sets of T y p e s across different tasks. The centroid task is especially appealing because of the remarkable statistical power it confers in estimating attention filters. Nonetheless, experiments to measure attention filters using other tasks are straightforward. For example, it is easy to imagine paradigms generalizing the sorts of experiments that have been used to investigate the extraction of summary statistics from ensembles of items (Alvarez & Oliva, 2008; Alvarez 2011; Ariely, 2001; Chong & Treisman, 2003, 2005a, 2005b).
One family of such experiments might use dots of the sort used in the Dots experiment. On each trial the stimulus is a cloud of dots (with the counts of different dot types varying across trials). In a given attention condition the participant is asked to attempt to apply a target attention filter ϕ to the cloud of dots in the display and sum the filter output across space to extract a summary statistic; if this statistic is greater than some fixed criterion, then the correct response is “A,” otherwise “B.” For example, in an attention condition analogous to the Dark-only condition in the Dots experiment, ϕ assigns the value 1 (0) to all dots of negative (positive) Weber contrast, and the criterion might be set at 9.5. In this case, the correct response would be “A” if the number of dots darker than the background was 10 or more and “B” otherwise. This rule would be used to give trial-by-trial feedback. The data would consist of (1) the matrix M whose j th row contained the counts of different dot types in the stimulus on trial j, and (2) the vector R whose j th entry was 1 (0) if the participant responded “A” (“B”) on trial j. A simple probit model (i.e., a general linear model with a Gaussian linking function) that uses the counts of different dot types as variables to predict the participant’s trial-by-trial classifications of stimulus clouds suffices to estimate the attention filter achieved by the participant in a given attention condition.
Will this classification task yield attention filters identical to the centroid task? The answer to this question is likely to have important consequences for our understanding of the functional architecture of human vision.
Final remarks
The centroid method described here provides guidelines developed (often through painful experience) for efficiently measuring attention filters. The two example experiments illustrate various aspects of the method. The method is distinguished by its power and simplicity:
-
1.
Statistical power: The attention filter plotted in each of the 12 panels of Fig. 3 is derived from only 200 trials (∼10–13 min.) The same is true for each of the full-set attention filters in Fig. 6. To achieve comparable results using a standard psychophysical choice paradigm (e.g., Nam and Chubb (2000), Chubb and Nam (2000), and Chubb and Talevich (2002) would require 3000 or more trials). The target-only attention filters plotted in each of the four panels of Fig. 6 are based on only 40 trials; although their confidence intervals are larger, these curves are nonetheless very informative.
-
2.
Modeling simplicity: To estimate an attention filter (as well as the other model parameters, x d e f a u l t , y d e f a u l t , V and σ) from centroid task data is easy: A simple linear model is used to predict the x- and y-coordinates of the participant’s response from the locations of different types of dots across all of the trials observed in the data set.
Notes
As will become clear, fixing the n k ’s across trials allows the use a very simple and efficient linear regression procedure to estimate the attention filter f ϕ achieved by the participant in the attention condition with any given target filter ϕ. It also makes it easy to (1) define the “Filter-fidelity” statistic (Eq. 8) that is used to measure the deviation of f ϕ from ϕ and (2) to compute the “ p m i s s ” statistic that is used to measure the corruption of the participant’s responses by random error.
Performance was poorer in the Inverse-graded attention condition than in either the Uniform or Graded attention conditions. Accordingly, a large number of ϕ-specific training trials were devoted to this attention condition (600) for each of participants S3 and S4 in order to insure that performance had stabilized. In addition, S3 required a greater number (400) of ϕ-specific training trials in the Graded attention condition than did S4 (100). S3 completed the different attention conditions in the following order: Inverse-graded (600 ϕ-specific training trials, and 200 data-collection trials over 2 days) →Uniform (100 ϕ-specific training trials, and 200 data-collection trials in 1 day) →Graded (400 ϕ-specific training trials, and 200 data-collection trials over 2 days); and S4 was tested in the reverse order from S3: Graded (100 ϕ-specific training trials, and 200 data-collection trials in 1 day) →Uniform (100 ϕ-specific training trials, and 200 data-collection trials in 1 day) →Inverse-graded (600 ϕ-specific training trials, and 200 data-collection trials over 2 days).
In an experiment that also includes target-only trials (see “Target-only clouds”), the same method can be used to estimate the attention filter achieved across the target-only trials (provided there are multiple simultaneous targets).
An important type of model failure can result from the use of a centroid computation that systematically overweights or underweights peripheral display items vs. items near the center of the stimulus cloud. Such spatially inhomogeneous weighting of display items was extensively analyzed by Drew et al. (2010) who found considerable variation across participants. It should be noted, however, that the participants of Drew et al. (2010) received no general training in the centroid task. The purpose of general training is precisely to minimize such idiosyncratic differences between the centroid computations used by different participants. Moreover, the model parameters introduced by Drew et al. (2010) to accommodate variations in the centroid computations used by different participants did not significantly influence the estimated attention filters. For these reasons, individual differences in the centroid computations used by different participants are not further considered here.
A prominent theory of cognitive aging proposes that as people age they become less able to inhibit irrelevant information at different processing stages (Hasher and Zacks 1988; Lustig et al. 2007). In particular, evidence is accumulating to support the claim that older adults may be specifically impaired in deploying feature-based attention (Quigley et al. 2010; Quigley and Muller 2014). This suggests that comparison of the attention filters and Efficiencies achieved in full-set vs. target-only trials in centroid experiments such as the Dots experiment reported here may provide a useful measure of cognitive aging.
References
Alvarez, G.A. (2011). Representing multiple objects as an ensemble enhances visual cognition. Trends in Cognitive Sciences, 15(3), 122–131.
Alvarez, G.A., & Oliva, A. (2008). The representation of simple ensemble visual features outside the focus of attention. Psychological Science, 19(4), 392–398.
Ariely, D. (2001). Seeing sets: Representation by statistical properties. Psychological Science, 12(2), 157–162.
Arman, A.C., Ciaramitaro, V.M., & Boynton, G.M. (2006). Effects of feature-based attention on the motion aftereffect at remote locations. Vision Research, 46(18), 2968–2976.
Baldassi, S., & Verghese, P. (2005). Attention to locations and features: Different topdown modulation of detector weights. Journal of Vision, 5(6), 556–570.
Ball, K., & Sekuler, R. (1981). Adaptive processing of visual motion. Journal of Experimental Psychology: Human Perception and Performance, 7(4), 780–794.
Blair, G., Wright, C. E., Chubb, C., & Sperling, G. (2015). Disc size supports top-down, selective attention in a task requiring integration across multiple targets. Journal of Vision, 15(12), 897–897.
Chong, S.C., & Treisman, A.M. (2003). Representation of statistical properties. Vision Research, 43(4), 393–404.
Chong, S.C., & Treisman, A.M. (2005a). Attentional spread in the statistical processing of visual displays. Perception and Psychophysics, 67(1), 1–13.
Chong, S. C., & Treisman, A. M. (2005b). Statistical processing: Computing the average size in perceptual groups. Vision Research, 45(7), 891–900.
Chubb, C., & Nam, J.-H. (2000). The variance of high contrast texture is sensed using negative half-wave rectification. Vision Research, 40, 1695–1709.
Chubb, C., & Talevich, J. (2002). Attentional control of texture orientation judgments. Vision Research, 42, 311–330.
Dakin, S.C. (2001). Information limit on the spatial integration of local orientation signals. Journal of the Optical Society of America A, 18, 1016–1026.
Davis, E.T., & Graham, N. (1981). Spatial frequency uncertainty effects in the detection of sinusoidal gratings. Vision Research, 21(5), 705–712.
Drew, S., Chubb, C., & Sperling, G. (2010). Precise attention filters for Weber contrast derived from centroid estimations. Journal of Vision, 10(10:20), 1–16. http://www.journalofvision.org/content/10/10/20.
Felisberti, F.M., & Zanker, J.M. (2005). Attention modulates perception of transparent motion. Vision Research, 45(19), 2587–2599.
Fieller, E. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society, Series B, 16 (2), 175–185.
Graybill, F. (1961). An introduction to linear statistical models, Volume 1: McGraw-Hill.
Haenny, P.E., Maunsell, J.H., & Schiller, P.H. (1988). State-dependent activity in monkey visual cortex. ii. Retinal and extraretinal factors in v4. Experimental Brain Research, 69(2), 245–259.
Hasher, L., & Zacks, R.T. (1988). Working memory, comprehension, and aging: A review and a new view. In G. H. Bower (Ed.), Psychology of Learning and Motivation (Vol. 22 pp. 193–225). New York: Academic Press.
Hayden, B.Y., & Gallant, J.L. (2005). Time course of attention reveals different mechanisms for spatial and feature-based attention in area v4. Neuron, 47(5), 637–643.
Ho, T.C., Brown, S., Abuyo, N.A., Ku, E.-H.J., & Serences, J. T. (2012). Perceptual consequences of feature-based attentional enhancement and suppression. Journal of Vision, 12, 1–17. (8:15 (http://www.journalofvision.org/content/12/8/15, 10.1167/12.8.15))
Inverso, M., Sun, P., Chubb, C., Wright, C.E., & Sperling, G. (In press). Evidence against global attention filters selective for absolute bar-orientation in human vision. Attention, Perception and Psychophysics.
Kamitani, Y., & Tong, F. (2005). Decoding the visual and subjective contents of the human brain. Nature and Neuroscience, 8(5), 679–685.
Lankheet, M.J., & Verstraten, F.A. (1995). Attentional modulation of adaptation to two-component transparent motion. Vision Research, 35(10), 1401–1412.
Ling, S., Liu, T., & Carrasco, M. (2009). How spatial and feature-based attention affect the gain and tuning of population responses. Vision Research, 49(10), 1194–1204.
Liu, T., Larsson, J., & Carrasco, M. (2007). Feature-based attention modulates orientation-selective responses in human visual cortex. Neuron, 55(2), 313–323.
Liu, T., & Mance, I. (2011). Constant spread of feature-based attention across the visual field. Vision Research, 51(1), 26–33.
Liu, T., Slotnick, S.D., Serences, J. T., & Yantis, S. (2003). Cortical mechanisms of feature-based attentional control. Cerebral Cortex, 13(12), 1334–1343.
Lu, Z.-L., & Dosher, B.A. (1998). External noise distinguishes attention mechanisms. Vision Research, 38, 1183–1198.
Lustig, C., Hasher, L., & Zacks, R.T. (2007). Inhibitory deficit theory: Recent developments in a “new view.” In D. S. Gorfein, & C. M. MacLeod (Eds.), The place of inhibition in cognition (pp. 145–162). Washington: American Psychological Association.
Martinez-Trujillo, J.C., & Treue, S. (2004). Feature-based attention increases the selectivity of population responses in primate visual cortex. Current Biology, 14(9), 744–751.
Maunsell, J.H., Sclar, G., Nealey, T.A., & DePriest, D.D. (1991). Extraretinal representations in area v4 in the macaque monkey. Visual Neuroscience, 7(6), 561–573.
McAdams, C.J., & Maunsell, J.H. (2000). Attention to both space and feature modulates neuronal responses in macaque area v4. Journal of Neurophysiology, 83(3), 1751–1755.
Motter, B.C. (1994). Neural correlates of attentive selection for color or luminance in extrastriate area v4. The Journal of Neuroscience, 14(4), 2178–2189.
Muller, M.M., Anderson, S., Trujillo, N.J., Valdes-Sosa, P., & Hillyard, S.A. (2006). Feature-selective attention enhances color signals in early visual areas of the human brain. Proceedings of the National Academy of Sciences, USA, 103(38), 14250–14254.
Nam, J.-H., & Chubb, C. (2000). Texture luminance judgments are approximately veridical. Vision Research, 40, 1677–1694.
O’Craven, K.M., Rosen, B.R., Kwong, K.K., Treisman, A.M., & Savoy, R.L. (1997). Voluntary attention modulates fMRI activity in human MT-MST. Neuron, 18(4), 591–598.
Quigley, C., Andersen, S. K., Schultz, L., Grunwald, M., & Muller, M. M. (2010). Feature-selective attention: Evidence for a decline in old age. Neuroscience Letters, 474(1), 5–8.
Quigley, C., & Muller, M.M. (2014). Feature-selective attention in healthy old age: A selective decline in selective attention? The Journal of Neuroscience, 34(7), 2471–2476.
Rossi, A.F., & Paradiso, M.A. (1995). Feature-specific effects of selective visual attention. Vision Research, 35(5), 621–634.
Saenz, M., Buracas, G.T., & Boynton, G.M. (2002). Global effects of feature-based attention in human visual cortex. Nature Neuroscience, 5(7), 631–632.
Saenz, M., Buracas, G. T., & Boynton, G.M. (2003). Global feature-based attention for motion and color. Vision Research, 43(6), 629–637.
Schoenfeld, M.A., Hopf, J.M., Martinez, A., Mai, H.M., Sattler, C., Gasde, A., Heinze, H.J., & Hillyard, S.A. (2007). Spatio-temporal analysis of feature-based attention. Cerebral Cortex, 17(10), 2468–2477.
Seidemann, E., & Newsome, W.T. (1999). Effect of spatial attention on the responses of area MT neurons. Journal of Neurophysiology, 81(4), 1783–1794.
Serences, J.T., & Boynton, G.M. (2007). Feature-based attentional modulations in the absence of direct visual stimulation. Neuron, 55(2), 301–312.
Serences, J.T., Saproo, S., Scolari, M., Ho, T., & Muftuler, L.T. (2009). Estimating the influence of attention on population codes in human visual cortex using voxel-based tuning functions. Neuroimage, 44(1), 223–231.
Shih, S.I., & Sperling, G. (1996). Is there feature-based attentional selection in visual search? Journal of Experimental Psychology: Human Perception and Performance, 22(3), 758–779.
Solomon, J.A. (2010). Visual discrimination of orientation statistics in crowded and uncrowded arrays. Journal of Vision, 10, 1–16. (14:19 http://www.journalofvision.org/content/10/14/19, 10.1167/10.14.19)
Sperling, G. (1963). A model for visual memory tasks. Human Factors, 5, 19–31.
Sperling, G., Wurst, S. A., & Lu, Z.-L. (1992). Using repetition detection to define and localize the processes of selective attention. In D. E. Meyer, & S. Kornblum (Eds.), Attention and performance XIV: Synergies in experimental psychology, artificial intelligence, and cognitive neuroscience–A silver jubilee, chapter 12 (pp. 265–298). Cambridge: MIT Press.
Treue, S. (2001). Neural correlates of attention in primate visual cortex. Trends in Neuroscience, 24(5), 295–300.
Treue, S., & Martinez-Trujillo, J.C. (1999). Feature-based attention influences motion processing gain in macaque visual cortex. Nature, 399(6736), 575–579.
White, A.L., & Carrasco, M. (2011). Feature-based attention involuntarily and simultaneously improves visual performance across locations. Journal of Vision, 11, 1–10. (6:15(http://www.journalofvision.org/content/11/6/15, 10.1167/11.6.15))
Zhang, W., & Luck, S.J. (2009). Feature-based attention modulates feedforward visual processing. Nature Neuroscience, 12(1), 24–25.
Acknowledgments
This work was supported by NSF Award BCS-0843897.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Estimating model parameters
The section, “Appendix 4. Matlab code for fitting the centroid model” includes Matlab code that implements all of the computations described in Appendices 11–33.
It is possible to rewrite Eqs. 5 and 6 as follows. For each trial j,
where
-
1.
for k = 1,2,⋯ , N t y p e s , X k (j) is the sum of the x-locations of all items of type k in the cloud on trial j, and Y k (j) is the corresponding sum of y-locations,
-
2.
\(X_{N_{types}+1}(j) = 1\), \(X_{N_{types}+2}(j) = 0\),
-
3.
\(Y_{N_{types}+1}(j) = 0\), \(Y_{N_{types}+2}(j) = 1\), and
-
4.
$$ W_{k} = \left\{\begin{array}{ll} \frac{V}{S}f_{\phi} (k) & \text{for}~k=1,2,\cdots,N_{types},\\ (1-V)x_{default} & \text{if}~k=N_{types}+1, \\ (1-V)y_{default} & \text{if}~k=N_{types}+2. \end{array} \right. $$(21)
Each of Eqs. 19 and 20 describes a basic linear model for which standard regression is the appropriate analysis. Moreover, both equations use the same weights W k , k = 0,1,⋯N t y p e s . Accordingly, standard matrix methods can be used to estimate the model parameters x d e f a u l t , y d e f a u l t , f ϕ , V, and σ.
Let X be the N t r i a l s ×(N t y p e s +2) matrix whose (j, k)th entry is X k (j), and let Y be the corresponding matrix generated from the Y k (j)’s. Then
-
1.
form the full stimulus matrix M by appending the matrix Y to the bottom of the matrix X: i.e., for j = 1,2,⋯ ,2N t r i a l s , and k = 1,2,⋯ , N t y p e s +2,
$$ M(j,k) = \left\{ \begin{array}{ll} X_{k}(j) & j \le N_{trials}\\Y_{k}(j-N_{trials}) & j > N_{trials} \end{array} \right.,~\text{ and } $$(22) -
2.
form the full response vector R by appending R y to R x : i.e., for j = 1,2,⋯ ,2N t r i a l s ,
$$ R(j) = \left\{ \begin{array}{ll} R_{x}(j) & j \le N_{trials}\\R_{y}(j-N_{trials}) & j > N_{trials} \end{array} \right. . $$(23)
Then derive the weight vector \(\widehat {W}\) minimizing
i.e., find the weight vector \(\widehat {W}\) that acts on the stimulus matrix M to best predict the observed response vector R. In fact, \(\widehat {W}\) can be derived via linear regression.
Specifically, for M the matrix given by Eq. 22 and R the vector given by Eq. 23, the linear regression model assumes that
where Q is a 2N t r i a l s ×1 vector whose entries are jointly independent Gaussian random variables all with mean 0 and standard deviation σ. Under these assumptions, the maximum likelihood estimate of W is given by
where M ‡ denotes the (left-side) pseudoinverse of M. I.e., M ‡ satisfies M ‡ M = I (where I is the (N t y p e s +2)×(N t y p e s +2) identity matrix).
Given \(\widehat {W}\), estimates of V and f ϕ , x d e f a u l t and y d e f a u l t are obtained as follows:
(where n k is the number of items of type k occurring in each display). It is easy to check that
where S is given by Eq. 4.
To estimate f ϕ , we take
to insure (for plotting purposes) that \(\widehat {f}_{\phi }\) sums to 1. Note that there is no aspect of the modeling procedure that excludes negative attention filter values \(\widehat {f}_{\phi }(k)\) for some item types k.
Finally, the following statistics are used to estimate x d e f a u l t and y d e f a u l t :
Appendix 2: Estimating confidence intervals for model parameters
Throughout this section, it will be convenient to write N W for N t y p e s +2 (the number of weights W k in Eqs. 19 and 20).
Point estimates of the model parameters V, f ϕ , and σ are given by Eqs. 27, 29, and 11, and estimates of both x d e f a u l t , y d e f a u l t are given by Eq. 30. This section of the appendix shows how to derive confidence intervals for V and for f ϕ (k), k = 1,2,⋯, N t y p e s . The parameters x d e f a u l t and y d e f a u l t are included in the model primarily to enable estimation of V and are of little interest in themselves; moreover, for high values of V, the estimates of x d e f a u l t and y d e f a u l t become highly volatile. For these reasons, confidence intervals are not supplied for these two parameters.
The model used to derive these estimates assumes Eq. 25 and applies Eq. 26 to derive estimates \(\widehat {W}_{k}\), of the regression weights W k , for k = 1,2,⋯ , N W .
By singular value decomposition, we can express M ‡ as
where C is the N W ×N W matrix of “principal components,” H is N W ×N W diagonal matrix of corresponding eigenvalues, and L is the column orthonormal matrix whose k th row gives the “loadings” of the different principal components contributing to the k th column of matrix M ‡.
The matrix
is the covariance matrix that characterizes the ellipsoidal dispersion of the multivariate Gaussian noise that perturbs the estimate \(\widehat {W}\) away from its expectation W.
The random variables \(\widehat {W}_{i}\), i = 1,2,⋯ , N W are jointly normal with means W i and covariance matrix Θ. A fundamental result (see, e.g., Graybill (1961), Theorem 3.6, p. 56) is captured in the following
One-variable linear combination theorem
Let random variables X 1, X 2,⋯, X N be jointly normal with expectations μ 1, μ 2,⋯, μ N and covariance matrix Γ. Then for any vector v of length N, the random variable
is normally distributed with
and variance
This theorem is actually a special case of the following
Two-variable linear combination theorem
Let X 1, X 2,⋯, X N be jointly normal with expectations μ 1, μ 2,⋯, μ N and covariance matrix Γ. For any vectors v and w of length N, the random variables
are jointly normal with
and 2×2 covariance matrix Υ with entries
The confidence interval for V
The linear combination theorem implies that our estimate \(\widehat {V}\) of Data-drivenness (Eq. 27) is normally distributed with mean V (as shown in Eq. 28) and variance
where η is the N W ×1 vector
for n k the number of items of type k occurring in each full-set stimulus cloud. Thus, the random variable
has a standard normal distribution. However, the true value of σ is unknown. If instead, the unbiased estimator \(\widehat {\sigma }\) (given by Eq. 11) is used in the denominator, we obtain a statistic
that has a t-distribution with r = 2N t r i a l s −N t y p e s −2 degrees of freedom. Writing \(F_{r}^{-1}\) for the inverse of the t-cdf with r degrees of freedom, we note that the critical value of T V is given by
That is,
from which it follows that
and hence that
yielding a p c r i t confidence interval of \(\widehat {V}\pm \left (\widehat {\sigma }^{2}\eta ^{T}{\Theta }\eta \right )^{\frac {1}{2}}t_{crit}\) for V.
Confidence intervals for f ϕ
The problem of computing confidence intervals for f ϕ is complicated by the fact that \(\widehat {f}_{\phi }\) is constrained to sum to 1 by Eq. 29. The following result (Fieller 1954) is directly applicable.
The ratio theorem
Let X and Y be jointly normal random variables with expectations μ X and μ Y , variances ν 11 σ 2, ν 22 σ 2 and covariance ν 12 σ 2. Then, if ν 11, ν 22 and ν 12 are all known, and if μ Y is positive and large in comparison to σ, a 1−α confidence interval for the ratio \(\frac {\mu _{X}}{\mu _{Y}}\) is given by
where
for \(\widehat {\sigma }\) the unbiased estimator of σ and \(t_{r,\alpha }=F_{r}^{-1}\left (\frac {1+\alpha }{2}\right )\) (where, as above, \(F_{r}^{-1}\) denotes the inverse of the t-cdf with r degrees of freedom).
To derive confidence intervals for f ϕ (k), we note that for k = 1,2,⋯, N t y p e s , the antecedent conditions of the ratio theorem are satisfied by taking the random variables
and
where ξ k and U are the N W ×1 vectors whose j th components are given by
Then the one-variable linear combination theorem implies, for Θ given by Eq. 32, that
and the two-variable linear combination theorem implies that
In the current application of the ratio theorem, the unbiased estimator \(\widehat {\sigma }\) of σ is given by Eq. 11, and the degrees of freedom r is equal to 2×N t r i a l s −(N t y p e s +2). Thus, all the values required to evaluate Eq. 49 are in hand.
Appendix 3: Computing Efficiency
This Appendix shows how to compute Efficiency=1−p m i s s (See “Measuring resistance to residual error: Efficiency.”). Roughly speaking, p m i s s is the proportion of items by which stimulus clouds would need to be decimated in order to yield the observed value of \(\widehat {\sigma }\) (as given by Eq. 11) if this decimation were the only source of random error in the participant’s centroid computation. (In the case in which Efficiency is “singleton-corrected,” \(\sigma _{SC}=\sqrt {\widehat {\sigma }^{2}-\widehat {\sigma }_{singleton}^{2}}\) is used instead of \(\widehat {\sigma }\), where \(\widehat {\sigma }_{singleton}\) is the unbiased estimator of the standard deviation of the response errors (in each of x and y) observed on singleton trials.) p m i s s could of course be estimated by simulation. Here we present an algorithm to compute p m i s s exactly.
Notation
For j = 1,2,⋯ , N t y p e s , let n j be the number of items of type j in each stimulus cloud. Let N be the total number of items in each stimulus display, and for i = 1,2,⋯ , N, let τ i be the type of the i th item in the stimulus display, and let (X i , Y i ) be its location. We write \(\widetilde {N}\) for the set of integers 1,2,⋯ , N. For any of the integers L = 0,1,⋯ , N−1, we write α L for a random subset of \(\widetilde {N}\) of size N−L (i.e., α L is derived by removing a random set of L items from \(\widetilde {N}\)). For any integer L = 0,1,⋯ , N−1, we define P L , the proportion of Lost items from a stimulus of N items, as
Assumptions about stimulus cloud locations
The algorithm to compute p m i s s makes several assumptions about the locations of items in a stimulus cloud. All of the x- and y-coordinates, X 1, X 2,⋯ , X N , Y 1, Y 2,⋯ , Y N , of the stimulus cloud locations are assumed to be identically distributed random variables with mean 0 and variance η 2. The X i ’s (Y i ’s) are also assumed to be pairwise uncorrelated. That is, the expected value of the product of the x-coordinates (and of the y-coordinates) of two stimulus items is
Note in particular that these assumptions are satisfied by stimulus clouds
-
1.
whose locations are drawn independently from a circular, bivariate Gaussian density whose horizontal and vertical standard deviations are both η.
-
2.
whose locations are produced as described in “Generating full-set stimulus clouds” so as to have fixed dispersion η.
The hypothetical response process used to derive p m i s s
The response process used to derive p m i s s mirrors the model of Eqs. 5 and 6. However, it assumes that all random response error is due to random removal of items from the stimulus prior to computing the centroid. Let x d e f a u l t and y d e f a u l t be real numbers, let V be a Data-drivenness value (i.e., 0<V<1), let \(\ddot {f}:Types\rightarrow \mathbb {R}\) be an attention filter, and set
Because \(\ddot {f}\) is an attention filter, it is constrained (as a matter of convention) to satisfy \({\sum }_{i=1}^{N_{types}}\ddot {f}(\tau _{i})=1\). The function f will not generally satisfy this condition. However, f has the convenient property that the x- and y-coordinates of the response to an undecimated cloud (i.e., a cloud from which 0 items have been removed) in our hypothetical experiment are given by
For any subset A of \(\widetilde {N}\) let
Then the random variables
represent the x- and y-coordinates of a response that is produced by
-
1.
decimating the stimulus by L items,
-
2.
replacing each of the remaining items i in the display (i.e., each i∈α L ) by a “penny-pile” of size f(τ i ),
-
3.
deriving the centroid (C X , C Y ) of this filtered, decimated cloud, and
-
4.
selecting the point that lies proportion V of the way from (x d e f a u l t , y d e f a u l t ) to (C X , C Y ).
This process is very similar to the process described in Eqs. 5 and 6 by which the participant is assumed to produce his/her response on a given trial j. The key differences are
-
1.
the summation in each of Eqs. 5 and 6 is over all items in the display whereas the summation in each of Eqs. 61 and 62 is over a random subset of N−L items from the original stimulus display, and
-
2.
Equations 5, 6 include the additive noise term Q x (j) (Q y (j)). These terms are lacking from Eqs. 61 and 62.
The definition of p m i s s
For L = 0,1,⋯ , N−1, define
That is, Var m i s s (P L ) is the variance of the difference (in either the x- or the y-coordinate) between the response produced using a stimulus cloud with L randomly chosen items removed vs. the response produced using a full stimulus cloud.
As is evident, Var m i s s (P L ) is an increasing function of the proportions P L corresponding to L = 0,1,⋯ , N−1. We extend this function to any 0<p≤P N−1 by linearly interpolating between the values Var m i s s (P L ) as follows. For any probability p<P N−1, let L p be the greatest integer less than or equal to N p and define
It is not hard to show that this definition is equivalent to setting
where λ p is the integer-valued random variable defined as follows:
for
The function Var m i s s (p) is strictly increasing over the set of all p such that 0<p≤P N−1. Therefore, the inverse function \(\text {Var}_{miss}^{-1}\) is well defined. Accordingly, for any given value of the model parameter σ 2<Var m i s s (P N−1) characterizing the random variables Q x (j) and Q y (j) in Eqs. 5 and 6, we now define
Specifically, Eq. 64 implies that
where L σ is the greatest integer such that \(\text {Var}_{miss}\left (P_{L_{\sigma }} \right ) <\sigma ^{2}\). To reiterate, P m i s s (σ) is the proportion of items by which the stimulus cloud must be decimated in order for the variance of the difference between the x-coordinate (the same goes for the y-coordinate) of the response produced using a decimated stimulus cloud vs. the response produced using a full stimulus cloud to be equal to σ 2.
Finally, we define the statistic p m i s s characterizing the performance of a participant in a given centroid task condition by
for \(\widehat {\sigma }\) given by Eq. 11.
Computing p m i s s
As is clear from Eqs. 69 and 70, if we can compute Var m i s s (P L ) for any L = 0,1,⋯ , N−1, then we can
-
1.
find the greatest integer \(L_{\widehat {\sigma }}\) such that \(\text {Var}_{miss}(P_{L_{\widehat {\sigma }}})\leq \widehat {\sigma }^{2}\), and
-
2.
compute \(p_{miss}=P_{miss}(\widehat {\sigma })\) using Eq. 69.
Accordingly, the remainder of this section develops an algorithm to compute Var m i s s (P L ).
For L = 0,1,⋯ , N−1, note that
for
(Of course, there exists a corresponding random variable Z Y (L) identically distributed to Z X (L).) Thus, Var m i s s (P L ) is given by
which follows from Eq. 72.
Our solution to the problem of computing Var m i s s (P L ) depends crucially on computing the variance of Z X (L). Toward this end, writing \(\bar {\alpha }_{L}\) for the complement of α L , note that Z X (L) can be rewritten as follows to aggregate all the X i ’s for i∈α L in one sum and all the X i ’s for \(i\in \bar {\alpha }_{L}\) in the other:
for
Moreover, for i = 1,2,⋯ , N, E[X i ]=0, and X i and W i are independent; hence E[Z X (L)]=0, from which it follows that
But for all \(i,j\in \widetilde {N}\), if i≠j, then E[X i X j ]=0, and otherwise \(E\left [{X_{i}^{2}}\right ]=\eta ^{2}\). Hence,
It follows that
where the function Γ(A) is defined as follows for any \(A\subset \widetilde {N}\):
Thus, if we can compute E[Γ(α L )], we will be able to compute Var m i s s (L). Toward this end, call any vector \(\vec {k}=\left (k_{1},k_{2},\cdots ,k_{N_{types}} \right )\) an L-count vector if k j ∈{0,1,⋯ , n j } for j = 1,2,⋯ , N t y p e s , and
That is, an L-count vector describes the numbers of different item types remaining after L items have been removed. For any subset \(A\subset \widetilde {N}\), if k j gives the number of integers h∈A for which τ h = j, then \(\vec {k}\) is called the count vector of A. Note that in this case
and
As is easy to check from Eqs. 82 and 83, if subsets \(A,A^{\prime } \subset \widetilde {N}\), share the same count vector, then Γ(A) = Γ(A ′). Accordingly, set \(\gamma (\vec {k})={\Gamma }(A )\) for \(A\subset \widetilde {N}\) with L-count vector \(\vec {k}\) and observe that
where \(\mathbf {P}(\vec {k})\) is the probability that a subset α L derived by randomly sampling N−L integers from \(\widetilde {N}\) without replacement will have L-count vector \(\vec {k}\). This probability is given by
To see this, note first that the denominator gives the total number of subsets of \(\widetilde {N}\) of size N−L, and the numerator gives the total number of those subsets that have, for j = 1,2,⋯ , N t y p e s , exactly k j of the n j integers \(i\in \widetilde {N}\) for which τ i = j.
Computing V a r m i s s ( p L )
To compute Var m i s s (P L ) for L = 0,1,⋯ , N−1, we
-
1.
Use the Matlab program called GetAllLCountVectors.m (included in “Appendix 4. Matlab code for fitting the centroid model”) that takes as input arguments the integer L, and the vector \(\vec {n}=\left (n_{1},n_{2},\cdots ,n_{N_{types}}\right )\) whose entries give the numbers of different types of items in a given stimulus cloud and returns a matrix K whose columns are all the L-count vectors consistent with \(\vec {n}\).
-
2.
For each of the vectors \(\vec {k}=K(:,j)\) in this array
-
3.
Use Eq. 84 to compute E[Γ(α L )].
-
4.
Derive var(Z X (L)) using Eq. 79, and
-
5.
compute Var m i s s (P L ) using Eq. 74.
The algebraic derivation of p m i s s described above has been confirmed by Monte Carlo simulation.
Limitations of the Efficiency computations
The algorithm for computing Efficiency (=1−p m i s s ) has the following limitations:
-
1.
For a given stimulus cloud size N s t i m , the minimum Efficiency that can be estimated is \(\frac {1}{N_{stim}}\). This is a consequence of the fact that a decimated display must contain at least a single item for a centroid to be computable. In the case of an experimental design that uses stimulus clouds with fixed dispersion D, an Efficiency of \(\frac {1}{N_{stim}}\) corresponds to a value of \(\widehat {\sigma }=D\) (the standard deviation of the x- or y-coordinate of a single display item). Two consequences of this observation are that
-
(a)
if \(\widehat {\sigma } > D\), it is impossible to compute Efficiency. In this case, our software assigns to Efficiency the value NaN (which stands for “not a number” in Matlab).
-
(b)
if N s t i m = 1, it is impossible to compute Efficiency.
-
(a)
-
2.
Efficiency may become practically impossible to compute if displays become too complicated in the sense that
-
(a)
the number of different types of items contained in a display becomes too large, and/or
-
(b)
the total number of items contained in a display becomes too large.
Specifically, a bottleneck can arise in computing E[Γ(α)] (Eq. 84). In particular, the number of L-count vectors can explode for values of L near half the total number of items in a display. For example, if full-set displays contain 3 each of 12 different types of items, then the number of different L-count vectors is 12 for L = 1 (and L = 35); 78 for L = 2 (and L = 34); 364 for L = 3 (and L = 33); ⋯; 766,272 for L = 13 (and L = 23); 1,024,464 for L = 14 (and L = 22), ⋯, 1,650,792 for L = 17 (and L = 19); and 1,703,636 for L = 18. This means that for complicated displays, it may still be possible to algebraically compute high and low Efficiencies but impossible to compute Efficiencies near \(\frac {1}{2}\).
When displays become too complicated to allow an exact computation of Efficiency, one can still estimate Efficiency by using Monte Carlo simulation.
-
(a)
Appendix 4: Matlab code for fitting the centroid model
The main program: FitCentroidModel.m
GetAttentionFilterCIs.m (Called by FitCentroidModel.m)
GetMismatch.m (Called by FitCentroidModel.m)
GetPMiss.m (Called by FitCentroidModel.m)
GetVarCorrespondingToDecimationNum.m (Called by GetPMiss.m)
GetAllCountVectors.m (Called by GetVarCorrespondingToDecimationNum.m)
Appendix 5: Matlab code for nested model comparison
Main program: FTestForEqualityOfFilters.m
FitCentroidModelForFTest.m (Called by FTestForEqualityOfFilters.m)
FitCentroidModelForFTestRestricted.m (Called by FTestForEqualityOfFilters.m)
SharedAttentionFilterSumSquaredDevs.m (Called by FitCentroidModelForFTestRestricted.m)
Rights and permissions
About this article
Cite this article
Sun, P., Chubb, C., Wright, C.E. et al. The centroid paradigm: Quantifying feature-based attention in terms of attention filters. Atten Percept Psychophys 78, 474–515 (2016). https://doi.org/10.3758/s13414-015-0978-2
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13414-015-0978-2