Fast and Conspicuous? Quantifying Salience With the Theory of Visual Attention

Particular differences between an object and its surrounding cause salience, guide attention, and improve performance in various tasks. While much research has been dedicated to identifying which feature dimensions contribute to salience, much less regard has been paid to the quantitative strength of the salience caused by feature differences. Only a few studies systematically related salience effects to a common salience measure, and they are partly outdated in the light of new findings on the time course of salience effects. We propose Bundesen’s Theory of Visual Attention (TVA) as a theoretical basis for measuring salience and introduce an empirical and modeling approach to link this theory to data retrieved from temporal-order judgments. With this procedure, TVA becomes applicable to a broad range of salience-related stimulus material. Three experiments with orientation pop-out displays demonstrate the feasibility of the method. A 4th experiment substantiates its applicability to the luminance dimension.


IntroductIon
As early as 1890, William James (1890, p.416) described a kind of attention caused by "an instinctive stimulus, a perception which, by reason of its nature rather than its mere force, appeals to one of our normal congenital impulses".
Though over a century old and in an uncommon wording, the quote expresses the idea that some objects trigger basic attentional mechanisms that all humans share. These mechanisms are featurespecific instead of being based on sensory strength. This description fits the current idea of stimulus-driven or bottom-up attention. For both James' description and the modern perspective, however, there remains the question which features attract such attention. Among James' rather uncommon examples are strange things, moving things, bright things, and metallic things. From today's knowledge, we would argue that it is not simply the properties, but the context in which the object occurs which are of great importance. This relation is captured by the term salience (among others) which describes a local feature difference that attracts attention. Thus, a bright stimulus among other bright stimuli would not attract much attention, and neither would an object moving in the same direction and with the same speed as other moving objects.
James' (1890) initial question which features are essential for guiding attention has been extensively studied within visual attention research (for a summary see Wolfe & Horowitz, 2004). However, much less research has addressed the strength of salience dimensions and their quantitative influence on attention, which is the focus of the present article. If you want to be seen, would it be better to be moving, or to be bright-or even metallic?
There are several, mostly model-based approaches to answer this question.
Early visual processing is based on the receptive fields of neurons tuned to particular features (e.g., Hubel & Wiesel, 1959, 1968, which are the source of bottom-up influences on perception and attention (for a review see Treue, 2003). The strength of these neurophysiological responses depends on the strength of the presented features (Zhang, Zhaoping, Zhou, & Fang, 2012). This strength and combinations of features of varying strength have predominantly been tackled using methods from engineering (e.g., Itti & Koch, 2001b;Zhao & Koch, 2013).
Computational modeling approaches allow to simulate retinotopic salience maps for natural input images (for a review see Frintrop, Rome, & Christensen, 2010). Different mathematical strategies have been explored to compute a salience value for every location in the image. Because of the difficulties of solving these problems algorithmically, machine learning techniques have been employed (Itti & Koch, 2001b;Zhao & Koch, 2013). Although such approaches may be applied in computer vision, it is unclear if they correspond to salience in human attention. For instance, many computational models such as that by Itti and Koch (2001a) predict that a higher luminance contrast attracts more attention. Einhäuser and König (2003) experimentally manipulated the luminance contrast of images. The participants in their study had to carefully study natural and modified natural images. The correlation of luminance contrast and fixation probability, however, failed to confirm the model prediction. The neurophysiological salience model by Li (2002) makes quantitative predictions about human performance in salience related tasks.
Li assumes that the strength of salience is represented implicitly by the firing rate of retinotopic neurons in V1 that encode specific features or combinations of features. This model accounts qualitatively for a wide range of empirical findings like search asymmetries in visual search (e.g., Li, 1999). It simulates the neurophysiological processing of the visual information by a complex recurrent artificial neuronal network (Li, 2001). The firing rate of these artificial neurons can hence be regarded as a quantitative prediction. However, the model cannot yet account quantitatively for experimental data.
Another model focusing on salience-related human performance is the fourth version of the Guided Search model by Wolfe (2007). In this model, salience is handled by a module for the bottom-up guidance of attention. This guidance is modeled by individual channels tuned to specific features (e.g., steep, shallow, left, and right for orientation). It contains a simple mathematical function for the contribution of each orientation channel. Salience itself is then computed by pairwise comparisons of these values for all visible objects. Wolfe states that the precise shape of the function that determines the contribution of a channel to overall salience is not critical for the qualitative performance of the model. This statement makes it questionable whether the model may provide good quantitative predictions on this level although it qualitatively accounts for a wide range of empirical findings on visual search. As Wolfe himself concedes, not all quantitative aspects of human behavior in terms of response times and errors can be successfully predicted. In conclusion, models do not yet provide a general explanation of the quantitative strength of salience.
Some attempts to establish a quantitative measure of salience are based on the analysis of behavioral data. Among the few studies in this line of research are those by Nothdurft (1993Nothdurft ( , 2000. He asked participants to compare the conspicuousness of two singletons that are unique elements embedded in a display of homogeneous background elements. Each stimulus whose salience was to be measured was presented with a stimulus that was salient due to a luminance difference. To measure the salience of a stimulus, the salience of the reference (luminance) stimulus was systematically increased. By this means, Nothdurft (2000) related the feature dimensions motion, orientation, luminance, and color to each other and also compared combinations of features from different dimensions. He quantified salience by relating a salient stimulus to the luminance difference that would create the same salience via approximation of psychometric functions and calculating what one might call the point of subjective equal salience.
This approach comes close to a general and theoretically well-founded quantification. Unfortunately, the results are difficult to replicate.
While we could replicate Nothdurft's findings using orientation and luminance, we also found that many participants showed no regular psychometric functions but rather a behaviour strongly influenced by guessing (unpublished pilot study). Similar difficulties were reported by Koene and Zhaoping (2007).
Starting from this need for a better behavioral method to quantify salience, Huang and Pashler (2005) came up with a search task for the biggest and brightest square in a display of several objects. The location of a small probe on its left or right side had to be reported to verify that the target was found. The dependent variable was the response time for a correct report. The display was randomly filled with other distractor squares. Salience was measured in these trials by introducing a salient key distractor. Its salience was quantified by examining the effect of the feature differences on response times. Via this quantification, Huang and Pashler related luminance and size to each other.
An additional aspect impeding the measurement of salience is its time course. Regarding the time course, several different ideas were discussed (e.g., Egeth & Yantis, 1997), with two types of temporal dynamics being especially important for the study of salience. (1) Salience-based progression of attention (e.g., Koch & Ullman, 1985) describes the shift of attention from the most salient spot in an image to the second most salient spot and so forth. (2) Time course of salience describes how the strength of salience effects varies over time. Salience effects increase from display onset to 100 or 150 ms (e.g., Couffe, Mizzi, & Michael, 2016;Kean & Lambert, 2003) and decay after approximately 300 ms. Evidence for this time course-which resembles the time course of attention (Olivers, 2007)-comes from a variety of different paradigms: probe detection (Dombrowe, Olivers, & Donk, 2010;Donk & Soesman, 2010), TOJs (Donk & Soesman, 2011), saccadic selection (Donk & van Zoest, 2008), and saccadic trajectories (van Zoest, Donk, & Van der Stigchel, 2012). This research implies that it is crucial to measure salience at specific points in time (a condition not met by Huang & Pashler, 2005).
The approaches discussed above consider or measure performance as an indicator of attention. They spend less effort on the quantification of salience itself. An approach that might provide such a quantification is Bundesen's Theory of Visual Attention (TVA; Bundesen, 1998). It comprises a psychologically inspired, general formal explanation of visual attention and selection processes and allows to infer attentional weights for specific objects in a display. The attentional weight determines if an object is encoded in visual short-term memory (VSTM)and if so, how quickly it is encoded-that is, its processing speed. These parameters can possibly be used as a general quantification of salience in the sense that the strength of salience is the attentional weight of an object. Although promising on an abstract level, TVA has only rarely been used to investigate salience (e.g., Nordfang, Dyrholm, & Bundesen, 2013). A possible reason is that in the item-report paradigms commonly used with TVA, the potential stimulus material is restricted to highly overlearned categories like digits and letters. The experimental paradigm requires a categorization because probabilities of stimulus categorizations are estimated. Hence, TVA is not directly applicable to salience research.
Recently, however, Tünnermann, Petersen, and Scharlau (2015) paved the way for such an application. Originally, they investigated whether the relatively faster perception of an attended stimulus in a pair is caused by speeded processing of this attended stimulus or decelerated processing of its unattended counterpart. Along with TVA-based item report, participants judged the temporal order (temporal-order judgment; TOJ) in which the stimuli appeared. Tünnermann et al. found that the attentional benefit originates from a combination of speeding up the attended and slowing down the unattended stimulus. This conclusion is based on a conventional TVA analysis. In the Discussion, however, they sketched a new approach. They suggested that data from TOJ might be directly modeled by TVA to obtain TVA's attention parameters. At first sight, this might not seem ground-breaking, but the proposed method offers applying TVA-based analysis to any kind of stimulus. The aim of the present paper is to test the feasibility of this approach.
In a nutshell-details will be explained below in two sections on TVA and modeling of TOJ data-the method consists of having observers judge the temporal order of two arbitrary visual stimuli. The interval between the stimuli is varied over trials. Application of TVA to the observers' judgments allows computing of processing speed, attentional weights, and overall attentional processing capacity. By manipulating the features of the stimulus, this method allows us to quantify salience in the form of these parameters. This approach can provide a theoretically well-founded, general quantification of salience.

The Theory of Visual Attention (TVA)
The present section provides a short summary of the relevant parts of TVA as a formal theory. Key terms for the modeling as well as the experiments are introduced, most importantly attentional weight and processing capacity. The section can, however, not provide a full introduction to TVA, for which we refer the interested reader to sources such as those by Bundesen (1998) and Bundesen, Habekost, and Kyllingsbaek (2005).
TVA was introduced as a unified theory of visual recognition and attentional selection. The theory achieves this by mathematically for-malizing the processes associated with the processing of visual objects from presentation towards encoding in VSTM. This processing is described as a race for representation in one of the limited slots in VSTM.
Stimuli race independently and in parallel. The race is influenced by many factors. Among them are the total number of elements competing for representation, the distribution of attention across the stimuli, and the categories to which the stimuli potentially belong.
In order to explain the formalization of this process, we proceed backwards from the arrival in VSTM to the appearance of the stimuli.
TVA assumes that the arrival times of stimuli in VSTM are exponentially distributed. Although the theory is fleshed out for multiple stimuli, the present approach is a simpler case: In the derivation proposed by Tünnermann et al. (2015) on the basis of TOJs, only two targets are encoded. Thus, the VSTM limitation can be ignored, which simplifies formalization. Back to the event of encoding an object to VSTM, the probability of an object x to be encoded before time t can then be expressed as the probability density function: (1) The two cases that are distinguished in the equation emerge from the assumption that there is a maximal ineffective exposure duration t 0 . This is the interval-that is still too short to provide enough sensory evidence for the race to start at all. If t ≤ t 0 , there is no chance that the processing of x finishes, whereas for t > t 0 there is a chance that processing has been completed. This probability depends on the exposure duration and the processing rate υ x . This rate's unit corresponds to categorizations per second, and it is composed of: (2) The equation is based on the idea that different categorizations are possible for object x . The set R represents this set of categories and the processing rate υ(x,i) with i ∈ R expressing the speed of the particular categorization that x belongs to category i. This i can, for example, refer to the property of having a particular color or a certain orientation.
Descending deeper into the formalization, the processing rate is defined as: This equation introduces three important factors that are η(x,i) , the strength of the sensory evidence that x belongs to category i, β i , a decision bias for category i, and the relative attentional weight for x given by its own weight ω x divided by the weights for all objects in the visual field. All objects in the visual field are contained in the set S. The weights are defined by the weight equation: (4) which again includes the sensory evidence for x as η(x,j) and a new variable Π j , which is a selection bias for category j, the pertinence value.
These are summed over the set of all categories R. The present approach concentrates on the parameters attentional weight ω, processing speed υ, and overall processing capacity C. The processing speed describes how quickly a representation in VSTM is built up. The sum of all the processing speed available is the processing capacity. The attentional weight corresponds to the relative advantage of a stimulus and expresses how much attention is allocated to this object in comparison to the others. (The biases Π and β are both held constant in the context of the present experiments and are hence not estimated.) Based on this admittedly swift introduction of the formalization the reader may deem TVA too cumbersome for dealing with comparably simple salience displays. This formalization, however, offers advantages.
Firstly, TVA allows precise quantification and provides psychologically meaningful parameters, such as processing speed, which can be applied to a broad range of perceptual and attentional phenomena. Secondly, salience research can be related to other phenomena that have already been studied with TVA, such as, for example, feature-difference (bottom-up) and feature-relevance (top-down) interactions (Nordfang et al., 2013). Finally, because of its precise quantitative nature, the TVA framework can be used for generating quantitative hypotheses.

Modeling TOJ Data by TVA
TVA was initially applied to multi-element displays of highly overlearned stimuli, such as letters or numbers from which all or several belonging to a certain category had to be reported. The stimuli have to be masked to derive the assumed performance. Both features-highly overlearned and maskable stimuli-have so far restricted the general applicability of TVA. As already mentioned, Tünnermann et al. (2015) discussed a TOJ model derived from TVA equations which renders TVA applicable to all kinds of visual stimuli and also does away with the necessity of masking. They did so by introducing a temporal-order task and relating the psychometric functions derived from this task mathematically to the distributions assumed by TVA. In the following section, we will explain briefly how TOJ data can be modeled with TVA. For more detail, we refer the reader to the original article.
In the TOJ paradigm, the temporal order of two onsets has to be judged. We call these two targets T probe and T reference . In the experiments presented later, they will have different properties according to the experimental variable, but at present these names are just used to make them distinguishable. They appear with a variable interval between them. The dependent variable is the amount of judgments for T probe .
If T probe precedes T reference with a large interval, judgments in favor of T probe will be frequent. If the other stimulus leads, the proportion of judgments for T probe will be low. If T probe and T reference are comparable, and the two stimuli are presented simultaneously, the participants' performance should reach chance level.
However, subjective perception can deviate markedly from objective events. Such judgments can, for example, be systematically influenced by attention. If one of the stimuli is attended-to in advance, this stimulus will be perceived earlier. This phenomenon is called prior entry (Spence & Parise, 2010). In terms of the judgments, this effect becomes evident in an increased proportion of reporting the attended stimulus as being perceived first.
TOJ data can be fitted with psychometric functions. Possible mathematical descriptions of psychometric functions include the cumulative distribution of the normal distribution, logistic, Weibull, and Gumbel functions, of which the former two are most widely employed (for more formal descriptions and how to fit these functions see Kuss, Jäkel, & Wichmann, 2005;Wichmann & Hill, 2001a. These functions have at least two parameters, the most important of which describe the center of the function and its slope. The center, at which both judgments are equally likely, is usually interpreted as the point of subjective simultaneity (though see Weiß & Scharlau, 2011). The slope is an indicator of discrimination performance. Importantly, it is a matter of debate which of the functions mentioned above should be used because none of them is particularly supported by theory. Hence, also the interpretation of the functions and their parameters is limited.
In contrast to psychometric functions, TVA offers parameters deeply rooted in psychological theory. As an additional advantage, they can also be interpreted readily. For instance, the parameter v corresponds to processing speed. Its unit is stimuli processed per second.
This model carries more information than the point of subjective simultaneity and discrimination performance which measure only performance, not the processes that drive this performance.
Each data point of a psychometric function is equivalent to the proportion of one event being encoded first. This connection is illustrated in Figure 1 for the judgment of a salient and a non-salient stimulus (the main conditions in the experiments reported below). Each of the points, sampled from the psychometric function, depends on the process depicted above the function: According to the TVA-based model, each of the two bars represents a race to VSTM. The results of these two races are compared which determines the participant's judgment. Each race is influenced by the objective onset and its speed. The process is, however, still a stochastic process-that is, these variables do not fully determine the outcome.
As proposed by Tünnermann et al. (2015) the chance of onset T probe being encoded first can be described with the parameters of TVA. It can be expressed by three parameters which include υ p (the processing speed of T probe ), υ r (the processing speed of T reference ), and Δt which incorporates the SOA and the maximal ineffective exposure duration , where t 0 p and t 0 r are the maximal ineffective exposure durations for the two stimuli. They are assumed to be equal in the context of the present experiments.
In terms of these parameters, the probability of T probe being encoded first can be expressed as: where 1 − e −vp|∆t| describes the probability that T probe is fully encoded before T reference starts the race to VSTM. The probability e vp|∆t| is the probability of the event that T probe is not encoded before T reference starts its race. Then the probability of encoding T probe first is given by Luce's Here, analogously, e vr|∆t| denotes the probability that T reference is not encoded before T probe starts its race. If this happens, the probability of T probe being encoded first is given by Luce's choice axiom.
To estimate the TVA parameters introduced in this section, a suitable statistical modeling is needed. We use Bayesian statistics for modeling and data analysis because Bayesian methods are particularly well-suited for inference under an assumed model (Little, 2006). We implemented a generative model based on the mathematical description of TVA, visualized in the hierarchical graphical Bayesian model of However, they do not provide additional information because they depend on the weight and capacity, as indicated by the direction of the arrows. For further information on the exact nature of the Bayesian parameter estimation process, please refer to Appendix A.
The following four experiments test the viability of the proposed method in salience research. To this end, we combined TOJs with salience displays. In Experiment 1, the order of stimulus onsets had to be judged. This experiment was most similar to common TOJ experiments. In Experiment 2, stimulus offsets were judged, and the stimuli of Experiment 3 flickered for a short duration. We investigated whether salience increased processing speed and attentional weights.  cognitive model. the bars in the upper part represent the races to vstM. Formally, these races depend on the processing rates. the rates υ sp and υ sr from the salience condition of the experiments are shown exemplarily. the proportion of "salient first" judgments depends on the comparison of both races. soA = stimulus onset Asynchrony.
hierarchical Bayesian graphical model of the data of the salience condition. the salience condition is indicated by the index s . the same model applies for the neutral condition n. the group level, the variables in the highest layer, estimate tvA parameters for a particular condition. this layer was compared to the neutral condition (see variables of the hierarchical Bayesian graphical Model (see Figure 2)

ExpErImEnt 1
Experiment 1 is based on the hypothesis that the onset of an orientation singleton achieves an increased attentional weight and is hence encoded to VSTM more quickly. It was carried out as a proof of concept to show that TVA can be successfully applied to salience research via the general TOJ method outlined by Tünnermann et al. (2015). To this end, it had to meet the requirements of both salience studies and TOJ research, requiring us to combine multi-element displays from salience research with temporally distributed targets in the most direct way possible.
The participants judged the temporal order in which two targets appeared in a display of 17 × 17 bars. A center section of these displays is exemplarily shown in Figure 3. The salience display consisting of homogeneous background stimuli was shown first. The targets appeared later. One of the targets could differ in orientation whereas the other one was always non-salient-that is, of the same orientation as the background elements.
This combination of multi-element displays and stimulus onsets is the direct way of checking the applicability of the method.
Unfortunately, however, it is questionable whether target onsets allow salience effects to show up. Firstly, the blanks at the locations of the future targets may act as salient stimuli because they violate the background pattern (Li, 2002). Secondly, results on the temporal course of salience suggest that salience is used to gradually distribute attention over the display (Dombrowe et al., 2010): After a 30 ms delay, the salience effect is very small in comparison to its peak at 120 ms. Salience information thus might not be available initially. Finally, the onset information may be so strong that it masks any effects of salience. Because the present experiment serves as a proof of concept, this is no severe disadvantage. If the methodology works as expected, we will be able to precisely describe the reported temporal order with the help of the proposed model independent of whether an effect of salience is present on the group level. Following this proof of concept, Experiments 2 and 3 will look into effects of salience themselves. visualization of the stimulus sequence of experiment 1 to 4. stimuli are identical to those of the experiments, but displays have been scaled for visibility. the salience display was shown 150 ms before the probe event. the event to be judged was the onset (experiment 1), offset (experiment 2), or flicker (experiment 3 and 4; depicted as white coronae). only the salience conditions are shown. these conditions comprise a salient probe stimulus. the neutral conditions of the experiments featured a non-salient probe stimulus equal to the reference stimulus. these conditions are not depicted. the arrow depicts the flow of time. soA = stimulus onset Asynchrony.

Procedure
Participants were instructed to fixate the cross in the center of the screen throughout each trial. Their task was to report which element occurred first, the left or the right one, and press the left or right key, respectively. There was no time pressure. The experiment started with a training phase of 40 trials that included feedback about errors. There was no feedback after the training. After 50 trials each, a break was initiated which was ended by a keypress. The experiment lasted approximately 45 min.

Results
The judgments whether the left or right stimulus appeared first were converted into the judgment whether T probe appeared first. Remember that T probe is the stimulus that stands out from its surroundings in the salience condition.
As can be seen in Figure 4, the participants generated typical sigmoid TOJ data. All individual data showed this pattern which allowed us to apply the model (see the section "Modeling TOJ data by TVA" for details).
Bayesian statistics yields a full probability distribution of the model parameters, a point estimate of the parameter, which is provided by the mode of the respective distribution, and an easily interpretable measure of the certainty with which the parameter was estimated. Broad probability distributions correspond to vague estimates. This information is expressed by the highest density interval (HDI) of the distribution, the interval on the x-axis in which 95% of the likely parameters lie.
The most interesting variables in the hierarchical Bayesian graphical model are on the group level because they allow us to compare the difference between the salience and neutral condition. The relation between the weight for T probe in the salience condition ω sp and its counterpart in the neutral condition ω np shows if salience has an influence on attention parameters (see Figure 5). The parameter distribution for the weights are depicted in Figure 5. The parameter estimations show that w sp = .507 and ω np = .516 differ only slightly. Interestingly, the value .5 is not among the 95% of the most probable parameters for ω np -that is, attention is not distributed equally across the two targets in the neutral condition. Because all elements were equally salient in this condition, visual properties cannot be the cause of the higher attentional weight for T probe . The temporal properties, however, offer an explanation: T probe was always shown 150 ms after display onset. This fixed interval made it predictable. In order to measure the effect of salience unbiased by that of temporal expectation, we subtracted the deviation from the expected neutral weight .5 in the ω np parameter from the ω sp param-SOA (ms) estimated attentional weights (ω) for the probe stimuli of experiment 1, salience condition (ω sp = weight for the salient probe) in blue and neutral (ω np = weight for the neutral probe) in red. the weights for the reference stimuli are 1 minus the weight of the respective probe.
eter. The corrected weight is ω sp clean = .493. The correction shifts the weight of the salience condition ω sp in the opposite of the expected direction, which would be an increased weight for the salient stimulus.
As explained earlier, the effect is small and hence again, ω np and ω sp clean differed only slightly.
The processing rates for the stimuli are very similar. All are in the range of 23.3 Hz to 24.9 Hz. This result is to be expected when both weights and capacities are similar (see Figure 6).
The processing capacity was similar in both conditions with C s = 49.4 Hz and C n = 48.1 Hz (see Figure 7). The distribution of its difference is centered on 0. Hence a difference is very unlikely. Importantly, this allows one to compare the attentional weights across conditions because it can be assumed that the same process distributes the same resources differently in the two conditions.

Discussion
Staying close in design to the well-established TOJ paradigm while using multi-stimulus displays yielded plausible data that resembled psychometric functions. The TVA-based model was successfully applied to model the data. It was possible to estimate parameter distributions for individual participants as well as on the group level. These rates are comparable to what has been found in earlier TVA studies (e.g., Finke et al., 2005). In sum, this allows us to use TOJs on multi-element displays in order to compute TVA-based attentional parameters.
Although one stimulus was clearly salient due to its 90° orientation difference, this salience did not increase its attentional weight nor its processing rate in comparison to its counterpart from the neutral condition. Salience thus had no influence on the distribution of attention as measured by TVA parameters. This result cannot be attributed to a lack of sensitivity: The fact that the neutral weight (.5) was located outside of the HDI for the neutral condition (likely due to the fixed time of the T probe onset) indicates the sensitivity of the approach. That is, if present, even small differences between attentional parameters of T reference and T probe should have been detected.
The absence of a salience effect on attentional parameters might be explained by the lack of a delay between the property which is supposed to guide attention (the local contrast) and the events which are relevant for the TOJ-that is, the onsets. TVA assumes that the sensory evidence for onset and local contrast is available equally fast. In the V1-salience model by Li (2002), however, it is assumed that salience is computed by pyramidal cells and interneurons that interact locally and reciprocally in their layer. The onset, however, can be processed estimated processing rates (υ) for experiment 1. the processing rates of the salience condition (υ sp = processing rate for the salient probe; υ sr = processing rate for the reference in the salient probe displays) are shown in blue, those of the neutral condition (υ np = processing rate for the neutral probe; υ nr = processing rate for the reference in the neutral probe displays) in red. the darker distributions belong to the probe stimulus and the lighter distributions belong to the reference stimulus. estimated processing capacities (C) for experiment 1 in the salience condition (C s = capacity for salient stimulus) in blue and the neutral condition (C n = capacity for neutral stimulus) in red. the difference of 0 is in the highest density interval (hdi) if both distributions are subtracted, which indicates that the overall processing capacity was similar in both conditions.

ExpErImEnt 2
In Experiment 2, the onsets used in Experiment 1 were replaced with offsets. Offsets are susceptible to attentional effects (Vingilis-Jaremko, Ferber, & Pratt, 2008). We hypothesized that the presence of the salience-generating property prior to the event (offset) should cue the event and hence lead to a higher attentional weight. Again, this should lead to a quicker encoding into VSTM. The offset at the potentially salient position occurred 150 ms after the onset of the display. As shown by Donk and Soesman (2010), effects of orientation salience should be present in this time range.

Method ParticiPants
A total of 20 participants (9 male and 11 female; M age = 22.6, range 19-47), including the authors, participated in Experiment 2. All of them were students or members of Leuphana University of Lüneburg or Paderborn University. Each participant reported normal or corrected-to-normal visual acuity and completed one session. All participants except for the authors received a payment of 8 Euro per hour.

aPParatus
The apparatus was the same as in Experiment 1.

stimuli
The same stimuli as in Experiment 1 were used. Because this time the temporal order of offsets had to be judged, all elements (background elements, T reference and T probe ) were shown after the initial presentation of the fixation cross. The offsets of the two targets occurred with the same timing as the onsets in Experiment 1.

Procedure
The procedure was the same as in Experiment 1 except that participants were instructed to judge which element disappeared first. This is depicted in Figure 3.

Results
Similar to Experiment 1, the data resembled psychometric functions.
Hence, it was possible to apply the model and estimate the parameters.
A summary of the raw data is given in Figure 8.
The attentional weights on the group level are, again, most informative about whether attention was deployed unequally. In contrast to Experiment 1, the attentional weight for the probe in the salience condition, ω sp clean = .393, clearly differed from the equal weight distribution, as shown in Figure 9. As in Experiment 1, the attentional weight ω np = .526 in the neutral condition deviated from the balanced value of.5. We suppose this deviation to be a consequence of the timing which differed for probe and reference stimulus. The weight in the salience condition was again corrected (uncorrected ω sp = .423), such that the small shift in weight likely due to timing does not affect the measurement of salience. The processing rate for the salient υ sp = 23.4 Hz was lower than the processing speed for the neutral condition υ np = 31.6 Hz (see Figure   10 for their distributions). The processing capacity, as shown in Figure   11, was constant over the conditions which allowed the comparison of weights across conditions. The comparison of the judgment data and the posterior predictive in Figure 8 shows that the model is able to fit the data and provides a reasonable description for them.

Discussion
Replacing the onset from Experiment 1 with the offset led to a distinct and measurable salience effect. The attentional weights shifted away from the salient to the non-salient stimulus. Contrary to theory, the sa-SOA (ms) Plot of raw data (mean of judgment frequency per soA over all participants) and posterior predictive for the salient and neutral condition of experiment 2. this plot shows predicted data based on the estimated parameters. soA = stimulus onset Asynchrony. estimated attentional weights (ω) for the probe stimuli of experiment 2, salience condition (ω sp = weight for the salient probe) in blue and neutral (ω np = weight for the neutral probe) in red. the weights for the reference stimuli are 1 minus the weight of the respective probe.  New and Scholl (2009) reported that the subjective duration of an attended stimulus is longer than the duration of an unattended one which contributes to a delayed perceived offset of the attended stimulus. Similarly, Rolke, Ulrich, and Bausenhart (2006) showed that the response to a cued offset takes longer than the response to an uncued offset. They conclude that attention delays the perceived stimulus offset.
Furthermore, the absence of a stimulus can be salient if it violates the local pattern (Li, 2002). Replacing the stimuli with gaps might hence have caused an unwanted manipulation of salience. Although we cannot offer a full explanation yet, it is likely that the unexpected direction of the effect caused by salience is due to the offset event. This event does not only probe salience but also manipulates it. Independent from this unexpected finding, Experiment 2 however substantiated the validity of the method proposed in the present paper. The TVA-based analysis was applicable to the data and yielded interpretable parameters.
Experiment 3 makes a final attempt at disclosing effects of salience with this method by keeping the salience display as constant as possible.

ExpErImEnt 3
Although a salience effect was measured successfully in Experiment 2, its direction was unexpected. We hypothesized that the offset event was responsible for this because it changed the salience display permanently. Therefore, a short flicker was used in Experiment 3. The flicker prevents a permanent change of the salience display. Again, salience is supposed to increase the attentional weight and thus speed up the processing of the probe stimulus.  estimated processing rates (υ) for experiment 2. the processing rates of the salience condition (υ sp = processing rate for salient probe; υ sr = processing rate for reference in salience displays) are shown in blue, those of the neutral condition (υ np = processing rate for neutral probe; υ nr = processing rate for reference in neutral displays) in red. the darker distributions belong to the probe stimulus and the lighter distributions belong to the reference stimulus. estimated processing capacities (C) for experiment 2 in the salience condition (C s , in blue) and the neutral condition (C n , in red). the difference of 0 is in the highest density interval (hdi) if both distributions are subtracted, which indicates that the overall processing capacity was similar in both conditions.

Procedure
The procedure was the same as in Experiment 1 except that the participants were instructed to judge whether the first flicker was on the left or the right of the fixation cross. This procedure is depicted in Figure 3.

Results
As in the previous experiments, it was possible to apply the model to the TOJ data and to derive the parameters. For illustration, the averaged responses per SOA are given in Figure 12 This result also means that processing speed changed: The salient element is processed faster υ sp = 27.5 Hz than its non-salient counterpart from the neutral condition υ np = 20.6 Hz while the reference stimulus from the salience condition is processed slower υ sr = 13.2 Hz than its counterpart υ nr = 18.09 Hz. All estimated rate parameters are shown in Figure 14. The rates can be interpreted as a shift of resources from the non-salient reference stimulus to the salient probe stimulus in the salience condition.
The overall processing capacity, again, was the same in both conditions as shown in Figure 15. Hence, weights are interpretable as a redistribution of the same resources.
Also and as the final result of modeling, the posterior predictive shows a distinct shift between the salient and neutral condition as depicted in Figure 12. The two conditions show almost no overlap. This shift indicates that the salient T probe is perceived earlier, in perfect accord with the parameters and summary of the raw data discussed above.

Discussion
Experiment 3 yielded a salience effect that increased the attentional weight on the salient stimulus and hence its processing speed. This is in line with both the salience and TVA literature and shows that TVA can be used to quantify the effects of salience on processing. This quantification happens in terms of the individual processing speed and the attentional weight. The attentional weight describes the allocation of attention across all relevant stimuli and has the advantages of measuring the salience in relation to the other stimulus in the display. Attentional weights are directly comparable if overall capacity is the same. The processing speed is a second possible measure of salience. Though attentional weight is theoretically more sound, processing speed is directly comparable even if the capacity does not stay the same.
With a value of ω sp clean = .642, the shift from the neutral weight of .5 is very clear. Note that the TOJ method is rather conservative in this respect because both targets have to be encoded. This makes extreme values for the attentional weight close to 0 or 1 very unlikely.
To the best of our knowledge, this is the first study in which TOJs manipulated by salience were sufficiently sampled to show the full psychometric function and the occurrence of systematic shifts in the report probability. The occurrence of this shift was already assumed by Donk and Soesman (2011). Because only one-half of the suspected psychometric function was sampled in their experiment, the actual function was not derivable. Both the data presented in Figure  Plot of raw data (mean of judgment frequency per soA over all participants) and posterior predictive for the salient and neutral condition of experiment 3. this plot shows predicted data based on the estimated parameters. soA = stimulus onset Asynchrony. estimated attentional weights (ω) for the probe stimuli of experiment 3, salience condition (ω sp ) in blue and neutral condition (ω np ) in red. the weights for the reference stimuli are 1 minus the weight of the respective probe.
The size of the change in attentional weights (as inferred from the HDI and the posterior predictive) indicates that the proposed method will be appropriate to prove effects of different size, including small effects: There is nearly no overlap between the expected psychometric functions for the salient and neutral condition. This means that smaller shifts will also be detectable, as, for instance, can be expected when smaller local differences would be used. The small but reliable effect of fixed time of the T probe shows that the method is sensitive enough for small effects.
The arguments why to prefer the TVA model over the classical analysis by psychometric functions are theoretical ones, as explained in the Introduction. We, however, also conducted a conventional analysis of psychometric functions which the interested reader finds in the Appendix. It is in accord with the present results but provides less information.

ExpErImEnt 4
Experiment 3 showed the feasibility of the proposed method.
Experiment 4 was designed as a test of the generality of our approach.
We furthermore analyzed feature differences smaller than the admittedly large difference between 0° and 90° in Experiments 1 to 3. To this end, we used a high-salience condition and a low-salience condition, operationalized by stimulus luminance.

aPParatus
The apparatus was the same as in Experiment 1.

stimuli
The same stimuli as in Experiment 1 were used, except that salience was manipulated in the luminance dimension. In the low-salience condition, a dark gray probe with RGB (80, 80, 80) (4.03 cd/m 2 ) was used. In the high-salience condition, the probe was black RGB (0, 0, 0) (0.31 cd/m 2 ). To keep the experiment as short as possible, the neutral condition without a salient probe was omitted. We did this because Experiment 3 already showed what can be theoretically assumed: This condition yields a weight of .5 for the target-that is, attention is distributed equally between the two visually equal targets.

Procedure
The procedure was the same as in Experiment 3.

Results
Again, the raw data were typical TOJ data (see Figure 16). The attentional weight for the probe in the high-salience condition ω hp = .582 was higher than in the low-salience condition ω lp = .539, which implies a difference of .043 in attentional weight. The parameter distributions are shown in Figure 17. estimated processing rates (υ) for experiment 3. the processing rates of the salience condition (υ sp = rate for the salient probe; υ sr = rate for the reference in the salient display) are shown in blue, those of the neutral condition (υ np = rate for the neutral probe; υ nr = rate for the reference in the neutral probe display) in red. the darker distributions belong to the probe stimulus and the lighter distributions belong to the reference stimulus. estimated processing capacities (C) for experiment 3 in the salience condition (C s ; blue) and the neutral condition (C n ; red). the difference of 0 is in the highest density interval (hdi) if both distributions are subtracted, which indicates that the overall processing capacity was similar in both conditions.
in the processing of the non-salient reference stimulus: High-and low-salience probes were processed nearly equally fast with a rate of υ hp = 18.7 and υ lp = 18.3. The processing speed of the reference stimulus, however, varied strongly with condition, with a rate of υ hr = 13.3 in the high-salience and one of υ lr = 17.1 in the low-salience condition.
This is important for the theoretical explanation (see below).
The overall processing capacity was very similar with C h = 32.2 for the high-salience condition and C l = 35.1 for the low-salience condition, as depicted in Figure 19.
The posterior predictive, presented in Figure 16, shows an asymmetrical distribution. This accords to the processing speeds shown in Figure 18: The processing of the reference targets is affected more than the processing of the probes. Figure 16.

SOA (ms)
Plot of raw data (mean of judgment frequency per soA over all participants) and posterior predictive for the high-salience and low-salience condition of experiment 4. this plot shows predicted data based on the estimated parameters. soA = stimulus onset Asynchrony. estimated processing rates (υ) for experiment 4. the processing rates of the high-salience condition (υ hp = rate for the highly salient probe; υ hr = rate for the reference in the highsalient probe displays) are shown in blue, those of the lowsalience condition (υ lp = rate for the lowly salient probe; υ lr = rate for the reference in the low-salient probe displays) in red. the darker distributions belong to the probe stimulus and the lighter distributions belong to the reference stimulus. estimated processing capacities (C) for experiment 4 in the high-salience condition (C h ; blue) and the low-salience condition (C l ; red). the difference of 0 is in the highest density interval (hdi) if both distributions are subtracted, which indicates that the overall processing capacity was similar in both conditions.

Discussion
Experiment 4 expanded the scope of the present method to the luminance dimension and tested two quantitative levels of salience. As expected, both singletons received increased attentional weight, and this increase scaled with their salience: The highly salient probe received more attentional weight than the less salient probe. Thus, this fourth experiment shows that the proposed method is applicable to features other than orientation, which is a promising result for further generalization. Furthermore, Experiment 4 indicated that quantitative differences in salience lead to quantitative differences in attentional weights. This result promises to enlarge the scope of our method to a general quantitative model of salience. Note, however, that this difference seems to be caused by slower processing of the reference stimuli.
Faster processing of the highly salient compared to the less salient probe contributed only slightly to this difference.

GEnEral dIscussIon
The Theory of Visual Attention (TVA) can serve as a foundation for quantifying visual salience. We showed this by conducting four experiments. All experiments substantiate the soundness of the model, which combines TOJs and TVA.
Experiment 1 demonstrated the applicability of the suggested method in general. This was achieved by combining salience displays and TOJs. Experiment 2 tested the effects of salience on attentional weights and processing speed. Although in principle successful-the experiment indeed measured effects on weight and speed-it was not entirely satisfying because attentional weights favored the non-salient stimulus, which was processed faster than the salient one.
As we reasoned that the offsets we used in Experiment 2 might not have been optimal because they caused (possibly) salient gaps in the bar array, we replicated the experiment with flickering stimuli. This experiment showed the full relevant data pattern: The salient stimulus received more attentional weight and was processed faster than the non-salient one. Attention was withdrawn from the non-salient stimulus and redistributed to the salient one.
In Experiment 4, we applied the flicker procedure to the luminance dimension in order to demonstrate its applicability to other stimulus dimensions as well as its sensitivity and its usefulness for a quantification of salience effects. All aims were successfully reached by Experiment 4: Salient stimuli drew attention towards themselves, and there was a difference in weights and processing speeds between highly and less salient stimuli.
Beyond comparison of individual model parameters, both experiments have shown that salience redistributes resources according to feature differences. Attention which is dedicated to the salient stimulus is withdrawn from the reference stimulus. Importantly, this relation is not predefined by the TVA model. Because the processing rate of each stimulus is modeled as an independent process, it is possible that only the salient stimulus gains while the speed of the race stays constant for the reference stimulus. (Such a rate increase would result in a capacity difference between conditions.) Although we focused on a measure of salience, this may be understood as evidence for parallel processing rather than a guided serial processing as in the Guided Search models by Wolfe (e.g., 1994Wolfe (e.g., , 2007 that predict an increase of attention for salient stimuli.
Independent of the salience-related results, the proposed method of combining salience displays with TOJs and TVA parametrization was successful in all four experiments: All yielded psychometric functions as well as plausible parameters including the attentional weights and processing speeds of the two targets as well as the overall processing capacity.
To sum up, the combined TVA/TOJ method proposed in the present paper seems a promising tool. Further studies could test and model the quantitative relationship between salience values and attentional weights in more detail, for instance, by employing several levels of salience instead of only two. Also, different salience dimensions could be compared directly via attentional weights, relating the salience of, say, a colored singleton to an orientation or luminance singleton. This is, however, beyond the scope of the present article.
We propose the presented procedure to measure the strength of salience because this strength can be quantitatively expressed in a theoretically meaningful parameter of a tried and tested theory. Different from earlier approaches, the method is not limited to specific salience dimensions because the task is largely independent of the type of elements. Also, it is not limited to a reference stimulus like the methods proposed by Nothdurft (2000), and Huang and Pashler (2005).
A further advantage is that no assumptions about contested issues such as the relative contribution of top-down and bottom-up influences have to be made to apply the present approach. While Theeuwes (2004,2010,2013), for example, takes the stance that salience captures attention inevitably, other researchers claim that all salience effects are modulated by top-down task sets (e.g., Ansorge & Becker, 2014;Folk, Remington, & Johnston, 1992;Yantis & Egeth, 1999). Our method provides a useful salience measure for both perspectives. Furthermore, interactions between bottom-up and top-down influences can be studied within the TVA framework. Nordfang et al. (2013) have developed a TVA extension that tackles this problem (see also Bundesen, Vangkilde, & Petersen, 2015). Both feature contrast and task relevance are modeled as individual variables affecting the attentional weight.
That is, these authors already proposed a model for the interaction of bottom-up and top-down influences on attention. Its empirical application is, however, restricted to the partial report and the stimuli suitable for the partial report, whereas our TOJ-based approach can deal with all kinds of stimuli.
Besides effects of salience, we consistently detected a small effect on the attentional weight in the neutral conditions of Experiments 1 to 3. All visual features were equal for the two targets in these conditions, except for their timing. While T reference varied according to the SOA, T probe was always shown at a fixed point in time. With this procedure, the strength of salience is not distorted by the time course of salience.
As a trade-off, we accepted the chance that an effect of predictability occurred-which indeed was the case. The formal model, however, al-lowed to correct for it. Note that this finding is well in line with results from Vangkilde, Coull, and Bundesen (2012), who investigated the effect of temporal predictability on perception. They examined effects of timing on t 0 , the minimal effective exposure duration, and the processing speed υ, whereas we detected an influence on the attentional weight ω of the predictable stimulus and its υ parameter. The precision with which the small effect was detected is promising for future studies.
A further aspect concerning the timing of the experiment is the presentation duration of the display prior to the TOJ. Although we kept it equal in all conditions, decreasing and increasing the duration of the salience display is possible. By this procedure, effects of presentation references appEndIx a

Bayesian Parameter Estimation for the Proposed Model
We use Bayesian statistics for the formalization of the model and data analysis because Bayesian methods are particularly well-suited for inference under an assumed model (Little, 2006). We thus implemented a generative model based on the mathematical description of TVA to perform Gibbs sampling (Plummer, 2003) to obtain posterior distributions of all relevant parameters and comparisons.
We employed the hierarchical Bayesian graphical model as shown in Figure 2 for analyzing the data. The discrete y nodes correspond to the collected data and represent the response for T probe as the first target.
The n node represents the total amount of trials per SOA. θ represents the probability of a success of the binomial distribution at each SOA.
This probability and the TVA parameters introduced in the TVA section are computed according to the equations in Table 1. This describes the model for a neutral condition, in which no stimulus is salient. The salience condition, in which one of the stimuli was salient, was modeled analogously.
Besides the data, priors are a mandatory part of Bayesian data analysis. Usually, these priors express previous knowledge about a variable. To keep the priors and resulting analysis unbiased by assumptions and previous data (which the reader might not agree with) we based the priors on the theoretical extrema of these parameters. This type of prior is called non-informative. The priors of weight and processing capacity are bound by 0 on the lower end. Also, the weight cannot exceed 1. The processing capacity is not limited explicitly by theory, but the chosen maximum processing capacity of the prior (500 Hz) would imply that a race to VSTM could be completed in 2 ms which is highly unlikely. As recommended by Gelman (2006) we decided for a t-halfdistribution, the positive half of a t-distribution, for the variance priors of our hierarchical model. The priors are given in Table A1.
Given the model, priors and collected data, the parameters are computed by Gibbs Sampling (Plummer, 2003). Mathematically, it is guaranteed that for an infinitely long process the result is perfectly representative of the true posterior. Limited by computing time and resources, the process is, however, not guaranteed to converge. Hence, it is crucial for a Bayes analysis to check the samples for representativeness. The parameters which have been used for Gibbs Sampling are a burn-in period of 5,000 iterations, 100,000 samples and a thinning of 10 in four chains. As explained by Kruschke (2010, p. 131), there is no optimal way of checking representativeness but rather several ways that are useful. The four chains converged for all reported posterior distributions. This was checked by examining the trace plots, the histograms (density plots), and the shrink factor. Additionally, the accuracy of the samples was checked by the effective sample size. All these parameters indicated that our chains indeed converged.

Analysis of Experiment 3 by Psychometric Functions
For the interested reader we present an exemplary analysis of Experiment 3 with the common method of psychometric functions.
Function parameters were computed by Bayesian statistics as described by Kuss et al. (2005).The following is a short summary of the technical details that are presented more elaborately in the article by Kuss et al.
We Based on these parameter distributions, a posterior predictive was computed. This posterior predictive is depicted in Figure B2. The width parameter ω is difficult to interpret. In general, a steeper function corresponds to smaller uncertainty in the judgment-that is, better performance. By contrast, the shift of the threshold m has a straightforward interpretation: The salient probe has an increased probability of being reported first compared to its non-salient counterpart from the neutral condition. The m parameter is usually interpreted as the point of subjective simultaneity; an m parameter different from zero in positive direction indicates that the salient stimulus is perceived faster than its non-salient counterpart. In sum, the results of the logistic function analysis do not contradict the TVA-based analysis. Plot of raw data (mean of judgment frequency per soA over all participants) and posterior predictive for the salient and neutral condition of experiment 3. this plot shows predicted data based on the estimated parameters of both logistic functions. soA = stimulus onset Asynchrony.