The mechanisms of feature inheritance as predicted by a systems-level model of visual attention and decision making

Feature inheritance provides evidence that properties of an invisible target stimulus can be attached to a following mask. We apply a systemslevel model of attention and decision making to explore the influence of memory and feedback connections in feature inheritance. We find that the presence of feedback loops alone is sufficient to account for feature inheritance. Although our simulations do not cover all experimental variations and focus only on the general principle, our result appears of specific interest since the model was designed for a completely different purpose than to explain feature inheritance. We suggest that feedback is an important property in visual perception and provide a description of its mechanism and its role in perception.


INTRODUCTION
The perception of a briefly flashed target stimulus followed by a mask can be strongly impaired or, depending on the mask and the stimulus-onset asynchrony, the stimulus can be easily detectable. Theories of visual masking explain the impaired perception typically by an erosion of the target information, be it by temporal fusion, interruption or suppression through competition. In feature inheritance, however, the mask inherits a property of the target stimulus (e.g. Herzog & Koch, 2001). For example, a vernier, a tilted line, or a bar in apparent motion are presented for a short time and followed immediately by a grating comprising a small number of straight elements. The grating is perceived as offset, tilted, or moving. The perceived distortion (e.g. tilt) is much smaller than the actual property of the target. The target stimulus itself remains largely invisible. This effect cannot be easily explained by a simple temporal fusion since the property of the mask is only slightly distorted and the effect lasts for mask presentation times of about 300 ms. Moreover, when target and mask are very different in orientation, both appear visible (shine through). Thus, feature inheritance demonstrates that stimulus properties can act upon the properties of a following stimulus.
The mechanism responsible for feature inheritance is still unclear, but some recent work addressed its neural correlate. Zhaoping (2003) explains feature inheritance by lateral figure-ground binding in V1 and shows that a vernier followed by a grating consisting of a few elements results in only one or two saliency peaks at the border of the grating, whereas a grating with several elements results also in a saliency peak at the center, suggesting no feature inheritance but shine through. However, the actual decoding of this saliency information into a percept or a decision has not been modeled and it remains open in how far V1 saliency is responsible for the perception of an offset or tilt. We have recently developed a computational model to explain most http://www.ac-psych.org Fred H. Hamker of the temporal phenomenology of feature inheritance (Ma, Hamker, & Koch, 2006). We varied the duration of target and mask presentation and tuned the parameters of the model to be consistent with observations. According to the model, a subsystem creates an inert hypothesis about the stimulus which is then tested against the later input. Cells further downstream, related to object perception, only fire when the hypothesis is confirmed. We will call this a strong hypothesis testing model. Although the model can account for several observations, the hypothesis-testing subsystem was specifically designed to explain feature inheritance. While this approach is typical for most computational models, fundamental insights can only be achieved if a model generalizes to other phenomena. Thus, we here apply a model of visual attention to the paradigm of feature inheritance to gain further insight into general mechanisms of visual perception. This model contains a mechanism of weak hypothesis testing by means of feedback, which implements feature-based attention and goal-directed search and resolves ambiguities (Hamker, 2005a;Hamker, 2005b;Hamker, 2006).
Weak hypothesis testing refers to the rule according to which feedback is not necessary for brain areas to process the stimulus-driven feedforward signal.
Feedback only modulates processing.
Object substitution theory proposes that masking is a consequence of ongoing recurrent interactions between different levels of the cortical hierarchy (Di Lollo, Enns, & Rensink, 2000;Enns, 2002). The first stimulus is initially processed in a feedforward sweep. This sweep activates neurons at high levels which project back to earlier levels. With respect to feature inheritance, the features of a target can be incorporated into the activation pattern of a following mask if both are similar (Enns, 2002). At this level of abstraction, our model is very similar, if not identical, to object substitution theory. However, one key idea of the object substitution theory is that perception requires a confirmation of the perceptual hypothesis by comparing the hypothesis at the higher level with the ongoing activity at the lower level (Enns, 2002;Di Lollo et al., 2000). The exact mechanism of this comparison is critical, and requires a clear definition. Although, feedback has been emphasized in several models of visual perception, its exact mechanism significantly differs across these models. In the computational model of object substitution (CMOS) the input into the higher area is defined as the sum of feedback and feedforward (Di Lollo et al., 2000). A summation predicts the activation of cells at an early level by feedback from higher levels and thus, both, the actual signal and the top-down hypothesis are simultaneously activated at an early level.
Several approaches treat vision as a generative process (Mumford, 1992;Olshausen & Field, 1997;Rao, 1999). According to this paradigm, feedback represents the predicted image and the feedforward signal the residual image which is obtained by subtracting the predicted image from the input image.
A good match between the internal hypothesis and the actual input results in a weak feedforward signal and a mismatch in a strong signal. Thus, feedback primarily serves to "explain away" the evidence by suppressing the activity. This approach has been primarily used for the learning of receptive fields and object recognition. Its relevance for masking or feature inheritance has not been explored so far.
Our approach, which shows some similarity to adaptive resonance (Grossberg, 1980), interactive activation models (McClelland & Rumelhart, 1981), Bayesian belief propagation and particle filtering (Lee & Mumford, 2003), predicts an enhancement if both signals are consistent with each other by increasing the gain of the feedforward signal. If both signals are not consistent no enhancement occurs, i.e., no gain change takes place. Perception in our model can be actively guided by an internal hypothesis, but a match between the visual observation and the internal hypothesis is not required for the activation of visual areas (weak hypothesis testing approach).
Thus, a purely sensory-driven activation (with and without feedback) is sufficient to activate all model areas. Due to competitive interactions irrelevant information is inhibited (Hamker, 2004), similar as in the Biased Competition framework (Desimone & Duncan, 1995). We have termed this interaction of the top-down or feedback with the feedforward signal as population-based inference (Hamker, 2005a;Hamker, 2005b), since it implements an inference operation but differs in several aspects from a true Bayesian approach. In the following we will briefly introduce the model of attention and its mechanism of feedback. We then apply different versions of the model to simulate a typical feature inheritance experiment and derive conclusions about the role of feedback and memory in visual perception. The fact that human subjects can under some conditions report a masked, briefly flashed stimulus has lead to two alternative interpretations (Smith, Ratcliff, & Wolfgang, 2004). In the first one, stimulus properties get encoded in visual short-term memory (VSTM), http://www.ac-psych.org and its content represents the input for the decision process. In the second one, the decaying iconic trace provides the input for decision making. We will also discuss a third alternative. Here, memory provides a top-down signal which modifies the properties of visual areas. The decision however, is still based on the content of the iconic trace. We call this approach active hypothesis testing.
We are specifically interested in the question if memory-based, active hypothesis testing is required for feature inheritance to occur, or if passive hypothesis testing by feedback, is sufficient. Thus, we have tested five different models, two where perception is only sensory-driven, and three where perception is hypothesis-driven. We obtain an internal hypothesis by memorizing a representation of the stimulus at different times. From the two models of sensorydriven perception, one can be categorized as passive hypothesis testing, since it contains feedback but no external top-down signal. In the other one, we removed feedback.

Systems-level model of attention
Our model of attention is an extension of an earlier model (Hamker, 2003;Hamker, 2004;Hamker, 2005a), which has been strongly constrained by several electrophysiological observations and anatomy.
The present version operates with real input images.
It has been applied on tasks such as object detection in natural scenes, change detection, visual search, and feature-based attention (Hamker, 2005b;Hamker, 2005c;Hamker, 2006). Since it has been extensively described in Hamker (2005b) we here give only a brief overview with emphasis on the relevant aspects for feature inheritance.

Population-based inference
We have developed a population-based inference ap-

Model for visual attention. First, information about the content and its low level stimulus-driven salience is extracted. (Stimulus-driven saliency, however, will not be crucial for the results obtained here.) This information is sent further downstream to V4 and to IT cells which are broadly tuned to location. A target template is encoded in PF memory (PFmem) cells. Feedback from PFmem to IT increases the strength of all features in IT matching the template. Feedback from IT to V4 sends the information about the target downwards to cells with a higher spatial tuning. FEF visuomovement (FEFv) cells combine the feature information across all dimensions and indicate salient or relevant locations in the scene. The FEF movement (FEFm) cells compete for the target location of the next eye movement. The activity of the FEF movement cells is also sent to V4 and IT
As long as the maximal activity within the population is lower than a threshold (e.g. A=1), the feedback signal  r i effectively increases the gain. On the population level, however, the local gain mechanism can result in the distortion of the population response and thus in a misperception. We have recently shown that our population based inference approach is general enough to explain also spatial effects such as the shift and shrinkage of receptive fields in area V4 prior to saccade (Hamker & Zirnsak, 2006).

Simulation of the featureinheritance experiment
We used a similar experimental procedure as Herzog

Decision making
Our model allows us to simulate the temporal course of activity in different brain areas. In order to close the gap between a continuous time varying signal and a finite decision of a human subject we will use a simple neural decision model, which reads out the population response in the orientation channel and determines if the mask is perceived as tilted or not. Models of decision making that accumulate the evidence over time have a long tradition in mathematical psychology leading to several models. For an overview see  as well as Usher and McClelland (2001) and for a comparison of models refer to Ratcliff and Smith (2004). Despite many differences the general idea is very similar. All models accumulate the evidence from a time-varying input signal and stop when a criterion is reached such as the crossing of a threshold. In most decision making simulations the input of the model is  Subjects probably learn what information is relevant in a particular experimental situation. In our model, we select the relevant information by weighting the activity, distributed across the feature space, with a Gaussian (Fig 4).
following the common approach that the evidence for one choice reduces the evidence of the other choice (Mazurek, Roitman, Ditterich, & Shadlen, 2003). The accumulated evidence is computed within a laterally connected set of two neurons r 1 and r 2 : τ τ d dt r t I k w r t a w r t w r t r t d dt r t

RESULTS
We simulated five different models, (1) sensory-driv-    Memory at 180-200 ms Fred H. Hamker neural response at different times leads to less target information in memory with increasing time (Fig 6A).
Moreover, for all three models of hypothesis-driven perception, large orientation offsets lead to little or no influence of the target information on the population encoded in memory since only the strongest population enters memory. According to the first approach to the perception of masked visual stimuli, the memory content represents the input of the decision . Thus, this model predicts the perception of relatively strong tilts (Fig 6A). In many cases, the perceived tilt is about half of the veridical tilt, which is not consistent with the typical observation (Herzog & Koch, 2001).
If we now consider the third approach to the perception of masked visual stimuli where memory modifies visual areas we observe for all three models that the IT activity is permanently distorted towards the target orientation ( Fig 6B).  Encoded orientation information in the population activity at 300 ms after target onset with respect to the veridical orientation. The decoding of the encoded orientation in the population response has been done with a simple population vector method (Dayan and Abbott, 2001).

(A) Decoded orientation relative to the mask in the PFmem cells. The memorization of the IT activity at different times reflects the sustained influence of the briefly presented target on the population response. The sustained influence is orientation dependent. If the orientation of target and mask differ strongly the information from the target is not memorized. Only when the memorization of the IT activity occurs at 100-120 ms, a target stimulus of an orientation offset of 40° or larger largely distorts the population. For orientation differences up to 30° some information of the target is still encoded by the population. (B) The population response in IT receives a small but sustained distortion, if a template has been memorized and used for top-down guidance.
In the models with no memory or without feedback the information from the target stimulus has faded away at 300 ms after target onset. Note, the y-axis in panels A and B scales differently.   (Herzog & Koch, 2001) might depend on their decision criterion. Subjects which are trained in fast decision making, such as playing ball games might use a low threshold and thus they perceive an influence of the target. In subjects using a conservative criterion (high threshold), the mask dominates the decision and the subject does not perceive the tilt, or the target presentation times have to be longer. This view of perceptual decision making is similar to masked response priming which can also be modeled by a neural accumulation process (Vorberg et al., 2003).
Somewhat surprisingly is our observation that feedback-loops alone are sufficient to lead to feature-inheritance. Although the information of the target disappears at about 150-200 ms after target onset, feedback holds the target information sufficiently long to influence the decision with respect to the perceived orientation. We do not claim that feature inheritance necessarily occurs at the level of IT and V4. Our proposed feedback mechanism is a general mechanism of feedback and also acts from V2 to V1 and V4 to V2. Consistent with observations, the model predicts that feature inheritance only occurs within a limited range of an orientation difference between target and mask. Since we only used 20 cells to represent the orientation space and did not tune the width of the population response the exact range might be slightly different, e.g., subjects reported feature inheritance if elements are tilted by 7° (Herzog & Koch, 2001). At the level of the decision, the model of sensory-driven perception does not fundamentally differ from the model of hypothesis-driven perception. However, the model of sensory-driven perception without feedback does not provide sufficient evidence for a feature-inheritance effect. From our analysis we cannot exclude that other mechanisms than feedback can also account for feature-inheritance. The strength of our approach rather lies in its generality. Our model was designed for a completely different purpose, but nevertheless, without modification, it shows a feature-inheritance effect. We acknowledge that a comprehensive demonstration of the role of feedback in feature inheritance requires more simulations and perhaps also changes in the model, but at present, it appears important to us to identify general, universal mecha-nisms of perception as compared to specialized models tuned to a single experimental paradigm, such as our earlier model (Ma et al., 2006). Our model appears also consistent with the observation of a trace carried over a sequence of invisible elements (Otto, Öğmen, & Herzog, 2006). Other experiments have revealed that the locus of spatial attention influences feature inheritance (Sharikadze, Fahle, & Herzog, 2005). Offsets at the attended edge of the grating influence performance whereas offsets of non-attended elements do not show a strong influence. This is probably not easy to test with orientations, since local orientation differences typically pop-out.
However, these results provide additional constraints for models of feature inheritance.
The present discussion about models of visual perception is dominated by extremes such as purely feedforward models and models that require reen-  (Rockland, Saleem, & Tanaka, 1994). Furthermore, feedback can act as fast as 10 ms (Hupé, James, Girard, Lomber, Payne, & Bullier, 2001). Given that a final decision typically requires to integrate information over time, there is little room for a decision purely based on feedforward evidence. We rather suggest the following scenario: http://www.ac-psych.org Other phenomena, such as the change of temporal perception, might also depend on feedback. Our model predicts a decrease in the time for a perceptual decision, if target and mask are similar. Two aspects of our model seem to be primarily involved in this speed up. First, the reentrant connections in the visual areas and second, the integration of the relevant features for the perceptual decision. Present evidence suggests, that not the pure similarity of features, but the task relevance of the features is the cause of enhanced processing speed (Scharlau & Ansorge, 2003;Enns & Oriet, 2007;Scharlau, 2007).
Thus, it appears that the integration of the relevant features, i.e. the evidence, is the crucial process involved in the increase of processing speed. In the present version of our model the definition of which features are relevant is predetermined. It would be very interesting to explore how learning could lead to an automatic selection of relevant features for a given task.
Feedback might also be crucial for the relatively long duration of iconic memory, a high-capacity form of storage, lasting for at least a few hundred milliseconds (Coltheart, 1983). Iconic memory seems to be essential for visual awareness (Koch, 2004), probably by providing the substrate for the collection of evidence. This transfer from iconic memory to visual awareness is not understood so far. It is not clear if integration alone (sensory-driven perception) is sufficient or if a form of active hypothesis testing is required, as suggested by inattentional blindness experiments (Mack & Rock, 1998). The fact that passive hypothesis testing seems to be sufficient to explain feature inheritance by our model does not exclude the possibility that at a higher level, such as the transition to awareness, active hypothesis testing is required. However, is appears unlikely that a strong form of hypothesis testing occurs early in the visual pathway.
Since our model is very simple with respect to the shape of objects the present version does not allow strong predictions in other masking paradigms.
However, since classical models of backward masking (Breitmeyer, 1984;Breitmeyer & Öğmen, 2000;Öğmen, Breitmeyer, & Melvin, 2003) are based on local, lateral connections, it might be interesting to further explore the role of feedback in masking. Object substitution theory provides a first important step into this direction. However, object substitution is at present a more general framework and it requires a clear definition of many underlying computational mechanisms.
Our model could lead to a partial refinement of object substitution, since we have given evidence that the mechanism of feedback can be well described as a gain increase on the feedforward signal. Anyway, more detailed neural models with feedback appear a promising tool to further study the role of feedback in masking.