Spatial processing and visual backward masking

Most theories of visual masking focus prima-rily on the temporal aspects of visual information processing, strongly neglecting spatial factors. In recent years, however, we have shown that this position is not tenable. Spatial aspects cannot be neglected in metacontrast, pattern and un-masking. Here, we review these results.


IntroductIon
In visual backward masking, perception of a target is impeded by a trailing mask. Most research has focused on the phenomenon of B-type masking, in which the strongest deterioration of performance occurs for intermediate SOAs. In these investigations, usually metacontrast masks are used, i.e. masks not overlapping with the target. Deteriorated performance is often explained by neural inhibitory mechanisms such as lateral inhibition (e.g. Bridgeman, 1971;Growney & Weisstein, 1972), mask blocking (Francis, 2000), dual channel inhibition (e.g. Breitmeyer & Ganz, 1976;Öğ�en� 1993), delayed facilitation (e.g. Bachman, 1994), contour elimination (e.g. Kolers, 1962;Werner, 1935), or object substitution (e.g. Di Lollo, Enns, & Rensink, 2000). For example, in the �n�uent��� �ode�s �y Bre�t�eyer �nd G�n� �1976) and Bachman (1994), target and mask processing occurs in two channels, a faster and a slower one, thereby allowing the mask signal in the faster channel to catch up with the target signal in the slower channel (Öğ�en� 1993).
In A-type masking, performance improves monotonically when the ISI between the target and mask increases. The effects of the mask on the target are often explained in terms of contrast reduction (e.g. Eriksen, 1966� or ���ou���e �e��� Coltheart & Arthur, 1972;Enns, 2004).
Almost all studies of both A and B-type masking have a common focus on the temporal characteristics of the target and mask, largely neglecting non-basic spatial dimensions (however, see Werner, 1935;Williams & Weisstein, 1981, 1984. Here, we review results suggesting that the spatial layout of t�e t�r�et �nd ��sk exerts � tre�endous �n�uen�e on backward masking that was largely neglected previously. In particular, spatial grouping seems to be a key factor for certain masking effects. We will argue that, for this reason, models have to incorporate explicit spatial processing components. Models employing temporal mechanisms only are not suf-fi��ent� rESuLtS

Pattern and A-type masking
In pattern masking (by structure), mask and target spatially overlap. Usually A-type masking is found, which is explained, in terms of integration masking, for example, as a result of luminance summation and contrast reduction (e.g. Eriksen, 1966), by camouflage and montage (recently, Enns, 2004),

AbStrAct
Most theories of visual masking focus primarily on the temporal aspects of visual information processing, strongly neglecting spatial factors. In recent years, however, we have shown that this position is not tenable. Spatial aspects cannot be neglected in metacontrast, pattern and un-masking. Here, we review these results.
These factors are often assumed to occur at stages as early as the retina (e.g. Michaels & Turvey, 1979).
In a series of experiments using pattern masks, we have shown that these explanations are not suf-fi��ent (Herzog & Fahle, 2002;. Figure 1 shows a typical example of these experiments. A vernier target is followed by a grating comprised of 25 aligned verniers; a moderate threshold elevation occurs compared to when the vernier is presented without the grating. This masking can be strongly potentiated if four single contextual lines are presented in addition to the grating: the vernier target can be rendered invisible and thresholds dramatically rise ( Figure 1). This interference is dominant in a temporal window of more than 100 ms and can hardly be explained with the classical accounts of integration masking.
Luminance summation and contrast reduction may play a role if only the central grating follows the vernier (horizontal line in Figure 1). However, they cannot explain why adding four additional lines potentiates masking. This becomes even more evident when taking into account that adding 2*25 contextual lines, hence further increasing energy, undoes the masking of the four lines which are contained in the 2*25 lines �F��ure 2�� C��ou���e or �ont��e ���y no ro�e s�n�e the four lines may even serve as a reference to localize the vernier (collinear lines above and beneath the central grating element also yield a strong performance deterioration; Herzog, Schmonsees, & Fahle, 2003b).
Finally, the vernier is covered only by the central grating element in all conditions, which yields the same degree of image distortion in the near neighborhood of the vernier. Still, performance varies strongly with the spatial layout of the contextual elements. Taken together, classical explanations of integration masking fail to account for our masking results (Herzog, Dependahl, Schmonsees, & Fahle, 2004;Herzog & Fahle, 2002).  Our results clearly show that explanations of pattern masking have to carefully consider aspects of the spatial layout of the target and mask. We could show that some of the above results can be reproduced with a simple but dynamic model of spatial information processing (Hermens & Ernst, this volume;Herzog, Ernst, Etzold, & Eurich, 2003).

A left or a right offset vernier (V) was presented for 10ms and followed immediately by a grating comprising 25 aligned verniers (G) lasting for 300 ms. Observers had to indicate the offset direction of the vernier in a binary task. The horizontal line in the results graph indicates the threshold in this condition ("standard"). In addition to the standard grating, four contextual lines could be displayed with varying SOAs in relation to the vernier onset (SOA denotes the onset asynchrony of contextual lines (C) relative to the standard grating). These lines appeared above or below the third grating element to the left and right of the center. Lines were separated by a small vertical gap of 200'' from the grating and presented for 5 ms or 10 ms (a SOA of -50ms is shown in the stimulus sketch). Performance strongly deteriorated for SOAs from -100 ms to 30 ms, i.e. much longer than the duration of the four lines. Reprinted from Vision
It is very important to note that the dramatic changes of performance, caused by rather simple spatial manipulations, occur only in a short temporal window. Even only slightly longer vernier durations, as used above, yield weak or no masking independent of the spatial layout (e.g. Herzog, Schmonsees, & Fahle, 2003a). Hence, it seems that the above results reveal complex spatio-temporal effects at the very beginning of spatial information processing.
Using a feature fusion paradigm, we have shown how the spatial layout contributes to unmasking. We presented a vernier followed by a second vernier with t�e s��e dur�t�on �nd s��t��� ��r��eters �s t�e first vernier except for having an offset with opposite direction (Herzog, Parish, Koch, & Fahle, 2003). This "�nt��vern�er" serves �s t�e first ��sk� W�t� t��s �ondition, both verniers fuse yielding the percept of one single vernier. The anti-vernier dominates performance more strongly than the vernier, i.e. backward masking is stronger than forward masking ( Figure   4a). When these two verniers are followed by an additional mask, dominance can reverse, i.e. the vernier dominates performance (Figure 4d-f; Herzog, Lesemann, & Eurich, 2006). However, this unmasking is present only for extended masks but not, for example, for a single aligned vernier, even though this single vernier is part of the 25-element grating which yields strong unmasking (Figure 4b,d). On the other hand, the single vernier is not part of the metacontrast grating which, however, yields unmasking like the 25-element grating. Hence, unmasking like pattern masking cannot be explained by the masking of its parts. This again suggests complex spatial

Metacontrast and b-type masking
B-type masking is usually believed to be the most interesting phenomenon in backward masking. A later presented mask can catch up to an earlier presented target and dominate performance, thereby ruling out an ultra-fast feedforward processing as found in other domains of vision (e.g. Thorpe, Fize, & Marlot, 1996; VanRullen, this volume). It should be mentioned that B-type masking loses much of its mystery when it is accepted that the brain acts as a temporal low pass fi�ter �nd so�e te��or�� non���ne�r�t�es �re �nvo�ved (Francis, 2000;Francis, this volume;Francis & Herzog, 2004). Here, we show that temporal aspects are not the whole story but that B-type masking strongly depends on the spatial layout of the mask and target.
A vern�er w�s �resented for 20 �s �nd ��nked �y a line on each side, presented for 20 ms as well. Flank length was either the same as the vernier or twice as long. These metacontrast masks exerted B-type masking as expected ( Figure 5; Duangudom, Francis, & Herzog, 2007; see also Otto� Öğ�en� & �er�o�� 2006; Otto, this volume�� Sur�r�s�n��y� for �ore ��nk�n� ��nes� A�ty�e ��sk�n� or ��t ��sk�n� fun�t�ons were o�t��ned de�end�n� on t�e �en�t� of ��nks� �en�e� we can change the masking function, e.g. from A to B, by changing the spatial layout of the mask. Surprisingly, the weakest masking was obtained for the mask with 6 e. f. c. (c, d), a metacontrast grating, or a light field (f). Gratings lasted for 300 ms, verniers for 15 ms or 20 ms. The metacontrast grating resulted from removing the central element in the 25-element grating. If only the vernier and anti-vernier were presented, the anti-vernier dominated performance, indicated by a value below 50%. For a single aligned vernier or a 5-element grating no obvious dominance occurred, whereas extended masks led to unmasking: the vernier dominated (performance was above 50%). From Herzog, Lesemann, & Eurich (2006) with permission from "Advances in Cognitive Psychology". http://www.ac-psych.org lines on each side of the vernier being twice its length ( Figure 5; Duangudom, Francis, & Herzog, 2007). This mask has the highest energy but still yields the weakest masking contrary to many models of masking (Bre�t�eyer & Öğ�en� 2006, p. 48).

dIScuSSIon
Visual masking has been explored for more than a century. Still, the underlying mechanisms are subject for debate. Most models try to explain masking from purely temporal grounds, ignoring spatial aspects (only the model by Francis, 1997, has a full 2-dimensional spatial representation). Here, we have provided strong support for the involvement of spatial aspects in pattern, un-, and metacontrast masking.
These effects are only visible with backward masking in a very narrow time window. For example if an ISI of 10 ms only is inserted between the vernier and the standard grating, adding contextual lines raises thresholds only modestly. Hence, contextual modulation has vanished (Herzog et al., 2003a; see also . We believe that masking with the shine-through effect reveals aspects present only at the very beginning of spatial information processing.

Local contour interactions. B-type masking with
metacontrast masks is often assumed to occur because the mask inner contour suppresses the processing of the target contours (e.g. Werner, 1935). In support of this hypothesis, it was found that the larger the space between the target and the inner contour of a metacontrast mask, the better was the performance (e.g. Growney, Weisstein, & Cox, 1977;Kolers, 1962;review: Bre�t�eyer & Öğ�en� 2006, p. 56). Hence, local spatial interactions seem to be important.
However, contrary to this proposition, we could change the masking function qualitatively from B-type to A-type masking while leaving the inner contour of the metacontrast masks constant ( Figure 5). Thus, local computations between the target and the neighboring ��sk�n� e�e�ents �re not suffi��ent to ex����n B�ty�e masking.
In pattern masking, we used gratings. Performance changed greatly in the various conditions even though the standard grating was constant (see Figure 1). In unmasking, both the 25-element and the metacontrast grating yielded comparable results whereas the proximity of contours clearly differed in these conditions. For these reasons we deny an important role of local contour interactions, at least with our stimuli. Also, models based on simple lateral interactions may not be able to explain many of our results.  Francis, & Herzog (2007 Hence, it has to be seen whether existing mathematical models of masking (e.g. Bridgeman, 1971;Di Lollo et al., 2000;Öğ�en� 1993) can capture the above effects when extended by appropriate spatial processing components (for the 2D model of Francis, 1997, no simulation results are available because of limited spatial resolution).
We propose that grouping also plays an important role in metacontrast masking. For short SOAs, the vernier offset can hardly be discriminated when it ��n �e �rou�ed w�t��n t�e ��nk�n� ��nes -�s t�e single contextual elements lose their power when grouped within contextual gratings (Figure 2; see also Malania, Herzog, & Westheimer, 2007;Sharikadze, Fahle, & Herzog, 2005). When SOA increases, grouping breaks down by temporal cues and performance improves. To the best of our knowledge, there are only a few visual masking studies taking complex spatial aspects into account beyond low level variations such as varying mask size or the distance between target and mask contours. Williams and Weisstein (1984) showed that B-type masking occurs when the target appears as part of a 3-dimensional structure but A-type when not. More recently, Moore and Lleras , Lleras & Moore, 2003 argued that masking depends strongly on whether or not the target and mask can be processed separately (see also Kahan & Mathis, 2002).
It is surprising to see so few studies jointly investigat-�n� te��or�� �nd s��t��� v�s�on even t�ou�� t�e first �o�� of vision is the generation of a coherent spatial representation of the outer world that, as masking shows, is not created instantaneously. Spatial and temporal vision research belong together.