Spatio-temporal low-level neural networks account for visual masking

T emporal masking is a paradigm that is widely used to study visual information processing. When a mask is presented, typically within less than 100 msec before or after the target, the response to the target is reduced. The results of our psychophysical and visual evoked potential (VEP) experiments show that the masking effect critically depends on a combination of several factors: (1) the processing time of the target, (2) the order of presentation of the target and the mask, and (3) the spatial arrangement of the target and the mask. Thus, the masking effect depends on the spatial-temporal combination of these factors. Suppression was observed when the mask was positioned within a spatial range that was found to evoke inhibition, and when the temporal separation between the target and the mask was short. In contrast, lateral facilitation was observed when the mask was presented at a spatial separation that did not evoke inhibition from the target’s vicinity and with a temporal sequence that preceded the target, or when it was presented simultaneously with it, but not when the target preceded the mask. We propose that masking effects, either suppression or facilitation, reflect integration into the spatial and the temporal domains of the feedforward response to the target and the lateral inputs evoked by the mask (excitatory and/or inhibitory). Because the excitation evoked by the mask develops and propagates slowly from the mask’s location to the target’s location, it lags behind the response to the target. On the other hand, inhibition that is produced in the vicinity of the target evolves more rapidly and follows the onset and offset of the stimulus more closely. Thus, lateral excitation that overcomes the inhibition may facilitate the grouping of local elements into a global percept by increasing the survivability of the object and its accessibility for perceptual awareness.

. These studies clearly show that the neural representation of a target is modulated with regard to the surround stimuli. It is also apparent from these studies that the outcome of contextual modulation is complex; it is mostly suppressive but may also be facilitative in some spatial-temporal combinations.
The nature (either facilitation or suppression) and the strength of the context effects are determined by several parameters such as proximity, similarity, contrast, and global configuration.
Traditionally, masking is treated separately in the spatial and temporal domains (Breitmeyer, 1984). In the temporal domain, when the mask precedes the target, it is termed forward masking (FM), whereas mask presentation following the disappearance of the target is termed backward masking (BM). Most of the temporal masking studies have focused on BM, less on FM, whereas simultaneous masking (SM) has been typically treated as a separate condition, most likely due to the lack of a temporal mismatch between the target and the mask.
In the spatial domain, the literature on masking distinguishes between pattern masking (mask and target presented at the same retinal location) and metacontrast (the mask location does not overlap with the target location, also termed lateral masking). This distinction is based on an implicit assumption that sharp boundaries that allow a visually apparent gap between the target and the mask indicate a distinct activation of different receptive fields.
However, within the context of neuronal modeling, an important factor is the overlap between the receptive fields of the responding units, which may account for lateral interference regardless of whether the physical stimuli overlap or not. We will address this important issue next.
Our working hypothesis is that the masking effect critically depends on a combination of spatial and tem-poral stimuli attributes that can be summarized in a descriptive model with the following main factors: (1) the processing time of the target, (2) the presentation order of the target and the mask, and (3) the spatial arrangement of the target and the mask.

Processing time.
An estimate of the persistence or the integration time of the target response taken from physiological experiments (Albrecht, 1995;Mizobe et al., 2001;Polat, Mizobe, Pettet, Kasamatsu, & Norcia, 1998) provides an upper limit of 200 ms.

This estimate is consistent with psychophysical results
showing that the integration time for contrast detection at threshold is 160-200 ms (Watson, Barlow, & Robson, 1983) and with results from our laboratory (Rosen, Belkin, & Polat, 2005). We assume that a mask presented beyond this time-window will fail to affect the response to the target.

Interactions: excitation vs. inhibition.
The results of Polat and Sagi (2006) showed that temporal masking is affected by the order of presentation of the target and the mask as well as the spatial separation between them, which can be explained by the temporal and spatial properties of excitation and inhibition.
Dynamics. Temporal masking can be accounted for by assuming different time courses for excitatory and inhibitory interactions. Whereas excitation develops slowly and is sustained, lagging behind the stimulus both in onset and offset, inhibition is rapid and transient, thus following the onset and offset of the stimulus more closely.
Spatial architecture. Several models of lateral interactions assume that excitatory and inhibitory connections form a neuronal network that determines the measured responses (Adini & Sagi, 2001;Adini, Sagi, & Tsodyks, 1997;Polat, 1999;. It is assumed that each network unit receives three types of visual input: (1) direct thalamic-cortical input, (2) lateral input from other units within the network, and (3) top-down feedback. These inputs can be subdivided into excitatory and inhibitory types. The lateral excitation is organized along the filters' optimal orientation, forming a collinear field Polat & Tyler, 1999), and is superimposed on a suppressive area surrounding the filters.
Propagation time. It has been suggested that the size of the receptive fields in V1 is estimated to be between 2 to 3λ (Mizobe et al., 2001;Polat, 1999;Polat & Norcia, 1996;Polat & Sagi, 1993;Watson et al., 1983;Zenger & Sagi, 1996). Thus, masking effects from target-to-mask separations of 2λ or less may be considered as integration (or summation) within the same receptive field (pattern masking), whereas separations of 3λ or more activate lateral interactions http://www.ac-psych.org between different neurons responding to the target and the mask (lateral masking). Masking effects from outside the receptive field propagate to the target's location through lateral connections, which are relatively slow compared with the direct input received by the receptive field. The estimated propagation speed of lateral excitation derived from psychophysical studies is about 3 degrees per sec (Cass & Spehar, 2005;Tanaka & Sagi, 1998), in agreement with the estimates from intracellular and optical imaging measurements (Bringuier, Chavane, Glaeser, & Fregnac, 1999;Malonek, Tootell, & Grinvald, 1994;Series et al., 2003). Therefore, facilitation is possible only if the propagation of the excitatory input from the mask to the target is not delayed by a period longer than the integration time of the feedforward input.

Pattern vs. lateral masking.
Most of the masking studies used targets and masks that can be regarded as broadband stimuli in the spatial domain, and thus may be detected by receptive fields of different sizes.
Therefore, it is likely that larger receptive fields respond both to the target and the mask. Thus, the masking effect may be related to interactions within the same receptive field, resulting in pattern masking. For example, in these studies it is impossible to differentiate between pattern and lateral masking, and the observed results may be confounded by both types of masking. Thus, an important factor in masking is the overlap between the receptive fields of the responding units, which may account for lateral interference, regardless of whether the physical stimuli overlap or not.
In this study we also sought to find the neurophysiological correlates for the masking effect with the same stimuli that we used in the behavioral BM experiment and to compare our observations with previous findings in the literature. A particularly relevant EEG study by Jeffreys and Musselwhite (1986) investigated whether metacontrast-related inhibition or suppression is reflected in early components of the waveforms in visual evoked potentials (VEPs), namely the C1 and C2 components.
Scalp distributions of C1 and C2 reflect the respective sites of origin in the striate and extrastriate visual cortex (Jeffreys, 1971;Jeffreys & Axford, 1972). No effect of metacontrast masking was found in C1 or C2 amplitudes; however, a clear U-shaped masking function in a separate psychophysical study was observed. An earlier EEG study (Schiller & Chorover, 1966) did not find evidence for metacontrast masking effects in early VEP components as well. Bridgeman's reanalysis of Jeffreys & Musselwhite's data (Jeffreys & Musselwhite, 1986) revealed a U-shaped modulation of the VEP amplitude of a later visual component in the VEP, around 250 ms, corresponding to the be-havioral U-shaped masking function, which was thought to reflect visual masking due to recurrent processing (Bridgeman, 1988). A modulation around this latency has been found in single neuron activity in the cat and monkey striate cortex (Bridgeman, 1975(Bridgeman, , 1980. Interestingly, a recent MEG study compared metacontrast masking with variable stimulus onset asynchrony using effective vs. pseudo mask (van Aalderensmeets, Oostenveld, & Schwarzbach, 2006). In order to determine whether the perceptual effect on the target's visibility is reflected in the corresponding component of the VEPs, around 250 ms, a control condition was introduced -a pseudo mask. In contrast to an effective mask, the pseudo mask did not share similar features but otherwise was similar to the effective mask (similar physical qualia, different shape). The pseudo mask did not produce behavioral masking. However, the lack of a distinction in the VEPs' amplitudes, around 250 ms, be- However, the spatial characteristics of the mask, such as its shape, the sharpness of its edges, and the possibility of a consequent overlap with the visual field of the target are of critical importance (see factor 3 of our descriptive model). That is, the visually apparent lack of pattern masking does not necessarily guarantee the lack of overlapping between the target and the mask within the same receptive field.
Using VEP, we measured the interactions between the target and the subsequent mask at different temporal separations. We used the spatial separation that produces metacontrast masking (i.e., the target and the mask activate separate receptive fields) under conditions that provide behavioral facilitation of target visibility.

Participants
Ten subjects with normal or corrected-to-normal vision in both eyes participated in the experiments. Five

Stimuli
The stimuli were localized gray-level gratings (Gabor patches) with a spatial frequency of 6 cycles per degree (cpd), modulated from a background luminance of 40 cd . m -2 ( Fig. 1). Stimuli were presented binocularly on a Philips multiscan 107P color monitor, using a PC system. The effective size of the monitor screen was 24 × 32 cm, which, at a viewing distance of 150 cm, subtends a visual angle of 9.2 × 12.2 degrees. The subjects' responses were recorded from a viewing distance of 150 cm, in a dark cubicle, wherein the only ambient light came from the display screen.
The threshold of contrast detection was measured using a two alternative forced choice (2AFC) paradigm, in which the target had to be detected in one of two successive presentations, separated by an interval of 800 ms with a random jitter of 500 ms to avoid confounding the responses upon anticipation of the onset of the trial. A visible fixation circle in the center of the screen indicated the location of the target. Four visible crosses were presented at the corners of the monitor, at the same time with the target's appearance, to avoid temporal uncertainty when presenting the target. The subjects activated the presentation of each pair of im-ages (i.e., a single trial) at their own pace. Negative auditory feedback was provided. Contrast thresholds were measured utilizing a staircase method, which was shown to converge to 79% correct (Levitt, 1971).
In this method, the target contrast is increased by 0.1 log units (26%), after an erroneous response, and is decreased by the same amount after three consecutive correct responses. About 40 trials were needed to estimate the threshold in each block. In addition, the threshold of contrast detection of the target presented alone, in a range of durations from 30 to 500 ms, was tested monocularly (Figure 2

Participants
Five subjects with normal or corrected-to-normal vision in both eyes participated in the experiments.

Stimuli
The target, similar to the target stimulus used in the psychophysical experiments, was presented at 1 Hz for 50 ms at a contrast of 6% (at or very close to the detection threshold), with no change in the average background luminance. Backward masking, either on a

Example of stimuli used in this study. Three configurations of the target and masks were used in the temporal interaction experiments: simultaneous masking (SM), backward masking (BM), and forward masking (FM). The duration of presentation of the low-contrast target and the high-contrast mask was 60 ms in the psychophysical experiments and 50 ms in the VEP experiments. The masking effect was measured by comparing the responses under 5 conditions: (1) the target alone (T), (2) the target and mask presented simultaneously (simultaneous masking, SM), (3) the target followed by a mask (BM-on-T), (4) SM followed by a second mask (BM-on-SM), and (5) the target preceded by a mask (FM on T). The three levels of gray shading represent the three types of masking: forward, simultaneous, and backward.
http://www.ac-psych.org target alone or on SM, was tested using stimuli similar to the masks used in the psychophysical experiments.
The spatial distance between the target and the mask was 3λ. Each mask (either M1 or M2) was presented at the same spatial and temporal frequency, and for the same duration as the target. The mask was presented Peak amplitude comparisons between conditions were performed using the paired t-test.

Integration time
We first present data showing the integration time of the target (the threshold of contrast detection for a Gabor patch, 6 cpd) presented alone for a range of durations (Fig. 2). The results show that the contrast threshold improves by more than a factor of two from the duration of 30 ms to 120 ms, followed by saturation. This result is consistent with earlier results (Legge, 1978;Watson et al., 1983), indicating that efficient processing is performed during the first 120 ms of stimulus presentation, an observation that may pose an upper limit for efficient temporal masking.

Effect of target-to-mask spatial separation
Our aim was to test the effect of spatial separation (i.e. the distance) on the masking effect. Two distances were tested: 2λ, which is assumed to have some overlapping with the target location, and 3λ, where no overlapping with the target location is assumed, as discussed in the Introduction and Discussion (Fig. 3). The masking effect was measured as the log of the target's threshold, normalized to the threshold of the target presented alone (i.e., the threshold elevation). Thus, positive 1) The effect of distance on temporal masking can be regarded as an effect from inside (2λ) and outside (3λ) the receptive field (see Introduction).
Suppression is evident in BM and FM at 2λ, but not at 3λ. Facilitation is evident at 2λ only in SM, whereas at 3λ in FM and SM, but not in BM. For an ISI of 180 ms, however, no effect of temporal masking was found for any distance.
2) The results clearly show that SM, FM, and BM differ in the way they affect the target response at an ISI of 60 ms. FM produced facilitation at 3λ but resulted in suppression at 2λ. SM produced facilitation at both distances. BM produced suppression at 2λ, but had no effect at 3λ. Thus, the observed interaction between the effective integration time of the feedforward response and the delayed lateral response (due to a slow propagation time) seem to determine the perceptual masking effect.

difference between bM-on-t and bM-on-SM
It is possible that the asymmetric masking effect (FM vs. BM) observed above can be accounted for by differences between the temporal dynamics of the mask and the target responses and the interaction between them.
It was previously shown that when the mask in SM was presented continuously after the target disappeared (with no ISI), the effect of the facilitation expected in SM disappeared (Polat & Sagi, 2006).

Effect of contrast in bM-on-SM
It is still possible that the second mask under the BM-on-SM condition abrogates the facilitation observed in SM by inhibiting the response to the first mask by reducing its visibility. In other words, the effect might be regarded as pattern masking of the first mask by the second. If true, the perceived contrast of the first mask should be lower. It was shown earlier that even a lowcontrast mask in SM still produces facilitation (Polat, 1999). Therefore, one would expect that reducing the perceived contrast of the first mask in BM-on-SM will

Comparison of BM-on-T, BM-on-SM and SM at 2 and 3λ. The elevation of the threshold of the target detection under the two BM conditions, as compared to SM, is shown. The Y-axis denotes the threshold elevation (positive values indicate suppression; negative values indicate facilitation). The results for BM-on-T (dots), BM-on-SM (vertical strips), and SM (horizontal strips) at 2 (blue) and 3λ (red) are presented (mean of 5 subjects ± SEM).
http://www.ac-psych.org still result in facilitation. We repeated the BM-on-SM experiment for different contrast levels (7.5-60%) of the first mask ( Figure 5, orange bars). The contrast of the second mask was kept constant, at 60%. For comparison, the SM condition for the same mask contrasts was tested ( Figure 5, blue bars). The results of the SM, presented in Figure 5, confirmed the earlier finding that facilitation is not dependent on the contrast of the first mask, and that this is valid between contrast levels of 7.5-60%, though the magnitude of the facilitation is slightly reduced for the lower contrast of the first mask.
However, in BM-on-SM, the second mask abrogated the facilitation for all contrast levels (p < .0006, t-test), indicating that the effect of BM reduces the effective lateral interactions between the first mask and the target. Further support for this result comes from the VEP experiment, which is presented below.

dIScUSSIOn
In this study our working hypothesis was that masking effects, either suppression or facilitation, reflect integration into the spatial and the temporal domains of the feedforward response to the target and the lateral inputs evoked by the mask (excitatory and/or inhibitory). It was found that when masking of a single target was explored, the expected suppression effect was observed for both FM and BM, but only with a spatial separation of 2λ (i.e., interactions within the same receptive field). However, facilitation was observed at 3λ (i.e., interactions between different receptive fields), with FM and SM, but not with BM. http://www.ac-psych.org Uri Polat, Anna Sterkin, and Oren Yehezkel response to the first remained unchanged. Thus, the VEP, in concert with the behavioral findings, rules out the possibility of a pattern masking effect of the second mask on the first mask.

the possible neuronal mechanism underlying masking
What is the possible neuronal mechanism underlying the observed masking effects? Polat and Sagi (2006) suggested that both facilitation and masking reflect excitatory and inhibitory interactions within neuronal networks in response to Gabor stimuli (Adini et al., 1997;Hirsch & Gilbert, 1991;. The presentation of a mask initiates both excitatory and inhibitory processes. However, whereas excitation develops slowly and thus lags behind the stimulus, inhibition is rapid and follows the onset and offset of the stimulus more closely. Thus, when the first mask is turned off, the inhibition decays rapidly, whereas the sustained excitation persists, resulting in lateral facilitation of the target. This suggestion is supported by the relatively slow time scale that characterizes lateral interactions (Bringuier et al., 1999;Malonek et al., 1994;Series et al., 2003) andstrong, transient (Borg-Graham, Monier, &Fregnac, 1998) and fast inhibition (Bair, Cavanaugh, & Movshon, 2003).
In the present study we highlight the importance of the temporal matching between feedforward input and lateral propagation, by monitoring their delays using VEP measurements. The response delay decreases with increasing target contrast by up to 100 ms (our unpublished data), which is consistent with data revealed from single unit recordings. Here we show that the delay of the peak response to the target presented alone was 210 ms (on average), whereas the corresponding delay of the mask response was 160 ms, indicating that the feedforward signal of the mask precedes the signal of the target (low contrast) by 50 ms. Because the speed of lateral propagation of the mask response is slow, it reaches the target's location with a delay of an additional 50 to 100 ms (Polat & Sagi, 2006). Thus, the resulting delay of the lateral masking effect is 210 to 260 ms. As In FM, when the mask is presented 50 ms before the target, the feedforward response to the target would be delayed by about 100 ms relative to the mask response. However, the lateral propagation of the mask response (with a delay of about 50 to 100 ms, i.e., within the efficient processing time-window of the target) would modulate the feedforward processing of the target, resulting in a masking effect. In SM, the feedforward delay of the target (210 ms) is temporally matched with the resulting delay of the mask response (i.e., the sum of the feedforward delay of 160 ms and the lateral propagation delay of 50 to 100, which is 210 to 260 ms).
Thus, the network response is biased towards excitation, resulting in facilitation of the response to the target.

the inhibition-excitation account and its relationship to insideoutside the receptive field
A BM effect (suppression) on the target was observed for a target-to-mask separation of 2λ, but not of 3λ.
The lateral masking effect is composed of inhibition and excitation. As previously mentioned, the inhibitory response is rapid and transient. As discussed above, in BM with ISIs of 50 to 100 ms, the rapidly developing inhibition coincides with the target response, which would result in a suppressive effect, but the relatively delayed excitation abrogates the inhibition. However, when the mask is positioned at a distance of 2λ (i.e., overlapping with the receptive field of the target), the dominant effect would be inhibitory. The strong inhibitory response is composed of the lateral component as http://www.ac-psych.org well as the local one (from the vicinity of the receptive field of the target). The lateral propagation of the excitation produced by the mask towards the target representation is relatively fast, since the spatial separation of 2λ is relatively short. Therefore, the excitation is temporally matched with the stronger transient inhibition from within the receptive field of the target. Thus, the lateral excitation and the local inhibition interact within the integration time of the target. This explanation is consistent with the physiological study, showing that the main effect of temporal masking is evident only when the mask is positioned within a distance that overlaps with the receptive field (Macknik & Livingstone, 1998).
When the separation between the mask and the target was increased, the masking effect disappeared, in agreement with earlier studies (Breitmeyer, 1984).
Usually the distinction between pattern and lateral masking is based on an implicit assumption that the sharp boundaries that allow a visually apparent gap between the target and mask are indicative of a distinct activation of the center and surround. However, within the context of neuronal modeling, an important factor is the overlap between the receptive fields of the units responding to the target and mask, which may account for lateral interference regardless of whether the stimuli overlap or not.
Physiological studies that showed clear effects of surround modulations on the classical receptive field (Kapadia et al., 1995;Mizobe et al., 2001;, positioned the mask at a distance that, when presented alone, evoked no response from the target location. Thus, the masking effect may possibly be confounded by mixed responses from the target's location as well as from the mask's location. Therefore, we propose that pattern and lateral masking may be inseparable in some of the temporal masking studies, especially for stimuli presented in periphery.

Is the VEP just a linear summation of the target and mask responses?
It has been suggested that changes in the early components of the VEP signals reflect linear summations of the waveforms but not the real perceptual effect (van Aalderen-Smeets et al., 2006). However, our VEP results show that the measured signals are very different from the prediction of a linear summation of the target and mask waveforms, whereas there is an interaction between the target and the mask (i.e., for ISIs of up to 50 ms). However, for ISIs longer than 150 ms, the mask and the target responses are independent (and thus equal to the prediction of a linear summation).
Consequently, at such ISIs no masking effect is evident.
Thus, the evoked potentials seem to mirror the reported perceived masking effect. Moreover, the negative peak response, N1, was found to be markedly reduced (in absolute terms) under the BM-on-SM condition at an ISI of 50 ms, as opposed to (van Aalderen-Smeets et al., 2006), who did not observe any effect of BM at this delay. It is possible that the "pseudo" mask, although having different features from the effective mask, may still have interfered with the receptive field of the target, in a way similar to that of the effective mask, thus producing an undistinguishable pattern of interference with the target processing in the physiological results.
The psychophysical findings for the two types of masks, although differential, are influenced by both the perceptual and the cognitive (i.e., post-perceptual) components of the behavioral response.
Our results suggest that the masking effects, either suppression or facilitation, reflect integration in the spatial and temporal domains of the feedforward response to the target and the lateral inputs evoked by the mask (excitatory and/or inhibitory). The excitation evoked by the mask is relatively delayed to the target stimulus, because it develops and propagates slowly from the mask's location to the target's location. The inhibition produced in the vicinity of the target, however, evolves more rapidly, and therefore follows the onset and offset of the stimulus more closely. It is also possible that the temporal properties of the responses in our study can be accounted for by the dual-channel model, which assumes effects of transient inhibition on sustained excitation (Breitmeyer, 1984). However, our model differs from the dual-channel model in assuming that both inhibition and excitation remain active as long as the stimulus is present. Moreover, our model and results disagree with the model of objectsubstitution masking (Enns & Di Lollo, 2000) in showing that rather than being unaffected, as expected by the model, the response to the mask is reduced.
To conclude, the interplay between the sustained lateral excitation and the transient inhibition may facilitate the grouping of local elements into a global percept by increasing the survivability of the object and its accessibility for perceptual awareness.