Contrast sensitivity reveals an oculomotor strategy for temporally encoding space

The contrast sensitivity function (CSF), how sensitivity varies with the frequency of the stimulus, is a fundamental assessment of visual performance. The CSF is generally assumed to be determined by low-level sensory processes. However, the spatial sensitivities of neurons in the early visual pathways, as measured in experiments with immobilized eyes, diverge from psychophysical CSF measurements in primates. Under natural viewing conditions, as in typical psychophysical measurements, humans continually move their eyes even when looking at a fixed point. Here, we show that the resulting transformation of the spatial scene into temporal modulations on the retina constitutes a processing stage that reconciles human CSF and the response characteristics of retinal ganglion cells under a broad range of conditions. Our findings suggest a fundamental integration between perception and action: eye movements work synergistically with the spatio-temporal sensitivities of retinal neurons to encode spatial information.


Introduction
Contrast sensitivity, the ability to distinguish a patterned input from a uniform background, is one of the most important measures of visual function (Robson, 1966;Campbell and Robson, 1968;De Valois et al., 1974;Owsley and sensitivity, 2003). Elucidation of its underlying mechanisms is, thus, essential for understanding how the visual system operates both in health and disease.
It has long been established that sensitivity varies in a specific manner with the spatial frequency of the stimulus, yielding the so-called contrast sensitivity function (henceforth CSF). Under photopic conditions, the CSF measured with stationary gratings exhibits a well-known band-pass shape that typically peaks around 3-5 cycles/deg and sharply declines at higher and lower spatial frequencies.
The mechanisms responsible for this dependence on spatial frequency are not fully understood. At high frequency, a decline in sensitivity is expected for several reasons, including the filtering of the eyes' optics (Campbell and Green, 1965) and the spatial limits in sampling imposed by the cone mosaic on the retina (Hirsch and Miller, 1987;Rossi and Roorda, 2010). At low frequencies, however, the reasons for a reduced sensitivity have remained less clear.
A popular theory directly links the low-frequency attenuation in visual sensitivity to the neural mechanisms of early visual encoding (Atick and Redlich, 1990;Atick and Redlich, 1992). Building on theories of efficient coding (Barlow, 1961), it has been argued that this attenuation reflects a form of matching between the characteristics of the natural visual world and the response tuning of neurons in the retina: retinal ganglion cells (henceforth RGCs) respond less strongly at low spatial frequencies so as to counterbalance the spectral distribution of natural scenes. According to this proposal, this filtering eliminates part of the redundancy intrinsic in natural scenes and enables more efficient (i.e. more compact) visual representations.
Although very influential, this proposal conflicts with experimental data. Neurophysiological recordings have long shown that the way the responses of retinal ganglion cells vary with spatial frequency deviates sharply from the CSF. The CSF of macaques is very similar to that of humans (De Valois et al., 1974); yet neurons in the macaque retina respond much more strongly at low spatial frequencies than one would expect from behavioral measurements of the CSF ( Figure 1A). This deviation cannot be reconciled with standard models of retinal ganglion cells. It persists even when one takes into account obvious differences in the stimuli often used in neurophysiological and behavioral measurements (i.e. drifting gratings vs. temporally modulated gratings), as well as the nonlinear attenuation in responsiveness at low spatial frequencies exhibited by some retinal ganglion cells (Derrington and Lennie, 1984;Croner and Kaplan, 1995;Benardete and Kaplan, 1997a). This mismatch between neuronal and behavioral sensitivity indicates that additional mechanisms contribute to the CSF.
A fundamental difference between neurophysiological and behavioral measurements of contrast sensitivity is the presence of eye movements in the latter. Under natural viewing conditions, humans and other primates incessantly move their eyes (Kowler, 2011;Cherici et al., 2012). Small movements, known as fixational eye movements (FEMs), occur, even when attempting to maintain steady gaze on a single point ( Figure 1B). Although humans often tend to suppress saccades of all sizes, including microsaccades, during measurements of contrast sensitivity (Mostofi et al., 2016), ocular drift-the seemingly erratic motion in between saccades/microsaccades-keeps the stimulus on the retina always in motion and may cover an area as large as that of the foveola (Rucci and Poletti, 2015a). Critically, this retinal image motion is completely eliminated or markedly attenuated in many neurophysiological preparations, where the retina is studied in a dish, or eye muscles are paralyzed as a result of anesthesia and/or neuromuscular blockade.
In previous work, we have shown that eye drift profoundly reshapes visual input signals, redistributing the 0 Hz (DC) power of the external static stimulus to non-zero temporal frequencies on the retina (Casile and Rucci, 2006;Casile and Rucci, 2009;Kuang et al., 2012;Aytekin et al., 2014). These modulations appear to be used by humans for the fine spatial discrimination Boi et al., 2017;Ratnam et al., 2017), providing new support to the long-standing proposal that the visual system uses oculo-motor induced luminance fluctuations for encoding spatial information in a temporal format (see Victor, 2015b andRucci et al., 2018 for reviews). Building upon this previous work, here, we investigate whether this temporal encoding strategy, coupled with the known response characteristics of retinal neurons, accounts for the most fundamental properties of human spatial sensitivity.
In addition to the properties described above, it is well established that contrast sensitivity is affected by temporal modulations in the stimulus. Although the CSF exhibits a strong attenuation at low spatial frequencies when tested with stationary gratings, the shape of this function changes when gratings are modulated in time, transitioning from band-pass to low-pass as the temporal frequency of the stimulus increases (Robson, 1966). Furthermore, although strongly attenuated, sensitivity also tends to shift to higher spatial frequencies when retinal image motion is strongly reduced, as in experiments of retinal stabilization (Kelly, 1979). In both these conditions, the temporal modulations impinging onto retinal receptors differ drastically from those generated by normal eye drift over stationary gratings.
Does a temporal strategy of spatial encoding reconcile neurophysiological and behavioral measurements of contrast sensitivity? And does this strategy explain the differences in the CSF measured in various experimental conditions? More broadly, does the oculomotor-driven dynamics of retinal ganglion cells provide a unified account of human spatial sensitivity? Answers to these questions are not only critical for advancing our comprehension of the mechanisms of visual encoding but also for understanding the consequences of abnormal retinal image motion and their clinical implications. In the following, we use neuronal models to quantitatively examine the impact of eye drift on neural activity and compare the responses of retinal ganglion cells to the CSF of primates. Figure 1A compares the mean receptive fields of ganglion cells in the primate retina, as estimated by Croner and Kaplan (1995), with the contrast sensitivity of alert and behaving macaques (De Valois et al., 1974). The two sets of data deviate considerably, especially at low spatial frequencies. In this range, unlike the CSF, neural sensitivity is not strongly attenuated, a trend reported by multiple neurophysiological studies (e.g., Kaplan and Shapley, 1982;Hicks et al., 1983;Derrington and Lennie, 1984). This deviation is not simply the outcome of incorrectly extrapolating receptive-field measurements, as neural responses have been directly measured at very low spatial frequencies (down to 0.07 cpd in Croner and Kaplan, 1995; Figure 1A).

Results
While a difference-of-Gaussians model can yield reduced responses at low spatial frequencies, attenuation similar to that observed in the CSF can only be achieved at the expense of highly unrealistic model parameters. As shown in Figure 1-figure supplement 1A-B, for both M and P cells, matching the physiological CSF requires a surround strength that is more than twice the value found in physiological measurements, a condition that gives an almost perfect balance between excitation and inhibition. Even small deviations from this balance lead to marked departures from the CSF (Figure 1-figure supplement 1C-D). Thus, contrary to previous proposals, the spatial sensitivity of retinal ganglion cells appears to be quantitatively incompatible with the characteristics of the CSF. A greater attenuation of neural sensitivity is required at low spatial frequencies to counterbalance the large power of natural scenes in this range.
The response of a neuron, however, does depend not only on the cell's spatial preference but also on its temporal sensitivity. Temporal transients are always present in the input signals to the retina during behavioral measurements of contrast sensitivity. Experimenters often take great care to minimize these transients, for example by slowly ramping up the stimulus at the beginning and down at the end of a trial and by enforcing fixation to prevent visual changes caused by saccadic eye movements ( Figure 2A). Yet, despite these precautions, fixational eye movements are always present and modulate the visual flow impinging on the retina even when the stimulus does not change on the monitor. Could sensitivity to these oculomotor fluctuations reconcile neurophysiological and behavioral measurements of spatial sensitivity?
To investigate this question, we recorded eye movements in human observers, as they carried out a grating detection task at threshold and exposed spatiotemporal filters approximating the receptive fields of retinal ganglion cells to the luminance signals experienced by the retina in each  individual trial. Figure 2B shows the temporal modulations impinging onto retinal neurons during a typical measurement of contrast sensitivity. In the absence of any transient, the power of a stationary visual stimulus would be confined to the DC (0 Hz) temporal frequency axis. In practice, however, both eye drift and the turning of the stimulus on and off on the display introduce temporal modulations. These modulations effectively redistribute part of the stimulus DC power to nonzero temporal frequencies, that is they transform static power (the original power at 0 Hz) into dynamic power (power at non-zero temporal frequencies).
As shown in Figure 2C-D, because of the characteristics of ocular drift, the resulting dynamic power increases with spatial frequency, up to approximately 30 cpd (magenta line in Figure 2D), which, interestingly, roughly corresponds to the frequency limit given by the spatial resolution of photoreceptors in the fovea. In contrast, unlike drift, contrast modulations due to the onset/offset of the stimulus on the display cause power redistributions that do not depend on the spatial frequency of the stimulus (black line in Figure 2D). It is important to keep in mind that eye movements do not generate new power in the retinal input. They only redistribute the original DC power of the stimulus, so that a complementary frequency-dependent attenuation of power occurs along the 0 Hz axis (Figure 2-figure supplement 1).
Both eye drift and contrast changes yield temporal modulations that are well within the range of temporal sensitivity of retinal ganglion cells (cfg. Figure 2C and E). However, in simulations that replicated the standard conditions of contrast sensitivity measurements, drift modulations predominated. Since drift modulations convey little power at low spatial frequencies, the responses of standard ganglion cells were attenuated in this frequency range ( Figure 3B-C). This happened for both M and P cells, despite the well-known differences in their spatio-temporal sensitivity. As a consequence of this effect, a simple linear combination of the resulting M and P responses accurately predicted human contrast sensitivity with stationary stimuli over the entire range of relevant spatial frequencies (solid line in Figure 3A).
In contrast, in the absence of eye movements, when the only temporal modulations were those given by the onset/offset of the stimulus on the monitor, the CSF predicted by the same linear combination of neural responses exhibited a low-pass behavior that deviated considerably from human contrast sensitivity, especially at low spatial frequencies (dashed lines in Figure 3). In fact, no linear combination of modeled responses could approximate the CSF in this condition. This happened because, unlike the luminance modulations resulting from ocular drift, the amplitude of the contrast modulations of the stimulus on the display does not depend on the spatial frequency of the stimulus (black line in Figure 2D). Thus, without taking ocular drift into account, neuronal models exhibit a higher level of response at low spatial frequencies, as dictated by the spatial sensitivity of their kernels -and this strongly deviates from the CSF ( Figure 1A).
In sum, standard models of the responses of M and P RGCs well predict the shape of the human CSF as measured with stationary gratings, but only when one considers sensitivity to the temporal modulations caused on the retina by fixational drift.
Contrast sensitivity is a function not only of the spatial frequency of the stimulus but also of its temporal frequency. Measurements with gratings modulated in time have long shown that the CSF in humans is not space-time separable: the way contrast sensitivity varies with spatial frequency depends on the temporal frequency of the modulation (Robson, 1966). As the temporal frequency increases, the CSF changes its shape, transitioning from band-pass to low-pass ( Figure 4A).
To investigate whether our model also accounts for this change in shape, we repeated our simulations using gratings modulated at various temporal frequencies. The same linear combination of the responses of M and P cells as in Figure 3 continued to closely match human performance when the stimulus was temporally modulated on the display, and the predicted CSF replicated the low-pass to band-pass transition observed in primates, as the frequency of the modulation increased ( Figure 4B). This change in shape was the consequence of the different amount of dynamic power that the combination of fixational drift and temporal modulations of the stimulus delivered within the range of neuronal sensitivity. Since we assume that there is no sensitivity to unchanging stimuli, the DC power does not contribute to cells' responses. However, flickering a grating has the effect of shifting the 0 Hz power of the grating to the temporal frequency of the modulation ( Figure 4C). As a consequence, as the frequency of the modulation increased, this DC power was progressively moved into the sensitivity range of modeled neurons. At low temporal modulating frequencies (e.g. 1 Hz or below), only a small fraction of this power was within the region of neuronal sensitivity, and the temporal redistribution resulting from eye drift continued to exert a strong influence, forcing the CSF to maintain its band-pass shape. However, at higher temporal frequencies (e.g. 6 Hz and higher), the power restricted to the 0 Hz axis in the absence of stimulus' modulations now became fully available within the cells' peak sensitivity region. Since this static power is predominantly at low spatial frequencies ( Figure 2-figure supplement 1), it caused a transition from band-pass to low-pass behavior in the responses of simulated M and P neurons, as well as in the shape of the CSF. Estimates of the CSF at intermediate frequencies between 0 Hz and 6 Hz (Figure 4-figure supplement 1) suggest that this transition occurs around 3 Hz, which is in agreement with psychophysical results (Bowker and Tulunay-Keesey, 1983).
In sum, our model attributes the space-time inseparability of the CSF to the structure of the temporal modulations delivered within the range of sensitivity of retinal ganglion cells. Modulations resulting from eye drift yield a band-pass CSF, whereas sinusoidally modulated gratings yield a lowpass CSF. The interplay between these two components of the retinal input explains not only contrast sensitivity with stationary gratings, but also the band-pass to low-pass transition that occurs with temporally modulated gratings. Notably, it correctly predicts the temporal frequency range at which this transition takes place. Our results, thus, suggest a functional link between the physiological instability of visual fixation and the characteristics of the CSF.
A natural question then emerges: how is contrast sensitivity affected by elimination of the luminance modulations caused by ocular drift? Ideally, in the complete absence of eye movements, neural responses in our model would only be driven by the modulations present in the external stimulus.
Under such conditions, the model predicts that sensitivity to a stationary grating would be greatly attenuated and the CSF would shift toward a low-pass shape, as it would lack the frequency-dependent amplification operated by ocular drift.
In real experiments, however, elimination of oculomotor-induced luminance modulations is impossible. Retinal stabilization -a laboratory procedure that attempts to immobilize an image on   the retina (Riggs et al., 1953;Yarbus, 1957) -is always affected by noise in the oculomotor recordings as well as imperfections in gaze-contingent display control, which leave some residual motion on the retina. Under these conditions, contrast sensitivity has indeed been found to be attenuated, but it maintains its band-pass shape and peaks at higher spatial frequencies (Kelly, 1979).
To examine whether sensitivity to temporal transients accounts for the changes in the CSF measured under retinal stabilization, we exposed modeled neurons to reconstructions of the visual input signals experienced in these experiments. Previous studies have established that a Brownian model well captures the characteristics of retinal image motion during fixation Poletti et al., 2015). Building on this previous finding, we modeled the residual motion of the retinal image in stabilization experiments as a Brownian process, but with greatly reduced diffusion coefficients relative to that present during normal, unstabilized fixation. Figure 5A shows how the spatial frequency content of the luminance fluctuations experienced by retinal receptors (the power available at nonzero temporal frequencies) varies with the scale of the Brownian motion process (i.e. its diffusion coefficient, D). Changing the amount of retinal image motion has interesting repercussions on the characteristics of temporal modulations. As expected, a smaller diffusion constant delivers less dynamic power to the retina within the range of neural sensitivity, a direct consequence of the fact that luminance modulations are now smaller. However, a smaller D also has the effect of shifting the range of amplification to higher spatial frequencies by a factor of ffiffiffi ffi D p . This happens because reducing the scale of retinal image motion is functionally equivalent to spatially stretching the stimulus, which translates, in the Fourier domain, to a compression of the axis of spatial frequencies that moves the amplification range toward higher spatial frequencies.
These effects in the spectral distributions of the retinal flow well match the changes in contrast sensitivity observed in retinal stabilization experiments. Figure 5B compares classical retinal stabilization data from Kelly (1979) to the sensitivity predicted by our model when the diffusion constant of the retinal image motion was attenuated by a factor of 125, which corresponds to shrinking the spatial scale of eye movements by approximately one order of magnitude. Model predictions closely followed psychophysical measurements: a reduction in the amount of retinal image motion attenuated contrast sensitivity while maintaining its band-pass shape and shifted its peak sensitivity to higher spatial frequencies from 4 Hz to 5.5 Hz ( Figure 5B). These data show that consideration of the luminance modulations resulting from the motion of the stimulus on the retina accounts not only for behavioral sensitivity measurements performed in the presence of normal eye movements, but also for measurements made under conditions of retinal stabilization, when retinal image motion is greatly reduced.

Discussion
Contrast sensitivity is a fundamental descriptor of visual functions. In many species, including humans, sensitivity strongly depends on the spatial and temporal frequency of the stimulus. Here, we show that a temporal scheme of spatial encoding, a scheme in which spatial vision is driven by temporal changes, predicts such dependencies when the temporal modulations introduced by incessant eye movements are taken into account. In contrast, when these consequences of fixational drift are ignored, the known response characteristics of retinal ganglion cells fail to account for human CSF. As described below, these results are highly robust, bear multiple consequences, and lead to important predictions.
An important consequence of our results regards the strategies by which the visual system encodes spatial information. Existing theories of visual processing have attributed the shape of the CSF to the characteristics of early visual processing. In an influential study (Atick and Redlich, 1992) found that the theoretical filter that optimally decorrelates natural images closely matches the CSF. Since decorrelated responses enable compact neural representations, these authors assumed that the CSF reflects the average spatial selectivity of ganglion cells in the retina. However, experimental measurements have long shown that the response selectivity of RGCs differs considerably from the CSF, particularly at low spatial frequencies, where decorrelation would be most beneficial (Hicks et al., 1983;Kaplan and Shapley, 1982;Derrington and Lennie, 1984;Croner and Kaplan, 1995). As expected from this deviation, broad spatial correlations in RGCs responses have been found in preparations in which natural images are displayed in the absence of eye movements (Puchalla et al., 2005;Segal et al., 2015). These findings are consistent with our model: when the transients in stimulus presentation override the consequences of eye drift, spatial sensitivity follows the spatial kernels of modeled receptive fields. For this reason, responses to low spatial frequencies are enhanced relative to the level that would be needed for decorrelating activity.
The same principle also provides an explanation for the band-pass to low-pass transition of the CSF as the temporal frequency of the stimulus increases. This transition is the consequence of the spectral characteristics of the signals that the combination of fixational drift and stimulus transients delivers within the range of neuronal temporal sensitivity. With stationary gratings, temporal modulations in the retinal input are heavily influenced by ocular drift, which enhances high spatial frequencies imposing a band-pass sensitivity (Figure 3). With temporally modulated gratings, neuronal responses are also affected by the contrast modulation imposed to the stimulus on the display. Above a frequency of a few Hz, the impact of external modulations outweighs the effects of eye movements, removes the space-time inseparability in cell responses caused by ocular drift, and enhances again sensitivity to low spatial frequencies ( Figure 4B).
Rather than attributing spatial sensitivity solely to the spatial selectivity of RGCs, our analysis shows that the CSF is shaped by the joint spatial and temporal characteristics of retinal responses and how they interact with oculomotor transients. It predicts the complex way contrast sensitivity varies with the spatial and temporal frequency of the stimulus by a linear combination of the spacetime separable functions of P and M channels. While our study cannot exclude that other mechanisms, at various stages of visual processing, may also play a role in shaping the CSF (e.g. the number of neurons in different frequency channels), it suggests that these other contributions are minimal. Consideration of RGCs temporal sensitivity provides a parsimonious unifying framework for a wide range of experimental measurements of the CSF with only a minimal set of assumptions.
In our model, we assumed that retinal ganglion cells possess negligible sensitivity below the frequencies at which sensitivity can practically be measured (~0.2-0.3 Hz). This hypothesis may appear to conflict with the neurophysiological data reported in the low temporal frequency range by several studies. However, in both neurophysiological and psychophysical experiments, measuring sensitivity in this range is challenging because it requires trials with long durations, consideration of the visual stimuli present before and after each trial, and estimation of long impulse responses. Typically, the transfer functions reported at low temporal frequencies are extrapolations outside of the range of measured values based on models that were not designed for this purpose (e.g. the linear cascade model (Victor, 1987) in Benardete and Kaplan, 1997b;Benardete and Kaplan, 1997a; a difference of exponential in Derrington and Lennie, 1984, etc.). These extrapolations must be interpreted with great caution, as they merely reflect untested model assumptions.
The few studies that specifically examined retinal ganglion cells' responses at low temporal frequencies found a decline in sensitivity up to the limit that they could measure (Victor, 1987;Purpura et al., 1990). These studies suggest that the response attenuation takes the form of an approximately linear decrease in log-log scale. Such behavior is expected from theoretical considerations based on the characteristics of adaptation (Thorson and Biederman-Thorson, 1974), considerations that appear to apply to the responses of cones in the retina of the macaque (Boynton and Whitten, 1970) and therefore will limit the low-frequency behavior of retinal ganglion cells. Furthermore, temporal signals at frequencies below~0.3 Hz, even if present, are not likely to be useful to an observer in a psychophysical experiment, as they will contain noise power due to visual stimulation on previous trials and during the intertrial interval (such as eye-blinks and glances around the lab). Our results are robust to the specifics of how this low-frequency attenuation in sensitivity was implemented. The curves presented in Figures 3, 4 and 5 were obtained by simply discarding responses below 0.6 Hz. Results were, however, virtually identical when we used different frequency thresholds (Figure 4-figure supplement 2A), or when we modeled sensitivity as a power law function in the low-frequency range, as in Purpura et al. (1990) and Thorson and Biederman-Thorson (1974) (Figure 4-figure supplement 2B).
We specifically focused on fixational drift both because of its ubiquitous presence and its known influence on fine pattern vision (Ratliff and Riggs, 1950;Ditchburn, 1955;Steinman et al., 1973;Rucci et al., 2007;Ratnam et al., 2017). Other types of eye movements, like saccades and microsaccades, tend to be suppressed during measurements of contrast sensitivity (Mostofi et al., 2016) and were not considered in this study. The transients from these movements, however, differ in their spectra from those from eye drift, as they provide equal temporal power across a broad range of spatial frequencies. Thus, during normal viewing, the visual system could benefit from different types of modulations. In keeping with this idea, it has been argued that the stereotypical alternation of oculomotor transients resulting from the natural saccade/drift cycle contributes to a coarse-to-fine processing dynamics at each visual fixation (Boi et al., 2017).
It is worth emphasizing that our results are very robust and do not depend on fitting model parameters. With regard to oculomotor activity, we did not model eye movements, but used real traces recorded from human subjects during measurements of contrast sensitivity. With regard to neuronal properties, we implemented standard M and P filters obtained from the neurophysiological literature and frequently adopted by modeling studies (Croner and Kaplan, 1995;Benardete and Kaplan, 1997a;Benardete and Kaplan, 1999). We chose to estimate the CSF by linearly combining M and P responses in fixed ratio, because this was the simplest model. But we note that other ways of combining M and P signals will yield very similar conclusions, since the space-time inseparability originate from the visual input rather than the neuronal models. Our two parameters (the global gain at a given temporal frequency and the ratio of M-P contributions, see Equation 7 in the Materials and methods section) were merely used to quantitatively align the modeled CSF with the experimental data. They have no role in explaining the shape of the CSF and its band-to low-pass transition.
In addition to providing a comprehensive explanation of the CSF, our study makes important predictions at different levels. At the neural level, our results predict that the response selectivity of RGCs will change when measured in the presence and absence of the fixational motion of the retinal image. Neurophysiological studies already suggest that fixational eye movements are an important component of visual encoding (Gur et al., 1997;Leopold and Logothetis, 1998;Martinez-Conde et al., 2000;Olveczky et al., 2003;Kagan et al., 2008;Meirovithz et al., 2012;McFarland et al., 2016). Eye jitter has been found to reduce redundancy in the responses of retinal neurons (Segal et al., 2015) and to synchronize them, enhancing visual features (Greschner et al., 2002) even beyond the physiological limitations imposed by photoreceptors spacing (Juusola et al., 2016). Furthermore, retinal ganglion cells have been found that may distinguish between the global motion given by fixational eye movements and the local motion of objects (Olveczky et al., 2003). Yet, retinal responses are traditionally measured with the eyes immobilized, a condition in which RGCs tend to exhibit relatively strong responses at low spatial frequencies (Croner and Kaplan, 1995). Our model predicts that the spatial frequency amplification produced by fixational drift in the retinal input ( Figure 2D) will enhance neuronal sensitivity to higher spatial frequencies and will reduce sensitivity to low spatial frequencies. As a consequence, RGCs' spatial sensitivity should exhibit a more pronounced band-pass behavior and its peak should shift toward higher frequencies. This prediction is difficult to test in vivo, because of the need to completely stabilize the retinal input, but it can be thoroughly investigated in vitro, where the motion of the retinal image is under full experimental control.
At the perceptual level, an interesting observation comes from the changes in the frequency content of the retinal input shown in Figure 5A. The amplitude of fixational instability regulates the power available in different spatial frequency bands. Specifically, the smaller the amount of retinal image motion, the more the range of amplification shifts to higher spatial frequencies. The visual system could, in principle, exploit this relationship by dynamically matching the spatial scale of eye drift to the frequency content of the visual scene, or the frequency range that is task-relevant. Within a certain range, smaller drifts would optimize information accrual when foveating on regions rich in high spatial frequencies. This effect could not only be directly driven by the stimulus in a bottom-up fashion, but also be used to meet top-down demands in high-acuity tasks. Indeed, several studies support the idea that humans can control the amount of their ocular drift (Steinman et al., 1973;Cherici et al., 2012;Poletti et al., 2015). In the same vein, the relationship between fixational drift and the frequency content of the retinal input may also explain individual perceptual differences. Subjects with relatively smaller drifts are expected to perform better in tasks in which high spatial frequencies are critical. Studies that quantitatively relate the characteristics of fixational eye drift to visual perception are needed to investigate these predictions. Furthermore, our model predicts that manipulating temporal modulations from eye drift will affect performance. We have shown that reducing the amount of the retinal jitter well matches the overall reduction in contrast sensitivity as well as the shift to higher spatial frequencies observed in experiments of retinal stabilization. In the other direction, enlarging fixational jitter increases the amount of power available at low spatial frequencies predicting an improvement in contrast sensitivity in this range. This prediction is consistent with the improvements in word and object recognition reported in patients with central visual loss, when images or text are jittered or scrolled (Watson et al., 2012;Harvey and Walker, 2014;Gustafsson and Inde, 2004). The spatial frequency band of retinal ganglion cells decreases with eccentricity and enlarging retinal image motion has the effect of bringing more power in their range of sensitivity.
Our study also has clinical implications, as it predicts that disturbances in fixational oculomotor control will affect visual sensitivity. Oculomotor anomalies and impaired sensitivity co-occur in a variety of disorders, including conditions as diverse as dyslexia (Stein and Fowler, 1981;Stein and Fowler, 1993) and schizophrenia (Dowiasch et al., 2016;Egaña et al., 2013). Patients with these conditions exhibit similar visual deficits including reduced sensitivity (Lovegrove et al., 1980a;Lovegrove et al., 1980b;Slaghuis, 1998), low-level visual impairments (Eden et al., 1996;Li, 2002;Butler et al., 2001;Kim et al., 2006) and reading disabilities (Revheim et al., 2006) possibly caused by the disturbances in low-level vision (Revheim et al., 2006;Lovegrove et al., 1980a). Our results suggest a potential link between fine-scale eye movements and these visual deficits, which has not yet been investigated and which may inspire novel therapeutic approaches.

Data collection and analysis
To examine the influences of eye movements on visual sensitivity, neuronal models were exposed to reconstructions of the input signals typically experienced by observers in experiments of contrast sensitivity. To this end, we used oculomotor traces recorded in measurements of contrast sensitivity to move the stimuli presented as input to the models. Methods for the collection and analysis of eye movements data, as well as perceptual results have already been described in previous publications and are only briefly summarized here (see Mostofi et al., 2016 andBoi et al., 2017). This section focuses on the methods that are novel to this study.

Subjects
Eye movements were recorded from five observers (all females, age range 21-31). To optimize the precision of the recordings, only subjects with normal, uncorrected vision took part in the study. Informed consent was obtained from all participants following the procedures approved by the Boston University Charles River Campus Institutional Review Board (protocol number 1062E).

Apparatus
Stimuli were displayed on a gamma-corrected fast-phosphor CRT monitor (Iyama HM204DT) in a dimly-illuminated room. They were observed monocularly with the left eye patched, while movements of the right eye were recorded by means of a Dual Purkinje Image eyetracker (Fourward Technology) and sampled at 1 KHz. This system has a resolution -measured by means of an artificial eye -of approximately 1 0 (Crane and Steele, 1985;Ko et al., 2016). A dental imprint bite bar and a head-rest prevented head movements. Stimuli were rendered by means of EyeRIS, a custom system that enables precise synchronization between oculomotor events and the refresh of the image on the monitor .

Stimuli and procedure
As in typical psychophysical CSF measurements, we used a standard grating-detection paradigm (see Mostofi et al., 2016 for the behavioral data). In a forced-choice procedure, observers detected 2D Gabor patterns oriented at AE45 . Their contrast varied across trials following PEST (Taylor and Creelman, 1967). The frequency and standard deviation of the Gabor were 10 cycles/deg and 2:25 respectively. Stimuli were displayed over a uniform field with luminance of 21 cd=m 2 . Oculomotor traces were segmented in complementary periods of drift and saccades based on a speed threshold of 2 o /s (Mostofi et al., 2016). Only oculomotor traces collected around threshold levels of sensitivity and that contained no saccades, microsaccades or blinks were used in this study.
Modeled neurons were exposed to the same retinal input experienced by human participants, identically replicated at all spatial frequencies. Gratings were presented for 3.2 s. They were smoothly ramped up and down in contrast at the beginning and end of the trial by means of the modulating function MðtÞ and also modulated in time at frequency ! t (! t = 0, 1, 6, 16, or 22 Hz). The reconstructed retinal input was thus given by: where ðtÞ ¼ ½ x ðtÞ; y ðtÞ represents eye movements and f s ¼ ½f s cosða s Þ; f s sinða s Þ the stimulus frequency (0.1-60 cycles/deg). The orientation a s and the phase f s uniformly spanned the range ½0 2pÞ.

Neural models
The mean instantaneous rate of retinal ganglion cells (RGCs) were simulated by means of standard space-time separable linear filters with transfer function: where f and ! indicate spatial and temporal frequencies respectively. The spatial kernel Kðf Þ was modeled as in Croner and Kaplan (1995) with a standard difference of Gaussians: Kðf Þ ¼ CðK c pr 2 c e Àprcjgf j 2 À K s pr 2 s e Àprsjgf j 2 Þ (3) with parameters adjusted based on neurophysiological recordings from macaques ( Table 1 in Croner and Kaplan, 1995). The scaling factor g was set to 0:5 to model the smaller receptive fields of the fovea following cortical magnification (Eq.8 in Van Essen et al., 1984). The temporal sensitivity function Hð!Þ consisted of a series of low-pass filters and a high-pass stage as propose by Victor (1987): Parameters were taken from neurophysiological studies that fitted this model to recorded neurons (M cells: median values in Table 2 in Benardete and Kaplan, 1999; P cells: median values in Table 2 in Benardete and Kaplan, 1997a). The scaling factor was set to 1/1.6 to include the effects of large stimuli on retinal responses ( Figure 7B in Alitto and Usrey, 2015).

Estimating contrast sensitivity
The main hypothesis of our study is that the visual system is insensitive to temporal stimulation at 0 Hz so that spatial sensitivity is entirely driven by temporal transients. For this reason, we estimated the predicted CSF on the basis of cell responses to input changes.
For each spatial frequency f s of the grating, we first estimated the space-time power spectrum of the retinal input P I ðf ; !Þ by averaging the square of the absolute value of the Fourier transform of Equation 1 across trials, stimulus' orientations a s and phases f s . Since both P I ðf ; !Þ and the spatial kernels Kðf Þ possess circular symmetry in spatial frequency, we reduced the spatial dimensionality from 2D to 1D by radial averaging. We then computed the power spectrum of neuronal responses Oðf ; !Þ by multiplying the space-time power spectrum of the retinal input P I ðf ; !Þ by the transfer functions of the cells' filters: O z ðf ; !Þ ¼ P I ðf ; !Þ Á jRF z ðf ; !Þj 2 where RF z ðf ; !Þ, with z ¼ M or P, represents the Fourier transform of M or P cells' receptive fields (Equation2). Finally, we evaluated the CSF at each spatial frequency f , by computing the square root of the integrated temporal power across all non-zero temporal frequencies: where O z represents the power spectrum of M or P responses. The integral in Equation 6 was computed numerically. To avoid artifacts from finite bandwidth, the first two temporal samples of the spectrum were discarded so that integral over temporal frequency started from ! ¼ 0:63 Hz. However, virtually identical results were obtained when we used lower thresholds or when we modeled the low-frequency range of temporal sensitivity as a power law (Figure 4-figure supplement 2). The predicted CSF was then estimated, for each condition, by a linear combination of the contrast sensitivities of the two types of neurons, CSF M ðf Þ and CSF P ðf Þ : CSF est ðf Þ ¼ A Á ½lCSF M ðf Þ þ ð1 À lÞ Á CSF P ðf Þ where l (l ¼ 0:57 for all conditions) weighs the contributions of the M and P populations and A is a global rescaling coefficient. to model the temporal kernels of magno-(upper row) and parvo-cellular (bottom row) neurons. Data are from Benardete and Kaplan (1997a); Benardete and Kaplan (1999 Data availability All data generated or analysed during this study are included in the manuscript and supporting files.