Spatiotemporal summation of perimetric stimuli in healthy observers

Spatial summation of perimetric stimuli has been used to derive conclusions about the spatial extent of retinal-cortical convergence, mostly from the size of the critical area of summation (Ricco's area, RA) and critical number of retinal ganglion cells (RGCs). However, spatial summation is known to change dynamically with stimulus duration. Conversely, temporal summation and critical duration also vary with stimulus size. Such an important and often neglected spatiotemporal interaction has important implications for modeling perimetric sensitivity in healthy observers and for formulating hypotheses for changes measured in disease. In this work, we performed experiments on visually heathy observers confirming the interaction of stimulus size and duration in determining summation responses in photopic conditions. We then propose a simplified computational model that captures these aspects of perimetric sensitivity by modelling the total retinal input, the combined effect of stimulus size, duration, and retinal cones-to-RGC ratio. We additionally show that, in the macula, the enlargement of RA with eccentricity might not correspond to a constant critical number of RGCs, as often reported, but to a constant critical total retinal input. We finally compare our results with previous literature and show possible implications for modeling disease, especially glaucoma.


Introduction
Measuring how contrast sensitivity varies according to different stimulus sizes and durations has proven invaluable for investigating the psychophysical and physiological basis of transient stimulus detection (Choi, Nivison-Smith, Khuu, & Kalloniatis, 2016;Khuu & Kalloniatis, 2015a, 2015bPan & Swanson, 2006;Phu, Khuu, Zangerl, & Kalloniatis, 2017) and how the underlying physiology is altered by disease (Kwon & Liu, 2019;Redmond, Garway-Heath, Zlatkova, & Anderson, 2010;Rountree et al., 2018;Yoshioka, Zangerl, & Phu, 2018). In fact, change in sensitivity with increasing stimulus size (spatial summation) and duration (temporal summation) has been shown to be altered following retinal ganglion cell (RGC) loss from glaucoma (Mulholland et al., 2015;Redmond, Garway-Heath, Zlatkova, & Anderson, 2010;Rountree et al., 2018;Yoshioka, Zangerl, & Phu, 2018). Both spatial and temporal summation are characterized by a biphasic response, with a steeper reciprocal relationship between stimulus area/duration and contrast at threshold for smaller/shorter stimuli (total summation) and a shallower change for larger/longer stimuli (partial summation). The response is often characterized in terms of the point of transition between these two phases (critical size/duration) (Mulholland et al., 2014). The physiological basis of spatial and temporal summation has been extensively studied. Although models solely based on RGCs exist (Glezer, 1965), spatial summation has been linked to cortical magnification and to the convergence of RGCs onto cells of the visual cortex (Kwon & Liu, 2019). This phenomenon is often referred to as cortical pooling, and it is the favored model for explaining spatial summation (Kwon & Liu, 2019;Pan & Swanson, 2006). Cortical pooling can be modeled through a linear combination of filter elements tuned to different spatial frequencies (Pan & Swanson, 2006).
One aspect that has been explored to a lesser extent is the interaction between stimulus size and duration and its effect on sensitivity (spatiotemporal summation). Models exist to describe temporal summation in isolation (Gorea & Tyler, 1986;Swanson, Pan, & Lee, 2008;Watson, 1979). Many of these authors acknowledge the effect of stimulus configuration (Gorea & Tyler, 1986;Watson, 1979) and adaptation state (Swanson, Ueno, Smith, & Pokorny, 1987) on critical duration. Direct experimental evidence of the interaction between size and duration for simple circular stimuli (Barlow, 1958;Mulholland et al., 2015;Owen, 1972) suggests a combined integration of the total input by the visual system. Some attempts have been made to describe such an interaction, mainly in the field of motion detection (Anderson & Burr, 1991;Fredericksen, Verstraten, & van de Grind, 1994), but this phenomenon has been little explored for perimetry (Owen, 1972). Another aspect that has been overlooked is the effect of retinal convergence. One common assumption is that spatial summation at different eccentricities can be exclusively explained by the change in density of RGCs (Kwon & Liu, 2019). However, similarly to cortical convergence, individual RGCs might carry a different weight in terms of retinal input at different eccentricities because they receive input from a different number of photoreceptors (larger in the periphery), with significant changes in the composition and density of their mosaic.
Understanding these aspects is essential for many clinical applications of psychophysics. White-on-white perimetry is one of the most performed tests in clinical practice to diagnose and monitor the progression of a variety of diseases. In its most common implementation, the test is a "yes/no" task in which an observer is asked to press a button every time a stimulus is perceived. The response needs to be provided within a set time window following stimulus onset, with no response indicating that the stimulus was not seen. The stimulus is projected on a bowl with a uniform white background and usually consists of a circular target with sharp edges and 0.43 degrees in diameter (size III according to Goldmann [1999]) and a duration between 100 and 200 ms. The intensity of the target is varied to estimate the 50% seen contrast threshold, using a variety of strategies. The target is presented at various locations around the fixation target, according to a set of predetermined testing grids, so that the 50% threshold can be estimated at each of these locations. This produces a sensitivity map that can be used to identify and monitor visual field defects. The objective of our work was to collect experimental data to build and validate a spatiotemporal summation model, able to capture the combined effect of retinal convergence, stimulus size, and stimulus duration for perimetric stimuli.

Participants
Ten visually healthy participants between 18 and 40 years of age were recruited on a voluntary basis at City, University of London, London, United Kingdom. All participants gave their written informed consent. The study was approved by the local ethics board (Optometry Proportionate Review Committee, approval number ETH2021-1728) and adhered to the tenets of the Declaration of Helsinki. All participants underwent an ophthalmic assessment by an ophthalmologist (GM), which included objective refraction and measurement of the intraocular pressure (IOP) with a noncontact tonometer and auto-refractor (TRK-1P; Topcon, Tokyo, Japan), best-corrected visual acuity (BCVA) with Snellen charts, slit-lamp assessment of the anterior segment, and indirect fundoscopy. Reasons for exclusion were any abnormality of the retina or of the optic nerve head (ONH), IOP > 21 mmHg, and a BCVA < 6/6 in the test eye. If both eyes were eligible, the one with the smallest refractive error was selected.

Testing apparatus
All experiments were carried out on an Octopus 900 bowl perimeter (Haag Streit AG, Koeniz, Switzerland) controlled through the Open Perimetry Interface (OPI) (Turpin, Artes, & McKendrick, 2012). The bowl is 30 cm in radius. The perimeter is equipped with a chinrest and an infrared camera to monitor eye position and pupil size. Chinrest position was adjusted by the operator as required, to maintain good centration of the pupil. A central target (four small dots in a diamond arrangement) encouraged fixation and avoided interference with centrally presented stimuli. A near-vision lens addition of approximately +2.50 D was used to reduce strain from accommodation, refined with subjective assessment of optimal visibility by the subject. Lenses were placed on an adjustable lens holder in-built to the instrument. The background illumination was 10 cd/m 2 . Calibration was performed in a dark room before every experiment through an automated procedure implemented by the manufacturer. As it is convention in perimetry, the intensity of the stimulus in dB is expressed as attenuation of the maximum possible stimulus intensity (3,185 cd/m 2 ), so that higher contrast equates to lower dB values. This quantity can be converted to Weber contrast (W c ) using Equation 1. However, for simplicity in our calculations, we report the values as differential light sensitivity (DLS), which is simply the sensitivity value in dB/10.

Spatiotemporal summation
In the first experiment, we estimated contrast sensitivity at 12 locations in the central visual field (VF) with different stimulus sizes and durations for one test eye of all 10 participants. The locations' coordinates {X; Y} in visual degrees from fixation were {±7; ±7}; {±4, ±4}; {±1, ±1}. Stimuli were round achromatic targets with five different diameters (Goldmann sizes, G): 0.10 (G-I), 0.21 (G-II), 0.43 (G-III), 0.86 (G-IV), and 1.72 (G-V) deg. All locations were tested with all stimulus sizes. The locations at {±7; ±7} were additionally tested with five different stimulus durations (for all stimulus sizes): 15 ms, 30 ms, 55 ms, 105 ms, and 200 ms. Four combinations (G-I/15 ms, G-I/200 ms, G-V/15 ms, and G-V/200 ms) were tested twice so that more robust estimates of their threshold were available for the measurement of the frequency of seeing (FOS) curves (see next section).
The threshold was determined with a yes/no task. The observer was asked to press a button every time a stimulus was perceived. We assumed that no response within a predetermined time widow (1,500 ms) corresponded to "not seen." The threshold was estimated through a Bayesian strategy, the Zippy Estimation through Sequential Testing (ZEST) (King-Smith et al., 1994), as implemented on the OPI. For our test, the strategy was set to have a uniform prior distribution between 0 and 50 dB (the range of the instrument). The likelihood function was a Gaussian cumulative distribution function (CDF) with a standard deviation (SD) of 1 dB and a guess/lapse rate of 3%. The prior distribution was updated at each response to generate a posterior distribution. The posterior distribution was used as the prior distribution for next step in the strategy. The stimulus was chosen as the mean of the prior distribution at each step, rounded to the closest integer dB value. This has been shown to provide unbiased estimates of the 50% detection threshold for a yes/no task (King-Smith et al., 1994). The determination of each threshold terminated when the posterior distribution reached a standard deviation < 1.5 dB (dynamic termination criterion).
Each combination of stimulus size and duration at each location was treated as a separate independent "thread" by the strategy (140 in total). The threads were randomly subdivided into four blocks, to allow for breaks within the test. Each block of testing lasted for approximately 15 min (∼350 presentations). Individual presentations within each block were fully randomized. A block was completed when all the 35 threads assigned to it reached the termination criterion. A pause between individual presentations was also introduced, calculated as (1,000 ms -response time, minimum 200 ms) plus an additional pause, randomly sampled from a uniform distribution between 0 and 100 ms. All responses occurring within the pause or less than 180 ms after stimulus onset stimulus (Olsson, Bengtsson, Heijl, & Rootzen, 1997) were considered false responses and discarded.

Frequency of seeing curves
For a subset of five participants, FOS curves were determined for four stimulus combinations (G-I/15 ms, G-I/200 ms, G-V/15 ms, and G-V/200 ms) at coordinates {±7; ±7} degrees (four locations) using a method of constant stimuli (MOCS) procedure. Following others (Rountree et al., 2018), we used a two-stage approach. First, we obtained a coarse estimate of the FOS curve through a multidimensional Bayesian strategy, QUEST+ (Watson, 2017). Such a strategy is similar in principle to ZEST but uses entropy to determine the next presentation and allows for multiple parameters to be estimated. In our procedure, the FOS curve was parameterized as the CDF of a Gaussian distribution, with a fixed guess/lapse rate of 3%. The mean and SD (which model the 50% threshold and the slope of the FOS curve, respectively) were simultaneously fitted as free parameters. The test was terminated when the entropy of the combined posterior distribution was ≤ 4.5. For the purpose of this preliminary step, the four spatial locations were considered interchangeable. Therefore, only four FOS curves were determined, one for each stimulus combination. The prior distribution for the mean was itself a Gaussian distribution with a SD of 4 dB, centered on the average of the sensitivity estimates obtained from the ZEST procedure for the tested locations (eight estimates for each stimulus combination, i.e., four locations each tested twice) and limited over a domain of ±5 dB around its mean. The prior distribution for the SD of the FOS curve was a uniform between 1 and 10 dB, with steps of 0.5 dB.
The estimated SD for the Gaussian FOS curves was used to determine the contrast levels to be tested for each stimulus combinations in the actual MOCS. We tested seven steps for each location and each condition. The steps were placed at the following quantiles of the Gaussian FOS (neglecting lapse/guess rate): {0.0001, 0.1, 0.3, 0.5, 0.7, 0.9, 0.9999}. We, however, ensured that all the steps were at least 1 dB apart (the minimum interval allowed by the device) and that the two most extreme contrast levels were at least 10 dB above and below the estimated 50% threshold. The 50% threshold was calculated as the average of the two test results obtained from the ZEST strategy for each location. Each contrast level was presented 25 times, and each spatial location was tested fully and independently, for a total of 2,800 presentations. A break of at least 10 min was introduced every 350 presentations, and the whole test was split into two sessions performed on two separate days. The individual presentations were fully randomized across test locations, stimulus area/duration combinations, and contrast levels. Pauses between presentations and false responses were determined as described above for the main experiment.
MOCS data were fitted using a Bayesian hierarchical model, similarly to Prins (2019). The results of the test performed on each subject were fitted independently. The psychometric function was modeled with the CDF of a Gaussian function ( ), where the mean (μ), SD (σ ), lapse rate (λ), and guess rate (γ ) were free parameters (see Equation 2). Mean (μ) and σ were hierarchical parameters that varied for each of the four tested locations. Information, however, was propagated across different locations to improve the robustness of the fit of the parameters for each testing condition. Lapses and guesses were instead modeled as global parameters for the whole test. Details of the implementation of the Bayesian model are reported in the Appendix.

Imaging
Retinal imaging was performed using a Spectralis spectral domain optical coherence tomography (SD-OCT; Heidelberg Engineering, Heidelberg, Germany) scanner. Dense macular volume scans spanning the central 25 × 30 visual degrees (121 vertical B-scans, 9 averaged scans) were segmented and exported as RAW files using the Heidelberg Eye Explorer (HEYEX; Heidelberg Engineering). Retinal ganglion cell layer (RGCL) thickness maps were built from segmentation data and converted to customized estimates of local RGC counts by combining thickness data with histology data provided by , using previously published methodology (Montesano et al., 2020;Raza & Hood, 2015). Local customized RGC density was calculated for each location tested in the psychophysical procedure by accounting for RGC displacement (Drasdo, Millican, Katholi, & Curcio, 2007;Montesano et al., 2020), using methodology detailed elsewhere (Montesano et al., 2020).

Modeling of perimetric sensitivity
One of the objectives of this study was to provide a model that was simple but sufficient to describe the change in sensitivity observed with different combinations of sizes and durations for perimetric stimuli. Our working hypothesis, derived from previous work (Barlow, 1958;Baumgardt, 1959;Mulholland et al., 2015;Owen, 1972), was that the combined effect of these two parameters, at any given location, could be described by taking the product of stimulus area and stimulus duration. We called this product the spatiotemporal input. We integrated the spatiotemporal input into a computational model of the response of RGC mosaics, partially based on the work by Pan and Swanson (2006) and Bradley et al. (Bradley, Abrams, & Geisler, 2014). The key novel aspect of our modeling was that the linear response from the RGC mosaic was pooled and integrated over time so that changes in duration and size of the stimulus would both simultaneously affect the temporal and spatial response of the system. We further modeled the retina as a two-stage mosaic, where the response from individual photoreceptors active in photopic adaptation conditions (cones) was integrated by the RGC mosaic to explore the effect of retinal convergence in the central visual field. The density of the two mosaics was varied to reproduce the effect of eccentricity. We refer to the combined effect of the spatiotemporal input and changes in retinal structure (i.e., density of the photoreceptor and RGC mosaics) as total retinal input. The model was implemented in MATLAB (The MathWorks, Natick, MA, USA) and is described in detail below.

Hexagonal mosaics
Following Swanson et al. , we modeled multiple detectors organized in a regular hexagonal lattice. This organization is reflective of many naturally occurring cell mosaics as it represents the most efficient packing scheme for objects with circular/spherical geometries (Legras, Gaudric, & Woog, 2018). For our purposes, we simplified the retina as being composed of two stacked mosaics, the photoreceptor mosaic and the RGC mosaic. Being interested in the results of experiments performed in photopic conditions (background illumination = 10 cd/m 2 ), we modeled only the cone mosaic. In this retinal model, individual RGCs pool the response from the photoreceptors according to their receptive fields (RFs). To improve the efficiency of computation, each hexagonal lattice was rearranged in a regular lattice with anisotropic spacing (see Figure A.1). This simplifies the pooling operation, which can be computed via simple convolution of the regularized lattice with the RGC-RF filter (see next section), also rearranged accordingly on the same regular lattice. The response of the photoreceptor mosaic was simply computed by multiplying the mosaic by the stimulus. In its simplest form, this is equivalent to assigning a value of 1 to all the photoreceptors that fall within the stimulus area, leaving the others to 0. However, in its final implementation, this was modified to include the effect of optical blur (see later). Only the Parasol OFF RGC mosaic was used for the calculations (P-OFF-RGC), assuming that the ON and OFF mosaics operate on parallel redundant channels for the detection of simple round stimuli. Parasol cells were chosen because there is experimental evidence that these cells preferentially mediate sensitivity to briefly flashed stimuli, such as those used in perimetry. The calculations were repeated with the midget OFF RGC mosaic (mOFF-RGC) and reported as supplementary material for comparison with some previous literature (Kwon & Liu, 2019).

RGC receptive field
The spatial filters for the RGC-RF were modeled with a Difference of Gaussian (DoG; Figure 1A), using the median parameters estimated by Croner and Kaplan (1995) from electrophysiology on macaques' retina. In their work, they showed that, although the scaling factors for the relative width and height of the inhibitory and excitatory Gaussian components of the filter changed with eccentricity, their ratios remained approximately constant. In this model, the surround inhibitory component has peak sensitivity The bottom-right panel shows a comparison between the predicted (unscaled) sensitivity for the numerical calculations from the mosaic with discrete changes in convergence (dots) and continuous factor scaling (line). K s = 0.01 * K c , where K c is the peak sensitivity of the excitatory center. The SD of the surround was 6.7 times larger than the SD for the center (average reported by Croner & Kaplan, 1995). The SD for the center was scaled so that the radius of the center component was equal to the intercell spacing of the mosaic (defined by its density). The radius was defined by Croner and Kaplan as the distance from the center at which the excitatory Gaussian component has value K c /e. The corresponding SD was approximated as SD = Cell spacing/1.414. Note that, while the center-surround proportions are based on Croner and Kaplan (1995), the actual extent of the RGC-RFs in our model depends only on the intercell spacing of the RGC mosaic.

Cone-RGC convergence
The number of cones that converge onto a RGC is known to increase with eccentricity Drasdo, Millican, Katholi, & Curcio, 2007;Sjostrand, Olsson, Popovic, & Conradi, 1999). In our model, this corresponds to an increasing number of photoreceptors pooled by the RGC-RF per unit area. This can be achieved by increasing the density of the cone photoreceptor mosaic, also provided by Curcio et al. (Curcio, Sloan, Kalina, & Hendrickson, 1990). The convergence rate can be calculated by taking the ratio of the density of cones over the density of P-OFF-RGCs ( Figure 1B). Because of how the hexagonal matrix has been rearranged for calculations (Figure 1), the intercell spacing for the RGC mosaic needs to be an exact multiple of that of the cone mosaic. This limits the possible cone/RGC ratios that can be calculated. However, changing the convergence ratio is equivalent to simply multiplying the response of the RGC obtained with a 1:1 convergence ratio by a scaling factor. This is easily demonstrated by the graph in Figure 1B. This method was therefore chosen to account for the change in convergence across the VF in a smooth fashion.

Modeling of optical factors
The effect of natural optics was modeled using the formula for the average modulation transfer function (MTF) of the human eye proposed by Watson (2013). In this formula, the square root of the diffraction-limited (DL) MTF, which depends only on the pupil size, is multiplied by a Lorentzian function whose parameters are fitted so that the product would approximate the average human MTF. A multiplicative correction factor, which depends on age and eye pigmentation, is then additionally applied to the MTF to account for light scattering. Figure 2 reports examples of the effect of optical blur on different stimulus sizes for different pupil apertures using the MTF (without accounting for scattering) (Watson, 2013). The calculations are performed by multiplying the two-dimensional Fourier transform of the stimulus by the MTF and then back-transforming in the spatial domain. The blurred stimulus can then be sampled with the photoreceptor mosaic. For each subject, we used the average pupil size recorded by the Octopus perimeter during the test to model the results of our experiments.

Proposed spatiotemporal model
One desired property of our proposed model was that the size and duration of the stimulus interacted so that longer stimuli would decrease Ricco's area (upper limit of complete spatial summation) and larger stimuli would shorten the critical duration (upper limit of complete temporal summation). One solution to achieve this is to use a pooling operation that integrates the spatial input over time. The integration, however, must take into account not only the duration of the stimulus but also the amount of RGCs stimulated (i.e., the amount of spatial input). In other words, the temporal integration is to be performed by a cortical pooler on the total spatial input rather than by individual detectors prior to pooling. The simplest model, with the smallest number of parameters, is a capacitor (Equation 3), which is convolved with the temporal profile of the stimulus and then integrated over time according to Equation 4 to obtain the response (in the equations, the symbol "*" indicates convolution; "×" indicates multiplication): where τ is the integration constant, k is the summation exponent (4 in this study) (Meese & Williams, 2000;Pan & Swanson, 2006;Quick, 1974;Robson & Graham, 1981;Tyler & Chen, 2000), and S is the total spatial input defined as where R i is the response of an individual ganglion cell to the stimulus. Note that the contribution of individual RGCs (R i ) can change because of the location of the RGC with respect to the stimulus (edge as opposed to center) and the effect of retinal convergence (RGCs in the periphery will have a bigger contribution when fully stimulated because of their larger pooling from the photoreceptors). The temporal profile of the stimulus is represented by f(t), which is a step function with value 1 when the stimulus is on and 0 otherwise. As previously mentioned, the combined effect of stimulus size, stimulus duration, RGC density, and retinal convergence defines the total retinal input. Much like other temporal filters, this operation can also be implemented through temporal convolution. Note that such an approach to spatiotemporal summation is very similar to what was described in Frederiksen et al. (Fredericksen, Verstraten, & van de Grind, 1994) and Anderson and Burr (1991) for motion detection. Since only the P-OFF-RGC mosaic was considered for our calculations, the RGCs that were assigned a negative input were considered inhibited by the stimulus. Their negative contribution to the sum can be interpreted as an inhibition of their background activity. Obviously, such a simple approach would not account for other filter choices with a strong biphasic response, where a simple summation would always result in a zero net sum. From the examples in Figure 3, we can see that this pooler has the desired properties when the response is computed for different stimulus sizes and durations (i.e., a shorter duration determines a larger critical area and vice versa). One additional convenient property of this pooler is that the critical size and duration depend on the integration constant τ . The integration constant τ is therefore the scaling factor of the pooler and can be used to test the hypothesis of constant input integration across the VF. If the hypothesis of constant integration response for the same amount of total retinal input is correct, we do not expect important changes in the integration constant across different testing conditions and eccentricities. An alternative approach would be to model individual RGCs (or higher-order visual detectors) as separate spatiotemporal integrators and to pool their response by vector summation (Pan & Swanson, 2006;Quick, 1974). Such an approach has the advantage of allowing the modeling of the response from specific classes of RGCs and produces sensible spatial and temporal summation responses. However, it fails to reproduce the interaction between spatial and temporal input that would be expected. For example, Ricco's areas in spatial summation curves would be unaffected by changes in stimulus duration. This is in contrast with evidence from the literature (Barlow, 1958;Baumgardt, 1959;Mulholland et al., 2015;Owen, 1972). It is worth noting that the current model could be extended to include the temporal response of individual classes of RGCs prior to pooling. However, this would increase the number of tunable parameters and would be beyond the objectives of the current study and what could be determined with our experiments.

Fitting procedure
The model described by Equation 4 was fitted to the data using an iterative algorithm (Nelder-Mead Simplex Method, fminsearch function in MATLAB; Lagarias, Reeds, Wright, & Wright, 1998) to minimize the root mean squared error (RMSE). The summation exponent was set to k = 4 (Meese & Williams, 2000;Pan & Swanson, 2006;Quick, 1974;Robson & Graham, 1981;Tyler & Chen, 2000), and the RGC mosaic density was varied according to the eccentricity following the model by Drasdo et al. (Drasdo, Millican, Katholi, & Curcio, 2007;Montesano et al., 2020). These estimates were corrected with individual imaging data obtained from the OCT scans, as previously reported (Montesano et al., 2020;Raza & Hood, 2015). The model was fitted by tuning the parameter τ , which represents the integration constant of the spatiotemporal input. An additional parameter (additive in log-scale) allowed translation along the vertical axis (log-DLS, Offset term).

Calculation of critical size
The transition from total to partial summation is smooth for the curves generated by our model. The response curve is fully characterized by the integration constant τ and the amount of retinal input. The calculation of the critical (Ricco's) area is therefore dependent on an arbitrary threshold and is only performed for comparison with previous literature. For our calculations, the transition point was the retinal input at which the slope of the summation curve is 0.5 (Piper's law). Note that the retinal input scales perfectly with stimulus size for our chosen implementation of the model, but nonlinearities are introduced if taking the sum of the module in Equation 5. For consistency with our supplementary analyses (see later), the conversion between stimulus area and retinal input for each mosaic was calculated numerically and locally approximated with a linear function in log 10 -log 10 scale. The parameters for the curves were fitted accounting for the optical blur (based on each participant's average pupil size and iris pigmentation). Densely sampled curves were numerically calculated using these parameters to estimate Ricco's area. These curves were calculated without the effect of optical blur. This simulates removing the estimated effect of optics on the size of Ricco's area. Note that accounting for convergence in the fitting process will not change Ricco's area, as parameters are optimized to fit the same data.

Statistical analysis
Statistical comparisons were performed using linear mixed models to account for correlations between observations from the same subject. When data from multiple locations were used, individual locations were used as a nested random factor within the subject. When multiple comparisons were compared, the p values were corrected using a Bonferroni-Holm correction. All calculations were performed in R (R Foundation for Statistical Computing, Vienna, Austria) using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015). All comparisons were performed on log 10 -transformed values of Ricco's area, integration constant, and number of P-OFF-RGCs, unless otherwise specified. Eccentricity was treated as a discrete factor.

Average response
In this section, we show plots of the average DLS for different experimental conditions to give an intuitive representation of the phenomena under investigation. Characteristics of each eye in the sample are reported in Table 1. Figure 4A reports the average DLS for the spatial summation experiment at different eccentricities. As expected, the summation curves are separated by a  Table 1. Characteristic of each eye in the sample. Note: All subjects had their sensitivity tested with the ZEST strategy for all the duration and size combinations for all tested locations. Psychometric functions were estimated for subjects from 1 to 5 using the method of constant stimuli. D = Diopter; logMAR = log-minimum angle of resolution; GCL = macular ganglion cell layer; RNFL = peripapillary retinal nerve fiber layer. Average macular and GLC thickness were measured for the area corresponding to the central 10 degrees.  horizontal shift, owing to the effect of the changes in the retinal mosaic. Interestingly, simply transforming the stimulus area into the corresponding estimated number of RGC-RFs underlying the stimulus did not fully account for the effect of eccentricity. Most of the effect was instead removed by considering the product of stimulus area, RGC-RF density, and cone/RGC convergence ratio. We evaluated this by comparing the results of a simple second-degree polynomial fit of the DLS using either the log 10 (stimulus area), the raw log 10 (number of RGCs), or the convergence weighted log 10 (number of RGCs) as predictors in a mixed-effect model. The unexplained residual variance (including random effects) was 1.93 dB 2 for the log 10 (stimulus area), 1.79 dB 2 for the unweighted log 10 (number of RGCs) (7.2% reduction), and 1.77 dB 2 for the convergence weighted log 10 (number of RGCs) (8.1% reduction). Figure 5 reports the average DLS for the different testing conditions at locations {±7; ±7} and shows how both spatial and temporal summation curves are affected by changes in stimulus duration and size, respectively. However, the values seem to follow a common trend when plotted according to the spatiotemporal input (i.e., the product of stimulus area and duration). We evaluated this by comparing the results of a simple seconddegree polynomial fit of the DLS using either the log 10 (stimulus area) or the log 10 (spatiotemporal input) as predictors in a mixed-effect model. The unexplained residual variance (including random effects) was 11.4 dB 2 for the log 10 (stimulus area) and 3.7 dB 2 for the log 10 (spatiotemporal input), a 67.5% reduction.
Taken together, these results and plots support the hypothesis that the main determinant of DLS is the total retinal input to higher visual centers, influenced by the number of stimulated RGCs, retinal convergence, and duration of the stimulus.

Spatial summation-effect of eccentricity
The parameters of the model were fitted independently for each location using the data collected with different stimulus sizes and 200-ms Figure 6. Boxplots of the different parameters and estimates derived from the model for spatial summation data. Note that the convergence weighted values in (D) are obtained by simply multiplying the uncorrected number of P-OFF-RGCs at Ricco's area by the convergence rate. The box encloses the interquartile range, the horizontal midline indicates the median, and the error bars extend from the 5% to the 95% quantiles. The vertical axis is log 10 -spaced. stimulus duration (the only duration tested at all eccentricities). Figure 6 reports the estimated critical size (Ricco's area) at different eccentricities. The average RMSE of the model fits was 0.85 ± 0.39 dB (mean ± SD). As expected, the estimated Ricco's area increased toward the periphery ( Figure 7C and Table 2), with no significant differences between the areas calculated with and without accounting for convergence. However, such a change did not correspond to a constant number of P-OFF-RGCs being stimulated. Instead, the estimated number of P-OFF-RGCs at Ricco's area was consistently larger toward the fovea ( Figure  6D). This was mirrored by a change in the integration constant τ with eccentricity. However, this trend in τ was completely eliminated by accounting for the change in cone/RGC convergence ( Figure 6A and Table 2). This effect of convergence was larger when modeling the mOFF-RGC mosaic (supplementary material). This result can alternatively be visualized by multiplying the number of P-OFF-RGCs at Ricco's area by the corresponding convergence factor ( Figure 6D and Table 2). Note that this is a post hoc calculation and not an output from the model (accounting of convergence is expected to have an effect on the model's parameters but not on Ricco's area and the shape of the fitted response profile). There was a small significant increase in the vertical Offset with eccentricity, which was reduced by accounting for convergence ( Figure 6B and Table 2).

Spatiotemporal summation
The same spatiotemporal model was used to analyze data from locations {±7; ±7} with all different combinations of stimulus sizes and durations. The data were collated to obtain a single estimate of the integration constant and accounting for retinal convergence. The global average RMSE for this fit was 1.67 ± 0.52 dB (mean ± SD) and 1.40 ± 0.41 dB for the 200-ms stimuli. This can be compared to the 0.96 ± 0.35 dB average RMSE obtained from fitting the 200-ms data alone at the same eccentricity. For context, Figure 7. The two top panels show an example fit from one location in one subject, with the horizontal axis reporting the stimulus area (A) and the product of area and duration (B). Correlation between the parameter estimates obtained by combining all durations and by only using data obtained with the 200-ms stimulus for the integration constant (C) and the offset (D). The diagonal line indicates equivalence.  the root mean squared difference in sensitivity between the two repetitions of the retested combinations was 2.44 dB, and the root mean squared deviation from the average of the two repetitions was 1.22 dB. An example of the calculation for one location in one subject is also shown (Figures 7A, B). There was a strong correlation between the parameter estimates obtained by fitting data from all stimulus durations and 200 ms alone (previous section), at the same eccentricity (correlation coefficient: 0.83 for log 10 (τ ) and 0.89 for the sensitivity offset; Figures 7C, D) Table 2 for the same eccentricity (p < 0.0001 and p = 0.00298, respectively). Significant differences were also present for all the other parameters, including Ricco's area and the number of Figure 9. Relationship between the slope (σ ) of the psychometric function and the 50% threshold (µ). The regression line is also reported. The relationship was statistically significant (p < 0.0001).
P-OFF-RGCs at Ricco's area (all p < 0.0001). Numeric values of Ricco's area and corresponding P-OFF-RGC counts are reported in Table 3 for all durations. Differences in Ricco's areas between different durations were not tested as such differences are assumed by the model.

Frequency of seen curves
We estimated the FOS curves for the four most extreme combinations of stimulus size and duration at locations {±7; ±7} using the MOCS data for five subjects. The results of the Bayesian fitting are shown in Figure 8. The FOS was modeled using the CDF of a Gaussian distribution. The averages of the estimates for μ, σ , λ, and γ are reported as supplementary material.
In general, there was a tendency for slopes (σ ) to be shallower for conditions where sensitivity was lower (μ). This agrees with previous literature (Gardiner, Demirel, & Johnson, 2006;Henson et al., 2000). Figure 9 shows this relationship. Estimates from Henson et al. (2000) are also reported for comparison.

Discussion
Constant integration of visual input has been regarded as a fundamental principle governing the perception of visual stimuli (Barlow, 1958;Owen, 1972). However, the interaction of stimulus duration and size has been rarely and incompletely explored in perimetry (Mulholland et al., 2015). Our data support constant input integration as a fundamental principle in perimetric response in healthy observers. Such a principle has translational value as it provides a simple framework for the interpretation and prediction of perimetric responses in healthy subjects and allows speculations on the expected changes from disease.
The first important result is the change in Ricco's area with different stimulus durations. The size of Ricco's area has often been interpreted considering cortical magnification (Kwon & Liu, 2019), linking the number of RGCs within Ricco's area to the number of RGCs contacting V1 cells in the visual cortex. Such a line of reasoning seems, however, questionable if Ricco's area can vary with stimulus duration, because duration would have no effect on the spatial extent of RGC-V1 connections. Rather, temporal and spatial summation appear to operate in concert to maintain a consistent behavior in response to the same amount of visual input, be it from changes in stimulus size or duration. Fredericksen et al. (Fredericksen, Verstraten, & van de Grind, 1994) also proposed a similar integration model in the context of motion detection, suggesting that spatiotemporal summation likely arises from diffuse cortical integration rather than specific temporal or spatial processes. Our model captures such a spatiotemporal interaction by only requiring the fitting of one parameter (the integration constant τ ) while providing good predictions of the experimental results. Other models, while not specifically investigating the interaction between stimulus size and duration, also showed that the spatial scale of the visual system could be modeled independently of the underlying RGC density and their RFs using cortical filters with different spatial scales (Pan & Swanson, 2006;. Our model also decouples spatial summation from the extent of the retinal spatial filters (in this case, the extent of the DoG filter used to model RGCs' responses). This has important implications for modeling the effect of disease that will be discussed later. It should be noted that other authors have proposed that these effects could be explained by a dynamic change in the "functional" receptive field size as a function of stimulus duration and background luminance (Glezer, 1965). More realistically, this could correspond to a selection of cortical filters of different sizes for different stimulus characteristics or to the response envelope of multiple filters combined by probability summation whose sensitivity can be selectively changed by different stimulation conditions (Pan & Swanson, 2006). Further research is needed to understand how this would apply in the case of disease, such as RGC loss (see later). Such a mechanism is further explored in a dedicated paragraph in the Appendix.
The model described by Equations 3 and 4 can be modified to incorporate different impulse response functions. In this study, it was a simple capacitor equation, as this was deemed sufficient to model our data by fitting only two parameters. This is likely to be simplistic for many other applications. For example, our model does not include any response delay. Our results can be largely replicated with the monophasic response filter used by Gorea and Tyler (1986) and first described by Watson (1982). Such an impulse response can also be tuned to produce different critical durations by changing an integration constant, while keeping all the other parameters fixed. Using this impulse response produced only minimal differences (one example is provided as supplementary material). A drop in sensitivity has been shown for very long stimulus durations (Breitmeyer & Ganz, 1977;Kelly & Savoie, 1978;Roufs, 1974) and modeled with a biphasic impulse response integrated over a limited time window (Gorea & Tyler, 1986). Our stimuli would not be long enough for this to be evident. Our temporal integral in Equation 4 extends to infinity, similarly to Watson (1979). Gorea and Tyler (1986) highlighted the implausibility in this assumption, because an observer that integrates over an infinite time window will never make a decision to respond. A practical choice for our experiments would be to use the maximum time interval allowed for a response (1,500 ms) as an integration window. However, this is so much longer than the longest stimulus (200 ms) that it would be practically equivalent to infinity.
It should be mentioned that both temporal and spatial summation, and contrast sensitivity in general, can be largely affected by background adaptation. For the background illumination used in this study (10 cd/m 2 ), threshold behavior should be close to Weber's law at least for a G-III stimulus (Bierings, de Boer, & Jansonius, 2018;Swanson et al., 2014). Retinal illuminance can be reduced by media opacity (such as cataract), but this is likely to be negligible in a young healthy cohort. Pupil size can also affect retinal illuminance, especially if below 3 mm (Swanson et al., 2014), but the average pupil size in our cohort was 5.9 ± 0.8 mm.
The model can be used to investigate the effect of eccentricity on spatial summation. Our results show that Ricco's area significantly increased with eccentricity, as expected (Choi, Nivison-Smith, Khuu, & Kalloniatis, 2016;Khuu & Kalloniatis, 2015;Khuu & Kalloniatis, 2015). However, this did not correspond to a constant number of P-OFF-RGCs being stimulated, with this number being comparably larger at smaller eccentricity. This is mirrored by the identical trend for the integration constant τ , indicating that more P-OFF-RGCs need to be stimulated to achieve the same change in sensitivity closer to the fovea. This trend is even bigger when modeling the response from the mOFF-RGC mosaic (supplementary material). Our results agree with Kwon and Liu (2019), who also observed a notable departure from a constant number of mOFF-RGCs at Ricco's area and a trend with eccentricity. However, they concluded that this was likely a result of inaccuracies in the estimates of RGC density. We propose a different explanation: The trend in the number of RGCs, and in the integration constant, appeared to be completely eliminated by weighting the contribution of each RGC by the cone/OFF-RGC convergence ratio. This observation suggests that, much like the effect of change in stimulus duration, convergence can change the "contribution" provided by each RGC in terms of retinal input. Our model is able to account for this, because the contribution of each RGC can be weighted by its convergence rate prior to summation in Equation 4. Our experiments would not allow us to uncover a specific mechanism for this phenomenon. However, a reasonable hypothesis is that increased convergence could change the contrast gain determining the spiking rate of the RGC for a given level of contrast. For our main analysis, we considered one possible class of RGCs, P-OFF-RGCs. This is important for our assumption of hexagonal tiling, because different classes of RGCs form independent and overlapping mosaics (Dacey, 1993;Dacey & Petersen, 1992). mOFF-RGCs were also modeled (supplementary material) for comparison with Kwon and Liu (2019). Their choice was justified by the fact that these are the most prevalent type of RGCs in humans (Dacey, 1993;Drasdo, Millican, Katholi, & Curcio, 2007). However, previous literature showed that briefly flashed stimuli, such as those used in perimetry, might preferentially stimulate parasol RGCs (Swanson, Sun, Lee, & Cao, 2011), and this was the reason for our choice to model P-OFF-RGCs instead. It should be noted that the effect of eccentricity, and the importance of cone/RGC convergence, was much more pronounced for mOFF-RGCs. However, accounting for convergence eliminated significant differences in the number of stimulated RGCs at Ricco's area and in the integration constant between the smallest and the largest eccentricity for both modeling choices. Interestingly, when weighted by convergence, the results were effectively identical to those obtained with the P-OFF-RGC mosaic, because the higher convergence ratio for the mOFF mosaic effectively produced the same scaled input. It should be noted that there is no clear anatomical evidence of increased cone/P-OFF-RGC convergence with eccentricity. However, this seems a reasonable assumption because the cone/RGC ratio calculated from histology data Curcio, Sloan, Kalina, & Hendrickson, 1990) increases with eccentricity in a similar fashion for both the midget and parasol cells. The similarity between our results and those reported by Kwon and Liu (2019) should be interpreted with caution, because it can be explained by the fact that both our estimates and theirs were derived from those provided by Drasdo et al. (Drasdo, Millican, Katholi, & Curcio, 2007;Montesano et al., 2020), which are in turn based on a small histology data set by . Despite our attempt to improve precision by customizing Drasdo's estimates using individualized structural OCT data (Montesano et al., 2020), the results are unlikely to be greatly altered. Therefore, Kwon and Liu's (2019) results cannot be considered a fully independent confirmation of our findings. Finally, it should be noted that the compensation of the effect of eccentricity with the convergence ratio might be coincidental and could be explained by other factors, such as optical aberrations. The effect of natural ocular optics on spatial summation in the parafoveal retina is debated (Dalimier & Dainty, 2010;Davila & Geisler, 1991;Tuten, Cooper, & Tiruveedhula, 2018). In our model, we included the effect of optical aberrations and glare using the average MTF for the human eye proposed by Watson (2013): The data were fitted accounting for optical factors, but the summation curves were generated without the effect of optics. This was an attempt at estimating the pure neural contribution to spatial summation. However, the effect on the results largely depends on other assumptions within the model, particularly the choice of whether the summation in Equation 5 is taken over the signed or absolute value or the RGC response. Our choice of summing the signed contribution was based on some desirable properties of the model, particularly the perfect linear scaling of the response with the change in RGC density and filter size. This produced a very small effect from ocular optics, because the total power of the stimulus was simply spread over a larger area. Taking the summation over the absolute value instead produced a much greater effect (results reported in supplementary material) because negative contributions from "inhibited" RGCs were transformed into positive contributions, greatly amplifying the effect of optical blur. Our choice of modeling produced an average change in Ricco's area due to optical factors of 0.056 log 10 units, which is very similar to the change measured by Tuten et al. (Tuten, Cooper, & Tiruveedhula, 2018) with adaptive optics (AO). Taking the summation over the absolute value instead produced an average change of 0.37 log 10 units, which is closer to what was reported by Dalimier and Dainty (2010) for similar experiments. Ultimately, a definitive answer to these questions could only be obtained by performing these same experiments with coupled AO-corrected stimuli and imaging, so that accurate estimates of individual RGCs can be obtained and the effect of optical aberrations eliminated (Liu et al., 2017).
Another important result is the effect of different stimulus durations and sizes on the shape of the psychometric function. In general, and in agreement with previous reports (Gardiner, Demirel, & Johnson, 2006;Henson et al., 2000), we have found that the change in the slope of the psychometric function was largely explained by a change in sensitivity and was reasonably described by a log-linear relationship (Figure 9). This effect is indicative of the presence of multiplicative noise in the response (Tyler & Chen, 2000). However, it is difficult to identify the exact origin of such noise (quantal fluctuations, eye movements, noise from the instrument). This, however, has important implications, because it confirms that the increase in variability of perimetric responses with sensitivity is not uniquely linked to disease but can be replicated in healthy observers. The MOCS experiments were designed to replicate the simple detection task involved in perimetry, where observers are asked to continuously monitor the presence of a signal in sequential intervals. This can be modeled as a task with a variable observer-defined "criterion" (i.e., rate of false alarm or response bias) (Klein, 2001). In our FOS curves, this bias is accounted for by estimating the guess rate as a lower asymptote (the γ term in Equation 2). This framework is rooted in high-threshold theory and widely adopted in the field of perimetry (Rubinstein, Turpin, Denniss, & McKendrick, 2021). It should be kept in mind that, under the alternative signal detection theory, the bias correction would be performed after z score transformation and would require numerous catch trials to determine the individual response bias (Klein, 2001). In our data, the response bias and lapse rate were estimated from the response to stimuli that were likely to be much above or below the 50% threshold (as determined using a pilot using QUEST+ to estimate threshold and psychometric function slope), and all participants were encouraged to maintain a low false-alarm rate during the experiments. Both the guess and lapse rates were very close to 0 and are therefore unlikely to have greatly affected the estimates of the psychometric function.
Our choice of placing our testing locations along the diagonals limits our ability to appreciate the previously reported dissociation in between ganglion cell number and perimetric sensitivity in nasal visual field (Keltgen & Swanson, 2012). We, however, found a significantly smaller number of P-OFF-RGCs within Ricco's area for the nasal locations, indicating a smaller spatial scale compared to temporal locations (p = 0.005). This comparison was performed for the log 10 -RGC number with a linear mixed model using the hemifield as a fixed effect and the eccentricity as a random effect, nested within the subject, to perform a paired same-eccentricity comparison.
It is interesting to consider the implications of our results and modeling approach for the interpretation of changes observed in disease. Redmond et al. (Redmond, Garway-Heath, Zlatkova, & Anderson, 2010) have demonstrated an increase in Ricco's area in patients with glaucoma, which could be accounted for by a shift of the summation curves along the horizontal axis (stimulus size). According to some models (Kwon & Liu, 2019;, such a change could only occur by scaling the spatial filters to increase spatial convergence (equivalent to changing the cortical magnification factor), which would imply some sort of "restructuring" of either the pooling mechanism (e.g., the spatial extent of RGC-V1 connections) or an enlargement of RGCs' RFs. The latter seems implausible, because most histologic studies have shown dendritic pruning and shrinkage (Liu, Duggan, Salt, & Cordeiro, 2011), which would imply smaller RGCs' RFs. The first hypothesis also lacks solid support from experiments: Wang et al. (Wang, Yan, & Zhou, 2021) observed changes in the cortical magnification factor in patients with glaucoma tested with functional magnetic resonance imaging; such changes, however, are indicative of increased retina-V1 divergence and therefore do not clearly support the hypothesis of an increased magnification factor. Our model makes no such assumptions. Instead, the change in Ricco's area is a consequence of the reduction in retinal input owing to a loss of RGCs in glaucoma. In Figure 10A, data from healthy participants in Redmond et al. (Redmond, Garway-Heath, Zlatkova, & Anderson, 2010) were fitted with our model, assuming a mosaic of P-OFF-RGCs with density estimated from Drasdo et al. (Drasdo, Millican, Katholi, & Curcio, 2007;Montesano et al., 2020). The mosaic was then randomly degraded to achieve 73% RGC loss, equivalent to the reported proportional average change in Ricco's area. The figure plots the average response of 100 randomly degraded mosaics. The model correctly predicted a horizontal shift of the curve, in agreement with the data. A horizontal shift in the response could also be explained by RGC loss preferentially affecting higher-frequency cortical filters, whose loss in sensitivity might determine a horizontal shift of their probability summation envelope (Pan & Swanson, 2006). Our model also predicts that temporal summation curves can be equated between healthy controls and patients with glaucoma by appropriately scaling stimulus size. This is shown in Figure 10B, for the same mosaics simulated in Figure 10A. Mulholland et al. (2015) provided experimental evidence that using Ricco-scaled stimuli could reduce the difference in temporal summation observed between patients with glaucoma and healthy controls with G-III stimuli, although some residual differences were still present. This is further proof of the interaction between stimulus size and duration. However, more research is needed to fully characterize such an interaction in glaucoma. Finally, our model also predicts changes in spatial and temporal summation with photoreceptor loss, such as from diseases of the external retina. However, studies investigating this with perimetric stimuli are still lacking and will need further research.
Other questions remain, particularly pertaining to the systematic difference between the estimates of the model parameters obtained with 200-ms stimuli only or with all stimulus durations combined. Small inaccuracies in the delivery of the stimulus might produce variations in the intended durations, skewing the results of the combined analysis. Another consideration is that our model, despite describing most of the variability in the data, might not be capturing all aspects of the effect of stimulus duration on sensitivity.
In fact, the model was not meant to be a complete description of the psychophysical response to all the features of the stimulus but rather aimed at providing a coherent framework to explain important experimental observations from the data that are often neglected by other modelling attempts.

Conclusions
We show that the amount of total retinal input can account for most of the characteristics of spatiotemporal summation with perimetric stimuli in healthy observers, including the effect of eccentricity. This could have important implications for the interpretation and design of perimetric examinations in diseased eyes as well as providing a framework for analyzing spatiotemporal integration in heathy observers.
Keywords: spatial summation, temporal summation, perimetry, spatiotemporal summation  Pan and Swanson (2006) with an exponent of 2. For this specific spatial filter, this corresponds to a partial summation slope of 0.25 in the log 10 -log 10 plot, the same as in our model. The spatial summation curve with the smallest filter is shown in gray for reference. changes in stimulation conditions, such as background illumination. However, there is no clear evidence of such a change occurring in the retina. Furthermore, Glezer (1965) proposed such changes to occur through alterations in the weighting of the center and surround of center-surround receptive fields. Despite this, Ricco's area was observed to alter in response to glaucomatous RGC loss in patients with glaucoma (Redmond, Garway-Heath, Zlatkova, & Anderson, 2010) and background luminance in healthy subjects (Redmond et al., 2013) in the s-cone pathway, in which a center-surround receptive field organization is absent (Dacey & Lee, 1994). A more reasonable hypothesis, which fits more closely with experimental observations, is that a set of spatial cortical filters exist and can be optimally selected based on the amount of retinal input. Figure A.2 shows a hypothetical response of an array of cortical neurons employing a biphasic first Gaussian derivative filter (D1) with a Gaussian envelope. This filter was chosen because it produces a smooth monotonic spatial summation curve, as shown by Pan and Swanson (2006). Note that the locations of the cortical neurons in the schematic indicate their projection into the visual space, rather than their anatomical arrangement in the visual cortex. In the schematic, selecting a larger filter corresponds to selecting a sparser mosaic of cortical neurons, since the extent of the filter is scaled with the intercellular spacing. This is equivalent to proportionally scaling the same mosaic. As expected, the summation curves with larger filters are shifted along the horizontal axis toward larger stimulus sizes. These mosaics can be obtained by selecting subsets of neurons from the same array (as in this example) or be constituted of separate sets of neurons. It should be mentioned that the summation curves produced by a more realistic implementation of this model (with cortical neurons sampling the response of RGCs with static RF sizes) would largely reproduce this behavior but would not be an exact horizontal translation of the same response (see later).
The change in spatial scale with different stimulus durations can therefore be replicated by a horizontal shift (in log 10 -log 10 coordinates) of the same template response by an amount equivalent to the log 10 change in duration. Note that the selection of the filter scale does not need to depend solely on the stimulus duration but more generically on the retinal input, to include the effect of cone/RGC convergence, duration, background illumination, or, for example, RGC loss in disease. For  The blue line represents the response envelope obtained by combining, through probability summation, the responses of filters with progressively smaller spatial scales. For example, in the top-right panel, the envelope is obtained by combining the responses of the filters with the three largest scales, while excluding the remaining three with a smaller spatial scale. the sake of simplicity, everything except duration was held constant for these calculations. The combined effect is best represented by a summation surface, shown in Figure A.3. In the figure, three summation curves are isolated by cutting through the surface at different stimulus durations and correspond to using a different filter scale. Importantly, temporal summation responses can be obtained by cutting through the surface along the orthogonal (duration) axis. Because the surface is obtained by proportionally translating the same spatial summation curve, temporal summation responses also follow the same template curve, proportionally shifted with different stimulus sizes. This would produce the same results obtained with our more generic input summation model. With this interpretation, although a strict retina-V1 convergence cannot be defined, testing in partial summation condition (i.e., long stimulus durations and high background illumination) would allow the calculation of the smallest possible spatial scale for a given retinal location.
Another possibility, proposed by Pan and Swanson (2006), is that different stimulus features, such as adaptation state and stimulus duration, might alter the relative sensitivity of individual filters and change the combined response "envelope" obtained through probability summation. For simplicity, we demonstrate this concept in Figure A.4 by selectively combining the response of filters with progressively smaller spatial scales. The resulting response envelope is a simple translation of the same curve.
We finally implemented a more realistic two-stage version  of the cortical pooling model presented in Figure A.2, where an array of cortical cells would sample the response of an array of RGCs like the one used in our main model. The cortical cell array was the same as the RGC array but used a D1 filter as their receptive field. Figure A.5 shows the responses produced by both the multiscale filters and the combination envelope. These largely replicate our experimental results (horizontal translation of the same response), with some small changes at different scales introduced by the fact that the size of the RGCs did not scale with the chosen cortical filter. Like in Swanson et al. , this modeling exercise shows that Ricco's area can be entirely determined by cortical filters without changing the RGC density or the size of the RGC-RF.