Object Image Size Is a Fundamental Coding Dimension in Human Vision: New Insights and Model

—In previous psychophysical work we found that luminance contrast is integrated over retinal area subject to contrast gain control. If diﬀerent mechanisms perform this operation for a range of superimposed retinal regions of diﬀerent sizes, this could provide the basis for size-coding. To test this idea we included two novel features in a standard adaptation paradigm to discount more pedestrian accounts of repulsive size-aftereﬀects. First, we used spatially jittering luminance-contrast adaptors to avoid simple contour displacement aftereﬀects. Second, we decoupled adaptor and target spatial frequency to avoid the well-known spatial frequency shift afteref-fect. Empirical results indicated strong evidence of a bidirectional size adaptation aftereﬀect. We show that the textbook population model is inappropriate for our results, and develop our existing model of contrast perception to include multiple size mechanisms with divisive surround-suppression from the largest mechanism. For a given stimulus patch, this delivers a blurred step-function of responses across the population, with contrast and size encoded by the height and lateral position of the step. Unlike for textbook population coding schemes, our human results (N = 4 male, N = 4 female) displayed two asymmetries: (i) size aftereﬀects were greatest for targets smaller than the adaptor, and (ii) on that side of the function, results did not return to baseline, even when targets were 25% of adaptor diameter. Our results and emergent model properties provide evidence for a novel dimension of visual coding (size) and a novel strategy for that coding, consistent with previous results on contrast detection and discrimination for various stimulus sizes. (cid:1) 2023 The Author(s). Published by Elsevier Ltd on behalf of IBRO. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

These perceptual phenomena have been used to develop our understanding of the sensory coding dimensions used by the human brain, but one obvious candidate missing from the list above is that of size. There are probably several reasons for this. First, retinal image size is often lumped in with retinal spatial frequency (e.g. Webster and Leonard, 2008) but as we shall show, these are two separate adaptable stimulus dimensions with different underlying mechanisms. Second, there is a view that size mechanisms have already been identified psychophysically by using adapt and test stimuli with different diameters, for example. However, as we review in this discussion, results from those studies might derive from perceptual contour displacement, not size-mechanisms per se. Finally, there is a view that the problem of retinal image size and position is solved in the primary visual cortex because it is retinotopically mapped. While local signs might at least be involved in the encoding of relative position on the retina (Morgan et al., 1990), it is no more suitable as a route for encoding size as it is for orientation, a feature for which dedicated orientation-tuned mechanisms are well-known and celebrated.
On the face of it, one might suppose that the size of a patch of grating, for example, could be encoded by a population of size-selective mechanisms along the lines of the standard population model.  Fig. 1. The standard 'textbook' model for orientation-coding and adaptation (the tilt aftereffect) (A-D) applied to size-coding and adaptation (E-H). A bank of orientation-tuned spatial filters (A) responds selectively (B) to stimulus orientation (black arrow) encoded by the peak of the response distribution. Following adaptation at this orientation, the sensitivities of nearby mechanisms are attenuated in proportion to their response to the adaptor (C), distorting the population codes (D) for nearby stimuli (black arrows) and resulting in a repulsive tilt aftereffect. (In (D) the thin curves show the response distributions for two stimuli (black arrows) pre-adaptation, and the filled curves are for those same stimuli post-adaptation.) For a population of size mechanisms, the arrangement of mechanism sensitivities along the coding dimension is different (E) from the orientation case (A). The population response to a stimulus of a given size (e.g. an adaptor; black arrow) resembles a blurred step-edge (F). For this situation the population code is not summarised by the location of its peak, as in (B), but its gradient maximum (black dot). (This happens to be to the left of the nominal stimulus size (black arrow), but this is unimportant for our purposes here). This blurred step-edge distribution means that adaptation desensitises all of the mechanisms smaller than the adapter and, to a lesser extent, some of those that are larger (G). It follows that this model predicts that adaptation will increase the perceived size of stimuli both larger and smaller than the adaptor (note the rightward shifts in gradient maxima (dots) in panel H)-i.e. it is not generally repulsive. The purpose of (E-H) is to illustrate the details of a basic size adaptation model, but thus is not the model we advocate. See text for further details, including the problems involved in trying to achieve an arrangement like that in (A-D) for size. from the difference of Gaussian-type arrangement of the underlying receptive fields (i.e. elongated excitatory centres for luminance with subtractive inhibition from flanking surrounds; not shown).
Modelers of size-adaptation might proceed by simply relabeling the orientation axes in Fig. 1(A-D). However, this would leave the problem of how to get from retinal image size to a standard population code with the appropriate characteristics. For orientation and spatial frequency, this is straightforward, but the situation is not so simple for retinal image size. Fig. 1(E) shows that (unlike for orientation and spatial frequency), a population of mechanisms that integrate luminance contrast over area (Meese and Summers, 2012;Richard et al., 2019) do not sample the coding dimension at regular intervals, but with considerable overlap and a common centre, which means the population response for a given stimulus is not the familiar bell-shaped curve ( Fig. 1(B)), but a blurred step-edge ( Fig. 1(F)), where the rising region owes to the benefit of contrast integration within mechanism. The arrangement might be revised by supposing difference of Gaussian-type second-order mechanisms with contrast integration windows of various sizes and inhibitory contributions from contrast in the surrounds. However, this is not the remedy it might seem. First, while the arrangement benefits mechanism selectivity, in that small mechanisms are quashed by large stimuli, it fails the other way around, in that small stimuli will excite equally all the mechanisms that are greater than or equal to the stimulus size. A fix for this is to normalise each mechanism's sensitivity by its integration area, but that arrangement kills the benefit of stimulus area at detection threshold for which evidence shows a signal integration process Summers, 2009, 2012;Meese, 2010). Second, surround suppression from outside the classical receptive field is not subtractive, but divisive. This is true for cortical cells (Cavanaugh et al., 2002) and for the mechanisms derived from visual psychophysics (Foley, 1994;Snowden and Hammett, 1998;Meese, 2004) and is inconsistent with the assumptions underlying the standard model. Third, regardless of the arrangement for suppression, these mechanisms will confound an increase in stimulus contrast area (within the excitatory part of its mechanisms) with an increase in image contrast. Similarly, as stimulus size increases, larger mechanisms will respond more strongly than the smaller mechanisms did for the same level of contrast (e.g. compare the two solid pre-adapt curves in Fig. 1  (H)). This model feature is consistent with contrast detection thresholds (Meese and Summers, 2012) but not with the perception and discrimination of suprathreshold contrast (Cannon and Fullenkamp, 1988;Meese et al., 2005) and is all very different from the population behaviour of the standard model ( Fig. 1(A-D)). Of course, one might appeal to the process of contrast gain control (as we have done elsewhere and will do here) but this raises questions about the method of implementation and still leaves the problem of how to extract an estimate of stimulus properties from the population code which is not as simple as it is for the tidier standard model (see Fig. 1). We suggest that the first derivative of the response distribution ( Fig. 1(F,H)) is a useful approach, and return to this in a later section.
With the above in mind, our motives were twofold. First, we wanted to know whether we could find evidence for size mechanisms in human vision. Second, should they be evidenced by repulsive adaptation aftereffects, the standard population model would need considerable development for it to provide a plausible account of this result while remaining consistent with contrast perception. However, we were far from daunted. Our previous work on the detection and discrimination of luminance contrast provided evidence for the spatial integration of contrast signals across the retinal image, a sound starting point for the construction of size mechanisms. This contrast integration was found in both the excitatory and suppressive pathways of a contrast gain control network Summers, 2007, 2012;Baker, 2011, 2013;Baker et al., 2013). For example, by manipulating both stimulus size and extrinsic uncertainty, Meese and Summers (2012) rejected contemporary (signal detection theory) accounts of probability summation for area summation of grating contrast at detection threshold. Instead, those results provided evidence for a process that sums (exponentiated) contrast and internal noise over contiguous regions of the retina. Other experiments using novel spatial stimulus designs tailored to the issue came to the same conclusion Summers, 2007, 2009;Meese, 2010;Meese and Baker, 2011;Baldwin and Meese, 2015). Furthermore, strong evidence for this spatial integration of contrast was found when the enquiry was extended above threshold using pedestal masking (Meese and Summers, 2007;Meese and Baker, 2011) and contrast matching (Meese et al., 2017). Crucially, in those experiments, the suprathreshold integration process was seen only with help of 'Battenberg' or 'Swiss cheese' stimuli (essentially, these are gratings containing evenly spaced holes) because under normal circumstances, when the area of both target and pedestal increase together, the benefit of signal integration is hidden by a concomitant process of suppression. This provides the solution to one of the problems outlined above: the visual system does not confound stimulus area with contrast because of the opposing operations of spatial integration and spatial suppression. These findings prompted a cartoon model involving a population of size-mechanisms each subject to suppressive gain control from the largest spatial region in the pool (Meese and Baker, 2011). From this, we developed the hypothesis that spatial pooling of luminance contrast might provide the basis of a population code for retinal image size in visual cortex (Meese and Baker, 2011) (e.g. for a patch of grating), and that adapting this population would distort the perception of stimulus size.
To test this prediction, we conducted a series of novel adaptation experiments designed to isolate the putative size-coding mechanisms predicted by our hypothesis. The outcome is supported by our formalisation of a computational model developed from the earlier work on contrast perception referred to above, where surprising asymmetric features in our size-adaptation data are emergent properties of the model.

EXPERIMENTAL PROCEDURES Participants
Participants were eight undergraduate students in their early twenties, who participated for course credit. They had no known visual abnormalities and gave written informed consent. The second author also served as a participant in a follow-up experiment. Any prescribed optical correction typically used for near work was also worn during testing. The work was conducted with the ethical approval of Aston's School of Life and Health Sciences (now the College of Health and Life Sciences).

Equipment
Stimuli were displayed on a Phillips MGD403 19-inch greyscale monitor, running at a resolution of 1280 Â 1025 pixels, with a refresh rate of 80 Hz. The mean luminance of the monitor was 150 cd/m 2 . The monitor was viewed from 103 cm, at which one degree of visual angle subtended 64 pixels on the display. Participants placed their heads in a head-and-chin rest positioned at the appropriate distance and were instructed to fixate on a central cross throughout. We generated stimuli using Matlab and rendered them using a VSG2/5 framestore device (Cambridge Research Systems Ltd., Kent, UK). Participants made responses using a Kensington Expert Mouse Pro trackball devicethis is a stationary device housing a ball that could be rotated to provide dynamic adjustment of the stimulus on the screen.

Stimuli
Stimuli were sinusoidal gratings, windowed by a raised cosine envelope with a blur width of four pixels added to each side of the plateau of the envelope. (The width of the plateau was the nominal width of the stimulus). All stimuli had a Michelson contrast of 50%. Target stimuli were always horizontal, whereas adapting stimuli were either horizontal or vertical depending on the experiment.
In experiments where size was manipulated, all stimuli had a spatial frequency of 4c/ deg, and adaptors and targets varied in size (adaptors either 1 or 2 degrees, and targets from 0.25 to 8 degrees at seven levels in proportion to the adaptor; see Fig. 3

(A) for examples).
In experiments where spatial frequency was manipulated, target stimuli were the same size as the adaptor (either 1 or 2 degrees) but varied in spatial frequency between 1 and 16c/deg (see Fig. 3

(B) for examples).
Procedure Fig. 2 outlines the main procedure. Trials were blocked by adaptation condition throughout. The adaptor was presented to the left of a central fixation cross for 60 s at the start of a block (initial adaptation), and 10 s before each trial (top-up adaptation). The adaptor was always a sinusoidal grating with a spatial frequency of 4c/deg and a diameter of either 1 degree (four cycles) or 2 degrees (eight cycles). During adaptation, the adaptor moved to a random spatial position every 250 ms, with the range of possible positions constrained to lie within the boundary of an invisible circle with a diameter of 4 degrees (for the small adaptor) or 8 degrees (for the large adaptor; see upper panel of Fig. 2). After adaptor offset, there was a blank period of 500 ms, followed by a target presentation of 250 ms, accompanied by a beep. The target was offset to the left of fixation by 2.5 degrees (for conditions with a 1 degree adaptor), or 5 degrees (for conditions with a 2 degree adaptor). The target then disappeared, and a matching stimulus was presented to the right of the fixation cross, offset by the same distance. In some blocks this was a thin grey ring, the diameter of which was adjusted by moving the trackball. In the remaining blocks it was another sine-wave grating, and moving the trackball adjusted the spatial frequency. The starting size or spatial frequency of the match had equal probability of being larger or smaller (or lower or higher in frequency) than the target. Once the participant was satisfied with their judgement, they clicked a button on the trackball to initiate the next trial. Baseline conditions were as described above, except that there was no adaptation sequence. We collected baseline data for each session before any adaptation took place, and each participant adapted to only one orientation on a given day. For conditions in the main experiment where the target size varied and spatial frequency was judged, the matching stimulus always had the same diameter as the target. In the control experiment, the matching stimulus had a fixed diameter of either 1 or 2 degrees. Each target condition (target size or spatial frequency) was repeated four times in each block, and each adaptation condition was repeated twice by each participant. Overall, participants completed 1,344 settings each across 168 distinct conditions (7 target levels Â 2 adaptor types Â 2 adaptation conditions Â 6 sub-experiments) in the main experiment, and a further 112 settings each across 14 conditions (7 target sizes Â 2 match sizes) in the control experiment.

Quantification and statistical analysis
All analyses were conducted in R. The mean setting (in log units) for each participant and condition was calculated across all 8 repetitions. We calculated the percentage shift in perceived size (or spatial frequency) relative to the baseline settings. The mean and standard error of this adaptation effect were then calculated across participants. We performed a frequentist 2 (adaptor size) by 7 (target size or spatial frequency) repeated measures ANOVA for each experiment using the log values. All degrees of freedom were Greenhouse-Geisser corrected where Mauchly's test of sphericity was significant, and we report generalised eta-squared as a measure of effect size (Olejnik and Algina, 2003). We also conducted a Bayesian ANOVA (Rouder et al., 2012) (using the BayesFactor R package), allowing us to report Bayes factor scores corresponding to the main effects and interactions.

RESULTS
Main Experiments: Manipulations of size, spatial frequency and orientation Human participants (N = 8) were presented with an adapting stimulus (a patch of sine wave grating) on the left side of a computer monitor, which jittered in position (Baker and Meese, 2012;Storrs and Arnold, 2017) every 250 ms to avoid causing retinal afterimages (Ko¨hler and Wallach, 1944). They were then shown a target stimulus in the adapted region of the display and asked to report its perceived size by adjusting the diameter of a ring on the opposite side of the display (see Fig. 2). Compared to unadapted baselines (which were approximately veridical as summarised in Supplementary Fig. S1), when the target was smaller than the adaptor, its perceived size reduced by around 20%. When the target was larger than the adaptor, its perceived size increased by around 10% (see Fig. 3(A)). These effects were consistent across two different sizes of adaptor and are robust and compelling in demonstrations (see Supplementary Movie S1). A factorial (7 stimulus size Â 2 adaptor size) repeated measures ANOVA indicated that the main effect of relative stimulus size was significant (F (2.3,16.3) = 18.64, p < 0.001, g g 2 = 0.54, log 10 -BF 10 = 15.1), but there was no effect of absolute adaptor size (F (1,7) = 0.004, p = 0.95, g g 2 < 0.001, log 10 BF 10 =-À0.7), nor any interaction (F (6,42) = 1.44, p = 0.22, g g 2 = 0.04, log 10 BF 10 = À0.80).
We then sought to replicate a classic aftereffect in which the coarseness of the texture (but not the size of the patch) is affected by adaptation (Blakemore and Sutton, 1969) using our jittering adaptor (see Supplementary Movie S2). The method remained the same but this time the targets had a constant size and varied in spatial frequency (i.e. bar width). The perceived target spatial frequency was indicated by adjusting the spatial frequency of a matching patch of grating on the opposite side of the display. This experiment also produced a repulsive aftereffect of around 20% in each direction (see Fig. 3 (B); note the logarithmic y-axis). The effect of target spatial frequency was significant (F (6,42) = 24.74, p < 0.001, g g 2 = 0.60, log 10 BF 10 = 15.9), but there was no effect of adaptor size (F (1,7) = 0.05, p = 0.82, g g 2 < 0.001, log 10 -BF 10 = À0.68), nor any interaction (F (6,42) = 0.59, p = 0.74, g g 2 = 0.03, log 10 BF 10 = À0.89). Note that spatial frequency-specific aftereffects of this kind were originally referred to as 'size adaptation' (Blakemore and Sutton, 1969) and understood in terms of a population code for spatial frequency. In principle, this scheme can explain the distorting effects (Kreutzer et al., 2015;Zeng et al., 2017;Altan and Boyaci, 2020) of perceived size for luminance-defined objects (see Supplementary Movie S4) by assuming that perception of size is mediated by spatial frequency selective channels. However, this approach does not explain the size adaptation aftereffects reported in Fig. 3(A) where object size and spatial frequency are decoupled; the perceptual judgement here is specific to the size of the patch, not the spatial frequency of the grating texture it contains.
To reveal more about the level of processing at which these adaptation effects occur, we asked whether the aftereffects are tuned for orientation (see Supplementary Movie S3). For size adaptation, we found effects of similar magnitude when the adaptor orientation was orthogonal to that of the target (see Fig. 3(C)). There was a main effect of target size (F (6,42) = 27.51, p < 0.001, g g 2 = 0.59, log 10 BF 10 = 15.0) but no effect of adaptor size (F (1,7) = 1.01, p = 0.35, g g 2 = 0.03, log 10 BF 10 = À0.43) or interaction (F (6,42) = 1.27, p = 0.29, g g 2 = 0.06, log 10 BF 10 = À0.63). In fact, our informal explorations suggest that the size adaptation aftereffect is insensitive to the texture content of the stimuli and occurs even when this is substantially mismatched between the adaptor and target. For example, we get the same effects when the adaptor is a grating and the target is a face (see Supplementary Movie S5). For spatial frequency adaptation (Fig. 3(D)), aftereffects were also found when using an orthogonal adaptor, with a significant main effect of target spatial frequency (F (6,42) = 18.07, p < 0.001, g g 2 = 0.51, log 10 BF 10 = 10.8) but no effect of adaptor size (F (1.7) = 0.34, p = 0.58, g g 2 = 0.003, log 10 BF 10 = À0.68) or interaction (F (6,42) = 1.27, p = 0.29, g g 2 = 0.08, log 10 BF 10 = À0.34). Early work on spatial frequency aftereffects showed strong orientation tuning, but did not test target-adaptor orientation differences beyond 40° (Blakemore et al., 1970;Blakemore and Nachmias, 1971). In the context of these classic early studies, our findings with an orthogonal adaptor might be surprising as they appear to show no orientation tuning for the aftereffect. However, subsequent work on the same phenomenon found strong effects on perceived spatial frequency from orthogonal adaptors (Heeley, 1979), consistent with our results.
Do the size and spatial frequency adaptation aftereffects derive from a single process?
Both of our adaptation aftereffects involve aspects of stimulus size -overall size in one case, the width of the stripes in the other. Might these two effects be different aspects of a single phenomenon? For example, when the perceived size of a target shrinks, do its stripes also appear closer together, causing the perceived spatial frequency to increase? To test this, we instructed participants to judge the perceived spatial frequency of the targets from the size adaptation experiment, and the perceived size of the targets from the spatial frequency experiment. Perceived size did not show any clear modulation following spatial frequency adaptation (see Fig. 4(A)) suggesting that our size aftereffect was not a consequence of perceived bar width pulling in or pushing out the perceived diameter of the patch. There were no effects of target spatial frequency (F (1.07,7.46) = 0.79, p = 0.58, g g 2 = 0.04, log 10 BF 10 = À1.01), or adaptor size (F (1,7) = 0.44, p = 0.53, g g 2 = 0.01, log 10 BF 10 = À0.58), nor any interaction between them (F (1.12,7.84) = 0.79, p = 0.59, g g 2 = 0.04, log 10 BF 10 = À0.81). All Bayes Factor scores offered support for the null hypothesis (all log 10 BF 10 < À0.5).
Judgements of perceived spatial frequency for targets of different sizes did show a reduction in perceived spatial frequency for small target sizes (see Fig. 4(B)). This was confirmed by ANOVA, showing a significant main effect of target size (F (2.19,15.33) = 4.99, p < 0.01, g g 2 = 0.20, log 10 BF 10 = 1.86) but no effect of adaptor size (F (1,7) = 0.11, p = 0.75, g g 2 = 0.002,  log 10 BF 10 = À0.66) and no interaction (F (1.74,12.17) = 0.50, p = 0.81, g g 2 = 0.03, log 10 BF 10 = À0.94). The effect of target size is surprising because the bar width appears larger (lower spatial frequency) for grating patches that in fact look smaller due to the size adaptation aftereffect (Fig. 3(A)). In other words, the adaptation aftereffect here (Fig. 4  (B)) is in the direction opposite to that expected if the size and spatial frequency aftereffects (Blakemore and Sutton, 1969) were caused by a single process.

A B C
Note that in our matching paradigm, the target grating would be reduced in perceived size (due to size adaptation), whereas the matching grating would not. Could this mismatch between the perceived sizes of target and adaptor (even when the physical sizes are identical) be somehow responsible for the low spatial frequency settings in Fig. 4(B)?
To test this possibility, our participants completed a control experiment in which the size of the matching stimulus was constant in all conditions. Participants matched the perceived spatial frequency of targets of different sizes to the perceived spatial frequency of a match of fixed size. There was no adaptation. The results (Fig. 4(C)) showed a significant main effect of target size (F (1.18,8.28) = 15.17, p = 0.003, g g 2 = 0.61, log 10 BF 10 = 17.4) with small and large targets underestimated and slightly overestimated in spatial frequency, respectively. There was no effect of match size (F (1,7) = 0.24, p = 0.64, g g 2 = 0.002, log 10 BF 10 = À0.69). There was a significant interaction (F (6,42) = 5.35, p = 0.01, g g 2 = 0.05, log 10 BF 10 = À0.57) but the Bayes Factor score indicated greater evidence for the null hypothesis, and the result carries no particular theoretical importance in any case. This control experiment confirmed the perceived spatial frequency bias in the previous experiment ( Fig. 4(B)) when target and match stimuli were different sizes, but in this case, it cannot be attributed to size adaptation since there was no adaptor. This suggests that at least some of the spatial frequency effect seen in Fig. 4(B) is a secondary effect deriving from a mismatch in the perceived sizes of the target and match stimuli. Thus, we are confident that the unexpected quirk in our data (Fig. 4(B)) does not undermine our investigation, and that the effect might be related to other spatial frequency biases that have been reported before (Georgeson, 1980;Harris and Wink, 2000). We also conclude that the size and spatial frequency adaptation aftereffects derive from different processes.

Computational modelling
To understand the relationship between size adaptation and population coding involving mechanisms with superimposing selectivity for size, we developed a computational model (described more fully in Appendix A). This was guided by earlier work on contrast perception (e.g. Meese and Summers, 2007;Meese and Baker, 2011;Meese et al., 2017), and devised to overcome the difficulties involved with a simple applica-tion of the population model for size coding as described earlier (Fig. 1).
For simplicity, our spatially-one-dimensional model (see Fig. 5(A)) takes stimulus contrast as input over the spatial extent of the stimulus 1 (i.e. it treats the envelope of the stimulus as a local measure of contrast), 2 regardless of spatial frequency and orientation ). (Our model was not intended to explain the spatial frequency aftereffect which has been modelled elsewhere (e.g. Klein et al., 1974), but could be extended to do this with little or no impact on our conclusions here). These spatially distributed contrast responses (Moutsiana et al., 2016) are multiplied by each element in a onedimensional 'size' array of Gaussian pooling mechanisms of various spatial extents (windows) (dashed red curve in Fig. 5(C)), each member of the array being subject to nonlinear gain control. This includes (i) self-suppression to protect the system's image contrast code from the influence of stimulus size above threshold (Meese and Baker, 2011) (green curve in Fig. 5(C)), (ii) a divisive surround suppression term (Sengpiel et al., 1998;Xing and Heeger, 2001;Cavanaugh et al., 2002;Webb et al., 2003), where each mechanism is suppressed by the largest mechanism in the array (black curve in Fig. 5(C)) to provide the initial basis for size coding (Meese and Baker, 2011) and (iii) a saturation constant (Z) that allows for the benefit of stimulus size at detection threshold (Meese and Summers, 2012). This has the same initial value for each mechanism (Meese, 2004) but is influenced by the mechanism's response to the adaptor (R) and the gain of adaptation (a) (Foley and Chen, 1997;Meese and Holmes, 2002) such that Z j = 1 + aR j , where j indexes the array (i.e. population) of size mechanisms. In general, with no adaptation this model arrangement produces a distribution of activity across the size array (model Layer 1) with the form of a blurred step edge (black curve in Fig. 5(C)). (See Supplementary Figs. S3-S5 and their captions for further discussion of the model development with and without adaptation.).
First-order image contrast (Meese and Baker, 2011) (not our focus here) and stimulus size (Meese and Baker, 2011) are encoded by the height and location of the distribution of activity across Layer 1, respectively. Note that mechanisms with excitatory pooling regions 1 Our model here is devised to deal with psychophysical stimulipatches of luminance grating in a nominal world of otherwise uniform luminance. In this simple case, no stopping rule is needed for the contrast integration process at the stimulus boundary since the absence of stimulus contrast does the job. In the more general case, an additional process of image segmentation would be needed to identify the boundary of the integration process. We pick up on this point in the discussion.
2 To a first approximation this simplification is equivalent to a measure of local (receptive field size) RMS contrast at each point across the image which can be derived from the square root of the sum of the squares of a quadrature pair of filter outputs (i.e. a standard model complex cell). All of our stimuli were uniform patches of grating (albeit of various sizes). Thus, across most of the stimulus, the local RMS contrast is simply the contrast of the stimulus which is uniform across the stimulus region and zero beyond the stimulus region (see footnote 1). By using the stimulus envelope as a proxy for this measure, we are ignoring minor edge effects (contrast blurring) that would be produced by first-order spatial filters had we included a convolution stage in the model. greater than or equal to the size of the stimulus have comparable responses because they receive the same excitatory drives and suppression. This sets our Layer 1 population code apart from many others in visual perception, where the usual bandpass properties of the mechanisms involved (Fig. 1(A)) result in peaked (bell-shaped) distributions ( Fig. 1(B)). Instead, the computational task here is to find where the response transition in the Layer 1 array is located (see black curve in Fig. 5(C)). Following the logic of edge-emphasis by lateral inhibition in the retina (Barlow, 1953), Layer 2 of our model takes a copy of the Layer 1 responses and subjects each to subtractive inhibition from its immediate neighbour, approximating first-order differentiation. This results in size-tuning of the population (blue curve in Fig. 5(C)), with perceived size being given by the peak of the spline-interpolated distribution (Marr and Hildreth, 1980) (see points in the upper row of Fig. 5(B)).
Adaptation is implemented by setting the saturation constant (Z, see (Foley and Chen, 1997;Meese and Holmes, 2002)) in the gain control of Layer 1 such that Z j -1 is proportional to the jth mechanism's response to the adaptor (see Appendix A for details.) Adaptation changes the population size-tuning curves, such that the peaks of the tuning curves in Layer 2 shift away from the adapting stimulus (i.e. smaller sized stimuli produce peaks at smaller mechanism widths, and vice versa), as shown in the lower row of Fig. 5(B): the peaks (marked by the coloured dots) are repulsed laterally from the location of the adaptor following adaptation. The ratio of adapted to unadapted perceived size is shown in Fig. 5(D) (curve) and produces the same bidirectional size adaptation aftereffect that we observed empirically (symbols). Note that the decreases in size are around twice as large as the increases, and that this asymmetry is present in both model and human behaviour. Further investigation on the second author confirmed that the size reduction effects remained strong for even smaller  N = 8 participants)). The curve is the model behaviour, and the symbols are the 1 deg results replotted from Fig. 3(A).
targets, and that size increase effects returned to baseline for larger targets. This is also predicted by the model (see Supplementary Fig. S2). These unusual effects derive from natural asymmetries in our model's architecture. For example, within the mechanisms of Layer 1, adaptation is more influential for smaller integration windows because with less potential for the summing of signal (in the inhibitory pathway as well as the excitatory pathway) these mechanisms are more labile. This is to say that the impact of an increase in Z (see Fig. 5(A)) is greater when the values of A and B (A 1 A 2 A m ; and B 1 B 2 -B m ) are small than when they are large, and since Z is the parameter that carries desensitisation by adaptation, aftereffects are larger for smaller mechanisms. Similarly, for any given mechanism, as target size becomes smaller than the integration window, the impact of adaptation increases for that mechanism. (See Supplementary Figs. S4 and S5 and their captions for further discussion.) In general, the behaviour of our model derives primarily from its architecture; the black curve in Fig. 5 (D) was derived by setting a single adaptation parameter (a) by eye (see Methods), yet the model describes the key features of our results (i.e. their bidirectional asymmetries) very well.
Note that the first layer of our model is consistent with our motivating work on area summation of luminance contrast at threshold and above (Meese, 2004;Meese and Summers, 2007;Meese and Baker, 2011), extending it here by pooling over first-order spatial frequency and orientation, the details of which no doubt deserve further investigation. The second, subtractive, layer is a novel extension designed to deliver the retinal image-size code.

DISCUSSION
We have used an adaptation paradigm to provide the first psychophysical evidence for size mechanisms in human vision, and supported this with a computational model. Other studies have also shown distortions of perceived object shape or size, including various size adaptation aftereffects (Ko¨hler and Wallach, 1944;Pooresmaeili et al., 2013;Kreutzer et al., 2015;Zimmermann et al., 2016;Laycock et al., 2017;Zeng et al., 2017;Altan and Boyaci, 2020). However, none of these studies have demonstrated that vision involves neural mechanisms for size coding. For example, where static luminancedefined shapes or contours have been used, perceptual distortions can be attributed to spatial repulsion effects (Ganz, 1964) or to adaptation of low spatial frequency mechanisms (Blakemore and Sutton, 1969). In some cases, the involvement of retinal afterimages might also have been important (see Ko¨hler and Wallach, 1944). In contrast, our own experiments are the first ones designed to investigate the size-mechanism hypothesis directly, ruling out the possibilities above by using (i) a rapid spatially jittering adaptor and (ii) stimuli that are narrowband in spatial frequency. This approach decouples object size from (i) retinal contour location and (ii) carrier (or dominant object) spatial frequency. We have also placed our results in a computational context developed here by building in a natural way on known cortical physiology and a large body of previous psychophysical work on early vision. We know of no other image-driven model of size perception that has done this. Our model accommodated our hypothesised bi-directional adaptation aftereffects for size, and in this respect the work here delivers a successful test of our proposal about size-coding that emerged from work on image contrast (Meese and Baker, 2011). But more than that, two asymmetrical features of our results that we had not anticipated are emergent properties of our model and provide further support for the scheme we have been advocating.
Our work is complemented by several other studies and observations. For example, the perceived aspect ratio of a shape can be distorted by adaptation, appearing narrower or wider when observers were adapted to other shapes (Storrs and Arnold, 2017) or to large grating patterns (Frome et al., 1979). Haptic size adaptation has also been demonstrated using paradigms where participants grasped objects of different sizes (Walker and Shea, 1974). And more generally, strong simultaneous effects on perceived size can also be induced by surrounding elements that are either smaller or larger than a central target, such as in the well-known Ebbinghaus illusion (Ebbinghaus, 1902).
Our focus here has not been on temporal dynamics, but in additional pilot experiments conducted on the second author, we found that the size effect increased monotonically during the first 16 s of adaptation but was constant thereafter. Size perception returned to baseline by around 60 s after the offset of the adaptor. This rapid attack and long persistence mean that judgements of object size in natural environments might be affected in repetitive tasks such as fruit picking or production line work, for example. There might also be important clinical implications relating to judgements of body size in patients with eating disorders (Challinor et al., 2017).
Our motivation for this work was the prediction that populations of neurons which pool image texture across various regions of the visual field might code for object size (Meese and Baker, 2011). Receptive fields in V1 are too small for this, but neurons with large receptive fields that also exhibit suppression effects are found in extra-striate visual areas such as V4 (Desimone and Schein, 1987;Pollen et al., 2002) and have many of the required properties. Results from fMRI have shown that adaptation can reduce or increase the area of V1 activated by a stimulus (Pooresmaeili et al., 2013) and this might suggest that our own adaptation effects occur at an earlier stage than we suppose. However, the slow time-course of fMRI means that feedback from later stages might also be involved in these imaging results. A recent study applying TMS to lateral occipital cortex (Zeng et al., 2020) found that size judgements were disrupted at an earlier time point than when TMS was applied to early visual cortex, consistent with this feedback hypothesis.
Our modelling shows how mutually inhibitory mechanisms that pool over different regions of the visual field can produce an adaptable population code for size, but several developments are needed before a more comprehensive understanding of perceived object size can emerge. For simplicity, our current model is one dimensional and (quite straightforwardly) would need to be extended to the two spatial dimensions of the retina to have general applicability. More challenging is that our current model has no method for segmenting the background to deliver a stopping rule for contrast integration, this shortfall owing to the simplicity of our psychophysical constraints. However, our work is not undermined by this, and remains valid so long as the segmentation task can be achieved. Everyday observations serve as an existence proof for this, and our finding that size adaptation aftereffects extend across the orientation and spatial frequency of luminance modulations points to the sort of general second-order process that we should expect (see also Richard et al., 2019): one that operates on the envelope of pooled local contrasts (e.g. the boundary of an object). Furthermore, visual neurons in V2 and V4 (Zhou et al., 2000) are known to have the border ownership properties that might be an important part of this process, the details of which continue to be investigated (von der Heydt and Zhang, 2018). Finally, our experiments did not distinguish between image size (which depends on viewing distance) and physical object size (which does not), so whether the process identified here comes before or after the processes of depth perception and size constancy remains to be elucidated. We hope that future work in neuroimaging and neurophysiology as well as psychophysics, will help to illuminate these issues.

APPENDIX A. A COMPUTATIONAL MODEL OF SIZE ADAPTATION
We constructed a computational model to simulate the size adaptation experiments. For simplicity, the model operates along only one spatial dimension with i = 1 to n pixels (n = 8,192). This gives a notional resolution of 256 pixels per degree. This is higher than in the experiments (64 pixels per degree) which were limited by the pixel resolution of the display. The model has a population of j = 1 to m pooling mechanisms (m = 91) defined by Gaussian spatial profiles (G ij ) with standard deviations ranging from 4 to 2048 pixels (or $ 1 arcmin to 8 degrees) in logarithmic steps of the index j. Model inputs were rectangular functions (see Fig. 5(A)) with unity height and widths between 16 and 4096 pixels (3.75 arcmin to 16 degrees; 17 sizes in log steps). Half a cycle of a raised cosine function (16 pixels, or 3.75 arcmin wide, consistent with the stimuli in our experiments) was added to each end of these functions to produce profiles with smooth edges. These inputs describe the contrast envelope of our stimuli across space which, to a first approximation, is equivalent to the output of an array of standard model complex cells.
(Recall that stimulus and adaptor contrasts were fixed and identical in our experiments.) Stimulus response was calculated by multiplying model inputs by each pooling mechanism, and their products were passed through a nonlinear gain control equation (Layer 1), given by: where C i is the model input (the contrast envelope) at each pixel location (i). Note that the contrast terms are exponentiated before summation, as required by previous work . The exponents p and q were fixed at 2.4 and 2 respectively, also based on previous work (Legge and Foley, 1980;Summers, 2007, 2012). Z j is a saturation constant with a value of 1 in the absence of adaptation (described further below). The term that includes G im represents surround suppression from the largest mechanism in the population (indexed by m). The R symbol denotes summation across space (i = 1:n). All stimuli and pooling mechanisms were centred in the middle of the spatial array (of size n).