The computation of relative numerosity, size and density

Highlights • Observers could accurately discriminate dot number in irregularly shaped arrays.• Accuracy was less than that previously reported for regular shaped arrays.• Observers could discriminate changes of dot density from changes in array size.• Results are consistent with a computation of number from size and density.


Introduction
Relative numerosity discrimination has been studied experimentally in adults (Burr & Ross, 2008;Durgin, 1995Durgin, , 2008Ross & Burr, 2010) infants (Xu & Spelke, 2000), and non-human species (Brannon et al., 2001;Gallistel, 1989;Leslie, Gelman, & Gallistel, 2008), using psychophysics (Barlow, 1978), fMRI (Harvey et al., 2013;Piazza et al., 2007), and single unit physiology (Nieder, 2005). It has been suggested that there is a 'visual sense of number' (Burr & Ross, 2008) and that 'Vision senses number directly' (Ross & Burr, 2010) for large numbers of tokens. Here we attempt to discover whether there is indeed a mechanism for numerosity separate from density and size of textures. A common-used strategy for measuring relative numerosity thresholds is to scatter the tokens within a confined area, such as a circle (Burr & Ross, 2008;Durgin, 1995;Raphael, Dillenburger, & Morgan, 2013). In these circumstances, changing the number of tokens must change either the area of the pattern or the density of items. Weber fractions for numerosity are lower when the numerosity change is accompanied by a change in area (Raphael, Dillenburger, & Morgan, 2013), in agreement with other studies showing that a high-precision, one-dimensional mechanism is responsible for area discrimination of circles (Morgan, 2005;Nachmias, 2011). Therefore, experiments with circular textures may overestimate the accuracy of true numerosity discrimination. Randomly interleaving size-varying and density-varying trials (Burr & Ross, 2008;Raphael, Dillenburger, & Morgan, 2013) does not solve this problem, since observers may use whichever of the two independently noisy signals, size or density, is larger on a particular trial (Raphael, Dillenburger, & Morgan, 2013). For these reasons, we thought it desirable to repeat the experiment of Burr and Ross (2008) using stimuli with non-circular polygonal outlines (Fig. 1). We compared four conditions: (1) density-varying trials alone (2) area varying trials alone (3) interleaved area-density trials where the observers made a numerosity discrimination and (4) which is the same as condition 3, but in addition observers had to decide whether the difference was in area or in density. We expected area thresholds for random polygons to be higher than those for circles, and the first question was whether this would also raise thresholds for numerosity. In additional conditions subjects discriminated changes in density or changes in size when numerosity was constant.
In signal-detection models of the data, we asked whether independently noisy area and density channels were sufficient to account for the data, or whether a separate numerosity mechanism is required. We addressed this question by comparing two-channel vs three channel fits to the combined data in all conditions.

Stimuli and procedure
Examples of the stimuli are shown in Fig. 1. Stimuli were presented on the LCD display of a MacBookPro laptop computer with screen dimensions 33 Â 20.7 cm (1440 Â 900 pixels) viewed at 0.57 m so that 1 pixel subtended a visual angle of 1.25 arcmin. The background screen luminance was 50 cd/m 2 . Stimulus presentation was controlled by MATLAB and the PTB3 version of the Psychtoolbox. On each trial subjects saw consecutively two stimuli, which they were required to compare for number, density or size.
Each stimulus contained a number of fuzzy dots with a diameter of 10 arcmin and a Gaussian envelope with a space constant of 2.5 arcmin. Each dot was randomly assigned a negative (black, 0.4 cd/m 2 ) or positive contrast (white, 300 cd/m 2 ). The dots were randomly positioned within notional polygons without overlap. The irregular polygon shapes were generated by an algorithm that pseudo-randomly varied the position and number of vertices of each polygon in any trial. In all conditions the standard stimulus contained 64 dots within the standard area of 50,000 pixel, which corresponds to a circular area of 2.63°radius. An example is shown in the center panel of Fig. 1. The standard and the test stimuli were presented for 0.5 s each in random order (2AFC). Between the two intervals a gray blank screen with a central fixation cross was shown for 0.75 s. After each stimulus pair a key press was awaited while only the fixation cross was presented. The test and standard positions were separately offset from the fixation point to avoid interference by afterimages and to prevent the observer from using landmarks on the screen for size judgments. The offset was randomly selected in both horizontal and vertical direction from a uniform distribution with a width of 75 arcmin (60 pixel). The test stimulus either varied in texture size with dot density kept constant at the level of the standard (left panel of Fig. 1) or in dot density with size kept constant at 2.63 arcmin radius (right panel). The number of dots co-varied with size or density, respectively. The deviation in either texture size or density relative to the standard patch was chosen by an adaptive procedure (Watt & Andrews, 1981) in steps of 4%. The procedure was designed to obtain the 50% point (l) and the standard deviation (r) of the psychometric function efficiently by concentrating cue values at l ± r.
Similar to the experiment with circular structures described in Raphael et al. (2012) the following conditions and Trial Types were used. We use 'Condition' to refer to a block of trials containing the same Task and one or two Trial Types and, 'Trial Type' to refer to the kinds of trial within a block.
The 'Density Condition' consisted of blocked density varying trials where the area of the test was the same as the standard and the density of dots co-varied with the number. Observers estimated the differences in density between the test and standard patch. Similarly, the 'Size Condition' consisted of size varying trials where the density of the dots in the test was the same as in the standard, and the area was adjusted to accommodate the greater or smaller number of dots at that fixed density. Here, observers were asked to estimate the differences in texture area. In both conditions, size varying and density varying trials were presented in separate blocks and observers made a binary choice: 'denser'/'less dense' and 'larger'/'smaller'. In a modified Size Condition, the 'Outline Size Condition' only the outline of the polygon shape was shown but no dots. Here, observers compared area size of the test stimulus with the area size of the standard.
In the 'Mixed Task Condition' and in the 'Numerosity Condition' trials of size and density varying cues were randomly interleaved. In the 'Mixed Task Condition' observers were asked which kind of difference (size or density) was present, and the direction of change. In the 'Number Condition' the observers had only two keys available, to indicate which stimulus had more dots (numerosity discrimination).
Since we cannot prevent observers in the density and size conditions using numerosity as a cue (because both signals co-vary with numerosity), we introduced a further condition to estimate size ('Extended Size Condition') and density ('Extended Density Condition') changes alone. Here, we introduced a trial-type for which the number of dots was kept constant at 64 dots in each stimulus with size and density of the test varying oppositely to each other. Hence, in the Extended Size Condition in half of the trials a larger stimulus coincided with less density and in the 50% of trials a larger stimulus coincided with higher numerosity, but constant density compared to the standard. The aim of this arrangement is to prevented observers from using numerosity as a reliable cue to estimate the density or size of the texture.
An overview of all conditions is given in Right: Test stimulus with larger area than the standard but the same density. The shapes were generated by an algorithm that randomly varied the position and number of vertices in the polygon while keeping area constant.
Each observer completed 5 sessions of all conditions; only of the Mixed Task Condition 7 sessions were done. In the Single Trial Type conditions (Size Task, Density Task and Outline Size condition) the sessions contained 128 trials summing up to 640 trials per observer and condition. In the Mixed Task, Extended Size and Density and Number Conditions two different Trial Types were interleaved summing to at least 1280 trials per observer and condition.
Five observers took part in the experiments; three with experience in psychophysical experiments (Observers 1-3 of age 34, 37 and 70) and two with no previous psychophysical experience (Observer 4 and 5 of age 19 and 39). The vision of subjects 1, 3, 4 and 5 was normal or corrected to normal. Subject 2 was a myope with À0.75D/À0.75D. Subjects 1-4 took also part in the numerosity experiments presented in Raphael, Dillenburger, and Morgan (2013) and had therefore some experience in numerosity, size, and density judgment. The experiments were carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki) and informed consent was obtained from the human participants.

Data analysis and modelling
Individual psychometric functions representing percent 'lar ger'/'denser'/'more' responses of each condition and Trial Type were fit, using the MATLAB 'fminsearch' procedure, by cumulative Gaussian functions with parameters l (50% point) and spread r (standard deviation). 95% confidence limits for the individual points on the psychometric functions and those for the fitted parameters of the psychometric functions were obtained by a bootstrapping procedure with 640 simulations. The best-fitting parameters of the psychometric functions and their confidence intervals for all subjects and tasks are given in Tables 2 and 4.
To see if there were any statistically significant effects of Trial Type a Chi-squared test based on likelihood ratios was used. A fit to the combined data over Trial Type using different values of r for the two Types was compared to a fit using the same value of r. To asses the differences in only the slope of the psychometric functions (r), the PMFs were allowed to have different l's. The likelihoods and the derived chi-squared values of these pairwise comparisons are shown in Table 2. Twice the difference in log likelihoods between the two fits was assumed to be distributed as Chi-squared with 1°of freedom (Hoel, Port, & Stone, 1971). If the two-r fit is significantly better than a one-r fit we can conclude Table 2 The table shows fitted values of the mean (l) and standard deviation (r) of the psychometric functions obtained under the conditions shown in the column headings. Also shown are the log likelihoods (L) of the separate fits (3rd row for each subject) and the Weber Fraction and the log likelihood of the combined size and density varying trials fit (4th row).
The last row of each subject shows the v 2 of the likelihood ratio test for the difference between the individual fits for Size and Density and their combined fit and its significance level ( * p < 0.05, ** p < 0.01, *** p < 0.001). The last column of the Single Trial Type conditions shows the v 2 values for the likelihood ratio test between size judgments with dots and with the polygon outline. that the thresholds for the two trial-types (density or size changes with numerosity) are significantly different. Signal detection models were fit to all the data of a single subject under all the experimental conditions. Details are given at the appropriate point in the paper.

Results
Thresholds (Weber Fractions) are shown separately for the different conditions, Trial Types and observers in Fig. 2 and Table 2. The observer's thresholds (r) are shown by circles with error bars indicating 95% confidence limits. The colored bars show predictions of a 2-channel size-density model, to be described below. The horizontal lines depict predictions of a 3-channel size-density-numerosity model.
For a given observer, thresholds are similar across tasks and conditions, and observers who are relatively good at one task are also good at the others. The most obvious feature of the data is that thresholds are similar for size, density and numerosity, both in Single Trial and mixed conditions (first 6 data points, reading from the left of the figure). Unlike the previously reported results for circular textures (Ross, 2010;Raphael, 2013) numerosity thresholds were not lower in the size-varying vs the density-varying conditions. The exception to this simple rule is in the 'Extended Size' and 'Extended Density' conditions. Here, thresholds are lower when size/density co-varies with numerosity than when size and density change in opposite directions, so as to keep number constant. Finally, thresholds are higher in the Outline Size condition, when only the outline of the polygon changes in size than in all the other conditions when dots are present within the outline.
These findings are confirmed by the statistical pairwise comparisons in Table 2. Fig. 3 and Table 3 compare the Weber fractions of the polygon experiment against those of the previously published circle experiment of the four subjects that took part in both experiments (Raphael, Dillenburger, & Morgan, 2013). It is shown that thresholds on size-varying trials are indeed lower with circular texture than for polygons in all conditions.
In Fig. 4 which shows the Weber Fractions of size varying trials against density varying trials for each observer and condition it can be seen that the thresholds for size-varying and density-varying trials, in size, density and number conditions are indeed similar. In trials of the Extended Conditions when numerosity is kept constant and density and size vary reciprocally, discrimination is significantly impaired (significant in 9 out of 10 comparisons, 5 . The right hand column shows predictions of the model with separate channels for density, size and number (l S , r S , l D , r D , l N , r N ). The fits of the models are constrained by the data of all conditions. This is why some of the fitted values are markedly discrepant with some data points. The symbols have been joined by lines solely for convenience of reading; in reality there is no continuity between the different conditions. observersÂ 2 Trial Types). Though, comparing the Single Trial Type conditions for size and density with the same Trial Type of the extended conditions of size and density reveals no increase in threshold for size judgment. Hence, observers are not worse in density and size judgments when stimuli are interleaved with trials that offer a less reliable cue.
When only the outlines of the polygons but no dots were shown the threshold for Size judgment increases markedly (significantly in 3 out of 4 observers) relative to the case where changes in size co-vary with dot number. The Weber Fractions in the outline Size Condition resemble the thresholds of the Extended Size Condition when numerosity is kept constant and does not co-vary with patch size. This suggests that the outline condition gives a true measure of size discrimination, when it is not aided by concomitant changes in density and/or number. Size discrimination of polygons thus appears to be relatively poor, as would be expected from the finding that the most accurate forms of 2-D size discrimination are obtained by combining 1-D estimates (Morgan, 2005).
The simplest explanation of the identity of thresholds between size-density-and number-varying conditions is that the same mechanism is being used in each case: a relative numerosity system. However, a single-channel model for all the data is ruled out by a number of facts. One is that observers were able to discriminate changes in size and density in the Mixed Task Condition (see Fig. 5). Performance in this 4 button task, as measured from the slope of the psychometric functions, was not as good as that Trial Type size and density varying trials. Squares: size and density varying trials are randomly interleaved and the subject has to indicate not only in which direction the test stimulus is changed but also whether it differs in size or density (Mixed Task). Downward pointing triangles: size and density varying trials are randomly interleaved and the subject has to indicate whether the test has more or less dots than the standard (Numerosity Condition). Thresholds for size are raised in the Polygon experiment, but there is no systematic change for density. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Table 3 The table shows the mean (l) and standard deviation (r) of the fitted c functions and the log likelihood (L) of the fits. Also shown in the column labeled Mean/L are the values of l and r for the combined fits of the circle and polygon data and the associated log likelihood. The column headed v 2 shows the results of the likelihood ratio test for the difference between the individual fits for circles and polygons and their combined fit and the rightmost column show the significance for df = 1. for discriminating the direction of the numerosity change (increase vs decrease) but this is to be expected from a 2-channel model where there are separate channels for size and density, as we show in the Modelling section below. Another problem for a single-channel numerosity model is that observers could still discriminate changes in size/density in the Extended Condition where numerosity was held constant, albeit with decreased accuracy versus the condition where numerosity changed also. Finally, observers could discriminate changes in size of an outline figure, where the numerosity signal was absent, albeit with reduced sensitivity. These considerations suggest that a hybrid model may be required to explain the full range of data, incorporating size, density and numerosity mechanisms (channels). But if this approach is to be taken, it is necessary to show that a 3-channel model is significantly better than a 2-channel (size/density) model, taking into account the greater number of parameters in the 3-channel case. We address this question rigorously in the following Modelling section.

Two-versus three-channel models
We consider whether a numerosity channel is warranted by the data, or whether independent size and density channels will suffice. We note that in all conditions where numerosity was varied, size or density was varied as well: therefore a two-channel model deserves consideration on grounds of parsimony. To model the data we used the data from all Trial Types and Conditions including the Extended Conditions and the Outline Size condition. An example of the full data set for one observer and the fit of two models are shown in Fig. 6. The modelling results for all observers are shown in Fig. 2. In what we shall call the '2 channel model' we assumed two independent mechanisms or channels, one responsive to size changes and the other to density. In Single Trial Type conditions the observer monitors only the relevant channel. Otherwise the observers are assumed to monitor two different noisy signals for size and density each defined by the shift (l) and the sigma (r) of the psychometric function (l S , r S , l D , r D ) and chooses the one that deviates most from its reference value (Green & Swets, 1966;Palmer, Ames, & Lindsey, 1993;Palmer, Verghese, & Pavel, 2000;Morgan and Solomon, 2006) 5. The figure shows that subjects were able to discriminate (Red) changes of size from changes in density when they were randomly interleaved (Mixed condition). Also shown (Blue) is the ability to discriminate the direction of the numerosity change (increase vs decrease). The first five panels show the data for each of the observers separately and the bottom right panel shows all the data combined over subjects. Error bars show 95% confidence limits based on the binomial distribution. The continuous curves are the best-fitting cumulative Gaussians. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) between which the observer chooses are assumed to be normalized by their standard deviation. This is particularly important, when the two channels have different noise levels since the observer would have strongly biased towards choosing the more noisy density signal (Raphael, Dillenburger, & Morgan, 2013).
To model the data in the extended size and Density Conditions, we assumed that observers monitored only the relevant channel (size or density) as in the Single Trial Type condition. In the size-outline condition, the observer monitors only the size channel.
To improve the fit of the two-channel model and to compensate for two conspicuous failures, two further parameters were included (as in Raphael, Dillenburger, & Morgan, 2013). The first discrepancy occurs because the channels are clearly not independent. On trials when the observer makes an incorrect identification of size vs density, they are above chance at reporting the correct direction of the change. In other words, an increase in density is more likely to be reported as an increase in size than as a decrease in size (see row 3 in Fig. 6). A correlation between observed size and density has previously been reported by (Dakin et al., 2011) who note precedents in the previous literature on density discrimination. The correlation takes the form of larger stimuli appearing denser, and denser stimuli appearing larger, the same as the correlation observed in the present data. To account for the cross talk between channels we introduced a 'leakage' parameter: a fixed proportion of the signal in the channel containing the signal was added to the channel not containing the signal. For example, if the signal on a particular trial were m in the density channel, a signal mp (p < 1) would be added to the size channel. This is equivalent to introducing a bias l in the psychometric function. If p > 0 this ensures that observers will be above chance at detecting the direction of the numerosity difference even if they incorrectly identify its source. The same cross talk will also improve performance in the numerosity condition where observers do not have to identify the source. The cross-talk was also assumed to be present in the extended conditions, leading to a larger signal in the case where size and density co-varied and to a smaller signal when they counter-varied. There was no cross-talk in the outline size condition, because the density cue was absent.
A second major drawback of the simple 2-channel model is that it assumes observers choose density or size equally often if there is no noticeable cue available, hence there is no bias in the identification of the source of change. However, some observers (observers responses. Row 2 shows all trials under that condition. Row 3 shows the results when the observer chose the wrong task, e.g. responded to density when the patches had a size difference. Row 4 shows results on trials when the observer chose the correct task. Row 5 shows the probability of correctly choosing the density task on density trials (left) and correctly choosing the size task on size trials (right). Row 6 shows the results of the Numerosity Task, when the density and size trials were randomly interleaved and the observer chose between 2 responses ('more' vs. 'less'). The error bars show 95% confidence levels. The solid black curves depict 2-parameter (l, r) fits to the individual psychometric functions (see Table 1). The red curves in the left-hand figure (a) show the best fits to all the data of the 2-channel 6-parameter MAX model with one parameter accounting for the bias towards selecting the density response over the size response. The blue curves in the right-hand figure show the fit of a 6-parameter model with 3 channels. The fitted parameter values for all observers are shown in Table 2. 1, 2, and 3) show a significant bias as can be seen in the upshift or downshift of the response probability for small cues in the 5th row of Fig. 6. Hence, a parameter that accounts for the bias for choosing density or size as the source was introduced, as explained in Raphael, Dillenburger, & Morgan, 2013;. The fitted values of this parameter showed large biases in Observers 1-3 (À0.29, À0.35 and À0.23, respectively) but no significant biases in Observers 4 and 5 (À0.01 and À0.02, respectively). Next, we consider a 3-channel model with independent channels for size, density and numerosity. We fitted the same 2 parameter (l, r) psychometric function to all empirical functions where numerosity varied and could be used as a cue. However, a single mechanism could not discriminate between size and density changes in the Mixed Task, so we added two further channels for Size and Density, each with its own l and r, and deemed that observers used the Max rule to make their choice. This gave a 6-parameter model in total. The size and density channels are used only to identify the source of the numerosity difference in the Mixed Task condition and in the Extended Conditions. More details of the models are given in (Raphael, Dillenburger, & Morgan, 2013).
The fitted parameter values of the described models are shown in Table 2. For all observers, the 3-channel model was inferior to the 2-channel 6-parameter model. Predicted thresholds of the 3-channel model are overestimated in the extended conditions when numerosity is constant, and underestimated when numerosity co-varies with density or size. No significance levels can be assigned to these differences because the models are equally constrained, but the conservative conclusion is that we cannot reject the model of a single numerosity mechanism, which the observers use to make discriminations of 'greater' vs 'smaller' in all of the conditions, number, size and density. It is noteworthy that in the same analysis carried out for the circle experiment, a single number channel model fails badly in all observers, reinforcing the conclusion that there are at least 2 channels, for size and density. The difference between the two experiments is that observers are more sensitive to size of the circles than to the size of polygons.
It might reasonably be objected that the 3-channel model has been disadvantaged relative to the 2-channel model by having three values for l instead of two. There is no reason to expect non-zero values for this bias parameter, and the fitted values are small. Therefore, we repeated the fits of the models with l values constrained to be zero. The 6-parameter three-channel model now had only three parameters (r for each channel) while the two-channel model with leakage and bias had four. The different degrees of freedom allow a statistical comparison between the fits (Hoel, Port, & Stone, 1971). The results of the fits are shown in Table 4. For every subject, the fit of the 2-channel model was superior to that of the three-channel model (see Table 5).

Discussion
It is clear from these data that the mechanisms for size, density and numerosity discriminations are closely intertwined. For a given observer, thresholds are similar across tasks and conditions, and observers who are relatively good at one task are also good at the others (Kendall's Coefficient of Concordance = 0.75). Moreover, changes in density tend to be confused with changes in size. Whether these findings support the idea of a special numerosity mechanism, distinct from size and density, is a complex question, which we now address. Table 4 The table shows best-fitting parameter values for 3 models described in the text and tests of significance between the models. The first model is the 5 parameter MAX model with 2 channels (l S , r S , l D , r D ) and a leakage factor. The second line shows the 2 channel 6 parameter model which is supplemented by a bias in the choice between size and density channels. The third model has 3 separate channels for number, density and size. The columns 1 denote the observer. The last column contains the results of a likelihood ratio comparison between two models indicated in the previous column. For each observer the values of v 2 and the significance levels are shown.  Table 5 Best fitting parameters of the models with l values constrained to be zero and the likelihood (L) for each observer. The v 2 values of the 6th and 12th column depict the results of a likelihood comparison test between the full models and the model with l set to 0. The last column shows the v 2 value of the likelihood ratio test comparing the two constraint models. The main argument against a pure numerosity mechanism is that it cannot explain the ability of observers to discriminate between changes in size and density at threshold. Separate channels for size and density are required. Once these have been admitted, the introduction of a third channel for numerosity is difficult to justify. An augmented two-channel model to take account of cross-talk between channels, and allowing for biases in identification, has no more parameters (6) than the 3-channel model and is more successful. If the mean (l) of the psychometric function is constrained to be zero, the resulting 3 parameter version of the three-channel model is significantly inferior to the 4-parameter version of the two-channel model for all observers. Further, the ability of observers to report the direction of a numerosity change correctly when they are unable to identify its origin (size vs. density) can be explained by a bias towards seeing denser patterns as larger and vice versa. This bias has been independently confirmed (Dakin et al., 2011) and could be explained by constancy scaling (Raphael, Dillenburger, & Morgan, 2013;Thouless, 1972) or decorrelation of naturally-correlated signals (Barlow & Földiák, 1989). The bias can be modeled by a cross talk between the size and density channels. Note that the average fitted parameter value for this cross-talk over observers is 0.36, which is far less than the value of unity predicted from using a pure numerosity channel. As mentioned earlier, the increase in density thresholds when size changes in the opposite direction (constant number trial type in the Extended Density Condition) and vice versa can be explained by the cross-talk between size and density channels.
Our view is that the case for a distinct numerosity mechanism is 'not proven' by the present experiment. It cannot be rejected, nor is there any compelling reason to accept it. Lacking so far is a model for how numerosity is computed if it is indeed true that 'Vision senses number directly' (Ross & Burr, 2010). If counting is excluded for numbers outside the 'subitizing region' (Jevons, 1871;Ross & Burr, 2010) it is a simple matter of logic that approximate numerosity must be computed from some summary statistic of the image, such as contrast energy (Dakin et al., 2011;Morgan et al., 2014). However, Anobile, Cicchini, and Burr (2014) have recently suggested that mechanisms for numerosity and texture density are separable, the former operating only for numbers too small to constitute a texture. Their evidence is that Weber's Law for numerosity is replaced by a square-root relation for large numbers. The challenge now is to provide a mechanism for approximate numerosity, other than a texture based mechanism, that explains the Weber relationship.