Enumerating the forest before the trees: The time courses of estimation-based and individuation-based numerical processing

Melcher, David; Huber-Huber, Christoph; Wutz, Andreas

doi:10.3758/s13414-020-02137-5

Enumerating the forest before the trees: The time courses of estimation-based and individuation-based numerical processing

Open access
Published: 30 September 2020

Volume 83, pages 1215–1229, (2021)
Cite this article

Download PDF

You have full access to this open access article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Enumerating the forest before the trees: The time courses of estimation-based and individuation-based numerical processing

Download PDF

David Melcher^1,2,
Christoph Huber-Huber^1,3 &
Andreas Wutz^4,5

2459 Accesses
5 Citations
4 Altmetric
1 Mention
Explore all metrics

Abstract

Ensemble perception refers to the ability to report attributes of a group of objects, rather than focusing on only one or a few individuals. An everyday example of ensemble perception is the ability to estimate the numerosity of a large number of items. The time course of ensemble processing, including that of numerical estimation, remains a matter of debate, with some studies arguing for rapid, “preattentive” processing and other studies suggesting that ensemble perception improves with longer presentation durations. We used a forward-simultaneous masking procedure that effectively controls stimulus durations to directly measure the temporal dynamics of ensemble estimation and compared it with more precise enumeration of individual objects. Our main finding was that object individuation within the subitizing range (one to four items) took about 100–150 ms to reach its typical capacity limits, whereas estimation (six or more items) showed a temporal resolution of 50 ms or less. Estimation accuracy did not improve over time. Instead, there was an increasing tendency, with longer effective durations, to underestimate the number of targets for larger set sizes (11–35 items). Overall, the time course of enumeration for one or a few single items was dramatically different from that of estimating numerosity of six or more items. These results are consistent with the idea that the temporal resolution of ensemble processing may be as rapid as, or even faster than, individuation of individual items, and support a basic distinction between the mechanisms underlying exact enumeration of small sets (one to four items) from estimation.

Capacity limit of ensemble perception of multiple spatially intermixed sets

Article 08 August 2018

Comparing explicit and implicit ensemble perception: 3 stimulus variables and 3 presentation modes

Article Open access 11 October 2023

Ensemble perception: Extracting the average of perceptual versus numerical stimuli

Article 03 January 2021

Visual scenes are typically crowded and contain numerous objects. An example is a set table for a large dinner party, which might contain a series of plates, glasses, and silverware. Such a scene can be perceived in terms of individual objects, such as by fixating on a particular glass in order to grasp it. However, we are also able to quickly and effectively glean the overall meaning of the table and report the average color, size, and shape of the plates or glasses and give a close, but often inexact, estimate of the number of place settings.

Participants are able to report the exact combination and properties of features for an attended object, and even the exact feature value for a small group of objects (Cowan, 2000; Xu & Chun, 2009). The representation of features for a group of items, however, reflects an estimation of the average value of the item’s orientation, size, motion, color, or even more complex features such as gender or facial expression (for review, see Whitney & Yamanashi Leib, 2018). This estimation of the “feature gist” of the scene, or average value of an ensemble of items, may help to provide the rich impression of continuous and stable perception over time and across saccades (Corbett, Fischer, & Whitney, 2011; Corbett & Melcher, 2014a, 2014b; Melcher & Colby, 2008).

One of the attributes of ensemble perception that has often been emphasized is its speed. It has been suggested that the properties of the ensemble, such as average size, shape, or orientation, are perceived in a single glance (for a dissenting viewpoint, however, see Myczek & Simons, 2008). A stronger claim, however, is that “ensemble representations can be extracted with a temporal resolution at or beyond the temporal resolution of individual object recognition” (Whitney & Yamanashi Leib, 2018, p. 112). There is evidence for extraction of ensemble information from very brief presentations or from rapid sequences (Haberman & Whitney, 2007). For example, Chong and Treisman (2003) reported that mean size could be computed for a display shown for only 50 ms, which is consistent with the idea of rapid, parallel feature processing (see also Yamanashi Leib et al., 2016). That study, however, did not mask the stimuli, so the effective duration of the display was likely much longer. When using a backward mask, Whiting and Oriet (2011) found that size averaging improved for durations greater than 100 ms, suggesting that an effective duration of 100–200 ms was required in their task to reach maximal performance. When interpreting the time course of mental operations, like estimation, it is critical to distinguish between their temporal resolution, which defines how quickly sufficient information can be obtained to make a reasonable judgment about the stimulus, and their temporal integration window, which defines at which point performance does not improve significantly with increased viewing time (Whitney & Yamanashi Leib, 2018).

In terms of our understanding of visual processing, the question of whether features can be extracted into an ensemble within 50 ms, or rather 200 ms, is critical theoretically. More generally, processing of ensembles or scene “gist” is often claimed to be faster than object recognition (amongst others; Greene & Oliva, 2009a, 2009b), which would be a requirement for any theory that posits the use of scene context to disambiguate object identity (e.g., Bar, 2004). To directly compare ensemble perception to processing of individual objects, then, requires measuring both the temporal resolution (“can ensemble processing be done for a presentation rate of 50 ms?”) and temporal integration window (“does precision improve over longer time periods?”) of both processes.

The comparison of ensembles versus individual perception is further complicated by the fact that even the perception of individual objects might vary depending on the number of items in question (Wutz, Caramazza, & Melcher, 2012; Wutz & Melcher, 2013, 2014; Wutz, Weisz, Braun, & Melcher, 2014). Perhaps the best-known demonstration of rapid object individuation is the classic Sperling (1960) study showing that participants can report only around four items in a single glance with near perfect accuracy. In terms of rapid encoding for enumeration (“how many items are there?”) or visual memory (“was this item presented previously?”) tasks, many studies have shown that performance remains high up to around three to four items, depending on the stimulus and task parameters and individual differences. This raises the additional question of whether the appropriate comparison for ensemble processing is processing a single item or, instead, that of individuating a small group of items.

Our ability to either focus on an individual item or, instead, the group as a whole is captured by the saying of “not seeing the forest for the trees” (attributed to Sir Thomas More, 1533). Here, we are asking whether we actually do see the forest before the trees, as measured by our ability to report the number of trees. To do so, we compare the time course of seeing the numerosity of “the forest” with that of perceiving exactly one, two, three or more “trees”—that is, we compare the time course of ensemble processing with that of individual item perception in the case of rapid enumeration.

Estimation versus individuation

Individuation is a core mental ability. It requires the perceptual system to select features from an image, bind them into an object, and select it as separate from other objects and from the background (Xu & Chun, 2009). Individuation forms a primary constraint for the limited capacity of perception, attention, and working memory (Piazza, Fumarola, Chinello, & Melcher, 2011; Wutz & Melcher, 2013). Indeed, when looking at single participants, it has been found that individuation commonly exceeds working memory capacity and that the two measures can be highly correlated (Melcher & Piazza, 2011; Piazza et al., 2011). Moreover, the stages of individuation and then identification in scene analysis are clearly dissociable by different masking techniques (Wutz & Melcher, 2013). This body of evidence suggests that individuation forms a critical limit for mental capacity, upon which higher cognitive tasks like working memory or object-location tracking are grounded (Dempere-Marco, Melcher, & Deco, 2012), with the additional requirements of identification for working memory or location updating for multiple-object tracking.

In addition to its central role for scene analysis and object capacity, individuation is fundamental for numerical cognition. The phenomenon of “subitizing” reflects the rapid apprehension of the numerosity of a small set of items (Jevons, 1871; Kaufman, Lord, Reese, & Volkmann, 1949). Typically, around three to four items can be individuated at once (as described above) and can be enumerated precisely and exactly, whereas quantities exceeding this limit are only represented by approximation (estimation of the numerosity of the ensemble) or involve multiple processing steps (counting).

For numerical cognition, it has been argued that individuation and estimation are distinct processes operating on distinct object quantities (Burr, Turi, & Anobile, 2010; Feigenson, Dehaene, & Spelke, 2004; Piazza et al., 2011). For example, numerosity judgments typically follow Weber’s law, with errors increasing in proportion to the number of items presented, while errors remain relatively constant within the subitizing range (Burr et al., 2010). Moreover for scene analysis, estimation of the quantity of items in an ensemble is considered central, because it allows for the global processing of image properties that gives access to the large-scale scene layout and its summary statistics (Alvarez, 2011; Alvarez & Oliva, 2008, 2009). In this view, estimation may be a special case of ensemble or statistical processing, key to our sense of number and in some ways similar to ensemble processing of other basic visual features, such as orientation, size, movement, color, and depth, as well as more complex features such as the gender or emotional expression of a face (for review, see Whitney & Yamanashi Leib, 2018).

The time course of estimation and individuation

In terms of individuation, capacity limits have typically been characterized in terms of space, forming a debate between theories involving a finite number of discrete slots and those that argue for limited shared resources among the processed items (Awh, Barton, & Vogel, 2007; Cowan, 2000; Xu & Chun, 2009). Such theories start with the idea of how many items in space are individuated (and encoded into working memory) in a single glance. However, more recent accounts that investigated individuation with highly time-sensitive measures (e.g., visual masking, magnetoencephalography) support an alternative explanation based on time (Wutz et al., 2012; Wutz & Melcher, 2013, 2014; Wutz et al., 2014). The pattern of results in those studies suggests that individuation is not instantaneous, and that time may be a key factor that plays a role in capacity limits. Those studies divided the effective processing of stimuli (or, depending on interpretation, the iconic or sensory memory) into smaller units of time by means of forward masking (see below and Method section). In line with classic psychophysical estimates of sensory memory, individuation capacity limits (of around three or four items) were reached only when the effective duration of the targets was at least 100–150 ms. One or two items, by contrast, could be accurately individuated quite quickly after ca. 30–50 ms. That pattern of results suggests that individuation capacity depends on the temporal limits imposed by the fading trace of sensory memory. It is important to note that mere stimulus detection did not depend on temporal factors, in the same paradigm, suggesting that individuation operates on a subsequent level of scene analysis beyond simple feature detection (Wutz & Melcher, 2013).

The current study

In the present study, we now directly compared how individuation and estimation evolve over time within the same individuals using the above-described forward masking paradigm. Typically, performance in both individuation and estimation tasks has been measured in terms of a single display presentation, with little control over the effective duration of the stimuli in terms of visual persistence and sensory memory. Here, instead, we take advantage of a masking paradigm that combines simultaneous and forward masking (Di Lollo, 1980). This method involves a forward pattern mask upon which, at some stimulus onset asynchrony (SOA), one or more targets (“X” symbols) are superimposed.

One key aspect of this design is simultaneous masking. As shown in Fig. 1, the target is camouflaged by the mask, since both are made up of black lines. It is impossible to detect the targets when they are presented together with the mask at an SOA of zero ms. When the SOA exceeds around 20–30 ms, it is possible to detect that there is at least one target (Di Lollo, 1980; Wutz et al., 2012). However, for greater quantities within the subitizing range (three to four items), exact enumeration performance is only reached at longer SOAs of around 100–150 ms (Wutz et al., 2012; Wutz & Melcher, 2013, 2014; Wutz et al., 2014). In other words, the ability to distinguish between different item quantities at different set sizes depends on the SOA between the first display (only mask) and second display (targets and mask together).

The paradigm was designed to yield a form of integration pattern masking due to the close temporal proximity between the two displays. Integration masking describes the temporal aspect of the paradigm, in which both displays (mask and target displays) are combined into a single unified percept. When there is only a single, unique percept, the target Xs are not visible. Instead, when the display is temporally segregated into two unique events, the target Xs become visible (for review, see Enns & Di Lollo, 2000). Pattern (or “structure”) masking describes the fact that the two displays (mask and target) contain similar component elements, which here are oriented lines (for review, see Enns & Di Lollo, 2000). The main reason for using the combined forward and simultaneous (pattern) masking paradigm is that it is provides greater control over the effective processing time of the multiple targets, because very short SOAs can be used (Wutz et al., 2012). By contrast, traditional backward masking paradigms typically leave time for the stimulus presentation duration before the mask is presented. Thus, classic forward or backward masking yields more of an all-or-none step function in which the targets are either visible, and thus enumerated correctly within the subitizing range, or invisible (Wutz et al., 2012; Wutz & Melcher, 2013). For object individuation, we expected to replicate our earlier findings that performance within the subitizing range increases up to capacity limits within 100–150 ms. Moreover, we mapped out the time course of object-quantity estimation and compared it with the temporal dynamics of individuation. Similar time courses would suggest a common underlying mechanism for individuation and estimation and would therefore raise a critical argument against theories that involve extraction of ensemble representations (or scene “gist”) prior to object identification. Alternatively, one process might be faster/slower suggesting that the one builds upon the other or that they serve complementary visual strategies. In particular, if feature processing is rapid and parallel, this could serve estimation and drive a fast temporal resolution for estimation that could even exceed that of the exact individuation of objects.

Experiment 1

Method

Participants

Eighteen participants (10 females, 17 right-handed, mean age = 24.4 years, SD = 1.9 years) took part in the experiment. All participants provided written informed consent, as approved by the institutional ethics committee. They took part in exchange for course credit or a small payment and had normal or corrected-to-normal vision. This sample size (N = 18) is similar to our previous work with the same paradigm (N = 14 in Wutz et al., 2012; N = 16 in Wutz & Melcher, 2013; N = 14 in Wutz et al., 2014). In our earlier studies, we found strong main effects for SOA (η_p² ~ 0.8–0.9) and for item numerosity (η_p² ~ 0.5–0.7), and small-sized to medium-sized interaction effects (η_p² ~ 0.1–0.3). Based on a power analysis (using the software G*Power; Faul, Erdfelder, Lang, & Buchner, 2007), the minimum required sample size to detect small-sized to medium-sized effects (f = 0.2) is 16 participants (analysis of variance [ANOVA], repeated measures, within factors, 16 measurements, alpha = 0.05, power = 0.8). Thus, our sample (N = 18) is sufficient to detect the expected pattern of results.

Stimuli and apparatus

The experiment was run using MATLAB (The MathWorks, Natick, MA) and Psychophysics Toolbox (Version 3; Brainard, 1997; Pelli, 1997). Participants were seated in a dimly lit room, approximately 45 cm from a CRT monitor (1,280 × 1,024 resolution, 36.5 × 27.2-cm display size) running at 60 Hz. On each trial a different pattern of 1,080 randomly oriented, partially crossing black lines (mean line length = 1° visual angle [VA], mean line width = 0.1° VA, 31 × 25° VA mean size of whole pattern) was presented centered on a white background (see Fig. 2a). This pattern remained on the screen, and then, after a variable onset delay, a variable number of items (depending on each task estimation or individuation) appeared. The target items formed an “X” and were linearly superposed upon the random line pattern, by use of the image processing technique “alpha blending”—that is, using Screen(‘BlendFunction’) in Psychophysics Toolbox (Version 3; Brainard, 1997; Pelli, 1997). A small set size (one to eight items) was used for individuation in Experiment (Expt) 1a, and a large set size (>10 items) was used for estimation in Expts 1b and 2 (see Figs. 1 and 2a). The physical properties of both mask and target elements (i.e., contrast, mean line length, mean line width) were equated and the alpha-blending procedure edited the transparency/opacity values of the visual stimuli assuring a mathematically correct superimposition, such that all of the lines were the same level of black. All items were colored in black, were 1° VA in size, and were placed randomly on one of 100 possible locations within an invisible, central rectangle of 14.5° VA in eccentricity (horizontally and vertically) with a minimum buffer of 0.5° VA between the locations.

Procedure

In the first experiment, each subject completed the two tasks (individuation in Expt 1a, estimation in Expt 1b) in two sessions comprising approximately 1 hour each. The serial order of the tasks was balanced across the observers. All subjects received verbal and written instructions about each task and completed 20 practice trials prior to the main experiment. Each trial began with a central fixation dot (black, 0.5° VA) on a white background for 500 ms. Then, the random line masking pattern (containing oriented black lines) was presented for a specific duration to control the stimulus onset asynchrony (SOA) between the onset of the mask and the target item(s), which were made up of the black lines from the mask plus the black “X” symbols. For estimation (Expt 1b) there were five SOAs (33, 50, 100, 200, 500 ms) and one “no-mask” control condition. We used the same four SOAs between 33–200 ms for individuation (Expt 1a), because this process has been shown previously to unfold within this temporal range (Wutz et al., 2012; Wutz & Melcher, 2013). We focused on this common SOA range for the two tasks for the subsequent analysis. The target display containing the items to be individuated or estimated was superposed upon the masking pattern and was always presented for the same brief duration of 50 ms. The mask plus target display was immediately followed by a white screen (see Fig. 1b). Using this procedure, we achieved an optimal temporal resolution to slice visual processing time (sensory/iconic memory) after target exposure into small units of only tens of milliseconds, as very short SOAs can be used. Short SOAs leave less time to exclusively access the target item(s) memory trace, because the sensory traces of the mask and the target item(s) are temporally integrated for a greater amount of time (Di Lollo, 1980; Wutz et al., 2012; Wutz & Melcher, 2013).

The quantity of the target item(s) and the response procedure after mask plus target item(s) presentation depended on the specific task (see Fig. 1c). For individuation in Expt 1a, we either showed items within the subitizing range (one to four items) or a randomly assigned quantity between six and eight items serving as a control condition outside the subitizing range. We informed the participants that quantities in the range between one and eight items would be shown. Participants did not know that a five-item target was never presented. They were instructed to respond their perceived item quantity by pressing the corresponding number on a keyboard immediately after the presentation of the mask plus target display.

For estimation in Expt 1b, randomly assigned item quantities in three different number ranges were used. Either 11–15 (thereafter called ≈10 items), 21–25 (≈20 items) or 31–35 items (≈30 items) were shown on each trial. The response procedure followed a two-interval forced-choice design (see Fig. 2c). The white screen after the mask-plus-item(s) presentation remained on the screen for 1 s, and then a second item display without a mask was shown for 200 ms, against which the target item(s) had to be compared. A white screen immediately followed the presentation of the second item display. The comparison quantity was a constant Weber fraction above/below the sample quantity (±3–5 items for 11–15 items, ±6–10 items for 21–25 items, ±9–15 items for 31–35 items), to control for difficulty across number ranges. Participants were instructed to press the left arrow key when they perceived the sample as containing more elements than the comparison and the right arrow key otherwise. For both tasks, the subject’s response on the keyboard initiated the next trial.

The individuation experiment (Expt 1a) comprised 10 blocks of 80 trials each. Each combination of mask item(s) SOA (33, 50, 100, 200 ms) and item number (one to four items or “above-capacity” control) was shown four times per block in random order. The estimation experiment (Expt 1b) comprised 10 blocks of 72 trials each. For five subjects, only eight blocks were available due to technical difficulties. Each combination of mask item(s) SOA (33, 50, 100, 200, 500 ms or no mask control) and item number (≈10, ≈20, ≈30 items) was shown four times per block in random order. The sample contained more items than the comparison display on half of the trials and less on the remaining half.

Data analysis

The performance (percentage correct trials) for each task (individuation, estimation) was fed into a two-way repeated-measures analysis of variance (ANOVA), with the factors time (i.e., SOA) and number range. For individuation only data for number ranges within the subitizing range (one to four items) was analyzed. Because the 500 ms SOA and the “no-mask” condition were not present for the individuation task, they were left out for the analysis for the estimation task. Performance in the 500 ms SOA condition was very similar to the other SOAs (500 ms SOA: M = 79.5% correct, SD = 5.7% correct; other SOAs: M = 79.5% correct, SD = 4.9% correct). The “no-mask” condition yielded slightly worse performance (M = 74.2% correct, SD = 8% correct), probably because it was a much less frequent event than the masked conditions throughout the block. To directly compare the performance across tasks, we averaged over the number ranges in each task (one to four items for individuation, ≈10–≈30 items for estimation) and ran a two-way repeated-measures ANOVA, with the factors time (i.e., SOA from 33–200 ms) and task (individuation, estimation). We used partial eta square (η_p²) to calculate effect sizes for the ANOVA results. Further, we measured the individuation performance increase with longer SOA by means of logarithmic function fits with two free parameters (y = a × log (b × x)). The logarithmic fits were used to estimate the SOA-value at which performance was at a critical threshold of 75%-correct responses, for each numerosity level separately.

Results

Individuation over time and number (Expt 1a)

We found significant main effects of both item number (within the subitizing range, one to four items) and time (33–200 ms SOA) on individuation performance, number: F(3, 51) = 51.3, p < 2 × 10^-15, η_p² = 0.75; time: F(3, 51) = 60.7, p < 1.1 × 10^-16, η_p² = 0.78. Moreover, both factors showed a significant interaction, F(9, 153) = 3.4, p < 7.4 × 10^-4, η_p² = 0.17 (see Fig. 3a). This pattern of results is remarkable, because typically individuation performance is at ceiling and indistinguishable for item numbers within the subitizing range. The masking procedure used here, however, revealed strong differences even between one and four items. Moreover, individuation performance increased with increasing SOA between mask and target item display. This suggests that individuation is not an instantaneous process but instead depends on temporal factors. Moreover, the interaction pattern confirms our earlier findings (Wutz et al., 2012; Wutz & Melcher, 2013) that the masking procedure affects the rate at which items are individuated. Individuation capacity increased in steps with increasing time left in visual processing/memory. It is important to note that individuation performance outside the subitizing range showed a qualitatively and quantitatively different pattern. As expected, performance for 6-8 items was considerably worse compared with smaller numerosities and it depended less on temporal factors (see Fig. 3a).

Estimation over time and number (Expt 1b)

In sharp contrast to the individuation condition, there were no significant main effects and no interactions between them for estimation performance, number: F(2, 34) = 2.8, p < .08, η_p² = 0.14; time: F(3, 51) = 1.6, p < .20, η_p² = 0.09; interaction: F(6, 102) = 2.1, p < .07, η_p² = 0.11 (see Fig. 3b). For all three number ranges (≈10, ≈20, ≈30 items), performance was at around 80% correct and stable across the different SOAs (33–200 ms). This pattern of results suggests that estimation, unlike individuation, does not depend on temporal factors. It is important to note that estimation performance was at a high level even for the shortest SOA (33 ms). Interestingly, previously we found a very similar pattern for the mere detection of a second target display after the first mask presentation (Wutz & Melcher, 2013). Thus, this suggests that stimulus detection is largely sufficient to provide the information necessary in order to approximately estimate item numbers.

Comparison between individuation and estimation over time (Expt 1 and b)

To statistically pin down the differences between tasks as a function of time, we collapsed over the factor item number (one to four items for individuation, ≈10–≈30 items for estimation). A two-way repeated-measures ANOVA revealed significant main effects of task, F(1, 17) = 7.1, p < .017, η_p² = 0.3, and time, F(3, 51) = 62.3, p < 0, η_p² = 0.79, as well as a significant interaction between the factors, F(3, 51) = 38.9, p < 3.2 × 10^-13, η_p² = 0.7. Performance was slightly better in the estimation compared with the individuation task. The highly significant interaction, however, suggests that this effect was largely driven by short SOAs, which affected performance for individuation, but not for estimation. In sum, we found a strong impact of temporal factors—the mask-item(s) SOA—on individuation performance, which was completely absent for estimation (see Fig. 3). The individuation and estimation of object quantities evolves with different temporal dynamics.

Errors in individuation as a function of set size and SOA

We investigated the actually reported numbers in the individuation task as a function of set size and SOA, to more clearly distinguish between three different potential causes of the response on any given trial: (1) precise individuation, (2) approximate estimation, and (3) random guessing (see Fig. 4). When there was only a single target, participants mainly responded “1,” combined with a more scattered set of responses consistent with guessing or occasional lapses, irrespective of the SOA (M ± SD for 33-ms SOA: 1.9 ± 1.8 items; for 50-ms SOA: 1.4 ± 1.3 items; for 100-ms SOA: 1.1 ± 0.5 items; for 200-ms SOA: 1.1 ± 0.4 items). This pattern is typically found for precise individuation processes. When there were three items, however, at the short SOA values, participants often reported two or four items, and responses followed a Gaussian-like pattern around the actual value (M ± SD for 33-ms SOA: 3.6 ± 1.3 items; for 50-ms SOA: 3.3 ± 1 items). Gaussian-like response distributions are a hallmark of approximate estimation processes. A similar pattern was found with four target displays, with the Gaussian-like distribution centered around 4 and the width of the distribution being more broad for shorter SOA values (M ± SD for 33-ms SOA: 4.3 ± 1.3 items; for 50-ms SOA: 4.2 ± 1 items). Enumeration of three to four items became more precise at longer SOAs (M ± SD for three items and 100-ms SOA: 3.1 ± 0.7 items; three items and 200-ms SOA: 3 ± 0.5 items; four items and 100-ms SOA: 4.1 ± 0.7 items; four items and 200-ms SOA: 4 ± 0.7 items). In the case of larger set sizes (six to eight targets), the responses followed a relatively broad Gaussian-like distribution centered on the number 6 at all SOAs (M ± SD for 33-ms SOA: 5.9 ± 1.3 items; for 50-ms SOA: 5.9 ± 1.1 items; for 100-ms SOA: 6 ± 1 items; for 200-ms SOA: 6 ± 1 items). In other words, at very brief SOA values (33 and 55 ms), when there was more than a single item, there was evidence for an influence of estimation on responses, while for longer SOAs (100 and 200 ms) the influence of estimation on responses for set sizes of four or less appeared to be greatly diminished.

Interim discussion

As in previous studies (Wutz et al., 2012; Wutz & Melcher, 2013; Wutz et al., 2014), we found that reaching the “normal” subitizing pattern of high percentage correct performance across the range of three or four items required more than 100 ms. As in the previous studies, there was a significant interaction between numerosity and time. To more precisely and quantitatively assess the individuation time courses for each numerosity level separately, we fitted logarithmic functions to the data and extracted the SOA-value, at which performance reached a critical threshold of 75% correct responses. This revealed that even within the subitizing range smaller numerosities reached high performance faster than the three or four item displays (SOA value for 75% correct responses for one item: 32 ms, two items: 73 ms, three items: 110 ms, 4 items: 209 ms). Moreover, enumeration was better for one item than for two items, even up to 100 ms. This pattern reflected a specific effect of SOA on performance within the subitizing range, since performance was flat for displays of six or more items.

In sharp contrast to enumeration in the subitizing range, accuracy in the numerical estimation task was not affected by the effective duration of the visual stimulus. Participants performed around 80% correct on the estimation task, irrespective of the effective duration of the stimulus. Together with the finding that enumeration outside the subitizing range was flat, these results are consistent with suggestions that estimation involves a radically different mechanism from individuation (Burr et al., 2010; Piazza et al., 2011). Even at the shortest stimulus onset asynchrony (33 ms), numerical estimation was excellent, and indeed as good as for the longest SOAs (200 ms and 500 ms). By contrast, the participants correctly enumerated/individuated that there were exactly three or four items less than 50% of the time at the 33-ms SOA.

Thus, we can estimate that 33 ms or less of effective processing time is sufficient for registering features sufficiently to support perception of approximate numerosity. This provides an estimate of the temporal resolution of numerical estimation. Our findings are consistent with the idea that the temporal resolution of ensemble processing is sufficient to allow useful and meaningful information to be extracted within 50 ms (Chong & Treisman, 2003).

A closer examination on the actual responses in the individuation task revealed distinct patterns for precise individuation and approximate estimation processes depending on the set size and the SOA. When there was only one item, performance was accurate even at the fastest SOA, and errors were relatively widely distributed across the other items as consistent with errors being largely due to guessing or lapses. For three items, the pattern of errors was consistent with responses reflecting a combination of exact enumeration and some degree of estimation depending on temporal factors (SOA). There were relatively few errors that seemed to reflect guessing. This finding is consistent with the idea that the estimation process can act across all set sizes, with the difference in error distribution in the subitizing range due to an additional process, which is the attentive individuation of each item (Burr et al., 2010). When set size exceeded the subitizing range, the pattern of responses looked similar to a Gaussian distribution centered on number 6. This distribution was similar in shape across all values of SOA, consistent with an estimation process. Overall, small set sizes (in particular, one item) showed little influence of estimation, while set size of three was consistent with a combination of estimation and exact enumeration depending on SOA, and in contrast, performance outside the subitizing range was best explained by approximate estimation, reflecting statistical or ensemble processing, largely equal across the different levels of SOA.

The pattern of responses for the six to eight item set size in the individuation task seemed consistent with an estimation process. However, it is challenging to directly compare this pattern with the 11–35-item set size since the latter (Expt 1b) involved a two-interval forced-choice procedure rather than giving an exact number response. To better compare the two tasks, we repeated the estimation task in a second experiment, in which participants responded with a specific number of target items. This allowed us to measure the pattern of responses, including errors, and compare that with those found in Experiment 1a.