Perceptual addition of continuous magnitudes in an ‘artificial algebra ’

Although there is substantial evidence for an innate ‘number sense ’ that scaffolds learning about mathematics, whether the underlying representations are based on discrete or continuous perceptual magnitudes has been controversial. Yet the nature of the computations supported by these representations has been neglected in this debate. While basic computation of discrete non-symbolic quantities has been reliably demonstrated in adults, infants, and non-humans, far less consideration has been given to the capacity for computation of continuous perceptual magnitudes. Here we used a novel experimental task to ask if humans can learn to add non-symbolic, continuous magnitudes in accord with the properties of an algebraic group, by feedback and without explicit instruction. Three pairs of experiments tested perceptual addition under the group properties of commutativity (Experiments 1a-b), identity and inverses (Experiments 2a-b) and associativity (Experiments 3a-b), with both line length and brightness modalities. Transfer designs were used in which participants responded on trials with feedback based on sums of magnitudes and later were tested with novel stimulus configurations. In all experiments, correlations of average responses with magnitude sums were high on trials with feedback. Responding on transfer trials was accurate and provided strong support for addition under all of the group axioms with line length, and for all except associativity with brightness. Our results confirm that adult human subjects can implicitly add continuous quantities in a manner consistent with symbolic addition over the integers, and that an ‘artificial algebra ’ task can be used to study implicit computation.


Introduction
Humans are unique in our ability to perform explicit computation; that is, symbolic arithmetic and mathematics.Yet we share with nonhumans a capacity for implicit computationan ability to perform arithmetic-like operations on internally-represented magnitudes.This capacity may support examples of adaptive behaviour in nonhumans that have evolved to be optimal (or nearly so), such as spatial navigation, cue integration, timing, and decision making (Gallistel, 1990;Grace et al., 2020).Implicit computation is an important topic for cognitive science because it may contribute to the development of symbolic mathematicsboth in terms of its phylogenetic roots and its scaffolding of children's learning about numbers and arithmetic.
The proposal that mathematical learning is scaffolded by an evolutionarily-based system shared with nonhumans is, of course, not new.Extensive research on numerical cognition has investigated a nonverbal 'number sense', shared with nonhumans, that facilitates processing of quantity and numerosity (Dehaene, 1999).Number sense includes two components: an object tracking system (OTS) that facilitates automatic enumeration of small sets of objects; and an approximate number system (ANS) that allows for less precise estimates of larger quantities (Feigenson, Dehaene, and Spelke, 2004) and follows Weber's Law (Libertus and Brannon, 2009).The capacity to discriminate discrete quantities has been demonstrated with newborn human infants (Izard, Sann, Spelke, and Streri, 2009), as well as rhesus monkeys (Cantlon and Brannon, 2006;Hauser, MacNeilage, and Ware, 1996), lemurs (Merritt, MacLean, Crawford, and Brannon, 2011), rats (Meck and Church, 1983), and honeybees (Bar-Shai, Keasar, and Shmida, 2011;Gross et al., 2009), among other species, suggesting the sense for number is innate and evolutionarily ancient.There is considerable evidence from neuroimaging studies that numerosity processing is localised in the intraparietal sulcus and other regions in the parietal cortex (Nieder and Dehaene, 2009;Sokolowski, Fias, Mousa, and Ansari, 2017), and number sense has been linked with later learning of symbolic mathematics (Chen and Li, 2014;Halberda, Mazzocco, and Feigenson, 2008;but cf. Lyons and Ansari, 2015).
Despite this evidence, questions have been raised whether number sense really represents number at all.Critics have argued that continuous magnitude, rather than discrete quantity, is the dimension more plausibly accounted for in number sense research, and that knowledge of numerical quantity is not innate but acquired through experience of reliable environmental correlations between discrete and continuous magnitudes (Henik, Gliksman, Kallai, and Leibovich, 2017;Leibovich, Katzin, Harel, and Henik, 2017).The main line of argument stems from the fact that numerosity is always correlated with at least one continuous dimension, and so any attempted measure of discrete quantity is necessarily confounded by non-discrete measures (Leibovich and Henik, 2013;Mix, Huttenlocher, and Levine, 2002).For example, a cloud of dots that is more numerous than another will necessarily have a larger convex hull, smaller dot size, and/or greater dot density.Thus it cannot be ruled out that some or all of these continuous dimensions, rather than discrete numerosity, are implicated in results of number sense studies.Proponents of this alternative account argue that continuous magnitude is a plausible basis for numerosity judgements because it represents a more automatic process that is likely to be evolutionarily and developmentally antecedent to a dedicated sense of discrete number (Gebuis and Reynvoet, 2012;Henik, Leibovich, Naparstek, Diesendruck, and Rubinsten, 2012;Leibovich et al., 2017).Other researchers have argued a middle ground between theories of 'number' and 'magnitude' sense, proposing that discrete and continuous magnitudes are innately encoded as distinct, but perceptually related dimensions (Lourenco and Aulet, 2023).
Many studies have tested competing predictions of these accounts, typically with discrimination tasks in which participants make a binary response to indicate which of two magnitudes is greater or whether one magnitude is distinct from another (Gebuis and Reynvoet, 2013;Park, DeWind, Woldorff, and Brannon, 2016).Yet somewhat overlooked in this debate has been the nature of the computations that can be performed with internally-represented magnitudes.This is an important gap in our knowledge, because the nature of such computations and how magnitudes are represented are interrelated questions, and an eventual complete theory will need to answer both.
There is evidence for computation with discrete magnitudes consistent with the number sense account.Non-symbolic addition and subtraction of discrete quantities has been documented in human infants (Cohen and Marks, 2002;McCrink and Wynn, 2004;Wynn, 1992), monkeys (Cantlon and Brannon, 2007;Flombaum, Junge, and Hauser, 2005), honeybees (Howard, Avarguès-Weber, Garcia, Greentree, and Dyer, 2019), and newly-hatched chicks (Rugani, Fontanari, Simoni, Regolin, and Vallortigara, 2009), and fMRI studies have confirmed these tasks activate similar neural structures to those involving symbolic arithmetic in older humans (Bugden, Woldorff, and Brannon, 2019;Venkatraman, Ansari, and Chee, 2005).If cognition of discrete quantities is in fact scaffolded by a learned correlation with continuous magnitudes, as critics of number sense have claimed, then the computations performed in such tasks might be based on representations of their underlying continuous dimensions.But to our knowledge, no studies from developmental psychology or comparative cognition have investigated whether non-symbolic computation of (purely) continuous magnitudes is possible.
Some evidence for computation of continuous magnitudes can be found in the psychophysical literature.Studies dating back to the 1960s and 1970s, for example, showed that adult human subjects could average pairs of continuous magnitudes including weight (Anderson, 1967), line length (Miller and Sheldon, 1969;Weiss and Anderson, 1969), greyness (Weiss, 1972), and angle (Stanley, 1974), with Norman Anderson's 'information integration' model also extending to integrative statistical judgements in social and cognitive psychology (Anderson, 1981); see Bauer (2015) for review.Interest in psychophysical averaging was rekindled at the turn of the century following an influential paper by Dan Ariely (Ariely, 2001) which highlighted the importance of 'ensemble perception' for understanding how the brain aggregates complex perceptual information.Subsequent studies have demonstrated an ability to compute average hue (Maule and Franklin, 2015;Webster, Kay, and Webster, 2014), frequency (Piazza, Sweeny, Wessel, Silver, and Whitney, 2013), and facial characteristics (Haberman and Whitney, 2009;Luo and Zhou, 2018), among other modalities.
A small number of psychophysical studies from the 1960s and 1970s also utilised methods of continuous addition, with mixed results.Goude (1962) observed that subjects overestimated the sum of two lifted weights (when compared to a third, standard weight), but were able to accurately add two angles to produce a third angle representing their sum.Curtis and Fox (1969) found subjects' category-rated sum of lifted weights was consistent with separate estimates of their individual magnitudes.Subjects in Krueger (1970)'s study systematically overestimated the sums of perceived line lengths, while Abravanel (1971) found subjects were able to accurately sum lengths in visual, haptic, and cross-modal stimulus production procedures.Dawson (1971) instructed subjects to numerically estimate the sums of circle areas and loudnesses, and found they almost approximated corresponding magnitude scales.In another test of cross-modal sensory equivalence, subjects consistently added visually and haptically perceived angles both within and across modalities by indicating the location of the summed angle on a circle (Stanley, 1974).
Finally, there is a large body of psychophysical research concerning the computation of differences and ratios of continuous magnitudes.These studies historically utilised variations on Stevens' methods of magnitude estimation (Stevens, 1957), alongside category rating tasks (in which judgements are ranked along an ordinal scale), to determine the operation(s) according to which comparisons of perceptual magnitudes are made.Results have shown that adult human subjects can reliably estimate differences across a range of sensory modalities, including heaviness (Birnbaum and Veit, 1974;Mellers, Davis, and Birnbaum, 1984;Rule, Curtis, and Mullin, 1981), line length (Parker, Schneider, and Kanow, 1975), darkness (Birnbaum, 1978;Veit, 1978), loudness (Birnbaum and Elmasian, 1977;B. Schneider, Parker, Farrell, and Kanow, 1976), pitch (Elmasian and Birnbaum, 1984), sweetness (Graaf and Frijters, 1988), and distances between US cities (Birnbaum and Mellers, 1978).The ability to estimate ratios is less universally supported, and it has been suggested this may be possible only with respect to 'extensive' continuous dimensions, such as length and area (Masin, 2013;Masin and Brancaccio, 2017;Masin, Brancaccio, and Tomassetti, 2019).Grace, Morton, Ward, Wilson, and Kemp (2018) recently argued that a fundamental limitation of all prior studies for investigating the capacity for pre-verbal magnitude comparison was their reliance on explicit instruction and formal symbolism, and therefore higher-level cognitive understanding of mathematicssubjects in these studies are typically instructed at the outset which computation to perform, and respond on a numeric scale.Drawing on behavioural learning methodologies, they devised a non-symbolic, non-verbal task designed instead to engage implicit learning processes.Rather than responding with numbers, participants compared pairs of brightnesses, circle areas, and numerosities by making a mouse click along an analogue response bar.Feedback was provided showing the position on the response bar corresponding to either the scaled difference or the ratio of the magnitudes presented, so that participants were trained to respond according to one or the other operation without explicit instruction.Results showed that under this paradigm, adult human subjects could learn to produce both differences and ratios of continuous magnitudes with a high degree of accuracy.
Although Grace et al.'s (2018) original motivation was to test if magnitudes were compared by differences or ratios (Torgerson, 1961), their task can be described as an 'artificial algebra' because in principle it could be used to train arbitrary relationships between magnitudes by feedback and without explicit instruction, similar to artificial grammar tasks in psycholinguistics (Pothos, 2007).In the current study, we extended this paradigm to test whether addition of continuous magnitudes is possible.Rather than being trained via feedback to produce the difference or ratio of two magnitudes, participants were trained to N.J.Morton et al. produce the sums of pairs of brightness or line lengths by making a mouse click along the response bar, to which the sums of the physical magnitudes were linearly mapped.Specifically, we tested whether participants could produce additive judgements corresponding to the axioms of an algebraic group, which generalise symbolic addition over the integers.We give formal definitions of these axioms here: A group is a pair (G, ∘) where G is a non-empty set and ∘ is a binary operation on G (that is, ∘ is a map from G × G to G), such that the following three conditions are satisfied.
• Identity: There exists e in G such that for each a (in G), a∘e = e∘a = a holds; in symbols, ∃e ∀a (a∘e = e∘a = a).
The element e is the identity element.
• Inverse: For each a in G there exists b (in G) such that a∘b = b∘a = e holds; in symbols, The element b is the inverse of a.
If, in addition to the above requirements, the binary operation ∘ is such that a∘b = b∘a holds for each a and b (in G), then (G, ∘) is a commutative (or 'Abelian') group.The integers under addition (ℤ, +) satisfy the requirements for a commutative group.In this research we sought to test whether structurally analogous computations with continuous magnitudes are possible within Grace et al. (2018)'s artificial algebra paradigm.
We report three pairs of experiments using both extensive and intensive stimulus modalities (line length and brightness), which test continuous addition under analogues of the four group properties described above.Experiments used a transfer design in which participants were first trained on a set of exemplars and later tested with novel stimulus configurations for which additive group properties make specific predictions.If participants learned to respond accurately on trials with feedback, and in accordance with predictions on test trials, it would suggest that they had implicitly learned to add the stimulus magnitudes in a manner analogous to symbolic addition (i.e.commutative, associative, and inclusive of inverse and identity elements).Insofar as such perceptual addition is possible with continuous stimuli, this result would provide evidence for a general capacity for implicit computation that is not restricted to discrete dimensions.

Experiments 1a-b (Commutativity)
Our goal was to test whether participants could learn to respond in Grace et al. (2018)'s task based on the sum of stimulus values, and whether their responding was commutative.On each trial, they saw a pair of lines that vaired in length (Experiment 1a) or circles that varied in brightness (Experiment 1b) and clicked on a horizontal response bar below the stimuli.The stimuli were sampled from a set of 28 pairs, which were arranged so that half of the pairs had the longer line or brighter circle on the right (and vice versa).After each response, participants received feedback in the form of a coloured oval centred on the correct response location, defined in terms of the sum of the stimuli.If their response was within +/− 7% of the correct location, the oval was green, otherwise it was red.Incorrect responses were followed by a correction trial with the same stimulus pair.There were four blocks of trials in which each pair was presented three times.In the fourth block, six stimulus pairs were presented with their original left-right position reversed, and responses did not produce feedback.These transfer test trials were meant to assess whether participants' responding violated commutativity, that is, whether responding was conditional on the spatial order of the stimuli.Because the trained operation was addition, the transfer tests assessed if response to b + a equalled the response to a + b.

Participants
Twenty undergraduate psychology students from the University of Canterbury served as participants, eight in Experiment 1a (5 m, 3f; M = 19.25 years) and 12 in Experiment 1b (2 m, 10f; M = 19.75years).All reported normal or corrected-to-normal vision.None were familiar with the purpose of the research, and they received course credit in exchange for participation.
One of the possible 28 pairs of lines or brightness stimuli was presented on each trial.Lines were pale yellow on a black background, mm wide, with midpoints positioned 18 cm apart and 10 cm from the top of the screen.They were presented at randomised angles (which differed by at least 30 • within pairs) to discourage use of direct measurement strategies.Brightness stimuli were displayed as circles, each cm diameter (6.8 • visual angle), and shown side by side against a black background, 8 cm from the top of the screen and separated by 3.3 cm.A 2 mm thick grey horizontal response bar, containing no markings and 16.5 cm in length, was positioned below the line length or brightness stimuli.
For each stimulus pair, 'correct' responses in terms of locations on the horizontal bar were calculated based on the sum of the nominal line length or brightness grayscale values.The left and right ends of the horizontal bar were defined as 0 and 1, respectively.Sums were then scaled linearly relative to the minimum and maximum sums across the 28 pairs so that the minimum sum corresponded to 0.05 on the response bar and the maximum sum to 0.95 (to avoid floor and ceiling effects): Where a and b are the nominal values of the left and right stimuli.

Procedure
Participants were seated at a desk in a cubicle with an HP Elitedesk core i7 computer with a Samsung 22" LCD monitor.The monitor had a resolution of 1680 x 1050 pixels and a maximum brightness of 300 cd/ m 2 and contrast ratio of 1000:1.Participants were told to "estimate the length of the lines" or "estimate the brightness of the circles" by making a click along the response bar, and that the purpose was "to learn how to estimate the [line lengths/brightnesses] as accurately as you can".On each trial, participants saw one of the possible 28 pairs of lines or circles.For half of the pairs the longer line or brighter circle was always presented on the left, and the reverse was true for the other half of the pairs.These constituted the training pairs, and each was presented three times in randomised order in each of the first three trial blocks in the session, giving a total of 84 trials per block.In the fourth block, an additional six N.J.Morton et al. transfer pairs were included, each presented three times.The transfer trials likewise presented of one of the 28 possible pairs of values, but their order was the reverse of that encountered in training trials (i.e. if the brightest circle appeared on the left during the training trials for that pair, it was presented on the right in the transfer trial).The transfer pairs were selected to span the range of training values (pairs with ranks = 2, 4, 11, 17, 24 and 27).The fourth block included 102 trials (84 training and 18 transfer trials), so the session comprised 354 trials in total.Nominal stimulus values, correct response locations, and the identity of transfer pairs are given in Supplementary Table S1 for both Experiments  1a and 1b.Fig. 1 shows a schematic of the experimental setup and procedure (with brightness stimuli).Participants indicated their response by clicking on the horizontal response bar with a mouse.There was no time limit to respond.For training trials, feedback was presented after a 100 ms delay by a 10 mm high oval extending 7% in either direction and centred on the correct response location (this threshold was determined by pilot testing to yield an optimal level of task engagement).If the participant's response fell within the +/− 7% range, the oval was green, whereas if it fell outside this range the oval was red.After 500 ms, the stimuli and response bar were removed, and the next trial began after a 2-s interval.Responses that produced red ovals were followed by a single correction trial in which the same stimulus pair was presented.Responses on correction trials were not included in the results.For transfer trials, the procedure was identical as for training trials except that feedback in terms of the green or red oval was omitted and no correction trials were presented.
All research protocols for Experiments 1a-1b and other experiments reported here were approved by the University of Canterbury Human Ethics Committee (HEC 2017/66/LR-PS).

Results
Data for one participant from Experiment 1a and two from Experiment 1b were omitted due to non-systematic responding (correlations of responses with correct values were rs = 0.43, − 0.13, − 0.18).
Fig. 2 shows the absolute deviation of responses from correct values for each block, averaged over participants with stimulus pairs that varied in line length (Experiment 1a) or brightness (Experiment 1b).
Because accuracy increased after Block 1 but stabilised thereafter, subsequent analyses used data pooled over Blocks 3 and 4. To examine how responses depended on training, we calculated correlations for each participant between average responses and correct values over the 28 trained stimulus pairs (i.e., those given feedback).For line length, the average correlation was r = 0.969 [95% CI: 0.953, 0.968], and for brightness was r = 0.955 [95% CI: 0.942, 0.968], which were not significantly different, t(15) = 1.30, p > .20.
Fig. 3 shows scatterplots of responses versus correct values for the trained stimulus pairs, averaged across participants for Experiment 1a (left panel) and 1b (right panel).Data are shown separately for trials in which the left stimulus magnitude was less than the right stimulus magnitude (a < b) and the converse (a > b).For both line length and brightness stimuli, responding was strongly linearly related to correct values (R 2 > 0.98), and did not appear to vary depending on whether the left or right magnitude was greater.Regression slopes were <1, indicating a conservatism or tendency for responding to be systematically less extreme than correct values.The data in Fig. 3  Correlations between test and trained responses were r = 0.996 for Experiment 1a and 0.930 for Experiment 1b, and the average absolute deviations of responding on test and training trials were 0.019 and 0.014.This shows that on average, responding on test trials was effectively identical to responding on training trials, so that predictions based on commutativity were supported.
To determine if the average data in Fig. 4 were representative of individual results, we calculated correlations and average absolute deviations for each participant.For Experiment 1a the average individual correlation was 0.948 [95% CI: 0.901, 0.994], and was 0.926 [95% CI: 0.894, 0.959] for Experiment 1b.The average absolute deviation for individual data in Experiment 1a was 0.049 [95% CI: 0.038, 0.060], and was 0.073 [95% CI: 0.061, 0.085] for Experiment 1b.Responding on test and training trials were highly consistent for individual data.
The strength of evidence for equal test and training responses for the data in Fig. 4 was estimated in a Bayesian analysis (Keysers, Gazzola, and Wagenmakers, 2020).Specifically, we ran a Bayesian repeatedmeasures ANOVA with training vs. test and stimulus pair as withinsubjects factors.We used a JZS prior based on a Cauchy distribution (Rouder, Morey, Speckman, and Province, 2012) and calculated the strength of evidence for the null hypothesis using the BayesFactor package (Morey, Rouder, and Jamil, 2018) in Jamovi (Jamovi, 2022).For both lines (left panel) and brightnesses (right panel), the analysis indicated moderate evidence for the null hypothesis relative to the training vs. test effect, BF 01 = 4.75 and 5.59, respectively.
Results of Experiments 1a and 1b show that with both extensive (line length) and intensive (brightness) dimensions, participants in the artificial algebra task learned to make an analogue response based on the sum of stimulus magnitudes by feedback and without explicit instruction.Responses were accurate in terms of being highly correlated with trained values, and consistent with commutativity: When stimulus pair b + a was presented without feedback, responses were indistinguishable from those previously made to a + b.These results extend those of Grace et al. (2018) and Chen, Berg, Kemp, and Grace (2020) to show that the artificial algebra task can be used to train responding based on sums of stimulus magnitudes.However, it is important to note that because correct values were determined by a linear mapping of nominal sums to the response bar, the operation used by participants on subjective magnitudes could have been any operation that was linearly related to additionfor example, averaging of magnitudes.The possibility that participants may have responded based on averages rather than sums is addressed in Experiments 2 and 3.

Experiments 2a-2b (Inverse and Identity)
Experiments 2a and 2b tested analogues of the inverse and identity group properties in the artificial algebra task.It makes sense to test these properties together as both involve the identity element, e, which is defined as that for all a in G, a + e = a.For the additive group over the integers, e = 0.The inverse property states that each element has an inverse, − a, such that a + − a = e.This property defines subtraction as the inverse operation of addition.Importantly, averaging does not satisfy the group axioms of identity or inverses.This experiment therefore provides a more direct test of participants' ability to perform perceptual addition, rather than compute a statistical generalisation of the stimuli presented.
To create analogues of inverse and identity, we arranged our task so that if the magnitude of the left stimulus was greater than the right, the correct value was towards the left of the response bar; if the magnitude on the right was greater, the correct value was towards the right; and if the magnitudes were equal, the correct value was the centre of the response bar.Specifically, correct values were calculated as the difference between the stimulus magnitudes, with (conventionally) the left magnitude corresponding to negative and the right to positive values.Each stimulus pair was presented with stimuli in both left/right positions, that is, as both (a,b) and (b,a).The inverse property predicts that the response to (b,a) should be the opposite of the response to (a,b) in terms of deviation from the centre of the response bar.Expressed differently, the inverse property predicts that the function relating responses to correct values should be symmetrical around trials with equal stimuli and for which the correct value was the centre of the response bar.Trials were included in which either the left or right stimulus was omitted, that is, a blank space was presented where the line or circle would usually appear; these were analogues to adding zero (i.e., the identity element).
Participants were not given specific instructions other than that their task was to estimate the brightness or length of the stimuli by clicking on the response bar.As in Experiments 1a-1b, participants completed four blocks of training in which transfer trials were included in the final block.On transfer trials, novel stimulus pairs were presented without feedback and we tested if responses conformed to predictions based on the identity and inverse properties.

Participants
Fourteen individuals from the University of Canterbury community served as participants, seven in Experiment 1a (2 m, 5f; M = 26.57years) and seven in Experiment 2b (6 m, 1f; M = 20.86 years).All reported normal or corrected-to-normal vision.None were familiar with the purpose of the research, and they received a $15 NZD shopping voucher in exchange for participation.

Materials
Experiment 2a (line length) used the same stimuli as Experiments 1a, and Experiment 2b (brightness) used the same stimuli as Experiment 1b.

Procedure
All details of the procedure were the same as Experiments 1a-1b, with the following exceptions.Participants were instructed to "estimate the lengths of the lines (or brightness of the circles) by clicking on the response bar" and told that the purpose was "to learn to estimate the length of lines (or brightness of the circles) as accurately as possible".Sessions consisted of five blocks of trials, each separated by a short break.The first three blocks comprised 58 trials each.These trials were defined as follows (see Supplementary Table S2).Of the 28 pairs of stimuli (a,b), with a < b, 14 pairs were presented twice, once with a on the left and b on the right (a,b) and once with the reversed order (b,a) (28 trials).Of the remaining 14 pairs, seven were presented as (a,b) and seven as (b,a) (14 trials).Of the eight individual stimuli, four were presented singly in identity element trials on both the left and right, that is, (a,0) and (0,a) (8 trials), and four were presented singly on either the left or right location (a,0) or (0,a); 4 trials.For four of the eight individual stimuli, trials were included in which the stimulus appeared on both the left and right location, that is, (a,a) (4 trials).
Blocks 4 and 5 included the 22 test trials in addition to the same 58 trials of the preceding three blocks (80 trials in total).Of these test trials, 14 presented stimulus pairs as (b,a) that had been trained as (a,b) in the first four blocks (7 trials), or vice versa (7 trials).Four test trials presented stimuli singly on the left (or right) that had been trained on the right (or left) in preceding blocks, and four test trials presented the four stimuli on both the left and right (a,a) that had not been previously used in (a,a) trials.Thus, sessions consisted of 334 total trials (training and test).
On training trials, feedback was based on the correct response location, which was calculated as follows:  S2.

Results
Fig. 5 shows the average absolute deviation from correct values by block for Experiment 2a (line length) and 2b (brightness).Deviations decreased across blocks in both experiments, with a somewhat greater decrease observed for line length.A repeated-measures ANOVA with block and stimulus type as factors found a significant effect of block, F (4,48) = 14.95, p < .001,η p 2 = 0.56, and no significant effect of stimulus type (p = .80)or interaction (p = .06).Post-hoc tests (Tukey) confirmed that deviations in Block 1 were significantly greater than deviations in Blocks 2-5.Subsequent analyses used responding pooled over Blocks 3-5.
The inverse property predicts that the function relating responses to correct values should be symmetrical around the centre of the response bar (i.e., x = 0.50).Results in Fig. 6 appear to be consistent with this expectation.If the panels are divided in half at x = 0.50, a symmetrical pattern is evident for both experiments in which responses for each half are linearly related to correct values but with shallower slopes than over the full range.This pattern suggests that participants were responding based on deviations from the centre, with the shallower slopes indicative of the conservatism seen in responding in Experiments 1a-1b; that is, participants tended to respond in the middle of the possible range, avoiding the extremes.
Specifically, the symmetry predicted by the inverse property requires that the response to (a,b) should be the opposite of the response to (b,a) in terms of deviations from x = 0.50.To test this, we first compared performance on pairs of trained (a,b) and (b,a) trials, that is, with the same stimuli but opposite left/right positions.As Table S2 shows, there were 14 pairs that were trained both as (a,b) and (b,a), and 4 stimuli presented singly that were trained as (a,0) and (0,a).For these 18 trials, the deviation of responses from the centre of the response bar (x = 0.50) should have been equal.The average of these absolute deviations was 0.015 and 0.019 for Experiments 2a and 2b, respectively.Average absolute deviations were calculated based on each participant's data, and averaged 0.044 [95% CI: 0.037, 0.051] and 0.052 [95% CI: 0.041, 0.062] for Experiments 2a and 2b, and were not significantly different, t (12) = − 1.67, p > .25.These results show that in both Experiments 2a and 2b, responding was consistent with the symmetry predicted by the inverse property for trials in which feedback was included.
Fig. 7 shows results from the inverse test trials for Experiment 2a (left panel) and 2b (right panel).For the 14 pairs which were trained as either (a,b) or (b,a), average responses on test trials (in terms of absolute deviations from the centre of the response bar) are plotted against the corresponding response on training trials.Performance on inverse test trials corresponded closely to that predicted from training trials.For line length (Experiment 2a), the average absolute deviation between (a,b) and (b,a) trials was 0.015, and for brightness (Experiment 2b) was 0.013.For individual participants, average absolute deviations were 0.038 [95% CI: 0.030, 0.046] for line length, and 0.043 [95% CI: 0.037, 0.050] for brightness, which were not significantly different, t(12) = − 0.94, p > .35.
A Bayesian repeated-measures ANOVA similar to Experiment 1a-b found moderate evidence for the null hypothesis (equality of deviations from centre for trained and tested trials) for lines (Fig. 7, left panel), BF 01 = 5.27, and weak evidence for brightnesses (right panel), BF 01 = 2.85.
Fig. 8 shows results of identity test trials.These were trials in which stimuli were presented singly, but on the opposite side from trained trial, that is, (a,0) after (0,a) had been trained, and vice versa.Average test trial responses are plotted against the corresponding response on trained trials, as the deviation from the centre of the response bar.On average,  Results of Experiments 2a and 2b confirm that participants quickly learned to add both line length and brightness magnitudes in a manner consistent with analogues of the inverse and identity group properties.These properties are integral to algebraic groups and encode key aspects of symmetry.The present task used spatial analogues of inverse and identity: Participants estimated the magnitudes of the stimulus pairs, responding more towards the left on the bar if the magnitude of the left stimulus was greater, more to the right if the right stimulus was greater.
The key prediction of the inverse property was that responding should be symmetrical around the centre of the response bar, and this prediction was supported by trained and test trials in both experiments.These results are consistent with a spatial application of the inverse property under addition.According to the identity property, adding the identity  element leaves any element unchanged, that is, a + 0 = 0 + a = a.In our task, identity was tested by including trials in which only a single stimulus was presentedthe left/right position with the omitted stimulus was assumed to be zero magnitude and corresponded to the identity element.Results showed that with both line length and brightness stimuli, participants responded equivalently on training and test trials, consistent with addition under the identity axiom.

Experiments 3a-3b (Associativity)
The associative property holds that the order in which a binary operation is applied with two or more elements does not matter, that is, a + (b + c) = (a + b) + c.A procedural analogue of associativity in the artificial algebra task was devised in which two stages incorporating a total of three stimuli were used.In Stage 1, either one or two stimuli were presented (corresponding to a or a + b) -we refer to these as Type 1 and 2 trials, respectively.These stimuli were shown in the upper half of the screen with a horizontal response bar below.In Stage 2, two stimuli were shown (i.e., b + c) on Type 1 trials, and one stimulus on Type 2 trials (i.e., c) in the lower half of the screen, also with a response bar below.Participants were asked to estimate the overall length (or brightness) of the stimuli presented at each stage.Feedback with respect to both stages was only presented after the response in Stage 2 had been made.This is not as strict an analogue of the associative axiom since it involves a two-stage procedure, while group axioms are properties of binary operations.However it provides a useful means to study the consistency with which participants are able to add multiple stimuli in successive stages, in keeping with associativity.Like Experiments 1 and 2 it also provides a means to discriminate between perceptual addition and averaging, since averaging does not satisfy the associative axiom, and so would predict different results in this task.
Similar to Experiments 1a-b and 2a-b, both line length and brightness stimuli were used (3a and 3b).Eight different stimuli in each experiment were used to generate a total of 56 triplets (a,b,c) in which a < b < c.Of the 56 triplets, 28 were trained as both Type 1 and Type 2 trials, while 28 were trained as Type 1 or Type 2 (14 each).In the final block of training, test trials were included in which the triplets that had been trained as Type 1 were presented in a Type 2 trial (or vice versa) and no feedback was presented.The key question was whether Stage 2 responses on test trials were similar to responses on the corresponding trained trialsthat is, whether the response made with respect to the addition of all three elements depended on the order in which additions on the subsets were made.We also examined overall accuracy of responding in terms of whether Stage 1 and 2 responses were correlated with correct values.

Participants
Eighteen individuals from the University of Canterbury community served as participants, 10 in Experiment 3a (6 m, 4f; M = 28.2years) and 8 in Experiment 3b (4 m, 4f; M = 25.5 years).All reported normal or corrected-to-normal vision.None were familiar with the purpose of the research, and they received a $15 NZD shopping voucher in exchange for participation.

Procedure
Participants were seated at a desk in a cubicle with an HP Elitedesk core i7 computer with a Samsung 22" LCD monitor.The monitor had a resolution of 1680 x 1050 pixels and a maximum brightness of 300 cd/ m 2 and contrast ratio of 1000:1.Participants were instructed to "estimate the overall length of the lines" or "estimate the overall brightness of the circles" and told they "should respond at whatever pace feels comfortable and natural".They were also informed that "your click on the second response bar should be to the right of your original response".
Each session consisted of two blocks separated by a brief rest period.The first block consisted of 84 trials, all of which provided feedback, and the second block consisted of 112 trials.The second block included the same 84 trials as the first block, and 28 test trials.The 84 trials in the first block included 28 triples that were trained both as Type 1 and Type 2 trials (56 total trials), 14 trials that were trained as Type 1, and 14 trials that were trained as Type 2. The 28 test trials consisted of the triples that had been trained as either Type 1 or Type 2 only, which were tested as the other type.The order of trials in each block was randomised for individual participants.
A diagram of the procedure is shown in Fig. 10.On Type 1 trials, a single stimulus was displayed in the upper half of the screen centred 5.5 cm below the top (Experiment 3a: a yellow line, 10 mm wide, presented at a randomised orientation; Experiment 3b: a grey circle, 10 cm diameter), with a horizontal bar (33 cm long, 5 mm wide) centred 2.5 cm below the stimulus.The participants' response location was marked by a small circle on the bar (10 mm diameter).After a 200 ms delay, two stimuli were displayed, centred 8 cm below the upper horizontal bar but 12.5 cm to the left or right (Experiment 3a: two yellow lines, 10 mm wide, randomised orientation; Experiment 3b: two grey circles, 10 cm diameter), with a second horizontal bar (33 cm long, 5 mm wide) centred 2.5 cm below the stimuli.For training trials, feedback was presented after a 100 ms delay by two 10 mm high ovals extending 7% in either direction and centred on the correct response location on both the upper and lower horizontal bars.If the participant's response fell within this +/− 7% range, the oval was green, whereas if it fell outside this range the oval was red.After 500 ms, the stimuli and response bars were removed, and the next trial began after a 2-s interval.Responses that produced red ovals were followed by a single correction trial in which the same stimuli were presented again.Responses on correction trials were not included in the results.For transfer trials, the procedure was identical as for training trials except that feedback in terms of the green or red ovals was omitted and no correction trials were presented.
On Type 2 trials, there were two stimuli presented in the first stage, 5.5 cm from the top of the screen and 12.5 cm to the left or right, with a horizontal bar centred 2.5 cm below the stimuli.There was a single stimulus presented in the second stage, centred 8 cm below the upper horizontal bar, and a second horizontal bar 2.5 cm below the stimulus.The stimulus characteristics and feedback were the same as Type 1 trials.
Feedback was based on correct response locations, calculated according to the following equations (left and right ends of the horizontal bar = 0 and 1, respectively).
Type 1 trials, Stage 1: Type 2 trials, Stage 1: Type 1 & 2 trials, Stage 2: where a,b,c were the stimulus magnitudes.min(a + b + c) was set equal to zero.Stimulus values for training and test trials and correct values for Experiments 3a-3b are listed in Supplementary Table S3.

Results
Data for one participant in Experiment 3a and one in Experiment 3b were omitted due to failure to follow instructions (for each, their Stage response was to the left of their Stage 1 response in a substantial proportion of trials (>25%), so that correlations of Stage 2 responses with correct values were low (rs = 0.28 and 0.00, respectively).Fig. 11 shows the average absolute deviation of Stage 2 responses from correct values for Experiment 3a (line length; left panel) and 3b (brightness; right panel).Deviations generally decreased from Block 1 to Block 2, and were greater on Type 2 trials for Experiment 3a.A repeatedmeasures ANOVA found a significant effect of Block, F( 1 Although more variability was evident with brightness stimuli, responses were highly correlated with correct values in both experiments.For Stage 1 with line length stimuli (upper left panel), correlations for Type 1 and Type 2 responses were r = 0.99 and 0.97, respectively, and regression slopes were similar and slightly (but significantly) less than one, B = 0.93 [95% CI: 0.89, 0.97] and 0.83 [95% CI: 0.76, 0.90].For Stage 1 brightness responses, correlations were also high for Type 1 and 2 trials (r = 0.95 and 0.91, respectively), but regression slopes differed substantially, B = 1.70 [95% CI: 1.51, 1.88] for Type 1 trials, and B = 0.64 [95% CI: 0.55, 0.74] for Type 2. The difference in slopes resulted in systematically greater responses on Type 1 compared to Type 2 trials when the sums of brightness magnitudes were equal.Thus, while Stage 1 responses were similarly related to the sum of the line lengths regardless of whether one or two lines were presented, this was not the case with brightness stimuli.
For Stage 2 responses with line length stimuli (Fig. 12, upper right panel), correlations with correct values again were very high (r = 0.97 and 0.98, for Type 1 and 2 trials, respectively) and regression slopes were similar, B = 0.66 [95% CI: 0.60, 0.73] and B = 0.76 [95% CI: 0.71, 0.82].For brightness stimuli (lower right panel), responding was more variable and hence correlations and regression slopes were lower, r = 0.85 and 0.82, B = 0.56 [95% CI: 0.45, 0.66] and B = 0.55 [95% CI: 0.43, 0.68].However for both line length and brightness stimuli, patterns of responding were similar across Type 1 and 2 trials.Overall, responses on transfer trials corresponded closely to responses on trained trials.For line length, correlations of transfer and trained responses were r = 0.98 and 0.97 for Type 1 and 2 trials, respectively.Correlations with brightness stimuli were lower, r = 0.78 and 0.80, as responses varied across a narrower range.To test if transfer responses deviated systematically from trained values or between Type 1 and 2 trials, average deviations (signed) of transfer and trained responses were calculated.For line length, the average deviations were M = − 0.02 [95% CI = − 0.08, 0.05] and M = 0.03 [95% CI: − 0.03, 0.09] for Type 1 and 2 trials, respectively, which did not significantly differ, t(8) = − 0.74, p = .48.For brightness, average deviations were M = − 0.02 [95% CI: − 0.06, 0.03] and M = 0.07 [95% CI: 0.02, 0.11].Although the difference was not significant, t(6) = − 1.86, p = .11,the confidence interval for Type 2 trials excluded zero, indicating that responses on Type 2 transfer trials were likely to be greater than those on corresponding Type 1 training trials.
We conducted a Bayesian repeated-measures ANOVA on the data in Fig. 13  Results of Experiments 3a and 3b show that participants can respond accurately in a two-stage artificial algebra task in which feedback was based on the sum of three stimulus magnitudes.With line lengths, both Stage 1 and Stage 2 responses were highly correlated with trained values, and did not differ systematically between Type 1 (a + (b + c)) and Type 2 ((a + b) + c) trials.Transfer tests showed that when a stimulus triple was presented without feedback on a Type 1 trial after training on Type 2 trials, or vice versa, that responses were highly correlated with trained responses and did not systematically differ depending on trial type.This shows that responding with line length stimuli was consistent with our procedural analogue of associative addition.With brightness stimuli, Stage 1 responding was still highly correlated with trained values but varied systematically with trial type: For Type 1 trials, responses were greater than trained values and with a regression slope substantially greater than unity (B = 1.70;Fig. 12) whereas for Type 2 trials, responses more closely corresponded to trained values except for trials with larger magnitudes where responses were lower than trained values (B = 0.64).An overall conservatism (a tendency to respond towards the middle area of the upper response bar) was therefore observed in Stage 1 brightness responses.Furthermore, although Stage 2 transfer responding was positively correlated with corresponding responses on trained trials, participants were likely to make transfer responses on Type 2 trials that were greater than trained Type 1 responses, whereas no such tendency was present for Type 1 transfer trials.One possible explanation for this discrepancy between trial types is that participants were averaging the overall brightnesses rather than adding them.Because averaging is not associative, it would predict that responses on Type 2 transfer trials be greater than the corresponding Stage 2 response on Type 1 trained trials.To see this, let a,b,c be a triplet of magnitudes presented on a trial (a < b < c).Thus, the greater responses on Type 2 transfer trials suggest that participants may have exhibited a tendency to generalise the overall brightnesses perceived (in terms of an average), rather than compute the consecutive sums of their individual magnitudes at Stage 2. With respect to more than two intensive stimuli, averaging is plausibly a more intuitive computation than addition, since in a natural context, averaging would (arguably) represent a more useful abstraction of complex perceptual information than summation.For extensive stimuli (such as line length), a sum is essentially a concatenation of magnitude in space or time, and so this may have been more meaningful, and therefore easier to pick up in our two-stage implicit learning task.Note that averages and sums of brightness magnitudes used here were highly correlated (r = 0.84 and 0.88 for Stage 1 and 2 responses, respectively), so addition-based feedback might have strengthened averaging if that operation was easier for participants to perform with brightnesses.Insofar as it was an (imperfect) analogue of associativity then, the capacity for addition with brightness consistent with this axiom was not as strongly supported in this task.

General discussion
How magnitudes are represented by the perceptual system and the nature of the computations they support are interrelated questions.Prior research has emphasised the former, with a major theoretical controversy being whether magnitudes are represented as discrete or continuous quantities, or both (Dehaene, 1999;Gebuis and Reynvoet, 2012;Leibovich et al., 2017;Lourenco and Aulet, 2023).Much research has tested if comparative judgements of magnitude are based on discrete or continuous perceptual variables, with discrimination tasks in which participants make a binary response (two-alternative forced choice or go/no-go) (e.g.Gebuis and Reynvoet, 2013;Park et al., 2016).By contrast, few studies have asked if participants can perform algebraic operations on perceptual variables.Such operations might be described as implicit computation and were the focus of the present research.
Our goal was to test if participants could add continuous magnitudes in an analogue perceptual task consistent with the formal properties of addition as an algebraic operation.In three pairs of experiments, we used an 'artificial algebra' task (Grace et al., 2018) in which participants learned to add magnitudes by feedback and without explicit instruction.Participants were instructed to 'estimate' pairs of lengths or brightnesses by making a click on an analogue response bar, and received visual feedback based on the sum of stimulus magnitudes following each trial.We measured the overall accuracy of responding, and tested whether it was consistent with analogues of the axioms of (Abelian) algebraic groups: commutativity (1a-1b), identity and inverse (2a-2b), and associativity (3a-3b).In each experiment pair, we used stimulus magnitudes from both extensive (line length) and intensive (brightness) modalities.All experiments used a transfer design in which novel stimulus configurations were presented without feedback after participants had been exposed to training stimuli.To test basic commutative addition, stimulus pairs that had been trained as (a,b) in terms of left-right position were tested as (b,a) and vice versa.For both line length and brightness, accuracy in the task was very high, and average responses on transfer trials were close to identical to those predicted by training trials (Fig. 4).In Experiment 2, we altered the task so that the left and right stimuli were assumed to have opposite signs (i.e., positive or negative) to represent the inverse property, and an omitted stimulus was used as an analogue to the identity element (i.e., zero).Responding was again very accurate overall, and was symmetrical around the midpoint of the response bar, as predicted by the inverse property (Fig. 6).Responding on transfer trials was highly correlated with trained trials (Fig. 7).For the identity test, deviations from centre were nearly equal when (a,0) was trained and (0,a) was tested, and vice versa.Experiments 3a-3b used a two-stage version of the task to test the associative property, which compared analogue responses to a + (b + c) and (a + b) + c.With line length, responding in both stages was highly correlated with trained values and test trials supported two-stage addition with associativity (Fig. 13, left panel).However for brightness overall responding deviated somewhat from predictions, which may have been due to participants averaging the stimuli rather than adding them in the second stagethis was noticeable on transfer trials when responses to (a + b) + c trials were greater than when the same triplet was presented as a + (b + c).
These results show that with minimal training and without explicit instructions, participants were able to produce non-symbolic judgements corresponding to the sum to two continuous magnitudes (of both extensive and intensive varieties), and that in the case of extensive addition, these computations exhibited the properties of an algebraic group.Measures of accuracy, in terms of correlations of average responses with correct values, were often remarkably highgreater than r = 0.98 for line length in Experiments 1a and 2a, and greater than r = 0.95 in Experiment 3a.The apparent ease with which participants learned to respond accurately in the task is consistent with the theory that number sense has some basis in the representation of continuous quantity.If mathematical ability is based exclusively in the representation of discrete dimensions, then computation of purely continuous magnitudes, replete with the algebraic structure analogous to that of symbolic arithmetic, would be unexpected.
A theoretical implication of these findings is that algebraic structure may be an automatic property of our perceptual processes, or inherent to representations of magnitude.This possibility is supported by recent theoretical work by Grice, Kemp, Morton, and Grace (2023).Noting there was no satisfactory explanation for arithmetic's origin, they posed a metamathematical question: Arithmetic consists of a set of elements and operations that combine two elements of the set to give another element.Of all possibilities, why are the elements represented as numbers and the operations as addition and multiplication?Grice et al. showed (via mathematical proof) that four assumptionsmonotonicity, convexity, continuity and isomorphism (MCCI) -were sufficient to uniquely identify addition and multiplication over the real numbers.Their result shows that arithmetic and algebraic structure are a logical consequence of purely qualitative conditions.Grice et al. argued, based on evidence that MCCI characterises perception in humans and nonhumans, that these conditions have substantive psychological meaning as principles of perceptual organisation.They concluded that arithmetic was a natural consequence of how our perception is organised, so that algebraic structure may be inherent in perception.
Our hypothesis that magnitude representations have algebraic structure goes beyond previous accounts in numerical cognition which have largely focussed on whether magnitudes are scaled linearly or logarithmically and where they are located in the brain (Dehaene, 1999(Dehaene, , 2003;;Dehaene, Izard, Spelke, and Pica, 2008;Nieder and Dehaene, 2009;Sokolowski et al., 2017;Walsh, 2003).The question of scaling is a measurement issuehow numbers are assigned to representationsand reflects a historical perspective from psychophysics (Gescheider, 2013;Murray, 1993).Scaling also provides a useful method to study developmental changes in the understanding of number, as in the line estimation task (Siegler and Booth, 2004).Our results suggest that it is important also to conceptualise magnitude representations in terms of the algebraic structure necessary to support implicit computation with them.
We have described our task as a nonsymbolic 'artificial algebra' because participants learn by feedback and are not explicitly told the rule that maps stimuli to correct responses, similar to artificial grammar learning (Knowlton, Ramus, and Squire, 1992;Pothos, 2007;Proulx and Heine, 2009;Reber, 1989).It is interesting to note that there have been previous studies with a symbolic artificial algebra, which has been used to investigate how participants learn the symbol manipulation steps to solve an algebraic equation, for example, solving -a/b -x = − c for x.In this task, different symbols such as ©, Φ, Δ, and ↔ are used for the variables and operators in the eq.(Anderson, Qin, Sohn, Stenger, and Carter, 2003;Blessing and Anderson, 1996).The key difference is that the symbolic task is studying processes related to explicit computation, whereas the present task is an assay of implicit computation.An important question for future research will be to investigate whether implicit computation is related to learning of symbolic mathematics.Given that accuracy in numerosity discrimination tasks predicts mathematics achievement (Chen and Li, 2014;Halberda et al., 2008;Schneider et al., 2017) it is reasonable to expect that individual differences in implicit computation may also be associated with learning of symbolic mathematics.
Future research should also investigate the capacity for continuous computation in children and non-humans.Although our results confirmed the ability to add continuous magnitudes in adults, this may have been a product of participants' cumulative knowledge of formal mathematics (e.g.having explicitly learned that a + 0 = a in school, participants may have generalised this knowledge in our non-symbolic identity task).Testing for the ability to perform basic addition of continuous quantities with pre-school children and non-humans would strengthen the claim that the perceptual system is able to perform computations on non-discrete magnitudes.It would also be interesting to determine whether symbolic, discrete, and continuous computation are supported by the same neural mechanisms.Previous fMRI studies have confirmed similar brain structures are activated in tasks involving symbolic and discrete non-symbolic arithmetic (Bugden et al., 2019;Venkatraman et al., 2005).Our results suggest similar areas may be implicated in computation of continuous dimensions, the confirmation of which would be an important further step towards understanding the basis of implicit computation and mathematical learning.
Our research tested analogues to the axioms of a commutative algebraic group, but is limited in some ways as a test of formal group structure.First, we did not test the axioms under the closure property which is necessary for a pair (G, +) to be a group.Closure states that for all a,b in G, a + b is also in G.For example, the set of integers with addition satisfies closure because any two integers added together gives another integer.Because stimuli and responses in our task were fundamentally different, it was impossible to satisfy closure.However it would be possible in principle to test closure if participants had to make a response that was the same modality as the stimuli, for example to produce a line length in response to lines as stimuli.Also, the axioms were not tested simultaneouslyeach pair of experiments tested one (commutativity, associativity) or two (identity and inverse) axioms in isolation.Finally, Experiment 3 was not a strict analogue of associativity, as it used a two-stage procedure, whereas associativity is a property of binary operations.This limited our ability to draw firm conclusions from the results of Experiment 3b with respect to participants' capacity to perform associative addition of brightnesses.Although these are limitations of our research, given that our paradigm is the first attempt to study implicit computation experimentally, we think the present approach has merit.Future research should explore versions of our task which correspond more closely to the group axioms, particularly in terms of the closure and associative properties.
More broadly, it is interesting to consider why perceptual systems would have a capacity for implicit computation at all.To our knowledge, the only answer has been proposed by Shepard (1994), who argued that because the world is described by principles of Euclidean geometry and physical laws with algebraic structure, natural selection would favour perceptual systems that successfully adapted to those principles.Consequently, the algebraic and geometric invariants that describe the external world have been internalised in the minds of organisms by evolution.To the extent that the physical world has mathematical structure, an implication is that the functioning of perceptual systems will appear to have a similar mathematical structure.Thus, Shepard's (1994) account explains implicit computation as a necessary outcome of evolution.An alternative possibility is that mathematical structure may be intrinsic to the representations that organisms form of their environments, and thus forms the basis of the mathematics that was subsequently developed to describe formally the characteristics of those environments.Although this idea may seem quite radical, if mathematics is understood as the science of patterns (Resnik, 1997), then the implication is simply that these representations have a pattern or structure that can be characterised by mathematical principles such as symmetry.
In summary, the current research shows that implicit computation can be studied experimentally with an 'artificial algebra' task (Grace et al., 2018).Participants learned to make analogue responses based on sums of stimulus magnitudes, by feedback and without specific instructions, consistent with predictions based on the properties of a commutative algebraic group.Our results confirm that representation of continuous quantities is an important aspect of number sense, and suggest that these representations may have a group-like structure that supports implicit computationthat is, a capacity to perform analogue algebraic operations with non-symbolic quantities.

Fig. 1 .Fig. 2 .
Fig. 1.Diagram of Procedure for Experiment 1b.After a pair of stimuli are presented, the participant makes a response (marked by an unfilled circle), followed by a coloured oval centred on the correct response and extending 7% of the response bar to either side.Correct responses (left panel) are followed by a green oval.Incorrect responses (right panel) are followed by a red oval, and a single correction trial with the same stimulus pair.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) are representative of individual results; the average regression slope (pooled across a < b and a > b trials) for participants in Experiment 1a (line length) was M = 0.784 [95% CI: 0.711, 0.856], which was not significantly different than the average for Experiment 1b, M = 0.804 [95% CI: 0.732, 0.876], t(15) = − 0.397, p > .60.The primary goal of Experiments 1a-b was to compare responses on test trials with corresponding training trials, which used the same stimulus pairs but with the left-right position reversed.Fig. 4 shows average responses on test trials (b + a) plotted against responses on corresponding training trials (a + b) for Experiment 1a (left panel) and 1b (right panel).Data fell very close to the major diagonal in both panels.

Fig. 3 .
Fig. 3. Average responses for training trials plotted against correct values for Experiment 1a (line lengths; left panel) and Experiment 1b (brightness; right panel).Filled circles indicate trials for which the left stimulus magnitude was less than right (a < b); unfilled circles indicate the converse (a > b).The solid diagonal indicate the line of equality and the dashed lines are the best-fitting regressions (equations shown).Bars indicate one standard error.
) where a and b are the nominal values of the left and right stimuli.When stimuli were presented singly, the value of the 'absent' stimulus was taken as zero.Consequently, max(ba) = b max -0 = b max (= a max ), and min(ba) = 0 -a max , so [max(ba) -min(ba)] = 2b max = 2a max .Stimulus values for training and test trials and correct values for Experiments 2a-2b are listed in Supplementary Table

Fig. 4 .
Fig. 4. Average transfer test trial responses (b + a) plotted as a function of corresponding training trial responses (a + b) for Experiments 1a (line lengths; left panel) and 1b (brightnesses; right panel).The solid diagonal indicates the line of equality, and error bars one standard error.

Fig. 5 .
Fig. 5. Average absolute deviation of responses from correct values by block for Experiment 2a (line lengths, filled circles) and Experiment 2b (brightnesses, unfilled squares).Bars indicate one standard error.

Fig. 6 .
Fig. 6.Average responses for training trials plotted against correct values for Experiment 2a (line lengths; left panel) and Experiment 2b (brightnesses; right panel).The solid diagonal indicate the line of equality and the dashed line is the best-fitting regression (equations shown).Bars indicate one standard error.

Fig. 7 .
Fig. 7. Results of inverse test trials.Average transfer trial responses (as absolute deviations from the centre of the response bar) plotted as a function of corresponding training trial responses for Experiments 2a (line lengths; left panel) and 2b (brightnesses; right panel).The solid diagonal indicates the line of equality, and error bars one standard error.

Fig. 8 .
Fig. 8. Results of identity test trials.Average transfer trial responses (as absolute deviations from the centre of the response bar) plotted as a function of corresponding training trial responses for Experiments 2a (line lengths; left panel) and 2b (brightnesses; right panel).The solid diagonal indicates the line of equality, and error bars one standard error.
205 in 8-bit grayscale values, generated using an 8-bit monochrome palette [0-255]).Three stimuli (a 'triplet') were presented on each trial.The eight stimuli in each experiment yielded a total of 56 triplets, (a,b,c), in which a < b < c [8! / (8-3)!3! = 56].There were two types of trials, and each trial consisted of two stages.Type 1 trials, in which a was presented alone in Stage 1, and b and c were presented in Stage 2, were analogues for a + (b + c).Type 2 trials, in which a and b were presented in Stage 1, and c was presented in Stage 2, corresponded to (a + b) + c.

Fig. 9 .
Fig. 9. Results of inverse pair trials.For each pair, the correct response was the centre of the response bar.Average trained (dark grey) and transfer pairs (light grey) are shown for Experiments 2a (line lengths; left panel) and 2b (brightnesses; right panel).Bars indicate one standard error.
,16) = 6.69, p < .05,η p 2 = 0.30, confirming that deviations overall decreased from Block to Block 2. The Block x Stimulus Dimension and Block x Trial Type x Stimulus Dimension interactions were significant, F(1,16) = 7.19, p < .05,η p 2 = 0.31, and F(1,16) = 14.06, p < .01,η p 2 = 0.47, confirming that deviations in Block 1 were greater on Type 2 trials with line length (left panel).Subsequent analyses used data from Block 2. Average responses for trials with feedback are plotted as a function of correct values in Fig. 12 for line length (upper panels) and brightness (lower panels) stimuli, and for Stage 1 (left panels) and Stage 2 responses (right panels).Responses for Type 1 (a + (b + c)) and Type 2 ((a + b) + c) trials are indicated by unfilled and filled symbols, respectively.

Fig. 10 .
Fig. 10.Diagram of Procedure for Experiment 3a.For Type 1 trials, a single line is presented and the participant clicks on the response bar below.Two more lines are presented and the participant clicks on the lower response bar.After the second response, feedback is provided with coloured ovals.The corresponding sequence of events on a Type 2 trial (in which two lines are presented in the first stage and one in the second) is shown on the right.
Fig. 13 shows results from associativity transfer tests for line length (left panel) and brightness (right panel).Average Stage 2 responses for test trials presented in the last block of training are plotted as a function of the response to the corresponding trained triplet.Type 1 test trials (a + (b + c)) are shown by unfilled squares while Type 2 test trials ((a + b) + c) are shown as filled circles.
similar to those in Experiments 1a-b and 2a-b.For lines (left panel), the data provided moderate-to-strong evidence in favour of the null hypothesis that Stage 2 test triplet responses were equal to Stage 2 responses for the same triplet, BF 01 = 8.90.For brightnesses (right panel), there was weak evidence in favour of the null, BF 01 = 2.01.

Fig. 11 .
Fig. 11.Average absolute deviation of responses from correct values by block for Experiment 3a (line lengths; left panel) and Experiment 2b (brightnesses; right panel).Results are shown separately for Type 1 (unfilled squares) and Type 2 (filled circles) trials.Bars indicate one standard error.

Fig. 12 .
Fig. 12.Average responses on training trials as a function of correct values for Experiments 3a (line lengths; upper panels) and 3b (brightnesses; lower panels).Stage 1 responses are shown in the left panels and Stage 2 responses in the right panels.Type 1 trial responses are indicated by unfilled squares; Type 2 trials by filled circles.The major diagonal indicates equality.Best-fitting regression lines (with equations) are shown separately for Type 1 and 2 trials.

Fig. 13 .
Fig. 13.Average Stage 2 responses on associativity transfer test trials plotted as a function of corresponding average training trial responses for Experiments 3a (line lengths; left panel) and 3b (brightnesses; right panel).Type 1 and 2 test trials are indicated by unfilled squares and filled circles, respectively.The major diagonal indicates equality, and error bars one standard error.