The motor Wisdom of the Crowd

Wisdom of the Crowd is the aggregation of many individual estimates to obtain a better collective one. Because of its enormous social potential, this effect has been thoroughly investigated, but predominantly on tasks that involve rational thinking (such as estimating a number). Here we tested this effect in the context of drawing geometrical shapes, which still enacts cognitive processes but mainly involves visuomotor control. We asked more than 700 school students to trace five patterns shown on a touchscreen and then aggregated their individual trajectories to improve the match with the original pattern. Our results show the characteristics of the strongest examples of Wisdom of the Crowd. First, the aggregate trajectory can be up to 5 times more accurate than the individual ones. Second, this great improvement requires aggregating trajectories from different individuals (rather than trials from the same individual). Third, the aggregate trajectory outperforms more than 99% of individual trajectories. Fourth, while older individuals outperform younger ones, a crowd of young individuals outperforms the average older one. These results demonstrate for the first time Wisdom of the Crowd in the realm of motor control, opening the door to further studies of human and also animal behavioural trajectories and their mechanistic underpinnings.


Introduction
The Wisdom of the Crowd (WOC) is the notion that the aggregate opinion of a diverse group of people may be more reliable than that of an expert. This idea when aggregating several individuals [5,6]. An example of the negative effect of broken independence arises when subjects are informed of the guesses of others and are then allowed to emit their guess or reconsider their previous one [7]. However, quantifying social influence over each individual allows one to find new aggregation measures that counteract this effect [8], and to take advantage of it to improve upon the crowd estimate [9]. Moreover, it has been shown how the condition of independence can be relaxed [10], and how allowing subjects to discuss before arriving at a consensus estimate can lead to improvements at both the group and individual level [11,12]. Others have argued for maximal differences (negative correlations) between subjects as the essential requisite for the WOC [13], and therefore the detection of correlations is presented as a powerful tool to improve the collective estimate when it deviates from the true value [14]. Finally, methods to find subgroups of individuals whose aggregated estimate may be better than the aggregated estimate of the whole crowd have been proposed, for example, based on identifying expertise within the crowd [15].
The second key question is what tasks can be performed more efficiently by a collective than by an individual. Classical demonstrations of the WOC consisted of guessing a number or choosing over a set of discrete alternatives [5,6]. While a big part of research still follows this paradigm, many studies have successfully applied WOC to more complex tasks, such as estimation of multi-dimensional quantities [16], sequential decision-making [17], collective guessing of a sentence [18], collective editing [2], forecasting in prediction markets and prediction polls [19,20], correct tempo of a classical music piece [21], medical diagnosis [22,23] or drug prescription [24]. However, the vast majority of these studies share a common characteristic: they focus on explicitly rational tasks, in which individuals need to make a conscious estimate.
Here we investigated whether WOC can be applied to a task that depends on embodied motor control rather than high-order abstract cognition. To that end, we developed an experimental paradigm where we asked children to trace a series of predefined patterns on a tablet. Such a situation defies classical WOC studies because it is a complex motor task, which is difficult to parameterize or describe in simple terms, and whose errors are highly correlated (a deviation at any point in the line affects the future trajectory of the finger). It is also worth emphasizing that the task is intimately related to drawing, an important part of human culture and, with the appropriate experimental design and measuring tools, can be tested in naturalistic conditions beyond artificial laboratory settings. Although some studies have investigated collective problem-solving in tasks involving movement [25], visual decision-making [26][27][28] and visual search [29][30][31], our study is, to the best of our knowledge, the first one showing the implications of aggregating individual solutions to a sensory-motor task.
In this paper, we first present experimental results from hundreds of subjects tracing with their fingers a series of well-defined geometrical templates displayed on touchscreen tablets in a classroom setting. Using such 'big behavioural data' [32] collected 'outside the laboratory' [33], we examine the four main features that characterize the strong instances of WOC. The paper is thus organized along these four questions: (i) whether individual trajectories of subjects can be aggregated to produce a more accurate description of the desired pattern, (ii) whether the improvement is a true 'crowd' effect (requiring different individuals, as opposed to a single individual repeating the same task), (iii) whether the effect is strong enough so that the aggregate is more accurate than most of the individuals and (iv) whether the effect is strong enough so that a crowd of low-skill individuals outperforms one high-skill individual. We find that all these conditions are met, providing the first evidence of motor WOC.

Collecting behavioural big data in a classroom setting
We asked 797 school students with ages between 6 and 18 years old to trace with their finger several shapes using a custom-made drawing app (figure 1a). After a few minutes of practice to familiarize themselves with the app, subjects were invited to reproduce five different geometric curves with varying levels of complexity, from ellipses to four-fold rose figures. A template of each shape was shown in the screen of the tablet (figure 1a), and subjects were instructed to trace fluidly and continuously for 30 s, not excessively fast so as to avoid systematically overshooting the template but not excessively slow so as to avoid halts and brief jerky movements in trying to perfectly match the template. In other words, to simply produce good-enough tracing. All shapes were closed curves that could be traced repeatedly in a single stroke, and almost every subject traced each template several times during the 30 s of each experimental curve (figure 1b; electronic supplementary material, figure S1a). Our experimental procedure allowed us to test hundreds of children producing thousands of high-resolution trajectories (more than 24 h of data) in naturalistic conditions (see Methods for more details).

Processing the trajectories to extract the Wisdom of the Crowd
In order to estimate the WOC of the trajectories traced by our subjects, the first step consisted in splicing the full trajectory of each subject into individual trajectories that represent a single pass over the template. This task turned out to be complicated for some complex shapes, and especially for low-accuracy trajectories that deviated very much from the template. To avoid biasing our results and increase the transparency of our analysis, we resorted to a simple approximate method: we determined that in most cases subjects traced each template at least 8 times, and divided the full trajectory of each subject into 8 segments of equal duration, which we then call 'raw individual trajectories' (electronic supplementary material, figure S1b). While in most cases each raw individual trajectory contains more than a single pass over the template, this fact does not affect our conclusions and, in fact, strengthens them (see Methods). Next, we subsampled the templates and processed the raw individual trajectories to facilitate their further analysis. We first took a set of reference points along each template (figure 1c, left; electronic supplementary material, figure S1c). We then took the raw individual trajectory, assigned each of its points to the nearest reference point on the royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220480 template, and found the median centre of mass of each group of points (figure 1c, left; electronic supplementary material, figure S1d-h). In this way, we obtained a subsampled individual trajectory (which we simply call 'individual trajectory'), with one experimental point corresponding to each reference point of the template. From now on, we will refer to these subsampled individual trajectories simply as 'individual trajectories'.
To compute WOC trajectories, we computed the median centre of mass for the points associated with the same reference point from different individual trajectories (figure 1c, right; electronic supplementary material, figure S1i-k).
To quantify the accuracy of a trajectory (either an individual trajectory or a WOC trajectory), we computed the distance between each of its points and the corresponding reference point of the template (figure 1c; electronic supplementary material, figure S1l ), obtaining the distance between them for each small region. Then, we used the average of these errors to quantify the overall error for the whole trajectory (electronic supplementary material, figure S1m).
We are now in a position to establish whether the drawing task fulfils the four criteria of WOC mentioned above.

Criterion (i): Wisdom of the Crowd trajectories are more accurate than individual trajectories
We first investigated the accuracy of individual trajectories. Participants cared about properly tracing the templates but were not particularly motivated to be accurate, as we instructed them to trace quickly and fluidly without being too concerned about accuracy. Furthermore, our dataset includes data from very young children to late teenagers, whose motor skills are at different maturation stages. Consequently, individual trajectories showed a lot of dispersion (figure 2a, blue). In spite of this inaccuracy at the individual level, WOC achieves remarkable accuracy. We built WOC trajectories for all shapes by taking one individual trajectory from each subject and aggregating all of them (figure 2a, red). Average error (across all patterns) decreased more than twofold, from Despite not showing a definite structure, one can notice interesting differences in the local improvement along different parts of the templates. In ellipses, major improvements take place in the straightest parts of the curve, where tracing speed is typically faster. While three-petal flowers show their WOC maximal improvement at the edges, four-petal flowers have it where curvature is minimal (rather than at the inner high-curvature turns). In both cases, however, the improvement corresponds to the most distal parts of the template taken globally. In the lemniscate, the WOC shows an intriguing global top-bottom asymmetry.

Criterion (ii): Wisdom of the Crowd accuracy improves with a diverse crowd
Aggregating several trajectories from a single subject should also lead to an improvement in accuracy, an effect termed 'the crowd within' [34]. Therefore, the improvement reported here for the WOC trajectories might not require a diverse crowd, but just be a consequence of aggregating multiple datasets (regardless of whether they come from the same subject or from different ones).
To elucidate whether our observation is a true effect of the crowd, we took advantage of the fact that each subject traced each pattern several times. While in the previous section we built our WOC trajectory from all of our subjects, here we studied how the error of the WOC trajectory changes as a function of how many individual trajectories are aggregated. We also compared the case of aggregating trajectories from the same individual (cycles while the same person draws the same pattern) or from different individuals. While in both cases the error decreases as we aggregate more trajectories, this decrease is much more rapid when the trajectories come from different individuals (figure 3a), thus better supporting the WOC in movement trajectories. This advantage of crowds over repeated trials of the same subject indicates that subjects tend to repeat their own errors. Different trajectories drawn by the same subject tend to deviate in the same regions and towards the same side, creating systematic biases. These biases are corrected when aggregating trajectories from different subjects, whose deviations are more balanced.
We also used this procedure to determine how many subjects are needed to achieve high accuracy. The error decreases monotonically as we add more trajectories from different subjects, dropping by almost half with only 10 subjects and starting to saturate after 50 subjects (figure 3a, red).

Criterion (iii): Wisdom of the Crowd outperforms most individuals
A mere improvement of accuracy as we aggregate more trajectories would seem insufficient to support our claim of a WOC effect in motor control. In fact, it is a mathematical necessity that the error of an aggregate trajectory must be equal to or smaller than the average error of the individual trajectories [14,35].
What makes WOC such an important effect is the magnitude of the improvement. While there is no absolute threshold to consider that the effect is strong enough, a classical criterion is that the WOC estimate must be better than an overwhelming majority of the individual estimates. This was the case for example in Galton's original demonstration of the effect, in which the arithmetic mean of individual guesses was better than every single individual estimate [36].
To investigate whether our WOC trajectories outperform the vast majority of individual estimates, we computed all

Criterion (iv): Wisdom of the Crowd of low-skill individuals beats the average high-skill individual
The last criterion of strong WOC, which is critical for its practical applicability, is that a crowd of low-skill individuals must outperform a single high-skill individual. To test this criterion, we first need to divide our population of subjects in groups with different expected skill levels. For tasks that require specific skills learned through education or professional training, high-skill subjects can be defined through their cultural level or profession. In the context of drawing, we considered testing professional painters and designers, but these professions usually have more to do with an aesthetic sense and the ability to master different drawing tools than with the motor control required to follow a predefined line with one's finger.
However, our empirical approach allowed us to sample a large population of maturing subjects of different ages. Such a rich dataset offered a practical criterion to separate subjects by skill level: age stratification. Our results include subjects aged from 6 to 18 years (figure 4a), and motor control develops during this period (especially between 6 and 10 years of age) [37][38][39][40]. Indeed, our results show that individual performance improves with age (figure 4b). Therefore, we defined low-skill individuals as young children (less than 10.5 years), and high-skill individuals as older children (greater than 10.5 years) whose motor skills are comparably more developed (while this threshold maximizes the difference in individual performance between the two groups, our results hold regardless of the threshold chosen; electronic supplementary material, figure S2).
As a confirmation of this criterion, individual estimates are better for older children than for younger ones (figure 3c, compare bars 1 and 2 of every template). The key question is then whether a crowd of young children outperforms the average older child. Remarkably, we found this to be true for every template (figure 3c, compare bars 2 and 3 of every template). Therefore, our dataset meets the last criterion of the strongest versions of WOC: a crowd of low-skill individuals outperforms the average high-skill individual. We also found an interesting and unexpected result. When comparing the results from WOC estimates from old and young children, we found almost no difference. Only for 2 of the 5 templates are the WOC estimates better for older children than for younger ones (figure 3c, compare bars 3 and 4 of every template). This result suggests that, when recruiting a crowd to perform a WOC estimate, there is no benefit in selecting high-skill individuals [41]. In other words, one may not need to include (motor) 'experts' to achieve the (motor) WOC.
We also asked the question of whether different criteria to define high-skill or low-skill individuals could give different results. To answer this question, we simulated criteria that estimated individual skill with different accuracies, from those capable of selecting the very best individual, to benchmarks incapable of distinguishing skill. Our results turned out to be very robust: for a high-skill individual to outperform the crowd, the selection method must be able to identify accurately the top 1.73% performers (electronic supplementary material, figure S3).

The method to aggregate trajectories has little impact on the results
An important practical question is how to aggregate the individual responses to produce the WOC estimate [7,14], and in particular whether to use the mean or the median, considering that the latter can counteract the effect of outliers [42,43]. All our previous analyses use the median to compute centres of mass, as we detected some subjects with very large deviations from the templates. We re-did our analysis using the mean instead of the median and compared both methods, to find very little difference between them (electronic supplementary material, figure S4). This result indicates that, even though some subjects are clear outliers, the distribution of trajectories around the true pattern has relatively little skew (figure 2a; note how the density of traces is nearly symmetric along the templates).

Discussion
Here we tested whether WOC can take place in a context far removed from those studied so far. By asking our subjects to trace a complex trajectory rather than estimating a number, we investigated a procedure that does not consist in making an explicit rational estimate but in performing an implicit embodied motor task. We tested hundreds of children drawing in a custom-made tablet app, collecting a large amount of precise and quantitative data to test the effect of aggregating independent individuals for better collective performance. We found that tracing geometrical patterns manifests all the characteristics of WOC: (i) accuracy improves when aggregating several trajectories, (ii) these trajectories must come from different subjects, (iii) the aggregate trajectory outperforms most individual ones and (iv) a crowd of lowskill individuals outperforms a typical high-skill individual. Our results thus extend the concept of WOC from its classical application of quantity estimation to the realm of motor control and embodied cognition.
Our findings suggest that WOC may be applicable with a greater generality than previously thought because they address outstanding theoretical concerns. For the WOC estimate to be accurate, the individual estimates must follow a probability distribution whose average (either the mean or the median) matches the true value. A general concern was that the tasks typically chosen, such as number estimation, might share some characteristics that made them fulfil this requirement, which would not be met when trying to extend WOC to other contexts. Here we have tested WOC in a completely different context, in which the object to be estimated is not a number but a complex trajectory, and the estimation procedure is not a rational thought process but a motor task. Our results indicate that the conditions required for WOC are met with great generality, and call for a more systematic exploration across tasks of different nature.
In our results, collectives of younger children achieve almost the same accuracy as collectives of older ones, despite the fact that the former are on average less accurate individually. This result indicates that individual inaccuracy does not translate here to a larger bias at the population level: younger children draw more diverse trajectories (average individual Euclidean distance to collective trajectory 2.77 mm for younger, 2.37 mm for older; see electronic supplementary material, figure S5), but their aggregate is as accurate as that of older children. Since more diverse datasets converge to their average faster than less diverse ones (by virtue of the diversity prediction theorem [5]), collectives of younger children catch up with the accuracy of collectives of older  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220480 ones. The fact that young children exhibit no larger systematic bias than older ones suggests that young children might have a similar psychomotor plan to execute the task than older ones, but it is executed with less precision due to underdeveloped motor control.
Different contexts may modulate the effect. Our results were collected after instructing subjects to emphasize natural movements rather than accuracy. Different instructions might change the results, as greater emphasis on individual accuracy might decrease the impact of WOC, while absence of instructions might increase variability (and therefore the potential for WOC effects). In any case, we found a high accuracy in the motor WOC, despite the complex nature of the task, the extreme inaccuracy of some trajectories and the fact that errors accumulate (a deviation at one point in the trajectory affects its future). This high accuracy opens the door to applying similar aggregation methods to other tasks with similar characteristics, such as finding the optimal trajectory of a vehicle [44] or in the context of skill improvement in amateur and professional athletes, where the aggregation of different patterns used to execute a movement or a manoeuvre could complement other approaches aiming to find an optimal biomechanical technique [45].
Our results suggest a potential application in the study of human symbols. Letters, numbers and other symbols are written differently by different people [46], and their shape changes significantly over time and from region to region [47]. These changes are well documented qualitatively, and recent techniques have been developed to characterize them quantitatively [48,49]. These techniques are usually based on aggregating images, regardless of the trajectories that the writers followed to trace each symbol. Our results indicate that it would be possible to compute an average trajectory for a given symbol across a population. The methodology would need to be different from the one presented in this paper, with trajectory estimation and alignment posing important methodological challenges. But the surprisingly high accuracy found in our dataset indicates that aggregating symbols written by different people may provide an intelligible average tracing trajectory, which would facilitate the study of how handwriting changes in space and time.

Hardware
Thirty Android touch-screen tablets were used for the behavioural experiments. The tablet brand was Samsung Galaxy Tab A6 (size: 254 × 164 × 8 mm 3 , Android v. 8.1.0 and API level 26). The price per tablet was less than $200. The display has a 10.1-inch PLS LCD screen, with dimensions 216 × 135 mm 2 , and a resolution of 1920 × 1200 px 2 . The tablet has a capacitive touch-screen, and registers touch by a finger or a capacitive stylus, with a resolution equal to the display resolution. Maximum screen refresh rate is 60 Hz, and maximum sampling rate of touch events is close to 85 Hz.

Software
The app was programmed in Android Studio (v. 3.3.2) in the Kotlin programming language [33], and tailored specifically for accuracy, efficiency and robustness in out-of-the-laboratory experiments with children. It can be freely downloaded (https://github.com/adam-matic/KinematicCognition), and also edited for different experimental purposes.

Experimental procedures
A total of 851 subjects (56% female, 10% left-handed) participated in the experiment, most of whom were school students between 6 and 18 years old (797 subjects; see electronic supplementary material, table S1). All experiments were performed during the 2019 Brain Awareness Week (from 11 to 15 March 2019). Students arrived in groups of about 30 individuals, belonging to the same school class. Classes belonged to several different schools in the area of Alicante, Spain. There were no specific selection criteria for schools and classes beyond their willingness to participate in our experiment and a more or less homogeneous sampling of ages and locations. Groups were assigned different time slots throughout the morning. The experiments took place in a regular small classroom (with a capacity for 30 students) in a building adjacent to the Instituto de Neurociencias de Alicante, Spain. We used 30 tablets (a single tablet per children), placed on the tables in the classroom. Each experimental session had an overall duration of about 15 min.
Students were invited to enter the classroom and sit down wherever they wished. Then, before starting the experiments, students were greeted and briefly told about the overall goal of the study (they already knew some details because they had an explanation of the activity days before by their teachers at their own local schools). The explanation was generic, namely, a brief and fun experiment was going to take place where they would simply need to draw and trace specific geometric figures in order as they would appear on the tablet screen, helping scientists to study motor control (this fitted well in the Brain Awareness Week, since each group also received a related outreach talk, so that they could not only listen about scientific experiments but actually also participate in them).
Students were requested to use the index finger of their dominant hand to draw and trace on the tablet screen (rather than using tablet pens). They were also asked to avoid touching the screen with the other hand, or to move the tablet from where it was placed when they entered the room. Before starting the experiment, they all tapped on the screen at the custom-made app logo of the experiment.
First, a simple screen opened where they were asked whether their dominant hand was left or right, their age by scrolling on the date of birth, and their gender. After the information is complete, the app allows two options: 'Practice' or 'Experiment'.
Second, the participants did a trial exercise before the actual experiment, where similar curves appeared to the ones they would encounter later. This part was key for them to familiarize with the tasks they would need to accomplish next. In particular, they were instructed not to separate the finger from the screen until each task was over, and to perform fluid movements, avoiding delineating too slow or too fast. They could at this point ask questions before the experiment took place.
Third, after such trials, oral instruction prompted the students to start all the 'Experiment' part at the same time. The experimental part consisted of a series of exercises, or tasks, all automatically concatenated with brief pauses in between. In this way, we avoided having to verbally interrupt the whole classroom with various unnecessary (and potentially distracting and confusing) instructions (especially for the younger children) to start the many different drawing and tracing tasks.
Although sitting next to each other, participants had little influence on each other's performance. While the attitude of other participants might affect general features such as the speed or care with which they performed the task, the nature of our experiment made it virtually impossible to use information from other subjects to increase one's performance.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220480 Three different classes of tasks were presented to the participants: tracing, tracking and scribbling. Every class comprised different exercises, each one with a duration of 30 s, with a 7 s pause in between. Thus, the whole motor control experience was brief, avoiding distractions or loss of interest by the children. A visual summary of the experimental dataset can be found at: https://youtu.be/rz-TWk_6HSU.
In this study, we only analysed the first class of task (tracing), where participants had to delineate with their finger a black curve on a blue screen. The curves shown were an oval (or ellipse), a thinner oval (larger eccentricity), three-petal and four petal-flowers (both based on Huh's pure frequency curves [50,51]), and an infinite symbol (lemniscate). Subjects were not instructed to start tracing at any specific location of the templates. However, the duration of each drawing trial ensured that all participants covered the same surface area, several times.
When the experiment finished, students were thanked again and invited to leave the room to continue enjoying the Brain Awareness Week parade at the Instituto de Neurociencias. Experimental procedures were approved by the Institutional Review Board and followed the required guidelines on participation and personal data protection. Parents of the students had been previously informed and asked for written consent to record the finger trajectories.

Data cleaning
We removed anomalous data following three criteria. (i) Trajectories that lasted less than 25 s, which indicated subjects who were not focused on the task, since they were instructed to draw for the full duration of the experiment (less than 4% of trajectories were removed for this reason, and most of them were shorter than 15 s, so including them would have minimal impact in our conclusions). (ii) Trajectories whose standard deviation (in either the horizontal or vertical dimension) was less than half the standard deviation of the points of the template. (iii) Trajectories with jumps between two consecutively recorded points greater than 1/4 of the dimensions of the touchscreen, which in most cases were due to malfunctioning of the tablet or the subject touching the screen with both hands simultaneously. See electronic supplementary material, table S1, for the number of full trajectories originally stored, the number of trajectories that did not meet each of the filtering criteria (some trajectories did not meet more than one criterion), and the number of trajectories that were finally used for the analysis. In sum, a total of 3485 trajectories were analysed, each with a duration longer than 25 s, which yields an estimate of a total of more than 24 h of high-resolution quantitative measurements of human drawing in naturalistic conditions.

Definition of raw individual trajectories
Each subject traced the template repeatedly during the 30 s allotted for the task, so each original trajectory contains several passes over the template. The ideal definition for 'individual trajectory' would be a single pass over the template, but finding the exact point in which the trajectory finishes one pass and starts the next is problematic: when the trajectory is a very poor approximation to the template, it is unclear when we should consider that an individual trajectory has ended. Therefore, attempts to divide the trajectories in this way would either force us to discard the less accurate trajectories or could result in systematic biases affecting differently the high-and low-accuracy trajectories.
For this reason, we chose a simple definition of raw individual trajectory: we divided each trajectory in 8 segments of equal duration, and we took each of these segments as a single raw individual trajectory. While this is only an approximation, it has the advantage of being a simple and transparent method and avoiding biases among trajectories of different accuracy.
Because of this approximate definition, in many cases, an individual trajectory contains only part of the template or more than one pass over some parts of the template. To determine a convenient number of segments, we randomly extracted 150 original trajectories (from only those where a single pass could be unambiguously defined). In 76% of the cases, subjects completed 8 or more full passes over the whole template (and only 6% of subjects performed 4 or 3 passes) so on average our individual trajectories contain more than one pass. We made this choice to be conservative: since they in fact contain more than one pass over each template (with only 17% including 2 or more passes), our individual trajectories already benefit from a small degree of aggregation, and therefore their error is smaller than the one we would measure if we took a true single pass. Therefore, the effect of WOC in our dataset is, if anything, probably underestimated. See electronic supplementary material, figures S6 and S7, for a demonstration of how the results are similar when dividing the original trajectories in 5 or 12 segments.

Subsampling of individual trajectories
Raw individual trajectories contained between 200 and 215 points. Before any analysis, we subsampled the templates into a set of reference points. We used 50 reference points for the two ellipses and 100 reference points for the rest; these numbers were chosen to be high enough to represent each template faithfully, while low enough to ensure that sufficient experimental points fell into each bin. Then, we subsampled the raw individual trajectories to ensure that each point of a subsampled individual trajectory corresponded to one reference point of its corresponding template. We did this by finding the median centre of mass of all points of the raw individual trajectory nearest to each reference point of the template (similar results are obtained by using the arithmetic mean instead of the median, as shown in electronic supplementary material, figure S4). See electronic supplementary material, figure S1, for a step-by-step description of this process. The term 'individual trajectory' hereinafter refers to the subsampled one (the term 'raw individual trajectory' referred to the trajectory before subsampling).

Aggregation of individual trajectories
To aggregate several individual trajectories (regardless of whether they belonged to the same subject or to different subjects), we computed the median centre of mass of the points of each individual trajectory corresponding to the same reference point in the template (figure 1c). The final result is an aggregate trajectory that has the same number of points as the number of reference points of the template (see electronic supplementary material, figure S1).

Computation of errors
To compute the error of any trajectory (either an individual trajectory or an aggregated one), we found the Euclidean distance between each of its points and the corresponding reference point of the template (figure 1c; electronic supplementary material, figure S1). These distances are used to represent the colours in figure 2b-d, showing the average error for each region of each template across all the trajectories. The total error of a trajectory is computed as the arithmetic mean of all the distances to the reference points.
To compute the average error when aggregating individual trajectories from the same subject (figure 3a, black lines), we created all possible sets with a given number of individual trajectories from each subject. For example, when aggregating 3 individual trajectories, for each subject there are 56 different combinations out of the total of 8 individual trajectories, so royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20220480 there are 56 different sets of 3 individual trajectories per subject. The error for the template is determined as the average of the errors over all sets of each subject, and then over all subjects.
To compute the average error when aggregating individual trajectories from different subjects (as for example in figure 3a, red lines), it would not be possible to compute all possible combinations (there would be too many). Therefore, we performed a random sample: we randomly drew the desired number of subjects from the database, and for each subject, we randomly chose one of the individual trajectories. Then, we aggregated the individual trajectories, and computed the error of the aggregate one. We repeated this process 10 000 times and computed the arithmetic mean of all errors.
To compute the average error across templates in figure 4, we first computed the total error for each of the 5 templates and then computed the arithmetic mean of these 5 error values.

Computation of confidence intervals
To compute the confidence intervals for the errors of individual and aggregated trajectories (figure 3a,c), we used bootstrapping [52]. First, we created 'virtual experiments' by randomly drawing subjects with repetition until we reached the total number of subjects. Therefore, each virtual experiment consisted of the same number of subjects, but due to the random sampling some of our original subjects may be missing, and some subjects may be present more than once. Then, we recreated all our analysis on each of these virtual experiments. We repeated this full process with 500 virtual experiments, and our confidence intervals represent the region that contains 95% of these results.

Ethics. Experimental procedures were approved by the Institutional
Review Board of the Miguel Hernández University in Elche (Alicante, Spain), under project registration number 2019.111.E.OIR, and followed the required guidelines on participation and personal data protection. Parents of the students had been previously informed and asked for written consent to record their children's finger trajectories.
Data accessibility. Experimental data and code for analysis and figure generation can be found at the following link: https://github.com/ gabrielmadirolas/motorwoc.
The data are provided in electronic supplementary material [53].