Introduction

When speaking, playing a musical instrument, or walking in nature, we naturally coordinate our actions with what we perceive. Music is an excellent model for studying this link between perception and action as listening to music urges us to move1. Sometimes we can choose to deliberately align our movements to the beat of music like we do when dancing. How can we explain this widespread tendency to move to music? Musical features like its regular temporal structure (rhythmic complexity, syncopation), but also its pitch structure (harmonic complexity) are particularly conducive to movement2,3. An explanation of this tight link between musical rhythm and movement can lie in the structure and functioning of our brains1,4,5,6. Regions of the brain typically underpinning motor control, such as the basal ganglia and cortical motor areas, are surprisingly engaged even when we merely listen to a rhythmic sequence in the absence of motor movement4,5,7,8,9.

Humans—musicians and non-musicians alike—are well equipped to extract temporal regularities from stimulus sequences, both in the auditory and the visual modalities, and to align their movements to the most prominent periodicity (e.g., the musical beat or pulse) by foot tapping, dancing, or walking (beat perception and synchronization—BPS10,11,12,13,14; for a contribution of the vestibular system to perceptual–motor coupling see also15). The majority in the general population can track the beat of music and move along with it14,16. Matching movements to the beat is possible because the temporal dynamics of rhythm drives internal neurocognitive self-sustained oscillations underpinning beat perception17,18,19. This underlying process, called entrainment, generates temporal expectations which influence motor control and allows the alignment of movements to the anticipated beat times.

BPS abilities are thought to be universal skills that can be refined by musical training. Musicians outperform non-musicians in several BPS tasks. Musical training is found to improve the ability to extract the beat from a musical sequence and parse its metrical structure, and to reproduce rhythms4,20,21,22,23,24. In addition, musicians display more precise and accurate motor synchronization to the beat than non-musicians, as shown in finger tapping to rhythmic sequences like a metronome or music25,26,27,28,29.

Particularly in the absence of musical training, individuals can differ significantly in BPS abilities14,16. A few single-case studies on beat-deaf individuals or poor synchronizers16,30,31,32, using motor tasks (finger tapping33) and/or beat perception tests (e.g., the Beat Alignment Test34), revealed that either beat perception, synchronization, or both can be selectively impaired in neurotypical individuals. Both rhythm perception and production are often impaired in beat-deaf individuals31,32. However, synchronization to the beat can be selectively impaired in the presence of spared beat perception16. The reverse—poor perception with unimpaired synchronization to the beat—is also observed30. These individual differences are further exacerbated by disease, such as neurodegenerative and neurodevelopmental disorders (Parkinson35,36, ADHD37, speech and language impairments38,39,40,41,42). Single-case evidence and studies of BPS in patient populations reveal intriguing dissociations between perception and production, and between beat-based and memory-based processes (see also43, for a recent study on a larger sample of non-musicians).

These effects of musical training and individual differences in non-musicians are typically examined by isolated tests or limited sets of tasks, which often vary across experiments and are occasionally tested in small-sample studies (i.e., single-case studies). This approach, albeit valuable, may provide an incomplete picture of differences linked to musical training and a simplified characterization of individual differences. Rhythmic abilities reveal a complex structure involving different dimensions (e.g., beat-based vs. memory-based processes43,44,45,46,47), compatible with the idea that there may exist multiple rhythm intelligences48. This complexity and rich structure of rhythmic abilities might not be easily captured with a few isolated tasks. There is a need of a more general, and at the same time parsimonious, way to account for individual differences in rhythmic abilities. This approach would call for multiple testing and modelling of both the differences resulting from musical training and the fluctuations of BPS capacities in the general population. These dissociations between perception and production may translate into single profiles that characterize individual differences in BPS abilities. We refer to the term “profile” as a limited set of measures that can represent, in a succinct way, a given group (e.g., musicians vs. non-musicians, or a clinical group), and can be illustrated graphically (e.g., using a directed graph, see below). Such profiles might also serve as markers of developing or progressing impairment in clinical populations. However, the task of identifying these profiles based on a few single cases and isolated methodologies is daunting. Even though single-case evidence is informative and suggestive, its generalization is not warranted. A systematic investigation in larger cohorts, relative to the performance of individuals with musical training, is still lacking.

In sum, previous studies showed general differences in BPS abilities linked to musical training and mostly quantified by a limited set of tests. Evidence from several single-case studies hinted at different rhythmic profiles. However, a general approach to individual differences in rhythmic abilities is still missing, which leaves several questions unanswered. Which tasks or measures of BPS are the most sensitive to pick up differences due to musical training? Can we characterize an individual in terms of a given rhythmic profile (e.g., a signature of rhythmic abilities)? What is the weight of perceptual and motor processes in defining these profiles?

In the present study we aimed at characterizing rhythmic abilities in musicians and non-musicians, focussing on beat-based processes, using a data science approach, namely by exploiting predictive modelling with machine learning. Participants classified themselves as musicians (practicing their instrument during the last year) or non-musicians. Non-musicians could not have received more than 7 years of formal musical training (e.g., participation in structured lessons of an instrument or voice, coupled with at least 1 h of practice/week). The outcome of this approach is to define profiles of BPS abilities in musicians and non-musicians, and distill out a parsimonious model capable of accounting for individual differences in both groups. Given the multidimensionality of rhythmic abilities and to circumvent the limitations of previous studies, we submitted musicians and non-musicians to a battery of tests assessing both perceptual and motor timing abilities (Battery for the Assessment of Auditory Sensorimotor and Timing Abilities, BAASTA)49. The battery has been extensively used in the past and has proven to sensitively detect individual differences in healthy and patient populations30,35,38,50,51. As multiple tests provide a wealth of information, we used dimensionality reduction techniques to pinpoint a minimal set of tasks and measures that allow capturing variability in BPS abilities linked to musical training. Machine learning52,53, and graph theory served to operationalize individual differences in rhythmic abilities, based on measures of BPS. We expected that both perceptual and motor measures would contribute to classify musicians and non-musicians, and that musicians would display a stronger link between perceptual and motor measures. This hypothesis is grounded in both behavioral and brain imaging evidence of a tighter coupling between perceptual and motor systems as a result of extensive musical practice6,54,55. Musicians tend to activate auditory and motor regions jointly when listening to sound4,56, or merely moving their hands57,58. An increased sensorimotor association can be found after relatively little training while learning a musical instrument59,60, a process which is likely involving the dorsolateral premotor cortex61. A second goal was to examine whether the measures of rhythmic abilities, which could most successfully distinguish musicians from non-musicians, would uncover subgroups in non-musicians, potentially reflecting some degree of informal musical experience. While musicians were expected to be quite homogeneous in their rhythmic abilities, non-musicians should likely be less homogeneous in terms of their BPS abilities, and probably lie on a continuum, or cluster in groups. Individual differences found in single-case studies point towards different profiles, which might characterize clusters of individuals in the general population. To test this possibility, we used unsupervised learning methods (clustering) in non-musicians only.

Results

Defining profiles of rhythmic abilities in musicians and non-musicians

To define a profile of rhythmic abilities based on the results of the BAASTA tasks and classify musicians vs. non-musicians, we processed data from 55 measures using Sparse Learning and Filtering algorithms (SLF)62,63. SLF aims at selecting a minimal set of variables obtained from the perceptual and motor tasks of BAASTA (Perc: perceptual; M: motor) that capture the most relevant differences between musicians and non-musicians. This set of measures was then entered in a model, aimed at classifying individuals as musicians or non-musicians based on their rhythmic performance. The outcome model, including a limited set of measures, is parsimonious and affords a significant gain in statistical power relative to a model that includes all the variables. We systematically tested three classification models, by taking as input (1) the entire set of perceptual measures (7 out of 55; Model Perc), (2) the entire set of motor measures (48 out of 55; Model Motor), and (3) the selected measures resulting from Models Perc and Motor (Model PMI). Model PMI also included interactions between motor and perceptual measures to test the hypothesis that the combination of perceptual and motor abilities is more relevant in defining the rhythmic profile of musicians than of the one of non-musicians. The procedure and the selected variables are illustrated in Fig. 1. Model Perc significantly accounted for 50% of the variance (F(73) = 14.4, p < 0.0001), model Motor accounted for 84% of the variance (F(74) = 97.1, p < 0.0001), and finally model PMI accounted for 92% of the variance (F(70) = 99.5, p < 0.0001), relative to the prediction of a model using the entire set of variables. Accuracy for the three models is shown in Fig. 1.

Figure 1
figure 1

Schema of the analysis pipeline using Sparse Learning and Filtering (SLF). Selected variables for Model Perc: 3 variables from the Beat Alignment Test (BAT_slow_Dprime, BAT_med_Dprime, BAT_fast_Dprime), 1 from Duration discrimination (Dur_discrim_Thresh), and 1 from Anisochrony detection with music (Anisoc_det_music_Thresh). Model Motor: 2 variables from Paced tapping with tones (Paced_metro_750_Vector_dir, Paced_metro_750_Vector_len), 1 from Paced tapping with music (Paced_music_ross_Vector_dir), and 1 from Unpaced tapping (Unpaced_slow_CV). Model PMI included (a) two perceptual measures, reflecting the ability to detect whether a metronome is aligned to the beat of music or not (P1—BAT_slow_Dprime), or whether there is a rhythmic irregularity—a shifted beat—in a short musical fragment (P2—Anisoc_det_music_Thresh), (b) two motor measures, indicating the alignment of participants’ taps with the beat of a metronome (M1—Paced_metro_750_Vector_dir) and music (M2—Paced_music_ross_Vector_dir), and (c) the four interactions between these measures (P1 × M1, P1 × M2, P2 × M1, P2 × M2). Accuracy is calculated based on the Test phase of the classifier validation. Full description of the variables, and confusion matrices for the three models at Train (60% of the dataset), Validation (20%) and Test (20%) phases are provided in Supplementary Materials.

We tested the capacity of the three models to discriminate between musicians and non-musicians by taking the outcome of the models (value from 0 to 1, with 0 indicating non-musicians, and 1, musicians) using the entire dataset (i.e., including the three phases of the classifier validation). Model PMI was superior compared to the other two models in classifying musicians and non-musicians (Fig. 2a; Model × Group interaction, F(2, 154) = 3.8, p < 0.05; model Perc, t(66.9) = 5.2, p < 0.0001, d = 1.3; model Motor, t(64.3) = 6.2, p < 0.0001, d = 1.5; model PMI, t(64.0) = 7.0, p < 0.0001, d = 1.8). Even though the distributions of the model PMI predictions were partly overlapping (Fig. 2b), the model could classify musicians and non-musicians very successfully (Fig. 2c). Equation (1) below allowed the classification of musicians and non-musicians (prediction = ) for model PMI.

Figure 2
figure 2

Classification performance of the three models. (a) Comparison of the Perc, Motor, and PMI models predictions (0 = non-musicians; 1 = musicians). Error bars represent SEM. (b) Probability density for musicians and non-musicians obtained with model PMI. (c) Scatter plot showing the individual predictions based on model PMI. For simplicity, the predictions are presented as a projection of two composite scores representing the linear portions of the prediction model, referring to the perceptual measures (perceptual score = 0.05P1 − 0.08P2) and the motor measures (motor score = 0.11M1 + 0.12M2), without the interactions.

$$\widetilde{y}=0.49+0.05{P}_{1}-0.08{P}_{2}+0.11{M}_{1}+0.12{M}_{2}+0.02{P}_{1}{M}_{1}-0.005{P}_{1}{M}_{2} -0.002{P}_{2}{M}_{1}- 0.007{P}_{2}{M}_{2}.$$
(1)

Model PMI, including two perceptual variables, two motor variables, and the four interactions yielded the best classification results. The two perceptual variables come from two different tasks—the Beat Alignment Test and Anisochrony detection with music—in which participants were asked to detect whether a metronome was aligned to the beat of a musical excerpt presented at a slow tempo (P1—BAT_slow_Dprime; sensitivity index), and whether there was a rhythmic irregularity—a shifted beat—in a short musical fragment (P2—Anisoc_det_music_Thresh; threshold). The two motor measures come both from paced finger tapping tasks (Paced tapping with tones, Paced tapping with music), and indicate synchronization accuracy (alignment of participants’ taps to the beat) when participants tapped to a metronome presented at a slow tempo (M1—Paced_metro_750_Vector_dir) or to a musical excerpt (M2—Paced_music_ross_Vector_dir). To gain a better understanding of the rhythmic signature of musicians and non-musicians, we examined the contribution to the classification of each variable and interaction independently for each group. We calculated the correlation between each variable and the prediction separately for musicians and non-musicians, leading to a measure of explained variance. With this method we obtained two profiles characterizing musicians and non-musicians (Fig. 3a,b). The contribution of each component (perceptual, motor, and their interaction) to the profiles of musicians and non-musicians is illustrated in Fig. 3c. The motor measures contributed to the profiles more than perceptual measures in both musicians (χ2(1) = 20.8, p < 0.0001) and non-musicians (χ2(1) = 22.4, p < 0.0001). Differences between the two profiles regarding the contribution of individual variables and interactions are apparent. Interactions between perceptual and motor variables play a more important role in the definition of musicians’ profile than for non-musicians (χ2(1) = 5.2, p < 0.05). The motor component may contribute more to the definition of non-musicians, but this difference did not reach statistical significance (χ2(1) = 3.5, p = 0.11).

Figure 3
figure 3

Profiles of rhythmic abilities for musicians (a) and non-musicians (b) based on model PMI, expressed as undirected graphs. The nodes’ size reflects the contribution of the variable to the definition of the group (proportion of variance) and the edges’ widths to the contribution of the interactions. (c) Comparison of the contribution of the model component (perceptual, motor, interaction) to the profiles of musicians and non-musicians. P: perceptual; M: motor. Numbers refer to the specific variable (P1—BAT_slow_Dprime; P2—Anisoc_det_music_Thresh; M1—Paced_metro_750_Vector_dir; M2—Paced_music_ross_Vector_dir). *p < 0.05.

These findings revealed that a minimal set of rhythmic measures and their interactions can successfully differentiate musicians from non-musicians, as illustrated by undirected graphs. A limitation of this representation, however, is that it does not reflect the sign of the relation, positive or negative, between the variable and the predicted group. Moreover, one might wonder whether each component of the model can explain individual variability within each group. To address this point, we examined the relation between each variable and interaction and the prediction, separately for each group (Fig. 4). Each component of the profile can explain differences within each group. The only exception was the interaction between P2 and M1, which just failed to reach significance in explaining the variability for non-musicians. Notably the motor measures alone could explain large portions of intra-group individual variability (63% for non-musicians and 46% for musicians). Moreover, interactions between perceptual and motor variables could capture a significant amount of individual variability in both groups. This further supports the inclusion of interactions in the PMI model. Also worth noting is the sign of the relation between the components and the prediction. For both groups, improved performance (i.e., an increase of P1, M1, or M2; a decrease of P2, indicating a lower detection threshold) was associated with greater rhythmic abilities. When considering the interaction between perceptual and motor measures, opposite relations with rhythmic abilities as reflected by the prediction of the PMI model (from 0 to 1) emerged. These interactions were not only key in differentiating musicians from non-musicians (as shown in Fig. 3c), but also played different roles in capturing individual differences within each group. Notably, a stronger interaction between perceptual and motor measures, suggesting better performance in both dimensions (characterized by increasing positive values for P1 × M1 or M2, and negative values for P2 × M1 or M2, considering that lower P2 values—a perceptual threshold—signify better performance), was associated with enhanced prediction scores exclusively among musicians.

Figure 4
figure 4

Scatter graphs showing the relation between individual elements of the profiles of rhythmic abilities (measures and their interactions) and the prediction of model PMI, separately for musicians and non-musicians. Explained variance for each group is reported (in bold when the regression was statistically significant). **p < 0.01, *p < 0.05, ns not significant.

Profiles of rhythmic abilities for non-musicians

An inspection of measures of rhythmic abilities in non-musicians revealed considerable variability in this group. Variance was larger in non-musicians than in musicians, as reflected by the PMI model (SD of the prediction from PMI model for non-musicians = 0.24, for musicians = 0.14; Bartlett test for homogeneity of variance, χ2(1) = 9.6, p < 0.01). To assess whether non-musicians’ rhythmic performance was homogeneous or rather different clusters (and corresponding profiles) would emerge, we used model PMI to examine the distribution of rhythmic skills in this group, linked with formal and informal musical experiences. This approach was both practical and parsimonious as it was based on a limited set of variables (as compared to the full set) shown as sensitive to large individual differences in rhythmic abilities linked to musical training (see below for further analyses including the full dataset). Building on graph theory, we extracted two clusters from the non-musicians group using modularity analysis64,65. This procedure led to the detection of two clusters (Subgroup 1, Subgroup 2) that maximized modularity.

Subgroup 1 included 20 participants (9 females, mean age = 24.1 years, SD = 4.7), and Subgroup 2 was formed by 20 participants (10 females, mean age = 22.1 years, SD = 2.6). The two subgroups did not differ significantly in terms of years of formal musical training (mean = 0.8 years, SD = 1.5, for Subgroup 1; mean = 1.8 years, SD = 2.4, for Subgroup 2; t(31.8) = 1.6, p = 0.14, d = 0.5). However, Subgroup 2 showed more years of informal musical activities (mean = 1.7 years, SD = 2.9, for Subgroup 1; mean = 5.0 years, SD = 4.3, for Subgroup 2; t(33.3) = 2.8, p < 0.01, d = 1.0). By informal musical activities we referred to engaging in amateur play of one or more musical instruments without the requirement of formal musical education.

Once we uncovered the two clusters of non-musicians, we used the variables of model PMI to classify the two subgroups (see Eq. 2 below). This new model could classify the two subgroups with an accuracy of 100% (in the Test phase of the classifier validation; the confusion matrices at Train, Validation and Test phases are provided in Supplementary Materials). The performance of the model is seen in Fig. 5. To avoid confusion with the coding of musicians and non-musicians in the previous model, we indicated perfect classification for Subgroup 1 with “− 1” and for Subgroup 2 with “+ 1”.

Figure 5
figure 5

Classification performance of Model PMI for the two subgroups of non-musicians (− 1 = Subgroup 1; 1 = Subgroup 2). (a) Probability density for non-musicians in Subgroup 1 and Subgroup 2. (b) Scatter plot showing the individual predictions based on the model. The predictions are presented as a projection of two composite scores representing the linear portions of the prediction model, referring to the perceptual variables (perceptual score = 0.76P1 + 0.15P2) and the motor variables (motor score =  0.53M1 − 0.07M2), without the interactions.

$$\widetilde{y}=0+0.76{P}_{1}+0.15{P}_{2}+0.53{M}_{1}-0.07{M}_{2}+0.42{P}_{1}{M}_{1}-0.04{P}_{1}{M}_{2} -0.10{P}_{2}{M}_{1}- 0.00{P}_{2}{M}_{2}.$$
(2)

We obtained profiles for the two subgroups of non-musicians by examining the contribution of each variable and interaction to the classification of each group (in Fig. 6, panels a and b). The contribution to the profiles of Subgroups 1 and 2 of each component (perceptual, motor, and their interaction) is presented in Fig. 6c. The motor measures contributed to the profiles more than perceptual measures for both Subgroup 1 (χ2(1) = 32.1, p < 0.0001) and Subgroup 2 (χ2(1) = 8.8, p < 0.01). There was a tendency toward a greater contribution of the motor component in Subgroup 1 than in Subgroup 2 (χ2(1) = 3.2, p = 0.07).

Figure 6
figure 6

Profiles of rhythmic abilities for non-musicians in Subgroup 1 (a) and Subgroup 2 (b) expressed as undirected graphs. The nodes’ size reflects the contribution of the variable to the definition of the group (proportion of variance) and the edges the contribution of the interactions. (c) Comparison of the contribution of the model component (perceptual, motor, interaction) to the profiles of Subgroups 1 and 2. P: perceptual; M: motor. Numbers refer to the specific variable (P1—BAT_slow_Dprime; P2—Anisoc_det_music_Thresh; M1—Paced_metro_750_Vector_dir; M2—Paced_music_ross_Vector_dir). *marg. marginally significant, p = 0.07.

To test whether each component of the non-musicians’ subgroup profiles can explain individual variability within each group, we examined the relation between each variable and interaction and the prediction separately for each subgroup (Fig. 7).

Figure 7
figure 7

Scatter plots showing the relation between individual elements of the profiles of rhythmic abilities (variables and interactions) and the prediction of model PMI, separately for non-musicians in Subgroups 1 and 2. Explained variance for each group is reported (in bold when the regression was statistically significant). **p < 0.01, *p < 0.05, ns not significant.

Only a few components of the profiles could explain the variability in each Subgroup, but in a rather specific way. Notably, the two components P1 and M1, and their interaction, were the only ones that explained the variability within the two subgroups. Variability within Subgroup 1 was strongly related with component P1, explaining 68% of its variance, and with the interaction between component P1 and M1 (20%). Variability within Subgroup 2 was mostly explained by component M1 (51% of the variance) and the interaction between M1 and P1 (68%). Finally, as the two subgroups of non-musicians differed in terms of informal musical activities, we tested whether variability in each of the two subgroups was related to this variable. Variability within Subgroup 1, indicated by the prediction of PMI model, was correlated with informal musical training (r = 0.48, p < 0.05). With increasing years of informal musical training, the prediction from the model for participants in Subgroup 1 became closer to 0, thus nearing the performance of Subgroup 2. This finding is consistent with the observation that Subgroup 2 displayed significantly more informal musical experiences than Subgroup 1 (see above). Finally, we ran an additional analysis to test whether some of the perceptual and motor measures included in the PMI model were specifically correlated with informal musical training. Only M1 was positively correlated with informal musical training (r(38) = 0.42, p < 0.05, Bonferroni-corrected), and the P1 correlation just failed to reach significance (r(38) = 0.37, p = 0.08, Bonferroni-corrected).

By design, the detection of two clusters among non-musicians and the extraction of the corresponding profiles were based exclusively on the variables included in model PMI, which was highly successful in distinguishing musicians from non-musicians. This choice, albeit practical and parsimonious, might have biased the search for profiles of rhythmic skills in non-musicians by focussing only on those capacities linked with musical training. To test whether the clusters we identified were more representative of rhythmic abilities in general, we extracted two clusters using a modularity analysis from the group of non-musicians using a subset of the initial set of 55 variables representing 95% of the variance instead of taking only the variables from model PMI (see Supplementary Materials). We detected two clusters of participants very comparable to those obtained in the analysis based on model PMI, showing a large overlap (88%) between the classifications. Hence, the identified subgroups using the PMI model and based on a reduced set of profile measures seemed to effectively and parsimoniously capture the individual differences in rhythmic abilities among non-musicians, as reflected by their performance on BPS tasks.

Discussion

We used machine learning techniques to characterize BPS abilities in musicians and non-musicians by selecting a minimal set of perceptual and motor measures from a battery of rhythmic tests (BAASTA)49. We expected both perceptual and motor measures to contribute to the classification of musicians and non-musicians. In keeping with our hypothesis, a classification model (PMI) including small sets of perceptual and motor measures (4 variables overall) and their interactions was superior to models of perceptual and motor measures on their own. The final model of rhythmic abilities was very accurate in classifying musicians and non-musicians (almost 90%) and allowed distilling separate profiles (undirected graphs) for the two groups.

The final model included perceptual and motor measures derived from tasks which were shown to be affected by musical training in prior work. The detection of the alignment between a metronome and musical beat (Beat Alignement Test), and the perception of a rhythmic irregularity in an isochronous sequence (Anisochrony detection) are typically enhanced by musical training66,67,68. Similarly, performance in paced tapping to rhythmic stimuli (music and metronomes) is improved by musical training28,29,69 (for a review, see70). Interestingly, two of the identified measures involved presenting the stimuli at a slow tempo (with an IOI of 750 ms), when detecting a misalignment or tapping to an isochronous sequence, while the other two measures involved music fragments presented at an average tempo (IOI of 600 ms). Whether stimulus processing at slow tempos plays a specific role in defining rhythm profiles based on musical expertise is still unclear. For example, there is some evidence that increased tempo might affect the tendency to anticipate the beat in paced finger tapping tasks (mean negative asynchrony71; but see70, for a discussion), an effect that seems to depend on musical training, at least at very slow tempos29. However, further research is needed to confirm the role of slow tempo in BPS when classifying musicians and non-musicians.

Motor measures played a more important role (explaining more variance) than perceptual measures for both musicians and non-musicians. Interactions between perceptual and motor measures were stronger for musicians than for non-musicians. Interestingly, the components of the profiles, albeit limited in number, could successfully capture individual variability within each group. An improvement in rhythm perception (better detection of beat alignment or deviations from isochrony) or production (a less negative relative phase between taps and the beat) increased the probability of being classified as musicians (with prediction scores closer to 1). The interactions between perceptual and motor measures also explained a significant amount of variance within each group. Notably, these interactions were differently linked to the performance of musicians and non-musicians in rhythmic tasks as shown by opposite slopes relative to the model prediction in the two groups. In sum, it is apparent that interactions between perceptual and motor measures play a critical role in defining a musician’s rhythmic profile. Taking interactions into account both improves the classification of an individual as a musician or a non-musician and can specifically account for intra-group variability.

These findings are consistent with neuroimaging evidence of stronger coupling of auditory and motor systems resulting from extensive musical practice6,54. The effects of both short-term and long-term musical training on brain plasticity have been widely documented and pertain to primary sensory and motor regions as well as sensorimotor integration areas54,72,73,74,75. Auditory-motor associations can emerge as the result of little training when learning a musical instrument (e.g., after training non-musicians to play an MRI-compatible cello)60 and are thought to engage the dorsolateral premotor cortex61. The examination of the neuronal underpinnings of auditory-motor integration when playing different musical instruments and singing76 revealed overlaps, for example in dorsal premotor and supplementary motor cortices, thus pointing to shared mechanisms. A next step in future research should be to link the profiles of rhythmic abilities identified herein through behavioral measures with their brain substrates, to gain a better understanding of these individual differences.

A second goal was to examine whether the profile measures which were successful in differentiating musicians from non-musicians (from model PMI) could also reveal subgroups among non-musicians. This might reflect varying degrees of informal musical experience in non-musicians. By applying modular clustering, an unsupervised learning method, to the data extracted from non-musicians’ profiles, we identified two clusters. Again, motor measures contributed more than perceptual measures to both profiles, with a slight tendency to contribute more to one of the two profiles (Subgroup 1). However, this general observation did not entirely reflect the weight of the different tasks to capture variance within each group. For Subgroup 1, a single perceptual task (detecting a misalignment between a metronome and the musical beat) unexpectedly accounted for the majority of the variance, which contrasts with the initial impression given by the graph profiles. Similarly, for Subgroup 2, it was a motor task (finger tapping to a metronome) that captured most of the variance. This suggests that, despite the overall dominance of motor measures in defining subgroup profiles, individual tasks may have a disproportionate impact on the variance within each subgroup, thus providing a very parsimonious metric of individual differences in the rhythmic domain. Generally, the interactions between perceptual and motor measures were less specific in capturing individual variability than when we compared musicians and non-musicians. Notably, the classification of similar subgroups among non-musicians was replicated when considering a much larger set of perceptual and motor measures issued from BAASTA battery (and explaining most of the variance), showing that the profiles obtained, in spite of the fact that they included a minimal set of variables based on the comparison of musicians and non-musicians, could robustly capture individual differences in rhythmic abilities among non-musicians.

The two clusters of non-musicians differed in their informal musical activities. People belonging to the subgroup displaying more years of informal musical activities, albeit being classified as non-musicians, received less extreme prediction scores by the model (farther away from − 1) than the other group. Consistently, the years of informal musical activities could also account for individual variability within one of the subgroups. Thus model PMI can capture subtle differences in rhythmic abilities linked to informal musical activities, such as playing a musical instrument as an amateur musician would do. Notably, Western listeners, musicians, and non-musicians alike, can acquire implicitly complex musical features (melody, harmony, rhythm)77,78,79,80,81 from mere exposure to music82, shaping their perception77 and production83. This implicit knowledge paves the way for non-musicians’ perception of the relations between musical events, thus creating expectations for upcoming events that in turn, influence their processing84,85. Interestingly, informal music activities are likely to play an important role earlier in life during music acquisition86. There is growing interest in the role of informal musical activities, with evidence suggesting positive effects on both music and language development, auditory processing, and vocabulary and grammar acquisition87,88,89. As informal musical activities are quite common, having a model capable of capturing their effects on BPS may provide further insight into the relation between BPS and developmental disorders37,38,41,42.

Conclusions

The effects of musical training on beat perception and synchronization are well established. Yet, to date we are missing a general approach to rhythmic abilities that can account for differences linked to musical training and variability in non-musicians. Here we filled this gap by using machine learning techniques52,53 to extract a parsimonious model of rhythmic abilities from a large set of perceptual and motor tasks (BAASTA)49. The obtained profiles of rhythmic abilities, represented by undirected graphs, were based on a handful of perceptual and motor measures. They were very successful in separating musicians and non-musicians, and could capture the variability among non-musicians. Thus, using machine learning for the purpose of building a model of rhythmic abilities was a very fruitful endeavor. Machine learning is progressively becoming an important element of the toolbox for research in cognitive science and neuroscience90,91,92,93 with applications in music research such as computational music analysis94, and more recently in music cognition95. A similar approach based on machine learning and graph theory as used to model individual differences in rhythmic abilities could be purposefully extended to other music abilities such as pitch perception and production, or improvisation96.

By exploiting machine learning, starting from a dataset derived from behavioral tests, we demonstrated that the complexity of rhythmic abilities reflected by BPS tasks43, linked with formal musical training but also informal musical experiences, can be captured by a minimal set of behavioral measures and their interaction. Detecting profiles of rhythmic abilities and their link with formal and informal musical experiences is valuable to shed light on the sources of individual variability in BPS (e.g., neuronal97; genetic98), and the link with more general cognitive functioning. For example, rhythmic abilities and cognitive functions such as cognitive flexibility, inhibition, and working memory tend to covary in both healthy individuals and in populations with disorders37,38,99; moreover, executive functions and rhythmic abilities are both enhanced by musical training100,101. A systematic study of the relation between profiles of rhythmic abilities and cognitive functions is likely to advance our understanding of the inter-dependence between rhythmic capacities and general cognitive functions, and the underlying mechanisms. Finally, as individual differences in BPS are exacerbated in clinical populations, the proposed model and the profiles of rhythmic abilities may serve for identifying markers of impairment102. One of the advantages of this approach, owing to the model’s parsimony, is that profiles can be obtained via rapid testing and screening of participants with a limited set of tasks (the Beat Alignment Test, anisochrony detection, and finger tapping to a metronome and to music), that can be performed in about 15 min. This corresponds to a gain in time over 80% relative to the full testing battery (lasting 2 h)49. Ultimately, detecting individual profiles of rhythmic abilities in patient populations may play a pivotal role in devising personalized rhythm-based interventions103,104. To further substantiate the profiles uncovered in this study, it would be beneficial to extend the testing of the aforementioned measures to a broader sample, also encompassing clinical populations.

Materials and methods

Participants

Seventy-nine adults (43 females, 35 right-handed, 4 left-handed and 2 ambidextrous) between 18 and 34 years of age participated in the experiment. Thirty-nine (24 females; mean age = 24.3 years, SD = 2.5) were musicians and 40 (19 females; mean age = 23.1, SD = 4.0) non-musicians. Musicians were German native speakers recruited via a participant database at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig (Germany), and non-musicians were French speakers recruited in Montpellier (France). Participants had to satisfy the following criteria to be assigned to one of the two groups: (1) they self-classified as musicians or non-musicians, (2) they had practiced during the last year (musicians) or they did not practice during the last year (non-musicians), and (3) did not receive more than 7 years of formal musical training to be considered as non-musicians. A year of formal musical training was defined as a year during which the participant underwent a structured lesson schema of any instrument or voice, either self-taught or by an instructor, and practiced an average of at least 1 h per week. Musicians reported more years of formal musical training (mean = 7.5 years, SD = 2.5) than non-musicians (mean = 1.3 years, SD = 2.1; t(48.6) = 6.7, p < 0.0001, d = 1.9), and more years of informal musical activities (13.1 years, SD = 4.4. vs. 3.4 years, SD = 4.1; t(76.4) = 10.2, p < 0.0001, d = 2.3). Informal musical activities consisted of playing one or more musical instruments as an amateur without having received formal musical training, which was the case for some non-musicians. The study was approved by the Ethics Committee of the University of Leipzig and by the Euromov Ethics Committee. Informed consent was obtained from all participants. All experiments were performed in accordance with relevant guidelines and regulations.

Tests and procedure

We tested participants’ rhythmic abilities by submitting them to the Battery for the Assessment of Auditory and Sensorimotor Timing Abilities (BAASTA)49.

Measures of rhythmic abilities

BAASTA consists of a series of 4 perceptual tasks and 5 motor tasks. Perceptual tasks consisted in discriminating single durations (Duration discrimination), detecting deviations from the beat in tone and musical sequences (Anisochrony detection with tones, with music), or in saying whether a superimposed metronome was aligned or not to a musical beat (Beat Alignment Test). Motor tasks involved finger tapping in the absence of stimulation (Unpaced tapping), tapping to the beat of tone and music sequences (Paced tapping with tones, with music), continuing tapping at the pace of a metronome (Synchronization–continuation), and adapting tapping to a tempo change (Adaptive tapping). Participants were tested on all the tasks with a computer version of BAASTA. Auditory stimuli were delivered via headphones (Sennheiser HD201). Task order was fixed (Duration discrimination, Anisochrony detection with tones and music, BAT, for perceptual tasks; Unpaced tapping and Paced tapping to tones and music, followed by Synchronization-continuation and Adaptive tapping, for motor tasks). The battery lasted approximately 2 h.

In the Duration discrimination test, two tones (frequency = 1 kHz) were presented successively. The first tone lasted 600 ms (standard duration), while the second lasted between 600 and 1000 ms (comparison duration). Participants judged whether the second tone lasted longer than the first. The goal of the Anisochrony detection with tones task was to test the detection of a time shift in an isochronous tone sequence. Sequences of 5 tones (1047 Hz, tone duration = 150 ms) were presented with a mean inter-onset interval (IOI) of 600 ms. Sequences were isochronous (i.e., with a constant IOI) or not (with the 4th tone presented earlier than expected by up to 30% of the IOI). Participants judged whether the sequence was regular or not. The task was repeated using musical stimuli (Anisochrony detection with music) that consisted of an excerpt of two bars from Bach’s “Badinerie” orchestral suite for flute (BWV 1067) played with a piano timbre (inter-beat interval = 600 ms). To assess beat perception, the Beat Alignment Test (BAT) used 72 stimuli based on 4 regular musical sequences, including 20 beats each (beat = quarter note). Two sequences were fragments from Bach’s “Badinerie”, and 2 from Rossini’s “William Tell Overture”, both played with a piano timbre and played at three different tempos (with 450, 600, and 750-ms inter-beat intervals—IBIs). From the 7th musical beat onward a metronome (i.e., triangle sound) was superimposed onto the music, either aligned or non-aligned to the beat. When non-aligned, the metronome was either phase shifted (with the sounds presented before or after the musical beats by 33% of the music IBI, while keeping the tempo), or period shifted (with the tempo of the metronome changed by ± 10% of the IBI). Participants judged whether the metronome was aligned or not with the musical beat. These perceptual tasks were implemented using Matlab software (version 7.6.0). In the first 3 tasks, there were three blocks of trials and a maximum-likelihood adaptive procedure105 (MLP; MATLAB MLP toolbox106) was used to obtain perceptual thresholds (for details, see49). All tasks were preceded by 4 examples and 4 practice trials with feedback. Responses in the perceptual tasks were provided verbally and entered by the Experimenter using the computer keyboard by pressing one of two keys corresponding to a “yes” or “no” response. “Yes” indicated the situation when the participant detected a duration difference, the presence of an anisochrony, or that a metronome was misaligned with the musical beat.

Motor rhythmic abilities were tested with finger tapping tasks. Participants responded by tapping with the index finger of their dominant hand on a MIDI percussion pad. The purpose of the Unpaced tapping task was to measure the participants’ preferred tapping rate, and its variability without a pacing stimulus. Participants were asked to tap at their most natural rate for 60 s. In two additional unpaced conditions we asked participants to tap as fast as possible, and as slow as possible, for 60 s. In the Paced tapping with tones task, we asked participants to tap to a metronome sequence, formed by 60 isochronously presented piano tones (frequency = 1319 Hz) at 3 different tempos (600, 450 and 750-ms IOI). Similarly, in the Paced tapping with music task participants tapped to the beat of two musical excerpts taken from Bach’s “Badinerie” and Rossini’s “William Tell Overture”. Each musical excerpt contained 64 quarter notes (IBI = 600 ms). Paced tapping trials were repeated twice for each stimulus sequence and were preceded by one practice trial. To test the ability to continue tapping at the rate provided by a metronome, in the Synchronization–continuation task participants synchronized with an isochronous sequence of 10 tones (at 600, 450, and 750 ms IOI), and continued tapping at the same rate after the sequence stopped, for a duration corresponding to 30 IOIs of the pacing stimulus. The task was repeated twice at each tempo and was preceded by one practice trial. Finally, in the Adaptive tapping task, aimed to assess the ability to adapt to a tempo change in a synchronization-continuation task, participants tapped to an isochronous sequence (10 tones). At the end of the sequence (last 4 tones) the tempo either increased, decreased, or remained constant (40% of the trials). The tempo changed by ± 30 or ± 75 ms. The task was to tap to the tones in the sequence, to adapt to the tempo change, and to keep tapping at the new tempo after the stimulus stopped for a duration corresponding to 10 IOIs. After each trial, participants judged whether they perceived a change in stimulus tempo (acceleration, deceleration, or no change). The responses were communicated verbally and entered by the Experimenter. Trials were divided into 10 experimental blocks (6 trials × 10 blocks overall) and presented in random order. A training block preceded the first experimental trial. In all the motor tasks, the performance was recorded via a Roland SPD-6 MIDI percussion pad. Stimulus presentation and response recording was controlled by MAX-MSP software (version 6.0). A MIDI response latency of 133 ms (Leipzig testing) and 100 ms (Montpellier testing) was subtracted from the tapping data before further analysis.

Analyses

BAASTA

For Duration discrimination, Anisochrony detection with tones and with music, we calculated mean thresholds (percentage of the standard duration in the three task) across the three blocks or trials. We rejected the blocks including more than 1/3 of False Alarms, when a difference in duration, or that the sequence beat was irregular, was reported while there was no difference/no deviation from isochrony in the stimulus. In the BAT, we calculated the sensitivity index (dʹ) for the entire set of 72 stimuli, and separately for each or the 3 tempos (medium, fast, and slow). dʹ was calculated based on the number of Hits (when a misaligned metronome was correctly detected) and False Alarms (when a misalignment was erroneously reported).

Motor data obtained from tapping tasks were pre-processed as follows (as in Refs.16,49). We discarded taps leading to inter-tap intervals (ITIs) smaller than 100 ms (artifacts) and outlier taps were discarded. An outlier was defined as a tap for which the ITI between the actual tap and the preceding tap was smaller than Q1–3 × Interquartile range (IQR) or greater than Q3 + 3 × IQR, where Q1 is the first quartile and Q3 is the third quartile. We calculated the mean ITI (in ms) and motor variability (coefficient of variation of the ITI—CV ITI—namely, the SD of the ITI/mean ITI) for Unpaced tapping (spontaneous, slow, and fast), Paced tapping, and for the Synchronization-continuation tasks. Moreover, we analyzed synchronization in the Paced tapping task using circular statistics107 (Circular Statistics Toolbox for Matlab108; for use in BAASTA see49). Tap times were coded as unitary vectors with angles relative to the pacing event (tone or musical beat) on a 360° circular scale (corresponding to the IOI). The mean resultant vector R was calculated from the unit vectors corresponding to all the taps in a sequence. We used two indexes of synchronization performance: the length of vector R (from 0 to 1) and its angle (θ or relative phase, in degrees). Vector length indicates whether the taps are systematically occurring before or after the pacing stimuli (synchronization consistency); 1 refers to maximum consistency (no variability), and 0, to a random distribution of angles around the circle (i.e., lack of synchronization). The angle of vector R (θ or relative phase, in degrees) indicates synchronization accuracy, namely whether participants tapped before (negative angle) or after (positive angle) the pacing event. Accuracy was calculated only if participants’ synchronization performance was above chance (null hypothesis = random distribution of data points around the circle), as assessed with the Rayleigh test for circular uniformity107,109. The null hypothesis is rejected when R vector length is sufficiently large according to this test. Vector length data was submitted to a logit transformation (e.g.40) before conducting further analyses. Additionally, for the Synchronization–continuation task, we calculated measures of central (or timekeeper) variance and motor variance based on Wing–Kristofferson’s model110. In both paced tapping and synchronization–continuation tasks, the results in the two trials were averaged and submitted to further analyses.

Finally, we analyzed the data from the Adaptive tapping task by first calculating an adaptation index111. To do so, we fitted a regression line to obtain the slope of ITIs functions relative to the final sequence tempo; the value of the slope corresponds to the adaptation index. When the value is 1, the adaptation is perfect; lower and higher values than 1 indicate undercorrection and overcorrection, respectively. We calculated this index separately for tempo acceleration (i.e., faster tempi with final sequence IOIs < 600 ms) and tempo deceleration (slower tempi with final sequence IOIs > 600 ms). In addition, error correction was portioned into the two contributors, phase correction and period correction112. Phase and period correction were estimated from the two parameters alpha and beta of the fitted two-process model of error correction111. Finally, in the same task we calculated the sensitivity index (dʹ) for detecting tempo changes based on the number of Hits (when a tempo acceleration or deceleration was correctly detected) and False Alarms (when a tempo acceleration or deceleration was reported while there was no change or the opposite change). The full list of the 55 variables (7 perceptual, 48 motor) obtained from BAASTA and used as input database for clustering and data modelling is provided in Supplementary Materials.

Definition of a model of rhythmic abilities (sparse learning and filtering)

The 55 variables from BAASTA collected from a group of 39 musicians and 40 non-musicians served as the starting point to define a model of rhythmic abilities, for classifying the two groups. These variables formed a vector \(\mathbf{x}=({x}_{1}, \dots , {x}_{d})\) in our classification model. Given a set of pairs \(({\mathbf{x}}_{k}, {y}_{k}), k = 1, \dots , n\), we looked for a relation \(y = f(\mathbf{x})\) that fits the answers \({y}_{k}\) by \(f\left({\mathbf{x}}_{k}\right)\); \(y\) takes value 1 for musicians and 0 for non-musicians. We denote by \(\mathbf{Y}\) the vector of all answers \(y\). Finding this relation represents a supervised classification problem as the answers \({y}_{k}\) are known. We addressed this problem using machine learning. One advantage of machine learning is that it does not need a priori hypotheses. However, several methods (e.g., neural nets) behave like a “black box” and do not provide any insight about the process and relations leading to successful classification. In this study, to avoid this issue, we looked for an explicit linear relation (see Eq. 3).

$$\widetilde{y} = {b}_{0}+{\sum }_{i=1}^{d}{b}_{i}{x}_{i}={b}_{0}+{{\varvec{b}}}^{T}{\varvec{x}}.$$
(3)

Here, \(\mathbf{b}=({b}_{1}, \dots , {b}_{d})\) is the vector of regression coefficients and \({b}_{0}\) is the intercept. By d we refer to the dimension or number of variables. The symbol T represents the transpose of vector b. By \(\widetilde{y}\) we refer to the approximation of the known response \(y\). This is a linear regression model. The classification follows the rule:

$$\left\{\begin{array}{l}\widetilde{y}>0.5\to musicians \\ \widetilde{y}\le 0.5\to non{\text{-}}musicians\end{array}\right..$$

An important issue of all statistical learning models is data dimensions (\(d\)). In our case, the dimension is large (\(d=55\)) for a relatively limited sample size (\(n = 79\)). For this reason, one of our goals was to uncover a minimal set of measures capable of classifying musicians and non-musicians, assuming that some variables are redundant and some variables may contribute more than others to classification. To this aim, we employed a sparse learning and filtering (SLF) method62,113. This approach presents two advantages: (i) the selected variables are easier to interpret; (ii) for a given number of observations, the statistical power of the prediction increases with the reduction of the dimension114. SLF follows the principle of the lasso method115 by minimizing both the regression error and the penalized sum of absolute values of coefficients (see Eq. 4). In the equation, \(L\) is the number of observations that serve to learn the coefficients \(({b}_{0},\mathbf{b})\); \(\mu\) indicates the penalization factor.

$$E\left( {b_{0} ,{\varvec{b}},\mu } \right) = \underbrace {{\mathop \sum \limits_{k = 1}^{L} \left( {y_{k} - { }b_{0} - {\varvec{b}}^{T} {\varvec{x}}_{k} } \right)^{2} }}_{{Regression{\, }Error}} + \underbrace {{\mu \mathop \sum \limits_{i = 1}^{d} \left| {b_{i} } \right|}}_{{Penalized{\, }coefficients}}.$$
(4)

Recent methods in sparse learning have employed a proximal variant of the error function, which is smooth and facilitates convergence when parameters are selected appropriately62,63,113. Notably, these proximal optimization methods have proven to be successful in supervised classification for big dimension data (\(d \sim {10}^{6}\))63,113. Sparse learning methods have the property to bring the coefficients \({b}_{i}\) to \(0\) by tuning the penalizing factor \(\mu \ge 0\). The bigger the penalization \(\mu\), the more \({b}_{i}\) tends to converge to 0. The limit case \(\mu =0\) turns to be the basic linear regression and \(\mu =\infty\) puts all \({b}_{i}=0\) thus leading to the constant model \(y = {b}_{0}=\frac{1}{L}{\sum }_{k=1}^{L}{y}_{k}\). The optimal \(\mu\) is found by a trade-off between the classification error and the number of non-zero coefficients \({b}_{i}\).

We used a standard learning protocol (supervised classification) to assess the classification of musicians and non-musicians. We divided the observations into three subsets stratified according to each class (i.e., with balanced percentages of musicians and non-musicians), namely a Train set (60%), a Validation set (20%), and a Test set (20%). The Train set is used to minimize \(E({b}_{0},\mathbf{b},\mu )\) (see Eq. 4) and served to select the variables. The Validation set served as stopping criterion: when prediction errors on Train and Validation sets are approximately equal, the training iterations are stopped. This prevents the model from overfitting the data, another well-known issue of learning methods. The data in the Test set was used to assess the prediction behavior of the model for new data. We used SLF in all the three models: Perc, based on the entire set of perceptual variables (7 out of 55); Motor, based on the entire set of motor variables (48 out of 55); PMI, based on the selected variables as an outcome of models Perc and Motor. In Model PMI, we also included the interaction between perceptual and motor variables selected for this model, as one of the goals of the study was to assess the relation between perceptual and motor rhythmic abilities in musicians and non-musicians.

Starting from model PMI, we derived profiles of rhythmic abilities for each group, expressed by undirected graphs. The nodes in each graph indicated the variables selected by the model, and the edges represented their interactions. We estimated the contribution of variables and interactions separately to the classification of musicians and non-musicians. To do so we calculated the squared correlation coefficient between each variable or interaction and the prediction, and expressed that as a percentage of the total explained variance in the group. We used this classification procedure based on model PMI to classify musicians and non-musicians, and to classify the two subgroups of non-musicians.

Unsupervised learning (modular clustering)

To uncover clusters within the group of non-musicians we considered the participants as a collection of nodes \(V\), and link each couple of nodes to form the set of edges \(E\). We thus obtained a network that can be easily represented mathematically by a similarity matrix \(S\). The value of each entry \(S(v,w)\) is the similarity between individuals \(v\) and \(w\). If \(\mathbf{x}\) and \(\mathbf{y}\) are the measures for these two individuals then, in our study, their similarity is given by the correlation coefficient \(S\left(v,w\right)=corr(\mathbf{x},\mathbf{y})\). We processed these correlations using modularity analysis64,65. Modularity quantifies the degree to which a group can be subdivided into clearly delineated and non-overlapping clusters. High modularity reflects strong within-cluster links and weaker links between clusters. We maximized the well-known Newman’s modularity that measures the clustering quality versus a null model. The degree of node v is defined by \({s}_{v}={\sum }_{w\in V}S(v,w)\) and \(M={\sum }_{v\in V}{s}_{v}\) is the sum of degrees. For a given partition \(g\) of the node’s set \(V\), the Modularity (\(Q\)) is defined as indicated in Eq. (5).

$$Q\left(g\right)=\frac{1}{2M}\sum \limits_{v, w\in V}\left(S\left(v,w\right)-\frac{{s}_{v}{s}_{w}}{2M}\right)\delta \left(g\left(v\right),g\left(w\right)\right).$$
(5)

Here, \(\delta \left(g\left(v\right),g\left(w\right)\right)=1\) if \(g\left(v\right)=g\left(w\right)\) (that is, \(v\) and \(w\) are in the same cluster) and \(0\) otherwise. The null model is given by the value \(\frac{{s}_{v}{s}_{w}}{2M}\) that is the probability for a random edge to appear between \(v\) and \(w\). Then, the clustering algorithm searches for a partition \(g\) that maximizes \(Q\left(g\right)\) (Eq. 5). Modularity maximization led to partition the group of non-musicians into two clusters, each including 20 participants (for a representation of heat matrix showing that intra-cluster similarities are larger than inter-cluster similarities in the dataset, see Supplementary Materials).