Persistent Effects of Musical Training on Mathematical Skills of Children With Developmental Dyscalculia

Musical training (MT) is perceived as a multi-sensory program that simultaneously integrates visual, aural, oral, and kinesthetic senses. Furthermore, MT stimulates cognitive functions in a ludic way instead of tapping straight into the traditional context of school learning, including mathematics. Nevertheless, the efficacy of MT over mathematics remains understudied, especially concerning longstanding effects. For this reason, this longitudinal study explored the impact of MT on numerical cognition and abstract visual reasoning using a double-blind and quasi-experimental design. We assessed two groups of children from primary schools, namely one with developmental dyscalculia [DD; n = 22] and another comprising typically developing children [TD; n = 22], who concomitantly underwent MT. Numerical cognition measurement was carried out at four different time points: Baseline (pre-MT assessment), mid-test (after 7 weeks of MT), post-test (after 14 weeks of MT), and follow-up (10 weeks after the end of MT). Significant interactions were found between time and group for numerical cognition performance, in which the DD group showed higher scores in number comprehension, number production at mid-test, and calculation at post-test compared to baseline. A key finding was that number production, number comprehension, and calculation effects were time-resistant for the DD group since changes remained on follow-up. Moreover, no significant differences over time were found for abstract visual reasoning for both groups. In conclusion, the findings of this study showed that MT appears to be a useful tool for compensatory remediation of DD.


INTRODUCTION
Numerical cognition underlies daily-life activities and mathematical performance across our lifespan, and it has an impact on personal and professional development (Ancker and Kaufman, 2007). Numerical cognition includes six components; the number sense, an innate ability to recognize, compare, add, and subtract quantities without counting, the mental number line, which is an ordinal spatial representation of quantities and is built with experience (Dehaene, 1997), and the numerical processing and calculation, which are components developed during formal education. The numerical-processing is divided into number comprehension, responsible for understanding the nature of numeric symbols associated with their quantities, and number production, e.g., reading, writing, and counting numbers (McCloskey et al., 1985), and finally, the calculation component that is related to the performance of basic mathematical operations like addition, subtraction, multiplication, and division.
Deficits in calculation procedures, undeveloped problemsolving strategies, prolonged solution times, and higher inaccuracy rates are the core features of developmental dyscalculia (DD) (Geary, 1993;Skagerlund and Träff, 2016). In some cases, children with DD can also present symbolic processing deficits, e.g., difficulties with numerical digits and words or with their verbal and semantic representations (Kucian and Von Aster, 2015). Moreover, dysfunction in spatial reasoning is also present since quantities seem to be embodied in spatial formats, i.e., in the mental number line (Landerl, 2013). These deficits are associated with anomalies in brain functioning, for instance, some studies, comparing children with DD to typically developing control children, showed that the latter demonstrated higher activations in the intraparietal sulcus for number representations, while children with DD mostly activated medial frontal areas, revealing compensatory mechanisms (Kaufmann et al., 2011).
The prevalence of DD is around 3% and 7% in England and Israel, respectively (Butterworth, 2005;Shalev et al., 2005). However, it depends on the methodology and diagnosis criteria used (Devine et al., 2013). For this reason, prevalence rates may vary in different countries. For instance, in Brazil, Bastos et al. (2016) found a slightly higher prevalence (7.8% from a cohort of 2.893 children) compared to previously quoted studies. Nevertheless, Fortes et al. (2016) found 6% of prevalence in a cohort of 1.618 children from four different Brazilian states. Both Brazilian authors adopted cubes and vocabulary subtests of the Wechsler Intelligence Scale for Children (WISC-III) to assess the intellectual level. However, various mathematical screening protocols were used, for instance, Bastos et al. (2016) applied Grafman and Boller's modified protocol, while Fortes et al. (2016) used the schooling achievement test (Stein, 1994). These divergences reveal the need to control the instruments selected, which should be standardized and not restricted to screening measures (Butterworth, 2005). Moreover, other factors should be considered, such as the socioeconomic classification of children's families and schools (OECD, 2015), and the way children are assessed (individually or grouped).
Concerning DD prognosis, longitudinal studies have shown that DD is a persistent disorder that, when not appropriately assisted extends even beyond adolescence (Shalev et al., 2005;Mazzocco and Räsänen, 2013;Castro et al., 2014). Besides, it is essential to highlight that a longitudinal study performed by Ritchie and Bates (2013) was able to show that low socioeconomic and professional status in adulthood is not only a continuation of social status from one generation to the next but also a result of academic motivation and duration of education. For this reason, it is crucial to understand and create strategies that could improve or at least motivate these children to continue their academic studies.
Consequently, in recent years, approaches for remediation have been developed to enhance numerical cognition, for example, computer-assisted interventions (Kucian et al., 2011;Syah et al., 2015;Michels et al., 2018;Cheng et al., 2019), board games (Elofsson et al., 2016), and tutoring programs, i.e., the combination of conceptual features of numerical knowledge with counting skills (Iuculano et al., 2015), and even attentional training (Ashkenazi and Henik, 2012). These interventions produced improvements in the numerical cognition capacity, however, the effect was restricted to isolated numerical abilities such as number sense (Wilson et al., 2006), number line (Kucian et al., 2011;Käser et al., 2013;Elofsson et al., 2016), or number comprehension (Fuchs et al., 2013;Syah et al., 2015). Congruently, systematic reviews suggested that resistance to interventions may be an essential marker of DD (Monei and Pedro, 2017) and also that behavioral interventions have limited efficacy for children with DD (Kroeger et al., 2012). A factor, observed in those previous interventional-studies, was that most approaches embraced individual training, but it would be crucial to assess the effect of group settings since they would reinforce the social context and children's well-being (Koelsch, 2010), which likewise might potentiate neuroplasticity (Davidson and McEwen, 2012).

A Recent Cognitive Remediation Approach: Musical Training
Music is an enriching activity, and it seems to be an ideal tool to nourish human cognition (Koelsch and Siebel, 2005;Koelsch, 2010). Furthermore, music science is a field of research that investigates the impact of music on cognition. For instance, the "4E cognitive science" framework, has been recently adopted in music research by some authors (Schiavio and Altenmüller, 2015;Krueger, 2018;van der Schyff and Krueger, 2019). "4E" stands for Embodied, Embedded, Extended, and Enactive -four overlapping principles that help describe mental life in a novel and fascinating way. These principles challenge more traditional accounts of cognition (including music cognition) by emphasizing the fluid integration of neural, bodily, and environmental factors (Schiavio and van der Schyff, 2018;Ryan and Schiavio, 2019). For instance, researchers working from an Embodied perspective might offer novel tools to analyze their data, which more consistently integrate aspects of functions and processes distributed across the whole bodies of their participant(s) (Gallagher, 2005). In the case of Embedded cognition, environmental factors, such as cultural ones, are highlighted (Donald, 1991). With regard to Extended cognition, the authors argue that mental processes can be functionally coupled with tools and devices of their environment, generating novel opportunities to facilitate different cognitive tasks (Clark and Chalmers, 1998). Finally, the Enactive perspective describes cognition in terms of situated action, positing a deep continuity between mind and life (Varela et al., 1991;Thompson, 2007).
Musical training (MT) involves perception and action, which are mediated by a sensory, motor, and multimodal combination of diverse brain regions (Zatorre et al., 2007;Schlaug et al., 2010). Due to this multimodal characteristic, music intervention has the potential to change brain architecture that might be accompanied by cognitive changes. Although neuroanatomy (gross morphology of auditory cortex) has been shown to be extremely stable from childhood to adolescence in a longitudinal study (Seither-Preisler et al., 2014), other studies observed changes related to brain activation, as assessed by functional magnetic resonance imaging (Oechslin et al., 2013), electroencephalography (James et al., 2017), neural connectivity measured by diffusion tensor imaging (Moore et al., 2014), and cortical thickness (Hudziak et al., 2014).
Furthermore, music processing and mental calculations, such as addition and subtraction, seem to be connected to complex reasoning and to trigger similar brain pathways in both the prefrontal cortex and the parietal lobe (Schmithorst and Holland, 2004).
Few studies have been devoted to the study of the effects of MT on numerical cognition systems. However, some studies have proposed an association between music and executive functions, consequently influencing math skills (Dumont et al., 2017;Azaryahu et al., 2019;Guhn et al., 2019), and also the spatial-temporal reasoning (Rauscher et al., 1994;Graziano et al., 1999), which is essential to process number magnitude since it is spatially represented in the mind (Dehaene et al., 1993). In line with this, studies demonstrated that MT leads to improvements in spatial abilities in pre-and elementary school children boosting their learning of specific math concepts, such as counting, proportions, and fractions (Graziano et al., 1999;Silva et al., 2017;Arias-Rodriguez et al., 2019). If MT increases spatial-visual abstract reasoning, this practice might also improve comprehension about geometry, proportional reasoning, pattern recognition, ratio fractions, and subdivisions (Vaughn, 2000;Schlaug et al., 2005).
The associations between MT and improvement in mathematical skills is often explained by functional brain connectivity (Fingelkurts et al., 2005) and distributive brain processing (McIntosh, 2000). The first is related to the connectivity and organization of brain activity among diverse neuronal assemblies that share functional properties (Fingelkurts et al., 2005), the latter proposes that brain areas are structurally interrelated, and process information in a distributed way (McIntosh, 2000). Previous studies revealed that MT could enhance essential circuits for mental arithmetic such as left and right planum temporale related to temporal speech cue discrimination (Elmer et al., 2016), the left dorsolateral prefrontal cortex (Hudziak et al., 2014), and the intraparietal sulcus (Wan and Schlaug, 2010). Brain activations in these areas are consistent with the enhancement of numerical cognition and spatial reasoning (Gromko, 2004).
Regarding the results of behavioral studies that explored the effects of MT on numerical cognition, Silva et al. (2017), comparing two preschoolers gender-balanced groups, one that underwent eight MT sessions and another that did not receive the training, found that only the group that underwent the MT improved in counting, addition, and subtraction abilities. Moreover, Sanders (2012Sanders ( , 2018 found, in a quasi-experimental design with a MT involving clapping, tapping, and movement on a regular weekly basis during 9 to 10 months, improved mathematical performance such as counting and calculation in preschool and primary school children. She also showed that when a music-mathematics link was established, this effect slightly increased. Nevertheless, the impact of MT has recently been challenged by specialists in the music psychology field (Schellenberg and Weiss, 2013;Sala and Gobet, 2017;Schellenberg, 2019). A metaanalysis performed by Sala and Gobet (2017) showed that the training effects in children with typical development (TD) and young adolescents' cognitive or academic skills might not be reliable, and those previous positive findings were probably due to confounding variables.
Although, there are longitudinal studies, testing neurological and behavioral effects of MT on clinical samples of children with attention-deficit (hyperactivity) disorder (Seither-Preisler et al., 2014;Serrallach et al., 2016), and with dyslexia (Serrallach et al., 2016), very few studies have investigated the MT effects on learning disorder related to numerical abilities, such as DD (Esteki, 2013;Arias-Rodriguez et al., 2019).
Moreover, MT may be an interesting option for remediation since it stimulates cognitive functions in a ludic way (Ribeiro and Santos, 2015), without tapping directly into the traditional context of math learning (Ribeiro, 2013;. In fact, it has been associated with near and far transfer in cognitive skills (Schellenberg, 2004;Miendlarzewska and Trost, 2013). In this context near transfer is related to beneficial effects in similar functions trained in MT, e.g., decode prosodic cues in speech (Thompson et al., 2004), while far transfer, for instance, can test the effects of MT in the IQ capacity (Schellenberg, 2004) or numerical cognition .
With regard to a DD sample,  carried out 14 sessions of MT for two groups of primary school children, one with low numerical abilities and one with typical-numerical skills. Results revealed that children with low achievement improved their numerical cognition performance, especially for number production capacity compared to normative data at post-MT, which consisted of singing, solfeggio, rhythmic, and melodic techniques Santos, 2012, 2017). In the present study, we aimed to extend the findings presented by  by assessing children throughout the MT development and 10 months after the end of MT to evaluate whether the participants outcomes were stable over time. To the best of our knowledge, no other study has carried out a follow-up test regarding the persistence of MT effects on numerical cognition, especially in children with neurodevelopmental disorder.
We chose to apply an MT including singing, objects as percussion, and corporal movements due to the following reasons: (i) Firstly, because music is not part of the public school curriculum, and schools do not have available musical instruments or even a budget to buy them; (ii) Secondly, this study was developed in a countryside community of Brazil and families with low income neither have financial resources to purchase musical instruments nor traditionally acquire musical education. For these reasons, our MT seemed suitable in the context of developing countries since it is based on ecological resources, i.e., the MT sessions use activities and materials that are from children's real-world resources (Shadish et al., 2002). These materials include recognition of sound in their environment, inside the classroom and in the surrounding areas . In this context, music can be taught in an organic way, which means that children apply natural resources, such as their voices and bodies (for clapping and body games and movements) (Sanders, 2018).
In general, this study aimed to examine whether MT increases specific components of numerical cognition abilities in children with DD from a developing country. Specifically, we sought to assess the effect of a brief MT on DD children's numerical cognition contrasting with children with TD. Initially, the design of the study comprised the two groups of DD and TD children (between-subjects factor), which received the same treatment over time (within-subjects factor). Moreover, we compared the DD group results to the normative data of Santos et al. (2012) by visual inspection to clarify whether they would have an equivalent performance at post-test, revealing improvement after MT.
We hypothesize that over the brief MT children with DD will improve performance that specifically concerns numerical cognition. In contrast, no substantial improvements are expected for the TD group. Furthermore, we anticipate that the TD group will score on numerical cognition tasks according to their age group when compared to the normative sample of Santos et al. (2012).

Study Design
This is longitudinal and double-blind research with a mixed design in which, firstly, we compared the groups' baseline with the three-time points after the start of MT: mid-test in the middle of MT, post-test at the end of MT, and follow-up after 10 months. Secondly, the DD and TD groups were compared in the four different time points (Figure 1). All neurocognitive assessments were conducted by a psychologist blind to the intervention status of the children, and the MT was tutored by a music teacher who has had formal musical education and was specially invited to develop the MT. Moreover, he was unaware of the diagnosis of the children, and the children that participated in the MT were blind to their diagnosis.

Participants
The participants were selected from a total cohort of 407 students enrolled in the 3rd school year of four different public schools in the countryside of São Paulo State, in Brazil. From the total cohort, we screened only those children indicated by the teacher as having mathematical difficulties confirmed by grades. For this reason, 223 children (112 boys, aged 8) underwent the screening phase, in which they were assessed in writing, arithmetic, and reading skills with the schooling achievement test (SAT; Stein, 1994). From 223 children, twenty-two out of 201 children fulfilled the criteria for DD and performed the MT (DD, 17 boys, M age = 99.18, SD age = 3.51 months). The DD was indicated by a cut-off value of <9 points in the arithmetic's subtest of SAT (Stein, 1994), which means a substantial delay according to the criterion for the diagnosis of DD by the International Classification of Diseases (ICD-10, F81.2; World Health Organization [WHO], 2018). Their performance on an arithmetic subtest of SAT corresponded to one grade below TD group, which means a substantial delay according to the criterion for the diagnosis (ICD-10, F81.2; World Health Organization [WHO], 2018). Besides, an objective criterion for the DD diagnosis was assured by the cut-off of 1.5 SD below the mean age in the Zareki-R (Rotzer et al., 2009). In the present study, using the two-phase procedure, i.e., screening, and neuropsychological assessment (Shalev et al., 2005), the prevalence of DD was 5.4%, comprising more boys (60%) than girls (Bastos et al., 2016).
Therefore, for our TD, we selected a convenience sample of 29 from the 223 children screened previously, who had no deficit or delay in any subtest of the SAT or Zareki-R; however, three children dropped out the MT, and four were excluded after exploratory statistical analyses determined they were outliers (higher results in Zareki-R); then, data of twenty-two participants (18 boys) were included in the analyses. All participants were native Portuguese speakers and nonbilingual. None of these children presented emotional disorders, motor difficulties, speech and hearing impairments, or had a neurological and psychiatric diagnosis, based on the parent's and the teacher's reports. All children enrolled in the present study exhibited average visual abstract reasoning capacity assessed by Raven's Colored Progressive Matrices Test according to normative criteria (percentile ranging from 40 to 74; Angelini et al., 1999), which was an inclusion criterion. For demographic characteristics, see Table 1.
Socioeconomic status was assessed by the Brazilian Association of Marketing Research Institutes Scale that stratifies the socioeconomic status in five classes, from A/richest to E/poorest (Associação Brasileira de Empresas de Pesquisa [ABEP], 2008). In the present study, mean socioeconomic status was classified as C/middle class, corresponding to 4 to 10 minimum monthly wages -2,488 BRL to 6,220.00 BRL (∼ 1,141.28 USD to 2,853.21 USD). SES means, and standard deviations are displayed in Table 1.
The MT was conducted with five mixed (DD and TD) and balanced (age and gender) training classes once a week for 60 min (two classes with n = 8, two classes with n = 9, and one class with n = 10). The number of participants per class was small to enable active participation. The selection of the MT classes and the initial activities were assigned by pseudo-random criteria related to DD diagnosis and gender. The distribution took into account their classroom period to avoid interference in their regular classes, the training was performed in a different time of formal schooling classes. For example, a control child and a child with DD paired by sex and class time were allocated in the same MT group.

Procedure
Four schools were chosen to be venues of this study according to the National Institute of Studies, and at the Brazilian Development Index of Primary Education that classifies the quality of education system performance from 0 to 10. The quality of the schools selected for this study was classified as ≥6 points, which meets the Organization for Economic Co-operation and Development standard (INEP, 2009), therefore, possible deficits should not be taken into account as pedagogical failures. Ethics committee approval was obtained by the UNESP, São Paulo State University (process number: 1367/2011). Furthermore, written consent was obtained from the participating schools and the parents/guardians of the children. We also asked for children's assent to take part in the study.
During the screening phase, and to avoid disturbing the progress of classes, writing, arithmetic, and reading skills (SAT) were individually assessed in a quiet room at a schedule that did not coincide with that of regular classes. Then the selected children completed the Baseline (diagnosis phase), and after that, they were assigned to the MT. Group allocation is described in Musical Training Paradigm. The children participated in the MT for the first seven weeks, and, at this time, a mid-test was conducted. Immediately after completing fourteen weeks of training, a post-test was administered for the DD and the TD groups. A follow-up measure of all trained children was obtained 10 months after the end of the MT.
Children were assessed in their schools in a single, on average, 60-min session in a quiet room with pauses to avoid fatigue. The pencil-and-paper tasks were performed before the computerized ones. The baseline protocol of neurocognitive assessment was also used in the subsequent time points. Moreover, we also controlled the formal math classes with the participants' teacher to make sure that the same math content was being taught during MT classes.

Musical Training Paradigm
There were seven sessions for each set of activities, in which three classes (2 classes with n = 8, one class with n = 9) begun with the melodic activities, while the others (1 class with n = 9 and one class with n = 10) started with rhythmic activities. The mixed groups' selection and the initial allocation onto rhythmic or melodic activities were random and balanced for each group. For all classes, the mid-test was carried out before the switch to the set of activities.
Our MT was grounded on active musical learning methodologies, such as Willems, Suzuki, Dalcroze Eurhythmics, and Suzuki methods, which combine music, movement, and speech into lessons to make it analogous to a setting where children play (Goulart, 2000). For instance, the activities selected from these musical methodologies are arranged in Table 2. A brief description of each session is presented below (Examples of a complete melodic and rhythmic sessions are given for the first lesson).

Melodic Activities
1st Lesson: Introducing the concept of sound and silence, recognition of sounds in different environments. For this purpose, in the first lesson, the children were asked to describe all the sounds that they could identify inside and outside of the classroom. Thereafter, they were asked to reproduce the sounds described and later to detect and sing the intensity and tonal range in words said out loud in the classroom and through recorded audios of environmental sounds, such as horns, dog barks etc., as well as musical notes from instruments such as the piano and guitar.
2nd Lesson: Working with animal sounds of the city and forest, the contrast between treble, midrange and bass sounds, recognition of animal sounds, and their differentiation.
3rd Lesson: Recognizing various musical instruments, exemplifying the concept of timbre and its distinction, the use of human speech as an example of timbres.
4th Lesson: Sound and silence with stories and introducing the concept of intensity through songs and differentiation between utterances.
5th Lesson: Contrast between weak and strong sounds, discrimination between intensities by listening to different musical styles.
6th Lesson: Graphical representation of sound and discrimination of musical styles.
7th Lesson: Introducing the concept of melody and harmony using different instruments and different notes. Aspects of melody are studied through the analysis of sounds emitted through body motion The experience of the almost universal melodic contours is given by the pentatonic pitches (CDEGA). To interpret musical histories and improvisations Used the identification and maintenance of repeated rhythmic pattern Learning methods Imitation, invention, and improvisation Improvisation through movements suggested by students or/and the musical instructor Improvisation by body percussion and musical rhythms sung. The instruments were not included in this study Toning ability (student's ability to produce and recognize a timbre and the tone of a sound). The use of an instrument was not included in this study

Rhythmic Activities
1st Lesson: Introducing the concept of pulse, body sounds, and discrimination of musical progression through rhythmic variations. In this particular session, firstly, children were asked to mimic the pattern of hand-clapping the teacher was doing. Later, these patterns could vary, and the children had to identify which were the changes and try again to reproduce them according to the teacher's instructions. 2nd Lesson: Examples of sound duration, discrimination between long and short sounds using musical games, body percussion, and drawings.
3rd Lesson: Explanation of the pulse and musical accent, producing different sounds and rhythms using tongue twisters, poems, and rhymes.
4th Lesson: Concepts of pulse, rhythm, use of different timbres, and rhythms with own body through association with rhythm, movement, and body games.
5th Lesson: Rhythmic association with movements, rhythmic reading through drawings and properties of sounds, and their duration in bodily games and songs.
6th Lesson: Improvising rhythms and accents, the conceptualization of musical breaks through the games of glasses.
7th Lesson: Listening and rhythmic activities, improvising rhythms and accents, differentiation between binary, ternary, and quaternary compasses.
A booklet of the MT was provided for the music teacher with a detailed description of all planned activities per session. Through these guidelines, the teacher could reproduce the same content for each session. Moreover, the music teacher used a guitar to guide participants throughout lessons. To guarantee the understanding during the MT implementation, the researcher and the music teacher had meetings once a week in which the music teacher reported the performance of each child referenced by name and school. This information was not shared with parents or school teachers, i.e., it was exclusively gathered for research quality control.

Screening Measures
Anamnesis (Santos, 2002) Set of qualitative questions to assess general and specific child development in social, educational, psychological, and health dimensions. It was used as a screening tool for exclusion criteria, as it included questions regarding neurological, psychiatric or psychological disorders, and chronic use of psychoactive substances.

SES -Brazilian Association of Marketing Research Institutes Scale (Associação Brasileira de Empresas de Pesquisa [ABEP], 2008)
The Criteria of Economic Classification Brazil is an index consisting of a series of 11 questions about the possession of durable goods and the educational level of the head of the household.
Schooling Achievement Test (Stein, 1994) Set of three subtests: (1) Writing task: In which the experimenter reads out loud a maximum of 34 words and children are required to write them; (2) Reading task: a paper with 70 words is presented to the child who is asked to read; and (3) Arithmetics task: This subtest is composed of two calculation modalities: the first one, contains three oral calculations (for example, "if you had three sweets and won 4 more, how many would you have now?"); and the second test included 35 written calculations (from simple calculations, subtraction, multiplication, division to fractions and equations, with one to three digits) in a separate notebook in which children are asked to answer as many questions as they can. Each item presents a range of calculations in ascending order of difficulty, which are presented to children regardless of their school age. The dependent variable for each subtest was the sum of correct answers, and, consequently, the total score was the sum of the three subtests. In the original psychometric study with the SAT, the following Cronbach alpha coefficients were reported: (1) Written task = 0.95; (2) Reading task = 0.99; Arithmetics task = 0.93; and Total score = 0.99 (Stein, 1994).

Cognitive Measurement
Raven's Colored Progressive Matrices Test (Angelini et al., 1999) A measure of abstract visual reasoning, which assesses general cognitive ability, in other words, eductive ability (Raven et al., 1998). It is composed of three series with 12 items: A, Ab, and B. In each series, the items are arranged by increasing order of difficulty, each series being more difficult than the previous one. Easier items are always placed at the beginning of each series, which has the purpose of introducing the examinee to a new type of reasoning, which will be required for the following items. The items consist of a drawing or matrix with a missing part. Below the main drawing, six alternatives are presented, one which correctly completes the array. The child must choose one of the alternatives to the missing part. Angelini et al. (1999) presented satisfactory reliability (Cronbach alpha = 0.90).

Zareki-R -Battery of Neuropsychological Tests for Number Processing and Calculation in Children-Revised (Von Aster and Dellatolas, 2006)
Zareki-R is an international specialized pencil-and-paper battery test that assesses numerical cognition in school-age children. For this article, subtests were organized into five numerical cognition subtests as described below: (1) Number sense -composed by (i) Counting dots-Children have to enumerate different sets of dots. The scoring system considers the number of correct responses and (ii) Perceptual estimation -The child has to orally give an estimate of the number of items shown in a picture, which is displayed for 5 s, for example, the number of balls in the picture (the precise answer is 57 balls); (2) Number Production -consists of three subtests: (i) Counting backward -the participant must count the dots backward, e.g., count the sequences from 23 to 1 and from 67 to 54; (ii) Dictation of numbers -the child is asked to write, in arabic numerals, eight numbers orally presented (e.g., 23); and (iii) Reading Numbers: The participant should read out loud eight numbers written in arabic numerals, such as 15 and 1900.
(3) Number Comprehension -is comprised of three subtests: (i) Oral comparison -Eight pairs of numbers are verbally presented (e.g., 34601 and 9678) and the child must judge which one is the largest in quantity; (ii) Contextual estimation: The child must judge sentences in terms of coherence between quantities and context, for instance: "eight lamps in the same room" is "little, " "median, " or a "lot?"; (iii) Written comparison: Pairs of numbers in arabic numeral form are presented visually, for example, 13 and 31, and the child must judge which one is the largest. (4) Calculation: includes two subtests (i) mental calculation, in which eight additions, eight subtractions and six multiplications are orally presented; and (ii) Problem solving: The participant must solve orally presented numerical problems of increasing difficulty. For instance, one of the problems is, "Peter has 12 marbles. He gives 5 to his friend Ann. How many marbles does Peter have now?". (5) The Positioning numbers on an analog scale is a measure of mental number line: in this subtest, a vertical line is presented in which the participant is asked to point and mark a specific position indicated by the experimenter. Moreover, the number line task from the ZAREKI-R differs from the classical number line estimation task (Berteletti et al., 2010;Schneider and Siegler, 2010) as the number line in the ZAREKI-R is vertical and not horizontal. For each subtest, the sum of correct answers was calculated, and the total score of Zareki-R is the sum of points in the tests mentioned above, which was used as the dependent variable. The battery also has a measure of phonological memory that is not included in the Zareki-R total score. The Memory of Digits requires the forward (FDS) and backward (BDS) repetition of digit sequences of increasing length. Its total score was used as a dependent variable.
The Brazilian normative sample of the Zareki-R  was formed by 172 children from the same region and cultural backgrounds, which were not attending any training at the time of the study; this sample was used as a normative data set for clinical comparisons. The re-test reliability indicated adequate test-retest reliability (0.87 over a 14-week period) for Zareki-R Total .

Post-MT Self-Report Evaluation
This questionnaire was designed exclusively for this study to assess if children could identify MT effects through six different dichotomous questions: "After the MT did you: (1) improve your school learning?; (2) notice changes in the way you do math calculations?; (3) improve your grades?; (4) notice changes in your memory?; (5) notice modifications in your mood?; (6) notice alterations in your attention?" (Ribeiro, 2013;. Children were asked to answer with yes or no for each question.

STATISTICAL ANALYSES
(1) To investigate the effects of the MT on abstract visual reasoning, memory of digits and numerical cognition performance in DD and TD groups over 14 weeks at fourtime points, a series of analyses comparing both groups were performed: (2) Three separated 2 (Groups: DD and TD) × 4 (measurement time point: baseline, mid-test, post-test, and followup) repeated-measures ANOVA were performed for the following dependent variables: abstract visual reasoning percentile, the memory of digits subtest, and Zareki-R Total; (3) A 2 (Groups: DD, TD) × 4 (measurement time point: baseline, mid-test, post-test, and follow-up) repeated measures MANOVA was performed having as dependent measures numerical cognition systems (number sense, number line, number production, number comprehension, and calculation). All preconditions for conducting ANOVAs and MANOVA were tested (normality and Mauchly's test of sphericity), and no violation was found. (4) A non-parametric Mann-Whitney U test to compare groups in Post-MT self-report evaluation items responses was applied since it included data that were not normally distributed; (5) We also determined re-test reliabilities at different measurement time points for Zareki-R total.

Numerical Cognition
The following ANOVA results for Zareki-R total revealed a main effect of group, F(1,42) = 46.04, p < 0.001, MSE = 710.86, η 2 p = 0.52. Pairwise comparisons revealed that the DD group had worse performance in the Zareki-R total score compared to the TD group (p = 0.001). A main effect of time was also verified, F(3,126) = 61.33, p < 0.001, MSE = 105.40, η 2 p = 0.59, in which comparisons showed that baseline results were lower as compared to mid-test, post-test, and follow-up test results (ps < 0.001). Finally, there was a significant interaction effect between group and times, F(3,126) = 10.97, p < 0.001, MSE = 105.40, η 2 p = 0.21. These interaction shows the effects of MT on the Zareki-R total changes according to groups.
To see whether groups differed with regard to numerical cognition abilities, we carried out the 2 (Groups: DD, TD) × 4 (measurement time point: baseline, mid-test, post-test, and follow-up) repeated measures MANOVA. The results showed a significant group effect, F(1,42) = 43.96, p < 0.001, MSE = 32.97, η 2 p = 0.51, in which the DD group showed lower scores than the TD group (p < 0.001).
To disentangle this significant interaction, we performed separate repeated measures ANOVAs for the DD and TD groups. Results indicated significant effects only for the DD group on number production-[F(3,63) = 36.54, p < 0.001, η 2 p = 0.63], in which pairwise comparisons showed that baseline score was lower compared to the other three assessed timepoints (ps < 0.001) and mid-test performance was lower than follow-up performance.
For number comprehension -[F(3,63) = 18.00, p < 0.001, η 2 p = 0.46], baseline performance was lower as compared to the other assessed timepoints (ps < 0.002); Calculation -[F(3,63) = 21.72, p < 0.001, η 2 p = 0.51], baseline was lower compared to the other assessment times (ps < 0.02) and Mid-test performance was poorer compared to the follow-up (p < 0.001) (Figure 2). We displayed in Table 3 mean and standard deviations of raw scores and the percentage of correct responses for numerical cognition systems in the four timepoints.

Post-MT Self-Report Evaluation
To investigate whether groups differed in the self-report evaluation of learning, math learning, grades, memory, mood, and attention after the MT, we carried out six separate Mann-Whitney U tests. Results showed significant results just for selfreported memory capacity, U = 143, p = 0.04, in which DD group showed more yes responses to higher memory capacity after the MT compared to the TD group. The percentage of children in each group who said yes on each of the questions is displayed in Figure 4.

Test-Retest Reliability
The test-retest reliability was conducted to investigate the Zareki-R temporal stability. The test-retest reliability results using Pearson's test detected strong positive and significant correlations between the test and retest for Zareki-R (Baseline -midtest = 0.87, mid-test -post-test = 0.87, and post-test -followup = 0.87).

DISCUSSION
In this longitudinal, double-blind, and pseudo-randomized study with mixed design, the between-groups comparison tested for effects of the MT in children with DD compared to children with TD in measures of numerical cognition and abstract visual reasoning, while the within-group comparison tested changes at specific time points. Far transfer effects due to MT stimulation were investigated comparing both the DD and the TD groups at four assessments: Baseline, mid-test, post-test at the end of MT, and a Follow-up 10 months after the end of MT.
As far as we know, there are no other longitudinal studies with a follow-up currently available to contrast the efficacy of organic MT activities on numerical cognition and visual abstract reasoning nor data showing its effectiveness. Importantly, this study was carried out in a clinical sample, mostly understudied employing a proper operational diagnosis supported by psychometric measures and the medical manuals. In terms of epidemiology, our study followed a two-phase diagnostic assessment, i.e., the first phase comprised the screening of intelligence and schooling achievement, such as writing, arithmetics, and reading abilities and the second phase was the diagnosis (ICD-10, F81.2; World Health Organization [WHO], 2018) confirmation based on the Zareki-R's cutoff criteria (Rotzer et al., 2009), with a validated battery for Brazilian children that allowed an accurate diagnosis. We found a similar prevalence rate to international studies (Butterworth, 2005;Shalev et al., 2005;Sigmundsson et al., 2010) and national ones that showed higher prevalence in boys with DD than girls (Bastos et al., 2016;Fortes et al., 2016). Moreover, our sample apparently had a rare phenotype of primary DD, i.e., children neither had comorbidities, nor other cognitive deficits apart from the ones in numerical cognition (von Aster and Shalev, 2007;Ashkenazi and Henik, 2010;Kaufmann et al., 2013;. The comparison between groups indicated that the DD group had lower scores at the baseline in many numerical abilities (Number sense, number production, number comprehension, and calculation) compared to the TD group. Clinically, i.e., considering normative data , only number line performance corresponded to the expected age mean score on the posttest, and the overall low performance confirms the diagnosis of DD (Rotzer et al., 2009).
Congruently to our first hypothesis, the longitudinal perspective revealed that children with DD that accomplished the MT outperformed themselves from the baseline to mid-test in number production, number comprehension, number line, and from the baseline to follow up in calculation. This result suggests that the DD group was more responsive to MT since they had room to grow in performance. By contrast, the TD group neither shows statistical nor clinical changes, since their mean scores were already equivalent to their age counterparts (Santos et al.,  2012). This result is congruent with studies showing that MT could cause modest or null far-transfer in children with TD (Sala and Gobet, 2017). Furthermore, it is essential to make clear that children in the DD and TD groups did not receive any additional training during baseline until post-test, and that formal math instruction at school was the same for all of them; only in the follow-up two of these children had started music classes unrelated to MT. Apart from that, changes do not seem to be due to successive assessments with the same tasks, i.e., learning effects. If this was the case, the retest-reliability would be lower (Spreen and Strauss, 1998), and performances in the tasks should be consistently better over time for both trained groups, which was not detected.
Observing numerical systems individually in children with DD, the data revealed improvements until 10 months after the post-test, which might be named as remediation. At least for 10 months after the post-test, children with DD retained gains in the number comprehension and the number line tasks. Changes in number line task after intervention are assumed as a core improvement in numerical cognition per se and associated with brain changes (Kucian et al., 2011). A plausible explanation for this result would be that the MT activities did facilitate symbolic representations (Graziano et al., 1999;Vaughn, 2000;Gromko, 2004;Schlaug et al., 2005;Nutley et al., 2014) when children were given tasks such as drawing the size or the tone of the sound.
Additionally, in the follow-up, the TD group outperformed the DD group in number production; even though based on normative data from the Zareki-R validation sample , the groups were equivalent. The calculation was the only ability that remained below average at the last time point, in defiance of slightly increased scores in the course of repeated assessments. The lower results in the calculation were influenced by the abilities to perform mental calculation and problemsolving tasks orally since the sum of these scores comprised the calculation system.
The resistance of these abilities to intervention across time is well established (Shalev et al., 2005;Mazzocco and Räsänen, 2013) and suggests that the DD children remain unable to follow the steps required to effectively perform the calculation (McCloskey et al., 1985;Skagerlund and Träff, 2016). However, bearing in mind the increasing trajectory of calculation after MT, perhaps remediation could be achieved with longer intervention. As would be expected in this neurodevelopmental disorder, the Zareki-R Total of the DD group is still lower than the TD group scores at this time point due to calculation deficits.
In the case of the number line, results indicated that this core ability of numerical cognition frequently impaired in children with DD is responsive to MT stimulation. Although the task we used differs from the classical number line estimation (c.f., horizontal one as Berteletti et al., 2010;Schneider and Siegler, 2010), it has been previously studied in computer assistance intervention. Successful performance in the vertical number line task is associated with comprehension of the connection between numerical magnitudes, ordinality, and precise number representation (Kucian et al., 2011;Käser et al., 2013) and we infer that MT stimulates this connection in children with DD (Michels et al., 2018).  These outcomes indicate that MT provided a fruitful basis for children with DD to access the symbolic numbers' magnitude representation enabling learning in regular math classes, which conceivably corroborates, to some extent, the studies which used instrumental MT for number production and comprehension (Graziano et al., 1999;Vaughn, 2000;Gromko, 2004;Schlaug et al., 2005;Nutley et al., 2014).
As far as we know, this was the first study that used the MT as a remediation tool for children with DD, and the outcome suggests that even complex abilities, such as calculation, may improve with this kind of training. Despite the lack of untrained control groups, we can partially assume that our MT would be beneficial to mathematical knowledge because it produced far transfer effects in numerical cognition after controlling several variables that could influence groups performances. By contrast, regular math classes, and even conventional educational interventions, apparently might not cause significant changes in numerical cognition capacity according to follow up studies (Shalev et al., 1998(Shalev et al., , 2005Mazzocco and Räsänen, 2013). Moreover, the Post-MT selfreport evaluation indicated that children with DD were aware of their improvement in math and 60 to 80% of trained children spontaneously and explicitly associated math gains to MT.
On the baseline, the DD group performed abstract visual reasoning in equivalence to the TD group. There was no significant interaction between measurement timepoint and tested groups, revealing that MT did not produce a far transfer effect to the abstract visual reasoning, corroborating Sala and Gobet (2017). As previously demonstrated by Schellenberg (2004), the impact on IQ is relatively small regardless whether researchers use IQ subtests or the sum of the scores. Being so, abstract visual reasoning seems to be a particular task, which could not be influenced by MT. Nevertheless, differences in MT activities should be further studied to make clear on which aspects of cognition MT has a beneficial effect.
The far transfer effects for numerical cognition described here were confirmed by cluster analyses contrasting two-time points in a previous study . The number of children assigned to the control group was larger on the post-test, i.e., eight children with DD normalized their math scores after the MT .
The outcomes of this study indicate that children having mathematics difficulties could benefit from our organic MT to improve learning. At the same time, the organic MT motivated children barely familiar with music as art toward instrumental music education. Instrumental music teachers that are frequently involved in group music teaching and teaching in various educational contexts could incorporate some of these elements in their practice. For instance, the instrumental music teacher should teach apart from the technical aspects of playing a musical instrument, they should promote motivation to children express themselves through music, develop aural skills and ability to children be musically inventive which later on might result in interest to pursue instrumental music learning. Furthermore, in our view, our organic MT is a pleasant technique that has the advantage of being accomplished at school and in mixed groups, which favors inclusion and socialization, in addition, it does not require mobility by parents and students, because the activities could be carried out at the school, requiring only an educator with formal musical education.

Strengths, Limitations, and Perspectives
This study describes the basic structure of an MT program and its effects in order to expand knowledge about the techniques that are suitable for cognitive remediation. Also, the protocol had specific tasks for different components of numerical cognition, providing a scrutinized view of these abilities.
One of the limitations of this study was the design, which did not include untrained DD and TD groups throughout the fourtime points. It was not feasible since the screening phase, MT, and all assessments must occur while the child remains in the same school year. Otherwise, the cognitive scores, which are age-or schooling-related, would not be suitable for comparison with the baseline, even though we cannot ignore that the math content and practice might increase across the school year as the general development.
Considering the prevalence rate of DD, we should search for a new cohort of around 400 children to find another sample of 22 children. Moreover, this design was preferred due to ethical constraints since the ethics committee would not approve a research project having an untrained DD group. Nevertheless, our sample was considerably larger than in other studies with a similar design (Kucian et al., 2011;Ashkenazi and Henik, 2012;Michels et al., 2018). Furthermore, we controlled math classes on formal education to be similar across schools and also requested information regarding extracurricular courses confirming that children did not engage in extra activities besides the MT. Apart from that, we had normative data to contrast performances from the Zareki-R validation carried out by Santos et al. (2012), which, in a broad sense, are "untrained" children.
Although we applied rhythmic and melodic lessons, it was not our aim to investigate which of the two activities were most effective in numerical cognition. It would be important that researchers could systematize the findings of diverse MT strategies. However, another type of design should be applied, such as a crossover study to investigate each of the methods, for instance, an organic MT vs. a formal instrumental MT.
As for future directions, it is essential to replicate MT in other samples with DD and to explore its effects in other learning disabilities, aiming to understand possible transfers and neuroplasticity. Moreover, future studies may explore ecological transfer measures, such as children's school grades across the school year, and also specific musical abilities, e.g., rhythmic and melodic, which can later be used for cognitive remediation of different types of learning disabilities, if possible, with neuroimaging techniques. In line with these considerations, future research will be essential to explore the efficiency and cost-effectiveness of such methods.

Conclusion
In conclusion, the DD group showed slight improvements in numerical cognition throughout 14 sessions of a brief MT, especially for number production, number comprehension, and calculation. Scores for calculation remained better for the DD group compared with their baseline, but lower compared to the TD group performance throughout the training, which shows that the calculation deficits seem to be longer-lasting in the DD group. On the other hand, the follow up indicated that the MT benefits in numerical cognition remained at least 10 months after training. Moreover, the organic MT seems to produce beneficial cognitive effects similar to those obtained with instrumental MT with the advantage of being appropriate for environments with socioeconomic disadvantage.

DATA AVAILABILITY STATEMENT
The datasets generated for this study are available on request to the corresponding author.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics committee of UNESP, São Paulo State University. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.