Maturation of the mismatch response in pre-school children: Systematic literature review and meta-analysis

Event-related potentials (ERPs), specifically the Mismatch Response (MMR), holds promise for investigating auditory maturation in children. It has the potential to predict language development and distinguish between language-impaired and typically developing groups. However, summarizing the MMR ’ s developmental trajectory in typically developing children remains challenging despite numerous studies. This pioneering meta-analysis outlines changes in MMR amplitude among typically developing children, while offering methodological best-practices. Our search identified 51 articles for methodology analysis and 21 for meta-analysis, involving 0 – 8-year-old participants from 2000 to 2022. Risk of Bias assessment and methodology analysis revealed short-comings in control condition usage and reporting of study confounders. The meta-analysis results were inconsistent, indicating large effect sizes in some conditions and no effect sizes in others. Subgroup analysis revealed the main effects of age and brain region, as well as an interaction of age and time-window of the MMR. Future research requires a specific protocol, larger samples, and replication studies to deepen the understanding of the auditory discrimination maturation process in children.


Introduction
In the field of language development research, different methods to gain objective measure are sought, because behavioral evaluations may not reflect the realistic abilities of a young child.One feasible option is to measure the brain's bioelectrical activity, which can be used to describe the ability to discriminate between different characteristics of speech (Näätänen et al., 2007;Norton et al., 2021) and is denoted as the Mismatch Response (MMR).The MMR seems to play an important role in the complex process of speech perception and its development (Näätänen et al., 2012).Although there is a considerable number of children's MMR studies available, the maturation of typical auditory processing indexed by the MMR cannot be conclusively summarized.Yet, it is essential to track the changes happening in the typical development of the MMR to create a deeper understanding of the maturation of auditory processing and in the future possibly use it as a diagnostic tool for detecting language impairments.The brain's bioelectrical activity measured with the electroencephalograph (EEG) provides several ways to study language development.Different analyses of EEG recordings like time-frequency analysis, network analysis or event-related potentials (ERPs) give complementary insights into the brain's functioning (Hervé et al., 2022).ERPs are bioelectric voltage deflections that are recorded from the scalp through electroencephalogram (EEG) and reflect lower-and higher-level -processing of sensory information, linked in time with physical or mental events (Duncan et al., 2009).Among other ERPs the MMN is a widely researched fronto-central negativity occurring at the latency range of 100-250 ms (Duncan et al., 2009).It reflects the level of prediction error made when an unexpected stimulus, the deviant, is presented in the auditory stream (Escera et al., 2014;Fong et al., 2020).More specifically, it denotes speech perception sensitivity because its amplitude is associated with performance in discriminating acoustic differences (Kujala, 2007;Näätänen, 2001).To derive the MMN waveform the averaged ERPs to the frequent stimuli are subtracted from the ERPs to rare stimuli, which results in a difference wave (Duncan et al., 2009;Näätänen et al., 2007).
In child studies the MMN is frequently referred to as the MMR (e.g., Linnavalli et al., 2018), because it can be both negative and positive depending on age and experimental conditions.The MMR response offers crucial insights into the developmental changes in the brain as well as into the functioning of the typical and atypical brain.Real-time measurement can capture the activity underlying rapid and transient language processing within milliseconds.The MMR can be obtained passively without any effort nor attention needed from the test subject (Näätänen, 2001), therefore making objective evaluation of children possible.It is an excellent method to conduct longitudinal studies because the task demands do not change (Norton et al., 2021).Additionally, several studies have shown that the differences between typically developing children and impaired or at-risk of language impairment group are detectable in MMR-s (for more detail, see Cheng et al., 2016;Gu and Bi, 2020;Hämäläinen et al., 2013;Kujala & Leminen, 2017;Norton et al., 2021;Sperdin & Schaer, 2016;Virtala et al., 2022).Finally, the MMR measure shows potential to presage later language skills (for more detail, see Bishop, 2007;Cantiani et al., 2016;Choudhury and Benasich, 2011;Depoorter et al., 2018;Friedrich et al., 2004Friedrich et al., , 2009;;Guttorm et al., 2005;Norton et al., 2021;Schaadt, 2015Schaadt, , 2023;;Weber et al., 2005).
During the preschool years (0-8-year-olds), the auditory system goes through an immense amount of change, developing complex abilities like recognizing speech from other environmental sounds, segmenting speech streams, categorizing native phonemes, recognizing words, understanding grammar, sentences and narratives (Vihman, 2014;Dick et al., 2016).The preattentive processing, accessible via MMR, plays an important role in achieving adult-like auditory and language perception abilities (Norton et al., 2021).It takes place within a couple of hundred milliseconds after the auditory signal reaches the ear and it's an automatized process which happens involuntarily (Duncan et al., 2009).During preattentive processing the auditory system discriminates different features of auditory signal, for example frequency, duration, and intensity cues.This kind of complex neural process can be modulated by intrinsic factors like sex (Marklund et al., 2019) or overall cognitive skills (Liu et al., 2007) and extrinsic factors like multilingual home environment (Shafer et al., 2011;Yu et al., 2019) or musical training (Putkinen et al., 2014;Virtala et al., 2012).
Although a fair amount of MMR studies have been done on young children, it is still difficult to summarize the maturation of the response.To our knowledge the last conclusive review was written by Cheour et al. (2000) where they discussed studies done with children so far.It was concluded that a prominent and stable MMN can be obtained from the majority of children, but indicated that there are some differences between adult and child MMNs which need further investigation (Cheour et al., 2000).The question of polarity and stability is still relevant to this day and will be considered here.First of all, more recent research has reported both frontal, central and/or frontocentral negative MMR, a counterpart of the adult MMN (e.g., Linnavalli et al., 2018;Liu et al., 2014;Virtala et al., 2022), as well as a positive MMR (p-MMR; e.g., Nan et al., 2018;Ní Choisdealbha et al., 2022;Virtala et al., 2022;Yu et al., 2019) in response to a change in auditory stream.It is not entirely clear whether the p-MMR can be interpreted as an ''immature MMN" (He et al., 2007(He et al., , 2009a(He et al., , 2009b;;Lee et al., 2012;Mueller et al., 2012;Schaadt, 2015) which will change polarity while developing and become an MMN.There are several assumptions available that will be discussed further.Chen et al. (2016) suggested that the p-MMR might be a predecessor of the P3a response, which is thought to reflect orienting of attention (Escera et al., 1998).Further, Lee et al. (2012) proposed that the positive difference wave is not a specific phenomenon for infants, but it is evoked by certain stimuli-related factors, for example short ISI or a more difficult discrimination condition.The opposite polarities of the MMR can also indicate different processing mechanisms (Rivera-Gaxiola et al., 2005).More specifically, the bottom-up and the top-down processing, which have different generators in the brain (see more Schaadt et al., 2015).Finally, some studies report a presence of two simultaneous discriminative wave-forms stemming from different cortical layers: one being a slow positive and the other a fast negative component (Shafer et al., 2010;Trainor et al., 2003).Herein we will not take a certain position on that matter, but simply observe the developmental trajectory of both p-MMR and MMN.
There are several key aspects to this uncertainty, which concern both the reliability and validity of child studies.First of all, there is a scarcity of large sample and/or longitudinal studies and replication studies to resolve the great inter-subject variability in child research and to define the limits of the normal variation (as already stated in Cheour et al., 2000).However, it is worth considering that even in studies with relatively large samples, as demonstrated by Kuuluvainen, N=63) et al. (2016), there can be considerable inter-subject variability observed within a specific age group (6-7-year-olds) and among individuals with typical cognitive abilities.This refers to a need to further study the factors influencing the variability of the MMR in larger homogeneous groups and supplement the methodologies, to confirm the MMR to be a reliable tool to study cortical maturation.
Additionally, there is a double-sided problem with research methods used in child studies.Brooker et al. (2020) emphasized that reviewers, editors, and consumers often expect application of adult study techniques and procedures to younger samples, which creates unrealistic expectations for research paradigms, data collection, and data reduction and analyses, which in turn leads to inappropriate measures and methods that hinder drawing conclusions and advancing theory.The latter may have caused the huge variety in methodological aspects used in child studies.
The aim of the current literature review is to systematically assess the existing research to reveal different methods that are used to elicit the MMR and based on the acquired knowledge to give methodological suggestions for reaching best-practice in future child research.The aim of the meta-analysis is to give an overview of whether and how the MMR amplitude matures between the ages of 0-8 years in typically developing children.

Method
A systematic literature review and meta-analysis was conducted to identify the methodologies used to elicit the MMR in children.This review also intends to map the developmental changes in the MMR amplitude taking place from birth to school age in typically developing participants and highlight the knowledge gaps in the current literature.The methodology for this systematic literature review is carried out in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009), additionally following the Cochrane recommendations (Higgins & Cochrane Collaboration, 2020).

Eligibility criteria for the methodology analysis
The same search terms were used to find articles for the methodology analysis and to the meta-analysis, but the eligibility criteria for the methodology analysis were more lenient.The selection comprised studies where at least some of the participants (e.g., control group) were typically developing monolingual children aged 0-8 years -a common pre-school age in the European educational system.Only articles with longitudinal design or including comparisons between different age groups cross-sectionally were included in the analysis.Studies utilizing any stimulus type other than auditory were excluded from our analysis L. Themas et al. (e.g., audiovisual stimuli like simultaneous presentation of a speech sound and a corresponding letter).However, studies that focused on measuring the differentiation of various auditory features, such as changes in fundamental frequency or changes in vowel within a syllable, were specifically included.To establish the electrophysiological auditory processing markers, only those studies which reported data from difference waves were included.All publications came from peerreviewed journals between 2000 and March 2022 and were written in English.

Eligibility criteria for the meta-analysis
The eligibility criteria for the meta-analysis (see Table 1 for full selection criteria) were more stringent than those for the methodology analysis.We included some intervention studies but exclusively used data of participants that did not receive the given intervention.Further, we included studies that examined participant groups with impairments or at-risk of an impairment for the purpose of comparison with typically developing groups.From all studies only the data from the typically developing control groups was used.The time-window of analysis of the MMR amplitude was restricted to 100-350 ms, which was chosen based on adult studies (Kujala et al., 2007), but also taking into consideration the results of child studies (Näätänen et al., 2012), because in the younger population the pre-attentional MMR response may be of longer latency.

Search strategy and data derivation
To identify relevant studies, the electronic databases Scopus, Web of Science, PubMed, and EBSCO were searched.The literature searches were conducted through the online databases accessed via the University of Tartu library portal and the following search terms were implemented individually or combined: 1) MMN OR {mismatch negativity} OR MMNm OR *MMR OR {mismatch response} 2) child* OR infant* OR "toddler* " OR newborn* OR preschool* 3) maturation* OR longitud* OR {age-related} OR development* The publication period under observation was equally divided into four sections each given to one of the four authors (for example: 2000-2006, 2007-2011 etc.).Each author screened the titles and abstracts yielded with the database search against the eligibility criteria in the publication period appointed to them.In case of uncertainty, whether to include or exclude the article.The full text reading was similarly divided between authors as the title and abstract screening.Any uncertainties were resolved through discussion.
For the methodology analysis authors extracted data from the listed subject areas: characteristics of participants, native language of participants, ages at ERP measurements, time-window of MMR, paradigm, frequency of the deviant, total number of stimuli presented, description the of stimuli, number of stimulus sets, inter-stimulus interval or stimulus onset asynchrony, number of electrodes used, which electrodes were analyzed and the number of accepted trials which were used to calculate the difference wave.
The data for the meta-analysis was extracted and double-checked by the first author.Any uncertainties were resolved through discussion.From the age-group comparison studies three parameters were extracted for the statistical analysis: the number of participants in the group, the mean amplitude of the MMR in microvolts, and the dispersion of the amplitude (e.g., standard deviation, standard error, or confidence intervals).From the longitudinal studies the correlation between measurements at different time-points was additionally obtained.If the mean amplitude was reported in successive time-windows after stimulus onset, then only these time-windows that were defined as being MMR responses by the current authors and which also corresponded to the previously determined time-window (100-350 ms) of the current study were used.We included all the reported data points of different electrodes or regions, being it the frontal or the occipital region.If any of the previous was not reported in the article, the authors were contacted via email, with a request for missing data.If the numeric data was available solely on figures, then the PlotDigitizer: Version 4.5 (https://plotdigit izer.com/)software was used to derive the necessary.

Risk of bias assessment
The risk of bias (RoB) in the selected studies was rated using the Quality in Prognosis Studies in Systematic Reviews (QUIPS; Grooten et al., 2019;Hayden et al., 2006Hayden et al., , 2013) ) recommended by Cochrane Prognosis Methods Group (Group CCPM, 2022).The QUIPS tool consists of 6 RoB domains: study population, study attrition, prognostic factor measurement, outcome measurement, confounding measurement and account, and analysis.Each domain provides guiding statements that the evaluators rated on a four-point scale ("no", "yes", "partially" "not sure").Based on these ratings, the evaluator had to decide on a RoB rating on each domain: low risk of bias, moderate risk of bias, and high risk of bias.Finally, an overall RoB score was given to each article.Before the rating process started all authors discussed every guiding statement in the QUIPS tool to have a similar understanding of their meaning.Additionally, all authors rated one chosen study individually, and afterwards compared their ratings and discussed their disagreements collectively.All the articles in the meta-analyses were then rated at least by one author.For 20 articles the ratings were collected from two evaluators in order to assess the initial inter-rater agreement.Any disagreements were solved involving a third person among the authors into the rating process.The third party reviewed the study in question and decided on their own bias score for the domain where the disagreement arose.The final decision on the score was then made based on whichever judgment two authors out of three agreed upon.Any other uncertainties or disagreements were solved through ongoing discussion between all the authors.⁴ Diffusion Tensor Imaging.⁵ Magnetoencephalography.

Statistical analysis
The statistical analysis was performed with R Statistical Software version 4.1.2(R core team, 2021), using different packages for meta-analysis and the ggplot2 package (Wickham, 2016) for plotting.The data and the R code can be downloaded from the Open Science Foundation repository (DOI 10.17605/OSF.IO/VE3J6).
To measure the change in the main dependent variable, the MMR amplitude between the youngest and the oldest age-group in the crosssectional age-group comparison studies, esc package (Lüdecke, 2019) was used to calculate the standardized mean difference (Hedges G) individually for each study.Additionally, the standardized mean gain was calculated individually for each study between the first and the last measurement point in the longitudinal experiments (or between measuring points where the data was available).As authors rarely report a correlation between the results of different time-points, the correlation from the work of Ní Choisdealbha et al. ( 2022) was used to calculate all of the effect sizes of the longitudinal studies that included participants ages 0-1 and a half.For the ages 5-8 years the correlation from the study of Linnavalli et al. (2018) was used.Repeated measure correlation was pooled with the rmcorr package (Bakdash andMarusich, 2017, 2022) within each of the two studies as both authors reported several correlation between different time-points of measure.Additional correlations were reported by Engström et al., (2021), Jansson-Verkasalo et al. (2004), Schaadt et al. (2015), but these were not used because the correlations were calculated on samples that included both typical and impaired population.
All the individually calculated effect sizes were further converted to reflect the change of the MMR amplitude during 1 year (Viechtbauer, 2019), which was necessary due to the variety of time intervals between the age-groups or measuring points.This is a practice recommended, however with the current data this kind of conversion erases important nuances about the change in the MMR amplitude which take place within shorter periods than a year.Additionally, the author who provided the normalization formula (Viechtbauer, 2019) emphasizes that the standard deviations of the measurements cannot be too large.It is known that the variance in child ERP study results is indeed large, and it is the same case with the data in the current meta-analysis as well.Therefore, the results solely with the not-normalized data are presented herein and the results with the normalized data can be found in from the Open Science Foundation repository (DOI 10.17605/OSF.IO/VE3J6)..
Using the metafor package (Viechtbauer, 2010) individual standardized mean differences from studies were pooled together with a multi-level meta-analysis model (Harrer et al., 2021).As we anticipated considerable between-study heterogeneity, a random-effects model was used.With the dmetar package (Harrer et al., 2019) the restricted maximum likelihood estimator (Viechtbauer, 2005) was used to calculate the heterogeneity variance I 2 and the t-distribution to calculate the confidence interval around the pooled effect (Harrer et al., 2021).For detecting influential cases in the data, the Cook´s distance, DF beta values and Hat values in metafor package (Viechtbauer, 2010) were calculated and to address the possible publication bias (Fox and Weisberg, 2019), a funnel plot with the meta package (Baluzzi et al., 2019) was created to visually detect if any meta-analysis studies suffered from publication bias (Harrer et al., 2021).Finally, a sub-group analysis was performed adding predicting variables and their interactions into the model one-by-one.The likelihood ratio test was used to select the best optimal model (Gurka, 2006;Verbyla, 2019).There were several numerical and categorical predictor variables that were entered into the model, and these were selected using the available literature.The numeric variables were: (1) the midpoint of age, which in the longitudinal studies was the sum ages at the first and the last measuring divided by two and in the cross-sectional studies the sum of the ages of the youngest and the oldest comparison group divided by two; (2) the (log-normalized) mean of the MMR response time-window.The categorical variables were: 1. RoB (risk of bias) score, levels: low, moderate and high; 2. Design, levels: longitudinal or cross-sectional; 3. Type of stimuli, levels: non-speech sound (e.g., pure tones, white noise, complex sounds) or a speech-sound; 4. Brain region categorized by regions: frontal, central, fronto-central, and other; 5. MMR polarity type, levels: (a) continually negative, (b) continually positive, (c) changing from positive to negative between measuring points or age groups, (d) changing from negative to positive between measuring points or age groups.
For the post-hoc analysis the Tuckey HSD test with the Holm adjustment was done using the multcomp package (Hothorn et al., 2008).

Results
The following chapter is divided into five subsections.The first subchapter will describe the database search results and article selection process.Then the results of the risk of bias assessment will be presented.This will be followed by the description of the different subject areas of methods used to elicit the MMR response.Finally, the effect sizes of time on individual study level as well as an overall pooled effect size alongside a subgroup analysis will be reported.

Database search and study selection
The full workflow of the article selection process can be seen in Fig. 1.At first the database search yielded an enormous number of studies (n = 5866), but with closer inspection it became clear that the abbreviation MMR may also refer to the measles, mumps and rubella complex vaccine or maternal mortality rate.Therefore, the terms "vaccine" and "maternal mortality" were explicitly excluded from the search, which then concluded with 4203 studies.After duplicate removal and title and abstract screening 125 relevant articles were identified (see Table 1 for selection criteria).Of these, 51 studies (Table 2) remained after full-text inspection and were included into the methodology analysis part of the current review.The main reasons for study exclusion were the inappropriateness of the participant ages, the data of the difference wave was not reported, and/or there was no age-group comparison data in the cross-sectional studies.
For the meta-analysis we could obtain necessary data from 24 articles of which 13 were of longitudinal design and 11 of cross-sectional age group comparison design.After the influential cases detection one study excluded (Wu et al., 2021).While inspecting the funnel plot (Fig. 2) for possible publication bias, an additional study (Paquette et al., 2015) was excluded.Further, the work of Niemitalo-Haapola et al. (2015) had to be left out, because there was no reliable correlation available regarding the age group of the named study (2-4-year-olds), to calculate the standardized mean gain.Because of the same reason we could only use data from T1 and T2 from Yu et al. (2019).

Risk of bias assessment
The risk of bias of study reporting was rated in all of the 24 articles initially included into the meta-analysis and the results of each domain and the overall scores are presented in Fig. 3. Additionally, the overall RoB score for specific studies is shown in Fig. 8.For those studies which were eventually not included into the meta-analysis the overall scores were as follows: moderate (Wu et al., 2021), andlow (Niemitalo-Haapola et al., 2015;Paquette et al., 2015).The initial inter-rater agreement calculated as weighted kappa showed to be "fair" (κ = 0.265).

Overview of the methodologies used to elicit the MMR response
A full description of methodological aspects within studies is presented in Table 2 and the summarized data on Fig. 4. Almost half (N = 22) of the 51 articles included in the methodology review studied small children up to two years of age.The ages of the rest of the 29 articles spread more evenly from 2 to 8 years of age.
The majority of the studies used the oddball paradigm (N = 39) to elicit the MMR of which 12 used a control condition.There were some exceptions of multi-feature paradigm (Engström et al., 2019;Linnavalli et al., 2018;Niemitalo-Haapola et al., 2015;Virtala et al., 2022) and equiprobable paradigm (Chuang et al., 2018) usage.Three studies applied trains of ten stimuli where two of the stimuli were always deviants and appeared in random positions within the train (Glass et al., 2008;Shafer et al., 2010;Yu et al., 2019).
The studies used a different number of accepted deviant trials based on which the difference wave was calculated.The percentage of accepted epochs compared to presented trials ranged between 26 and 98 (Table 2).Some studies reported an absolute minimum of accepted trials and if not achieved the data was excluded.This minimum threshold ranged between 30 and 200 deviant trials.In many of the studies the information about accepted epochs was missing (N = 19).
The division between speech (n = 25) and non-speech (n = 22) stimuli was almost equal.There were some exceptions, for example one study used synthesized speech and five used foreign speech, which we did not categorize as typical speech stimuli, assuming that the processing of these stimuli is different from native speech stimuli processing.Figs. 5 and 6 present detailed information about the difference between the standard and the deviant stimuli.Within the sound domain (Fig. 5), research groups mostly used pure-or complex-tones as stimuli, but there were some exceptions where environmental sounds, speech shaped noise or frozen noise were applied.The most recurring change with tones was increasing the frequency of the deviant tone.Within the speech domain (Fig. 6), studies used vowels, syllables as well as words as stimuli.The most recurring change was changing the consonant in the CV syllable.
Finally, ISIs were highly variable between different studies.Herein ISI is treated as the time interval between the offset of one stimulus and the onset of another.More specifically, temporal gaps within a single stimulus, stimulus pair or between trains-of-tens are not currently considered.The most frequently used ISI was 500 ms in 8 studies, 300 ms in 5 studies and 190-290 ms in 5 studies.Additionally, most of the research groups used fixed ISI and only a few used two different ISI-s or a randomly occurring ISI within a fixed time interval.More specifically Virtala et al. (2022) used an inter-stimulus-interval of 850-950 ms, alternating in 10-ms steps randomly.Garami et al., (2014Garami et al., ( , 2016)), Ragó et al., (2014Ragó et al., ( , 2021))

Effect sizes within studies
Of the 51 articles included in the current review, only 21 presented necessary data (M, SD) for the meta-analysis.On Fig. 7 effect sizes within each study are presented.Altogether there were 114 observations divided unequally between 21 studies.Most of the studies (n = 15) in the meta-analysis provided results about small children ages 0-1 and a half years.The other 6 studies presented results of children aged 3 and a half to eight-year-olds.The effect size ranged from being zero to large.
In a minority of the studies either a significant large negative (n = 4 in 3 studies) or significant positive (n = 5 in 4 studies) effect size was found.Large negative effect size was found in Slugocki and Trainor (2014) in the occipital lobe when participant were presented white noise coming from different directions (Hedge's g = − 0.82), and in Ragó et al. (2014) from electrodes F3, Fz and F4 when participants had to detect an initial consonant change (respectively Hedge's g = − 1.16, − 1.13 and − 1.06).Large positive effect sizes were found in Leipälä et al. (2011) from F3 in tone frequency change paradigm in the first time-window 150-250 ms and in the second 250-350 ms (respectively Hedge's g = 1.01, 1.23), in Slugocki and Trainor (2014) in frontal area when participant were presented white noise coming from different directions (Hedge's g = 1.01), in Marie and Trainor (2013) calculated from grand average of electrodes in a musical note discrimination paradigm (Hedge's g = 0.7948), in Ragó et al. (2021) calculated from grand average of used electrodes in the pseudoword condition when the illegal stress pattern was the deviant (Hedge's g = 1.88), and finally Jansson-Verkasalo et al. ( 2010) averaged from F4, C4, P4, F3, C3, P3 electrodes in case of a non-native vowel as a deviant (Hedge's g = 0.95).
In some studies, also a significant medium negative (N = 5 in 4 studies) or positive (N = 1) effect size was found.Medium negative effect size was found in Choudhury and Benasich (2011)  − 0.36, -0.42).In one study a medium positive effect size was also found (Marie and Trainor, 2013; Hedge's g = 0.79).In other observations the effect sizes were either small, non-existent, or not significant (N = 99).

Pooled effect sizes of studies and the overall pooled effect size
On Fig. 8 the pooled effect sizes of each study are presented, while the overall pooled effect size based on the multi-level meta-analytic model estimate was 0.02 (95% CI: − 0.12 to 0.16; p = 0.76) reporting no general effect of time on the MMR response.The estimated variance components were τ 2 Level 3 = 0.0659 and τ 2 Level 2 = 0.0154 This means that I 2 Level 3 = 39.03% of the total variation can be attributed to between-cluster, and I 2 Level 2 = 9.11% to within cluster heterogeneity (Fig. 9).We found that the three-level model provided a significantly better fit compared to a two-level model with level 3 heterogeneity constrained to zero (χ 2 1 = 3.9; p = 0.05).A significant medium effect size was found in Leipälä et al. (2011).In other studies, the effect sizes were non-existent or small, but not significant.
We further investigated whether an effect size was present if only observations from frontal, frontocentral and central areas are included into the model.The results from 16 studies and 85 data points showed no overall pooled effect size (estimate 0.009; 95% CI: − 0.18 − 0.19; p = 0.0031).

Subgroup analysis
In the sub-group analysis, the current data allowed us to observe main effects and two-way interactions.Factors added to the model were as follows: midpoint of age, mean of the MMR response time-window, RoB score, design, stimulus type, brain region, and MMR polarity (see results from DOI 10.17605/OSF.IO/VE3J6).The results (Table 3) revealed two significant main effects and four interactions [F: (df1 = 17, df2 = 96) = 4.6606, p < .0001].
The first relevant predictor of the effect size was brain region (with levels: frontal, fronto-central, central, and other) showing different effect size in the frontal region.The Post Hoc test revealed that the significant differences were between the frontal and other regions (p = 0.012).Other sub-groups did not differ from each other.There was also an interaction between the brain area and the (log-normalized) mean of the MMR response time-window, but in the post hoc test the differences diminished.
The second predictor of the effect size was the age of the participants (variable: Midpoint of age; in the longitudinal studies: the average of the ages at the first and the last measuring; in the cross-sectional studies the average of the ages of the youngest and the oldest comparison group) which pointed to the decrease of the effect size as the age increases.Additionally, there was an interaction between the participant age and the (log-normalized) mean of the MMR response time-window in which the effect size decreased slightly mitigated when both participant age and MMR time-window increased.The post hoc test showed this interaction to remain significant (p = 0.001).
Another significant interaction was between the overall RoB score (categories; low, moderate, high) and the publication year of the study.The effect size was significantly different in the RoB "low" category, where it grew when the year of publication increased.Yet again, the significant differences diminished in the post hoc test.
Finally, the model revealed an interaction of the overall RoB score and the design of the study (longitude or cross-sectional group Fig. 5. Non-speech stimuli used in the 51 studies included in the methodology review. L. Themas et al. comparison).In the category of RoB "low" and design "longitude" the effect size was higher (Cai et al., 2016;Choudhury and Benasich, 2011;Yu et al., 2019).At the same time, the post hoc test did not reveal any significantly different subgroups.

Discussion
The MMR response is considered a valuable tool to study the development of sound and speech sound perception.Although it is somewhat studied among preschool children, the picture concerning the maturation of MMR response is far from clear.It is essential however to track the changes happening with the MMR to develop a deeper understanding of the maturation of auditory processing and in the future possibly use it as one of several diagnostic tools for detecting and specifying language impairments.The current paper makes a first attempt to summarize the maturation of the MMR amplitude based on more than twenty years of research.
In the following section we will first discuss the risk of bias indicators in the articles as well as the findings of applied methodologies for eliciting the MMR.Based on the latter, we will give methodological suggestions for future research of child studies.In the second part of the current chapter, the results of the meta-analysis and sub-group analysis will be discussed.

Risk of bias in reporting
The current study aimed to evaluate the risk of bias in articles included in the meta-analysis using the QUIPS tool (Fig. 3).
Three main problematic topics emerged in the current set of articles.Firstly, authors often did not report confounding factors.In the case of studying pre-attentive speech perception, factors such as a multilingual home environment can be considered as a study confounder.While the studies may not have been influenced by this, being not reported lowered their RoB assessment.Secondly, in the domain of study participation, the most obvious shortcoming was the sample size wish was in several studies 10-20 participants.Although, it is understandably a challenge to gather young subjects into an EEG experiment.Thirdly, in the domain of prognostic factors authors did report the division of sex in the sample, but the influence of the participant gender on the MMR amplitude should have been inspected further as some research have shown possible gender differences in pre-attentive speech perception (Marklund et al., 2019;Shafer et al., 2011) and it is well known that the pace of language acquisition in boys is a bit slower compared to girls.Additionally, there might be other prognostic factors (e.g., general intelligence) that might relate to the maturation of the MMR.Therefore, to increase the validity of the MMR we suggest always including some other parallel measurements.
The quality of reporting varied longitudinal (N = 11) and crosssectional group comparison studies (N = 10).There were considerably more longitudinal articles that gained either a "low" (n = 3) or "moderate" (N = 6) RoB score compared to cross-sectional studies where the "low" RoB score was given only to one study and the "moderate" RoB score was given to three studies.Note that even when the study attrition was evaluated only in longitudinal studies, the overall RoB scores were still better in the longitudinal study group.Additionally, to the superiority of longitudinal design over cross-sectional, the reporting in these articles was of higher standard as well.
The initial inter-rater agreement using the QUIPS tool calculated as weighted kappa showed to be 0.265 between authors scoring the studies for any potential bias.This can be interpreted as poor agreement beyond chance level.This finding is somewhat not surprising because RoB assessment is a process of subjective judgment and the items in QUIPS can be interpreted in different ways.Additionally, there are no specific guidelines on how to decide on the overall RoB score of the studythe sum of the six domains (Grooten et al., 2019).In the study of Hayden et al. (2013) the kappa values obtained with QUIPS were between 0.56 and 0.82.However comparing these results with the current ones may not be appropriate because in Hayden et al. (2013) they used a previous version of the QUIPS tool and the calculation of kappa was inconsistent in different review teams.In a more recent study of Grooten et al. (2019) the overall weighted kappa coefficient was similarly weak as in the current results.Finally, these kinds of quality assessment tools, including the QUIPS tool, are designed to rate study quality in the medical field and not in experimental work.The latter may have also contributed to the poor inter-rater agreement score.All in all, it needs to be emphasized that together with using the QUIPS tool, a discussion between raters is necessary to reach more reliable results.

Methodologies for eliciting the MMR
The first research question in this study sought to describe the different methodologies used to elicit the MMR.Expectedly the different researchers used a large variety of methods.

Paradigms
A common aspect of many studies in the current review was the implementation of the oddball paradigm (Fig. 4).Although this paradigm has proven to be reliable for eliciting the MMR in children, concurrently it has its disadvantages.Firstly, the oddball paradigm allows one to study the perception of a limited amount of acoustic feature change, i.e., one or two deviants per series.Secondly, the recording sessions can be lengthy: a minimum of 150 deviant trials (Duncan et al., 2009) which is a problem working with preschoolers.Finally, the oddball paradigm needs an appropriate control condition to accurately measure the MMN in adults (Kujala et al., 2007;Winkler, 2007) and the same principle should be applied to child studies as well.
To resolve the first two of the disadvantages Näätänen et al. ( 2004) proposed a more optimal paradigm named the multi-feature paradigm or Optimum-1.Within this paradigm every other stimulus is the standard and every other is one of the five chosen deviants (Näätänen et al., 2004).It is particularly useful in clinical research as it can be used to obtain five different types of MMR responses in the same time traditional paradigms elicit only one type (Garrido et al., 2009).Further, there is evidence that this can be utilized with children as successfully as the oddball paradigm (Lovio et al., 2009) and has been used by some researchers studying children at different ages (Engström et al., 2019;Kuuluvainen et al., 2016;Linnavalli et al., 2018;Niemitalo-Haapola et al., 2015;Partanen et al., 2013;Torppa et al., 2022;Virtala et al., 2022).
The third concern employing the oddball paradigm is the necessity to use a control condition which was done by 12 studies out of 39 in the current review.A control condition adds time to the experiment, which is problematic working with children, but is irreplaceable to accurately measure the MMR and to go beyond group comparisons.The reasons being the influence of the obligatory responses and the refractoriness effects.The influence of the obligatory responses, more specifically the N1 response, on the MMR is quite ambiguous because the latency window of the child MMR is not as established in children as it is in adults.Therefore, separating the two serves to be quite difficult, but employing Fig. 8.The pooled effect sizes of each study.Design levels: longitudinal design (l) and cross-sectional age group comparison design (g); Risk of bias (RoB): low (L), moderate (M), high (H); under Age1 the age at the first measurement or the age of the first age group is reported in years and months; under Age2 the age at the last measurement or the age of the oldest age group is reported in years and months.The squared shape placement represents the value of the pooled effect size of the study and the size of the standard error.The vertical line passing through the square shape represents the confidence intervals.control conditions may resolve some of this uncertainty.Lohvansuu et al. (2013) studied the separation of N1 from MMN in 9-10-year-olds and concluded that without a control condition the MMN seemed to emerge earlier than when the deviant-control contrast was used.Therefore, the early temporal activation is obligatory in nature instead of denoting change detection (Lohvansuu et al., 2013).Kujala et al. (2007) and Winkler (2007) have proposed several examples on how to conduct a control condition for both resolving the problem with separation of the N1 and the refractoriness effect.

The number of standards, deviants, and accepted trials
There was a considerable variation between the studies in the number of stimuli (ranging from 240 to 5000 stimuli) and among these a variable proportion of deviants (ranging from 8% to 50%) presented to the participants (Fig. 4).However, in most of the studies the number of deviants ranged between 100 and 200.In the case of adults, it is highly recommended to have a sufficient duration of the stimulus series (at least 150 deviants) in order to achieve an adequate signal-to-noise ratio (Duncan et al., 2009).Most of the current studies more or less conform to this suggested number, with the exceptions being Ní Choisdealbha et al. ( 2022) who managed to record a robust MMR with 42 deviants and Niemitalo-Haapola et al. (2015), with 54.At the same time, in child research where the population has a higher level of variability compared to adults, a larger number of stimuli may be needed to obtain a robust MMN (Key and Yoder, 2013).Even larger number of trials is needed, when the sound difference in standard and deviant is physically small, the authors wish to analyze data on individual level (Kujala et al., 2007), or want to increase statistical power (McWeeny and Norton, 2020).Ten studies in the current review indeed used a larger number of deviants than 200.But a note of caution is due here, because a prolonged exposure to the stimuli may produce the habituation effect (Key and Yoder, 2013).It seems that in child research the decision about the number of stimuli has been guided by the principles from adult studies.Nevertheless, we concur with McWeeny and Norton's (2020) suggestion that future research should aim to determine the optimal amount of standards and deviants to further increase the quality of studies, but at the same time taking into consideration the comfort of participants.
A large variation of accepted deviant trials for calculating the MMR was present as well.Some authors obtained a high number of usable epochs (ca 98% of all deviant trials with 6-to 24-month-olds; Choudhury and Benasich, 2011) and others a significantly lower amount (ca 26% of all deviant trials with 2-to 4-month-olds; Trainor et al., 2003).Most of the studies managed to acquire over 50% of accepted epochs.Note that data loss cannot be equated to data quality (Van Der Velde and Junge, 2020).This implies that a significant amount of data loss does not necessarily indicate a low quality of the retained epochs and vice versa.
The somewhat low number of accepted trials is expected as measuring ERPs of small children is difficult.For one there usually is a high amount of movement artifacts exhibited (Hervé et al., 2022).At the same time, it is a problem, because high data loss and attrition due to data loss reduces the reliability and statistical power of child ERP studies.Van Der Velde and Junge (2020) have specifically explored how to reduce data loss in infant and young children's ERP studies and recommend testing early in the day, preferably during summer or spring months using a flexible experimental design which allows breaks and using auditory stimuli compared to other options (e.g.visual).A more relevant aspect to the current study is that a lengthier paradigm enhances data loss (Van Der Velde and Junge, 2020) which supports the previous recommendation to use Optimum-1.
A second source of variation in the number of accepted trials is the difference in how the EEG recordings are processed and artifacts are dealt with.Hervé et al. (2022) suggest a standardized processing pipeline for child ERP data, more specifically the "Maryland analysis of developmental EEG pipeline" (MADE; Debnath et al., 2020) and some additional methods to control for artifacts: recording cardiac activity, using external electrodes positioned on the face, using an eye-tracker to enable control over the timing and nature of experimental conditions, utilizing the child's eye behavior as a guiding input, and using a video-EEG to monitor the child's behavior online.
Unfortunately, avoiding data loss and minimizing artifacts is not always possible in child studies.The researchers have difficult decisions to make regarding attrition due data loss because there is no golden standard (even in the adult research community) of what should be the number of accepted trials to be included into the study (Van Der Velde and Junge, 2020).On one hand the aim should always be to achieve the datasets to be as clean as possible.On the other hand, it is not reasonable to exclude too many participants because of the low signal-to-noise ratio.Furthermore, excluding a substantial number of participants with movement artifacts would introduce bias into the results, as it would limit the study sample to children with specific characteristics.
One solution would be the studies to provide the number of presented deviants, the number of accepted trials, the attrition due to data loss and the characteristic of the children excluded because of low signal-to-noise ratio.Van Der Velde and Junge (2020) suggested sharing attrition rates split, at least, by age groups and gender.The risk of bias assessment in the current meta-analysis revealed that most studies had deficiencies in reporting attrition, although more recent studies exhibited enhanced reporting.Knowing the latter information, though, is essential to gain a more thorough understanding of the causes of data loss which will possibly help future researchers to reduce it.

Non-speech stimuli
The studies used an extensive variety of both non-speech and speech stimuli, while the division between these two categories was almost equal: 22 vs 25 (Figs. 4 and 5).The variability of stimuli within the nonspeech stimuli group was less diverse.Several experiments used a frequency change of the fundamental frequency and for some studies also of formant frequencies.However, the baseline frequencies and the amount of change were often different.Some studies explained their choice by relying on behavioral studies.However, others offered little explanation, why certain stimuli and change between standard and deviant was used.
One aspect that can influence the quality of the MMR measurement is whether sinusoidal or more natural sounds are used.Kujala et al. (2007) suggested using piano tones to improve the signal-to-noise ratio and to shorten the data recording time: frequency change in piano tones evokes larger MMNs than an identical frequency change of sinusoidal sounds.It is an important knowledge to take into consideration in child studies as well, because child data includes a large amount of noise and at times no clear MMR would not be detected.Some studies in this review used musical tones as stimuli (He et al., 2009a(He et al., , 2009c;;Marie and Trainor, 2013).

Speech stimuli
The variation within the speech stimuli category was large (Fig. 6).Two studies in the current review compared the development of acoustic and linguistic processing.Marklund et al. (2019) found no maturational differences between speech and non-speech stimuli.On the other hand Paquette et al. (2015) found a significant effect of age (If we consider these findings at the presumption that a p-MMR represents an immature response) in the speech stimuli condition, where they registered a p-MMR at 3 months of age, but a MMN at 12 and 36 months of age.In the non-speech condition 3-months olds showed a longer latency than the two older age groups.Additionally, a very recent longitudinal study by Chen et al. (2022) observed no cross-condition correlation of the changes taking place with the MMR amplitude between the speech stimuli condition and the non-speech one with young infants.Cross-sectional results of Kuuluvainen et al. (2016) showed larger amplitudes elicited by speech stimuli particularly apparent in the left scalp sites with 6-7-year-olds.Unfortunately, there is no data about how the reactions matured.
A bulk of studies inspected the perceptual narrowing, which is the enhancement of the ability to discriminate between native speech sounds and the loss of ability to discriminate between non-native speech  2019) used a foreign phonological pattern from hindi and two other native CV syllables and found on the contrary to their expectations no group difference on perceptual narrowing between institutionalized children and children who grew up in a family environment.Chuang et al. (2018) and Wu et al. (2021) employed exclusively non-native syllables to measure group differences in babies with craniosynostosis and typically developing babies.They reported no significant change in the typically developing group between two measuring points (respectively ca 2-10 months and 7-16 months; 3-12 months and 6-15 months).Future research is needed to inspect perceptual narrowing, because this might be a useful way to identify at-risk children with whom the named process occurs later compared to typically developing (Jansson-Verkasalo et al., 2010).Four studies in the current review compared vowel and consonant change detection.Cheng et al. (2015) presented evidence that from 3 to 6 months of age the discrimination of vowels matured to resemble an adult like MMR, but with consonants the MMR response remained positive.Lee et al. (2012) used the same stimuli with 4-, 5-and 6-year-olds and showed that the vowel perception was similar to adults, but with consonants the response remained positive.On the other hand Niemitalo-Haapola et al. (2015) presented data about 2-and 4-year-olds, where there was no difference between the response to the vowel and consonant deviants.Linnavalli et al. (2018) used the same stimuli and found that the vowel perception was still maturing between the ages of 5 and 6 years.Whereas the response for consonant change was stable but not very salient.Additionally, a very recent study of Werwach et al. (2022) observed the maturation of the MMR to consonant and vowel change at 2, 6 and 10 months of age.They found that the consonant MMR decreased in a quadratic growth curve, while the vowel MMR first increased between 2 and 6 months to then declined from 6 to 10 months (Werwach et al., 2022).Similar findings are presented in Schaadt et al. (2023) with the consonant-MMR amplitude decreasing towards a negativity in a quadratic growth curve, but the vowel-MMR amplitude changed in an inverted u-shape with an initial increase (i.e., more positive) from 2 to 6 months, followed by a decrease (i.e., less positive, or more negative) from 6 to 10 months (Schaadt et al., 2023).If we consider these findings at the presumption that a p-MMR represents an immature response and its decrease showing maturation thourds MMN, the overall state of art is not as straightforward as would be desirable.Therefore, more research which focuses on the developmental trajectories of vowel and consonant perception is needed.
A frequently used differences between the standard and deviant stimuli, was a change of one plosive consonant to another at the beginning of a CV syllable (Cheng et al., 2015;Linnavalli et al., 2018;Niemitalo-Haapola et al., 2015;Ovchinnikova et al., 2019;Paquette et al., 2015;Schaadt, 2015).This choice of stimuli is pragmatic because the large contrast between the plosive and the vowel makes it easier to segment the two and to perceive the differences in the plosive.Another advantage for this choice is that most world languages have voiceless plosives /t/, /k/, and /p/ with very similar acoustic characteristics (Ladefoged & Maddieson, 1996).Therefore, these stimuli create a situation where studies in different languages are more comparable, and these stimuli are a good option for future cross-linguistic studies.
Another set of studies investigated suprasegmental features like Mandarin lexical tones or stress patterns (Chen et al., 2016;Cheng et al., 2013;Cheng and Lee, 2018;Garami et al., 2014Garami et al., , 2016;;Lee et al., 2012;Liu et al., 2014;Nan et al., 2018;Ragó et al., 2014Ragó et al., , 2021;;Varga et al., 2021;Vavatzanidis et al., 2013).Previous research suggests that both local-to-global and global-to-local analysis occur simultaneously in the auditory system, with both hemispheres participating in the process but each being optimized for different types of processingthe left-hemisphere for parsing speech at the segmental scale and the right-hemisphere for parsing speech at the syllabic time scale (Ghazanfar and Poeppel, 2014).It is an interesting question whether segmental-and suprasegmental feature perception develops with the same pace or with a different one and how it is relatable to the developmental trajectories of the MMR.This review includes two studies that have looked at perception of both segmental and suprasegmental feature change with the same participants.Ragó et al. (2014) studied 6 and month old Hungarian infants and found that the phoneme change (/ba: nán/ to /pa:nán/) elicited a significant negative MMR response at both ages.On the other hand no MMR was found in the illegal stress pattern detection (/ba:nán/ to /baná:n/) condition.Lee et al. (2012) studied consonant, vowel and Mandarin lexical tone change detection in 4-, 5and 6-year-olds.In the lexical tone condition the easier deviant (tone to tone 3) elicited a negative MMR in all age groups, but the more difficult deviant (tone 2 to tone 3) elicited a positive MMR in 5-and 6-year-olds and the p-MMR of the 4-year-olds was diminished.A similar trend was found in mismatch responses to vowels (/di/ to /da/ or /du/).Interestingly, in the initial consonant condition (/ba/ to /ga/ or /da/) all the age groups exhibited a p-MMR with both deviants.If we consider these findings at the presumption that a p-MMR represents an immature response, then the two studies present contradicting evidence therefore more research needs to be done in concerne of investigating the perception of segmental and suprasegmental features of speech.
A number of the studies used meaningful words as stimuli.In both children and in adults several studies have observed an enhanced MMR/ MMN response to meaningful words compared with pseudo-words or non-words (for review please see Pettigrew et al., 2004).More specifically, the MMN may also reflect pre-attentive processing at the level of the auditory input lexicon, where the results of phonological analysis of the acoustic input are matched to existing word forms in the lexicon (Pettigrew et al., 2004).This notion is supported by several studies in the current review as well (Chen et al., 2016;Garami et al., 2016;Leminen et al., 2020;Ragó et al., 2021;Ylinen et al., 2017).Thus, working with meaningful words as stimuli researchers have to consider the possibility that the observed MMR-s may be enhanced and may not reflect just acoustic and phonological processing but additionally lexical activation.
The last point that will be discussed under this section is a circumstance where the similar set of stimuli are used by different authors, with nearly identical timing of repeated measurements.Fellman et al. (2004), Kushnerenko et al. (2002), andLeipälä et al. (2011) used thee-partial harmonic tones with the fundamental frequency of 500 Hz as standard and at 750 Hz as deviant.Two of these studies (Fellman et al., 2004;Leipälä et al., 2011) also reported identical sinusoidals, but the second reported no information about these features.Interestingly, the results were different when compared to one another.Fellman et al. (2004) detected a shift in the MMR from positive to negative, Kushnerenko et al. (2002) reported that the negative MMR was not constantly elicited across the ages studied, and Leipälä et al. ( 2011) obtained a very early negative response to the deviant at the first measurement, but a positive and more delayed reaction at second measurement.The latter refers to the need for replication studies using the same set of stimuli with the same age groups.
To summarize, several very interesting developmental paths exist (e. g., sound compared with speech stimuli, native compared with nonnative stimuli, vowels compared with consonants etc.) in the research concerning the MMR.The large variety of stimuli used in studies is probably associated with specific interests of authors as well as authors' tendencies to research aspects that are important in the context of their native language.At the same time there is a dire need for studies using the same method, same set of stimuli with the same age groups.This would allow more direct comparisons and add more reliability to the studies as well as give us a chance to see how similar stimuli modify the MMR response among participants with different native language backgrounds.All in all, the perception of different language features needs to be further studied and these studies replicated to draw any certain conclusions.

Number of electrodes and MMR localization
The studies in the current review used a variety of different electrodes ranging from 8 to 129, but the most utilized options were 32 or 124 electrode systems (Fig. 4).This variability partly reflects the development of technology: research conducted earlier tended to use less electrodes compared to more recent work that used more electrodes.The duration of the experiment needs to be optimal when studying the child population, therefore the time spent on the electrode placement should be as short as possible.Brooker et al. (2020) reported average application times for 32 (M = 12 min, SD = 2.74), 64 (M = 17.5 min, SD = 7.55), and 128 channel systems (M= 13.33 min,SD= 2.58).Based on that knowledge and their extensive experience conducting studies with children they (Brooker et al., 2020) advise using 32 channels when source localization is not included.
Regarding MMR localization the articles included in the current review frequently reported results from frontal and central areas or electrodes (Fig. 4).Clearly, the choice is influenced by adult studies (Duncan et al., 2009;Fong et al., 2020;Kujala et al., 2007).Also, the clustering of electrodes in the current set of studies is influenced by tradition.A note of caution is due here mature.Selecting only those electrodes for analysis where hypothesized differences appear to be present introduces the possibility that findings will be biased toward significant results (Brooker et al., 2020).Further, the clustering of electrodes into regions should be based on PCA or ICA, the goal being the identification of electrodes that produce similar outcomes and grouping those together.
Another way of reporting is to compare the results from the left and right hemisphere, which was done by several authors in the current review.McWeeny and Norton (2020) advise that this is a feasible solution when interested in spatial questions because the overall ERP localization is quite limited.Further, reporting left and right hemisphere data will give an interesting insight to the lateralization of non-speech compared to speech stimuli and to the lateralization of processing different features of speech.

Inter-stimulus interval
The variety of ISIs has been rationalized based on the age of the participants and the duration of the experiment (Fig. 4).Shorter ISI will decrease the length of the experiment, but at the same time a short ISI may increase the difficulty of discriminating between the stimuli or even cause the perception of two units as a single as proposed by Wang et al., (2005).Three studies in the current review have inspected the impact of ISI on the results.He et al. (2009b) presented piano tones with 100 ms and 200 ms ISI, while the duration of the stimuli in the fast presentation condition was decreased by 500 ms.They found that both 2-and 4-month-old infants showed a slow positive wave at 150-400 ms which was less positive with the increased presentation rate.They also detected a fast MMN-like negativity in 4-months olds with latency but no amplitude difference in the fast and slow presentation rate.Jing and Benasich (2006) compared 70 and 300 ms ISI presentation rate with tone pairs in 16-and 60-month-olds.They reported no difference in MMR amplitude, but the participants had a longer latency in the fast presentation condition.
Another important aspect concerning the ISI is to choose between fixed ISI or alternating ISI.A fixed ISI is useful in controlling the timing between stimuli and ensuring a consistent time period between events.This can be helpful when the goal is to measure the response to a single type of stimulus and the effect of repetition or adaptation over time.On the other hand, alternating ISIs can help to minimize any potential habituation effects that may occur with a fixed ISI.Our results show that most of the studies applied the fixed option and less studies chose the alternating option, where the ISI varied either within 100 ms range or 200 ms range (Fig. 4).Čeponien ė et al., (1998) found that there was no impact of ISI on the MMR of children aged 84-108 months, when the ISI changed within the range of 200 ms.But, at the same time there is no data about the younger population.So, it is reasonable to choose a smaller time-range for alternating ISI for smaller children.

The effect size of each observation within studies
Altogether there were 114 individual observations in the metaanalysis, divided unequally under 21 studies.Unfortunately, many of the studies concluded into the methodology analysis did not report means and proper estimations of variability of the MMR and any of the longitudinal studies did not report correlations between measurements at different time-points.It is essential to report this kind of data for assessing results and conducting future meta-analyses.In these, the effect size ranged from being non-existent (N = 43), small (N = 47), medium (N = 11) or large (N = 11; Fig. 7).Additionally, both positive and negative values of the effect size were present.
Firstly, the studies reporting large or medium effect sizes mostly involve children under one year of age.Additionally, to the articles in the meta-analysis, other authors have also detected significant change during the period.For example Cheng et al., (2013Cheng et al., ( , 2015) ) reported a polarity shift from positive to negative during the first six months after birth. et al., (2009b, 2009a), He andTrainor (2009), andTrainor et al. (2003) found an emerging negativity between 2 and 4 months, 2-6 months and 3-7 months respectively.Fellman et al. (2004) reported an increase in negative MMR amplitude during the first 15 months of life, but it was not specified whether it was statistically significant.Less studies detected no change during this period (Garami et al., 2014(Garami et al., , 2016;;Kushnerenko et al., 2002).Based on older research Cheour et al. (2000) concluded that the ERPs, including the MMR, seem to follow an inverted-U shape as a function of age: the amplitude maxima is obtained prior to maturity.The presented evidence is in accordance with the general knowledge about the fast development of non-speech and speech-sound processing during the first year of life, when the shift from general acoustic processing to language specific processing takes place and the base of phonological knowledge is formed.
Secondly, we observed that both negative and positive effect sizes were present in the data, sometimes within the same study.The latter can be explained by several arguments.For instance, in Slugocki and Trainor (2014) the effect size was positive in the frontal and central areas with increasing MMR amplitude between age groups.Yet the effect size was negative in the occipital area indicating a decrease in amplitude over time.No such pattern can be seen though in Chen et al. (2016) Martin et al. (2003) where the data from several regions/electrodes was available.
The opposite direction of the effect sizes may also be due the stimulus type like in the work of Jansson-Verkasalo et al. (2010).A small negative effect size was found with the native vowel as a deviant, but a large positive effect size was found with the non-native vowel deviant.But in Ragó et al., (2014, 2021), and Varga et al. (2021), who used phonetically illegal and legal stress patterns as deviants, the direction of the effect sizes seemed to be dependent on the specific study.For instance, in Varga et al. (2021) we found small to medium negative effect sizes in the case where the deviant was a pseudoword with an illegal stress pattern.In Ragó et al. (2021) the same condition resulted in small to large positive effect sizes.The opposition of results were further found in the condition where the deviant was a pseudoword with a legal stress pattern.In Varga et al. (2021) there was a small to large positive effect size, but in Ragó et al. (2021) it was negative.Additionally, the works by Ragó et al., (2014Ragó et al., ( , 2021) ) presented opposite directional effect sizes in the condition where a word with an illegal stress pattern was the deviant.In the 2014 article the effect sizes were positive ranging from small to large, but in the 2021 article we found non-existent to medium negative effect sizes.There are several explanations for this confusion.First of all, the stimuli were similar, but not identical (/bebe/, /ba:nán/ and /baba/).The latter emphasizes the influence of specific details of the stimuli on the outcome.Then again, these results might have been a product of individual differences in the participant groups or some other unknown but influential factor.
The third observation was that there was a substantial amount of non-significant and non-existing effect sizes in the current data.This refers to the fact that the MMR response does not change during the course of time.If hypothesized that the changes do occur in reality, then there must be another explanation for this outcome.Escera et al. (2014) discussed that there is a general consensus that the MMN/MMR reflects more than a simple form of adaptation to the repetition of a particular standard stimulus; yet it requires in a simple-feature oddball paradigm, particular care to disentangle which aspect of the electrophysiological response is based on mere adaptation and which on the representation of the context regularity that is kept in sensory memory.The authors refer here to the necessity to use a control condition which we discussed more thoroughly in the subchapter of paradigms.Most of the studies in this review did not use a control paradigm which led to the situation that these studies did not register deviance detection but some other (lower level) reaction which remained stable during maturation.

The pooled effect size
The overall pooled effect size based on the multi-level meta-analytic model estimate was 0.02.Additionally, we found a pooled effect size for each article (Fig. 8), which was small or non-existent for most.This refers to the stability of the p-MMR and/or the MMN, which is also supported by very recent work of Alatorre-Cruz et al. ( 2023) with 5 repeated measures done from 3 months to 24 months of age and Torppa et al. (2022) and Morales et al. (2023) involving pre-school ages.
At the same time this outcome is unexpected as at least some of the observations within every study in the meta-analysis pointed to amplitude changes taking place during the first year of life.Further, this is supported by very recent findings of Chen et al. (2022), Liu et al. (2023), Schaadt et al., (2022, 2023), and Werwach et al. (2022) involving infants under the age of 1 year.The non-existent effect size can also be thus explained in several different manners which also highlight the shortcoming of the current study.
Firstly, Key and Yoder (2013) pointed out that the oddball MMN may not be detectable in adults even when behavioral evidence of sound discrimination is present, because the other ERPs may temporally overlap with the MMN and therefore mask it.They emphasize that undetectable MMN is even more likely to occur with the child population (Key and Yoder, 2013).This is also stated in the work of Shafer et al. (2010), where they hypothesized that the co-existing positive and negative waveforms may cancel each other out.The latter may partially explain the substantial amount of non-existent effect sizes in the current data.More specifically, the non-existent effect size may not reflect the stability of the MMR during development or the absence of the MMR but reflects the inability to measure it.Further, the inability to detect a MMR response may introduce a bias into the developmental work, when datasets with no registered MMR response will be excluded (Key and Yoder, 2013).
Another explanation for missing effect sizes comes from the polarity change.In the course of it the MMR amplitude must pass zero at some point, which is problematic because this makes it difficult to say whether the change was truly not different from zero or the measures were done during the passing of zero (Marklund et al., 2019).It is possible that at least in some of the studies in the present meta-analysis the data was collected in the timeframe where the MMR amplitude was passing zero.
Yet another explanation for missing pooled effect sizes is the possible non-linear nature of the development of the MMR amplitude (Cheour et al., 2000).This means that the amplitude does not change steadily but there are periods of slow and fast development even over the course of a couple of months.Therefore at least some studies might have measured the MMR during a slower developmental period and these studies outweigh the ones which captured the faster period.Additionally, the random effect model used for pooling the effect sizes assumes linear change of the dependent variable.If the change is not linear then the current model does not reflect the reality of the amplitude change of the MMR.
Finally, it seems that the generalization on this level may have erased some important changes that are taking place with the MMR amplitude.This is due to the heterogeneity of the data that was used in modeling.As we wanted to be thorough and avoid any possible bias, we included every result available in the literature and added it to the meta-analytic model.This led to an unexpected problem with studies presenting effect sizes in opposite directions.For instance, the study of Slugocki and Trainor (2014) contributed three observations into the current dataset: the effect size denoting the change of amplitude in the frontal area (Hedges g = 1), the central area (Hedges g = 0.6) and finally the occipital area (Hedges g = − 0.8).The overall pooled effect size was 0.12.It seems that the opposite results in the current case have leveled out one another.Second example can be drawn from Jansson-Verkasalo et al. (2010) where the direction of effect size was opposite in the case of two different deviants.The native vowel evoked a MMR that slowly changed to be more negative (Hedges g = − 0.15) and the non-native vowel evoked a MMR that made a large change during the same time-frame and reversed polarities (Hedges g = 0.95).However, the pooled effect size for this study was 0.13, referring to the same outcome that happened with the case of Slugocki and Trainor (2014).
A solution to overcome this issue is possibly to include measurements only from the frontal and central areas where the MMR supposedly is the most prominent.However, this may introduce a possible bias towards a significant result (Brooker et al., 2020) or leave out important changes taking place in other areas of the brain as we discussed previously.Another solution might be to group the results not according to study but according to some other factor (which we tried to achieve in the sub-group analysis).For example, the data can be organized based on a change between the standard and the deviant stimulus.The latter allows to investigate a specific developmental trajectory of an acoustic or linguistic change detection, for example the perception of duration.Currently though, the available data is too scarce to examine it at this kind of detailed level.

The variance in the meta-analytic model
Based on the meta-analytic model over half (51.85%) of the variance of the current results is caused by sampling error (Fig. 9).The second level of variance represents the true effect size differences within clusters, which contributes 9.11% of the variance into the current results.Finally, the third level reflects the between-study variation, which contributes 39.03% of the total variance into the results.
The amount of sampling error is an expected outcome in child studies.Moreover, the current sample in the meta-analysis includes children of different ages.Nevertheless, the reduction of the sampling error must be sought by increasing the number of participants, replicating studies, and randomizing the procedure of recruitment.Although, a bigger sample may not solve the problem of inter-subject variability as demonstrated in Kuuluvainen et al. (2016), where the relative sample homogeneity still produced a large variation of results.It might be so that variability is not between the methodologies but also within the methodologies telling that other possible mediators might play a role there.All on all, more precise protocol is needed to draw more general conclusions.
A more interesting finding was that within study heterogeneity only contributed one tenth of the total variation in the results.Previously, it was brought out that some studies (Section 4.3.1.)exhibited notable differences in their effect sizes, even displaying opposing directions.Despite these disparities, the results within individual studies are generally consistent.Finally, another expected finding was the large between study variation, which further confirms the impact of the different methodologies used by researchers on the results.

The sub-group analysis
The sub-group analysis revealed two significant main effects: brain region and participant age (Table 3).Additionally, there was one significant interaction between the participant age and the (log-normalized) mean of the MMR response time-window.
The main effect of the brain region was expected and unexpected at the same time.For instance several child studies have confirmed that the development of the MMR amplitude is different in brain regions (e.g., Chen et al., 2016;Trainor et al., 2003).Additionally, we found that the effect sizes in the frontal, central and fronto-central area do not differ from each other.More specifically it refers to the notion that the changes in the MMR amplitude are similar in these regions.This is expected, because it is generally known that the neural activation of the MMN is concentrated in the fronto-central areas in adults (Duncan et al., 2009) and child studies show similar distribution of the neural activity over the frontal, central and fronto-central areas (e.g., Cheng et al., 2015;Kushnerenko et al., 2002;Schaadt et al., 2015;Slugocki andTrainor, 2014, Werwach et al., 2022).It must be noted though that in certain conditions there might be different developmental trajectories for the frontal, central and fronto-central areas as well (Liu et al., 2014).
The post hoc test additionally revealed that the effect sizes in the central and fronto-central regions are not different from the effect size of the "other" region.One might expect that the changes in the MMR amplitude in these areas take place differently.The reason for the lack of it might be since the group named "other" was a set of different electrodes and/or regions that did not fit under the frontal, fronto-central or central category.The latter kind of division was done based on an assumption that most of the activity concerning the MMR will happen in the three named regions and less in other regions.Further, adding more categories would have resulted in an unequal number of observations within groups, for instance there would have been only 4 observations from 2 studies under the category of "occipital region".Therefore, it is unclear whether there is no true difference between fronto-central and central areas compared to "other" regions.
The main effect of the participant age on the effect size was expected.More specifically the effect size seems to decrease as the age increases.This is in accordance with the general knowledge that in earlier years the development of the auditory system and language perception is intense.Therefore, the changes taking place with the MMR amplitude are larger as well.There are not many studies that have observed MMR maturation over a longer period.But those which have are in agreement with the current findings and suggest that the more rapid change takes place during infancy compared to toddlerhood or the year following (Jing and Benasich, 2006;Lee et al., 2012;Linnavalli et al., 2018;Paquette et al., 2013Paquette et al., , 2015;;Virtala et al., 2022).Note that studies still report an ongoing change in the MMR amplitude even at school age and this is explained by the level of task difficulty (Putkinen et al., 2014;Schaadt et al., 2015).
A significant interaction was found between the participant age and the (log-normalized) mean of the MMR response time-window where the effect size decreased while both variables increased.This could be interpreted that with smaller children larger effect sizes existed even at late time-windows but with older children the effect sizes are small in the later time-windows, referring to the possible latency decrease of the MMR.This result is expected because as the structures of the brain develop and the language experience grows the auditory discrimination responses become more automated and efficient, therefore becoming faster compared to earlier ages.The reduction of the MMR latency has been found by several previous studies as well (Alatorre-Cruz et al., 2023;Choudhury and Benasich, 2011;Jing and Benasich, 2006;Morr et al., 2002;Shafer et al., 2000).
Ultimately, there is a wide range of methodologies employed to elicit the MMR, posing challenges in comparing studies and drawing generalizations.At the same time, it is important to recognize the advancements in technology and methods for acquiring MMRs, as well as the evolving analysis techniques over time.Furthermore, each research group possesses their own distinct interests and methodologies, which are rooted in theoretical foundations.All these factors contribute to the diversity of methodologies employed and some of these factors are not suitable for standardization.The latter discussion gave suggestions for creating consistent guidelines where standardization might be possible, which is very important if the MMR is used to investigate auditory maturation.Further, having a proper benchmark helps assess the role of auditory stimulus discrimination in various developmental language disorders in the future.

Fig. 1 .
Fig. 1.Flow chart showing each step of the process to identify and select relevant articles.
age, long = longitudinal, cross = cross-sectional group comaprison, oddb.= oddball paradigm, MFP = multi-feature paradigm, no control = no control condition for the oddball paradigm eas used, control yes = a control condition for the oddball paradigm was used, N dev = number of a single deviant; % acc.or min = percentage of accepted deviant trials or the minimum for data acceptance; m = month, F = frontal region, C = central region, P = parietal region, T = temporal region, O = occipital region, L = left hemisphere, R = right hemisphere.and three 2000.Other research groups presented the number of stimuli between 240 and 5000.Most frequently used deviant frequency among standards was 15% in 14 studies and 20% in 13 studies.The other frequencies ranged from 8% to 50%.More specifically, the majority used the number of deviants in the range of 100-200 (Table 2).Less deviants were used by Ní Choisdealbha, N = 41) et al. (2022) and Niemitalo--Haapola, each deviant presented 54 times) et al. (2015).Larger amount of deviants was used by ten studies ( Čeponien ė et al. , and Varga et al., used randomly varying ISI between ca 190-290 ms.Nan et al. (2018) used ISI ranging from 600 to 800 ms.

Fig. 2 .
Fig. 2. Funnel plot of effect sizes within studies.The effect sizes expressed as the standardized mean difference (Hedges g) are presented on the x-axis.The standard errors are presented on the y-axis.The contoured colors denote the significance level of each study in the plot.

Fig. 3 .
Fig. 3.The risk of bias of study reporting.The x-axis reflects the proportions of studies categorized under having a low, moderate, or low risk of bias in the reporting.The numbers on the bars show the number of studies in the categories.The y-axis displays the different bias domains that were judged within each study.

Fig. 4 .
Fig. 4. Summarized information about methodologies used to elicit the MMR.The y-axis and colors represent the proportion of different characteristics used within each domain and the values in the brackets represent the actual number of studies.The x-axis shows different domains under inspection.

Fig. 6 .
Fig. 6.Speech stimuli used in the 51 studies included in the methodology review.

Fig. 7 .
Fig. 7. Effect sizes within each study.The dotted line in the middle represents zero within the effect size scale.The squares in every datapoint represent the value of the effect size and the vertical reference lines the confidence intervals.The diamond at the bottom represents the average effect size and the length of the diamond confidence intervals (DOI 10.17605/OSF.IO/VE3J6).

Fig. 9 .
Fig. 9. Heterogeneity on different levels of the model.The upper row represents the estimated variance caused by sampling error.The lower row represents the estimated variance which is not caused by sampling error.Variance on level 2 represents the variance of observations within each study.Variance on level 3 represents the variance of observations between studies.

Table 1
Inclusion and exclusion criteria for studies in the systematic literature review.

Table 2
Summarized information of studies included into the methodology review and meta-analysis.
(continued on next page) L.Themas et al.

Table 3
Jansson-Verkasalo et al. (2010)nalytic multilevel random effect model..Jansson-Verkasalo et al. (2010)exposed preterm and full-term infants to native and non-native vowels and reported that the perceptual narrowing took place later in the preterm group.Ovchinnikova  et al. ( sounds