High contextual interference improves retention in motor learning: systematic review and meta-analysis

The effect of practice schedule on retention and transfer has been studied since the first publication on contextual interference (CI) in 1966. However, strongly advocated by scientists and practitioners, the CI effect also aroused some doubts. Therefore, our objective was to review the existing literature on CI and to determine how it affects retention in motor learning. We found 1255 articles in the following databases: Scopus, EBSCO, Web of Science, PsycINFO, ScienceDirect, supplemented by the Google Scholar search engine. We screened full texts of 294 studies, of which 54 were included in the meta-analysis. In the meta-analyses, two different models were applied, i.e., a three-level mixed model and random-effects model with averaged effect sizes from single studies. According to both analyses, high CI has a medium beneficial effect on the whole population. These effects were statistically significant. We found that the random practice schedule in laboratory settings effectively improved motor skills retention. On the contrary, in the applied setting, the beneficial effect of random practice on the retention was almost negligible. The random schedule was more beneficial for retention in older adults (large effect size) and in adults (medium effect size). In young participants, the pooled effect size was negligible and statically insignificant.


Inclusion and exclusion criteria
Eligible studies were identified based on PICO (Population, Intervention, Control, Outcome): Population: young, adult, novice, experienced.Only healthy participants, as classified by authors of the primary studies, were included.We did not include studies on disabled participants.The population criteria were divided into two variables.The first variable was related to the age of the participant.Participants classified as Young were under 18 years old, those between 18 and 60 years old were classified as Adults.Older Adults were participants over 60 years old.The second variable was experience.We recognized Novice and Skilled (Experienced) participants according to the authors' statement.
Intervention: high CI (random/interleaved schedule); field setting.Control: low CI (blocked schedule/ repetitive practice); laboratory settings.The studies utilized a wide variety of motor tasks and experimental procedures.However, only the single-task procedure (as opposed to dual-task procedure, only one task is performed at a time) was considered relevant for this review.
The main category considered for analysis was the contextual interference-studies including groups with different practice order: random schedule (high CI) and blocked schedule (low CI) were compared.
The subgroup analysis was performed.Firstly, the intervention category was related to the nature of researchstudies conducted in a field setting using typical sports skills (applied research) were matched up with studies carried out in a controlled laboratory environment (basic research).The second category of subgroup analysis was the age of the participants: young (< 18 y), adults and older Adults (≥ 60 y).The third category was the experience-experienced participants vs. novices.
Outcome: retention test results.The primary outcomes were the standardized effect sizes of CI in retention in motor learning.The outcomes evaluating retention of the learned motor skill were considered selectable.Considering the effect of sleep-induced consolidation of trained skills 27,28 , meta-analysis consisted of delayed retention results only, i.e., results of the tests performed after 24 h while systematic review consisted of immediate and delayed retention test results.We assumed that most of the analyzed study participants should have slept between acquisition and retention tests during the last 24 h.As Diekelmann and Born 27 (p.114)noticed, "Sleep has been identified as a state that optimizes the consolidation of newly acquired information in memory, depending

Search methods and selection procedure
AW performed the search on the following databases: Scopus ("contextual interference" in Title OR Abstract OR Keywords), EBSCO ("contextual interference" in Title OR Abstract-no Keywords option), Web of Science ("contextual interference" in Topic), ScienceDirect ("contextual interference" in Title OR Abstract OR Keywords), in April 2020 (for the period 1966 to 2020), updated in September 2021 (period 2020 to 2021) and November 2022 (period 2021-2022).Additionally, relevant studies were scrutinized using the Google Scholar search engine ("contextual interference" in Title OR Abstract OR Keywords).PsycINFO (EBSCO) database was searched by SC ("contextual interference" in Title OR Abstract OR Keywords).
For the reliable risk-of-bias assessment, the "grey literature" (i.e., Ph.D. dissertations and conference proceedings) available on-line in the searched databases has been excluded as well as studies in languages other than English.
Given the large number of retrieved citations we applied a method proposed by Dundar and Fleeman 29 .AW evaluated all the titles, keywords, and abstracts of the studies for inclusion and exclusion criteria and a random sample was cross-checked by senior researcher (SC).Ineligible articles were excluded.Duplicates of identified studies have been removed.
Two co-authors (AW and PS) read the full text of the studies, independently assessing the papers for final eligibility.Any discrepancies between the two authors were discussed with the senior researcher (SC) to reach a consensus.

Data collection and analysis
AW and PS summarized relevant data in developed MS Excel data extraction forms during the screening.Each entry consisted of study characteristics: the authors' names, the title of the study, year of publishing, journal title, and number of experiments (in case of multiple experiments in the same publication).Further details were based on PICO criteria: -Population (number of participants, age, expertise level, gender), -Intervention (nature of research, practice schedule, type of motor task, testing procedure, dependent variable), -Objectives/outcomes (extracted means and standard deviations for all groups and all measures -immediate retention results and delayed retention results).Only the results of the first block in the retention testing procedure were considered for extraction.We assumed that the following blocks may promote further learning.If SEM (standard error of the mean) was available, we converted it into SD.Similarly, if quartiles were available, we converted these with Mean Variance calculator [30][31][32][33] .Whenever required, the positive/negative effect sizes were transformed to ensure that positive always favors random practice.
We included the results from both, blocked and random scheduled retention tests.If participants were tested in both conditions, both results were included independently.In addition, based on the Quality Assessment Tool for Quantitative Studies (described in the following section), study quality indicators were included (covering: selection bias, study design, confounders, blinding, data collection methods, withdrawals, and dropouts, global rating).
Since the included studies utilized different motor skills (tasks), retention was measured using different scores (numbers, percentages) or units (seconds, meters, number of cycles, etc.), we summarized the analysis using standardized mean difference (SMD) as an effect size measure, i.e.Hedges' (adjusted) g, very similar to Cohen's d, but it includes an adjustment for small sample bias 34,35 .
Heterogeneity among the studies was evaluated using I 2 statistics.The interpretation of I 2 , is as follow: 30% to 60% represent moderate heterogeneity; 50% to 90%-substantial heterogeneity; and 75% to 100%-considerable heterogeneity 36 .However, thresholds for interpretation can be misleading 37 .
Following guidelines were applied while interpreting the magnitude of the SMD in the social sciences: small, SMD = 0.2; medium, SMD = 0.5; and large, SMD = 0.8 38 .
Given that the studies included in our analyses could yield more than one outcome, we decided to use a model that accounts for various sources of dependence, such as within-study effects and correlations between outcomes.Since there is no universally accepted method for this, we opted to apply two different approaches, each with its own advantages and disadvantages.The first approach is based on a three-level mixed model.The second approach is a classical random-effects model based on averaged outcomes (SMDs) from a single study.

Three-level mixed model
A three-level mixed model which uses (restricted) maximum likelihood procedures 39,40 was computed.The model considers the potential dependence among the effect sizes, i.e. when there are multiple outcomes (effect sizes) from one study.The model assumes that the random effects at different levels and the sampling error are independent.The first level of the model refers to variance between effect sizes among participants (level 1).The second level refers to outcomes, i.e. effect sizes extracted from the same study (level 2; within-cluster variance).The third level refers to studies (level 3; between-clusters variance) 39 .Given the second level in the model accounts for sampling covariation, the benefit of this model is that it is no crucial to know or estimate correlations between outcomes from extracted one study 39,41 .
Sensitivity analysis was performed using Cook's D distances.Outcomes further than 4/n (where n was the number of outcomes) were removed to assess how these outliers influence the pooled effect.Meta-analyses

Reasons for exclusion of individual experiments or particular groups of participants
From some studies not all experiments were included, only those that complied with the inclusion criteria.The reasons for exclusion are provided below.
In the study by Schorn and Knowlton 4 , two experiments were performed.Only the first one was included in the present review.The second experiment examined the transfer.Furthermore, the authors did not provide details regarding a number of participants in each of the four groups.Neither were these details provided in the paper nor by the authors in e-correspondence.Another study consisting of more than one experiment was an article by Ste-Marie and colleagues 46 , where CI effect on learning handwriting skills in young participants was examined.Only results of (immediate) retention testing were available from the third experiment, as the results available in the text regarding the second experiment mainly focused on transfer.It was not possible to obtain the results from the text in the first experiment.
Of the two experiments conducted in the study by Porter and Magill 47 , results from the first experiment were possible to obtain.In case of the study by Shea and colleagues 48 we were not able to obtain the absolute timing error results (reported as non-significant) from the authors.The second experiment' results of the mentioned study were not included due to group characteristics (group of ratio-feedback/blocked and random and group of segment-feedback/blocked and random) not compliant with PICO.
Chua and colleagues conducted three experiments, of which two (the second and the third one) were included in the present study.The first experiment was not included as it described constant practice group instead of blocked practice: Constant group participants were divided into three subgroups, and each subgroup threw from  or variable practice group instead of random/blocked groups.Variable group participants threw from all three distances (4, 5, and 6 m) during practice.The order of distances was predetermined and quasi-random, with the constraint that each distance occurred 20 times.Constant group participants were divided into three subgroups, and each subgroup threw from one of the distances (4 m, 5 m, or 6 m) for a total of 60 practice trials.***Broadbent, D.P., Causer, J., Williams, M.A., Ford, P.R., 2017, The first experiment was not considered for meta-analysis as it did not comply with PICO (dual task procedure was applied), in the 2nd task the blocked and random groups (without the Stroop effect) were chosen.(Novice participants were divided into blocked, random, blocked-Stroop(BStroop), and random-Stroop (RStroop) groups.Additionally-however tasks were purely cognitive, they formed (were part of) motor tasks.Anticipation is an important part of each motor task (e.g.where will be opponent in a second and where should a person pass the ball).****(A) Porter, J.M., Magill, R.A., 2010, 2nd  one of the distances (4 m, 5 m, or 6 m) for a total of 60 practice trials 49 .The study of Broadbent and colleagues 50 consisted of two experiments.In the first experiment, the authors applied a dual-task procedure.The second one, beside CI effect, involved Stroop procedures which resulted in four groups (blocked-Stroop, blocked, random-Stroop, random).Hence, the results of two groups (random and blocked) were chosen for analysis.Sherwood 51 conducted two experiments utilizing a rapid aiming task.However, the results of the second experiment were excluded from the present review, as these were considered to be transfer results.
In an experiment by Beik and colleagues 52 , participants were randomly assigned to six groups: blocked-similar, algorithm-similar, random-similar, blocked-dissimilar, algorithm dissimilar, or random-dissimilar.According to PICO, retention results of four groups (blocked-similar, random-similar, blocked-dissimilar, random-dissimilar) were considered applicable for the present review.Consistently, out of the four groups (blocked only, random only, blocked-then-random, and random-then-blocked) involved in the study by Wong and colleagues 53 , two groups were chosen for the purpose of the current review: blocked schedule group and random schedule group.
A study by Porter and colleagues 54 described acquisition and retention of participants randomly assigned to three practice groups: high, moderate, and low CI group.For the present review, the retention results of two groups (high CI and low CI) were extracted.The following study of Porter 55 consisted of three groups: blocked, random, and increasing-CI practice schedules.Only blocked and random schedule groups were included.
In the study by Del Rey and colleagues 56 , CI effect was examined in key-pressing task performed by five practice groups.The groups were different in terms of the administered amount of CI and the presence (or absence) of retroactive inhibition.Retention results of two groups (random, blocked-without retroactive inhibition) were chosen for the purpose of the present study.
French and colleagues 57 in a study on CI in learning volleyball skills, randomly assigned participants to three acquisition groups: random, random-blocked, or blocked practice.The random and the blocked practice groups were included in the present review.Similarly, in an article by Goodwin and Meeuwsen on CI effect in learning golf skills 58 , three groups of participants were tested: learning in random, blocked-random, or blocked practice condition.Consistently, only blocked and random practice schedule groups were included in the current review.

Results of quality assessment of included studies
The results of the methodological assessment of the studies included in our systematic review are highlighted in Table 2.Only three articles presented moderate 4,46 or high 59 methodological quality according to the Quality Assessment Tool for Quantitative Studies 44 .The primary studies failed mainly on the following criteria: 44 articles scored weak rating in the Selection Bias section, in Withdrawals And Drop-outs section 50 studies scored weak rating.Such relatively strict evaluation records could be explained by the fact that two weak ratings were enough to automatically determine a weak classification of an article in its global rating for all six components of the checklist.
Only studies rated strong and moderate should be included in the meta-analysis 26 .However, excluding studies rated as weak would make our analysis rather dubious (with only two studies included).Therefore, we have included fifty-four articles in the meta-analysis.Consistently, the impact of this decision on heterogeneity was considered.

Findings
Only delayed retention testing results were included in the present meta-analysis, yielding 194 effect sizes.Outcomes from 54 studies were included in the meta-analysis, resulting in testing of 2068 participants, yielding 6183 measurements in total.A broad range of variables was involved: time (decision time, absolute error time, variable time, reaction time, response time, completion time), distance (accuracy error distance, absolute error distance, median pathway traveled), a number of performed movements, accuracy (accuracy scores, proficiency percentage).Outcome measures evaluating retention of the learned motor skills were presented in various units: meters, seconds, percentages, or scores.
In ten studies, the results of both testing procedures (immediate and delayed) were presented.In a study by Beik 60 , in addition to immediate testing, participants took part in delayed testing 24 h after their last acquisition session.In Kim's 61 article, immediate retention testing was performed 6 h after acquisition, whereas the time interval between acquisition and delayed retention was 24 h.In a study by Porter 62 , immediate testing results and 7-days delayed testing results were presented.In an experiment by Kaipa et colleagues 63 , delayed retention testing was performed six days after acquisition.In an article by Parab and colleagues 64 , young participants took part in the following testing strategy: immediate testing, delayed testing trial (24 h after last acquisition), and finally-testing seven days after the last acquisition session.Wong and colleagues 53 applied both testing procedures: immediate and delayed (48 h after the last acquisition session).Porter and Saemi 55 described immediate and delayed (48 h) testing results.In an experiment by Li and Lima 65 , participants were involved in two testing procedures: immediate and delayed-24 h after the last acquisition session.A study by Sherwood 51 described two testing strategies: immediate and 24 h delayed; similar procedures could be found in research by Green and Sherwood 66 .

Laboratory versus applied setting-comparison characteristics
Fifty-four studies were included in a laboratory vs. applied settings comparison.Five of these studies described results of immediate retention testing, of which two were conducted in laboratory settings 56,102 and three in applied settings 46,103,104 .Thirty studies with reported results of delayed retention testing were carried out in the laboratory (89 effect sizes).In contrast, twenty-four studies describing delayed retention were conducted in applied settings (105 effect sizes).
In the study of Broadbent 75 , acquisition and retention of tennis skills were performed in laboratory settings.Learning and testing of golf skills and throwing were assessed by Chua et al. 49 .Similarly to Porter et al. 54 , participants practiced golf skills in the laboratory setting.In the study of Jeon 74 , the virtual reality-based balance tasks were performed using the Nintendo Wii Fit system.Moreno et al. 77 , in their study, focused on throwing in the laboratory setting-side throwing, low throwing, and darts throwing skills were learned and tested.Throwing and kicking skills presented in the experiment of Pollatou et al. 79 were performed on the two apparatuses specially invented and constructed to measure the selected motor skills.
The study of speech training and testing 53 by Wong et al. took place in the laboratory settings.The age of the participants in all the aforementioned laboratory experiments ranged from 11 years 75 to 82 years 74 .
Laparoscopic skills acquisition and testing were performed on medical students and post-graduate residents using a virtual reality simulator, mimicking the regular laparoscopic tasks 87 , using FLS Box trainer, in accordance to the Fundamentals of Laparoscopic Surgery (FLS) program 86 .The experiment utilizing Pawlata roll skill 105 took place in the indoor pool.
The age of the participants in the group of applied studies ranged from 6 years 83 to 34 years 62 .Adult participants were primarily students.An article by Souza and colleagues 106 was the only study describing the retention of motor skills in older adults (65-80 years old) in an applied setting.The motor task performed in the study consisted of throwing a boccia ball to three targets.However, due to missing data, this study was excluded from meta-analytic analysis.

The CI effect in youth vs. adults vs. elderly adults-comparison characteristics
All participants included in the present review were from 6 years 83 to 82 years old 74 .Analyzed age subgroups were: young (up to 18 years old), adults (18 years old to 59 years old), and older adults (60 years and older).The articles covering immediate retention reported results from 68 children and 90 adults.In the delayed retention studies, 418 young participants, 205 older participants, and 1425 adults were included in further analyses.In the article by Tsutsui 82 , the authors presented results of 20 participants from 15 to 22 years old; therefore, the results of this study were not included in the age subgroups analyses.

The CI effect in novice vs. experienced participants-comparison characteristics
In his meta-analytic study, Brady initially compared the CI effect between skilled and novice participants.He classified their skill levels based on how the studies' authors labeled them 7 .We applied the same rule in our review.Consequently, we classified participants of five studies as skilled (n = 202).Participants of these studies were characterized as follows.
In an article by Porter and Saemi, the skill level of participants was characterized in the following way: "participants (…) were considered moderately skilled at passing a basketball, which involved skills they were taught in their respective basketball course.None of the participants played college or professional basketball; however, some participants acknowledged that they did play basketball recreationally from time to time" 55 (p. 64-65).Participants in the experiment on CI in learning throwing skills by Tsutsui and colleagues 82 were labeled as highlevel pitchers (n = 10) or low-level pitchers (n = 10).They were assigned to the aforementioned groups based on pretest scores.In their experiment, Broadbent and colleagues 107 described the CI effect on young participants' learning of tennis skills.Based on their skill level, athletes were classified as intermediate.Years of participants' experience in tennis ranged from 5.3 ± 2.2 in the blocked group to 5.9 ± 3.1 in the random group.In their study, Frömer and colleagues 76 investigated the CI effect in learning virtual darts throwing.Based on pretest scores, it was apparent that the participants were familiarized with throwing but were not experts.Participants in the study of Porter and colleagues were classified as unskilled: "(…) had less than two years' basketball playing experience (1.1 ± 1.3 years) and no representative level basketball playing experience" 62 (p. 7).
Summarizing, the aforementioned studies described different standards of inclusion to skilled group.Due to this fact, our meta-analytic comparison of skilled versus novice participants was not considered.
The pooled effect size based on the three-level meta-analytic model was medium SMD = 0.63 (95% CI: 0.33, 0.93; p < 0.001).The estimated variance components (tau squared) were τ 3 2 = 0.93 and τ 2 2 = 0.34 for the level 3 and level 2 components, respectively.This means that I 3 2 = 66% of the total variation can be attributed to betweencluster, and I 2 2 = 24% to within-cluster heterogeneity.Total I 2 = 90%.Sensitivity analysis revealed that there were ten outcomes which were further than 4/n threshold: one outcome from Beik and Fazeli 94 ; one from Beik et al. 52 ; one from Bertollo et al. 78 ; one from Green and Sherwood 66 ; one from Immink et al. 96 ; one from Kaipa and Kaipa 63 ; one from Lin et al. 92 ; one from Pasand et al. 70 ; one from Shea et al. 48; and from Wong et al. 53 .
Given that 36 outcomes were retrieved from the Parab's et al. study 64 we also performed a sensitivity analysis removing all 36 outcomes.The result was not substantially different from the full analysis: SMD = 0.69 (95% CI: 0.41, 0.97; p < 0.0001).

Laboratory vs. field-based (applied) studies
The included studies were divided into those carried out in a laboratory setting (n = 30), including 1210 participants (89 effect sizes), and the remaining (n = 24) conducted in an applied setting (105 effect sizes), including 858 participants.
Three-level mixed model.A subgroup analysis of the CI effect in laboratory studies was performed (Fig. 4).The test of moderators turned out to be significant F(1, 192) = 4.50, p = 0.03.
The CI effect in young vs. adults vs. elderly adults Fifty-three studies were included in a meta-analytic comparison of the CI effect in three age groups, resulting in the testing of 205 older adults in total, 1425 adults, and 418 young participants.Participants (aged from 15 to 22 years old) from the study of Tsutsui 82 were excluded from this comparison due to difficulty in qualifying them to any of the abovementioned age groups.This analysis yielded 6169 measurements in total and elicited: 49 effect sizes for young participants, 119 effect sizes for adults, and 24 effect sizes for the group of older adults.
Three-level mixed model.The test of moderators was not significant F(2, 189) = 1.69, p = 0.19; however we decided to perform the analysis anyway, because the differences between age groups (SMD) were quite substantial.

Discussion
The study's main objective was to determine the effect size of CI on retention in motor learning.When applying the three-level mixed model, we found that the pooled effect size was statistically significant and medium (SMD = 0.63).Analysis with the random-effects model on averaged outcomes for singles studies yielded similar results, i.e., statistically significant medium effect (SMD = 0.71).Our secondary objectives were to estimate the CI effect in laboratory versus non-laboratory studies and estimate the CI effect in different age groups.Only analysis of laboratory studies yielded statistically significant results, and the SMD was large (the three-level mixed model SMD = 0.92; the random-effects model SMD = 0.99).Analysis of applied studies turned out to be statistically insignificant, and the effect size was slightly above negligible (SMD = 0.23 in the three-level mixed model and SMD = 0.28 in the random-effects model).In both analyses, random practice was favored.
Lee and White 109 suggested that CI effect is more conspicuous in laboratory settings as tasks are less motor demanding, more cognitively loaded, lack intrinsic interest, and quickly reach an asymptote.The robustness of the CI effect in a laboratory setting, on the other hand, might be attributed to its well-controlled specification.Additionally, as Jeon and colleagues 74 noticed, the CI effect in laboratory settings was frequently associated with simple tasks.In contrast, CI in more complex tasks was often examined in the field setting.One plausible explanation of this finding may be that the complexity of the sport task, alongside high interference practice, could be too challenging for the information processing system, negatively affecting learning 110,111 .Therefore, the CI effect in applied setting may not be so conspicuous.
The analyses in age groups yielded significant results in older adults in both analyses.In the first place, CI effect in motor learning of older adults was large (in the three-level mixed model SMD = 1.45 and SMD = 1.58 in the random-effects model), i.e., a random schedule was more beneficial for retention.
The benefits of implementing random practice in motor learning for adult participants were medium (SMD = 0.63 in the three-level mixed model and SMD = 0.66 in the random-effect model).However, the difference between the blocked and random groups of adults was insignificant in the three-level mixed model, whereas it was significant in the random-effect model.The CI effect in young participants was statistically insignificant, and the SMD was negligible in the three-level model (SMD = 0.02) and small in the random-effects model (SMD = 0.28).
The overall results of our review partially corresponded with those reported in the meta-analysis by Brady 7 .In line with the constantly advancing methodology of conducting meta-analyses, the inclusion criteria implemented

Age-and settings-related differences
The most interesting trend found in our and Brady's 7 meta-analysis was that the CI effect is more conspicuous in older adults.However, what has to be emphasized is that all included experiments with older adults' participation were performed in a laboratory setting.On the opposite, only one primary study 75 , including 18 young participants, was conducted in a laboratory setting.The remaining studies, including 400 youth, were performed in an applied setting.Due to this fact, it is difficult to differentiate whether participants' age plays a crucial role in the CI effect.Perhaps it is the setting that is critical when considering the CI effect?To solve this problem, more studies on children in a laboratory and more studies on older adults and elderly in an applied setting could be conducted.Given the disparity between settings in different age groups, the overall CI effect in different age groups may be biased.Analogically, one could claim that the results of the settings comparison may be biased, i.e., in the applied setting, only children (except for one study) and no older adults are included.The opposite situation is noticed in a laboratory setting.Ammar and colleagues 18 reported that CI effect was present only in 20-24 and 25-32-year-old participants (small and moderate ES, respectively) whereas blocked practice was favored in older adults.Again, these contradictions to our findings may be attributed to the different search methods and the number of studies and effect sizes included in both meta-analysis.to the intervention was answered: "Can't tell".In 56 out of 59 studies, there was no information on whether the assessor(s) was aware of the intervention or the exposure status of participants.In 39 out of 59 studies, there was no information on whether participants were aware of the study research question.51 out of 59 studies did not report the withdrawal and dropout numbers and reasons.Unfortunately, studies on the CI effect are of poor quality.When considering all of the limitations of the included studies, we cannot tell whether participants were aware of the study research question.One may conclude that studies on CI may be biased.Therefore, the question initially asked by Al-Mustafa 11 and re-asked by Brady 7 has to be re-stated again: is "contextual interference a laboratory artifact or sport-skill related''?

Heterogeneity problem
The possibility of substantial heterogeneity in the analyses was considered when planning our review and introducing the broad PICO criteria.Differences in populations such as age and origin, followed by a variety of included motor tasks and outcome measures, could contribute to increased heterogeneity.Additionally, the number of primary studies (and consequently different methodologies such as experiment duration) could determine the level of heterogeneity.According to Van Aert and colleagues: "none of the publication bias methods has desirable statistical properties under extreme heterogeneity in true effect size" 108 (p.8).Another source of heterogeneity could be the low quality of the included studies.
There are many reasons why I 2 values could have been so high.First of all, it may be due to the test we used.We used I 2 index instead of Q test that was used by Brady (2004).It is because Q test informs about the presence versus the absence of heterogeneity, whereas I 2 index also quantifies the magnitude of heterogeneity 124 .Second, in all our analyses, more than 24 outcomes were included.In the general analysis, 194 outcomes were used.However, the more studies are included in the heterogeneity tests the higher is I 2 value 125 .As Schroll and colleagues 126 noted, if there are very few studies included in the meta-analysis, with relatively few participants, the risk of high heterogeneity as quantified with I 2 value (> 50%) is very small, although the heterogeneity may be present.Unfortunately, authors dealing with high heterogeneity cannot do much about it.Increased precision does not solve the problem 127 , and there is little advice for authors on how to deal with it 126 .

Limitations
Firstly, there is a dependency within studies problem when multiple outcomes from a single study come from the same sample.Unfortunately, there is no simple answer to how to deal with such a problem.The most reliable approach would be to calculate the correlations between outcomes from the same sample using the raw data.This approach is, however, infeasible-many authors do not want to share their results, many have already lost them, and many authors passed away (given that we were analyzing results from 1966).
Additionally, the most popular approaches include averaging effect sizes derived on the same sample, analyzing each type of outcome separately, and applying three-level mixed models.Again, each of these approaches has its advantages and limitations.In our case, we decided that the most suitable would be to apply the three-level mixed model.Though the model assumes that there is no correlation between outcomes (effects sizes) obtained from the same sample, as Van den Noorgate et al. 41 noticed, "An important conclusion is that, although the multilevel model we proposed for dealing with multiple outcomes within the same study in principle assumes no sampling covariation (or independent samples), our simulation study suggests that using an intermediate level of outcomes within studies succeeds in accounting for the sampling covariance in an accurate way, yielding appropriate standard errors and interval estimates for the effects" (p.589).Nevertheless, one may doubt the model since it assumes the effect sizes are independent.To compare our results, we applied another model, the classical random-effects model; however, we used averaged effect sizes whenever they were derived from the same studies (same population).This approach is not potentially biased due to the dependency problem; however, it loses its informative value since the variance between effect sizes is reduced 41 .
What is worth mentioning is that both models we applied yielded very similar results.
One could suggest that we could analyze each outcome type, e.g., grouping them according to the skill characteristic (throws, aiming, kicking, etc.) or nature of the outcome (force production, scoring system, reaction time, movement time, kinematic characteristics, etc.).This approach would require a robust theoretical framework to differentiate different outcomes.Our potential readers should consider the limitations and advantages of our analyses.
Secondly, we did not study practice volume or nominal task difficulty in our analyses.These may be important factors contributing to the overall pooled effect size; however, given our review is broad in scope, more detailed analyses could be performed in the subsequent studies.
Thirdly, an analysis focusing exclusively on experience as a contributing factor affecting CI effect could be performed.However, a thorough and detailed definition of experience might be used.

Recommendation for future research
There is a limited number of motor learning studies utilizing young (up to 18 years old) healthy participants in a laboratory setting.Most motor learning studies with older adults (60 years and older) are performed in a laboratory setting.Therefore, we recommend further research on the CI effect, including young participants in a basic (laboratory) setting.We would also suggest future research on the CI effect in older adults (60 years and  older) conducted in an applied setting.
In the subsequent studies, researchers could put a strong emphasis on the quality (methodology) of the research.

Conclusions
The Ci effect is a robust phenomenon in motor learning.Our results evinced, however, that, similarly to Brady  (2004), this claim is primarily based on laboratory studies in adults and older adults.Experiments conducted in applied settings yielded fewer convincing results.Moreover, high CI does not benefit retention in young participants.It does in adults and older adults.
Practitioners, should consider other factors, e.g., the interaction between the skill level of the performer's motor complexity, cognitive load, and the performer's intrinsic interest, while deciding how to structure practice and how much CI apply.

235 Figure 1 .
Figure 1.PRISMA flow diagram of the search process 45 .Flowchart of the primary search (time period 1966 to 2020), updated searches (time period 2020 to 2021 and 2021 to 2022), and the inclusion and exclusion process.

Figure 7 .
Figure 7.The random-effects model analysis of retention test results of random vs. blocked in a laboratory setting.In columns "Random Total" and "Blocked Total" are provided number of participants.

Figure 9 .
Figure 9.The three-level mixed model analysis of adult participants' retention tests results: random practice vs. blocked practice.The forest plot presents the retention test results obtained by participants aged 18-59, including various motor tasks and different outcome measures.

Figure 10 .
Figure 10.The three-level mixed model analysis of older adults' retention tests results: random practice vs. blocked practice.The forest plot presents the retention test results obtained by participants aged 60-82, including a variety of motor tasks and different outcome measures.

Figure 11 .
Figure 11.The random-effects model analysis of young participants' retention tests results: random practice vs. blocked practice.In columns "Random Total" and "Blocked Total" are provided number of participants.

Figure 13 .
Figure13.The random-effects model analysis of older adult's participants' retention tests results: random practice vs. blocked practice.In columns "Random Total" and "Blocked Total" are provided number of participants.

Table 1 .
Summary of the included studies.

Table 2 .
Quality assessment of included studies.Q1, Q2, ... -question number accroding to the Quality Assessment Tool for Quantitative Studies Vol:.(1234567890) Scientific Reports | (2024) 14:15974 | https://doi.org/10.1038/s41598-024-65753-3 The three-level mixed model analysis of retention test results in a random and blocked schedule in a laboratory setting.The forest plot presents the retention test results obtained by participants practicing in a laboratory setting, including various motor tasks and different outcome measures.The three-level mixed model analysis of random and blocked schedule retention test results in an applied setting.The forest plot presents the retention test results in an applied setting, including various motor tasks and different outcome measures.