Associations between fundamental movement skills and accelerometer-measured physical activity in Chinese children: the mediating role of cardiorespiratory fitness

Background and purpose The associations of fundamental motor skills (FMS), health-related physical fitness (e.g., cardiorespiratory fitness, CRF), and moderate-vigorous physical activity (MVPA) have been demonstrated in Western children, but these associations have not yet been validated in a sample of Chinese children. The aims of this study, therefore, were to examine the association between FMS subdomains and MVPA in a sample of Chinese children and to evaluate whether this association is mediated by CRF. Methods A cross-sectional study consisting of 311 children aged 8–12 years (49.2% girls; mean age = 9.9 years) from Shanghai was conducted. FMS, CRF and MVPA were assessed using the Test of Gross Motor Development-3rd Edition, Progressive Aerobic Cardiovascular Endurance Run and ActiGraph GT3X accelerometers. Preacher & Hayes’s bootstrap method was used to test the mediating effects of CRF on the association between FMS and MVPA. Results CRF fully mediated the association between total FMS and MVPA in girls (indirect effects, b = 0.21, 95% CI [0.07–0.37]), while the mediation was only partial in boys (indirect effects, b = 0.12, 95% CI [0.01–0.26]). CRF fully mediated the association between locomotor skills and MVPA in girls (indirect effects, b = 0.27, 95% CI [0.09– 0.51]), whereas CRF partially mediated the association between object control skills and MVPA in boys (indirect effects, b = 0.15, 95% CI [0.18–0.35]). Conclusion In order to better design and implement sex-specific interventions aiming to increase MVPA, it is essential to consider FMS subdomains and CRF alongside the sex differences in the association between them.

The acquisition of fundamental motor skills (FMS) is a critical component of early childhood.FMS, also known as gross motor skills, are basic, goal-directed movement forms that may be combined and applied to more context-specific skills (Burton & Miller, 1998;Clark, 1994).
These FMS enable children to actively engage with their environment and peers, allowing for diverse experiences that contribute to a well-rounded developmental trajectory (Burton & Miller, 1998;Clark, 1994).Components of motor development are continuously improving and being modified in a developing child; this has cascading effects within the cognitive, physical, neuromuscular, and physiological realms, indicating a critical area of examination (Malina, Bouchard, Bar-Or, 2004;Piek, Dawson, Smith, & Gasson, 2008).By examining these skills one may identify delays, monitor progress, or plan activities to improve a child's motor competency (Burton & Miller, 1998;Ulrich, 2000).Ample research indicates that accruing a multiplicity of FMS sets the foundation for future movements and allows for translation to lifelong physical activity pursuits (Burton & Miller, 1998;Clark & Metcalfe, 2002;Stodden et al., 2008;Robinson et al., 2015).A dynamic and positive relationship has been posited for FMS competence and physical activity participation and engagement in early childhood through adolescence (Logan, Webster, Getchell, Pfieffer, & Robinson, 2015;Stodden et al., 2008;Robinson et al., 2015).Stodden and colleagues (2008) postulated that motor competence, physical activity, perceived motor competence, and healthrelated fitness all work together throughout the lifespan to lead children into a positive spiral of engagement (i.e., higher physical activity, motor competence, perceived competence, and fitness) or a negative spiral of engagement (i.e., lower physical activity, motor competence, perceived competence, and fitness).More work has been conducted in recent years and have highlighted a positive relationship between physical activity and perceived competence, as well as motor "Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc. competence with physical activity, health-related fitness, and inversely with weight status (Robinson et al., 2015).
In measuring motor competency, FMS may be examined by either process-oriented or product-oriented means.The product-oriented approach focuses on the end result of a movement; for example, throwing a ball a specified distance or time to complete a 40-yard dash (Haywood & Getchell, 2014).A process-oriented approach examines the descriptive characteristics, form, or mechanics of a movement; for example arms moving in opposition to legs with elbows bent during running or an elongated step prior to kicking a stationary ball (Haywood & Getchell, 2014;Ulrich, 2000).There are several assessments that examine FMS in young children that utilize either approach of describing movement.The Test of Gross Motor Development (TGMD; Ulrich, 1985;Ulrich, 2000) is one process-oriented assessment, with a few product-oriented components that is widely used for young children.A few examples of product-oriented components included in the TGMD-3 are completing four consecutive gallops, hops, and skips, or throwing a ball underhand at least 15 feet.
The TGMD recently went through a revision to accommodate potential changes in the normative population, as well as to incorporate recommendations from experts in the field of motor development and practitioners who frequently use the assessment to create the TGMD-3 (Ulrich, 2014).Changes included dropping two skills from the TGMD-2, the leap and underhand roll, and replacing these with the skip, underhand throw, and one-hand strike in the TGMD-3.Minor scoring clarifications and wording were adjusted to reflect to reduce ambiguity on scoring for researchers, teachers, and practitioners.The TGMD is comprised of two subscales, locomotor and ball skills (i.e., object control skills), that evaluate six to seven gross motor skills each to provide a comprehensive picture of a child's motor repertoire (Ulrich, 2000).The TGMD is easily administered and provides information on qualitative aspects and sequences of movement behaviors (Burton & Miller, 1998;Cools, De Martealaer, Samaey, & Andries, 2008;Wiart & Darrah, 2001).Valuable information may be derived from standardized observations of these FMS, which may be an important component of assessing the motor development of young children.The utilization of both process-and product-oriented approaches for the TGMD allows for ample information on movement behaviors to be derived from the assessment, which may allow practitioners, teachers, and researchers to garner more information.
A norm-referenced assessment compares performance to that of a representative, normative group, while a criterion-referenced test compares a child's performance to a set standard of performance (American Education Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 2014).The TGMD is comprised of both criterion-referenced and norm-referenced components (Burton & Miller, 1998;Cools et al., 2008;Ulrich, 2000).For the normative data set used with the TGMD, the representative sample is updated every 15 years to reflect changes in the population from the United States census data (AERA, APA, & NCME, 2014;ProQuest, 2012).The TGMD strives to include FMS information in the normative data set that is representative of children between the ages of 3 to 10 years of age in the United States.Therefore, the data that is compiled must be reflective of the composition of age, sex, ethnicity/race, geographic region, and socioeconomic status based on the most recent United States census data (Kaplan & Saccuzzo, 2009;ProQuest, 2012).During this revision of the normative data set, feedback is compiled from experts in the field, researchers, and practitioners who use the test to ensure these FMS are still relevant and appropriate for the targeted purposes.With this revision for the TGMD-3 comes the necessary evaluation of the psychometric properties of this assessment to ensure that usage is both valid and reliable.
Validity, or the degree to which a measurement adequately assesses constructs of interest, is a critical component of test creation and revision (AERA, AOA, & NCME, 2014;Urbina, 2004).
An important underlying element of establishing validity is that the purpose of the assessment needs to match the intended interpretation.Like the TGMD-2, the purposes of the TGMD-3 include: 1) identification and screening; 2) instructional programming; 3) assessment of an individual's progress; 4) program evaluation; and 5) research tool, in relation to FMS competency in young children (Ulrich, 2000).The conceptual framework for the TGMD-3 focuses on evaluating a well-representative variety of FMS in young children, and has been supported through previous editions to align with the intended purposes (Ulrich, 1985(Ulrich, , 2000)).Validation work for a subsample of the TGMD-3 normative data (i.e., item analysis sample) will be discussed in this paper and reinforce this framework as changes have been made to the TGMD-3.New validation and reliability measures must be examined to ensure this test is appropriate prior to expanding data collection to fit the current census information of the U.S. population and finalizing the normative data set (AERA, APA, & NCME, 2014).
Reliability, or the consistency of performance across multiple occasions, is imperative to ensure an assessment is free from random error measurements (AERA, APA, & NCME, 2014; Urbina, 2004).Reliability of a measure provides an estimate of the average amount of error in a child's test score.This can be particularly relevant when decisions of services or program evaluations may hinge on assessments being reliable (Kaplan & Saccuzzo, 2009).Based on the stated purposes of the TGMD-3, it is important to establish reliability prior to evaluating the various forms of validity.Assessment literature highlights that a tool may not be valid if it is proven to be unreliable (AERA, APA, & NCME, 2014;Kaplan & Saccuzzo, 2009;Urbina, 2004).
Therefore, the purpose of this manuscript is to investigate the reliability and validity of the TGMD-3.Specifically, to ensure that an initial subsample of children provide acceptable forms of validity and reliability so that the normative data set may be established.

Participants
In this investigation of the psychometric properties of the TGMD-3, 807 children between 3-10.9 years of age (M age = 6.33 ± 2.09 years) were included in the analysis.This group of participants will be included in the full normative data set that will be published in the manual of the TGMD-3 (Ulrich, 2016).Of the participants, 52.5% were male and the racial make-up for this group included: 57.3% Caucasian, 18.0% African American, 13.3% Hispanic, 7.4% Asian/ Pacific Islander, 0.2% American Indian/ Eskimo/ Aleut, and 3.8% mixed racial background.Participants were recruited through elementary schools and preschools across the United States; University of Michigan Institutional Review Board approval was gained as well as parental consent and child assent prior to data collection.
Test-retest reliability was also evaluated for a subsample of children from this group (n = 30; 60% male).Children were randomly selected to participate in this process.

Test of Gross Motor Development -3 rd edition
The TGMD-3 is a direct observation assessment that measures performance of 13 FMS.
These skills are partitioned into two subscales, locomotor and ball skills.The skills assessed in the locomotor subscale include: run, gallop, one-legged hop, skip, jump, and slide.The ball skills evaluated include: two-hand strike, one-hand strike, catch, kick, dribble, overhand throw, and underhand throw.Each skill is evaluated by examining three to five performance criteria.For example, the performance criteria for skipping include: 1) A step forward followed by a hop on the same foot; 2) Arms are flexed and move in opposition to legs to produce force; and 3) Completes four continuous rhythmical alternating skips.
For each skill, a trained administrator demonstrates the skill and then provides the participant one practice trial, followed by two formal trials that will be observed by the administrator and coded.Each skill has three to five performance criteria for each movement pattern.If a child correctly demonstrated the performance criteria, they were awarded a score of "1" for each trial.If they do not demonstrate the appropriate criteria, a score of "0" was recorded for the trial.Total scores from the performance criteria over the two formal trials are summed to create a raw skill score.Skill scores may be summed to provide a total raw score for either the locomotor or ball skills subscales, or combined to provide a total TGMD-3 raw score.The locomotor subscale raw score total has a possibility of 46 points; the ball skills subscale has a possibility of 54 points.The higher the score, the better the performance on the assessment.
Administrators were trained by the author of the assessment and reached 98% reliability coding sample administrations prior to testing.The author of the assessment and a motor development specialist were the primary administrators for this particular component of the project.One administrator conducted the assessment while another coded the assessments live, or video recorded them for later analysis.Children were tested individually or in small groups of two to four participants.All video recorded assessments (approximately 50% of the assessments) were coded by at least two administrators, 98% inter-rater reliability was achieved for these reliability checks.All live coded assessments were coded by the two main administrators (i.e., test author and motor development specialist).To ensure 98% of agreement was being met, approximately 50% of the live observations were coded simultaneously by these two administrators.
For the children that were randomly selected to participate in the test-retest protocol, the same administrator who conducted the first assessment with the child administered the TGMD-3 a second time.Children were retested at least one week after their first administration, the amount of time varied (M = 13.23 days, range = 7 -25 days) depending on the school's schedule, holidays, or absences.Average age of children from this group was 6.33 years (range: 3 -10 years).

Statistical Analysis
For each statistical test, raw scores were used and calculated for the locomotor and ball skills subscales, as well as for the total TGMD-3.Descriptive statistics were calculated and subdivided by age group (i.e., 1 group for each year between 3 to 10 years).Additionally, correlations were calculated between average performance for each subscale and age.Internal consistency measures were evaluated using Cronbach's coefficient alphas for each age group as well as several subgroups (i.e., both sexes and different ethnic/racial groups).Reliability coefficients that are greater than or equal to 0.70 are considered minimally reliable while reliability coefficients greater than or equal to 0.90 are considered ideal (Nunnally & Bernstein, 1994).Testretest reliability was assessed on a subsample of children (n = 30) using the intra-class correlation coefficient (ICC).ICC agreements were rated as poor for values less than 0.40, fair to good for values ranging from 0.40 -0.75, and excellent if they were above 0.75 (Nunnally & Bernstein, 1994).
Item difficulty was calculated by examining proportions of success; acceptable cutoffs were deemed acceptable in the 15-85% range (Anastasi & Urbina, 1997) calculated by the item-total-score Pearson correlation index, cutoff of indexes .20 or higher was considered minimally acceptable (Anastasi & Urbina, 1997).
The overall sample was randomly split in two (subsample one, n = 407; subsample two, n = 400).Exploratory factor analysis (EFA) with maximum likelihood estimation procedures was used to examine the factor structure of the TGMD-3 on subsample one.Eigenvalues of one or greater (Kaiser, 1974) and a scree test were used as factor extraction criteria.The chi-square (χ²) value, factor loadings (> .40)and cross-loadings (< .30),and communalities were used to guide model and parameter fit (Brown, 2015).These criteria, collectively, were used to make comparisons between plausible solutions (i.e., one-factor model; two-factor model).
A confirmatory factor analysis (CFA) with maximum likelihood estimation procedures was used to validate construct validity findings from the EFA.Specifically, the factor structure produced in the retained EFA on subsample one was confirmed with the second subsample.However, we also tested an alternative model representing the EFA solution that was not retained for comparison purposes.Both parameters and model fit indexes were used to judge the adequacy of the CFA.Standardized factor loadings of .40 or greater were again used as minimum parameter criteria.The overall χ² value provided an assessment of the absolute fit of the model with lower values (e.g., non-significant p-values) representing a good fit.However, χ² tests are considered highly conservative and restrictive tests that rarely produce non-significant p-values (Kline, 2016).Therefore, CFA model evaluation was also determined by the following criteria (Brown, 2015;Kline, 2016) Statistical Package for Social Sciences version 23.0 (SPSS Inc., Chicago, IL, USA) was used for all calculations except CFAs (lavaan package in R); alphas levels were set a priori to 0.05.

Results
The ball skills subscale had a slightly higher average raw score compared to the locomotor subscale, 38.2 (SD = 14.9) and 33.0 (SD = 12.0), respectively.As previously denoted, a higher raw score indicates higher motor skill competency and the raw scores improve with increasing age over the eight age groups (Table 1).Also, moderate to large correlations were found between average raw scores for each category and age (Cohen, 1988), with ball skills having a higher correlation (r = 0.47, moderate correlation) compared to locomotor skills (r = 0.39, moderate correlation; Table 1).
Internal consistency was very high in each age group for both subscales and total raw scores (Table 2).Cronbach's coefficient alpha levels all exceeded the 0.90 excellent classification with the exception of one group, which was ball skills for 10-year-old children (α = 0.89; Cronbach, 1951).When examining the internal consistency for various subgroups (gender and ethnicity), values remained at an excellent level; however, the American Indian/ Eskimo/ Aleut subgroup did not have a large enough sample size to calculate internal consistency (Table 2).
For test-retest reliability, ICC was determined between scores for the first and second administration.All subscales had excellent ICC agreements, the locomotor subscale had an ICC of 0.97, the ball skills subscale had an ICC of 0.95, and the total TGMD-3 scores had an ICC of 0.97.
Median item difficulty levels from the locomotor subscale had difficulty values ranging from 0.49-0.87, the ball skills subscale ranges from 0.43-0.91(Table 3).Average values were 0.74 and 0.71 for locomotor and ball skills respectively for the TGMD-3, this is comparable to the average values of the TGMD-2 which were 0.77 and 0.69 for the same subscales (Table 3).
The TGMD-3 had above acceptable item discrimination values, ranging from 0.34 to 0.67, and is similar in variation to that of the TGMD-2, which ranged from 0.38 to 0.58 (Ulrich, 2000; Table 4).The average median item discrimination values were 0.52 for locomotor skills and 0.54 for ball skills; Anastasi & Urbina (1997) indicated indices greater than 0.35 are considered appropriate.
The data was randomly split into two sets to run EFA and CFA.Age, sex, and ethnicity were similar between the two subsets.For the exploratory subset (i.e., subsample one) the sample contained 407 participants, 52.1% male and average age of 6.16 years (SD = 2.12 years).Using maximum likelihood EFA, one factor emerged from the data, χ 2 (65) = 520.25,p < .001,explaining 73.82% of the variance.Examination of the scree plot clearly identifies a single factor with an eigenvalue of 9.596, well above the recommended eigenvalue of 1.0 (Kaiser, 1974).This lends to the notion that each of the 13 skills contribute to a one-factor solution, with the latent variable being gross motor skills.Factor loadings ranged from 0.797 (slide) to 0.897 (kick; M = 0.85, SD = 0.03), details can be found in Table 5. Communalities were high; indicating ample variance for each skill was explained by this single latent factor, ranging from 0.635 (slide) to 0.810 (kick; Table 5).
A two-factor EFA solution was also tested and although it produced a smaller chi-square value, χ 2 (53) = 338.78,p < .001than the one-factor model, all other criteria failed to support the two-factor solution (see Table 5).The eigenvalue of the second factor was .56.Similarly, factor one accounted for the primary factor loadings in all 13 skill indicators while the second factor did not explain meaningful variance providing clear support for the one-factor model.Results from the one-factor CFA in the second subsample produced results that supported the solution identified in the one-factor EFA on subsample one.Standardized factor loadings (M = .84,SD = .05)for the 13 TGMD-3 skills ranged from 0.76 (slide) to 0.92 (kick) and were all highly significant (p< .001).Subsequently, communalities were also high (M = .71,SD = .08)suggesting high amounts of common variance between the single gross motor skill factor and each test.Model-related information was generally good, χ² (65) = 327.61,p< .001,CFI = .95,TLI = .94,RMSEA = .10,SRMR = .03.Although the RMSEA value is higher than anticipated, the general parameter and model fit tests, collectively, confirm the construct validity of the TGMD-3 as a battery of tests representing an overall factor of gross motor development.
An alternative two-factor CFA model produced a slightly better fit compared to the onefactor CFA, χ² (64) = 273.72,p< .001,CFI = .96,TLI = .95,RMSEA = .09,SRMR = .03.However, examination of parameters revealed that the correlation between the two latent factors of locomotor skills and ball skills was .96and failed to support two clear factors.Based on all of the evidence from these EFAs and CFAs, it was concluded that the one-factor solution garnered the most empirical support and was therefore retained.

Discussion
Assessment of the psychometric properties of a revised test is critical to ensure that the latest edition of the assessment is valid and reliable so that it may be used with confidence.
From the evaluation of this initial item analysis for the TGMD-3, preliminary results show that the assessment exhibits high levels of validity and reliability.This provides confidence for the usage of this revised edition of the TGMD and supporting the collection of new norms for the TGMD-3.Examining the raw scores for the TGMD-3, the ball skills subscale had a slightly higher average raw score compared to the locomotor subscale, likely due to the ball skills subscale having eight additional performance criteria compared to the locomotor subscale (Ulrich, 2016).The raw scores improved with increasing age over the eight age groups, indicating there is a developmental nature to this assessment and provides evidence for construct validity (Urbina, 2004; Table 1).
Construct validity was additionally reinforced with the moderate to large correlations found between raw scores and age (Anastasi & Urbina, 1997).
Internal consistency was evaluated to examine the reliability of the assessment to ensure the TGMD-3 is homogeneous in the variables assessed (Kaplan & Saccuzzo, 2009).Reliability coefficients were found to be ideal, with the exception of one group (Nunnally & Bernstein, 1994).
However, this value was minimally below the ideal mark and still achieved acceptable levels (Table 2).Internal consistency was also examined for subgroups to ensure this assessment is appropriate for both sexes and different ethnic/racial groups (AERA, APA, & NCME, 2014).
Coefficients remained at an excellent level, indicating this assessment consistently evaluates closely related constructs across all ages, in both sexes as well as different ethnic/racial groups.
There was also little variation among subgroups in terms of values for internal consistency, indicating its appropriateness among a wide variety of groups within this item analysis sample, further providing evidence for reliability and construct validity (Urbina, 2004).However, the American Indian/ Eskimo/ Aleut subgroup did not have a large enough sample size to calculate internal consistency, further work when completing the normative data set is needed to ensure a representative sample is acquired for this group and psychometric properties are assessed appropriately.For test-retest reliability, a subsample of children were randomly selected to be assessed on two occasions to determine if the assessment adequately represented stability over time (AERA, APA, & NCME, 2014).ICC values were all in excellent agreement with one another, indicating that this assessment shows strong stability and reliability across time points (Nunnally & Bernstein, 1994).Based on the evaluation of internal consistency and test-retest reliability of this assessment, the TGMD-3 boasts a high degree of reliability.The magnitude of these results provides ample evidence of this version being used with confidence among test users in terms of this being an appropriate, reliable version of this assessment.
Item difficulty and item discrimination were used to examine content validity for this assessment (Anastasi & Urbina, 1997).For item difficulty, using the dichotomous nature of the TGMD-3, a proportion was assessed to evaluate successful achievement of all performance criteria (Urbina, 2004).Difficulty was found to reduce as children age, indicating the developmental nature of FMS that children tend to be more competent at older ages.There may be a chance that as children reach an item difficulty over 0.85 (e.g., 8 and 9 year old locomotor skills, 9 and 10 year old ball skills) that they have reached a difficulty threshold established by Anastasi and Urbina (1997) and after which may encounter a ceiling effect in terms of being competent in FMS and transitioning into more context-specific skills.The average item difficulty for the locomotor subscale was slightly higher compared to the ball skills.As this evaluation takes into account proportions of successfully completing performance criteria, this may provide more detailed information about differences in raw scores between the two subscales.Performance is also comparable to the average values from the TGMD-2 (Ulrich, 2000; Table 3).
Item discrimination, which examines how performance on one item from an assessment is related to performance on other related items, was assessed to examine content and item validity : (a) comparative fit index (CFI) close to .90 -.94 = acceptable fit, .95+= good fit; (b) Tucker-Lewis Index (TLI) close to .90 -.94 = acceptable fit, .95+= good fit; (c) root mean square error of approximation (RMSEA) close to .08 = acceptable fit, .06= good fit; and (d) standardized root mean square residual (SRMR) close to .08 = acceptable fit, .06= good fit.The Downloaded by Univ Southern Qld on 02/28/17, Volume 0, Article Number 0 "Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc.

"
Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc.
. Item discrimination was "Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc.

"
Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc.

"
Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc.

"
Evaluation of the Psychometric Properties of the Test of Gross Motor Development -3 rd Edition" by Webster EK, Ulrich DA Journal of Motor Learning and Development © 2017 Human Kinetics, Inc.

Table 1 . Raw Score Means (and Standard Deviations) by Age for TGMD-3 Subtests
** Correlation is significant at the 0.01 level (2-tailed).*AmericanIndian/ Eskimo/ Aleut was not calculated due to low sample size