Which Bayley-III cut-off values should be used in different developmental levels?

Background/aim Latest version of Bayley Scales (Bayley-III) and its predecessor (BSID-II) are the most widely used standardized developmental tools in infancy and early childhood. Recent studies showed that Bayley-III scores were higher than BSID-II in 18–24 month-old and mostly premature infants. We aimed to evaluate the generalization of inflated scores of Bayley-III to children aged 6–42 months with different disease groups, and to find out which cut-off points should be used in Bayley-III to detect mild, moderate, and severe developmental delay according to BSID-II standard cut-off points. Materials and methods Two hundred and fifty-five children aged 6–42 months with different diseases and developmental levels were administered both the Bayley-III and BSID-II in the same session between 15 November 2017 and 15 April 2018. Results The mean Bayley-III Cognitive Composite (CC) and Cognitive Language Composite (CLC) scores were respectively 13.1 ± 9.1 and 8.6 ± 8 points higher than BSID-II Mental Development Index (MDI) scores (P < 0.001). The mean Bayley-III Motor Composite (MC) scores were 14.4 ± 10.5 points higher than BSID-II Psychomotor Developmental Index (PDI) scores (P < 0.001). Cognitive delay was found in 126 (49.4%) and 59 (23.1%) children according to BSID-II MDI and Bayley-III CC scores, respectively. Motor delay was found in 174 (69.3%) and 86 (34.3%) children according to the BSID-II PDI and Bayley-III MC scores, respectively. Children had less cognitive (48.6%) and motor delay (54.5%) according to Bayley-III scores. Bayley-III scores were significantly higher than BSID-II scores for all ages (P < 0.001). According to ROC analysis the cut-off scores for mild, moderate, and severe delay were 92.5, 83.2, and 71.2 for Bayley-III CLC; and 98.5, 86.5, and 74.5 for Bayley-III MC, respectively. Conclusion Bayley-III scores should be interpreted carefully for all age ranges and different diagnosis. The risk for underestimation of developmental delays by Bayley-III should be kept in mind. Different Bayley-III cut-off scores should be used to define developmental delay levels.

the BSID-II, and Bayley-III may identify significantly fewer children with developmental delay compared to the BSID-II [4][5][6][7]. These studies were predominantly conducted in premature infants and specific age ranges such as between 18 and 24 months. No study has been conducted so far about whether Bayley-III's higher scores for neurodevelopment than BSID-II can be generalization to 0-42 months age range and to infants with different diseases. Based on the standard cut-off scores of 85, 70, and 55 of BSID-II reflecting mild, moderate, and severe developmental delays, studies determining which cutoff points should be used in Bayley-III are limited [8]. Moreover, there is no study comparing BSID-II and Bayley-III in Turkish children.
We aimed to evaluate the generalization of inflated scores of Bayley-III to children aged 6-42 months and also with different disease groups, and to find out which cut-off points should be used in Bayley-III to detect mild, moderate, and severe developmental delay according to BSID-II standard cut-off points.

Study design and participants
This prospective study was conducted in Ankara Child Health and Diseases Hematology and Oncology Training and Research Hospital, University of Health Sciences Turkey. Ethical committee of approval was obtained from Etlik Zübeyde Hanım Women's Health Teaching and Research Hospital, University of Health Sciences Turkey.
Children with developmental risks or difficulties (prematurity, hearing loss, perinatal asphyxia, cerebral palsy, epilepsy, genetic diseases, metabolic diseases, etc.) and healthy children aged 6-42 months admitted to the Developmental-Behavioral Pediatrics Outpatient Clinic between 15 November 2017 and 15 April 2018 were included in the study. Inform consent was taken from parents.
A A developmental-behavioral pediatrician reviewed the medical records of children, performed detailed neurological and physical examinations, and evaluated their developmental functions, activities, and participation in life.
Bayley-III items were administered first in accordance with its technique. Bayley-III is more detailed and contains more items than BSID-II, and most of the test items in Bayley-III overlap with BSID-II. Thus, Bayley-III and overlapping BSID-II items were coded with the child's performance. Additional BSID-II items were administered after Bayley-III in the same session. Items were scored according to the instructions of each version. During the assessment, for hearing impaired children, it was ensured that the child was using the hearing aid or cohlear implant correctly; the ambient noise was minimized, and the child was spoken to clearly and naturally.
The MDI and PDI were calculated from the BSID-II raw scores; and the Cognitive Composite (CC), Language Composite (LC), and Motor Composite (MC) scores were calculated from the Bayley-III raw scores. The Bayley-III Cognitive Language Composite (CLC) was defined as the mean of the CC and LC scores [6,8]. The differences  between the BSID-II MDI and Bayley-III CC; BSID-II  MDI and Bayley-III CLC; and BSID-II PDI and Bayley-III  MC were compared. With regard to Bayley scores, mild, moderate, and severe delay were interpreted as <85 points (<-1SD), <70 points (<-2SD), and <55 (<-3SD), respectively.
Age adjustment was performed in premature infants up to two years. Children who had lost their concentration or could not finish both tests were completely excluded from the study to avoid reporting misleading scores. According to family-centered holistic developmental evaluation, children with developmental difficulties or disabilities were referred to early intervention services.

Statistical analysis
Statistical analyses were performed using the SPSS statistical package (v. 20.0 for MAC). Categorical variables between groups were analyzed using the χ2 test. Comparison of means between two groups was examined by using a t-test, where the data fit a normal distribution. For comparison of more than two groups, ANOVA was used for normal distributions and the Kruskal-Wallis test for nonnormal distributions. Receiver operating characteristic curve analysis (ROC) was used to determine the power of variables to differentiate groups, and the area under the curve was calculated; significant cut-off levels were calculated using a Youden index. A P value of <0.05 was deemed to indicate statistical significance.
When the distribution of children according to age ranges is considered, most of the children were between the ages of 6 and 12 months, 25 and 30 months, and 19 and 24 months, respectively (Table 3). In all age ranges, the mean Bayley-III CC and MC scores were higher than the BSID-II MDI and BSID-II PDI scores (P < 0.001).
While most of the studies were performed in preterm infants [4,5,7,9,[12][13][14][15][16][17], only two studies were conducted with children with different diagnoses such as neonatal encephalopathy [6] and complex cardiac surgery history [10]. Our study was performed in children with different It was reported that 40% and 48.1% of the children were classified as less severely cognitive and motor delayed with the Bayley-III scores than with the BSID-II [5,12]. In our study, the levels of cognitive delay and motor development delay were classified as less severely delayed in 48.6% and 54.5% of the children using the Bayley-III compared to BSID-II. The discrepancy between BSID-II and Bayley-III in determining the level of developmental delay was higher in our study compared to previous studies. This might be related to the fact that our study was performed in children with different age ranges, diagnoses, and developmental levels.
In some studies new cut-off points for Bayley-III were determined to correct inflated scores, classify the developmental delays accurately, and to interpret the results of the studies correctly because Bayley-III has widespread use and no alternative. Studies suggested taking <80-85 as a Bayley-III CC score cut-off for the <70 BSID-II MDI score [5][6][7]9] ; and in one study the cut-off value was reported as <93 points [14]. In another study, <80 points was indicated as the best cut-off point for Bayley-III CLC score corresponding to the <70 BSID-II MDI score, and it was argued that Bayley-III CLC scores had the advantage of producing a single continuous outcome; however, it required more confirmation [7].
The studies that define the optimal cut-off value for Bayley-III MC are limited. Jary et al. suggested the optimal Bayley-III MC cut-off for the identification of BSID-II PDI <70 was <85 [6] whereas Duncan et al. suggested <73 as cutoff in their study conducted using the National Institute of Child Health and Human Development Neonatal Research Network data [16]. In another study, it was asserted that the Bayley-III MC cut-off composite scores should be 12-24 points higher than 70 for optimal prediction of the motor delay as defined by the BSID-II index score <70 [15].
However, most of the previous studies focused on the cut-off value for moderate developmental delay (70 points); there is only one study identifying optimal cut-off values for mild (85 points) and severe (55 points) developmental delay in 62 children [8]. They suggested to use 87.3, 78.0, and 67.0 in Bayley-III CLC instead of 85, 70, and 55 in the BSID-II MDI respectively, for mild, moderate, and severe cognitive delays [8]. In the same study, it was suggested that the scores should be increased from 70 to 80 and from 55 to 68.5 for the moderate and severe motor delay, respectively [8]. In our study, it was found that 92.5, 83.2, and 71.2 cut-off points should be used in the Bayley-III CLC score; and 98.5, 86.5, and 74.5 cut-off points should be used in the Bayley-III MC score respectively for the mild, moderate, and severe delays. We found higher cut-off levels than Yi et al. This may be associated with higher number of participants in our study and our study included healthy children and children with different developmental levels.
It was shown that the difference between the BSID-II and Bayley-III scores was not linear, and the gap between the two scales increased as the severity of the developmental delay increased [8,9]. Similarly we found that the discrepancy between the BSID-II and Bayley-III scores was 7.5, 13.2, and 16.2 points for the mild, moderate, and severe cognitive delay respectively; and 13.5, 16.5, and 19.5 points for the mild, moderate, and severe motor developmental delay, respectively. The relatively higher Bayley-III scores are important in terms of inadequate detection of children with developmental delay and need for intervention services. The largest gap between the two scales at the lowest end of the scores might cause the children who may benefit most from early intervention services to be detected as not requiring these services, and leads them not to receive services. Moreover, as Bayley-III is a widely used tool in research, relatively higher scores may lead to inadequate detection of the prevalence and severity of developmental delays in clinical populations. The standardization of Bayley II was performed in children who were completely healthy. Bayley-III was normed using a "mixed sampling procedure". In other words, approximately 10% of the standardization sample consists of children with developmental difficulties or delay such as Down syndrome, cerebral palsy, pervasive developmental disorder, prematurity, and speech delay. This mixed sampling procedure is likely to lead to a decrease in the raw scores, which would constitute the standard average score of 100, the increase in standard deviation, and the expected performance from the normative population, and to decrease the capacity to identify developmental delay [18]. This might be the reason for the increasing discrepancy between the two tests. In fact, expected performances for the common test items that measure the same developmental function in both versions of the test are scored at the older age range in Bayley-III and this may support the difference between Bayley-III and BSID-II. For example, while the performance in the test item "completes correctly pink board within 180 s" is expected at 23-25 months age range in BSID-II, it is expected at the 25 months 16 days-28 months 15 days age range in Bayley-III. While the "uses a two-word utterance" performance is expected at the 23-25 months age range in BSID-II, it is expected at the 28 months 16  There are few studies about prediction of later functioning by Bayley-III and BSID-II. In a meta-analysis study, a strong correlation between the MDI scores and cognitive functioning at later ages, and a weak correlation between the PDI scores and the motor functioning at later ages were found (P < 0.001) [19]. Different results including high [20] and low [21] predictivity were reported about the value of 2-year-old Bayley-III cognitive and language composite scores in predicting cognitive functioning at the age of 4. It was stated that Bayley-III motor scores had a high specificity in predicting later motor impairment with a low sensitivity, and Bayley-III was less able to detect later motor impairment [22]. We will follow our patients to evaluate which scale will predict later functioning better.
Our research is the first study to show that the mean Bayley-III scores were higher than the BSID-II scores in all age ranges at 6-42 months aged children. In addition, it is a strong aspect of our study that it included children with different diagnoses and developmental levels. This is the largest study to date that determined cut-off values for mild and severe developmental delays, in addition to moderate delay for Bayley-III. It was also the first study in the Turkish population to compare the two scales.
Although fatigue cases are excluded, both versions of the tests were performed in the same session, which might have resulted in assessment bias that led to inflated test scores. The administering of BSID-II and Bayley-III by the same specialist is another limitation of our study.
In conclusion, Bayley-III scores should be interpreted carefully for all age ranges and different diagnoses. The risk for underestimation of developmental delays by Bayley-III should be kept in mind. Different Bayley-III cut-off values from BSID-II should be used to define developmental delay levels. Long-term results are needed to determine which scale has better predictive value for later cognitive and motor functioning.