Acoustic Cry Characteristics in Preterm Infants and Developmental and Behavioral Outcomes at 2 Years of Age

This cohort study evaluates the association between acoustic cry characteristics in preterm infants and developmental and behavioral outcomes at 2 years of age.


DESIGN, SETTING, AND PARTICIPANTS Infants born less than 30 weeks postmenstrual age (PMA)
were enrolled from April 2014 through June 2016 as part of a multicenter (9 US university affiliated NICUs) cohort study and followed to adjusted age 2 years.Reported analyses began on September 2021.Data were analyzed from September 2021 to September 2022.EXPOSURES The primary exposure was premature birth (<30 weeks PMA).RESULTS Analyzed infants (363 participants) were primarily male (202 participants [55.65%]) and had a mean [SD] gestational age of 27.08 [1.95] weeks).Cross-validated random forest models revealed that cry acoustics were associated with 2-year outcomes.Tests of diagnostic odds ratios (DOR) revealed that infants who exhibited total problem behavior CBCL scores greater than 63 at age 2 years were 3.3 times more likely (95% CI, 1.44-7.49)to be identified as so by random forest model estimates relative to other infants (scores Յ63); this association was robust to adjustment for familywise type-I error rates and covariate measures.Similar associations were observed for internalizing (DOR, 2.39; 95% CI, 1.04-5.47)and externalizing (DOR, 2.25; 95% CI, 1.12-4.54)scores on the CBCL, clinically significant language (DOR, 1.71; 95% CI, 1.10-2.67)and cognitive (DOR, 1.70; 95% CI, 1.00-2.88)scores on the Bayley-III, and a positive autism screen on the M-CHAT (DOR, 1.91; 95% CI, 1.05-3.44).
The possibility that acoustic characteristics of the infant's cry could be associated with prematurity is suggested by the findings that medical conditions affecting the brain, such as postasphyxia encephalopathy, meningitis, and cri du chat have been associated with "abnormal" cries. 13This association led to the hypothesis that cry characteristics could be a measure of the integrity of the central nervous system and be useful in populations with less severe medical problems such as preterm infants.
The paradox in infant cry research is that early technologies to measure acoustic features of the infant's cry used crude instrumentation; initially the sound spectrograph, which could only measure a few cry features, was used.Cry pitch became the harbinger of pathology because it is audible and high pitch is common in infants with medical problems. 13However, with the advent of high-speed computer technology, the sheer amount of cry data that became available was unmanageable for traditional statistical models.The present study brings together a cry analysis system that leverages state-of-the-art signal processing with machine learning technology to take advantage of large and complex data sets to study associations with varied cry acoustic features including energy (ie, loudness), pitch (ie, fundamental frequency), formants (ie, resonance), voicing (ie, vocal fold vibration), and fricatives (ie, vocal tract constriction).We studied preterm infant cries recorded before discharge from the neonatal intensive care unit (NICU) and developmental outcomes at age 2 years and hypothesized that acoustic cry characteristics were associated with neurobehavioral and behavioral deficits.

Study Design
Infants were enrolled in the multicenter, observational Neurobehavior and Outcomes in Very preterm Infants (NOVI) study from April 2014 through June 2016 at 9 US university-affiliated NICUs.
Parents of eligible infants were invited to participate at 31 to 32 weeks' PMA, or when the attending neonatologist determined that survival to discharge was likely.Enrollment and consent procedures were approved by local institutional review boards.All mothers provided written informed consent.
Sample size was determined by an a priori power analysis.This cohort study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Measures
Cries of the preterm infants were recorded during administration of the NICU Network Neurobehavioral Scale (NNNS) 7 during the week of NICU discharge (± 3 days).The cries were elicited by handling the baby in a standardized format that is designed to increase the babies' level of arousal.
The stimuli are akin to cries that are elicited during routine caregiving such as a diaper change.Audio recordings were made using an Olympus direct to PCM digital voice recorder and saved in an uncompressed .wavaudio format (recording parameters: 16 bit, 48 kHz).Recorders were attached to the side of the infant's crib at a standardized location and oriented toward the infant's mouth.This allowed for a standard distance between the infant's mouth and the recorder during the NNNS examination (approximately 8-9 inches from the infant's mouth).Episodes of cry vocalizations were identified from these audio recordings.Cry episodes suitable for acoustic analysis were identified based on the absence of background noises that would interfere with the analysis (ie, adult talk, medical equipment noises, and other environmental noises).The identification of usable cries was based on reliability training from a previous study 14 in which 89% agreement was established in identifying cries appropriate for acoustic analysis.The first suitable cry episode from each examination was excerpted into an uncompressed .wavfile for subsequent acoustic analysis.The extracted recordings only contained infant cries and were stripped of any identifiable information.Of the infants with 2-year outcome data, 428 had extractable cries (infant cried during the audio recording, good sound quality, no background noise).
Several acoustic features were used to characterize cry episodes, including energy, fundamental frequency, formants, utterances, voicing, fricatives, and signal quality.Briefly, energy measures sound pressure levels in decibels (which we hear as loudness) as air is expelled from the lungs.
Related are utterances which can be long (>500 ms) or short (<500 ms).At the larynx, vibration of the vocal folds produces sound waves including the fundamental frequency (which we hear as pitch).
This initial sound is then modified or filtered by the vocal track producing formants.Formants are characteristic features of the resonances of the space, (ie, the vocal tract).For example, a "C" note on a saxophone sounds different than a "C" note on a piano because the resonating chambers are different.Signal quality measures the reliability with which the cry analyzer can measure the acoustic characteristics.Voicing refers to sounds produced by vocal fold vibration.Sounds can also be unvoiced when the vocal folds are not involved but the sound is altered as, for example, with the tongue or lips.Fricatives characterizes how the sound is altered by friction due to constriction of the vocal tract.
Procedures and criteria for collecting maternal and infant variables have been previously described, including cranial ultrasonography readings, interpreted by centralized study neuroradiologists. 13Infant medical risk was measured using an index, 15 in which scores are computed as the sum of dummy variables indicating the presence of 4 morbidities ascertained prospectively 16 from medical record reviews using Vermont-Oxford Network definitions 17 of brain injury (from consensus central readings of cranial ultrasounds [that defined periventricular leukomalacia, moderate-severe ventriculomegaly, or parenchymal echodensity]), bronchopulmonary dysplasia, severe retinopathy of prematurity, and necrotizing enterocolitis and/or culture positive sepsis. 15Race and ethnicity data (American Indian/Alaska Native, Asian, Black or African American, Native Hawaiian or other Pacific Islander, more than 1 race, unknown or not reported, Hispanic ethnicity) were obtained from maternal interview self-reports (preferred source), or medical record abstraction.We included race and ethnicity as part of standard, unbiased demographic information.
Main outcome measures were completed at 2 years adjusted age (± 2 months).Assessments included the Bayley-III, administered by experienced examiners who were blinded to infant medical conditions in the NICU, and trained by certified Bayley-III trainers to reliability using standardized Bayley-III training protocols. 18Parents completed the Child Behavior Checklist (CBCL) 19 problem

Statistical Analyses
As shown in the Figure, digital sound files of cry episodes were subjected to an established, computerized cry analysis system. 18 Infant-level counts of cry utterances were also computed (thereafter referenced as "utterances").
Next, the resulting set of candidate features was reduced, such that features with high rates (>40%) of missing data, near 0 variability, and high (r = 0.75) intercorrelation were removed, consistent with published guidelines for machine learning feature filtering. 22cross-validated supervised machine learning model (ie, random forest) was used to estimate developmental outcomes at age 2 using all available cry acoustics variables.Random forests were fitted using the ranger R 23 package on R 4.2 (R Project for Statistical Computing) 24 using default parameters with exception of number of trees (10 000).Model estimates were evaluated using leave-1-out cross validation, such that the test-train split is repeated for every infant in the data set, random forest models are trained on data containing all except 1 infant, and the held-out infant is used to evaluate whether model estimates generalize to new data.Predicted probabilities generated using leave-1-out cross validation were used to create a receiver operating characteristic (ROC) curve, and pick an optimal probability cut point (based on the shortest distance to perfect prediction). 25e resulting binary estimates were used to compute diagnostic odds ratios (DOR), 26 95% confidence intervals and P values (α = .05statistical significance threshold).Area under the receiver operating curve estimates were also reported. 26

Figure. Data Reduction Strategy
Leave A mixed effect binary logistic regression model was fitted (using glmer from the lme4 package 27 ) to account for nesting of infants (level 1 unit) within study sites (level 2 unit).Consistent with prior work, 28,29 the effect of confounding variables was adjusted using a stepwise method, where all covariate measures (ie, sex, low maternal SES, low maternal education, minority race or ethnicity, maternal primary language, no partner at birth, infant medical risk index score, and post menstrual age at birth) were entered as level 1 fixed effects in the first step and the final estimate of our machine learning models was entered as a level 1 fixed effect in the second step.A (nonparametric) likelihood ratio permutation test was used to evaluate whether adding the final machine learning estimate to a model containing covariate measures led to a significant reduction in

Results
The study enrolled 704 infants, of which 556 completed the 2-year assessment (mean [SD]   follow-up, 2.29 [0.16] years).Of these, 363 completed the year 2 follow-up, had complete covariate data and valid cry data.Maternal and infant descriptive data of the final sample are shown in Table 1.
Examining the top 10 most important variables within each model (see eTable in Supplement 1) revealed that final models used 140 acoustic features across 14 separate models that may be summarized into 7 groups of variables (ie, energy, fundamental frequency, formants, utterances, voicing, fricatives, and signal quality).Of the 7 variable groups, 3 (energy, formants and signal quality) were used at least 20 times, 1 (fundamental frequency) was used 18 times and the remaining 3 (utterances, voicing, and frication) were used 8 times or less.

Discussion
Using a machine learning (random forest) approach along with contemporary signal processing methods, we have shown that acoustic cry characteristics are associated with some 2-year developmental and behavioral deficits in very preterm infants.Specifically, estimates of models trained using acoustic cry characteristics were associated with clinically significant total problem behavior scores, even after controlling for covariate measures or family-wise type-I error rates.
Likelihood-ratio tests also indicated that adding estimates of random forest models trained on cry acoustics to a mixed model containing covariates significantly improved model fit, suggesting a unique association of acoustic cry characteristics with prediction of clinically significant total problem behavior scores at age 2. Similar (but nonsignificant) associations were observed for clinically significant language and motor composite scores on the Bayley-III, internalizing and externalizing scores on the CBCL, and a positive autism screen on the M-CHAT.
Previous work describing abnormal cry acoustics in infants with severe medical problems suggested that cry characteristics might be of diagnostic value in at-risk infant populations where developmental outcome is less clear.Cry acoustics have been associated with prematurity 30-32 and other at risk populations 13 in the neonatal period but not to longer term developmental or behavioral deficits and typically researchers examined a limited subset of acoustic characteristics.Our findings point to the potential use of acoustic cry characteristics in the early (before NICU discharge) identification of which preterm infants are most at risk for longer term developmental and behavioral deficits and by perhaps to other at-risk and not at-risk populations.
The CBCL findings are noteworthy from a linguistic perspective.Human speech is divided into linguistic and paralinguistic components.The linguistic component refers to words and their syntax.
The paralinguistic component is the prosodic features of pitch, loudness, melody, and intonation that convey the affective side of speech.Crying includes only prosodic features that are aligned with the behavioral and emotional problems measured by the CBCL.The effects that we found were on the internalizing problem score which includes anxious/depressed, withdrawn-depressed, and somatic complaint scales and the externalizing problem score that combines the rule-breaking and aggressive behavioral scales.This connection between cry as prosody and later behavioral and emotional problems could lead to the study of neural pathways related to the origins of mental health disorders.

Limitations
Our study had limitations.One limitation of this work is that we were only able to use cries from 363 of the 556 infants; however, there was only one significant difference between those included and those excluded.This suggests that we need to make improvements in cry collection and extraction.A second limitation is that this is a data driven approach and although the infants were drawn from 9 NICUs, this was still a single cohort study.The next step would be a hypothesis-testing study to determine if these acoustic cry models replicate in other cohorts.Additionally, although acoustic cry analysis may help identify which preterm infants are most at risk for adverse outcomes, these measures do not readily translate into specific intervention strategies.

Conclusions
Despite its limitations, this work raises the possibility that cry analysis could be developed into a bedside, accessible, and non-labor-intensive machine-interpreted diagnostic similar to electrocardiographic readings.
Cries were recorded during a neurobehavioral examination administered during the week of NICU discharge.Cry episodes were analyzed using a previously published computerized system to characterize cry acoustics.Year-2 outcomes included the Bayley-III Composite scores, Child Behavior Checklist (CBCL) and the Modified Checklist for Autism in Toddlers (M-CHAT R/F), dichotomized using clinically significant cutoffs (<85 on Bayley Language, Cognitive and/or Motor Composite scores, T-score >63 on the CBCL Internalizing, Externalizing and/or Total Problem Scales and total M-CHAT R/F score >2).

Findings
In this cohort study of 363 preterm infants, acoustic cry characteristics were associated with clinically significant language and cognitive deficits, behavior problems, and a positive autism screen at age 2 years.Meaning These findings point to the potential use of acoustic cry characteristics in the early identification of preterm infants that are most at risk for longer term developmental and behavioral deficits.
Analysis proceeded in 2 phases.The first phase applied a cepstral-based acoustic analysis to extract acoustic parameters with a 12.5 ms frame advance.The second phase organized and summarized this information into cry utterances.The mean (SD) amount of crying analyzed per infant was 17.37 seconds (15.54).Long cry utterances were defined as a cry during the expiratory phase of respiration lasting at least 0.5 seconds.Short cry utterances were less than 0.5 seconds; short and long utterances show distinct acoustic characteristics and thus were analyzed separately.The automated cry analyzer produced 56 acoustic characteristics per cry utterance.Acoustic characteristics for a total of 4431 long (Ն500 ms) and 10 270 short (<500 ms) utterances were processed, including a mean (SD) of 12.2 (12.2) and median (IQR) of 9 (6-14) long utterances and a mean (SD) of28.3 (25.8) and median (IQR) of 20 (13-33) short utterances per child on average.For each acoustic characteristic, we computed infant-level means and infant-level signal quality (ie, the percentages of utterances that produced a valid value for a given acoustic parameter).

JAMA Network Open | Pediatrics Acoustic
Cry Characteristics in Preterm Infants and Developmental and Behavioral Outcomes at 2 Years of Age

Table 1 .
Demographic and Clinical Characteristics of 327 Mothers and 363 Infants Included in the Sample Acoustic Cry Characteristics in Preterm Infants and Developmental and Behavioral Outcomes at 2 Years of Age The false discovery rate correction was used to maintain acceptable (α = .05)familywise type I error rates.Data were analyzed from September 2021 to September 2022.

Table 2 .
Characterizing the Association Between 2-Year Outcomes and Estimates of Machine Learning Models a The ability to identify which infants are at highest risk for developmental and behavioral disorders could lead to the development of interventions to mitigate adverse outcomes.Kusaka R, Ohgi S, Shigemori K, Fujimoto T. Crying and behavioral characteristics in premature infants.J Jpn Phys Ther Assoc.2008;11(1):15-21.doi:10.1298/jjpta.11.15 31.Rautava L, Lempinen A, Ojala S, et al; PIPARI Study Group.Acoustic quality of cry in very-low-birth-weight infants at the age of 1 1/2 years.Early Hum Dev.2007;83(1):5-12.doi:10.1016/j.earlhumdev.2006.03.004 32.Goberman AM, Robb MP.Acoustic examination of preterm and full-term infant cries: the long-time average spectrum.J Speech Lang Hear Res.1999;42(4):850-861.doi:10.1044/jslhr.4204.850Count of the Top 10 Most Important Variables Broken Down by Model Outcome, Utterance Type and Acoustic Variable Type eFigure 1. Feature Importance Estimates From Models Predicting CBCL Composite Scores Using Cry Acoustic Characteristics (Only the Top 10 Features are Shown) eFigure 2. Feature Importance Estimates From Models Predicting Bayley-III Composite Scores Using Cry Acoustic Characteristics (Only the Top 10 Features are Shown) eFigure 3. Feature Importance Estimates From Models Predicting MCHAT Positive Autism Screen Using Cry Acoustic Characteristics (Only the Top 10 Features are Shown)