Neural phase angle from two months when tracking speech and non-speech rhythm linked to language performance from 12 to 24 months

Atypical phase alignment of low-frequency neural oscillations to speech rhythm has been implicated in phonological deficits in developmental dyslexia. Atypical phase alignment to rhythm could thus also characterize infants at risk for later language difficulties. Here, we investigate phase-language mechanisms in a neurotypical infant sample. 122 two-, six-and nine-month-old infants were played speech and non-speech rhythms while EEG was recorded in a longitudinal design. The phase of infants ’ neural oscillations aligned consistently to the stimuli, with group-level convergence towards a common phase. Individual low-frequency phase alignment related to subsequent measures of language acquisition up to 24 months of age. Accordingly, individual differences in language acquisition are related to the phase alignment of cortical tracking of auditory and audiovisual rhythms in infancy, an automatic neural mechanism. Automatic rhythmic phase-language mechanisms could eventually serve as biomarkers, identifying at-risk infants and enabling intervention at the earliest stages of development.


Introduction
By the time infants produce their first words around one year of age, they have been engaged in months of spoken language listening, significant portions of which will be to infant-directed speech (IDS), a rhythmic and repetitive input with exaggerated pitch and affect (Kuhl, 2000;Leong et al., 2017). Hence before infants reliably show recognition of their own name at five months (Mandel et al., 1995) and of other common words from six months (Bergelson & Swingley, 2012), they will have had many months to learn acoustic speech features. A key unanswered question is whether automatic neural encoding mechanisms in the pre-linguistic infant brain affect subsequent language acquisition. In the current report, we focus on neural phase alignment mechanisms, as phase mechanisms are known to be atypical in children with phonological impairments (developmental dyslexia) and are critical for the processing of auditory speech rhythm. Rhythmic processing is thought to be a universal precursor of language acquisition (Mehler et al., 1988). Studies of children with phonological impairments have found a different preferred phase at the group level to rhythmically-presented syllables repeated every 500 ms (2 Hz presentation rate) compared to children without phonological difficulties (Power et al., 2013;Keshavarzi et al., 2022a). Furthermore, these child studies demonstrated that individual differences in preferred phase are related to phonological awareness, the ability to reflect on the phonological structure of words, which develops throughout the preschool years (Lonigan et al., 1998;McDowell, Lonigan, & Goldstein, 2007). In the current study, our aim was to see whether individual differences in early-developing phasealignment mechanisms were predictive of later language ability in a community sample of 122 infants. If so, future applied work could determine whether atypical phase responses to simple rhythmic inputs can be used to identify infants at risk for language difficulties at the earliest stages of development, enabling intervention before they begin to comprehend and produce language.
The rhythmic measures chosen for the current study were a syllable repeated at 2 Hz and a metronome beat (drumbeat sound) repeated at 2 Hz. These measures were selected on the basis of prior studies with children with either developmental dyslexia or developmental language disorder (DLD, e.g., Thomson & Goswami, 2008;Power et al., 2013;Cumming et al., 2015a, b) that were testing a sensory/neural framework originally proposed to explain phonological difficulties in developmental dyslexia, Temporal Sampling theory (TS theory, Goswami, 2011Goswami, , 2015Goswami, , 2019. TS theory is based on speech rhythm, which unfolds in the temporal domain. Adults and children with dyslexia and many children with DLD have speech-sound (phonological) processing difficulties (Ramus et al., 2003;Ziegler & Goswami, 2005;Cumming et al., 2015a). TS theory proposes that these difficulties are related to impairments in fundamental auditory and neural processes that contribute to speech rhythm processing. Many of the underlying sensory-neural differences are present from infancy, as documented by differences in both eventrelated potential Leppänen et al., 2010;Van Zuijen et al., 2013;Virtala et al., 2022) and behavioural (Kalashnikova, Goswami & Burnham, 2018) responses to auditory stimuli between infants at family risk of dyslexia and those not at risk.
It is a long-established EEG phenomenon that the brain responds "in time" to the rhythmic repetition rate of an auditory stimulus (AM rate) via increases in neural power at the same temporal rate (Picton et al., 1987). Regarding rhythm patterns in the speech signal, rhythmic structure is related to regular modulations of signal energy over time in the speech amplitude envelope, which for adult-directed speech (ADS) peak at a rate of 3-5 Hz, the "syllable rate" (Greenberg, Carvey, Hitchcock, & Chang, 2003). The amplitude envelope of speech is also represented in the EEG signal (Ding et al., 2016). Low frequencies in the amplitude envelope corresponding to quasi-rhythmic components of speech play a role in both accurate neural tracking of the speech signal and in speech intelligibility (Giraud & Peoppel, 2012;Doelling et al., 2014). As these rhythmic features facilitate adult comprehension of spoken language, it is possible that they also play a role in language acquisition.
Newborns are adept at processing speech rhythm, for example they can discriminate between languages from rhythm classes with and without stress-timing (Nazzi et al., 1998) and can recognise prosodic cues for speech segmentation (Fló et al., 2019). There is ample evidence that the infant brain represents and tracks auditory stimuli (Gibbon et al., 2021;Jessen et al., 2019) including the speech amplitude envelope Ortiz Barajas et al., 2021). The lower frequencies in the speech envelope (AMs < 15 Hz) are tracked most accurately by the delta band (~0.5-4 Hz) of infant EEG across the first year of life, which shows more accurate tracking than the theta band (4 -8 Hz) or the alpha band (8 -12 Hz) for continuous speech input (nursery rhymes, Attaheri et al., 2022;IDS, Menn et al., 2022). This may suggest an important developmental role for delta-band speech information in language acquisition. When adults speak to infants in IDS, delta-band frequencies in the speech envelope contain enhanced power relative to when they speak in ADS (Leong et al., 2017). In a key study, adult listeners heard syllables repeated at a regular 4 Hz rate. While all listeners showed peaks in EEG power in response to speech at 4 Hz (the rate of syllable presentation here), only adult listeners who could understand the language being spoken exhibited lower frequency peaks in power reflecting the phrasal and sentential rates (~1 Hz and ~ 2 Hz, Ding et al., 2016). Although infants do not yet comprehend speech, the enhancement of rhythmic information in the delta band in IDS may mean that such frequencies provide the brain with important acoustic cues to phonological structure, acting as a scaffold for language acquisition. This possibility is supported by prior neural data testing TS theory with children.
Neural developmental studies related to TS theory have identified a key role for delta band encoding of speech envelope information in phonological and vocabulary development (Mandke et al., 2022;Keshavarzi et al., 2022b;Destoky et al., 2020;Power et al., 2016). Furthermore, the low frequency AMs in speech, corresponding to delta band temporal rates in English nursery rhymes, are known from speech modelling studies to determine the perception of syllable stress patterns (speech prosody; Leong & Goswami, 2015). Accordingly, delta-band AMs may sit at the apex of a hierarchical AM structure for acoustic speech organization, within which other AM cues to linguistic components (e.g. syllables, ~5 Hz AMs, phonemes, ~30 Hz AMs) are nested (see Leong & Goswami, 2015, for computational analyses). The accuracy of delta-band phase alignment mechanisms could thus be expected to play a key role in learning this acoustic AM hierarchy. Leong and Goswami's (2015) analyses showed that phase relations between the different AM bands were key to perceiving phonological structure. Accordingly, it is not sufficient for cortical power at a given frequency to increase in response to information presented at that frequency, but rather the neural response must be temporally aligned to the incoming auditory information. Support for this idea comes from the dyslexia studies noted earlier that used rhythmic syllable repetition at a 2 Hz delivery rate and identified group differences in preferred phase that were related to phonological development (Keshavarzi et al., 2022a;Power et al., 2013; see also Soltész et al., 2013 for a tone-based rhythmic repetition study with dyslexic adults). Given that preferred phase in response to 2 Hz rhythmic stimulation is associated with fundamental language abilities such as phonological awareness in children, it is possible that the neural mechanisms underlying the relationship between preferred phase and phonological development are present much earlier and play a role in individual differences in language acquisition.
An important theoretical consideration regarding investigations of preferred phase is that the increased power found in response to a repeated rhythmic stimulus may not reflect oscillatory alignment. While some researchers interpret phase measures as demonstrating an entrainment of neural rhythms to the frequency of an incoming stimulus (Peelle & Davis, 2012), others argue that it could be a "frequencyfollowing" or steady-state evoked response (Keitel, Quigley & Ruhnau, 2014). By the latter account, neural responses occur at the stimulus frequency not because they are endogenous cortical oscillations aligning to the stimulus, but because they are evoked by the stimulus each time it repeats (and may include non-cortical components). Even non-sensory ERPs, such as the attention-related infant Nc, have been hypothesised to generate a neural response that can be interpreted as an increase in oscillatory power (Keitel et al., 2021). Other work suggests that even if responses are oscillatory, they may not be directly related to the stimulus' acoustic features, but may reflect other endogenous languagerelated processes (Meyer et al., 2020). Nevertheless, recent studies suggest that oscillatory responses to rhythmic speech stimuli persist after the stimulation ends, suggestive of an entrained rather than a purely evoked component (van Bree et al., 2021;Zoefel, ten Oever & Sack, 2018). Here, we do not make any strong claims about whether the 2 Hz response that we aim to measure is evoked or entrained. Given that Power and colleagues (2013) found that children's reading skill and phonological processing were related to preferred phase in response to 2 Hz stimuli, our interest is in whether this same measure of preferred phase will relate to language development in early life, regardless of whether it is an evoked or entrained response.
We investigate our hypothesis via a longitudinal study of early neural and motor tracking of auditory and audiovisual rhythms in relation to later language acquisition (the Cambridge UK BabyRhythm project). In the current report, we examine how the offset and consistency of infant neural phase alignment in response to 2 Hz stimulation predicts performance on multiple language and communication measures administered in later infancy and toddlerhood (Rocha et al., preprint). Henceforth, methods and results are discussed in terms of phase, but from an "evoked response" perspective, the offset can be thought of as a difference in latency, with consistency reflecting whether the evoked response occurred consistently at the same latency for each successive stimulus. At the ages of two, six, and nine months, infants were played a drumbeat sound and a woman's voice saying the syllable /ta/, each repeated at 2 Hz, while EEG was recorded. At two months the infants were sleeping and so stimulation was auditory-alone; at six and nine months accompanying videos were played so that the stimulation was audio-visual. Although prior work (e.g. Power et al., 2013) used a repeated syllable only, the drumbeat stimulus was included to find if neural responses to this simpler, non-linguistic rhythmic stimulus would also be predictive of later language development. In previous studies testing TS theory with tapping tasks, children with dyslexia show neural phase differences versus control children when listening to a simple metronome beat without tapping to it (Colling, Noble & Goswami, 2017). Our hypothesis was that infants would show phase alignment to both the syllable and the drumbeat, and that individual differences in preferred phase angle would be predictive of subsequent language outcomes.
Our experimental approach was to use EEG power to determine regions of interest (ROIs) using a cluster-based approach. Having determined the existence of a stimulus-driven 2 Hz cortical tracking response over these ROIs (see Appendices A and C) using measures of power, we then switched to the temporal domain for our key questions. Calculating the phase angles of the infants' neural responses relative to the beats of the stimuli for the power-determined ROIs enabled investigation of whether the 2 Hz neural response was drawn into a consistent phase with the rhythmic stimuli, both within and across individuals. We hypothesized that it would be drawn into a consistent 2 Hz phase both within and across participants, on the basis of findings by Power and colleagues (2013) with older children. As noted previously, Power and colleagues (2013) also found a link between phase angle in response to rhythmic 2 Hz stimuli and language ability. This motivated our hypothesis that, if the alignment of 2 Hz cortical rhythms to corresponding rhythmic auditory and audiovisual stimuli plays a role in language development, then individual preferred phase to rhythmic stimuli in infancy should relate to language ability in later infancy and early toddlerhood. The language measures reported here were drawn from a larger set of measures originally devised for the project, many of which were affected by the Covid-19 Pandemic (see Rocha et al., preprint). The measures used here had relatively few missing scores, did not show either floor or ceiling effects, and showed developmental continuity across testing sessions (see Rocha et al., preprint, for full information). These measures were then grouped according to whether they were infant-led or parent-estimated measures. The expectation was that both groups of measures would be affected by neural phase angle and phase consistency in response to both rhythmic stimuli.

Participants
At the two-month session, 108 of the 122 infants enrolled in the longitudinal study took part, at a mean age of 2 months 1 day ± 6 days. At six months, 113 infants took part (6 months 2 days ± 7 days). At nine months, 108 took part (8 months 27 days ± 5 days). Numbers of datasets analysed were 106, 108, and 106 respectively. Exclusion reasons are provided in Appendix A. Seventy-six infants provided data for all analysed language measures. The study was approved by the Psychology Research Ethics Committee of the University of Cambridge, U.K.

Stimuli and apparatus
Audiovisual stimuli took the form of a female speaker repeating the syllable /ta/ twice per second, and a ball bouncing on a drum creating a drumbeat, twice per second (Fig. 1). The videos were 12 s long and contained 24 iterations of the syllable articulation or drumbeat. At eight weeks, audio-only versions of both stimuli were presented. Stimuli were presented on a TX300 screen (Tobii AB, Stockholm, Sweden) and through 2020i speakers (Q Acoustics Ltd., Bishop's Stortford, UK) driven by a Topaz AM5 stereo amplifier (Cambridge Audio Ltd., London, UK). EEG data were recorded via Netstation software using 64-channel Geodesic Sensor Nets connected to a GES 300 amplifier recording at a sampling rate of 1000 Hz (Electrical Geodesics Inc., Eugene, OR, USA).

EEG
At two months, infants were held by their parent in an EEG cubicle with low or no light. Ideally, EEG recording took place in low light or darkness while the infant slept. Infants were played 2000 2 Hz beats of one stimulus followed by 2000 beats of the other. At six and nine Fig. 1. Depiction of how the rhythmic audiovisual stimuli appeared to 6-and 9-month-old participants. months, infants were seated in an infant chair. The experimenter alternated between the syllable and drumbeat videos up to 60 presentations each. Infants viewed 42.9 ± 10.4 drum videos (mean ± standard deviation) and 40.1 ± 12.8 syllable videos at six months, and 46.1 ± 6.4 drum videos and 45.7 ± 6.3 syllable videos at nine months. EEG recording took place in silence for three to five minutes before stimulus presentation at two months and after at six and nine months.

Language outcome measures
See Rocha et al. (preprint) for a full description of outcome measures. The parent-reported measures used for this analysis came from the UK-CDI, a questionnaire of receptive and productive vocabulary (Alcock, Meints, & Rowland, 2020) completed by caregivers when the children were 24 months old. The infant-led measures include a pointing measure of communicative intention at 12 months; scores on the computerised comprehension task (CCT) at 18 months, which is a measure of infant vocabulary in which the infant chooses the picture of a named item (Friend & Keplinger, 2003); and scores on a phonological task, non-word repetition (NWR), at 24 months. NWR measures accurate reproduction of consonants, number of syllables, and stress patterns in items like "punky" as well as real words like "puppy". The timeline of EEG and language measures discussed in the current report can be seen in Table  A2 in Appendix A.

Processing
Information about the laboratory sessions was stored in Redcap (Harris et al., 2009;2019). Data were exported from EGI Netstation and to EEGlab 14 (Delorme & Makeig, 2004), running in Matlab version R2018b (Mathworks, CA, USA). Data were filtered with a highpass of 0.2 Hz and a lowpass of 45 Hz. Persistently bad channels were identified by eye and removed, with an average of 4.1 ± 2.7 bad channels excluded at two months, 4.2 ± 2.6 at six months, and 3.9 ± 2.5 at nine months. Two frequently poorly-fitting channels in front of the ears were also excluded. Missing channels were then interpolated in EEGlab.
After manual removal of periods of extreme noise, further noise reduction was performed using Artifact Subspace Reconstruction (ASR; Kothe & Makeig, 2013). Data were epoched into segments of 10.5 s starting at the fourth "beat" in each video, baseline-corrected to the preceding 500 ms. This approach of starting after the third beat follows Power and colleagues (2013), who discarded the initial beats on the basis that an oscillatory response would take time to entrain. Our aim was to achieve continuity from the baseline, so this means that the baseline correction window also contained a beat (the third beat). An increase in 2 Hz power would be expected in response to all stimuli, hence correcting to the baseline works against our hypothesis that power would increase in response to 2 Hz stimulation. An increase in 2 Hz power in response to all stimuli was found using this approach (see Appendix C and prior report, Ní Choisdealbha et al., 2022). Epochs with remaining artefacts were removed. At two months, there were 50.5 ± 20.5 drum (mean ± standard deviation), 48.3 ± 20 syllable and 14.2 ± 8.1 silent segments after cleaning. At six (nine) months, there were 23.4 ± 11.8 (26.7 ± 11.3) drum, 23.5 ± 11.6 (25.5 ± 10.1) syllable and 12.1 ± 7 (14.4 ± 7.4) silent segments. Condition-level datasets with fewer than five clean segments were excluded from analysis. An upper threshold of 30 trials, randomly selected, was applied to reduce the impact of lower trial numbers in the silent condition. Data were rereferenced to the head average, excluding peripheral channels.

Phase analysis
The Circular Statistics toolbox (Berens, 2009) was used to compute the phase angle and resultant vector length of the neural response at each analysis frequency over each ROI. The initial phase values were obtained in FieldTrip (Oostenveld et al., 2011) using the wavelet method. Five Morlet wavelets offered a compromise between temporal and spectral precision. For each ROI, values were obtained for frequency bins between 1 and 15 Hz in 0.1 Hz steps, and in steps of 100 ms. From the resulting time-frequency output, we selected phase angle observations for analysis every 500 ms. The Morlet approach cuts off some time-frequency windows on the edges, meaning there were 8 s of data in each trial, or 16 observations per trial. With a lower bound of 5 trials and an upper bound of 30 trials, this meant that the number of observations available for the analyses varied from 80 to 480 per infant. This value was then averaged across all trials per condition for each session using the circ_mean function (Berens, 2009) to obtain mean phase angle and vector length for each infant. Two times pi was added to any negative phase angle values to get the corresponding positive value.

Statistical Approach: EEG
Two sets of analyses were run in all cases, one with the two-montholds' data alone, and one with the six-and nine-month-olds' data only. This separation took place primarily because the paradigm presented at two months was in an auditory-only modality, while that presented at six and nine months was audiovisual. Statistical comparison of the data between these ages is further precluded by a number of factors. The infants were tested in a different state of alertness at two months than at the older ages, and the greater number of trials at two months (as well as physiological considerations like skull thickness) could lead to a difference in signal-to-noise ratio between data collected at this age, versus the older ages.
A cluster-based approach was used to determine where the differences in 2 Hz power by condition (stimulation versus silence) were greatest in each age group. This spectral power approach tells us over which electrodes we are finding responses to the 2 Hz stimulation, but does not tell us whether the temporal measures that we are interested in are significant over these clusters too. Rather, it provides a basis for an iterative analysis. Given that these are the areas where the greatest increases in 2 Hz power in response to 2 Hz stimulation were found, we can ask if these 2 Hz responses also occurred consistently at the same angle within and across participants. Differences in the lag with which a signal is picked up by electrodes on different parts of the scalp are likely to affect results in the temporal (versus the spectral) domain, particularly when direction is important, as it is in the measurement of phase angle. Analyses of relative power were reported in our prior report based on the rhythmic repetition tasks, along with the effect on 2 Hz inter-trial coherence (ITC; Ní Choisdealbha et al., 2022; see also Appendices A and C). These relative power and ITC effects were found using whole-head averages. Due to the small size of the infant head, a whole-head effect is not unexpected when power is the focus of the analyses.
Target frequency bins for the phase analyses were 2 Hz to 7 Hz. In Ní Choisdealbha et al. (2022), the 2 Hz, 4 Hz and 7 Hz frequencies were compared. Stimulus-related increases in power and ITC were expected at 2 Hz. 4 Hz was used as a comparison bin to see if harmonic effects were also present, and 7 Hz was a control bin unlikely to be affected by harmonic effects. For the analyses presented here, 3 Hz and 5 Hz, were also included to find whether effects occured in other delta and theta band bins, or if they were limited to the stimulus frequency and its harmonic. 6 Hz was included for completeness. The 1 Hz bin was included in the cluster-based analyses as well as the relative power analyses reported in Appendix C, but was excluded from the coherence and phase analyses due to the Morlet wavelet approach truncating the time window available for 1 Hz data.
For the cluster-based approach, EEGlab pop_specparams and pop_statparams functions were used to identify groups of proximal electrodes over which target frequency power differed between the conditions. A separate analysis was run for each of the frequency bins. The pop_specparams option to subtract the individual subject mean from each spectrum before plotting and computing statistics was employed. As the EEGlab program does not account for missing datapoints, only infants with data for all three conditions were included; the electrode groups identified were then targeted for all further analyses, which involved the full sample. The Holms correction for multiple comparison was applied due to its conservatism. A threshold of p < 0.01 was set for the identification of significant electrodes. Adjacent electrodes exceeding this threshold were considered a ROI. In Appendix C, the results of relative power and ITC analyses directly comparing the ROIs and the frequency bins are reported. These results are not reported in this main text because phase was the key measure of interest in the current report.
The phase angle and vector length data enabled us to ask three research questions. First, we asked whether infants' phase responses to the stimuli fell into a similar temporal phase at the group level. This established whether there was a group-level preferred phase, namely whether or not neural responses tended to show a consistent temporal alignment to our rhythmic stimuli that was similar across infants. The presence of group-level preferred phase effects was established using the Rayleigh test (circ_rtest) function of Circular Statistics Toolbox (Berens, 2009). This tests whether the mean phase angles in a group of participants are distributed uniformly around all 360 • of a circle, or whether they tend to cluster in a particular direction. Tests were run on each infant's 2 Hz mean phase angle per age group, condition and ROI, with a Bonferroni correction applied across the multiple comparisons (three ages, three conditions, and four ROIs, with a resulting alpha of 0.05 divided by 36, conservatively rounded to 0.001). This approach has been used for analysing data from a similar paradigm with older children (Keshavarzi et al. 2022a, Power et al., 2013. However, one limitation is that it does not allow for a direct test across conditions. Further, although our underlying data was unimodal, it should be noted that many tests of circular data lack statistical power especially when the data are multimodal (Landler, Ruxton & Malkemper, 2018). Second, on an individual level, we asked whether infants' phase responses tended to fall consistently at or around a similar angle, whether this consistency of response was significantly greater for the stimuli than for silence, and whether phase consistency was greater at 2 Hz than at the other frequencies. The consistency analyses tell us whether preferred phase was significant at an individual level, that is, whether each infants' neural response tended to fall at a similar angle in response to repetitions of a stimulus. To investigate this second research question, we compared the resultant vector length of infants' neural responses. Vector length is a linear variable between 0 and 1 which is smaller when an infant's responses to a given stimulus are less consistent at a given frequency, and larger when these responses occur more consistently at or around the same angle. Linear mixed effects models (LMEMs) with fixed factors of condition, frequency bin, ROI, and their interactions were run on the vector length data, with random intercepts on participant identity, using lme4 and lmerTest (Bates & Sarkar, 2007) in R software. For the audio-visual paradigm administered at six and nine months, age was included as a fixed factor alone and in interaction with the other variables, and as a random slope. F-tests were performed using Satterthwaite-corrected ANOVAs and are reported in Appendix C. The base cases used were the 7 Hz frequency bin (neither near to nor a harmonic of the stimulus frequency), the left temporal RoI, the silent condition, and, where applicable, the six-month timepoint. Beta estimates (which can be taken as effect sizes) are reported for important effects.
The final set of analyses were concerned with the question of whether, within individual infants, the mean phase angle differed between each condition as well as by frequency band and ROI. To conduct these analyses, we adopted the same LMEM approach using the same factors as in the vector length models. However, a linear model cannot be run on angular or directional data because treating radians as linear makes 1 • and 359 • appear on opposite ends of the scale, when they are in fact adjacent. Consequently, we expressed the angular data as its sine and cosine components, and ran a model with each of these as the dependent variable. Sine and cosine are linear expressions of the angle's coordinate position on an imaginary circle, with the cosine of the angle reflecting its x-axis position (between 0 • and 180 • ) and the sine reflecting its y-axis position (between 90 • and 270 • ). Figure A3 in Appendix A depicts the relationship between the phase angle and its sine and cosine components.
Although a two-way ANOVA for circular data is available (Berens, 2009;Harrison & Kanji, 1988) and has been used for within-participants research (Henry & Obleser, 2013;Meyer et al., 2017), and although a number of more complex approaches for modelling circular data in a multilevel context have been developed (Cremers & Klugkist, 2018), here we wanted to ensure that we could control for repeated measures from the same participant using the LMEM, and avoid Type II errors. Further, given our research questions, we wanted to mirror the approach used with the vector length data. Expressing the angular data as its sine and cosine components enabled us to achieve this.
Two LMEMs were run on each dependent variable of vector length, phase angle sine, and phase angle cosineone for the two-month-old, auditory-only data, and one for the six-and nine-month audiovisual data. The results of these analyses are uncorrected because, unlike the Rayleigh tests, separate tests did not need to be run by condition, ROI, or other relevant variables (frequency bin and, within the audio-visual paradigm, age). Readers may wish to consider reported p-values of the phase angle models in light of the presence of two models (sine, cosine) rather than a single phase angle model.

Statistical approach: Language
Multivariate linear models were run using the parieto-occipital 2 Hz phase data at two months, as this is where significant tracking and phase preference results were found. The left temporal region was used for the older ages as effects were more broadly distributed across the scalp. Predictor variables were the phase angle (expressed as the sine by cosine interaction), vector length (stimulus minus silence, controlling for general oscillatory consistency), and angle by vector length interaction in response to each of the syllable and drum stimuli. One multivariate model was run for each age group (two and nine months) for each of the infant-led and parent-estimated groups of measures. This statistical approach was planned in advance and applied across multiple Baby-Rhythm studies using different neural measures as language predictors (see also Attaheri et al., preprint, which uses neural responses to nursery rhymes as its predictors; and Ní Choisdealbha et al., preprint, which uses neural responses to visual-only speech), accordingly an alpha threshold of 0.05 was applied. Single-level linear models were also run on each of the language measures individually in order to ascertain whether a specific language development measure was particularly affected by the early neural responses to rhythmic input, or whether the effect applied to the infant-led or parent-estimated measures more globally. As these follow-up analyses required multiple comparisons, one per language measure, they were Bonferroni-corrected for the number of language measures in the group (i.e. 5 for infant-led, and 2 for parent-estimated). Respective alpha levels were therefore 0.01 and 0.025. Although the main multivariate tests may show that some predictors had a significant effect and others did not, we did not remove any of the original covariates in the follow-up models. We continue to control for them in the follow-up tests to ensure comparability between the multivariate and follow-up univariate tests.
Data and scripts for the statistical analyses can be found at htt ps://osf.io/seyda/.

Systematic difference in phase angle associated with date of testing
During data analysis, a systematic difference in phase was identified linked to the date of testing. This was identified when infants who had been shown shorter, 16-beat drumbeat videos (see Appendix A) were entered into the phase angle analysis, having been excluded from the prior ITC analysis (reported in Appendices) due to the difference in available time window. Upon investigation, this difference was unlikely to be stimulus-driven, because it affected the data in response to the syllable as well (which was always a 24-beat video), and because there was also a systematic difference in the 2-month-old data, which did not use these audiovisual stimuli. Nonetheless the difference appears to have emerged at the date the stimulus was changed. Although not driven by the stimulus, it may be related to another, incidental change to the testing set-up that could have been made at that time (but which we have been unable to identify). Appendix B reports the various tests undertaken to establish whether this was a systematic difference and to find an explanation for it.
To mitigate this date of testing effect on our longitudinal design, we control for it by assigning a dummy variable to each infant (henceforth, "control factor for date of testing" or "control factor") that specified if an infant was tested before or after the date the drumbeat stimulus was changed. For the Rayleigh tests, we test each group separately. For the analyses of whether sine and cosine differed by frequency band, condition, and ROI, we include an interaction term between the control factor and frequency band. As this was a systematic issue affecting all collected data after a given date, we would not expect its effect on the data to be different in each ROI, or in each condition. For the language analyses, the control factor was included as a main effect, and we also interacted the control factor with the phase angle term (sine by cosine interaction) in the models. We did not include a vector length by control factor interaction in the LMEMs for vector length, because regardless of any shift in angle dependent on the date of testing, we would not expect intra-individual consistency as indexed by vector length to be affected.

ROIs and Rayleigh tests
The ROIs identified by the cluster-based approach were the left temporal (EGI 64-channel electrode numbers 24, 25, 27, 30), right temporal (44, 45, 46, 48, 52), parieto-occipital (28, 31, 33, 34, 35, 36, 37, 38, 39, 40) and right fronto-central areas (53,54,56,57,58,59,60). Further results, including the electrodes in each group, are reported in Appendix C. Our first phase-related research question concerned whether the phase of infants' responses fell into a similar temporal phase at the group level. Using the ROIs established by the clustering analyses, Rayleigh tests were used to find whether the spread of individuals' mean phase angles in response to the stimuli differed from a uniform distribution. Table 1 and Table 2 show the results for each age group, condition and ROI, with Table 1 showing results from infants tested before the systematic change related to date of testing, and Table 2 showing results from those tested after this date (14th February 2018; please note sample size difference for 6 and 9 months, with Table 2 having more observations). Significant p-values indicate that the circular distribution of phase angles was clustered in a particular direction. Fig. 2 shows these phase distributions over key RoIs for the later-tested group only; Figure  B1 in Appendix B shows both groups separately. Accordingly, with respect to our first research question, a group preferred phase was already established by 2 months of age over the parieto-occipital region, and group preferred phase extended to all other regions at six and nine months.

Vector length
Our second research question concerned the phase angles of individual infants. Vector lengths were used as a measure of individual response consistency (the longer the vector, the more consistently the response was measured at or around the same angle, time-locked to the stimulus onsets). At two months (auditory-only input), the LMEM showed a significant interaction between condition and frequency bin (Satterthwaite-corrected test of variable contributions to model: F(10, 6690) = 31.627, p < 0.0001; see Table C1 in Appendix for full tests of fixed effects). At 2 Hz, the vector lengths in response to both the drumbeat (β = 0.095, SE = 0.026, p = 0.0002) and syllable (β = 0.149, SE = 0.026, p < 0.0001) were longer relative to the base cases, indicating greater consistency in responses to the stimuli at this frequency. There was a significant estimate on the parieto-occipital 2 Hz response to the drumbeat (β = 0.127, SE = 0.037, p = 0.0005), suggesting further consistency at this ROI (marginal interaction in Satterthwaite test, F(30, 6690) = 1.35, p = 0.096). These two-month vector length results suggest   Figure B1. that infants responded consistently on an individual level to both the drumbeat and syllable. The Rayleigh results suggest that the preferred phase of this response varied across individuals, with the exception of the parieto-occipital drumbeat response. At six and nine months (audio-visual input), the LMEM interactions between frequency bin and condition (F(10, 13912) = 56.473, p < 0.0001) and frequency bin, condition, and RoI (F(30, 13912) = 2.052, p = 0.0006) were significant (results of tests of fixed effects are shown in Table C1). The drum vector was longer than the silent condition vector overall (β = 0.09, SE = 0.023, p < 0.0001), and even more so at 2 Hz (β = 0.198, SE = 0.032, p < 0.0001). The vector in response to the syllable was longer at 2, 3 and 4 Hz (β = 0.198, SE = 0.031, p < 0.0001; β = 0.073, SE = 0.032, p = 0.02; β = 0.1, SE = 0.032, p = 0.002) than 7 Hz. The interaction with RoI reflects a longer 2 Hz vector in response to the drumbeat over parieto-occipital (β = 0.098, SE = 0.045, p = 0.03) and right fronto-central areas (β = 0.11, SE = 0.045, p = 0.015). These vector length results show that infants' neural responses at the stimulus frequency were drawn into a particular, consistent phase relative to that stimulus. The infants were thus exhibiting individual phase consistency at both six and nine months of age in many ROIs.

Phase angle
The third set of analyses investigated whether, within individual infants, the mean phase angle (expressed separately as its cosine and sine value) for each condition differed from the angle observed in silence, particularly at 2 Hz. We focus here on the key interactions revealed by the mixed effect analyses (condition by frequency bin, condition by frequency bin by ROI); full tests of fixed effects are reported in Appendix C, Table C2. In the LMEM on the cosine of the two-month data, the 2 Hz response to the syllable differed significantly from the silent condition (β = 0.314, SE = 0.145, p = 0.031), while the 2 Hz response to the drumbeat did not (β = 0.135, SE = 0.145, p = 0.352). The cosine model also showed evidence for the effect of the control factor (infants tested before/after February 2018) at 2 Hz specifically (β = -0.129, SE = 0.059, p = 0.029). In line with Figure B1 (top row) in Appendix B, this indicates that infants tested later were showing phase angles closer to 180 • than 360 • . For the sine values at two months, the LMEM analyses showed a significant effect for the 2 Hz drumbeat response parieto-occipitally (β = -0.499, SE = 0.02, p = 0.014). The angle of the 4 Hz response to the syllable also differed from the silent angle (β = 0.289, SE = 0.144, p = 0.044), showing a further interaction with the right fronto-central region (β = − 0.488, SE = 0.02, p = 0.0164). However, there was also a drumbeat by right fronto-central interaction in the 5 Hz frequency band (β = − 0.413, SE = 0.02, p = 0.043). As we would expect observations in the 5 Hz frequency band to be as inconsistent as those in the 7 Hz band, this latter finding should be interpreted with caution. Overall the mixed effect analyses for the two-month-olds show evidence for a difference in 2 Hz mean phase angle between the silent condition and the syllable (cosine values) and the silent condition and the drumbeat (sine values, parieto-occipital region).
At six and nine months, the LMEM analyses showed a significant interaction between condition and frequency bin for both models (see Table C2). The cosine value of the drumbeat differed from the base case at 2 Hz (β = 0.284, SE = 0.141, p = 0.044), while the sine value differed for the syllable at 2 Hz (β = 0.459, SE = 0.138, p = 0.0009; drumbeat at 2 Hz marginal β = 0.249, SE = 0.14, p = 0.075). The control factor of date of testing affected both the sine (β = 0.12, SE = 0.034, p = 0.0004) and cosine values (β = 0.112, SE = 0.034, p = 0.0009), indicating an overall group difference in phase angle in line with the findings reported in Appendix B. Investigating the 2 Hz phase angles further, the LMEM models showed that they differed significantly between the right frontocentral region and the left temporal base case in response to the drumbeat (cosine value, β = − 0.631, SE = 0.2, p = 0.002; sine value, β = − 0.705, SE = 0.198, p = 0.0003) and the syllable (sine value only, β = − 0.948, SE = 0.195, p < 0.0001). Overall, therefore, compared to the silent condition, the 2 Hz phase angle differed on the y-axis for the syllable and the x-axis for the drumbeat. Tables C4 (cosine) and C5 (sine) show that there were also some differences between 7 Hz base case and the 3 Hz, 4 Hz, and 5 Hz frequency bands, by condition and by RoI. These comparisons across frequency bins are purely a control measure and do not provide a specific comparison of phase effects in each frequency band, as phase angles at each frequency were observed every 500 ms (i.e. at 2 Hz), and not at a rate adapted to that particular band.

Language analyses
Having established the presence of consistent phase angles at the level of individual infants for both the syllable and drumbeat stimuli, we turn to relations with language development. For the two-months-olds, single-level linear multivariate models were run with sixty-two infants who had parent-estimated 24-month vocabulary (UK-CDI) scores and also had data for all of the two month neural measures. There were no significant effects in this multivariate model. Fifty-seven infants had data for all of the infant-led language measures and complete two-month neural data. The multivariate models showed a significant effect of syllable phase angle (expressed as the sine by cosine interaction) on language outcomes (F(5, 43) = 3.604, p = 0.008). Follow-up single-level linear models (ɑ = 0.01) on separate language measures indicated an effect of syllable phase angle on infant-led 18-month vocabulary scores (Computerised Comprehension Test, CCT; 33). Examining all infants (n = 65) who had CCT data and complete two-month neural data, the interaction between the sine and cosine of the phase angle had a significant effect on their scores (β = 0.437, SE = 0.148, p = 0.005). The beta estimates in the follow-up tests illustrate the size of the effect. This model estimate of 0.437 would correspond to an additional 17 correct questions of 41 in the mathematically-impossible situation in which both the sine and cosine of an angle could be equal to one. Fig. 3 depicts the relationship, split by the control factor. This figure uses radians, because each sine-cosine pair necessarily has a corresponding angular value (e.g. a cosine of 0.5 and a sine of 0.66 corresponds to 60 • or 1.05 rad). To further aid interpretation, Figure C3 in Appendix C is the same as Fig. 3, but plots the sine and cosine values separately. Although the phase angle that coincides with the "peak" of the CCT score distribution differs by the control factor, there was no significant interaction between phase angle and the control factor (β = -0.23, SE = 0.159, p = 0.146), which suggests that the size and direction of the effect was similar for each group. The multivariate models and the follow-up model are reported in full in Appendix C (Tables C6, C7, and C10).
At six and nine months (audio-visual paradigm), the linear mixed effects models did not indicate any robust age-related effects on the vector length, sine or cosine values of the phase response. Therefore, before running the single-level multivariate models, we selected the nine-month age group for analysis because it contained fewer missing cases. Seventy-two children had data for all infant-led language outcome measures and complete nine-month neural data. The multivariate model showed a significant interaction between the drumbeat phase angle (expressed as the sine by cosine interaction) and the control factor (F(5, 58) = 3.759, p = 0.005, see Table C6). This suggests that there was an effect of phase alignment to the drumbeat (at nine months) on language outcomes later, and that it differed by the date of testing. Follow-up measures (ɑ = 0.01) were conducted on separate language measures, and included all infants with available nine-month neural data and behavioural data for each language measure individually (n = 82 for all NWR measures). These tests indicated that the drumbeat angle by group interaction occurred for two of the three non-word repetition (NWR) measures taken at 24 months, namely the proportion of correctly repeated consonants (β = -0.862, SE = 0.215, p = 0.0001) and the proportion of repetition attempts with the correct number of syllables (β = -0.752, SE = 0.224, p = 0.001), but not the correct placement of syllable stress (β = -0.422, SE = 0.216, p = 0.054). The negative values indicate that performance was better for infants whose nine-month phase angles had smaller sine and cosine values. Fig. 3 depicts the relationship between phase angle and correctly-repeated consonants. The multivariate model is reported in table C6, and the follow-up tests on the correct consonants and syllables measures are reported in full in Table C10 in Appendix C.
The parent-reported measures were also predicted by the neural measures taken at nine months (n = 77). The multivariate model showed significant effects of the syllable phase angle (sine by cosine interaction; F(2, 66) = 3.66, p = 0.031) and, as for the infant-led measures, the control factor by drumbeat phase angle interaction was significant (F(2, 66) = 7.54, p = 0.001). The effect of drumbeat vector length was marginal (F(2, 66) = 3.11, p = 0.051). In the follow-up tests on the parent-estimated comprehension and production scores separately (ɑ = 0.025), only the drumbeat phase angle by control factor interaction had a significant effect on subsequent vocabulary size (comprehension: β = -342.52, SE = 113.97, p = 0.004; production: β = -468.58, SE = 118.72, p = 0.0002). Although these beta estimates appear extremely large, again they refer to the difference between two mathematically impossible and idealised situations, with the negative sign indicating that a baby with a sine and cosine of 0 would know 342 more words and say 469 more words than a baby with a sine and cosine of 1. Since the sine and cosine of a given angle cannot be simultaneously equal to 0, or . In all figures, the blue dots are participants, ordered from the participant with the smallest phase angle to the largest (expressed in radians). Values are shown on the left y-axis. Note that participant numbers on the x-axis are not fixed but will vary depending on which participants provided data for the language measure in the plot and which did not. They will also differ by age group, as mean phase angle will have been changed between two and nine months. The red dots are language outcome measures, with values shown on the right y-axis. The red line is plotted by the geom_smooth function in ggplot2 using the loess method to illustrate the trend in the data points. The sinuisoidal shape of the trend lines is indicative of the underlying circular-linear relationship to the phase angle data. (A) shows Syllable Phase angle in radians at 2 months plotted against CCT score at 18 months, split by the control factor for date of testing. The right y-axis shows proportion of correct responses out of 41 trials. (B) shows drumbeat phase angle plotted against Consonants Correct from the NWR task at 24 months, with the proportion of correctly repeated consonants on the right y-axis. Only infants tested after 14th February 2018 are included. (C) shows drumbeat phase angle plotted against CDI comprehension score at 24 months. The right y-axis shows the total number of words on the UK-CDI that parents reported that their child understood. Only infants tested after 14th February 2018 are included. simultaneously equal to 1, the true differences would be more modest. A graphical depiction of the data is provided in Fig. 3.
Complete results of the follow-up tests considering comprehension and production scores separately are shown in Table 3. Two marginal (0.025 < p < 0.05) results in the follow up tests should be noted. The first is the negative effect of a longer vector length in response to the syllable on both comprehension (β = -147.7, SE = 68.68, p = 0.035) and production (β = -154.94, SE = 71.54, p = 0.034). This corresponds to 148 fewer words understood and 155 fewer words produced for a hypothetical infant with a vector length of 1 (i.e. an infant with totally consistent responses to the syllable, and totally inconsistent responses to the silent condition) versus an infant with a vector length of 0 (no difference in consistency between syllable and silence). The second is the positive interaction between drumbeat phase angle and vector length (β = 322.36, SE = 157.71, p = 0.045), which points to an increase in productive vocabulary size of 322 words for a hypothetical full unit increase in both sine and cosine in conjunction with a full unit increase in vector length. Such an increase is imaginary, given that a sine and cosine value of (1, 1) is impossible, and neither value can increase above 1. Likewise, a vector length of 1 is unlikely. A more realistic interpretation might be an increase of 32 words for each combined 0.1 unit increase in sine, cosine and vector length. Taken together, these vector length results indicate that the relation of language development to phase consistency may be more complex than that to phase angle. It may be that consistency is only beneficial in conjunction with an ideal phase angle.

Discussion
The Cambridge UK BabyRhythm data analysed here illustrate that even in early infancy, preferred phase in the tracking of simple, 2 Hz rhythmic auditory stimuli is significantly related to subsequent language development. Accordingly, and consistent with TS theory, the sensory/ neural mechanisms that sample and align low-frequency delta-band oscillations to the modulation frequency of incoming rhythmic auditory input appear to affect how efficiently infants learn language. Phase tracking of the amplitude envelope of speech is known to be present in newborns (Ortiz Barajas et al., 2021). Our data suggest that, as early as two months of age, individual differences in cortical phase alignment to rhythmic inputs at 2 Hz have consequences for language development extending beyond the first year of life.
Although based on simple rhythmic stimuli, our results accord with other recent infant studies of AM noise that identified delta-band effects (Telkemeyer et al., 2011), and with infant studies of natural language processing Attaheri et al., 2022;Menn et al., 2022) showing that the brain tracks rhythmically presented stimuli (including IDS) at their temporal modulation rates. However, it extends this previous research by examining group preferred phase and individual preferred phase in neural responses to rhythmic stimuli. Our approach specifically examined 2 Hz processing, as prior data from children with and without dyslexia has shown a relationship between neural responding at this rate and differences in phonological skills (Keshavarzi et al., 2022a;Power et al., 2013). The vector length and phase angle models utilized here indicated that infant neural responses were drawn into a particular 2 Hz phase relative to the stimulus at all ages. A longer vector length in response to the stimuli (compared to silence), seen at all ages, indicated a consistent phase response. Although the mean phase angle may have differed across individuals, the Rayleigh results indicated significant clustering around a particular point for all ROIs and stimuli by nine months (for the larger, later-tested group in Table 2), indicative of the existence of a robust group-level preferred phase for tracking 2 Hz stimuli by late infancy.
It is difficult to infer an "optimal" phase angle from our data, as the lag between the neural signal in the brain and its recording at the scalp is not known. Nevertheless, the data indicate that individual differences in the size of the phase angle had a significant effect on language performance. At two months, phase angle in response to the syllable on a group of language measures spanning late infancy and early toddlerhood had a significant effect, in particular on toddler vocabulary. The more positive both sine and cosine were, the higher the scores on an infant-led vocabulary task, the CCT. In the mathematically-impossible, idealised situation in which both the sine and cosine of an angle were equal to 1, the model beta estimate of 0.44 would correspond to an additional 17 correct vocabulary tokens out of 41. At two months, significant Rayleigh scores for the drumbeat and syllable were limited to two ROIs. Individual consistency in responses was present at two months, as indicated by the significantly longer 2 Hz vector lengths in response to the syllable versus the silent condition, while the sine and cosine model results suggest a difference in mean 2 Hz phase angle between the silent recording and the stimuli, albeit only over the parieto-occipital region for the drumbeat. Perhaps those infants who showed a specific angle of phase alignment to the syllable at two months were equipped to encode language more efficiently than other infants, leading to improved performance on a test of vocabulary by 18 months.
At nine months, phase angle in response to both the syllable and the drumbeat played a role in language development, via parent-estimated vocabulary (as measured using the UK-CDI, Alcock, Meints & Rowland, 2020). For the drumbeat, phase angle relative to the stimulus was related to both productive and receptive vocabulary when assessed individually, as well as when assessed together in the multivariate models (Table 3, and Table C7 in Appendix C), while for the syllable the effect was only found when controlling for shared variance via the multivariate model. The effects of alignment for both stimulus types could be indicative of a domain-general benefit of timing one's neural responses to an incoming rhythmic stimulus. Indeed, phase angle in response to the drumbeats (but not the syllables) also affected performance on the phonological task of non-word repetition. In the NWR task, two-year-olds had to repeat the stress patterns, consonants and number of syllables in real and non-words that they heard accurately, presented as names for toys. While there was no effect on correct stress assignment, the variation in both consonants and syllables produced correctly was related to phase angles in response to the drumbeat stimulus at nine months. Thus, while it appears that the angle of phase alignment to simple rhythmic stimuli affects vocabulary size regardless of whether those stimuli are linguistic or not, phase angle in response to Table 3 Follow-up univariate linear models on effects of nine month neural measures on 24-month UK-CDI scores. the drumbeats specifically related to phonological development. Alternatively the benefits seen for vocabulary and phonological development could be limited to these stimuli specifically (syllables and drumbeats), rather than reflecting a domain-general factor of neural rhythmic responding. Future research could investigate this possibility. Overall, these data are consistent with our hypothesis, derived from TS theory, that individual differences in the phase alignment between simple 2 Hz rhythmic stimuli and the corresponding 2 Hz neural response is important for language development. The data suggest that the timing with which rhythmic speech is sampled by the brain impacts vocabulary development, with considerable differences in vocabulary size by two years. They also show that phonological development relates to the consistency of neural oscillatory alignment to a simple drumbeat stimulus. The cortical tracking of drumbeats at 2 Hz may therefore be an appropriate target in the search for an early cross-language marker of language difficulties such as developmental dyslexia. Use of a drumbeat could enable development of a very simple biomarker that could be presented in a range of clinical settings. Such simple rhythmic EEG paradigms may enable detection of risk before the child has acquired a well-functioning language system (Colling, Noble & Goswami, 2017;Gibbon et al., 2021). In line with TS theory, these infant data highlight that the consistency and efficiency with which low-frequency neural oscillations sample incoming rhythmic signals plays a mechanistic role in later language development.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.