Learning to generalise but not segment an artificial language at 17 months predicts children ’ s language skills 3 years later

.


Introduction
In order to acquire language, children need to be able to build a vocabularythat is, they must determine what are the meaningful units in speech.They must also develop an understanding of the grammar of their language, which requires finding the relations between those words in utterances (Bates, Bretherton, & Snyder, 1988;Brinchmann, Braeken, & Lyster, 2019;Dale, Dionne, Eley, & Plomin, 2000;Dixon & Marchman, 2007;Labrell et al., 2014).How children resolve these two learning tasks can be characterised as a "chicken-and-egg" problem (Childers, Heard, Ring, Pai, & Sallquist, 2012;Gleitman, 1990;Gleitman, Cassidy, Nappa, Papafragou, & Trueswell, 2005): in order to acquire the meaning of a word, its grammatical role in a sentence has to be understood, yet, in order to understand the role of the word in the sentence, the grammar of the sentence needs to be known.Obviously, children do acquire both vocabulary and grammar.So, how do they accomplish these inter-related tasks?
Studies on children's language development have investigated the intimate relation between vocabulary and grammar development.Vocabulary might provide the base on which grammatical knowledge is built (Caselli, Casadio, & Bates, 1999;Marchman & Bates, 1994;Szagun, Steinbrink, Franik, & Stumper, 2006); alternatively, vocabulary and grammar may be inter-dependent (Dionne, Dale, Boivin, & Plomin, 2003;Moyle, Weismer, Evans, & Lindstrom, 2007), such that developing knowledge in one has a mutually beneficial effect on the other; or, alternatively, both vocabulary and grammar development may be driven by a general language learning ability (Brinchmann et al., 2019;Dixon & Marchman, 2007;Hoff, Quinn, & Giguere, 2018;Language andReading Research Consortium, 2015, Logan &Jia, 2018).Testing children's language skills provides us with a measure that is a composite of children's ability to learn language together with variation in the communicative environment to which children are exposed, the latter of which can have a substantial effect on language development (e.g., Dale et al., 2000;Dionne et al., 2003;Rowe, 2012).In order to isolate children's learning skills from environmental contributions, actually testing this ability to learn may indicate more directly the mechanisms that underly children's different language development trajectories.Thus, instead of measuring children's natural language abilitiesthe outcome of children's learning skillsan alternative approach to understanding language development is to probe language learning skills themselves.
Artificial language learning tasks provide a possible means by which the mechanisms central to language learning can be interrogated.Statistical learning skills have been closely related to language development (Arciuli & Simpson, 2012;Kidd, 2012), and employing artificial languages has enabled researchers to highlight children's varied sensitivity to a range of statistical properties present in language, including both adjacent and non-adjacent dependencies between syllables in words, and words in phrases (Gómez & Gerken, 1999;Raviv & Arnon, 2018;Saffran, 2003).Non-adjacent dependencies are of particular theoretical interest in language learning because they underwrite phonotactic and morphological structure of words and they also occur in grammatical structures of sentences in natural language (Newport & Aslin, 2004), such as in number agreement between nouns and verbs (e.g., the penguin that is grey likes fish).The ability to identify non-adjacencies is thus an important aspect of word learning and grammatical processing (Wang, Zevin, & Mintz, 2019).In addition to identifying non-adjacencies, acquiring grammar requires ability to flexibly apply and generalise such non-adjacencies to new words occurring in a similar structure (e.g., the penguin that waddles eats fish) (Wilson et al., 2020).
In a landmark study, Peña, Bonatti, Nespor, and Mehler (2002) devised an artificial language in which processes involved in learning words and learning grammar could be assessed by testing adult participants in different ways from the same input, by determining participants' sensitivity to, and use of, non-adjacent dependencies.The Peña et al. (2002) language incorporated nonadjacent dependencies that were taken to either define words, in a segmentation task, or define the grammatical structure of the language, requiring generalisation of the non-adjacency structure.Participants heard streams of continuous syllables of the form AXB, where the A and B syllables reliably co-occurred, but the X syllable varied over three other syllables (e.g., da-ro-pi-go-la-tu-da-la-pi-bu-fiko-go-ro-tu-… where in this example the A_B non-adjacent pairs were da_pi, go_tu, and bu_ko, and the X syllables were ro, la, and fi).After listening to the sequence, participants were tested on their ability to extract words from the language by comparing selection of a word as defined by the non-adjacent statistical structure of the language (e.g., da-ro-pi), from a part-word which occurred during the training speech but that spanned across the boundary between two non-adjacent dependencies (e.g., ro-pi-go).Participants were also tested on whether they were able to generalise to new sequences that respected the non-adjacent structure, by comparing selection of a generalised sequence (e.g., da-go-pi) from a part-word (e.g., ro-pi-go).Adult participants were able to segment the speech to distinguish words from part-words, but not able to distinguish generalised sequences from part-words.However, when short pauses were placed between the triplets, participants were able to generalise the grammar.Note that this generalisation task was more difficult than the segmentation (word learning) task, as segmentation requires recognition of a sequence consistent with the statistical structure of the language to which the learner has been exposed, whereas generalisation requires flexibly adjusting to novel sequences within the constraints of the structure.Consequently, accuracy is found to be lower for generalisation than segmentation for these tasks (Perruchet, Tyler, Galland, & Peereman, 2004;Peña et al., 2002).Frost and Monaghan (2016) repeated the studies with adults using the Peña et al. (2002) artificial language with non-adjacent structure in order to determine whether there was a segregation of segmentation and generalisation for this task.They replicated the results showing accurate segmentation performance for these non-adjacencies.However, they also tested the acquisition of generalised grammar sequences in a different way to Peña et al. (2002).The generalised sequences in the Peña et al. (2002) study interposed an A or a B syllable from one non-adjacency within another non-adjacency (e.g., da-go-pi contained go which was an initial syllable in another sequence go-X-tu).In Frost and Monaghan (2016) a novel syllable that had not occurred during the listening phase of the study was used within the non-adjacency.This innovation avoided possible interference from syllables occurring in different positions in the test stimuli than had occurred during training, as in the Peña et al. (2002) study, meaning that a more direct test of the grammatical learning could be accomplished.Frost and Monaghan (2016) showed that both segmentation and generalisation could be achieved from exactly the same speech input, and, in contrast to Peña et al.'s (2002) contention, that prior acquisition of words through segmentation was not required before generalisations of the structure were available to the learner.Frost and Monaghan (2017), using the same language to test segmentation and generalisation, found that sleep affected these two tasks differently, potentially indicating that, though both can be learned simultaneously, representation of words and grammar from the language may be distinct.
These studies were conducted with adults, but Marchetto andBonatti (2013, 2015), using a slightly simplified version of Peña et al. (2002) non-adjacency artificial language, showed that infants too had the ability to segment words and generalise structure from artificial speech sequences.Marchetto and Bonatti (2015) found that 12-month-old children could detect words in continuous speech, and were able to generalise the grammar of the language if pauses were interspersed between the triplets.Whether these pauses are P. Monaghan et al. necessary for children in their learning is an issue we return to in the Discussion.
However, the value of these artificial language learning studies resides in their ability to relate to natural language processes, so the validity of these tasks with regard to natural language abilities must also be established.There is a growing body of research linking artificial language learning tasks with natural language.For instance, Cheung, Hartley, and Monaghan (2022) found that whether children were late talkers or typically developing at 24 months predicted their ability to retain words in an artificial word learning task at 40 months.Similarly, Ahufinger, Guerra, Ferinu, Andreu, and Sanz-Torrent (2021) and McGregor et al. (2022) found that children who were diagnosed with DLD demonstrated impaired learning in an artificial word learning task.In each of these studies, words were individuated in the speech, and only word learning was tested.
Relating early language learning ability in terms of both segmentation and generalisation skills would provide a further benefit over studies that have linked word learning to later language development, because this would also enable interactions between vocabulary and grammar to be examined.A step towards bridging multiple language learning processes detected through artificial language learning as predictors of natural language ability was taken by Frost et al. (2020).In their study, 17-month-old children's ability to learn an artificial language was related to natural language vocabulary skills from 19 through to 30 months.Frost et al. (2020) exposed children to the artificial language from Frost and Monaghan (2016), and then tested their segmentation and generalisation from this artificial language.Performance on both artificial language segmentation and generalisation were then examined in relation to children's vocabulary measured by a communicative development inventory completed when the children were aged 19, 21, 24 25, 27, and 30 months.Children's segmentation in the artificial language was found to predict natural language vocabulary size at each time point.However, no significant relation was found between generalisation from the artificial grammar and the children's natural language vocabulary development.Frost et al. (2020) study suggests that children's ability to use non-adjacent dependencies to segment words at 17 months was predictive of natural language vocabulary development over the next 13 months.The lack of a significant relation between artificial language structural generalisation and natural language vocabulary could be due to one of two possibilities.First, there may be a division between grammar and vocabulary in language development, such that artificial language segmentation relates to natural language vocabulary development and artificial language generalisation relates to natural language grammar.In this case, we would predict that the artificial language generalisation task should relate to a measure of natural language grammar skills, and the artificial language segmentation task should continue to relate to measures of natural language vocabulary.Second, it may be that there is no clear division between vocabulary and grammar development (e.g., Brinchmann et al., 2019), and that children's ability to learn grammatical structure, as measured by the artificial language generalisation task, may be relevant for vocabulary learning but that this may manifest itself only later in children's language development.
There are a wide range of contributors to children's language learning that have changing influences as children's language develops.In terms of environmental variables, the quantity of children's early language experience is initially important for children's vocabulary development, but the richness of children's input, rather than the quantity, becomes more important as children grow older (Rowe, 2012).Children also become more sensitive to socio-pragmatic cues as they age, which are critical to supporting language learning (Bruner, 1983;Donnelly & Kidd, 2021).The types of information that children build on for learning from within the language also changes with age, with the syntactic constraints of words in sentences becoming more important in defining their meanings (Fisher, Gertner, Scott, & Yuan, 2010;Naigles, 1996;Pinker, 1989).For example, Gleitman et al. (2005) suggested that early vocabulary can be acquired effectively by mapping words onto their referents in the environment, and so acquisition of the child's vocabulary in the early stages requires skills involving identifying the word in speech and linking it to a referent.However, as the vocabulary grows, acquisition of new words requires understanding their role in the sentence rather than just linking to concrete referents in the environment.Later vocabulary development then becomes more dependent upon syntactic bootstrapping, whereby words are acquired by incorporating grammatical structure information to constrain and develop later lexical mappings (St. Clair, Monaghan, & Christiansen, 2010).Thus, at 30 months, natural language vocabulary development may be closely related to ability to identify words in speech, explaining the link between artificial language segmentation and vocabulary development (Frost et al., 2020).However, after 30 months, we may see greater involvement of grammatical learning abilities predicting more advanced stages of vocabulary learning.If this is the case then we would predict that the artificial language segmentation and generalisation tests should both relate to natural language vocabulary at later stages in language development.
In this study, we investigated the extent to which the artificial language learning tasks of segmentation and generalisation measured at 17 months relate to natural language vocabulary and grammar tasks at 54 months.These ages were selected because children at 17 months are close in age to the point at which children are first able to detect non-adjacent dependencies (Marchetto & Bonatti, 2015), and, for our sample, children at 54 months are just beginning formal schooling, which is an important age regarding children's language development and its influence on later academic achievement (von Hippel, Workman, & Downey, 2017).We know that the artificial language segmentation at 17 months can predict natural language vocabulary ability up to 30 months (Frost et al., 2020).We also know that there is a close relationship between natural language vocabulary and grammar skills in preschool children (Brinchmann et al., 2019;Language and Reading Research Consortium, 2015).Our first research question was thus whether the artificial language segmentation task predicts natural language vocabulary development over a longer period, up to the point at which children commence formal schooling in the UK (at 54 months), and also, whether this artificial language segmentation task can predict children's grammar skills at 54 months.If so, language learning in a simple segmentation task early in children's development could be useful to provide insight into processes involved in long-term language development across the preschool years.
Our second research question investigated whether the artificial language generalisation task at 17 months was useful for predicting vocabulary and grammar at 54 months.Though there is a correlation between natural language vocabulary and grammar skills (Brinchmann et al., 2019), and these skills can be learned at the same time (Frost & Monaghan, 2016), they may still be underwritten by different processes (Frost & Monaghan, 2017;Peña et al., 2002).If there is a distinction between vocabulary and grammar skills learning then we might observe that the artificial language generalisation task only relates to natural language grammar skills, and conversely that the artificial language segmentation task may relate only to natural language vocabulary skills.Alternatively, the generalisation task may not relate to natural language skills at all, consistent with Frost et al. (2020) finding that, at 30 months, no relationship was evident between artificial language generalisation and natural language vocabulary.A further possibility is that the greater complexity of the generalisation task may emerge as a valuable predictor of both vocabulary and grammar development at a later stage in children's language development (Gleitman et al., 2005).As natural language becomes more complex, the child's ability to build this complexity may be better predicted by more complex statistical processing required for generalising the structure of language, and utilise this structure to acquire new words.
In this study, we investigated the same cohort of children who had been tested on their vocabulary development at 30 months in Frost et al. (2020).We followed these children to age 54 months, at the point at which they were just beginning formal schooling in the UK, and then tested their natural language vocabulary and grammar skills on a number of different tasks.In a pre-registered analysis (https://osf.io/8y67m/?view_only=ab7fbaf576e54fac9a2245ee4a6adf8b), we determined whether the 17-month artificial language segmentation and generalisation learning tasks could predict language skills 37 months later.

Participants
Participants were from a cohort of children from 95 families recruited as part of a larger longitudinal project in the North West of England, the Language 0-5 project (Rowland et al., unpub.).For more details of the project see the Language 0-5 Project OSF site (https://osf.io/kau5f/).Nine of these families had a history of language delay or dyslexia, but the children were typically developing when tested at 17 months in terms of their receptive vocabulary.All children were monolingual, born at term, and had normal vision and hearing at both testing timepoints.For the 17-month test of artificial language learning, there were 71 children tested (40 females, 31 males; aged between 16.5 and 17.5 months, mean age = 517 days).For the 54-month tests of natural language skills, there were between 62 and 72 children who completed each test.

Materials
For the artificial language learning task, the speech was composed of four sequences where the first and third syllable (ba_so, li_fe) co-occurred, and the second syllable (mu, ga) could vary over one of two syllables.The four sequences that occurred were thus bamuso, bagaso, limufe, and ligafe.Speech stimuli were produced at 140 Hz monotone pitch using the Festival speech synthesiser (Black, Taylor, & Caley, 1990).
For the training, sequences were synthesised as continuous speech such that co-articulation occurred between every syllable (within and across sequence boundaries).There were no direct repeats of sequences, and each of the six sequences occurred 200 times.All infants heard the same training sequence.The training speech lasted for approximately 15 min.Speech faded in for 5 s at the beginning and out for 5 s at the end of the training, and was presented through a loudspeaker attached to a computer.
For the segmentation testing trials, the four sequences consistent with the co-occurring non-adjacent syllables were used as the target stimuli, and foil sequences were created from sequences that straddled the boundary between the target sequences, so they occurred during the training but were not consistent with the statistical structure of the language.The foil sequences were sobamu, feliga, gasoli, and mufeba.
For the generalisation testing trials, the target stimuli were constructed from the non-adjacent dependent first and third syllables of the target sequences, with a novel syllable intervening between them.Thus, the generalisation targets were baniso, baposo, lidufe, and livefe.The foil stimuli used the last syllable of one of the training sequences and first syllable of another sequence with a novel syllable occurring before or after these syllable pairs.The four generalisation task foils were posoba, nifeli, solive, and febadu.Segmentation and generalisation testing trials were presented on a computer screen, with sound presented through loudspeakers.

Procedure
All children were tested in exactly the same way for the artificial language learning tasks (with the same order of tasks and same order of stimuli within tasks).This reduced the inter-individual variation due to extraneous contributions and follows protocols required for adapting experimental tasks to individual difference measures (Cooper, Gonthier, Barch, & Braver, 2017;Panter, Tanaka, & Wellens, 1992).However, this individual differences approach to reducing inter-individual variation introduces the possibility that there may be interference effects from one task to another.We return to this point in the Discussion.
During training, the child listened to the training speech for 15 min played at a comfortable volume while they engaged in quiet, non-verbal play with the experimenter.Then, the child was tested using an adapted head-turn preference paradigm (Kemler Nelson et al., 1995).For testing, the child was moved to a car seat positioned in front of an SR Research Eyelink 1000 + eyetracker (SR Research: Ottawa, Ontario, Canada) at a distance of 580-620 mm from a 17″ computer screen.The eyetracker was operated in remote mode with the remote arm configuration and a target sticker which tolerated a small amount of movement by the child.Speakers were P. Monaghan et al. positioned to the left and right side of the screen.
Testing commenced with a five-point calibration.Then, the child was presented with the segmentation test items.Infants heard a test item repeated with a 500 ms intervening pause, played through either the left or right speaker.The sounds were accompanied by an animated clip of a slow-moving hand against a black background, presented on the left or right of the screen corresponding to the sound source, similar to Marchetto and Bonatti (2015) paradigm.Trials lasted up to 65 s (as in Marchetto & Bonatti, 2013, 2015), but terminated when the child looked away from the slow-moving hand stimulus for more than 2 s.A fixation stimulus then appeared at the centre of the screen, and the next trial was presented when the child fixated on this for 2 s.Each target and foil stimulus was presented twice, so there were 16 trials in all.
After the segmentation test, the child watched a cartoon for 135 s with non-verbal soundtrack, and then proceeded with the generalisation test.The procedure was identical to that of the segmentation test, and there were 16 trials in total.

Analysis and scoring
For the artificial language learning tasks, responses were filtered by removing trials shorter than 700 ms (which was the approximate duration of one stimulus) and trials longer than 2SD beyond the mean looking time across trials.Data from children who failed to provide responses for at least one target and foil trial were removed, and this resulted in data for one child being removed for the segmentation task, and data from nine children being removed for the generalisation task.The mean number of trials included per child was 11 (range = 2-16) for the segmentation task and 9 (range 2-14) for the generalisation task.
For both artificial language learning tests, we used the individual measure of preference for each child computed over the whole set of stimuli, developed by Frost et al. (2020).For the segmentation task, this measure takes the Cohen's d effect size of preference for looking to words over part-words for each individual child.A positive value for that child indicates longer looking to words over partwords, a negative value indicates longer looking to part-words over words, and a value close to zero indicates no preference.For the generalisation task, the measure was the effect size of each child's looking time to generalised sequences over part-words.A positive value indicated the child looks more to the generalised sequences than part-words, a negative value indicated more looks to the partwords, and a value close to zero indicates no preference.

Materials
Natural language vocabulary was tested with a set of standardised assessments.We used the British Picture Vocabulary Scale (BPVS-3, Dunn, Dunn, Sewel, & Styles, 2009) as a receptive vocabulary test, and the Renfrew Word Finding Vocabulary Test (Renfrew & Mitchell, 2010) as an expressive vocabulary test.Natural language grammar was tested with the Test for Reception of Grammar (TROG-2, Bishop, 2003) as a receptive grammar test, and the Renfrew Bus Story revised edition (Renfrew, 2010) as an expressive grammar test.

Procedure
At 54 months, the children in the study were tested on a large battery of assessments, and four of these tests were included in the pre-registration for this study which focused on expressive and receptive tests of vocabulary and grammar.The tests that we analysed in the current study were administered in the following order: (1) the TROG-2; (2) the Renfrew Word Finding Vocabulary Test; (3) the Renfrew Bus Story Task; and (4) the BPVS-3.For all assessments, children sat in a quiet room with the experimenter, accompanied by a caregiver.Children completed each assessment sitting opposite the experimenter at a table.
The TROG-2 was used to assess children's receptive grammar.Children were shown groups of four pictures and asked to point to the picture that matched the sentence spoken by the experimenter.The sentences increased in difficulty, from simple to complex as the test progressed.There were four sets of pictures per block and 20 blocks in total.We began at block A for all children, and proceeded until they reached the discontinue criterion for the test, which was five failed blocks in a row.
The Renfrew Bus Story Test was used to measure children's expressive grammar through narrative speech.The experimenter told children a story about a bus that was illustrated with 12 pictures.Following this, children were asked to retell the story.This was recorded with a Dictaphone and transcribed and coded after the session in CHAT format (MacWhinney, 2000) using the associated Computerized Language Analysis (CLAN) software.
The BPVS-3 was used to assess children's receptive vocabulary.Children were presented with an array of four pictures and were asked to select the picture that matched the word spoken by the experimenter.The words increased in difficulty throughout the assessment.There were 12 picture arrays in each set and 14 sets in total.At 54 months children started at set 2 and all items were presented until they reached the discontinue criterion in the test, which was eight or more wrong in one set.
The Renfrew Word Finding Vocabulary Test assessed children's expressive vocabulary.Children were shown individual line-drawn pictures and asked, 'What's this?' for each picture.The pictures were presented in order of difficulty until the discontinue criterion was reached, which was failure to name five pictures in a row.
P. Monaghan et al.

Analysis and scoring
For the TROG-2, we used the raw scores for the test.For the Renfrew Bus Story task, we used two measures of expressive grammar from the transcriptions scored according to the manual (Renfrew, 1994).The first was sentence length score, which was calculated as the mean length in words of each child's 5 longest utterances.These scores were rounded to the nearest whole number.The second was subordinate clause score, where children were awarded a score of 1 if a sentence contained a subordinate clause, and 0 if not.The total number of sentences containing subordinate clauses was the resulting measure.Sentence length and subordinate clause use are likely to be related, as longer sentences are more likely to contain subordinate clauses, but subordinate clauses provide a more focused assessment of the hierarchical structural complexity of children's grammatical productions.
For the BPVS-3 we used the raw scores for productive vocabulary.For the Renfrew word finding vocabulary test, we again used the raw scores of accuracy.

Descriptive statistics
Descriptive statistics for all the measures are shown in Table 1, and the correlations among the measures are shown in Table 2. Table 1 also reports the standardised scores for the BPVS-3 and TROG-2 tests.
In order to analyse the relations among the 17-month artificial language learning measures and vocabulary and grammar abilities at 54 months, we conducted structural equation modelling, using lavaan in R (Rosseel, 2012).We used full information maximum likelihood to account for missing values in the dependent variables.For the modelling, we used standard fit indices with cut-off points as recommended by Hu and Bentler (1999), with good model fit indicated by root mean square error approximation (RMSEA) < 0.06, a comparative fit index (CFI) > 0.95, and Tucker-Lewis index (TLI) > 0.95 (Hu & Bentler, 1999), and χ 2 difference tests, where a significant difference would indicate that the model and data diverge and that the model is therefore not a good approximation to the data.

Confirmatory factor analysis: Determining relations among the natural language measures
In accordance with our pre-registered analysis plan, we first determined whether the two vocabulary measures (BPVS-3 and Renfrew Vocabulary) and the three grammar measures (TROG-2, Renfrew sentence length, Renfrew subordinate clause score) could be effectively combined into two latent variables (one for vocabulary and one for grammar) using confirmatory factor analysis.The model did not provide a good fit to the data, RMSEA = 0.290, CFI = 0.806, TLI = 0.515, χ 2 (4) = 28.144,p <.001.Furthermore, the model fit resulted in a correlation that problematically exceeded 1 between the vocabulary and grammar factors (indicating that the covariance matrix was not positive definite, Chen, Bollen, Paxton, Curran, & Kirby, 2001).This was likely due to the high correlation between the TROG-2 and the vocabulary measures (see Table 2).
We followed our pre-registration plan by next examining modification indices, which test how model fit to the data changes by relaxing a constraint of the model (i.e., adding a path into the model).We found the greatest change in fit by adding TROG-2 to the vocabulary factor, indicating that the TROG-2 measure shared variance with the vocabulary measures, as well as the grammar measures.The correlation between the TROG-2 and the BPVS-3 measures may be partly due to shared similarities in terms of the way in which the task is presented and how children respond: both are receptive tests, requiring the child to listen to speech, then select one of four pictures.Logan and Jia (2018) found a similar close relationship between these tests for children in the same age range.We thus addressed this poor fit of the model to the data by conceptualising the natural language measures as a single factorthat is, a general language skill construct representing all the natural language vocabulary and grammar measures togetherrather than separating vocabulary and grammar into separate factors, to see if this resulted in a good fit to the data.
Running this confirmatory factor analysis resulted in poor fit to the data, RMSEA = 0.280, CFI = 0.773, TLI = 0.545, χ 2 (5) = 33.324,p <.001.Again, according to our pre-registration plan, we tested the modification indices, which indicated that adding covariance between the Renfrew sentence length and Renfrew subordinate clause measures would improve model fit, and so we also added this pathway.As these measures were computed from the same task, it is likely that they do covary.Adding this covariate resulted in a good fit, RMSEA = 0.000, CFI = 1.000,TLI = 1.012, χ 2 (4) = 3.381, p =.496.

Structural equation model: Linking artificial language learning at 17 months to language skills at 54 months
We next applied these results of the confirmatory factor analysis to structural equation modelling.First, we conducted the preregistered original plan for the structural equation model relating the 17-month measures of artificial language segmentation and generalisation to the vocabulary and grammar factors.We anticipated that this structural equation model would not provide a good fit, because of the variance shared between the TROG-2 measure and the BPVS-3 vocabulary measure, which is not respected in the model, and this was confirmed to be the case.These results are reported in Supplementary Information S1.
The second way we addressed structural equation model fit, paralleling the approach taken in the confirmatory factor analyses, was to construct a structural equation model with one latent variablea general language constructto represent shared variance among all the language measures at 54 months, and including covariance between the Renfrew sentence length and Renfrew subordinate clause measures.This model enabled us to test whether the natural language measures at 54 months were fit well by conceiving of them as a unitary skill, and then determine whether the 17-month artificial language learning measures related to this general language skills latent variable at 54 months.The general language construct model, however, does not enable us to distinguish the extent to which the artificial language learning measures predict natural vocabulary and grammar development individually, but it does enable us to determine whether the shared variance between natural language vocabulary and grammar skills is predicted by either artificial language segmentation or generalisation performance.Interrogating the correlations in Table 2 also provides us with insight about relations among particular natural language measures and the artificial language learning tasks.
The model included one latent variablea general language skills variableand covariance between the Renfrew sentence length and subordinate clause scores.The model fit was good, RMSEA = 0.043, CFI = 0.987, TLI = 0.978, χ 2 (12) = 13.326,p =.346.This model is shown in Fig. 1.The model demonstrates a significant negative relation between the artificial language generalisation task at 17 months and the language measures at 54 months, indicating that longer looking times to foil stimuli over target generalisation stimuli related to higher natural language scores.Taken together with the correlations among the individual natural language measures, the results demonstrate that the artificial language segmentation task at 17 months did not predict either natural language vocabulary nor natural language grammar skills at 54 months, but that the artificial language generalisation task did predict a combined measure of natural language vocabulary and grammar skills.
The relations between artificial language segmentation and generalisation with the combined natural language latent variable are shown in Fig. 2.

Discussion
Our primary aims for this study were to determine whether and how measures of segmentation and generalisation in an artificial language learning task, presented to children at 17 months, could predict natural language vocabulary and grammar performance 37 months later, just prior to the children beginning school.Language and communication skills at school entry are a key predictor not only of literacy development but also academic success more generally (von Hippel et al., 2017).Understanding the mechanisms underlying children's language skill development early in childhood has key advantages for supporting children who are likely to be delayed in terms of their communicative development (Dale, Price, Bishop, & Plomin, 2003;Leonard, 2009Leonard, , 2014;;Reilly et al., 2010;Rescorla, 2009).Fig. 2. Individual children's scores on (a) artificial language segmentation and (b) artificial language generalisation tasks compared to the combined natural language vocabulary and grammar latent variable from the confirmatory factor analysis model.Notes: show linear regression of the relation, with grey shaded areas indicating 95% CI; for the generalisation task, one outlier on the generalisation task was removed from the figure to show the relation more clearly.
P. Monaghan et al.Regarding our first research question to determine the link between artificial language segmentation at 17 months and language development at 54 months, we assessed a structural equation model which represented a combined language variable, constructed from a set of natural language vocabulary and grammar skill tasks given to children at 54 months.We found no evidence of segmentation performance at 17 months predicting vocabulary or grammar 37 months later.In previous analyses of the same children's language development up to 30 months, Frost et al. (2020) found that the 17-month segmentation task did predict vocabulary: children who showed a greater preference for foils (non-conforming stimuli) over words (stimuli that conformed to the non-adjacent statistical language structure) had larger vocabularies at age 30 months than children who showed a greater preference for words over foils.
So, what changed between 30 months and 54 months in terms of vocabulary development?One possibility is that segmentation of an artificial language vocabulary learning task can account for language development in earlier stages, when the structures that the child is required to learn are less complex, and the child's language mastery is less complete.Consistent with Frost et al. (2020), Weyers, Männel, and Mueller (2022) found that 8 to 10 month old children with better language skills showed sensitivity to nonadjacent sequences in an artificial language as measured using EEG, but no sensitivity was detected in children with poorer language skills.Similarly, Gerbrand, Gredebäck, Hedenius, Forsman, and Lindskog (2022) found that visual sequence processing (of adjacent dependencies) at 10 and 18 months predicted children's vocabulary at 18 months.Thus, statistical processing of sequences, required to identify words in speech, may effectively predict children's earlier vocabulary skills.However, the ability to segment words may become less relevant to later stages of natural language vocabulary development.Though there are many changes in children's sensitivity to, and use of, different information sources in language learning (Bruner, 1983;Donnelly & Kidd, 2021;Pinker, 1989;Rowe, 2012), the isolation of mechanisms associated with segmentation of speech into words, and generalisation of structure from the non-adjacent statistical dependencies in speech, is consistent with accounts of language development that describe the growing importance of the role of syntactic constraints on language learning (e.g., Fisher et al., 2010).For example, Gleitman et al. (2005) proposed that after the initial stages of vocabulary development, the structural relations among words in speech become more important in vocabulary acquisition, whereas at earlier stages of vocabulary development word learning can proceed by merely identifying the speech sounds and linking them to referents.Our artificial language task thus relates to Gleitman et al. (2005) hypothesis: segmentation of words in artificial language speech is predictive of vocabulary development up to 30 months, but not at 54 months.Though syntactic bootstrapping accounts of language development are often discussed with regard to verb learning, using the contextual constraints of words is known to also be helpful for acquisition of words from other grammatical categories (e.g., Mintz, 2003;Monaghan, Christiansen, & Chater, 2007).Furthermore, processing the sentential context of a word becomes more important as children must learn different senses of words, and determine how relational terms affect the meaning of the content words in an utterance, which are processes that increase in importance as children's language progresses over the first 5 years of life (e.g., Harris, Golinkoff, & Hirsh-Pasek, 2011;Naigles, 1996).The computation of context, therefore, likely applies accumulatively to words from all grammatical categories.
An alternative possibility for the change in results from 30 months to 54 months is that the way in which vocabulary was measured affected the predictive relationship between the artificial language segmentation task and the natural language measures.In the measures up to 30 months, children's vocabulary was estimated using Communicative Development Inventories (Alcock, Meints, & Rowland, 2020;Meints & Fletcher, 2001).These are caregiver report questionnaires that ask which words on a checklist a child can say and/or understand.In the measures given at 54 months, the child's vocabulary was directly interrogated either by measuring children's comprehension of spoken words by indicating one of a series of responses (for the BPVS-3), or by analysing the child's production of words in response to pictures (for the Renfrew Vocabulary test).However, it seems unlikely that there is a qualitative difference between direct tests of children's vocabulary and caregiver report measures, as CDI scores and direct measures of receptive vocabulary tend to correlate highly (Fenson et al., 1993;Pan, Rowe, Spier, & Tamis-Lemonda, 2004).
Our second research question concerned whether children's ability to generalise over the structure of an artificial language at 17 months predicted later language development.Here, we did observe a significant effect.The structural equation modelling demonstrated that the artificial language generalisation task at 17 months predicted differences in natural language skills 37 months later.The children with higher vocabulary skills showed longer looking times towards novel foil stimuli (that did not conform to the grammatical structure of the language) over target stimuli (that did conform to the grammatical structure of the language).This is a similar direction as the relationship that Frost et al. (2020) observed for predicting vocabulary development up to 30 months from behaviour in the artificial word learning task at 17 months.In that study, children with a preference for looking longer at novel foil stimuli, that do not conform to the non-adjacent structure of the language (i.e., that were not words in the language) over target stimuli where the sequences do conform to the non-adjacent structure (that were words in the language), are those with larger vocabularies.In the current study, this novelty preference for non-conforming over conforming generalisation sequencesboth containing a novel syllable that the children had not heard beforerelated to their language abilities at 54 months.Note though that Frost et al. (2020) did not find a relation between artificial language generalisation performance and vocabulary up to 30 months.
In terms of evidence for distinctions between vocabulary and grammar skills in early natural language development, the results are inconclusive.Our study was not specifically designed to test how these natural language skills develop, which would require regular measures of vocabulary and grammar longitudinally and investigating the relation between those skills (e.g., Brinchmann et al., 2019;Hoff et al., 2018).Rather the focus of our study was the cognitive precursors of those natural language vocabulary and grammar skills.What we did find is that a confirmatory factor analysis with vocabulary and grammar as separate latent variables did not provide a good fit to the data, because in our data there was a strong relationship between the vocabulary and the grammar measures: children whose vocabulary skills were strong also demonstrated good grammar skills.In part due to this strong relationship, the model with one latent variable representing a general language construct provided a good fit to the data, indicating that, for the set of tasks included in this study, there was substantial shared variance between vocabulary and grammar in children's language skills.The correlations shown in Table 2 indicate that there are correlations among tasks measuring grammar, and between the two measures of vocabulary, but that there are also correlations across the vocabulary and grammar tasks, which may also partially reflect the ways in which the skills were measured (such as the high correlation between the BPVS-3 measure of vocabulary and the TROG-2 measure of grammar which are both receptive measures requiring selection of one picture from a set).
These results are neither inconsistent with models of language development that posit a general cognitive skill that underlies development of both vocabulary and grammar (e.g., Brinchmann et al., 2019) nor also models that propose there is a degree of separation of these skills (Caselli et al., 1999;Hoff et al., 2018;Marchman & Bates, 1994;Szagun et al., 2006).From our natural language data measures, deciding between these alternatives is not possible.Our study was designed, however, to relate the artificial language learning tasks to natural language measures, and there we see an emerging distinction in the learning abilities relating to language development at 54 months, with generalisation predicting language skills but segmentation not predicting language at this stage of development.
However, the set of natural language tasks measuring vocabulary and grammar spanned both receptive and expressive tasks, and as previously mentioned the correlation in performance between the receptive tasks, in particular, seemed to be strong.In the LARRC et al. ( 2018) study on 60-month-old children, there was a high correlation between vocabulary and grammar constructs (r = 0.935), and, as in the current study, BPVS and TROG tests were included in the set of measures designed to assess vocabulary and grammar in natural language.In the Language and Reading Research Consortium (2015) study on children in pre-school and first year of formal schooling, vocabulary and grammar were found to not be reliably distinct constructs consistent with the current study.It may be that more reliable distinctions between vocabulary and grammar constructs emerge later in language development, beyond the preschool years, at least for receptive language.
In parallel with the debate about distinctions in vocabulary and grammar in children's language development, the artificial language learning tasks used at 17 months were originally devised to determine whether segmentation and generalisation are dissociable processes in language learning, highlighting distinctions in the mechanisms involved in word and grammatical learning.Peña et al. (2002) and Marchetto and Bonatti (2015) found that only when pauses were placed between sequences in the speech were learners able to generalise the structure of the language.Frost and Monaghan (2016) found that both segmentation and generalisation were possible when novel, rather than repositioned, syllables were used in the test sequences, casting doubt on a computational distinction between segmentation and generalisation of this language at least for adult learners.The current study shows that pauses in the stimuli are not necessary for adjusting learning from segmentation of words to grammatical generalisation for infants either, as the individual differences on the segmentation and generalisation tasks (using a language without pauses between non-adjacent sequences) were both found to relate to language development (in Frost et al., 2020, and in the current study, respectively).Hence, children's identification of words and generalisation over the artificial language were both assessing early mechanisms relevant to language development.However, our study does point toward a dissociation in these two tasks in terms of how they relate to different stages of children's language development.At age 30 months, segmentation is predictive of vocabulary skills, whereas by 54 months, it is generalisation that predicts variance in language skills.Segmentation may be more critical early in language development, with generalisation becoming more important as the child's language matures and as the language experience and structures from which children learn become more complex (Gleitman et al., 2005;Huttenlocher, Vasilyeva, Cymerman, & Levine, 2002;Theakston & Lieven, 2017).
A further advantage of using continuous speech, rather than presenting pauses between the sequences, is that we can then be more certain that the non-adjacencies are being responded to by the learners, rather than positional information.When sequences are separated by pauses, the learner may become sensitive first of all to the position of syllables within the sequence.Indeed, in a study with adults using the same language as Pena et al. 's (2002) with pauses between sequences, Perruchet et al. (2004) showed that learners had a preference for sequences that respected the position of syllables over part-words, where the syllable positions were consistent with those used in the words in training, but the non-adjacencies were violated.Hence, in studies with pauses, positional information encoding is a potential confound with non-adjacent dependency processing.In a series of studies, Endress and Bonatti (2007) showed that adult learners first acquired syllable positions of non-adjacent triplets when there were pauses between the sequences, before acquiring dependencies among those syllables.However, when pauses were not present between the sequences, the positional information was not acquired before sensitivity to the non-adjacent dependencies, and so the positional information is unlikely to be a confound to determining processing of non-adjacencies when pauses, or other cues indicating positions of syllables, are not present within the sequences.Further, the use of novel syllables to test generalisation ensured that violations of syllables in particular positions were not driving the results, which could have been a further potential confound in previous studies where generalisation was tested by moving syllables from other positions in the sequences (see Frost & Monaghan, 2016, for discussion).
A potential issue in the design of the study was that the artificial language generalisation task occurred after the segmentation task for all children.This order effect many have manifested itself in two ways.First, it may be that those children that performed well on the second task were those more able to attend to tasks for longer, relating to general cognitive resources enabling endurance of an additional task.Second, it may be that there was contamination from the first task to the second, meaning the measure of generalisation is not distinct from learning that could have occurred during the segmentation testing.
We contend that these explanations are unlikely for a number of reasons.In terms of generalisation performance being a proxy for the cognitive resources of the child, it is possible that children with greater resources could attend to the task longer, thus showing more sensitivity to the generalisation stimuli, and children with more cognitive resources may have higher natural language skills (independent of the artificial language task).However, critically, in the generalisation task children showed sensitivity to the stimuli in two ways, and this sensitivity related either to higher or lower language skills.Children who responded with novelty preference to the generalisation stimuli were those with higher language skills, whereas those with familiarity preference to the generalisation stimuli were those with lower language skills (see Fig. 2b).Children who did not demonstrate sensitivity to the stimuli were those with P. Monaghan et al. intermediate language skills.
To test this effect further, we included an additional measure of cognitive resourcesworking memoryinto the structural equation model to determine whether the artificial language tasks predicted natural language skills when working memory was also included as a predictor of natural language.The working memory test was the BRIEF-P digit span task (Gioia, Isquith, Guy, & Kenworthy, 2000), which was measured with the children in this study at 37, 43, and 49 months.We selected the 49 month test due to it being the closest in time to the natural language measures taken at 54 months.As this analysis was not part of our preregistration, the results are reported in the Supplementary Information S2.The key results in the structural equation model shown in Fig. 1 were unchanged: when working memory resources were also taken into account, artificial language generalisation, but not segmentation, still predicted natural language skills.
In terms of the second potential limitationthat generalisation performance could have been affected by segmentation performanceagain, we suggest that this order effect does not seem to exert an effect on the key results.We attempted to minimise the possibility of learning and transfer from the stimuli used for testing segmentation to those used for testing generalisation in the design of the study.We placed a 2 min pause between these tasks to mitigate against interference.In addition, as Frost et al. (2020) note, the segmentation test contains target stimuli containing non-adjacent dependencies, but also an equal number of foil stimuli containing no non-adjacent dependencies, thus reinforcing words and part-words to an equal degree.We do acknowledge, however, that there may have been greater similarity between the segmentation and generalisation targets than between the foils for these tasks.However, if there was an influence of one task on the other then we would expect a correlation between the tasks, yet there was no significant correlation between segmentation and generalisation scores (see Table 2), meaning that sensitivity to the structure for segmentation testing did not result in higher or lower sensitivity for the generalisation test.Similarly, if ability to remember the segmentation test stimuli influenced performance on the generalisation test, then we would anticipate an effect of working memory on generalisation test performance.There was no significant correlation, r(55) = -0.25,p =.089.
For these reasons, we are confident that the generalisation test scores are a reflection of the child's use of non-adjacent dependencies to generalise to novel, but conforming, stimuli.Though children's communicative development is in large part a consequence of their environment (Dale et al., 2000;Dionne et al., 2003), there is still a component of individual differences in cognitive processes that affects development.The current study contributes to a growing literature that is uncovering the mechanisms involved in children's language development (Ahufinger et al., 2021;Cheung et al., 2022;Hartley, Bird, & Monaghan, 2020;McGregor et al., 2022;Peter et al., 2019).The variance explained by the artificial language generalisation task is significant, but relatively small, accounting for approximately 11 % of the variance in later natural language skills, and so the value of our result is primarily to elucidate the processes involved in language development rather than to be useful as a predictor of natural language skills.Nevertheless, our direct measures of language learning may explain additional variance in language development alongside established measures such as early vocabulary size, nonword repetition performance, socioeconomic status, family history of language disorders, and gender (Dale et al., 2003;Hammer et al., 2017;Hodges, Munro, Baker, McGregor, & Heard, 2017;Law & Roy, 2008;Marini, Ruffino, Sali, & Molteni, 2017;Stokes & Klee, 2009;Rescorla, 2009).
In this study, we have investigated whether a language learning task administered early in children's language developmentat 17 monthsis able to predict language skills 37 months later.We found that it is.The results demonstrate the possibility of using laboratory-based learning tasks to probe children's ability to acquire language, to pinpoint different cognitive mechanisms involved in children's language learning, and predict the pattern of children's language development over the critical first few years of life.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 1
Descriptive statistics for all measures.

Table 2
Pairwise correlations among the measures.