Non-adjacent dependency learning in infancy, and its link to language development

To acquire language, infants must learn how to identify words and linguistic structure in speech. Statistical learning has been suggested to assist both of these tasks. However, infants’ capacity to use statistics to discover words and structure together remains unclear. Further, it is not yet known how infants’ statistical learning ability relates to their language development. We trained 17-month-old infants on an artificial language comprising non-adjacent dependencies, and examined their looking times on tasks assessing sensitivity to words and structure using an eye-tracked head-turn-preference paradigm. We measured infants’ vocabulary size using a Communicative Development Inventory (CDI) concurrently and at 19, 21, 24, 25, 27, and 30 months to relate performance to language development. Infants could segment the words from speech, demonstrated by a significant difference in looking times to words versus part-words. Infants’ segmentation performance was significantly related to their vocabulary size (receptive and expressive) both currently, and over time (receptive until 24 months, expressive until 30 months), but was not related to the rate of vocabulary growth. The data also suggest infants may have developed sensitivity to generalised structure, indicating similar statistical learning mechanisms may contribute to the discovery of words and structure in speech, but this was not related to vocabulary size.


Introduction
To reach linguistic proficiency, infants must master two critical tasks; identifying words in speech, and discovering the constraints that shape the way those words are used.Although speech contains no absolute cues to word boundaries (Aslin, Woodward, LaMendola, & Bever, 1996) or grammatical structure (Monaghan, Christiansen, & Chater, 2007), it is replete with distributional information that could assist with these tasks: regular co-occurrence of particular syllables provides a helpful description of what constitutes specific words in a language, whereas information about how words are used in combination helps illustrate how that language operates in terms of its grammatical structure.The ability to draw on such information (statistical learning) has therefore been suggested to play a key role in language acquisition (e.g., Conway, Bauernschmidt, Huang, & Pisoni, 2010;Kidd & Arciuli, 2016;Lashley, 1951;Redington & Chater, 1997;Rubenstein, 1973).
Taken together, these lines of research provide converging evidence that infants can draw on the statistical properties of language to learn about multiple linguistic features.Further, these studies suggest that learners may develop the capacity to employ statistical learning mechanisms for discovering words and basic structure at relatively similar points in development (though see e.g.Frost & Monaghan, 2016;Peña, Bonatti, Nespor, & Mehler, 2002;and Perruchet, Tyler, Galland, & Peereman, 2004 for the debate concerning the nature of the statistical mechanisms these tasks employ).However, to our knowledge, infants' ability to perform these tasks together during learning remains to be demonstrated (see Marchetto & Bonatti 2013, 2015).In the current study we address this directly, and test whether 17-month-old infants can discover both word boundaries and linguistic structure (non-adjacent dependencies) together, using co-occurrence statistics alone.Further, we examine the way that infants' ability to do so relates to their language development outside of the laboratory.

Testing acquisition of words and linguistic structure
In both infant and adult research, learners' capacity for joint acquisition of words and language structure has been assessed using artificial languages comprising non-adjacent dependencies -statistically reliable relationships between two items that are separated in speech.Non-adjacent relationships are pervasive in language, and exist at multiple levels of language structure, including syntax (i.e., the relationship between the auxiliary verb and the present participle verb form in the sun is shining), morphosyntax (i.e., cooccurring prefixes and suffixes, e.g., uncovered, independently), and number agreement (i.e., the lion at the zoo roars, the penguins at the zoo swim).Artificial grammars comprising words with morphological non-adjacent dependencies (with an AXC structure, where A and C reliably co-occur, regardless of X) provide the ideal platform for assessing word and structure learning together, since they contain sequences that learners need to discover (words, i.e., AXC strings), as well as structural regularities (i.e., A_C relationships, see e.g., Frost & Monaghan, 2016;Marchetto & Bonatti, 2013, 2015;Peña et al., 2002;and Perruchet et al., 2004, for assessments of word and structure learning using AXC-style input).
To examine the way that word and structure learning proceed in infants, Marchetto andBonatti (2013, 2015) trained infants on an artificial language with an AXC structure, and examined their capacity for segmenting words from speech, and generalising the internal morphological structure these words contained (the A-C relationships).Learning was tested with the head turn preference paradigm, indexed by differences in looking times to different items at test 1 .In their first set of studies, Marchetto and Bonatti (2013) examined whether the emergence of morphosyntactic structure learning actually precedes statistical segmentation, with the view that the former may act as an economical solution to identifying possible word candidates in speech.To this end, they pitted adjacent transitional probabilities against nonadjacent dependency structure at test, and examined whether infants relied on one type of information over the other to identify likely word candidates in speech.They report that 12-and 18-month-olds preferentially drew on within-word structure (A-C relationships) when speech was segmented with pauses.When speech was continuous (as is more typical of natural language), 18-month-olds relied more on transitional probabilities to identify likely words, whereas 12-month-olds showed no preference (they did not discriminate between the two types of test item).
Their second set of studies explored related issues in 7-and 12-month-olds, and found that at 12 months, infants could use statistical relationships between syllables to extract words from continuous speech, but could only detect the non-adjacent dependencies contained in the words if speech was segmented (Marchetto & Bonatti, 2015).Seven-month-olds were unsuccessful at both segmenting words and generalising A-C structure, though they were able to discriminate between words and non-words after exposure to a segmented version of the speech stream.The authors concluded that infants' capacity for learning can be seen to critically develop over time, such that by 12 months infants possess the cognitive resources to analyse the internal composition of words (but can only do so if the speech contains information that aids segmentation).
These studies provide an informative first foray into infants' capacity to discover words and structure using the statistical information contained in speech.Yet, in each of these studies, a small number of design features make performance somewhat difficult to unpack (see footnote 1 for an overview of the test stimuli).Further, due to methodological differences across the sets of studies, it is 1 In Marchetto and Bonatti's (2013) study with 12-and 18-month olds, sensitivity to words and structure was assessed together with part-word versus rule-word comparisons; part-words occurred in speech with relatively higher frequencies than other items such that they were sound candidates for lexical items, and rule-words comprised an A-C dependency with an intervening A or C from another pairing, such that they were grammatical, but new.Inferences about infants' performance were based on the direction of infants' looking preferences for these comparisons.While infants did attend differentially to these stimuli, unpacking the nature of this difference is difficult (see related comments about interpreting looking preferences in the Results and Discussion sections), particularly given the combined assessment of these skills.In their 2015 study with 7and 12-month olds, sensitivity to words and structure was examined separately using looking times to words versus non-words (to assess segmentation) and rule-words versus non-words (to assess structure learning).difficult to compare these datasets -meaning the developmental trajectory for these tasks is yet to be conclusively established.Thus, further research is needed to understand infants' ability to discern statistically defined words and structure from speech -both in terms of the nature of these processes, and when and how they develop.
Nevertheless, when viewed together these data suggest that the discovery of statistically-defined words and (word-internal) structure may be underpinned by different processes (i.e., statistical segmentation versus algebraic computation of structure), each with a slightly different developmental trajectory.This proposed distinction in processing between word-and structure-learning is consistent with much of the adult literature on this topic (e.g., Peña et al., 2002), and is in line with the suggestion that learners perform these tasks separately during language acquisition, drawing on separable and distinct computations for segmenting speech versus generalising structure (Marcus et al., 1999, Peña et al., 2002).However, recent advances in the adult literature have highlighted a possible methodological confound which may have influenced performance in the research that generated these conclusions.Specifically, in prior studies, generalisation of structure was typically assessed with comparisons involving "rule words" -a familiarised A-C dependency, with an intervening A or C element from a different dependency -versus an item that is infrequent or absent from the training speech.Structural generalisation would be evidenced by a preference for rule words over the competitor item on a 2AFC test with adults, or a difference in looking times to rule words and part-words/non-words in infant head-turn preference studies.
Though such comparisons permit assessment of preference for the overall structure, they require learners to use trained A and C items flexibly in a way that conflicts with their knowledge of where those syllables should occur within sequences.Frost and Monaghan (2016) argued that this may have constrained learners' willingness to generalise, and suggested that learners may be able to do so in the absence of such conflict.Indeed, using amended generalisation stimuli (containing entirely novel intervening items, rather than repositioned A or C items), Frost and Monaghan (2016) demonstrated that adults could learn about words and linguistic structure at the same time, in the absence of additional information such as pauses between words (see Frost, Isbilen, Christiansen, andMonaghan (2019), andIsbilen, Frost, Monaghan, andChristiansen (2018) for replications of this effect).Thus, the processes underlying word and structure learning may be more similar in nature than previously suggested, with statistical learning about words and linguistic structure possibly being served by the same (or at least the same type of) mechanism.
With this in mind, it is possible that infants' true capacity for learning about non-adjacent dependencies from continuous speech may not have been detected in previous studies, perhaps due to limitations of the learning measure, rather than the learner.We propose that implementing methodological changes to the stimuli in line with those made by Frost and Monaghan (2016) would provide a closer approximation of infants' capacity to generalise non-adjacent dependencies, shedding further light on the developmental trajectory for these tasks.

Statistical learning, language development, and individual differences
A key question in interpreting data from artificial language learning studies is what performance on these tasks actually means in terms of natural language development.That is, how does infants' ability to detect patterns in an artificial grammar relate to how they learn language in the world outside of the laboratory?Recent research with adults has indicated that participants' ability to compute statistics over artificial grammars relates to their competence on other linguistic tasks, shedding light on the way statistical learning skills may shape or reflect language learning more broadly.For instance, Isbilen et al. (2018) demonstrated that adults' capacity to compute statistically defined non-adjacent dependencies relates to their ability to learn more naturalistic language structure on a cross-situational learning task that taught learners a small-world version of Japanese.
Emerging evidence for the role of statistical learning in language acquisition also comes from literature on individual differences, which seeks to determine whether variation in learners' performance on experimental language learning tasks relates to variation in natural language skills.There is growing support for the existence of a meaningful relationship between an individual's statistical learning ability and their "real-world" language skills for both children (e.g., Kidd, 2012;Kidd & Arciuli, 2016) and adults (e.g., Christiansen, Conway, & Onnis, 2012;Conway et al., 2010;Misyak, Christiansen, & Tomblin, 2010), strengthening the possibility that statistical computations play a role in natural language acquisition.
Recent work by Lany (2014) and Lany and Shoaib (2019) shed new light on this relationship by demonstrating that infants' performance on a statistical language learning task differed as a function of their natural language ability.Lany (2014) tested infants' ability to map distributional information onto semantic categories (animals and vehicles), then examined whether their capacity to do so was related to their vocabulary size.Co-occurrence of determiners and nouns during familiarisation was found to inform infants' formation of semantic categories, helping them to use the new nouns as labels.However, this was only the case for infants with higher scores on the MacArthur-Bates CDI measure of grammar development (Fenson et al., 2007) -providing a promising indication that infants' capacity for statistical learning relates to their language learning outside of the lab, with more advanced users of natural language outperforming their peers on the statistical learning task.
Further, Lany and Shoaib (2019) found evidence to suggest that for some 15-month-olds, their ability to learn non-adjacent dependencies in an artificial language learning task (dependencies between words in segmented speech, e.g., Gómez, 2002) may be related to their vocabulary size at the time of testing, and possibly at prior and subsequent points in development (at 12 and 18 months).For some participants, there was also evidence that performance at 15 months predicted later sensitivity to analogous non-adjacent dependencies in natural language (tested at 18 months).However, the effects in this study were complex, with substantial differences across sexes, and the relationships described here were not observed uniformly across participants -most correlations were only observed for the small sub-sample of females (Nrange = 10-16).
Consequently, more research with different statistical structures, and different age groups, is needed to understand the relationship between statistical learning and language development more fully.Here, we contribute to this literature by examining whether infants' capacity for statistical segmentation relates to their vocabulary size.An important step in understanding the role of statistical learning in language acquisition is to look at how it relates to other language skills across time, over development, as well as concurrently.There is a growing body of literature suggesting that infants' early linguistic skills relate to their subsequent language development (typically indexed by CDI scores) -giving critical insight into the extent to which particular linguistic skills serve language learning more broadly.For instance, research on phonetic perception suggests that infants' behavioural (Tsao, Liu, & Kuhl, 2004) and neural responses (Molfese, 2000;Molfese & Molfese, 1985, 1997;Rivera-Gaxiola, Klarman, Garcia-Sierra, & Kuhl, 2005) to phonemic speech sounds may play a role in explaining the language skills of those children at later points in development (see Cristia, Seidl, Junge, Soderstrom, & Hagoort, 2014 for a review).Similarly, research on speech segmentation has found that infants' recognition of new words in spoken utterances relates to their vocabulary development, and this relationship has been shown for both behavioural (Newman, Bernstein Ratner, Jusczyk, Jusczyk, & Dow, 2006;Newman, Rowe, & Bernstein Ratner, 2016;Singh, Reznick, & Xuehua, 2012) and neural (Kidd, Junge, Spokes, Morrison, & Cutler, 2018;Junge, Kooijman, Hagoort, & Cutler, 2012) indices of speech segmentation.Research has also found a relationship between laboratory-based word learning ability and vocabulary size (Bion, Borovsky, & Fernald, 2013).
For statistical learning, though, it is not yet known how infants' performance on laboratory tests of word segmentation and structure learning relates to their language development (but for related preliminary evidence, see Lany and Shoaib's (2019) study of nonadjacency learning from pre-segmented speech).While much research has documented infants' ability to draw on statistics in speech to learn about words and within-word structure, infants have never been found to be capable of performing both tasks together from statistics alone.Further, it is not yet known how infants' performance on these tasks relates to different aspects of language development.For instance, word learning and structure learning may be separable processes (Marchetto & Bonatti, 2015), in which case we might expect only statistical segmentation to relate to vocabulary development (that is, if it relates at all to natural language learning).Alternatively, if word learning and structure generalisation involve related processes, both might relate to vocabulary development.
We thus extended the work of Marchetto and Bonatti (2013;2015) and Frost and Monaghan (2016), to examine whether infants, like adults, can compute word-like and structure-like regularities at the same time, from the same set of distributional statisticswithout any extra cues in the speech signal (i.e., pauses between words).Demonstrating that infants are able to detect both words and structure from this input would have important implications for the simultaneity of these tasks in language acquisition, and the statistical computations that may underlie them.
Importantly, we also tested whether infants' statistical learning ability related to their natural language ability, both concurrently and over development; if the transitional information contained within speech does support natural language acquisition (or if children' s language development supports their ability to compute over the statistical information contained in speech), it follows that infants' capacity for statistical learning on this task may be related to their language development in the world outside of the laboratory.This relationship could take two forms; statistical learning ability may relate to vocabulary size (i.e., children with a greater capacity for statistical learning may have larger vocabularies), and it may also relate to vocabulary growth (i.e., children with greater statistical learning ability may increase their vocabulary more rapidly over development).To test this possibility, we examined whether performance related to a measure of natural language development taken at the time of testing (UK-CDI, Alcock, Meints, &Rowland, 2020) , andat 19, 21, 24, 25, 27, and30 months (Lincoln CDI, Meints &Fletcher, 2001). 2e expected that infants would be able to segment the speech, and generalise the language structure to novel consistent items (Frost & Monaghan, 2016).Further, we expected that infants' performance on the segmentation task would relate to their concurrent vocabulary size (Junge et al., 2012;Kidd et al., 2018;Lany, 2014;Newman et al., 2006;Newman et al., 2016;Singh et al., 2012), and possibly their vocabulary growth over time.Testing whether generalisation of the artificial language also relates to vocabulary development provides insight into the similarities or potential distinctions between word learning and grammatical generalisation.

Participants
The experiment was completed by 71 infants (40 females, 31 males; aged between 16.5 and 17.5 months, mean age = 517 days), recruited from Liverpool, UK.All infants were monolingual native English learners, born at term, with normal vision and hearing.All infants were typically-developing at the time of testing.Infants were tested in the laboratory at The University of Liverpool.
This study forms part of a larger longitudinal project in the North West of England, the Language 0-5 Project (Rowland, Bidgood, Durrant, Peter, Pine, unpub).Ninety-five families were recruited to take part.Of these, one family was excluded due to responses on a family background questionnaire (persistent ear infections likely to affect hearing) and four withdrew before the project began.This resulted in a final sample of 90 families.Out of the final 90 families, nine had a family history of language delay or dyslexia.More general information about sample background can be found in Peter et al. (2019).

Design
The data were collected as part of a large-scale study of language development and individual differences in language acquisition.Due to the unique requirements of group-level and individual differences level research, studies attempting to assess both must typically prioritise one over the other.Here, we prioritised assessment of individual differences, and controlled for this statistically to test for effects at the group level.Thus, to minimise task-related variance across learners, all participants received the same stimuli, in the same order (see Procedure section for further information regarding this decision, and see the results section for details of how we controlled for this in our analysis).Ethical approval was given by the University of Liverpool Research Ethics Subcommittee for Non-Invasive Procedures (RETH000764) for the project.

Training
A 15-minute-long continuous speech stream was created using the Festival speech synthesiser (Black et al., 1990) by concatenating the four AXC words (bamuso, bagaso, limufe, ligafe)3 .This was produced using a female voice at 140 hz, with the constraint that no A i X j C i sequence was immediately repeated.In the speech stream, transitional probabilities for A-C syllables were always 1, while probabilities for A-X and X-C transitions were 0.5.The likelihood that a particular AxC word would be followed by another given word was 0.33.The speech stream was edited to have a 5 s fade in and out, so that the onset and offset of the speech could not be used as a cue for segmentation.4

Testing
We assessed segmentation by measuring looking times to two types of trials; words and part-words.Word trials comprised repetitions of one of the words used in the familiarisation stream (e.g., bamuso bamuso bamuso…).Part-word trials contained repetitions of items that occurred in the training speech but straddled word boundaries, comprising the last syllable of one word and the first two syllables of another word (C i A j X; sobamu, feliga), or the last two syllables of one word and the first syllable of another (XC i A; gasoli, mufeba).There were therefore four word trials and four part-word trials, each of which was presented twice, giving 16 trials in total (see Marchetto & Bonatti, 2015).
We assessed generalisation with trials containing repetitions of rule-words and non-words (Frost & Monaghan, 2016;Marchetto & Bonatti, 2015).Rule-words comprised an A i _C i non-adjacency, intervened by one of the four novel syllables (so, taking the form A i NC i , where N indicates the novel syllable; baniso, baposo, lidufe, livefe).Non-words were part-words in which one syllable was replaced with a novel syllable, to ensure any preference observed on generalisation trials could not be attributed to the presence of a novel syllable alone.Novel syllables could appear in the initial or final position (so, taking the form NC i A j ; posoba, nifeli, or C i A j N; solive, febadu) with two trials adhering to each possible non-word structure.There were therefore four trials of each item type, and each was repeated twice, giving 16 trials in total (see Marchetto & Bonatti, 2015).
The presentation order of trials was pseudo-randomised using the same criteria as in Marchetto and Bonatti's study (2015), with no immediate repetition of particular items, and a maximum of three consecutive trials of the same type (with regard to both word type, and left/right location of stimulus presentation).The precise presentation order for each task is given in the supplemental materials.

Procedure
Infants were familiarised with the experimental language for 15 min via incidental learning (Gómez, Bootzin, & Nadel, 2006;Saffran, Newport, Aslin, Tunick, & Barrueco, 1997), with infants playing quietly with the experimenter (i.e., with no verbal communication) while the speech stream played at a comfortable volume in the background.During the incidental learning phase, caregivers completed questionnaires for another component of the Language 0-5 project (these questionnaires are not relevant for the current study, so will not be discussed further).
Following familiarisation, we assessed infants' learning using an adaptation of the classic head turn preference paradigm (Kemler Nelson et al., 1995), modified to incorporate an eye-tracker, which measured infants' looking times to each test trial.Eye movements were recorded using an SR Research Eyelink 1000 plus (SR Research: Ottawa, Ontario, Canada) in remote mode using the remote arm configuration with a target sticker, which permits stable tracking while accommodating some level of movement.Infants were seated in a car seat in front of the eye-tracker (affixed to a 17" LCD monitor), which was uniquely positioned for each child such that the display distance was 580-620 mm.Trials began after successful five-point calibration.
Sound stimuli were played through speakers positioned behind the monitor, to the left and right sides of the screen.Test items were paired with a visual stimulus (an animated clip of a slow-moving hand, as in Marchetto & Bonatti, 2015), set against a black background, which appeared onscreen on either the left or the right, in accordance with the location of the sound.Individual test trials occurred twice; once to the left, and once to the right.On each trial, infants heard repetitions of a test-item, separated by a 500 ms pause, with items played in the same voice and at the same rate as in familiarisation.Trials could last for a maximum of 65 s (Marchetto & Bonatti, 2013, 2015), and were gaze contingent, such that trials terminated if an infant looked away from the visual stimulus for more than 2 s.After each trial ended, a fixation stimulus appeared at the centre of the screen to re-direct infants' attention, and the next trial began after infants had attended to this for 2 s.
To minimise item and task order related variance, which would add noise to our individual differences analyses, all infants completed the segmentation trials first followed by the generalisation trials, and all trials were presented in the same pseudorandomised order (for more justification of this decision, which is necessary for adapting group-based experimental procedures for individual differences designs, see e.g.Panter, Tanaka, & Wellens, 1992 for a summary of the effects of item and test order on the reliability of comparisons across individuals, and see Cooper, Gonthier, Barch, & Braver, 2017, for details of how to apply these considerations to individual differences research in cognition.For information on how we controlled for this statistically in our grouplevel analysis, see the Results section).The segmentation and generalisation phases were separated by a brief comfort break, during which infants watched a short cartoon (a 135 s excerpt of Pingu -chosen for its lack of linguistic content).For each infant, familiarisation and testing took place in the same laboratory.Caregivers were asked to refrain from communicating verbally with their infant during both familiarisation and testing, and were asked to avoid directing infants' attention at test.
Because this experiment formed part of a large longitudinal cohort study assessing language development in children, infants tested for this study had participated in studies assessing various aspects of language learning prior to this session.However, none of these studies contained the same words or grammar-like rules as this study, and none of them examined infants' capacity for statistical language learning.On the day of testing, infants did not complete any other behavioural studies prior to participating in the study at hand.

Data preparation
Filtering criteria were applied to the data: Trials shorter than 700 ms (the approximate length of a test item) were excluded from analysis, as were trials with looking times greater than 2SD beyond the mean looking time for that trial.In their study, Marchetto and Bonatti (2015) excluded trials with looking times shorter than 1000 ms.However, we implemented a lower minimum cut-off to maximise the amount of useable trials, and to align this cut-off with the stimuli such that looking times could be more confidently linked to attending to the test items.All data that permitted comparison of looking to the different types of experimental trials were included in the analysis; that is, infants were only excluded if they failed to provide data for at least one of each trial type after the data were filtered.For segmentation, data for 70 participants was included in the analysis, and the mean number of trials included per child = 11 (range = 2-16).For generalisation, data for 61 participants was included in the analysis, and the mean number of trials included per child = 9 (range = 2-14).See the supplemental materials for replications of the main group-level analyses for each task with the full, unfiltered datasets (all critical effects are replicated with the raw dataset).

Data analysis
We first examined infants' performance on the segmentation and generalisation trials, assessing looking behaviour on each of these tasks separately.We then examined whether infants' performance on these tasks related to their concurrent CDI scores.In subsequent analyses, we investigated the relationship between statistical learning ability (speech segmentation) and vocabulary development over time.
Note that for both segmentation and generalisation, due to possible effects of trial order we do not make inferences based on overall group means; instead, we report pre-planned analyses that control for trial order statistically (see Sections 3.2.1,3.2.2, and 3.2.3).
Linear mixed-effects analysis was performed on the data for the segmentation trials (Baayen, Davidson, & Bates, 2008), which modelled the probability (log odds) of looking times considering variation across participants and materials, as well as across the two types of test items (words and part-words), to determine whether these differentially affected looking behaviour.
The model was built incrementally, and was initially fitted specifying random effects of subject, gender, and stimuli location, to account for variation in performance across participants and across items displayed on either the left or right of the screen.Random intercepts and slopes were omitted if the model failed to converge with their inclusion.We then added fixed effects and interactions for trial order and test item type; these were added incrementally, and were retained in the model if significant.Trial order was included as a fixed effect as we predicted a habituation-related decline in looking times to stimuli over the course of the task (it was also important to control for this given that all infants received the same trial order, due to the individual differences nature of the design).Importantly, we added this to our model first so that subsequent comparisons could test whether there was a difference in looking to word versus part-word trials over and above any effects of trial order. 5Experimental effects are thus effects that are observed once variation associated with order has been accounted for, and are therefore not due to performance on any particular trial.A summary of the final model (i.e., the most complex model at the end of this incremental process) is reported in Table 1.
The linear mixed-effects analysis revealed a significant effect of trial order, with looking times decreasing as anticipated over the course of the session (model fit improvement over model containing random effects: χ 2 (1) = 105.48,p < .001).Crucially, there was a significant effect of trial type, over and above the effect of trial order, indicating that infants responded differently to words and part-words (model fit improvement over model containing random effects and a main effect of trial order: χ 2 (1) = 5.128, p = .023),suggesting that they had segmented the words from the speech stream.
Likelihood ratio test comparisons indicated that model fit was significantly improved when we added the interaction term for trial type and trial order, with infants' looking times to words and part-words changing over the course of the task (model fit improvement over model containing just main and random effects: χ 2 (1) = 9.843, p = .002).This interaction is likely to be a product of habituation, as the difference in looking times for word versus part-word trials reduces over the course of the session (see Fig. 1).
Linear mixed-effects analysis was performed on the data for the generalisation trials (Baayen et al., 2008), which modelled the probability (log odds) of looking times considering variation across participants and materials, and across the two types of test items (rule-words and non-words), to determine whether infants looked differently to rule-words and non-words at test.A summary of the final model is reported in Table 2.As with the segmentation analysis, the model was built incrementally, and was initially fitted specifying random effects of subject, gender, and stimulus location, to account for variation in performance across participants and across items displayed on either the left or right of the screen.Random intercepts and slopes were omitted if the model failed to converge with their inclusion.We then added fixed effects and interactions for trial and test item type, with significant main effects/ interactions being retained in the model.
As expected, there was a significant effect of trial order, with looking times decreasing over the course of the session (model fit improvement over model containing random effects: χ 2 (1) = 24.011,p < .001).Critically, there was a significant effect of trial type indicating that infants responded differently to rule-words and non-words (model fit improvement over model containing random effects and a main effect of trial: χ 2 (1) = 8.626, p = .003),suggesting that infants were sensitive to the structure of the words in the speech stream (see Fig. 2).
Again, likelihood ratio test comparisons indicated that model fit was significantly improved when we added trial type, trial order and the interaction term for trial type and trial, with infants' looking to rule-words and non-words changing over the course of the task (model fit improvement over model containing just main and random effects: χ 2 (1) = 6.1982, p = .013).This interaction could relate to a preference switch during the task, as the difference in looking times between trial types seems to fluctuate over the course of the session.Alternatively, this could be due to participants not converging on a stable representation.
In sum, as a whole our sample discriminated between words and part-words in our segmentation task, and between non-words and rule-words in our generalisation task.

Indexing performance with Cohen's d
In order to examine the relationship between infants' statistical learning skills and their natural language abilities, we required a measure of individual infants' performance on the task, indexing the size of the difference in looking times between test stimuli, but also taking into account the variance in looking times for each child.For this purpose, we computed a measure of effect size (Cohen's d) for each participant.For the segmentation data, these were calculated by subtracting looking times to words from looking times to part-words, then dividing this by the pooled standard deviation of looking times across all segmentation trials (per infant).A positive effect size would indicate a preference for part-words (novelty preference) whereas a negative effect size would indicate a preference for words (familiarity preference).An effect size around zero would indicate no clear preference.
For the generalisation data, effect sizes were calculated by subtracting looking times to rule-words from looking times to nonwords, then dividing this by the pooled standard deviation of looking times across all generalisation trials (per infant).A positive effect size would indicate a preference for looking toward the non-words (i.e., a novelty preference to sequences not conforming to the A_C non-adjacency structure), whereas a negative effect size would indicate a preference for looking toward the generalised, rule-  ) is broken down into trial number by trial type (1-8), to illustrate the relative difference between looking on the first, second, etc. trial of each type (though we did not alternate perfectly between types of trial, and trial order was statistically controlled for in the analysis).We note that individual differences analysis (reported in Section 3.2.4)shows that segmentation performance was not homogeneous; see this section for a visualisation of looking behaviour split by looking preference, and for evidence of the stability of the effects over the task (thus, initial trial performance does not drive the observed effects of trial type).See supplementary figure iii for an illustration of looking behaviour residualised against trial.

Table 2
Summary of the linear mixed-effects model of (log odds) looking times on the generalisation trials.word items (thus, a familiarity preference to the A_C structures).
As yet, there is no established method for mapping individual differences in looking preferences onto representations of knowledge.However, Cohen's d allowed us to ascertain differences between individuals, while taking into account individual variation in looking behaviour across the task.Both the size and direction of this difference in looking could be indicative of the nature and extent of learning.For instance, children may demonstrate familiarity or novelty preferences according to the extent to which the stimuli are treated as linguistically relevant, or novel.Note that processing mechanisms that result in patterns of familiarity or novelty preference will exert opposite effects on learning, so observations of no preference in some children may be a consequence of these opposing forces balancing out over the task, instead of tipping the scales in a particular direction.Equally, the size of the effect could indicate learning, with higher scores possibly denoting greater (and consistent) preferences.Here, we assume that children who show a consistent novelty preference across trials have encoded the information better than those who show no overall preference or a consistent familiarity preference, on the basis that a novelty preference indicates better encoding of the familiarised stimuli than no preference or a familiarity preference. 6igs.3 and 4 below illustrate infants' performance on the segmentation and generalisation tasks, respectively, for children with each type of preference (and for children with no overall preference).There was no significant correlation between Cohen's d scores for segmentation and generalisation performance (Pearson's r = −0.02,N = 61, p = .892),suggesting that infants' looking behaviour on the segmentation and generalisation trials was not statistically related.

Statistical learning and vocabulary development 3.2.4.1. Relationship with concurrent vocabulary.
We first determined the relationship between infants' statistical learning and their concurrent vocabulary, by correlating infants' statistical learning scores (Cohen's d) with their UK CDI scores (expressive and receptive) at 17 months.CDIs were completed either on the day of the experiment, or within the week prior to testing.
For speech segmentation, children's effects were expressed along a continuum from preferences for familiar words (negative effect size) to preferences for novel, part-word stimuli (positive effect size), see panels A and B of Fig. 5. Overall, there was a significant positive correlation between statistical learning performance and vocabulary size, both for expressive (Pearson's r = 0.32, N = 68, p = .008)and receptive (Pearson's r = 0.32, N = 68, p = .007)scores -suggesting statistical language learning skills and vocabulary may be critically related.Of particular note is the direction of this relationship; data indicate that participants with larger vocabularies (indexed by larger CDI scores) showed larger preferences for novel, rather than familiar, items at test, whereas the opposite was true for children with smaller vocabularies.This result in line with Hunter and Ames' (1988) model, which suggests that children with larger vocabularies are more advanced in their linguistic development than children with smaller vocabularies, and are thus more inclined to show a novelty preference.
To verify the possible maturational distinction between familiarity versus novelty seekers on our segmentation task, a ternary split was applied to the data; dividing the sample into familiarity seekers, novelty seekers, and infants with no preference.A Cohen's d of 0.2 is traditionally considered a small effect size (Cohen, 1992).Therefore, children with a Cohen's d of −0.2 or below were classed as having a familiarity preference, while children with a Cohen's d greater than −0.2 but less than 0.2 were classed as having no preference, and children with a Cohen's d of 0.2 or above were classed as having a novelty preference.
For structure generalisation, there was again a continuum of effects, but statistical learning performance and concurrent vocabulary size were not significantly correlated for either expressive (Pearson's r = −0.04,N = 59, p = .762)or receptive (Pearson's r = −0.09,N = 59, p = .508)scores (see Fig. 5, panels C and D), and were not explored further.

Relationship with later language development: Segmentation only.
To assess the relationship between segmentation performance and vocabulary acquisition, growth curve analyses (GCA; Mirman, 2014) were performed using lme4 1.1.21(Bates, Mächler, Bolker, & Walker, 2015) in R 3.5.2(R Core Team, 2018).Separate models were fitted to the receptive and expressive vocabulary scores, which were derived from the Lincoln CDI scores taken at 19, 21, 24, 25, 27, and 30 months.As in the previous analyses, the Cohen's d segmentation score was entered as a fixed predictor.To identify the appropriate polynomial order for the age parameter, two separate models were fitted to the data, then compared.The first included age as a centred first-order linear variable, along with the fixed effect of segmentation score.The second entered age as a second-order orthogonal polynomial, in addition to the linear term included in the first model.Both models were fitted with random intercepts for subject, but with no random slopes to maximise comparability.
Model comparison (log-likelihood) indicated a significant difference in model fit ( 2(2) = 38.46,p < .001),with the model 2), respectively.Effect size boundaries for defining preference groups were determined based on Cohen (1992).We note the stability of the familiarity and the novelty effects across the task.containing a second order (quadratic) age parameter (AIC = 3601; BIC = 3631) more likely than the alternative with a first-order linear term for age (AIC = 3636; BIC = 3658).Thus, our GCA model contained an orthogonal quadratic age parameter crossed with the fixed effect of segmentation score.The model was fitted with the maximal random effects structure supported by the data (Barr, Levy, Scheepers, & Tily, 2013) , which included the random intercept of subject, without random slopes.Confirmatory tests were performed using log likelihood-ratios via sequential model decomposition (Bates et al., 2015) with bootstrapped simulations (R = 10000) to obtain 95% CIs and p-values for model estimates (Luke, 2017).The marginal and conditional pseudo-R 2 are also reported for the growth curve model, which represent the proportion of the variance explained by fixed effects alone and the full model, respectively (e.g., Nakagawa, Johnson, & Schielzeth, 2017).
For receptive vocabulary, the GCA demonstrated a significant linear increase in scores across development ( = 224.07 [198.34, 249.77],SE = 13.12,= 127.76,p < .001),but also a quadratic shift in this slope over time ( −40.34 [−53.37, −27.21],SE = 6.67, 2 = 54.31,p < .001,see Fig. 6).While segmentation ability did not have a significant main effect on the intercept ( = 59.47 [9.75, 109.29],SE = 25.39, 2 = 2.13, p = .155),it did interact with the linear term of age ( = -47.33[−90.74,−3.72], SE = 22.2, = 4.55, p = .039),suggesting that the predictive effect of statistical segmentation ability on vocabulary size decreased over development.The fixed effects accounted for 46.85% of variance in the data, increasing to 93.36% with the inclusion of the random effects (R m 2 = 0.47; R c 2 = 0.93).Similarly, for expressive vocabulary, GCA model fit was significantly improved with the addition of a second-order quadratic age term ( 2 (8) = 22.86, p < .001,AIC = 3821, BIC = 3851), compared to a model with only a first-order linear term (AIC = 3840; BIC = 3863).There was a significant linear increase in scores across development ( = 384.07 [346.15, 422.p < .001,see Fig. 7).Unlike for receptive vocabulary, segmentation ability had a significant positive effect on the intercept ( = 70.56 [6.44, 134.85],SE = 32.76, 2 = 4.55, p = .039),with children who demonstrated larger segmentation effects at 17 months having larger expressive vocabularies.Segmentation scores showed no interaction with the linear ( = −4.8[−67.91, 58.75], SE = 32.31, 2 = 0.03, p = .856)or quadratic terms of age ( = 12.14 [−19.90, 44.76], SE = 16.5, 2 = 0.55, p = .464).The model explained 56.04% of the variance in the data without the random effects, and 92.81% when they were included (R m 2 = 0.56; R c 2 = 0.93).Although segmentation abilities were shown to predict both expressive and receptive vocabulary, the GCAs suggest that the nature of this relationship may differ across these two measures over time.However, this difference could be due to limitations in the CDI scales: The relationship between segmentation and receptive vocabulary is seen to plateau around ceiling from 25 months onward, but for expressive vocabulary this is not the case.Thus, it is possible that the receptive measure was unable to capture variance in vocabulary at these later time-points.This is illustrated in Fig. A1 (see appendices); receptive vocabulary scores appear to be normally distributed up to ~25 months, but are negatively skewed thereafter.
Additional exploratory analyses were conducted to establish the specific age points at which segmentation ability statistically predicted receptive vocabulary size.Separate bootstrapped multiple regression models were fitted to the data at 24, 25, 27, and 30 months.These models contained the fixed effect of segmentation ability (there was no within-subject random variance to control  a1a and a1b, see appendices) suggest that segmentation significantly predicted receptive vocabulary at 24 months (p = .031)and marginally at 25 months (p = .053),but not at 27 and 30 months.

Discussion
We examined 17-month-old infants' ability to learn statistically-defined non-adjacent dependencies from continuous speech, to shed light on whether word segmentation and structure learning may proceed together during language acquisition, from distributional statistics alone (i.e., in the absence of additional cues) -as has recently been demonstrated for adults (Frost & Monaghan, 2016).Demonstrating that infants share this same capacity for statistical learning of words and structure would provide critical insight into the nature of the processes that may underlie these tasks in natural language learning, and the time-course in which they may operate.Crucially, we also investigated the way that infants' statistical learning abilities related to their concurrent natural language skills, and subsequent language development, to help shape our understanding of the way in which statistical learning skills may serve (or be served by) language learning more broadly.
We expected to show that infants could compute over the statistical properties of the speech in order to segment it into individual items (Marchetto & Bonatti, 2013, 2015).Analysis of the segmentation data revealed a significant effect of word type on infants' looking times, indicating that infants attended differently to words and part-words at test.This suggests that infants could indeed compute over the statistical properties of the speech to segment it into word candidates, which they could distinguish from competitor items.These data therefore replicated the finding that infants can segment a continuous stream of artificial speech on the basis of the statistical information contained within the input (e.g., Saffran et al., 1996, Aslin et al., 1998).Further, these data provide critical support for prior demonstrations of infants' ability to do so by computing over non-adjacent, as well as adjacent, statistics (Marchetto & Bonatti, 2013, 2015; see also e.g., Frost & Monaghan, 2016;Peña et al., 2002;Perruchet et al., 2004 for demonstrations of this in adults).
In previous studies of morphosyntactic (within-word) non-adjacent dependency learning, segmentation was typically assessed with word and non-word comparisons (where non-words were sequences that had not occurred during habituation), or with comparisons that tested preferences for words and rule-words together, investigating word-and structure-learning simultaneously (Marchetto & Bonatti, 2013, 2015).Here, we tested each task in isolation, and increased the difficulty of the segmentation task by using words and part-words -statistical competitors comprising the end of one word and the start of another (rather than random combinations of syllables; see e.g., Saffran et al., 1996).Using this more difficult and more robust assessment, we confirmed that infants could segment speech by computing over non-adjacent statistical regularities.
We note that the group-level looking preference observed for the segmentation task is different to that observed by Marchetto and Bonatti (2015), with higher mean looking times for word than part-word trials.As this study used a fixed trial order (see Section 2.4), we do not make any inferences about this preference -and instead draw upon our LMER analyses that take trial order into account.Nevertheless, we note that the directional difference observed between Marchetto and Bonatti's (2015) work and our conceptual replication could be due to a number of possibilities, including trial order, exposure duration (Endress & Bonatti, 2007, found increasing habituation to words over part-words and generalised words with longer exposure), the different types of test-pair comparisons used, and potential overall differences related to participants' linguistic maturity at the group level (see Hunter & Ames, 1988) -perhaps due to the linguistic knowledge that infants bring to the task, as shaped by their prior experience with relevant language structure (our infants were acquiring English, which is morphologically poor).Future studies which counterbalance presentation order and examine infants' learning cross-linguistically will be key to disentangling these possibilities.
Critically, infants' segmentation performance (indexed by Cohen's d) was found to correlate significantly with their concurrent vocabulary size, both for receptive and expressive vocabulary, providing further evidence that infants' capacity for speech segmentation in the laboratory relates meaningfully to their real-word language skills (Junge et al., 2012;Kidd et al., 2018;Newman et al., 2006;Newman et al., 2016;Singh et al., 2012), and extending this finding to statistical segmentation of non-adjacent dependencies.These data indicate that infant's statistical language learning abilities may shape, or be shaped by, infant's language proficiency (see also Lany, 2014, andLany &Shoaib, 2019).This demonstration that statistical learning of non-adjacencies supports segmentation of an artificial language stream serves as striking evidence that such artificial language learning studies are probing key mechanisms in natural language development.
Of particular note is the direction of the learning effects on the segmentation task, and the way that the polarity of Cohen's d scores related to infants' CDI scores; infants with lower CDI scores demonstrated a familiarity effect (preferring words), whereas infants with higher CDI scores demonstrated a novelty effect (preferring part-words).This difference is suggestive of a maturational preference-switch (from familiarity to novelty), similar to that demonstrated in the ERP segmentation literature (Kidd et al., 2018).This result is in line with the prior suggestion that infants' looking preferences are dynamic, with directional switches resulting from differences in levels of stimulus encoding (see e.g., Houston-Price and Nakai (2004); Hunter and Ames (1988), and see e.g., Jusczyk and Aslin (1995) and Saffran et al. (1996) for a demonstration of preferential differences on speech segmentation tasks that could (at least in part) be due to differences in exposure and stimulus encoding).
The results from the growth curve analyses indicate that the relationship observed between statistical segmentation ability and vocabulary size at the time of testing persists over development, with segmentation performance significantly statistically predicting receptive vocabulary up to 24 months, and expressive vocabulary up to 30 months.Thus, we propose that infants' statistical learning ability may be an informative predictor of their vocabulary size at a later point in development -possibly even over a year later.However, in the study at hand, segmentation performance is not seen to positively influence the rate of vocabulary acquisition -with no apparent relationship between segmentation scores and growth for expressive vocabulary (i.e., a stronger preference on the segmentation task did not predict faster learning).For receptive vocabulary, there was a negative relationship between segmentation scores and growth, which is likely due to receptive scores reaching ceiling at the later time points.
The lack of relationship between segmentation ability and vocabulary growth could be interpreted in a number of ways.One possibility is that individual differences in statistical learning are unrelated to individual differences in vocabulary acquisition.However, this is unlikely; the data reveal a strong relationship between statistical learning and concurrent vocabulary, and the GCA show a significant effect on the intercept over development -indicating that these abilities are indeed related.A second possibility is that both differences in statistical learning and differences in vocabulary acquisition are due to another underlying factor not assessed here, for instance individual differences in neural maturation (general cognitive ability), speed of processing, or possible differences arising from variation in the socioeconomic background of infants (see e.g., Schwab and Lew-Williams, 2016).Testing the possible influence of additional variables would require assessing a broader array of cognitive skills, and factoring infants' performance on these additional tasks into the analyses.
A third alternative is that differences in statistical learning contribute to differences in the rate of vocabulary acquisition at the earliest stages of language development, so are not captured here.It may be the case that infants' strategies for segmentation change over development, with infants relying more on statistical learning to develop their early lexicon, then incorporating other strategies when they become available, or when infants reach a certain level of proficiency (see e.g., Conway et al. (2010), andFrost, Monaghan, andChristiansen (2019) for evidence that learners may make use of both bottom-up and top-down strategies for speech segmentation).The notion of a developmental shift in infants' speech segmentation strategy is not new; there is much research to suggest that early segmentation is stress-based, before infants turn to the statistical properties of the input (e.g., Johnson & Juscyzk, 2001;Johnson & Seidl, 2009;Thiessen & Saffran, 2003).It is conceivable that infants then adapt their segmentation strategy further upon reaching a certain level of maturity; for instance, by drawing on representations for highly familiar items to help identify word boundaries for neighbouring items (e.g., Bortfeld, Morgan, Golinkoff, & Rathbun, 2005).An early advantage of statistical learning for vocabulary acquisition could explain why children with good statistical learning abilities have larger vocabularies in the study at hand, though further research examining the relationship between statistical learning and vocabulary growth earlier in development is required to test these claims.
Contrary to prior suggestions (Marchetto & Bonatti, 2013, 2015), there was some evidence that infants could generalise the nonadjacent dependencies contained within continuous speech to grammatically consistent but previously unseen sequences (trained dependencies with a novel intervening syllable).Previous studies have demonstrated generalisation only when pauses separated sequences containing the dependencies (e.g., Marchetto & Bonatti, 2013, 2015).The apparent necessity of this pause has been interpreted as requiring speech segmentation to be resolved before generalisation over the grammatical structure can occur.However, in the current study we showed that this additional pause cue was not necessary, and that generalisation could occur in the same brief learning period as segmentation, from the same input.
There are two key possible explanations for the generalisation effects seen here.The first is that 17-month-old infants are able to generalise non-adjacent dependencies under the right testing circumstances, with conflicting use of syllables across test items preventing participants from distinguishing between rule-words and part-words in prior studies (e.g., Marchetto & Bonatti, 2015, see Frost & Monaghan, 2016).Thus, the data could indicate that without this conflicting information, generalisation of the non-adjacencies can be observed in infancy in the absence of additional cues to the language structure (e.g., Mueller et al., 2008Mueller et al., , 2010) ) perhaps proceeding together with speech segmentation.
However, an alternative possibility is that generalisation performance could be a product of test order, with infants acquiring within-word structure over the course of the tasks rather than during familiarisation; infants heard segmentation trials first, and so received additional exposure to the words from the speech stream ahead of the generalisation task.Since words were presented in isolation on the segmentation trials, infants were not necessarily required to learn about the structure of those words while segmenting them from speech in order to succeed on this task.This is unlikely to explain infants' generalisation performance entirely, though, as infants also received equivalent exposure to part-words during the segmentation testing -meaning the segmentation task may have strengthened the child's representations of both words and part-words (which formed non-words) to a similar degree.Nevertheless, it remains a possibility that completing this task first could have influenced subsequent generalisation performance.Future studies addressing generalisation immediately after training (i.e., without the segmentation task) will enable us to firmly disentangle infants' capacity for generalisation from possible effects of task order.Follow-up studies with and without inter-item pauses in the speech stream will also permit a more thorough investigation of infant's capacity to learn structure from speech.
Finding evidence for both segmentation and generalisation could suggest that both tasks may be supported similarly by the same statistical properties of the input.However, the relationship between these tasks, and their differing links to vocabulary development, does not permit us to confirm that the same statistical operations are applying to both tasks (Frost & Monaghan, 2017).We note, though, that the relative draw toward familiar versus novel items may have been somewhat different across these tasks, which could restrict the degree to which these scores could be compared directly; whereas the segmentation task involves a straightforward familiarity (words) versus novelty (part-words) comparison, the generalisation task is perhaps more complex, and could be interpreted as pitting less novelty (rule-words) against more novelty (non-words).
Nevertheless, the results indicate some distinctions between these tasks.First, whereas both tasks individually demonstrate learning, the correlation between segmentation and generalisation performance was not significant, meaning there was no observable relationship between performance on these tasks.Second, while the relationship between vocabulary size and segmentation performance was significant, for generalisation this was not the case; CDI scores did not correlate with performance.This is not the same as a dissociation, and the lack of correlation could have been due to the lower sensitivity of the generalisation task compared to the segmentation task (cf.mean and SD of estimates in Tables 1 and 2), rather than an absence of a relationship altogether.The alternative -that segmentation and generalisation are radically different types of tasks (Marchetto & Bonatti, 2015;Peña et al., 2002) -would predict that only vocabulary scores should relate to segmentation performance here, whereas generalisation performance ought only to relate to tasks associated with distinct grammatical processing.The present results may be seen to better align with the latter, however future tests of these children's grammatical processing abilities would be required to address this directly.
In sum, this study provides further evidence that infants can segment speech by computing over the statistical properties of the input (in this case, non-adjacent dependencies).We find evidence to suggest that children can also detect non-adjacent dependency structure in continuous speech, and generalise this to novel consistent items, though further research is required to establish this conclusively.Crucially, we have shown that laboratory-based studies of children's language learning, in terms of abstract word segmentation from non-adjacent structures in continuous artificial speech, have real-world counterparts in children's language development.Furthermore, we have shown that an individual differences approach to interpreting the effect size on these artificial language learning tasks differentiates children who present with familiarity and novelty preferences, with the direction of the effect corresponding with children's vocabulary size (but not with their vocabulary growth).This insight into individual differences in performance shows that such variation is meaningful, rather than noise, and contributes to further interpretation of novelty and familiarity preferences with respect to language maturation.

Fig. 1 .
Fig. 1.Mean overall looking times to word and part-word trials over the task, with SE.Trial number (1-16) is broken down into trial number by trial type (1-8), to illustrate the relative difference between looking on the first, second, etc. trial of each type (though we did not alternate perfectly between types of trial, and trial order was statistically controlled for in the analysis).We note that individual differences analysis (reported in Section 3.2.4)shows that segmentation performance was not homogeneous; see this section for a visualisation of looking behaviour split by looking preference, and for evidence of the stability of the effects over the task (thus, initial trial performance does not drive the observed effects of trial type).See supplementary figure iii for an illustration of looking behaviour residualised against trial.

Fig. 2 .
Fig. 2. Mean looking times to non-word versus rule-word trials over the course of the task, with standard error.Trial number (1-16) is broken down into trial number by trial type (1-8), to illustrate the relative difference between looking on the first, second, etc. trial of each type.

Fig. 3 .
Fig. 3. Mean overall looking times to word and part-word trials over the course of the segmentation task with SE, given for participants with a familiarity preference (preferring words, d < −0.2), no preference (d = −0.2-0.2), and a novelty preference (preferring part-words, d > 0.2), respectively.Effect size boundaries for defining preference groups were determined based onCohen (1992).We note the stability of the familiarity and the novelty effects across the task.

Fig. 4 .
Fig. 4. Mean overall looking times to non-word and rule-word trials over the generalisation task with SE, given for participants with a familiarity preference (preferring rule-words, d < −0.2), no preference (d = −0.2-0.2), and a novelty preference (preferring non-words, d > 0.2), respectively.Effect size boundaries were determined based on Cohen (1992).

Fig. 5 .
Fig. 5. Scatterplots to show the relationship between concurrent vocabulary scores and performance on the segmentation (panel A: receptive; panel B: expressive) and generalisation trials (panel C: receptive; panel D: expressive).

Fig. 6 .
Fig. 6.The relationship between segmentation (Cohen's d) at 17 months and receptive vocabulary scores over time (19-30 months).Panel A maps the trajectory of vocabulary development for individual participants (given in grey) and for participants providing high (red; > 0; novelty preference) versus low (blue; < 0; familiarity preference) segmentation scores.Bold age points indicate the ages at which vocabulary size was measured.Panel B depicts the relationship between segmentation and receptive vocabulary scores at each individual time point.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7 .
Fig. 7.The relationship between segmentation scores (Cohen's d) at 17 months and expressive vocabulary scores over time.Panel A maps the trajectory of vocabulary development for individual participants (given in grey) and for participants providing high (red; > 0; novelty preference) versus low (blue; < 0; familiarity preference) segmentation scores.Panel B depicts the relationship between segmentation and expressive vocabulary scores at each time point.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 1
Summary of the linear mixed-effects model of (log odds) looking times on the segmentation trials.