Simultaneous segmentation and generalisation of non-adjacent dependencies from continuous speech

Language learning requires mastering multiple tasks, including segmenting speech to identify words, and learning the syntactic role of these words within sentences. A key question in language acquisition research is the extent to which these tasks are sequential or successive, and consequently whether they may be driven by distinct or similar computations. We explored a classic artificial language learning paradigm, where the language structure is defined in terms of non-adjacent dependencies. We show that participants are able to use the same statistical information at the same time to segment continuous speech to both identify words and to generalise over the structure, when the generalisations were over novel speech that the participants had not previously experienced. We suggest that, in the absence of evidence to the contrary, the most economical explanation for the effects is that speech segmentation and grammatical generalisation are dependent on similar statistical processing mechanisms.


Introduction
In order to achieve linguistic proficiency, language learners must identify words from continuous speech, and work out the relations between those words, in terms of determining grammatical categories and syntactic structures.However, there are no definitive acoustic cues for word boundaries (Aslin, Woodward, LaMendola, & Bever, 1996), nor of grammatical categories of words that can help determine the syntactic dependencies between words (Monaghan, Christiansen, & Chater, 2007).Thus, learning must operate by somehow determining the regularities that are evident within the language, and how these regularities relate to meaning in terms of defining the relations between words and their mapping to intended referents in the environment (Cunillera, Laine, Camara, & Rodriguez-Fornells, 2010;Monaghan & Mattock, 2012).
There are two views about how these learning tasks proceed in language acquisition.One perspective is that similar statistical mechanisms may apply to speech segmentation and to grammatical processing (Perruchet, Tyler, Galland, & Peereman, 2004;Romberg & Saffran, 2010).An alternative view, deriving from classical cognitive psychology approaches to learning (Chomsky, 1957;Pinker, 1997), is that while speech segmentation is likely to depend on processing statistical dependencies, learning grammar relies on rather different algebraic processes that operate between symbolic representations of elements of language.Previous studies of word identification and grammatical processing have tended to be tested by distinct stimuli, and so comparison across tasks is difficult.However, assessing with the same stimuli word identification and abstraction over these sequences for grammatical processing enables a test of whether processing of these tasks proceeds in tandem or is separated in learning.Though it is not possible to establish for certain whether the same or different processes apply to these tasks, it becomes more challenging to contend that the same statistical process applies to both word identification and grammatical processing if they can be shown to be temporally distinct.
There is good reason to suspect that learning may operate in tandem, because similar sources of information appear to be useful for both segmentation and determining dependencies between words in language acquisition.Monaghan and Christiansen (2010) demonstrated in corpus analyses of child-directed speech that identifying boundaries in speech could usefully rely on determining high-frequency function words that separate other words, forming points of very low transitional probabilities in the speech stream (Ordin & Nespor, 2013).Similarly, they found that these high frequency function words also provide useful markers to the phrase structure of the utterance (Cunillera, Camara, Laine, & Rodriguez-Fornells, 2010), for instance, determiner "the" tends to reliably precede nouns, and pronoun "you" precedes verbs.It is possible that the same sources of information are consulted twice to address these tasks in sequence, but a more economical explanation would be that the same source of information gradually builds up the learners' understanding of what the words are and how they operate in the grammar of the language.
If such statistical processing can be demonstrated to be sufficient for word identification and grammar learning, then this weakens the requirement to posit language-specific mechanisms for language acquisition, instead, a simpler domaingeneral approach to language learning could be assumed, until evidence to the contrary is ascertained (Christiansen & Chater, 2008).In an ingenious set of studies, Peña, Bonatti, Nespor, and Mehler (2002) set out to show this distinction.They focused on learning of non-adjacent dependencies, which are evident in language structure at multiple levels, from orthography (e.g., final e changing the pronunciation of the previous vowel, cap and cape), morpho-syntax (e.g., I go, he goes), grammatical categorisation (e.g., high-frequent non-adjacent pairs of words assist grammatical categorisation of the intervening word, "the __ is", "you __ to" (Mintz, 2003;St Clair, Monaghan, & Christiansen, 2010), and hierarchical grammatical relations (e.g., the boy the cats chase runs).As Perruchet et al. (2004) note, if statistical dependencies can be shown to be sufficient for acquiring nonadjacencies then this increases the likelihood of the role of domain general statistical processing in language acquisition.
In Peña et al.'s (2002) study, adults were presented with synthetic speech containing items defined by non-adjacent transitional probabilities (e.g.A1XC1, A2XC2), where particular A syllables were always paired with particular C syllables, but the X syllable freely varied over a set of three other syllables.To measure speech segmentation, participants were tested on their ability to identify previously occurring words that were consistent with the non-adjacencies presented in the speech, by assessing preference for words (e.g.A1XC1) over part-words (e.g.XC1A2).
Critically, Peña et al. (2002) also used these same stimuli to test the extent to which participants could manipulate non-adjacencies to generalise to new items.This involves going beyond the surface form of the sequences, by abstracting the structure to generalise to these new sequences, and is a key property of grammatical processing (Marcus, Vijayan, Rao, & Vishton, 1999).After the same training, they tested a different set of participants on their preference for "rule-words", constructed by moving an A or a C syllable from elsewhere in the speech stream (e.g., placing A2 within the A1_C1 non-adjacency: A1A2C1), in comparison with part-words.
Participants were not able to generalise.However, when the segmentation task was solved for participants, by placing a 25 ms gap between the syllable triples during training, participants did generalise to the rule-words.Peña et al. (2002) thus suggested that although adults are capable of using statistics to identify words from a continuous speech stream, they may then apply separate computations that do not depend on learning statistical dependencies between particular elements of the language, to generalise the structure to consistent forms.They suggest that this can occur only once the task of identifying the words in the stimuli has been solved (Chomsky, 1957;Endress & Bonatti, 2007;Marchetto & Bonatti, in press;Marcus et al., 1999;Miller & Chomsky, 1963).
The interpretation of these results has been hotly debated, but previously the focus of disagreement has been on whether non-adjacencies were learned at all, or rather whether participants instead remembered particular items from the speech (Perruchet et al., 2004), or whether participants learned only the general position of syllables in the sequences rather than the dependencies between them (Endress & Bonatti, 2007;Endress & Mehler, 2009;Mueller, Bahlmann, & Friederici, 2008, 2010;Perruchet et al., 2004).However, there has been substantially less focus on the extent to which segmentation and generalisation of structure co-occur, or are temporally distinct processes.
The Peña et al. (2002) rule-word generalisation stimuli were constructed by moving an A or a C syllable to a new position in the sequence.An advantage of this is that the frequencies of individual syllables were controlled across the target and the part-word stimuli in forced choice tests, so any observed preferences must then be due to syllable co-occurrences, either of adjacent or non-adjacent elements in speech.However, this design may have made generalisation performance harder to detect because it requires not only generalisation of the non-adjacency but also unlearning of the dependency relations for the moved syllable.For instance, the moved-syllable test of Peña et al.'s (2002) study would be analogous to training participants on "the boy the cats chase runs" and "the girl the dog nuzzles smiles", and then testing whether they can flexibly apply the non-adjacency to "the boy smiles runs".Participants may reject these items because they are not able to generalise the non-adjacent structure, or because they fail to accept a violation of relational structure.The observed importance of the pause between syllable triples may then be required not to solve the segmentation task, but rather to increase the salience of syllables with regard to their position (Endress & Mehler, 2009;Perruchet et al., 2004), thus providing an additional cue to relative positions of elements of the language in the speech.
In the current study we tested whether participants are able to simultaneously segment and generalise structure of a non-adjacent dependency language if new, rather than moved, syllables comprise the sequences to be generalised.A novel syllable intervening between an Ai_Ci dependency is a stronger test of generalisation, but without interference from previous learning of relative syllable positions.Participants listened to a continuous speech stream, and then completed either a test of segmentation, or of generalisation to rule words containing a moved syllable as in Peña et al. (2002).An additional condition tested generalisation to rule words containing novel syllables.If participants are able to use the same information for segmentation and generalisation simultaneously, but were affected by having to unlearn positional information in Peña et al.'s (2002) test of rule-word generalisation, then we expect learning for the novel syllable rule-words in addition to learning for the segmentation task.However, if segmentation and structural generalisation are separable processes, then we expect to see a null effect for the novel syllable generalisation task, with similar performance to that seen in Peña et al.'s (2002) original study.

Participants
The experiment was completed by 54 adults (8 males, 46 females) with a mean age of 18.52 years (range = 18-24 years).All participants were native-Englishspeakers, with no known history of auditory, speech or language disorder.Participants were paid £3.50, or received course credit.

Design
The experiment used a between participants design with three conditions of test type: segmentation, moved-syllable generalisation, and novel-syllable generalisation.
Similarities in phonological properties of non-adjacent dependent syllables have been shown to support acquisition of those non-adjacencies (Newport & Aslin, 2004), but they are not essential for such learning to occur (Onnis, Monaghan, Christiansen & Chater, 2004).Nevertheless, words from the same grammatical category tend to be coherent with regard to phonological properties (Monaghan et al., 2007), and so this property of the artificial language is consistent with natural language.Each AXC string lasted approximately 700 ms.To control for possible preferences for particular dependencies between syllables not due to the statistical structure of the sequences, 8 versions of the language were generated by randomly assigning syllables to A and C roles.The same X items were used across all versions of the language.These were counterbalanced across the three conditions of the study.There were three additional syllables comprising continuant phonemes (again consistent with a correlation between relational structure and phonology in natural language), which were reserved for testing generalisation to novel items (ve, zo, thi).

Training
A 10.5-min-long continuous stream of synthetic speech was created using the Festival speech synthesiser (Black et al., 1990) by concatenating AXC words in the language.No Ai_Ci dependency was immediately repeated.Speech streams had a 5s fade in and out so that onset and offset of the speech could not be used as a cue to the language structure.

Testing
For testing segmentation, a forced choice task tested preference for word compared to part-word comparisons.Part-words were trisyllabic items that occurred in the training speech but straddled word boundaries, comprising the last syllable of one word and the first two syllables of another word (CiAjX), or the last two syllables of one word and the first syllable of another (XCiAj).Both types of part-word were created for each of the nine AXC items.Eighteen test pairs were constructed by matching each part-word with its corresponding word (so, for example, an A1X2C1 item was paired with an X2C1A2 part-word).
For testing moved-syllable generalisation, a forced choice task compared preference for rule-word compared to part-words.Rule words comprised an Ai_Ci non-adjacency containing an A or a C item from elsewhere in the speech stream.
There were three rule-words for each Ai_Ci dependency.There were 9 test items altogether, five rule-words paired with a CiAjX part-word and four paired with a XCiAj part-word.
For novel-syllable generalisation, nine forced choice tests comprised a ruleword containing one of the three novel syllables (so of the form AiNCi), where N indicates the novel syllable, and a novel part-word.Each novel rule-word appeared once in each Ai_Ci dependency.Part-words comprised two syllables that occurred during training in their respective positions, with the same novel syllable that the novel-syllable task was significantly higher than chance (M = .693,SD = .160),t(17) = 5.129, p < .001,d = 1.209.
Additional analyses were performed to examine whether there was any learning over testing, and whether any such learning differed between the generalisation conditions, to rule out the possibility that using novel phonemes in the novel-syllable generalisation test made the non-adjacencies more salient in this condition than the moved-syllable or segmentation conditions.For each condition, testing was divided into three blocks of equal number of trials, distinguishing early, middle and late testing trials, and responses were then reanalysed with block as an additional, within-subjects factor.
There was no significant effect of block, F(2, 60) = 1.736, p = .185,ηp 2 = .055,and the linear contrast for block was not significant, F < 1.There was no significant interaction between block and condition, F < 1, which was also not significant in the linear contrast, F < 1.Thus, there was no evidence of learning during the test trials across conditions, and no differential effect of learning between the conditions.To ensure that there was no learning within individual conditions, we conducted an ANOVA with block as a within subjects factor for each condition separately.Again, there were no significant main effects or linear contrasts for block in any condition.

Discussion
We examined adults' ability to learn non-adjacent dependencies from a continuous stream of synthetic speech, to establish whether segmentation and generalisation may occur within the same brief learning period.In accord with prior literature, we found that adults are able to use non-adjacent transitional probabilities to segment a continuous artificial speech stream (Onnis et al., 2005;Peña et al., 2002;Perruchet et al., 2004).As anticipated, performance on the moved-syllable generalisation task did not provide evidence of generalisation, corresponding with the findings of Peña et al. (2002).However, critically, participants were able to generalise non-adjacency structures to sequences that contained novel syllables.
It is possible that there were conflicting forces preventing participants from displaying a preference for the moved-syllable rule-words; on one hand, the nonadjacencies were being used for processing the structure, but on the other participants may have been affected by the unfamiliarity of the repositioned A or C syllable contravening the dependencies within the language.In previous studies, the importance of the pause cue between triples in the language may have been not to solve the segmentation task but rather to provide an additional cue increasing saliency of the positions of individual syllables (Endress & Mehler, 2009), resulting in enhanced learning of syllables in particular positions (Endress & Bonatti, 2007;Perruchet et al., 2004).Without this conflicting information, we have shown that generalisation of the non-adjacencies can be observed in tandem with segmentation, and, furthermore, that this can be accomplished without requiring additional cues to the structure of the language (e.g., Mueller et al., 2008Mueller et al., , 2010).
An alternative reason for the observation of novel-syllable generalisation is that the part-word items do not occur in the speech, though for the moved-syllable generalisation task the part-word items did occur, and it is therefore harder to reject them.This is a possibility, but does not affect the overall result that generalisations depending on non-adjacencies are driving the results in the novel-syllable generalisation task.In either case, there was no significant effect of the type of partword, indicating that a part-word containing a syllable pair that occurred during training (NCA, or CAN) was no harder to reject than a part-word containing no syllable pairs that occurred during training (XNA), making this an unlikely cause for the distinction between the two generalisation conditions.
The current study demonstrates that segmentation and generalisation are not separable behaviourally within the same time period, where such a differentiation had previously been claimed (Peña et al., 2002).Thus, evidence suggesting that there was a distinction between processes for word identification and grammatical processing is shown to not be supported.There remains a possibility that the tasks are solved simultaneously but in different ways, such that non-adjacencies are utilised for segmentation using statistical learning, but that operations over the structure are still symbolic, applying to abstract generalisations of the relations between elements (Marcus et al., 1999).We suggest that, in the absence of evidence to the contrary, the same class of mechanisms -statistical learning -should be assumed to be sufficient for driving word learning as well as structural generalisation (Aslin & Newport, 2014).
Previous claims of the need for symbolic, algebraic processing for generalisation of sequences rely on a narrow interpretation of statistical mechanisms that permit only computation of dependencies between elements in experienced stimuli (Marcus et al., 1999).However, statistical processing is consistent with learning to generalise, as well as learning precise co-occurrences between experienced elements in sequences (Romberg & Saffran, 2010).The traditions of symbolic and statistical processing in cognitive psychology, and language acquisition research, have undergone substantial convergence (Perruchet & Pacton, 2006).For instance, Redington, Chater, and Finch (1998) demonstrated how statistical processing of clustering can support generalisations as well as learning individual correspondences in grammatical structure.Similarly, French, Addyman, and Mareschal (2011) showed how the same statistical learning mechanism could apply to both speech segmentation studies and studies of implicit learning of rulebased sequences.Our results confirm that such results also occur behaviourally.
The breadth of possible statistical processes that can support speech segmentation, grammatical categorisation, and syntactic processing reduces the requirement to stipulate that language acquisition processes may be domain-specific, rather than applications of powerful general-purpose learning mechanisms (Christiansen & Chater, 2008).However, exactly what statistical mechanism is being applied remains a difficult issue to resolve.In the current experiment, the distinction between word identification and grammatical processing can be understood in terms of learning dependencies between experienced sequences and learning to generalise dependencies to new sequences.However, scaling this distinction up to the dependencies observed in natural language requires explaining how long-distance dependencies between hierarchical structures may be acquired (see Lai & Poletiek, 2011;Lany & Gómez, 2008;Lai & Poletiek, 2011;Onnis, Monaghan, Christiansen, & Chater, 2004;Saffran et al., 1996, for progress in this field).Nevertheless, the study we present here demonstrates that, from the same input and at the same time, participants are able to identify particular sequences as words, and generalise the structure of those sequences.Any qualitative distinctions between the processes involved in these tasks as yet remain to be demonstrated.

Figure 1 .
Figure 1.Mean accuracy for each test condition.