Tuning in to non-adjacencies: Exposure to learnable patterns supports discovering otherwise difficult structures

doi:10.1016/j.cognition.2020.104283

Cognition

Volume 202, September 2020, 104283

https://doi.org/10.1016/j.cognition.2020.104283 Get rights and content

Abstract

Non-adjacent dependencies are ubiquitous in language, but difficult to learn in artificial language experiments in the lab. Previous research suggests that non-adjacent dependencies are more learnable given structural support in the input – for instance, in the presence of high variability between dependent items. However, not all non-adjacent dependencies occur in supportive contexts. How are such regularities learned? One possibility is that learning one set of non-adjacent dependencies can highlight similar structures in subsequent input, facilitating the acquisition of new non-adjacent dependencies that are otherwise difficult to learn. In three experiments, we show that prior exposure to learnable non-adjacent dependencies - i.e., dependencies presented in a learning context that has been shown to facilitate discovery - improves learning of novel non-adjacent regularities that are typically not detected. These findings demonstrate how the discovery of complex linguistic structures can build on past learning in supportive contexts.

Introduction

Non-adjacent dependencies are ubiquitous in language. For instance, English marks number agreement (e.g. The linguists at the conference are restless) and aspect (e.g. People are learning all of the time) via inflectional morphemes that establish dependencies between distal items. Despite their prevalence in natural languages, non-adjacent dependencies in artificial grammar learning experiments are notoriously difficult to learn, both for adults and infants (e.g., Gómez, 2002; Gonzalez-Gomez & Nazzi, 2012; Newport & Aslin, 2004; Romberg & Saffran, 2013; see Wilson et al., 2018 for a recent review). Given their centrality to language structure, how do we learn non-adjacent dependencies that are not easily detected in speech?

Previous research suggests that the input can be structured to support learners' discovery of non-adjacent regularities. For example, learning can be facilitated simply by increasing exposure (Romberg & Saffran, 2013; Vuong, Meyer, & Christiansen, 2016); additional experience may allow learners more opportunity to uncover patterns. Learning can also be improved when the non-adjacent dependencies are paired with additional cues that highlight their relatedness (e.g., Onnis, Monaghan, Richmond, & Chater, 2005; van den Bos, Christiansen, & Misyak, 2012). For instance, Onnis et al. (2005) found that learners were better able to learn dependencies between phonologically similar syllables, and Newport and Aslin (2004) showed that participants could successfully detect non-adjacent patterns among sets of consonants or vowels, but failed to discover non-adjacent patterns among syllables. Thus, non-adjacent relations seem to be more easily tracked when dependent elements are perceived as similar. Perceptual cues that make relevant items more salient, such as prosody or pauses that mark boundaries in the speech stream, can also boost learning (e.g., Grama, Kerkhoff, & Wijnen, 2016; Peña, Bonatti, Nespor, & Mehler, 2002; Wang & Mintz, 2018), demonstrating that non-adjacent relations can be highlighted in numerous ways.

A particularly powerful factor that can highlight the presence of non-adjacent dependencies is the variability surrounding to-be-learned patterns (Gómez, 2002; Gómez & Maye, 2005). In a classic study by Gómez (2002), participants' learning of non-adjacent regularities improved significantly as the number of unique items that appeared between the dependent elements increased. Variability in the intervening elements affects learning because it can focus attention toward invariant, and hence reliable, structure in the input. With highly variable intermediate elements, learners are better able to detect the reliable associations between non-sequential items, suggesting that surrounding information can help direct learners' attention to non-adjacent regularities.

Learners can also build on past experience with related structures to detect the presence of non-adjacent structures. Previous experience can shape learners' expectations and change the statistical relations that they track (e.g., LaCross, 2015; Lew-Williams & Saffran, 2012; Potter, Wang, & Saffran, 2017; Wang, Zevin, & Mintz, 2017). For example, experiencing some word categories in adjacent structures subsequently helps learners recognize non-adjacent relations between the same words (Lany & Gómez, 2008; Lany, Gómez, & Gerken, 2007). Following experience with associations that are easily learnable, learners may be better able to detect more complex relations (e.g., Elman, 1990; Lai & Poletiek, 2011). Existing native language knowledge can have a particularly powerful impact on the expectations learners form about the structure of upcoming language input. In a recent study, Wang et al. (2017) showed that recent experience with consistent rhythmic patterns embedded in native language structures changes what patterns learners subsequently infer from novel materials. Participants learned non-adjacent dependencies embedded in an artificial language after they were exposed to English phrases that had a matched four-word structure, but not when the two structures were in conflict. This finding is consistent with evidence that infants are better able to discover regularities with a structure that matches their prior experience (Lew-Williams & Saffran, 2012). Together, these studies suggest that learners can use prior experience to improve their learning of non-adjacent dependencies by building on past learning about specific items in simpler contexts or by drawing on knowledge about non-adjacent structures from their first language. However, this leaves open the question of whether learners can discover non-adjacent dependencies de novo when the relevant dependencies only appear in non-adjacent relations. When acquiring a novel language, learners must learn new distal grammatical relations that are rarely, if ever, encountered in simpler forms. How might learners break in to learning new non-adjacencies?

In the current work, we investigated whether past distributional learning itself may offer a solution to the problem of discovering new non-adjacencies. This explanation focuses on the role of past learning in guiding future learning. If the input is initially structured to support successful non-adjacent dependency learning, this could lead learners to expect to encounter non-adjacent structure in the language. These expectations could subsequently allow them to extract non-adjacent patterns, even in contexts when learning would otherwise be difficult. To test this proposal, we designed a series of experiments in which learners could build on past distributional learning to succeed when faced with a more difficult context for detecting non-adjacent structure. We hypothesized that prior experience with non-adjacent dependencies in the presence of high variability (a context known to support learning; Gómez, 2002; Gómez & Maye, 2005; Plante et al., 2014) would facilitate acquisition of a new set of non-adjacent associations among novel words. In three studies, we tested our hypothesis that experience with one set of non-adjacent dependencies presented in more learnable circumstances would subsequently facilitate learning of a new set of non-adjacent dependencies that learners otherwise struggle to detect. Together, these studies explore how pattern learning in the present builds on pattern learning from the past by testing whether prior experience with readily learnable structures allows difficult linguistic structures to be learned more easily.

Section snippets

Experiment 1

Our first study tested whether being pre-exposed to non-adjacent dependencies in a learnable context would aid participants in recognizing novel non-adjacent regularities that are difficult to learn. Learners were tested for their ability to discover the association between the first and third word in three-word sequences (e.g., pel-kicey-rud). One group of learners was pre-exposed to a set of artificial sentences that we expected to be learnable based on past work (Gómez, 2002): consistent

Experiment 2

In Experiment 2, we conducted a replication of Experiment 1 with an additional condition (No Pre-Exposure Condition) in which participants received no pre-exposure experience. We predicted a linear effect across the three conditions, such that performance would be strongest in the Learnable Condition, intermediate in the No Pre-Exposure condition, and weakest in the Non-Learnable Condition, with significant differences between all three conditions. The linear hypothesis and analytic approach

Experiment 3

In Experiment 3, we tested the effect of exposure to learnable non-adjacent dependencies against a new condition (Unstructured Pre-Exposure Condition) in which total language exposure was equated with materials presented in the Learnable Pre-Exposure Condition. Crucially, the Unstructured Pre-Exposure Condition included a pre-exposure phase consisting of the same words as the pre-exposure in the Learnable Pre-Exposure Condition. However, the words occurred individually in random order, instead

General discussion

This set of studies investigated a proposal for how distributional learning might build on itself, such that learners develop expectations about linguistic structures that allow them to successfully learn otherwise difficult patterns. When learners were exposed to patterns with learnable non-adjacent dependencies, they were subsequently more successful at learning novel non-adjacent dependencies than if their previous exposure did not include learnable non-adjacent patterns. We tested three

Author contributions

All authors developed the study concept and design. Data collection and data analysis were performed by MZ. All authors contributed to the interpretation of the data and wrote the manuscript.

Acknowledgements

This research was supported by NSF-GRFP DGE-1747503 awarded to MZ, and grants from the NICHD to JRS (R37HD037466), CEP (F32 HD093139), and the Waisman Center (U54 HD090256). We thank Jill Lany for helpful comments on an earlier draft, and Emily Cummings, Grace McCune, Lauren Silber, and Amy So for aiding in data collection.

References (58)

R.H. Baayen et al.
Mixed-effects modeling with crossed random effects for subjects and items
Journal of Memory and Language
(2008)
D.J. Barr et al.
Random effects structure for confirmatory hypothesis testing: Keep it maximal
Journal of Memory and Language
(2013)
E. van den Bos et al.
Statistical learning of probabilistic nonadjacent dependencies by multiple-cue integration
Journal of Memory and Language
(2012)
N. Chater et al.
Probabilistic models of language processing and acquisition
Trends in Cognitive Sciences
(2006)
J.L. Elman
Finding structure in time
Cognitive Science
(1990)
M.C. Frank et al.
Three ideal observer models for rule learning in simple languages
Cognition
(2011)
R.L.A. Frost et al.
Simultaneous segmentation and generalisation of non-adjacent dependencies from continuous speech
Cognition
(2016)
T.F. Jaeger
Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models
Journal of Memory and Language
(2008)
J. Lai et al.
The impact of adjacent-dependencies and staged-input on the learnability of center-embedded hierarchical structures
Cognition
(2011)
C. Lew-Williams et al.
All words are not created equal: Expectations about word length guide infant statistical learning
Cognition
(2012)

E.L. Newport et al.

Learning at a distance I. Statistical learning of non-adjacent dependencies

Cognitive Psychology

(2004)

L. Onnis et al.

Phonology impacts segmentation in online speech processing

Journal of Memory and Language

(2005)

T. Regier et al.

Learning the unlearnable: The role of missing evidence

Cognition

(2004)

N. Siegelman et al.

Linguistic entrenchment: Prior knowledge impacts statistical learning performance

Cognition

(2018)

R.P. Abelson et al.

Contrast tests of interaction hypotheses

Psychological Methods

(1997)

D. Bates et al.

Fitting linear mixed-effects models using lme4

Journal of Statistical Software

(2015)

E.A. Bates et al.

Second language acquisition from a functionalist perspective: Pragmatic, semantic, and perceptual strategies

M.H. Christiansen et al.

Statistical learning within and between modalities: Pitting abstract against stimulus-specific representations

Psychological Science

(2006)

A.L. Gebhart et al.

Statistical learning of adjacent and non-adjacent dependencies among non-linguistic sounds

Psychonomic Bulletin and Review

(2009)

L. Gerken et al.

Infants avoid “labouring in vain” by attending more to learnable than unlearnable linguistic patterns

Developmental Science

(2011)

J. Gervain et al.

Learning multiple rules simultaneously: Affixes are more salient than reduplications

Memory and Cognition

(2017)

R.L. Gómez

Variability and detection of invariant structure

Psychological Science

(2002)

R.L. Gómez et al.

The developmental trajectory of nonadjacent dependency learning

Infancy

(2005)

N. Gonzalez-Gomez et al.

Acquisition of nonadjacent phonological dependencies in the native language during the first year of life

Infancy

(2012)

I.C. Grama et al.

Gleaning structure from sound: The role of prosodic contrast in learning non-adjacent dependencies

Journal of Psycholinguistic Research

(2016)

T. Grüter et al.

Grammatical gender in L2: A production or a real-time processing problem?

Second Language Research

(2012)

D. Guillelmon et al.

The gender marking effect in spoken word recognition: The case of bilinguals

Memory and Cognition

(2001)

J.S. Horst et al.

The novel object and unusual name (NOUN) database: A collection of novel images for use in experimental research

Behavior Research Methods

(2016)

J. von Koss Torkildsen et al.

Exemplar variability facilitates rapid learning of an otherwise unlearnable grammar by individuals with language-based learning disability

Journal of Speech, Language, and Hearing Research

(2013)

Cited by (7)

Using known words to learn more words: A distributional model of child vocabulary acquisition
2023, Journal of Memory and Language
Why do children learn some words before others? A large body of behavioral research has identified properties of the language environment that facilitate word learning, emphasizing the importance of particularly informative language contexts that build on children’s prior knowledge. However, these findings have not informed research that uses distributional properties of words to predict vocabulary composition. In the current work, we introduce a predictor of word learning that emphasizes the role of prior knowledge. We investigate item-based variability in vocabulary development using lexical properties of distributional statistics derived from a large corpus of child-directed speech. Unlike previous analyses, we predicted word trajectories cross-sectionally across child age, shedding light on trends in vocabulary development that may not have been evident at a single time point. We also show that regardless of a word’s grammatical class, the best distributional predictor of whether a child knows a word is the number of other known words with which that word tends to co-occur.
Analogical inference from distributional structure: What recurrent neural networks can tell us about word learning[Formula presented]
2023, Machine Learning with Applications
One proposal that can explain the remarkable pace of word learning in young children is that they leverage the language-internal distributional similarity of familiar and novel words to make analogical inferences about possible meanings of novel words (Lany and Gómez, 2008; Lany and Saffran, 2011; Savic et al., 2022b; Unger and Fisher, 2021; Wojcik and Saffran, 2015). However, a cognitively and developmentally plausible computational account of how language-internal lexical representations are acquired to enable this kind of analogical inference has not been previously investigated. In this work, we tested the feasibility of using the SRN (Elman, 1990) as the supplier of language-internal representations for use in analogical inference. While the SRN is in many ways well suited to this task, we discuss several theoretical challenges that might limit its success. In a series of simulations with controlled artificial languages and the CHILDES corpus, we show that Recurrent Neural Networks (RNNs) are prone to acquiring ‘entangled’ lexical semantic representations, where some features of a word are partially encoded in the representations of other frequently co-occurring words. However, we also show that this problem is mitigated when RNNs are first trained on language input to young children, due to the fact that its distributional structure more reliably predicts semantic category membership of individual words. Overall, our work sheds light on the conditions under which RNNs organize their learned knowledge so that word-level information can be more easily extracted and used in downstream processes, such as word learning.
The influence of language-specific properties on the role of consonants and vowels in a statistical learning task of an artificial language: A cross-linguistic comparison
2024, Quarterly Journal of Experimental Psychology
The Influence of Memory on Visual Perception in Infants, Children, and Adults
2023, Cognitive Science
Close Encounters of the Word Kind: Attested Distributional Information Boosts Statistical Learning
2023, Language Learning
The influence of memory on visual perception in infants, children, and adults
2021, Research Square

View all citing articles on Scopus

View full text

Tuning in to non-adjacencies: Exposure to learnable patterns supports discovering otherwise difficult structures

Abstract

Introduction

Section snippets

Experiment 1

Experiment 2

Experiment 3

General discussion

Author contributions

Acknowledgements

Journal of Memory and Language

Journal of Memory and Language

Journal of Memory and Language

Trends in Cognitive Sciences

Cognitive Science

Cognition

Cognition

Journal of Memory and Language

Cognition

Cognition

Cognitive Psychology

Journal of Memory and Language

Cognition

Cognition

Contrast tests of interaction hypotheses

Psychological Methods

Fitting linear mixed-effects models using lme4

Journal of Statistical Software

Second language acquisition from a functionalist perspective: Pragmatic, semantic, and perceptual strategies

Statistical learning within and between modalities: Pitting abstract against stimulus-specific representations

Psychological Science

Statistical learning of adjacent and non-adjacent dependencies among non-linguistic sounds

Psychonomic Bulletin and Review

Infants avoid “labouring in vain” by attending more to learnable than unlearnable linguistic patterns

Developmental Science

Learning multiple rules simultaneously: Affixes are more salient than reduplications

Memory and Cognition

Variability and detection of invariant structure

Psychological Science

The developmental trajectory of nonadjacent dependency learning

Infancy

Acquisition of nonadjacent phonological dependencies in the native language during the first year of life

Infancy

Gleaning structure from sound: The role of prosodic contrast in learning non-adjacent dependencies

Journal of Psycholinguistic Research

Grammatical gender in L2: A production or a real-time processing problem?

Second Language Research

The gender marking effect in spoken word recognition: The case of bilinguals

Memory and Cognition

The novel object and unusual name (NOUN) database: A collection of novel images for use in experimental research

Behavior Research Methods

Exemplar variability facilitates rapid learning of an otherwise unlearnable grammar by individuals with language-based learning disability

Journal of Speech, Language, and Hearing Research