The role of Complementary Learning Systems in learning and consolidation in a quasi-regular domain

We examine the role of off-line memory consolidation processes in the learning and retention of a new quasi-regular linguistic system similar to the English past tense. Quasi-regular systems are characterized by a dominance of systematic, regular forms (e.g., walk-walked, jump-jumped) alongside a smaller number of high frequency irregulars (e.g., sit-sat, go-went), and are found across many cognitive domains, from spelling-sound mappings to inflectional morphology to semantic cognition. Participants were trained on the novel morphological system using an artificial language paradigm, and then tested after different delays. Based on a complementary systems account of memory, we predicted that irregular forms would show stronger off-line changes due to consolidation processes. Across two experiments, participants were tested either immediately after learning, 12 h later with or without sleep, or 24 h later. Testing involved generalization of the morphological patterns to previously unseen words (both experiments) as well as recall of the trained words (Experiment 2). In generalization, participants showed 'default' regularization across a range of novel forms, as well as irregularization for previously unseen items that were similar to unique high-frequency irregular trained forms. Both patterns of performance remained stable across the delays. Generalizations involving competing tendencies to regularize and irregularize were balanced between the two immediately after learning. Crucially, at both 12-h delays the tendency to irregularize in these cases was strengthened, with further strengthening after 24 h. Consolidated knowledge of both regular and irregular trained items contributed significantly to generalization performance, with evidence of strengthening of irregular forms and weakening of regular forms. We interpret these findings in the context of a complementary systems model, and discuss how maintenance, strengthening, and forgetting of the new memories across sleep and wake can play a role in acquiring quasi-regular systems.


Introduction
A recent body of literature has highlighted the role of longterm memory processes in language learning. Several studies have established that, for example, while children and infants may be successful at initial encoding of a novel word, their long-term retention is often poor (e.g., Friedrich & Friederici, 2011;Horst & Samuelson, 2008;Kucker & Samuelson, 2012;Vlach & Sandhofer, 2012;Wojcik, 2013). Vlach and Sandhofer (2012) have shown that enhancing encoding conditions can improve the otherwise poor longterm retention. Furthermore, in children and in adults a crucial role has been established for consolidation processes, including off-line maintenance, off-line strengthening, and forgetting, which play key roles in word learning and longterm retention (e.g., Brown, Weighall, Dumay & Gaskell, 2007;Dumay, 2016;Friedrich, Wilhelm, Born, & Friederici, 2015;Tamminen & Gaskell, 2008;Werchan & Gomez, 2014;Williams & Horst, 2014). Davis and Gaskell (2009) applied principles from the Complementary Learning Systems (CLS) model of memory (e.g., Kumaran, Hassabis, & McClelland, 2016;McClelland, 2013;McClelland, McNaughton, & O'Reilly, 1995) to account for the role of long-term memory processes in word learning. They suggested that complementary systems in the hippocampus and neocortex underpin the ability to acquire new words and integrate them with existing linguistic knowledge for long term retention. The hippocampal system is thought to be initially more strongly involved in supporting new learning, whereas neocortical networks provide a lasting basis for retention. Communication between these systems provides a means of consolidation-related changes in the retention of new linguistic material as well as the integration of the new linguistic knowledge into the long-term lexical network. Hippocampal replay of newly established memories (e.g., Skaggs & McNaughton, 1996) is thought to underpin improvements in retention specifically related to sleep.
There is now substantial evidence supporting the role of consolidation processes, and particularly sleep-related consolidation in both the retention and integration of new linguistic knowledge (e.g., Bakker, Takashima, van Hell, Janzen, & McQueen, 2014;Dumay & Gaskell, 2007;Gais, Lucas, & Born, 2006;Schreiner & Rasch, 2015Tamminen, Payne, Stickgold, Wamsley, & Gaskell, 2010). The improved retention of new words after a period of sleep has been causally linked to hippocampal replay in studies using targeted memory replay during sleep (e.g., Batterink, Westerberg, & Paller, 2017;Schreiner & Rasch, 2015. Similarly, the integration of new knowledge with the existing lexical networks has been directly related to sleep-related memory consolidation processes (e.g., Tamminen et al., 2010). These findings are consistent with a CLS-type model of language learning, as well as other active systems consolidation models invoking sleep as a vehicle for hippocampal replay (e.g., Born & Wilhelm, 2012).
Mirkovi c and Gaskell (2016) further examined the extent to which the CLS model applies to language learning by focusing on the encoding schemes utilized by the two systems. In particular, the hippocampal system is thought to use a sparse encoding scheme beneficial for pattern separation, whereas the neocortical system is thought to use an overlapping, distributed encoding scheme beneficial for extracting commonalities across experiences (O'Reilly & Rudy, 2001). Mirkovi c and Gaskell (2016) hypothesized that the differences in the encoding schemes used by the two systems may be relevant for understanding initial acquisition behavior. Specifically, the hippocampal sparse encoding may be particularly beneficial if the new learning is essentially arbitrary in terms of the relationship between two domains (e.g., form-tomeaning mapping, as in vocabulary learning). However, if new learning comprises of systematic mappings it may be more amenable to direct neocortical encoding at initial acquisition due to the ability of a distributed neocortical representational scheme to capture systematic regularities. In essence, the lower level of interference between each individual mapping would allow useful learning to occur without requiring the intervention of the hippocampus for pattern separation.
If the extent of hippocampal recruitment during initial learning depends on factors such as systematicity then it is plausible that the extent of any sleep-associated benefit in retention will depend on these same factors. Thus newly learned arbitrary mappings should show strong sleepassociated benefits in retention compared with newly learned systematic mappings that have a weaker reliance on the hippocampus during encoding. Mirkovi c and Gaskell (2016) provided an initial test of this view using an artificial language incorporating both arbitrary and systematic mappings. As predicted, the arbitrary components of the language (mappings from form to meaning, as in vocabulary learning) benefited from a retention interval including sleep as opposed to wake, whereas the more systematic aspects of the language (involving generalization of grammatical markers) showed no similar benefit.
In the current study we further explore the extent to which properties of new mappings influence the profile of initial learning and retention within a CLS framework, specifically focusing on inflectional morphology. We developed an artificial language mimicking the main properties of one of the best-studied morphological systems, the English past tense (see McClelland & Patterson, 2002;Pinker & Ullman, 2002, for reviews). This mapping displays key properties of quasiregular systems in that there is a dominance of a systematic regular pattern (as in walk-walked, or jump-jumped), together with different degrees of deviation from that pattern in irregulars or exceptions (as in sit-sat or go-went) (see Seidenberg & Plaut, 2014, for a review). This type of quasiregularity is found across many cognitive domains, from spelling-to-sound mappings to semantic cognition (McClelland, 2015). The study of the learning, representation, and use of these systems has benefitted from computational modeling that shares the distributed encoding aspects of the CLS (e.g., Armstrong, Dumay, Kim, & Pitt, 2017;Harm & Seidenberg, 1999McClelland & Rogers, 2003;Mirkovi c, Seidenberg, & Joanisse, 2011;Plaut, McClelland, Seidenberg, & Patterson, 1996;Woollams, Joanisse, & Patterson, 2009), but critically it has not considered the involvement of the hippocampal system and moreover the complementary contributions of the two systems in initial learning and long-term retention (c.f. Schapiro, Turk-Browne, Botvinick, & Norman, 2017). Thus understanding the extent to which the CLS framework applies to learning a quasi-regular inflectional system will have important implications for learning and representation in cognition more broadly.
The existence of a systematic, regular pattern alongside irregular or exceptional items typical of quasi-regular systems allowed us to examine the specific role of the two learning systems at initial acquisition and off-line consolidation. In particular, the systematic aspects of regulars may make them more likely to make use of the distributed neocortical scheme at initial acquisition, whereas the exceptional nature of irregulars would benefit from the sparse encoding scheme and pattern separation of the hippocampal system. The difference in relative reliance on the two systems at initial acquisition might have crucial implications for consolidation: the initial stronger reliance on the hippocampal system for irregulars could make them subject to stronger consolidation effects. Thus, when assessed after a delay, one might expect a greater change in behavioral performance for irregular relative to regular forms. We explored these aspects of the CLS as applied to learning a novel morphological system by focusing on generalization behavior.
To our knowledge, this is the first study to explore the role of consolidation processes in learning a new inflectional system. Previous studies on the role of consolidation in morphological learning explored the acquisition of derivational morphology, and in particular new affixes for existing vocabulary items (e.g., Leminen et al., 2016;Tamminen, Davis, & Rastle, 2015;Tamminen, Davis, Merkx, & Rastle, 2012). For example, Tamminen and colleagues trained participants on novel derivational affixes for existing English words by teaching them, for instance, that a sleepnule was a person asked to sleep for a medical experiment, and a climbnule a person climbing dangerous mountains. Participants were tested immediately after training or after a delay (2 days or 1 week). They were tested on their ability to generalize the acquired affix-to-meaning mappings (e.g., -nule ¼ a person) by combining the trained affixes with English words not presented during training. The generalization tests included tests of explicit knowledge of the affix-to-meaning mappings, and more implicit tests such as novel word naming in the context of sentences semantically congruent or incongruent with the meaning of the novel affixes. Tamminen et al. (2015) showed that in explicit tasks participants were able to generalize the meaning of the new affixes both at the immediate and the delayed test. However, in the more implicit naming tasks likely to engage existing linguistic knowledge, the benefit of learning only appeared at the delayed test when participants were faster to name the novel words in sentences congruent versus incongruent with the meaning of the novel affixes. This facilitation was absent at the immediate test.
The current study differs from previous studies of learning and consolidation of morphological systems in three ways. First, unlike previous studies that relied on existing linguistic knowledge, we used a fully artificial language with a new vocabulary and new affixes. This decision was motivated by the fact that the role of the complementary learning systems at both initial acquisition and offline consolidation is crucially influenced by the extent to which new learning is supported by existing knowledge (e.g., Himmer, Mü ller, Gais, & Sch€ onauer, 2017;McClelland, 2013;Tse et al., 2007). Implementing the new inflectional system within a new vocabulary allowed us to focus specifically on new learning while minimizing the influence of existing linguistic knowledge. Second, the current study has a finer-grained focus on the initial period of consolidation, unlike previous studies that compared performance immediately after learning with delayed performance a day (Leminen et al., 2016) or a week (Tamminen et al., 2015) later. While such comparisons are valuable, they fail to assess the potentially discriminable contributions of time spent asleep versus awake on the representation of the new system (c.f. Sweegers & Talamini, 2014;Werchan & Gomez, 2014). The current study therefore compared a range of delay periods within the first 24 h in order to provide a better understanding of the processes involved in retention and consolidation of morphological learning. The third distinctive aspect of the current study is the focus on the learning and consolidation of a new inflectional system. Unlike derivational morphemes, inflectional morphemes have direct consequences for linguistic processing outside individual words; for example, in English, a plural morpheme es on a subject noun crucially influences the form of the verb due to agreement (e.g., the cat is on the roof, vs the cats are on the roof). This property of inflectional morphology has led to models that incorporate fundamental differences in how derivational and inflectional affixes are processed (e.g., Bozic & Marslen-Wilson, 2010;Laudanna, Badecker, & Caramazza, 1992). However, alternative models describe both as simply points on a continuum, and suggest that both can be viewed as regularities in the form-to-meaning mappings (e.g., Bybee, 1995a;Hay & Baayen, 2005;Seidenberg & Gonnerman, 2000). Thus findings that relate to how both types of morphological systems are learned and supported by the CLS may have important implications for the debate on the nature of the differences between derivational and inflectional morphology. The artificial inflectional system used in the current study mimicked several key properties of the English past tense and other inflectional systems. First, the majority of inflected forms conformed to a systematic, regular pattern (as in walkwalked), and a minority had an irregular form. Second, the irregular forms had higher token frequency (e.g., Bybee & Slobin, 1982;Bybee, 1995b). In natural languages, this property is thought to help preserve irregular forms in the language (e.g., go-went; Bybee, 1995aBybee, , 1995b. Finally, both regular and irregular forms varied in the extent to which a phonological cue was associated with the pattern (as in keep-kept, sleep-slept vs sit-sat, hit-hit, fit-fitted; Albright & Hayes, 2003).
We were particularly interested in what properties of this system would drive generalization performance assessed immediately after training and after a delay. First, based on previous work on learning and generalization in these systems, we predicted that immediately after learning previously unseen forms that did not have strong phonological overlap with any uninflected forms in the trained language would be regularized (a global form of generalization; cf. Xeroxed; Prasada & Pinker, 1993). Second, novel forms containing a phonological sequence that was shared with irregular uninflected forms in the trained language would also share their cortex xxx (2018) 1e22 irregular inflectional pattern, mimicking the 'islands of reliability' (Albright & Hayes, 2003) found in natural languages (a more local form of generalization). Third, previously unseen forms sharing phonological properties with both trained regulars and trained irregulars would lead to competition immediately after learning (cf. sit-sat, fit-fitted). Crucially, to the extent that the new learning can be explained within a CLS framework, and greater reliance on the hippocampal system for irregular forms, we predicted that a consolidation period would lead to the strengthening of irregular over regular trained items, enhancing the influence of the irregulars in the generalization process. We tested these predictions in two experiments, with participants in Experiment 1 tested either immediately after training or after a 24 h delay, and in Experiment 2 after a 12 h delay including sleep or wake.

Experiment 1
In both experiments we used the same artificial language, with novel words (e.g., rish) referring to familiar objects (e.g., apple). The new inflectional system was implemented in the number domain, such that plural forms were created by adding a "suffix" to the unchanged singular form or "stem" (rish þ aff ¼ rishaff for 'apples'). As suggested above, the new morphological system incorporated several key properties of inflectional (and other quasi-regular) systems in natural languages.
The first property was the dominance of regulars: The plural forms of the novel words were designed such that the majority of items in the training set had a regular plural affix (e.g., -aff), and a minority had an irregular plural affix (e.g., -eem,-esh). Hence the regular plural affix had a higher type frequency than the irregular affixes, as they do in natural languages (Bybee & Slobin, 1982;Bybee, 1995b).
The second property involved the phonological characteristics of the "stems" and how they related to the affixes (e.g., Albright & Hayes, 2003;Plunkett & Marchman, 1991). As in English regular verbs (e.g., talk, hope, join) regular words in this artificial language had diverse phonological properties in the (uninflected) singular form (e.g., rish, groll, heef). This group was termed no cue regulars. A small subset of regulars shared a phonological cue in the singular form with a set of irregulars (e.g., -arb in farb, yarb), and these were termed regular inconsistent (see Table 1 for examples). All regular items had the same affix in the plural form (e.g., -aff: rishaff, farbaff, yarbaff).
Within the irregulars, one group of items had a phonological cue in the singular form (e.g., isp in tisp) that was uniquely associated with an irregular affix in the plural (e.g., tispeem). This condition is analogous to the phonological similarity in English irregular past tense forms where phonologically similar present tense forms have phonologically similar (irregular) past tenses (e.g., sing-sang, ring-rang). These items were termed irregular consistent, in that a phonological cue was consistently associated with a particular suffix. The second group of irregular items shared the phonological cue in the singular form with the subset of regular items described above (-arb items) and had their own irregular affix (e.g., -esh: harbesh). These items were labeled irregular inconsistent, as the phonological cue in the singular form was inconsistent in terms of its association with regular and irregular affixes (as in English sit-sat and fit-fitted) (see Table 1 for an example set).
We included the manipulation of consistency in the mapping between the phonological cue and the affix because consistency is considered a key factor influencing learning and processing in quasi-regular systems (e.g., Harm & Seidenberg, 2004;Plaut et al., 1996;Seidenberg & McClelland, 1989). For example, Seidenberg (1992) reported that English past tense forms (e.g., baked) take longer to generate if the phonological cue in the stem (-ake) is also associated with an irregular past tense form (e.g., take-took). Similarly, in spelling-sound mappings, inconsistent words such as wave (cf. have) take longer to read out loud and are more prone to errors than consistent words such as wade (e.g., Glushko, 1979;Jared, 1997Jared, , 2002Jared, McRae, & Seidenberg, 1990;Taraban & McClelland, 1987).
The third property of the new inflectional system that mimicked morphological systems in natural languages was higher token frequency of irregular relative to regular forms (Bybee & Slobin, 1982;Bybee, 1995b). This property also allowed us to examine generalization of regular and irregular affixes in the artificial language while keeping the total number of exposures to the two types of affixes the same.
We used performance on a generalization task as a key measure of interest. In this task, participants were presented with previously unseen "stems" (uninflected singular forms) within a linguistic context that required the production of the inflected form. The task thus engaged online language production processes, and was likely to rely on the same type of knowledge where previous studies of morphological learning found consolidation effects (e.g., Tamminen et al., 2012;Tamminen et al., 2015).
We assessed the use of regular and irregular affixes in this task in three different conditions: In the no cue condition, the previously unseen stems did not share phonological properties with the training set (e.g., jeech). In the irregular consistent condition, the stems shared the trained phonological cue that had been associated with the unique irregular affix (e.g., zisp). Finally, in the ambiguous condition the stems shared the trained phonological cue that had been associated with both the regular and the irregular affix (e.g., narb).
In Experiment 1, participants' performance on the generalization test was assessed immediately after learning or after a 24-h delay. We were particularly interested in the use of regular and irregular affixes across the three conditions, and the extent to which a consolidation opportunity, with no additional training, would influence it. Overall, to the extent that the learning of the novel inflectional system is Table 1 e Example items of the artificial language. As in natural languages, regular items had high type, and low token frequency, while irregular items had low type, and high token frequency. underpinned by complementary learning systems, immediately after learning we might expect a dominance of regular affixes for the previously unseen no cue items, and a dominance of irregular consistent affixes for the previously unseen items with the irregular consistent phonological cue. Importantly, for the items with the ambiguous cue we may see competition between regularizations and irregularizations, with the consolidation strengthening irregularizations.

Participants
Fifty-two students at the University of York participated in Experiment 1 for monetary remuneration or course credit after providing a written informed consent. The protocol for both Experiments 1 and 2 was approved by the Ethics Committee at the Department of Psychology, University of York. The participants in both experiments were native English speakers and had no reported hearing or learning difficulties. Four participants were excluded from all analyses in Experiment 1 due to a lack of accurate productions in one or more conditions in the generalization test (see below for more details). Participants in Experiment 1 were randomly assigned to two groups who were both trained at the same time, but were tested either immediately after learning or after a 24-h delay. Each group had a total of 24 participants.

Materials
2.1.2.1. TRAINING SET. There was total of 18 novel words in the training set. All 18 words were presented in their singular/ "stem" (e.g., rish) (Session 1 and Session 2) and plural forms (e.g., rishaff) (Session 2). All 18 items were pronounceable monosyllabic English pseudowords (Rastle, Harrington, & Coltheart, 2002). They were digitally recorded by a female native British English speaker in a sound attenuated booth, sampled at 44.1 KHz. The training set was designed such that the majority of items (12 out of 18) had a regular affix in the plural form (e.g., -aff), and a minority (6 out of 18) had one of the two irregular affixes (e.g., -eem, -esh). Nine out of 12 regular items were labeled no cue regulars as they did not have any cues to the plural affix (e.g., groll, rish, heef). Consistency was manipulated such that three regular and three irregular items shared the rime (e.g., -arb: farb, harb), rendering the -arb cue inconsistent regarding the mapping between the phonological cue and the plural affix. These items were labeled regular inconsistent (e.g., farbaff)a n d irregular inconsistent (e.g., harbesh) respectively. The remaining three irregular items had a unique phonological cue in the stem (e.g., -isp) which cued a unique irregular affix (e.g., -eem: tispeem). These items were labeled irregular consistent (see Table 1 above for example items across all conditions).
We developed six item lists to counterbalance the assignment of the affixes and the phonological cues across different consistency and regularity conditions, such that each affix and each phonological cue was paired with different conditions across the lists (see Appendix 3 for the full item set).
The novel words were paired with pictures of familiar objects (Rossion & Pourtois, 2004), with six animals, six fruits, and six artifacts, evenly distributed among different conditions. Each new word in the singular form was paired with a picture of one referent, and each plural form was paired with a picture of the same three referents (Fig. 1). There was no phonological overlap between the novel words and the existing English words for the referents.
During training, each irregular plural item was presented more often than each regular item, to capture the higher token frequency of irregular forms found in natural languages. Specifically, each irregular plural item was presented for a total of 24 times, whereas each regular plural item was presented for a total of 6 times. This manipulation allowed us to keep the total frequency of exposure to the regular and irregular affixes constant across conditions (Table 2). All items were presented the same number of times in the singular form (16 in Session 1, and 14 in Session 2).

GENERALIZATION SET.
A total of 24 pseudowords were used as novel uninflected singular forms in the generalization test. They were all pronounceable monosyllabic English pseudowords (Rastle et al., 2002). Half of the generalization items did not share onsets or rimes with the training set (Appendix 3). Thus this subset of items did not have any phonological cues to the plural affixes. The other half were novel words that shared the phonological properties of the different conditions in the training set. Specifically, 6 of these items had a phonological cue that had been associated with the irregular consistent plural affix (e.g., -isp in zisp), and 6 had a phonological cue that had been associated with both a regular and an irregular plural affix in the training set (e.g., -arb in narb). Out of this total pool of items, we created two subsets to match different lists at training, with each containing a total of 18 items, with 6 items in each of the following conditions based on the cue they provided to the plural suffix: no cue (e.g., jeech), irregular consistent (e.g., zisp) and ambiguous between the regular and irregular inconsistent affix (e.g., narb). All Fig. 1 e Example picture from the training set: a) a picture that was paired with a singular form of a novel word (e.g., rish); b) a picture that was paired with a plural form of a novel word (e.g., rishaff). cortex xxx (2018) 1e22 generalization items were digitally recorded (sampled at 44.1 KHz) in a sound attenuated booth by the same speaker as the training items.
2.1.3. Procedure 2.1.3.1. TESTING SCHEDULE. All participants were trained on the singular forms in Session 1, which occurred in the morning, lasting approximately 45 min. For Session 2, participants came back to the lab at 6 pm a week after Session 1, when they were trained on the plural forms, and continued with the exposure to the singular forms. Participants were also trained on a non-verbal declarative memory task and a procedural memory task (as these tasks were not relevant for the questions of the current study, they are presented in Appendix 4). The training part of the session lasted approximately 1 h 45 min. All participants took a 20 min break after the training phase of Session 2. Upon return to the lab half of the participants stayed on for the immediate test (0 h delay group), and half were asked to return at 8 pm on the following day (24 h delay group), when they were tested on the generalization test, and the declarative and procedural memory tests. The test phase lasted approximately 20 min.
2.1.3.2. SESSION 1. At Session 1, participants were trained on the singular forms using a word repetition task. Immediately after training, their memory for the novel words was tested using cued recall (picture naming), and 2AFC recognition. Word repetition and cued recall were implemented in DMDX (Forster & Forster, 2003), and 2AFC recognition in E-Prime (Psychology Software Inc.). Auditory stimuli were delivered via headphones.
2.1.3.2.1. WORD REPETITION. Each trial started with a fixation cross (500 msec), which was followed by the auditory presentation of the novel word; 300 msec post-word onset the corresponding picture and written word form were displayed for 4000 msec. Participants were told they were learning words of a new language, and were required to repeat each word out loud. They were told they would be tested at a later point on how well they remembered the meanings of the new words. There were two practice items at the beginning of the task. Each word was presented for a total of 16 times over 4 blocks. The order of items was randomized for each participant.
2.1.3.2.2. CUED RECALL. Each trial started with a fixation cross (500 msec), followed by the presentation of a target picture. The task was to name the picture out loud using the novel words. There were two practice items at the beginning of the task.
2.1.3.2.3. 2AFC RECOGNITION. Each trial started with a fixation cross (500 msec), followed by the auditory presentation of the novel word and two pictures on each side of the screen. The pictures stayed on the screen for 4000 msec or until the participant made the response. Participants were required to press a button on the keyboard (1 or 9) corresponding to the picture which matched the novel word (1 for the picture on the left, 9 for the picture on the right). The two alternative pictures were always of the same semantic category. Each novel word was presented once. The position of the target picture on the screen was counterbalanced across trials.

SESSION 2. The order of training tasks in Session 2 was
as follows, with a short break after task 4: 1) non-verbal declarative memory training 2) procedural memory training 3) word repetition with singular forms only 4) cued recall for singular forms 5) procedural memory training 6) word repetition with singular and plural forms 7) cued recall for singular and plural forms 8) 2AFC recognition for singular and plural forms.
Here we describe the language tasks, and we provide the description of the declarative and procedural memory tasks in Appendix 4. As in Session 1, all auditory stimuli were presented via headphones.

WORD REPETITION AND CUED RECALL FOR SINGULARS
ONLY. Participants performed one block of the repetition task with the singular forms (the same task as in Session 1) with a total of 8 repetitions per item. This was followed by cued recall for the singular forms (the same task as in Session 1).
2.1.3.3.2. WORD REPETITION, SINGULARS AND PLURALS. Participants performed the same word repetition task as described in Session 1, but in this case it included both singular and plural forms. Singular word forms of the novel word were paired with pictures of individual objects, whereas the plural word forms were paired with pictures of three items of the same kind (Fig. 1). The total number of repetitions for each plural form varied from 6 to 24 depending on the condition (see Table 2), and each singular form was presented 6 times in Each trial started with a fixation cross (500 msec), which was followed by the presentation of the phrase 'one [novel word in the singular form]' (e.g., 'one jeech') in the center of the screen, with the simultaneous auditory presentation of the singular form of the novel word over the headphones. The visual stimulus stayed on the screen for 1 sec. This was followed by a blank screen for 500 msec, which was followed by the visual presentation of the phrase 'three ?' which stayed on the screen for 4000 msec. The participant's task was to "say out loud whichever you thought was the appropriate form of the new word (...) to follow the word three." There was one practice trial, and the experimenter checked with the participant to make sure they understood the instructions. Each item from the generalization set was presented once. All responses that included an accurate production of the novel uninflected form (e.g., jeech) and one of the three affixes from the trained language were included in the analyses. Three participants were excluded in the 24 h group and one in the 0 h group because of a lack of productions meeting the above criteria in one or more of the cells of the design. The responses were coded as regularizations if they included a regular affix (e.g., -aff), as irregular consistent if they included the irregular affix that had been associated with a unique phonological cue (e.g., -eem), and as irregular inconsistent if they included the irregular affix that followed the phonological cue that had been associated with both regular and irregular forms (e.g., -esh).
2.1.3.5. DATA ANALYSES. Performance on cued recall and 2AFC recognition for plural forms were used as measures of initial acquisition. Cued recall was assessed using accuracy as a binary dependent measure. The recall data were analyzed using a mixed effects logistic regression, with the glmer function in the lme4 package in R (Bates, M€ achler, Bolker, & Walker, 2015). Recognition accuracy on the 2AFC task was high in both groups (M 0h ¼ .99, M 24h ¼ .97), so the analyses below focus on reaction times (RTs) for correct responses. The RT data were analyzed using a mixed effects linear regression, using the lmer function in the lme4 package in R (Bates et al., 2015), with logarithmically transformed RTs to address non-normality. In both tasks, group (0 h, 24 h), regularity (regular, irregular), and consistency (no cue/consistent, inconsistent) were included as effect-coded fixed factors, and a maximal structure for random effects for participants and items that allowed the models to converge (Barr, Levy, Scheepers, & Tily, 2013). For all models, we followed the maximal inclusion of intercepts and slopes in the initial model, which was reduced in a stepwise manner starting from the highest-order slopes until a model that converged was identified. The final random effects structure for all models presented in the Results section is provided in Appendix 1.
The performance on the generalization task was assessed using loglinear analysis that tested the distribution of the three types of responses (regularizations, and the two types of irregular affixes (irregular consistent, irregular inconsistent)), across the two groups (0 h, 24 h), and three different phonological cues (no cue, irregular consistent, ambiguous). The analyses were run using the loglm function in R.
In all analyses for both experiments the acceptable level of Type I error was set at .05.

Initial acquisition
Participants' initial acquisition of the training set was assessed using cued recall and the 2AFC recognition task. Our aim was to determine whether the groups were well matched at the initial learning of the plural system, and to examine whether, across groups, the structure of the system would impact on the initial levels of learning as it does in natural languages (Albright & Hayes, 2003;Marchman, 1997;Plunkett & Marchman, 1993;Prasada & Pinker, 1993).
The overall poorer performance with inconsistent plurals was confirmed in the recognition task (Fig. 3): Participants in both groups were slower to recognize inconsistent relative to consistent plural forms (ß ¼ .08, SE ¼ .03, t ¼ 3.08, p ¼ .002). There were no other significant effects (Appendix 1, Table 2).
In summary, the two groups of participants were wellmatched in terms of initial acquisition on the trained plural cortex xxx (2018) 1e22 forms. There were no overall differences between the regular and the irregular items, suggesting that in both groups regular and irregular affixes were learned equally well. Both recall accuracy and recognition times were influenced by the consistency between the phonological cue and the plural affix, similar to natural languages (e.g., Albright & Hayes, 2003;Marchman, 1997;Seidenberg, 1992). In both tasks, and both groups, there was evidence of poorer initial learning for the plural forms where the phonological cue to the affix was associated with two possible affixes (inconsistent forms). In the recall task there was also evidence that this was particularly the case for the regular/low token frequency forms.

Generalization
The key question we were interested in was what properties of the novel morphological system would drive performance on new, previously unseen forms when participants were required to generalize their existing knowledge of the trained forms. In addition, we wanted to examine whether a consolidation period might influence performance and specifically the relative contribution of the systematic, regular forms versus the irregular forms.
The analyses below included three possible types of responses to previously unseen novel "stems" (uninflected singular forms): regularizations (the production of the regular affix), and two types of irregularizations e the production of the irregular affix that had consistently followed a unique phonological cue, termed an irregular consistent response, or the production of the irregular affix that had followed the phonological cue associated with both the regular and the irregular forms, an irregular inconsistent response. We were interested in the frequency distribution of these types of responses across the three phonological cues in the group tested immediately versus the group tested after a 24 h delay.
The loglinear analysis showed that the distribution of responses was influenced by a three-way interaction: the difference between the saturated model and the model that excluded the group x stem cue x response type interaction was significant [c 2 (4) ¼ 30.21, p < .001]. As illustrated in Fig. 4, both groups showed a similar pattern of responses to two of the three cue types: When there was no phonological cue in the "stem", participants tended to provide a regularization response. Conversely, when there was a cue in the novel "stem" that matched the trained consistent irregulars then the corresponding irregular affix tended to be chosen. Crucially, when the phonological cue was ambiguous (matching trained regular and irregular forms) the type of response depended on when the participants were tested: participants tested immediately after training produced a similar number of regular and irregular inconsistent affixes, whereas participants tested with a 24-h delay predominantly produced an irregular inconsistent affix.   In summary, generalization responses indicated that after learning a novel quasi-regular inflectional system, participants showed both regularizations and irregularizations as in natural language learning. Crucially, after a delay and with no additional training and when encountering an ambiguous phonological cue irregularizations dominated over regularization responses.

Discussion
In this study, we trained participants on a novel morphological system that incorporated a large set of regular forms, and a smaller set of exceptions, thus mimicking other quasiregular systems found in natural languages. At initial acquisition, and as in natural language, participants' performance was influenced by the consistency of the phonological cues to the affix, with poorer initial learning of the forms with inconsistent phonological cues. Importantly, there was no overall effect of regularity, suggesting that regular and irregular affixes were learned equally well. Our main questions related to how participants would generalize their newly acquired knowledge of this artificial morphological system. Would they show evidence of a regular "default" inflection that could be applied to a range of forms that had no particular similarity to any training item (e.g., Prasada & Pinker, 1993)? Moreover, would they show evidence of irregularization in cases where novel uninflected forms were phonologically similar to trained exceptions? Most crucially, would the balance between these competing tendencies change over the course of 24 h after initial learning due to consolidation processes?
We found that participants produced regular affixes for previously unseen items where there was no phonological overlap with any items in the training set. This is analogous to the performance in natural languages (e.g., Xeroxed) and suggests that participants had acquired a default as a consequence of the structure of the training materials. In contrast, when the new items contained a phonological cue uniquely associated with an irregular affix, that affix dominated the responses, leading to irregular inflections. Again, this is similar to generalization performance in natural languages which shows sensitivity to subtle phonological regularities (Albright & Hayes, 2003;Marchman, 1997).
Both the default generalization of the regular plural and the more specific irregularization for items highly similar to consistent irregulars were observed very strongly 20 min after learning (observed in more than 80% of the associated trials), and these behaviors remained dominant after a delay of 24 h. This lack of change over time might be an indication that consolidation processes are of little relevance to the retention of inflectional learning. However, it is also feasible that the strong dominance of one type of generalization response for the no cue and irregular consistent items makes these conditions less sensitive to consolidationda kind of ceiling effect. Consistent with the latter interpretation, for new items with an ambiguous phonological cue that could lead to either regularization or irregularization we found that generalization performance was well balanced between these two options soon after learning (54% irregular responses, compared to 46% regulars). This state of equilibrium altered substantially after 24 h with irregularizations now much more likely and the proportion of regular plural responses dropping to just 18%.
The change of generalization performance over the delay and without any additional training suggests that, as in vocabulary learning (e.g., Bakker, Takashima, van Hell, Janzen, & McQueen, 2015;Bakker et al., 2014;Dumay & Gaskell, 2007;Gaskell & Dumay, 2003), consolidation processes play a role in learning a quasi-regular inflectional system. The specific pattern of change suggests that the influence of exception/ irregular items was enhanced over the delay, such that for the phonologically ambiguous cue irregularizations dominated generalization responses at the delayed but not the immediate test. Based on the CLS model and previous work on the role of the hippocampus in pattern separation, this may have been due to the initially stronger reliance on the hippocampal system for irregular items, and sleep-related memory consolidation benefits for hippocampally encoded memories (e.g., Tamminen et al., 2010). These processes may have been facilitated further by stronger forgetting of systematic, regular aspects of the new morphological system over wake. That is, the more systematic mappings initially more strongly reliant on the neocortical system may have been subject to stronger interference-based forgetting occurring during the wake period (e.g., Sadeh, Ozubko, Winocur, & Moscovitch, 2014). To assess the contribution of these consolidation processes to learning and retention of a quasi-regular inflectional system, in Experiment 2 we trained participants on the same new language as in Experiment 1, and we tested them after a 12 h delay which included sleep or wake.

Experiment 2
In Experiment 2 we further explored consolidation processes that contributed to the change in generalization performance across the 24 h delay. We specifically focused on the contribution of sleep and wake-related consolidation processes. A substantial body of evidence has established that sleeprelated memory consolidation helps strengthen new linguistic knowledge (e.g., Dumay & Gaskell, 2007;Gais et al., 2006;Schreiner & Rasch, 2015Tamminen et al., 2010) through the process of hippocampal replay (Batterink et al., 2017;Schreiner & Rasch, 2015). In the current study, to the extent that irregular forms are more strongly encoded by the hippocampal system, we may expect greater sleep-related strengthening of irregular relative to regular forms. During wake, although consolidation-related strengthening of new memories may occur (e.g., Dewar, Alber, Butler, Cowan, & Della Sala, 2012; Dewar, Alber, Cowan, & Della Sala, 2014), we generally see stronger influence of forgetting (see Diekelmann & Born, 2010 for a review). Few studies have explored the specific role of forgetting in language learning. One exception is the study by Werchan and Gomez (2014) who investigated the role of forgetting in word learning in 2.5-year old toddlers. They showed that wake-related forgetting of specific item-level information was beneficial for generalizations based on systematic aspects of new form-meaning mappings. This finding suggests that in addition to sleep-related strengthening, wakerelated forgetting may also play a role in learning and consolidation of a new inflectional system. To assess the influence of sleep and wake on learning and retention in a quasi-regular domain, in Experiment 2 participants were trained in the morning or in the evening and tested after a delay of 12 h which included wake or sleep. The study protocol followed Experiment 1 closely, and our primary goal was to determine the influence of these two time periods on the generalization of the new learning to previously unseen materials. To provide an additional source of evidence of the memory for the inflectional system, we also tested performance on the trained items themselves, both immediately after learning and after a 12 h delay including sleep or wake. We used an item fate analysis (Dumay, 2016;Schreiner & Rasch, 2016) to explore the extent to which systematic regular and exceptional irregular items are strengthened or forgotten over the delay period.

Participants
Sixty-eight students at the University of York participated in Experiment 2 for monetary remuneration or course credit after providing a written informed consent. After exclusion of 8 participants due to a lack of any accurate productions in one or more conditions in the generalization test, 31 participants were included in the analyses in the wake group, and 29 in the sleep group.

Materials
The same training and generalization materials were used as in Experiment 1.  Johns, 1991)]. In addition, we also administered a simple reaction time task as a test of alertness (Reid, 2013).

TESTING. The test phase for both groups of participants
included the same three tasks as in Experiment 1, performed in the same order. These were followed by two tasks assessing performance on the trained items (as in Session 2): cued recall for singular and plural forms, and 2AFC recognition for singular and plural forms. The test session started with the Stanford Sleepiness Scale and the alertness task (the data analyses for these tasks are presented in Appendix 5). . For this analysis, we focused on the ambiguous cue condition. Each generalization response for the items in this condition was coded as 1 if it included the irregular inconsistent affix, and as 0 if it was a regularization. Thus this analysis focused on the extent of irregularization at different delays. A small proportion of responses in this condition (2.8%) included an irregular consistent affix (which for the ambiguous condition represented an inappropriate inflection), and these were excluded from the analyses. The final models that converged included random intercepts by participants and by items for all contrasts, and random slopes for the first and the second set of contrasts. For the analyses of the change of performance on the trained items over the two 12 h delays in Experiment 2, we used several measures. First, as for initial acquisition, we analyzed accuracy at cued recall, and RTs for correct responses for the 2AFC task, with session (immediate, delayed), group (wake, sleep), regularity (regular, irregular), and consistency (consistent, inconsistent) as effect-coded fixed factors. All fixed and random effects included in the models are provided in Appendix 2.
In addition, to identify properties of the new language that influenced different consolidation processes, we analyzed performance at cued recall using two measures of item 'fate', explained further in the Results section.

3.2.
Results and discussion

Initial acquisition
As in Experiment 1, the level of initial learning of the plural forms was assessed using cued recall and 2AFC recognition. There was a similar level of initial learning in cued recall as in Experiment 1 (Fig. 5) regularity Â consistency interaction (ß ¼ 1.24, SE ¼ .36, z ¼ 3.41, p ¼ .001). As illustrated in Fig. 5, the increased difficulty with forms with inconsistent affixes was problematic for the regulars, but less so for the irregulars. There were no other significant effects (Appendix 2, Table 1). This pattern of findings replicates the performance at cued recall for initial acquisition of the new morphological system found in Experiment 1. The overall poorer performance with inconsistent items was confirmed in the recognition task (Fig. 6): participants in both groups were slower to recognize inconsistent relative to consistent plurals (ß ¼ .11, SE ¼ .03, t ¼ 4.41, p < .001). There were no other significant effects (Appendix 2, Table 2).
In sum, the analyses of initial acquisition replicate the findings of Experiment 1, and overall suggest a good level of learning of the regular and the irregular suffixes, with no differences between groups. The learning of the plural forms was influenced by the consistency of the phonological cue in the stem, with poorer learning of inconsistent plurals, and in particular poorer recall of regular inconsistent forms in both groups.

Generalization
A key question in the current experiment was how generalization performance on the delayed test would be influenced by whether the delay included wake or sleep. In particular, we were interested in how the dominance of regular versus irregular based generalizations would change over the 12 h delay period.
The loglinear analysis showed that, in contrast to Experiment 1, the two groups had a similar pattern of generalization (Fig. 7): the three-way interaction between group, stem cue, and type of response did not significantly contribute to the model [c 2 (4) ¼ 1.86, p ¼ .76]. As in Experiment 1, the generalization responses varied with cue type: Further model comparisons showed a significant contribution of the phonological cue x response type interaction [c 2 (4) ¼ 982.61, p < .0001], and no contributions from either interaction involving group [group x phonological cue: c 2 (2) ¼ .98, p ¼ .61; group x response type: c 2 (2) ¼ 1.61, p ¼ .45]. As illustrated in Fig. 7, when there was no phonological cue in the novel "stem" participants in both groups had a strong tendency to produce regularization responses. Conversely, when the phonological cue overlapped with the irregular consistent forms in the training set the dominant response in both groups was the irregular consistent affix. Notably, and unlike the immediate testing group in Experiment 1, in both delayed groups in the current experiment irregular inconsistent affixes constituted approximately two thirds of responses (65% in both groups) for the ambiguous cue. This difference suggests an increase in this type of generalization response across the 12-h delay relative to the immediate testing, that did not seem to vary depending on whether the delay included wake or sleep.
These analyses reveal a similar pattern of responses for two out of three types of cues as in Experiment 1, with regularizations dominating the no cue items, and irregular consistent responses dominant for the items with a phonological cue uniquely associated with an irregular affix. This suggests that the factors that underlie performance in these two conditions soon after learning remain stable across 12 h and 24 h periods regardless of whether sleep or wake intervene.
The ambiguous cue condition was the only one in Experiment 1 to show evidence of a close competition between two types of response, and was also the only one to reveal a change in performance across 24 h. Interestingly, for this cue both groups tested after a 12 h delay showed an apparent increase of irregular inconsistent responses relative to the group tested immediately after training in Experiment 1. To assess further the extent to which preference for irregular inconsistent responses emerges and increases over a delay period, we analyzed generalizations of all four groups of participants across the two experiments (tested immediately after learning, or after a 12 h wake, or a 12 h sleep, or a 24 h delay), focusing on the irregular inconsistent responses in the ambiguous cue condition. As illustrated in Fig. 8, these analyses showed that participants who were tested after 24 h had the greatest tendency to produce irregular inconsistent generalizations relative to the other three groups (24 h vs 0 h/12 h wake/12 h sleep contrast: ß ¼ .34, SE ¼ .09, z ¼ 3.84, p ¼ .0001). Participants in both 12 h groups also had a greater tendency to produce this type of response compared to the group tested immediately after training (ß ¼À .22, SE ¼ .10, z ¼À 2.16, p ¼ .031). As in the main analyses, there were no differences between the two 12 h groups (ß ¼À .01, SE ¼ .16, z ¼À .07, p ¼ .94).
These findings indicate that when presented with a phonological cue that could lead to regularizations or irregularizations, when tested immediately after training  participants showed a balanced preference for the two types of responses. However, after a 12-h delay with or without sleep there was a clear preference for irregular inconsistent responses, which increased further with an additional 12-h delay.

Change of memory for the trained items across the delay
The pattern of generalization performance described above may result from a mixture of maintenance, strengthening, and forgetting of different aspects of the newly learned inflectional system across sleep and wake periods. To assess the contribution of these processes, we examined the change in performance across the 12-h delay for the trained items.
The effects of session on cued recall performance likely reflect some combination of forgetting, maintenance or strengthening of memories in wake and in sleep. To tease these processes apart, we ran an item "fate" analysis following Dumay (2016; see also Dumay, 2018;Schreiner & Rasch, 2016). This type of analysis partitions the data into two complementary sets of trials based on whether the item was correctly recalled or not prior to the 12-h delay. For items that were not correctly recalled in the test immediately after learning, the two possible outcomes after the delay are a further inaccurate recall (described as never learned, coded as 0), or a correct recall (described as a gain, coded as 1). For items that were correctly recalled prior to the delay, the two possible outcomes are correct recall again (maintained, coded as 0) or inaccurate recall (forgotten, coded as 1). We ran two complementary analyses on these datasets using a mixed effects logistic regression with gains versus never learned and forgotten versus maintained as outcome measures, with a maximal structure of random effects for participants and items that allowed the model to converge. These analyses included group (wake, sleep), regularity (regular, irregular), and consistency (consistent, inconsistent) as effect-coded fixed factors (all effects are provided in Appendix 2, Tables 4 and 5).
For the gains versus never learned analysis, we expected to see better overall performance in the sleep group on this measure due to sleep-related strengthening of newly acquired memories. Indeed, that is what we found: as illustrated in Fig. 10, participants in the sleep group showed greater gains over the delay than the participants in the wake group (ß ¼ 1.59, SE ¼ .56, z ¼ 2.85, p ¼ .004). Overall, there were more gains for irregular forms (ß ¼ 1.03, SE ¼ .40, z ¼ 2.58, p ¼ .010). The analysis also yielded a regularity Â consistency interaction, with fewest gains for regular inconsistent items in both groups (regularity x consistency: ß ¼ 2.14, SE ¼ .82, z ¼ 2.59, p ¼ .010) Fig. 7 e Generalization performance for the participants tested after a delay of 12 h including wake or sleep. (see also Appendix 2, Table 4). In sum, the analysis of gains showed the expected benefit of sleep, and provided evidence for the overall strengthening of irregular items, with smallest gains for regular inconsistent forms in both groups.
Turning to the analysis of items that were initially recalled correctly, we analyzed the impact of group, regularity and consistency on the likelihood of forgetting. As expected, and illustrated in Fig. 11, there was an overall greater degree of Fig. 9 e Cued recall accuracy with trained plurals illustrated using differences scores (delayed e immediate) with trained plurals for the two groups. Fig. 10 e Proportion of items that were gained across the 12-h delay for the two groups at cued recall. The proportions are calculated out of all items that were not recalled at the immediate session, but were either gained (red and blue bars) or were not recalled at either session (never learned; grey bars). cortex xxx (2018) 1e22 forgetting in the wake than the sleep group (ß ¼À1.65, SE ¼ .52, z ¼À3.20, p ¼ .001). Overall, more inconsistent than consistent forms were forgotten (ß ¼ .66, SE ¼ .30, z ¼ 2.21, p ¼ .027). Interestingly, the pattern of forgetting in the two groups varied for different types of items: in the wake group forgetting was strongest for regular inconsistent and irregular consistent forms, unlike the sleep group (group Â regularity: ß ¼À 1.36, SE ¼ .60, z ¼À 2.27, p ¼ .023; group Â consistency: ß ¼À 1.19, SE ¼ .60, z ¼ 2.00, p ¼ .046; group Â regularity Â consistency: ß ¼ 3.07, SE ¼ 1.31, z ¼ 2.34, p ¼ .02). There were no other significant effects (Appendix 2, Table 5). This finding may suggest that the items with greatest forgetting in the wake group relied more strongly on the neocortical system at initial encoding, and thus they were subject to stronger effects of wake-activity related interference-based forgetting.
In sum, the analysis of performance on the trained items suggested that despite the very similar patterns of generalization behavior for sleep and wake participants, the underlying memory for the trained materials in these two groups was somewhat different. There was a clear difference in the overall cued recall performance between the two groups ( Fig. 9), with a greater decay in performance for the wake group. This overall difference is likely a result of greater strengthening of initially weak memories in the sleep group (as evidenced by the analysis of gains), and heightened forgetting of initially strong memories in the wake group (as demonstrated by the analysis of forgetting). In addition to these general group differences, the gains and forgetting analyses revealed a more complex pattern, with the level of performance depending on consistency and regularity. The forgetting analysis revealed a particularly intricate pattern, with the group Â regularity interaction showing a more equal pattern of forgetting for regulars and irregulars in the wake, but less forgetting of irregulars than regulars for the sleep group. A group by consistency interaction and a three-way interaction between these variables showed that consistency was also important in the degree to which items were forgotten, with the wake but not the sleep group showing substantial forgetting of inconsistent regulars and consistent irregulars.
The analyses also demonstrated a general effect of regularity for the memory of the trained items across both groups. In the overall analysis and the gains analysis (and with a nonsignificant (p ¼ .061) effect in the forgetting analysis) performance deteriorated less for the irregular items than the regulars. Putting this together with the overall benefit for sleep over wake, this result suggests that the interval between immediate and delayed testing benefits irregulars more than regulars, but that the benefit is dominated by memory gains for irregulars in sleep and by enhanced forgetting of regulars, and particularly regular inconsistent items, in wake.
The final analysis of the change of performance on the trained items over the delay provides additional evidence for overall improved performance in the sleep group on the trained items over the delay. In the 2AFC task, and as illustrated in Fig. 12, participants who slept between training and test were faster to recognize the new plural forms at the delayed test whereas there was no overall change in performance for the group who stayed awake (session: ß ¼À .04, SE ¼ .02, t ¼À2.83, p ¼ .005; group x session: ß ¼À.09, SE ¼ .03, t ¼À 2.99, p ¼ .003). The overall greater difficulty with inconsistent relative to the consistent items was replicated in this analysis with participants in both groups overall slower to recognize inconsistent relative to consistent forms Fig. 11 e Proportion of items that were forgotten across the delay for the two groups at cued recall. The proportions are calculated out of all items that were recalled correctly at the immediate session, and were then either forgotten (red and blue) or maintained (grey) at the delayed session. cortex xxx (2018) 1e22 (consistency: ß ¼ .13, SE ¼ .02, t ¼ 5.22, p < .001), and more strongly so for the wake group (group x consistency: ß ¼À.06, SE ¼ .03, t ¼À .98, p ¼ .035). There were no other significant effects (Appendix 2, Table 6 and Fig. 2).

Relationship between trained items and generalization
To assess more directly the contribution of the memory of the trained items to generalization patterns in Experiment 2 we used a stepwise multiple regression. We focused on the generalization responses with the novel items with the ambiguous cue, as this was the condition where we saw balanced competition soon after learning in Experiment 1 and a clear change in performance across different delays, and in particular an increase in irregularizations. Hence we used the proportion of irregular inconsistent responses as the outcome measure. Our aim was to examine the extent to which both the initial learning and the consolidated knowledge of regular and irregular forms as measured by cued recall contributed to generalization performance. We were particularly interested in assessing the extent to which the strengthening of the memory for the trained irregular forms and weakening of the regular forms over the delay may have contributed to the pattern of generalization responses.
In the first two steps of the analysis, we assessed the contribution of initial learning of irregular inconsistent and then regular consistent forms. The model containing both predictors (model 2) provided an improvement over the model with only the initial learning of irregular inconsistent forms as the predictor (model 1) [F (1) ¼ 5.74, p ¼ .02; see Table 3 for model parameters]. In the next two steps, we added measures of consolidated knowledge (cued recall accuracy for the trained items at the delayed session) for irregular inconsistent (model 3) and then regular consistent forms (model 4). Again, each model provided an improvement over the models containing fewer predictors [model 3 vs model 2: F (1) ¼ 10.32, p ¼ .002; model 4 vs model 3: F (1) ¼ 5.74, p ¼ .02; Table 3]. The addition of group (wake, sleep) as a predictor (model 5), did not provide any further improvements [model 5 vs model 4: These findings demonstrate that generalization performance was crucially influenced by the consolidated knowledge of the trained items. In particular, the stronger memory for the irregular inconsistent forms increased the likelihood of producing irregular inconsistent responses at generalization, and conversely the stronger memory for the regular inconsistent forms decreased the likelihood of producing irregularizations. Fig. 12 e Recognition times on the 2AFC task illustrated using difference scores (delayed e immediate test) with trained plurals for the two groups.

General discussion
In the current study our aim was to examine the contribution of complementary learning systems to the acquisition, retention and consolidation of new mappings in a quasiregular domain. We specifically assessed the extent to which different degrees of systematicity and arbitrariness in the mappings would influence learning and retention over 24 h. The new quasi-regular system mimicked key properties of the English past tense in that the majority of the new forms were regular, and a minority were irregular. Our hypothesis was that new irregular, exceptional forms should particularly benefit from a hippocampal sparse encoding scheme, whereas more systematic regular mappings should be less reliant on the hippocampus and more able to exploit neocortical learning mechanisms. On the basis of this hypothesis, we predicted stronger consolidation effects for irregular than for regular forms.
Over two experiments, we found that both regular and irregular forms of this new inflectional system mimicking pluralization were learned well. Using tests of generalization of the new knowledge to previously unseen items, we found that immediately after learning, both types of forms were used at approximately equal rates. Novel singulars with no particular similarities to trained items showed a strong tendency to be captured by the type-dominant regular plural process, whereas novel singulars that were similar only to high frequency unique irregular items tended to be pluralized as if they were irregulars. These patterns were robustly manifested, and observed at all four time points tested (immediately after learning, after a 12 h delay including sleep or wake, and after a 24 h delay), suggesting that there was stability over time in the generalization process for these items. All this points to the establishment of a new inflectional system with many properties of real language systems such as the use of a default regular inflection that applies to a wide range of novel forms (a global form of generalization; cf. Xeroxed; Prasada & Pinker, 1993), but with irregularizations in cases where novel forms have strong similarities to clusters of previously learned irregular forms (e.g., spling-splang; Albright & Hayes, 2003;Marchman, 1997).
The most challenging items in our battery of novel singulars were the items that were placed in a part of the phonological space that had inconsistent pluralization, similar to cortex xxx (2018) 1e22 trained regular and irregular items. When generalization was tested for these items soon after learning, we found a roughly 50:50 split of regular and irregular responses. However, over 12-h and 24-h delays with no additional exposure to the new system, the irregulars became increasingly dominant in the generalization responses. This observation is consistent with the notion that the irregulars benefit more from consolidation processes than regulars, as predicted on the basis of our hypothesis.
However, contrary to our expectations this increasing influence of irregular items was not specifically related to sleep. Rather, the effect was observed equally strongly in the two 12-h groups with or without sleep. In order to try to understand the role of maintenance, strengthening, and forgetting of the trained items in explaining this behavior, we examined the recall of the trained plurals before and after the 12-h intervals including sleep and wake. This technique has been the subject of recent debate (Dumay, 2018;Schreiner & Rasch, 2016), but the incorporation of the technique into a mixed effects analysis proved revealing for our data. Consistent with many previous studies in word learning, we found overall poorer retention of the new inflectional system over the delay in the wake than the sleep group (see McMurray, Kapnoula, & Gaskell, 2017, for a review). Two general processes may have contributed to this outcome: we found stronger gains in recall for previously subthreshold memories of the trained plurals in the sleep group, while at the same time there was greater overall forgetting of previously recalled trained plurals in the wake group. The overall better retention in the sleep group resulted from a combination of less forgetting (i.e., more robust maintenance) and also more gains in performance (perhaps due to sleep reactivating and strengthening initially weak memories).
Importantly, we also found that trained regular forms were less well retained over the 12-h interval than irregular forms. This effect was found regardless of whether the interval contained sleep or not, although we saw evidence of less forgetting for irregulars for the sleep but not for the wake group (indeed in both the gains and overall cued recall analysis the interaction with group was marginally significant; see Appendix 2, Tables 3 and 4). Putting these two effects together, we can see that the differential changes in memory for the trained plurals contribute to the shifts in generalization found for novel items. Irregular items were more likely to be strengthened (particularly in sleep) compared with regular items, whereas regular items were more likely to be forgotten (particularly in wake) than irregular items. In each case, the outcome was the same: a greater influence of the irregular trained materials in generalization to novel forms. Further evidence linking generalization performance to the retention of the training materials comes from a regression analysis, which showed that participants who retained strong memories of the irregular inconsistent plurals were more likely to produce irregular plurals of the ambiguous novel singulars, whereas participants who retained strong memories of the regulars were more likely to produce regular plurals. These effects were found even after controlling for post-training performance, suggesting that changes over the retention interval for trained items are key for explaining generalization performance.
Together, these findings highlight the importance of longterm memory processes in understanding the acquisition, retention, and use of new linguistic structures. We found that the knowledge of the quasi-regular inflectional system was not crystallized at the endpoint of the learning phase, with important changes taking place over the subsequent 24 h. It is reasonable to assume that these changes would continue to be influential over an even longer term as ongoing maintenance, strengthening, and forgetting processes progress over weeks or months.
It is worth noting that these changes were not observed for all the generalization materials that we tested. As mentioned, two properties (default regularization and generalization to consistent irregulars) were observed at all the time points tested. These are cases where the properties of the system did not conflict, and so any changes in memory for the trained materials did not have an observable effect. It was only when we focused on inconsistent forms for which there were strong competing tendencies to regularize and irregularize that we saw substantial changes in the generalization to new materials over time.
In many ways, the results conform with the hypotheses derived from the application of a complementary systems approach to language discussed in the introduction. The results can be explained in terms of a division of labor at encoding, with a greater reliance on the hippocampal system during the encoding of irregular forms, compared with a greater reliance on neocortical encoding of regular forms. Assuming that offline consolidation effects depend on the extent of hippocampal recruitment during learning, then this means that irregulars should be retained or strengthened to a greater extent than regulars over time. This was observed, both in terms of the retention of the trained items and the generalization to new materials.
That said, we also expected stronger maintenance and/or strengthening of hippocampally based memories across intervals including sleep compared with wake due to hippocampal replay (e.g., Skaggs & McNaughton, 1996;Staresina et al., 2015). We observed clear sleep benefits over wake in terms of the retention of the trained materials, including more robust maintenance of irregulars, but that did not translate into differential effects on generalization. Instead both 12 h including sleep and 12 h awake led to near identical shifts in the generalization pattern compared with the immediate test (12% for sleep and 13% for wake; see Fig. 8). Speculatively, the reason for the change in balance between regulars and irregulars across 12 h awake may relate to processes of forgetting impacting more on the regular, and particularly regular inconsistent forms, than the irregular forms (e.g., Sadeh et al., 2014;Sweegers & Talamini, 2014;Werchan & Gomez, 2014). Our finding of changes in generalization performance across both sleep and wake is part of a small but growing body of evidence that necessitates a better understanding of the offline processes that promote generalizable representations. Intriguingly, the effect of 24 h consisting of 12 h including sleep followed by 12 h awake on generalization (27%) was close to the summed effects of sleep and wake independently (12% þ 13%). This could suggest that the wake and sleep effects are simply combined additively across a 24 h period, although we would be cortex xxx (2018) 1e22 in a better position to make that argument if we had also tested a delay of 24 h that included 12 h awake prior to sleep.
Although the current results demonstrate a differential effect of regularity over time that is broadly consistent with our predictions, it is worth considering an alternative explanation of the data. In order to set up an inflectional system that included a type-dominant regular alongside pockets of irregularity, we needed to ensure that the irregulars were dominant in terms of token frequency (Bybee & Slobin, 1982;Bybee, 1995b). Therefore, an alternative explanation of the current findings is that the same neural circuits were involved to the same extent in encoding all items (perhaps hippocampally mediated in all cases), but that over the offline period the processes of maintenance and strengthening favored the more robust memories (the high frequency plurals) over the less robust ones (the low frequency plurals), whereas forgetting favored the opposite. The literature here is mixed, but there is certainly evidence consistent with offline prioritization of certain types of memory during sleep (e.g., Oudiette, Antony, Creery, & Paller, 2013;Wilhelm et al., 2011;van Rijn, Lucignoli, Izura, & Blagrove, 2017). Therefore we cannot rule out the possibility on the basis of the current evidence that the factor governing the relative strengthening and weakening of memories over time is frequency rather than regularity. Nonetheless it is worth pointing out that recent evidence has suggested that weakly learned information might be prioritized during offline hippocampal replay, which would predict the opposite pattern of change from the one found (Schapiro, McDevitt, Rogers, Mednick, & Norman, 2017). Furthermore, in the current study there was no evidence that regularity influenced the strength of initial encoding: although there were some differences in performance on the trained plurals at the end of training, these were relatively small, and a main effect of regularity was not observed in either experiment. Finally, forgetting was strongest in both a set of high frequency (irregular consistent) and a set of low frequency (regular inconsistent) forms. Thus, while item frequency may play a role in initial encoding and long-term retention it does not easily account for the current findings.

4.1.
Implications for models of language learning and use Our findings have several implications for models of language learning and use, and in particular for debates about the processing of grammatical aspects of language (e.g., McClelland & Patterson, 2002;Pinker & Ullman, 2002;Seidenberg & Plaut, 2014). We have successfully mimicked the learning and generalization of an inflectional system in natural languages, in that participants learned both the default generalization pattern (global generalization), and 'islands of reliability' with predictable phonological cues (local generalizations; Albright & Hayes, 2003;Plunkett & Marchman, 1991. The finding that generalization behavior was influenced by the memory of the trained regular and irregular forms is consistent with single-mechanism models of inflectional processing suggesting that both regular and irregular forms are processed within a single system encoding statistical regularities in the form-to-meaning mapping (e.g., Joanisse & Seidenberg, 1999;McClelland & Patterson, 2002;Seidenberg & Plaut, 2014). Moreover, the evidence that the memory for both regular and irregular trained forms was influenced by domain-general memory consolidation processes lends further support to domain-general accounts of language learning and use (e.g., McClelland, 2015;Seidenberg, 1997).
Our findings are also relevant for the research that examines the type of information that contributes to grammatical generalizations (e.g., Endress & Hauser, 2011;Wonnacott, 2011;Wonnacott, Brown, & Nation, 2017;Wonnacott, Newport, & Tanenhaus, 2008). In the current study, generalization patterns immediately after learning have shown sensitivity to both regular, type-frequency based information in the input, and phonologically constrained cues (consistency in the mapping between the phonological cue in the stem and the affix). Crucially, offline memory consolidation processes have influenced representations of the learned input to increase the influence of irregular, token-frequency based generalizations over time. Thus future studies on the types of linguistic information influencing generalizations will need to take into account memory consolidation processes and how they shape grammatical generalizations.
Our experimental paradigm most clearly mimics morphological learning in a second language, in that participants were learning new words for existing concepts. Indeed, more recent dual-mechanism models suggest that morphological learning in a second language is better described as memory-based learning of all new forms, rather than only irregulars (e.g., Ullman, 2001). Our findings are also relevant for grammar learning in the first language in that, as shown by a number of recent studies, grammatical knowledge of the first language is malleable both in the short and the long term throughout the life-span (Fine, Jaeger, Farmer, & Qian, 2013;Kaschak, Kutta, & Schatschneider, 2011;Luka & Choi, 2012;Ryskin, Qi, Duff, & Brown-Schmidt, 2017;Wells, Christiansen, Race, Acheson, & MacDonald, 2009). The consolidation processes we described in the current study are thus likely to play a role across a range of phenomena in both first and second language acquisition.
Finally, our findings are relevant for understanding the learning and representation in quasi-regular domains in cognition more broadly, and in particular for computational models implementing domain-general mechanisms of learning and representation in quasi-regular systems (e.g., Armstrong et al., 2017;Harm & Seidenberg, 2004;Rogers & Patterson, 2007;. Our findings of consolidation related-changes suggest that these models, typically implementing the learning and representational mechanisms of the neocortical system may need to be augmented by explicitly implementing representational and learning mechanism of the hippocampal system and the interaction between the two. For example, Armstrong et al. (2017) have recently explored how the structure of representational space influences generalization when learning a similar quasi-regular system as in the current study but in the print-to-sound domain. Unlike the current study, their focus was on generalization performance at the consolidated state of the acquired knowledge (48 h after learning), and their behavioral findings were well matched with the predictions from their computational model implementing the distributed architecture of the neocortical system. Crucially, in order to avoid catastrophic interference when cortex xxx (2018) 1e22 learning new items [a known problem for distributed neocortical systems (c.f. McClelland et al., 1995)] the model implemented different error scaling rates for the existing versus the new items. For the purposes of their key research question examining the representational space in the neocortical system this may have been an appropriate simplification in the model, but the extent to which the same computational architecture would capture our findings of the important changes in the pre-consolidated knowledge remains to be tested in future studies.

Conclusions
Our study of the learning and retention of a new artificial morphological system over the course of 24 h has demonstrated the importance of considering the role of systematicity in the learning, consolidation and retention processes for verbal material. We found evidence that consolidation processes affected participants' ability to generate inflected forms of trained and novel stems, but these changes did not occur across the board. We found the strongest evidence of consolidation effects for uninflected items that had conflicting cues consistent with both regular (systematic) forms and irregular forms. In these cases increasing consolidation periods with or without sleep benefited irregulars over regulars. This result is broadly consistent with a complementary systems model in which encoding of nonsystematic irregular items relies on hippocampal pattern separation to a greater extent than for systematic items, and in which consolidation preferentially benefits the hippocampal memories. However, the finding that these changes in performance occur in sleep and wake equally calls for a better understanding of how sleep and wake combine to enhance the generalizability of knowledge.