Contextual priming of word meanings is stabilized over sleep

Evidence is growing for the involvement of consolidation processes in the learning and retention of language, largely based on instances of new linguistic components (e.g., new words). Here, we assessed whether consolidation e ﬀ ects extend to the semantic processing of highly familiar words. The experiments were based on the word-meaning priming paradigm in which a homophone is encountered in a context that biases interpretation towards the subordinate meaning. The homophone is subsequently used in a word-association test to determine whether the priming encounter facilitates the retrieval of the primed meaning. In Experiment 1 (N=74), we tested the resilience of priming over periods of 2 and 12h that were spent awake or asleep, and found that sleep periods were associated with stronger subsequent priming e ﬀ ects. In Experiment 2 (N=55) we tested whether the sleep bene ﬁ t could be explained in terms of a lack of retroactive interference by testing participants 24h after priming. Participants who had the priming encounter in the evening showed stronger priming e ﬀ ects after 24h than participants primed in the morning, suggesting that sleep makes priming resistant to interference during the following day awake. The results suggest that consolidation e ﬀ ects can be found even for highly familiar linguistic materials. We interpret these ﬁ ndings in terms of a contextual binding account in which all language perception provides a learning opportunity, with sleep and consolidation contributing to the updating of our expectations, ready for the next day. a of matched controls in expressive measures, hippocampus maintenance of ﬁ c to lexical a set of ambiguous many healthy patients


Introduction
Over the last 20 years, a substantial body of psycholinguistic research has uncovered remarkable plasticity in the adult system. Whereas previously language development might have been characterised as a steady progression towards a fairly stable state, it is now clear that such a stable state is never achieved. Instead, we retain substantial plasticity as adults, allowing us to adapt our perception of phonemes when exposed to unfamiliar accents (Norris, McQueen, & Cutler, 2003), tailor our production system to reflect the statistical structure of our environment (Dell, Reed, Adams, & Meyer, 2000) and acquire and retain new forms (Gaskell & Dumay, 2003), meanings (Fang & Perfetti, 2017;Rodd et al., 2012) and syntactic constructions (Kaschak & Glenberg, 2004;Ryskin, Qi, Duff, & Brown-Schmidt, 2017). Along with these observations of plasticity, there has also been an enhanced recognition of the applicability of detailed theories of memory function to the domain of psycholinguistics (Davis & Gaskell, 2009;Gagnepain, Henson, & Davis, 2012;Szmalec, Page, & Duyck, 2012).
One key example of this increased synergy between memory and language has involved our understanding of the importance of consolidation processes in language learning. Studies of infants (Friedrich, Wilhelm, Mölle, Born, & Friederici, 2017;Gomez, Bootzin, & Nadel, 2006;Horváth, Myers, Foster, & Plunkett, 2015), children (Friedrich et al., 2017;Henderson, Weighall, Brown, & Gaskell, 2012;James, Gaskell, Weighall, & Henderson, 2017;Sandoval, Leclerc, & Gómez, 2017;Williams & Horst, 2014) and adults (Bakker-Marshall et al., 2018;Bakker, Takashima, van Hell, Janzen, & McQueen, 2014;Dumay & Gaskell, 2007;Kurdziel, Mantua, & Spencer, 2017) have shown that retention and integration of new linguistic knowledge can benefit from a consolidation period, and sometimes specifically from a sleep period (Tamminen, Payne, Stickgold, Wamsley, & Gaskell, 2010). For example, interference from learning a new word (e.g., "cathedruke") on the recognition of its existing neighbour (e.g., cathedral) tends not to be observed immediately (although cf. McMurray, Kapnoula, & Gaskell, 2016), but instead emerges after a period of sleep (Dumay & Gaskell, 2007) and is associated with the prevalence of spindle activity (brief ∼12-15 Hz bursts of activity in non-REM sleep) during the intervening night (Tamminen et al., 2010). These observations can be explained by systems consolidation models (e.g., Rasch & Born, 2013) applied to language learning (Davis & Gaskell, 2009) in which sleep provides an opportunity for new hippocampally mediated memories to be replayed (Ji & Wilson, 2007;Nadel, Hupbach, Gomez, & Newman-Smith, 2012) https Although it is fairly clear that consolidation is a component process in the retention of language knowledge, there is substantial variability in the extent to which consolidation effects are found (McMurray et al., 2016), which likely reflects the nature of the material to be learned. The studies that have revealed consolidation effects in language learning have tended to focus on examples of new material (e.g., novel words or grammars), and it is possible that stimulus novelty is the main factor that determines the level of reliance on consolidation. This would fit with a complementary systems account of language learning (McClelland, 2013) in which the hippocampus steps in to facilitate learning in cases where adjustment of cortical weights would interfere with existing knowledge (Mirković & Gaskell, 2016).
An exception to this rule is the study of Gaskell et al. (2014), who examined the role of sleep in the acquisition of phonotactic constraints in speech production. The research exploited the work of Dell and colleagues (Dell et al., 2000;Warker & Dell, 2006;Warker, Dell, Whalen, & Gereg, 2008;Warker, Xu, Dell, & Fisher, 2009) who had shown that speakers can acquire new phonotactic constraints (e.g., absence of a /g/ at syllable offset) over brief periods of time, as evidenced by the structure of their speech errors. Gaskell et al. (2014) extended the work of Warker (2013) and showed that when the constraint is a more complex "second-order constraint" (e.g., absence of a /g/ after /ae/) the integration of this constraint into speech errors is facilitated by a period of sleep (specifically, slow-wave sleep). A recent study demonstrated that consolidation also benefits the acquisition of second-order constraints when the material to be learned is non-linguistic (Anderson & Dell, 2018). The phonotactic consolidation effect can in some ways be thought of as learning of new material (i.e. the "gaps" in the repertoire of allowable sequences are new), but in other ways the change can be thought of as a revision of existing knowledge about the co-occurrence probabilities of various phonemes. Therefore, it remains to be seen whether sleep and consolidation are important for the retention of new evidence that acts to revise a well-established body of existing linguistic knowledge.
In the current study, we examine the potential for sleep and/or consolidation to influence the process of selection between the various familiar meanings of lexically ambiguous words. This domain exemplifies the building up of a body of knowledge over the course of a lifetime relating to the likelihood of different meanings, and yet has been shown to be susceptible to priming effects in the short term, suggestive of plasticity (Rodd, Cutrin, Kirsch, Millar, & Davis, 2013). For example, the word "pen" has multiple meanings, and in the absence of biasing contextual cues participants tend to retrieve the most frequent meaning of the word (in this case, relating to the writing implement meaning; Twilley, Dixon, Taylor, & Clark, 1994). This frequency bias is likely to reflect some kind of learning mechanism that amasses frequency counts from experience of the usage of the ambiguous word over a long period of time. Rodd et al. (2013) examined whether a recent experience with a particular meaning of an ambiguous word could alter the likelihood of retrieval of the different meanings. According to a kind of "knowledge crystallization" account, such an effect of recent experience would be unlikely, because frequency biases accumulated over many experiences across decades should not alter by a discernible amount on the basis of a single new encounter. However, an explanation that favours recent experience or maintains strong plasticity would suggest that meaning frequency biases should be more flexible. Rodd et al. tested these different accounts using a word-meaning priming paradigm. 1 Participants first encountered a set of ambiguous words embedded in spoken sentences that biased the subordinate (i.e., less favoured) meaning of the word (e.g., "A pen was used by the farmer to enclose the stock before he moved them to the market"). Participants were then tested on their comprehension of the primed ambiguous words-compared with a baseline unprimed condition-by presenting the ambiguous words in isolation as cues and asking participants to generate an associated word. Rodd et al. found that the proportion of associate responses consistent with the primed meaning rose from about 0.17 in the unprimed condition to 0.24 in the primed condition. A second experiment showed that the priming effect could not be explained simply in terms of standard semantic priming, which was more short-lived.
This word-meaning priming effect is relatively abstract, in that it applies regardless of whether the same or a different speaker is used for the priming sentence and the isolated cue word (Rodd et al., 2013), and transfers across spoken and written modalities (Gilbert, Davis, Gaskell, & Rodd, 2018). Although the delay used between exposure and testing was relatively long compared with many semantic-or form-priming studies (about 20 mins on average), this latency does not really provide much information about whether long-term lexical representations are being altered. Rodd et al. (2016) went further in mapping out the timecourse of word-meaning priming effects. They compared (in Experiment 2) exposure-test latencies of 1, 20 and 40 min with an unprimed baseline, finding that all three latencies showed some priming of associate responses, but with the 1-min condition stronger than the two longer latency conditions. This was taken as evidence of a relatively fast-fading component of the priming. Three further experiments examined longer latencies in a more naturalistic design, with good evidence that priming effects showed gradual decay across a day, and that beyond a day these effects weakened and were no longer significant (Experiment 1, Experiment 4). Intriguingly, though, participants with specific repeated experience of particular meanings of words (rowers with esoteric meanings of words like "feather") showed an influence of that experience on the likelihood of retrieval of the esoteric meaning several hours after that experience (e.g., rowing early morning and testing in the afternoon). Rodd et al. (2016) outlined a working model that might explain this complex pattern of priming effects. They argued that distributed connectionist models of ambiguous word representation and processing (Joordens & Besner, 1994;Kawamoto, Farrar, & Kello, 1994;Rodd, Gaskell, & Marslen-Wilson, 2004) provide a natural account of how meaning biases could be updated as a consequence of a recent experience via adjustment of the long-term weights between form and meaning units. This would make the primed meaning a little easier to access and the unprimed meaning(s) a little harder to access. This model, then, can quite easily explain the enhanced likelihood of accessing a primed meaning of an ambiguous word at a later timepoint. But is it also possible to explain the apparent decay in this effect that is seen across the course of the remainder of the day? Rodd et al. suggested that this could be a consequence of further learning and updating of weights in response to intervening unrelated language exposure, given the highly interconnected nature of representations in a distributed model of meaning. However, they also pointed out that the specific decay function observed in their studies, with strong decay initially and weaker decay later on, might be difficult to accommodate in such a model, speculating that multiple mechanisms might reasonably be involved.
Borrowing again from the memory literature, the apparent decay observed by Rodd et al. (2016) might indeed be a product of a second system involved in the priming of ambiguous word meanings. Several models of memory and forgetting have argued that the hippocampus incorporates a prodigious ability to encode new associations through pattern separation of sparse representations (Sadeh, Ozubko, Winocur, & Moscovitch, 2014;Yassa & Stark, 2011). This makes these representations resistant to interference, given that they have little overlap with other representations, but at the same time they are 1 We use the term word-meaning priming for consistency with the prior literature on this paradigm. The term "priming" is used in its simplest sense, as a description of the facilitation of access to a particular meaning as a consequence of a prior stimulus presentation, rather than as a description of a particular mechanism. susceptible to decay. As Hardt, Nader, and Nadel (2013;p. 112) put it "decay-driven forgetting is a direct consequence of a memory system that engages in promiscuous encoding". Therefore, a second potential explanation of the time-course of word-meaning priming might make use of the hippocampus to facilitate the encoding or binding of new associations between the words in a sentence during the exposure session. This new representation could form a second source of information alongside more permanent lexical semantic knowledge when participants are asked at a later point to generate associates of a given word. Furthermore, hippocampal trace decay would provide an explanation of why word-meaning priming effects tend to weaken as the delay between exposure and test increases within the course of a day.
As discussed above, there is a growing body of evidence that the hippocampus has an important role to play in the encoding of novel words. However, no new words are learned in word-meaning priming sentences; they are simply semantically coherent sentences containing familiar words. So the notion that the hippocampus is involved in learning simply during the comprehension of these sentences may seem unlikely. Nonetheless this suggestion is not new. Evidence from amnesic participants suggests that the hippocampus is involved in a range of online linguistic tasks that contribute to normal everyday conversation beyond simply word learning (Duff & Brown-Schmidt, 2012, 2017. These include tasks as diverse as the maintenance of common ground (Duff, Gupta, Hengst, Tranel, & Cohen, 2011), the use of cospeech hand gesture (Hilverman, Cook, & Duff, 2016), the flexible use of language (Duff, Hengst, Tranel, & Cohen, 2006) and, potentially, the updating of verb biases in syntactic ambiguity resolution (Ryskin, Qi, Covington, Duff, & Brown-Schmidt, 2018). Of particular relevance to the current study is the work of Klooster and Duff (2015). They questioned the received wisdom that remote semantic memory does not require a functioning hippocampus by testing a group of patients with hippocampal amnesia on the richness and depth of semantic knowledge for a range of different word types. They found that the amnesic participants performed worse than matched controls in both productive and expressive measures, suggesting that the hippocampus is involved in the maintenance of semantic representations well beyond initial acquisition. With specific reference to lexical ambiguity, they tested their patients on a set of ambiguous words and asked them to list as many senses as they could. Compared with both healthy comparison participants and patients with brain damage that spared the medial temporal lobe, the amnesic patients retrieved significantly fewer senses. Similar results were observed in a semantic features task and a word association task. These results suggest that a functioning hippocampus is needed for the maintenance and updating of rich semantic representations of lexical knowledge, including lexically ambiguous words. Further, they strengthen the viability of our alternative model of word-meaning priming that incorporates hippocampal involvement in the perception and retention of the sentences used as context for exposure to ambiguous words.
Existing evidence is therefore largely consistent with two quite different explanations of how word-meaning priming operates. The preferred explanation up to now is one in which each new experience with an ambiguous word immediately alters the long-term cortical lexical connections between form and meaning in favour of the contextually appropriate meaning (e.g., Gilbert et al., 2018). The alternative explanation is that new experiences with ambiguous words recruit hippocampal resources to temporarily bind the word with its context to provide an additional source of knowledge alongside permanent lexical knowledge. These two explanations can be differentiated in terms of their predictions relating to systems consolidation of word-meaning priming. For the immediate alteration account, there is little reason to suspect that systems consolidation might be relevant. The learning has already taken place in the cortex, and so the knowledge has already been made permanent. On the other hand, the contextual binding account suggests that systems consolidation via hippocampal replay during sleep will gradually integrate this new piece of information to ensure that long term lexical semantic representations remain rich and up-to-date .
We can test these predictions by looking further into the time-course of word-meaning priming, this time comparing periods spent awake and asleep. If sleep facilitates hippocampal replay of memories then we should see a benefit of sleep over wake in terms of the strength of wordmeaning priming after the sleep/wake period. More specifically, slowwave sleep and/or sleep spindles should be beneficial for strengthening or maintenance of word-meaning priming, given that slow-wake sleep is understood to be the main stage in which hippocampal replay occurs, marked by spindle activity (Rasch & Born, 2013). On the other hand, if word-meaning priming occurs via direct rewiring of cortical lexical connections, then sleep should have no particular active role to play in preserving word-meaning priming effects. That said, it still remains possible that sleep might have some passive role to play in maintaining word-meaning priming by providing passive protection against interference or decay (Ellenbogen, Hulbert, Stickgold, Dinges, & Thompson-Schill, 2006). We return to this point in the light of the evidence provided by Experiment 1.

Experiment 1
Testing in Experiment 1 involved two sessions separated by a delay that contained either sleep or wake (see Fig. 1). In Session 1 we exposed participants to homophones (e.g., pen) in spoken sentence contexts. As in previous studies (e.g., Rodd et al., 2013Rodd et al., , 2016 the sentences always biased comprehension of the ambiguous word towards the weaker, subordinate meaning (e.g., "A pen was used by the farmer to enclose the stock before he moved them to the market."). The subordinate direction of bias was chosen for comparability with previous studies and because we thought that this was the circumstance that was most likely to reveal strong effects. There were three counterbalanced sets of items, and participants heard sentences for two of the three sets, with the third set forming the control unprimed items (see Fig. 1). After a short filler task, we then tested participants' interpretations of a subset of the homophones using a word-association test (the unprimed set, plus one of the primed sets). The crucial question was whether the proportion of associate responses related to the subordinate meaning (e.g., pen-pig) was higher than for a set of matched homophones that had not been primed during exposure. A second word-association test was conducted using the second of the primed sets of ambiguous words in Session 2 after a delay period that included sleep or wake. This allowed us to determine whether the priming varied as a consequence content of the delay. For a subsidiary analysis, the Session 2 word-association test also included the other two sets of previously tested words in order to determine whether repeated testing influenced the pattern of performance after a delay. We implemented two protocols for the delay. In one, the delay was ∼12 h outside the lab and began either in the morning (wake) or in the evening (sleep). This kind of design is common in sleep and memory studies because it can assess the potential benefit of a full night of nocturnal sleep. This design is also susceptible to circadian confounds in term of the mental state of the participants at training and test (Doyon et al., 2009). The second protocol employed a shorter delay period taken in the lab and matched sleep and wake groups in terms of time of day of exposure and testing. This is also a commonly used design because it eliminates any circadian confound but with the potential cost of a less substantial difference between groups in terms of the time spent in sleep. For the second protocol the sleep group had a polysomnographically recorded afternoon nap whereas the wake group watched a silent film.

Participants
Participants were undergraduates at the University of York with English as their first language. In total, 74 participants were tested in M.G. Gaskell et al. C o g n it io n 1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 Experiment 1, evenly split between sleep and wake groups. To provide adequate power for the individual differences analyses that were conducted on the participants who had their afternoon nap polysomnographically recorded, 44 participants were allocated to the 2-h protocol, with the remaining 30 allocated to the 12-h protocol. The four groups were well-matched for mean ± SD age in years (2-h sleep 20.8 ± 2.3; 2-h wake 20.4 ± 0.9; 12-h sleep 20.5 ± 0.6; 12-h wake 20.6 ± 0.9) and sex (each group contained 4 males). Participants gave informed consent for the experiment, which was approved by the Research Ethics Committee of the Department of Psychology, University of York. As assessed by self-report, participants were nonsmokers, not taking any psycho-active medication, had no prior history of drug or alcohol abuse, no neurological, psychiatric or sleep disorder, and had a sleep-wake pattern where they typically rose by at least 9 am each morning, after a duration of at least 6 h of sleep per night. For the 24 h before the experiment and during the experiment, participants were asked to abstain from alcohol, caffeine and smoking, and to awaken by 8 am on the day of the experiment.

Materials
The key materials (see Appendix A) were 87 homophonic spoken words and sentence contexts previously used in word-meaning priming studies (e.g., spade; Rodd et al., 2016, Experiment 2;Betts, Gilbert, Cai, Okedara, & Rodd, 2018;e.g., spade;Rodd et al., 2016, Experiment 2). Prior dominance ratings had established for each word a dominant meaning (e.g., digging implement; mean dominance = 0.70; SD = 0.14) and a subordinate meaning (e.g., suit in card game; mean dominance = 0.23; SD = 0.13). For each word, a sentence context provided a prior biasing context that clearly indicated the subordinate meaning (e.g., "The gambler knew that his opponent wanted a spade"). Each sentence was also matched with a semantically unrelated probe word (e.g., hair) that was used in a relatedness matching task during exposure to ensure that participants attended to meaning. The test sentences were divided into three matched sets for the purposes of counterbalancing.
Thirty spoken filler sentences were included in the exposure phase in Session 1 alongside the test materials. These were similar in construction to the test sentences, but had probe words that were semantically related to the sentences (e.g., "Soup had spilled all over the counter", broth). The sentences and the isolated ambiguous words had been recorded by a female native British English speaker (Rodd et al., 2016).
The Stanford Sleepiness Scale (SSS; Hoddes, Zarcone, Smythe, Phillips, & Dement, 1973) was used to identify rated sleepiness at key points in the experiment, and a simple RT test provided a second source of evidence relevant to sleepiness. Participants also filled out general sleep quality and language background questionnaires.
Two videos were used in the experiment. Between exposure and Test 1, an 8-minute episode of "Shaun the Sheep" (http:// shaunthesheep.com/) occupied the participants. For the 2-h delay period in the wake condition participants watched an animated film (L'Illusioniste; Chomet, 2010). Both these videos were selected because they had minimal linguistic content.

Polysomnography
Polysomnographic monitoring was conducted on participants in the two-hour sleep condition using an Embla N7000 system with Remlogic 3.0 software in our sleep lab. Electrodes were attached using EC2 electrode gel after the scalp was prepared using NuPrep exfoliator. Scalp electrodes were attached at O1, O2, C3, C4, F3 and F4 according to the international 10-20 system, each referenced to the contralateral mastoid (A1, A2). Left and right electrooculograms (EOG) electrodes were attached, as were electromyography electrodes at the mentalis and submentalis bilaterally, and a ground electrode was attached to the forehead. EEG electrodes had a connection impedance of < 5 kΩ and all signals were digitally sampled at 200 Hz. Sleep data were bandpass filtered (0.3-40 Hz) and scored in 30 s epochs using RemLogic 3.0 in accordance with the standardised sleep scoring criteria of the American Academy of Sleep Medicine (Iber, 2007). All scoring was carried out by a single trained researcher (the second author) based on the frontal channels, allowing determination of the time spent in each sleep stage (N1, N2, N3 and REM).
For epochs scored as either N2 or N3 (i.e., those that could contain spindles), artefacts were eliminated by visual inspection before a 12-15 Hz linear finite impulse response filter was applied to frontal and central channels. An automated detection algorithm (Ferrarelli et al., 2007) counted discrete spindle events as amplitude modulations that exceeded a threshold of 8 × mean filtered amplitude for that channel.
As stated in the introduction, if word meaning priming is supported by hippocampal replay during sleep then one would predict that duration of N3 (slow-wave sleep) should be a predictor of change in priming over sleep, as should the density of spindle activity (Mednick et al., 2013;Rasch & Born, 2013;Tamminen et al., 2010). Nonetheless, some have argued that improvements in priming may be driven by REM sleep (Cai, Mednick, Harrison, Kanady, & Mednick, 2009;Plihal & Born, 1999;Stickgold, Scott, Rittenhouse, & Hobson, 1999), so we also tested the subsidiary prediction that there might be an association between duration of REM sleep and changes in word-meaning priming over sleep.
In the 12-h sleep condition, participants were asked to wear a commercial sleep recording device (Zeo; Zeo Inc.). However, equipment failure, noncompliance and problems with the headband slipping off led to absent or incomplete sleep data for the majority of participants in this condition and the remaining data were not analysed further.

Design
The main dependent variable was the proportion of word associates generated in response to the test material that matched the sententially primed subordinate meaning of the word as opposed to the dominant meaning. Two between-participants independent variables manipulated the presence or absence of sleep in the interval between testing points (sleep vs. wake) as well as the interval duration (2 h vs 12 h). Within participants, the main analysis manipulated three levels of the variable priming: control performance without priming during exposure was assessed in Session 1 (unprimed), whereas half the primed items were tested in Session 1 (primed-early) and the remainder in Session 2 (primed-delayed). The three matched subsets of the 87 test items were rotated across these three conditions in a counterbalanced fashion so that all items were used in all conditions across the whole experiment. The use of three separate sets of items ensured that any comparison of priming between test points did not involve the repeated presentation of homophones in the word-association test. However, we thought it prudent to include all sets of items in the second word-association test as a secondary measure of any effect of priming. Therefore we were also able to look at priming specifically in the repeated items in Session 2 as a separate analysis.

Procedure
Participants signed up for either the 2-h or the 12-h version of the experiment, but were randomly allocated to the wake or sleep group within these two protocols. Both sessions took place in individual study rooms or bespoke testing booths at the University of York. For the 12-r protocol one session took place between 7 and 11 am and the other between 7 and 11 pm. The wake group had the morning session first, followed that evening by the second. The sleep group had the reverse order, with the sessions separated by a night's sleep. For the 2-h protocol, Session 1 began at 1 pm, with Session 2 beginning roughly 2 h after the end of Session 1.
2.1.5.1. Session 1. All participants were first asked to complete a language background questionnaire to assess if they were monolingual British English speakers and to determine their level of experience with other languages. Participants then completed the SSS to assess their alertness at the start of the experiment. Other experimental tasks were completed on PC or laptop computers with DMDX software (Forster & Forster, 2003) using headphones for the spoken stimuli. Responses were recorded via a computer keyboard. Participants completed the simple RT test, followed by the exposure task (7 min). They then watched the 8-min filler video before taking the word-association test for the first time.
2.1.5.2. Delay period. In the 12-h conditions participants left the lab and went about their normal activities before returning to the lab 12 h later. In the 2-h condition, the sleep participants had a 90-min nap opportunity with polysomnographic monitoring followed by a 10-min rest period to minimise the influence of sleep inertia. In the wake condition, participants watched the film L'Illusioniste for 90 min and then had a 10-min rest.
2.1.5.3. Session 2. Participants completed the SSS and the simple RT test followed by the second, longer word-association test that included items from all three counterbalancing sets.

The tasks.
1. Simple RT. Participants were presented with 18 trials of a simple decision task to assess alertness (Reid, 2013). On each trial they saw a fixation cross for 500 ms before seeing two digits ordered as "10" or "01 "). Their task was to press one of two keys depending on whether the 1 was on the left or the right of the pair. Nine trials of each type were used in a random order. Participants were asked to respond as quickly and accurately as possible. 2. Semantic relatedness exposure task. Participants completed a practice block of five trials before completing all 88 experimental trials (58 test trials, 30 fillers), taking a short break at the half way point. Each trial began with a fixation cross for 1000 ms, followed by the presentation of a sentence over headphones. Following the completion of the sentence there was a 1000 ms delay and then the probe word was presented visually. Participants were asked to use the shift buttons on the keyboard to indicate whether they thought the probe was semantically related to the sentence (related = right, unrelated = left). They were asked to respond as quickly and accurately as possible. Three counterbalanced versions of the test were used, each containing two of the three 29-item sets, leaving the third item set as unprimed. 3. Word-association task. After 5 practice items, each trial began with a fixation cross for 1000 ms, and then a homophone item (e.g., "pen") was presented as a cue via headphones. A response cue followed 200 ms after the word offset and participants were asked to type in the cue word (to check whether it was perceived correctly), and then were instructed to type "the first word that comes to mind that is related to the word you heard". Participants were given the example of "tennis-Wimbledon". In Session 1, three counterbalanced versions of the experiment were used, each containing the unprimed item set plus one of the two primed item sets (58 trials in total). In Session 2, all three item sets were used (87 trials in total). In each case, the order or trials was randomised afresh.

Sleep questionnaire and Stanford Sleepiness Scale
According to the sleep questionnaire wake participants had a mean bedtime in the last month of 23:44 (SD = 62 mins) and sleep participants had a mean bedtime of 23:56 (SD = 72 min; t(72) = −0.78, p = 0.44). The groups were also well matched on estimated typical sleep duration (mean ± SD: wake = 7.78 ± 1.01 h, sleep: 7.97 ± 1.03 h; t(71) = −0.79, p = 0.43). The vast majority of participants reported that they had not taken any prescription medicine to help them sleep in the last month (wake: 36/37; sleep: 36/37). An ANOVA on participant SSS ratings with the within-participants independent variable Session (1 vs. 2) and between participants variables Interval Duration (2 h/12 h) and Group (sleep/wake) revealed a main effect of Session (F[1, 68] = 8.10, p = 0.006). Participants rated their sleepiness to be higher on the scale in Session 2 (2.97) than Session 1 (2.49). No other effects or interactions were statistically significant, although the interactions between Session and Interval Duration (F [1, 68] = 3.42, p = 0.069) and Session and Group (F[1, 68] = 3.01, p = 0.069) approached significance. As a further precaution, we tested M.G. Gaskell et al. C o g n it io n 1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 whether sleepiness ratings could explain priming levels by running correlations between rated sleepiness and level of priming in each Session (see below). Out of the four tests (wake-Session 1, sleep-Session 1, wake-Session 2, sleep-Session 2), only the correlation for wake participants in Session 1 was significant at p < 0.05 uncorrected for multiple comparisons (r τ = 0.26, p = 0.043) providing a weak hint that priming effects might be stronger when participants are sleepier. Crucially, the main pattern of priming change across session could not be explained in this way (in fact it runs contrary to the levels of sleepiness in Session 2).

Simple RT
The RT test was a secondary measure of alertness. An ANOVA on participant mean RTs with same independent variable as above revealed a significant interaction between Session and Interval Duration (F[1, 64] = 4.80, p = 0.032). For participants in the nap study, RTs varied very little across session (mean ± SEM; Session 1: 393 ± 10 ms, Session 2: 397 ± 10 ms), whereas there was a little more variation across session for the 12 h protocol (Session 1: 395 ± 11 ms, Session 2: 380 ± 11 ms). The Session × Group interaction was also significant (F[1, 64] = 5.65, p = 0.020) with sleep participants improving their RTs slightly across the delay (400 ± 10 vs 384 ± 10 ms) and wake participants getting a little slower (388 ± 10 vs 393 ± 10 ms). These effects are indicative of small differences in alertness in the different groups and sessions, and so to determine whether they could explain the main priming effects for the word association test, we ran correlations between every participant's RT speed and level of priming in the two sessions. None of these approached significant level. In sum, both the SSS and RT analyses suggested that the pattern of priming found in the word association test could not be explained in terms of any confound of alertness or sleepiness.

Word association responses
Obvious spelling mistakes were corrected, and word association responses were coded by two experimenters who were blinded as to the experimental condition. The coding categories along with the number and percentage of responses that fell into the categories were as follows.
The analyses focused on the 95% of responses from the first two categories, with the proportion of these trials that were consistent with the primed meaning being the dependent variable (see Table 1). Mixed effects logistic analyses used the glmer software from the lme4 (Bates, Maechler, Bolker, & Walker, 2014) package of R (R Core Team, 2013). The binary between-participant independent variables Group (wake vs sleep), and Interval Duration (2 h vs 12 h) were effect coded. Within participants, the independent variable Priming had three levels (Unprimed, Primed-Session 1, Primed-Session 2), which represented the presence or absence of priming, and the session of testing. This variable was coded using orthogonal Helmert contrasts that compared: (i) the unprimed condition vs the two combined primed conditions (Priming1) and (ii) the two primed conditions with each other (Priming2). Our analysis method followed Jaeger (2011; https://hlplab.wordpress.com/ 2011/06/25/more-on-random-slopes/) by building the maximal model including random items and participants effect structures that was justified by the sample (Barr, Levy, Scheepers, & Tily, 2013) and then enhancing the power of the analysis (Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017) by reducing the random effects structure of the model to the simplest structure that was not substantially different in terms of its explanatory value (using anova model comparison and a threshold of chisq < 3). Likelihood of convergence was enhanced by using the "bobyqa" optimiser and the "maxfun = 2e+05" setting. If necessary to help interpret interactions, pairwise comparisons were conducted on the mixed effects model using the R package "emmeans" with Holm correction for multiple comparisons. In these cases, uncorrected p-values are reported with those that remain significant at alpha < 0.05 marked with an asterisk.
This analysis, reported in detail in Table 2, revealed an effect of Priming1, showing that, overall, there were more responses consistent with the prime sentences for the primed items than the unprimed items. More importantly, there was an interaction between Group and Priming2. This showed that the effect of Session for the primed items varied significantly depending on Group (see Fig. 2), with the percentage of prime-sentence consistent responses dropping from 32.9% to 28.8% across a wake interval (Z = 2.07, p = 0.039) but rising from 29.1% to 31.7% across an interval including sleep (Z = 1.00, p = 0.32). Aside from an intercept effect, no other effects were significant.
A subsidiary analysis examined just the two sets of items in Session 2 that had already been tested in Session 1 (i.e., participants had already generated an associate response to these items in Session 1). This analysis followed along the same lines as the first analysis, except that Note. M and SD represent mean and standard deviation, respectively. S1 = Session 1, S2 = Session 2, R = repeated.  1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 there were only two levels of the Priming variable (unprimed vs primed) and so this variable was effect-coded. Note that if these repeated testing items showed the same pattern of responses as the unrepeated items above then one would expect stronger priming for the sleep group than the wake group. In fact, there was no evidence for such a difference (see Tables 1 and 2). Instead there was an overall effect of Priming, with similar levels of priming in the sleep and wake groups (see Fig. 3). No other effects reached significance level apart from the intercept.

Sleep data
Total sleep time for the participants who napped in the sleep lab was 66.9 ± 3.0 min (mean ± SEM). Of this 13.9 ± 1.5 min was spent in stage N1, 26.5 ± 2.6 min in N2 and 21.0 ± 2.9 min in N3 (slow-wave sleep). All but 4 participants reached stage N3. The duration of REM sleep was 5.5 ± 1.5 min, although only 11 of 22 participants reached this stage of sleep in their nap. We tested whether time in minutes in N3 or REM sleep (after controlling for total sleep time) was correlated with the change in a participant's level of prime-sentence consistency for primed items from Session 1 to Session 2 (i.e., Primed-S2 -Primed-S1), but neither partial correlation approached significance level (N3: r = 0.20, p = 0.38; REM: r = 0.05, p = 0.83). 2 We also analysed the sleep data in terms of the prevalence of sleep spindles (spindle density per minute) during N2 and N3 stages of the nap (Ferrarelli et al., 2007). In the absence of any prior expectation of one particular electrode location being influential, we opted to use the average density across frontal and central electrodes as our predictor, but this was not associated with the behavioural change measure (r = 0.067, p = 0.767).  M.G. Gaskell et al. C o g n it io n 1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6

Discussion
Experiment 1 was the first attempt to examine the robustness of word-meaning priming over relatively extended periods of time (2 and 12 h) that varied in terms of whether the intervening period included sleep or not. We found that the change in priming across the interval depended on whether the participant slept or remained awake during the interval. Participants who stayed awake throughout the interval showed reduced priming at the final test (−4.0%) whereas in the conditions for which the interval included sleep showed stronger priming (+2.5%). The two protocols we used varied substantially in terms of the duration of the interval (2 h vs. 12 h) and the way in which the intervening period was filled (short sleep in the lab vs. long sleep at home; watching silent video vs. conducting normal daytime activities). As one might expect, the sleep effects were numerically a little stronger for the 12-h manipulation than the 2-h manipulation, but there was no significant effect of interval duration, suggesting that the sleep benefits in terms of word-meaning priming applied across a range of durations.
This behavioural effect is consistent with the notion that the sleep interval facilitated the consolidation of word-meaning priming. This was a prediction of the contextual binding account, whereby ambiguous words are associated with their sentential context during comprehension. This new association is then consolidated during sleep in much the same way that memory for pairs of associated words are consolidated during sleep (e.g., Plihal & Born, 1997). If we had observed a significant correlation between the changes in performance and slow-wave sleep duration or spindle activity for the participants napping in the lab, this would have strengthened the case for the effect being driven by consolidation during sleep. Nonetheless, the absence of a significant correlation is not strong evidence against sleep having a causal role in the preservation of word-meaning priming. As mentioned, although the key effects we observed did not interact with interval duration, the nap participants still showed numerically weaker priming after sleep (4%) as compared with the overnight participants (8%). Therefore it is possible that the behavioural effects after a short sleep were too weak to reveal correlations with sleep parameters in our sample. This point is particularly pertinent to our subsidiary test of whether REM sleep might be associated with changes in the strength of word-meaning priming. Because the nap was an afternoon nap it was not optimised for variation in REM duration: only half the participants actually reached REM sleep. Although it should be noted that nap studies have in the past been useful for understanding the involvement of REM sleep in memory consolidation (e.g., Batterink, Westerberg, & Paller, 2017), further research (e.g., using overnight sleep recording) would be valuable to identify what aspects of sleep if any are influential.
The main results were derived comparing sets of materials that were tested for the first time either in Session 1 or Session 2. A secondary result of interest was the pattern of priming for words that were tested in both sessions. In contrast to the results for unrepeated items, the wake and sleep groups showed similar performance across sessions for the repeated items. That is, the drop-off in priming for the wake participants using unrepeated stimuli was not seen for the repeated items. This suggests that the act of generating an associate of a word can in itself represent a learning experience, but this effect is not differentially affected by sleep relative to wakefulness. We return to this point in the discussion of Experiment 2.
Although the key results of Experiment 1 are consistent with the contextual binding account, it is worth considering whether they are strong evidence against the immediate alteration account that has been previously put forward as an explanation of word-meaning priming. This account would not necessarily predict a consolidation effect, but it might predict that interference from new episodes of learning reduces the strength of word-meaning priming effects over time. As Rodd et al. (2016, p. 35) suggested, "One possibility is that the decay function could arise purely due to interference from intervening encounters with other unrelated words: each such encounter would result in weight changes, which could potentially influence even apparently unrelated words because these may share some connections within the highly interconnected distributed network." Therefore the benefit of sleep over wake might simply be one of passive protection from interference by reducing the opportunity for encoding of new experiences. This debate over active versus passive accounts of sleep benefits in memory is a familiar one (e.g., Ecker, Brown, & Lewandowsky, 2015;Ellenbogen, Payne, & Stickgold, 2006;Walker & Stickgold, 2010), and studies exploiting methods for enhancing sleeping brain oscillations (Ngo, Martinetz, Born, & Mölle, 2013) and reactivating targeted memories during sleep (cf. Paller, 2017) have largely favoured an active account for memories of object location or word pairs. Nonetheless we cannot automatically assume that word-meaning priming is supported by those same active mechanisms.
In Experiment 2 we attempted to tease apart active and passive accounts of the benefit of sleep for word-meaning priming by extending the delay between training and testing. As in the 12-h delay protocol of Experiment 1, we gave two groups of participants an exposure session and initial word-association test either in the evening or the morning and then left them to go about their normal activities outside the lab. However, instead of recalling the participants after 12 h, instead we recalled them after 24 h. These two groups should therefore have equivalent amounts of sleep and equivalent amounts of time spent awake, with the only difference being whether wake comes before or after sleep.
We know from Experiment 1 that word-meaning priming largely dissipates after 12 h spent awake. Therefore, for a passive account of the sleep benefit in Experiment 1 to hold, both groups in Experiment 2 should show equally dissipated word-meaning priming effects after 24 h, because both groups have spent a whole day awake encoding experiences that should interfere with the word-meaning priming effect. On the other hand, if sleep actively stabilizes word-meaning priming then the participants who sleep soon after exposure should show stronger effects after 24 h than those who have had a whole day for word-meaning priming to decay prior to sleep. In other words, sleep should largely crystallize the benefit found in Experiment 1 after 12 h. For the participants encoding in the morning, we know that there is very little priming left to crystallize overnight, whereas those encoding in the evening still show a strong word-meaning priming effect prior to sleep, which should then be preserved over the next 24 h (with perhaps some forgetting).

Participants
Participants were undergraduates at the University of York with English as a first language. In total, 55 participants were tested, with participants randomly assigned to an exposure session either in the evening (PM group; N = 28) or in the morning (AM group; N = 27). The groups were well-matched for mean ± SD age in years (PM 19.7 ± 1.7; AM 19.5 ± 1.5) and sex (PM 7 males, AM 4 males). Participant instructions, self-report characteristics and ethical review were as in Experiment 1.

Design, materials and procedure
The design matched Experiment 1 with a small number of exceptions. Rather than a between-participants manipulation of interval duration, all participants had a 24-h interval between testing points (see Fig. 1). Furthermore, the key difference between the two participant groups was no longer sleep versus wake. Instead, the key difference was the time of day of both tests, either in the morning (AM group) or evening (PM group). Within participants, the manipulation of priming was the same. The materials were identical to Experiment 1, and the procedure matched the 12-h interval condition of Experiment 1, except M.G. Gaskell et al. C o g n it io n 1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 of course for the duration of the interval, which was always 24 h.

Sleep questionnaire and Stanford Sleepiness Scale
According to the sleep questionnaire AM participants had a mean ± SD estimated bedtime in the last month of 23:42 ± 55 min and PM participants had a typical bedtime of 23:48 ± 50 min (t (53) = −0.39, p = 0.70). The groups were also well matched on mean ± SD estimated typical sleep duration (AM: 7.81 ± 0.91 h, PM: 7.96 ± 0.78 h; t(53) = −0.70, p = 0.49). All participants apart from one reported that they had not taken any prescription medicine to help them sleep in the last month (AM: 27/27; PM: 27/28). An ANOVA on participant SSS ratings with the within-participants independent variable session and between participants variable group revealed only a significant interaction between Session and Group (F[1, 53] = 4.66, p = 0.035). The groups were well matched on mean ± SD SSS score in Session 1 (AM: 2.37 ± 0.69, PM 2.36 ± 0.62), but for the AM group sleepiness ratings rose in Session 2 (2.70 ± 0.87), whereas they showed a slight fall for the PM group (2.21 ± 0.96). It is unclear why this shift may have taken place, but as in Experiment 1, we tested whether sleepiness ratings could explain priming levels in the word association test by running correlations between every participant's rated sleepiness and level of priming. None of the four tests revealed any association that approached significance level uncorrected for multiple comparisons (all ps > 0.11).

Simple RT
An ANOVA on participant mean RTs with same independent variables as above revealed only a main effect of Session (F[1, 49] = 33.58, p < 0.001), with both groups responding more quickly in Session 2 (mean ± SEM; 385 ± 5 ms) than Session 1 (424 ± 9 ms). As with the sleepiness ratings, we ran correlations between every participant's mean RT and level of word association priming in the two sessions. None of these approached significance level. In sum, both the SSS and RT analyses suggested that the pattern of priming found in the word association test could not be explained in terms of any confound of alertness or sleepiness.

Word association responses
Word association responses were again coded blind as to the experimental condition. We used the same excluded categories, with the proportion of the remaining trials that were consistent with the primed meaning being the dependent variable (see Table 3 and Fig. 4). Mixed effects logistic analyses (see Table 4) followed the same analysis protocol, except that there was only one effect-coded between-participants independent variable: Group (AM vs PM). Within participants, the independent variable Priming again had three levels (Unprimed, Primed-Session 1, Primed-Session 2), coded using orthogonal Helmert contrasts that compared: (i) the Unprimed condition vs. the two Primed conditions (Priming1) and (ii) the two primed conditions with each other (Priming2).
As in Experiment 1, there was an overall effect of Priming1, showing that there were more responses consistent with the prime sentences for the primed items than the unprimed items. Here, there was also an effect of Priming2, with stronger priming in Session 1 than Session 2. However, the interaction between Group and Priming2 showed that this reduction in priming across session was not uniform. The group trained in the morning showed a reduction in sentence-consistent responses over 24 h from 32.7% to 26.5% (Z = 3.03, p = 0.002 * ), whereas the sentence-consistent response percentage for participants trained in the evening was if anything slightly higher after 24 h (32.0%) than at initial test (30.9%) (Z = 0.03, p = 0.97). The intercept effect was significant, and the remaining effects were non-significant.
As before, a subsidiary analysis examined just the two sets of items in Session 2 that had already been tested in Session 1 (see Tables 3 and  4). There was an overall effect of Priming, with similar levels of priming in the AM and PM groups (see Fig. 5). Aside from the intercept, no other effects reached significance level.

Discussion
Experiment 2 was intended to discriminate between active and passive accounts of the sleep benefit found in Experiment 1 for sentence priming of ambiguous word meanings. If this effect was due to sleep passively protecting the priming effect by reducing the potential for retroactive interference from language then 12 h of interference awake post-sleep should have the same interfering effect as 12 h of wake interference prior to sleep. This means that both groups of participants would be predicted to show the same reduction in priming 24 h after sleep. However, this pattern of results was not found. Participants who were trained in the morning initially showed a 9.3% priming effect when tested shortly afterwards, but this dropped to 3.1% after 24 h. Remarkably, the priming effect for participants trained in the evening showed no such reduction: in fact, the 8.5% initial priming was numerically higher (9.6%) after 24 h. This pattern of results suggests that the passive account of the priming effects in Experiment 1 is wrong. Instead, as we flesh out in the General Discussion, these results fit with an explanation in which sleep alters the memory of the recent experience with the ambiguous word to make the new learning more robust to interference or decay during wake.
As in Experiment 1, we also looked at the effect of a 24-h period with sleep or wake first on the sentential priming effects for words that were tested twice. once in each session. Again, this repeated testing showed no significant effect of group, unlike the main analyses. On a methodological level this suggests that for word-meaning priming repeated testing does not provide a clear test, given that the act of testing changes the later outcome. But on a theoretical level this result is interesting as well. Why might the priming effect for AM participants survive the 24-h period (or in Experiment 1 a 12-h period awake) when words are tested shortly after training but not when they are left untested until the second session? It seems likely that the explanation for this difference links in with the well-attested retrieval practice benefit (e.g., Karpicke & Blunt, 2011). When participants are asked to generate an associate of a word, potentially they remember their response, so forming a second memory that can retained or forgotten over time. Participants tended to produce the same associate in response when tested the second time (overall, just over 50% of repeated words also had repeated associate responses), which diminishes the potential for group differences to emerge. Furthermore, retrieval practice has been posited as a form of consolidation (Antony, Ferreira, Norman, & Wimber, 2017), which might further reduce the potential for sleep/ wake differences to be found.

General discussion
The experiments presented here were intended to shed further light Table 3 Experiment 2 means and standard deviations for the proportional consistency of the word association test responses with the prime sentential context as a function of Group and Priming.  1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 on the mechanism that provides plasticity in our understanding of word meanings. The data from two experiments reveal a clear and coherent pattern. Like several other studies (Betts et al., 2018;Gilbert et al., 2018;Rodd et al., 2013Rodd et al., , 2016, we found evidence that encountering a lexically ambiguous word in a disambiguating sentential context alters the word's representation in a way that has consequences for its future usage. When presented 20 min later, participants tended to generate associates that were consistent with this sententially primed meaning. Consistent with Rodd et al. (2016), this priming effect tended to fade or decay over time spent awake. However, in Experiment 1, we found that when sleep (either a nap or overnight) followed reasonably swiftly after the contextual exposure a priming effect was observed with no evidence of decay over the period of sleep. Furthermore, in Experiment 2 we found that this period of sleep stabilised the primed words' representations such that no decay in priming was seen across the subsequent day spent awake. As described in the introduction, Rodd et al. (2016) argued that   1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 word-meaning priming effects of this type can be explained with reference to a connectionist network model of word recognition that maps from a representation of word form to a distributed representation of meaning. In the case of lexical ambiguity, the network learns through linguistic experience to map from one word-form to multiple meanings. Once the multiple meanings have been learned, recognition of an ambiguous word will briefly activate a weighted "blend" of the meanings, with meaning frequency and context of the word on each occasion helping the network to settle into the appropriate meaning's attractor state. Word-meaning priming in such a model is an extension of the process that acquires the mapping in the first place. That is, the exposure phase of the experiment causes a small immediate alteration in the connections in the network from form to meaning, making the contextually appropriate meaning a little easier to access and the inappropriate meaning a little harder to access. This weight adjustment process then influences the later word association test, meaning that words associated with the primed meaning are more likely to be retrieved. This immediate alteration model provides a clear mechanism to explain the basic word-meaning priming effect, and can perhaps also explain the decay in the priming effect over time awake in terms of further weight changes in response to subsequent linguistic input. However, the current results are more challenging. In particular, the fact that sleep appears to make word-meaning priming more robust and resilient to further interference/decay is not predicted. In order to explain this sleep benefit, the immediate alteration model might appeal to the synaptic homeostasis hypothesis (SHY; Tononi & Cirelli, 2014), which explains consolidation not at a systems level but at a more local level. By this account, the learning that takes place as a consequence of encountering an ambiguous word in its subordinate context involves a strengthening of the cortical synapses relating to that meaning. SHY then predicts that synaptic downscaling during sleep enhances plasticity, while at the same time effectively silencing synapses that may be peripheral or irrelevant to the representation of that meaning. This has the consequence of improving the signal-to-noise ratio of the representation of the recent learning experience, hence boosting retention. This would allow an immediate alteration explanation to accommodate a benefit of sleep over wake. While we should not rule out such an explanation at this point, it is worth noting that the recruitment of SHY to explain sleep benefits also makes it harder to explain why wordmeaning priming should show quite strong decay during wake, given that the cortical strengthening during learning reduces plasticity and hence susceptibility to interference. It is hard to see how a single network model can accommodate at the same time both the benefits of sleep to subsequent word-meaning priming and the decay of priming that is seen prior to sleep.
We think that the current results are more amenable to the contextual binding account described in the introduction, in which there is a division of labour in a complementary systems model between the main long-term repository for established lexical knowledge (a cortical network along the lines of the model described above) and the network that provides shorter-term plasticity to enable learning. By this account, the exposure phase of the experiment does not necessarily change connection weights in the long-term cortical network to any substantial degree. Instead, on encountering a sentence the hippocampus is recruited to bind together essentially a new memory of that sentence or utterance, combining the various components of the sentence in a similar way to the formation of a new association between words (Eichenbaum & Bunsey, 1995). This new memory in the short term presumably allows listeners to keep track of, and act on, conversations or other forms of dialogue. It also provides a source of information that can be used when participants are asked to generate an associate of the word encountered in the exposure phase. In these circumstances, an unprimed word can only make use of the cortical long-term lexical network, whereas a primed word can make use of both the cortical network and also the hippocampal representation of the recent experience with that word.
Importantly, like many other hippocampal memories, the new hippocampal memory should be susceptible to decay over time (Hardt et al., 2013) but can be consolidated via offline replay chiefly during sleep (Bendor & Wilson, 2012), integrating the new material with existing knowledge of the meanings of the constituent words. The decay explains why long periods awake prior to sleep lead to a diminishing of the word priming effect. In turn, the involvement of consolidation explains why word priming remains after sleep, as long as the sleep begins before too much wake decay has occurred. It is important to note that we do not need to assume that consolidation is complete after a single period of sleep. Instead it may be that consolidation is partial, with the hippocampal memory still present perhaps still susceptible to interference, or perhaps more robust as a consequence of consolidation within the hippocampus (cf. Winocur, Moscovitch, & Bontempi, 2010). That said, there is good reason to think that consolidation processes should be completed relatively quickly, given that the long-term consequence of learning in this case is an adjustment of pre-existing weights. This is similar to other examples in which prior schema knowledge can support the rapid consolidation of new information into the cortex (Lewis & Durrant, 2011;McClelland, 2013;Tse et al., 2007).
Given that new learning is simply a matter of connection weight adjustment, one might ask whether a complementary systems model should need to rely in hippocampal mediation in the first place. One of the key computational arguments for the necessity of a hippocampal network in a complementary systems model has been the potential for catastrophic interference (McClelland, McNaughton, & O'Reilly, 1995). This occurs when a distributed connectionist network is required to learn a new mapping that is unlike existing mappings. However, in this case there is no new mapping, just an existing mapping that requires updating. Mirković and Gaskell (2016) presented data that suggested that in the acquisition of a new artificial language, arbitrary mappings (high risk of catastrophic interference) showed sleep benefits whereas systematic mappings (low risk of catastrophic interference) did not. Why then, does the current paradigm show such clear benefits of sleep? Mirković and Gaskell suggested that the prioritization is an important aspect of sleep's role in consolidation (Stickgold & Walker, 2013). In the Mirković and Gaskell study, the whole artificial language was new, and so the hippocampal system (and consequently sleep) may have intervened in the component of the language for which it was most needed, namely the arbitrary aspects. In the case of word-meaning priming, no elements are new, and so the hippocampal system can intervene for mappings that are less in need of rescue from catastrophic interference.
It is important to note that previous results from the word priming paradigm have been taken as evidence against an episodic model of word-meaning priming. In the light of the current data we should review this evidence. First, Rodd et al. (2013) showed that word-meaning priming does not vary depending on whether the speaker of the priming context sentence matches the speaker of the ambiguous word used as the cue in the word association test. In fact, they do not even need to match in terms of the modality of presentation. Gilbert et al. (2018) showed that there was no reliable difference in priming levels for between-modality priming (e.g., auditory prime, visual test) as opposed to within-modality priming. These data point very clearly towards a relatively high-level model of priming that does not depend on a detailed perceptual match between prime sentence and test cue.
Surface detail and modality matter to a great extent for paradigms such as repetition priming (e.g., Goldinger, 1998;Luce & Lyons, 1999), but equally there is much evidence to suggest that many aspects of priming for spoken words is insensitive to such details. Furthermore priming based on similarity at a surface level may have a slower, more effortful mechanism than priming that relies on match at an abstract level (e.g., McLennan & Luce, 2005). Our proposal based on the current evidence is that word-meaning priming is supported the same medial temporal lobe circuitry that underpins paired-associate learning, which M.G. Gaskell et al. C o g n it io n 1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 at least in part may have a more abstract element (Cairney, Sobczak, Lindsay, & Gaskell, 2017), linking the words at a lexical or conceptual level rather than a more superficial level. A second argument was based on the incidence of words from the priming sentence context occurring as responses in the word associate test. If word-meaning priming relies on learning direct associations between the ambiguous word and the words in its context then one might expect these context words to crop up as associate responses reasonably often. Rodd et al. (2013) found that this type of response was relatively rare (e.g., 3% of all responses in Experiment 1), with a modest increase in prevalence for primed (4%) vs unprimed (2%) conditions. The word-meaning priming effect remained when these responses were excluded. We found similar figures in the current experiments: 3% of responses in untrained conditions were words from the context sentences, and this rose to 4% in the conditions for which we saw priming. Clearly, then the vast majority of associate responses were not simply taken from the context. Interestingly, though, the incidence of these repetitions of context were not even across sentences. For most items (49/87) context words were never reused as associate responses at test, probably because these sentences provided a bias without containing any direct lexical associates of the ambiguous words (e.g., for the word fence, the context sentence was "He wanted to learn how to fence"). However, when there was an obvious associate of the ambiguous word in the sentence, then this associate tended to be used a little more often in the primed conditions (e.g., for pipe, participants might respond with smoke for "The grandfather picked up his pipe to smoke". On closer analysis, then, we think such findings are not incompatible with a complementary systems account of word-meaning priming in which the new association between the ambiguous word and its sentential context provides a source of information to guide the subsequent word association response one of two ways: (1) By directly boosting the likelihood of responding with a word that is both contained in the sentence context and is an established associate of the ambiguous word.
(2) By biasing the retrieval of an associate of the ambiguous word that has some semantic relation with the theme of the context sentence.
It remains possible, then, that all the previous findings using the word meaning paradigm can be reinterpreted in terms of a contextual binding account by which the hippocampus has a crucial role in relating ambiguous word meanings to their sentential context. However, although the current results provide evidence for this contextual binding mechanism, it is not necessarily the case that the immediate alteration account has no role to play at all. Rather, it is possible that for the ambiguous words in sentence context, in fact both cortical and hippocampal systems operate in parallel, perhaps explaining why in the current results the AM participants showed some suggestion of priming even 24 h after initial exposure. A goal in future work would be to determine whether there are ways of further teasing these accounts apart, perhaps by testing the effects of sleep and wake delays on different dependent measures.
As discussed in the introduction, we see our model as compatible with the work of Duff and Brown-Schmidt (2017), who have provided a substantial body of evidence from amnesic participants relating to the involvement of the hippocampus and medial temporal lobe in the everyday use of language, including the updating and enriching of both lexical semantic representations (Klooster & Duff, 2015;MacKay, Stewart, & Burke, 1998) and syntactic biases (Ryskin et al., 2018). Converging evidence can also be found from fMRI evidence suggesting that the left hippocampus is sensitive to the comprehensibility of spoken sentences (Davis & Johnsrude, 2003). Studies using direct EEG recording in the medial temporal lobe also implicate the hippocampus in normal language comprehension. For example, Meyer et al. (2005) recorded event-related potentials (ERPs) of epilepsy patients in rhinal cortex and hippocampus during the perception of different types of sentence violation. They found that semantic violations generated an N400 response (Kutas & Hillyard, 1980) in rhinal cortex in the anterior medial temporal lobe, whereas syntactic violations elicited a later negativity (500-800 ms) in the hippocampus. Both these effects point to the involvement of the medial temporal lobe in the process of sentence comprehension. With particular relevance to the current study, Piai et al. (2016) recorded oscillatory activity directly in the hippocampus for sentences that varied in the contextual constraints that predicted a final word that had to be named, given a picture (e.g., "She walked in here with the…" vs. "She locked the door with the…" followed by a picture of a key). They found that the level of theta activity during the course of the lead-in sentence was dependent on the constraint strength, with the strong-constraint condition showing greater activity. Indeed, within the strong constraint condition, theta activity appeared to track the semantic predictivity of the sentence (as measured by latent semantic analysis) on a word-by-word basis. Hippocampal theta power is strongly implicated in memory processes of recollection of previously studied material (e.g., Guderian & Düzel, 2005) and episodic memory retrieval (e.g., Lega, Jacobs, & Kahana, 2012), although it is important to note that this oscillation also has associations with other aspects of cognitive function such as spatial and time coding (Korotkova et al., 2018). Nonetheless, perhaps surprisingly given earlier evidence (e.g., Kensinger, Ullman, & Corkin, 2001), the emerging understanding from recent studies is of a role for the hippocampus not just in learning language, but also in language comprehension, production and maintenance (see also Cross, Kohler, Schlesewsky, Gaskell, & Bornkessel-Schlesewsky, 2018, for a proposal on the involvement of hippocampal theta in incremental sentence comprehension).
The current results represent the first evidence for a contextual binding account implicating sleep and consolidation in comprehension of lexically ambiguous words. Unsurprisingly, several important questions remain. One is the generality of this mechanism. Is it focused on cases of lexical ambiguity, or does it operate for all utterances and all words irrespective of ambiguity? Indeed it is feasible that contextual binding plays a prominent role only when the subordinate meaning of an ambiguous is retrieved. We can only speculate as to the answer at this point, but a parsimonious working hypothesis would be that the contextual binding we see here as benefiting from sleep is a general component of comprehension, but perhaps (consistent with Piai et al., 2016) sensitive to the degree of predictability of each word meaning. This would link in well with models of memory that see prediction error as driving encoding (Henson & Gagnepain, 2010) and might predict that consolidation effects are stronger when the subordinate (unlikely) meaning of a word is comprehended as compared with the dominant (likely) meaning.
A second area of uncertainty is at what level of representation this contextual binding occurs. Thus far we have been describing the binding as relating to a collection of words, suggesting quite a superficial level. However, the observation here and elsewhere that participants rarely actually reproduce context words in their word-association test responses provides some evidence that the contextual binding occurs at a higher, more gist-like, level. One possibility is that the process we identify here is related to the generation of a situation model of the comprehended utterance, fitting with a construction-integration account of discourse comprehension (Kintsch, 1988). This would be consistent with the common finding that memories for sentences that endure over several days (perhaps through consolidation) tend to be gist-like rather than verbatim representations of the original material (Kintsch, Welsch, Schmalhofer, & Zimny, 1990).
A third outstanding question is to what extent the priming that we see here is related to explicit and/or intentional retrieval of the context sentences either in their verbatim form or (more likely) a more gist-like form. Priming is often thought of as implicit and automatic, but this is not necessarily true in all cases. Here, the proportion of word-association trials that related to the sentences heard in the exposure phase is quite high (50% in Session 1, 67% in Session 2), and the participants M.G. Gaskell et al. C o g n it io n 1 8 2 ( 2 0 1 9 ) 1 0 9 -1 2 6 were not under strong time pressure. Furthermore, the exposure and test phases were all in the same physical environment, providing a contextual cue at test to the exposure session. To some extent, these issues have been addressed in previous studies of word-meaning priming. For example, Experiment 1 of Rodd et al. (2016) involved exposure during the course of a radio show and then a later online test. In those circumstances environmental cues were likely to be more variable and yet word-meaning priming was still found. Experiments 3 and 4 were even more naturalistic in that they tested individuals with particular experience of specific meanings of ambiguous words (rowers tested on words such as feather that had rowing-related meanings). Likewise, word-meaning priming effects have been found using more time-critical tests such as speeded semantic relatedness judgements (e.g., Cai et al., 2017;Gilbert et al., 2018), allowing less time for intentional retrieval of the context sentences to occur. Nonetheless, as mentioned earlier it is not necessarily the case that all effects that are described as word-meaning priming are must be attributable to the same single mechanism, This means that future studies that assess the contribution of consolidation in the absence of environmental matching cues or with speeded tasks would be valuable to complete the picture in terms of the mechanism or mechanisms that contribute to wordmeaning priming. Nonetheless, we find it striking that the word meaning effects that we have observed here can be found even 12 or 24 h after the exposure session despite there being no instruction to memorise the sentences, and no way in which the key ambiguous words were flagged up as important or special. It would be surprising if participants worked hard to intentionally encode or retain the sentences, as there was absolutely nothing for them to gain in doing so. Part of the motivation for this work was to address the question of the extent of involvement of consolidation in language plasticity. We have shown that consolidation is not only important for learning of completely new material. Instead it seems to play a role in the refinement and updating of lexical semantic representations for highly familiar words. When we come across an ambiguous word we need to make use of the context of that word to determine the intended meaning and update our expectations of the word's meaning ready for the next encounter. We have argued that this involves the learning of an association at some level of abstraction between the words or concepts in the sentence so that updating can occur via consolidation, associated with sleep. A potential implication of this result is that effectively all language comprehension constitutes a new hippocampal learning experience that may lead to sleep-associated consolidation. The use of lexically ambiguous words merely provides a clear example of how the learning experience employs consolidation to improve our predictive model of language.