Is word learning capacity restored after a daytime nap?

Sleep is thought to be involved in the consolidation of new memories encoded during the day, as proposed by complementary learning systems accounts of memory. Other theories suggest that sleep ' s role in memory is not restricted to consolidation. The synaptic homeostasis hypothesis proposes that new learning is implemented in the brain through strengthening synaptic connections, a biologically costly process that gradually saturates encoding capacity during wake. During slow-wave sleep, synaptic strength is renormal-ized, thus restoring memory encoding ability. While the role of sleep in memory consolidation has been extensively documented, few human studies have explored the impact of sleep in restoring encoding ability, and none have looked at learning beyond episodic memory. In this registered report we test the predictions made by the complementary learning systems accounts and the synaptic homeostasis hypothesis regarding adult participants ' ability to learn new words, and to integrate these words with existing knowledge. Participants took a polysomnographically-monitored daytime nap or remained awake prior to learning a set of new spoken words. Shortly after learning, and again on the following day, we measured participants ' episodic memory for new words. We also assessed the degree to which newly learned words engage in competition with existing words. We predicted that sleep before encoding would result in better episodic memory for the words, and facilitate the overnight integration of new words with existing words. Based on existing literature and theory we further predicted that this restorative function is associated with slow-wave and sleep spindle activity. Our pre-registered analyses did not ﬁnd a signiﬁcant


b s t r a c t
Sleep is thought to be involved in the consolidation of new memories encoded during the day, as proposed by complementary learning systems accounts of memory. Other theories suggest that sleep's role in memory is not restricted to consolidation. The synaptic homeostasis hypothesis proposes that new learning is implemented in the brain through strengthening synaptic connections, a biologically costly process that gradually saturates encoding capacity during wake. During slow-wave sleep, synaptic strength is renormalized, thus restoring memory encoding ability. While the role of sleep in memory consolidation has been extensively documented, few human studies have explored the impact of sleep in restoring encoding ability, and none have looked at learning beyond episodic memory. In this registered report we test the predictions made by the complementary learning systems accounts and the synaptic homeostasis hypothesis regarding adult participants' ability to learn new words, and to integrate these words with existing knowledge.
Participants took a polysomnographically-monitored daytime nap or remained awake prior to learning a set of new spoken words. Shortly after learning, and again on the following day, we measured participants' episodic memory for new words. We also assessed the degree to which newly learned words engage in competition with existing words. We predicted that sleep before encoding would result in better episodic memory for the words, and facilitate the overnight integration of new words with existing words. Based on existing literature and theory we further predicted that this restorative function is associated with slow-wave and sleep spindle activity. Our pre-registered analyses did not find a significant benefit of napping prior to encoding on word learning or integration. Exploratory analyses using a more sensitive measure of recall accuracy demonstrated significantly better performance in the nap condition compared to the no-nap condition in the immediate test. At the delayed test there was no longer a significant benefit of the nap. Of note, we found no significant effect of slow-wave activity prior to encoding on episodic memory or integration of newly learned words into the mental lexicon. However, we found that greater levels of Stage 2 sleep spindles were significantly associated with greater improvements in lexical competition from the immediate to the delayed test. Therefore,

Introduction
Sleeping after learning is thought to be beneficial for the consolidation of newly encoded declarative memories. According to complementary learning systems accounts of memory (Diekelmann & Born, 2010;McClelland, McNaughton, & O'Reilly, 1995) new memories are encoded in parallel in the fast-learning hippocampus and in the slow-learning neocortex during wake. Then, during sleep, the hippocampus repeatedly replays new memories and communicates with the neocortex. Through this hippocampal-neocortical dialogue, the neocortex gradually extracts regularities from overlapping representations of new memories, as well as integrating those memories with existing knowledge stored in long-term memory. The theory therefore predicts that sleep after learning should not only benefit episodic memory, that is, memory for the details of the experience encoded during wake, but it should also benefit more complex memory processes such as memory integration and generalisation (Witkowski, Schechtman, & Paller, 2020). Much of the empirical evidence, particularly for sleep's role in more complex memory processes, comes from human studies looking at language learning. For example, Dumay and Gaskell (2007) trained participants on 24 novel spoken words (e.g., cathedruke) before sleep. They found that one night of sleep after learning increased free recall of the words, while an equivalent period of wake after learning yielded no change from baseline. Importantly, the novel words were all highly phonologically overlapping with familiar existing base words (e.g., cathedral). This allowed the authors to quantify to what degree the newly learned words interfered with recognition of the familiar base words. That is, to what degree the novel words had been integrated with existing words in the mental lexicon. This lexical competition effect was only observed after a night of sleep. While this study supported the hypothesis that sleep supports consolidation of episodic memory as well as the integration of new memories with existing memories, subsequent studies using the same word learning paradigm provided support for the neural mechanisms postulated by complementary learning mechanism accounts.
In an fMRI study, Davis, Di Betta, Macdonald, and Gaskell (2009) observed a strong hippocampal response to the first presentation of spoken novel words, compared to novel words trained the day before. At the neocortical level, only those novel words that had been trained the day before scanning (i.e. with a chance to consolidate overnight) showed a similar pattern of activation as familiar words, while novel words learned on the same day (i.e. no chance to consolidate overnight) showed an activation pattern similar to untrained novel words. Regarding the neural mechanisms that operate during sleep that might underpin these changes, Tamminen, Payne, Stickgold, Wamsley, and Gaskell (2010) used the same word learning task and reported that nocturnal slow-wave sleep duration predicted overnight change in accurate recognition memory speed for the newly learned words, and an association between sleep spindle activity and the integration of the novel words, as measured through overnight change in the lexical competition effect. The latter finding was notable as sleep spindles (brief bursts of activity in the 11e15 Hz range occurring in Stage 2 sleep and slow-wave sleep) are known to occur in close temporal connection with hippocampal ripples (Siapas & Wilson, 1998;Staresina et al., 2015) and to prime cortical plasticity (Rosanova & Ulrich, 2005), suggesting they play a key role in hippocampal-neocortical dialogue.
While the impact of sleep on memory consolidation has been extensively researched, much less is known about the impact of sleep in preparing the brain for learning. When awake, learning occurs via interactions with the environment including making associations due to rewards and punishments, performing tasks, or learning new information about the world. According to the synaptic homeostasis hypothesis (Tononi & Cirelli, 2003, learning is associated with increased synaptic strength (also referred to as synaptic potentiation). Such learning however comes at a cost in two broad domains. Firstly, neurons with higher synaptic strength consume more energy leading to cellular stress. Secondly, as synaptic strength increases, neurons become less selective in terms of the range of inputs that can cause them to fire. In other words, the neuronal signal-to-noise ratio is decreased. The end result of these costs is the gradual reduction in memory encoding ability during a period of prolonged wake. The solution offered to these problems by the synaptic homeostasis hypothesis is sleep, particularly slow-wave sleep. During sleep the brain's connection to the outside environment is cut off, allowing it to spontaneously activate and sample its entire knowledge of the environment, including both new information encoded during previous wake, and information encoded in the past. Over the course of this sampling those synapses that activated most strongly and consistently during wake, or were best integrated with older memories, survive, while at the same time those synapses that were less activated are weakened. The theory further proposes that this weakening is particularly effective during the transitions between intracellular up and down states experienced during slowwave sleep. This competitive down-selection of weaker synapses increases neuronal signal-to-noise ratio, thus restoring memory encoding ability. Therefore, sleep can be considered to serve two functions in relation to memory: to consolidate new information (consolidation) and to prepare the brain for acquisition of new information in the subsequent period of wakefulness (restoration).
Although the synaptic homeostasis hypothesis makes clear predictions about the restorative impact of sleep on learning, few studies have tested these predictions, and none of them have explored sleep's impact beyond declarative episodic memory, or in the domain of language learning where the impact of sleep on consolidation has been so richly documented. In one of the first studies looking at sleep before learning, Mander, Santhanam, Saletin, and Walker (2011) asked participants to encode face-name pairs, followed by a daytime nap or wake, and a second session encoding of a new set of stimuli. In the no-nap group encoding ability deteriorated from the first to the second encoding session, while in the nap group encoding ability remained stable, suggesting that the nap restored encoding ability. Analysis of subjective and objective measures of fatigue and alertness showed that the groups did not differ on these measures and the observed effect was due to sleep rather than fatigue. Furthermore, both number of fast sleep spindles and time spent in Stage 2 sleep during the nap predicted encoding ability in the second session. The authors (see also Saletin & Walker, 2012) interpreted these findings by extending the principles of complementary learning systems accounts to the memory function of sleep before encoding. They argued that sparse hippocampal representational encoding leads to reducing encoding capacity over the course of the day. Hippocampal-neocortical dialogue, of which sleep spindles are a marker, is needed to shift from hippocampal to cortical-dependence of previously learned stimuli, thus restoring hippocampal episodic learning capacity. This view differs from that of the synaptic homeostasis hypothesis. For example, the Mander et al. framework does not assign a critical role to slow-wave activity. Instead the data reported by Mander et al. support the view put forward by Saletin and Walker (2012) that there may be a difference between spindle-driven and slow wave-driven memory consolidation mechanisms. The precise nature of the difference is unclear, but the implication is that it ought to be possible to observe a role for sleep spindles in the restoration of encoding ability in the absence of a role for slowwave activity, or vice versa.
Other studies have however supported the prediction made by the synaptic homeostasis hypothesis that slow-wave activity (which includes slow-wave sleep duration and other slow-oscillation measures) is critical in restoring encoding ability. Antonenko et al. (2013) used transcranial slow oscillation stimulation (tSOS) to induce slow-wave activity during a daytime nap. Following the nap participants encoded pictures, word pairs, and word lists. All three tasks benefitted from tSOS, compared to a sham control group, suggesting that slow-wave activity before encoding does improve encoding ability. Similarly, Ong et al. (2018) enhanced slow oscillations acoustically during a daytime nap. While acoustic stimulation did not result in improved encoding ability of pictures compared to a sham control, slow oscillation enhancement did correlate with encoding success, providing some support to the prediction that slow-wave activity restores encoding ability. Van Der Werf et al. (2009) used mild acoustic sleepperturbation to reduce slow-wave activity (but not total sleep time) during a night of sleep in a small group of older adults. Participants encoded images the next day, with a lower memory score following perturbed slow-wave activity compared to an unperturbed night of sleep.
In the current study we sought to establish the impact of sleep on subsequent word learning ability. Participants in two separate sessions learned new words immediately after a daytime nap, and new words after an equivalent period of wake. We tested both their episodic memory of the words, and the extent to which the words have been integrated with existing vocabulary. Polysomnography (PSG) was used to test the impact of slow-wave sleep and Stage 2 sleep spindles in restoring word learning ability.

Hypotheses
2.1. Hypothesis 1: napping compared to wake before encoding will result in better episodic memory of newly learned words As discussed above, both the synaptic homeostasis hypothesis and the complementary learning systems predict that sleep restores memory encoding capacity. However, the current evidence base for this is limited, and no studies have tested this prediction in the context of language learning, a domain of declarative learning that is well documented to benefit from sleep-associated memory consolidation after encoding. In the present study we tested this prediction by training participants on a set of new spoken words. Prior to the learning session, participants either took a daytime nap, or remained awake for an equivalent period of time under controlled conditions. Episodic memory for the newly learned words was tested shortly after learning, and again the next day, using a cued recall task which is known to benefit from sleep after learning (e.g., Henderson, Weighall, Brown, & Gaskell, 2012;Tamminen et al., 2010). We predicted that participants would score significantly higher when learning was preceded by sleep, compared to when it was preceded by wake. We hypothesised to observe this effect both immediately after learning and a day after learning. Note that since the same participants were tested in both conditions, we can rule out potential confounds due to group differences. Also, because the same participants were tested in both conditions at the same time of the day, any circadian confounds are eliminated.

2.2.
Hypothesis 2: napping compared to wake before encoding will result in larger consolidation effects in the integration of newly learned words with existing known words The impact of pre-encoding sleep on memory integration is particularly interesting for two reasons. Firstly, no study so far has examined the impact of pre-encoding sleep beyond declarative episodic memory. Secondly, lexical integration only emerges after a consolidation opportunity and is not typically observed prior to this (e.g., Davis et al., 2009;Dumay & Gaskell, 2007;Gaskell & Dumay, 2003;Henderson et al., 2012;Tamminen et al., 2010;Tamminen, Lambon Ralph, & Lewis, 2017). Lexical integration therefore appears to be a process that occurs only after encoding, and consequently may not be affected by presence or absence of pre-encoding sleep. However, a recent study suggests that pre-encoding sleep may c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 have knock-on effects on later consolidation processes. Walker et al. (2019) used a visual version of the lexical competition paradigm and manipulated encoding strength by varying the number of training trials during learning of their novel words. They found that when encoding strength was lower than typically used in this paradigm, the overnight consolidation effect was reduced, that is, a weaker overnight change in the magnitude of the lexical competition effect was observed. Intriguingly, overnight change in episodic memory (free recall and recognition memory) was unaffected by the encoding strength manipulation. It appears therefore that disrupting encoding may have knock-on effects on later consolidation that are unique to the integration of new memories with existing memories. Based on this literature we predicted that napping before encoding novel spoken words compared to wake would result in larger increase overnight of the lexical competition effect.

2.3.
Hypothesis 3: slow-wave activity will be associated with better encoding ability The synaptic homeostasis hypothesis postulates a key role for slow-wave activity in the downscaling of synaptic strength. According to this view, slow-wave activity acts both as a marker of increased synaptic strength and contributes to the decrease of synaptic strength. This dual role gives rise to a slow-wave activity control loop during sleep (Tononi & Cirelli, 2014): increasing daytime learning results in increasing synaptic strength which in turn results in increasing slow-wave activity during sleep. High slow-wave activity during sleep results in increased synaptic depression which progressively leads to slowing of renormalisation until an equilibrium point is reached where synaptic strength is sufficiently low to further weaken synaptic connections. The clear prediction from this view is that slow-wave activity can be used as a measure of the degree to which the brain restores encoding activity during sleep. This prediction is supported by the brain stimulation studies cited earlier. Concerning our study, we therefore predicted an association between slow-wave activity during the nap preceding novel word learning and episodic encoding ability. We also predicted, following from Hypothesis 2, that pre-encoding slow-wave activity would be associated with the overnight emergence of the lexical competition effect. We employed two measures of slow-wave activity which have been shown by previous work to be involved with word learning: slow-wave sleep duration (Tamminen et al., 2010) and slow oscillation activity (Tamminen, Lambon Ralph, & Lewis, 2013). Crossing these two factors yielded four concrete predictions. We predicted that slow-wave sleep duration would be positively associated with better cued recall (Hypothesis 3a), power in slow oscillation band would be positively associated with better cued recall (Hypothesis 3b), slow-wave sleep duration would be positively associated with larger overnight consolidation effects in emerging lexical competition (Hypothesis 3c), and that power in slow oscillation frequency band would be positively associated with larger overnight consolidation effects in emerging lexical competition (Hypothesis 3d).

2.4.
Hypothesis 4: frontal fast sleep spindle activity in Stage 2 sleep will be associated with better encoding ability While the synaptic homeostasis hypothesis emphasises the importance of slow-wave activity in the downscaling of synaptic strength, it also allows for a role for other neural features of sleep that are associated with the onset of slow oscillations (Tononi & Cirelli, 2014). One of these features is sleep spindles. However, the study reported by Mander et al. (2011) showed an association between Stage 2 rather than Stage 3 frontal fast spindles and restoration of encoding ability. The authors argued that these spindles are a marker of hippocampal activation and support a shift from hippocampal to cortical dependence of newly encoded memories. Based on this finding we predicted that Stage 2 fast frontal spindle activity should correlate with measures of both episodic memory for novel words (cued recall, Hypothesis 4a) and overnight increase in lexical competition (Hypothesis 4b). Although Hypotheses 3 and 4 seek to find support for competing theories of the impact of pre-encoding sleep on memory, in our view these theories are not mutually exclusive. It is possible that both mechanisms outlined in the two hypotheses are in operation simultaneously. While no studies reported so far have observed effects of both sleep spindles and slow-wave activity on restoration of encoding ability at the same time, our word learning paradigm is uniquely suited to uncovering such effects as both slow-wave activity and Stage 2 sleep spindles (Tamminen et al., 2010) have been implicated in consolidation processes involved in this type of learning. Table 1 presents a summary of the research questions, the hypotheses associated with each research question, the statistical tests probing each hypothesis, and the interpretation of each potential statistical outcome.

Behavioural data
Power analyses would ideally be based on directly comparable past research. In this case the only directly comparable published study is that of Mander et al. (2011) in that it compared a nap condition with a no-nap condition (although in a less powerful between-subjects design). However, it is not advisable to base power analyses on a single study (Kiyonaga & Scimeca, 2019). For this reason we searched the literature for all studies that sought to isolate the impact of sleep on word learning in similar paradigms to ours, where both episodic memory and integration of new words with existing words were considered, and which used ANOVAs as the statistical test (Dumay & Gaskell, 2007;Henderson et al., 2012;Tamminen et al., 2010Tamminen et al., , 2017Wang et al., 2016). This yielded 13 effect sizes ranging from f ¼ .23 to f ¼ .75, with a mean of f ¼ .46, 95% confidence interval ranging from f ¼ .36 to f ¼ .54. Given that effect sizes in the published literature tend to be exaggerated (Klein et al., 2018), we chose the smallest reported effect size to base our power analysis on. We used G*Power c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 Is slow-wave activity associated with the restoration of episodic memory encoding ability in word learning?
Hypothesis 3a: Slow-wave sleep duration will be associated with better episodic encoding ability Our sample size N ¼ 45 is large enough to be able to detect the smallest reported comparable effect size (r ¼ .47, required N ¼ 45, Tamminen et al., 2017).
Correlation coefficient between slow-wave sleep duration and number of correct responses in the cued recall task. Hypothesis 3a predicts a significant positive correlation.
Positive result: Slow-wave sleep duration is associated with better episodic encoding ability. Negative result: There is no evidence that slow-wave sleep duration is associated with better episodic encoding ability. Hypothesis 3b: Power in slow oscillation band will be associated with better episodic encoding ability As in Hypothesis 3a.
Correlation coefficient between slow oscillation power and number of correct responses in the cued recall task. Hypothesis 3b predicts a significant positive correlation.
Positive result: Slow oscillations are associated with better episodic encoding ability. Negative result: There is no evidence that slow oscillations are associated with better episodic encoding ability. Is slow-wave activity associated with the restoration of ability to integrate new words with existing words?
Hypothesis 3c: Slow-wave sleep duration will be associated with larger consolidation effects in the integration of newly learned words with existing known words As in Hypothesis 3a.
Correlation coefficient between slow-wave sleep duration and overnight increase in the lexical competition effect. Hypothesis 3c predicts a significant positive correlation.
Positive result: Slow-wave sleep duration is associated with larger consolidation effects in the integration of newly learned words with existing known words. Negative result: There is no evidence that slowwave sleep duration is associated with larger consolidation effects in the integration of newly learned words with existing known words.
c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 Hypothesis 3d: Power in slow oscillation band will be associated with larger consolidation effects in the integration of newly learned words with existing known words As in Hypothesis 3a. Correlation coefficient between slow oscillation power and overnight increase in the lexical competition effect. Hypothesis 3d predicts a significant positive correlation.
Positive result: Slow oscillations are associated with larger consolidation effects in the integration of newly learned words with existing known words. Negative result: There is no evidence that slow oscillations are associated with larger consolidation effects in the integration of newly learned words with existing known words. Is Stage 2 sleep spindle activity associated with the restoration of episodic memory encoding ability in word learning?
Hypothesis 4a: Spindle activity will be associated with better episodic encoding ability As in Hypothesis 3a.
Correlation coefficient between spindle count and number of correct responses in the cued recall task. Hypothesis 4a predicts a significant positive correlation.
Positive result: Spindles are associated with better episodic encoding ability. Negative result: There is no evidence that spindles are associated with better episodic encoding ability. Is Stage 2 sleep spindle activity associated with the restoration of ability to integrate new words with existing words?
Hypothesis 4b: Spindle activity will be associated with larger consolidation effects in the integration of newly learned words with existing known words As in Hypothesis 3a.
Correlation coefficient between spindle count and overnight increase in the lexical competition effect. Hypothesis 4b predicts a significant positive correlation.
Positive result: Spindles are associated with larger consolidation effects in the integration of newly learned words with existing known words. Negative result: There is no evidence that spindles are associated with larger consolidation effects in the integration of newly learned words with existing known words. c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 (Erdfelder, Faul, Buchner, & Lang, 2009;Faul, Erdfelder, Lang, & Buchner, 2007) to determine the required sample size. To detect an effect of f ¼ .23 we would need a sample size of 44 participants in order to test Hypotheses 1 and 2 (within-subjects ANOVA, 90% power, alpha at .02).

Polysomnography data
To evaluate correlations between sleep architecture measures and encoding ability, thus testing Hypotheses 3 and 4, we were guided by Mander et al. (2011) as well as a literature search to identify all reports of correlations between sleep architecture and word learning (Tamminen et al., 2010(Tamminen et al., , 2017. This yielded four correlations, ranging from r ¼ .47 to r ¼ .59, with a mean of r ¼ .53, 95% confidence interval ranging from r ¼ .48 to r ¼ .58. We again chose the smallest reported effect size to base our power analysis on. To detect a correlation of r ¼ .47 we would need a sample size of 45 participants in order to test Hypotheses 3 and 4 (one-tailed, 90% power, alpha at .02). Based on the above calculations, we recruited 45 participants. Participants were students at Royal Holloway, University of London, aged 18-24 years old and were recruited via posters, e-mails, the university research website, social media, a mailing list from the university sleep lab and word of mouth. Participation was voluntary and participants were paid for their time. To be eligible to participate, participants had to be native English speakers with normal hearing and normal or corrected-to-normal vision, have no currently diagnosed neurological, psychiatric, or sleep disorders, must not be currently taking medication affecting their sleep, must not be currently engaged in shift work involving working at night, have not travelled across time zones within two weeks of taking part in any of the sessions in the study, and must have no known special educational needs (e.g., dyslexia, ADHD, autism spectrum disorder).
Participants were excluded if they failed to complete all sessions of the study, or if data were lost for the reasons described in the analysis plan. Based on our experience of running nap studies, we expected about 10e15% of recruited participants would be excluded. Excluded participants were replaced until the required sample size was achieved at which point recruitment ceased. This study was given ethical approval following the Royal Holloway University of London Ethics Committee procedures. Using a within-subjects design, participants took part in both the wake and nap conditions. The order of the conditions was randomly assigned to each participant, and counterbalanced across participants.

Materials
Materials used in this study are available at https://osf.io/ 98es5/. Legal copyright restrictions prevent public archiving of WASI-II which can be obtained from the copyright holders in the cited references.

Screening questionnaire
Participants were required not to consume alcohol or caffeine in the 24 h preceding the encoding session, and not to nap on the day of each encoding session. To facilitate falling asleep in the nap session and to control for time spent awake before the encoding sessions, participants were asked to wake no later than 6am the morning of both encoding sessions and not to nap before the experiment. Adherence was monitored through the screening questionnaire and additionally through an actigraph attached to the non-dominant wrist the evening before an encoding session. Each participant was reminded of these requirements through a phone call or a text message the day before each encoding session. Participants who failed to adhere were dismissed and replaced.

Polysomnography
Polysomnography measures were recorded during the nap using an Embla N7000 system. Electrodes were placed on the scalp on the left and right frontal (F3, F4), central (C3, C4) and occipital (O1, O2) positions as recommended by the American Academy of Sleep Medicine (AASM) guidelines. These were referenced to the average of the left and right mastoid. The ground electrode was placed on the forehead. Electrooculogram (EOG) and electromyography (EMG) was used to measure eye movements and muscle tone respectively, with positioning of the electrodes following AASM guidelines. Impedances were kept below 5U for EEG electrodes and below 10U for EOG and EMG electrodes. Signals were sampled at 500Hz.

SASS-Y (Dietch, Sethi, Slavish, & Taylor, 2019)
A retrospective sleep diary was used to measure participants' habitual sleep patterns in the week prior to data collection. The Self-Assessment of Sleep Survey -Split (SASS-Y) can also be used to determine the difference between sleep patterns on the weekdays versus the weekends. These data were only used for exploratory analyses.

Habitual napping
Participants were asked how often they nap during a typical week, if they reported napping once or more per week then participants were classified as habitual nappers (e.g., Cousins et al., 2019). Participants who reported napping less than once per week were considered as non-habitual nappers. These data were only used for exploratory analyses.

Stanford Sleepiness Scale (SSS; Hoddes, Zarcone & Dement, 1972)
The SSS was used to evaluate levels of sleepiness. Before encoding and again before each test participants were asked to report their current level of sleepiness on a scale from 1 ("feeling active, vital, alert, or wide awake") to 7 ("No longer fighting sleep, sleep onset soon; having dream-like thoughts"). These data were used to check whether participants were equally tired when conducting the test sessions in the nap and no-nap conditions.

Wechsler abbreviated scale of intelligence II (WASI-II: Wechsler, 2011)
The WASI-II was used to measure participants' existing vocabulary knowledge. Participants were shown a list of words from the WASI-II stimulus book under the vocabulary subsection (page 40e51 of the WASI-II stimulus book). The researcher read each word aloud to the participant and pointed to the respective word. Participants were asked to define each word. All participants started at item four as they were over six years of age and continued until they scored zero for three consecutive items allowing them a maximum raw score of 59. Each response was scored from zero to two points, with DK to represent when the participant said they do not know the definition or NR if the participant did not respond in approximately 30 s. The scoring was in accordance with the WASI-II administration manual with two points awarded for clear understanding of the word, one point for general understanding but in a more vague manner, and zero points for incorrect or no obvious understanding of the word. This scoring allows a measurement of depth of vocabulary knowledge by assessing what the participant knows about the word. Additionally, this vocabulary measurement assesses breadth of vocabulary knowledge via the number of words a participant knows. Raw scores were converted to normreferenced T-Scores relevant to the participants' age as indicated by the manual. These data were used to characterise the sample based on their vocabulary level.

Novel word learning paradigm
66 stimulus triplets consisting of a familiar base word (e.g., cathedral), a fictitious novel word (e.g., cathedruke), and a similar-sounding nonword foil to be used in the old-new categorisation task (e.g., cathedruce) were selected from stimuli used in Tamminen and Gaskell (2008 . Two of the lists were used for encoding and testing, one in the nap condition and the other in the no-nap condition. The third list remained an untrained control. The assignment of lists to these conditions was counterbalanced across participants. In addition, the lexical competition task required 88 filler words, 44 to be used in the nap condition and 44 in the no-nap condition. This ensured that only one quarter of real words presented in the lexical decision task at each session were base words related to the newly learned words, making it unlikely that participants would explicitly notice the overlap between base words and the phonologically overlapping trained novel words. In each list of 44 real filler words, 15 were monosyllabic, 15 were bisyllabic and 14 were trisyllabic. These filler words were available from the stimulus set of Tamminen and Gaskell (2008). One hundred and fifty-two filler non-words were also available from Tamminen and Gaskell (2008). This means that the majority of filler nonwords were not repeated across the two conditions (i.e. in each of the two lists of 88 non-word fillers), but 23 nonword fillers were used in both conditions. Since responses to the filler items are not analysed, this partial repetition of fillers did not impact on the results of the study. Each list of nonword fillers contained monosyllabic (N ¼ 40), bisyllabic N ¼ 24), and trisyllabic (N ¼ 24) nonwords. These nonword fillers were generated by Tamminen and Gaskell (2008) and were created by changing one phoneme of a real word to form word-like legal nonwords. None of the real words used for this purpose occurred in any of the tasks used in the experiment. All spoken recordings of the stimuli were made by Tamminen and Gaskell (2008) and were recorded by a female speaker of southern British English in a soundproof booth.

Procedure
Participants underwent two two-day experimental sessions, one for the nap condition and one for the no-nap condition, separated by a minimum of two weeks and a maximum of four weeks to avoid carryover effects. Both sessions involved a 100-min sleep or wake opportunity followed by the encoding phase and a same-day test phase. A delayed test phase took place the next day. The procedure is visualised in Fig. 1.
Both the nap and no-nap sessions began at 2 pm. Participants were first asked to complete the consent forms, the initial screening questionnaire, and the SASS-Y. In the nap session participants then underwent preparation for the PSG recording. In the no-nap session participants were allowed to engage in activities of their choice within the sleep laboratory for an equivalent time (20 min). Following Mander et al. (2011), in the nap condition participants were then given a sleep opportunity of 100 min. In the no-nap condition participants watched a video with no language input (Mr. Bean) for 100 min. At the end of the 100 min, the nap participants were given 30 min to clean up and to overcome potential sleep inertia effects. In the no-nap condition participants were allowed to engage in activities of their choice within the sleep laboratory for an equivalent time. After this, participants filled in the SSS and the novel word training session began.
The novel word training and test tasks closely followed the protocols established in Tamminen et al. (2010). Participants learned the novel words by listening to repeated presentations of each novel word while monitoring for one of six target phonemes (/P/,/D/,/S/,/M/,/N/,/L/). There were five blocks of phoneme monitoring: in each block each novel word was heard six times, once while monitoring for each of the six target phonemes. This yielded a total of 30 exposures to each novel word in this task. Each phoneme monitoring trial began with the visual presentation of the target phoneme. The spoken novel word was presented through headphones 500 ms after the onset of the visual target. The target remained on the screen until the end of the trial. Participants were required to indicate with a button press whether the target phoneme was present or absent in the novel word. The trial ended when a button press was made or after 3000 ms had elapsed from the offset of the auditory presentation of the word. 1 In addition to the phoneme monitoring task, there were four blocks of verbal repetition task where participants heard each novel word once and were asked to repeat the word 1 In the Stage 1 submission we stated that the trial would end after a button press or after 3000 ms had elapsed from the onset of the visual target. In programming the experiment, we noticed that this arrangement risks slower participants missing trials. We therefore changed the trial structure with the editor's approval. c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 aloud. The trial ended when the participant pressed a response key to indicate they had completed their response. 2 Vocal responses were recorded via head mounted microphone. These blocks were interleaved with the five phoneme monitoring blocks. Therefore, the total number of exposures to each novel word across the two training tasks was 34.
Immediately after the training phase participants entered the same-day test phase. The test tasks consisted of lexical competition, cued recall, and old-new categorisation, in this fixed order. Please note that since there is inconsistent evidence of the impact of sleep on recognition memory in word learning, we do not have a strong basis for making predictions about the old-new categorisation task and therefore this task was only considered in exploratory analyses. In cued recall participants heard the first two or three phonemes of a trained novel word, excised from the recording of the complete word. The length of the cue (two or three phonemes) depended on how much of the word is needed to identify it reliably from other words beginning with the same sequence of phonemes. The task was to say aloud the complete novel word during the 10 s that followed from the presentation of the cue. Participants' verbal responses were recorded for offline scoring as above. The trial ended after the 10 s. No feedback was given regarding accuracy to avoid further learning. The order of trials was newly randomised for each participant by the software.
In the lexical competition task, participants were asked to determine as quickly and accurately as possible whether a presented spoken stimulus was a real word or a nonword (i.e. make a lexical decision). The task included 22 base words (e.g., cathedral), 22 control words (e.g., dolphin), 44 filler real words, and 88 filler nonwords. The task began with a practice block of ten trials, five nonwords and five real words. A lexical competition task trial began with the visual presentation of a fixation cross for 500 ms. The spoken word was then presented through the headphones. The trial ended when a button press (word or nonword) was made or after 2500 ms had elapsed from the offset of the spoken word. At the end of the trial feedback on response accuracy was given through the presentation of a smiley face (correct) or frowning face (incorrect). The feedback remained on the screen for 750 ms before the next trial began. The order of trials was newly randomised for each participant by the software. Reaction times were measured from the onset of the spoken word.
In the old-new categorisation task, participants heard 22 trained novel words (e.g., cathedruke) and 22 untrained foils (e.g., cathedruce). Participants were asked to make a button press to indicate if the word was old (i.e. trained) or new (i.e. a foil). A trial began with the visual presentation of a fixation cross for 500 ms. The spoken word was then presented through the headphones. The trial ended when a button press (old or new) was made or after 5000 ms had elapsed from the onset of the spoken word. If no response was made within the 5000 ms, a message saying "too slow" was presented on the 2 In the Stage 1 submission we stated that the trial would end after 2000 ms had elapsed from the onset of the spoken novel word. In programming the experiment, we noticed that this risks slower participants to miss responses. The subsequent change to the trial structure was approved by the editor. screen until the participant pressed a button to proceed to the next trial. This was to encourage fast responding. No feedback was given regarding accuracy to avoid further learning. Reaction times were measured from the onset of the spoken word.
The WASI was administered once, at the very end the study (after the final test session). In all tasks stimuli were delivered by E-Prime. Manual responses were collected by a millisecond accurate button box. Auditory stimuli were delivered, and vocal responses collected by Beyerdynamic DT 234 Pro headsets.

3.4.
Exclusion criteria, quality checks, and blinding procedures Only participants who met the eligibility criteria (set out in Section 3.1) were recruited to take part. Once participants were recruited, they were excluded (and replaced) from the analysis if they met any of the following criteria.
1. Their average accuracy rate in the phoneme monitoring task in the training phase was at or below chance (50%). This suggests they were not paying attention to the novel word training and therefore the quality of the data in both training and subsequent tests will be compromised. 2. Their average accuracy rate in the lexical competition task was at or below chance (50%) in one or more of the test sessions. This suggests they were not attending to the task and therefore the quality of the data in this task will be compromised. 3. They failed to enter slow-wave sleep or Stage 2 sleep during the nap. Theories discussed above and the existing literature (e.g., Antonenko, Diekelmann, Olsen, Born, & M€ olle, 2013) predict that slow-wave sleep and/or Stage 2 sleep will be of critical importance in enhancing encoding ability. Therefore, data from participants who fail to get any slowwave and/or Stage 2 sleep will not be able to address the hypotheses of this study. 4. They failed to follow any of the instructions outlined in Section 3.2.1 as indicated by the screening questionnaire and/or actigraphy. 5. Their data from any of the behavioural tasks or PSG data were lost due to experimenter error or device malfunction. 6. They chose to withdraw from the study before completing all sessions and tasks.
Blinding procedures were implemented to avoid experimenter bias from impacting the results. We identified two stages of the data analysis where bias could occur. Firstly, to avoid any bias in the scoring of the PSG data, the scorer had not seen any of the participant's behavioural data at the time of scoring. Secondly, to avoid bias in scoring the accuracy of the cued recall data, the scorer remained blind to the condition (nap or no-nap) of the dataset being scored. There was no room for subjective judgement in the analysis of other aspects of the data, as the analysis strictly followed the pipeline outlined below. Potential experimenter bias during data collection was avoided by giving participants full written instructions for each behavioural task, in addition to brief computer-administered instructions at the beginning of each task.

Analysis plan
Alpha levels in all analyses were set at .02, unless otherwise stated.

4.1.
Hypothesis 1: napping compared to wake before encoding will result in better episodic memory of newly learned words

Cued recall
The cued recall responses recorded by E-Prime were phonetically transcribed and scored as correct or incorrect. Only responses that matched the cued novel words in all phonemes were scored as correct. A 2 Â 2 within-participants ANOVA with nap condition (nap vs. no-nap) and test time (immediate vs. delayed) as factors was calculated on the proportion of novel words correctly recalled. A significant main effect of nap condition would support Hypothesis 1. No significant interaction was predicted, but if a significant interaction had been found, planned contrasts (one-tailed t-tests) would have been used to compare the nap and no-nap conditions at both tests times to clarify the source of the interaction. We predicted that in at least one of those comparisons the nap condition would show higher cued recall. In this case a Bonferroni correction would be applied to the alpha level, and therefore only t-tests with p .01 would be considered statistically significant.

4.2.
Hypothesis 2: napping compared to wake before encoding will result in larger consolidation effects in the integration of newly learned words with existing known words

Lexical competition
RTs only to the base words and control words were analysed, data for filler words and nonwords are not informative regarding our hypotheses. Following precedent using this same task (Tamminen et al., 2010;Tamminen & Gaskell, 2008), RTs were log-transformed to better meet the assumption of normality, and all RTs faster than 300 ms and slower than 2500 ms were removed to reduce the impact of outliers. Only trials where an accurate response was made were entered into the analysis. The magnitude of the lexical competition effect was calculated for each participant at both test times by deducting the mean log RT to control words from the mean log RT to base words with novel competitors. A 2 Â 2 withinparticipants ANOVA with nap condition (nap vs. no-nap), and test time (immediate vs. delayed) was calculated on the RTs. Given that lexical competition effects are not observed immediately after training, Hypothesis 2 predicted that participants in the nap condition would show a larger overnight increase in the lexical competition effect, manifested in a significant interaction. A planned contrast (one-tailed t-test) would have been used to confirm that the significant interaction reflected this pattern by comparing the overnight c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 change in the magnitude of the lexical competition effect across the two conditions.

4.3.
Hypothesis 3: slow-wave activity will be associated with better encoding ability Within the nap condition, sleep architecture data were assessed to investigate the role of slow-wave activity in restoring encoding capacity in accordance with the synaptic homeostasis hypothesis and existing literature (Antonenko et al., 2013;Ong et al., 2018). Two measures of slow-wave activity that have in the past been successfully implicated in novel word learning were investigated here: slow-wave sleep duration (Tamminen et al., 2010) and spectral power in the slow oscillation frequency band (Tamminen et al., 2013). We closely followed the analysis strategy and methods of these existing studies.

Slow-wave sleep duration
The PSG record of the nap was scored by an experienced sleep scorer following AASM guidelines (see Section 3.4 for blinding measures) in 30-s epochs. Total time spent in slow-wave sleep was recorded for each participant. A one-tailed Pearson correlation coefficient was calculated between slow-wave sleep duration and proportion of correct responses in the cued recall task (Hypothesis 3a). Only data from the immediate test were used here, as there was no a priori justification for expecting that a correlation would be seen only in the immediate test but not in the delayed test, or vice versa. A onetailed Pearson correlation coefficient between slow-wave sleep duration and the overnight change in the lexical competition effect was also calculated (Hypothesis 3c). The lexical competition effect here and in subsequent analyses was calculated for each participant by deducting the mean RT to control base words from the mean RT to base words. As two correlations involving slow-wave sleep duration were calculated here, a Bonferroni correction was applied to the alpha level, and therefore only correlations with p .01 were considered statistically significant.

Slow oscillations
EEG power spectral density in slow-wave sleep was analysed in the slow oscillation frequency band (.5e1 Hz). Preprocessing steps included extraction of those epochs scored as slowwave sleep, exclusion of noisy electrodes (e.g., due to an electrode becoming dislodged during the nap), and manual removal of artefacts due to arousals or movement. The preprocessed data on each EEG electrode were then subjected to power spectral analysis following Welch's method using 4-s Hamming window length with 50% overlap. Statistical analysis of the spectral data was carried out on log-transformed absolute power (mV 2 /Hz). As there was no a priori theoretical or empirical justification for restricting the analysis to specific electrode sites, we calculated the average power across all electrodes. This is further justified by data reported in Tamminen et al. (2013) who found slow-oscillation power averaged across cortical electrodes to be involved in integration of newly learned words in the mental lexicon. One-tailed Pearson correlation coefficients were calculated between slow-oscillation power and proportion of correct responses in the cued recall task (Hypothesis 3b). Only immediate test data were used again. A one-tailed Pearson correlation coefficient between slow-oscillation power and overnight change in the lexical competition effect was also calculated (Hypothesis 3d). As two correlations involving slow-oscillation power were calculated here, a Bonferroni correction was applied to the alpha level, and therefore only correlations with p .01 were considered statistically significant.

4.4.
Hypothesis 4: frontal fast sleep spindle activity in stage 2 sleep will be associated with better encoding ability Preprocessing steps included extraction of those epochs scored as Stage 2 sleep, exclusion of noisy electrodes (e.g., due to an electrode becoming dislodged during the nap), and manual removal of artefacts due to arousals or movement. The raw EEG data were then band-pass filtered to restrict analysis to fast spindles (13.5e15Hz) using a linear finite impulse response (FIR) filter. An automated detection algorithm developed by Ferrarelli et al. (2007) was used to derive the number of discrete spindle events for each electrode. The algorithm is freely available as an appendix to Warby et al. (2014). Briefly, the algorithm counts amplitude fluctuations in the filtered time series exceeding a predetermined threshold as spindles. These thresholds are calculated relative to the mean channel amplitude and set to eight times the average amplitude. This algorithm has been successfully used in previous research into sleep and word learning (e.g., Tamminen et al., 2013;Tham, Lindsay, & Gaskell, 2015), including the same spoken word learning paradigm as in the current study (Tamminen et al., 2010). The average number of spindles detected over frontal electrodes (F3 and F4) was calculated and used in the correlational analyses. One-tailed Pearson correlation coefficients were calculated between number of spindles and proportion of correct responses in the cued recall task (Hypothesis 4a). A one-tailed Pearson correlation coefficient between the number of spindles and overnight change in the lexical competition effect was also calculated (Hypothesis 4b). As two correlations involving spindle count were calculated here, a Bonferroni correction was applied to the alpha level, and therefore only correlations with p .01 were considered statistically significant.

Current sleepiness
As the main manipulation in this experiment concerns sleep, it is possible that participants may have been more or less sleepy at test in one or the other condition and this may have affected their test performance. Out of our two tests tasks cued recall could have been vulnerable to this potential confound (sleepiness may have affected global RTs in the lexical decision task but not the RT difference between the control words and base words with new competitors). We compared the Stanford Sleepiness Scale (SSS) scores at both test sessions to check whether participants were equally tired when completing the nap and no-nap conditions. A two-tailed paired t-test comparing data from the two conditions was calculated for the immediate test session, and another one for the delayed test session. A Bonferroni correction was applied to the alpha level, and therefore only t-tests with p .01 were considered statistically significant. If a significant difference in SSS scores had been found at either test time, we would have repeated the ANOVA analysis under Hypothesis 1 using an Analysis of Covariance (ANCOVA). For each participant we would have calculated the sleepiness difference between the nap and no-nap conditions (based on the immediate test if that is where the difference was observed, based on the delayed test if that is where the difference was observed, or based on the mean values across both tests if the difference was observed in both). This difference would have been centered (Schneider, Avivi-Reich, & Mozuraitis, 2015) and entered as an additional covariate with the same main effects and interactions as outlined under Hypothesis 1. If sleep restores episodic memory encoding ability, we predicted we would see a significant main effect of nap condition even when the sleepiness difference between nap and no-nap conditions is accounted for by the covariate.

Results
The data and analysis scripts are available at https://osf.io/ 98es5/and the Stage 1 manuscript at https://osf.io/73yms/. There were eight male (18%) and 37 female (82%) participants. Participants were aged between 18 and 24 years old (M ¼ 21.11, SD ¼ 1.22 years). The minimum age-normed t-score on the WASI-II vocabulary assessment was 44 and the maximum was 77 (M ¼ 54.11, SD ¼ 6.06), indicating vocabulary abilities in the average range. Thirteen participants were non-habitual nappers (29%) and 32 were habitual nappers (71%). Seven further participants took part but were replaced following the criteria outlined in Section 3.4. Three participants withdrew before completing the study. Three participants were excluded due to equipment malfunction. One participant was excluded due to researcher error, training them on the same word list in both sessions. All data were collected in accordance with the registered report protocol, with two exceptions, both approved by the editor. First, one participant had their second session 34 days after the first, compared to the intended 28 days. As this minor deviation from protocol is unlikely to impact the results, we included the participant's data in the analyses. Second, two participants' data had to be excluded from the sleep spindle and spectral power analyses due to intermittent loss of signal. We were unable to replace these two participants due to a narrow data collection window caused by COVID-19 restrictions to face-to-face data collection.

5.1.
Hypothesis 1: napping compared to wake before encoding will result in better episodic memory of newly learned words

5.2.
Hypothesis 2: napping compared to wake before encoding will result in larger consolidation effects in the integration of newly learned words with existing known words

Lexical competition
Trimming of outliers (below 300 ms or above 2500 ms) resulted in the removal of 1.82% (N ¼ 144) of the trials. Removal of incorrect responses resulted in the removal of a further 7.06% (N ¼ 549) of trials. Table 2 shows the RTs to base words and control words in each condition, and the error rate per condition. Fig. 3 shows the magnitude of the lexical competition

5.3.
Hypothesis 3: slow-wave activity will be associated with word learning capacity 5.3.1. Hypothesis 3a: slow-wave sleep duration and cued recall Table 3 shows the average distribution of time spent in different sleep stages during the 100-min nap opportunity. Fig.  4A shows the association between proportion of words correctly recalled in the cued recall task and slow-wave sleep duration in the immediate test in the nap condition. There was no significant correlation between slow-wave sleep duration (in minutes) and proportion of accurate cued recall responses, r(43) ¼ .17, p ¼ .139 [95% CI ¼ À.09 e 1.00]. Fig. 4B shows the association between proportion of correct answers in the cued recall task and spectral power in the slow oscillation frequency band. There was no significant correlation between these variables, r(41) ¼ .16, p ¼ .153 [95% CI ¼ À.10 e 1.00]. Fig. 5A shows the association between the overnight change in the magnitude of the lexical competition effect and time spent in slow-wave sleep. There was no significant correlation between these variables, contrary to our prediction, r(43) ¼ À.11, p ¼ .243, [95% CI ¼ À1.00 e .15].    ]. This suggests that more fast frontal spindles prior to encoding is associated with greater increases in lexical competition from the immediate test to the delayed test (after a period of overnight sleep).

Current sleepiness
Paired t-tests were used to examine whether there were significant differences in the Stanford Sleepiness Scale (SSS) ratings in the nap condition test sessions compared to the wake condition test sessions. There was no significant difference in the immediate test, t(44) ¼ À.37, p ¼ .71, d ¼ À.08 or in the delayed test, t(44) ¼ À.75, p ¼ .46, d ¼ À.11. Therefore, we do not need to use an ANCOVA in the main analyses, as stated in Section 4.5.

Exploratory analyses
6.1. Cued recall using a graded measure of accuracy In our pre-registered analyses of the cued recall data, we categorised a response as correct only if it perfectly matched the trained novel word. As shown in Fig. 2, only a small proportion of responses met this criterion. Importantly, under this scoring both a failure to produce any response and a response that deviates from the correct word by just one phoneme are scored as incorrect, yet the participant in the latter case clearly recalled more than in the former case. In response to this problem, we (Ricketts, Dawson, & Davies, 2021) and others have started calculating Levenshtein distance as a more fine-grained measure of recall accuracy, both in the domain of word learning (e.g., Ricketts et al., 2021;Frances et al., 2020) and in the literature looking at the impact of sleep-associated memory consolidation (e.g., Kurdziel & Spencer, 2016). Levenshtein distance is a measure of how  many changes (including insertions, deletions, or substitutions of phonemes or letters) are needed for the participant's response to match the correct response; a larger distance suggests the participant's response was further away from the target. It can be used to calculate a matching percentage that quantifies the overlap between the given response and the correct response. In calculating the matching percentage we followed the formula used by Kurdziel and Spencer (2016): matching percentage ¼ 100 e ((Levenshtein Distance)*100)/MaxLength, where MaxLength equals the number of phonemes in the longer of the two words (actual response or correct response) and controls for word length. A matching percentage of 100% represents a completely correct response. A matching percentage of 0% indicates a response with no overlap with the correct response, or a failure to enter a response. Importantly, Levenshtein distance captures a graded degree of accuracy that is more sensitive to individual differences than binary (correct or incorrect) accuracy  ( Ricketts et al., 2021); the formula from Kurdziel and Spencer (2016) also considers the impact of blank responses in the calculation of mean accuracy. In our first exploratory analysis we repeated the main analyses on the cued recall task using matching percentage as the measure of accuracy (see Fig. 7). If multiple responses were given for the same item, then the item with the shortest Levenshtein distance was selected as the final answer. A mean matching percentage was calculated per participant per session. There was no significant main effect of nap condition (nap vs. no

Recognition memory
As outlined in the Method section, we included an old-new categorisation recognition memory test but did not include this in the pre-registered analyses as we were not confident in making firm predictions about recognition memory. However, given that some word learning studies have reported a beneficial impact of sleep following learning on old-new categorisation accuracy and response times (e.g., Tamminen et al., 2010), we conducted exploratory analyses to establish whether our nap manipulation affected accuracy and response times in this task. Accuracy was measured in d' to account for response bias. This was calculated by subtracting the z-transformed proportion of accurate "old" responses to trained words (i.e. hits) from the z-transformed proportion of inaccurate "old" responses to foils (i.e. false alarms).  Another 2 Â 2 ANOVA was conducted using the reaction times for correct responses to trained words. Removing the incorrect responses resulted in the removal of 18.61% of the data. Using the same cleaning procedure as Tamminen et al. (2010), reaction times slower than 3,000 ms and faster than 500 ms were removed. This resulted in the removal of a further 1.49% of the data. The remaining reaction times were log transformed, and a mean was calculated per participant per session. There was no significant main effect of nap condition, F(1, 43) 3 ¼ .  Table 4).

WASI vocabulary scores
Researchers have argued that a participant's vocabulary size predicts word learning success, at least in children (e.g., James, Gaskell, Weighall, & Henderson, 2017). Given that we collected a measure of vocabulary size, we sought to extend these findings to our sample that consisted of young adults. For these exploratory analyses, raw WASI vocabulary scores are used, as t-scores are age adjusted, and our sample crosses an age boundary for the WASI t-scores, however, we do not anticipate any developmental differences in a sample of 18e24-year-olds. An average of performance across all test sessions for each test task was calculated per participant. However, in the lexical competition analyses, only the delayed test data were included as lexical competition effects typically only emerge after a period of consolidation. This was then correlated with the raw WASI vocabulary scores to examine whether participants with better vocabulary have a higher ability to learn the novel words. As we were conducting four two-tailed correlation tests, alpha level was Bonferroni corrected to .005. There was no significant correlation between vocabulary and cued recall performance either when measured as proportion of accurate responses, r(43) ¼ .21, p ¼ .169 [95% CI ¼ À.09 e .47] or when measured in matching percentage, r(43) ¼ .04, p ¼ .781 [95% CI ¼ À.25 e .33]. There was also no significant correlation between vocabulary score and old-new categorisation d' scores, r(43) ¼ .19, p ¼ .208 [95% CI ¼ À.11 e .46], or between vocabulary scores and the magnitude of lexical competition effects in the delayed test, r(43) ¼ .27, p ¼ .077 [95% CI ¼ À.03 e .52].
Next, we sought to establish whether participants with lower vocabulary size might benefit from the nap more than participants with higher vocabulary size. For cued recall and old-new categorisation, we first averaged the data across the immediate and delayed sessions, and then subtracted the wake condition from the nap condition, to yield the difference between nap and wake across test times. As we conducted three two-tailed correlation tests, alpha level was Bonferroni corrected to .007. There was no significant correlation between vocabulary size and the nap effect in cued recall matching percentage, r(43) ¼ .08, p ¼ .579, [95% CI ¼ À.21 e .37], or in old-new categorisation accuracy, r(43) ¼ À.18, p ¼ .231, [95% CI ¼ À.45 e .12]. For lexical integration we calculated the magnitude of change in the lexical competition effect from the immediate to the delayed test sessions, and again subtracted the wake condition from the nap condition. There was no significant correlation between vocabulary size and the nap effect, r(43) ¼ .27, p ¼ .073, [95% CI ¼ À.03 e .52].

Frontal slow spindles
In our pre-registered analyses we focussed on fast sleep spindles as Mander et al. (2011) found fast spindles to be associated with restoration of episodic encoding capacity, and no such association involving slow spindles. In exploratory analyses, we sought to establish whether the association between fast spindles and the emergence of lexical competition effects that we observed is specific to fast spindles, or whether it would also be observed in slow spindles. Therefore, we repeated the spindle analyses with slow (11e13.5 Hz) frontal spindles in Stage 2 sleep. In all other respects the analysis was identical to the pre-registered analysis. As we were conducting three two-tailed correlation tests, alpha level was Bonferroni corrected to .007. Similarly to the findings in the original analyses, we did not find a significant relationship between slow frontal spindles and cued recall accuracy when measured in proportion accuracy, r  to the same association with fast spindles but which did not survive the Bonferroni correction here.

SASS-Y retrospective sleep diary
Recent studies have shown that restricted sleep and sleep/ wake variability can have detrimental effects on adolescents' and young adults' ability to learn (Lowe, Safati, & Hall, 2017). Such restriction and variability could be captured by the SASS-Y as it evaluates sleep both on weekdays and weekends. Table  5 shows the measures collected in the SASS-Y averaged across all participants. We removed SASS-Y questionnaires with missing data, allowing only complete sets of within-subjects data (N ¼ 39). Using the difference between weekday and weekend total sleep time as a proxy for sleep restriction, we predicted that participants with higher levels of sleep restriction would learn fewer new words. We conducted twotailed exploratory Pearson's correlations between sleep restriction and cued recall matching percentage in the immediate test, as this appears to be the most sensitive test of sleep effects in our data. As we were conducting two two-tailed correlation tests, alpha level was Bonferroni corrected to .01. We found no significant correlation when using data from the nap condition, r (37)

Habitual napping
Using two-tailed exploratory independent t-tests, we examined whether there was a significant difference between habitual nappers (one or more nap per week) and non-habitual nappers (napping less than once per week on average) slow-wave sleep duration or Stage 2 fast frontal sleep spindles. There was no significant difference between habitual and non-habitual nappers in SWS duration, t(43) ¼ À1.43, p ¼ .16, non-habitual nappers had an average of 26.6 min SWS (SD ¼ 14.4) and habitual nappers had 20.27 min (SD ¼ 13.1). There was also no significant difference in the average number of sleep spindles, t(41) ¼ À1.32, p ¼ .20, nonhabitual nappers had slightly more spindles on average (M ¼ 43.04, SD ¼ 28.1) than habitual nappers (M ¼ 33.52, SD ¼ 18.2).

Bayes Factors
We conducted exploratory post-hoc Bayesian analyses to examine the extent to which our non-significant results in the pre-registered analyses support the null hypotheses. Bayesian ANOVAs and correlations were calculated in JASP 0.16.4 (JASP Team, 2022) following van Doorn et al. (2021) and van den Bergh et al. (2020). In all analyses we used the default priors implemented in JASP.
6.7.1. Hypothesis 1: napping compared to wake before encoding will result in better episodic memory of newly learned words For Hypotheses 1 and 2 we conducted a Bayesian version of the ANOVA in the original pre-registered analyses. The Bayes factors (BF) of the effect analysis are reported in Table 6. The BFs associated with the nonsignificant effects are both higher than 1/3 and below 3 and therefore provide only weak evidence for the null hypothesis (van Doorn et al., 2021).  c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 6.7.2. Hypothesis 2: napping compared to wake before encoding will result in larger consolidation effects in the integration of newly learned words with existing known words The BFs associated with the main effects are below 1/3 and therefore show moderate evidence in favour of the null hypothesis, and the BF associated with the interaction is below 1/10 and therefore provides strong evidence for the null hypothesis (Table 6).

Correlational analyses
Bayes Factors were calculated for the Pearson correlations reported under the pre-registered analyses (Table 7). All nonsignificant correlations show moderate evidence in favour of the null hypothesis. The BF for the significant correlation between spindle activity and increase in the lexical competition effect is above 3 and therefore associated with moderate evidence for the alternative hypothesis.

Discussion
In the current study we set out to test four sets of hypotheses based on predictions made by theories that suggest sleep restores capacity to encode new episodic memories (Saletin & Walker, 2012;Tononi & Cirelli, 2003. In the nap condition participants took a short daytime nap before being trained on a set of new words. In the no nap condition participants remained awake for an equivalent duration and watched a video instead of napping. Participants were tested on their memory for the newly encoded words in an immediate test, and in a next-day delayed test. Our pre-registered analyses did not find a significant main effect of napping compared to wake on test performance across any task, although our exploratory analyses did suggest better immediate cued recall in the nap condition when using a more sensitive measure of accuracy. We did not find a significant correlation between test performance and slow-wave activity. However, in line with some earlier reports (Mander et al., 2011) we found that more Stage 2 fast frontal spindles during the daytime nap were significantly correlated with greater increase in lexical integration from the immediate to the delayed test.
Our first hypothesis stated that if the nap restores encoding ability, participants should be able to encode more words after the nap than after wake, and therefore show higher levels of recall in a cued recall task following training in the nap condition. Our pre-registered analysis did not support this hypothesis: there was no statistically significant difference between the number of words recalled correctly in the nap condition and the no nap condition. However, binary cued recall accuracy (correct or incorrect) lacks sensitivity in the current study due to the presence of floor effects. Importantly, this approach fails to account for partially correct responses and instead treats fully incorrect responses (or no response at all) and responses that only minimally deviate from the correct word as equivalent. Therefore, we (Ricketts et al., 2021) and others (e.g., Frances, de Bruin, & Duñabeitia, 2020) have recently adopted more sensitive approaches to scoring recall data. In exploratory analyses, we used matching percentage (Kurdziel & Spencer, 2016), a measure based on Levenshtein distance, to calculate a graded measure of recall accuracy. In this analysis, participants' responses were significantly more accurate in the nap condition compared to the no nap condition in the immediate test. In the delayed test, performance was equal regardless of whether a nap or wake preceded encoding. It is important to acknowledge that exploratory analyses are not hypothesis driven and must be treated with caution. Further research is needed to test the hypothesis that a nap before learning benefits cued recall when using our more sensitive measure of accuracy.
In both the pre-registered and exploratory cued recall analyses, there was significantly better performance in the delayed test than the immediate test. This may partially be explained by practice effects, especially as the old-new categorisation task may have served as additional training by providing one additional exposure to each new word in the immediate test. Memory consolidation processes during overnight sleep between the test sessions may also have contributed to the increase in recall levels. For example, Tamminen et al. (2010) showed that both cued and free recall benefitted from sleepassociated memory consolidation in the same word learning paradigm as used here, whereby recall levels increased after a night of sleep. Sleep-associated memory consolidation is thought to operate preferentially on weaker rather than stronger memories, at least in the absence of interference (Petzka, Charest, Balanos, & Staresina, 2021). If consolidation processes occurring between the first and second test favoured the weaker condition (no nap), this could explain why the nap benefit in our exploratory analyses was no longer seen in the delayed test.
Our second hypothesis was that the increase in lexical competition effects from the immediate test to the delayed test, which reflects emerging integration of new and old information, would be greater in the nap condition compared to the no nap condition. We found no significant impact of the nap on the emergence of the lexical competition effects. While this appears to indicate that pre-learning sleep does not benefit later integration of new words in the mental lexicon, we note that the magnitude of lexical competition effects remained close to zero in the delayed test, where we expected to see effects significantly above zero. It is not clear why this was the case. One possible explanation comes from an inspection of the overall Table 7 e Bayes factors (BF 10 ) for the pre-registered correlational hypotheses.

Hypothesis
Bayes Factor (BF 10 ) 3a. Slow-wave sleep duration will be correlated with cued recall accuracy .33 3b. Slow-wave spectral power will be correlated with cued recall .32 3c. Slow-wave sleep duration will be correlated with lexical competition .24 3d. Slow-wave spectral power will be correlated with lexical competition .20 4a. Fast frontal Stage 2 spindle activity will be correlated with cued recall .24 4b. Fast frontal Stage 2 spindle activity will be correlated with lexical competition 6.31 c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 reaction times in this task. Table 2 shows that the average reaction times in our study ranged between 1122 ms and 1225 ms. Tamminen and Gaskell (2008), who used the same task, the same word stimuli, and the same audio recordings of the stimuli, reported reaction times in the 970 mse1050 ms range (see Figure 2 in Tamminen & Gaskell, 2008). Quantification of the lexical competition effect requires participants to respond as close as possible to the uniqueness point of the spoken words (i.e. the point where the base word cathedral deviates from its new competitor cathedruke). If participants in our study responded substantially after this point, the magnitude of the lexical competition effect could be diluted. Given these considerations, we urge caution in interpreting the null effect observed regarding our second hypothesis. We measured lexical competition effects in a lexical decision task. Another task commonly used for this purpose is pause detection, where participants are not asked to judge the lexicality of a word, but to detect a short pause inserted close to the uniqueness point of the spoken word (e.g., Dumay & Gaskell, 2007). The time it takes to make a decision about the presence of the pause is taken as a proxy for the level of lexical activation at the time with longer pause detection times indicating higher lexical activity (Mattys & Clark, 2002). The advantage of this task is that it is less reliant on explicit linguistic knowledge than lexical decision and may therefore be a more sensitive measure of lexical integration and its time course. In fact, studies using the lexical decision task have sometimes shown a different time course of lexical integration from studies using pause detection (e.g., Tamminen et al., 2010vs. Dumay & Gaskell, 2007. Another reason for the absence of emerging lexical competition effects in our study could be low levels of explicit memory for the novel words. Walker et al. (2019) failed to find overnight increases in lexical competition when the number of exposures to novel words during training was very low suggesting that sleep does not benefit integration of poorly encoded novel words.
The third and fourth hypotheses targeted the neural mechanisms of sleep-associated restoration of memory encoding capacity, and compared predictions made by two different theoretical accounts of restoration. The synaptic homeostasis hypothesis (Tononi & Cirelli, 2014) proposes that slow-wave sleep functions to desaturate the brain by pruning excessive synaptic connections, allowing restoration of encoding capacity after sleep. We therefore predicted in our third set of hypotheses an association between slow-wave activity (i.e., slow-wave sleep duration in hypotheses 3a and 3c; slow oscillation spectral power in hypotheses 3b and 3d) and both cued recall accuracy and the overnight emergence of lexical competition effects. The extension to the Complementary Learning Systems account proposed by Mander et al. (2011) and Saletin and Walker (2012) on the other hand proposes that restoration of encoding capacity is due to a process of hippocampal-neocortical dialogue as indexed by frontal fast sleep spindle activity. We therefore predicted in our fourth set of hypotheses an association between the number of frontal fast spindles and cued recall and the overnight emergence of lexical competition effects.
We did not find a significant relationship between the slow-wave sleep measures and the test tasks. This could suggest that slow-wave sleep is not the key neural mechanism associated with restoration of encoding capacity. However, before accepting such a conclusion we need to consider an important feature of the synaptic homeostasis hypothesis. The theory postulates a slow-wave activity control loop during sleep whereby increasing daytime learning results in increasing synaptic strength which in turn results in increasing slow-wave activity during sleep. If it is the case that our participants engaged in little new learning before taking the nap, then there would be little need for reducing synaptic strength and little variability across participants in slow-wave activity during the nap. This might limit our ability to detect effects involving slow-wave sleep. This point further highlights a potentially important difference between our study and that of Mander et al. (2011). In Mander et al. participants encoded a large set of face-name stimulus pairs before napping. This ensures that episodic memory capacity is saturated before sleep and that restoration processes are engaged. In our study we did not give participants a memory task before the nap, and it is therefore possible that for some participants, restoration processes were not needed or were engaged to a limited degree. Furthermore, whilst our wake condition was designed to roughly match with the nap condition in linguistic input without letting participants sleep, it may be the case that the wake condition due to its passive nature allowed some degree of slow-wave activity that may have diluted the benefit of sleep. Brokaw et al. (2016) showed that even a brief period of wakeful rest allowed memory consolidation of a previously learned short story to occur, and that this consolidation was associated with slow oscillatory EEG activity. Our participants may have benefitted from such activity while watching the video, in contrast to Mander et al. (2011) who allowed participants to complete usual daily activities during wake.
Turning to our predictions concerning sleep spindles, we did not find a significant correlation between sleep spindles and cued recall performance. However, we did find a positive correlation between fast frontal spindles and overnight increase in the lexical competition effect, as predicted. This supports the view put forward by Mander et al. (2011) and Saletin and Walker (2012) that hippocampal-neocortical dialogue shifts newly encoded memories from hippocampal to cortical-dependence, thus restoring hippocampal episodic learning capacity. It is not clear however why spindles were only associated with emerging lexical competition and not cued recall (although Fig. 6 suggests both associations are in the same direction). One possibility is that the observed association is a proxy for overnight memory consolidation. Sleep spindle activity is trait-like such that individuals show stable spindle activity levels from night to night (De Gennaro, Ferrara, Vecchio, Curcio, & Bertini, 2005). Furthermore, Mylonas et al. (2020) recently showed that there is a strong correlation between an individual's spindle density observed during a daytime nap, and spindle density during overnight sleep. Therefore, participants with more spindles during the nap in our study may also have had greater levels of spindles during their nocturnal sleep. Tamminen et al. (2010) showed that spindle activity overnight is correlated with larger increases in the lexical competition effect. If spindle activity during a nap is predictive of spindle activity overnight, the association we observed could reflect overnight emergence of c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 lexical integration rather than an impact of the nap on lexical integration. Because this aspect of our experimental design is correlational, further research is needed to clarify possible causal relationships.
The pre-registered analyses targeted fast sleep spindles. This choice was motivated by Mander et al. (2011) who found a role for fast but not slow spindles in restoring encoding capacity. Furthermore, in the memory consolidation literature it has been argued that fast spindles are involved in consolidation processes while slow spindles may not be (e.g., M€ olle, Bergmann, Marshall, & Born, 2011). In exploratory analyses we sought to establish whether the fast spindle specificity observed by Mander et al. (2011) was also present in our data. We found no statistically significant association between slow spindles and overnight increase in lexical competition effects after Bonferroni correction, suggesting the association we found may be specific to fast spindles. However, the numerical size of the association was very similar for fast (r ¼ .40) and slow (r ¼ .39) spindles, and it therefore seems premature to rule out a role for slow spindles in restoration of encoding capacity based on our data.
The trait-level aspect of sleep spindles (e.g., De Gennaro et al., 2005;Mylonas et al., 2020) also suggests that it is important to consider individual differences as well as group difference in analyses. Our analyses revealed an association between individual differences in sleep architecture and performance that would have been masked had we restricted our analyses to group comparisons alone. Specifically, the number of fast frontal spindles during Stage 2 sleep of the nap was a significant predictor of changes in lexical competition from the immediate to the delayed test. The importance of individual differences has also been observed in experimental manipulations of sleep architecture. For example, Ong et al. (2018) found that participants with greater increases in slow oscillations following acoustic stimulation showed better memory recollection. As in our study, Ong et al.'s (2018) group-level analyses did not indicate an impact of slow-oscillation stimulation on cognitive performance. Therefore, future studies may benefit from investigating the individual characteristics that predict sleep's ability to restore encoding capacity, even when group-level differences are less evident. Such research will require larger sample sizes and perhaps more sensitive measurement.
Other sleep stages may also be of interest in future research. For example, in the current study, less than half (44%, N ¼ 20) of participants achieved REM sleep, with the average duration of REM being 12 min for these participants. The low levels of REM sleep in the current study shows that the majority of participants did not achieve a full sleep cycle during the nap. Therefore, different effects might emerge after a longer sleep duration, allowing a full sleep cycle. On average our participants spent less time in REM than participants reported in Mander et al. (2011), with our participants reaching an average of 5.5 min in REM while Mander et al. reported an average REM duration of 17.5 min. The average duration of the other sleep stages is very similar across the two studies. While we cannot exclude the possibility that our results were weakened by low levels of REM sleep, we note that none of the current theories of how sleep restores memory encoding capacity ascribe a role for REM sleep.
We also note that our sample of participants varied in their napping habits although the majority (71%) were habitual nappers. We are not aware of published literature examining whether habitual and non-habitual nappers might differ in the benefits they gain from a daytime nap respective of subsequent learning. A recent study however has shown that habitual and non-habitual nappers both benefit from a daytime nap occurring after learning (Leong et al., 2021) Even if habitual napping does not affect the memory benefits of a nap, it can influence sleep architecture during a daytime nap and therefore introduce statistical noise to the PSG correlations we were investigating. McDevitt, Alaynick, and Mednick (2012) showed that habitual nappers have more Stage 1 and 2 sleep and less slow-wave sleep than non-habitual nappers. We conducted exploratory analyses to check whether habitual and non-habitual nappers in our sample differed in our key measures of sleep architecture. Habitual napping did not significantly affect the duration of slow-wave sleep or average number of fast frontal Stage 2 sleep spindles, suggesting that this is unlikely to be a substantial confound in our study.
In other exploratory analyses, we examined the effects of test time and napping on recognition memory. Due to mixed results in previous studies using old-new recognition, we were unable to make strong predictions for this task. Some studies have found consolidation benefits for recognition memory (e.g., Tamminen et al., 2010, using the same word learning paradigm) as well as benefits of pre-encoding sleep (e.g., Mander et al., 2011). Our results show significantly better recognition (higher d') in the immediate compared to the delayed test of the old-new recognition task suggesting there was no consolidation benefit in the delayed test. When examining reaction times towards accurate responses, participants responded faster in the delayed test than the immediate test in both conditions, possibly indicating some degree of consolidation demonstrated by speed of recognition. Alternatively, the decreases in accuracy accompanied by faster responses could reflect a speed-accuracy trade-off in the delayed test. This could possibly be underpinned by a change in response strategy whereby participants in the immediate test relied more on recollection than familiarity resulting in slow but accurate responses, but vice versa in the delayed test resulting in fast but inaccurate responses due to both old and new items now being familiar to a degree. Contrary to Mander et al. (2011), we did not find a significant benefit of preencoding sleep on recognition memory. However, the paradigms and number of items used differ between the studies. Participants in Mander et al.'s research were presented with 100 encoded face-name pairs and 50 novel pairs during testing whereas we use 22 trained non-words and presented 22 similar sounding foils. Larger numbers of trials, and a different ratio of "old" to "new" trials in Mander et al.'s (2011) study may allow for greater sensitivity than our recognition test, which perhaps allowed for significant effects to emerge e as seen in our exploratory analyses using graded cued recall accuracy. Additionally, greater levels of training both in regards to depth and breadth of materials studied may be beneficial in future research to reduce floor effects, and thus increase sensitivity.
Since we also measured existing vocabulary knowledge, we were able to conduct exploratory analyses investigating associations between existing vocabulary knowledge and ability to learn new words, and between vocabulary knowledge and any nap benefits. We found no significant correlation between vocabulary knowledge and any of the tests of word c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6 learning. Although these exploratory analyses were not motivated by a hypothesis, the lack of significant correlation contrasts with experimental studies that show an association between existing vocabulary knowledge and subsequent word learning (e.g., Ricketts, Bishop, Pimperton, & Nation, 2011). More generally, the lack of an association seems somewhat at odds with the notion of "Matthew effects" (Stanovich, 1986) in vocabulary development, whereby greater existing vocabulary knowledge is associated with faster growth in vocabulary, or the 'rich getting richer'. However, Matthew effects are not consistent across studies (Pfost, Hattie, D€ orfler, & Artelt, 2014) and may diminish after childhood (Ricketts, Lervå g, Dawson, Taylor, & Hulme, 2020). In addition, research considering Matthew effects tends to focus on the long-range vocabulary growth that is observed across months or years of development, rather than the short-range vocabulary learning that occurs in carefully controlled experimental paradigms. Future research carefully manipulating the nature of existing vocabulary knowledge, and studying these relationships across the lifespan, would be better equipped to examine these effects (e.g., James et al., 2021;Ricketts et al., 2020).
Lastly, we conducted exploratory analyses correlating a proxy for sleep restriction calculated from total self-reported sleep time during weeknights compared to weekends and graded measure of cued recall accuracy. Relevant to this is the social jetlag hypothesis, which suggests that weekday sleep restriction occurs due to a clash between the timing of social commitments (e.g., lectures) and biological sleep preferences (e.g., circadian rhythms) and therefore sleep is longer on the weekends to compensate (Wittmann, Dinich, Merrow, & Roenneberg, 2006). Sleep restriction has been related to lower performance across multiple cognitive domains, including declarative memory (e.g., Lowe, Safati, & Hall, 2017). Based on such findings we speculated that sleep restriction might predict participants' ability to encode new words. We did not find any significant relationships between sleep restriction and test tasks in these analyses however. Notably, we did not see a large difference between weekend and weekday sleep in our university student sample and this might have obscured any effects. The majority of our data collection took place during spring and summer 2021 when teaching and exams took place online to mitigate COVID-19 risks. It is well documented that social jetlag dramatically reduced during the pandemic (e.g., Blume, Schmidt, & Cajochen, 2020) thanks to the flexibility of home working, and it is likely that our participants benefitted similarly. Alternatively, the habitual (past week) nature of the sleep diary used may have masked effects of night-to-night sleep variability during the week, which has been associated with poorer academic grades in university students (Phillips et al., 2017). Such night-to-night variability may affect learning and memory in different ways than weekday/weekend sleep restriction. Future research may benefit from night-to-night recordings for a week prior to the testing sessions, for example using actigraphy and daily diaries to explore these relationships which are outside the scope of the current study.
In conclusion, our pre-registered analyses did not find a significant main effect of a daytime nap restoring ability to encode novel words. Exploratory analyses, using a graded measure of cued recall accuracy revealed that napping may benefit encoding in the immediate test, but that this benefit is not sustained at the next day test. We found no evidence that napping prior to encoding benefited lexical integration. Despite clear predictions made by the synaptic homeostasis hypothesis (Tononi & Cirelli, 2014), we found no significant evidence that pre-encoding slow-wave activity restored encoding capacity nor ability to integrate new and old information. Yet, we found that a greater number of Stage 2 frontal fast sleep spindles predicted greater overnight improvements in integration of new and old information from the immediate to the delayed test. We conclude that further research is needed to verify whether a daytime nap restores capacity to encode new words. Such research needs to measure episodic memory recall in a graded manner to increase sensitivity and reveal potential effects. The neural mechanisms underpinning cued memory recall were not identified in our current study. However, our spindle data suggest that Stage 2 fast sleep spindles are a likely neural mechanism for restoring the capacity to integrate new words in the mental lexicon. These findings have also potential practical importance. Some researchers have suggested that daytime napping should be incorporated into educational settings such as schools and universities, due to the benefits of napping for learning and memory (Cousins, Wong, Raghunath, Look, & Chee, 2019). As  3  parasheff  parashen  parachute  7  3.71  3  3  pelikiyve  pelikibe  pelican  7  3.08  3  3  pulpen  pulpek  pulpit  6  3.00  2  3  siridge  sirit  siren  6  3.44  3  3  skeletobe  skeletope  skeleton  7  3.78  3  3  specimal  specimav  specimen  8  3.44  3  3  squirrome  squirrope  squirrel  6  3.94  2  3 tycol tycoff tycoon 5 3.18 2 c o r t e x 1 5 9 ( 2 0 2 3 ) 1 4 2 e1 6 6