Sleep and intelligence: critical review and future directions

General cognitive ability — or intelligence — is a key psychological phenotype. Individual differences in intelligence may either cause or be a consequence of individual differences in the macrostructure of sleep, such as timing or duration. Furthermore, biological measures of sleep, especially highly trait-like sleep EEG oscillations may provide insights about the biological underpinnings of intelligence. Here we review the current state of research on the association between sleep measures and intelligence. We concluded that the macrostructure of sleep has a small but consistent correlation with intelligence, which is possibly moderated by age. Sleep spindle amplitude and possibly other sleep EEG measures are biomarkers of intelligence. We close by discussing methodological pitfalls of the field, and give recommendations for future directions.


Intelligence
A plethora of various tests are used in neuropsychology or the psychometric testing of healthy individuals to assess an individual's abilities in various domains, such as verbal abilities, spatial skills, memory, general knowledge or arithmetic. While all of these abilities can be considered a type of 'intelligence' [1], in practice they are all correlated, meaning that an individual scoring above average in one can be expected to score above average on all the others as well. The ubiquitous positive correlation between mental tests is called the positive manifold [2,3], and it enables the statistical extraction of a higher-order factor, termed the g-factor, which can be interpreted as a latent, domain-independent general cognitive ability. The g-factor is close to identical (r > 0. 9) regardless of which IQ test battery it is extracted from [4,5], and also almost identical if neuropsychological [6,7] or scholastic [8,9] tests are instead compared with IQ tests. IQ test scores, traditionally obtained by summing subtests scores result in a score that is not fully identical but very highly correlated to the g-factor [10,11]. Nevertheless, only about 40-60% of IQ subtest variance is accounted for the g-factor. In plain words, it means that while the full IQ score mainly reflects a domain-independent general cognitive ability, there is also plenty of variation in the spatial, verbal, arithmetic and other. abilities of individuals with the same g or total IQ score. The external validity (correlation with real-life outcomes) of IQ tests stems mostly if not exclusively from the gfactor [12,13]. In contrast, traumatic injuries to specific brain areas tend to affect lower-level, non-g abilities, such as verbal, spatial or memory skills [14,15].
In this short paper we review the evidence suggesting that intelligence -either g or less commonly specific abilities -is correlated with individual differences in sleep characteristics. In our review, we focus on well-established results from large studies or meta-analyses (Table 1). We do this as a response to the ongoing replication crisis in science [16] which is especially prominent in the neurosciences [17][18][19], including psychological sleep studies [20 ]. Our review deliberately avoids the discussion of small studies or very specific findings -such as the effect of a certain experimental setup or a covariate -with an unknown replication record. We are also cautious with mechanistic interpretations of the findings in the literature, because in our view the causes of many of the reported associations between sleep and intelligence measures are much less well established than the findings themselves. Our message in short is that modest correlations between sleep macrostructure and intelligence are well established, but the study of EEG oscillations in this field, while promising, is still inconclusive, mainly because of methodological limitations including but not limited to the use of small samples, lacking information about the validity of automatic EEG processing algorithms and misunderstandings about the nature of cognitive abilities.

Intelligence and sleep macrostructure
Does the length, composition or timing of sleep differ between individuals as a function of intelligence? A set of large meta-analyses are available about this question, and they suggest that the answer to this question is 'yes', even though the effects are small.
Astill et al. [21] reviewed 86 studies with a combined N > 35000 addressing the correlation between sleep measures and cognitive performance in school-age children. The meta-analytic correlation between sleep duration and cognitive performance was 0.08, and between sleep efficiency and cognitive performance it was 0.12. Effect sizes were smaller for specific cognitive domains (such as memory and attention tests) and larger for scholastic and IQ tests, suggesting a correlation between sleep measures and g itself. The modality of sleep measurement had moderate effects on the association with cognitive performance. Effect sizes were similar whether naturally occurring or experimentally imposed variations in sleep duration were examined and whether sleep duration was reported subjectively or measured, but they were larger if sleep duration was estimated as actual sleep time rather than just time spent in bed or the time between the first sleep onset and last awakening. Sleep efficiency, however, only correlated with cognitive performance if it was measured by actigraphy, but not by polysomnography. Importantly, two recent large studies reported nonlinear effects in both younger or older samples, with both low and high sleep duration both being associated with lower cognitive performance. The first was a study of over 2000 children which reported a weak nonlinear relationship between mother-reported total nocturnal sleep time at 24 months and cognitive test performance at six years of age [22]. The other was a study of over 10 000 adults who filled out the Cambridge Brain Sciences cognitive battery and reported their average sleep duration on a website [23 ]. Both studies found lower cognitive test scores among those sleeping either less or more than the average. Both linear and nonlinear relationships between sleep duration and cognitive ability may reflect effects in either direction: that is, there may be a biological tendency of more intelligent individuals to sleep a certain amount or more efficiently, but deviations from optimal sleep amounts may also either reflect social effects or illnesses with a possible effect on cognitive performance. At least one large longitudinal study [22] found a relationship between optimal sleep duration at baseline and better cognitive performance at the follow-up, suggesting causal effects of sleep on cognition. A large meta-analysis of older participants [24] also revealed nonlinear effects (lower cognitive performance at both sleep duration extremes) which persisted in longitudinal designs.
Kanazawa and Perina [25] used the large National Longitudinal Study of Youth (NLSY) longitudinal sample (N > 10000) to match the self-reported circadian preferences of young adults to childhood IQ scores. They found that respondents had consistently later circadian preferences as a function of higher childhood IQ, a pattern which persisted both on weekdays and weekends. Preckel et al. [26] reviewed 11 independent studies investigating the correlation between cognitive ability or 110 Cognition and perception -*Sleep and cognition* academic achievement and chronotype, the latter conceptualized as either eveningness (preference for activity later in the day) or morningness (preference for activity earlier in the day). (All studies in the meta-analysis included a measure of morningness, but some also included a measure of eveningness as a negatively correlated but formally independent construct.) They found a meta-analytic correlation of r = 0.08 between eveningness and cognitive ability and r = À0.14 between eveningness and academic achievement as well as a meta-analytic correlation of r = À0.04 between morningness and cognitive ability and r = 0.16 between morningness and academic achievement.
These results were confirmed by another meta-analysis [27] specifically about academic achievement, which found a meta-analytic correlation of 0.14 between morningness and achievement. In other words, a preference for later bedtimes is associated with higher cognitive ability but paradoxically with lower scholastic achievement.

Both the Preckel et al. meta-analysis and Kanazawa and
Perina's very large study focused on young adults. Newer reasonably sized studies [28 ,29 ,30] with school-age children, however, did not find negative correlations between intelligence and morningness. This suggests that the negative association between morningness and intelligence may only exist in adulthood, and among children, higher performance of all types may be associated with an earlier circadian preference.
Why is morningness associated with lower intelligence but higher academic performance, and why is this association moderated by age? One possibility is that cognitive abilities and circadian preferences share some of their biological underpinnings, but this manifests itself differently as a function of age. Another possibility is that while higher intelligence is associated with a later chronotype, the advantage higher intelligence confers in test performance is offset by sleep deprivation effects if all children are tested at the same early hour in school. The well-documented effects of sleep deprivation on cognitive test performance [31][32][33] are in support of this hypothesis. However, the evidence is inconclusive whether sleep deprivation effects also apply to traditional IQ tests [34][35][36]. Some research is also available on whether IQ test performance is higher when the test is administered at the preferred time of day. One study of 80 young adolescents of varying chronotypes assigned to either morning or evening testing in a between-subject design [37] found slightly better performance at the preferred time of testing on two out of three WISC-III subtests. Another study of 70 young adults of varying chronotypes repeatedly tested in both morning and afternoon sessions on Multidimensional Aptitude Battery subtests [38] found no such effect.
In adulthood, reverse causation may also be present: because higher IQ is associated with higher job prestige [39] and presumably more flexibility in work scheduling, higher IQ individuals may simply have work schedules which do not frequently demand an early bed-and waking time. In a study [40] we compared members (mean age $38 years) of the German Mensa (IQ > 130) to age-matched and sex-matched controls and only found differences in weekday, but not weekend sleep timing. Later weekday sleep timing was fully accounted for by the later work schedules of Mensa members. We are aware of only one other study of the IQ-chronotype association in working age adults [41], which, however, did not address this question.

Intelligence and sleep EEG measures
The activity of the central nervous system (CNS) in sleep is a more data-rich potential biomarker of cognitive function than sleep macrostructure. The most frequently studied biological measure of sleep-specific CNS functions is the electroencephalogram (EEG). EEG records brain activity with excellent temporal and some spatial resolution. A considerable body of evidence in both human and animal research posits the crucial role of sleep EEG oscillations such as sleep spindles and slow oscillation in cognitive processes including memory consolidation [42][43][44]. However, sleep EEG measures are highly stable within the same individual [45][46][47], leading to the hypothesis that they may also reflect an individual's typical state-independent brain activity which in turn may be related to similarly stable psychological traits such as intelligence. Only few EEG studies associate sleep stage duration and IQ, with the largest study [49 ] being unable to replicate any associations reported in considerably smaller datasets. Without further research it is difficult to ascertain if nevertheless a weak association exists, or if stronger associations exist in certain subpopulations.
By far the most prominent sleep EEG element putatively associated with intelligence is the sleep spindle [48], a frequent, prominent NREM sleep oscillation with a slow and a fast subtype (slow spindles typically defined at 10À13 Hz and fast spindles at 13À16 Hz in adults, with some variability in the literature), arising from thalamocortical networks and also implicated in other cognitive domains, most commonly memory [42]. A large number of studies have been conducted in this field with various samples and methodologies. A recent meta-analysis [20 ] summarized the findings of the available literature consisting of 22 studies with 953 participants. Based on sample size-weighted averages, there was strong evidence for a positive correlation between slow (r$0.1) and fast (r$0.2) spindle amplitude and IQ, but little to no evidence for other spindle parameters, such as density and duration. Publication bias was present, but was not the source of the effects. In the section 'Methodological pitfalls' we discuss how this finding can be reconciled with small individual studies finding much larger effects.
Another meta-analysis [49 ] about sleep spindles and cognitive test scores is also available, but since it concerns effects in adolescence without within-study correction for the effects of age, it does not separate the developmental and constitutional sources of the correlation between the two variables.
Interestingly, while the sleep EEG records many other prominent sleep oscillations which are just as trait-like as spindles [50][51][52][53][54], neither garnered comparable attention. One small study [55] in a small sample of 14 children found correlations between IQ and theta, alpha, sigma and beta activity. We recently replicated some of these findings in a sample of 151 adults [56 ] and found correlations between IQ and power in many NREM and REM frequency ranges outside sleep spindles. While preliminary, these findings suggest that sleep spindles are far from being the only putative sleep EEG indices of intelligence.

Methodological pitfalls
We maintain that sleep EEG is a rich potential area to look for neural correlates of intelligence, but the present literature needs to be critiqued, extended and overhauled. We identify three main problems with the existing literature, which should be targeted by future researchers: 1) the use of small samples and the uncritical handling of the results from these; 2) the use of unvalidated analytical algorithms; and 3) the lack of a targeted investigation of general or specific cognitive abilities.

Small studies, statistical issues and replication
Most sleep EEG studies are very small. A meta-analysis [20 ] found the median sample size of studies investigating the relationship between spindle parameters and intelligence to be N = 24. If we use a small sample to estimate a correlation r which exists in the general population, the expected value of the correlation in the sample is r, but it is also subject to sampling error, the mean of which zero with variance equal to the standard error which leads to random deviations in the actual measured correlation. Researchers can report both false positive and false negative findings due to sampling error. There are very many possible correlations to calculate between sleep and cognitive variables, a phenomenon known as researcher degrees of freedom [57]. For instance, researchers can choose verbal IQ, performance IQ or full IQ and a practically infinite number of sleep EEG measures from many electrodes, or split the sample by age or sex. Sampling error guarantees that this will eventually automatically result in publishable positive findings even if no actual relationship exists. These may be cited many times due to their interesting nature, but will not replicate.
The opposite is also true: with a small sample, a researcher can falsely conclude that there is no effect while there is one. It is not guaranteed that if there is a real effect, any study will find it at p < 0.05. Statistical power reflects the chance that a study will return a statistically significant result when it should because the null hypothesis is false. For example, if there is a true correlation of 0.2 between a sleep measure and IQ and we conduct a thousand studies of N = 24 each to find it, only 170 of these studies will correctly conclude that there is a significant correlation at p < 0.05. In other words, a sample of this size has only 17% power to find an effect of r = 0.2. Figure 1 illustrates the importance of sampling error as a source of misleading findings.
To avoid both false positive and false negative findings and their proliferation in the citing literature as real effects we recommend the following practices: The expected effect sizes between sleep and intelligence variables are small. Because of power issues, authors should not conduct detailed correlational studies in small (N << 100) samples. The pooling of data across labs and the reporting of negative findings or alternative statistical models should be encouraged. Researchers should never only describe a finding as 'significant' or 'non-significant'. The true effect size (e. g. r = 0.02) should always be reported, including for non-significant findings. Researchers should cite the literature critically. Smaller studies are to be given less credence. Whenever possible, meta-analyses should be conducted to assess true effect sizes and possible publication biases or moderator effects.

Algorithm validity
In all sciences, we intend to measure abstract constructs but we can usually only directly measure some operationalization of it. For instance, we do not usually measure 'temperature', but the length of a mercury rod in a glass vial, although this happens to be a virtually perfect operationalization of the former. 'Sleep spindle density', 'sleep duration' or 'intelligence' are also only constructs, and the quality of our research depends critically on the quality of the operationalizations we use to measure them, for example, the number of spindles detected by a certain algorithm per minute, actigraphic recordings of sleep duration or the result of a certain IQ test. These operationalizations must be reliable and valid. In short, at the minimum we expect that: 1) Two operationalizations intended to measure the same constructs should yield similar, ideally identical results (convergent validity) Sleep and intelligence Ujma, Bó dizs and Dresler 113  A demonstration of how sampling error can lead to spurious findings. We created two standardized, normally distributed variables with N = 1000 and r = 0.2. 10,000 random samples were drawn from this population, and the correlation was measured in each sample. Panel (a) illustrates the correlation of Variable 1 and Variable 2 in the total population overlain with the Sample 1 (r = À0.32, p = 0.11) and Sample 2 (r = 0.59, p = 0.006), both manually chosen from the 10 000 random samples. Even though the correlation in these samples is only different from the population correlation of 0.2 due to sampling error, to a naïve researcher the results from Sample 1 may suggest a negative correlation and 2) Operationalizations should correlate strongly with a more direct, 'reified' definition of the construct they intend to measure (construct validity).
It is by no means automatically true that the biological measure or mathematical procedure (e.g. bipolar-referenced EEG or a certain sleep spindle detector) we choose to operationalize a construct (e.g. sleep spindle density) is valid. For example, when designing a sleep spindle detector researchers make choices about what frequency and amplitude the signal considered to be a 'spindle' must have, and these choices may result in very different 'spindle' detections, not all of which are guaranteed to be correct. Actually, we found that the convergent validity of the Individual Adjustment Method (IAM) spindle detection algorithm and another one based on individual amplitude means and SDs was very low, except for fast spindle amplitude [58]. We found similarly low convergent validity between IAM and the Ferrarelli algorithm [59], again except for fast spindle amplitude (Anu-Katriina Pesonen, personal communication). A general tendency of sleep spindle detectors is that human raters agree with each other more than with automatic detectors [60,61], and there is poor agreement between different automatic detectors as to which epochs they identify as spindles [62].
To our knowledge, the only test of the construct validity of sleep spindle detectors was performed by Bó dizs et al. [63] who found that sigma-frequency spectral peaks disappeared once the 'spindles' marked as such by the IAM algorithm were removed from the EEG signal, but this did not happen when only visual detections were considered. This suggests that to the extent that sigma-frequency spectral peaks represent the true phenomena we want to define as spindles, IAM is more construct valid than visual detections. This is particularly problematic if visual detections are considered a gold standard and automatic detectors are designed to replicate them [60][61][62]64]. Similar research is urgently needed, and algorithms for the detection of spindles or other phenomena should not be accepted as valid until at least convergent validity has been tested.

General and specific cognitive abilities
It is well known to psychometrists that performance in all psychometric tests is positively correlated in humans [2,11,[65][66][67] as well as in other animals [68][69][70][71]. In practice, this means that performance in, for example, a short-term memory test depends not just on the specific ability to perform on that test, but also more general abilities. Unfortunately, in neuropsychology and the cognitive sciences, it is sometimes assumed that 'premise of a general cognitive ability flies in the face of 50 years of neuroscience research' [69], and cognitive tests measure only whatever is written on their label. It is a legitimate question whether specific abilities or the g-factor correlate with a biological measure. For instance, both the g-factor and g-residualized mental speed is affected by ageing [72]. It should be avoided, however, to extract, for example, performance and verbal IQs or spatial and verbal ability scores from IQ or other cognitive tests, and then claim that, for instance, because the former significantly correlates at p < 0.05 with a sleep measure and the latter does not, this sleep measure is an indicator of performance but not verbal IQ. These variables reflect strongly overlapping constructs, and if one correlates with a third variable but the other does not it is likely due to sampling error, but by no means evidence for a correlation with the g-residualized specific ability. The proper way to perform such an analysis is by separating their shared and unique variances either by using structural equation models [12,72] or more simply, to use the factor scores (g-factor) and the residuals (g-independent verbal, spatial, memory etc. abilities) of a principal component analysis of all cognitive tasks we have data from. There is very little research in this area, but at least one study [73 ] -using reasoning, short-term memory and verbal test scores residualized for the other scores -explored the correlations between sleep spindle parameters and approximately g-independent cognitive abilities.

Future directions
The correlation between the macrostructure of sleep and intelligence is well established, but modest. EEG studies are promising, but require further work. We propose three possible areas of further exploration.
First, while sleep spindles are the most commonly investigated sleep EEG biomarkers of intelligence, the available evidence suggests that the validity of sleep spindle detectors may be low, except for fast spindle amplitude. As a result, amplitude is the only spindle biomarker reliably associated with intelligence [20 ], but this may be because of the lack of validity for other biomarkers. Future studies should explore the validity of spindle detection algorithms, and ultimately a consensus must be made about the right and wrong ways to detect spindles.
Second, it is not sufficiently explored whether other traitlike sleep EEG measures, such the amplitude, coupling or rhythmicity of non-spindle oscillations is associated with intelligence. The simultaneous exploration of multiple oscillations and oscillation characteristics would be especially useful because it would permit the construction of multivariate predictive models. Our group [74] and another research team [75] recently used learning algorithms to predict chronological age from a large set of sleep EEG measures, and could do so with high accuracy (R 2 up to 80%). While it is unlikely that intelligence can be predicted from the sleep EEG with comparable accuracy, it could potentially account for more variance than the few current morphometric predictive models [76][77][78] which account for 4-8% of cognitive performance variance (excluding total brain size), or the better established genetic predictors [79] which currently top out at just over 10%. If the learning algorithm allows the identification of the specific independent variables which contribute to model performance then mechanistic interpretations are also possible. For example, we might notice that oscillation propagation speed is generally associated with intelligence more than amplitude, an observation which would have implications for the neural underpinnings of intelligence, which are currently not sufficiently known [80].
Third, in case of both spindles and other EEG phenomena, it is unknown whether they are associated with general or specific cognitive abilities. Because of the correlation between g and specific abilities, we would expect a positive correlation with full IQ scores even if these oscillations were only correlated with, for example, verbal or memory abilities. This question can be resolved using cognitive test batteries instead of single tests and structural equation models.
Future research along these lines may facilitate the identification of the sleep-related biomarkers of intelligence.