Multivariate genome-wide covariance analyses of literacy, language and working memory skills reveal distinct etiologies

Several abilities outside literacy proper are associated with reading and spelling, both phenotypically and genetically, though our knowledge of multivariate genomic covariance structures is incomplete. Here, we introduce structural models describing genetic and residual influences between traits to study multivariate links across measures of literacy, phonological awareness, oral language, and phonological working memory (PWM) in unrelated UK youth (8–13 years, N = 6453). We find that all phenotypes share a large proportion of underlying genetic variation, although especially oral language and PWM reveal substantial differences in their genetic variance composition with substantial trait-specific genetic influences. Multivariate genetic and residual trait covariance showed concordant patterns, except for marked differences between oral language and literacy/phonological awareness, where strong genetic links contrasted near-zero residual overlap. These findings suggest differences in etiological mechanisms, acting beyond a pleiotropic set of genetic variants, and implicate variation in trait modifiability even among phenotypes that have high genetic correlations.


INTRODUCTION
Within most Indo-European languages, including English, an alphabetic writing system maps sequences of symbols to the sounds and meaning of words 1 . Thus, symbols or graphemes (letters or groups of letters) represent individual sounds (phonemes). This correspondence of spoken language to printed words, known as the alphabetic principle, is, in turn, the basis of phonological decoding skills, which enable the interpretation of texts through phonetic transformation 2 . As a consequence, close and reciprocal interrelationships manifest between phonological decoding, letter knowledge, and phonological awareness, the ability to dissect and transform words according to their phonological structure 3 . As links emerge and can be used in real-time to identify printed words, children apply this knowledge to develop reading and spelling skills 1 . Fully developed reading comprehension skills also include the ability to identify and integrate word meanings along with contextual and world knowledge to construct the meanings of sentences and larger discourse structures 4 . Once mastered, reading has a profound impact on the acquisition of knowledge, including print exposure 5 , and, ultimately, on final educational attainment 6 .
Based on this framework, the "Simple View of Reading" 4,7 proposed that reading comprehension is the product of two independent skill sets, "decoding" and "language comprehension", where the latter specifically refers to listening comprehension 7 , i.e. the ability to interpret the meaning of words in grammatical structures (sentences) presented orally 7 . In younger children, reading comprehension is thought to be constrained primarily by decoding abilities, whereas in older children, with a high level of decoding ability, reading comprehension is mainly a function of language comprehension 8 . Other theoretical approaches, such as the Reading Systems Framework 9 implicate additional general cognitive resources in reading processes, such as phonological working memory (PWM) 10 , which is typically assessed by non-word repetition tasks 11 . This task requires a purely sound-based division of non-words along with retention and later reproduction of the sound sequence 10 . Like phonological awareness, PWM contributes substantially to decoding abilities. In addition, children with language and reading problems have lower non-word repetition accuracy that may affect receptive vocabulary development 12 .
This wide-spread evidence of genetic pleiotropy is consistent with the existence of "generalist genes" that contribute to shared aspects of cognitive functioning, such as those involved in literacy, language, working memory and related skills, affecting in particular learning abilities 27 . As hypothesized by a recent "multiple-deficit theory" model, "generalist genes" may play a role in increasing liability to developmental disorders, including reading problems in dyslexia 28,29 . This is supported by observational research demonstrating that cumulative risk is the best predictor for poor language and reading 30 . However, we lack a comprehensive map of genetic covariance structures, which include both these broad connections as well as unique genetic relationships.
For example, strong genetic links have been identified between oral language and reading comprehension 14,16 , while genetic overlap between oral language and word reading fluency has been only moderate [14][15][16] . Genetic differences may, thus, partially distinguish meaning-based (i.e. comprehension-related) versus code-based (i.e. decoding-related) abilities [14][15][16] , consistent with the two etiologically distinct core factors proposed by the "Simple View of Reading", potentially shaping detectable overarching multivariate genetic covariance patterns across domains. Even within a domain, such as literacy, reading fluency measures may subtly vary in the level of comprehension-and decoding-related aspects required and, consequently, differ in trait interrelationships. However, investigations of multiple measures of reading fluency, allowing inferences beyond measurement-specific limitations, are still outstanding, and shared genetic links with PWM capturing underlying cognitive resources during mid-childhood and early adolescence have not yet been characterized.
Importantly, the current understanding of multivariate genetic covariance across these phenotypic domains has been largely based on twin analyses 14,[17][18][19][20][21][22][23] . While twin studies offer many advantages, notably the ability to disentangle genetic from shared environmental and nonshared environmental factors 31 , they have also been criticized for their reliance on several assumptions. This includes, for example, the equal environment assumption for mono-and dizygotic twins, in particular the equal treatment of twins, the correct classification of zygosity and the representativeness of twin studies for the general population 31 . Investigations of multivariate genetic structures with independent approaches, utilizing general population-based samples, have, so far, been missing.
This study aims to evaluate the genetic relation of reading skills to abilities that are outside of reading proper but are nonetheless literacy-related. Reading is a complex trait that builds upon numerous, interrelated skills that each have a myriad genetic and environmental causes. As people learn to read they must integrate these disparate, complex abilities into a fluent, nearly automatic skill. Understanding how these interrelated skills culminate in an individual's ability to read provides a framework for modifying the way we teach students to read and ultimately allow us to enhance the effectiveness of the learning process. Firstly, we model the multivariate genetic covariance across four phenotypic domains: literacy (assessed by reading fluency and spelling measures), phonological awareness (phonemic awareness), oral language (listening comprehension), and PWM (nonword repetition).
Secondly, by modeling genetic data from unrelated individuals, we use independent methods to confirm and augment knowledge of multivariate genetic architectures identified in twin studies. Specifically, we bypass twin research-related criticisms by investigating multivariate genetic trait covariance in unrelated individuals with genome-wide single nucleotide polymorphism (SNP) information 32 , studying youth from The Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort 33,34 . Here, we fit structural models to genetic relationship-matrices (GRMs), derived from genome-wide markers, using structural equation modeling (Genetic-relationship-matrix structural equation modeling, GSEM) 32 . Specifically, GSEM adapts multivariate twin models 35 to make them applicable to GRM-based analyses, as originally proposed for genomic data using uni-and bivariate genomic-relatedness-based restricted maximum-likelihood (GREML) 36,37 . Like GREML 36 , GSEM dissects phenotypic variation into additive genetic variance (A) and residual variance (E). However, beyond estimating SNP-based heritability estimates (SNP-h 2 ) and genetic correlations, GSEM models aim to identify underlying genetic factor structures. In addition to fitting independent pathway and Cholesky decomposition model, we also implement a novel combined Independent Pathway/Cholesky (IPC) model, enhancing the interpretability of genetic factor structures without a loss of model fit in residual factor structures. This approach enables us ultimately to compare genetic and residual covariance between traits to illuminate similarities and differences in etiological relationships between literacy (reading fluency, spelling), phonological awareness, oral language, and PWM.

RESULTS
Univariate and bivariate genetic variance analyses Variability in reading fluency (non-word-reading speed and accuracy, word reading speed and accuracy, passage reading speed and accuracy), spelling, phonemic awareness, listening comprehension, and non-word repetition during mid-childhood and adolescence (Table 1) was moderately heritable, confirming previous reports 23,38 . Using Genome-wide Complex Trait Analysis (GCTA) software, GREML-based SNP-based heritability (GCTA SNPh 2 ) estimates for reading fluency, spelling accuracy, phonemic awareness, listening comprehension, and non-word repetition ranged between 30% (SE = 0.06) and 50% (SE = 0.07) when assessed in unrelated children and adolescents from the general population, irrespective of phenotypic transformation (Table 1 and  Supplementary Table 1). All scores were phenotypically (Supplementary Table 2) and genetically (Supplementary Tables 3 and 4 Modeling strategy for multivariate analyses Using a series of structural equation models, we describe the multivariate polygenic covariance of reading fluency, spelling, phonemic awareness, listening comprehension and non-word repetition abilities. Due to the computational burden of multivariate genetic variance analyses, we were not able to fit one large combined model of all 11 measures in this study. Instead, we adopted a two-stage approach: First, we fit multiple smaller-scale multivariate models focussed on literacy measures only with the aim of identifying proxy measures that sufficiently capture the observed genetic structures of reading fluency (6 measures) and spelling (2 measures), which were assessed with multiple psychological instruments (Stage 1, Supplementary Table 5). Second, we fitted multivariate genetic models across traits including the proxy measures of reading fluency and spelling (identified from Stage 1) as well as phonemic awareness, listening comprehension and non-word repetition (Stage 2, Supplementary Table 5).
For each fitted multivariate model during either stage (except for spelling), three GSEM submodels were examined: (i) a saturated model (Cholesky decomposition), (ii) an independent pathway model, and (iii) a hybrid IPC model, involving an independent pathway model for the genetic covariance structure and a Cholesky decomposition model for the residual covariance structure. The model fit was compared using Akaike information criterion (AIC) and Bayesian information criterion (BIC) fit indices and likelihood ratio tests (LRT) 39 . The two spelling measures (Stage 1) were fitted with a Cholesky decomposition model.

Single-domain structural models of reading fluency
We started the process of proxy measure identification (Stage 1, Supplementary Table 5) by structurally modeling the six reading fluency measures with the aim to identify the proxy measure of reading fluency: non-word reading accuracy and speed, word reading speed and accuracy, and passage reading accuracy and speed, all ascertained in ALSPAC participants between the ages of 9-13 years ( Table 1). Following BIC (the more stringent criterion), the most parsimonious model describing the data was the IPC model ( Fig. 1, Table 2, and Supplementary Tables 6-9). The model identified evidence for a shared genetic factor (A) across the six reading fluency measures ( Fig. 1), in addition to specific genetic factors for some of the measures.
For the shared genetic factor (Fig. 1a, b), the strongest factor loading was estimated for passage reading accuracy, explaining 47% (SE = 0.07) of the phenotypic variation (Supplementary Table  6) and 93% (SE = 0.04) of the SNP-h 2 (Supplementary Table 7). The weakest factor loading was found for word reading speed (Fig. 1a, b), explaining only 27% (SE = 0.07) of the phenotypic variation (Supplementary Table 6), but 82% (SE = 0.09) of the SNP-h 2 (Supplementary Table 7). Thus, the shared genetic factor captured the vast majority of genetic variance across all reading fluency measures, and its factor structure is reflected by near-perfect genetic correlations (r g ) between the different reading fluency measures (Supplementary Table 8 and Fig. 1c). For example, passage reading accuracy had r g of 0.97 (SE = 0.02) with non-word reading accuracy, 0.92 (SE = 0.04) with non-word reading speed, 0.92 (SE = 0.03) with word reading accuracy, 0.88 (SE = 0.05) with word reading speed, and 0.93 (SE = 0.04) with passage reading speed.
There was further evidence for measurement-specific factors describing additional genetic variation underlying non-word reading speed (passing the nominal p-value threshold only), word reading speed and accuracy, and passage reading accuracy (Fig. 1a, b), although the explained phenotypic variation was small. Word reading speed (age 13) had the strongest measurement-specific factor loading explaining 6% (SE = 0.03) of phenotypic variance (Supplementary Table 6), corresponding to 18% (SE = 0.09) of the SNP-h 2 (Supplementary Table 7).
To ensure that the conclusion from Stage 2 analysis would be robust across the choice of the reading fluency measure, we selected two proxy measures based on the structural model for reading fluency in Stage 1: (i) passage reading accuracy (age 9), because it showed the highest common factor loading for reading, and (ii) word reading speed (age 13), because it showed the highest measurement-specific genetic factor loading.
Single-domain structural models of spelling As a second proxy identification analysis in Stage 1, we fitted a saturated Cholesky model to spelling accuracy scores (Supplementary Table 5) at 7 and 9 years of age (Supplementary Fig. 2 and Supplementary Tables [10][11][12]. This model showed that genetic factors underlying spelling accuracy at age 7 years can capture nearly all (94% with SE = 0.09) of the genetic variation for spelling accuracy at age 9 years (Supplementary Table 11), with r g of 0.97 (SE = 0.05, Supplementary Table 12). Consequently, the former measure was selected as a proxy for Stage 2 analysis.

Multi-domain structural models
In Stage 2 analyses, we investigated the multivariate genetic covariance of literacy skills (reading fluency and spelling), phonological awareness (phonemic awareness), oral language (listening comprehension), and PWM (non-word repetition). Each of the proxy measures for reading fluency was independently studied. The multi-domain model with passage reading accuracy (age 9) as proxy for reading fluency was referred to as passage reading subset (Fig. 2), while the multi-domain model with word reading speed (age 13) as proxy was referred to as word reading subset (Fig. 3). For each analysis (Tables 3 and 4), we compared the fit of a saturated Cholesky decomposition to the fit of an independent pathway and IPC model.
For the passage reading analysis, the IPC model fitted the data best, based on AIC, BIC, and LRTs (Table 3 and Supplementary  Tables 13-16). There was evidence for a single shared genetic    The path diagram (a) depicts the genetic factors of the best-fitting model (IPC model) describing variation in non-word reading accuracy at age 9 (NW reading a 9, NBO), non-word reading speed at age 13 (NW reading s 13, TOWRE), word reading accuracy at age 9 (W reading a 9, NBO), word reading speed at age 13 (W reading s 13, TOWRE), passage reading accuracy at age 9 (P reading a 9, NARA), and passage reading speed at age 9 (P reading s 9, NARA). The phenotypic variance was dissected into common (AC) and specific (AS1-AS6) genetic factors, according to an Independent Pathway model, as well as residual factors, based on a Cholesky decomposition (E1-E6; Factor loadings are not shown, Supplementary Table 6, and Supplementary Fig. 3). Observed measures are represented by squares and factors by circles. Single-headed arrows (paths) define relationships between variables. Dotted and solid paths represent factor loadings with p > 0.05 and p ≤ 0.05 respectively. The variance of latent variables is constrained to unit variance; this is omitted from the diagram to improve clarity. The variance plot (b) depicts the standardized genetic variance components for the model in a. The correlogram (c) shows genetic (r g , lower triangle) and residual (r e , upper triangle) correlations for the model in a. Point estimates and their SEs are reported in Supplementary Table 9.
Lastly, we compared multivariate genetic and residual trait correlation patterns as fitted by IPC models. Among reading fluency measures (Fig. 1c), and extending to spelling accuracy and   Tables 16 and 20. Especially, the residual correlations between listening comprehension versus passage reading, spelling accuracy, and phonemic awareness, were low in comparison to genetic correlations (Fig. 2c) Table 16). Likewise, most of the phenotypic covariance was accounted for by genetic covariance, with bivariate SNP-h 2 estimates consistent with one, based on derived 95% CIs (Supplementary Table 15).

DISCUSSION
By modeling multivariate genetic variance within a populationrepresentative sample of unrelated youth using genome-wide markers, we demonstrated that literacy, phonological awareness, language and PWM abilities share a large proportion of their underlying genetic variation, but also revealed substantial differences in their genetic variance composition. These findings imply a pleiotropic set of common genetic variants that contributes broadly to performance in these domains and can be augmented by trait-specific genetic variation. Together with evidence for discordant genetic and residual covariance patterns, especially between genetically highly correlated oral language and literacy abilities, our findings suggest differences in underlying etiological mechanisms.
In line with long-established findings from multivariate twin research 14,18,23 , we identified an overarching genetic (core) factor that is not only shared across literacy skills, including non-word, word, passage reading abilities and spelling, but also with other abilities such as phonemic awareness, listening comprehension, and non-word repetition that are foundational for the acquisition and use of literacy skills. Utilizing passage reading accuracy (age 9) as a reading fluency proxy, the overarching genetic factor accounted almost fully (>90%) for the genetic variance in reading fluency, spelling and phonemic awareness and, to a lesser extent, for the genetic variance in listening comprehension (44%) and non-word repetition (53%). The identified shared genetic factor suggests wide-spread pleiotropy, in support of the concept of "generalist genes" 27 that, here, may capture multiple shared aspects of cognitive functioning related to decoding abilities. The shared genetic covariance structures also include genetic overlap between listening comprehension and reading fluency, and extend findings from previous twin research 15,40 that detect primarily moderate genetic correlations between oral language and reading fluency of 0.47-0.58 (across ages 7, 12, and 16 years 14 ). The connection between oral language and reading fluency may arise due to several processes: Not all words can be acquired through phonological skills, as predicted by the "Simple View of Reading" 4,7 ; words with inconsistent orthographicphonological mappings need to be memorized or used in conjunction with contextual cues 15 . Furthermore, once decoding skills have been established and well-practiced on reading words, the orthographic representations of those words become integrated "lexical representations" directly connected to phonological and semantic knowledge 15,41 . Likewise, phonological awareness is no longer simply a predictor of reading ability, but instead becomes reciprocally shaped by reading experience 1,42 . These processes potentially strengthen the covariance between language, literacy, and phonological awareness with reading experience, as children accumulate increasingly detailed orthographic knowledge 1 .
Multivariate genetic variance models also demonstrated that non-word repetition, here considered a proxy of PWM, shares genetic variation with listening comprehension, literacy abilities, and phonemic awareness. These findings extend twin research reporting moderate genetic correlations between early-childhood non-word repetition and a mid-childhood reading composite measure (r g = 0.44) 43 and emphasize the importance of general cognitive resources across domains during mid-childhood. Based on the core genetic factor contribution to phenotypic variance (factorial co-heritabilities), our structural models suggest that shared etiological processes contribute to nearly half of the genetic variance for both, PWM and language. Our findings support also recent investigations of working memory, based on genome-wide genotyping information from unrelated individuals, using summary statistics 44 . This research suggested that both verbal numerical reasoning and memory-related tasks are related to an underlying common genetic factor 44 , which has long been hypothesized by observational studies 45 .
The genetic core factor across all domains studied here could be robustly identified using two diverse reading fluency proxies ascertained at different developmental stages, spanning midchildhood to early adolescence. Longitudinal phenotypic research has demonstrated that word decoding skills are age-dependent, shaping the transition of "learning to read" to "reading to learn" 24,46 . In contrast, etiological evidence from twin studies has confirmed a high level of genetic stability for word-level decoding 14 , with age-to-age (7-12-16 years) genetic correlations of 0.72-0.84. Similarly, our findings confirm a high level of developmental stability in the genetic structure of reading fluency using very different methods.
We also observed evidence of at least three instances where genetic influences reflect specific genetic variation. First, there was a trait-specific genetic variance contribution to oral language abilities (here listening comprehension), consistent with the "Simple View of Reading" 4,7 distinguishing language comprehension from decoding abilities. This provides converging evidence with recent twin models suggesting a strong patterning of oral language with reading comprehension, compared to a distinct and only moderate overlap between oral language with reading fluency 14 . Furthermore, word recognition (decoding) and listening comprehension have been shown to exert independent genetic influences on reading comprehension 16 , e.g. children with high reading accuracy and fluency are not necessarily efficient in reading comprehension 47 .
Second, we identified a trait-specific genetic variance contribution underlying non-word repetition. PWM is thought to consist of two components; (i) a short-term phonological store, as the nonword has to remain in memory long enough to identify the individual phonemes and their sequence, and (ii) an articulatory rehearsal process, as the non-word has to be rehearsed sufficiently rapidly and repeatedly to prevent it from decaying over time 10 . Thus, while these processes are strongly interlinked with language and literacy, supported by findings of this study and others 44 , we also find evidence for an independent genetic contribution.
Third, the structural model across all domains identified a traitspecific genetic factor for reading fluency, as proxied by word reading speed (Fig. 3), but not passage reading accuracy (Fig. 2). As genetic variance in word reading speed, too, could only partially be explained by the common genetic factor shared with other reading fluency measures (82%: Fig. 1 and Supplementary Table 6, Stage 1 analysis), these genetic contributions are likely to represent measurement-specific influences. To the best of our knowledge, there are currently no studies that have directly examined the genetic links between passage and word reading fluency. The differences in genetic structures may reflect the fact that reading accuracy and speed measures, based on grammatically and semantically coherent passages, are likely to be more influenced by comprehension skills compared to those based on single-word reading 1,48 .
Beyond genetic correlations, literacy and phonological awareness measures were, without exception, also residually intercorrelated. Residual trait covariance reflects the phenotypic trait covariance that is not captured by GRMs derived from unrelated individuals. As weak to strong residual correlations were observed across different psychological instruments and raters across a window of~6 years, correlated assessment-specific error is an unlikely explanation. Residual correlations may also reflect the influence of rare genetic variation although this, too, is an unlikely scenario, given the low power of population-based cohorts to detect rare genetic effects 49 . Instead, residual correlations are likely to reflect here shared environmental factors such as neighborhood, but also the English schooling system. As such, our findings suggest transferability of acquired skills across literacy and phonological awareness domains. This conclusion is strongly supported by twin study findings of bidirectional effects between reading fluency and reading comprehension and overlapping genetic and environmental factors 50 . In contrast, residual correlations between oral language versus literacy or phonological awareness abilities were near zero, despite moderate genetic relationships. In conjunction with the identified trait-specific genetic variance contributions for listening comprehension, these findings provide converging evidence for differences in underlying etiological mechanisms between oral language versus literacy/phonological awareness skills. Moreover, they implicate reduced modifiability of listening comprehension by processes that shape literacy skills during mid-childhood to early adolescence. Although the nature of these etiological mechanisms is not yet known, our findings highlight the importance of studying the scope of schooling and intervention programs targeting reading and language.
This study benefits from modeling multivariate genetic trait covariance using directly assessed genomic data, and likelihoodratio-test-based model comparison against a saturated model. GSEM 32 , thus, extends GRM-based methods, such as GREML 36 , by incorporating models that are popular in multivariate twin studies. The GSEM approach takes advantage of (i) accurate estimations of genetic relatedness between unrelated individuals using genomic data, (ii) similar recruitment conditions for all participants, including comparable environmental conditions (e.g. UK schooling system), and, consequently, and (iii) general population-based findings through use of the ALSPAC cohort 33 . A further strength of this work is the introduction of IPC models that jointly enhance both the interpretability of model coefficients and the accuracy of estimates, while incorporating information from all available data. Importantly, our analyses show that unmodeled shared residual variation can strongly affect the model fit (LRT-p < 10 −10 ). Thus, GSEM also complements and extends existing twin research methodologies by presenting new models to study genetic trait interrelationships in samples of unrelated individuals with dense phenotyping information, such as ALSPAC.
Our results should be interpreted in light of several limitations. First, the abilities studied here were all assessed during midchildhood and adolescence. ALSPAC lacks longitudinal information on these measures that would allow assessing measurementinvariant, development-specific genetic variance changes, such as previously reported for age-dependent word decoding skills 24,46 . However, using a latent variable approach, as presented in this study, still facilitates the identification of shared and specific genetic variance components. Second, population-level phenomena, such as assortative mating and dynastic effects, can potentially inflate genetic correlations between assessed measures and upward-bias SNP heritability, even in seemingly unrelated individuals 51 . For example, parents may select each other based on their compatibility in cognitive skills, which may create a reading-rich environment for their children. These 6383 participants had phenotype data of this domain and genetic data. The models were compared with likelihood ratio tests, AIC and BIC. The model with the lowest BIC values is shown in bold and its path diagram is shown in Fig. 3. All analyses are based rank-transformed residualized scores. AIC Akaike information criterion, BIC Bayesian information criterion, Cholesky Cholesky decomposition (saturated) model, GSEM genetic-relationship matrix structural equation modeling, IPC joint independent pathway (genetic part)/Cholesky (residual part) model, IP independent pathway model, LL log-likelihood, k number of parameters.
phenomena could explain some genetic covariation across measures in our study. However, population-level biases should systematically affect all intercorrelated traits and are, thus, unlikely to account for differences in genetic and residual correlation patterns between language and literacy measures. Furthermore, this bias is not unique to GSEM or methods analyzing genetic covariance structures in unrelated individuals. Violation of the random mating assumption also introduces bias to genetic variance estimates in twin studies, although the direction of bias would be opposite; with an overestimation of the shared environmental and an underestimation of the shared genetic effect 31 . Finally, the lack of independent cohorts with comparable sets of literacy, phonological awareness, language, and PWM measures prevents a direct replication of our work, although the consistency of our findings with existing research study reports supports the validity of our results.
Modeling multivariate genome-wide genetic covariance, we have identified an overarching, pleiotropic genetic factor that is shared among measures of literacy, phonological awareness, language, and PWM. This core genetic factor is augmented by trait-specific genetic variance contributions, especially for oral language and PWM. Together with evidence for distinct genetic and residual covariance structures across oral language and literacy, our findings suggest a diverse spectrum of interrelated cognitive skills involving multiple etiological mechanisms and different levels of trait modifiability during mid-childhood and adolescence.

Cohort description
This study was carried out using longitudinal data from children and adolescents between 7 and 13 years from the ALSPAC study, a UK population-based pregnancy-ascertained birth cohort (estimated birth date: 1991-1992) 33,34 .
Written informed consent was obtained where appropriate for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Details on the recruitment of the ALSPAC cohort can be found in Supplementary Note 1.
Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool: http://www.bristol.ac.uk/alspac/researchers/our-data/.

Phenotype descriptions
Traits in this study included reading fluency (non-word reading speed and accuracy, word reading speed and accuracy, passage reading speed and accuracy), spelling (accuracy), phonemic awareness, listening comprehension, and non-word repetition. These measures were ascertained using standardized as well as ALSPAC-specific instruments (Table 1). We excluded ALSPAC measures that were composites of multiple language and literacy domains (such as e.g. verbal intelligence) or which involved tiered assessments.
Word reading accuracy age 9 (NBO). To evaluate word reading accuracy, the child was asked to read aloud a list of 10 real words. Nunes and Bryant selected these real words, which are a subset of those proposed by Nunes, Bryant, and Olsen (NBO) 52 . The reading accuracy score consists of the total number of items that the child read correctly. The test-retest reliability was 0.80, and the score had a correlation of 0.83 with the Schonell Word Reading Task 53 .
Passage reading speed and accuracy age 9 (NARA II). Children's passage reading speed and accuracy were assessed with the revised Neale Analysis of Reading Ability 54 (NARA II). A story was given to the child to read. The tester recorded the time it took the child to read each passage, and noted any errors made by the child. All scores were standardized by age. Depending on age at assessment, the parallel form reliability for accuracy ranged from 0.84 to 0.92 (average 0.89), and from 0.50 to 0.83 (average 0.66) for speed 55 . Reading speed and accuracy had a correlation of 0.76 and 0.95 with the Schonell Graded Word Reading Test 53 respectively.
Word reading speed age 13 (TOWRE). The Test of Word Reading Efficiency 56 (TOWRE) was used to assess overall word reading efficiency using the Sight Word Efficiency sub-scale. The child was given 45 seconds to read as many words as possible. Words that were skipped or wrong words were marked by the tester. The reading speed score was computed as the total number of correct words read by the child. Depending on age and sub-task, the alternate form reliability for this test ranged between 0.86 and 0.97 (with an average of 0.93), and correlations with the Woodcock Reading Mastery Tests ranged from 0.89 to 0.94, demonstrating concurrent validity 55 .
Non-word reading accuracy age 9 (NBO). The child was asked to read aloud 10 non-words. These words had been selected from a larger selection of non-words using an ALSPAC-specific instrument based on the research conducted by Nunes and colleagues 52 . The test-retest reliability of the non-word reading task was 0.73 and the score had a correlation of 0.73 with the Schonell Word Reading Task 53 . The tester emphasized to the child that the words were made-up and asked the child to read all the nonwords in the way that they thought they should be read. A total score was computed as the sum of the number of items read correctly by the child based on regular symbol-sound correspondences of written English.
Non-word reading speed age 13 (TOWRE). The Test of Word Reading Efficiency (TOWRE) was also used to assess non-word reading speed using the Phonemic Decoding Efficiency sub-scale. It is designed as a relatively pure measure of decoding, independent of meaning. The child had 45 seconds to read as many non-words as possible. Non-words that a child skipped or got wrong were marked by the tester. A total score was computed as the sum of the number of correct non-words read by the child based on regular symbol-sound correspondences of written English. The alternate form reliability was 0.94 and the test-retest reliability ranged between 0.82 and 0.97 55 . For validity, TOWRE had correlations with the Woodcock Reading Mastery Tests ranged between 0.89 and 0.91 55 .
Spelling accuracy age 7 and age 9 (NB). Spelling accuracy was assessed by asking a child to spell a series of 15 words. The words were chosen specifically for this age group after piloting on several hundred children (Nunes and Bryant, ALSPAC-specific measure 23 ). The words included regular and irregular words of differing frequencies and were put in an order of increasing difficulty. For each word, the tester first read the word out alone to the child, then within a specific sentence incorporating the word, and finally alone again. The child was then asked to write down the spelling. The spelling accuracy score is computed as the number of words spelt correctly. The test-retest reliability was >0.7 and scores were 0.91 correlated with the Schonell Spelling Test 52 .
Phonemic awareness age 7 (AAT). Phonemic awareness was assessed using the Auditory Analysis Test 57 (AAT). The task contained two practice and 40 test items of increasing difficulty. For each item, the child was first asked to repeat the word, and then to produce it again but with part of the word (a phoneme or several phonemes) removed. For example, the word to repeat initially is "sour" and then the child is asked to repeat it again without /s/ to which the correct response is "our". The test assessed seven omission categories: the omission of a first, medial or final syllable, the omission of the initial consonant, the omission of the final consonant of a one-syllable word, and the omission of the first consonant or consonant blend of a medial consonant. The words from different categories were mixed. Reliability of the AAT was not assessed in the initial task report 57 . However, test criteria of the commercial version of the AAT 58 have been assessed and revealed high internal consistency (0.78). A total score was computed as the sum of correct responses over all types of omission. The AAT had a correlation of 0.53-0.84 with the language arts subsets of the Stanford Achievement Test 57 .
Listening comprehension at 8 (WOLD). Listening comprehension is a subtest of the Wechsler Objective Language Dimensions (WOLD) 59  Non-word repetition at age 8 (NWR). Children's short-term memory was evaluated by a non-word repetition test 11 . The test contains 12 nonsense words, for each of 3, 4, and 5 syllables, all conforming to English rules for sound combinations. The participant was asked to listen to each of these 36 words from an audio cassette recorder and then repeat each word. The test-retest reliability was 0.8 and NWR scores showed correlations between 0.45 and 0.67 with the digit span test 11 .

Phenotype transformations
Any children who stopped prematurely during the psychological tests were excluded from the study. The scores of each test were adjusted for age, sex, and the two principal components from the genotyping analysis to correct for population stratification 63 . The scores from passage reading measures (NARA) are already age-standardized, and therefore these scores were not further adjusted for age. All residualized scores were finally ranktransformed to improve the GSEM log-likelihood estimation assuming multivariate normality.
SNP-h 2 estimates (Supplementary Table 1) and bivariate phenotypic correlations (based on Pearson correlation coefficients, Supplementary Table 2) between measures, using both untransformed and ranktransformed scores, were comparable with each other and with previous reports 23 .
Genotyping ALSPAC children were genotyped using Illumina HumanHap550 quad-chip platform. The ALSPAC GWAS data were generated by Sample Logistics and Genotyping Facilities at the Wellcome Trust Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. After quality control (individual call rate > 0.97, SNP call rate > 0.95, Minor allele frequency (MAF) > 0.01, Hardy-Weinberg equilibrium (HWE) p > 10 −7 , and removal of individuals with cryptic relatedness and non-European ancestry), genome-wide data were available for 8226 children and 465,740 directly genotyped single-nucleotide polymorphisms (SNPs). Among those children, 6774 had at least one phenotype observation for reading fluency, spelling, phonological awareness, language, or PWM abilities available.

Genetic-relationship matrix structural equation modeling
Multivariate genetic covariance structures were identified using geneticrelationship matrix structural equation modeling (R:gsem library, version 1.1.0, https://gitlab.gwdg.de/beate.stpourcain/gsem) 32 . This multivariate analysis technique combines whole-genome genotyping information with structural equation modeling techniques applied in twin research to model multivariate genetic variance using a maximum likelihood approach 32 . Thus, within the GSEM framework, genetic and residual variance were modeled as genetic and residual factors, respectively. More specifically, GSEM also allows modeling multivariate residual influences, such as those not captured by genotyped markers, potentially involving rare, nonadditive or un-tagged genetic influences, environmental risk factors, and random error 36 . In this study, phenotypic variance for each measure was dissected into genetic and residual influences (AE model) using a full Cholesky decomposition and an independent pathway model 32,35 : i. The Cholesky decomposition model is a fully parametrized descriptive model without any restrictions on the structure of latent genetic and residual influences. This saturated model can be fit to the data through decomposition of both the genetic variance and residual variance into as many genetic and residual factors as there are observed variables. ii. The independent pathway model specifies a common genetic and a common residual factor, in addition to trait-specific genetic and residual influences. iii. The combined Cholesky decomposition and independent pathway (IPC) model structures the genetic variance as an independent pathway model (consisting of common and measurement-specific influences) and the residual variance as a Cholesky decomposition model (where the number of residual factors is the same as the number of observed variables). Supplementary Figure 3 shows an example of an IPC model with six traits, and Supplementary Fig. 4 an example of an IPC model with five traits.
The goodness-of-fit of GSEM models to the data was evaluated with LRTs, AIC and BIC 39 .
In the present study, we modeled rank-transformed residualized phenotypes (adjusted for age, sex, and the first two principal components) as described above. Genetic-relationship matrices were constructed based on directly genotyped variants in unrelated individuals, using GCTA software 36 . Individuals with a relatedness >0.05 (off-diagonals within a genetic-relatedness matrix) were excluded. Factor loadings were evaluated using Wald tests.
Multivariate models in unrelated individuals, studying interindividual genetic variation captured by genome-wide genetic variation, are computationally expensive 32 . For example, a 6-factor Cholesky decomposition model, as fitted within this study, can require 6 weeks computing time even on a system incorporating at least 4 parallel cores of 3 GHz and between 50 and 75 Gb memory. Hence, we started the modeling process by strategically combining similar measures, such as reading fluency or spelling abilities, to reduce the number of studied instruments by selecting proxies that most comprehensively capture the shared genetic variance of an entire domain. The advantage of this approach is that we can control the extent to which selected proxies represent underlying shared genetic factors, either fully or along with specific measurement-related influences, and subsequently assess the stability of the identified genetic structures. The identified proxy measures in Stage 1 were eventually jointly modeled with other measures as part of the Stage 2 analysis, with models across four phenotypic domains: literacy measures, including both reading (nonword, word, and passage reading speed and accuracy) and spelling, phonological awareness (phonemic awareness), oral language (listening comprehension), and PWM (non-word-repetition).
Applying a conservative approach, we also evaluated all derived factor loadings of the Stage 2 model against an experiment-wide error rate of 0.007, estimated based on the effective number of individually analyzed phenotypes using Matrix Spectral Decomposition (MatSpD) 64 , accounting for all previously conducted statistical assessments. However, a correction for multiple testing is not directly applicable, as we jointly analyze genetic trait covariance using GSEM.
Identified structural models were used to estimate SNP-h 2 as well as genetic correlations, factorial co-heritabilities (the proportion of total trait genetic variance explained by a specific genetic factor), and bivariate heritabilities (the contribution of genetic covariance to the observed phenotypic covariance between two measures), as defined here: Bivariate genetic correlation between phenotypes, measuring the extent to which two phenotypes 1 and 2 share genetic factors (ranging from −1 to 1), can be derived using estimated genetic variance and covariance 65 according to: where σ g12 is the genetic covariance between phenotypes 1 and 2, and, σ 2 g1 and σ 2 g2 are the respective genetic variances. A measure of factorial co-heritability was introduced to assess the relative contribution of a genetic factor to the genetic variance of a phenotype, estimated with the gsem package (R:gsem library, version 1.1.0). The factorial co-heritability f g 2 is defined as: where σ 2 g it is the genetic variance of the genetic factor i contributing to trait t and σ 2 g t the total genetic variance of trait t, based on standardized factor loadings. The corresponding SEs were derived using the Delta method.
Bivariate heritability also known as co-heritability 66 was defined as the ratio of the genetic to the phenotypic covariance between two traits and estimated using the gsem package (R:gsem library, version 1.1.0).
where σ g12 is the genetic covariance, estimated based on unstandardized factor loadings, and σ p12 the phenotypic covariance from observed ranktransformed measures. The respective SEs were approximated by the SE of the genetic covariance divided by the phenotypic covariance (as the SE of the phenotypic covariance is very small).