Genetic in ﬂ uence on family socioeconomic status and children's intelligence ☆

Environmental measures used widely in the behavioral sciences show nearly as much genetic influence as behavioral measures, a critical finding for interpreting associations between environmental factors and children's development. This research depends on the twin method thatcomparesmonozygoticanddizygotictwins,butkeyaspectsofchildren'senvironmentsuchas socioeconomic status (SES) cannot be investigated in twin studies because they are the same for children growing up together in a family. Here, using a new technique applied to DNA from 3000 unrelated children, we show significant genetic influence on family SES, and on its association with children's IQ at ages 7 and 12. In addition to demonstrating the ability to investigate genetic influenceonbetween-familyenvironmentalmeasures,ourresultsemphasizetheneedtoconsider genetics in research and policy on family SES and its association with children's IQ. © The Authors. Published by Elsevier Inc.


Introduction
A surprising finding from quantitative genetic research is that most environmental measures in the social and behavioral sciences show significant and substantial genetic influence (Kendler & Baker, 2007;Plomin, 1994;Plomin & Bergeman, 1991;Vinkhuyzen, van der Sluis, de Geus, Boomsma, & Posthuma, 2010). This genetic influence on environmental measures is attributed to genotype-environment correlation in which individuals' experiences are correlated with their genetic propensities (Plomin, DeFries, Knopik, & Neiderhiser, 2013). Most of this quantitative genetic research relies on the classical twin design that compares monozygotic and dizygotic twins. However, using the twin method for this purpose runs into a major limitation, especially in developmental studies: The twin design can only investigate environmental factors that make members of a twin pair living in the same family different from one another, called within-family environmental effects. However, some of the most influential aspects of the family environment operate between rather than within families. For example, consider the most widely studied measure in the social and behavioral sciences, socioeconomic status (SES), which we refer to as family SES because of our focus on children's development (Bradley & Corwyn, 2002). A study of school-age twins cannot detect genetic influence on family SES or its effect on twins' cognitive development because family SES is the same for members of a twin pair. Because family SES is the same for both twins in a family, a twin study would mistakenly attribute variance in family SES to shared environment even if genetic factors were in fact substantially involved. Most importantly, genetic mediation of the effect of family SES on children's cognitive development would also be missed by the twin method.
A new quantitative genetic method that uses DNA from unrelated individuals can solve this problem because it can assess genetic effects on between-family and between-school differences in children's outcomes. The method, called Genome-wide Complex Trait Analysis (GCTA), foregoes the identification of individual DNA variants to estimate the total genetic influence captured by genome-wide genotyping for a large sample of unrelated individuals whose genetic similarity is compared pair by pair (Yang, Lee, Goddard, & Visscher, 2011). The significance of GCTA is that it can estimate the net effect of genetic influence using DNA of unrelated individuals rather than using familial resemblance in groups of special family members who differ in genetic relatedness such as monozygotic and dizygotic twins. We applied GCTA based on children's genotypes to detect genetic influence on family SES as well as genetic mediation of the effect of family SES on children's cognitive development.

Sample and genotyping
The sample was drawn from the Twins Early Development Study (TEDS), which is a multivariate longitudinal study that recruited over 11,000 twin pairs born in England and Wales in 1994, 1995and 1996(Haworth, Davis, & Plomin, 2013. TEDS has been shown to be representative of the UK population (Kovas, Haworth, Dale, & Plomin, 2007). The project received approval from the Institute of Psychiatry ethics committee (05/Q0706/228) and parental consent was obtained prior to data collection. Cognitive data and buccal DNA were available for 3747 11-and 12-year-old children (one twin per family), whose first language was English and had no major medical or psychiatric problems. From that sample, 3665 DNA samples were successfully hybridized to Affymetrix GeneChip 6.0 SNP genotyping arrays using standard experimental protocols as part of the WTCCC2 project (Trzaskowski et al., 2013b). In addition to nearly 700,000 genotyped SNPs, more than one million other SNPs were imputed using IMPUTE v.2 software (Howie, Donnelly, & Marchini, 2009). 3152 DNA samples (1446 males and 1706 females) survived quality control criteria for ancestry, heterozygosity, relatedness, and hybridization intensity outliers. To control for ancestral stratification, we performed principal component analyses on a subset of 100,000 qualitycontrolled SNPs after removing SNPs in linkage disequilibrium (r2 N 0.2). Using the Tracy-Widom Test, we identified 8 axes with p b 0.05, which were used as covariates in our GCTA analyses. This is standard procedure in genome-wide association analyses to avoid artificial associations due to ethnic or other types of population stratification (Trzaskowski et al., 2013b); correcting for these covariates is also standard in GCTA in order to avoid this source of genetic similarity among individuals in the population (Yang et al.).
The mean age of the sample at the first wave of assessment was 7.04 years (SE = 0.25) and 11.5 years (SD = 0.66) at the second wave. There were 2679 individuals with SES at 7 and 1897 with IQ at 7. The sample of individuals available for the covariance was 1750. In addition, there were 2319 individuals with IQ at age 12 for whom a total of 2013 had data for both SES at age 7 and IQ at age 12.

Socioeconomic status (SES)
There is general consensus that a composite of variables including parental education and occupation represent SES better than any single indicator (White, 1982). To index family SES we used parental education and occupation assessed when children were age 2 and again when children were age 7. At age 2, SES was constructed from the first unrotated principal component, which explained more than 50% of the variance from a factor analysis conducted on five measures: father's highest educational qualification, father's occupation, mother's highest educational qualification, mother's occupation, and age of mother at birth of eldest child. The SES composite when children were age 7 was created similarly but without the variable of age of mother at birth of eldest child.

General cognitive ability (IQ)
At ages 7 and 12, IQ was assessed from two verbal tests and two non-verbal tests. At age 7, the two verbal tests consisted of the Similarities subset and the Vocabulary subset from the WISC-III-UK. The two nonverbal tests were the Picture Completion subset from the WISC-III-UK and the Conceptual Grouping subset from the McCarthy Scales of Children's Abilities. At age 12, the verbal tests were Information (General Knowledge) and Vocabulary Multiple Choice subtests from WISC-III-PI. The two non-verbal reasoning tests were WISC-III-UK Picture Completion and Raven's Standard and Advanced Progressive Matrices. At age 7, testing was conducted by telephone (Petrill, Rempell, Oliver, & Plomin, 2002) and at age 12 testing was conducted over the Internet .

Composite measures for IQ
For each cognitive measure, outliers above or below 3 SD from the mean were excluded. Scores were regressed on sex and age and standardized residuals were derived and quantile normalized (Lehmann, 1975). Subsequently, composite measures for IQ were created as unit-weighted means requiring complete data for at least 3 of the 4 tests. All procedures were executed using R (www.r-project.org).

Statistical analyses
Genome-wide Complex Trait Analysis (GCTA). Conceptually GCTA compares a matrix of pairwise genomic similarity to a matrix of pairwise phenotypic similarity using a randomeffects mixed linear model in a large sample of unrelated individuals . The matrix that holds genomic similarities between all individuals from the sample is known as the genetic relatedness matrix (GRM). Each value in the GRM is a mean of pairwise genetic similarities (weighted by allele frequency) from across all genetic markers genotyped on the SNP array. Even remotely related pairs of individuals are excluded so that chance genetic similarity is used as a random effect in a mixed linear maximum likelihood model to decompose phenotypic variance into genetic variance as captured by the additive effects of causal variants in linkage disequilibrium with SNPs genotyped on DNA arrays (Yang et al., 2010). For this reason, as a default, GCTA removes one individual from a pair whose genetic similarity is 0.025 or greater; a coefficient that approximates at least fifth degree relatives. The power of the method comes from comparing, not just two groups like MZ and DZ twins, but thousands of pairs of unrelated individuals. Nonetheless, GCTA requires samples of thousands of individuals because the method attempts to extract a small signal of genetic similarity from the massive noise of hundreds of thousands of SNPs. Software is available to calculate power for univariate and bivariate GCTA (http://spark.rstudio.com/ctgg/gctaPower/) . For example, a sample of 3000 has 80% power to detect a GCTA heritability estimate of 30% and 50% power to detect a GCTA heritability estimate of 20%. For bivariate analysis, a sample of 3000 provides 80% power to detect a genetic correlation of 0.60 and 50% power to detect a genetic correlation of 0.45 when the GCTA heritability of one trait is 20% and the other is 30%.
In univariate analysis, the coefficients are estimated using residual maximum likelihood and the significance of genetic influence is inferred from the likelihood ratio test by comparing this model to a 'null' model of no genetic influence. Detailed description of the method can be found in GCTA publications . The bivariate method extends the univariate model by relating the pairwise genetic similarity matrix to a phenotypic covariance matrix between traits 1 and 2 (Lee, Yang, Goddard, Visscher, & Wray, 2012). The eight principal components described earlier were used as covariates in our univariate and bivariate GCTA analyses in order to attenuate the effects of ethnic and other forms of population stratification that could be read as genetic similarity, which is standard procedure in genome-wide analyses. As mentioned in the previous section, IQ scores were age-and sex-regressed prior to analysis.

Results
Using SNP genotypes from the children's DNA, we found significant genetic influence on their families' SES when the children were age 2 and age 7 ( Table 1). The GCTA estimates of heritability were 18% at age 2 and 19% at age 7; the similarity of results at age 2 and 7 is not a foregone conclusion because the correlation between family SES at the two ages is 0.75. These are underestimates of true heritability because GCTA is limited to detecting genetic influence due to additive effects of the common SNPs that are on current DNA microarrays such as the Affymetrix 6.0 GeneChip used in our study. That is, nonadditive effects and rarer DNA variants not tagged by common SNPs are missed in GCTA analysis. A novel aspect of the present study is that children cannot cause family SEStheir genotypes only reflect the causal genotypic factors responsible for their parents' education and occupation. For this reason, parental genotypes, not available in the present study, would be expected to yield a higher GCTA estimate of the parents' own education and occupation, which comprise SES. This was the case for a recent GCTA report of a component of SES, adult educational attainment (Rietveld et al., 2013).
A strength of our child-based design using children's genotypes in GCTA analyses rather than that of their parents is that it captures the genetic influence of family SES on the children themselves. This feature of the design enables the second stage of our analyses in which we conducted bivariate GCTA to determine the extent to which the well known link between family SES and cognitive developmentabout 0.30 in meta-analyses (Sirin, 2005) and in the present studyis mediated genetically. Because bivariate GCTA focuses on the genetic covariance between family SES and children's IQ, our analysis is limited to TEDS children for whom data are available for both variables. Despite the smaller samples, the bivariate GCTA heritability estimates for family SES (Table 2) are similar to the univariate estimates (Table 1): 21% at age 7 and 23% at age 12. For children's IQ, we find heritabilities of 28% at age 7 and 32% at age 12. These results from our bivariate analysis between family SES and children's IQ replicate our previously reported results showing significant GCTA heritability in our TEDS sample at ages 7 and 12 Trzaskowski, Yang, Visscher, & Plomin, 2013) as well as another study that reported significant GCTA heritability at age 11 (Davies et al., 2011).
The key result for the bivariate GCTA analysis is the genetic correlation, which indicates the extent to which the same genes affect family SES and children's IQ. The genetic correlation between family SES at age 7 and children's IQ at age 7 is near unity, indicating that the same genes affect both variables (Table 2). Despite the large standard error for GCTA estimates of genetic correlations, the genetic correlation is significantly greater than zero and not significantly lower than 1.0. We also conducted bivariate GCTA for family SES at age 7 and children's IQ at age 12 (family SES was not assessed at age 12). The GCTA genetic correlation was 0.66, which was again significantly greater than zero and not significantly lower than 1.0. Thus, these GCTA genetic correlations indicate that the same genes are largely responsible for genetic effects on family SES and children's IQ. This finding implies that when genes associated with children's IQ are identified, the same genes will also be likely to be associated with family SES. Although GCTA estimates of genetic variance and genetic covariance are biased in that they underestimate heritability to the extent that nonadditive effects and rare alleles are not included in the estimate, GCTA estimates of genetic correlations are unbiased because they are derived from the ratio of genetic covariance to genetic variance so that the bias in the numerator and denominator cancel out (Trzaskowski et al., 2013a).  Bivariate GCTA analysis also indicates the extent to which the phenotypic covariance between family SES and children's IQ is mediated genetically. The phenotypic correlation between family SES at age 7 and children's IQ at age 7 is 0.31. The genetic contribution to this covariance is 0.29 (Table 2). In other words, 94% of the correlation between family SES and children's IQ is mediated genetically. For family SES at age 7 and children's IQ at age 12, 56% of the phenotypic correlation of 0.32 is mediated genetically. The large standard errors for the estimates of genetic correlations suggest that replication is needed before interpreting the difference between the 94% versus 56% results for IQ at ages 7 and 12, respectively. However, if the difference is real, one possible explanation is that, although the phenotypic correlations between age-7 SES and IQ at ages 7 and 12 do not change, the lower genetic contribution to the SES-IQ correlation at age 12 might reflect increased environmental influence outside the family (e.g., peers, teachers).
In summary, genetic influence is significant and substantial on family SES, on children's IQ, and on the association between family SES and children's IQ. Fig. 1 summarizes the results for family SES at age 7 and children's IQ at age 7, incorporating data from Table 2.

Discussion
Our analysis provides the first DNA-based evidence that the well documented association between family SES and children's cognitive development, routinely interpreted as an environmental effect, is substantially mediated by genetic factors. Previous quantitative genetic research, largely using the twin design, has shown that most 'environmental' measures involve significant genetic influence and that associations between these environmental measures and children's development are mediated genetically (Plomin, 1994;Vinkhuyzen et al., 2010). GCTA adds importantly to this body of research in two ways. First, because it uses DNA alone, GCTA sidesteps concerns about the twin design such as the equal environments assumption (Plomin, DeFries, Knopik, & Neiderhiser, 2013), which might be especially relevant to measures of family environment (Power et al., 2013). Second, many important environmental factors such as family SES cannot be studied using the twin design because they operate between families (family general) rather than within families (child specific). GCTA can be used to study between-family variables.
The main limitation of this study is the sample size. Although a sample of 3000 unrelated children with genomewide genotypes and data on IQ and family SES is large by many standards, as noted earlier, GCTA has daunting demands for power. Our sample size is just on the cusp of being able to detect as significant the GCTA heritabilities of family SES, which is about 20%. It should be reiterated that GCTA heritability estimates are lower-limit estimates of twin heritability because GCTA is limited to detecting the additive effects of the common SNPs used in our genome-wide genotyping. On the other hand, because we found such high genetic correlations, we have good power to detect them. As noted earlier a sample of 3000 provides 80% power to detect a genetic correlation of 0.60. Table 2 Bivariate GCTA results (with standard errors) between family SES when children were age 7 versus children's IQ at ages 7 and 12.
Variables .04(.14) .99(.03) 1(.03) 2679 1897 SES 7 -IQ 12 .23(.12) .32(.14) .18 (.10) .24(.12) .32(.14) 0.66(.31) .76(.12) .68(.14) .14(.10) .20(.12) .99(.03) .99(.03) 2679 2319 Annotation: V(G)variance explained by genetic factors for trait 1 and trait 2 (tr1, tr2); C(G)covariance between trait 1 and 2 explained by genetic factors; V(e)residual variance for trait 1 and trait 2; C(e)residual covariance between trait 1 and trait 2; Vpphenotypic variance for trait 1 and trait 2; V(G)/Vpproportion of the phenotypic variance explained by genetic factors for trait 1 and trait 2; r Ggenetic correlation between trait 1 and trait 2 (constrained between 0 and 1); nnumber of individuals with data for both trait 1 and trait 2; values in parentheses are standard errors. *The current version of GCTA does not report the residual correlation or its standard error. The residual correlation was derived here from the GCTA estimates using the following algorithm: C(e)_tr12/(√V(e)_tr1 * √V(e)_tr2), whereas the standard error was calculated using: Var(re) = re * re * (VarVe1/ (4*Ve1*Ve1) + VarVe2/(4*Ve2*Ve2) + VarCe/(Ce*Ce) + CovVe1Ve2/(2*Ve1*Ve2) -CovVe1Ce/(Ve1*Ce) -CovVe2Ce/(Ve2*Ce)); SE(re) = sqrt[Var(re)], where re is the residual correlation, Ve1 is the residual variance for trait 1, Ce is the residual covariance between two traits, VarVe1 is the sampling variance for Ve1 (residual variance for trait 1), VarCe is the sampling variance for Ce, CovVe1Ve2 is the sampling covariance between Ve1 and Ve2, and CovVe1Ce is the sampling covariance between Ve1 and Ce.  Although these results are surprising and provocative, they do not in any way support the misguided notion that heritability implies immutability. Nor do any specific policies necessarily follow from finding genetic influence on family SES and its correlation with children's cognitive development because policies depend on values. However, our results do underline the need to consider nature as well as nurture when drawing policy implications from correlations between ostensible environmental measures and children's developmental outcomes. Specifically, our results bear on the extensive debate about social mobility, which has largely ignored the fact that parents and their offspring are genetically related (Saunders, 2012). Indeed, the correlation between parent and offspring SES is used as an index of intergenerational social mobility because it is assumed that SES advantages are transmitted environmentally from parent to offspring (Breen & Jonsson, 2005). For this reason, lower parent-offspring correlations are thought to indicate social mobility. From this environmental perspective, it follows that equal educational and occupational opportunities will result in equal outcomes between children so that parental SES would no longer have any effect on children's cognitive development, education or occupation. On the contrary, taking genetics into account suggests that higher parent-offspring correlations indicate social mobility. To the extent that genetics is important, parents and their offspring will be correlated; removing environmental sources of inequality will not remove this fundamental resemblance between parents and offspring.
More broadly, it should be recognized that from a genetic perspective, equal opportunity will result in relatively greater genetic influence, as reflected in greater parent-offspring correlations: As environmental differences diminish, variation that remains between children in their outcomes will be due to a greater extent to their genetic differences. In other words, heritability can be viewed as an index of meritocratic social mobility.