Main

Ascertaining the proportion of variance in a quantitative trait—such as height or intelligence quotient—that is due to genetic variation has long been of interest to a wide range of scientists.1, 2, 3, 4, 5 For human populations, where experimentation is not possible, the workhorse of such analysis has been the twin or extended twin design, where the average relatedness of various kin pairs is correlated with their phenotypic similarity in order to ascertain the effect of shared genotype on a given outcome.6, 7 The reigning critique of this approach is that it is difficult to eliminate the possibility that increased similarity between, say, monozygotic twins as compared with, for example, dizygotic twins is due to more similar environments and not solely their greater genetic similarity.8, 9

Among the recent and novel approaches to overcome this potential environmental confounding are studies that correlate phenotypic similarity with genotypic similarity across the genome among pairs of individuals who are less than 2.5% related as computed by identity by state and are therefore considered non-kin.10, 11, 12 Simply described, a genetic relatedness matrix is constructed in which each cell is filled by a measure of genotypic correlation between pairs of individuals (the rows and columns) where the genotype is based on the summation of 2N gametic correlation at specific loci across the genome, after pruning single nucleotide polymorphisms for linkage disequilibrium. The genotypes are coded as the numbers of minor alleles for each single nucleotide polymorphism, standardized by a transformation that makes the sample variance independent of allele frequency. The genetic relatedness matrix may then be used in concert with a phenotypic distance matrix to estimate heritability without estimating the phenotypic effect of any individual single nucleotide polymorphism locus. This Genetic Relatedness Estimation through Maximum Likelihood (GREML) approach yields estimates of narrow-sense (additive) heritability (h2) that are lower than but approaching those obtained from traditional twin-based approaches and has been deployed for diverse phenotypes, including height,13 schizophrenia,10 asthma,14 smoking,15 body mass index,16 educational attainment17 and political and economic preferences.18

However, similar to twin-based models, the GREML approach relies on one key assumption about the relationship between genetic similarity and environmental similarity. Although those who share genetic variation may experience more similar environments owing to population structure, admixture and, of course, extended family ties, GREML assumes that those who are less related than second cousins share alleles in an essentially random manner that is itself uncorrelated with environmental similarity. The motivating notion is that at these low levels of relatedness, relative genetic similarity is driven by the randomness of recombination and allele segregation and not by underlying kinship structure. As such, parental relatedness and relevant environmental conditions should be orthogonal to respondent relatedness.

To support this claim that relatedness among these pairs of individuals is random (and thus uncorrelated with potential environmental confounders), Yang et al.11 show correlations in relatedness levels between chromosomes in a Supplementary Table.11 Their logic is that if the person-wide genetic relatedness measure between individuals (that is, gametic correlation) was reflecting population structure (and, thus, covaried with environment), pairwise genetic relatedness would be correlated across those individuals’ chromosomes. However, if the distribution of pairwise relatedness is really just the result of randomization during meiosis, then each chromosome should be independent, demonstrating no correlation. Yang et al. find no single pair of chromosomes for which the P-value of the correlation between the genetic relatedness of those two chromosomes is less than 0.00022, which corresponds to a 0.05 α-level with a Bonferroni correction for the 231 comparisons they make across the bivariate combinations of the autosomal chromosomes. However, this strikes us as the wrong statistical test; we are not concerned as to whether the relatedness of a specific pair of chromosomes co-varies below a strict type I error threshold. Rather, we are worried that there is an overall pattern of relatedness in the data and thus should apply a more sensitive test that minimizes type II error. Along these lines, in Figure 1, we present a histogram of their 231 reported P-values and show that there is indeed an excess of low P-values, particularly below the P<0.10 threshold as compared with a random distribution. Indeed, when we perform a Kolmogorov–Smirnov test on their reported distribution, we find it to deviate from the theoretically expected (uniform) distribution (D+=0.1892, P-value=7.037e−08). Although we do not know the signs of the associated coefficients (as they were not reported by Yang et al.), the overall non-random distribution of correlations suggests that the data fail the test for randomization of alleles across chromosomes.

Figure 1
figure 1

Histogram of P-values from pairwise chromosome regressions of relatedness as presented in Supplementary Table 2 of Yang et al.11 ‘Common single nucleotide polymorphisms (SNPs) explain a large proportion of the heritability for human height.’ Note excess of low P-values, particularly less than 0.10. This suggests that there is a significant pattern of covariance between independently segregating genomic segments and thus potential non-randomness in overall relatedness (that is, potential covariance with population structure and thus environmental confounders): the Kolmogorov–Smirnov test: (D+=0.1892, P-value=7.037e−08).

With this in mind, we do not believe that this core assumption that the environmental similarity between pairs of unrelated persons is uncorrelated with their genetic similarity (below the 0.025 threshold) has not been adequately interrogated. In the present study, we test the key GREML assumption by asking whether the childhood environments of subjects are more similar if they are more related genetically. If pairs of individuals share the experience of urban environment during their respective childhoods—or, conversely, share a non-urban childhood experience—this is likely to have the effect of making their formative social and physical environments more similar than they would be by chance. Thus, if relatedness predicts environmental similarity in this way, it could confound the premise of GREML-based methods of estimating the genetic component of phenotypes. It makes no difference whether urbanicity is itself causal of the phenotype under consideration; it may be acting merely as a proxy for other, more relevant environmental factors—such as social class, nutritional status and so forth—that are themselves related, through environmental channels, to the offspring phenotype (such as height, body mass index or education). That said, a large literature shows that urbanicity is correlated with a range of outcomes studied by geneticists, ranging from mental health19, 20, 21 to immunological response21, 22 to education.23

Health and Retirement Study data allow us to estimate the heritability of urban childhood residence as well as how urban residence during childhood affects GREML estimates of other putatively heritable traits. We used the standard GREML analysis (using Genome-wide Complex Trait Analysis software12) to estimate heritability, with population stratification controlled by principal components (PCs; see Supplementary Materials: Methods). As shown in the first row of Table 1 below, in the Health and Retirement Study sample with two PCs controlled, urban childhood—putatively a childhood environmental variable based on circumstance and parental choices—is indeed highly heritable at 29%. As we suspected that the nonzero heritability might be a result of geographic population structure, we then reran the analysis with 10 and 25 PCs included as controls. These controls attenuated, but did not eliminate, the effect we discovered. Thus, it seems that controls for population structure through deployment of PCs do not adequately address this confounding. We replicated this finding with data from the National Longitudinal Survey of Adolescent Health as well as with another childhood phenotype—maternal education—in the National Longitudinal Survey of Adolescent Health and in the Framingham Heart Study. Both the National Longitudinal Survey of Adolescent Health and the Framingham Heart Study are underpowered to generate statistically precise GREML heritability estimates, but ordinary least square regressions show magnitudes of estimates in line with the Health and Retirement Study results (see Supplementary Materials).

Table 1 GREML heritability estimates for shared childhood urbanicity, height, BMI and educationa

Despite the apparent heritability of childhood residence, when we control for this possible confounder in analysis of common human phenotypes of interest—height, body mass index and years of schooling—we find that the differences between the ‘naive’ models and the ones that hold childhood urbanicity constant are negligible and not statistically significant. In fact, the only phenotype for which the heritability changes to any noticeable degree is respondent education, which drops by a statistically insignificant two percentage points (P=0.8203) in the model with only two PCs. This makes sense: of the three phenotypes, we would expect height to be the least influenced by childhood environment, body mass index in the middle and education to be the most affected by potential environmental confounds. As controlling for more PCs did not appear to eliminate the heritability of a putatively environmental confound—urban childhood—we then tried to see whether using a more restrictive relatedness cutoff (0.01) would address the ‘problem.’ However, when we used this more restrictive cutoff, sample sizes dropped too drastically to yield adequate power. (Results are shown in Supplementary Table 1.)

Our findings have implications not only for GREML analysis of heritability but also for genome-wide analysis more broadly. Namely, some scholars have claimed that PCs adequately control for population stratification, especially when data show no evidence of ‘early take-off’ (that is, across the vast majority of the distribution of P-values, they match what one would expect from chance).24, 25 Our results suggest that directly modeling error terms as a linear function of relatedness in a sample may also be necessary to adjust for stratification.26, 27 Finally, and most importantly, while the key assumption of GREML analysis that the genotype–environment correlation is zero is violated, the consequences of that violation appear to be trivial. We cautiously conclude that GREML is a valid estimation technique for heritability but recommend that going forward, researchers test for the violation of this assumption (and robustness to violations) in their own data sets as a standard sensitivity analysis.