Genetic and environmental causes of variation in epigenetic aging across the lifespan

Background DNA methylation-based biological age (DNAm age) is an important biomarker for adult health. Studies in specific age ranges have found widely varying results about its genetic and environmental causes of variation. However, these studies are not able to provide a comprehensive view of the causes of variation over the lifespan. Results In order to investigate the genetic and environmental causes of DNAm age variation across the lifespan, we pooled genome-wide DNA methylation data for 4217 people aged 0–92 years from 1871 families. DNAm age was calculated using the Horvath epigenetic clock. We estimated familial correlations in DNAm age for monozygotic (MZ) twin, dizygotic (DZ) twin, sibling, parent–offspring, and spouse pairs by cohabitation status. Genetic and environmental variance components models were fitted and compared. We found that twin pair correlations were − 0.12 to 0.18 around birth, not different from zero (all P > 0.29). For all pairs of relatives, their correlations increased with time spent living together (all P < 0.02) at different rates (MZ > DZ and siblings > parent–offspring; P < 0.001) and decreased with time spent living apart (P = 0.02) at similar rates. These correlation patterns were best explained by cohabitation-dependent shared environmental factors, the effects of which were 1.41 (95% confidence interval [CI] 1.16 to 1.66) times greater for MZ pairs than for DZ and sibling pairs, and the latter were 2.03 (95% CI 1.13 to 9.47) times greater than for parent–offspring pairs. Genetic factors explained 13% (95% CI − 10 to 35%) of variation (P = 0.27). Similar results were found for another two epigenetic clocks, suggesting that our observations are robust to how DNAm age is measured. In addition, results for the other clocks were consistent with there also being a role for prenatal environmental factors in determining their variation. Conclusions Variation in DNAm age is mostly caused by environmental factors, including those shared to different extents by relatives while living together and whose effects persist into old age. The equal environment assumption of the classic twin study might not hold for epigenetic aging.

Lifestyle factors, disease risk factors, and genetic variants have been reported to be associated with DNAm age [2][3][4][7][8][9][10]. Pedigree-based and single nucleotide polymorphism (SNP)-based studies have given widely varying estimates of the proportion of variation in DNAm age explained by genetic factors, ranging from 0 to 100% [6][7][8][9][11][12][13]. One possible reason for this is that these studies focused on specific age ranges only. There is also evidence that environmental factors shared within families explain a substantial proportion of variation in the middle age [14]. Individual studies of specific age ranges are not able to provide a comprehensive view of the causes of variation over the lifespan.
We previously pooled DNA methylation data from a variety of twin and family studies in which participants were at different life stages, from birth to older age. We found evidence that variation in genome-wide average methylation is caused to a great extent by prenatal environmental factors, as well as by environmental factors shared by relatives (including spouse pairs) when they cohabit and that these effects can persist at least to some extent across the whole lifetime [15]. If specific age ranges were studied separately, these findings might not have been found.
We have now applied the same approach to investigate the genetic, shared environmental, and individualspecific environmental causes of variation in DNAm age across the lifespan.
DNAm age was calculated using the Horvath epigenetic clock [12] (https ://dnama ge.genet ics.ucla.edu/ new), as this clock is mostly applicable to our multi-tissue methylation data and study sample including newborns, children, and adults.
DNAm age was moderately to strongly correlated with chronological age within each dataset, with correlations ranging from 0.44 to 0.84 (Fig. 1). The variance of DNAm age increased with chronological age, being small for newborns, greater for adolescents, and relatively constant with age for adults (Fig. 2). A similar pattern was observed for the absolute deviation between DNAm age and chronological age (Table 1). Within each study, MZ and DZ pairs had similar absolute deviations and residuals in DNAm age adjusted for chronological age. Table 2 shows the within-study familial correlation estimates. There was no difference in the correlation between MZ and DZ pairs for newborns or adults, but there was a difference (P < 0.001) for adolescents: 0.69 (95% confidence interval [CI] 0.63 to 0.74) for MZ pairs and 0.35 (95% CI 0.20 to 0.48) for DZ pairs. For MZ and DZ pairs combined, there was consistent evidence across datasets and tissues that the correlation was around − 0.12 to 0.18 at birth and 18 months, not different from zero (all P > 0. 29), and about 0.3 to 0.5 for adults (different from zero in seven of eight datasets; all P < 0.01). Across all datasets, the results suggested that twin pair correlations increased with age from birth up until adulthood and were maintained to older age.

Within-study familial correlations
The correlation for adolescent sibling pairs was 0.32 (95% CI 0.20 to 0.42), not different from that for adolescent DZ pairs (P = 0.89), but less than that for adolescent MZ pairs (P < 0.001). Middle-aged sibling pairs were correlated at 0.12 (95% CI 0.02 to 0.22), less than that for adolescent sibling pairs (P = 0.02). Parent-offspring pairs were correlated at 0.15 (95% CI 0.02 to 0.27), less than that for pairs of other types of first-degree relatives in the same study, e.g., DZ pairs and sibling pairs (both P < 0.04). The spouse-pair correlations were − 0.01 (95% CI − 0.25 to 0.24) and 0.12 (95% CI − 0.12 to 0.35).
From the sensitivity analysis, the familial correlation results were robust to the adjustment for blood cell composition (Additional file 1: Table S1).

Familial correlations across the lifespan
From modeling the familial correlations for the different types of pairs as a function of their cohabitation status (Additional file 1: Table S2), the estimates of θ (see "Methods" section for definition) ranged from 0.76 to 1.20 across pairs, none different from 1 (all P > 0.1). We therefore fitted a model with θ = 1 for all pairs; the fit was not different from the model above (P = 0.69). Under the latter model, the familial correlations increased with time living together at different rates (P < 0.001) across pairs. The decreasing rates did not differ across pairs (P = 0.27). The correlations for DZ and sibling pairs were similar (P = 0.13), and when combined their correlation was different from that for parent-sibling pairs (P = 0.002) even though these pairs are all genetically first-degree relatives, and was smaller than that for the MZ pairs (P = 0.001).
We then fitted a model in which DZ and sibling pairs were combined and the decreasing rates were the same across all pairs. The goodness of fit of this model was not inferior to that of the model above (P = 0.14), and the model included fewer parameters. Under this model, the familial correlations for MZ, DZ and sibling, and parent-offspring pairs all increased with time living together (all P < 0.02) with different increasing rates (P < 0.001); most rapidly for MZ pairs (λ = 0.041, 95% CI 0.035 to 0.048), less rapidly for DZ and sibling pairs (λ = 0.026, 95% CI 0.020 to 0.031), and least rapidly for parent-offspring pairs (λ = 0.011, 95% CI 0.002 to 0.0021), and decreased with time living apart (P = 0.02); see Fig. 3.

Causes of variation across the lifespan
Results from modeling the causes of variation across the lifespan are shown in Fig. 4 and Additional file 1: pairs. For all pairs, the proportion of variation explained by shared environmental factors increased with time living together (P < 0.001) and decreased at a slower rate with time living apart (P = 0.02).
Under the above cohabitation-dependent CE model, we further assumed that the variation is additionally caused by genetic factors whose effects are constant across the lifespan. Genetic factors were estimated to explain 13% (95% CI − 10 to 35%) of the variation (P = 0.27). That is, after taking into account the existence of non-genetic cohabitation-dependent effects, there was no evidence for a substantive role of genetic factors.

Results for other DNAm age measures
We also similarly studied two other DNAm age measures, a skin and blood clock developed by Horvath et al. [16] and a blood clock developed by Han et al. [17], which are also developed across tissues and/or ages. Overall, DNAm ages predicted by the two measures appeared to be more similar to chronological age than the DNAm age predicted by the Horvath epigenetic clock: within the same study, they had higher correlations with chronological age (Additional file 2: Figure  S1, Additional file 3: Figure S2) and lower absolute deviations from chronological age (Additional file 1: Table S4). For both measures, MZ and DZ pairs had similar absolute deviations and residuals in DNAm age adjusted for chronological age. Similar to the DNAm age predicted by the Horvath epigenetic clock, the variance of the DNAm ages predicted by the two measures increased with age in early life and remained relatively constant with age in adulthood (Additional file 4: Figure S3, Additional file 5: Figure S4).
Additional file 1: Table S5 shows the within-study familial correlation results for the two measures. For both measures, similar results to those for the Horvath epigenetic clock were observed: twin pair correlations increased with age from birth to adulthood and decreased with age in adulthood; no evidence that the twin-pair correlations differed by zygosity was observed across the lifespan, except in adolescence and at age 18 years. For both measures, newborn twins were

Discussion
Our study provides novel insights into the causes of variation in DNAm age across the lifespan, which appear to be almost entirely environmental (i.e. non-genetic) factors. These include cohabitation-related environmental factors that are evident prior to adulthood, and whose effects persist across the whole of the lifespan. Two longitudinal studies have also found that DNAm age is largely set before adulthood [18]. Our data suggest that people in the same family are not correlated in DNAm age when they start cohabiting; the longer they live together, the more similar they become but at a rate that differs substantially depending on their relationship. This is likely due to the different types of relatives sharing environmental factors relevant to DNAm age to different degrees. When pairs of relatives live apart, they no longer share the cohabitation environment, and this is reflected by a slow dissipation of the effects of shared environmental factors across adulthood at a rate that appears to be similar for all pairs.
Our study is the first to provide a comprehensive view of the genetic and environmental causes of DNAm age variation across the lifespan. Focusing on limited age ranges or types of relatives might bias the interpretation for the causes. For example, if middle-aged (e.g., 40-70 years old) twins only (i.e., no siblings, parents or spouses) were studied, the higher MZ pair correlation compared with DZ pair correlation at that age range (see Fig. 3) might have been interpreted as being caused by genetic factors to some extent, as there are no data from other age ranges or types of relatives contributing to the interpretation. Without using data of various types of relatives whose ages cover the whole lifespan, the comprehensive view would have not been easily obtained.
For MZ pairs, some DNA methylation measures have been found to be similar at birth but divergent over the lifetime, a phenomenon called 'epigenetic drift' [15,19].
DNAm age, however, shows a different pattern; MZ pairs are not similar at birth (and neither are DZ pairs) but become more similar the longer they live together, and do so more rapidly than do DZ or other pairs of relatives. In adulthood, MZ pairs then appear to slowly become less similar in DNAm age the longer they live apart, at the same rate as for other pairs of relatives, but still maintain a substantial similarity even into late life. These observations suggest that DNAm age reflects biological aging processes beyond what is reflected by DNA methylation alone.
Our finding that environmental factors shared while cohabiting play a major role in determining the variation in DNAm age is also supported by the observation that the variance of DNAm age increased dramatically with age prior to adulthood and was relatively stable across adulthood (Fig. 2, Additional file 4: Fgiure S3, Additional file 5: Figure S4). The latter has also been found by previous studies [18].
We investigated DNAm age based on other two pantissue/age clocks and found similar results to those for the Horvath clock. These results imply the role of cohabitation-related environmental factors in influencing the variation in these two clocks as well and suggest that our findings are robust to the way DNAm age is measured. These results of newborn MZ and DZ pairs were not differentially correlated in the two clocks implying the additional role of prenatal environmental factors in influencing the variation in these clocks, similar to what we found for the genome-wide average DNA methylation [15].
Given DNAm age has been found to be associated with the risks of death and various diseases in adulthood, identifying the environmental factors affecting DNAm age prior to adulthood might give novel insights into which, and how, early-life factors impact late-life health outcomes. This would have obvious implications for prevention and its timing. There is some evidence that DNAm age is associated with physical developmental characteristics, and exposures to stress and violence for children, although most studies had a moderate sample size [20][21][22][23][24]. Model details-AE model: variation was assumed to be caused by only A and E, and the effects of A are constant across the lifespan; cohabitation-dependent AE model: variation was assumed to be caused by only A and E, and the effects of A depend on cohabitation; cohabitation-dependent ACE model: variation was assumed to be caused by A, C and E, and the effects of A and C both depend on cohabitation; cohabitation-dependent CE model: variation was assumed to be caused by only C and E, and the effects of C depend on cohabitation The classic twin design assumes that MZ and DZ pairs share environmental effects relevant to the trait of interest to exactly the same extent, i.e., the equal environment assumption. Our study shows that this assumption might not hold for DNAm age because there was strong evidence that MZ and DZ pairs share their pre-adult environmental effects to different extents. Furthermore, DZ and sibling pairs were more correlated than parent-offspring pairs, despite all being genetically first-degree relatives of one another; this is not consistent with the correlations predicted by additive genetic factors. Given there is no substantive evidence of genetic effects, our results are not consistent with gene-environment interaction either [25]; we found that models including genetic effects, no matter whether as constant or cohabitationdependent, were less consistent with the data compared with the cohabitation-CE model.
Previous twin and pedigree studies assumed the equal environment assumption holds perfectly and consequently reported the heritability of DNAm age to be ~ 40% in adolescence and middle age [6,9,12]. Note that under our cohabitation-dependent AE model (which makes the equal environment assumption), genetic factors would explain ~ 40% of variation in adolescence and middle age. This model, however, was not a good fit and was rejected in favor of models that included cohabitation-dependent environmental effects.
Studies have predicted that measured SNPs could explain 0-70% of variation in DNAm age measured from whole blood and brain tissue [7][8][9]11]. Those analyses explicitly assumed, however, that all of the phenotypic covariance is due to genetic factors. In particular, one study predicted the SNP-based heritability of DNAm age based on mothers and children increased with the children's age, being zero when the children were around birth and 37% when the children were 15 years old [7]in line with our data and the estimates under the cohabitation-dependent AE model that was rejected. Without relying on the equal environment assumption, we found that genetic factors explained at most a small, and not statistically significant, proportion (~ 10%) of variation. Therefore, studies using the equal environment assumption might have overestimated the influence of genetic factors on DNAm age variation.
Our study has several strengths. One strength is that we have included participants whose ages covered the whole lifespan, so we could provide insights into the genetic and environmental causes of DNAm age variation which are unable to be provided by studies focusing on specific ages only. The other strength is that we have substantial sample size, even within studies, so we can detect moderate correlations with good precision, and have the power to distinguish between different variance components models. Our findings should be interpreted with caution, given that they are from statistical modeling which alone cannot prove that a consistent model is a true representation of nature. All that can be said is whether or not the data 'are consistent with' a particular explanation. Nonetheless, statistical modeling is an attempt to identify the plausible and implausible explanations of data, and our results suggest that cohabitation environmental factors being shared by pairs of relatives to different extents are more plausible than genetic explanations.

Conclusions
The variation in epigenetic aging across the lifespan is most consistent with having been caused, at least to a large extent, by environmental factors, including those shared to different extents by relatives while living together. The effects of the cohabitation environment increase with the time living together and persist into old age. The equal environment assumption of the classic twin study might not hold for epigenetic aging. Given the relationships between DNAm age and health outcomes, these findings highlight the importance and potential of pre-adulthood prevention related to environmental factors for adult diseases and biological aging.

Study sample
We analyzed genome-wide DNA methylation data from 10 studies, most of which were accessed through pub-  (Table 1 and Additional file 1).

Data preprocessing
As several datasets on public repositories contained quality-controlled and preprocessed data only, we were unable to apply the same preprocessing methods across datasets. We used the study-specific data preprocessing methods to address study-specific technical variations. This design allows us to investigate true biological signals independent of any bias introduced from a unifying data preprocessing approach. In DNAm age calculation, we chose the 'Normalize Data' option of the online calculator to normalize each dataset to be comparable to the training data of this epigenetic clock.

DNAm age and epigenetic age acceleration
We used the Horvath epigenetic clock [12] to determine DNAm age (https ://dnama ge.genet ics.ucla.edu/new) because it was developed across tissues and ages, and the 353 methylation sites used by this clock are common to the three methylation arrays used by the 10 studies (Table 1).
To adjust for the effects of chronological age on DNAm age, we studied epigenetic age acceleration, calculated as the residuals from a linear regression of DNAm age on chronological age. This calculation was done for each longitudinal measurement of the PETS 450K dataset and of the LSADT, for each generation of the BSGS, and for each age group of the DTR. For the PETS 27K dataset, DNAm age was standardized to have zero mean and unit variance for each type of biological sample, and the average standardized DNAm age across biological samples was used to calculate epigenetic age acceleration.
Sensitivity analyses were performed using only those studies in which DNA methylation was measured in blood to examine the robustness of results to cell composition. Naive CD8+ T cells, exhausted CD8+ T cells, plasmablasts, CD4+ T cells, natural killer cells, monocytes, and granulocytes estimated from the DNA methylation data [12,26] were additionally adjusted for in calculating epigenetic age acceleration.
We studied two other DNAm age measures which were developed across tissues and/or ages too, so they might be also applicable to our data. One is the skin and blood clock developed using multi-tissue methylation data of a sample aged 0-94 years [16]. As some of the 391 methylation sites used by this clock were not included the PETS 450K and 27K datasets, these datasets were not included in its analysis. The other measure is developed by Han et al. [17] using a sample aged 1-101 years. As the measure is developed using HM450K array blood methylation data, non-blood or 27K datasets were not included in its analysis.

Statistical analysis
Residuals of epigenetic age acceleration adjusted for sex were used in subsequent analyses. We used a multivariate normal model for pedigree analysis [27,28] and the program FISHER [29] to estimate correlations for different types of pairs (MZ, DZ, sibling, parent-offspring and spouse) and to fit variance components models. The likelihood ratio test was used to compare nested models. All P values were twosided, and P < 0.05 was considered significant.
According to the pattern in familial correlations by chronological age, and following previous theoretical and empirical studies [15,27,30], the familial correlations across the lifespan were modeled as a function of the cohabitation status of the pair. The modeling was performed using the pooled data across all studies. Studyspecific variance in the residuals was used in analysis. For individuals i and j from the same family, their correlation was modeled as where 0 ≤ θ ≤ 2, and λ, υ ≥ 0.
Under this model, the correlation when the pairs start to live together is θ minus 1, and λ and υ reflect the increasing and decreasing rates at which the familial correlation increases with the length of cohabitation and decreases with the length of separation, respectively. The definitions of t and t 0 depend on the relationship between i and j: (1) for twin pairs, t = chronological age and t 0 = 18 years; (2) for sibling pairs, t = chronological age of the younger sibling and t 0 = chronological age of the younger sibling when the older sibling was 18 years old; (3) for parentoffspring pairs, t = chronological age of the offspring and t 0 = 18 years; and (4) for spouse pairs, t = time in years since the pair married (assumed to be the average chronological age of the pair minus 24 years) and t 0 = time in years when the pair became separated (if known).
For individuals i and j from the same family, their covariance was modeled as where α, β A , β C , λ A , λ C , υ A , υ C ≥ 0, and the definitions of t and t 0 are the same as above.
We assumed that the variation of DNAm age can be caused by combinations of additive genetic factors (A), shared environmental factors (C), and individual-specific environmental factors (E). We assessed model fits using the Akaike information criterion (AIC) for the following models and assumptions:

AE model: variation is caused by only A and E;
the effects of A are constant across the lifespan; α = 2 × kinship coefficient, β A , β C , λ A , λ C , υ A , υ C = 0, and σ A 2 is free to be estimated. 2 Cohabitation-dependent AE model: variation is caused only by A and E; the effects of A depend on