Age patterns of intra‐pair DNA methylation discordance in twins: Sex difference in epigenomic instability and implication on survival

Abstract Aging is a biological process linked to specific patterns and changes in the epigenome. We hypothesize that age‐related variation in the DNA methylome could reflect cumulative environmental modulation to the epigenome which could impact epigenomic instability and survival differentially by sex. To test the hypothesis, we performed sex‐stratified epigenome‐wide association studies on age‐related intra‐pair DNA methylation discordance in 492 twins aged 56–80 years. We identified 3084 CpGs showing increased methylation variability with age (FDR < 0.05, 7 CpGs with p < 1e‐07) in male twins but no significant site found in female twins. The results were replicated in an independent cohort of 292 twins aged 30–74 years with 37% of the discovery CpGs successfully replicated in male twins. Functional annotation showed that genes linked to the identified CpGs were significantly enriched in signaling pathways, neurological functions, extracellular matrix assembly, and cancer. We further explored the implication of discovery CpGs on individual survival in an old cohort of 224 twins (220 deceased). In total, 264 CpGs displayed significant association with risk of death in male twins. In female twins, 175 of the male discovery CpGs also showed non‐random correlation with mortality. Intra‐pair comparison showed that majority of the discovery CpGs have higher methylation in the longer‐lived twins suggesting that loss of DNA methylation during aging contributes to increased risk of death which is more pronounced in male twins. In conclusion, age‐related epigenomic instability in the DNA methylome is more evident in males than in females and could impact individual survival and contribute to sex difference in human lifespan.


| INTRODUC TI ON
The epigenetic regulation of gene activity during the aging process has been intensively investigated during the past decade taking advantage of the rapid development in high-throughput techniques for DNA methylation (DNAm) analysis at genome scale. Multiple epigenome-wide association studies (EWAS) have been performed to examine the progressive changes in DNAm accompanying aging.
Large numbers of differentially regulated CpG sites have been reported which suggest that epigenetic changes have a strong influence on the aging processes (Li et al., 2017;Moore et al., 2015;Reynolds et al., 2020;Tan et al., 2016). Efforts have been made to find sex-specific patterns of age-related changes in the level of DNAm but with no obvious difference between the two sexes on autosomal chromosomes (Jansen et al., 2019;Tan et al., 2016), although a recent analysis reported sex-dependent methylation patterns by age on the X-chromosome (Li et al., 2020).
The identified age-dependent changes in DNAm are directional with CpGs showing increased and decreased methylation (hyperand hypomethylation, respectively) patterns occupying specific genomic regions enriched for different biological functions (Li et al., 2017). The reported mean levels of hyper-or hypomethylation are non-stochastic with a predominant pattern of hypomethylation observed in the older subjects. For example, Li et al., (2017) reported a significantly higher proportion of hypomethylation (61%) with age in two large Lothian Birth Cohorts (Deary et al.,2012) and Johansson et al. (2013) reported an even higher proportion of hypomethylation (64%), re-affirming the notion that the aging-associated epigenetic modification is characterized by gradual and extensive demethylation of the DNA methylome (Zampieri et al., 2015).
Instead of focusing on the mean levels of age-dependent change in DNAm, examining the variability of DNAm across individuals could better reflect individual aberration of DNAm which introduces genomic instability that potentially affects lifespan and healthspan in a sex-specific manner (Fischer & Riddle, 2018). As individual methylation levels are subject to changes due to genetic and environmental factors, comparison across individuals is a challenging task due to individual heterogeneity and differential exposures. Although individual difference in genetic make-ups can be controlled by taking repeated measurements on DNAm over ages in a longitudinal design, it is time-consuming and expensive. Twin pairs, especially monozygotic (MZ) twin pairs who share the same genetic make-ups, are ideal samples to fulfill the need (Tan et al., 2013, Tan, 2019. Intra-pair DNAm difference in MZ twins can be compared across twin pairs of different ages to infer age-related change in DNAm variability as twins in a pair are of the same age. This can be done in male and female MZ twins separately to examine sex differences in age-related DNAm variability as MZ twin pairs are of the same sex. In fact, (Fraga et al.2005) already reported increased discordance in DNAm in a 50-year-old MZ pair when compared with that in a 3-year-old MZ pair. At the Danish Twin Registry (Pedersen et al., 2019), we have over the years accumulated genome-wide DNAm data on multiple cohorts of twins enabling us to unprecedentedly analyze site-specific age-dependent intra-pair DNAm variation in twins. The multiple twin cohorts allowed us to perform sex-stratified discovery and replication analyses on independent samples. The identified significant sites were further characterized for their impact on mortality in cohorts of elderly twins in combination with bioinformatics analysis for functional interpretations.

| The replication cohort
The replication cohort consisted of 292 samples (152 males and 140 females) from 146 pairs of MZ twins aged 30-74 years with a median age of 57 years (Table 1). DNAm data were collected using the same 450K array as in the discovery cohort and were originally generated in connection with an EWAS on birth-weight discordance (BWD) reporting no significant findings (Tan et al., 2014). Both raw and processed DNA methylation data have been deposited to the NCBI GEO database http://www.ncbi.nlm.nih.gov/geo/ under accession number GSE61496.

| Cohorts for mortality analysis
The samples used for the mortality analysis came from the

| DNAm data preprocessing
For each dataset of Danish twins, normalization on the measured DNA methylation level was performed by the functional normalization (Fortin et al., 2014) implemented in the R package minfi. Probes with a detection p value (a measure of an individual probe's performance) >0.01 were treated as missing. CpG sites with more than 5% missing values were removed from the study. After quality control and filtering, a total of 292376 CpGs remained. At each CpG site,

| Adjusting for blood cell composition
Since DNA methylation was measured in whole blood comprising multiple cell types, cellular heterogeneity among twin samples can be an important factor influencing DNAm due to cell specificity of DNA methylation. To control for the effect of cell type composition, the proportions of major leukocyte cell types were estimated using the Houseman method (Houseman et al., 2012) implemented in the R package minfi. Based on the DNA methylation data, the method estimated blood cell composition in each individual twin for 6 blood cell types: CD8T, CD4T, natural killer cells, B cells, monocytes, and granulocytes. In the data analysis, we first regressed DNAm M value on the estimated cell type proportions and then kept the residual for each CpG for downstream statistical analysis.

| Association of age with DNAm variability
The age-dependent DNAm variability is assessed by modeling the difference in DNAm within a MZ twin pair (absolute value) as a function of age in a simple linear regression model as By testing the null hypothesis Ho: β 1 = 0, the age-related change in DNAm variability can be detected if β 1 is significantly different from 0, with β 1 > 0 or β 1 < 0 indicating increased or decreased variability in DNAm with age (two-sided test). In order to assess sex differences in DNAm variability with age, we fitted the above model to male and female twin pairs separately. In the discovery stage, statistical significance of the CpGs was determined by calculating the false discovery rate (FDR) (Benjamini & Hochberg, 1995) and CpGs with FDR<0.05 were defined as genome-wide significant. Top significant CpGs were defined by Bonferroni corrected p values depending on the number CpGs tested.

| Over-representation analysis
Over-representation analysis (ORA) is used to assess if a submitted list of significant CpGs from a replication cohort contains CpGs overlapping with significant CpGs from the discovery cohort more than would be expected by chance by calculating the probability from a hypergeometric distribution, that is, where N is the number of all CpGs on the 450k array, m is the number of significant CpGs found in the replication cohort, n is the number of significant discovery CpGs, and k is the number of overlapping CpGs.

| Biological pathway analysis
To test if genes in one biological pathway are over-represented by genes annotated to a list of significant CpGs, a gene-set enrichment analysis (GSEA) was performed. By replacing N in the equation above with the number of all genes linked to the CpGs on the 450K array, n with a list of genes belonging to biological pathway in N, m with a list of genes linked to all significant CpGs, k with the number of overlapping genes between n and m, a hypergeometric probability can be calculated. GSEA was performed as ORA of canonical pathways at https://www.gsea-msigdb.org/gsea/index.
jsp. Annotation of CpGs was done using the Bioconductor package: IlluminaHumanMethylation450kanno.ilmn12.hg19.  Table S1). In contrast, no significant CpGs were found in female samples (the lowest FDR detected 0.138, corresponding p value 4.69e-07) as shown by the volcano plots in Figure 1. Using Bonferroni correction, we selected CpGs with p < 1e-07 as top significant sites that meet both criteria. In males, we identified 7 CpGs with p < 1e-07, which are linked to the ELFN1, C1QL4, FAM19A1, and SULF2 genes, but no CpG from female twins (Table 2). In Figure S1, the p value of each CpG site is plotted against its base pair position in the Manhattan plots for males (S1a) and females (S1b) displaying much higher statistical significance in male than in female samples. The agerelated change in intra-pair DNAm discordance is plotted in Figure 2 for the top 12 significant CpGs (p ≤ 2.17e-07) in Table 2 with significant increase in DNAm variability displayed for all CpGs. In Figure S2, the distribution of the 3089 significant CpGs is shown over gene regions (S2a) and relative location to CpG islands (S2b). The distribution (red curve) is, overall, not different from that of all CpGs (blue curves) on the 450K array.

| Sensitivity analysis of discovery EWAS
In Table 1, the number of male twins is larger than the number of female twins in the MADT cohort (266 vs 226). To ensure that the pattern of sex difference in Figure 1 is not due to differences in sample size, we performed a sensitivity analysis by re-doing the

| Replication using an independent twin cohort
For replication, the same model for discovery EWAS was applied to the BWD twins with two purposes, (1) replicating the sex-specific patterns in Figure 1 and (2) replicating the 3089 discovery CpGs. In Figure 3, the replication volcano plots display more significant CpGs in male as compared with female samples. The pattern in Figure 3 is consistent with the discovery pattern in Figure 1 but with lower statistical F I G U R E 2 Scatter plots of intra-pair DNAm discordance of each twin pair plotted against age of twin pair for top 12 CpGs significant in males with FDR ≤ 0.005 significance due to a smaller sample size and different age spans. For the 3089 significant discovery CpGs, 2620 CpGs were matched to BWD data because of differences in which CpGs passed through QC, among which 975 were replicated with p < 0.05 (4 CpGs with p < 1e-05, 2 CpGs with p < 1e-6) and same direction of age-related change (corresponding to a replication rate of 37%). A hypergeometric test showed very high statistical significance of the overlap with p < 1e-16. Figure 4 plots the change in intra-pair DNAm discordance by age for the 2620 discovery CpGs matched to BWD data against that for BWD. The replicated CpGs (red dots) exhibit high correlation on agerelated DNAm variability between the two independent cohorts.
Among the top 7 discovery CpGs with p < 1e-07, 5 were matched to the BWD data of which 3 were replicated with p < 0.05 and same direction of age-related change.

| Functional annotations
Based on the biological annotation in Table S2, we performed functional analysis of the significant discovery CpGs. The genes linked to significant sites were submitted to GSEA websites for overrepresentation analysis of KEGG pathways. Twenty-one gene sets (

| Association with mortality
For the 3089 discovery CpGs, 2906 CpGs were matched to the LSADT methylation data. We fitted Cox regression models to the 2906 CpGs adjusting for age at blood sampling and intra-pair correlation as described in the Materials and Methods section. In male twins, 264 CpGs showed significant association with risk of death with p < 0.05 (Table S3). One CpG even meets significance after Bonferroni correction (p < 1.72e-05). A binomial test with a type I error rate of 0.05 as the hypothesized probability of success showed that the correlation with mortality by the discovery CpGs is extremely unlikely by chance (p < 2.2e-16). In female samples, 175 CpGs were correlated with mortality with p<0.05, 1 CpG with a p value below Bonferroni significance (p = 3.24e-06) (Table S3).
Though the discovery CpGs are from male MADT samples, a binomial test still showed a p value of 0.013 indicating a significant, non-random implication of these CpGs in female survival. In Figure   S4,

| DISCUSS ION
We have conducted an EWAS on age-related intra-pair DNAm discordance in MZ twins to identify CpGs with high variability in their

TA B L E 3
Over-represented KEGG pathways by genes linked to significant CpGs (FDR < 0.05) in male twins chronological age for assessment across ages; (2) individual genetic make-ups are perfectly matched for effectively controlling potential genetic influences on DNAm level so that the identified age-related DNAm variability patterns are independent of individual genetic variations and are purely associated with environmental factors or stochastic events. This is highly relevant with respect to mortality as it has been estimated that genetic variation only account for around 25% of human lifespan variation (Herskind et al., 1996); (3) with identical twin pairs, covariates such as anthropometric measurements can be canceled by studying intra-pair difference on methylation.
As a result, the use of MZ twin pairs helped to enrich the power of this study.
The increased intra-pair DNAm variability with age was already reported by Fraga et al., (2005) and Boks et al., (2009) Figure 1) suggests that the female epigenome could be more stable and less vulnerable during the aging process which may help to maintain genomic stability and expand female healthspan as well as lifespan. The fact that the observed sex difference was replicable in an independent sample (Figure 3) suggests that the age-related DNAm variability is stable across samples, while the high replication rate of 37% in an independent twin cohort indicates that specific CpGs might be more responsive to environmental stimuli in males than in females. In Figure 1, nearly all significant CpGs (FDR < 0.05) are characterized by increased intra-pair DNAm discordance or epigenetic variability which could reflect the accumulative effects of environmental exposure during the life course which is, as shown by F I G U R E 5 Association with mortality by discovery CpGs in old LSADT twins shown by plotting negative log of the p value (base 10) for each CpG site (y-axis) against intra-pair L-S difference in DNAm for male (a) and female (b) twins and by plotting Cox regression coefficient of each CpG site against intra-pair L-S difference in DNAm for male (c) and female (d) twins. The colored dots are CpGs overlapping CpGs between male and female samples with L-S>0 (red dots) or L-S<0 (green dots) our results, more pronounced in males than in females perhaps due to biological mechanisms.
In the discovery EWAS for female twins (Table S1) ing for 28%). The pattern is also displayed in Figure 1b suggesting that the age-related epigenetic instability is also observed in females but not as striking as in males (Figure 1a).
In  , 2016). Differential methylation in the intronic region of the C1QL4 gene has been found in a EWAS of coronary artery disease patients (Sharma et al., 2014). FAM19A1 mRNA expression is restricted to the central nervous system, and the expression level of the gene correlates with brain development and neurogenesis (Zheng et al., 2018). Genetic variation of FAM19A1 has been found to associate with human longevity in a large scale genome-wide association study (Zeng et al., 2018). Decreased expression of SULF2 was found in Alzheimer's disease patients indicating its implication in regulating neuronal signaling (Roberts et al., 2017).
Biological pathway analysis by GSEA (Table 3) showed enriched KEGG pathways involved in cancers, multiple signaling pathways and metabolism, underlying major aging-associated diseases and traits.
Interestingly, the significantly enriched biological pathways (Table 3) contain a number of important evolutionarily well-conserved signal transduction pathways of aging, that is, the WNT (Wnt) and Hedgehog (Hh) signaling pathways. Effective interactions within key signal transduction networks determine success in embryonic organogenesis, and postnatal tissue repair throughout developmental stages and aging (Carlson et al., 2008). Imbalances within these pathways during aging could be speculated to be responsible for age-related degenerative diseases and phenotypes. Both the Wnt and Hh pathways have shown implications in a variety of cancers (Taipale & Beachy, 2001), and it is possible that aberrant Hh pathway activation manifests cancer phenotypes by directly altering normal Wnt signaling through Wnt-Hh cross-talk (Carlson et al., 2008).
By performing survival analysis on a cohort of elderly twins (LSADT cohort), we were able to show the extensive implication of the identified age-related variability CpGs in individual's risk of death in male twins. Interestingly, the variability CpGs detected in males also showed non-random effects on female survival both in survival analysis by Cox regression and in intra-pair differential methylation comparison between longer-and shorter-lived twins. In It is hypothesized that the DNA methylome, evolved to increase stability of the differentiated state in somatic cells may have helped to increase longevity (Mendelsohn & Larrick, 2013). Considering the fact that a very large proportion of CpG sites in the genome have methylated cytosines (75-85%) (Kojima et al., 2018), the CpGs with L-S>0, which account for most of the significantly variable CpGs, could indicate that decreased DNAm or loss of control over gene activity is harmful to survival. The observed significant involvement in mortality by the highly variable CpGs in males could potentially help to explain the female survival advantage. Given the limited genetic contribution to human lifespan (Herskind et al., 1996;Ruby et al., 2018), studying the environment-mediated epigenetic instability in aging may hold the key to promoting healthy aging and extending human life span.

| CODE AVAIL AB ILIT Y
All R codes are used for calculation and data analysis are available upon contact with the corresponding author at qtan@health.sdu.dk.

ACK N OWLED G M ENT
This study was supported by grants from DFF research project

CO N FLI C T O F I NTE R E S T S
None declared.