Epigenetic-aging-signature to determine age in different tissues.

All tissues of the organism are affected by aging. This process is associated with epigenetic modifications such as methylation changes at specific cytosine residues in the DNA (CpG sites). Here, we have identified an Epigenetic-Aging-Signature which is applicable for many tissues to predict donor age. DNA-methylation profiles of various cell types were retrieved from public data depositories - all using the HumanMethylation27 BeadChip platform which represents 27,578 CpG sites. Five datasets from dermis, epidermis, cervical smear, T-cells and monocytes were used for Pavlidis Template Matching to identify 19 CpG sites that are continuously hypermethylated upon aging (R > 0.6; p-value <10−13). Four of these CpG sites (associated with the genes NPTX2, TRIM58, GRIA2 and KCNQ1DN) and an additional hypomethylated CpG site (BIRC4BP) were implemented in a model to predict donor age. This Epigenetic-Aging-Signature was tested on a validation group of eight independent datasets corresponding to several cell types from different tissues. Overall, the five CpG sites revealed age-associated DNA-methylation changes in all tissues. The average absolute difference between predicted and real chronological age was about 11 years. This method can be used to predict donor age in various cell preparations - for example in forensic analysis.


INTRODUCTION
Aging has different consequences in different tissues -it results for example in wrinkle formation of dermis, graying of epidermally-derived hair, loss of bone formation, myeloid bias of blood, and compromised function of the immune system [1]. Despite this wide spectrum of tissue specific age-associated changes the underlying molecular mechanisms might be related. Aging has been associated with accumulation of cellular defects such as DNA damage and telomere shortening. On the other hand, there is accumulating evidence that aging rather resembles a developmentally regulated process which is tightly controlled by specific epigenetic modifications [2][3][4][5][6][7][8].
Among epigenetic modifications, DNA methylation is best characterized. CpG dinucleotides in the mammalian genome can be enzymatically methylated at

Research Paper
cytosines -and many studies demonstrated the occurrence of age-associated modifications in the DNAmethylation pattern [9][10][11][12]. Recently, this research gained further momentum by available technologies such as microarray platforms [13]. Among these the HumanMethylation27 BeadChip facilitates simultaneous analysis of 27,578 CpG sites which are associated with promoter regions of more than 14,000 annotated genes [14]. Previously, we used this microarray for analysis of age-associated DNA methylation changes in mesenchymal stem cells (MSC) and fibroblasts [4,15,16]. Despite in vitro culture for several weeks these DNA-methylation profiles still reflected age-associated changes that relate to their donors, but this regulation differed markedly between MSC and fibroblasts indicating cell type specificity. Many other authors have used this platform to determine age-associated changes in primary tissues including dermis [17], epidermis [17], blood [11,18], cord blood [19,20] and cervical smear [21]. Recently, Bocklandt et al. described a predictor of age for saliva samples which was generated by a dataset of 34 male twin pairs [22]. Based on three CpG sites associated with the genes neuronal pentraxin II (NPTX2), EDARassociated death domain (EDARADD) and target of myb1 (chicken)-like 1 (TOM1L1) they were able to predict donor age in independent saliva samples [22]. Overall, age-associated DNA-methylation changes are highly reproducible but most of them seem to resemble a tissue-specific phenomenon [12,23].
On the other hand, some age-associated DNAmethylation changes do not appear to be tissue specific: Teschendorff and co-workers have identified a specific subset of 69 CpGs which are associated with polycomb group protein target genes and which revealed ageassociated changes -notably, they described similar modifications in seven independent data sets including normal and cancerous tissues as well as cultured MSC [21]. Furthermore, 10 CpG sites were overlappingly identified upon aging in saliva and blood samples [11,22]. It is conceivable, that such non-cell type dependent age-associated changes are of central relevance for the underlying process -and they might facilitate age-predictions in heterogeneous cell preparations. Therefore, we have combined several published DNA-methylation datasets to elaborate an Epigenetic-Aging-Signature which can be used for agepredictions across different tissues.

Selection of DNA-methylation datasets
For this study, we have combined several datasets which were retrieved from public data repositories. We have only considered datasets that 1) used the same Infinium HumanMethylation27 BeadChip platform, 2) were generated with freshly isolated cells to exclude effects by culture expansion, 3) used non-cancerous material since malignant transformation might influence age-related changes, and 4) provided reliable information about donor age. DNA-methylation datasets of 13 different cell types or tissues were used: 5 datasets were implemented as a training-set for identification of the Epigenetic-Aging-Signature and 8 datasets were reserved for subsequent validation (table 1). For each of the 27,578 CpG sites the percentage of DNAmethylation was provided as beta value ranging from 0 to 1. Overall, the distribution of DNA-methylation level was similar in all samples of the training-set as determined by quantile analysis of beta-values. There was no clear association between global methylation level and donor age ( Figure 1A). Several studies demonstrated that the global DNA-methylation level decreases upon aging [24][25][26]. However, the HumanMethylation27 BeadChip represents specific CpG sites which are predominantly associated with promotor regions and this might be the reason why global loss of DNA-methylation was not observed.

Various CpG sites reveal age-associated hypermethylation
Subsequently, we used Pavlidis Template Matching (PTM) [27] to identify CpG sites which correlated in their methylation level with donor age across the five datasets of the training-group. A template was specified according to the donor age (relative values between 0 and 1) and the beta-values of each CpG site were then compared to this template to identify CpG sites with either continuous hypermethylation or hypomethylation upon aging (Pearson correlation). Initially, we used very stringent parameters with a regression coefficient R of more than 0.6 (corresponding to a p-value <10 -13 ). 19 CpG sites passed this criterion -notably, all of them revealed hypermethylation upon aging ( Figure 1B). These methylation changes might be influenced by the varying distribution of samples across age groups. To analyze if the 19 CpG sites also revealed age-associated changes within individual datasets we performed PTM analysis for each dataset separately and in most cases this resulted in a similar correlation (Table 2). Subsequently, we used a less stringent cut-off of R > 0.4 (p-value <10 -5 ) resulting in age-associated hypermethylation at 431 CpG sites whereas 25 CpG sites were hypomethylated. This is in line with previous reports that demonstrated predominant hypermethylation at specific sites upon aging whereas hypomethylation might be less tightly regulated [11,21,22]. Taken together, several CpG sites revealed continuous age-associated methylation changes across all tissue types.

Identification of the Epigenetic-Aging-Signature
Next, we selected a subset of CpG sites to be integrated into the Epigenetic-Aging-Signature. Therefore, we have chosen CpGs which correlated with donor age across the whole training-set as well as in individual datasets. Another criterion was the variation in DNA-methylation level between young and elderly donors as larger changes are less prone to technical noise. Comparison of agepredictions in the training set led us to four hypermethylated CpG sites corresponding to tripartite motif-containing 58 (TRIM58; cg07533148), KCNQ1 downstream neighbor (KCNQ1DN; cg01530101), neuronal pentraxin II (NPTX2; cg1279989) and glutamate receptor ionotropic AMPA 2 (GRIA2; www.impactaging.com cg25148589). We reasoned that predictions might be more robust by additional consideration of a hypomethylated CpG site. Therefore, we have also included XIAP associated factor-1 (BIRC4BP; cg23571857) despite a lower correlation coefficient (R = -0.45; p = 9.76 x 10 -8 ). Selection of CpG sites was irrespective of gene function as it has been shown, that site-specific methylation changes are hardly associated with differential gene expression [4,15,18]. Furthermore, we observed age-associated hyper-and hypomethylation in the same promoter region -for example in KCNQ1DN ( Figure 2). Notably, the epigenetic age predictor for saliva samples by Bocklandt and co-workers also included the CpG site corresponding to NPTX2 [22] and TRIM58 as well as GRIA2 were also included in their 88 age-related CpG sites. This overlap is remarkable since these authors used different bioinformatic methods and their data was not included in our training-set.   www.impactaging.com For each CpG site we performed a linear regression analysis: the beta-values were plotted against donor age for all samples of the training-set ( Figure 3A). Based on these equations we could inversely calculate donor age for each given beta-value. The mean of the five predictions of the Epigenetic-Aging-Signature was then used to estimate donor age. When we combined all five CpG sites, the predictions correlated with an average precision of ± 9.3 years ( Figure 3B). Alternatively, we focused only on three CpG sites with the most significant age-associated correlation (NPTX2, GRIA2 and KCNQ1DN) -even this smaller subset facilitated an average precision of ± 10.3 years in the training-set ( Figure 3C). www.impactaging.com

Validation of the Epigenetic-Aging-Signature
The Epigenetic-Aging-Signature was then tested on the eight independent datasets of the validation-group (Table  1). To this end, we have only considered the five betavalues which corresponded to the CpG sites of the Epigenetic-Aging-Signature. Each of these CpG sites revealed age-associated changes in analogy to the training-set ( Figure 4A). The beta-values were then used for the linear-regression models of the training-group to estimate the donor age. The predictions for donor age in the validation-group also correlated with the real age with an average precision of ± 12.7 years ( Figure 4B). These predictions were even improved when we focused on the three most significant CpG sites of the signature (KCNQ1DN, NPTX2 and GRIA2) -then the average precision was ± 11.4 years ( Figure 4C). For some individual datasets the precision was even less than 6 years. Gender-related differences in the age-predictions were not observed using this signature (data not shown). www.impactaging.com Epigenetic changes are a hallmark of aging -but it is yet unclear how these modifications are regulated [6]. DNA-methylation changes have been shown to be enriched in target genes of polycomb complexes [21] or bivalently modified DNA [11]. Recently, we have demonstrated that long-term culture related DNAmethylation changes in MSC are associated with repressive histone marks [2]. Thus, it may be speculated that protein complexes which are associated with the histone code are involved in this process. Pavlidis Template Matching was used to identify CpG sites with the most significant age-associated changes. 19 CpG sites revealed hypermethylation with a Pearson correlation coefficient R of > 0.6 in all samples of the training group. Significant age-associated correlations were also observed in most individual datasets. CpG sites of the Epigenetic-Aging-Signature are indicated in grey. *One additional hypomethylated CpG site (cg23571857) was included in the predictor. www.impactaging.com

CONCLUSION
In this study we have identified an Epigenetic-Aging-Signature consisting of five CpG sites which facilitates predictions of donor age across different tissue types. This method can for example be used in forensic analysis to estimate donor age of unknown tissue specimen including blood. It has to be noted, that chronological age is not identical with biological age and it is conceivable that some of the discrepancy between predicted and real age can be attributed to this difference -further research might facilitate determination of the biological age for personalized medicine. A-GEOD-8490). After literature search we decided to include the following 13 datasets for subsequent analysis which were divided in a training-group for identification of the Epigenetic-Aging-Signature and a validation-group.

DNA
The authors of these important primary studies have to be acknowledged: Grönniger and co-workers isolated keratinocytes from epidermal suction blisters and dermal fibroblasts from punch biopsies (E-MTAB-202) [17]. Epithelial cells from cervical smear samples (19 HPV negative controls, 11 HPV positive controls) were collected and analysed as described by Teschendorff et al. (GSE20080) [21]. Leucocytes, CD4 + T-cells and CD14 + monocytes were isolated from fresh venous whole blood as described by Rakyan and co-workers (GSE20242 and GSE20236) [11]. Saliva samples comprising buccal epithelial cells and leucocytes were collected as described in detail by Bocklandt et al. (GSE28746) [22]. CD34 + hematopoietic progenitor cells (HPC) were isolated from cord blood and from G-CSF mobilized peripheral blood as described by Bocker and colleagues [19] (E-MTAB-487; monocytes and granulocytes were not included to keep the cell specification homogeneous). Peripheral blood lymphocytes were isolated from whole blood as described in Chen et al. (GSE23638) [18]. Mononuclear cells were harvested by centrifugation of whole blood isolated from umbilical cord blood (GSE27317) [20]. Teschendorff and co-workers analyzed whole blood samples of healthy postmenopausal women (GSE19711) [21]. Normal breast organoids prepared by enzymatic digestion of reduction mammoplasty specimens were analyzed by Fackler et al. (GSE31979) [28]. Essex and colleagues determined DNAmethylation profiles in saliva samples of fifteen-yearsold adolescents (GSE25892) [29]. Age ranges and sample numbers are summarized in Table 1.
Combination of different datasets. Beta-values of the different datasets were combined by the reference ID of the Infinium HumanMethylation27 BeadChip platform (Illumina Inc., San Diego, CA, USA). These beta-values represent the percentage of methylation at each of the 27,578 CpG sites -they are continuous variables between 0 and 1 and represent the intensity ratio of the methylated bead to the combined locus intensity. Background normalized raw data of these beta-values were determined with the BeadStudio software (Illumina) and retrieved from the public data repositories Gene Onmibus and Array Express. Initially we considered various normalization regimen including quantile normalization to minimize chip effects [30]. On the other hand, it is expected that methylation patterns vary between different cell tissues and this would be masked by such normalization regimen. Beta-values are less affected by normalization than the relative gene expression changes in mRNA microarray data. Furthermore, non-normalized beta-values are usually in line with validation experiments by pyrosequencing [4,14,15]. Therefore, we decided to use non-normalized raw-data for comparison over all data-sets. The combined data table of the training-set was subsequently analyzed using the MultiExperiment Viewer (MeV6.2) [31].
Identification of the Epigenetic-Aging-Signature. To identify CpG sites which reveal continuous ageassociated hypermethylation or hypomethylation we performed Pavlidis Template Matching (PTM) [27] with the MultiExperiment Viewer (MeV6.2) [31]. Each sample of the training-set was matched to a template with corresponding donor age. The combined dataset was then searched for CpG sites which correlated linearly in their beta-values with the donor age of the template (Pearson correlation) -initially we used very stringent criteria with R > 0.6. In analogy, each dataset was analysed separately and the overlap of ageassociated changes supported the notion that they occur in different tissues. Based on this analysis, we selected five CpG sites which revealed the best age-associated correlation across all 5 datasets of the training-set and relevant variation in the beta-values. For simplicity they were termed by their corresponding genes: TRIM58 (cg07533148), KCNQ1DN (cg01530101), NPTX2 (cg1279989), GRIA2 (cg25148589) and BIRC4BP (cg23571857).

www.impactaging.com
For each of these CpG sites we performed a linear regression analysis of beta-values versus donor age with EXCEL 2007 (Microsoft). These linear regression models were then used for age-predictions in the datasets of the training-group as well as for the validation-group: the five CpG sites (i) were inversely used to predict the age (N) by inserting the specific DNA-methylation levels of the corresponding CpG site (β). N i = (β i -A i )/ B i Where A is the Y-axis intercept and B is the slope of the corresponding CpG site in the training group ( Figure  3A). The mean of the predictions of the five individual CpG sites of the Epigenetic-Aging-Signature was subsequently used to predict donor age.