An epigenetic biomarker of aging for lifespan and healthspan

Identifying reliable biomarkers of aging is a major goal in geroscience. While the first generation of epigenetic biomarkers of aging were developed using chronological age as a surrogate for biological age, we hypothesized that incorporation of composite clinical measures of phenotypic age that capture differences in lifespan and healthspan may identify novel CpGs and facilitate the development of a more powerful epigenetic biomarker of aging. Using an innovative two-step process, we develop a new epigenetic biomarker of aging, DNAm PhenoAge, that strongly outperforms previous measures in regards to predictions for a variety of aging outcomes, including all-cause mortality, cancers, healthspan, physical functioning, and Alzheimer's disease. While this biomarker was developed using data from whole blood, it correlates strongly with age in every tissue and cell tested. Based on an in-depth transcriptional analysis in sorted cells, we find that increased epigenetic, relative to chronological age, is associated with increased activation of pro-inflammatory and interferon pathways, and decreased activation of transcriptional/translational machinery, DNA damage response, and mitochondrial signatures. Overall, this single epigenetic biomarker of aging is able to capture risks for an array of diverse outcomes across multiple tissues and cells, and provide insight into important pathways in aging.

The covariates of the j-th individual are including in the model using the following parametrization: which implies that the baseline hazard is given by where γ is an ancillary parameter to be estimated from the data. The cumulative distribution function of the Gompertz model is given by CDF(t,x)=1-exp(-exp(xb) (exp(γt)-1)/γ) where t denotes time (here in units of months) and xb = ∑ The Gompertz regression analysis resulted in coefficient values and parameter values ( Table 1, Table S1) and γ =0.0076927.
In step 2, we used the cumulative distribution function of the Gompertz model to estimate the 120-month mortality risk of each individual. Thus, CDF(t=120,xj) denotes the probability that the j-th individual will die within the next 120 months.
In step 3, carried out another parametric proportional hazards model analysis with Gompertz distribution, but only including chronological age as a IV. We will refer to this analysis as the univariate Gompertz regression model since it only involved one covariate (age). The resulting estimate of the cumulative distribution function CDF.univariate(t,age) CDF. univariate(t, age) = 1 − e {−e (age * b 1 +b 0 ) γ −1 (e γt −1)} allowed us to estimate the probability that the j-th individual with die within 120 months as follows CDF.univariate(120,agej) where agej is the age of the j-th individual.

Data used to generate DNAm PhenoAge
Participants ages 20 and over in NHANES III (1988-94) were used as the training sample to develop a new and improved measure of phenotypic aging (n=9,926), while participants ages 20 and over in NHANES IV (1999IV ( -2014 were used to validate the association between phenotypic aging and age-related morbidity and mortality (n=6,209). Overall, NHANES III had available mortality follow-up for up to 23 (n= deaths) and NHANES IV had available mortality follow-up for up to 17 years (n= deaths). InCHIANTI included longitudinal (two time-points-1998 and 2007) phenotypic and DNAm data on n=456 male and female participants, ages 21-91 in 1998, and 30-100 in 2007. Participants from WHI included 2,107 postmenopausal women, who were ages 50-80 at baseline and were followed-up for just over 20 years.

DNA methylation data
All but one cohort used the DNAm data sets used the Illumina Infinium 450K platform. However, the data from the Jackson Heart Study were generated on the EPIC array.
The Illumina BeadChips measures bisulfite-conversion-based, single-CpG resolution DNAm levels at different CpG sites in the human genome. These data were generated by following the standard protocol of Illumina methylation assays, which quantifies methylation levels by the β value using the ratio of intensities between methylated and un-methylated alleles. Specifically, the β value is calculated from the intensity of the methylated (M corresponding to signal A) and un-methylated (U corresponding to signal B) alleles, as the ratio of fluorescent signals β = Max(M,0)/[Max(M,0)+Max(U,0)+100]. Thus, β values range from 0 (completely un-methylated) to 1 (completely methylated). For WHI we used background corrected beta values, while InCHIANTI and the JHS data were normalized using the NOOB method [2].

Main Validation Studies
A number of validation studies were used to test the associations between DNAm PhenoAge and various aging-related traits. WHI (samples 1 and 2), FHS, and NAS comprised the main validation data for mortality and morbidity analyses. The two separate WHI subsamples were aggregated for our study within the WHI (BA23 and AS315). WHI sample 1 included 2,091 for whom complete data was available for DNAm PhenoAge, morbidity, mortality, and confounder variables. Participants were part of a subsample from the Women's Health Initiative (WHI), who were enrolled as part of an integrative genomics study, with the primary focus on identifying determinants of CHD risk as detailed in [3]. This sample included women ages 50-79 at baseline, with an overrepresentation of racial/ethnic minorities. About half of the samples developed coronary heart disease after the baseline measurement. The integrative genomics subsample employed a case-control sampling design. All incident cases and controls were required to have already undergone genome wide genotyping at baseline as well as profiling of seven cardiovascular biomarkers, as dictated by the aims of other ancillary WHI studies.
The second WHI data set was part of the Women's Health Initiative -Epigenetic Mechanisms of PM-Mediated CVD (WHI-EMPC, AS315) is an ancillary study of epigenetic mechanisms underlying associations between ambient particulate matter (PM) air pollution and cardiovascular disease (CVD) in the Women's Health Initiative clinical trials (CT) cohort. The WHI-EMPC study population is a stratified, random sample of 2,200 WHI CT participants who were examined between 1993 and 2001; had available buffy coat, core analytes, electrocardiograms, and ambient concentrations of PM; but were not taking antiarrhythmic medications at the time. As such, WHI-EMPC is representative of the larger, multiethnic WHI CT population from which it was sampled: n=68,132 participants aged 50-79 years who were randomized to hormone therapy, calcium/vitamin D supplementation, and / or dietary modification in 40 U.S. clinical centers at the baseline exam (1993)(1994)(1995)(1996)(1997)(1998)) and re-examined in the fasting state one, three, six, and nine years later [4].

The Normative Aging Study (NAS)
The US Department of Veterans Affairs (VA) Normative Aging Study (NAS) is an ongoing longitudinal cohort established in 1963, which included men who were 21-80 years of age and free of known chronic medical conditions at entry [5]. Participants were subsequently invited to medical examinations every three to five years. At each visit, participants provided information on medical history, lifestyle, and demographic factors, and underwent a physical examination and laboratory tests. DNA samples were collected from between 1999-2007 from the 675 active participants and used for DNAm analysis. We excluded 18 participants who were non-whites or had missing information on race, leaving a total of 657 individuals.
Official death certificates were obtained for decedents from the appropriate state health department and were reviewed by a physician. Experienced research nurse coded the cause of death using ICD-9. Both participant deaths and cause of death are routinely updated by the research team and last update available was December 31, 2013.

Jackson Heart Study
The JHS is a large, population-based observational study evaluating the etiology of cardiovascular, renal, and respiratory diseases among African Americans residing in the three counties (Hinds, Madison, and Rankin) that make up the Jackson, Mississippi metropolitan area [6]. Data and biologic materials have been collected from 5306 participants, including a nested family cohort of 1,498 members of 264 families. The age at enrollment for the unrelated cohort was 35-84 years; the family cohort included related individuals >21 years old. Participants provided extensive medical and social history, had an array of physical and biochemical measurements and diagnostic procedures, and provided genomic DNA during a baseline examination (2000)(2001)(2002)(2003)(2004) and two follow-up examinations (2005-2008 and 2009-2012). The study population is characterized by a high prevalence of diabetes, hypertension, obesity, and related disorders. Annual follow-up interviews and cohort surveillance are ongoing.
In our analysis, we used Illumina EPIC array data from n=1756 African Americans (n=1203 women and n=653 men) that were generated as part of project JHS ancillary study ASN0104. The blood samples were collected at the baseline of the study (visit 1). At the time of the blood draw, the individuals ranged from 22 to 93 (median age 57). At the time of the last follow up, 282 individuals were known to be deceased. The median number of years of follow up (time to death or last follow up) was 12.2 years (ranging from 0.14 to 14.5 years).
DNAmPhenoAgeAccel ranged from -19 to 38. 4.7 and 4.9 percent of individuals had a value of DNAmPhenoAgeAccel larger than 10 or smaller than -10, respectively.

Framingham Heart Study Offspring Cohort (FHS)
The Framingham Heart Study (FHS) Offspring Cohort began enrollment in 1971 and included 5,124 offspring and spouses of the offspring of the FHS original cohort. Participants were eligible for the current study if they attended the eighth examination cycle (2005)(2006)(2007)(2008) and consented to having their DNA to be used for genetic research. All participants provided written informed consent at the time of each examination visit. The study protocol was approved by the Institutional Review Board at Boston University Medical Center (Boston, MA). The FHS data are available in dbGaP (accession number "phs000724.v2.p9"). Deaths among FHS participants that occurred prior to January 1, 2013 were ascertained using multiple strategies, including routine contact with participants for health history updates, surveillance at the local hospital and in obituaries of the local newspaper, and queries to the National Death Index. Death certificates, hospital and nursing home records prior to death, and autopsy reports were requested. When cause of death was undeterminable, the next of kin were interviewed. The date and cause of death were reviewed by an endpoint panel of 3 investigators.

Early menopause and blood methylation data
We previously reported an association between early menopause and epigenetic age acceleration (based on Horvath DNAm Age) [7]. We used the same data from BA23 of the Women's Health Initiative (described above) to replicate this finding using DNAm PhenoAge acceleration.

Alzheimer's disease and brain methylation data from the Religious Order Study
A number of other samples were used to validate the accuracy of DNAm PhenoAge in various tissues or for case-control studies. For instance, the Religious Order Study (ROS) and the Memory and Aging Project (MAP) were used to test the association between DNAm PhenoAge in DLPCTX and Alzheimer's disease and/or neuropathology. Both are longitudinal community based cohort studies of aging and dementia. The majority of participants in both studies are 75-80 years old at baseline with no known dementia. Inclusion in the studies requires participants to consent to undergoing annual clinical evaluations as well as postmortem organ donation. The ROS sample includes Catholic priests, nuns, and brothers from across the United States, whereas the MAP sample includes a more general community based population from northeastern Illinois.

Breast cancer risk and blood methylation data from the EPIC study
We previously reported an association between intrinsic epigenetic age acceleration (based on the Horvath DNAm age estimator) and breast cancer risk [8]. Using the same data, we studied whether DNAmPhenoAgeAcceleration in blood also predicts incident breast cancer. The Illumina 450K DNA methylation data came from a nested case-control study embedded in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort (n = 960 females) [8]. The 960 females from the EPIC cohort included 480 incident breast cancer cases. According to Ambatipudi (2017), the main criteria for selection of case/control pairs included: (1) a balanced representation of the main subtypes of breast cancer, and (2) representation of recruiting centres. One control participant was randomly assigned for each patient with breast cancer from appropriate risk sets consisting of all cohort participants alive and free of cancer (except for non-melanoma skin cancer) at the time of diagnosis (and hence, age) of the index case. Matching criteria were: centre, length of follow-up, age at blood collection, time of blood collection, fasting status, menopausal status, menstrual cycle day and current use of contraceptive pill/hormone replacement therapy.

Offspring of Italian semi-supercentenarians
Semi-supercentenarians (aged between 105-109 years) are of great interest to aging researchers because they often managed to avoid the onset of major age-related diseases. We previously reported that the offspring of semi-supercentenarians are epigenetically younger than age matched controls (i.e. individuals of non-centenarians) using the Horvath DNAm age estimator [9]. Using the same blood methylation data, we revisited this analysis using DNAmPhenoAgeAcceleration.

Dementia status versus blood methylation
We evaluated whether DNAm PhenoAge in blood relates to clinically diagnosed dementia in living individuals using a blood data set from [10]. Results suggest that those with presumed Alzheimer's disease (AD, n=154) and/or frontotemporal dementia (FTD, n=116) have significantly higher DNAm PhenoAge compared to non-demented (n=334) individuals (P=2.2E-2). Patients were enrolled as part of a large genetic study in neurodegenerative dementia (Genetic Investigation in Frontotemporal Dementia, GIFT) at the UCSF Memory and Aging Center (UCSF-MAC) [10]. The blood DNA methylation data were generated on the Illumina 450K array.

Down syndrome versus blood methylation
We previously reported that individuals with Down syndrome are epigenetically older than age matched controls according to the Horvath DNAm age estimator [11]. Using the same blood data sets, we replicated the results for DNAm PhenoAge. Leukocyte data set 1 (measured on the Illumina 27K platform) involved 35 participants with DS and 21 controls (mean age 43 ranging from 22 to 64) that were ascertained through the New York State developmental disability service system as well as agencies in New Jersey, Connecticut and Northern Pennsylvania [12]. Blood data set 2 (measured on the Illumina 450K array) involved 29 individuals with Down syndrome, their mothers (DSM) and their unaffected siblings (DSS) [11]. The individuals investigated in this study were recruited in Emilia-Romagna region (Bologna and Ferrara provinces), Italy. In our original article, we showed that DS individuals also exhibited epigenetic age acceleration in brain tissue according to the Horvath DNAm age estimator [11]. By contrast, DNAm PhenoAge did not reveal a significant epigenetic age acceleration effect in the small brain data set (n=15 brain samples from DS and 54 controls).

HIV infection and blood methylation
We previously reported that HIV infection is associated with accelerated epigenetic age acceleration according to the Horvath DNAm age estimator. Here we used the same blood data sets to revisit this analysis with DNAm PhenoAge. Specifically, we used two publicly available blood DNA methylation data from HIV+ individuals and HIV-controls from [13]. All data sets were generated with the Illumina 450K array. The first dat set involved Peripheral blood mononuclear cells (PBMCs) isolated from 92 individuals were evaluated. The 24 HIV+cases had a mean age of 49 years (range, 29-67 years). The 68 controls had a mean age of 36 years (range, 18-74 years). The second data also involved PBMCs from HIV+ cases and HIV negative controls. The 23 cases had a mean age of 45 years (range, 24-68 years). The 69 controls had a mean age of 51 years (range, 35-64 years). The HIV+ cases were ascertained by the National Neurological AIDS Bank study or Multicenter AIDS Cohort Study in Los Angeles.

Parkinson's disease and blood methylation
We previously reported that Parkinson's disease is associated with a weak age acceleration effect according to both the Horvath and Hannum DNAm age estimators [14]. Using the same data, we revisited this analysis with DNAm PhenoAge. We observe a suggestive relationship between DNAm PhenoAge in blood and Parkinson's disease status (p=0.028) the n=508 individuals (n=289 PD cases and n=219 controls) of European ancestry. The blood DNA methylation data were measured on the Illumina 450K array. The blood samples came from the Parkinson's disease, Environment, and Genes (PEG) case-control study [15]. The PEG study is a large population-based study of Parkinson's disease of mostly rural and township residents of California's central valley. Cases were identified with the help of local neurologists, clinics, and community outreach and controls were randomly sampled from Medicare lists and residential tax assessor's records.

Estimation of blood cell counts based on DNAm levels
We estimate blood cell counts using two different software tools. First, Houseman's estimation method [16] was used to estimate the proportions of CD8+ T cells, CD4+ T, natural killer, B cells, and granulocytes (also known as polymorphonuclear leukocytes). Second, the Horvath method, implemented in the advanced analysis option of the epigenetic clock software [13,17], was used to estimate the percentage of exhausted CD8+ T cells (defined as CD28-CD45RA-), the number (count) of naïve CD8+ T cells (defined as CD45RA+CCR7+) and plasmablasts. We and others have shown that the estimated blood cell counts have moderately high correlations with corresponding flow cytometric measures [16,18].

Heritability Analysis using SOLAR
The polygenic model implemented in SOLAR (Sequential Oligogenic Linkage Analysis Routine) software [19] was used to estimate heritability of DNAm PhenoAge in the FHS pedigree cohort based on the known KINSHIP coefficients. Heritability is defined as the total proportion of phenotypic variance attributable to genetic variation in the polygenic model. The polygenic model was adjusted gender and chronological age. GCTA -We performed the REML analysis [20,21] to estimate the heritability of PhenoAgeAccel, using the postmenopausal women of the WHI cohort to estimate heritability of age acceleration. The WHI substudies were genotyped on different platforms. In order to combine the genotype data across the studies from WHI EMPC and WHI BA23, we converted the MaCH dosage format into PLINK format with best guess genotypes and used both genotyped and imputed markers for analysis. We only used the overlapped markers existing in all studies (such that SNP missing rate=0) and controlled the quality of SNPs based on MAF > 0.05, Hardy Weinberg equilibrium (HWE) P > 0.0001, and MaCH r 2 > 0.8, yielding approximately 4M markers available for analysis. All analyses were adjusted for 4 principal components (PC).

Enrichment Analysis using GREAT
We applied the GREAT analysis to analyze the functional involvements of cis-regulatory regions of the 513 CpG sites. Of the 513 CpG sites, 242 are positively correlated with chronological age in blood and 271 are negatively correlated with chronological age in blood. We used three different sets of genes as inputall, those co-locating to CpGs with positive age associations, and those co-locating to CpGs with negative age associations. Enrichment was assessed using the whole genome as background. The GREAT software performs both binomial test over genomic regions and hypergeometric test over genes when using a whole genome background.

Estimation of neuronal proportions in brain tissues
The CETS R package [22] was used to estimate the proportion of neurons based on DNA methylation data. We independently confirmed the high accuracy of the CETS algorithm by applying it sorted neurons, which led to estimates of the proportion of neurons in excess of 0.99.

Gene expression analysis of AgeAccelPheno
DNA methylation and gene expression from the Monocyte [23] and FHS datasets were adjusted for array effects using ComBat from the sva R package. The AAP variable was adjusted using linear modeling for site, race (in the Monocyte dataset), family structure (in the FHS dataset), gender, and cell proportion estimates where designated (Houseman estimates excluding the major cell type to avoid multi-colinearity: CD8T, CD4T, NK, Bcell, Mono (excluded in the Monocyte dataset), Gran (excluded in the FHS dataset). The individual gene expression probe levels where tested for associations with AAP using a robust correlation measure (biweight midcorrelation). The top 5% of genes positively and negatively correlated with the AAP where analyzed for GO term enrichment using the topGO R package.

DNAm PhenAge and other aging-related morbidity outcomes
Additional independent replication data was used to test for associations with other aging outcomes, which have previously been shown to relate to the first generation of epigenetic biomarkers [8-11, 13, 14]. Using the five studies described above, we find that women with higher DNAm PhenoAge tended to have an earlier age at menopause (Meta P-value=1.32E-2). Among the 527 women who were cancer free at age 50, accelerated DNAm PhenoAge in blood predicts incident breast cancer (OR: 1.037, p=0.033) using data from [8]. We find a marginally significant reduction of approximately 2.4 years for the DNAm PhenoAge of semi-super centenarian offspring, relative to controls (=-2.40, p=0.065) in a relatively small blood data set (comprised of 63 semi-supercentenarians' offspring and 47 age matched controls from [9]). We evaluated whether DNAm PhenoAge relates to clinically diagnosed dementia in living individuals using a blood data set from [10]. Results suggest that those with presumed Alzheimer's disease (AD, n=154) and/or frontotemporal dementia (FTD, n=116) have significantly higher DNAm PhenoAge compared to nondemented (n=334) individuals (P=2.2E-2), and the strength of the association is further increased (P=9.4E-3) when limiting the sample to those ages 75 and older. We also find that DNAm PhenoAge relates to Down syndrome in two separate blood methylation datasets (p=0.0046, n=56; and p=4.0E-11, n=87) described in [11]. We find that HIV infection is associated (p=6E-6 and p=8.6E-6) with accelerated DNAm PhenoAge in two blood datasets (n=92 and n=92) described in [13]. Finally, we observe a suggestive relationship between DNAm PhenoAge in blood and Parkinson's disease status (p=0.028) in a large data set (n=508) of individuals of European ancestry from [14].

Effect of obesity on liver and adipose tissue
Using the Horvath DNAm age measure, we previously found that body mass index correlated with epigenetic age acceleration in two independent human liver samples (r=0.42 and r=0.42 in liver data sets 1 and 2, respectively) [24]. Using the same data, we replicated this finding using the new measure of PhenoAge acceleration (r=0.32, p=0.011 and r= 0.48 p=7.7E-6 in liver data set 1 and 2, respectively).
Interestingly we also find a significant correlation between BMI and DNAm PhenoAge acceleration in the first adipose data set (r=0.43, p=1.2E-23 using n=648 adipose samples from the Twins UK study) but not in a second smaller adipose data set (n=32 samples).

DNAm PhenoAge and Immunosenescence
To test the hypothesis that DNAm PhenoAge captures aspects of the age-related decline of the immune system, we correlated DNAm PhenoAge with estimated [16,18] blood cell count (Supplementary Fig.  S7 . These results are consistent with age related changes in blood cells [25] and suggest that DNAm PhenoAge may capture aspects of immunosenescence in blood. However, three lines of evidence suggest that DNAm PhenoAge is not simply a measure of immunosenescnce. First, another measure of immunosenescence, leukocyte telomere length, is only weakly correlated with DNAm PhenoAgeAccel (r=-0.087, P=7.6E-3) in the n=905 individuals from the Framingham Heart study, for whom both DNA methylation data and LTL data were available (Supplementary Fig. S8). Second, DNAm PhenoAge also applies to non-blood tissues. Third, the strong association between DNAm PhenoAge and mortality does not simply reflect changes in blood cell composition, as can be seen from the fact that in Supplementary Fig. S9 the robust association remains even after adjusting for estimates of seven blood cell count measures (Meta(FE)=1.036, Meta p=5.6E-21).

Additional DNA sequence characteristics of the 513 CpGs in DNAm PhenoAge
The 513 CpGs in DNAm PhenoAge mapped to 506 distinct genes ( The Genomic Regions Enrichment of Annotations Tool (GREAT) [30] was used to directly test for functional enrichment among the 506 genes. As shown in Table S6, CpGs in our score co-locate with genes enriched in the Kallikreins (KLK) gene family (Region Fold Enrichment=66.8, Binomial FDR=2.4E-4), as well as genes in the methylglyoxal degradation I pathway (Region Fold Enrichment=63.9, Binomial FDR=5.5E-3). However, in general, we find very little enrichment and as a result, we tested whether DNAm of each of the 513 CpGs was associated with differential expression in the gene that it co-located with. Based on transcriptional data from monocytes (n= 1,264) described in [23], our results show that, in general, very few of our CpGs are associated with changes in expression of the genes that they are supposedly linked to (Table S7), which may explain why this gene set produces generally null enrichment findings. All models were adjusted for chronological age. Models for smokers were adjusted for pack-years. Models in all samples aside from WHI were adjusted for sex. Models in both WHI samples and NAS were adjusted for race/ethnicity.

Supplementary Table S5. Comparison of DNAm based epigenetic biomarkers of ageing
In the following pages, we provide a direct comparisons of 6 DNAm based biomarkers of ageing.  [35].
We evaluate these predictors in terms of predicting time to death, relationship to body mass index, relationship to blood cell count estimates, telomere length, predicting centenarian status, neuritique plaques in postmortem brain samples.

Supplementary Table S5A. Predicting time to death based on epigenetic age acceleration in blood
Our comparison is mainly based the Illumina EPIC array data from n= 1756 African American individuals from the Jackson Heart study.
We evaluate whether DNAm age predict time to death. During the following up time, 282 passed away. The blood samples were collected at the baseline of the study (visit 1). At the time of the blood draw, the individuals ranged from 22 to 93 (median age 57). At the time of the last follow up, 282 individuals were known to be deceased. The median number of years of follow up (time to death or last follow up) was 12.2 years (ranging from 0.14 to 14.5 years).
In the following, we report the results of this Cox regression analysis which demonstrate that DNAmPhenoAge has the most predictive association with time to death.
The DNAmAge estimator is listed in the heading.  We used Illumina EPIC DNA methylation data from blood samples of the Jackson Heart Study to test the association between the DNAm age estimators (dependent variable) and Leukocyte Telomere Length (more precisely Mean Telomere Restriction Fragment) using multivariate regression models that also included chronological age and sex as covariates.
The following analysis reveals that DNAmPhenoAge has the strongest association with leukocyte telomere length. [

Supplementary Table S5E. Offspring of centenarians have low epigenetic age acceleration in blood
We analyzed Illumina 450K data from Italian semi-supercentenarians (individuals aged 105 or more), their offspring and age matched controls from [36]. We analyzed peripheral blood mononuclear cells from Italian families constituted of 82 semi-supercentenarians (mean age: 105.6 ± 1.6 years), 63 semisupercentenarians' offspring (mean age: 71.8 ± 7.8 years), and 47 age-matched controls (mean age: 69.8 ± 7.2 years). Using a linear regression model, we demonstrate that the offspring of semi-supercentenarians have a lower epigenetic age than age-matched controls [36].
The negative sign of the Student T statistic indicates that the offspring of centenerians are younger than age matched controls. [

Supplementary Table S5F. Neuritique plaques vs age acceleration in the prefrontal cortex
We used the DNA methylation data from postmortem prefrontal cortex samples of the Religious Order Study and the MAP study [37]. We correlated measures of age acceleration in the prefrontal cortex with the abundance of neuritique plaques (adjusted for chronological age and gender).
Cor with age adjusted neuritique plaques

Supplementary Table S5F. Body Mass Index versus epigenetic age acceleration in liver
We used n=85 Illumina 450K data arrays from human liver samples. Liver samples from morbidly obese patients with all stages of NAFLD and controls were analysed. The data are described in [24,38] and available from Gene Expression Omnibus (GSE48325).
Using multivariate linear regression models (dependent variable DNAm Age) we find that high BMI is associated with increased DNAm Age even after correcting for gender and age consistent with the findings reported in [39]. We use Illumina 450K DNA methylation data from blood samples from over 2k postmenopausal women from the Women's Health Initiative [18]. We estimated blood cell proportions/counts using two different software tools. Houseman's estimation method [16] was used to estimate the proportions of cytotoxic (CD8+) T cells, helper (CD4+) T, natural killer, B cells, and granulocytes (mostly neutrophils). Another method was used to estimate the percentage of exhausted CD8+ T cells (defined as CD28-CD45RA-), plasmablasts, and the number (count) of naïve CD8+ T cells (defined as CD45RA+CCR7+) [40], which is implemented in the advanced analysis option of the epigenetic age calculator software [41]. Imputed blood cell counts have moderately high correlations with corresponding flow cytometric data [18]. All blood cell counts were adjusted for chronolgical age, i.e. the reported correlations are not confounded by chronological age.    Over the nine years of follow-up, mean and median change in phenotypic age (A) and DNAm PhenoAge (B) was about 9 years. Nevertheless, within person (age adjusted) measures for phenotypic age (C) and DNAm PhenoAge (D), remained fairly stable over time-those who are fast agers, remain fast agers. Finally, panel E shows the correlation between change in phenotypic age and change in DNAm PhenoAge, suggesting that those who experience an acceleration of phenotypic age based on clinical markers also experience age acceleration on an epigenetic level. When comparing DNAm PhenoAge by smoking status, we find that current smokers have significantly higher epigenetic ages compared to never, and/or former smokers (A). This is also true when comparing DNAm PhenoAge as a function of pack-years (B). However, no associations with pack-years are found when stratifying by smoking status-current (D) or former (D)-suggesting that the correlation with pack-years in (B) merely reflects a difference in smoking status (similar to what is shown in A). In smoking stratified analyses, adjusting for pack-years (in smokers) and chronological age, we find that DNAm PhenoAge significantly predicts mortality even within groups, and despite much smaller sample sizes (panel A depicts results for nonsmokers and panel B depicts results for smokers). The Hannum measure also relates to mortality in both smokers and non-smokers; although to a lesser degree than DNAm PhenoAge. Interestingly, the effect of DNAm PhenoAge on mortality appears to be larger for smokers, compared to non-smokers, suggesting that it may capture vulnerability to stressors, like cigarette exposure.

SUPPLEMENTARY FIG. S5. Associations between DNAm PhenoAge and race/ethnicity in the WHI.
When comparing DNAm PhenoAge by race/ethnicity, we find that non-Hispanic blacks have the highest DNAm PhenoAges, whereas non-Hispanic whites have the lowest (A). This is reflective of trends we see in life-expectancy. Further, this likely reflects differences between the three groups, rather than variations in the reliability of the measure within the three strata, as evidenced by the very consistent age trends across all three groups (B, C, & D  After adjusting for chronological age, DNAm PhenoAge has a very weak negative correlation with LTL, for the whole population, and stratifying by race/ethnicity and/or sex. This suggests that higher DNAm PhenoAge is modestly related to shorter LTL.

SUPPLEMENTARY FIG. S9. Fixed effects meta analysis of the effect of DNAm phenotypic age acceleration on the hazard of death after adjusting for blood cell counts. DNAm PhenoAge is
significantly predictive of all-cause mortality even after accounting for leukocyte proportions. The Cox regression model is adjusted for chronological age, race/ethnicity, smoking pack years, and imputed blood cell counts (exhausted CD8+ T cells, naïve CD8+ T cells, CD4T cells, natural killer cells monocytes, granulocytes). The meta-analysis p value is colored in red. A significant heterogeneity p value (red font) indicates that the hazard ratios differ significantly across studies.
SUPPLEMENTARY FIG. S10. Properties of the 513 CpGs that underly DNAmPhenoAge. In our functional enrichment analysis of the chromosomal locations of the 513 CpGs, we distinguished CpGs with positive age correlation from CpGs with negative age correlation. CpGs with positive age correlation exhibited a lower variance but a similar mean methylation level compared to CpGs with negative age correlation (B,C). The 149 CpGs whose age correlation exceeded 0.2 tended to be located in CpG islands (E) and were significantly enriched with polycomb group protein targets (p=8.7E-5, D). A) Each CpGs was correlated with chronological age in whole blood. The histogram shows the correlation coefficients.
Statistical comments: To avoid biased enrichment results, it is important to use the correct background set of CpGs when it comes to characterizing the properties of the 513 CpGs. The set of CpGs on the Illumina 27K array is the appropriate background set because this set of CpGs was used when training the DNAmPhenoAge estimator. While we used the Illumina 450K array to measure DNA methylation levels, we only used a small subset of the CpGs (namely those located both on the Illumina 27K and the EPIC array) as training set when developing DNAmPhenoAge. We discarded most of the CpGs on the Illumina 450K array when it came to training the DNAmPhenoAge estimator because, surprisingly, the resulting DNAmPhenoAge estimator exhibited more significant predictive associations with lifespan than DNAm estimators built using the full set of CpGs on the Illumina 450K array.

SUPPLEMENTARY FIG. S11: Correlation between 1) gene-DNAm PhenoAgeAccel correlations and 2) gene-chronological age correlations.
Genes have similar correlations with chronological age, as they do with DNAm PhenoAgeAccel (DNAm PhenoAge adjusted for age). This suggest that genes that tend to increase as a function of age, tend to be upregulated in persons who are epigenetically older than expected, compared to other of the same age. Conversely, genes that show decreased expression with age, are downregulated among those who are epigenetically older than expected. This can be taken to signify that age-related transcriptional alterations are further exacerbated for those with higher DNAmPhenoAge, relative to their chronological age. However, these cross sectional association studies do not allow us dissect cause and effect relationships between gene transcripts and DNA methylation changes.

SUPPLEMENTARY FIG. S12: Correlation between elastic net beta coefficients and age correlations.
Overall, the coefficients (weights) for the 513 CpGs in the DNAm PhenoAge score (x-axis) are not highly correlated with univariate age correlations (y-axis). Moreover, many of the CpGs shown in blue have little to know age correlation, while having some of the highest weights.
SUPPLEMENTARY FIG. S13. Partial likelihood versus log(lambda) parameter for elastic net proportional hazard model. Ten-fold cross-validation was employed to select the parameter value, lambda, for the penalized regression. In order to develop a sparse phenotypic age estimator (the fewest biomarker variables needed to produce robust results) we selected a lambda of 0.0192, which represented a one standard deviation increase over the lambda with minimum mean-squared error during crossvalidation. Of the forty-two biomarkers included in the penalized Cox regression model, this resulted in ten variables (including chronological age) that were selected for the phenotypic age predictor.

SUPPLEMENTARY FIG. S14. Partial likelihood versus log(lambda) parameter for elastic net regression
The CpGs used in the elastic net represent those that are found on the Illumina Infinium 450k chip, the EPIC chip, and the Illumina Infinium 27k chip. Lambda was selected using 10-fold cross-validation; however, given that sparseness was not a goal with this model, the lambda with the minimum mean-squared error was selected (lambda=0.35). This lambda, produced a model in which phenotypic age is predicted by DNAm levels at 513 CpGs.
SUPPLEMENTARY FIG. S15: Scatterplots of top genes vs DNAm PhenoAgeAccel and chronological age in the FHS PBMC data. Scatterplots depict the associations (negative in blue and positive in red) between expression in top genes and either DNAm PhenoAgeAccel (top six panels), or chronological age (bottom six panels). Results suggest that genes relate similarly to age adjusted DNAm PhenoAgeAccel and chronological age.
SUPPLEMENTARY FIG. S16: Scatterplots of top genes vs DNAm PhenoAgeAccel and chronological age in the MESA Monocyte data. Scatterplots depict the associations (negative in blue and positive in red) between expression in top genes and either DNAm PhenoAgeAccel (top six panels), or chronological age (bottom six panels). Results suggest that genes relate similarly to age adjusted DNAm PhenoAgeAccel and chronological age.