Genetic associations for two biological age measures point to distinct aging phenotypes

Abstract Biological age measures outperform chronological age in predicting various aging outcomes, yet little is known regarding genetic predisposition. We performed genome‐wide association scans of two age‐adjusted biological age measures (PhenoAgeAcceleration and BioAgeAcceleration), estimated from clinical biochemistry markers (Levine et al., 2018; Levine, 2013) in European‐descent participants from UK Biobank. The strongest signals were found in the APOE gene, tagged by the two major protein‐coding SNPs, PhenoAgeAccel—rs429358 (APOE e4 determinant) (p = 1.50 × 10−72); BioAgeAccel—rs7412 (APOE e2 determinant) (p = 3.16 × 10−60). Interestingly, we observed inverse APOE e2 and e4 associations and unique pathway enrichments when comparing the two biological age measures. Genes associated with BioAgeAccel were enriched in lipid related pathways, while genes associated with PhenoAgeAccel showed enrichment for immune system, cell function, and carbohydrate homeostasis pathways, suggesting the two measures capture different aging domains. Our study reaffirms that aging patterns are heterogeneous across individuals, and the manner in which a person ages may be partly attributed to genetic predisposition.


| INTRODUC TI ON
While chronological age is arguably the strongest determinant of risk for major chronic diseases, it alone is not sufficient to reflect the state of biological aging. Individuals with similar chronological ages are heterogeneous in their physiological states, and subsequent health risks, due to differences in both the rate and manner of biological aging. As a result, efforts have been launched to develop measures that can capture the concept of biological age (BA) (Ferrucci et al., 2020). Typically, these measures encompass single or composite biomarkers found to be associated with a surrogate of biological age, usually chronological age or mortality. In principle, a valid BA measure needs to outperform chronological age in predicting lifespan and a wide range of age-sensitive tests in multiple physiological and behavioral domains (Butler et al., 2004). BA predictors may help discover drivers of the aging process and can be used for secondary prevention by identifying at-risk individuals prior to disease onset. Additionally, they have been proposed as tools to monitor intervention or treatment effects aimed at targeting the aging process.
A variety of data modalities, covering different biological levels of organization, have been used to develop BA predictors, for example, DNA methylation data, gene expression data, proteomic data, metabolomic data, and clinical chemistry measures (Jylhävä et al., 2017). A study comparing 11 BA predictors found low agreement between those based on DNA methylation, clinical biomarkers, and telomere length, in terms of both their correlations with each other and their relative associations with healthspan-related characteristics: balance, grip strength, motor coordination, physical limitations, cognitive decline, self-rated health, and facial aging (Belsky et al., 2018). These findings suggest that different BA predictors may be capturing distinct aspects of the aging process. Furthermore, a recent paper (Ahadi et al., 2020) combining various omics data identified distinct "ageotypes", which represent diverse aging patterns across individuals.
We hypothesize that individual susceptibility to one biological aging domain versus another may be due in part to underlying genetic mechanisms. To test the hypothesis, we used data from the UK Biobank study to understand genetic predisposition to accelerated aging, measured by two validated biological age predictors, PhenoAge  and BioAge (Levine, 2013). PhenoAge and BioAge were previously trained for mortality and chronological age, respectively, as surrogates of biological age, using the biomarker data from the National Health and Nutrition Examination Survey (NHANES) III in the United States (US). Biomarkers were selected based on associations with the biological age surrogates for both measures and knowledge regarding their role or dependency in the aging process also for BioAge. Details on variable selection and methods to construct PhenoAge and BioAge are provided in the Methods. PhenoAge is a function of chronological age, albumin, creatinine, C-reactive protein (CRP), alkaline phosphatase, glucose, lymphocyte percentage, mean corpuscular volume, red blood cell distribution width (RDW), and white blood cell count. BioAge is a function of chronological age, albumin, creatinine, CRP, and alkaline phosphatase (also in PhenoAge), plus glycated hemoglobin (HbA1c), systolic blood pressure, and total cholesterol. Both aging measures have been shown to be robust predictors of aging outcomes (Levine, 2013;Levine et al., 2018;Liu et al., 2018) yet are clearly distinct. Biological age acceleration measured by either biological age measure adjusted for chronological age (PhenoAgeAccel or BioAgeAccel) was similarly associated with morbidity and mortality in the full sample and subgroups of National Health and Nutrition Examination Survey (NHANES) IV, but PhenoAgeAccel outperformed BioAgeAccel in those disease-free and with normal body mass index . We applied PhenoAge and BioAge that were previously trained using the US NHANES III data to the UK Biobank for genome-wide association studies (GWASs). Overall, this study is fully supported by UK Biobank, featured by a large sample and extensive genetic and phenotypic data.

| RE SULTS
451,367 genetically-determined Europeans were identified in UK Biobank. Of whom, 379,703 unrelated participants were included in analyses. Among the included samples (Table S1) participants in the training set and for 195,847 participants in the testing set. The training set was used to perform genome-wide association analysis and the GWAS summary statistics were used to construct polygenic risk scores (PRSs) in the testing set, evaluated for the use of risk stratification for a variety of age-related outcomes.
Demographics and PhenoAge or BioAge biomarker levels were quite balanced between the training and testing sets (Table S2).

| PhenoAgeAccelGWAS
In the Genome-Wide Association Study (GWAS) of PhenoAgeAccel, 7,561 genetic variants were identified (p < 5 × 10 −8 ) (Manhattan plot in Figure 1, including the top 10 mapped genes of lead SNPs detailed in Table 1). The SNP p-value distribution showed sizable deviation from the null distribution of no association. However, there was a lack of evidence of population stratification or cryptic relatedness (LD score regression intercept 1.02 with the standard error (SE) 0.01, compared to the null value 1; genomic inflation factor 1.12), and the proportion of inflation not explained by polygenic heritability was only 6.33% (SE = 3.06%).
The Multi-marker Analysis of GenoMic Annotation (MAGMA) gene set analysis identified 11 gene sets at the Bonferroni-corrected level of 5%, including regulation of signaling and transcription, homeostasis (carbohydrate homeostasis, homeostasis of number of cells, and myeloid cell homeostasis), and immune system process ( Figure 2). In the MAGMA tissue expression analysis, we found that genes expressed in whole blood and liver were more likely to be associated with PhenoAgeAccel than genes expressed in other tissues ( Figure 3).

| BioAgeAccelGWAS
In the GWAS for BioAgeAccel, 996 genetic variants were identified (p < 5 × 10 −8 ) (Manhattan plot in Figure 1 including the top 10 mapped genes of lead SNPs detailed in Table 2). The observed p-value distribution was significantly deviant from the expected under the null ( Figure 2). However, there was no evidence to suggest population stratification or cryptic relatedness (LD score regression intercept = 1.02, SE = 0.01; genomic control factor 1.11), and the F I G U R E 1 PhenoAgeAccel (bottom) and BioAgeAccel (top) Manhattan plots (colors to separate adjacent chromosomes without other indications), including the top 10 mapped genes of lead SNPs (see Tables 2 and 3) proportion of inflation not explained by polygenic heritability was small, 6.58% (SE = 3.78%).
Several lead SNPs were associated with blood pressures. Other lead SNPs were associated with HbA1c, cardiovascular disease, and/or lipid biomarkers (Table S4).
The MAGMA gene set analysis identified 10 lipid-related gene sets at the Bonferroni-corrected level of 5%, including lipid homeostasis, lipid protein particle clearance, and triglyceride-rich plasma lipoprotein particle (Figure 2). None of the 53 tissues showed significant specificity in gene expression for the genes associated with BioAgeAccel ( Figure 3).
• The rs7412T allele, or APOE e2 determined allele, associated with increased longevity , is associated with decreased BioAgeAccel, but increased PhenoAgeAccel.

The genetic correlation between PhenoAgeAccel and
Waist circumstance and waist-hip ratio, adjusted for BMI and physical activity (Pulit et al., 2019), were not correlated with PhenoAgeAccel genetically, but a modest genetic correlation was found between BioAgeAccel and waist-hip ratio (r g = 0.16).
BioAgeAccel (systolic: r g = 0.84, diastolic: r g = 0.57) also was F I G U R E 2 Significant gene sets identified by MAGMA for PhenoAgeAccel (in red) and BioAgeAccel (in green) at the Bonferroni-corrected level, 0.05/10,678 for 10,678 gene sets) genetically more positively correlated with systolic and diastolic blood pressures than PhenoAgeAccel (systolic: r g = 0.23, diastolic: r g = 0.17). Genetically increased PhenoAgeAccel and BioAgeAccel were correlated with lower forced vital capacity (FVC) and forced expiratory volume in one second (FEV1) to a moderate degree but not with the FEV1/FVC ratio. The genetic correlations between PhenoAgeAccel or BioAgeAccel with heel bone mineral density and heart rate variability (the root mean square of the successive differences of inter beat intervals, RMSSD) (Nolte et al., 2017) were minimal ( Figure S4). another liver biomarker, was more correlated with PhenoAgeAccel genetically. PhenoAgeAccel also was genetically more correlated than BioAgeAccel with creatinine, cystatin C, HbA1c, and CRP -biomarkers linked to kidney function, diabetes, and inflammation ( Figure S6).    Figure 4). The mean differences in systolic and diastolic blood pressures were much larger between the top 20% and the bottom 20% of BioAgeAccel PRS than that of PhenoAgeAccel PRS (top right, Figure 4).

| Polygenicriskscores
Both PhenoAgeAccel and BioAgeAccel PRSs were not associated with prevalent cancers including prostate cancer, breast cancer, and colorectal cancer (bottom right, Figure 4). The associations of BioAgeAccel PRS were stronger than those of PhenoAgeAccel PRS with prevalent cardiovascular diseases, particularly CAD and hypertension ( Figure 5). The odds ratio of CAD was 1.27 (95% CI:  Figure   S7). These biomarker results suggested the association between PhenoAgeAccel PRS and CAD was likely driven by non-lipid mechanisms as indicated by the gene set analysis results.
PhenoAgeAccel PRS was more strongly associated than BioAgeAccel PRS with liver and kidney diseases ( Figure 5) and the associated biomarkers, for example, albumin, total bilirubin, creatinine, and cystatin C ( Figure S7), plus COPD, hypothyroidism, type I and type II diabetes, and rheumatoid arthritis ( Figure 5) and Alzheimer's disease was not statistically significant ( Figure S3). ing array type, baseline assessment center, and the first five genetic principal components ( Figure 6). When comparing APOE genotypes as a function of PhenoAge, we find the reverse-e2e3 and e2e2 appeared older than e3e3, whereas e3e4, and e4e4 appeared younger.

F I G U R E 4
To further disentangle the associations between APOE genotypes and accelerated aging by the two biological age measures, we examined the associations between APOE genotypes and the individual biomarkers that make up the composites. We found that the trend of mean BioAge (e4e4 > e4e3 > e3e3 > e2e4 > e2e3 > e2e2) also held for total cholesterol, which was most strongly associated with APOE genotypes among the biomarkers of BioAge ( Figure 6).
When adjusting for total cholesterol, the trend of mean BioAge was reversed, that is, e3e4 and e4e4 younger than e2e3 and e2e2, suggesting that decelerated BioAge associated with e2 was driven by differences in plasma total cholesterol levels. The biomarkers that F I G U R E 5 Odds ratios (ORs) for diseases comparing the top 20% or 40%-60% to the bottom 20% of PhenoAgeAccel (in red) or BioAgeAccel (in blue) polygenic risk score (PRS) (*significantly associated with the top 20% of PhenoAgeAccel PRS at the 5% false-discovery-rate adjusted level; +significantly associated with the top 20% of BioAgeAccel PRS at the 5% falsediscovery-rate adjusted level) appeared to show inverse associations (similar to PhenoAge) were RDW, CRP, alkaline phosphatase, creatinine, and white blood cell count. For all of these biomarkers there was a trend towards higher levels among participants with e2 alleles and lower levels among those with e4 alleles (Figure 6).

| ReplicationintheUSheathretirementstudy
Complete genetic and 2016 biomarker data to estimate PhenoAge and BioAge were available for 5,572 and 1,782 participants, respectively. A summary for the PhenoAge and BioAge samples is provided in Table S6. We are underpowered to detect genome-wide signifi-  (Tables S7 and S8).
We also compared APOE genotypes, using e3e3 as the reference (Table S9). Overall, the three e4 genotypes showed a trend toward decreased PhenoAge, similar to what was observed in UKB, although only e2e4 significantly differed from e3e3 (p = 0.039), while e3e4 was marginally decreased (p = 0.054), and e4e4 was not significant (p = 0.912). Similarly, e2e2 and e2e3 showed a trend toward increased PhenoAge, with e2e2 being marginally significant (p = 0.058) and e2e3 not significant (p = 0.291). Overall, while we were underpowered to replicate the findings for all genotypes, the trends remained F I G U R E 6 Mean standard deviation (SD) differences between non-e3e3 and e3e3 genotypes: (1) biomarkers of PhenoAge (top) or BioAge (bottom) sorted by p-value from left to right for the null hypothesis of no genotypic effects; (2) p < 0.05, p < 0.01, and p < 0.001 labelled by *, **, ***, respectively unchanged. Further, given that this was a replication, a one-sided pvalue would be justified, and in that case, three of the genotypes significantly differed from e3e3 in the expected direction.
Finally, the GRS using the PhenoAgeAccel lead SNPs from UKB was significantly associated with PhenoAgeAccel in HRS. The mean PhenoAge was increased by 0.80 years (95% CI: 0.58 to 1.02, p = 1.42 × 10 −12 ) per SD increase in the GRS of PhenoAgeAccel, with adjustment for age in 2016, sex, and PC1-PC5. rs7412 was strongly associated with BioAgeAccel in UKB but the association cannot be replicated in HRS (Table S8), which also led to negative results with APOE genotypes (Table S9)  The hypothesis needs to be tested in older adults, however. Of note, accelerated aging is not only determined by genetics but also by environment. Interestingly, when considering the actual values rather than the PRS, accelerated PhenoAge is more strongly associated with all-cause mortality than accelerated BioAge in UK Biobank, which implies that the association between accelerated PhenoAge and allcause mortality may be explained to a larger degree by the environmental components.
The PhenoAgeAccel PRS was also related to dementia, but in the opposite than the expected direction, such that individuals with increased PhenoAgeAccel had reduced odds of dementia.
This result was almost entirely driven by the association between PhenoAgeAccel and APOE, which is the most well-known genetic risk factor for late-onset Alzheimer's disease (LOAD). Our results suggested that while PhenoAgeAccel and BioAgeAccel were both associated with the two APOE isoform coding SNPs (rs429358 and rs7412), the relationships were inverse. For instance, the APOE e4 allele is traditionally associated with adverse health outcomes, including an increased risk of Alzheimer's disease, cardiovascular disease, and reduced life expectancy, while the e2 allele confers protection.
However, in our results, we observed increased PhenoAgeAccel associated with e2 genotypes and decreased PhenoAgeAccel associated with e4 genotypes, relative to the common e3e3 genotype.
This paradoxical result was also found for a number of the biomarkers that make up PhenoAge, which likely explains this finding. For instance, APOE e2 allele was associated with higher CRP, RDW, alkaline phosphatase, creatinine, and white blood cell count, while APOE e4 allele was generally associated with lower levels of these biomarkers (Kuo et al., 2020). These biomarkers are not specific to physiological functions or diseases but are more or less associated with inflammation, which suggests that e2 genotypes may be prone to higher inflammation levels on average, compared to e3e3s and e4 genotypes. Our biomarker specific finding is supported by a previous study via CRP (one of the most popular inflammation markers) (Hubacek et al., 2010). Although the underlying mechanism is not fully understood, it has been reported that high inflammation (high CRP) is associated with increased risk of Alzheimer's disease among e4 genotypes (e3e4 or e4e4) but less so among non-e4 genotypes (Tao et al., 2018). This may suggest that increased inflammation levels in e2 genotypes does not increase their Alzheimer's disease risk, but that higher PhenoAge and inflammation among e4 genotypes could increase Alzheimer's disease risk. Contrary to PhenoAgeAccel, BioAgeAccel showed an expected association with APOE that consisted of decelerated aging among participants with e2 alleles and accelerated aging among participants with e4 alleles. This association was accounted for by higher levels of total cholesterol among those with increased BioAgeAccel. This is in-line with APOE known function as a transporter of extracellular cholesterol and the existing evidence suggesting those with the e2 allele exhibit reduced circulating cholesterol, particularly low-density lipoproteins (LDL) (Kuo et al., 2020).
We were able to replicate the association in HRS between PhenoAge and PhenoAgeAccel GRS using the lead SNPs from UKB and showed similar trends for APOE, in which e2 is associated with higher PhenoAgeAccel and e4 is associated with lower PhenoAgeAccel. The unsuccessful replication for BioAgeAccel may be explained by a small sample size or the instruction from the interviewer to the respondent with a high systolic blood pressure to consult their physician as BioAgeAccel, particularly those (e.g., rs7412) associated with systolic blood pressure. Systolic blood pressure, as we have shown in Figure S2, had the highest genetic correlation with BioAgeAccel (r g = 0.84, Figure   S2), compared to other BioAge biomarkers. At the end, more replication studies are needed to validate the UKB findings although it is challenging to find a large cohort with both genetic and biomarker data.
Inevitably, our study has limitations. The UK Biobank participants are healthier than the general population (Fry et al., 2017); therefore, are less susceptible to accelerated aging. The disease status was determined based on self-reported doctor diagnoses at baseline and inpatient electronic health records to 2017. Given that some participants were still relatively young and will likely go on to develop lateonset morbidity, this will contribute to misclassification, which could lead to biased associations, towards the null if the misclassification is non-differential. Nevertheless, when disease prognostic biomarkers were analyzed, we observed consistent results. Last but not least, our findings are based on European-descent participants and may not be generalizable to other ancestry populations.
Overall, the mapped genes and enriched genes sets highlight that

| Geneticdata
DNA was extracted from blood samples and was genotyped using Affymetrix UK BiLEVE Axiom array for the first ~50,000 participants and Affymetrix UK Biobank Axiom array for the remaining cohortthe two arrays have over 95% content overlap (Bycroft et al., 2018).
Imputation was performed by the UK Biobank team using the reference panels of 1000 Genomes and the Haplotype Reference Consortium (HRC), yielding ~93 million variants in 487,442 participants. Of whom, participants (n = 968) with unusually high heterozygosity or missing genotype calls were further removed (Bycroft et al., 2018).

| Biologicalagemeasures
PhenoAge and BioAge were previously trained using data from a US cohort (NHANES III) in separate papers (Levine, 2013;Levine et al., 2018) (Klemera & Doubal, 2006), where x j denotes the level of j-th biomarker, with the corresponding parameters q j , k j , and s j provided in

| Includedsamples
Participants of European descent were included, identified using genetic principal components analysis in detail in Thompson and colleagues (Thompson et al., 2019). Additionally, one in third-degree or closer pairs were removed, identified via pairwise kinship coefficients. The sample was randomly split into a training and a testing set, with a 1 to 2 ratio. The training set was used to perform genomewide association analysis with the results being used to create PRSs in the testing set to evaluate the use for risk stratification for agerelated outcomes.

| Genome-wideassociationanalysis
The association between accelerated PhenoAge or BioAge with each SNP was examined using an efficient Bayesian linear mixed effects model (BOLT-LMM software version 2.2)  for the outcome of PhenoAge or BioAge with additive allelic effect of the candidate SNP, and other fixed effects: chronological age (to make the case of accelerated biological age), sex, genotyping array type, and assessment center, plus random polygenic and environment effects. By default, the LD scores included in the BOLT-LMM for European-ancestry samples were used to calibrate the BOLT-LMM statistic. SNP p-values smaller than 5 × 10 −8 were deemed to be statistically significant. Manhattan and tile-quantile (Q-Q) plots were created for visualization using the CMplot R package.
The genomic inflation due to population stratification or cryptic relatedness was evaluated by linkage disequilibrium (LD) score regres- We performed a stepwise model selection procedure on the genome-wide SNP summary statistics to identify independent signals (p < 5 × 10 −8 ) using the COJO (Conditional & Joint association analysis) function in the GCTA (Genome-wide Complex Trait Analysis) software version 1.92.1 beta6 Linux (Yang et al., 2012). SNPs more than 10,000 kb away from each other were assumed to be in complete linkage equilibrium. As SNPs were selected, those with multiple regression R 2 greater than 0.9 with already pre-selected SNPs were excluded, so as not to include redundant signals from high LD. The loci marked by the selected SNPs were mapped to genes based on GRCh37/hg19 coordinates, and were used in searches for published GWAS associations based on GWAS catalog (Buniello et al., 2019).

| Geneenrichmentanalysis
The GWAS p-values were analyzed by Multi-marker Analysis of

| Polygenicriskscores
The PRSice-2 software version 2.2.2 (Choi & O'Reilly, 2019) was used to perform polygenic risk score (PRS) analysis. SNPs were clumped to obtain SNPs in low LD (r 2 < 0.1) in a 250 base-pair window. SNPs with p-values smaller than a threshold were used to calculate the PRS, sum of the effect alleles associated with accelerated aging, weighted by the effect size. The optimal threshold was chosen by a grid search from 1 × 10 −5 to 0.5 with the increment of 1 × 10 −5 plus 1, such that the variance of PhenoAge or BioAge in the testing set was best explained by PRS, in addition to that by baseline chronological age, sex, genotyping array type, baseline assessment center, and the first five genetic principal components. Subjects were equally divided into five groups by the PRS, where the top 20% (high-risk class) was compared to the bottom 20% (low-risk class) for a variety of aging traits (n = 111). The association analysis was conducted using a regression model, with adjustment for baseline chronological age, sex, genotyping array type, baseline assessment center, and the first five genetic principal components: (1) Cox regression models for lifespan outcomes with hazard ratios reported; (2) logistic regression models for binary outcomes such as disease and pain outcomes with odds ratios reported; (3) linear regression models for continuous outcomes such as physical measures and 49-item frailty that were z-transformed so the regression coefficients associated with PRS quintiles were unit-free and represented the mean differences between quintiles in standard deviations (SDs). Log (e.g., cognitive function measures and 49-item frailty) or the inverse normal transformation (e.g., blood counts and biomarkers) may be applied before the z-transformation to correct the distributional skewness.
Traits that show significant differences between the top 20% and the bottom 20% of PRS were highlighted at the 5% false-discoveryrate adjusted level. Results comparing the top 40-60% to the bottom 20% of PRS were also included in forest plots to examine the dosage effects.

| ReplicationintheUSheathretirementstudy
For replication, we used European-descent participants in the US Health Retirement Study (HRS) (Fisher & Ryan, 2018;Sonnega et al., 2014) to test the associations between PhenoAgeAccel/BioAgeAccel and the UKB lead SNPs individually, their genetic risk score (GRS), or APOE genotypes. Linear regression models were used, with adjustment for age, sex, and the first five genetic principal components (PCs) (Pilling et al., 2017).

CO N FLI C TO FI NTE R E S T
MEL is named on patent applications for epigenetic clocks and holds licenses for the clocks she has developed. MEL also serves as the Bioinformatics Advisor for Elysium Health.

AUTH O RCO NTR I B UTI O N S
Study design: CLK, MEL; data analysis: CLK; manuscript preparation: all.