Longitudinal plasma metabolomics of aging and sex

Understanding how metabolites are longitudinally influenced by age and sex could facilitate the identification of metabolomic profiles and trajectories that indicate disease risk. We investigated the metabolomics of age and sex using longitudinal plasma samples from the Wisconsin Registry for Alzheimer’s Prevention (WRAP), a cohort of participants who were dementia free at enrollment. Metabolomic profiles were quantified for 2,344 fasting plasma samples among 1,212 participants, each with up to three study visits. Of 1,097 metabolites tested, 623 (56.8%) were associated with age and 695 (63.4%) with sex after correcting for multiple testing. Approximately twice as many metabolites were associated with age in stratified analyses of women versus men, and 68 metabolite trajectories significantly differed by sex, most notably including sphingolipids, which tended to increase in women and decrease in men with age. Using genome-wide genotyping, we also report the heritabilities of metabolites investigated, which ranged dramatically (0.2–99.2%); however, the median heritability of 36.2% suggests that many metabolites are highly influenced by a complex combination of genomic and environmental influences. These findings offer a more profound description of the aging process and may inform many new hypotheses regarding the role metabolites play in healthy and accelerated aging.


INTRODUCTION
The metabolome represents the functional endpoints of a complex network of biological events, including genomic, epigenomic, transcriptomic, proteomic, and environmental factors [1]. Being the final downstream product, the metabolome is the closest to the phenotype among the biological systems [2], making it particularly relevant to investigate. Age is known to be the single largest risk factor of most prevalent diseases in developed countries [3]. A better understanding of how the metabolome changes with age could further reveal the mechanisms by which age influences disease risk and could facilitate the identification of high-risk metabolomic profiles that are suggestive of the early stages of particular diseases.

AGING
Previous studies have provided important evidence that age and sex influence the metabolome [4][5][6][7][8][9][10]. While informative, these studies are limited by their crosssectional designs and the relatively small number of metabolites assessed by most. According to the Human Metabolite Database (HMDB) v4.0, there are an estimated 25,424 blood metabolites [11]. However, due to current technical limitations in identifying and quantifying metabolites, most recent studies have only been able to confidently capture ~100-600 of these. A larger panel of metabolites will provide a more comprehensive understanding of the metabolomics of age and sex. Further, in order to assess the metabolomics of aging, it is crucial to use a longitudinal study design that can capture age-related phenomena, particularly due to the high variability of metabolites [12]. Longitudinal assessments also facilitate the examination of metabolite trajectories, which can address important biological questions.
Using longitudinal plasma samples from the Wisconsin Registry for Alzheimer's Prevention (WRAP), we investigated how a large panel of metabolites is influenced by age and sex, and whether metabolite trajectories vary by sex. To facilitate the interpretation of our results and determine whether identified metabolites are more strongly influenced by genetic or environmental factors, we used genome-wide genotyping data to assess the heritability (h 2 ) of metabolites.

Participants
A total of 1,212 WRAP participants with 2,344 longitudinal fasting plasma samples were available for analyses. At the baseline visit for the current study, participants were 61 years old on average, 69% were female, and 94% were Caucasian (Table 1). Most individuals were unrelated (n=825), but 147 families had >1 individual (family sizes ranged from 1-9 members, with an average of 1.2 individuals per family). Analyses stratified by sex included 838 women and 374 men, who had similar characteristics with the exception of more men taking cholesterol lowering medications than women. Participants each had 1,097 plasma metabolites available for analyses, 347 (31.6%) of which were of unknown chemical structure. Correlations between metabolites were assessed using Pearson r and the first available sample for each individual (i.e., using a cross-sectional approach). Metabolites were largely uncorrelated with each other ( Figure S1). Properties of each metabolite, such as biochemical name, super pathway, and sub pathway, are described in Table S1.

Metabolome-wide association study
Associations were tested using linear mixed effects regression models implemented in the SAS MIXED procedure. Primary predictors included age and sex, which were assessed within the same models. To examine effect modification of the metabolomics trajectories by sex, analyses were repeated stratifying the sample by sex. To assess the statistical significance of the effect modification, separate models were run that included an interaction term for age-by-sex using the full sample (men and women combined). All models included random intercepts for within-subject correlations (due to repeated measures) and within-family correlations (due to siblings). Models included fixed effects for age, sex, self-reported race, body-mass in- AGING dex, sample storage time, and cholesterol lowering medication use, which was the most commonly used class of medications in our sample. Since our cohort has increased risk for Alzheimer's disease, we performed sensitivity analyses including additional fixed effects for parental history of Alzheimer's and participant cognitive impairment status, and results were largely unchanged. Each set of analyses was corrected for multiple testing using the Benjamini-Hochberg [13] adjustment with an alpha of 0.05.

Aging metabolomics
All metabolome-wide association results are summarized in Table 2 and detailed in Table S1. After adjusting for multiple testing, the levels of 623 metabolites (56.8% of metabolites assessed) significantly changed with age, 523 of which increased with age ( Figure S2A and Figure 1). Of the total 34 steroid lipids tested, 29 significantly decreased with age (including 19/22 androgenic, 5/5 progestin, 4/4 pregnenolone, and 1/3 corticosteroids), while two, 11-ketoetiocholanolone glucuronide, an androgenic steroid, and cortisol, significantly increased with age. Higher levels of most fatty acid lipids were associated with increased age (including 13/14 long chain fatty acids, 28/34 acylcarnitines, and 42/78 other fatty acids). Higher levels of sphingolipids tended to be associated with increased age (24/25 associated sphingolipids).
The majority of amino acids associated with age increased with age (87.6% or 92/105 associated amino acids), including glutamine and tyrosine of the 20 common amino acids that are encoded directly by the genetic code. Five other common amino acids decreased with age (histidine, threonine, tryptophan, leucine, and serine), while the 13 others were not associated with age. Positive values indicate the amount a metabolite increased over 10 years, whereas negative values indicate the amount a metabolite decreased over 10 years. Black vertical lines indicate standard errors.   Shaded rows represent super pathways, which sum to the "Total" row. Sub pathways are indented. In the Sex columns, + means the metabolite was higher in women, whereas -means the metabolite was higher in men. For all other columns, + means the metabolite increased with age, whereas -means it decreased with age. In the Age*Sex columns, +/+ means the metabolite increased with age in both women and men, -/-means it decreased with age in both women and men, +/-means it increased with age in women and decreased with age in men, and -/+ means it decreased with age in women and increased with age in men. Results from the Age and Sex columns were assessed within the same model; results from the Age in Women and Age in Men columns were assessed within separate models stratifying the sample by sex; and results from the Age*Sex column were assessed within a separate model including an age-bysex interaction term.

Sex metabolomics
Six hundred and ninety-five metabolites (63.4% of metabolites assessed) significantly differed by sex, with the slight majority (386 metabolites or 55.5%) found in lower levels in women ( Figure 2B and Figure 2). Of the metabolites associated with sex, 405 were also associated with age. Twenty-nine steroid lipids were associated with sex, all of which were found in significantly lower levels in women, with the exception of two corticosteroids (cortisol and corticosterone), which were found in higher levels in women. Androgenic steroids constituted the three metabolites most strongly associated with sex (5alpha-androstan-3alpha, 17beta-diol monosulfate, P=1.4e-311, 5alphaandrostan-3alpha, 17beta-diol 17-glucuronide, P=3.1e-228, and 5alpha-androstan-3alpha, 17beta-diol disulfate, P=5.7e-185).
Ninety fatty acids were associated with sex, 60 of which were found in higher levels in women. Acylcarnitine fatty acids were an exception, as 17/26 significantly associated acylcarnitines were found in lower levels in women. Among all tested phospholipids, 73.8% (48/65) were higher in women, as were 87.5% (35/40) of all tested sphingolipids.
The majority of amino acids associated with sex were found in lower levels in women (75.9% or 85/112), including 13 of the 20 common amino acids (alanine, tyrosine, methionine, arginine, proline, aspartate, asparagine, tryptophan, glutamate, phenylalanine, and the three branched-chain amino acids (BCAAs): leucine, isoleucine, and valine), while two were found in higher levels in women (glycine and serine). The remaining five did not significantly differ by sex.

Effect modification of metabolomics trajectories by sex
Analyses stratified by sex identified 565 metabolites (51.5% of metabolites assessed) that were significantly associated with age among women ( Figure S3A and Figure S4) and 255 metabolites (23.2% of metabolites assessed) among men ( Figure S3B and Figure S5), with 188 being common to both groups.
The trajectories of 68 metabolites (6.2%) significantly differed over time by sex ( Figures S3C and S6). The three most significant metabolites were sphingolipids, which were also the largest group of metabolites whose trajectories differed by sex (22.1% or 15/68). Nine of these sphingolipids increased with age among women and decreased with age among men. Several other groups of metabolites had trajectories that also differed by sex, including six fatty acids, five of which showed larger increases with age among women than men; eight steroid lipids, seven of which showed larger decreases with age among women than men; eight phospholipids, five of which increased in women and decreased in men with age; and cholesterol, which increased in women and decreased in men with age.

Metabolite heritability estimates
The h 2 of each metabolite was estimated using a variance components method that jointly models narrow-sense h 2 and the h 2 explained by genotyped variants [14], which allows for the inclusion of both closely and distantly related individuals, as implemented in GCTA [15]. A genetic relationship matrix was created from 272,839 weakly linked (R 2 <0.50) and common (MAF>0.05) directly genotyped variants. Analyses of h 2 were cross-sectional, using the first available metabolomics sample for 1,111 Caucasians that had both metabolomic and genomic data, and adjusted for sex and age. To assess whether metabolite h 2 could influence the effect of age or sex on metabolite levels, Pearson r was used to calculate correlations between h 2 estimates and the strength of associations (i.e., P-values) for age and sex.
Metabolites associated with age and sex had h 2 estimates that were representative of overall metabolite h 2 estimates. Among the 623 metabolites associated with age, the median h 2 =36.1% and Q 1 -Q 3 : 26.2-50.0%. Similarly, among the 695 metabolites associated with sex, the median h 2 =37.2% and Q 1 -Q 3 : 25.6-50.7%. Overall, metabolite h 2 estimates were not correlated with the strength of associations for age or sex (Pearson r=-0.01 and -0.02, respectively).

DISCUSSION
To our knowledge, this is the first longitudinal metabolomics assessment of aging and sex and uses one of the largest panels of metabolites reported to date. Our results provide strong evidence that most plasma metabolite levels are highly influenced by aging and that aging has a broader effect on metabolites in women than men. Metabolites are also highly influenced by sex, with men and women having substantially different metabolomic profiles. We report h 2 estimates on more metabolites than previously reported and find that most are influenced by a complex combination of genetic and environmental factors, consistent with previous studies [16,17]. How heritable a metabolite was did not appear to influence the effect of age or sex on metabolite levels.
Differences in levels of plasma lipid steroids, including androgens, progestins, and pregnenolones, were among the most significant findings for both age and sex. Steroid differences by sex serve as a proof of concept, as it is well established that androgens are present in lower levels in women than men [18]. Androgens are also known to decrease with age among men [19,20] and women [21,22].
Plasma metabolites we identified to be associated with sex and age are consistent with findings from previous cross-sectional studies. The UK Adult Twin Registry (TwinsUK) study reported 165 out of 280 (58.9%) tested serum and plasma metabolites to be associated with age in cross-sectional analyses [5]. Our data had 114 of these 165 metabolites, of which 72 were significantly associated with age, and 66 had effects that were in the same direction as those reported in the TwinsUK study (Table S2). The metabolites that had the opposite direction of effect between studies were four amino acids (dimethylarginine, leucine, serine, and tryptophan), one nucleotide (uridine), and one xenobiotic (theophylline), all of which we reported decreased with age, with the exception of dimethylarginine, which increased with age, contradictory to findings from the TwinsUK study. However, other studies have reported that serum tryptophan levels decrease with age [4,9]. Among the 66 metabolites with the same effect, 27 were lipids, all of which increased with age (the majority were fatty acids, including 10 long chain fatty acids, six polyunsaturated fatty acids, and four other fatty acids), and 14 were amino acids (including glutamine and tyrosine, which both increased with age, and histidine, which decreased with age).
The Cooperative Health Research in the Region of Augsburg (KORA F4) study, which was also crosssectional, reported 180 out of 507 (35.5%) tested serum metabolites to be associated with sex [7]. Our data had 98 of these 180 metabolites, of which 84 were significantly associated with sex, and all had effects that were in the same direction as those reported in KORA F4 (Table S3). Among these were 33 amino acids (including 11 common amino acids, all of which were lower in women except glycine and serine, which is also consistent with Mittelstrass et al. [6]); 18 lipids (including five long chain fatty acids and three medium chain acids, all of which were higher in women, and three androgenic steroids, all of which were lower in women); and 18 unknown metabolites (all but one were AGING lower in women). The single most significant finding in the KORA F4 study was the third most significant in our study (5alpha-androstan-3beta,17beta-diol disulfate, an androgenic steroid; the two other andro-genic steroids that were our first and second most significant sex findings were not tested in the KORA F4 study). Also consistent with our findings, other studies have reported serum and plasma phosphatidylcholines and sphingolipids levels to be higher in women than men [6,8,23], and serum acylcarnitines to be lower in women [6].
Consistent with results from our sex-stratified analyses, a previous KORA F4 publication also reported serum sphingolipids to increase in concentrations with age among women and acylcarnitines to increase with age among both women and men [4]. It has been shown that higher levels of acylcarnitines are associated with higher risk for type 2 diabetes and obesity, which are increasingly common conditions in the US, and correlate with poor glycemic control [24]. Follow up research is needed to investigate whether acylcarnitines are causally associated with obesity and could serve as a target for obesity intervention. The KORA F4 study, which had a sample of 1,038 women and 1,124 men, also similarly found twice as many metabolites associated with age among women than men. This suggests that our similar observation may not be driven solely by the differences in sample sizes between women and men in our study and that it may have biological implications; i.e., aging may influence a wider breadth of metabolites in women than men. A probable cause for such a difference may be that during menopause, women experience very abrupt and dramatic hormone changes and loss of ovarian function, whereas during "andropause", men experience a gradual loss of hormones and decline in fertility [25]. These hormonal changes could be associated with other metabolic changes as well. Post-menopausal women have higher levels of sphingomyelins, fatty acids, acylcarnitines, lysophosphatidylcholines, and several amino acids than pre-menopausal women [26,27], and a recent study found that plasma and urine metabolomics can be used to predict menopause status with 90% accuracy. Moreover, androgenic steroids have been linked to lipid levels in postmenopausal women [28]. Given that the baseline average ages of women and men in our sample are each ~61 years old, it is likely that our results are indicative of hormonal changes that occur in later ages and that most of our female participants have undergone menopause. It will be crucial to replicate these findings with a metabolomics panel that captures a larger proportion of the ~25,000 known blood metabolites in order to determine the validity of this hypothesis. Among the 68 metabolites with different trajectories between women and men were sphingolipids, phosphatidylcholines, and cholesterol. Metabolites from the latter two subgroups have been previously reported to have similar trajectories as what we identified, i.e., increasing with age in women and decreasing in men [26]. To our knowledge, a decrease of sphingolipid levels in men as our results suggest has not been previously reported. However, it has been reported that women have greater sphingomyelin increases with age than men [29] and that women with high sphingomyelin levels have reduced risk of AD, while men with high levels of sphingomyelins have increased risk of AD [30]. This could suggest that among men, declining levels of sphingolipids are a typical trait of healthy aging. While impaired sphingolipid metabolism is thought to be involved in AD [31], follow-up investigations are needed to verify whether declining sphingolipids indicate healthy aging in men but increase AD risk in women.
Understanding how metabolites differ by sex and change with age could have implications for cancer. A recent study found that men with higher levels of serum androgenic steroids, which decrease with age in healthy men, measured up to 25 years prior to a diagnosis of prostate cancer were prospectively associated with increased risk of prostate cancer death [32]. Establishing "healthy" metabolite trajectories could help identify these high-risk individuals at different stages of life and be used to better understand changes occurring in the tumor microenvironment.
We compared our metabolite h 2 estimates to those recently estimated from a twin study of 1,930 individuals in the TwinsUK cohort [17]. Among the 466 metabolites overlapping with our study, h 2 estimates were only moderately correlated (Pearson r=0.36) and our estimates were 9.6 percentage points lower on average. However, our metabolite h 2 estimates were 8.9 percentage points higher on average (and had a lower correlation of r=0.25) when comparing 191 overlapping metabolite h 2 estimates from an earlier twin study based on 7,824 individuals from both the KORA F4 and TwinsUK cohorts [16]. Interestingly, despite having some overlapping participants, h 2 estimates between these two previous studies were only moderately correlated: among 163 overlapping metabolite h 2 estimates, Pearson r was 0.38, with estimates based on the TwinsUK cohort being 18.8 percentage points higher on average than the combined KORA and TwinsUK study. Differences in h 2 estimates may be driven by differences in population composition and size, phenotypic variation, and analytic approaches.

AGING
Although the strength of aging and sex metabolite associations were not associated with metabolite heritability, several of our aging and sex metabolites were identified to be associated with genetic variants (mQTLs, or metabolomic quantitative trait loci) in a previous study [16]. These aging and sex mQTLs are summarized in Tables S4 and S5, respectively. One of the androgenic steroids linked strongly to both sex and age in our analyses (5alpha-androstan-3beta, 17betadiol disulfate, h 2 =41.1%) is associated with a variant in the CYP3A5 gene (P=1.17e-29). CYP3A enzymes play a critical role in the metabolism of ~30% of clinically used drugs, and the capacity to metabolize drugs declines with age [33]. Expression of cytochrome P450 (CYP) enzymes typically increases with age and has been shown to be influenced by interactions between age and sex [34]. It is likely that some of our observed metabolomic changes with age and differences by sex are linked to these CYP changes. An in-depth pharmacogenomics investigation into relationships between CYP enzymes, androgenic steroids, age, and sex could further elucidate factors of aging that influence drug metabolism.
This study was not without limitations. Although our analyses adjusted for cholesterol lowering medication use, there could be residual confounding due to differences in duration or type of cholesterol lowering medication, which could be influencing the apparent lower lipid levels in men. An in-depth investigation into medication use could be informative. Our findings are likely driven by our panel of metabolites, and it is possible that a different panel of metabolites could produce different results. Many of our findings are in accordance with previous publications, thereby strengthening confidence in our results that have not been previously investigated with regards to age and sex. Accordingly, it will be crucial to replicate novel findings with an external cohort. However, we also identified several inconsistencies between our study and others regarding h 2 estimates and a few of our association results, which could have been due to differences in study designs and sample populations. This challenge is common [35], as the field of metabolomics is rapidly developing and widely accepted standards for quality control techniques are forthcoming. Differences in platforms, quantification techniques, statistical analysis methods, laboratory techniques for sample handling (i.e., anti-coagulation method, preservation, storage duration), and fasting status at the time of the sample draw may result in large variations from one study to another [36]. The metabolomics quality control process we have outlined here as well as that described in Voyle et al. [37] could serve as guidelines for future studies. Many of our findings included metabolites that had unknown chemical structures, which is a current limitation of the field of metabolomics, as it can be difficult and costly to accurately identify metabolites. Further, we only investigated linear effects of age, but non-linear age effects may exist and should be investigated in future investigations.
Using a large panel of longitudinal metabolomics data, we conducted a comprehensive investigation of the influence of aging and sex on metabolomics. Our findings suggest that levels of most metabolites are highly influenced by sex and age, and that sex differentially influences levels and trajectories of many metabolites. These findings underscore the importance of incorporating age and sex in the design and analysis of metabolomics investigations and offer a deeper understanding of the aging process that could inform many novel hypotheses regarding the role of metabolites in healthy and accelerated aging.

Participants
Study participants were from WRAP, a longitudinal study of initially dementia free middle-aged adults that allows for the enrollment of siblings and is enriched for a parental history of Alzheimer's disease. Further details of the study design and methods used have been previously described [38,39]. For the current analyses, follow-up occurred every two years. This study was conducted with the approval of the University of Wisconsin Institutional Review Board and all subjects provided signed informed consent before participation.

Plasma collection and sample handling
Fasting blood samples for this study were drawn the morning of each study visit. Blood was collected in 10 mL ethylenediaminetetraacetic acid (EDTA) vacutainer tubes. They were immediately placed on ice, and then centrifuged at 3000 revolutions per minute for 15 minutes at room temperature. Plasma was pipetted off within one hour of collection. Plasma samples were aliquoted into 1.0 mL polypropylene cryovials and placed in -80°C freezers within 30 minutes of separation. Samples were never thawed before being shipped overnight on dry ice to Metabolon, Inc. (Durham, NC), where they were again stored in -80°C freezers and thawed once before testing.

Metabolomic profiling and quality control
An untargeted plasma metabolomics analysis was performed by Metabolon, Inc. using Ultrahigh Performance Liquid Chromatography-Tandom Mass Spectro-AGING metry (UPLC-MS/MS). Quantification was performed as previously described [40]; details are outlined in the Supplemental Note. Metabolites within nine super pathways were identified: amino acids, carbohydrates, cofactors and vitamins, energy, lipids, nucleotides, partially characterized molecules, peptides, and xenobiotics.
Up to three longitudinal plasma samples were available for each participant. Metabolites with an interquartile range of zero (i.e., those with very low or no variability) were excluded from analyses (n=178 metabolites). After removing these metabolites, samples were missing a median of 11.7% metabolites, while metabolites were missing in a median of 1.2% of samples. Missing metabolite values were imputed to the lowest level of detection for each metabolite. Metabolite values were median-scaled and log-transformed to normalize metabolite distributions [41]. If a participant reported that they did not fast or withhold medications and caffeine for at least eight hours, the sample was excluded from analyses (n=159 samples). A total of 1,097 metabolites among 2,344 samples remained for analyses.

DNA collection and genomics quality control
DNA was extracted from whole blood samples using the PUREGENE ® DNA Isolation Kit (Gentra Systems, Inc., Minneapolis, MN). DNA concentrations were quantified using the Invitrogen™ Quant-iT™ PicoGreen™ dsDNA Assay Kit (Thermo Fisher Scientific, Inc., Hampton, NH) analyzed on the Synergy 2 Multi-Detection Microplate Reader (Biotek Instruments, Inc., Winooski, VT). Samples were diluted to 50 ng/ul following quantification.
A total of 1,340 samples were genotyped using the Illumina Multi-Ethnic Genotyping Array at the University of Wisconsin Biotechnology Center ( Figure  S7). Thirty-six blinded duplicate samples were used to calculate a concordance rate of 99.99%, and discordant genotypes were set to missing. Sixteen samples missing >5% of variants were excluded, while 35,105 variants missing in >5% of individuals were excluded. No samples were removed due to outlying heterozygosity. Six samples were excluded due to inconsistencies between self-reported and genetic sex.
Due to sibling relationships in the WRAP cohort, genetic ancestry was assessed using Principal Components Analysis in Related Samples (PC-AiR), a method that makes robust inferences about population structure in the presence of relatedness [42]. This approach included several iterative steps and was based on 63,503 linkage disequilibrium (LD) pruned (r 2 <0.10) and common (MAF>0.05) variants, using the 1000 Genomes data as reference populations [43]. First, kin-ship coefficients (KCs) were calculated between all pairs of individuals using genomic data with the Kinship-based Inference for Gwas (KING)-robust method [44]. PC-AiR was used to perform principal components analysis (PCA) on the reference populations along with a subset of unrelated individuals identified by the KCs. Resulting principal components (PCs) were used to project PC values onto the remaining related individuals. All PCs were then used to recalculate the KCs taking ancestry into account using the PC-Relate method, which estimates KCs robust to population structure [45]. PCA was performed again using the updated KCs, and KCs were also estimated again using updated PCs. The resulting PCs identified 1,198 WRAP participants whose genetic ancestry was primarily of European descent. This procedure was repeated within this subset of participants (excluding 1000 Genomes individuals) to obtain PC estimates used to adjust for population stratification in subsequent genomic analyses. Among European descendants, 160 variants were not in Hardy-Weinberg equilibrium (HWE) and 327,064 were monomorphic and thus, removed.
A total of 1,294,660 bi-allelic autosomal variants among 1,198 European descendants remained for imputation, which was performed with the Michigan Imputation Server v1.0.3 [46], using the Haplotype Reference Consortium (HRC) v. r1.1 2016 [47] as the reference panel and Eagle2 v2.3 [48] for phasing. Prior to imputation, the HRC Imputation Checking Tool [49] was used to identify variants that did not match those in HRC, were palindromic, differed in MAF>0.20, or that had non-matching alleles when compared to the same variant in HRC, leaving 898,220 for imputation. A total of 39,131,578 variants were imputed. Variants with a quality score R 2 <0.80, MAF<0.001, or that were out of HWE were excluded, leaving 10,400,394 imputed variants. These were combined with the genotyped variants, leading to 10,499,994 imputed and genotyped variants for analyses. Data cleaning and file preparation were completed using PLINK v1.9 [50] and VCFtools v0.1.14 [51]. Coordinates are based on GRCh37 assembly hg19. AGING providing Illumina Infinium genotyping services. We especially thank the WRAP participants.

Metabolite profiling
Plasma metabolites were profiled by Metabolon (Durham, NC) using Ultrahigh Performance Liquid Chromatography-Tandom Mass Spectrometry (UPLC-MS/MS). Samples were prepared using the automated MicroLab STAR® system from Hamilton Company. Several recovery standards were added prior to the first step in the extraction process for QC purposes. To remove protein, dissociate small molecules bound to protein or trapped in the precipitated protein matrix, and to recover chemically diverse metabolites, proteins were precipitated with methanol under vigorous shaking for 2 min (Glen Mills GenoGrinder 2000) followed by centrifugation. The resulting extract was divided into five fractions: two for analysis by two separate reverse phase (RP)/UPLC-MS/MS methods with positive ion mode electrospray ionization (ESI), one for analysis by RP/UPLC-MS/MS with negative ion mode ESI, one for analysis by HILIC/UPLC-MS/MS with negative ion mode ESI, and one sample was reserved for backup. Samples were placed briefly on a TurboVap® (Zymark) to remove the organic solvent. The sample extracts were stored overnight under nitrogen before preparation for analysis.
Several types of controls were analyzed in concert with the experimental samples: a pooled matrix sample generated by taking a small volume of each experimental sample (or alternatively, use of a pool of wellcharacterized human plasma) served as a technical replicate throughout the data set; extracted water samples served as process blanks; and a cocktail of QC standards that were carefully chosen not to interfere with the measurement of endogenous compounds were spiked into every analyzed sample, allowed instrument performance monitoring and aided chromatographic alignment. Instrument variability was determined by calculating the median relative standard deviation (RSD) for the standards that were added to each sample prior to injection into the mass spectrometers. Overall process variability was determined by calculating the median RSD for all endogenous metabolites (i.e., noninstrument standards) present in 100% of the pooled matrix samples. Experimental samples were randomized across the platform run with QC samples spaced evenly among the injections.
All methods utilized a Waters ACQUITY ultraperformance liquid chromatography (UPLC) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. The sample extract was dried then reconstituted in solvents compatible to each of the four methods. Each reconstitution solvent contained a series of standards at fixed concentrations to ensure injection and chromatographic consistency. One aliquot was analyzed using acidic positive ion conditions, chromatographically optimized for more hydrophilic compounds. In this method, the extract was gradient eluted from a C18 column (Waters UPLC BEH C18-2.1x100 mm, 1.7 μm) using water and methanol, containing 0.05% perfluoropentanoic acid (PFPA) and 0.1% formic acid (FA). Another aliquot was also analyzed using acidic positive ion conditions, however it was chromatographically optimized for more hydrophobic compounds. In this method, the extract was gradient eluted from the same afore mentioned C18 column using methanol, acetonitrile, water, 0.05% PFPA and 0.01% FA and was operated at an overall higher organic content. Another aliquot was analyzed using basic negative ion optimized conditions using a separate dedicated C18 column. The basic extracts were gradient eluted from the column using methanol and water, however with 6.5mM Ammonium Bicarbonate at pH 8. The fourth aliquot was analyzed via negative ionization following elution from a HILIC column (Waters UPLC BEH Amide 2.1x150 mm, 1.7 μm) using a gradient consisting of water and acetonitrile with 10mM Ammonium Formate, pH 10.8. The MS analysis alternated between MS and data-dependent MSn scans using dynamic exclusion. The scan range varied slighted between methods but covered 70-1000 m/z. Raw data files are archived and extracted as described below.
Raw data was extracted, peak-identified and QC processed using Metabolon's hardware and software. These systems are built on a web-service platform utilizing Microsoft's .NET technologies, which run on high-performance application servers and fiber-channel storage arrays in clusters to provide active failover and load-balancing. Compounds were identified by comparison to library entries of purified standards or recurrent unknown entities. Metabolon maintains a library based on authenticated standards that contains the retention time/index (RI), mass to charge ratio (m/z), and chromatographic data (including MS/MS spectral data) on all molecules present in the library. Furthermore, biochemical identifications are based on three criteria: retention index within a narrow RI window of the proposed identification, accurate mass match to the library +/-10 ppm, and the MS/MS forward and reverse scores between the experimental data and authentic standards. The MS/MS scores are based on a comparison of the ions present in the experimental spectrum to the ions present in the library AGING spectrum. While there may be similarities between these molecules based on one of these factors, the use of all three data points can be utilized to distinguish and differentiate biochemicals. More than 3300 commercially available purified standard compounds have been acquired and registered into LIMS for analysis on all platforms for determination of their analytical characteristics. Additional mass spectral entries have been created for structurally unnamed biochemicals, which have been identified by virtue of their recurrent nature (both chromatographic and mass spectral). These compounds have the potential to be identified by future acquisition of a matching purified standard or by classical structural analysis.
A variety of curation procedures were carried out to ensure that a high quality data set was made available

SUPPLEMENTARY FIGURES
for statistical analysis and data interpretation. The QC and curation processes were designed to ensure accurate and consistent identification of true chemical entities, and to remove those representing system artifacts, misassignments, and background noise. Metabolon data analysts use proprietary visualization and interpretation software to confirm the consistency of peak identification among the various samples. Library matches for each compound were checked for each sample and corrected if necessary.
Peaks were quantified using area-under-the-curve. A data normalization setp was performed to correct variation resulting from instrument inter-day tuning differences. Essentially, each compound was corrected in run-day blocks by registering the medians to equal one and normaling each data point proportionately.