High‐throughput serum proteomics for the identification of protein biomarkers of mortality in older men

Summary The biological perturbations associated with incident mortality are not well elucidated, and there are limited biomarkers for the prediction of mortality. We used a novel high‐throughput proteomics approach to identify serum peptides and proteins associated with 5‐year mortality in community‐dwelling men age ≥65 years who participated in a longitudinal observational study of musculoskeletal aging (Osteoporotic Fractures in Men: MrOS). In a discovery phase, serum specimens collected at baseline in 2473 men were analyzed using liquid chromatography–ion mobility–mass spectrometry, and incident mortality in the subsequent 5 years was ascertained by tri‐annual questionnaire. Rigorous statistical methods were utilized to identify 56 peptides (31 proteins) that were associated with 5‐year mortality. In an independent replication phase, selected reaction monitoring was used to examine 21 of those peptides in baseline serum from 750 additional men; 81% of those peptides remained significantly associated with mortality. Mortality‐associated proteins included a variety involved in inflammation or complement activation; several have been previously linked to mortality (e.g., C‐reactive protein, alpha 1‐antichymotrypsin) and others are not previously known to be associated with mortality. Other novel proteins of interest included pregnancy‐associated plasma protein, VE‐cadherin, leucine‐rich α‐2 glycoprotein 1, vinculin, vitronectin, mast/stem cell growth factor receptor, and Saa4. A panel of peptides improved the predictive value of a commonly used clinical predictor of mortality. Overall, these results suggest that complex inflammatory pathways, and proteins in other pathways, are linked to 5‐year mortality risk. This work may serve to identify novel biomarkers for near‐term mortality.


| INTRODUCTION
Biomarkers may have a variety of biomedical applications, including the identification of individuals at risk of clinical outcomes, improving the evaluation of interventions aimed at modifying outcomes, and the elucidation of biological pathways that contribute to health or disease. In the field of aging, there have been numerous studies aimed at the identification of prognostic biomarkers of mortality (Arbeev, Ukraintseva & Yashin, 2016;Barron, Lara, White & Mathers, 2015). Most have utilized assays to evaluate specific candidate biomarkers believed to reflect relevant biological events, including many related to inflammation (Enomoto et al., 2014;Kabagambe et al., 2011;Reuben et al., 2002;Schnabel et al., 2013;Zuo et al., 2016). The identification of good biomarkers could be useful to identify biological processes associated with mortality and to assess the effectiveness of interventions aimed at reducing mortality or extending health span.
Proteomics is a powerful technology that has been used for biomarker discovery (Huang, Ma, Huang, Li & Nice, 2017). Until recently, the discovery of new biomarkers using proteomic approaches has been limited by methods that are demanding of time and resources and have an inherently low throughput. Thus, studies have been generally restricted to relatively small sample sizes that are inadequate to assess variability on a population scale. New approaches, such as aptamer-based or antibody-based affinity proteomics, allow multiplexing and larger sample sizes but are limited to the evaluation of a relatively limited number of candidate proteins (Delfani et al., 2016). The plasma proteome is particularly complex but offers an important window on individual variation and the relationship of proteins to important phenotypes.
We have recently developed high-throughput and sensitive methods that allow a broad assessment of the serum proteome (Baker et al., 2010). Similar pipelines for large-scale discovery proteomics have been employed in several pioneering studies (Geyer et al., 2016;Surinova et al., 2015). Here, we describe the use of these high-throughput proteomic methods in a large cohort of older men, including an initial discovery phase followed by a validation phase in an independent subcohort, for the identification of peptides and proteins associated with mortality. We detect proteins previously well documented to be associated with mortality, but also a variety of novel proteins. These results provide unique insight into the biological basis of mortality and illustrate the potential utility of this approach for biomarker discovery.

| RESULTS
The characteristics of the discovery and replication cohorts are shown in Table 1. In the discovery cohort (N = 2473), the mean age was 73.6 AE 5.8 years (range 65-99). Among these men, death occurred within 2 years of enrollment in 66 (2.7%), in 119 within 3 years (4.8%), and in 267 within 5 years (10.8%). Few men withdrew from the study: two men within 2 years, 12 within 3 years, and 32 within 5 years. Overall, the cause of death was related to cancer in about one-thirds at any time point and to other causes in the remaining two-thirds. The characteristics of the men in the replication cohort were similar (Table 1).

| Enrichment analyses
Thirty of the 31 proteins represented by the 56 peptides associated with 5-year all-cause mortality in the discovery phase were enriched for the KEGG pathway "coagulation and complement cascades" (one of the proteins, K0819, could not be annotated), driven by 11 proteins that included several complement factors, VWF, kininogen, and prothrombin (FDR for enrichment = 2 9 10 À6 , Figure 2). Several of these proteins were also responsible for significant enrichment of the GO biological process terms "regulation of complement activation" and "negative regulation of endopeptidase activity" (Table S1).
Half of all mortality-associated proteins were linked to the GO term "blood microparticle," whereas approximately 80% of all mortalityassociated proteins were linked to GO terms for extracellular region or exosome (Table S2). Seven of the 30 proteins evaluated had links to the GO molecular function term "receptor binding" (p = 1.0 9 10 À4 but FDR = 12%, Table S3). The 30 mortality proteins had evidence of multiple protein-protein interactions ( Figure S1 and http://bit.ly/2lJkCiY).

| Replication with SRM
Ninety-five percent of the peptides assayed by SRM were detected in at least 90% of the replication samples. 82% of peptides had concordant results between discovery and replication for 5-year allcause mortality, which was statistically significant by permutation test (p = .05). Seventeen of the 21 peptides selected for SRM replication (from the set of 56 peptides associated with 5-year all-cause mortality in the discovery phase [ Note that the LRC57, TTHY, and VTNC effect sizes attenuated to zero in the SRM replication, and for C1QC, the effect size reversed direction of association (negative in IMS, positive in SRM). a = followed for validation of effect using SRM (21 peptides representing 16 proteins).
F I G U R E 2 KEGG analysis of mortality-associated proteins revealed an 11-fold enrichment in proteins that are members of the "Complement and coagulation cascades" pathway (KEGG term hsa04610). Mortality-associated proteins are highlighted in yellow. (KEGG Mapper v2.8 released October 20, 2016). Some KEGG names differ from the UniProt names presented in Table 2: | 5 of 12 2.2 | Peptides associated with mortality at 2 and

years
The numbers of deaths were limited after 2 and 3 years of followup, and thus, these analyses are exploratory. The 46 peptides significantly associated with mortality at 2 and 3 years in the discovery phase that were also concordantly associated with mortality in the replication phase are shown in Table 3.
However, of the 46 peptides selected for replication testing from the 2-and 3-year mortality associations, the 23 that retained associations in the replication phase represent a proportion that would not be atypical even if the SRM associations were independent of the discovery associations (p = .48 for 2-year peptides and p = .27 for the 3-year peptides). Nevertheless, eight of these peptides (six proteins) were also found to be significantly associated with 5-year mortality, potentially reinforcing their validity as biomarkers at earlier time points. Seven of these eight peptides are related to inflammatory pathways.

| Use of a peptide signature to predict mortality
The average ROC curves from the 5-peptide subset models (see Section 4.4) are shown in Figure 3. The relative overall contribution of each peptide to the best-fitting models is shown in Concordant at 2 years and 3 years but attenuated at 5 years.
T A B L E 3 Peptides associated with mortality in discovery and replication phases for 2-year or 3year all-cause mortality phenotypes.
The 5-year mortality results for these peptides are included for comparison purposes proteins well known to be related to the risk of mortality and cardiovascular disease (e.g., CRP, adiponectin, angiotensinogen, and prothrombin), a variety are involved in inflammation, as well as others related via as of yet unknown mechanisms. Most of the associations between peptide abundance and mortality appeared to arise from noncancer deaths. It is interesting that nearly all the associations were increases in abundance for those who died. The one-sidedness of the effects may suggest a broad type of biological change, such as cell death or senescence in addition to inflammation.
The search for biomarkers of important health outcomes has been a biomedical research priority. Biomarkers can provide tools for prediction and diagnosis, insight into pathophysiology, and targets for the development of therapeutics. To our knowledge, this study represents the largest proteomic effort to discover biomarkers of biomedical outcomes. Previous proteomic analyses have been limited to smaller numbers of participants and to the assessment of specific candidate proteins or other compounds. Our approach offers an unbiased opportunity to explore large numbers of peptides/proteins in order to identify those associated with outcomes. In addition, we utilized targeted proteomics to replicate peptides identified in the discovery phase, and thus to provide more confidence in their association with mortality.
One major goal of these analyses was to identify potentially useful biomarkers of 5-year mortality. As proteomic measurements are of peptides, we intentionally report the candidate peptide biomarkers. But, in addition, we were interested in exploring the biological bases for these associations, and thereby the biology of mortality.
Therefore, our approach also includes a protein-centric component that incorporates methods providing high confidence in the roll up from peptides to proteins. As above, the essential analytical method (mass spectrometry) is peptide based, and more importantly, the validation step we utilized (SRM) is designed to detect specific peptides.
Thus, we believe that reporting these peptides most appropriately reflects the experimental approach. Moreover, we envision that SRM (or a similar method) will become routine in the measurement of multiple peptide biomarkers that predict outcomes (e.g., mortality), and so, reporting the peptides employed in SRM is necessary. Of course, the peptide biomarkers can be interpreted in light of their precursor proteins and we have been diligent in ensuring the peptide-to-protein identification step is robust. Our analyses included only peptides that could be unambiguously linked to a protein via established human protein sequence databases. To best ensure the linkages, we employed a conservative confidence scoring system in identifying proteins from peptides. Our methods have been extensively described (Nielson et al., 2017). The number of identified peptides that could not be unambiguously rolled up to proteins (and were thus excluded from consideration for SRM validation) was very small (~2%).
The fact that many peptides were associated with the risk of death as much as 5 years in the future is not only biologically interesting but also supports the potential usefulness of population proteomic approaches to identify peptides and proteins useful as clinical biomarkers. We demonstrated that the use of a peptide signature added predictive value to the Schonberg index that incorporates most important clinical predictors of mortality, and the magnitude of AUC increase (0.042) is in nearly direct proportion to the increase in sensitivity that would be obtained for diagnostic cutoffs at moderately high specificities, as observed in Pencina, D'Agostino & Massaro (2013). Hence, it would represent a not insignificant improvement in diagnostic power of the kind that is often difficult to achieve (i.e., when supplementing a baseline classifier that is already somewhat sensitive) (Baker et al., 2014). Although the magnitude of predictive improvement is modest, it does indicate the potential to develop more robust peptide-based tools and to use the associations to better understand the biology of mortality. The ability to identify individuals at higher risk well before death, and the underlying molecular disturbances, could provide opportunity to intervene with improved preventive measures. Perhaps most importantly, the improvement in prediction by adding a peptide signature to clinical factors suggests the importance of extraclinical information related to mortality.
The proteins represented by the peptides we found to be associated with mortality reinforce some previous findings that implicate inflammation in the genesis of mortality risk (Reuben et al., 2002;Vidula et al., 2008). For instance, peptides from CRP were strongly associated with the risk of death in our participants, recapitulating many previous reports, and biomarkers of inflammation have consistently been reported for cardiovascular disease and mortality (Weiner et al., 2008). Twenty percent of the mortality-related proteins in this study were related to regulation of complement activation, an integral element in both adaptive and innate immune systems, that yields the generation of potent inflammatory mediators and cell destruction (Dunkelberger & Song, 2010). Peptides representing proteins in the complement cascade that were increased in men who experienced death within 5 years included peptides of complement factor H, complement C5, and complement C9. Although acute elevations in complement components have been described as predicting poor prognosis among hospitalized patients (Hoesel, Niederbichler & Ward, 2007), our results support the use of these biomarkers to predict mortality in ambulatory men over a relatively long period of observation.
Additional proteins matched to peptides associated with mortality in our analysis are also implicated in inflammatory events. For instance, alpha 1-antichymotrypsin (AACT) has been associated with N is a plasma zinc metalloprotease mediator of inflammation (Skidgel & Erdos, 2007), and levels of transthyretin have been noted to be depressed in inflammatory states (Ingenbleek & Bernstein, 2015). In fact, we found lower levels of transthyretin peptides in those who Several other peptides that we found to be associated with mortality are linked to biologically interesting proteins and may be useful biomarkers. We found higher levels of peptides for pregnancy-associated plasma protein A (PAPP-A) to be associated with increased mortality. Elevations in PAPP-A levels are present in acute coronary syndromes (Bayes-Genis et al., 2001) and have been associated with a number of age-related disorders, as well as inflammation, prompting a recent suggestion that therapeutic reduction on PAPP-A may be a strategy to promote healthy aging (Conover & Oxvig, 2017).
The biological underpinnings of other peptides for which levels were positively associated with mortality are less clear but may be worth further investigation. A number of the proteins linked to these peptides are involved in cell-cell adhesion and endothelial biology. For instance, VE-cadherin (CADH5) interacts with several critical signaling pathways (e.g., catenins, fibroblast growth factors, TGF-b) and plays an important role in endothelial cell biology through control of the cohesion and organization of the intercellular junction (Lagendijk & Hogan, 2015). Leucine-rich a-2 glycoprotein 1 (LRG1) is involved in protein-protein interactions, signaling, and cell adhesion (Song & Wang, 2015), and vitronectin contributes to cell adhesion. Vinculin is part of a complex that anchors actin to the cell membrane, participates in cell-cell adhesion, interacts with b-catenin, and is involved in the control of apoptosis (Saunders et al., 2006). Other proteins associated with mortality in our studies included mast/stem cell growth factor receptor (SCFR or KIT), which is a receptor tyrosine kinase responsive to stem cell factor, is expressed by hematopoietic stem cells and other tissues, and is essential for cell survival by suppressing apoptosis (Lennartsson & Ronnstrand, 2012). Saa4 is a serum amyloid protein of unclear function. In mouse models, it is upregulated by LPS and IL-6 (Rossmann et al., 2014) and its expression is increased during inflammation, including in muscle during critical illness (Langhans et al., 2014).
Our analysis has important strengths. It takes advantage of a large, prospective observational study that includes excellent followup and ascertainment of mortality. Discovery proteomic measures were performed on almost 2500 men, thus representing the largest such experiment available. Importantly, we included an independent replication phase using targeted proteomics (SRM) to provide added validation of the associations identified in the initial discovery phase.
To provide greater assurance that our results were robust, we were able to compare multiple peptide associations at once, demonstrating, for example, that several proteins, such as AACT, A2GL, CO9, CO5, and prothrombin, may be as strongly predictive of mortality as CRP. Demonstrating the association of biomarkers to health outcomes with confidence is challenging, and we included very robust statistical methods to link peptides with mortality risk. Importantly, we replicated findings from the discovery phase with an independent evaluation using targeted proteomic methods (selected reaction monitoring: SRM).
Several limitations should also be mentioned. Our primary proteomic measurements were of peptides and we report those that were robustly associated with mortality. The associations of many were replicated in SRM analyses and they may serve as useful biomarkers. While these peptides can be uniquely linked to specific proteins, not all peptides from any protein were similarly highly associated and thus we cannot comment directly on the levels of intact proteins and their associations with mortality. Moreover, whereas we used rigorous statistical methods to identify the associations of F I G U R E 3 Receiver operating characteristics (ROC) analyses predicting all-cause mortality: models include peptide signature, Schonberg index, and peptide signature + Schonberg index. The light gray bands show the ROC curves for each of the best-fitting signatures; the dark gray lines inside the bands represent the average ROC curve for each model. The jagged dark gray line without bands is the ROC curve for the Schonberg index (a step function because the index can take on only a small discrete range of values) peptides to mortality, in some cases, only one or few peptides from a protein were detected, thus limiting our ability to comment on protein-level associations. Although the proteomic analysis utilized during discovery is limited in terms of sensitivity, it is also relatively comprehensive and we examined a very large number of participants. We lacked detailed clinical confirmation of the cause of death.
As our pathway and protein-protein interaction analyses demonstrate, many of the mortality-associated peptides we report are from proteins with functions that are biologically linked, and while we can implicate major pathways as being associated with the risk of mortality, it is more difficult to evaluate the relative importance of each peptide/protein. While our results do not definitively point to new clear pathways related to mortality, a number of the associations we detect are unique and suggest avenues for further investigation.
Finally, observational studies such as ours are limited in their ability to disentangle the correlative from the causal factors, and from these analyses, we cannot determine the time at which potentially detrimental pathways (e.g., inflammation) become associated with mortality. Future experimental studies may help to elucidate the relationships among proteins and with outcomes relevant to human health.
In summary, we performed large-scale proteomic analyses on a large number of older men, and describe peptides that are associated with 5-year mortality. An independent replication study was performed to provide robust validation of these associations. Many of the proteins we identified are matched to proteins involved in inflammatory pathways, suggesting that inflammation is strongly associated with mortality over at least 5 years. These results provide the opportunity to further evaluate these peptides and proteins as biomarkers and highlight the potential importance of the biological pathways they implicate in the origins of death.

| Study participants
MrOS is a prospective observational cohort study of musculoskeletal health in men aged ≥65 years. The design has been described (Blank et al., 2005) We performed two independent phases of proteomics measures using samples from MrOS participants: an initial discovery phase employed a unique liquid chromatography-ion mobility-mass spectrometry (LC-IMS-MS) platform to assess a broad spectrum of serum peptides to identify those associated with mortality, and a replication phase was used to further evaluate discovery phase peptide associations. The replication phase utilized an independent subcohort of MrOS men as well as a targeted proteomic measures (SRM).

| Discovery phase
For the discovery phase, 2486 MrOS participants of self-reported white European ancestry who had sufficient baseline serum available were randomly selected for proteomics analyses. Because non-white men represented a small proportion of the MrOS cohort (10%), racial comparisons were not possible and we limited our analyses to white participants. There were no significant differences in baseline characteristics of this group compared to the entire MrOS cohort (Table 1).
Thirteen samples (about 0.5%) were later excluded owing to technical measurement failures or extreme measurement distributions, leaving 2473 participants in the discovery analysis.

| Replication phase
We independently tested the associations with mortality of a set of peptide candidates identified in the discovery phase by assessing the levels of abundance in baseline serum from~750 men selected from an independent group of 2632 MrOS participants who had not been included in the initial discovery analyses, based on the same selection criteria (white race, adequate serum availability, nonwithdrawal from the study). We randomly selected 533 from the 2632 which contained 54 deaths within 5 years and then enriched the subcohort with samples from the remaining 216 men who died within a 5-year follow-up window (control: case ratio approximately 2:1) (Table 1). To check the consistency of peptide abundance levels between discovery and the platform used in the replication phase (SRM), 100 samples chosen at random from the discovery cohort (86 alive + 14 dead) were measured along with the replication cohort. The peptide abundance distributions (standardized within cohorts) were very similar ( Figure S2).

| Ascertainment of mortality
MrOS participants were contacted regularly during a 5-year follow-up period. Information concerning death outcomes was available on >98% of enrolled participants. Deaths were ascertained with the return of a postcard by the participant's family or by a direct contact by study staff and were adjudicated by central examination of death certificates. The cause of death was determined from the death certificate. We considered cancer to be a reliable cause of death from the certificate and included all other causes as noncancer. We focused on deaths within 5-years, the time point at which sufficient numbers of deaths had occurred in the cohort. However, we also examined 2-and 3-year follow-up intervals (albeit with fewer observed deaths) to highlight potential relationships with nearer-term mortality.

| Serum proteomic analysis
The sample selection and processing workflow for discovery and replication phases are shown in Figure S3. In the samples analyzed, we detected 3946 identifiable peptides representing 339 proteins. Peptides present in fewer than 50% of the samples were not considered for further analysis, leaving 2857 peptides (256 proteins) for final analysis in the discovery phase ( Figure S3).

| Replication using LC-SRM
SRM analyses were performed using approaches as previously described (Nielson et al., 2016;Shi et al., 2013). A brief method description is provided in Supplemental Material. Of the 56 peptides that were identified as having robust associations with mortality, we chose 21 (representing 16 proteins) for final SRM measurements based on peptide response, transition specificity (co-eluting interference-free), detection sensitivity, and LC performance ( Figure S3). and 261 of 2857 (9.1%) did so at 2 and 3 years, respectively.

|
Ninety-six (96) peptides were consistent across all three of the time points. As approximately two-thirds of the deaths were due to reasons other than cancer, peptides selected for noncancer mortality were a subset of those selected for all-cause mortality. Cancer-specific mortality analyses yielded very few significant results, likely due to modest numbers of cancer deaths; of these, only two peptides differed from those seen for all-cause mortality.
These initial effect size estimates were made more robust by means of a cross-validation procedure implemented using a bootstrap-approximated jackknife resampling procedure. Briefly, the data were repeatedly split in half randomly, and peptide effects were selected for size in one half but re-estimated in the other half; the re-estimated effect sizes were then averaged in a Bayesian framework (Shao, 1989) (see Supplemental statistical methods). We refer to these effect size estimates as robust. False discovery rate (FDR) control is inappropriate when using this peptide discovery and crossvalidation pipeline (see Supplemental Materials), but our methods provide adequate control of false discoveries.

| Replication phase (targeted SRM)
To determine which peptides would be carried forward from the discovery to the replication phase, we calculated a weighted "impor-

| Peptide signature and mortality
To assess the usefulness of a peptide signature for the prediction of mortality, we examined classification performance using the area under the receiver operating characteristic (ROC) curve and compared the contribution of peptides to a prediction model utilizing the Schonberg mortality index (Schonberg et al., 2009) calculated using baseline MrOS data (Supplemental statistical methods).

| Enrichment analyses
To characterize the proteins represented by mortality-associated peptides for the 5-year all-cause endpoint, we conducted Gene Ontology (GO) and KEGG pathway enrichment analyses using the proteins identified as robust in the discovery phase and the workflow described by Schmidt et al. (Schmidt, Forne & Imhof, 2014) (Supplemental statistical methods).

ACKNOWLEDG MENTS
The MrOS Study is supported by the following institutes under the