Cerebrospinal Fluid Peptides as Potential Parkinson Disease Biomarkers: A Staged Pipeline for Discovery and Validation*

Finding robust biomarkers for Parkinson disease (PD) is currently hampered by inherent technical limitations associated with imaging or antibody-based protein assays. To circumvent the challenges, we adapted a staged pipeline, starting from our previous proteomic profiling followed by high-throughput targeted mass spectrometry (MS), to identify peptides in human cerebrospinal fluid (CSF) for PD diagnosis and disease severity correlation. In this multicenter study consisting of training and validation sets, a total of 178 subjects were randomly selected from a retrospective cohort, matching age and sex between PD patients, healthy controls, and neurological controls with Alzheimer disease (AD). From ∼14,000 unique peptides displaying differences between PD and healthy control in proteomic investigations, 126 peptides were selected based on relevance and observability in CSF using bioinformatic analysis and MS screening, and then quantified by highly accurate and sensitive selected reaction monitoring (SRM) in the CSF of 30 PD patients versus 30 healthy controls (training set), followed by diagnostic (receiver operating characteristics) and disease severity correlation analyses. The most promising candidates were further tested in an independent cohort of 40 PD patients, 38 AD patients, and 40 healthy controls (validation set). A panel of five peptides (derived from SPP1, LRP1, CSF1R, EPHA4, and TIMP1) was identified to provide an area under curve (AUC) of 0.873 (sensitivity = 76.7%, specificity = 80.0%) for PD versus healthy controls in the training set. The performance was essentially confirmed in the validation set (AUC = 0.853, sensitivity = 82.5%, specificity = 82.5%). Additionally, this panel could also differentiate the PD and AD groups (AUC = 0.990, sensitivity = 95.0%, specificity = 97.4%). Furthermore, a combination of two peptides belonging to proteins TIMP1 and APLP1 significantly correlated with disease severity as determined by the Unified Parkinson's Disease Rating Scale motor scores in both the training (r = 0.381, p = 0.038)j and the validation (r = 0.339, p = 0.032) sets. The novel panel of CSF peptides, if validated in independent cohorts, could be used to assist in clinical diagnosis of PD and has the potential to help monitoring or predicting disease progression.

mcp.M114.040576 Author contributions: JZ conceived and supervised the project; MS performed some of the initial proteomic profiling, assisted in experimental design and execution as well as data interpretation during the pipeline establishment, performed statistical analyses with PA, and drafted the manuscript with JZ, RD, and PA; JM helped to establish the pipeline in the initial phases and performed most of the peptide picking, CSF observable peptide selection, and SRM assay optimization experiments; RD assisted in SRM assays and data interpretation; PA performed most of the SRM data analysis and assisted in statistical analyses; YZ performed most of the SRM data acquirements and assisted in data analysis; CP was involved in the initial proteomic profiling and assisted in SRM assays; XL performed some of the initial proteomic profiling and assisted in peptide picking; TKB performed the IPA analysis; TS assisted in data interpretation and copyediting; CPZ, ERP, SCH, JFQ, and DRG were responsible for patient characterization and sample collection; all authors critically reviewed the manuscript. roughly 2% of persons over the age of 65 years (1,2). Currently, PD diagnosis is mainly based on observation of the cardinal motor indicators of the disease, patient response to drug treatment, and medical history (3,4). There is an appreciable misdiagnosis rate (4), particularly at early disease stages. Additionally, no objective measure of disease progression or treatment effects has been established. Thus, objective, reliable, and reproducible biomarkers are clearly needed to aid in the diagnosis of PD and tracking or predicting the disease progression.
The most sensitive tests developed to date are based on imaging modalities, which can detect functional and structural abnormalities even prior to the onset of motor dysfunction (5,6). However, the usefulness of neuroimaging techniques is limited by high cost, limited accessibility, difficulty in reliable differentiation of PD from other atypical parkinsonian disorders and subjection to confounding factors such as medication and compensatory responses (4 -7). Biochemical and molecular markers in cerebrospinal fluid (CSF) and other body fluids have also been actively investigated (5, 8 -12). The most extensively studied candidate in CSF is probably ␣-synuclein, the major protein component of Lewy bodies and Lewy neurites, the pathological hallmarks of PD (2). The current consensus is that CSF ␣-synuclein concentrations are generally lower in patients with PD compared with controls (5, 8 -10); the sensitivity and specificity, however, appear to be only moderate, and no correlation with PD severity or progression has been observed (8,9). Notably, all these CSF protein markers are measured using antibody-based assays, which are often associated with relatively high variability, particularly when different detection techniques (different antibodies, sample preparation, calibrators, etc.) are used, leading to discrepant results across laboratories (5). It should also be stressed that this high variability in immunoassays is not unique to PD, because similar difficulty is encountered in AD and other related disorders (13,14).
One strategy to avoid the inherent technical limitations associated with antibodies is to use alternative techniques in which unique peptides are selected and precisely quantified with mass spectrometry (MS) techniques, for example, accurate inclusion mass screening (AIMS) (15) and selected reaction monitoring (SRM) (16 -18). To this end, in the last few years, we and others have utilized proteomic technologies to identify novel proteins and peptides associated with different disease states and stages (5, 6, 19 -25). Using brain tissue or CSF, these unbiased proteomic profiling studies have revealed disease-related alterations in hundreds of peptides derived from many proteins (19 -25). However, there are no quantitative assays for the majority of these candidate proteins/peptides, and development of such assays is limited by the lack of antibodies available for many of them. Thus, although a large library of potential peptide biomarkers has been developed, the vast majority never reach the stage of validation and clinical testing, hampered by the difficulty of de novo development of immunoassays, a process that is time consuming, prohibitively expensive to develop and very difficult to multiplex.
In this study, we aim to establish a PD biomarker identification and verification pipeline, with the goal of prioritizing candidates and swiftly developing reliable quantitative assays. We focused on identifying peptides by SRM and AIMS, because these targeted proteomic technologies have been proposed as the basis of a viable biomarker pipeline (16) and have become a powerful tool in biomarker discovery because of their high sensitivity, accuracy and specificity. SRM, in particular, has emerged as an alternative to immunoaffinitybased measurements of defined protein sets with excellent reproducibility across different laboratories and instrument platforms (17,18). The staged pipeline in the current investigation ( Fig. 1) includes: (1) data-dependent and bioinformatic prioritization of thousands of candidate biomarkers identified in our previous profiling studies, (2) de novo development of antibody-free multiplex SRM assays to reliably measure tens to hundreds of peptides simultaneously, and (3) multiplex biomarker verification studies allowing identification and validation of models or panels of candidates in independent sample sets, two of which were used in this study.

EXPERIMENTAL PROCEDURES
Participants and CSF Sample Collection-A total of 178 subjects (70 PD, 38 AD, and 70 healthy controls) were randomly selected from a previously described well-characterized multicenter, retrospective cohort (8,11), with age and sex matched between groups. Participating institutions include Baylor College of Medicine, Oregon Health and Science University, the University of California at San Diego, VA Puget Sound Health Care Systems at Seattle, and the University of Washington (UW). This study was approved by the Institutional Review Boards of all participating sites. All subjects provided informed consent and underwent evaluations consisting of medical history, physical and neurological examinations, laboratory tests, and neuropsychological assessments. The inclusion and exclusion criteria were previously described (8,11) and a brief description is provided in the supplemental Methods. Thirty (30) patients with PD and 30 healthy controls were included as the training set in this study, and 40 patients with PD, 38 patients with AD, and 40 healthy controls were included as the validation set. Demographic information is listed in Table I for all subjects.
All CSF samples were obtained by lumbar puncture in the morning as described (8,11) (details can also be found in the supplemental Methods). Similar CSF collection protocols and quality control procedures were followed at all participating centers, in particular, use of polypropylene collection and storage tubes, rapid separation into single use aliquots, and freezing of CSF samples, to minimize potential site variations. Indeed, no apparent site effects were observed on the concentrations of several potential PD CSF biomarkers (e.g. total fluid; CSF1R, Macrophage colony-stimulating factor 1 receptor; Con, healthy control; CP, ceruloplasmin; EPHA4, ephrin type-A receptor 4; LRP1, prolow-density lipoprotein receptor-related protein 1; MMSE, Mini Mental State Examination; MS, mass spectrometry; ROC, receiver operating characteristic; SCX, strong cation-exchange; SPP1, osteopontin; SRM, selected reaction monitoring; TIMP1, metalloproteinase inhibitor 1; UPDRS, Unified Parkinson's Disease Rating Scale. ␣-synuclein and one phosphorylated form) in these CSF samples in our previous studies (8,11,26).
Proteomics Data Mining for PD-related Proteins/Peptides-Proteomic data were gathered from our previous quantitative PD-related proteomic studies, including a general human CSF profiling (19), two general human midbrain profiling studies (20,21), general human frontal cortex profiling studies of PD progression (22)(23)(24), glycoprotein profiling studies of human CSF and frontal cortex (25), a phosphoprotein profiling of human frontal cortex (unpublished data), glyco-and phospho-protein profiling studies in a MPTP (1-methyl-4phenyl-1,2,3,6-tetrahydropyridine) monkey model (putamen; unpublished data). A total of 15 data sets were integrated: two on the human CSF proteome, four on the human midbrain (substantia nigra) proteome, six on the human frontal cortex proteome, and the remaining three on the monkey putamen proteome. Candidate peptides and their corresponding proteins were selected based on the following criteria: (1) identified more than once; (2) had a confidence level of Ն95%; (3) displayed Ն20% disease-associated change in at least one data set (PD versus control, Ն1.20 or Յ 0.83). Peptides meeting these criteria were combined; human homologs were selected for peptides/proteins derived from monkey. A total of 13,879 unique peptides (derived from 4062 proteins) were identified (see a complete list in supplemental Table S1).
CSF Sample Processing-Proteins from CSF samples were precipitated using 20% trichloroacetic acid and solubilized in 8 M urea in 50 mM ammonium bicarbonate (pH 8.0). The protein concentration of individual samples was determined using the Pierce bicinchoninic acid (BCA) Protein Assay Kit (Thermo Scientific, Rockford, IL). The proteins were then reduced and alkylated using dithiothreitol and iodoacetamide, respectively, followed by trypsin digestion as described previously (24,27).
For assay development, reference CSF samples were pooled from Ͼ5 healthy control samples collected at a UW clinic. For CSF observable peptide selection, tryptic digests of 990 l reference CSF were desalted using 1cc Sep-Pak ® Vac C18 cartridges (Waters, Milford, MA). Strong cation-exchange (SCX) fractionation was then carried out using a PolySulfo-ethyl A (200 ϫ 2.1 mm ϫ 5-m, 300 Å) column (PolyLC, Columbia, MD) on a Biologic Duo-Flow LC system (Bio-Rad, Hercules, CA). Ten SCX pools were made for each CSF sample and desalted using C18 MicroSpin columns (The Nest Group, Southborough, MA) according to manufacturer's protocol. More details can be found in the supplemental Methods.
For SRM analysis of individual samples, 50 l of CSF from each subject were precipitated, digested, desalted with a C18 MicroSpin column, and resuspended in 40 l of 0.1% formic acid/2% acetonitrile plus 10 l of pooled "heavy" peptides (see below).
Selection of CSF-observable Peptides-The ϩ2 charge state for each of the 13,879 candidate peptides was used to generate two inclusion lists for targeted analysis on a Q-Exactive mass spectrom-eter (Thermo Scientific) based on the number of precursors that could be monitored in one run and the predicted retention time of the desired peptides. Liquid chromatography (LC) was performed using a Waters NanoAcquity UPLC; peptides were separated online with 75 m i.d. ϫ 20 cm home-packed fused silica columns (ReproSil-Pur C18-AQ, 3 m, Dr Maisch GmBH, Ammerbuch-Entringen, Germany) with a 120 min 2-80% acetonitrile/water gradient containing 0.1% formic acid (see more details in the supplemental Methods). Each SCX pool was run four times: two data-dependent acquisition (DDA) runs, and two coupled with the inclusion lists.
MS/MS spectral data were converted to the mzML format and searched using the Trans-Proteomic Pipeline (TPP) (ver. 4.6.2; Institute for Systems Biology, Seattle, WA) against a human International Protein Index (IPI) sequence database (version 3.87) to identify proteins in each pool. Search results were then exported to the Skyline Targeted Proteomics Environment (v2.1) (McCoss Lab, University of Washington, Seattle, WA) (28) to create spectral libraries and for MS1 filtering to identify precursor ions. Peak detection and integration were also manually inspected and the precursors with a dot-product ratio of Յ0.94 were excluded. If no peptide in the targeted list of a protein of interest was identified but different peptide(s) derived from the same protein were observed, the targeted peptide(s) were replaced. Several observed peptides derived from proteins known to be critical in PD (e.g. DJ-1(2)), though not in the initial targeted list, were added. A total of 1466 unique peptides (285 proteins) that could be reliably observed in CSF were identified (see supplemental Table S2).
Bioinformatics Analysis-Ingenuity Pathway Analysis (IPA; Qiagen, Redwood City, CA) was performed on the integrated quantitative proteomic data to identify proteins involved in pathways affected in neurological disorders. IPA (http://www.ingenuity.com/) is a commercial tool that is based on a proprietary database to facilitate the identification of biological themes in proteomics or gene expression data. The identified proteins, together with the top ranked most PD-relevant proteins obtained from genecards.org (search using "Parkinson disease" as keyword), served as a reference list to select peptides for further SRM assay development and targeting.
SRM Analysis-All SRM analysis was performed on a TSQ Vantage triple quadrupole mass spectrometer (Thermo Scientific) coupled to a nanoAcquity UPLC (Waters). Reversed-phase chromatography was performed on capillary columns (75 m ϫ 20 cm; Polymicro Technologies, Phoenix, AZ) packed with 100 Å Magic C18 (Michrom/ Bruker, Auburn, CA). Four (4) l of tryptic digest were injected into a column and separated using a binary gradient (see supplemental Methods for more details). Scheduled SRM were performed with dwell times of 20 ms and retention time windows of 4 min and 5 min for the training set and the validation set, respectively.
Quantification of Target Peptides by SRM-To determine the levels of endogenous target peptides in CSF of PD and control subjects, 192 peptide standards (Thermo Scientific) corresponding to natural counterparts ("light" peptides) were synthesized with heavy isotopic lysine (13C615N2) or arginine (13C615N4) at the C termini ("heavy" peptides), pooled, and spiked into the digested, desalted CSF samples upon resolubilization. The "heavy" peptides were estimated to be 60 -80% pure based on manufacturer's specifications and our independent quality checking using a 4800 Plus MALDI TOF/TOF (AB SCIEX, Framingham, MA). Collision energies (CE) were determined using the default formula from Thermo (0.034 ϫ precursor mass m/z ϩ 3.3140) and in the validation set they were further optimized with 10 additional CE steps (Ϫ5 to ϩ5 V from the default, with 1V increments; see details in the supplemental Methods). The SRM transitions of the heavy and their corresponding light peptides were optimized and validated. For each peptide, the best three transitions, including three "Light" and three "Heavy" transitions were selected. One-hundred and twenty-six unique, reproducibly-detected peptides that showed good SRM transition signals and dot-product ratios Ͼ0.94 were identified (see supplemental Table S3) and targeted for SRM quantification in the training set consisting of 30 PD and 30 control subjects. Seventeen peptides (Table II) with significant changes in PD compared with controls in the training set were further validated in another set consisting of 40 PD, 38 AD, and 40 healthy subjects.
All SRM mass spectral data were processed using the Skyline software (v2.1). Typical settings applied were 0.055 Th match tolerance m/z and default peak boundary assignment informed by Savitzky-Golay smoothing. All peak boundaries were manually inspected and reassigned as needed to ensure correct peak detection and accurate integration. Information including peak area and area ratio of light/heavy peptide pair were exported for further analysis. Following peak detection and integration, peptides were considered "detectable" for each subject if 1) the peptide transitions had a signal-to-noise ratio of Ն3, and 2) at least two light SRM transitions and two heavy SRM transitions were observed. Peptides detected in less than 50% of the subjects were excluded.
Statistical Analysis-All analyses were performed in SPSS 18.0 (IBM, Chicago, IL) or Prism 4.0 (GraphPad Software, La Jolla, CA). Peptide data were normalized by the Box-Cox transformation (29) (SERPINC1-FAT, and EPHA4: ϭ 0.1; SPP1, and LRP1, ϭ 0.3; SERPINC1-TSD, APOB, TIMP1, and CSF1R: ϭ 0.4; GPR37, and APLP1, ϭ 0.5; CP, ϭ 0.6), which represents a family of power transformations that incorporates and extends the traditional options to help researchers easily find the optimal normalizing transformation for each variable. CSF ␣-synuclein protein data were log10 transformed. The Mann-Whitney U test was used to examine median differences between PD and healthy control groups in the training set. One-way analysis of variance (ANOVA) followed by the Tukey test was used to compare differences between groups in the validation set, without or with (through general linear model analysis) controlling for potential confounding factors such as age, sex, and CSF total protein concentrations. Receiver Operating Characteristic (ROC) curves for individual peptide analytes were generated to evaluate their sensitivities and specificities in distinguishing PD from healthy or diseased (AD) control subjects. Logistic regression was used to determine the best linear combination of peptide analytes for predicting disease status (versus healthy or diseased controls), followed by ROC analysis on the linear combination. The "optimum" cutoff value for a ROC curve was defined as the value associated with the maximal sum of sensitivity and specificity. Additionally, relationships between the analytes and age, sex, and the Unified Parkinson's Disease Rating Scale (UPDRS) motor scores were analyzed with bivariate correlation using Pearson's correlation coefficients. Partial correlations between CSF peptide levels and UPDRS scores were also conducted while controlling for potential confounding factors such as sex and age of subjects. Stepwise multiple linear regression analysis was used to screen for the best predictors (linear combination of peptide analytes) that correlates the disease severity (UPDRS). Values with p Ͻ 0.05 were regarded as significant. Increased type I errors because of multiple comparisons were minimized by using the training and validation approach, in which a limited set of outcomes determined in the training set are repeated in the validation set, and considered significant only if the a priori determined outcomes are confirmed in the validation set.

Identification of CSF Peptide Biomarker Candidates-For
biomarker candidate selection, we first integrated an extensive compilation of 15 PD-related quantitative proteomic data sets generated from our previous studies (references (19 -25) and unpublished data). All analyses included samples from patients with PD as compared with age-and sex-matched healthy or diseased controls or monkeys treated with MPTP. A total of 13,879 unique PD-related peptides (corresponding to 4062 proteins) were identified (see supplemental Table S1). There are no traditional quantitative assays (e.g. ELISAs) for the majority of the candidates, severely limiting our ability to perform follow-up verification studies.
To triage and verify a large number of peptide or protein candidates in a manner that would enable us to test as many candidates as possible while containing costs, we adapted a strategically staged pipeline (16) using targeted proteomic technologies (Fig. 1). Specifically, the technology with the lesser expense and higher capacity to triage large numbers of candidates (data-dependent LC-ESI-MS/MS analysis, AIMS) was used first to determine whether the peptides can be reliably monitored in CSF. Digested pooled reference CSF samples were fractionated using SCX chromatography to further facilitate our sensitivity for detecting low abundance proteins during this initial analysis. A total of 1466 unique peptides (derived from 285 proteins) that could be reliably observed in CSF were empirically selected (MS1 filtering; supplemental Table S2).
Quantitative SRM Assay Development-The next step involved high-throughput quantification technology (SRM) to establish sensitive, accurate assays for candidate marker measurements. However, reagent costs (e.g. synthetic "heavy" standard peptides) limited the practicality of developing assays for all confirmed CSF candidates, necessitating a further prioritization step. For this, we used a bioinformatics approach (Ingenuity Pathway Analysis, ingenuity.com; and the GeneCards database, genecards.org) to identify the proteins that are most relevant to PD and other neurological disorders from the target list. A list of 192 peptides, representing 110 proteins, was generated (supplemental Table S3) to advance SRM assay development.
We configured these peptide candidates into a multiplex SRM assay with "heavy" peptides used as internal standards. Based on the performance of the assays, such as quality of light and heavy SRM transition signals, and reproducibility of the target peptides (CVsϽ20%), the target list was further narrowed down to 126 unique peptides (see supplemental  Table S3).
Peptide Marker Discovery in the Training Set-A total of 60 subjects (30 PD, 30 controls) were used in the training set to identify altered peptides/proteins in PD as compared with healthy controls. The relative peptide levels (light/heavy ratios) were measured by using the optimized 126-plex SRM assay. Seventeen (17) peptides corresponding to 16 proteins showed significant differences between PD and controls (p Ͻ 0.05, Mann-Whitney) in the training set (Table II; see the  complete dataset in supplemental Table S4A and box plots in supplemental Fig. S1). Six of them were later determined to fail to meet our criteria (see Experimental Procedures) for robust and reliable detection and quantification in the training or validation set, and thus were excluded from ROC and disease severity correlation analyses. Except for CP (r ϭ 0.256, p ϭ 0.048, Pearson), none of these 17 peptides significantly correlated with age of subjects. Similarly, only EPHA4 significantly correlated with sex of subjects in this training set (p ϭ 0.041).
The correlation of levels of a single peptide or a combination of peptides with PD severity (as determined by UPDRS motor scores) was also examined in the training set, with or without controlling for potential confounding factors (e.g. sex of subjects). Although no peptides could be considered individually correlated with the UPDRS scores, a 2-peptide model (derived from TIMP1 and APLP1) was identified using linear regression analysis to significantly correlate with disease severity (r ϭ 0.381, p ϭ 0.038, Pearson; r ϭ 0.381, p ϭ 0.041, controlling for sex; Fig. 2C).
Promising Marker Confirmation in the Validation Set-To further evaluate the potential of the peptides identified in the training set as PD biomarkers and determine their specificity for PD, the 17 peptides were re-screened in the validation set consisting of 40 PD, 38 AD, and 40 healthy controls. The group differences between PD and healthy controls for eight of the 11 (six of the 17 peptides were excluded because of inconsistent assay performance) promising peptides were confirmed in this set (ANOVA , Table II; see also supplemental  Table S4B and supplemental Fig. S1). Five of these eight peptides (APLP1, CP, CSF1R, SERPINC1-TSD, and SPP1) and APOB also displayed significant differences between PD and AD groups. The group differences remained largely unchanged after controlling for potential confounding factors including age, sex, and CSF total protein concentrations (general linear model analysis). None of these 17 peptides significantly correlated with age of subjects in the validation, or combined (training ϩ validation) sets. The sex effects on EPHA4 were also not confirmed (p ϭ 0.422); however, SERPINC1-FAT (p ϭ 0.025), APOB (p ϭ 0.040), LRP1 (p ϭ 0.005) significantly correlated with sex in the validation set but not in the training set. Though these data indicate that overall the concentrations of the candidate peptide markers may not be substantially affected by sex of subjects, when the two sets were considered a one combined set, SERPINC1-FAT (p ϭ 0.014), APOB (p ϭ 0.045), EPHA4 (p ϭ 0.037), and TIMP1 (p ϭ 0.049) showed a significant correlation with sex.
In this validation set, the 5-peptide model (SPP1, LRP1, CSF1R, EPHA4, and TIMP1) produced an AUC of 0.854, and both sensitivity and specificity equal to 82.5% in differentiating patients with PD and healthy controls in the ROC analysis (Table III and Fig. 2B), confirming its diagnostic performance observed in the training set. Furthermore, this peptide panel could also differentiate the PD and AD groups well (AUC ϭ 0.990, sensitivity ϭ 95.0%, specificity ϭ 97.4%; see Table III and supplemental Fig. S3). In contrast, the CSF ␣-synuclein protein measured with the immunoassay did not perform well The 2-peptide model (TIMP1 and APLP1) identified in the training set was also validated in its significant correlation with UPDRS motor scores (r ϭ 0.339, p ϭ 0.032, Pearson; r ϭ 0.336, p ϭ 0.036, controlling for sex; Fig. 2D). Interestingly, TIMP1 alone correlated with the disease severity in this validation set (r ϭ Ϫ0.368, p ϭ 0.019, Pearson; r ϭ Ϫ0.376, p ϭ 0.018, controlling for sex).

DISCUSSION
Biomarker discovery in PD and other neurodegenerative disorders has been quite challenging and ideal biomarkers are still an unmet clinical need. In this study, we employed a targeted approach, and established a staged pipeline to facilitate biomarker discovery and validation. We report a panel consisting of five peptides/proteins (SPP1, LRP1, CSF1R, EPHA4, and TIMP1) with fair robustness in regard to specificity and sensitivity in differentiating PD from healthy and diseased (AD) controls. Additionally, we report a 2-peptide/ protein model (TIMP1 and APLP1) that significantly correlates with disease severity as measured by UPDRS motor scores.
The first major achievement of this study is to establish a pipeline that distills proteomic profiling results to a reasonable number of peptides that can be followed practically. This process is quite important, because the "-omics" based discovery experiments are fraught with false discoveries resulting from biological variability and the large number of hypotheses being tested in small numbers of samples (16), and thus validation studies using independent methods must be performed to verify the clinical utility of a candidate. The validation process often requires development and optimization of protein-specific assays, frequently depending on immunologic reagents that may be unavailable, poorly characterized, or insufficiently specific. Even if good antibody pairs are available, high multiplexing (Ͼtens of simultaneous assays) of traditional immunoassays is still challenging, largely because of antibody cross-reactions and matrix effects (30). Therefore, many candidates are not carried through the validation stage because the traditional processes are simply too labor-intensive and/or require impractical sample volumes to be a reasonable tool for clinical investigation. In the current investigation, with a staged pipeline we were able to not only follow up and test a far larger number of candidates than would have been possible using conventional methods, but also successfully identify a novel panel of candidate peptide markers for PD diagnosis and severity correlation, thus marking a substantial improvement over the current state of PD biomarker evaluation.
The proteins in both panels/models we identified in this study are differentially regulated in diseased states versus healthy and/or diseased (AD) controls, and have been mechanistically implicated in various neurodegenerative processes. Of particular interest, osteopontin (SPP1; included in the 5-peptide model for diagnosis) is a glycosylated phosphoprotein expressed in neuronal cell bodies that seems to act like a double-edged sword in neurodegenerative disorders-it can be toxic to neurons and cause cell death in some instances, but is neuroprotective in others (31). Iczkiewicz and colleagues demonstrated that osteopontin expression was decreased in substantia nigra of MPTP-treated primates and in PD (32). In contrast, Maetzler et al. found it was enriched in the neuromelanin containing zone of post mortem human PD brain tissue and present in Lewy bodies, but was absent from control brain tissue (33). The same authors also showed CSF and serum osteopontin concentrations were higher in PD patients than controls (33). A more recent study reported increased CSF and plasma concentrations in mild cognitive impairment and AD patients (34). Interestingly, the peptide (AIPVAQDLNAPSDWDSR, unmodified) used in our study displayed lower levels in PD; this could be related to the changes in post-translational modifications (e.g. the increase of the extent of modifications exceeds the increase of the total pro- tein levels, resulting in a decrease of the unmodified peptide) and should be further investigated.
Three of the other candidates (LRP1, CSFR1, and EPHA4) are receptors implicated in signaling pathways associated with neurodegeneration, in particular with inflammatory response. The low density lipoprotein receptor-related protein 1 (LRP1) is a cell surface receptor expressed in different brain cell types including endothelial cells and neurons; it can regulate amyloid-␤ and other ligands trafficking into the cell and clearance from the brain (35,36), and maintain brain lipid homeostasis and associated synaptic and neuronal integrity (37). Increased expression of LRP1 was observed in PD brain (38) and functional soluble LRP1 was also detected in human brain tissue and CSF (39). Colony-stimulating factor receptor 1 (CSFR1) is the receptor for colony stimulating factor 1 and interleukin-34 (IL-34), which are key regulators of the monocyte/macrophage lineage (40). In a mouse model lacking CSFR1, neurons are more susceptible to cell death and neu- rodegeneration after excitotoxic injury, suggesting involvement of CSFR1 signaling in their survival (40). In addition, CSFR1 is required for the development of microglia, brain development, and maintenance of normal brain structure (41). Ephrin type-A receptor 4 (EPHA4) is a member of the A subclass of Ephrin receptor tyrosine kinases and interacts with both A-type and B-type ephrins (42). EPHA4 signaling through its ephrin ligands has been implicated in guiding axons during neural development, synapse formation, and regulation of long-term synaptic plasticity and memory (43). Recently, it has been found to be decreased in Huntington disease CSF (44) and also to be able to modulate the vulnerability of motor neurons to axonal degeneration in amyotrophic lateral sclerosis in animal models and in humans (42). However, no direct evidence yet has been discovered on the implications of EPHA4 signaling pathway in PD pathogenesis.
Tissue inhibitor of metalloproteinases-1 (TIMP1) is included in both the panel for PD diagnosis and the model for disease severity correlation. This protein is abundantly expressed and functions primarily to inhibit a large class of matrix metalloproteinases (MMPs) (45). TIMP1 has been shown to be neuroprotective in various systems, which indirectly supports the damaging role of MMP-3 (46,47). MMP-3 could cleave ␣-synuclein to remove the negative charges in the C-terminal portion making it more hydrophobic and prone to aggregation (46). Likewise, DJ-1, another protein implicated with PD (2), could also be cleaved by MMP-3 (46). TIMP1 has been detected in human CSF (48,49) and our finding of lower TIMP1 concentrations in CSF of PD patients is in line with the association of the aberrant and excessive activity of MMPs with the neurodegenerative processes in the PD brain (50).
Amyloid-like protein 1 (APLP1), another protein/peptide in the model for PD severity correlation, is also implicated in neurodegeneration. APLP1 is localized in the cerebral cortex postsynaptic density of rats and humans and its expression was reported to be increased in synaptic development, suggesting a role in synaptogenesis, or synaptic maturation (51). Increased APLP1 expression and neurodegeneration were found in the frontal cortex of manganese-exposed nonhuman primates (52), which could be a compensatory event. Soluble forms of APLP1 (53) and more interestingly, three APLP1derived amyloid-beta-like peptides (54), were previously observed in human CSF.
Together, these proteins represent an important step forward in PD biomarker discovery. To date, the best-performing PD biomarkers (e.g. ␣-synuclein (8,9,11)) are those identified based on their known roles in disease processes, but none have provided sufficient and validated sensitivity/specificity to be of clinical use for diagnosis or progression. Although further testing (e.g. against different control groups) and largerscale, independent validations are needed, the panel established here performed better than CSF ␣-synuclein and likely performs at least similarly to other best-established markers, suggesting that the strategy developed here, that is, selection of targets from proteomic studies based on biological plausibility performance in MS-based assays, could be an efficient way to expand the search for feasible biomarkers. Additionally, the candidate CSF peptides/proteins were selected from those more relatively specific to PD (as compared with AD) in our proteomic studies, and thus the performance of the CSF peptide/protein panel on differential diagnosis between PD and AD further confirmed the efficiency of our strategy. Furthermore, two unique peptides related to neurodegeneration appeared to be related to PD severity, which can be potentially used in PD progression or drug treatment assessment, if the observations can be validated in longitudinal cohosts, especially those prospective ones.
One concern when using combinations of multiple markers in a relatively small number of subjects is overfitting, which means that if one investigates enough classification rules then, by chance, one of them is likely to perform well (55). To avoid overfitting, one approach is to use a training sample set to formulate the classification rules and a test/validation sample set to create the definitive ROC curve (55), as we did in the current study. Additionally, we chose to reduce the peptide numbers in the model as much as possible and present the results from a 5-peptide model instead of using more peptides. In fact, if all 11 "good" peptides were included in the model, a nearly perfect diagnosis would be achieved in the training set (AUC: 0.982; sensitivity: 90.0%; and specificity: 96.7%), with an AUC of 0.932 (sensitivity: 85.0%; specificity: 92.5%) in the validation set (Table III). Although similar results have been considered acceptable, including those reported recently in Nature Medicine (56), we took a more conservative approach because, without pathology confirmation, the typical correct rate of clinical PD diagnosis is no more than 90% even at major medical centers (4). Therefore, a biomarker model perfectly matching the imperfect clinical diagnosis is unlikely to correctly reflect the underlying disease status and should be used with great caution.
The panel of candidate PD biomarkers identified in the current study does not contain peptides from several known potential PD CSF biomarkers (e.g. ␣-synuclein and DJ-1 (8,9,11)). This is not unexpected, because: (1) some proteins may not be readily identified in an unbiased proteomic profiling because of their low abundance or modifications (␣-synuclein, which is usually highly modified (2, 57), was not even included in the initial candidate list compiled from our previous proteomic profiling studies); and (2) the goal of the current investigation was to reveal not only novel but also robust markers, and therefore, the experimental conditions were geared toward markers that are readily and reproducibly quantified in a multiplex assay in CSF. Consequently, some of the peptides (e.g. those from DJ-1), even though they were included in the list, failed to pass our rigorous pipeline selection criteria. That said, a peptide from complement factor C3 (C3), a potential marker identified in CSF in our previous study (58), did appear as one of the 17 peptides identified in the training set. In further investigations, we will try to optimize the conditions to include not only these robust biomarker candidates but also known potential markers such as ␣-synuclein and DJ-1 for an "ideal" panel of markers to be used clinically. Additionally, although the identified panel of CSF peptides/ proteins could differentiate PD and AD well, to fully confirm disease specificity of these potential markers, a larger cohort including related disease controls (e.g. those with multiple system atrophy or progressive supranuclear palsy) will be needed to further test their usage in PD differential diagnosis.
In conclusion, through a staged pipeline and high-throughput SRM quantification, a panel of candidate peptide biomarkers has been identified in CSF to provide good diagnostic sensitivity/specificity for PD and correlation with disease severity. These results, if validated in independent studies, particularly those with samples collected prospectively, could be used to assist in clinical diagnosis of PD and have the potential to help monitoring or predicting disease progression.
Acknowledgments-We thank Drs. HyeJin Hwang and Jianpeng Zhang for their contributions to the early proteomic profiling and data analysis, Mr. Michael J Hipp for his assistance in sample preparation and data analysis, and Mr. Allion A Salvador for his kind help in peptide picking. We also deeply appreciate the patients and participants for their generous participation and donation of samples. * This study was supported by generous grants from the National Institutes of Health (NIH) (U01 NS082137, P42 ES004696-5897, P30 ES007033-6364, R01 AG033398, R01 ES016873, R01 ES019277, R01 NS057567, and P50 NS062684-6221 to JZ, R01 NS065070 to CPZ, and P50 AG005131 to DRG), and partially by a pilot study award from the NIH-sponsored ADRC at the UW (P50 AG003156-30) and a National Institute of Neurological Disorders and Stroke/NIH award R21 NS085425 to MS. It was also supported in part by the University of Washington's Proteomics Resource (UWPR95794), the National Institute Of Environmental Health Sciences of the NIH under Award Number P30 ES007033, and the National Natural Science Foundation of China (NSFC) projects (31200105 and 31470238). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH and other sponsors.