Discovery of biomarkers for amyotrophic lateral sclerosis and frontotemporal lobar degeneration 1 from human cerebrospinal fluid using mass spectrometry-based proteomics

1 Background: Amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD) are 2 progressive neurodegenerative diseases that share clinical and neuropathologic features. Critical to the 3 mission of developing effective therapies for ALS and FTLD is the discovery of biomarkers that can 4 illuminate shared mechanisms of neurodegeneration, which can then be evaluated for diagnostic, 5 prognostic or pharmacodynamic value across the disease spectrums. 6 Methods: Here, we merged unbiased discovery-based approaches and targeted quantitative 7 comparative analyses between ALS and FTLD cerebrospinal fluid (CSF) to identify proteins that are 8 altered in ALS and FTLD. 9 Results: Discovery mass spectrometry (MS)-based proteomic approaches combined with tandem mass 10 tags (TMT) quantification methods from 40 CSF samples comprising 20 patients with ALS and 20 11 healthy control (HC) individuals identified 19 differentially expressed candidate biomarker proteins 12 after CSF fractionation. Notably, these candidate biomarkers included novel and previously identified 13 proteins, thus validating our approach. Candidate biomarkers were subsequently examined using 14 parallel reaction monitoring (PRM) MS methods on 80 unfractionated CSF samples comprising 30 15 patients with ALS, 19 patients with FTLD, and 31 HC individuals. Two candidate biomarkers 16 (CNTNAP2 and CLSTN1) were downregulated in both ALS and FTLD compared to healthy controls, 17 and 11 further candidate proteins were significantly downregulated in FTLD compared to HC. 18 Conclusions: Taken together, this study identifies multiple novel proteins that are altered in ALS and 19 FTLD, which provides the foundation for their evaluation and development as biomarkers for these 20 diseases. 21 22

identified that an expanded hexanucleotide (GGGGCC) repeat insertion into the non-coding region of 10 C9orf72 is the most common genetic cause of both ALS and FTLD [11], and these discoveries have led 11 to the widely accepted view that ALS and FTLD share neurodegenerative pathways and lie on a disease 12 spectrum [12]. However, the shared pathogenic mechanisms of ALS and FTLD remain elusive, and, in  Clearly, the discovery of additional novel biomarkers is needed, but one major barrier is that many key 29 regulatory proteins are of relatively low abundance [25], and therefore changes in protein levels are 30 easily masked. Accordingly, the identification of useful biomarkers for ALS and FTLD requires the use 31 of innovative, highly sensitive, and accurate methodologies. 32 Herein, we report findings from an initial unbiased proteomics discovery study on fractionated CSF 33 from 20 patients with ALS and 20 healthy controls (HC). Tandem-mass-tag (TMT) technology was 34 4 used for the accurate and sensitive quantification of CSF proteins, with subsequent analysis via state-1 of-the-art Orbitrap Fusion Lumos ETD mass spectrometry. Several proteins from the discovery phase 2 confirmed previously identified biomarkers for ALS, validating our approach, and many new candidates 3 were identified. Using proteins identified from the discovery phase, a subsequent targeted, quantitative, 4 and highly sensitive analysis was conducted using parallel reaction monitoring (PRM) targeted mass 5 spectrometry in 80 unfractionated CSF samples from 30 patients with ALS, 19 patients with either 6 known or predicted FTLD-tau pathology and 31 HC individuals. Several novel candidate biomarkers 7 were identified that might be of use in better understanding shared disease mechanisms and potentially 8 differentiating ALS and FTLD from healthy controls. Further, this work highlights that our innovative 9 mass spectrometry approach has iterative potential in other neurodegenerative diseases. the non-fluent variant of primary progressive aphasia (nfvPPA, n = 2). Eighteen participants had either 21 FTLD confirmed on autopsy or a known FTLD-causing genetic mutation (i.e. microtubule associated 22 protein tau, MAPT), leading to high confidence in FTLD as causative in these cases, with the exception 23 of one case with bvFTD where the clinical suspicion for FTLD was high. Lumbar punctures were 24 performed using the atraumatic technique and collected in a polypropylene tube before transferring to 25 a 50 ml conical polypropylene tube at room temperature (RT), which was mixed gently by inverting 3-26 4 times. Within 15 min of collection, CSF was centrifuged at 2,000 x g for 10 min at RT and aliquoted 27 directly into pre-cooled polypropylene cryovials. Within 60 min of CSF collection, aliquots were frozen 28 on dry ice and then stored at -80°C, until further analysis. For samples collected prior to 2015, the 29 protocol used is described by Scherling CS et al. [26]. Study participants provided written informed 30 consent, and all procedures were approved by the UCSF Institutional Review Board (IRB). The 31 demographics of CSF samples used in this study are provided in Table 1. Four experimental sets of 10 samples were examined, with each set including a master pool (MP) 2 sample for normalization between sets. The MP was prepared by combining an equal volume from all 3 40 CSF samples including HC and ALS ( Figure 1A). The CSF samples were mixed with a urea buffer, 4 which was composed of 10 M urea/20 mM tris (2-Carboxyethyl) phosphine hydrochloride (TCEP)/80 5 mM chloroacetamide (CAA) in 100 mM triethylammonium bicarbonate (TEAB), at a one-to-one ratio.

6
The samples were then incubated for 1 h at RT for reduction and alkylation. Protein digestion was 7 carried out using LysC (lysyl endopeptidase mass spectrometry grade, Fujifilm Wako Pure Chemical 8 Industries Co., Ltd., Osaka, Japan) at a one-to-fifty ratio for 3 h at 37°C and subsequently with trypsin 9 digestion (sequencing grade modified trypsin, Promega, Fitchburg, WI, USA) at a one-to-fifty ratio at 10 37°C overnight after diluting the concentration of urea from 5 M to 2 M by adding 50 mM TEAB.

15
To perform TMT -based quantitative mass spectrometry, the digested peptides from CSF samples were 16 labeled using 11-plex TMT reagents following the manufacturer's instructions (Thermo Fisher  The peptide samples were analyzed on an Orbitrap Fusion Lumos Tribrid mass spectrometer interfaced 34 with an Ultimate 3000 RS Autosampler nanoflow liquid chromatography system (Thermo Fisher 1 Scientific). The dried 24 fractionated peptides were reconstituted in 0.5% FA and then loaded onto a 2 trap column (Acclaim™ PepMap™ 100 LC C18, 5 μm, 100 μm × 2 cm, Thermo Fisher Scientific) at a 3 flow rate of 8 μL/min. Peptides were separated on an analytical column (Easy-Spray™ PepMap™ 4 RSLC C18, 2 μm, 75 μm × 50 cm, Thermo Fisher Scientific) at a flow rate of 0.3 μL/min using a linear 5 gradient with mobile phases consisted of 0.1% FA in water and in ACN. The total run time was 120 6 min. The mass spectrometer was operated in a data-dependent acquisition mode. The MS1 (precursor 7 mass) scan range for a full survey scan was acquired from 300 to 1,800 m/z (mass-to-charge ratio) in 8 the "top speed" setting with a resolution of 120,000 at an m/z of 200. The AGC target for MS1 was set 9 as 1 × 10 6 and the maximum injection time was 50 ms. The most intense ions with charge states of 2 to 10 5 were isolated in a 3-sec cycle, fragmented using higher-energy collisional dissociation (HCD) 11 fragmentation with 35% normalized collision energy, and detected at a mass resolution of 50,000 at an   For MS data, MS1 error tolerance was set to 10 ppm and the MS/MS error tolerance to 0.02 Da. The 27 minimum peptide length was set to 6 amino acids, and proteins identified by one peptide were filtered 28 out. Both peptides and proteins were filtered at a 1% false discovery rate. The protein quantification 29 was performed with the following parameters and methods. The most confident centroid option was 30 used for the integration mode while the reporter ion tolerance was set to 20 ppm. MS order was set to 31 MS2. The activation type was set to HCD. The quantification value correction was disabled. Both 32 unique and razor peptides were used for peptide quantification. Protein groups were considered for 33 peptide uniqueness. Missing intensity values were replaced with the minimum value. Reporter ion 34 abundance was computed based on the signal-to-noise ratio. Quantification value corrections for 1 isobaric tags were disabled. The co-isolation threshold was set to 50%. The average reporter signal-to-2 noise threshold was set to 50. Data normalization was disabled. Protein grouping was performed by 3 applying strict parsimony principle as following; 1) all proteins that share the same set or subset of 4 identified peptides were grouped, 2) protein groups that have no unique peptides among the considered 5 peptides were filtered out, 3) Proteome Discoverer iterated through all spectra and selected which 6 peptide-spectrum match (PSM) to use in ambiguous cases to make a protein group with the highest 7 number of unambiguous and unique peptides, and 4) final protein groups were generated. The Proteome 8 Discoverer summed all the reporter ion abundances of PSMs for the corresponding proteins in a TMT 9 run.

11
Statistical and bioinformatic analyses of the results from discovery proteomics 12 Bioinformatics analysis was processed with the Perseus software package (version 1.6.0.7). Each set 13 was divided by the MP included in each set followed by dividing each column by a column median.  Co., Ltd., Osaka, Japan) at a one-to-fifty ratio for 3 h at 37°C and then using trypsin (sequencing grade 28 modified trypsin, Promega, Fitchburg, WI, USA) at a one-to-fifty ratio at 37°C overnight after diluting 29 the concentration of urea from 4 M to 2 M by adding 50 mM TEAB. Peptides were purified using C18

26
Quantitative proteome analysis of CSF samples 27 We first implemented an unbiased discovery-based approach to identify proteins that are differentially 28 expressed in the CSF of patients with ALS compared with HC individuals. We performed a quantitative 29 proteome analysis of 40 CSF samples comprising 20 patients with ALS and 20 HC individuals using 30 TMT labeling-based mass spectrometry ( Figure 1A and Table 1). The 40 samples were split into 4 31 batches of 10 and were labeled with an 11-plex TMT reagent. The MP, which is a pooled reference 32 sample of equal volumes from all 40 CSF samples, was placed at the 11 th channel of each 11-plex TMT 1 experimental set for the purposes of normalization between batches. After enzyme digestion and TMT 2 labeling, the peptides were pre-fractionated using bRPLC fractionation. The fractions were analyzed on  Table S1). normalized data were subsequently subjected to statistical analyses to identify proteins that were 12 differentially expressed between the two groups. Nineteen proteins were found to have differential 13 expression between ALS and HC based on q-value < 0.05 (Table 2 and Figure 2A   prior to trypsin-digestion to enable subsequent monitoring of relevant peptides. We evaluated 52 3 proteins with q-value < 0.1 (Table 2), and 16 out of the 52 proteins were detectable by PRM (Table 2). 4 11 of the 16 proteins were detected by 2 or more peptides, while 5 proteins were detected by 1 peptide 5 (Supplemental Table S2). Interestingly, most of the 16 proteins that were detectable by PRM were 6 proteins that were downregulated in ALS in the initial discovery experiments. Of note, several 7 previously identified protein biomarkers such as neurofilament proteins, CHIT1, CHI3L2, UCHL1 and 8 APOB were not detectable using this method. response curves from 100 fmol to 10 pmol ( Figure 3F). In summary, almost all peptides corresponding 28 to the 16 candidate biomarkers showed a linear response curve in the detectable range; accordingly, 29 they were considered reliable for quantitation of endogenous peptides in CSF.

31
Evaluation of target peptide repeatability in PRM analysis 32 We next determined the repeatability of the PRM analysis by examining if we could see consistent 33 results when the same samples were measured repeatedly over several days. We added 30, 500 and 1 1,000 fmol of the heavy SIL peptides to the trypsin-digested CSF samples and performed PRM analysis 2 over 5 days in triplicate, measuring the heavy to light (endogenous) ratio of the target peptides. All the 3 peptides showed lower than 15% of the coefficient of variation (CV). While the peptides at 30 and 500 4 fmols showed lower than 10% of CV, the peptides at 1,000 fmol showed a relatively higher CV ( Figure   5 4). We attribute these variations to the concentration differences between endogenous and SIL peptides.   concentrations of the endogenous peptides were lower than 5 pmol/ml, we added 5 pmol/ml of SIL 23 peptides to avoid ambiguous detection. After the digestion of the CSF proteins with trypsin, the 24 endogenous peptides and SIL peptides were monitored by PRM analysis ( Figure 5A). Two proteins, 25 CLSTN1 (P = 0.0341), and CNTNAP2 (P = 0.0281) showed statistically significant differences 26 between HC and ALS and HC and FTLD ( Figure 5B), while 11 proteins showed statistically significant 27 differences between FTLD and HC ( Figure 6). Eight peptides from 3 proteins showed no changes 28 between HC, ALS, and FTLD (Supplemental Figure S1). For APP, NCAN and NELL2, 3, 3 and 4 29 peptides, respectively, showed statistically significant differences between FTLD and HC ( Figure 6A).

30
For MEGF8, MFAP4, NPTX1, NPTX2 and RTN4RL2, two peptides showed statistically significant 31 differences between FTLD and HC ( Figure 6B). For the rest of the proteins, 1 peptide showed 32 statistically significant differences between FTLD and HC ( Figure 6C). Most of the peptides that 33 showed differential levels in FTLD did not achieve statistical significance in ALS; nevertheless, all 34 peptides downregulated in FTLD showed a trend of reduced concentration in ALS. These results 1 suggest that the majority of the candidate biomarkers identified are more applicable for use in FTLD, 2 with a select number that can be utilized for both ALS and FTLD. The objective of this study was to use an unbiased quantitative discovery approach to identify candidate 6 proteins that are differentially expressed in the CSF of patients with ALS compared to healthy controls, 7 and then subsequently use these protein candidates in a targeted quantitative approach to identify novel  and importantly, validates our approach as a method to identify novel candidate biomarkers.

21
After the discovery phase, we utilized PRM analysis of unfractionated CSF without TMT labeling to 22 validate the candidates that we had identified in the discovery phase of our study. This approach 23 involves stringent quantitative measurement of candidate proteins. In addition, the detection and 24 quantitation of candidates in unfractionated CSF is an important consideration for the potential 25 development of diagnostic or prognostic biomarkers. Notably, 69% of candidates that were found to be 26 differentially expressed in the discovery experiments with q-value < 0.1 were not detectable in the PRM 27 analysis, presumably due to the use of trypsin-digested CSF that was not pre-fractionated. Surprisingly,

28
proteins that showed the largest differential expression in our discovery studies such as neurofilament 29 proteins, CHIT1, CHI3L2 and APOB were not detectable by PRM, suggesting that these proteins are In summary, our mass spectrometry-based approach has identified two proteins that are significantly    Table 2. Differential proteins between ALS and control identified in the discovery experiment. The repeatability experiments of 37 detectable endogenous peptides in PRM analysis were conducted.

25
Three different amounts of target SIL peptides (30, 500, and 1,000 fmol) were added to the trypsin-26 digested CSF samples immediately before the PRM analysis. The PRM analyses were performed over 27 5 days in triplicate, measuring the heavy to light ratio of the target peptides.   Table S1. Complete list of proteins identified by the discovery experiment