Exploration of human cerebrospinal fluid A large proteome dataset revealed by trapped ion mobility time-of-flight mass spectrometry

Pro mass spec- trometer. This state-of-the-art mass spectrometry-based proteomic workﬂow allowed the identiﬁcation of 3’174 proteins in CSF. The dataset reported herein completes the pool of the most comprehensive human CSF proteomes obtained so far. An overview of the identiﬁed proteins is provided based on gene ontology annotation. Mass and tandem mass spectra are for spectral library generation and as a starting point for clinical studies focussing on CSF and neurological disorders • The data provides information for targeted protein/peptide assay development in human CSF


a b s t r a c t
Cerebrospinal fluid (CSF) is a biofluid in direct contact with the brain and as such constitutes a sample of choice in neurological disorder research, including neurodegenerative diseases such as Alzheimer or Parkinson. Human CSF has still been less studied using proteomic technologies compared to other biological fluids such as blood plasma or serum. In this work, a pool of "normal" human CSF samples was analysed using a shotgun proteomic workflow that combined removal of highly abundant proteins by immunoaffinity depletion and isoelectric focussing fractionation of tryptic peptides to alleviate the complexity of the biofluid. The resulting 24 fractions were analysed using liquid chromatography coupled to a high-resolution and high-accuracy timsTOF Pro mass spectrometer. This state-of-the-art mass spectrometry-based proteomic workflow allowed the identification of 3'174 proteins in CSF. The dataset reported herein completes the pool of the most comprehensive human CSF proteomes obtained so far. An overview of the identified proteins is provided based on gene ontology annotation. Mass and tandem mass spectra are made available as a possible starting point for further studies exploring the human CSF proteome. ©

Value of the data
• A comprehensive proteomic profile of "normal" human CSF, among the largest reported so far using LC-MS/MS, is provided • The data is useful for enhanced characterization and annotation of the human CSF proteome • The data is valuable for the proteomic community for spectral library generation and as a starting point for clinical studies focussing on CSF and neurological disorders • The data provides information for targeted protein/peptide assay development in human CSF

Data description
The dataset presented herein identified 3'174 proteins and their respective 25'227 peptides in "normal" CSF; protein and peptide lists are provided in Supplementary Table S1 . The human CSF sample analyzed in this report was previously analyzed with different LC-MS/MS intrumentations to assess throughput and robustness of an automated pipeline for biomarker discovery [4] and to deeply charaterize the human CSF proteome in the quest of identificaton of missing proteins [ 1 , 5 ]. In the present work, the previously prepared sample was analyzed again using the recent timsTOF Pro mass spectrometer to evaluate its capabilities in terms of CSF proteome coverage. MS data were thus acquired by analysing CSF depleted from abundant proteins, after tryptic digestion and peptide fractionation, using a nanoElute LC system coupled to a timsTOF Pro mass spectrometer. MS raw files were then converted into peaklists with MSConvert and searched against the human UniProtKB/Swiss-Prot database using Mascot and X! Tandem. The Scaffold software, specifying a false discovery rate (FDR) of 1% at both protein and peptide level, and a one unique peptide criterion, was used to report protein identifications. Gene Ontology (GO) annotation was performed with the Panther software ( Fig. 1 ). Binding and Catalytic activity  represented 78% of the molecular functions. Cellular process was the most important biological process represented ( i.e. , 23% of all genes); lastly, Cell and Cell part (21% each) were the major cellular components identified in this dataset.
A GO enrichment was also performed with Gorilla [6] , to identify terms enriched in this "normal" human CSF sample with respect to the whole human proteome ( Table 1 ). Terms relative to semaphorin/neuropilin/plexin, such as "semaphorin receptor activity ", "axon guidance receptor activity " or "semaphorin-plexin signaling pathway involved in neuron projection guidance " were particularly enriched in this dataset.

Sample preparation
The sample preparation was performed previously [ 1 , 5 ]. Briefly, 96 aliquots of 400 μL of a commercial pooled CSF sample (Analytical Biological Services) were evaporated with a vacuum centrifuge (Thermo Scientific). The dried samples were diluted in depletion Buffer A (Agilent Technologies) containing 9.65 μg/mL of β-lactoglobulin from bovine milk. Abundant CSF proteins were removed using MARS columns (Agilent Technologies) and HPLC systems (Thermo Scientific) equipped with an HTC-PAL (CTC Analytics AG) fraction collector. Buffer exchange was performed with Strata-X 33u polymeric reversed-phase (RP) (30 mg/1 mL) cartridges mounted on a 96-hole holder and a vacuum manifold, as previously described [7] . Samples were subsequently evaporated and subjected to reduction, alkylation, digestion, tandem mass tag (TMT) 6-plex (Thermo Scientific) labeling, pooling and purification using a 4-channels Microlab Star liquid handler workstation (Hamilton) in a 96-well-plate format and according to previously reported protocols [ 4 , 7-9 ]. Briefly, each sample was dissolved in 95 μL of triethylammonium bicarbonate (TEAB) 100 mM and 5 μL of 2% sodium dodecyl sulfate. A volume of 5.3 μL of tris(2-carboxyethyl) phosphine (20 mM) was added and incubation was performed for 1 h at 55 °C. A volume of 5.5 μL of iodoacetamide 150 mM was added (incubation for 1 h in darkness). Enzymatic digestion was performed via the addition of 10 μL of trypsin/Lys-C at 0.25 μg/ μL in 100 mM TEAB (incubation overnight at 37 °C). TMT labeling was performed via the addition of 0.8 mg of TMT 6-plex reagent in 41 μL of CH 3 CN (incubation for 1 h at room temperature). After reaction, a volume of 8 μL of hydroxylamine 5% in H 2 O was added to each tube to react for 15 min. Samples from a given TMT 6-plex experiment were pooled together in a new tube. Pooled Table 1 GO term enrichment for the genes representative of the 3'174 proteins identified in the CSF dataset. GO term enrichment analysis was performed with Gorilla [6] on the three ontologies, (a) molecular function (b) biological process and (c) cellular component. The background used for the enrichment analysis was the full human proteome (UniProtKB/Swiss-Prot 2020/02 release). In the table, only terms with p-value below 10 −5 and fold enrichment above 5, are displayed. All the enrichment results are presented in Supplementary Tables S2-4

RP-LC MS/MS analysis
The purified 24 fractions were dissolved in 50 μL H 2 O/CH 3 CN/formic acid (FA) 96.9/3/0.1%. A volume of 3 μL of each of the fractions were then diluted with 7 μL of H 2 O/FA 99.9/0.1% and only 2 μL of each diluted fraction were injected for separation on a 75 μm × 250 mm Aurora 2 C18 column (Ion Opticks). A typical RP gradient (Solvent A: 0.1% FA, 99.9% H 2 O MilliQ; Solvent B: 0.1% FA, 99.9% CH 3 CN) was run on a nanoflow LC system (nanoElute, Bruker Daltonik GmbH) at a flow rate of 400 nL/min. Column temperature was controlled at 50 °C. The LC run lasted for 120 min (2% to 15% of Solvent B during 60 min; up to 25% at 90 min; up to 37% at 100 min; up to 95% at 110 min and finally 95% for 10 min to wash the column). The column was coupled online to a timsTOF Pro with a CaptiveSpray ion source (both from Bruker Daltonik GmbH). The temperature of the ion transfer capillary was set at 180 °C. Ions were accumulated for 123 ms, and mobility separation was achieved by ramping the entrance potential from −160 V to −20 V within 123 ms.
The acquisition of mass and tandem mass spectra was done with average resolution of 60,0 0 0 and 50,0 0 0 full width at half maximum (mass range 10 0-170 0 m/z ), respectively. To enable the parallel accumulation-serial fragmentation (PASEF) method, precursor m/z and mobility information was first derived from full scan TIMS-MS experiments (with a mass range of m/z 10 0-170 0). Singly charged precursors were excluded by their position in the m/z -ion mobility plane and precursors that reached a 'target value' of 20,0 0 0 a.u. were dynamically excluded for 0.4 min. The quadrupole isolation width was set to 2 Th for m/z < 700 and 3 Th for m/z ≥ 700, for fragmentation, and the collision energies varied between 31 and 52 eV depending on precursor mass and charge. TIMS, MS operation and PASEF were controlled and synchronized using the control instrument software OtofControl 5.1 (Bruker Daltonik). LC-MS/MS data were acquired using the PASEF method with a total cycle time of 1.23 s, including 1 TIMS MS scan and 10 PASEF MS/MS scans. The 10 PASEF scans (123 ms each) contained on average 12 MS/MS scans per PASEF scan. Ion mobility resolved mass spectra, nested ion mobility versus m/z distributions, as well as summed fragment ion intensities were extracted from the raw data file with Data-Analysis 5.1 (Bruker Daltonik).

Data processing and analysis
Protein identification was performed against the human UniProtKB/Swiss-Prot database (2020/02 release) comprising 20'367 protein sequences in total. Mascot (version 2.4.6 from Matrix Sciences) was used as search engine. Variable amino acid modifications were: oxidized methionine, deamidated asparagine/glutamine, and 6-plex TMT-labeled peptide amino terminus; 6plex TMT-labeled lysine was set as fixed modifications as well as carbamidomethylation of cysteine. Trypsin was selected as the proteolytic enzyme, with a maximum of two potential missed cleavages. Peptide and fragment ion tolerance were set to 15 ppm and 0.05 Da, respectively. All Mascot result files were loaded into Scaffold Q + S 4.8.4 (Proteome Software) to be further searched with X! Tandem (The GPM, thegpm.org; version CYCLONE (2010.12.01.1)). The FDR in Scaffold was set up to 1% at protein and peptide level, with a one unique peptide criterion to report protein identification.