Data for a comprehensive map and functional annotation of the human cerebrospinal fluid proteome

Knowledge about the normal human cerebrospinal fluid (CSF) proteome serves as a baseline reference for CSF biomarker discovery and provides insight into CSF physiology. In this study, high-pH reverse-phase liquid chromatography (hp-RPLC) was first integrated with a TripleTOF 5600 mass spectrometer to comprehensively profile the normal CSF proteome. A total of 49,836 unique peptides and 3256 non-redundant proteins were identified. To obtain high-confidence results, 2513 proteins with at least 2 unique peptides were further selected as bona fide CSF proteins. Nearly 30% of the identified CSF proteins have not been previously reported in the normal CSF proteome. More than 25% of the CSF proteins were components of CNS cell microenvironments, and network analyses indicated their roles in the pathogenesis of neurological diseases. The top canonical pathway in which the CSF proteins participated was axon guidance signaling. More than one-third of the CSF proteins (788 proteins) were related to neurological diseases, and these proteins constitute potential CSF biomarker candidates. The mapping results can be freely downloaded at http://122.70.220.102:8088/csf/, which can be used to navigate the CSF proteome. For more information about the data, please refer to the related original article [1], which has been recently accepted by Journal of Proteomics.

network analyses indicated their roles in the pathogenesis of neurological diseases. The top canonical pathway in which the CSF proteins participated was axon guidance signaling. More than one-third of the CSF proteins (788 proteins) were related to neurological diseases, and these proteins constitute potential CSF biomarker candidates. The mapping results can be freely downloaded at http://122.70.220.102:8088/csf/, which can be used to navigate the CSF proteome. For more information about the data, please refer to the related original article [1], which has been recently accepted by Journal

Value of the data
This study identified the largest high-confidence dataset of the human CSF proteome.
Some CSF proteins' abundances are quantified by the iBAQ method. High proportion of the CSF proteins is microenvironment components of CNS cells. High proportion of the CSF proteins participate in neurocyte connectivity. A large part of CSF proteins are biomarker candidates of neurological diseases.
1. Data, experimental design, materials and methods Table 1 lists all the CSF proteins with at least 2 unique peptide identifications. Table 2 lists the proteins and their abundances, which were quantified by the iBAQ [2] method. Table 3 lists the CSF proteins that participate in the axon guidance signaling pathway. Table 4 lists the CSF proteins involved in neurological diseases.

Experimental design
CSF pooled from 14 patients (7 women and 7 men) who received spinal anesthesia before nonneurological operations was subjected to a global proteomic analysis. The pooled sample was first depleted of 14 high-abundance proteins with an immunoaffinity column. The flow-through proteins, bound proteins, and original proteins (extracted from the CSF samples that were not subjected to immunoaffinity depletion) were collected separately, digested according to the filter-aided sample preparation method [3], and then separated into 30 fractions each by high-pH RPLC. Each fraction was then subjected to proteomic analysis by nano-RPLC-MS/MS. In total, 90 LC-MS/MS runs were performed on the 90 fractions from the pooled CSF samples, and the resulting data were used to produce a comprehensive map of the human CSF proteome.

Apparatuses
A TripleTOF 5600 mass spectrometer from AB Sciex (Framingham, MA, USA) and an ACQUITY UPLC system from Waters (Milford, MA, USA) were used.

CSF collection
CSF samples were collected by lumbar puncture from patients who received spinal anesthesia before non-neurological operations at Beijing Tiantan Hospital. These patients were checked by an independent medical doctor to rule out neurological diseases and recent medication use. Following collection, a subsample of each CSF sample was sent to a clinical laboratory for routine CSF diagnostics. The remaining sample was immediately centrifuged for 10 min at 2500 Â g to remove cellular components and subsequently aliquoted and stored at À 80 1C for further analysis. A total of 14 samples from 14 individuals (7 women and 7 men, aged 24-55 years, with a median age of 28 years) were selected and subjected to quantitation by the Bradford method [4]. Equal protein amounts from 14 CSF samples were mixed, resulting in the pooled CSF sample for the proteomic analyses. All selected samples had normal clinical laboratory values with respect to microbiology, chemistry, and cell counts. Approval for this study was obtained from our institutional review boards in accordance with ethical regulations.

Immunoaffinity depletion of 14 high-abundance proteins
The pooled CSF sample, which contained approximately 1 mg of CSF protein, was depleted of 14 high-abundance proteins (albumin, IgG, α1-antitrypsin, IgA, IgM, transferrin, haptoglobin, α 1 -acid glycoprotein, α 2 -macroglobulin, apolipoprotein A-I, apolipoprotein A-II, fibrinogen, complement C3, and transthyretin) using a 4.6 Â 50 mm Human 14 affinity LC column (Agilent, St. Louis, MO, USA) with a Waters HPLC system (Milford, MA, USA). The separations were performed according to the manufacturer's instructions regarding column usage and loading capacity. The flow-through protein sample and bound protein sample were collected separately. The two samples and the non-depleted CSF sample (containing the original proteins) were subjected to the sample handling and analysis procedures described below.

Protein digestion
A filter-aided sample preparation method [3] was used to digest the proteins in the three samples (each contained 100 mg of CSF protein). Briefly, the proteins were reduced with 20 mM DTT at 95 1C for 3-5 min and washed once with 8 M urea on a 10 kDa filter at 14,000 Â g for 40 min. The samples were then alkylated with 55 mM iodoacetamide for 30 min in darkness and washed twice with 8 M urea. Next, the proteins were washed with 50 mM ammonium bicarbonate once and digested with trypsin (1 μg/50 μg protein) overnight at 37 1C. After digestion, the three peptide mixtures derived from the flow-through, the bound sample and the non-depleted CSF sample were desalted on a Waters Oasis C18 solid-phase extraction column and lyophilized for HPLC separation.

High-pH HPLC separation
The three lyophilized peptide mixtures were fractionated with a high-pH RPLC column from Waters (4.6 mm Â 250 mm, C18, 3 μm). Each peptide mixture was loaded onto the column in buffer A2 (H 2 O, pH¼10). The elution gradient was 5-30% buffer B2 (90% ACN, pH¼ 10; flow rate, 1 mL/min) for 60 min. The eluted peptides were collected as one fraction per minute, and the 60 fractions collected were re-suspended in 0.1% formic acid and pooled into 30 fractions. A total of 90 fractions produced from the three peptide mixtures were analyzed by LC-MS/MS.

LC-MS/MS
Each fraction was analyzed with a reverse-phase C18 self-packed capillary LC column (75 μm Â 100 mm, 3 μm). The elution gradient was 5-30% buffer B1 (0.1% formic acid, 99.9% ACN; flow rate, 0.3 μL/min) for 40 min. A TripleTOF 5600 mass spectrometer was used to analyze the fractions. The MS data were acquired using the high-sensitivity mode with the following parameters: 30 data-dependent MS/MS scans per full scan, full scans acquired at a resolution of 40,000 and MS/MS scans at a resolution of 20,000, rolling collision energy, charge state screening (including precursors with charge states of þ2-þ4), dynamic exclusion (exclusion duration 15 s), MS/MS scan range of 100-1800 m/z, and a scan time of 100 ms.

Database search
The MS/MS spectra were searched against the Swiss-Prot human database from the UniProt website (www.uniprot.org) using Mascot software, version 2.3.02 (Matrix Science, UK). Trypsin cleavage specificity was set with a maximum number of allowed missed cleavages of two. Carbamidomethylation (C) was set as a fixed modification. The searches were performed using a peptide and product ion tolerance of 0.05 Da. The resulting dataset was further filtered using the decoy database method in Scaffold (v 4.3.2).

Intensity-based absolute quantification (iBAQ) of proteins
Protein abundances were estimated using the iBAQ algorithm [2]. The detailed protocol is provided below: (1) The protein intensities were first computed by Progenesis LC-MS (v2.6, Nonlinear Dynamics, UK) as the sum of all identified peptide intensities (maximum peak intensities of the peptide elution profile, including all peaks in the isotope cluster). (2) The protein intensities were then divided by the number of theoretically observable peptides (calculated by in-silico protein digestion; all fully tryptic peptides between 6 and 30 amino acids were counted). (3) The resulting intensities were iBAQ values, which are shown as "Absolute iBAQ intensities" in column K of Supplementary Table 2. (4) The relative iBAQ intensities (in column L of Supplementary Table 2) were computed by dividing the absolute iBAQ intensities by the sum of all absolute iBAQ intensities. (5) The relative iBAQ intensities were applied to estimate the relative protein abundances (the proportions of protein amounts to total CSF protein amount). (6) The protein abundances (or concentrations) in CSF were finally calculated by multiplying the relative iBAQ intensities by 0.3 (the protein concentration of the pooled CSF sample).