Data on endogenous chicken sperm peptides and small proteins obtained through Top-Down High Resolution Mass Spectrometry

The endogenous peptides and small proteins present in chicken sperm were identified in the context of the characterization of a fertility-diagnostic method based on the use of ICM-MS (Intact Cell Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry). The interpretation and description of these data can be found in a research article, “Intact cell MALDI-TOF MS on sperm: a molecular test for male fertility diagnosis” (Soler et al., 2016) [1], and raw data derived from this analysis have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PRIDE: PXD002768. Here, we describe the inventory of all the molecular species identified, along with their biochemical features and functional analysis. This peptide/protein catalogue can be further employed as reference for other studies and reveal that the use of proteomics allows for a global evaluation of sperm cells functions.


a b s t r a c t
The endogenous peptides and small proteins present in chicken sperm were identified in the context of the characterization of a fertility-diagnostic method based on the use of ICM-MS (Intact Cell Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry). The interpretation and description of these data can be found in a research article, "Intact cell MALDI-TOF MS on sperm: a molecular test for male fertility diagnosis" (Soler et al., 2016) [1], and raw data derived from this analysis have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PRIDE: PXD002768. Here, we describe the inventory of all the molecular species identified, along with their biochemical features and functional analysis. This peptide/protein catalogue can be further employed as reference for other studies and reveal that the use of proteomics allows for a global evaluation of sperm cells functions. &

Value of the data
Here we make available for the scientific community the first repository of chicken sperm endogenous peptides and small proteins.
This data describes Top-Down mass spectrometry protein identification results using two different pre-fractionation strategies: gel filtration and reverse phase chromatography, which can be useful when designing future identification strategies using the same technique.
Data presented here include a description of the biochemical properties of the abovementioned identified sperm cells biomolecules as well as the molecular functions, biological processes and cellular components in which they are implicated.
This information can be valuable in male fertility research studies.

Data
This dataset consists of a compendium of peptidoforms and small proteoforms extracted from chicken ejaculated sperm, that were pre-fractionated using gel filtration or reverse phase chromatography and identified through a Top-Down mass spectrometry analysis. Hence, this set includes data regarding the identity of chicken sperm intact peptides and proteins as well as some structural biologically relevant information like N-terminal amino acids or post-translational modifications. See Figs. 1 and 2 and Supplementary Table S1.

Experimental design and sample collection
This dataset was produced with the objective of identifying by Top-Down mass spectrometry the peptides and small proteins contained in chicken sperm cells [1]. Sperm cells (200 μL) protein extraction was performed by sonication in 400 mL of 6 M Urea 50 mM Tris-HCl pH 8.8 buffer containing protease inhibitor cocktail (Roche, Switzerland). Samples were centrifuged (45 min at 13,000 rpm and 4°C) and supernatants containing the extracted proteins were kept for further analysis. The protein content was determined using a Bradford assay (BioRad, Marnes-la-Coquette, France).

Protein/peptide fractionation strategies, Top-Down mass spectrometry identification protocol and results
One mg of the extracted peptides/proteins were subjected to fractionation through chromatographic separation on an UltiMate 3000 RSLC system controlled by Chromeleon version 6.80 SR13 software (Thermo Scientific Dionex, Sunnyvale, USA) using reversed phase (RP) and gel filtration (GF) chromatography as described elsewhere [1]. A total of 35 and 65 fractions were obtained after RP and GF chromatography, respectively. All fractions obtained after fractionation were then analyzed by online micro-liquid chromatography tandem mass spectrometry (mLC-MS/MS) on a dual linear ion trap Fourier Transform Mass Spectrometer (FT-MS) LTQ Orbitrap Velos (Thermo Fisher Scientific, Darmstadt, Germany) coupled to an Ultimate s 3000 RSLC Ultra High Pressure Liquid Chromatographer (Thermo Scientific Dionex, Sunnyvale, USA) controlled by Chromeleon Software (version 6.8 SR11; Thermo Scientific Dionex, Sunnyvale, USA). Proteo/peptidoform identification and structural  characterization were performed using ProSight PC software v 3.0 SP1 (Thermo Fisher Scientific, Darmstadt, Germany). The detailed procedure followed for this analysis is described elsewhere [1]. Raw data derived from Top-Down analysis have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PRIDE: PXD002768.
In total, 1038 intact or fragment protein masses were detected by Top-Down mass spectrometry (Supplementary data Table S1). A total of 447 biomolecules were detected in RP-derived fractions and 591 biomolecules in GF-derived fractions. From all the identified biomolecules, 65 were identified using both fractionation methods, while the rest was identified uniquely after RP or GF chromatography separation. The distribution of molecular weight and isoelectric point of the identified masses after each pre-fractionation method is represented in Fig. 1A and B, respectively.

Functional analysis
In order to identify which cell compartments/functions/pathways were mainly represented by the data set, a systems biology analysis was performed. In brief, the UniProtKB accession numbers from all m/z masses that were confidently identified through Top-Down analysis were recovered and listed. From these, the official human gene symbols (HuGO Gene Nomenclature Committee) were retrieved from Gallus gallus annotated proteins, whereas uncharacterized proteins were mapped to the corresponding Homo sapiens orthologs by identifying the reciprocal-best-BLAST hits. When manual annotation was needed (i.e. for deleted entries), this was performed by BLASTp (E value reÀ 04), together with Gene Ontology (GO) functional annotation. Because far more human genes are annotated and more information in databases is available for humans than for chicken, the human background was employed when possible. The functional classification tool from the Database for Annotation, Visualization and Integrated Discovery (DAVID version 6.7) website (https://david.ncifcrf.gov/home.jsp) was employed to group proteins contained in our dataset based on functional similarity (Supplementary Data Table S2). Functional analysis was also performed using the Panther Functional Classification System (http://pantherdb.org/) to evidence the most represented molecular functions ( Fig. 2A) and biological processes (Fig. 2B). The functional analysis was completed using the "Set Distiller" module of GeneDecks (http://genecards.weizmann.ac.il/v3/index.php? path¼GeneDecks; Supplementary Data Table S3).
The data presented here consist in a list of intact and unmodified (by chemical treatment) endogenous low molecular weight biomolecules (o15 kDa) present in chicken sperm. This dataset can be further employed as reference for other studies focused on the research of sperm cells.