Data set for the proteomics analysis of the endomembrane system from the unicellular Entamoeba histolytica

Entamoeba histolytica is the protozoan parasite agent of amebiasis, an infectious disease of the human intestine and liver. This parasite contact and kills human cells by an active process involving pathogenic factors. Cellular traffic and secretion activities are poorly characterized in E. histolytica. In this work, we took advantage of a wide proteomic analysis to search for principal components of the endomembrane system in E. histolytica. A total of 5683 peptides matching with 1531 proteins (FDR of 1%) were identified which corresponds to roughly 20% of the total amebic proteome. Bioinformatics investigations searching for domain homologies (Smart and InterProScan programs) and functional descriptions (KEGG and GO terms) allowed this data to be organized into distinct categories. This data represents the first in-depth proteomics analysis of subcellular compartments in E. histolytica and allows a detailed map of vesicle traffic components in an ancient single-cell organism that lacks a stereotypical ER and Golgi apparatus to be established. The data are related to [1].

Data are supplied here and have been deposited to the open access library of ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository [2] with the dataset identifier PXD000770

Specifications table
Subject area Biology, parasitology More specific subject area Proteomics on the endomembrane system of Entamoeba histolytica Type of data Proteome Discoverer and Maxquant results (.txt) and list of identified proteins as tables (.xls) How data was acquired Liquid chromatography mass spectrometry in tandem (LC-MS/MS). Proteins from the internal membrane fraction of E. histolytica trophozoites were treated to obtain tryptic peptides. These were separated by HPLC coupled to an LTQ-Orbitrap Velos mass spectrometer (Thermo Fisher Scientific) Data format Raw and analyzed Experimental factors Non applied Experimental features Cell fractionation of E. histolytica to obtain enriched endomembrane proteins as described before [4] with some modifications. Samples were then prepared for liquid chromatography-mass spectrometry (LC-MS/MS) analysis. (Fig. 1 1. Data, experimental design, materials and methods

Preparation of samples for proteomics analysis
Proteins from the internal membrane fraction (50 mg) were precipitated with the methanolchloroform method [3] and the resulting dried pellet was dissolved in freshly prepared digestion buffer (8 M urea in 25 mM NH 4 HCO 3 ). Sample were reduced with 5 mM TCEP (45 min, 37 1C) and alkylated with 50 mm iodoacetamide (60 min, 37 1C) in the dark. Sample were diluted with 25 mM NH 4 HCO 3 to a final concentration of 1 M urea and digested overnight at 37 1C with sequencing grade trypsin gold (1 mg, Promega USA). After digestion, peptide mixtures were acidified to pH 2.8 with formic acid and desalted with minispin C18 columns (Nestgrp, USA). Samples were dried under vacuum and solubilized in 0.1% formic acid and 2% acetonitrile before mass spectrometric analysis.
The HPLC was coupled to an LTQ-Orbitrap Velos mass spectrometer (Thermo Fisher Scientific). Peptides were loaded onto the column with Buffer A (2% acetonitrile, 0.1% formic acid) and eluted with 120 min linear gradient from 2 to 40% buffer B (80% acetonitrile, 0.1% formic acid). After the gradient the column was washed with 90% buffer B and finally equilibrated with buffer A for next run. The mass spectra were acquired in the LTQ Orbitrap velos with full MS scan (RP 30,000) followed by 10 data-dependent MS/MS scans with detection of the fragment ions in the FTMS HCD mode (RP 7500). Target values were 1 Â 10 6 for full FT-MS scans and 5 Â 10 4 for FT-MS MSn scans. Ion selection threshold was set to 5000 counts.

Proteomic data analysis
Data analysis was performed using Thermo Proteome Discoverer software suite (version 1.4). For the search engine SEQUEST, the peptide precursor mass tolerance was set to 10 ppm, and fragment ion mass tolerance was set to 0.6 Da. Carbamidomethylation on cysteine residues was used as fixed modification, and oxidation of methionine along with N-terminal acetylation was used as variable  modifications. Spectra were queried against the E. histolytica uniProt database. In order to improve the rate of peptide identifications percolator node in proteome discoverer was utilized with the false discovery rate (FDR) set to 1% for peptide and protein identifications. The identified protein list was further arranged in protein groups based on common peptide matches. For a comparative analysis of all the identified peptide and protein lists among the three biological replicates (the three internal membrane samples) a common merger table was generated and provided in Supplemental material 1, Table 1 (Sheet 1). All the individually sample specific protein groups and their corresponding peptide list are also presented in Table 1 (Sheets 2-6). A summary of the identified proteins and their corresponding functional category is represented in Fig. 2. For detail analysis, each category group of proteins is listed in Supplemental material 1, Table 2 (ER, Golgi apparatus, heat shock, TGN-ER retrograde transport, Endosomes and MVBs), Supplemental material 1, Table 3 (proteins with    potential enzymatic activity associated to internal membranes), Supplemental material 1, Table 4 (GTPAses) and Supplemental material 1, Table 5 (possible cargo proteins). Proteins of unknown function present in the endomembrane fractions are listed in Supplemental material 1, Table 6 and of multiple functions are presented in Supplemental material 1, Table 7.
In order to determine the absolute abundance of different proteins within a single sample we used iBAQ feature of MaxQuant version 1.4.0.5 software using default search parameters [5,6]. The results of proteome discoverer and Maxquant searches were arranged together. The mass spectrometry proteomics data have been deposited to the open access library of ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository [2] with the dataset identifier PXD000770.

Bioinformatic analysis
Proteome discoverer annotation node, which is connected to ProteinCenter web based application, was used to download categorical GO database information in the form of biological process (BF), Maxquant results and analysis are found as a folder in Supplemental material 2 (Fig. 6).