Mass spectrometry data from label-free quantitative proteomic analysis of harmless and pathogenic strains of infectious microalgae, Prototheca spp

Here, we provide the dataset associated with our research article ‘label-free quantitative proteomic analysis of harmless and pathogenic strains of infectious microalgae, Prototheca spp.’ (Murugaiyan et al., 2017) [1]. This dataset describes liquid chromatography–mass spectrometry (LC–MS)-based protein identification and quantification of a non-infectious strain, Prototheca zopfii genotype 1 and two strains associated with severe and mild infections, respectively, P. zopfii genotype 2 and Prototheca blaschkeae. Protein identification and label-free quantification was carried out by analysing MS raw data using the MaxQuant-Andromeda software suit. The expressional level differences of the identified proteins among the strains were computed using Perseus software and the results were presented in [1]. This DiB provides the MaxQuant output file and raw data deposited in the PRIDE repository with the dataset identifier PXD005305.


Subject area
Biology More specific subject area Label-free quantitative proteomics, Bovine mastitis-associated infectious microalgae, Prototheca. spp.

Type of data
Raw data, table and Excel output files How data was acquired LC-MS using an UltiMate 3000 HPLC system (Dionex) connected online to an LTQ-Orbitrap Velos (Thermo Scientific) Data format Raw, processed Experimental factors a) Cell culture, harvest and protein isolation b) In-solution trypsin digestion and mass spectrometry analysis c) Protein identification and quantitative proteomic analysis Experimental features Whole cell proteins were extracted from Prototheca cultured strains cultured until mid-logarithmic phase of growth. For each sample protein concentrations were determined using the Bradford assay (Bio-Rad). Proteins were reduced, alkylated and digested with trypsin in solution. Following LC-MS analysis, protein identification and quantification was performed with MaxQuant software, the label-free quantitation was carried out using Perseus software.

Berlin, Germany
Data accessibility Data available at PRIDE: PXD005305.

Value of the data
The data further validate the protein identification presented in Murugaiyan et al. [1]. Data from the LC-MS analysis will provide researchers with detailed information on proteins associated with non-infectious, mildly and severely infectious strains of Prototheca spp.
Prototheca spp. represents an "orphan species" whose genome sequence has not yet been sequenced, therefore, this raw data is useful for quick analysis once the genome sequence has become available.

Data
This mass spectrometry data-in-brief is associated with the research article aimed towards identification of differentially expressed proteins among three different strains of Prototheca spp., Prototheca zopfii genotype 1 (GT1), genotype 2 (GT2) and Prototheca blaschkeae [1]. The dataset comprises raw data, results of protein identification using MaxQuant-Andromeda software suit and a list of proteins identified as differentially expressed between non-infectious, infectious and mildly infectious strains of Prototheca spp. The raw data can be downloaded from the PRIDE repository (identifier PXD005305), a compilation of the identified proteins is presented in Supplementary table 1 and the differentially expressed proteins are listed in Table 1.

Prototheca strains
The following three strains from the culture collection of the Institute of Animal Hygiene and Environmental Health, Freie Universität Berlin, Germany were utilized for this study [3]. a. P. zopfii genotype 1 (SAG 2063 T ), non-infectious environmental strain. b. P. zopfii genotype 2 (SAG 2021 T ), clinical strain. c. P. blaschkeae (SAG 2064 T ), clinical strain.

Cell culture and protein extraction
Following the retrieval from the culture collection, the strains were first streaked in Sabouraud dextrose solid media, incubated at 37°C until the appearance of visible colonies. The species and genotypes were reconfirmed using MALDI profiling as described [4]. The cell culture and protein extraction was carried out as described [1].

Mass spectrometry analysis
The proteins were subjected to in-solution trypsin digested as described [1]. The resultant peptides were purified using solid phase extraction procedure [5], separated by nanoscale C 18 reversephase liquid chromatography using the Dionex Ultimate 3000 nanoLC (Dionex/Thermo Fisher Scientific, Idstein, Germany) and directly ionised by electrospray ionization and measured after transfer into an LTQ Orbitrap Velos mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). MS survey scan (m/z 300-1700, resolution 60,000) was acquired in the Orbitrap and the 20 most intensive precursor ions were fragmented.

Data analysis
Data from MS/MS spectra was searched using MaxQuant-Andromeda software suit [6][7][8] against the Uniprot FASTA dataset of Chlorella variabilis and Auxenochlorella protothecoides proteome with the parameters settings as described in [1]. Table 2 shows the experimental design and sample file naming format and the dataset associated to the MaxQuant analysis is shown in Supplementary table 2.  [1]. The differences in protein expression computed in three different ways i) mildly infectious vs environmental strain, ii) severe infection-associated vs environmental strain and iii) severely infectious vs mildly infectious strain were presented in Murugaiyan et al. [1].

Mass Spectrometry dataset deposit
The mass spectrometry data was deposited at the ProteomeXchange (PX) Consortium [9][10][11] via the PRIDE (PRoteomics IDEntifications) partner repository at the European Bioinformatics Institute (http://www.ebi.ac.uk/pride/) and is now accessible with the dataset identifier PXD005305.

Acknowledgements
We would like to thank Michael Kühl for excellent technical assistance. We acknowledge the assistance of the Bio-MS unit of the Core Facility BioSupraMol supported by the Deutsche Forschungsgemeinschaft (DFG). The author Murat Eravci was supported by the Deutsche Forschungsgemeinschaft (DFG, SFB958). We thank the PRIDE team for their assistance in the MS data deposition.

Transparency document. Supporting information
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2017.04.006.