Data for the characterization of the HSP70 family during osmotic stress in banana, a non-model crop

Here, we present the data from an in-depth analysis of the HSP70 family in the non-model banana during osmotic stress [1]. First, a manual curation of HSP70 sequences from the banana genome was performed and updated on the Musa hub http://banana-genome.cirad.fr/. These curated protein sequences were then introduced into our in-house Mascot database for an in-depth look at the HSP70 protein profiles in banana meristem cultures and roots during osmotic stress. A 2D-DIGE LC MS/MS approach was chosen to identify and quantify the different paralogs and allelic variants in the HSP70 spots.


Value of the data
Data provides an overview of the identification and quantification of the HSP70 family in a nonmodel plant with special attention to osmotic stress.
Manual curation of the banana HSP70 sequences was performed as correct structural annotation of a sequenced genome is essential prior to proteomic analysis.
The proteomics approach using 2DE and LC MS/MS was a deciding factor in the successful identification and quantification of the HSP70 paralogs and allelic variants.
1. Experimental design, materials and methods

Analysis of the Musa HSP70 family
Musa HSP70 nucleotide and protein sequences were obtained from GreenPhyl and the Banana Genome Hub [2][3][4]. Since many HSP70 genes were incorrectly predicted, all HSP70 predicted from the Acuminata A genome sequences were manually curated as well as the cytoplasmic and luminal HSP70 sequences predicted from the Balbisiana B genome. The manually curated B genome cytoplasmic and luminal protein sequences can be found in Supplementary file 1. The manually curated A genome protein sequences are available at the Banana Genome Hub. Cytoplasmic HSP70 from rice and Arabidopsis thaliana were retrieved from Greenphyl with the accessions as described by Jung et al. [5]. Alignments of protein sequences were created using the ClustalX 2.1 software [6]. An alignment of the main cytoplasmic HSP70 isoforms identified later in this manuscript can be found in Supplementary file 2. Phylogenetic trees were constructed via ClustalX 2.1 using the neighbor-joining algorithm with 1000 replicate bootstrap tests. Trees were visualized with njplot [7]. The phylogenetic relationship between all curated cytoplasmic HSP70 protein sequences of the Musa A genome (GSMUA_Achr) and the cytoplasmic HSP70s of rice (LOC_Os) and Arabidopsis (AtHsp) can be found in Fig. 1.

In vitro meristem stress tests
In vitro plants of the selected variety Cachaco (ABB, ITC 0643) were supplied by the International Transit Centre of Bioversity International. Multiple shoot meristem cultures were initiated as described by Strosse et al. [8] and maintained on the standard control medium (MS medium supplemented with benzylaminopurine). All cultures were kept in dark at 25-27 1C. A stress test was started by adding 0.31 M sucrose to the standard medium. Tissue samples of stressed meristem cultures were taken and frozen after 0, 1, 4 and 14 days. All samples were stored at À 80 1C.

Plant root stress test
In vitro plants of the selected variety Cachaco (ABB, ITC 0643) were supplied by the International Transit Centre of Bioversity International. The plants were grown in a phytotron (Sanyo, MLR-351H). The humidity and temperature were kept constant at 75% and 25 1C respectively. A 12 h/12 h light/ dark period with an average light intensity of 183729 mmol photons m À 2 s À 1 was maintained throughout the experiment. To apply a physiological more relevant osmotic stress we lowered the osmotic concentration to apply moderate stress during which plants suffered a reduction in growth but not a full growth stop and switched to the non-metabolizable sorbitol. After five weeks an osmotic stress test was started by adding 0.21 M sorbitol to the MSR medium (MSR medium according to Voets et al. [9]). Root samples of control plants were taken and frozen in liquid nitrogen at the start of the experiment and after 4 days. Root samples of sorbitol stressed plants were taken and frozen after 0, 1, 4 and 14 days. All samples were stored at À 80 1C.

Proteomics
Meristem and root proteins were extracted and analyzed using the phenol extraction/ammonium acetate precipitation protocol reported by Carpentier et al. [10]. 50 mg of proteins was labeled with Cy2, Cy3 and Cy5 (GE Healthcare) for a total of 150 mg protein per gel, separated on gel and scanned according to Carpentier et al. [11]. Data were analyzed using the DeCyder software version 7.0 (GE Healthcare). Statistical analysis of the standardized abundance of spots was performed in DeCyder. Statistical analysis of the raw spot intensities was performed using ANOVA in STATISTICA software 10 on the log of the peak height of the internal standard samples exported from DeCyder. Dynamic abundance profiles of the root HSP70 spots 1, 2, 3, 4, 5 and 6 after 0, 1, 4 and 14 days of stress (n¼ 3) can be found in Fig. 2.
For protein identification, gel pieces were extracted based on the protocol of Shevchenko et al. [12] for the in-gel reduction, alkylation and destaining of the proteins. The destaining step was performed twice after which the gel pieces were covered with 3 mL of 0.1 mg/mL trypsin and 47 mL trypsin buffer (25 mM ammonium carbonate, 10% acetonitrile (ACN)). Digestion was performed overnight at 37 1C. Peptides were extracted by adding 100 mL 5% ACN in 0.1% FA, vortexing, centrifuging and sonicating for 5 min after which the supernatant is removed to a new eppendorf tube. The whole peptide extraction process is repeated twice with 50 mL 10% ACN in 0.1% FA the first time and 50 mL 95% ACN and 5% FA the last time. The accumulated supernatant was then dried in a vacuum centrifuge and stored at À20 1C. Before analysis, the samples were resuspended in 0.1% FA and 5% ACN, desalted using C18 Zip Tips (Millipore) and eluted in 10 ml Milli-Q water with 0.1% FA and 60% ACN, dried in a vacuum centrifuge and resuspended in 0.1% FA and 5% ACN.
The HPLC-MS/MS analysis was performed on a Q Exactive Orbitrap mass spectrometer (Thermo Scientific, USA). The samples (5 mL) were injected and separated on an Ultimate 3000 HPLC system (Dionex, Thermo Scientific) equipped with a C18 PepMap100 precolumn (5 mm, 300 mm Â 5 mm, Thermo Scientific) and an EasySpray C18 column ( (Thermo Scientific). For identification, all raw data were converted into mgf files using Progenesis v4.1 (Nonlinear Dynamics, UK). The spectra were searched using Mascot (version 2.2.04) against our in-house Musa database (76,220 sequences) containing all the protein sequences of the published A and B genome plus contaminant sequences (trypsin and keratin). Redundancy was eliminated from the database using the program cdhit [13]. If both A and B isoforms were identical, the B genome isoform was eliminated. The original HSP70 protein sequences were removed and replaced by the manually curated HSP70 sequences. Search parameters were set at: tryptic digestion, one miscleavage allowed, 10 ppm precursor mass tolerance and 0.02 Da for fragment ion tolerance with a fixed modification of cysteine carbamidomethylation and a variable modification of methionine oxidation. The advantage of an LC-separation is nicely illustrated by the separation of the peptides FSDSSVQSDIK (encoded by gene GSMUA_Achr7T15160) and YSDASVQSDIK (encoded by gene GSMUA_Achr10T00900). These two isoforms of the peptide have the same monoisotopic mass but have different retention times on the RP column because of their different hydrophobicity (approximately 19 and 17 min) (Fig. 3). This would have resulted in a chimeric spectrum using MALDI-TOF/TOF MS but produces separate spectra using LC-MS/MS.
An isoform was retained as positively identified in a spot if at least one tryptic specific peptide was found with an ion score higher than the Mascot identification score. Cytoscape v3.0 software was used to visualize tryptic specific peptides [14][15][16]. Supplementary file 3 contains a list of all identified HSP70 paralogs and allelic variants per spot based on this Mascot analysis. To quantify the different protein species in each spot, Mascot emPAI was exported and the ion intensity of the proteotypic peptide for each peptide was analyzed in Progenesis v4.1. Moreover, for all isoforms positively identified in at least one spot, we searched the unidentified MS/MS spectra in each spot in which they were not identified by performing a manual SRM approach. The ion intensity for a MS/MS spectrum was added to the quantification when the peptide fragment mass corresponded to the proteotypic peptide and a specific signature m/z was identified in the MS/MS spectrum. Several spots in a trail contain multiple proteins even on a 24 cm 3 pI zoom strip, as already been indicated by Schmidt et al. [17]. Although all the spots look well separated on the 2-DE gels they consist of several proteins caused by neighbor spots. The isoelectric focusing of one particular isoform is not restricted to one physical location in the gel and each isoform has its highest abundance at a particular isoelectric point (Supplementary file 4). Supplementary file 4 provides an overview of the ion intensity of the proteotypic peptide of each identified paralog and/or allelic variant in all spots. Although the peptides share an identical m/z (606.7908), their different amino acid constitution lead to different hydrophobic behavior as evidenced by the different GRAVY scores and therefore different retention times. GRAVY stands for grand average of hydropathicity and peptides with a more negative score are more hydrophilic and are eluted earlier during RP chromatography.