Delineating the glycoproteome of elongating cotton fiber cells

The data presented here delineates the glycoproteome component in the elongating cotton fiber cells attained using complementary proteomic approaches followed by protein and N-linked glycosylation site identification (Kumar et al., 2013) [1]. Utilizing species specific protein sequence databases in proteomic approaches often leads to additional information that may not be obtained using cross-species databases. In this context we have reanalyzed our glycoproteome dataset with the Gossypium arboreum, Gossypium raimondii (version 2.0) and Gossypium hirsutum protein databases that has led to the identification of 21 N-linked glycosylation sites and 18 unique glycoproteins that were not reported in our previous study. The 1D PAGE and solution based glycoprotein identification data is publicly available at the ProteomeXchange Consortium via the PRIDE partner repository (Vizcaíno et al., 2013) [2] using the dataset identifier PXD000178 and the 2D PAGE based protein identification and glycopeptide approach based N-linked glycosylation site identification data is available at the ProteomeXchange Consortium via the PRIDE partner repository (Vizcaíno et al., 2013) [2] using the dataset identifier PXD002849.


a b s t r a c t
The data presented here delineates the glycoproteome component in the elongating cotton fiber cells attained using complementary proteomic approaches followed by protein and N-linked glycosylation site identification (Kumar et al., 2013) [1]. Utilizing species specific protein sequence databases in proteomic approaches often leads to additional information that may not be obtained using cross-species databases. In this context we have reanalyzed our glycoproteome dataset with the Gossypium arboreum, Gossypium raimondii (version 2.0) and Gossypium hirsutum protein databases that has led to the identification of 21 N-linked glycosylation sites and 18 unique glycoproteins that were not reported in our previous study. The 1D PAGE and solution based glycoprotein identification data is publicly available at the ProteomeXchange Consortium via the PRIDE partner repository (Vizcaíno et al., 2013) [2] using the dataset identifier PXD000178 and the 2D PAGE based protein identification and glycopeptide approach based N-linked glycosylation site identification data is available at the Proteo-meXchange Consortium via the PRIDE partner repository (Vizcaíno et al., 2013) [2]  The data is accessible via this article, via the related research article [1], and at the ProteomeXchange Consortium via the PRIDE partner repository [2] using the dataset identifier PXD000178 and PXD002849. (http://proteome central.proteomexchange.org)

Value of the data
The present data provides valuable insights about the glycoproteins present in the elongating cotton fiber cells identified using Gossypium species specific protein sequence databases.
127 N-linked glycosylation sites, from 81 unique glycoproteins including 21 N-linked glycosylation sites corresponding to 17 unique glycoproteins are exclusively reported in the current study.
Our analyses using five independent protein databases show that Gossypium hirsutum harbors protein sequences from its parental contributors Gossypium arboreum and Gossypium raimondii.
The elucidated glycoproteome composition indirectly provides clues about parental genome contribution and unicellular compartmental requirement behind single cell development.

Data
Concanavalin A (Con A) based lectin affinity chromatography was employed to enrich glycoproteins isolated from elongating cotton fiber (Fig. 1). Four different proteomic approaches followed by five independent database search strategies were applied to identify the glycoproteins using Nano-LC-MALDI TOF/TOF. Our data revealed 352 unique proteins including 305 proteins (486%) with potential N-linked glycosylation sites [1]

Experimental design
The role of glycosylated proteins in the structural and regulatory aspects of cotton fiber development is yet to be explored. In this context, we have optimized a salt based extraction coupled to ultrasonication procedure followed by lectin affinity chromatography based enrichment of cotton fiber glycoproteins (Fig. 1). In our previous study [1], protein identities were attained using the publicly available NCBInr and partially sequenced Cotton D genome (Gossypium raimondii) derived protein sequence databases as the complete genome sequence of Gossypium hirsutum and its parental species was not available at that time. Protein identities and posttranslational modification sites obtained using cross-species and partially sequenced protein databases are often incomplete. In this context, we have reanalyzed the dataset using the protein sequences from G. hirsutum (AD), G. arboreum (A) and G. raimondii (D, version 2) to explore additional protein identities and N-linked glycosylation sites that were not reported in our previous study [1].

Plant materials
Cotton plants (G. hirsutum cv. Coker 310) were grown in climate controlled green house. Cotton bolls were collected from plants during elongation stages (5-15 dpa) and fibers were carefully removed from the ovule, frozen immediately in liquid nitrogen and stored at À 70°C until further use.

Protein extraction
In order to isolate maximum amount of proteins from cotton fibers that is compatible with the downstream glycoprotein enrichment procedures, we have optimized a salt based buffer extraction followed by ultrasonication approach. Briefly, cotton fibers were made into fine powder and were suspended in extraction buffer containing 25 mM Tris (pH 7.5), 0.2 M CaCl 2 , 0.5 M NaCl, 20 mM βmercaptoethanol (β-Me), 1X Proteinase inhibitor cocktail (Roche). The buffer extract was left under constant shaking for 2 h followed by intermittent vortexing at 4°C. Ultrasonication of the suspended extract was performed at 35% amplitude for 10 min in ice cold condition. Sample extract was then centrifuged for 20 min at 10,000g and the supernatant was transferred into fresh centrifuge tubes. Three volumes of extraction buffer were again added to the pellet fraction and the extraction steps were repeated. The supernatants were pooled, filtered and dialyzed overnight. All the above mentioned steps were performed at 4°C with three independent sample replicates. Dialyzed samples were frozen and lyophilized prior to use.

Glycoprotein capture by lectin affinity chromatography
Lyophilized crude protein extract was dissolved in binding buffer containing 20 mM Tris (pH 7.5), 0.5 M NaCl, 1 mM CaCl 2 ,1 mM MnCl 2 , 1 mM MgCl 2 and subjected to Concanavalin A (Con A) lectin affinity chromatography (LAC) in a manually packed column as described by Catala et al. [3]. In order to achieve maximum yield, glycoproteins bound to the lectin affinity column was eluted in three consecutive steps each with 3 column volumes (CVs) of binding buffer containing 0.5 M methyl α-D mannopyranoside (step I) followed by 1 M methyl α-D mannopyranoside (step II) and 1 M glucose (step III) respectively (Please see Fig. 1C in [1]). Eluant fractions were pooled, buffer exchanged and concentrated with buffer containing 20 mM Tris (pH 7.5) using Amicon 10 KDa (MWCO) centrifugal filters (Vivascience).

1D and 2D SDS PAGE
Around 50 μg of the protein samples enriched using LAC was subjected to 12% linear SDS-PAGE separation [4] in replicates. The gels were either stained with Coomassie, Periodic Acid Schiff (PAS) or β-glucosyl yariv stain to visualize the protein, glycoprotein or arabinogalactan pattern respectively. Around 100 μg of the CON-A enriched protein sample was subjected to two dimensional gel electrophoresis (2D SDS PAGE) using non-linear and linear immobilized pH gradient IPG strips. Briefly, for the first-dimensional separation, the sample was loaded onto a 13 cm IPG non-linear (pI 3-10) and linear (pI 4-7) strips (Amersham biosciences) and isoelectric focusing (IEF) was performed according to the manufacturer's instructions. Strips were then equilibrated and second-dimensional separation was carried out on 12% SDS polyacrylamide gel (13 cm, 1.5 mm). Gels were stained with silver staining procedure to visualize the spots and stored in 1% acetic acid at 4°C until further use.

Gel phase digestion and gel free (solution phase) digestion
Glycoprotein samples resolved in 12% 1D PAGE gels were excised into 0.5 mm gel slices (18 slices) from high to low molecular weight region. Bands from 1D PAGE and spots from 2D PAGE containing the protein of interest were subjected to in-gel trypsin digestion as described by Shevchenko et al. [5] with minor modifications as described previously by Kumar et al. [1].
Solution phase glycoprotein samples were subjected to proteolysis by trypsin through Filter Aided Sample Preparation (FASP) method using YM30/YM10 ultracentrifugal units (Millipore) as previously described [6] to obtain peptides for glycopeptide capture and gel free 2D LC-MALDI TOF/TOF approach. Tryptic peptides were lyophilized and stored in À 80°C prior to use.

Glycopeptide capture
Tryptic peptides obtained from the glycoprotein fraction (Con A Eluant) were dissolved in buffer containing 10 mM HEPES-NaOH (pH 7.5), 1 mM Cacl 2 , 1 mM MnCl 2 , 1 mM MgCl 2 and subjected to Con A based lectin affinity chromatography as described by Kaji et al. [7] with minor modifications as described previously [1]. Since plant glycoproteins are known to posses α 1,3-linked core fucose we have utilized both PNGase A and PNGase F enzymes to deglycosylate the glycosylated peptides and to identify the potential N-glycosylation sites. Briefly, glycopeptide fractions were dried in vacuum and dissolved either in 50 mM Sodium phosphate buffer (pH 7.0) or in 50 mM Citrate phosphate buffer (pH 5.0) for PNGase F or PNGase A based deglycosylation respectively.

Database search strategy for protein identification
Tryptic peptides from 1D SDS PAGE, solution based samples and deglycosylation fractions were injected into Nano-LC system, fractionated and spotted onto PAC (Pre Anchored Chip) MALDI target plates. The plates were analyzed using Ultraflex III MALDI TOF/TOF mass spectrometer (Bruker Daltonics). Acquired mass spectra were analyzed and annotated using Flexanalysis software version 3.0 through WARP-LC (Workflow Administration by Result driven Processing -Bruker Daltonics) The protein identification parameters were as follows: Significance threshold was set to achieve po 0.02, Expectancy cut off was set to 0.05, Individual ion score 4 45 was only considered for identification. These parameters led to a FDR value o1% in both the above mentioned database search strategies. The database search strategy for deglycosylated peptide identification is the same as mentioned above with minor additions: variable modifications included deamidation of asparagine (N), the peptide is considered as formerly glycosylated only if the deamidated asparagine (N) was followed by X-S/T (any amino acid except proline-serine/threonine). Also only those peptides that were observed in three replicate sample injections are reported as formerly glycosylated peptides in the current study.