Glycoproteome of Elongating Cotton Fiber Cells

Cotton ovule epidermal cell differentiation into long fibers primarily depends on wall-oriented processes such as loosening, elongation, remodeling, and maturation. Such processes are governed by cell wall bound structural proteins and interacting carbohydrate active enzymes. Glycosylation plays a major role in the structural, functional, and localization aspects of the cell wall and extracellular destined proteins. Elucidating the glycoproteome of fiber cells would reflect its wall composition as well as compartmental requirement, which must be system specific. Following complementary proteomic approaches, we have identified 334 unique proteins comprising structural and regulatory families. Glycopeptide-based enrichment followed by deglycosylation with PNGase F and A revealed 92 unique peptides containing 106 formerly N-linked glycosylated sites from 67 unique proteins. Our results showed that structural proteins like arabinogalactans and carbohydrate active enzymes were relatively more abundant and showed stage- and isoform-specific expression patterns in the differentiating fiber cell. Furthermore, our data also revealed the presence of heterogeneous and novel forms of structural and regulatory glycoproteins. Comparative analysis with other plant glycoproteomes highlighted the unique composition of the fiber glycoproteome. The present study provides the first insight into the identity, abundance, diversity, and composition of the glycoproteome within single celled cotton fibers. The elucidated composition also indirectly provides clues about unicellular compartmental requirements underlying single cell differentiation.

its primary cell wall (4). The relatively increased protein content and its consistency throughout the elongation phase correlates with its stage-specific compartmental requirement (5). Elongation is the most active and vigorous phase, during which the cell extends between 2 to 6 cm in length at a rate of Ͼ2 mm/day (1,2). Increase in length involve the expansive deformation of the cell wall, including loosening, expansion, and remodeling. These processes collectively determine the cell wall's yielding properties and are governed by the cellulose microfibril-matrix network and associated factors, such as wall bound structural proteins and interacting enzymes (6). These proteins play crucial roles in the elongation and maturation of numerous fiber cells on the ovule surface in a synchronized fashion (7).
Earlier efforts to understand fiber cell differentiation showed the stage-specific expression of genes encoding cell wall enzymes (8), implicating their probable role in cell elongation and post elongation events (9). Furthermore, experimental data from other plant systems highlight the roles of carbohydrate active enzymes (CAZymes) 1 , such as xyloglucan endotransglycosylases/hydrolases (XETs/XTHs) (10,11), glucanases (10,12), glycosyl transferases (GTs) (13,14), and pectin methyl esterases (PMEs) (15,16), in the wall modification occurring during cell development. Most of the earlier mentioned functions were suggested based on transcriptase, molecular biology or biochemical tools. Transcript level information does not reflect the structure, function or abundance of their gene products. In addition to CAZymes, genes encoding structural proteins, such as arabinogalactans (AGPs) and fasciclin-like arabinogalactans (FLAs), have been shown to play crucial roles in fiber development (17). AGPs are also known to act as signaling molecules, modulators of cell wall mechanics, pectin plasticizers (18), and stimulators of XET activity (19) and are also involved in pattern formation (20). Despite their diverse roles (21), experimental evidence concerning the het-erogeneity and abundance that governs the roles of the AGP family members is still emerging.
Proteomic studies employed so far to understand fiber development display an overview of key metabolic events based on the expression pattern of high abundant and detectable proteins from whole fiber extracts (5,13,22). However, no insight into the cell wall bound enzymes and structural proteins have been gathered through proteomic approaches. The majority of such CAZymes and structural proteins are known to be secreted, destined for the cell wall and N-linked glycosylated in plants (23). The glycosylation status of such proteins leads to an extended or altered conformation, which in turn is essential for crosslinking to the cell wall matrix and the strengthening of the cell wall (24). Therefore, the glycoproteome indirectly represents the proteome composition of plant cell walls and may also reflect the system specific functional properties of the wall (23,25). Exploring the glycoproteome of cotton fibers may also provide interesting clues about the single cell compartmental makeup and the probable role of these proteins during development. To our knowledge, such studies have not been performed in cotton fibers. In this context, we have characterized the glycoproteome of the cotton fiber employing lectin affinity chromatography (LAC) followed by protein identification using complementary proteomic approaches. Our study provides evidence for the identity, abundance, heterogeneity, and novel forms of glycoproteins including cell wall destined AGPs, FLAs, and CAZymes. Comparative analysis with known plant glycoproteome data sets highlighted the unique compositional makeup of the fiber glycoproteome. Further validation using quantitative real time PCR (qRT-PCR) of the glycoprotein encoding genes revealed their stage and isoform specific expression profiles, suggesting these genes may play a regulated role in the developmental process.

EXPERIMENTAL PROCEDURES
Plant Materials-Cotton plants (Gossypium hirsutum cv. Coker 310) were grown in a climate controlled greenhouse. Bolls were excised from the plants during the elongation stages (5-15 days post anthesis, dpa), and fibers were carefully removed from the ovule, frozen immediately in liquid nitrogen, and stored until use.
Protein Extraction-Cotton fibers were ground into a fine powder in liquid nitrogen, along with 10% polyvinyl polypyrrolidone (PVPP) and 10% silicon dioxide (SiO 2 ), in a prechilled mortar and were suspended in extraction buffer containing 25 mM Tris (pH 7.5), 0.2 M CaCl 2 , 0.5 M NaCl, 20 mM ␤-mercaptoethanol (␤-Me), and 1ϫ Proteinase inhibitor mixture (Roche). Extraction was performed for 2 h with constant shaking and intermittent vortexing, followed by ultrasonication at 35% amplitude for 10 min with a pulse interval of 5 s in ice-cold conditions. The sample extracts were then centrifuged at 10,000 ϫ g for 20 min, and the supernatant was separated from the pellet. Three volumes of extraction buffer were added to the pellet, and the extraction was repeated. The supernatants were pooled, filtered, dialyzed overnight, and lyophilized prior to use.
Glycoprotein Capture by Lectin Affinity Chromatography-Lyophilized samples were solubilized in buffer containing 20 mM Tris (pH 7.5), 0.5 M NaCl, 1 mM CaCl 2 , 1 mM MnCl 2 , and 1 mM MgCl 2 and were subjected to lectin affinity chromatography (LAC) in a manually packed column containing 2 ml of concanavalin A (Con A) Sepharose resin (Sigma) (26). The bound proteins were eluted in three steps, each containing 3 column volumes (CVs) of buffer containing 0.5 M methyl ␣-D mannopyranoside (step I), followed by 1 M methyl ␣-D mannopyranoside (step II) and 1 M glucose (step III), respectively. Eluant fractions were pooled, and the buffer was exchanged and concentrated with 20 mM Tris (pH 7.5) using Amicon 10 kDa (MWCO) centrifugal filters (Vivascience, Germany).
One Dimensional (1D) and Two Dimensional (2D) SDS-PAGE-The protein samples that were enriched using LAC were subjected to 12% SDS-PAGE separation (27) in replicates. The gels were either stained with Coomassie Blue, periodic acid-Schiff (PAS) or ␤-glucosyl Yariv stain to visualize the proteins, glycoproteins or arabinogalactan patterns, respectively. An aliquot of the protein sample was subjected to two-dimensional gel electrophoresis (2D-SDS-PAGE) as described previously (28). The gels were stained using a silver staining procedure to visualize the spots and were stored in 1% acetic acid at 4°C until further use.
Gel Phase Digestion and Gel Free (Solution Phase) Digestion-The glycoprotein samples that were resolved by 12% 1D-PAGE gels were excised into 0.5 mm gel slices (18 slices) from the high to low molecular weight regions. The bands from 1D-PAGE and the spots from 2D-PAGE were subjected to in-gel trypsin digestion as described by Shevchenko et al. (29) with minor modifications, which are described in the supplemental Methods. Solution phase glycoprotein samples were subjected to trypsin proteolysis using the filter aided sample preparation (FASP) method as previously described (30), for the glycopeptide capture and gel-free 2D LC-MALDI TOF/TOF approach. The tryptic peptides were lyophilized and stored at Ϫ80°C prior to use.
Glycopeptide Capture-Glycopeptide capture, deglycosylation and protein identification were performed on three independent replicate samples as described by Kaji et al. (31). Database search parameters for the deglycosylated peptide identification are described in the supplemental Methods. The following criteria were used for the identification of glycopeptides: (1) the significance threshold was set to p Ͻ 0.02; (2) the expectancy cut off was set to 0.05; and (3) individual ion scores (Ͼ45) that indicated identities were only considered for identification (false discovery rate (FDR) Ͻ1%). Furthermore, the peptide was considered formerly glycosylated only if the deamidated asparagine (N) was followed by X-S/T (any amino acid except proline -serine/threonine). Additionally, only those peptides that were observed in the three replicate sample injections were reported as formerly glycosylated peptides in this current study.
Data Analysis-Data analysis was performed as shown in Fig. 1B. Briefly, mascot generic format (MGF) files were extracted from the individual samples and exported into PEAKS Studio (version 6.0) (32). Spectral files were subjected to protein identification through multiple search engines against the aforementioned databases. The following data analysis parameters were used: (1) a FDR Ͻ1%; (2) at least one unique peptide; and (3) a protein probability of Ͼ90%. The identified proteins were exported into BLAST2GO platform (version 2.0) (www.blast2go.com/b2ghome) (33) for Gene Ontology (GO) annotation, protein motif prediction, and pathway mapping. Potential N-terminal signal peptides were predicted using the SignalP 4.0 server (www.cbs.dtu.dk/services/SignalP) (34); integral transmembrane domains were predicted using TMHMM-2.0 (www.cbs.dtu.dk/services/ TMHMM), whereas potential N-linked glycosylation sites were predicted using Net N-Glyc (www.cbs.dtu.dk/services/NetNGlyc), and glycosylphosphatidylinositol (GPI)-anchored proteins were predicted using the Big-PI plant predictor tool (www.mendel.imp.ac.at/gpi/ plant_server.html) (35). Proteins were classified into different CAZyme families using the CAZymes Analysis Toolkit (CAT) (www.mothra.ornl. gov) (36). Leucine rich repeat sequences were predicted using the LLR finder tool (www.lrrfinder.com). Peroxidases were analyzed and classified using the Peroxibase database (www.peroxibase.toulouse. inra.fr/peroxiscan.php) (37). Spectral counting-based semiquantitative analysis, as mentioned in the supplemental Methods, was used to calculate the relative abundance of the proteins in the current data set (38).
Proteomic Data Set-The raw data from the mass spectrometry experiments have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository (39) and can be found using the data set identifier PXD000178. Annotated MS/MS spectrums corresponding to proteins identified from 2D-PAGE and the glycopeptide approaches are shown in supplemental Data S1 and S2.

Glycoproteome of Cotton
Fibers-Cotton fiber glycoproteins were enriched using Con A LAC and were identified using four independent approaches (Fig. 1A, 1B and Table I).
Altogether, 334 unique proteins with Ն1 unique peptide were identified with an FDR Ͻ1%. A total of 92 proteins were identified by at least two approaches, including nine proteins that were identified by all four approaches ( Fig. 2A). The molecular weight distribution of the identified proteins from the 1D-PAGE analysis is depicted in Fig. 3A. Fifty eight protein spots were identified using 2D-PAGE (Fig. 4A, 4B and supplemental Table S1), and eight of these proteins showed isoform (chain)-like patterns. Amino acid sequence variations of the protein isoforms were confirmed using tandem mass spectrometry (MS/MS) and are highlighted in Table II and in supplemental Data S1. Furthermore, we report the presence of 92 formerly N-linked glycosylated peptides, containing 106 glycosylation sites, from 67 unique proteins (supplemental Table S2 and supplemental Data S2). The protein identification details from the independent approaches are listed in  Table S3. Overlay view of the Con A lectin affinity chromatography elution profile of cotton fiber glycoproteins from three independent replicates. E01: Fractions eluted with binding buffer containing 0.5 M methyl ␣-D mannopyranoside; E02: Fractions eluted with binding buffer containing 1 M methyl ␣-D mannopyranoside; E03: Fractions eluted with binding buffer containing 1 M glucose (C). SDS-PAGE profile of crude, Con A unbound and Con A bound fiber proteins stained with Coomassie blue (D) and glycoprotein specific PAS stain (E). SDS-PAGE profile of Con A bound proteins stained with beta-glucosyl Yariv reagent and counterstained with Coomassie blue (F). Table I and supplemental Table S3. In silico analysis identified 286 unique proteins with potential N-linked glycosylation sites accounting for Ͼ85% of the glycoproteome. Furthermore, 46%, 13%, and 5% of the identified proteins were predicted to have signal peptide cleavage sites, transmembrane domain regions and GPI anchor sites, respectively (supplemental Table S3). GO analysis annotated 67% of the proteins to be either cell wall or extracellular region destined, and 48% of the proteins were annotated as hydrolases, of which 26% were found to be involved in carbohydrate metabolism (Fig. 2C). Spectral counting-based semiquantitative analysis revealed the relative abundance of redundant (repeatedly detected) AGPs and FLAs, as well as nonredundant CAZymes (Fig. 5A). Based on these observations, the cotton fiber glycoproteome can be classified into three categories: structural proteins, enzymatic proteins and proteins with other or unknown functions. Furthermore, we observed that the fiber glycoproteome is composed of 5% AGPs and FLAs, which account for 43% of the spectral counts (SpC), 38% CAZymes, which contribute 31% of the SpC and 57% other proteins, which account for 26% of the SpC (Fig. 5A, supplemental Table S4).
Structural Proteins Comprising AGPs and FLAs Display Abundance and Heterogeneity-In the present study, we identified five unique members of the AGP family and eight unique members of the FLA family (supplemental Tables S3  and S5). Using 1D-PAGE followed by LC-MALDI analysis, we observed AGP or FLA specific peptides in 15 out of the 18 gel slices excised from different molecular weight regions (Figs. 3B and 3C). The theoretical molecular weights of the identified proteins were ϳ25 kDa, where as their observed weights varied from Ͻ 25 to Ͼ130 kDa as determined by 1D-PAGE (Figs. 3B, 3C and supplemental Table S5). The AGP epitope specific ␤-Yariv staining pattern at regions Ͼ70 kDa on the 1D-PAGE gel (Fig. 1F) suggested the presence of dominant post-translational modifications (PTMs) that might contribute to Ͼ60% of their observed molecular weight. In addition, 10 distinct spots from various pIs and molecular weights were identified as AGPs and FLAs following 2D-PAGE analysis (Figs. 4A, 4B and supplemental Table S1). Amino acid variations observed in their fasciclin (FAS) domains are highlighted in supplemental Fig. S2 and S3. We also observed a novel, unknown peptide sequence (m/z 1576.61) homologous to the fasciclin region of the FLA family (Fig. 6A). Both the AGPs and FLAs harbored 3 to 4 potential N-linked glycosylation sites within the FAS domain (supplemental Figs. S2, S3), among which we identified 1 to 3 glycosylation sites per unique member (supplemental Table S2 and supplemental Data S2). Independent deglycosylation reactions with PNGase F and A (peptide N-glycoamidase) showed that FLA6 harbored two different types of core N-linked glycans (supplemental Fig.  S5A). Our study further revealed that the FLAs were comparatively more abundant than the AGPs in cotton fiber. Among these proteins, FLA1 and FLA3 together constituted 30% of the identified spectral counts (Fig. 3D). The conserved glycosylation site (N 147 VT) in AGP4 and FLA6 harbored different core N-linked glycans as determined by the deglycosylation reactions. Multiple sequence alignment followed by in silico analysis highlighted that all the identified family members contained conserved domain regions and sequence properties like Ͼ30% PAST and AGP-like modules (supplemental Table S5). The AGP module was longer, in terms of amino acid residues, in the AGPs than those of the FLAs (supplemental Fig. S2 and S3). Based on their domain organization and sequence specific properties, we designated them as chimeric AGPs (supplemental Figs. S2and S3 and supplemental Table S5) (40). Further gene expression analysis revealed stage and isoform specific expression patterns of cer-tain identified members (AGP, FLA11, and FLA15) of this protein family (supplemental Fig. S6).
CAZymes are the Major Players Among the Fiber Localized Enzymatic Glycoproteins-CAZymes accounted for 38% of the fiber glycoproteome and 55% of the identified enzymes (Fig. 5A). Further classification and analysis showed that glycosyl hydrolases (GHs) were relatively more abundant, accounting for 59% of the CAZymes, followed by 10% carbohydrate esterases (CEs) and 5% glycosyl transferases (GTs). Carbohydrate binding modules (CBMs) accounted for 26% of the CAZymes, and the majority of them were catalytically inactive toward carbohydrate substrates (Fig. 4C). These enzymes displayed heterogeneity in molecular weight and pI, including the presence of different isoforms; however, they were relatively less abundant and redundant compared with AGPs and FLAs.
Sixty-three unique proteins were identified as glycosyl hydrolases and were classified into 12 families. Among these proteins, GH3 constituted 25% of the GHs, followed by GH35 (15%), and other families (Fig. 4C(iv)). Pathway mapping and literature surveys suggested that these hydrolases catalyze similar reactions involving a diverse class of glycoconjugates.
In silico based screening of the GHs showed the presence of noncatalytic domains apart from hydrolase specific domain regions (supplemental Table S3). To highlight a few, members of the GH16, GH32 and GH35 families had lectin-like domains, whereas the GH17 family harbored CBM43 and X8 domains. Additionally, 2D-PAGE analysis revealed four different forms of xylosidase and three different forms of glucosidase enzymes (GH3) ( Table II, Figs. 4A and 4B). Genes encoding selected members of these enzyme families were predominantly expressed during the elongation phase ( Fig.  5C & supplemental Fig. S6). Three unique members of xylo-  glucan active enzymes of the GH16 family (gi 155966597) were identified, and their genes were found to be consistently expressed throughout the elongation and postelongation phases (5-25 dpa) ( Fig. 5C and supplemental Fig. S6). Approximately 10 unique members of the glucan endo-1, 3-beta-D-glucosidase (E.C. 3.2.1.39) and cellulase (beta 1-3 glucanase, E.C. 3.2.1.4) of the GH17 family were identified; three of these proteins had predicted GPI anchor sites, whereas one had a transmembrane domain (supplemental Table S3). The GH18 family included the phosphoinositide-specific phopholipase C (PI-PLC) protein family that targets phospholipids and GPI anchored proteins (GAPs) like AGPs. Invertase and fructokinase members of the GH32 family were observed in the Ͼ70 kDa, 35 kDa, and Ͻ20 kDa regions. Ten unique members of invertases were identified; only three among them had secretory signals, and these proteins could be classified as extracellular, cytosolic or vacuolar forms (supplemental Table S3). The vacuolar forms were relatively more abundant than their extracellular counterparts, and they also exhibited variants by 2D-PAGE analysis (Table II and  supplemental Table S1). In addition, we also observed novel/ unknown peptide regions homologous to the invertase (m/z 1614.71, Fig. 6C) and fructokinase protein family (m/z 1617.85, supplemental Fig. S4A). The GH31, GH35 and GH38 families included the N-linked glycan processing enzymes, and among them galactosidases of GH35 were relatively more in number and in abundance. The beta-glucuronidase and heparanase-like proteins (EC: 3.2.1.31) of the GH79 family consist of proteins involved in glucuronate interconversion.
Identified members of carbohydrate esterases (CEs) were further classified into the CE3, CE6, and CE8 classes, with the latter being relatively abundant (Fig. 4C(i), supplemental Table  S3). CE3 was comprised of phopholipase C domain-contain-ing proteins and three isoforms of MAP3K-like protein kinases. CE6 included polygalacturonase inhibitor proteins that were found to contain two different core N-glycans (supplemental Fig. S5D), and the CE8 classes included pectinesterase and pectin methyl esterases (PME) (E.C.3.1.1.11). Genes encoding the different forms of PMEs showed differential stage specific expression patterns; for example, PME4 was majorly expressed during early elongation (5-10 dpa), whereas PME5 was highly expressed during the late elongation stages (15-20 dpa) (supplemental Fig. S6).
Glycosyl transferases comprising GT1, GT2, GT4, GT34, and GT55 form the third major class of the identified CAZymes (Fig. 4C(ii), supplemental Table S3). In addition, the identified members of endo-xyloglucan transferase of the GH16 family and the fructosyl transferase of the GH32 family are also known to perform glycosyl transferase-like functions in plants (41,42).
Carbohydrate binding modules form an associated class of enzymes that is comprised of 27 unique proteins classified into eight families (Fig. 4C(iii), supplemental Table S3). Among them, CBM43 observed within GH17 members was classified as catalytically active on carbohydrate substrates. CBM18 family is comprised of oxidoreductases and oxidases and constituted Ͼ40% of the identified CBMs. In addition, glucose-methanol-choline (GMC) oxidoreductase of the CBM1 family, protease of the CBM5 family, heat shock proteins (HSPs) of the CBM13 family, purple acid phosphatase (PAP) of the CBM32 family, domain of unknown function 1680 (DUF) of the CBM35 family and lectin like domain containing protein kinases of the CBM57 family were also identified, contributing to the catalytically noncarbohydrate active fiber localized enzymes. Among them, monocopper oxidase of the CBM18 family was found to contain different N-linked glycans (supplemental Fig. S5B), and PAP of CBM32 showed isoform variants (Table II).
Non-CAZymes Play Regulatory Roles in Fiber Cell Elongation-Non-CAZymes included non-CBM oxidoreductases, proteases, and proteins with interacting domains. Oxidoreductases, which included 40 unique proteins, accounted for 14.7% of the fiber glycoproteome. Among these proteins, 12 proteins were earlier classified under CBM18, and the remaining 28 proteins can be grouped into the reductase, disulfide isomerase, peroxidase, and copper binding protein families (Fig. 5A). Each of these protein groups has associated domains, such as NAD/FAD binding domains in reductases, thioredoxin-like fold/domains in disulfide isomerases and cupredoxin domains in copper binding oxidase-like proteins (supplemental Table S3). The peroxidases identified in the current study were classified as class III (secretory class) using in silico analysis (37). Class III peroxidases are known to be involved in wall loosening of cells that undergo growth through elongation rather than division. Proteases, proteasomes and protease inhibitors accounted for ϳ6% of the glycoproteome (Fig. 5A). The protease family included aspartic and serine carboxypeptidase, whereas the proteasomes were majorly comprised of B-type subunit family members, with one unique member containing an armadillo-like fold belonging to extra cellular matrix 29 (ECM) family. Approximately 7% of the fiber glycoproteome, constituting 18 unique proteins, had interacting or binding domain(s). These proteins included lectin like domain containing calreticulin, EF hand domain containing calmodulin, and cupin domain containing germins. Four different forms of calreticulin were identified in the current study (supplemental Table S1). In addition, we also observed proteins involved in nucleic acid, carbohydrate, and lipid metabolism. Approximately 2% of the identified proteins were classified as unknown and they contained DUF-like do-mains. In plants, certain DUF like domains are predicted to have cell wall binding (DUF642) and glycosyl transferase-like roles (43). Approximately 3% of the identified proteins were predicted to have no known functional domains (Fig. 5A).
Comparative Analysis of Plant Glycoproteomes Highlights the Unique Compositional Status of Cotton Fiber-The overall composition of the fiber glycoproteome was found to be similar to other known plant glycoproteome data sets (Solanum, Arabidopsis, and Brassica sp.) (supplemental Table S6) (26,44,45,46). However, major compositional variations among the protein families were observed (Fig. 7). To highlight a few, the percentage of AGPs and FLAs in the fiber glycoproteome was comparable only with Brassica oleracea xylem sap, while AGPs and FLAs were relatively underrepresented (Ͻ 3 proteins) in the other data sets (Fig. 7, supplemental Table S6). CBM containing enzymes that are catalytically inactive toward carbohydrate substrates were absent from the other data sets, suggesting that this is a unique feature of the cotton fiber glycoproteome. Polygalacturonases (pectinase) of the GH28 family, oxidoreductases, proteases, and proteins with interaction and binding domains were relatively low (Ͻ10%) in the fiber glycoproteome. Additionally, the protein inhibitor class included only protease inhibitors, whereas CAZyme inhibitor was relatively low in fiber (Ͻ1%) (Fig. 7, supplemental Tables S6 and S7).

DISCUSSION
The development of cotton seed epidermal cells into long fibers has been widely studied, and various factors, such as structural proteins, CAZymes, and transcription factors, have been examined for the individual roles that they play in the differentiation process. However, systems level identification and characterization of various protein components is still lacking, and this type of analysis will help researchers better understand their roles in development. Wall yielding properties majorly regulate the differentiation events leading to the fiber's length and strength. Wall-destined proteins are known to be glycosylated in plants (23), and they might in turn contribute to such yielding properties. Therefore, glycoproteome approaches can be employed to study the wall-destined proteins in plants (23). Identification and analysis of the cotton fiber glycoproteome revealed that the majority of the proteins were either destined for the cell wall or were extracellular in nature. Further, our results suggested that the structural proteins were relatively abundant, compared with the CAZymes and other enzymes (Fig. 5A). In this study, by employing complementary proteomic approaches, we have been able to resolve the heterogeneity of the fiber glycoproteome. Briefly, a 1D-PAGE based approach revealed the presence of unique and identical peptides corresponding to same protein family from different molecular weight ranges, whereas 2D-PAGE analysis showed protein isoforms with minor amino acid variations. In addition, independent deglycosylation reactions using PNGase F and A showed overlapping and unique peptides, suggesting differences in the core N-linked glycans attached to these protein molecules. Together, all of these approaches suggest the presence of both protein isoforms and glycoforms in cotton fiber. The presence of the different forms of the same protein highlights the cellular requirement for proteins to perform similar functions at different developmental phases, localizations, and under different physiological conditions (Figs. 8A, 8B, 8C and 8D).
The abundance and distribution of the identified glycoproteins provided clues about the cellular makeup of the cotton fiber (47). The GHs were relatively more abundant and diverse compared with other CAZymes. Further analysis revealed that most of the GHs and non-CAZymes harbored noncatalytic carbohydrate binding modules (CBMs) or interacting domains (lectin) of broad specificity. Comparative analysis with previously reported plant glycoproteomes (26,44,45,46) revealed that enzymes (CAZymes and non-CAZymes) containing noncatalytic carbohydrate binding modules were observed only in cotton fiber, which further highlights the unique feature of the fiber localized glycosylated enzymes. In a cell such as a cotton fiber, which is rich in structural and nonstructural carbohydrates, the presence of CBMs and carbohydrate interacting domains in enzymes would be advantageous as it might modulate the protein's activity by increasing or stabilizing these enzymes in close proximity to its substrate (48). On the other hand, we also observed CAZymes and other enzymes devoid of such interacting or binding domains. In addition to these features, these enzymes also had isoform variants. Such diverse and discriminating features among similar enzymes depict the cellular requirement for substrate hydrolysis, remodeling, and grafting in elongating cotton fibers (Fig. 8D).
Structural proteins have been proposed to have functional properties that might contribute to the dynamic status of the cell wall (49). In mammals, proteoglycans are known to be abundant molecules in the extracellular matrix, and they act as biological lubricants and stabilizers of cellular integrity (50). However, data concerning the abundance, heterogeneity and associated functional roles of such molecules in plant systems are still emerging. In order to withstand the diverse processes occurring during elongation, the fiber cell wall needs to contain responsive structural molecules. Plant AGPs and FLAs are reported to play major and diverse roles in cell development (51); however, these proteins were less represented in the non-fiber glycoproteome datasets (Fig. 7, supplemental Table S6). In the current study, we observed redun- dant and abundant members of the AGP and FLA families, suggesting they might play major roles as structural molecules in cotton fiber cells. Heterogeneity in their distribution across various molecular weights and pIs were observed, and this may correspond to variations in amino acid sequences and PTMs. Abundance could be the major determinant of the functional parameter for such molecules. The glycan component of these proteins might play a major role during extension by acting as molecular cushions. Cell wall localized AGP signals could also stimulate enzymes such as XETs, as demonstrated in in vitro experiments (20). Our earlier studies showed that transcripts encoding these arabinogalactans and CAZymes were majorly down-regulated in a lintless mutant (52). Additionally, these protein encoding transcripts were highly down-regulated during conditions of drought stress (53), suggesting they play major roles in fiber development and related conditions.
In conclusion, we have made a major attempt to characterize the cotton fiber glycoproteome and have revealed cell wall destined structural and enzymatic proteins. Our comprehensive analysis identified the presence, abundance and heterogeneity of fiber localized glycoprotein families such as AGPs, CAZymes and other glycoproteins. Such structural proteins and enzyme isoforms might play non-redundant roles throughout fiber development (Figs. 8C and 8D). The diverse and heterogeneous features of the identified glycoproteins displays the tetraploid nature of G. hirsutum contributed by the A and D parental genomes. GO based functional annotation showed that the fiber glycoproteome followed a distribution pattern similar to other plant glycoproteomes, but there were compositional variations unique to cotton fiber. The identification of particular protein families and their abundance shown in this study reflect the major determinants of the structural and functional parameters governing the wall yielding properties of the tetraploid cotton fiber.
Acknowledgments-We wish to thank Ranjana Pathak and Israr Ahmad for their assistance during documentation.