Glycopeptide Analysis, Recent Developments and Applications*

Glycopeptide-based analysis is used to inform researchers about the glycans on one or more proteins. The method's key attractive feature is its ability to link glycosylation information to exact locations (glycosylation sites) on proteins. Numerous applications for glycopeptide analysis are known, and several examples are described herein. The techniques used to characterize glycopeptides are still emerging, and recently, research focused on facilitating aspects of glycopeptide analysis has advanced significantly in the areas of sample preparation, MS fragmentation, and automation of data analysis. These recent developments, described herein, provide the foundation for the growth of glycopeptide analysis as a blossoming field.

Because EPO 1 is classified as a banned substance, drug testers need to be able to distinguish between recombinant and endogenous forms, and the glycosylation on EPO provides quite a useful means of distinguishing the compounds. In other cases, recombinant proteins are expressed for human and animal consumption for quite legitimate reasons, such as to aid in fertility, as is the case with follicle-stimulating hormone (6). In these instances, characterizing glycosylation of recombinantly expressed proteins is one important step is assessing the overall drug product quality. Finally, glycopeptide analysis can even aid in the development of new products, such as in the development of an HIV vaccine (7)(8)(9)(10). Glycopeptide analysis has been used to extensively compare the properties of numerous HIV envelope proteins under investigation for their potential to elicit a strong immune response against the HIV-1 virus. The following section highlights, in more detail, the examples mentioned above. These highlights are by no means an exhaustive list of important problems that are being addressed with glycopeptide analysis. Rather, the examples give the reader a sense of scope of the types of samples that are currently being studied using this technique and the types of problems that can be addressed.
Characterization of Endogenous Glycoproteins-One interesting example of the need for characterization of biologically relevant, isolated glycoproteins is the analysis of several glycopeptides from the venom of Dendroaspis angusticeps (1). These highly active polypeptide compounds were found to be both glycosylated and rich in proline; both features are unusual in snake venom compounds. Because the analytes of interest were generally under 4 kDa, no enzymatic digestion was required prior to analysis; and a top-down approach was used for assigning both the protein sequence, the glycosylation site occupancy, and the glycans themselves (1). The species contained small O-linked glycans. In this research, a variety of MS techniques were used, including MALDI and ESI for ionization and CID and ETD for fragmentation (1). This study represents the first step necessary for understanding the structure/function connection for this particular snake's venom, but it also serves as an illustrative example of the need for classifying glycosylation, as a first step, for achieving a better understanding of a glycoprotein of interest.
Often, classifying the glycosylation of a protein in one species is a very useful start to understanding that protein's characteristics, but if the protein of interest is present in organisms from multiple species, which is almost always the case, a comparison of the glycosylation on the same protein from different species can provide important insights into the changes in protein properties that are observed in different animals. For example, the glycosylation on the protein's FSH was characterized from human and horse isolates, and between these two species more glycoforms were identified that were unique to one species or the other than those glycoforms that were common to both species (4). The protein, with four different glycosylation sites, was characterized using a proteinase K digestion and analyzed using high resolution ESI-MS and CID data, and ϳ30 glycoforms were detected for each species (4). The key data are shown in Fig. 1.
Because glycosylation is dependent on the local environment of the cell, this modification can even change when samples are obtained from the same species but from two different biological conditions (2,3). About 10 years ago, the glycosylation state of haptoglobin was shown to be different between healthy cells and cancer cells (11), and this finding, along with other notable works showing that the glycosylation in certain proteins from individuals with cancer varies, compared with healthy controls (12,13), opens up the possibility for using the glycosylation state both as a marker for disease state and also as a potential window into understanding the biology of the disease state itself. Although such studies are tantalizing, the analytical techniques to support the work must be in place and capable of processing clinical samples. Both the ability to detect the glycoforms from the protein of interest, from a highly complex sample, and the ability to quantify those resulting glycoforms are current challenges that are being tackled by emerging methods (2,3).
Characterizing Recombinant Glycoproteins-In addition to the need to characterize endogenous analytes, equally impor- tant is the analysis of glycosylation from recombinantly expressed proteins. An interesting example highlighting this need is provided in Ref. 5, where the goal is to understand the glycosylation profile of recombinantly expressed EPO, a banned substance for professional athletes, so that this protein could be detected and differentiated from endogenous EPO. In this case, the recombinant protein was treated with trypsin and Glu-C, and the glycopeptide products were monitored by CE-MS using a TOF detector (5). Both O-linked and N-linked glycopeptides were detected, and the authors found several glycopeptides containing Neu5Gc in the recombinant form of the protein, which is significant because this glycoform is not native to humans and could identify the sample as being recombinantly expressed. This study is a more recent example of the successful glycopeptide analysis of recombinant EPO, which has also been accomplished previously. For examples, see Refs. 14 -16. Although glycopeptide analysis in the drug testing arena is a niche application, this analysis is much more widely practiced for the characterization of legally prescribed therapeutic glycoproteins. One report describes the glycopeptide analysis of recombinant FSH, a glycoprotein used to treat infertility (6). Characterizing the glycosylation on therapeutic glycoproteins is one important way that manufacturers can monitor the batch-to-batch consistency of their manufacturing process, and these analyses can also help to distinguish biosimilars, such as generic forms of therapeutic proteins. To obtain coverage of all four glycosylation sites on FSH, the investigators found that a chymotrypsin digestion was more useful than simply digesting with trypsin (6). Data were acquired on a qTOF mass spectrometer, and both high resolution MS and CID data were used to confirm the glycopeptide compositions (6).
In addition to characterizing the glycosylation of glycoprotein pharmaceuticals that are already on the market or in late stages of development, another important aspect of characterizing recombinant glycoproteins is supporting drug discovery. For the last decade, glycopeptide analysis has served to aid HIV researchers who are interested in developing the recombinant protein gp120 into an effective vaccine for HIV-1 (7)(8)(9)(10). Although this protein is extensively glycosylated, with 20 -30 N-linked glycosylation sites, depending on the exact sequence of the protein, glycopeptide analysis of this protein has been useful in developing a three-dimensional model of the full protein, including the glycoforms, which consume about half of the protein's mass (7). These analyses have also shown that the glycosylation on different HIV-1 variants changes (8 -10). Furthermore, the glycoforms derived from recombinant gp120 proteins known to be on transmitted viruses are different from the glycoforms that are typically found on the recombinantly expressed version of the analogous protein found on the virus from chronically infected patients (10). Researchers have shown that this particular protein's glycosylation is also highly diverse, with over 300 glycoforms typically characterized per recombinantly expressed protein (8 -10). These studies were all done using a tryptic digestion and HPLC-MS and MS/MS on high resolution instruments, and in some instances the work was supported with off-line HPLC followed by MALDI-TOF/TOF analyses (7)(8)(9)(10).
In summary, the types of applications warranting glycopeptide analysis are quite diverse, and the methods used to solve each problem at hand also vary. Frequently, a digestion strategy must be designed that takes into account the nature of the specific protein to be characterized. In the examples above, no digestion was necessary in one case (1), and in another case, chymotrypsin was shown to be the enzyme of choice (6). Proteinase K was also chosen in one of the cited references (4). In most cases, trypsin was used as the sole enzyme; however, sometimes pairing trypsin with a second enzyme, such as Glu-C in the case of the EPO analysis, provides significant advantages over trypsin alone (5).
Although the analysis of recombinant glycoproteins typically does not require the use of an up front protein purification method, this step is essential when characterizing glycoproteins from endogenous sources. Again, this step is typically dependent upon the exact protein to be isolated, but more research in enhancing the detection of glycopeptides in general, along with glycopeptides from a specific protein of interest, is a developing field that is expanded upon more below.
Another interesting observation can be made in considering the above-described applications. No consensus exists in the field as to the best MS method to characterize glycoproteins.
In the examples above, both MALDI and ESI data were used, along with several different types of mass spectrometers, including TOF/TOF, qTOF, oaTOF, ion traps, FT-ICR, and triple quadrupoles. Additionally, both ETD and CID data were employed, and in at least one case, no MS/MS data were used to support the assignments (5).
When a diversity of MS methods is employed, a diversity of data analysis strategies naturally follows. In many of the examples above, no particular strategy for analyzing the data, other than manually matching the candidate mass to the mass of its assigned glycopeptide, was explicitly described. In other cases, CID data were acquired and manually interpreted to support the high resolution assignments. In still other cases, some tools designed to support glycopeptide analysis were cited as being important aids. Clearly, data interpretation aspects of glycopeptide analysis are an important area that must be further developed, if glycopeptide analysis is to become routinely and more widely implemented.
One final important observation should be acknowledged regarding all the glycopeptide analysis applications described above. In each case, the analysis method provides composition information about the glycans but not full structural information, i.e. sequence, branching, linkage, and anomeric assignments. Often, investigators using glycopeptide data obtain some degree of glycan connectivity information, par-ticularly when CID data are employed, but linkage and anomeric information is absent in typical MS/MS data for glycopeptides. It is common practice in the field to depict "best guess" assignments for the glycan structures, such as those shown in Fig. 1, using a combination of the fragmentation data and "biological precedence" or a basic understanding of the types of structures that are typically formed, based on the glycosylation processing enzymes available. If complete structural data for the glycans are required, glycopeptide analysis can be paired with glycan analysis. For more information about obtaining structural information for glycans, readers may explore Refs. 17, 18.
The remaining portion of this review focuses on the emerging work in methods development supporting the field of glycopeptide analysis. Specifically, what sample enrichment and chromatographic methods are emerging? What new tools from the MS community could aid in data acquisition? What is the state of software development for supporting the MS analysis of glycopeptide data?

Emerging Methods Supporting Glycopeptide Analysis
Sample Enrichment and Separation-Sample enrichment of glycoproteins and glycopeptides is a key step prior to acquiring useful data for glycopeptide analysis. Except for recombinantly expressed proteins, where the glycoprotein of interest is already available in high concentration and high purity, affinity enrichment is necessary for a successful analysis. Various methods of affinity enrichment have recently been reviewed elsewhere (19). Briefly, common strategies involve using an antibody that recognizes the protein of interest, when just one glycoprotein is to be profiled (20,21), or using lectins to select for a group of proteins that are glycosylated (22,23). One attractive recent example of proteome-wide glycoprotein enrichment involves using a series of lectins to capture proteins with specific types of glycosylation (24). For example, Sambucus nigra agglutinin (SNA); was used to capture sialylated structures, whereas Aleuria aurantia agglutinin (AAL) was used to capture fucosylated structures (24). This strategy can also be used in conjunction with a Multiple Affinity Removal System (MARS) column to deplete abundant (nonglycosylated) proteins, further enriching the remaining sample in glycoproteins (24).
In contrast to enrichment at the protein level, which is typically only required when the sample is from endogenous sources, enrichment and/or separation of glycosylated species at the glycopeptide level is useful for all types of glycopeptide analyses, including the characterization of recombinant glycoproteins. One useful strategy for enhancing the MS signal of glycopeptides is to first remove any peptides that may co-elute (25)(26)(27)(28). This strategy has been utilized by multiple groups, taking advantage of a simple enrichment approach using Sepharose beads, as first demonstrated by Wada et al. (25). More recently, an alternative separation strategy using magnetic beads coated with a hydrophilic interaction liquid chromatography (HILIC) stationary phase has also been described (29). Although the HILIC approach appears to be quite advantageous, the beads are not currently commercially available; however, the preparation does not seem to require extensive expertise (29).
In addition to (or instead of) the off-line separation methods listed above, on-line (LC-MS) separation and enrichment are almost always used for glycopeptide analysis. Although HPLC-MS separation of glycopeptides with reverse phase (C18) columns is the most widely adopted pre-MS separation method, HILIC columns are also becoming more commonly used as well. Reverse phase columns typically separate the glycopeptides based on the peptide portion, although most of the glycoforms containing the same peptide sequence coelute. This co-elution can be an advantage, in terms of data analysis, because all the glycoforms of a given glycopeptide are easier to find in the data set. However, co-elution is also disadvantageous in that abundant glycoforms can suppress the signals of lesser abundant species. HILIC columns provide an alternative separation strategy in that the glycoforms interact with the hydrophilic column; thus, the glycoforms for a given peptide are more readily separated. Advances in the use of HILIC columns for glycopeptide analysis have been reviewed separately (30).
In the separation of glycopeptide isomers, using a graphitized carbon column is another useful strategy that could be employed (31). Lebrilla and co-workers (31) used this stationary phase in a microchip for nanoflow separations followed by MS and MS/MS analysis. This approach was useful for detecting 13 different glycoforms for RNase B, a glycopeptide that typically shows five different compositions. (Others have also shown that PGC columns provide a similar separation performance to reverse phase columns (32).) Separation at the nanoscale has significant advantages when sample quantities are limited, such as when endogenous glycoproteins are to be studied. A quite promising and emerging method for these types of separations includes porous layer open tubular (PLOT) columns (2). Recent work has shown extensive glycosylation coverage on a biologically isolated protein from lung cancer, where a total of 13 ng of protein was used for 10 LC-MS runs (2). The implementation of these types of columns will certainly open up the field of glycoproteomics to be able to answer more challenging biological questions when only limited samples are available. Fig.  2 includes data from a PLOT column.
A final option for separating glycopeptides is to utilize ion mobility. In reference 33, high field asymmetric waveform ion mobility spectrometry was used to separate glycoforms that had the same peptide sequence but different O-linked glycosylation sites and different glycans at those sites. This example of glycopeptide separation is interesting because the species that could be separated by ion mobility co-eluted under typical (reverse phase) chromatographic conditions. There-fore, this finding suggests that in some cases ion mobility could be used as an orthogonal separation strategy, when liquid phase chromatographic separation is insufficient.
Mass Spectrometric Fragmentation Methods-Although choosing a separation strategy that maximizes the MS detection of glycopeptides is a key step in the success of glycopeptide analysis, detecting the m/z values of the glycoforms is typically not sufficient for high confidence analysis (34). Fragmentation data on the glycoforms should be used to support the assignments. Historically, CID has been the dissociation method of choice for glycopeptides. Yet this approach has its disadvantages in that little to no fragmentation across the peptide is typically observed, and depending on the glycans present, one may (or may not) obtain sufficient fragmentation along the glycan backbone. Therefore, an active area of research in glycopeptide analysis is exploring the utility and complementarity of alternative fragmentation methods.
Aside from CID, the second most widely utilized fragmentation for glycopeptide analysis is ETD. For example, the snake venom glycopeptides described above were inferred based in part on their ETD fragmentation data (1); the haptoglobin analysis also relied on ETD (2). This method is useful because extensive peptide backbone cleavage is readily observed, and this fragmentation information is very complementary to the data obtained during CID. Fig. 3 shows an example of a glycopeptide fragmented both by CID and ETD, to further illustrate the differences in the dissociation methods.
Whereas the implementation of ETD is not brand new to the field of glycopeptide analysis, particularly for N-linked glycopeptides, recent work by Packer and co-workers (32) has shown that this technique is highly suitable for characterizing the very heavily O-linked protein MUC-1. This protein can frequently possess five O-linked sites in a span of less than 20 amino acids (32), and characterization of the analytes this complex is surely at the very edge of what is possible with today's technology ETD fragmentation was highly useful in discriminating the various O-linked glycoforms on multiple glycosylated peptides.
Another option for obtaining complementary fragmentation information for glycopeptides is exploring CID regimes that input more energy into the ions. Examples of higher energy CID data would include the fragmentation typically observed in a MALDI-postsource decay experiment or in implementing "HCD" on the orbitrap. Both of these dissociation methods are also being utilized (35)(36)(37)(38), although their fragmentation is less orthogonal to low energy CID than is ETD. A recent good example of implementing postsource decay data into a glycoproteomics workflow is provided in the analysis of the O-linked glycopeptides on MUC-1 (35). The utility of HCD for glycopeptide analysis was shown by Segu and Mechref (36) and Cordwell and co-workers (37); other groups are still discovering and praising the utility of this approach (38).
Although not yet applied to glycopeptides, this fragmentation method has shown great promise on glycans in that significantly more fragmentation across the glycan is present than is observed with CID (39). This additional fragmentation can be used to provide detailed structural information about the glycan, including the linkages of the monosaccharides. As mentioned above, obtaining structural information on the glycan portion of glycopeptides is not possible with currently implemented fragmentation methods. To obtain structural information, researchers must pair their glycopeptide analysis experiments with glycan analysis experiments. Even when these two sets of experiments are done simultaneously, a full view of the glycans is not present because glycan analysis loses the information regarding the site(s) of attachment to the protein.
The ability to obtain structural information on glycopeptides would represent a significant advance in the field of glycoprotein analysis. Fig. 4 shows a comparison of fragment ions obtained from CID versus UVPD for an N-linked glycan; although not yet applied to glycopeptides, this technique is the field's current best hope for obtaining structural information at the glycopeptide level.
Data Analysis-Although separating and detecting glycopeptides are important, the final hurdle over which glycoproteomics practitioners must ascend is assigning the MS data to glycopeptide compositions in an accurate and expedient fashion. Although analyses for simple glycoproteins, with a limited number of glycosylation sites, can often be done manually by expert researchers, automation of the analysis workflow offers several advantages. Three critical factors are as follows: 1) time savings; 2) higher confidence that the assignments are correct, because software can offer the option to calculate a false discovery rate; and 3) a lower level of expertise is necessary to interpret the results using automation versus manual analysis. Although the benefits of automation are clear, what is less clear is which automation strategy will ultimately become the approach of choice for end users. Currently, several different platforms are under development, each with their unique advantages and quirks.
The publicly available analysis tools that currently appear most promising can be delineated into three categories as follows: 1) those that facilitate the assignment of candidate compositions from (high resolution) MS data; 2) those that score potential assignments against some type of MS/MS data, typically CID; and 3) those that use both MS and MS/MS data for inputs. Generally, both MS and MS/MS data are required to provide a high confidence assignment for any glycopeptide composition (34). Unfortunately, only a few tools are currently available to the public that make use of both types of data in one package.
For obtaining potential candidate compositions, GlycoMod (http://web.expasy.org/glycomod/), is the most highly cited tool (40). It has been in existence for over a decade and has extensive functionality for matching glycopeptide compositions to MS data, including the ability to select multiple enzymes for in silico digestion and providing both "known" glycopeptide compositions along with those that are mathematically possible, even if they have not been observed in nature yet (40). Other publicly available tools designed spe-cifically for matching the masses of glycopeptides to the MS data include GlycoPep DB (http://hexose.chem.ku.edu/ glycop.htm) (41) and GlycoSpectrumScan (http://www. glycospectrumscan.org.) (42). These tools have unique advantages in that they both can handle multiply charged MS data, whereas GlycoMod only processes singly charged MS data. Additionally, GlycoPep DB uses a curated database of biologically relevant glycans, which reduces the number of irrelevant potential matches to be considered.
After obtaining candidate compositions from high resolution MS data, researchers use a different set of tools to interpret the MS/MS data. The two most powerful tools that utilize MS/MS data to identify glycopeptide compositions are GlycoMiner (http://www.chemres.hu/ms/glycominer/tutorial. html) (43) and GlycoPep Grader (44) (http://glycopro.chem. ku.edu/GPGHome.php). GlycoMiner was developed based on glycopeptide fragmentation as observed in a qTOF mass spectrometer, and GlycoPep Grader was optimized for MS/MS data from an ion trap. Therefore, the two tools are complementary in that they are aimed at different audiences. Both have been shown to be highly effective in assigning glycopeptides, and GlycoPep Grader is particularly advantageous in that its false discovery rate was shown to be very low.
An alternative to using separate software to interpret the MS and MS/MS data is to use a workflow where both data types are input and used. One highly promising and publicly available tool for glycopeptide analysis that functions in this manner is GlypID (45,46). GlypID is still in developmental stages, but it is publicly available. One valuable feature of GlypID is that it can make use of CID data and HCD data. It is available on line at: http://mendel.informatics.indiana.edu/ ϳchuyu/glypID/index.html. Another semi-automated strategy for interpreting both MS and MS/MS data is to input glycan types as variable modifications into an existing proteomics software package, such as Sequest (Thermo Scientific, San Jose, CA). This approach is described in Ref. 47. Although using robust commercial software that accepts both MS and MS/MS data is highly attractive, this approach suffers from the need to input all possible glycoforms as variable modifications. Therefore, this approach becomes impractical if a large number of glycan types are to be searched.
In summary, the field of data analysis software to support glycopeptide assignments is at least a decade behind proteomics software development. Clearly more tools are needed, particularly as new methods of fragmenting glycopeptides become more mainstream. Once robust analysis tools are developed and broadly implemented, data analysis would no longer be the chief bottleneck in glycoproteomics workflows, and a much larger pool of users could engage in these important experiments.
Concluding Remarks-Glycopeptide-based analysis by mass spectrometry is a field in flux, with researchers currently developing critically needed solutions to problems associated with sample preparation, MS analysis, and data interpretation. Continued development of tools in these areas is needed to achieve the following goals: effective separation and ionization of all glycopeptides of interest, even among complex samples; effective fragmentation so that optimal information about the glycan and peptide components can be inferred; and effective software to interpret the data in an automated fashion. Although significant progress has been made in each of these three areas in just the last 2 years, more effort is needed to reduce to common practice many of these emerging methods and tools. Researchers have already shown that glycopeptide analysis can serve diverse and extensive needs, and the number of applications for glycopeptide analysis is expected to expand as the technology to support the work becomes more refined. There is no doubt that glycopeptide analysis will continue to blossom into an essential component of basic biological and biomedical research.