Application of Microarrays for Deciphering the Structure and Function of the Human Glycome*

Glycan structures were defined historically using multiple methods to determine composition, sequence, linkage, and anomericity of component monosaccharides. Such approaches have been replaced by more sensitive MS methods to profile or predict glycan structures, but these methods are limited in their ability to completely define glycan structures. Glycan-binding proteins, including lectins and antibodies, have been found to have exquisite binding specificities that can provide information about glycan structures. Here, we show glycan-binding proteins can be used along with MS to help define glycan linkages and other determinants in unknown glycans printed as shotgun glycan microarrays.

Although exciting developments in genomics and proteomics during the past 25 years have rapidly advanced our understanding of the structure and function of genes and proteins, they have also exposed the persistent gap in our understanding of the structure and function of the human glycome. The complexity of glycan structures, the potentially large size and dynamic nature of the glycome as well as technical difficulties in glycan sequencing, have helped to create such a disparity (1-6) and will not be reviewed here. However, it is noteworthy that the biomedical community has long been aware of the following: all living organisms have a glycocalyx on their cell surfaces (7)(8)(9); expression of a significant percentage of the genome (10,11), including Ͼ700 genes involved in glycan-related processes (12), is required for its synthesis; pathogens invade their hosts by attacking this "sugar" barrier (13)(14)(15)(16)(17); and by the late 1960s significant alterations in cell surface glycosylation were recognized as major differences in normal and cancer cells (18,19). Despite such common facts, this scientific area, recently more formally recognized as glycomics (20 -22), has historically been somewhat overlooked by the general biomedical research establishment.
Two major obstacles to the progression of glycomics have been the lack of simple and robust methods for determining glycan structures (23,24) and the facile means for glycan synthesis (25,26), in contrast to the automated methods available for proteins and nucleic acids. The greatest recent advances in glycan sequencing have involved mass spectrometry (27,28), which now in combination with gene expression studies (29) can link such structural information to the information on biosynthetic pathways that may be helpful in providing clues to glycan sequences by predicting glycan compositions (30,31). Despite the availability of ultra-sensitive MS methods (32), MS data are limited in their ability to define epimers of hexoses and N-acetylhexosamines or to determine positional linkages, and they provide no direct information on anomeric configurations of individual monosaccharides.
Multiple Approaches for Profiling a Glycome-Over the years, many approaches were developed to provide information about glycan structures. For example, antibodies and lectins, now generally grouped as glycan-binding proteins (GBPs), 1 have been used in indirect approaches to explore features of cell surface glycans. Antibodies to blood group antigens (ABO and Lewis antigens) (33)(34)(35) have been used to identify expression of these determinants on glycoproteins and glycolipids in cells and tissues. Lectins and anti-glycan antibodies have been useful in identifying glycan structures associated with specific glycosylation pathways (36 -40), as well as to generate mutant cell lines with altered glycosylation identified by their resistance to toxic lectins (41). Specific GBP binding can provide significant information on the presence of unique monosaccharides (sialic acid and fucose), on the identification of epimers of hexoses (Gal, Man, and Glc), and in many cases anomericity, but they cannot provide complete glycan sequence information. In addition, the quality of lectins and antibodies and their precise specificities are often not well defined, creating problems in interpretation and replication of experiments.
More direct biochemical approaches to defining glycans involve disassembling the glycocalyx to investigate the structures of specific glycoconjugate classes (42,43). Free glycans generated by chemical or enzymatic means or directly available in body fluids, such as milk or urine, may be directly analyzed by MS methods, but they are often separated by chromatographic techniques, permethylated, and then characterized by MS methods (44). The amphipathic glycosphingolipids (GSL), which are small relative to glycoproteins and proteoglycans, are isolated from lipid extracts of cells and tissues, separated by nonaqueous chromatographic techniques, permethylated, and characterized directly by MS methods (45). Glycans can be released enzymatically from GSL using recombinant endoglycoceramidases and prior to permethylation and MS sequencing (46). For most common GSL that have a double bond in the sphingosine moiety, ozonolysis followed by degradation at neutral pH eliminates the glycan and provides a simple method for obtaining free GSL-derived glycans that can be reduced, reductively aminated with a variety of tags, permethylated, and analyzed by MS methods (47).
Glycoproteins and proteoglycans may be prepared from biological fluids and aqueous or aqueous-detergent extracts of cells or tissues and then separated by a variety of chromatographic techniques. In cases where a glycomic profile was considered informative, the component glycans may be released by enzymatic or chemical means, such as by treatment with proteases followed with endoglycosidases, e.g. peptide-N-glycosidase-F and O-glycanase, to release N-glycans and the Ser/Thr-linked core 1 O-glycan, respectively, or treated with base, e.g. NaOH or hydrazine, to release Ser/Thrlinked O-glycans by ␤-elimination using reductive or nonreductive methods (48). The resulting free glycans, like those derived from GSL, are typically either directly derivatized by fluorescent tags, e.g. 2-aminobenzamide (49) or 2-amino-N-(2-aminoethyl)benzamide (AEAB) (50) for separation by HPLC (51), or directly reduced, permethylated, and analyzed by MS analysis to obtain structural information (1,5,27,28,42,52). Thus, methods to obtain the glycans comprising the glycoconjugates of a glycome are generally available, but they all have specific limitations.
Glycan Sequencing Approaches Can Be Combined to Aid in Defining the Glycome-Current ultrasensitive MS methods are routinely used for predicting glycan composition and sequence, and if sufficient standards and sophisticated tandem MS technologies are available to reproducibly identify unique fragment ions, detailed glycan structural analyses are possible by highly skilled investigators. Such complex structural studies are time-consuming and difficult but can ultimately help to provide the list of glycan structures comprising a glycome. Although such a list can provide the "road map" for the synthetic organic chemists to synthesize the glycans com-prising a glycome, the information alone has intrinsically little value, and the unfortunate lack of facile synthetic methods put the "time until we have a glycome" far into the future. Thus, if glycomics is to evolve along the lines of genomics and proteomics, simple and robust methods that complement MS and permit the "capture" of natural glycans for functional studies are also required. Such methods can evolve from the technologies that were classically performed on pure compounds, which would also provide access to the glycans for functional analysis. Although NMR spectrometry has long been considered a definitive tool for identifying complete glycan structures (53,54) and is probably absolutely required for microbially derived complex polysaccharides (55), its application is limited because few naturally occurring glycans of interest are available in sufficient quantities for NMR analysis.
A variety of separation techniques have been applied to glycan analyses with great success, and if carefully monitored and validated, these methods could provide significant structural information based on comparison of chromatographic properties with known standards, as well as direct analyses by MS. Examples of significant success in these areas include ultra performance liquid chromatography (56), HPLC-MALDI-TOF MS (57), high performance anion-exchange chromatography (58 -60), two-dimensional HPLC (61-63), high performance capillary electrophoresis (42), and fluorophore-assisted carbohydrate electrophoresis (64,65). All of these methods rely on the high resolution of the separation techniques and the sensitivity of detection of separated glycans; however, all require the availability of defined standard glycans for comparison. Methods employing multiple chromatographic strategies, including hydrophilic interaction liquid chromatography (66), can be useful in comparative analysis of glycomes, as elegantly applied to serum glycoprotein glycosylation (23). However, most of these approaches suffer from the possibility that undefined glycans can co-purify with known structures and thus severely limit the discovery of previously unidentified glycans.
Serial lectin affinity chromatography (37,67,68) was an early strategy for analyzing glycans based on knowledge of specific structural features in glycans required for their interactions with lectins. In this approach, glycans either metabolically radiolabeled (69), end radiolabeled, or fluorescently labeled were analyzed by their chromatographic properties and affinity to specific lectins and antibodies. In these studies, glycan structures were determined by their specific binding to defined lectins and antibodies, co-chromatography compared with standards, and the use of highly purified, specific endo-and exo-glycosidase digestions to monitor changes in chromatographic behavior (70). Anomeric configurations and linkage positions of monosaccharides in a glycan were assessed based on susceptibility or resistance to specific enzyme degradation. This approach using HPLC-based methods is clearly useful for analyzing extremely small amounts of material (71). In some cases it has been possible to define branching of radiolabeled N-glycans using acetolysis (72) and linkage analyses on extremely small quantities of glycans by subjecting metabolically radiolabeled glycans to permethylation, hydrolysis, acetylation, and identification of methylated monosaccharides by gas chromatography with detection of radioisotope in the eluted gas (67,73). Although these approaches are laborious and time-consuming, they can generate information on the detailed structure of glycans at levels of sensitivity that are equivalent, if not superior, to some of the more sophisticated methods used today. An example of the use of such lectins in glycomics studies is the use of the plant lectin LPHA, the leukoagglutinin from Phaseolus vulgaris (red kidney bean). This lectin binds to a unique set of isomers of complex-type N-glycans containing branched mannose with N-acetyllactosamine units linked ␤1-2 and ␤1-6 to ␣-linked mannose (the so-called 2,6-branched mannose) (74). In cells (75) and mice (76) lacking the ␤1-6 N-acetylglucosaminyltransferase, the binding of LPHA is lost. Thus, this simple plant lectin is exquisitely suited to provide detailed information about N-glycan branching and expression and to distinguish N-glycan isomers and glycoproteins containing such structures in glycoproteomic analyses (77).
Defined Glycan Microarrays Provide Clues to Understanding GBP Function-The publicly available defined glycan microarray developed by the National Institutes of Healthfunded Consortium for Functional Glycomics (CFG) has made a major impact on the advancement of functional glycomic analysis. After the analysis of hundreds of GBPs on this array, investigators in many areas are beginning to appreciate access to glycans in a microarray format that can generate information on GBP function. Because it is not possible to amplify a glycome by a method like PCR, immobilized glycans on an array are analogous to amplified products of the genome, i.e. oligonucleotides, genes, gene fragments, and recombinant proteins, that make them available for functional studies.
The information derived by interrogating defined glycan microarrays with GBPs has led to important discoveries of GBP function. For example, the observation that galectins-3, -4, and -8 at physiological concentrations bind to human blood group glycans (78) suggested that they may play an innate immune role in humans. Humans are limited in generating the effective adaptive immune responses to blood group antigens recognized as self while being exposed to microorganisms expressing self-like blood group-related glycans. This hypothesis was confirmed by demonstrating that these galectins could not only bind to bacteria expressing blood group-related antigens but that they were in many cases bactericidal (79). Thus, data from defined glycan arrays represent a single piece of information regarding the glycanbinding specificity of a GBP. The GBP specificity is an important property that can be the basis for generating new hypotheses regarding GBP function. Knowledge of other important biological properties of a GBP, such as biological activity, tissue expression levels, and subcellular location, are normally required to rigorously define the GBP function.
In our approach to defining a glycome, we and others have derivatized free glycans derived from cells and tissues with a bifunctional tag that is fluorescent and also carries a free amino group (2,6-diaminopyridine (80) and more recently AEAB (50) or 2-aminobenzamide (81)). Fluorescence provides a method for detecting glycans during their purification, and the amino function provides a reactive center to immobilize glycans for functional analyses on glycan microarrays or other solid phases.
Shotgun Glycan Microarrays Define Biologically Relevant Glycans-Defined glycan microarrays are limited to the glycans we have available for printing on an array, and in many instances a defined array is missing important glycan structure(s) required to define a glycan specificity or epitope. Theoretically, a defined glycan array comprising all member glycans within the human glycome would allow us to define the specificity of any biologically relevant GBP. Currently, the CFG-defined glycan microarray with ϳ600 glycans represents Ͻ10% of the number of glycans estimated to comprise the human glycome. Defining and making the human glycome available in a format that can be screened by GBPs should be an important mission of the field of glycomics. One approach to address this goal will be to determine the detailed structures of the human glycome, which is probably composed of glycans containing at least 10,000 determinants (4), and then have them generated by synthetic chemists so that the human glycome would be available for functional analyses. Based on current technology, however, it is unrealistic to imagine either that the detailed structure of the human glycome can be accomplished in a reasonable time or that such a large number of glycans could be chemically synthesized. It is probably more reasonable to believe that a few thousand glycans can be synthesized by chemical and chemo-enzymatic synthesis during the next decade, but a lot of time may be focused on synthesizing structures of little biological interest. To direct the efforts of such a project, we have proposed shotgun glycomics as a method to identify physiologically or biologically relevant glycans that are screened as potential glycan ligands for GBPs of interest (82).
We developed shotgun glycomics to approach the definition of the function and structure of the human glycome in a relatively high throughput manner using naturally expressed glycans. We used nanoscale methods to isolate glycans from natural sources and prepare glycan libraries for direct studies of both their structure and function in terms of GBP recognition. This approach focuses sequencing efforts on functionally relevant glycans recognized by a GBP and results in libraries of naturally occurring glycans that can be archived and retrieved for future studies. In our first example of a shotgun glycan microarray (SGM) (82), ozonolysis of the sphingosine portion of a mixture of GSLs generated free aldehydes that readily reacted by reductive amination to create fluorescent GSL derivatives with a primary amino group. The mixture of GSL derivatives composed of bovine brain gangliosides (BBG) was resolved by two-dimensional HPLC into 40 individual derivatives that make up a BBG-tagged glycan library (TGL). The derived glycolipids were quantified based on their fluorescence, characterized by MALDI-TOF/TOF analysis, and printed at equimolar concentrations on N-hydroxysuccinimide-derivatized slides. We interrogated the SGM for antibodies to GSL in serum from individuals that had been diagnosed with Lyme disease. One BBG fraction showed significant recognition by sera from individuals with Lyme disease compared with control sera, and the GSL derivative was retrieved from the TGL for further structural characterization. The predicted structure was GD1b-lactone, which is a relatively rare component of brain tissues and melanoma cells (82). Although these data supported additional studies with a larger set of serum samples, they more importantly demonstrated the utility of an SGM for identifying a relevant but rare GSL in a crude fraction of BBG. Similar SGMs from human erythrocytes and a cultured human cell line are described in this report (82), demonstrating that this technology can generate purified natural glycans for defining the human glycome.
In this strategy for human glycomic analysis, the term "shotgun" refers to the fact that glycans are prepared from specific cells or tissues and differs from shotgun genomics in that it does not propose to directly sequence all of the component member glycans in the TGL, but to prioritize structural efforts and identify glycans to be synthesized by chemists for expanding the defined array. As structural definition progresses, the number of defined structures on the SGM will increase, and ultimately the entire TGL may be defined. The shotgun glycomics approach may be applied to any organism, and it might be particularly important for model organisms such as Caenorhabditis elegans, Drosophila, and zebrafish.
Metadata-assisted Glycan Sequencing, a Glycomics Approach Based on MS Analysis of TGLs and Defined GBP Binding to SGMs-The TGL generated from a tissue or orga-nism represents a significant component of a glycome, and each glycan fraction has an associated mass based on MALDI-TOF analysis carried out prior to printing the SGM. Defined GBPs, e.g. plant and animal lectins and anti-glycan antibodies, provide a rich source of reagents for detecting unique glycan determinants among the glycans printed on an SGM. To validate the printing process, we normally interrogate the SGM with defined GBPs to be sure glycans were printed. Thus, in the process of validating the printing of an SGM with these reagents, we generated significant structural information. Fig. 1 provides a description of the unique glycan determinants of a small selection of commercially available lectins and antibodies used to introduce this approach. Concanavalin A is capable of detecting N-glycans due to its specificity for ␣-linked Man in branched Man␣1-3(Man␣1-6)Man␣1-R (83,84), as well as its weaker interaction with the internal trimannosyl core of bi-antennary N-glycans, but not with tri-or tetra-antennary or bisected N-glycans. Sambucus nigra agglutinin (SNA) is generally considered specific for Neu5Ac␣2-6Gal␤1-4GlcNAc (85,86), but it binds better to Neu5Ac␣2-6Gal␤1-4GlcNAc␤1-3Man␣1-3Man sequences on N-glycans than to the terminal sequence on the six branch of bi-antennary N-glycans (87). Maackia amurensis lectin I detects Neu5Ac␣2-3Gal␤1-4GlcNAc (88), and Erythrina cristagalli lectin (ECL) is specific for Gal␤1-4GlcNAc (89). Among the fucose-binding lectins, the examples provided are Aleuria aurantia lectin, which has a rather broad specificity for ␣-linked Fuc (90), whereas Ulex europaeus agglutinin is specific for H-antigen (Fuc␣1-2Gal-R) (91). A variety of anti-glycan monoclonal antibodies are now commercially available, and the numbers are growing rapidly (4); five examples of different antibody specificities among the dozens defined to date are shown in Fig. 1.
MALDI-TOF data provide the molecular masses and the composition of the glycans printed on the SGM. The binding FIG. 1. Glycan determinants defined by glycan-binding proteins. The determinants defined by six commercially available lectins and five monoclonal antibodies were determined by analysis on the defined glycan microarray provided by the Consortium for Functional Glycomics. The determinants, whose presence on an array can be defined by positive signals in an array analysis, are outlined in each structure. Con A, concanavalin A.
patterns of the different lectins can provide extensive structural information and can be extremely useful for differentiating glycans that have the same mass but a different arrangement of monosaccharides. In Fig. 2, we summarize the data that would be generated from a glycan microarray of 10 isobaric, bi-antennary N-glycans, whose structures are shown with the pattern of binding of the 11 GBP specificities described in Fig. 1. Interestingly, no two patterns were identical despite the fact that all of the glycans were biantennary Nglycans with the same composition.
For example, glycans 6 and 7 (Fig. 2) differ only by the linkages of Gal and Fuc in the 3-branch of the bi-antennary structure, but they are readily distinguished by their binding with anti-Lewis a (Le a ) and anti-Lewis x reagents. The positive MAL binding together with antibody binding confirmed the branched structure because these reagents are specific for terminal structures. The data, however, cannot determine on which branch each determinant resides. After neuraminidase digestion, ECL binding was positive indicating that the ␣3linked sialic acid is located on Gal␤1-4GlcNAc. Space does N-glycans (1-10) are listed, and the predicted binding patterns for the hypothetical microarray of the lectins and anti-glycan antibodies described in Fig. 1 are shown. The ϩ or Ϫ indicate positive or negative binding, and each data point is divided to provide the results of binding with no treatment (above the slash) and the results of binding after treatment with nonspecific neuraminidase (below the slash).

FIG. 2. Predicted and unique lectin/antibody binding patterns to hypothetical isobaric N-glycans immobilized on a hypothetical glycan microarray used for structural analysis. Ten hypothetical
not permit the interpretation of each data point, but a large amount of data can be associated with each glycan on this example array. These metadata can be compiled in a database and used for predicting monosaccharide sequence and detailed structures of each individual glycan. We have termed this approach metadata-assisted glycan sequencing (MAGS).
MAGS is based on the analyses of many replicate arrays of undefined glycans (SGMs) that are interrogated by many different GBPs (Fig. 3). As the SGM is interrogated with defined GBPs as well as GBPs whose specificity and function are unknown, a database continues to be populated with information on each glycan. When a glycan is determined to be biologically relevant based on a binding event, additional information may be obtained by retrieving the glycan from the TGL for further analysis, although structural information on the entire glycome on the SGM can be addressed by evaluating the binding profile of defined GBPs before and after specific in situ exoglycosidase digestion on the arrays (92).
Analysis of Human Milk SGM by MAGS-A variety of techniques has been applied to solve the structures of the complex mixture of isomeric glycans found in human milk (93). We elected to take a functional approach to the human milksoluble glycome using shotgun glycomics (92). The neutral, monosialyl, and disialyl glycans of a milk sample were tagged with AEAB, separated by two-dimensional HPLC into 127 nearly homogeneous but not fully characterized glycans that made up the human milk TGL. During the production of the TGL, we analyzed each fraction by MALDI-TOF analysis and accumulated data for each glycan. The TGL was printed as a microarray of 127 glycans (n ϭ 4) on an N-hydroxysuccinimide-derivatized microscope slide to produce the "human milk shotgun glycan microarray" or HM-SGM. To demonstrate the utility of the human milk glycan (HMG) array, we interrogated it with a variety of well characterized GBPs, including lectins and specific anti-glycan antibodies. No significant binding was observed with concanavalin A, Vicia villosa lectin, Griffonia simplicifolia lectin II (GSL-II), and M. amurensis lectin I, consistent with the absence of mannose, terminal GalNAc, terminal GlcNAc, and terminal Neu5Ac␣2-3Gal␤1-4GlcNAc, respectively, in human milk (92). The other six lectins, A. aurantia lectin, SNA, Lotus tetragonolobus lectin, U. europaeus agglutinin, Ricinus communis agglutinin I, and ECL, exhibited binding to many HMGs on the array.
In interrogations to identify the function of HMGs, we discovered a number of interesting features of these glycans. Some glycans contain epitopes for the monoclonal antibodies TRA-1-60 and TRA-1-81, which are specific for biomarkers of human embryonic pluripotent stem cells (94). Other specif-

FIG. 3. Metadata-assisted glycan sequencing is an extension of the shotgun glycan microarray concept.
Beginning with the generation of the TGL, each glycan is assigned an accession number and printed on the array, and the metadata are collected for each glycan and stored in a database. Pre-printing information can include the following: number of negative charges based on ion-exchange chromatography; location of the glycan in the two-dimensional HPLC separation profiles; percentage of total glycomes that each glycan represents; MALDI-TOF data to provide information on purity, composition, additional MS n data as obtained; defined GBP binding before and after exoglycosidase digestion; and any other information deemed useful regarding the nature of the glycan.   Linking Glycan Structure and Function through Microarrays ically sialylated glycans are bound by fluorescently labeled influenza A virus and minute virus of mice (MVM), suggesting that HMGs may function as receptor decoys in an innate defense mechanism against potential pathogens (92). Overall, influenza A bound to eight glycans; MVM bound to six glycans, and the TRA-1 antibodies bound to six different glycans on the HM-SGM (total of 20 different glycans). Whereas molecular mass data provided compositions of the natural glycan ligands bound by these potential pathogens, more detailed MS n data were unable to definitively solve the structures. To obtain more decisive structural characterizations and demonstrate the utility of the MAGS, we retrieved 22 functionally identified glycans from the TGL of the HM-SGM and printed them with 17 defined milk glycan standards. Individual arrays were interrogated with eight lectins and five anti-glycan monoclonal antibodies that had been analyzed on the CFG glycan microarray to confirm their specificity and binding activity. Such an analysis is essential in these studies due to the lack of vendor quality control of these reagents. After obtaining the initial patterns of binding, the subarrays were subjected to digestion with specific exoglycosidases either independently or in combination and subsequently interrogated again with the appropriate lectins or anti-glycan antibodies with positive and negative binding indicating the presence or absence of the corresponding determinants. The results of these studies are summarized in heat maps along with detailed descriptions of the logic behind the assignments of predicted structures of the functionally identified glycans (92). Because of space limitations, the detailed analysis of a single disialylated glycan identified as a glycan ligand bound by MVM is provided here as an example of a structural determination using the MAGS approach. However, the structural data of 22 HMGs on the array were obtained simultaneously. Several assumptions can be made regarding HMGs based on previous studies (93,95,96). They all have lactose as a reducing disaccharide and are composed of a single glucose residue with Gal and GlcNAc present in linear or branched sequences of Gal␤1-3/4GlcNAc (LacNAc). The GlcNAc is linked ␤1-3 to Gal in linear glycans with branches occurring when GlcNAc is attached ␤1-6 to Gal. These core glycans are then substituted with ␣-linked Fuc and ␣-linked Neu5Ac to make up an extremely complex mixture of isomeric and isobaric glycans. The results of interrogation of the disialylated glycan and 10 standard milk glycans with a selection of de-fined lectins and antibodies are shown in Fig. 4A, and the results of interrogation after exoglycosidase and sequential exoglycosidase digestions are shown in Fig. 4B. The unknown glycan (predicted structure shown in Fig. 4, A and B) is an octasaccharide (mass ϭ 1818.174) composed of three residues of Gal, one Glc, two residues of GlcNAc, and two residues of Neu5Ac based on the fact that it originated from human milk. ECL and the anti-SLe a /LSTa antibody bind the unknown glycan, indicating that it is a branched structure containing a terminal type 2 glycan (ECL-positive) and a terminal LSTa determinant (absence of fucose excludes the possibility of SLe a ). ECL binding was weak (data not shown) and was presumably due to the steric effect of the sialylated branch, because the ECL binding signal increased by 3-fold after neuraminidase treatment (data not shown), and one branch must be a type 1 structure, which is not bound by ECL. Because digestion with ␣2-3-neuraminidase leads to no change in ECL binding (data not shown), we assume that the other sialic acid must be ␣2-6-linked and continues to block the ECL binding even after removal of the ␣2-3-sialic acid (Fig. 4B). In addition, anti-type 1 antibody binding is observed only after removal of all the sialic acid by nonspecific neuraminidase. Thus, the disialylated glycan is predicted to contain one terminal type 2 and one terminal disialyl LNT (DSL). In addition, no GSL-II binding is observed after ␤1-4 galactosidase treatment (Fig. 4B), because GSL-II does not bind GlcNAc␤1-6Gal, which would be exposed by sequential ␣2-3-neuraminidase and ␤1-3-galactosidase digestion (Fig.  4B). However, sequential nonspecific neuraminidase and ␤1-3-galactosidase treatment did lead to strong binding to GSL-II, consistent with the predicted structure. The low SNA binding is consistent with the lack of SNA binding to the DSL standard (Fig. 4A). Taken together, the MAGS analysis permits a relatively conclusive prediction that the isolated glycan ligand bound by MVM is as shown in the unknown structure in Fig. 4A.
Using this conceptual approach, we were able to propose structures, including most of the linkage positions and anomeric configurations for the glycans that bound TRA-1 antibodies, influenza A, and MVM (92). Some redundancy of structures was observed due to the overlap of glycan fractions collected during multidimensional chromatography. Nevertheless, the correlation of proposed structures with the function defined by antibody or virus binding provided information on the glycans in human milk that are related to FIG. 4. Example of MAGS of a single human milk glycan selected for structural analysis based on its binding function. A, lectin and antibody binding to standards and an unknown disialyl human milk glycan. The predicted structure of the unknown glycan identified as a glycan ligand for MVM from human milk is shown at the upper left above a list of 10 glycan standards obtained from human milk. The binding patterns for four defined lectins (SNA, ECL, A. aurantia lectin, and GSL-II) and three monoclonal antibodies (anti-Le a , anti-type I glycan (Gal␤1-3GlcNAc), and anti-Sialyl Le a /LSTa) from individual microarray analyses are indicated as either positive (ϩ) or negative (Ϫ), indicating the presence or absence of the determinants as defined in Fig. 1. B, lectin and antibody binding to standards and an unknown disialyl human milk glycan before and after exoglycosidase digestion. The predicted structures of the unknown glycan and exoglycosidase products of the enzyme treatments are shown with the patterns of lectin and antibody binding from individual microarray analyses indicated as either ϩ or Ϫ. embryonic stem cell-specific epitopes and potential receptors for viruses.
Specific lectins are commonly used for the immunohistochemical localization of specific glycan structures in cells and tissues, and with MAGS we simply apply this approach to libraries of structurally undefined glycans with the ability to obtain additional material from the reserved TGL entries for deeper structural characterization. Although the MAGS approach to determining glycan structure alone cannot unequivocally identify a glycan, it can certainly complement MS in glycan structural determination, because it incorporates additional chemical and biochemical approaches used to define glycan structure, including digestions with specific exo-glycosidases, and specific lectin and antibody interactions. The concept of collecting metadata on structural analysis arrays is rational, and it is not unreasonable to consider the possibility that this approach with miniaturization, increased detection sensitivity, and automation could be developed as a high throughput approach to glycomics analysis.
Although MAGS represents a novel approach to assist in high throughput sequencing of glycans, it is highly dependent upon the availability of well characterized GBPs, such as antibodies and lectins, of defined specificity. Although many such GBPs are available, the quality control of commercial reagents is sometimes lacking; thus, researchers unaware of this may draw incorrect conclusions without independent verification of the activity and specificity of the commercial GBP. Certainly, a major limitation of the MAGS approach here is the availability and costs of large numbers of quality-controlled GBPs. Another important aspect of this approach is the requirement of high quality purified, specific exoglycosidases that are active and efficient on immobilized glycans on the glycan arrays being used for MAGS. In our studies, we have observed that the concentration of enzyme required to completely hydrolyze a terminal residue from a printed glycan is significantly higher than the amount of enzyme required to accomplish the same result in solution. Finally, the analysis and interpretation of the binding patterns of these defined glycans are also relatively new areas carried out by a relatively small number of laboratories that require specialized equipment for printing and scanning glycan microarrays. These techniques continue to be refined as more and varied presentations of glycans on microarrays are developed.
Bioinformatics will be an absolute requirement for developing and maintaining the databases generated by MAGS and ultimately mining the information to obtain useful data. The simultaneous analyses of hundreds of unknown glycan structures using multiple analyses of replicate printed arrays before and after single or sequential enzyme digests will result in massive amounts of data that can only be handled computationally. Our initial approach to bioinformatics has developed out of our need to process data from the interaction of unknown GBPs with defined glycan arrays with the purpose to automate the discovery of GBP-binding motifs and define the glycan-binding specificity of GBPs (97). The algorithms used to perform these analyses will be used in developing more sophisticated software to automate the analyses of hundreds of glycans on SGMs. As glycans on each SGM are structurally defined, they become components of an ever-increasing defined glycan array and continue to expand the collection of structures in the process of defining the human glycome.