Challenges of glycosylation analysis and control: an integrated approach to producing optimal and consistent therapeutic drugs

Challenges of glycosylation analysis and control: an integrated approach to producing optimal and consistent therapeutic drugs Peiqing Zhang, Susanto Woen, Tianhua Wang, Brian Liau, Sophie Zhao, Chen Chen, Yuansheng Yang, Zhiwei Song, Mark R. Wormald, Chuanfei Yu and Pauline M. Rudd Bioprocessing Technology Institute, 20 Biopolis Way, #06-01 Centros, Singapore National University of Singapore, 21 Lower Kent Ridge Road, Singapore Oxford Glycobiology Institute, South Parks Road, Oxford OX1 3QU, UK National Institutes for Food and Drug Control, No. 2, Tiantan Xili, Dongcheng District, Beijing, China NIBRT GlycoScience Group, NIBRT – The National Institute for Bioprocessing Research and Training, Fosters Avenue, Mount Merrion, Blackrock, Co. Dublin, Ireland


Introduction
Glycosylation is one of the most common post-translational modifications (PTMs) of proteins, present on more than 50% of the eukaryotic proteome [1]. Glycans impart a wide range of functions on their protein carriers, ranging from folding and quality control in the endoplasmic reticulum [2], intracellular and extracellular targeting [3], peptide loading onto MHC class I [4], mediation of virus binding to host cells [5] to immune modulation of Fc receptor interactions [6]. Glycosylation of biopharmaceutical drugs has a pivotal role in their safety and efficacy by modulating a wide range of drug properties, including immunogenicity [7], in vivo circulatory half-life [8] and effector functions [9]. A survey of the top 20 best-selling biopharmaceutical drugs up to 2013 shows that 11 of them are glycoproteins, out of which eight are monoclonal antibodies (mAbs) and antibody Fc-based fusion proteins [10]. Therefore, this review will focus on mAbs because they represent the major category of glycoprotein-based therapeutic drugs.
Since the approval of the first mAb product in 1986 (OKT3, which is a murine mAb against CD3 for kidney transplant rejection therapy), 49 mAb-based biotherapeutics have been approved and marketed in the USA and Europe [10]. As the growth engine of modern translational biotechnology, global sales of therapeutic mAbs totaled nearly US$75 billion in 2013, accounting for approximately half of the revenues accrued from all biopharmaceutical products [11,12]. Except for three biopharmaceuticals produced in Escherichia coli, all other products have been glycoproteins expressed in mammalian cells, highlighting the importance of understanding mammalian glycosylation and its regulation [13].
In contrast to nucleic acids and proteins, the biosynthesis of glycans is not directly template-driven but, rather, is a result of a complex network of metabolic and enzymatic reactions that are  Representative monosaccharide and oligosaccharide species in mammalian N-glycosylation. Glycans are oligomers of monosaccharide species that are connected together by various glycosidic bonds. (a) Structures of monosaccharides commonly found in mammalian N-glycans, such as hexoses (Hex) that include glucose (Glc), galactose (Gal), mannose (Man), as well as N-acetylhexosamines (HexNAc) that include N-acetylglucosamine (GlcNAc) and N-acetylgalactosamine (GalNAc), and sialic acids such as N-acetylneuraminic acid (Neu5Ac) and N-glycolylneuraminic acid (Neu5Gc). Neu5Gc is not synthesized in humans and is considered an immunogenic glycan epitope that can be found on glycoproteins expressed by non-human systems. (b) Structure models of major N-glycan species found on recombinant IgG antibody drugs, including FA2 (also called G0F), FA2G1 (or G1F), FA2G2 (or G2F), FA2G2S2 (or A2F) and Man5. Note the positional isomers of FA2G1, in which the terminal Gal residue can be attached to the 6-arm or the 3-arm of the core structure. (c) Structure model of a tetra-antennary, fully sialylated N-glycan (FA4G4S4) that has been reported as a major oligosaccharide species on erythropoietin (EPO). Here, we only depict the conformation in which all four Neu5Ac residues are linked by a2,3-bond to the Gal residues. Other stereoisomers can arise from a2,6-linkage. This also shows the complexity in detailed glycan analysis.
influenced by many factors, including the genetic profile of the cells in which the glycoconjugates are expressed [14], epigenetics [15] and the extracellular environment [16]. Consequently, glycoproteins, including biopharmaceutical products, always display a heterogeneous set of glycans that can be influenced by the host cell line as well as the upstream and downstream bioprocesses. Fig. 1 shows chemical structures of common monosaccharide species that form N-glycans, as well as structural models of oligosaccharides typically found on antibody drugs and erythropoietin (EPO). From a regulatory perspective, human-compatible and consistent glycosylation is required for a safe and efficacious drug product. This requires the drug developers to analyze the glycosylation systematically throughout the drug development and manufacturing processes. In addition, controlling glycosylation has been a long-standing challenge that requires a detailed understanding of the glycosylation pathways in cell culture processes. This review aims to update our knowledge of the role of glycosylation on drug properties, with a focus on IgG antibodies and regulator requirements for glycosylation analysis. We will then walk through the process of producing mAbs, ranging from upstream and downstream bioprocesses to analytics and informatics tools. This will cover the cell line development, and tools for glyco-engineering host cell lines. Various upstream process parameters have been shown to influence glycosylation and we will provide a summary in the context of quality-by-design (QbD) as well as a review of alternative downstream practices on glycoform selection. State-of-the-art analytical technologies for structural analysis of glycosylation will be discussed. Finally, a major bottleneck in glycosylation analysis and control lies with the lack of informatics tools available for efficient data processing and pathway modeling; therefore we will review the recent progress in these two areas. Fig. 2 provides an overview of the manuscript.

Impact of glycosylation on antibody characteristics
IgG is the most abundant among the five classes of serum immunoglobulins. It makes up about 15-20% of serum glycoproteins. In humans, there are four subclasses of IgG (IgG1-4), which have differences in structure, antigen-binding characteristics as well as in a range of interactions with cellular receptors through the Fc region [17]. To date, most approved therapeutic antibodies are of the IgG1 subclass which effectively initiates antibody-dependent cell-mediated cytotoxicity (ADCC). We can learn from studies of natural human antibodies and apply the knowledge to the development of recombinant mAbs. The Fc region of IgG1 contains a conserved N-glycosylation site at Asn297 in each CH 2 domain. Fig. 3 illustrates the binding of alemtuzumab (Campath 1 ), a humanized IgG1k mAb used to treat B cell chronic lymphocytic leukemia (B-CLL), to its target CD52 expressed on the B cell surface. About 20% of the circulating IgG molecules carry additional N-glycans located in their Fab region [18][19][20][21]. With the single exception of cetuximab, which has an N-glycosylation site in the variable region of each heavy chain (Fig. 4), all approved therapeutic mAbs are only N-glycosylated in the Fc region  [7,22,23]. Fig. 5 shows typical glycan profiles of human serum IgG and trastuzumab, which is a recombinant humanized IgG1 produced in Chinese hamster ovary (CHO) cells used as a targeted therapy for HER2-positive metastatic breast cancer.

Effect of Fc region glycosylation on therapeutic antibody characteristics
In contrast to the Fab region which recognizes antigenic epitopes, the Fc region has limited structural diversity. The Fc region determines the function that will ensue after antigen binding. It can recruit molecules in the innate immune system, such as C1q, as well as cytotoxic and antigen-presenting cells via binding interactions with Fcg receptors. IgGFc contains two conserved N-glycosylation sites at Asn297, one on each heavy chain [24]. Variations in the structure of these glycans result in subtle changes in structure that have a significant influence on the interaction of IgG with the immune system (Table 1). One of the crucial functions of Fc glycans, and indeed of serum glycoprotein N-glycans in general [25], is to regulate protein turnover and renewal. Circulating exoglycosidases progressively remodel glycoprotein glycans, rendering them vulnerable to clearance by the immune system [25]. For example, it has been shown that the G0 glycoform of IgG, which contains no galactose and therefore presents terminal N-acetylglucosamine (GlcNAc), is bound by a C-type lectin, the mannose receptor, and is cleared by dendritic cells and macrophages [26,27]. Similarly, IgGs bearing high mannose-type glycans exhibit reduced serum half-life [28]. These findings have strong implications for therapeutic antibody quality control, suggesting that such glycoforms should be carefully controlled at a low level. IgG, modified by additional sugars in the CH3 domains, enables more-efficient binding to FcRn, the neonatal receptor, extending the half life [29]. Fc glycans also play a part in IgG transport, as shown by the increase in galactosylation during pregnancy [30].
Fc region glycans can directly influence the affinity of IgGs to Fcg receptors, either by changing the conformation of the Fc region [31,32] or through glycan-glycan interactions [33], thus strongly influencing their ability to recruit immune effector cells. Fc region glycosylation helps to maintain the stability of the molecule, and it has been shown that IgGs that have enzymatically truncated or removed Fc glycans show reduced thermal stability, complement binding [34,35] and impaired cell-dependent cytotoxic effects [36]. Furthermore, it has also been shown that the presence of glycans containing core fucose [37] or terminal sialic acid [38] at Asn297 reduces the affinity of IgGs to Fcg receptors. In the case of core fucose, it has been shown that either knockdown of the gene encoding a1,6-fucosyltransferase (Fut8) or overexpression of the gene encoding GnT-III transferase (Mgat3) results in a lack of core fucosylation at Asn297, vastly increasing the affinity of IgG for FcgRIIIa more than 100 times [37] and thus increasing ADCC. Subsequent analysis of crystal structures of the Fc receptor in complex with either the fucosylated or afucosylated Fc regions showed that the lack of fucose allows carbohydrate-carbohydrate interactions to form between the glycans of the Fc receptor and those at Asn297, enhancing their binding affinity [33].
Fc glycans are infrequently sialylated owing to their relative inaccessibility and their interactions with the protein surface [39], and sophisticated methods are required to achieve a high degree of sialylation in vitro [40]. Several studies suggest that the presence of sialylated glycans on the Fc region reduces the ability of IgGs to recruit effector cells via Fcg receptor binding [38,41], and that Fc glycan sialylation is highly correlated with the presence of Cterminal lysine [42]. Fc glycan sialylation has also been strongly associated with anti-inflammatory properties. These studies were initially carried out on intravenous immunoglobulin (IVIG), which is a therapeutic product consisting of IgGs extracted from the pooled plasma of 30 000 -60 000 healthy donors per batch [43]. IVIG has profound yet poorly understood anti-inflammatory effects, and as such has been used for over 30 years to treat a wide range of acute and chronic autoimmune and inflammatory diseases. It was shown that sialic-acid-enriched fractions of IVIG had up to tenfold higher anti-inflammatory activity compared with unfractionated IVIG [43]. Moreover, subsequent reports showed that the Fab regions were dispensable for this effect, and that Fc regions were solely responsible [44]. Unsialylated IVIG was found to interact primarily with type I Fc receptors, which includes all Fcg receptors, whereas sialylated IVIG interacted primarily with type II Fcg receptors, which includes lectin receptors such as dendritic-cell-specific intercellular adhesion molecule-3-grabbing non-integrin (DC-SIGN) [38]. This suggests that differential engagement of Fc receptors on effector cells was responsible for the observed differences in activity. These findings have not been reproducible under all experimental conditions -a number of researchers maintain that the anti-inflammatory properties of IVIG are not dependent on sialylation [45][46][47]. Moreover, the precise anti-inflammatory mechanism is still very much in doubt. It has been proposed that IVIG potentiates the activation of basophils [48] or regulatory T cells [49,50], but several studies have shown that this is not necessarily the case [45,51].

Effect of Fab region glycosylation on antigen binding
Although the majority of IgG antibodies are modulated by glycosylation in the Fc region, some 20% also contain nonconserved N-glycosylation sites (Asn-Xaa-Ser/Thr) in the variable region. Indeed, analyses of human and animal antibodies have shown that variable region (mainly V H -region) carbohydrates are frequently present [18][19][20][21]. A survey of human cDNA sequences indicated that 9% of variable regions had a potential N-linked glycosylation site [52], and other studies indicate that about 20% of polyclonal IgGs from human serum have V-region glycosylation [53,54]. V-region glycosylation can modulate the affinity of the antibody for its antigen, potentially either enhancing binding strength or abolishing it completely. The net effect of V-region glycosylation is exquisitely sensitive to the location of the carbohydrate chain on the complementarity-determining  region (CDR) loops which determine recognition and binding. For example, a study of murine antibodies of V H subgroup IIIB indicated that glycosylation at Asn58 in the second CDR of V H enhanced the affinity of an antidextran antibody for its antigen by approximately tenfold [21,55]. However, when nonglycosylated, antidextran antibodies were engineered to express N-glycosylation sites, it was found that glycosylation at Asn60 increased overall affinity only threefold, whereas glycosylation at Asn54 inhibited binding to dextran. Thus, a shift in glycosylation site by only a few amino acids can turn an enhancing glycan into an inhibitory one. To avoid any complications, during the screening of mAbs, clones bearing Fab glycans are typically discarded or the glycosylation sites mutated. In certain cases, however, Fab glycosylation can be exploited to improve drug properties. V-region glycosylation has been proposed as a way to increase the neutralizing ability of antibodies. A recent study of an anti-factor-VIII neutralizing antibody showed that V-region glycosylation enhances the ability of the antibody to neutralize factor VIII, but does not increase its binding affinity [56]. Song et al. engineered an N-glycan into the light chain of ibalizumab, a neutralizing antibody against HIV-1 that binds to the surface envelope glycoprotein gp120 [57]. The Fab N-glycan refilled a cavity caused by a mutation in gp120. In vitro data show the glyco-engineered ibalizumab binds to a broad range of HIV-1 strains, including the ibalizumab-resistant mutants. Fab glycosylation has also been proposed as a mechanism to block self-binding sites during immune system development. Autoantibodies on anergic B cells contained V-region glycans that blocked antigen-binding sites capable of recognizing foreign-and self-antigens. Reactivation of anergic B cells by foreign antigen challenge was characterized by an increase in the germinal cell progeny of IgM low IgD+ anergic B cells that initially had 72% of their antigen-binding sites blocked by V-region glycans. Following somatic hypermutation, these binding sites were cleared of blocking glycans, increasing their binding affinities 100-fold [58].
In contrast to the conserved glycans in the Fc, most V-region glycosylation sites are located on solvent-exposed loops and are readily accessible to endogenous lectins. They tend to contain fully processed, sialylated glycans, with a high incidence of bisecting GlcNAc residues [59,60]. Intriguingly, in some cases oligomannose glycans have been found within the CDR antigen-binding regions [55], particularly in the case of follicular lymphoma patients [61]. These glycans are accessible to C-type lectin-binding domains in carbohydrate recognition domains such as those  Glycan profiles of human serum IgG and trastuzumab (Herceptin W ) analyzed by HILIC-UPLC. Glycans on human serum IgG and trastuzumab were released by PNGase F treatment and labeled with 2-AB. The labeled glycans were analyzed by HILIC-UPLC with fluorescence detection. In both IgG samples, the majority of the glycans are complex-type, neutral glycans, such as FA2 (G0F), two isomers of FA2G1 (G1F) and FA2G2 (G2F). Other truncated complex-type glycans and afucosylated structures are also found, but at lower abundance. Oligomannose structures, most notably Man5, are typically found in recombinant IgG samples. Their abundance in human serum IgG is much lower as a result of mannose-receptor-mediated clearance. Sialylated glycans exist in much higher abundance in human serum IgG compared with trastuzumab, because of clones bearing Fab glycans. Other significant differences in the two IgG samples are bisecting GlcNAc and Neu5Ac connected in a2,6-linkage, which are present in human serum IgG but not CHO-derived recombinant IgG, including trastuzumab.
www.drugdiscoverytoday.com 745 Examples of Fc glycan on antibody properties and functions.
Fc glycan feature Impact on antibody properties and functions Refs

Diphtheria Toxin Hypersensitivity in humans
Drug Discovery Today

FIGURE 6
The role of non-human sialic acid, Neu5Gc, in serum sickness. Injection of horse antiserum against diphtheria toxin led to hypersensitivity reactions in humans. Antibodies were directed against H-D antigen, and were capable of agglutinating erythrocytes from a variety of mammals, including cows. Later it was found that antibodies against H-D antigen react only with Neu5Gc-containing glycosphingolipids.
presented by the mannose-binding lectin. The mannose glycans were not the result of impaired glycosylation pathways because Fc region glycosylation remained complex. V-region oligomannose glycans do not always inhibit antigen binding [55]; instead, recent evidence suggests that they contribute to pathological processes by binding to lectin domains of the innate immune system such as the mannose receptor and DC-SIGN, blocking their signaling functions [62].

Immune reactions to antibody glycans
Humans have baseline levels of antibodies against certain nonhuman glycan motifs including N-glycolylneuraminic acid (Neu5Gc) [63] (Fig. 1) and terminal a-1,3-linked galactose (agal) (Fig. 4) [7]. Their potential inclusion as PTMs on biologics thus presents an immunogenic risk to patients. Excluding these glycan motifs from recombinant glycoproteins is not a matter that can be simply solved by switching production cell lines -human cells cultured in vitro retain the ability to incorporate Neu5Gc through the salvage pathway when supplied with exogenous precursors to Neu5Gc in the medium [63,64] and CHO cells, which are the workhorses of the biopharmaceutical industry, can incorporate Neu5Gc [65] and a-gal [66] into recombinant glycoproteins, thus emphasizing the need for the optimization of culture conditions, as well as stringent quality control measures. The production of heterophilic antibodies upon injection with foreign serum was first described by Hanganutziu [67] and Deicher [68] in the 1920s as part of a severe immune response termed 'serum sickness' (Fig. 6). These antibodies caused higher levels of serum hemagglutination of horse, calf, sheep and rabbit erythrocytes, as well as complete adsorption to guinea-pig kidney sediment. However, their binding targets remained undetermined until the 1970s. Then, some early experiments with biologic drugs were performed when anti-human thymocyte immunoglobulins were raised in goats and injected into patients with venous immunologically mediated or lymphoproliferative diseases [69]. Although none of the patients experienced severe side effects, it was found that 91% of them developed the 'H-D antibodies' originally described by Haganutziu and Deicher. Researchers subsequently discovered that the immunological agents giving rise to H-D antibodies were Neu5Gc-containing glycosphingolipids [70], thus demonstrating that glycans on biologic therapeutics can provoke systemic immune responses (Fig. 6).
Nevertheless, hypersensitivity reactions toward Neu5Gc in biologics are rare. IVIG Fc fractionated with Sambucus nigra agglutinin (SNA), a sialic-acid-binding lectin, was the primary anti-inflammatory component of IVIG [71], and only Fc containing two or more sialic acids was capable of binding SNA owing to the spatial constraints of the CH2 domains and sialic acid exposure on the surface of the antibody [72,73]. Using ELISA and BIAcore analysis, combined with thorough characterization of glycan profiles at the intact antibody level with mass spectrometry (MS), our data [41] demonstrate that the binding reactivity of the mAbs to the anti-Neu5Gc antibody resided in the SNA-bound portion containing two or more Neu5Gc, whereas the SNA-non-bound portion harboring only one Neu5Gc showed no reactivity. The fact that most Neu5Gc epitopes are distributed singly on the Fc of mAbs suggests the low potential immunogenicity of Fc Neu5Gc.
Despite increasingly sophisticated methods for biologics production, hypersensitivity reactions still occur in 1-3% of treated patients [7,74]. Adverse immune responses to poorly controlled biologic glycosylation can be as minor as local irritation or as serious as cardiovascular failure [75]. Recent evidence indicates that hypersensitivity to glycan immunogens is strongly mediated by IgE antibodies [7, 74,76] (Fig. 7). In a landmark study, Chung et al. found that a high prevalence of hypersensitivity reactions to cetuximab (Fig. 4), reported in some parts of the USA, were due to IgE antibodies against a-gal [7], with symptoms that ranged from transient flushing or rash to full-blown anaphylaxis. Not all humans have sufficiently high levels of anti-a-gal IgE to elicit hypersensitivity responses -researchers observed that the incidence of either cetuximab or diet-associated a-gal hypersensitivity were higher in South-Eastern USA relative to the northern states [74,76]. It has been hypothesized that environmental stimuli such as tick bites, which are prevalent in the southern states but rarer in the north, could be responsible for regional differences in circulating anti-a-gal IgE levels, accounting for differences in sensitivity to cetuximab [77]. It was found that tick bites increase the level of anti-a-gal IgE antibodies 20-fold [77], providing a neat explanation for the observed phenomenon. Interestingly, it has been found that the glycans in cetuximab responsible for provoking hypersensitivity responses lie in the Fab region as opposed to the Fc region because they are more accessible to IgE binding [22]. Other approved biologics such as infliximab and palivizumab, which have been produced in murine SP2/0 and NS0 lines, also contain appreciable amounts of a-gal, but do not provoke similar rates of hypersensitivity reactions because they are not glycosylated in the Fab region.

Regulatory perspectives on glycosylation of therapeutic antibodies
Appropriate glycosylation is one of the critical quality attributes (CQAs) that must be demonstrated to ensure the safety and potency of commercial mAbs before regulatory approval. WHO guidelines on biotherapeutics [78] and the International Conference on Harmonization (ICH) Q6B [79] mandate state that PTMs such as glycosylation should be identified and adequately characterized. Thorough characterization of mAb glycan profiles and their potential impacts on safety and therapeutic efficacy are therefore an essential part of quality control strategy. Glycan profile characterization consists of two main steps: first, complete characterization of reference and/or conformance lots of a mAb; and, second, abbreviated tests of subsequent batches for lot release [80]. Because antibody glycosylation can be influenced even by subtle changes in the manufacturing process, the abbreviated tests are often necessary to demonstrate manufacturing consistency. However, the abbreviated tests might not be included in the lot release assay if the sponsors manage to prove that tight process control strategies can ensure the consistency of the manufacturing process by adopting QbD approaches. WHO guidelines on biotherapeutics suggest that the selection of tests to be included in routine control programs should be product-specific and should take into account the necessary quality attributes; so, for mAb products in which glycosylation has been identified as a CQA, sponsors are encouraged to include the abbreviated test in the lot release assays. Because glycosylation is so sensitive to environmental perturbations, therapeutic products that are subject to manufacturing process changes must undergo comparability studies as suggested by ICH Q5E. As part of these tests, glycan profile analysis should be thoroughly exercised to ensure the comparability of mAb products between, before and after process changes have been implemented.
Biosimilars are drawing increasing attention because of the impending 'patent cliff'. Regulatory approval for a biosimilar product is based on its comparability to an originator, whereas a thorough quality comparability exercise is crucial for demonstrating biosimilarity [81]. WHO guidelines [78] recommend the use of the same host cell type for glycoproteins in most cases, because glycosylation patterns vary significantly between different host cell types [82]. European Medicines Agency (EMA) guidelines [79] require a detailed comparison of carbohydrate structures including the overall glycan profile, site-specific glycosylation patterns and site occupancy. The EMA also points out that the presence of glycosylation structures or variants not observed in the reference medicinal product can raise concerns and would require appropriate justification, with particular attention to non-human structures (non-human linkages, sequences or sugars). FDA guidelines [83] also require the detailed comparison of glycan profiles for biosimilars.

Cell line development
Mammalian cell lines used for manufacturing recombinant therapeutic proteins are derived from single cell clones producing high amounts of product with consistent quality to meet regulatory requirements. The process of cell line development for recombinant protein production generally starts with transfecting the mammalian host cells with plasmid vectors carrying the gene of interest and a selection marker gene. Antibiotic-based selection is next performed to isolate a pool of stably transfected cells with the plasmid integrated into their genome by killing the untransfected cells. Owing to random integration and host cell heterogeneity, the stably transfected pool contains cells that exhibit variations in productivity, stability and other product characteristics [84,85]. Cells with high productivity, long-term stable production and good product quality, all of which are required for a production cell line, are rare within a stably transfected pool. To identify cell lines suitable for the final production, hundreds to thousands of clones are screened through multiple stages of evaluation. Typically, a primary screen is first carried out on a small scale, for example 96-and 24-well plates are used for productivity assessment [86]. High-producing clones are identified and scaled up to larger volumes, using shake flasks and micro bioreactors [87,88]. Although micro reactors can better mimic the subsequent, largerscale bioprocesses, many of the available systems are costly and can increase the development cost significantly. Using a larger culture volume at this stage provides sufficient material for moredetailed characterization of cell growth, metabolism, volumetric productivity (titer) and some product quality attributes, such as glycosylation, in either batch or simple fed-batch cultures. The short-list of the top-performing clones is finally characterized again for productivity and comprehensive product quality attributes in laboratory-scale bioreactors with well-controlled culture environments to mimic the final large-scale manufacturing. In parallel, these top-performing clones are evaluated for long-term production stability in shake-flask cultures. Crucial product quality attributes include glycosylation profiles, charge variants, aggregate levels and protein sequence variants [89,90]. Assays used for early-stage cell line development, such as ELISA, are chosen for their high throughput and requirement of small amounts of samples, whereas those used at a late stage are more complex and are able to provide more-detailed characterizations, such as HPLC and MS. By considering productivity, production stability and quality requirement, a final production clone and a backup clone are chosen for future large-scale manufacturing.

Impacts of expression vectors on glycosylation
During the past decade, R&D has focused on improving the efficiency of generating cell lines with high productivity [91]. With optimized expression vectors, high-throughput clonal selection methods and improved media formulation, it is now possible to generate cell lines with titers of more than 2 g/l on a routine basis. The timeline required to generate a high-producing cell line has also been reduced from over a year to a few months [92]. There is now a shift from increasing productivity toward improving product quality motivated by the improved analytical tools and the rise of biosimilars for existing blockbusters with expiring patents. Besides obtaining high titers to reduce cost-of-goods, regulatory approval of biosimilar products requires demonstration of comparable product quality as the innovator drugs [93,94]. Within the list of product characteristics, glycosylation stands out as one of the most crucial quality attributes and the most difficult parameter to control owing to its complex structures and sensitivity to the manufacturing process. Many steps involved during the cell line development process are crucial to the final product glycosylation profile. Known factors that contribute include the host cells, the protein itself and various cell culture environment variables [16,95]. Glycosylation differs between different mammalian host cells and even within a population of the same cells [96]. The choice of host cells is crucial to determining the glycosylation profile of a product. Few studies have looked at the impact of expression vectors on glycosylation. Different vector designs have a direct impact on the relative amounts of light chain and heavy chain for mAbs, affecting assembly and the glycosylation of the product [97,98].
Promoters and other DNA regulatory elements for enhancing recombinant protein expression could also have an impact on glycosylation because reports have shown direct and inverse correlations between protein synthesis rates and glycosylation for different proteins [95]. Antibiotic resistance genes and metabolic selectable genes, such as dihydrofolate reductase (DHFR) and glutamine synthetase (GS) have been commonly used for the generation of stably transfected cell lines [99]. Different selection markers seem to have little effect on obtaining product with the desired glycosylation despite their different mechanisms of action (Yang et al., unpublished). Many studies have looked at the impact of culture conditions and media components on glycosylation. One conclusion drawn from these studies is that the impact of each culture variable is cell line and protein specific [95].

Controlling glycosylation by targeted integration
As a consequence of our incomplete understanding of cell culture parameters that affect glycosylation, the generation of cell lines with desired glycosylation still relies on empirical clone selection, product characterization and process development. Glycosylation varies from clone to clone in a stably transfected pool [84]. Identifying a cell line with a desired glycan profile is challenging using the current cell line development process. Detailed product characterization for a large number of cell lines at early stages of cell line development is limited by the low throughput of the currently available analytical tools. Moreover, the glycosylation behavior of a cell line at an early stage might not be reflective of that at a late stage because of changes in the cell culture media, process and vessel types. Identifying cell lines with a desired glycoform profile at an early stage also requires the development of low-cost micro bioreactors to represent better the final production process. Another attractive concept is to reuse cell lines with well-characterized glycosylation pathways and product characteristics. The idea involves generating a panel of master cell lines using plasmid vectors containing homologous recognition sequences. Besides having high productivity and stable production, these master cells lines are chosen for their different glycosylation potentials and would thus be able to produce products with different glycan profiles. A specific master cell line able to produce the required glycosylation could be selected for expressing a specific protein through targeted integration and cassette exchange by using either recombinase-or dsDNA-break-based technologies [92].

Glyco-engineering host cell lines
Apart from using vector design and clonal selection to arrive at a clone capable of producing drugs with pre-targeted glycosylation patterns, various tools can be utilized to engineer the glycosylation machinery of the host cell line. Alternatively, the glycans of the protein products can be chemoenzymatically remodeled. In vitro glyco-engineering of proteins has been demonstrated in several cases, such as the use of a series of exoglycosidases to remove terminal monosaccharides from b-glucocerebrosidase for the production of Cerezyme 1 [100] and, recently, the use of glycosyltransferases for the remodeling of IgG glycoforms to regulate the levels of terminal galactose and sialic acid [101]. In a different approach, Mimura et al. [102] demonstrated that, by substituting a phenylalanine residue at position 243 of the human IgG1 Fc region with alanine, the level of sialylated IgG glycoforms can be increased by 5-15-fold. The concept of host cell line glycoengineering presents several advantages: (i) the glyco-engineered cell lines can have restricted glycosylation capabilities, hence reducing the overall glycan heterogeneity of their recombinant products, this could prevent the synthesis of unfavorable glycan structures or structures not observed on the reference materials; (ii) it is possible for glyco-engineered cell lines to produce optimized glycoforms, which would represent a small fraction of the total glycoforms in the wild-type cells. This would be highly desirable for the production of biobetter versions of therapeutic proteins, such as afucosylated mAbs.

Lectin-based glycosylation mutant isolation
Glyco-engineering of host cell lines can be done through untargeted screening [103,104] or targeted, gene-genome editing tools [105,106]. In the former approach, a common methodology is to use cytotoxic lectins for negative selection of glycosylation mutants [103]. Lectins can be rationally selected based on the targeted glycosylation feature. Maackia amurensis agglutinin (MAA) is specific for sialic acid (a2,3)Gal linkage [107,108], which is expressed on CHO cells [109]. We previously reported the use of MAA for the isolation of a CHO glycosylation mutant (R11, now renamed as CHO-gmt1) with a nonsense mutation in the gene SLC35A1 encoding the CMP-sialic-acid transporter [110]. CHO-gmt1 was shown to contain only a basal level of sialic acids and no significant amount of sialylated glycans was found on the recombinant interferon-g produced from this cell line [110]. Theoretically, MAA can also be used to isolate glycosylation mutants with loss-of-function genetic lesions in the nonredundant steps before sialylation along the glycan-processing pathway. Indeed, we have isolated a CHO mutant of UDP-galactose transporter using MAA [104] (Song et al., unpublished). Therefore, it is possible to aim for different asialo glycan structures using MAA. One intriguing discovery by Song and co-workers is that Ricinus communis agglutinin (RCA-I) only negatively selects for mutants of N-acetylglucosaminyl transferase I (GnT-I) [111]. It was shown that a panel of CHO mutants selected by RCA-I treatment all harbor mutations in GnT-I, despite different types of genetic lesions that range from insertion, deletion, frameshift, missense to nonsense mutations [111]. Therefore, RCA-I-based mutant isolation strategies can effectively function as a GnT-I-targeted gene editing (i.e. knockout) tool. Consequent to the loss-of-function mutation in GnT-I, the mutants accumulate N-glycans at the Man5 stage, making them desired hosts for the production of oligomannose-type glycoforms such as b-glucocerebrosidase. Interestingly, complementing such mutants with a functional GnT-I results in not only the restoration of the N-glycosylation pathway but also an increase in the sialylation of N-glycans [111]. This concept was validated in a perfusion cell culture process and demonstrated its capability of enhancing the sialylation of recombinant erythropoietin [112].

Targeted gene editing tools
Although screening-based glycosylation mutant selection using lectins represents a highly efficient approach, it is challenging to select for glycosylation mutants where there exist multiple copies of the same glycogene or functional redundancy. With the advent of gene editing tools, such as zinc finger nuclease (ZFN), transcription-activator-like effector nuclease (TALEN) and CRISPR-Cas9, it is now possible to create any nonlethal knockout by targeting a single or a set of genes for cell engineering.

ZFNs
ZFNs are unnatural nucleases consisting of arrays of zinc-fingerbinding proteins linked to the cleavage domain of an endonuclease, typically FokI. The zinc-finger motif specifically binds the target site on the chromosomal DNA. A pair of ZFNs binds to complementary strands of the DNA, bringing the FokI domains to dimerization, which in turn generates a double strand break. As the nonhomologous end-joining mechanism is activated in the cell to repair the DNA, insertion or deletion mutations (indels) can be introduced, which often results in functional knockout of the gene. The key to the success rate of ZFN-mediated gene knockouts is the target site and the corresponding zinc-finger sequence. An open-access pathway for zinc-finger design is the modular assembly method [113], in which each finger is designed to target a specific DNA triplicate. The fingers that bind the 5 0 -GNN-3 0 triplets are the best studied and strong DNA-binding fingers [113][114][115]. Three-fingered ZFNs or fourfingered ZFNs are commonly used. For four-fingered ZFNs, the sequence should match the ideal target sequence which is: 5 0 -NNCNNCNNCNNCxxxxxxGNNGNNGNNGNN-3 0 . The web-based ZiFiT program provided by the Zinc Finger Consortium [ZiFiT: software for engineering zinc finger proteins (V3.0)] (http://bindr. gdcb.iastate.edu/ZiFiT/) [116] can help to identify potential sites for ZFNs in the genes. Using ZFNs designed through the modular assembly method, we previously demonstrated the knockout of the Golgi GDP-fucose transporter in CHO cells [105], which were then shown to produce fucose-free glycoproteins.

TALENs
TALENs are customized transcription-activator-like effectors (TALEs) linked to FokI nuclease [117,118]. TALEs contain a central DNA-binding domain with several tandem repeats of 34 amino acids. Each repeat differs mainly in two amino acid residues at the 12th and 13th positions, the repeat variable di-residues (RVDs). Each TALE repeat binds one target nucleotide base as determined by the RVDs, with the following specificity: NI = A, HD = C, NG = T, NN = G or A [119,120]. The online tool TAL Effector Nucleotide Targeter 2.0 is available (https://tale-nt.cac.cornell.edu/node/add/ talen) [117], to assist the identification of the targeting site. Binding sites of TALEs always begin with a T (thymine). Different scaffolds are available to generate TALENs with repeats typically ranging from 15 to 20 RVDs [117,118] with spacer length of 12-21 bp. For the assembly of TALENs, Golden Gate methods [121] or high-throughput platforms [122] can be applied.

CRISPR-Cas
The CRISPR-Cas system was first discovered in bacteria and archaea to direct the degradation of complementary sequences present within invading viral and plasmid DNA [123,124]. Components of CRISPR include two noncoding RNAs: crRNA(CRISPR RNA) fused to tracRNA and CRISPR-associated nuclease (Cas9). crRNA and tracRNA together are referred to as sgRNA (single guide RNA) [123,124]. Requirements for CRISPR-Cas9-based gene targeting include an initiating G, which facilitates RNA Pol III transcription at U6 promoter region, an -NGG PAM (protospacer-adjacent motif), followed by a 20 bp crRNA target. In theory, sgRNAs can be designed to target any genomic sequence that has the consensus sequence GN 19 NGG to induce a double-strand break [124]. This has allowed the targeting of multiple genes simultaneously. Researchers from the Technical University of Denmark and KAIST reported the use of CRISPR-Cas9 to knockout Fut8, Bak and Bax in CHO cells [125]. This has opened the door to fine-tuning glycosylation and improving cell line characteristics (e.g. high productivity, resistance to cell death and viral infection) in a rapid and cost-efficient manner. Webbased tools are available for designing CRISPR-Cas-targeting constructs, for example CHOPCHOP (https://chopchop.rc.fas.harvard. edu/) [126]. However, off-target effects are a major concern because there is sequence homology throughout the genome [127,128]. Off-target effects that potentially impair the cell culture performance, for example cell growth, recombinant gene expression and quality control, should be eliminated by carefully selecting the gene target loci and optimizing the targeting constructs.

QbD for glycosylation control in upstream bioprocesses
QbD is a conceptual framework that should be adopted in manufacturing activity so as to build the product quality into the process via detailed designing in the product development stage [129]. With respect to the pharmaceutical industry, the full implementation of QbD on biologics manufacturing is more challenging compared with that of small molecule drugs because of the complexity of the biotechnology-derived products [130]. It requires a

REVIEWS
Drug Discovery Today Volume 21, Number 5 May 2016 thorough understanding of the process and aims at ensuring endproduct quality through working within carefully outlined design space for crucial process conditions. To apply a QbD framework, first the desired clinical performance of the drug must be identified to establish the quality target product profile (QTPP). Based on this profile, related CQAs can be determined together with the process parameters that can impact these CQAs [131]. By contrast, it is also necessary to monitor the process parameters in a timely fashion via process analytical technology (PAT) to ensure the previously defined specifications for CQAs are met [130].

Glycosylation-related CQAs
Earlier, we mentioned that, where CDC is concerned, the extent of galactosylation [132] of a mAb is considered a glycosylationrelated CQA (gCQA), whereas removal of the core fucose residue can increase ADCC dramatically (100-fold) [9,133]. Apart from impacting the therapeutic efficacy of the drug, the presence of other glycan motifs, such as a1,3-linked terminal galactose, posed a problem for the safety profile of the drug cetuximab [7]. All these above-mentioned glycan structures can be used to construct an array of gCQAs under the QbD framework.
Pivotal to the implementation of QbD is the pinpointing of the gCQAs followed by monitoring and controlling them. Ludger has proposed a systematic approach to identifying and ranking the gCQAs via the safety and efficacy (SE) profiling and the construction of impact maps [134]. SE profiling is a visualization tool aimed at revealing those gCQAs that alter the clinical performance of the drug. Impact maps are mathematical graphs showing the effect of these gCQAs on patient safety and clinical efficacy.

Upstream bioprocess parameters that affect glycosylation
To exert control over the predefined gCQAs, it is important to understand the impact of manufacturing conditions on the glycan distribution in the product. Apart from host cell line selection and engineering, the conditions used in running the bioreactor have been shown to affect the glycan profile on the biotherapies in a product-specific manner. Among the many process parameters, dissolved oxygen (DO) levels, culture temperatures of bioreactor and nutrient or supplement availability are commonly controlled and their effects on glycosylation have been studied extensively. Some of the findings are summarized in Table 2.

High-throughput and online-ready technologies for QbD implementation
Implementation of QbD to biologics manufacturing requires detailed characterization of the process and thorough exploration of the various conditions that can affect the gCQAs as described in Table 2. Besides being able to run many conditions in micro bioreactor systems such as Ambr TM workstation, Biolector 1 and Micro-24 MicroReactor System, a high-throughput glycan-profiling platform is also needed to assess the impact of process variables on the glycan profiles quickly in a cost-effective way. GlykoPrep [135] is such a commercial kit that allows glycan profiling in 96-well plates using proprietary reagents and micro-cartridges. Alternative platforms for high-throughput glycan analysis have also been reported. Doherty et al. used a protein A immobilization strategy to achieve efficient glycan release by PNGase F together with sample purification and concentration [136]. Beads functionalized with hydrazide groups can conjugate with aldehyde groups from sugars enabling the purification of glycans [137,138]. Utilizing this chemistry, Stockmann et al. reported a novel profiling workflow with full automation [139] and used it to profile glycans from IgG in serum samples. Instead of analyzing the released glycan, Reusch et al. presented a workflow in which the IgG samples were subjected to trypsin digestion and the resulting glycopeptides analyzed by electrospray MS [140]. Although high-throughput technology can increase the data points one can explore simultaneously, it is also advisable to employ design of experiment (DoE) techniques to maximize the information obtained from the screening data. Its application in bioprocessing has been discussed in the review by Kumar et al., with case studies of applications for chromatography and refolding steps [141].
Developing online methods for quick product characterization, including glycan profiling, is also crucial for QbD implementation because it enables real-time control over the bioreactor to inform manufacturing operations. At Genentech, an online mAb characterization strategy was proposed in which ion exchange or size exclusion separation was followed by trapping and desalting on reverse-phase cartridges with subsequent quadrupole time of flight (QToF) MS analysis [142]. Major glycoforms can be identified via MS analysis at the intact antibody level. Primack et al. from Amgen used a microchip-based CE method to profile the glycans from antibody cell culture [143]. The glycans were released from the purified mAbs and sent for CE analysis on the Caliper Labchip GXII Microchip-CE platform. M5, G0, G0F, G1F and G2F can be identified from the electropherogram. Agilent's multilayered microfluidic chip provided an integrated system for glycan release, clean-up, PGC-based separation and integrated electrospray into MS [144]. Only 100 ng of mAb was needed for the assay, and the whole analysis including sample preparation can be done within 10 min.

Quantitative models for predicting N-glycosylation in cell cultures
Although real-time monitoring of the detailed glycoforms during the biomanufacturing process still remains a challenge, now that the CHO cell genome has been sequenced [145] a systematic understanding of the pathways can offer some help in our ability to understand and eventually predict the effects of process changes on glycan profiles of the final product. Initial computational models [146][147][148] approximated the Golgi apparatus as a series of continuously stirred tank reactors (CSTRs) where reactions are centered on glycosyltransferases to generate more than 7500 potential glycan structures. Hossler et al. started to adopt the Golgi maturation model and used plug flow reactors in series to mimic the change in enzyme concentrations along the Golgi apparatus [149]. Follow-up models were developed to look into the microheterogeneity of antibody glycosylation specifically, and particular attention was paid to the nucleotide sugar donor (NSD) level to link glycosylation to cellular metabolism [150,151].

Downstream bioprocessing of glycoproteins
In common with other biopharmaceuticals, therapeutic glycoproteins typically require exceptional purity and constant potency, making effective and efficient downstream processing a crucial Effects of cell culture conditions of on glycosylation.

Effect on the bioprocess or product
Mechanism of action

Dissolved oxygen tension (DOT)
For erythropoietin (EPO)-Fc produced in CHO, the sialylation fluctuated between 11 and 14 mol sialic acid/mol EPO-Fc for 10-90% DOT, but dropped drastically at 100% DOT Mechanism not defined [106] For human follicle-stimulating hormone produced in CHO, maximum sialylation at 100% DOT Mechanism not defined [107] For IgG1 produced in murine hybridoma cells, reduced DOT resulted in reduced galactosylation Not clear. One suggested reason is that it is due to decreased UDP-Gal transport from the cytosol to Golgi [108] pH ] inhibits SiaT and GalT by increasing the pH of the trans-Golgi compartment [109] For EPO produced in CHO, at 10 mM NH 4 Cl, a reduction in sialylation was observed for both N-and O-linked glycans It might be ascribed to an increase in intra-Golgi pH. Another possibility of increase in extracellular sialidase activity was ruled out [110] For IgG3 produced in mouse hybridoma cell line, total sialic acid content is in the order of pH 7.2 > HEPES > pH 7.4 > pH 6.9; highest relative amount of agalacto structures were found in pH 7.2 and pH 6.9 culture Mechanism not defined [111] For interleukin (IL)-2 produced in BHK-21, at 15 mM NH 4 Cl there is a significant increase of tri-and tetra-antennary glycans The increase in complexity is considered to be brought by the increased levels of intracellular UDP-GlcNAc/GalNAc in the presence of high NH 3 /NH 4 + [112] Temperature For EPO produced in CHO, temperature range of 25-37 8C was used. The proportion of tetra-antennary structures reached maximum at 32 8C. The proportion of tetrasialylated glycans stayed roughly the same between 37 and 32 8C, but decreased by 24.4% when culture temperature dropped to 25 8C The author found that high specific EPO productivity at high temperature is probably not the cause instead the observed difference might be attributed to enzyme activity reaching optimum between 32 and 37 8C [113] For EPO-Fc produced in CHO, when culture temperature was shifted from 37 to 30 8C there was a reduction of sialylation by 40% Mechanism not defined [114] Nutrient supplement For interferon (IFN)-g produced in CHO, lipoproteinsupplemented medium helps to maintain the extent of glycosylation with time in batch culture Mechanism not defined [115] For monoclonal antibodies (mAbs) produced in murine hybridoma, higher glycan microheterogneity was found for culture supplemented with fetal bovine serum (FBS); G0 content was higher with FBS supplementation; with FBS addition, increase in sialylation and decrease in fucosylation were also observed It was mentioned that extracellular b-galactosidase activity increases threefold for serum-free medium [116] For IFN-g produced in CHO, glucose limitation and glutamine limitation can reduce the site occupancy of the product Nucleotide triphosphate depletion was thought to be the cause of decrease site occupancy during glucose limitation. Whereas glutamine limitation influenced glycosylation by reducing amino sugar formation [117] For chimeric heavy chain antibody (EG2) produced in CHO, deprivation of glucose resulted in a 45% nonglycosylated mAb fraction When glucose is not sufficient, it will be preferentially used for energy metabolism rather than replenishing the nucleotide sugar pool that serves as the precursor to glycosylation [118] For IFN-g produced in CHO, very low glutamine (<0.1 mM) or glucose (<0.7 mM) can lead to decrease of sialylation and increase of hybrid and high-mannose glycans Glutamine limitation can limit the formation of UDP-GlcNAc by limiting amino sugar formation. This will in turn decrease the formation of CMP-NeuAC [119] For human chorionic gonadotrophin (HCG) produced in CHO, in continuous culture setting, glutamine supplement concentration was alternating between 0 mM and 8 mM for 90 days. 15 glycan structures from HCG were profiled at various time points. Reduction in sialylation and antennarity of HCG was observed with a decrease in glutamine coupled with low glycolytic flux From metabolic flux analysis and monitoring of nucleotide sugar metabolism, it was suggested that high glutamine concentration contributes to the accumulation of UDP-GlcNAc and UDP-GalNAc which are required by glycotransferases when adding new glycan chains to biantennary structures. UDP-GlcNAc is also a necessary precursor for the synthesis of CMP-NeuAc and hence impacts sialylation [120] For IgG produced in human cell line rF2N78, aglycosylated antibody was produced when glucose was depleted from the medium. Alycosylated antibody was not observed with glucose feedng in fed-batch culture Glucose capping on oligosaccharide is a key step in the oligosaccharide recognition by the oligosacchryl tranferase, which eventually leads to glycosylation at the asparagine site [121] step in the overall production process [152]. The generic process for the recovery and purification of glycoproteins produced by animal cell culture consists of successive capture, purification and polishing steps, each comprising one or more procedures [153]. Protein A chromatography is the industry standard for commercial-scale purification of mAb products. In the research environment, however, the capture step might employ lectin or boronate affinity chromatography by selectively capturing the glycan component. Purification and polishing are conducted with ion exchange and hydrophobic interaction chromatography to remove aggregates, endotoxins, host cell proteins and DNA [154]. In addition, some intermediate, nonchromatographic purifications are also employed such as ultra-and dia-filtraton as well as clarification techniques [155][156][157].

Lectin-based purification
Lectins are carbohydrate-binding proteins and have been widely used to identify and isolate glycoproteins, glycopeptides, glycolipids and oligosaccharides [158]. Lectins are found in plants and animals, and have specificity for certain types of carbohydrate residues. The most common lectins currently being used are concanavalin A (Con A),which has specificity for a-D-mannose, a-D-glucose residues and bi-antennary N-glycans [159], wheatgerm agglutinin (WGA) with specificity for N-acetylglucosamine residues [160], jacalin with specificity for galactose and mannose residues [161], MAA and SNA, with specificities for Sia a2,3-and a2,3-linked Gal, respectively [108]. Additional lectins are available for binding to less common glycans. EPO, influenza vaccine and other therapeutic glycoproteins can be purified using lectin affinity chromatography [162,163]. However, as a class of biological receptors, lectins have significant disadvantages including multiple specificities, expensive purification costs, degradation during standard sterilization and subsequent contamination of the endproduct as a result of ligand leaching.

Boronate-based purification
Since 1970, boronate columns have been exploited as less-expensive and more-stable alternatives to lectins for the separation of glycoproteins. The most commonly used boronate adsorbent is maminophenylboronic acid agarose, which has been investigated for the specific capture of glycoproteins from protein mixtures; successful applications include the selective capture of IgG from CHO cell supernatant [164], EPO from mammalian cell cultures [165], ovalbumin from egg white [166], ovalbumin and ribonuclease B (RNase B) from BSA [167], horseradish peroxidase (HRP) from BSA, RNase B from myoglobin [168] and glycopeptides from nonglycopeptides of digested HRP [169], among others. However, to form stable cyclic esters between m-aminophenylboronic acid and cis-diol compounds, alkaline conditions are required (pH ! 8.5) [170]. It is not desirable to perform glycoprotein separation at such high pH, because it can lead to the degradation of some labile glycoproteins. To address the challenge, new boronate ligands that can work at physiological conditions have been studied and applied to the enrichment of glycoproteins [171,172].

Future needs for cost-efficient glycoform separation technologies
Unlike protein synthesis, which is under genetic control, glycan synthesis is not directly template-mediated and thus glycoproteins are typically expressed and marketed as populations of glycoforms with potentially different biological properties [173]. The different properties associated with each glycoform within the resulting heterogeneous mixture present regulatory difficulties for therapeutic glycoproteins and also problems in determining exact structure-activity relationships [173]. It is therefore important to the biopharmaceutical industry to be able to produce products with homogeneous glycoforms that can confer significant and predictable therapeutic advantages. However, protein production costs remain extremely high, with downstream processing constituting a substantial proportion (50-80%) of the overall cost [174]. It is expected that the development of new purification techniques exploiting inexpensive, specific, effective, robust and easy-to-use methods and materials will shape the future of glycoprotein purification [175].

Technologies for glycosylation analysis
The absence of a direct genetic blueprint and the nonlinear structures of glycans makes glycosylation analysis one of the most challenging physicochemical characterizations of biopharmaceutical products. The past two decades have seen tremendous developments in analytical instruments and knowledge of protein glycosylation. It has now become a consensus that, in glycosylation analysis, the question dictates the exact approach [176]. Broadly speaking, glycosylation analysis of therapeutic proteins can be performed at three levels, namely, intact protein, glycopeptide and released glycans, as illustrated in Fig. 8.

Intact glycoprotein analysis
There are a variety of analytical methods for characterization at the intact protein level, which include chromatography, electrophoresis, MS and other spectroscopic techniques. Owing to the superior resolution, MS has become a crucial analytical tool to characterize mAbs in many phases of the developmental process. Through top-down approaches that deal with intact protein samples, information such as molecular weight, and major modifications such as glycosylation, can be obtained. In top-down analysis, high-resolution mass analyzers are required, including ToF, Orbitrap and Fourier transform ion cyclotron resonance (FT-ICR) instruments to resolve small mass differences [177]. Valeja et al. showed that it is possible to achieve isotopic resolution through the use of FT-ICR for the analysis of intact IgG1 [178]. However, even with the use of FT-ICR, it is still challenging to resolve small mass changes such as deamidation, dehydration and oxidation [179]. In addition, FT-ICR is not a very commonly used instrument in the biopharmaceutical environment because of the tedious maintenance of the superconducting magnets and its high price.
Intact protein samples for top-level analysis usually undergo an online or offline separation step (e.g. LC, CE) before ionization by electrospray (ESI) or matrix-assisted laser desorption (MALDI) MS [180]. Although direct infusion of the intact mAbs is also possible, a desalting step is often required to obtain high-quality spectra with fewer salt adducts and better ionization efficiency. Using an ESI source, intact IgG samples can form quasi-molecular ions with high charge state (z in the range of 20-60) and the resulting m/z values in the range of 2000-5000 amu, which can be measured by many types of mass analyzers. By contrast, an intact IgG sample typically carries a maximum of four charges under MALDI conditions, which diminishes the resolution and mass accuracy [181].
High mass resolution required in the analysis of top-down analysis of intact monoclonal IgG antibodies without the use of FT-ICR is also achievable using hybrid QToF [182] or hybrid linear quadrupole ion trap-Orbitrap MS [183]. Through the use of the Orbitrap analyzer, Bondarenko et al. employed three different delivery methods: (i) direct nano-ESI infusion; (ii) step-elution; and (iii) online reversed-phase HPLC, to determine major glycoforms with different numbers of terminal galactose (AE162 Da) residues accurately in the intact IgG from the deconvoluted mass spectra [184]. The analysis of intact mAbs represents a fast and robust way to determine batch-to-batch consistency (analysis time is a few minutes), in which minimal sample preparation is required. However, the detection of minor glycoforms remains a challenge. In a similar fashion, antibody subunits or fragments can be analyzed by middle-down strategies based on high-resolutionaccurate MS platforms coupled with appropriate separation fronts. Different chemical and enzymatic approaches, such as reduction of inter-chain disulfide bonds by reducing agents and specific  The top, middle and bottom levels of glycosylation analysis. Depending on the exact objective, glycosylation analysis can be performed at different levels. This picture depicts the top, middle and bottom levels of glycosylation analysis in analogy to protein analysis, using monoclonal antibody drugs as an example. In top-level analysis, the intact glycoprotein is analyzed by mass spectrometry either through direct infusion or preceded by a separation system such as a liquid chromatograph. The highresolution accurate mass, (HR/AM) mass spectrometer measures, to high precision, a series of mass:charge ratios (m/z) of the IgG under question, which arise from a consecutive series of charge states. Using a computer-assisted deconvolution algorithm, neutral mass (molecular weight) of the IgG can be derived. Typically for monoclonal IgG molecules, several neural masses are revealed after deconvolution that correspond to different glycoforms and/or proteoforms. By examining the mass difference among the different forms of the intact IgG, and comparing with the mass(es) of the deglycosylated IgG, each neutral mass can be assigned to a particular glycoform or proteoform. Here, we show our data on a monoclonal antibody product, showing mainly G0F/G0F glycoform. The top-level analysis offers a unique insight into the pairing of glycans on the two heavy chains of an IgG molecule. We refer intact glycopeptide analysis as middle-level analysis. This typically involves proteolytic digestion of the glycoprotein, followed by LC-MS/MS analysis. Fragmentation of the glycan moiety can produce rich information on the composition and arrangement of the monosaccharide building blocks on the glycan. Middle-level analysis serves to answer the question of glycosylation microheterogeneity because it supports the elucidation of the carrier peptide sequence in addition to glycan compositional analysis. This is often necessary in the case of multiple glycosylation sites on a protein. Peptide-specific fragmentation is now better supported with the development of new ion fragmentation techniques, such as electron-transfer dissociation (ETD). In bottom-level analysis of glycosylation, glycans are cleaved off the carrier protein and analyzed as a pool. Popular techniques for released glycan analysis involve tagging the glycans with a fluorescent dye and separating the labeled glycans by LC-capillary electrophoresis (CE), with or without introducing them to a mass spectrometer. Away from the carrier proteins or peptides that often dominate the overall physicochemical properties of the glycoconjugate, bottom-level analysis of released glycans shows high sensitivity and quantitation power, which supports in-depth characterization of glycans including their compositions, sequences and glycosidic linkages. However, once detached from the protein backbone, the information on the local context, such as site of attachment and pairing, is lost. proteases (especially IdeS), have been demonstrated to support simultaneous elucidation of glycoforms and other PTMs located in the heavy chain of antibody molecules [185].

Glycopeptide analysis
The analysis of glycosylation microheterogeneity is essential to understanding the site-specific contribution of glycan structures on the safety and efficacy of drug products. Typical of bottom-up approaches, a glycoprotein sample is first digested by proteolytic enzymes, such as trypsin, then analyzed by separation techniques (e.g. HPLC or CE) or MS. MALDI or ESI are often used as the soft ionization methods to preserve the structural integrity of glycopeptides. The applicability of MALDI for high-throughput analysis has been recently demonstrated in a study profiling the N-glycans of the Fc region in IgG through glycopeptide analysis [186]. Such methods could be implemented when analyzing large batches of samples such as during process development or batch-to-batch monitoring. Competitive ion suppression (glycopeptides vs peptides) is often a major hindrance resulting from the complexity of enzymatic digestions of the glycoprotein. Therefore, various methods for glycopeptide enrichment have been developed, such as the use of size exclusion chromatography [187] and hydrophilic interaction chromatography with cellulose or sepharose [188]. Alternatively, an online hydrophilic liquid chromatography (HILIC) separation that is coupled to nano-ESI-MS or UPLC-MS could also be employed based on the hydrophilicity of the glycopeptides [189][190][191][192].
Tandem mass spectrometry (MS/MS) is required to elucidate the sequences of the carrier peptide, the glycan composition and the site of glycan attachment. Collision-induced dissociation (CID), such as high-energy C-trap dissociation (HCD) in Orbitrap instruments, generates distinct product ions of peptide with the innermost GlcNAc residue of an N-glycan (i.e. the y1 ion), which can be used to determine the N-glycosylation site [193]. Using database searching for bottom-up proteomic measurements with the core GlcNAc as a variable modification, the identity of the glycopeptide together with the N-glycosylation site can be determined. The presence of glycan oxonium ions (e.g. b ions at m/z = 204 for HexNAc, and 366 for HexNAc + Hex, and 276 for dehydrated Neu5Ac) at low mass range are also used as a marker for glycosylation. CID fragmentation of glycopeptides almost exclusively results in the cleavage of glycans from the peptide backbone to generate a series of y-fragment ions related to glycosidic cleavages, at higher mass range [193,194]. Thus, CID is a suitable technique for determining glycan compositions and even structures based on the glycan sequence tag. The use of QToF-MS/MS to obtain CID fragment-type ions has been widely used in the analysis of sialoand asialo-complex-type N-glycopeptides to obtain information on the glycan moiety [195]. In more recent papers, information on peptide sequence, glycan moiety and glycosylation site were obtained within one analytical run with the use of alternating high and low collision energy (MS E ) [182]. Precautionary measures have to be taken when analyzing the glycan moiety from the CID spectra of glycopeptides, because of the possibility of fucose rearrangement, which could be mistaken for b-and y-ions [196,197].
Newer fragmentation techniques, such as electron-activated dissociation (ExD), electron capture dissociation (ECD) and electron transfer dissociation (ETD), preserve labile PTMs such as glycosylation [198][199][200] but selectively fragment the peptide backbone, allowing the determination of peptide sequence and N-glycan attachment sites through the generation of peptide fragment ions and a mass-shift resulting from the attached glycan [201]. ExD complements CID and the two techniques can be combined to obtain information about the glycopeptide. Analysis of the glycopeptide can be done on MALDI-MS/MS to provide information of the peptide and glycan moiety. Depending on the configuration of the instruments, various distinct fragment ions can be observed. However, ExD only works efficiently for ions of high charge states and in the low m/z range; one way to circumvent this would be the addition of m-nitrobenzyl alcohol into the mobile phase to increase the charge-state of glycopeptides [202]. With the use of MALDI-ToF/ToF, cleavage of the side-chain amide bond of glycosylated asparagine usually occurs, giving a diagnostic peak together with b-and y-ions from the peptide backbone. Advances in instrumentation, particularly fragmentation techniques, now relocate the major bottleneck in glycopeptide analysis to the software solutions that are needed for high-throughput, high-confidence MS interpretation. This point will be discussed below.

Analysis of released glycans
The determination of monosaccharide composition, glycosidic linkage and relative abundance of a glycan in a mixture (Fig. 1) is highly relevant to understanding the safety, efficacy and batchto-batch consistency of glycoprotein therapeutics. Depending on the exact analytical requirement, glycans released from a protein carrier can be profiled, characterized and/or analyzed in-depth. Glycan analysis poses significant technical challenges owing to the lack of a chromophore in the natural carbohydrate molecules and structural heterogeneity caused by anomericity, linkage positions and configurations. Therefore, released glycans are often fluorescently labeled for sensitive detection [203] or derivatized to improve the ionization efficiency and minimize in-source or post-source decay in MS analysis [204]. Over the past two decades, high-performance tools have been developed for structural analysis of released glycans, mainly based on high-resolution separations (e.g. HPLC/UPLC, CE) and MS [176].

Liquid chromatography
HPLC-based glycan analytical workflows typically involve fluorescent labeling of glycan species. Several fluorescent tags have been attached to N-glycans, such as 2-aminobenzamide (2-AB) and 2aminobenzoic acid (2-AA) [203]. Recently, two new fluorescent labels for N-glycans have been reported, namely RapiFluor-MS TM (RFMS) [205] and 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC) [206], both based on the rapid reaction of glycosylamine with N-hydroxysuccinimide-activated carbamate. The use of UPLC has greatly reduced the amount of time required for data collection and gives good resolution because it can withstand higher pressures as a result of the smaller particle size (often less than 2 mm), in comparison to HPLC. UPLC has become the method of choice for efficient separation owing to shorter separation time, better resolution, particularly of arm-specific isomers, and relatively easy coupling to mass spectrometers [207,208]. The use of HILIC as a separation mode and a fluorescently labeled dextran ladder as an external calibration standard has enabled fast and highly reproducible glycan analyses across different systems and operators [209,210]. A dedicated experimental database (i.e. Glycobase) contains normalized retention time values (GU values) for a wide collection of glycan species analyzed by HILIC-HPLC/UPLC, in addition to data collected from RP-HPLC and CE-LIF platforms [211]. Coupled with exoglycosidase array digestions (Fig. 9), this has enabled relatively straightforward structural assignment [211,212]. By specifically removing terminal monosaccharides of particular linkages, exoglycosidase array digestion also helps identify co-eluting glycan species (Fig. 9). Recently, a robotized and low-cost N-glycan analysis platform to profile IgG was developed. The platform includes features from rapid IgG purification from cell culture through affinity binding to the labeling of released glycans in a 96-or 384-well format with excellent reproducibility based on UPLC analysis [139]. The development of high-throughput platforms for the analysis of IgGs will prove to be invaluable for clonal selection of mAb drugs, development of biosimilar candidates and genome-wide association studies [14,213]. An added advantage in using UPLC is its ease of coupling with MS technologies, which can be used as an orthogonal method for glycan analysis. One commercially available LC-MS technology for detailed glycan analysis is the UniFi 1 biopharmaceutical platform. The UniFi 1 platform incorporates HILIC-UPLC-based glycan separation, fluorescence detection and online high-resolution mass confirmation, as well as integrated informatics for glycan identification based on normalized retention time (expressed as GU values) and accurate mass. The HILIC-UPLC-MS approach is especially powerful in light of a new N-glycan labeling reagent, RFMS, which gives up to 160-times higher MS sensitivity to N-glycans compared with the traditional 2-AB label [205].  Determining monosaccharide compositions and linkages using exoglycosidase array digestion of N-glycans. In case of ambiguity caused by co-elution of glycan species, an array of exoglycosidases can be used to remove monosaccharide building blocks sequentially from the nonreducing (outer) to the reducing (inner) end. The undigested and digested glycan samples are analyzed by HILIC-UPLC. By following the movement of chromatographic peaks in response to each of the enzymes, one can deduce the sequence and linkages of monosaccharides of the glycan in question. In this figure, we show two IgG-derived N-glycans that showed close retention times, namely A2G2S1 (boxed in red) and Man7 (boxed in green), in the undigested chromatogram. The array of enzymes used in this example include: Arthrobacter ureafaciens sialidase (ABS), bovine testes b-galactosidase (BTG), bovine kidney alpha-fucosidase (BKF), b-N-acetylglucosaminidase (GUH). Depending on the specificity of the exoglycosidase, co-eluting glycans can be separated apart. In this case, treatment of A2G2S1 removed the sialic acid and resulted in forward shift of 0.7 GU (indicated by the red arrow) which is characteristic of a Neu5Ac residue. BKF treatment on top of ABS did not change the glycan profile, thus confirming the absence of fucosylated glycans in this sample. A further digestion using BTG shifted the A2G1 structure forward by 1.6 GU to the position of A2, which suggested the removal of two Gal residues. Finally, a further digestion by GUH shifted the A2 forward by 1 GU to the position for Man3, which suggested the removal of two GlcNAc residues. Throughout the digestion, Man7 remained at the original elution envelope (indicated by the green arrow). Taken together, the exoglycosidase array enables the deconvolution of co-elution structures and in-depth analysis of glycans in a highly quantitative and linkage-specific manner.

Capillary electrophoresis
Capillary electrophoresis with laser-induced fluorescence (CE-LIF) is another powerful technique for analyzing released N-glycans that are typically labeled with 8-aminopyrene-1,3,6-trisulfonate (APTS). The separation of the labeled glycans is based on the charge:hydrodynamic-volume ratio. Similar to HILIC-HPLC/ UPLC, there is an experimental database for normalized retention, expressed in GU values, of glycan species, for CE-LIF-based analytical platforms (http://glycobase.nibrt.ie/) [214]. CE-LIF can be used alone or in conjunction with UPLC as a 2D method [214]. Like UPLC, CE-LIF can be coupled with MS. However, it still remains a challenge and less explored in the field of glycomics.
A similar technique to CE-LIF is capillary gel electrophoresis with laser-induced fluorescence (CGE-LIF) based on a DNA sequencer for APTS-labeled-glycan analysis [215]. This technique is capable of automated high-throughput glycoprofiling. However, owing to the lack of a publically available database, assignment of the glycan structures remains a challenge

Mass spectrometry
The most widely used ionization methods for glycan analysis by MS are MALDI and ESI. MALDI is a fast and robust method to characterize and profile neutral glycans. However, MALDI-MS leads to dissociation of labile glycosidic bonds such as sialic acids, thus the need for the glycans to be permethylated or the sialic acid residues derivatized [216]. ESI is a milder ionization method that can be coupled to online separation methods such as LC. Although ESI is capable of ionizing native glycans, it still suffers from poor ionization of the hydrophilic glycans. In many cases, single-stage mass spectrometry (MS1) cannot confidently assign structures to glycan-related ions because of ambiguity arising from isobaric and isomeric structures. MS/MS, using common fragmentation techniques such as CID, can resolve some of the issues but can be limited in solving some of the structural isomers. However, it has been shown that the use of low-energy CID on the protonated adduct of fucosylated N-glycans could lead to misleading fragment ions, caused by the rearrangement of fucose residues from other parts of the molecule [196]. Recently, Harvey et al. have shown the use of negative ion fragmentation provides more structural information (e.g. cross-ring fragmentation) as compared with the positive ion mode without the need to derivatize N-glycans [217,218]. A more recent fragmentation technique, electronic excitation dissociation (EED), has been implemented to give a more sensitive and higher-throughput analysis. Yu et al. have demonstrated that permethylation and labeling of the reducing end, together with the high resolving power of MS, enabled them to process accurately the complex yet rich structural information from EED spectra [219]. The use of negative electron transfer dissociation (NETD) as a fragmentation technique is also very useful for the analysis of glycosaminoglycans (GAGs), this results in a lower level of sulfate loss [220]. Advances in nanoflow LC-MS has led to the development of chip-based technologies (nanoHPLC chip-MS) that promise faster and more-sensitive glycoprofiling strategies for the analysis of glycans from monoclonal antibodies [144,221]. Recent publications demonstrate the use of ion mobility separation as an added dimension to MS (IMS-MS) for separating isobaric Nglycans from a mixture. The glycans separate on the basis of mass, charge and shape, determining their unique collisional crosssection (CCS) [218]. This provides complementary information to the existing analytical methods. The potential of coupled technologies, such as LC-IM-MS, in the separation and structural analysis of complex carbohydrates has already been demonstrated, where the CCS of glycans and their fragments can be estimated from a set of calibration data (e.g. a dextran ladder or negatively charged Nglycans and their fragments) [218,222]. Yamaguchi et al. have demonstrated the use of 3D methods, for example UPLC-IMS-MS, to separate pyridylaminated biantennary isomeric N-glycans successfully [223], which could also prove to be a valuable tool in future glycomic studies.

Informatics for glycosylation analysis and pathway modeling
The development of powerful analytical tools such as CE, HPLC and MS significantly promoted glycan data accumulation [204,224]. In response, plenty of software and databases have emerged to facilitate glycan data analysis. For example, Glycobase (http://glycobase.nibrt.ie/) documents the experimental analytical data for a large collection of N-and O-glycans generated from liquid chromatography and capillary electrophoresis [211]. Uni-carbDB curates data for glycans analyzed by LC-MS/MS [225]. There are several excellent reviews on currently available software and databases [212,[226][227][228]. This section does not aim to review these databases but rather to focus on automated methods for MSbased glycan and glycopeptide analyses, the latter expected to gain more importance as a result of new antibody-based fusion protein drugs that contain multiple N-and O-glycosylation sites [10]. This will be followed by a review of current methods for glycosylation pathway modeling and analysis.

Data interpretation of MS-based glycosylation analysis
A multi-institutional study comparing glycan-profiling methods shows that MS is an efficient method for the characterization of glycan [229]. However, the automation of MS spectra interpretation is still a bottleneck in MS-based glycomics projects [230]. Several software platforms have been developed in the past few years serving this purpose. Here, we summarize some of the currently available tools on glycan compositional analysis through MS spectral data. Table 3 summarizes the key search engines for glycosylation data processing based on library matching. GlycoMod is a web tool designed to find all possible glycan compositions by experiment mass [231]. It is based on a pre-generated in silico library of glycan masses. By comparing experiment mass and theoretical mass stored in the library, GlycoMod returns all compositions under the same experiment mass. It supports parameter restrictions such as type of glycan, derivatization and number range for each monosaccharide and also supports queries on glycopeptides when the peptide sequence is known. Input experiment masses are confined to singly charged ions or neutral species and because of lacking biological filters implausible structures are also included in the hit list. To avoid giving results with nonbiologically relevant carbohydrate compositions, GlycosidIQ [232] matches experimental data with theoretical glycofragment mass generated from GlycoSuiteDB (which is now integrated into UniCarbKB). Meanwhile, GlycoFragment and GlycoSearchMS, a set of web tools interpreting MS/MS spectra [233,234], has also been developed employing mass fingerprinting and matches experiment mass with a library of theoretical spectra extracted from SweetDB, which comprises N-linked and O-linked glycans. A similar tool that searches against carbohydrate databases with glycans reported in the literature is GlycoPep DB [235]. Together with a library of potential peptide sequence masses, GlycoSpectrumScan [236] finds out glycoprotein compositions by analyzing single MS data. The user needs to input a list of possible oligosaccharide compositions of the sample and a list of peptide masses after in silico digestion of the sample protein. This software supports analysis of multiple charged ions, identification of N-and O-linked glycoforms and glycan assignment for proteins with multiple site glycosylation.

Database search
Cartoonist is an algorithm that helps to label MALDI-MS peaks with cartoons for permethylated N-linked oligosaccharides released from glycoproteins [237]. It matches peaks to potential glycans by comparing a table of glycans with theoretical mass and abundance of isotopes. Rather than arbitrarily cutting off the low intensity peaks, Cartoonist uses isotope envelopes as a filter for mass matching. Different species have different databases in response to restrictions on sample source. Further modifications are added to enable O-linked oligosaccharide analysis from MS/MS data [238]. Apart from release glycan analysis, N-glycopeptide identification is achieved by combing single MS-characterized N-glycans with MS/MS-identified glycosylated peptides [239]. These programs help to specify oligosaccharide composition but do not give information concerning the type of bond or the isomeric oligosaccharide.
Web tools matching against theoretical product ions include GlycoPep ID, GlycoPep Grader, GlycoPeptideSearch (GPS), Glyco-Pep Detector and GlycoMaster DB. GlycoPep ID is specialized in identifying peptide portions from glycoproteins when nonspecific enzymes are employed [240]. It calculates the theoretical m/z of all plausible product ions for input glycoprotein and compares the result with CID for a match. No scoring algorithm is included in this method to screen out the most likely glycopeptide candidate. GlycoPep Grader [241], another free web tool that helps to determine glycopeptide composition using MS/MS data, scores glycopeptide candidates using CID data in a charge-state-free manner. GPS is also a program for glycopeptide characterization using CID data [242]. MS/MS data were screened for glycosylation oxonium ions at m/z 204 and 366 followed by matching of intact-peptide fragment ions with an in silico digested peptide list. Glycan structures are deduced by matching glycan compositions in Glyco-meDB with the precursor masses after subtraction of previously identified peptide masses. GlycoPep Detector [243] is a web tool that targets ETD data analysis. It first generates a library of all possible fragmentation ions for known glycopeptides and then assigns a core for each input composition after searching their m/z value of c-, z-and y-ions against the library. Each time only one MS/MS spectrum is processed. GlycoMaster DB [244] is software for processing HCD and ETD spectra when interpreting MS/MS data of N-linked glycopeptide. It simultaneously searches a library of glycan structures from GlycomeDB and a library of protein sequences specified by the user. Besides searching for N-linked glycopeptide diagnostic peaks, GlycoMaster DB incorporates ion ladder checks to confirm the glycopeptide spectrum. If ETD data are provided, separate lists of potential glycans and peptides are computed independently. Glycopeptides are finally identified by scoring all pairs of glycan and peptide. Methods relying on glycan databases are limited by the lack of comprehensive collections of glycan structures and are insufficient to discover new glycan structures that have not yet been reported. Another limitation of performing database searches for glycoprotein identification is the need-to-know potential protein sequences.

De novo sequencing
To discover glycan structures that are not registered in any existing glycan databases, while avoiding too many implausible structures being assigned, approaches independent of database search have been invented to facilitate automatic interpretation of MS data for released glycans and glycoproteins (Table 4).
STAT is a pioneer software that uses MS/MS data for oligosaccharide composition analysis [245]. It determines sequence information for glycans of up to ten monosaccharide residues. After the user specifies the monosaccharide component, charge carrier, precursor and product ion mass, STAT generates all possible topologies for potential compositions. A rating system is used to list the most likely structures based on number of bond cleavages. The limitation of this method is that the user needs to input information at each step and the determination of large N-glycans is not supported. N-glycans containing bisecting GlcNAc are also beyond its analytical scope. GLYCH (GlycanCharacterization) is an algorithm for automated MS/MS glycan identification and characterization [246]. It utilizes cross-ring fragmentation patterns to derive linkage information between monosaccharide residues and distinguishes isomeric oligosaccharides from mass spectra. Low-intensity peaks are removed before processing the spectrum in GLYCH, which might lead to neglect of the right peaks for glycan structure.
Glyco-Peakfinder is a web application that determines glycan compositions by de novo annotation of glycan MS spectra [247]. It enables users to specify possible cross-ring fragments, potential modifications and charge state. Similar to GlycoFragment, theoretically possible fragments of a carbohydrate structure are enumerated. The difference is that Glyco-Peakfinder does not depend on glycan databases with theoretically computed masses or prior experimentally verified structures. All possible compositions are computed based on user-defined constraints. Multiple annotations for a single peak would need the user to judge the most probable structure because no scoring function is applied. Automatic peak detection is not covered so the user has to specify the monoisotopic m/z values. De novo sequencing tools could be used to find new structures not reported previously but expert knowledge is required in picking a proper solution to match experimental data.

Glycosylation pathway analysis
With the growing number of databases accumulating experimental data of glycan structures, there is a need to link these structures to our existing knowledge of genes and proteins to understand complex cellular processes. To associate chemical attributes of glycans with genomic information, several groups have developed glycan-related pathway-analysis tools. This section will describe these resources.

GlycoVis
The main biosynthetic pathways of N-linked glycans have been well studied in mammalian cells. A relatively small number of enzymes and nucleotide sugars are involved but single enzymes sometimes utilize multiple substrates so that the biosynthetic pathway is more like a network of 800 or so proteins [248]. Hossler et al. created a program, called GlycoVis, to visualize the relevant reaction pathways leading to a distinct N-linked glycan structure. Distribution of glycan structures on reaction pathways is also displayed in different colors [249].

Glyco-Net
Glyco-Net is a dynamic functional presentation of relationships among various biological molecules and biological events. It is part of Glycoconjugate Data Bank [250]. Besides glycans, biological molecules like carbohydrate-binding proteins, genes, inhibitors, lipids and glycosphingolipids are all included. Diseases as well as biosynthesis are all considered into biological events. External linkages are also enabled to obtain detailed information of biological molecules. Glyco-Net serves as an interface between different biological databases and dynamically outputs the functional network among biological molecules.

KEGG GLYCAN
KEGG GLYCAN is a nonredundant glycan database that stores carbohydrate structures that were originally derived from Carb-Bank [251]. It is a comprehensive glycome informatics resource integrating knowledge-base of protein networks, genomic information and chemical information [252]. This resource includes: (i) GLYCAN, a database of glycan structures; (ii) glycan-related pathways; and (iii) composite structure map (CSM), a tool that illustrates all possible variations of carbohydrate structures. KEGG GLYCAN provides a structural blast tool called KCaM. It aligns two glycan tree structures and finds the maximum common braches. It cleans redundant entries obtained from CarbBank and merges to single entries. Another useful tool in this resource is KegDraw, it enables querying glycan structures by drawings.
Carbohydrates are synthesized by different types of glycosyltransferases. Each enzyme recognizes certain substrates and catalyzes specific biosynthetic reactions. Once the repertoire of glycosyltransferases is known, the repertoire of glycan structures can be predicted theoretically (and vice versa). Based on this, CSM converts genomic glycosyltransferase information into variations of glycan structures [253].

GlycoVault
GlycoVault is a bioinformatics infrastructure that facilitates visualization, analysis and modeling of glycan pathways [254]. It supports storage of a variety of data formats using relational databases and ontologies. Two glycomics analysis tools are included: GlycoBrowser and GlyMpse.
GlycoBrowser is a biological pathway visualization tool that employs graphical representation of glyan structures in the biosynthetic pathways. Like Glycan Builder [255], GlycoBrowser allows users to build graphic glycan molecules. The difference lies in the restrictions it enforces using ontology with graphical editing. If experimental data or relevant information are present, GlycoBrowser will overlay experimental data with the displayed pathway.
GlyMpse is an ontology-driven simulation tool that models biochemical pathways. Biochemical entities such as glycans and enzymes are represented as nodes and enzymatic reactions as edges. Simulation datasets comprise metabolic pathways and enzyme kinetics datasets. Pathway simulation is based on firing delays and enzyme concentrations.

Glycan Pathway Predictor (RINGS)
As a part of the resource for informatics of glycomes at Soka (RINGS), Glycan Pathway Predictor (GPP) is a web-based tool that models specified glycans with substrate specificity rules [256]. It implements the mathematical model of N-glycosylation developed by Krambeck et al. [146,147]. GPP can be used to generate potential glycans catalyzed by a select set of glyco-enzymes together with detailed pathways and corresponding enzymes. Taking any glycan structure as an entry point, GPP will retrieve a possible glycan synthesis pathway map with a selected glycoenzyme set.
GPP utilizes KEGG Chemical Function (KCF) format as the input for glycan structures but enables multiple output formats by using LinerCode 1 . Free from the requirements of detailed concentration information of substrates, GPP does not support enzyme kinetic simulations. If the enzyme set is not well picked or the input format is wrong, GPP might fail to return any result.

Experimental implementation of glycosylation pathway analysis
Glycosylation pathway analysis is based on information of glycan structures, glycosyltransferases, glycosidases and other glycanbinding molecules. New experimental technologies accelerate the accumulation of such fundamental knowledge. A mediumthroughput quantitative, real-time reverse-transcriptase PCR (qRT-PCR) platform now allows the verification of glycomic pathway models that associate changes in glycan abundance and gene expression level for corresponding biosynthesis enzymes [248]. Recent development of high-throughput HPLC/UPLC technologies facilitates straightforward glycan analysis and enriches the knowledge-base of glycan structures [210]. Advances in genotyping technologies now permit identifying and validating millions of new single nucleotide polymorphisms (SNPs), which further enables hypothesis-free genome-wide association studies (GWAS) [257]. A GWAS study that combines high-throughput glycomics analysis discovered that HNF1 and its downstream target HNF4 play an important part in key fucosyltransferase and fucose biosynthetic gene regulation [213]. A more recent GWAS study identified five genes that have no prior report on their participation in protein glycosylation to be strongly associated with IgG N-glycosylation and autoimmune diseases [14]. With more pieces of information added into our knowledge-base, glycomic pathway analysis would be more informative and would give better guidance in new discoveries and provide better ways of controlling the cellular pathways in bioprocessing.

Concluding remarks
The biopharmaceutical industry has experienced tremendous growth during the past two decades [12]. mAbs provide a major driving force in this development. The industry will be further propelled by the rise of biosimilars [10]. The cost of goods for therapeutic mAbs has dropped significantly over the years because of advancements in the cell culture processes [91]. In addition, guidelines for regulating the biosimilars have placed an emphasis on glycosylation analysis and comparability [79,83]. Therefore, one major challenge in successfully developing therapeutic mAbs lies in controlling glycosylation. Glycosylation will play an even more important part in the light of biobetter drugs [258]. Producing optimal and consistent glycosylation requires an integrated approach. At present, efforts toward this aim are mainly concentrated on empirical adjustments of upstream cell culture parameters. With the accumulation of knowledge and development of new tools, we see opportunities in the following areas: (i) custommaking glycoforms by editing the glycosylation machinery of the host cells; (ii) development of online and atline analytical technologies to monitor and adjust the process continuously; (iii) the effective application of QbD and PATs in real-time monitoring and controlling of glycosylation; (iv) glycoform-specific downstream purification strategies; (v) integrated informatics platforms for data processing and model-driven glycosylation predictions; and (vi) systems biology of molecular and cellular events in the cell culture. An integrated approach combining the capabilities of the above areas will truly enable the industry to harness glycosylation for successful biopharmaceutical drug development.