Recent developments in bacterial protein glycan coupling technology and glycoconjugate vaccine design

The discovery of the Campylobacter jejuni N-linked glycosylation system combined with its functional expression in Escherichia coli marked the dawn of a new era in glycoengineering. The process, termed protein glycan coupling technology (PGCT), has, in particular, been applied to the development of glycoconjugate vaccines. In this review, we highlight recent technical developments in this area, including the first structural determination of the coupling enzyme PglB, the use of glycotags for optimal glycan attachment and the possible applications of other glycosylation systems and how these may improve and extend PGCT.


Bacterial N-linked glycosylation background
Sequencing of the Campylobacter jejuni (strain NCTC 11168) genome in 2000 confirmed that a genetic locus hypothesized to be involved in general protein glycosylation (Szymanski et al., 1999) was independent of the lipooligosaccharide and flagellar O-linked glycosylation loci (Parkhill et al., 2000).Central to this region was a gene termed pglB that encodes the oligosaccharyltransferase (OST) PglB (CjPglB), which was found to have significant similarity to the eukaryotic Stt3p protein, an essential component of the eukaryotic OST complex (Zufferey et al., 1995).Subsequent studies performed with C. jejuni pgl mutants led to the identification of the oligosaccharide that is transferred to proteins by PglB (Young et al., 2002) and the characterization of the functions of the genes within the locus (Linton et al., 2005).Functional recombinant expression of the general glycosylation locus in Escherichia coli (Wacker et al., 2002) established, for the first time, that recombinant glycoproteins could be produced.Additional investigation of CjPglB demonstrated relaxed glycan specificity and transfer of a number of polysaccharides to an acceptor protein, containing the extended glycosylation sequon D/EYNXS/T, compared to the short eukaryotic NXS/T (where X and Y are any amino acid except proline) (Kowarik et al., 2006b), in the recombinant E. coli expression system (Feldman et al., 2005).The functional recombinant expression of the glycosylation locus combined with the relaxed specificity of CjPglB allows for the production of novel glycoprotein combinations with potential applications in vaccine development.
The most successful human vaccines are often glycoconjugate-derived where the combination of a protein coupled to a glycan elicits both T-cell-dependent and T-cell-independent responses, evoking a protective and lasting immunity.Examples of such vaccines currently licensed include those to combat Haemophilus influenzae, Neisseria meningitidis and Streptococcus pneumoniae.However, using current technology, it is difficult to produce these vaccines as this requires both purification of capsular polysaccharide glycan from the pathogen and subsequent multistep chemical coupling to a suitable protein carrier.A major advance is a biological approach, where a given glycan is coupled to a target protein using an E. coli recombinant system to produce an inexhaustible and readily purified supply of glycoconjugate vaccine.
In the past few years, a new technique termed protein glycan coupling technology (PGCT) has been pioneered.This allows the production of vast combinations of recombinant protein-glycan structures and has enormous implications for the design of novel glycoconjugate vaccines (Langdon et al., 2009).In this review, we highlight the most recent technological developments that may contribute towards further improvement and optimization of PGCT.

Protein glycan coupling technology (PGCT)
The growing necessity for substituting commercially available polysaccharide-only vaccines with conjugated vaccines is due to the need to shift the host immune response from a T-cell-independent response, responsible for the production of IgM which opsonizes bacteria (Adderson, 2001), to a Tcell-dependent response.This shift can be achieved when an immunogenic carbohydrate is coupled to an immunogenic carrier protein.The current conjugated vaccine production system chemically links a polysaccharide to a protein in vitro.The methodology is expensive and time consuming due to difficulties in polysaccharide isolation and purification as well as yield loss during the multiple rounds of purification (Frasch, 2009).PGCT can be divided into three procedures.First, the target locus encoding the glycan has to be faithfully amplified, cloned and expressed in E. coli, usually on a suitable plasmid (Plasmid 1).Second, the target carrier protein has to contain the appropriate D/ EYNXS/T consensus sequon and be cloned into a suitable plasmid (Plasmid 2).If necessary, the consensus sequon can be engineered onto the target protein.Third, the coupling enzyme (a bacterial OST PglB) has to recognize the reducing end sugar on the target glycan in order to transfer it to the target protein.The gene encoding PglB can be on a third plasmid or added along with the target glycan in Plasmid 1.The plasmids are transformed into an appropriate E. coli host strain to produce a recombinant glycoprotein that can be subsequently purified to form the glycoconjugate vaccine (Fig. 1).
PGCT relies on an OST to transfer a suitable polysaccharide, a given capsule or other glycan to a selected carrier protein.This technology offers advantages compared to chemical conjugation (Fig. 1).(i) Because the given polysaccharide is assembled in E. coli and not subjected to several rounds of purification, it is more likely to be structurally intact; (ii) an inexhaustible source of glycan can be produced, (iii) the use of a recombinant E. coli system eliminates the need to work with pathogenic bacteria and (iv) the whole process requires only a single purification step, simplifying the procedure and improving overall yield.
However, PGCT also has its limitations.Firstly, CjPglB was demonstrated to transfer only glycans that have a reducing end sugar containing an acetamido group in the C2 position (Wacker et al., 2006).This limitation has a great impact as to which polysaccharides can be transferred onto the carrier protein.For example, compatible reducing end sugars have been identified in E. coli, C. jejuni, Pseudomonas aeruginosa, Shigella dysenteriae and Francisella tularensis; however, polysaccharides present in Streptococcus pneumoniae, Streptococcus suis and Actinobacillus pleuropneumoniae often lack the required acetamido modification.In order to tackle this problem, a search for other PglBs with different specificities is on-going; also, the recent determination of the Campylobacter lari PglB structure (discussed in detail in this review) may lead the way to engineered PglBs with broader specificities.
Additionally, the carrier protein needs to be carefully chosen.In commercially available conjugated vaccines, the carrier proteins are often inactivated bacterial toxins from Clostridium tetani and Corynebacterium diphtheria, which were chosen for their ability to induce an immune response (Dagan et al., 2010).Despite being highly immunogenic, none of the currently used toxins are substrates for PglB because they do not contain the glycosylation consensus sequon.If the consensus sequon is absent, it is possible to engineer it into the protein.For example, exotoxin A from P. aeruginosa is not naturally recognized by CjPglB but following the addition of two sequons, the modified protein can accept glycans and is now being tested as a glycoconjugate vaccine (Ihssen et al., 2010).Initially, it was considered that care had to be taken when selecting the grafting site in order not to interfere with the structural stability of the protein; however, it has been recently demonstrated that tags containing the ideal consensus sequence (glycotags) can be added to the N-and C-termini of proteins (Fisher et al., 2011).This means that it is not necessary to know the three-dimensional structure of the target protein in order to select an area into which to add a sequon.
PGCT has proved to be a successful approach to produce glycoconjugate vaccines as there are currently two vaccines in trials.The first glycoconjugate vaccine solely produced in E. coli has completed phase I clinical trials in human volunteers and was developed by GlycoVaxyn AG, a Swiss Biotechnology company (8.10.10 press release).This vaccine protects against Shigella dysenteriae O1 and uses exotoxin A as carrier protein.GlycoVaxyn AG have also developed a vaccine against Staphylococcus aureus infection using capsular polysaccharide, demonstrating that the system can also be used for Gram-positive organisms.

C. lari PglB structure and contribution to the development of PGCT
Acquisition of structural data highlights important features of a protein which facilitates understanding its enzymic characteristics.In the context of PGCT, structural knowledge of the OST PglB may contribute to future engineering of the enzyme so that it has broader specificity and, potentially, higher activity (Kowarik et al., 2006b).
Recently, Lizak et al. (2011) reported the structure of a complete PglB enzyme from C. lari (Fig. 2a).C. lari PglB is 56 % identical to C. jejuni PglB (Schwarz et al., 2011) and is able to complement the function of C. jejuni PglB in E. coli when co-expressed with the pgl locus.Lizak et al. (2011) confirmed the presence of two domains, a periplasmic domain (amino acid positions 433-712) and a transmembrane domain (amino acid positions 1-432) (Lizak et al., 2011).Apart from the covalent linkage, the two domains are also connected by non-covalent interaction provided by an extended periplasmic loop 1 (EL1) (Fig. 2b).The periplasmic domain contains an a/b fold as previously reported for C. jejuni PglB and Pyrococcus furiosus AglB (Igura et al., 2008;Maita et al., 2010).The transmembrane domain contains 13 membrane-spanning segments connected by short cytoplasmic and external loops with the exception of the long loops EL1 and EL5.The sequonbinding and catalytic sites are formed by transmembrane segments TM1-4 and TM10-13, which also provide the interface with the periplasmic domain.When the PglB is bound to the acceptor peptide, two pockets are formed above the membrane surface; the left pocket binds the peptide and the right pocket is speculated to bind the lipidlinked oligosaccharide (LLO), since it contains the amino acid residues shown to be required for catalysis.The pockets are connected where the asparagine protrudes into the catalytic pocket (Fig. 2a).
Since the peptide co-crystallized with PglB and formed a loop of almost 180 u , it was speculated that a protein containing sequons for glycosylation had to present these to PglB in an accessible, flexible and surface-exposed loop (Kowarik et al., 2006a) as it was observed that the pocket would not accommodate a fully folded protein (Lizak et al., 2011).This observation also confirms the impossibility of having a proline, a rigid amino acid, at the +1 position of the sequon D/EYNXS/T, which is in agreement with previous experimental evidence (Kowarik et al., 2006b).
The structure also elucidated the necessity for a serine or a threonine at the +2 position of the sequon as the b- hydroxyl group establishes three hydrogen bonds with each one of the side chains of the W463 W464 D465 motif (Fig. 2c).These hydrogen bonds saturate the hydrogen-bonding capacity of the b-hydroxyl group, contributing to the physical separation of the +2 threonine from the acceptor asparagine residue.This observation is remarkable because it demonstrates that, although essential for glycosylation, the +2 threonine is not directly involved in the catalysis as previously thought but rather in the PglB glycopeptide specificity.It is also of note that when the sequon contains a threonine at the +2 position, glycosylation occurs 406 more efficiently than when a serine is at the +2 position, owing to the van der Waals interactions established with the I572 of the PglB.This residue has been suggested as belonging to the motif MXXI, however the residue is only conserved in bacteria.This indicates that isoleucine 572 is not the only residue able to establish van der Waals interactions with the residue in the +2 position.Previous literature reported that glycosylation is efficient if the sequon also contains a negatively charged residue in the 22 position (Asp or Glu).The consensus sequon would therefore be D/EYNXS/T (Kowarik et al., 2006b).The negatively charged residue establishes a salt bridge with residue R331 from the PglB (Fig. 2c), strengthening the PglB-peptide bond; however, residue R331 is conserved only in bacteria.
PglB activity is dependent on the presence of a divalent cation, which may be Mn 2+ or Mg 2+ (Lairson et al., 2008).The PglB catalytic pocket contains three residues with three acidic side chains, D56, D154 and E319, which are likely to coordinate the divalent cation (Fig. 2c).D154 and D156 belong to the previously reported motif DXD and were shown to be catalytically relevant (Liu & Mushegian, 2003).In mutagenesis studies performed with a mannosyltransferase, an enzyme from the same superfamily as PglB, when either of these residues in the DXD motif was substituted by an alanine, the enzyme was inactive (Maeda et al., 2001).
On the other hand, residues D56 and E319 do not appear to be involved in catalysis; however, their carboxyl groups seem to interact with both the divalent cation and the amido group of the asparagines (Fig. 2c).The importance of the residues D56, D154 and E319 was clarified when mutagenesis of D56A, D154A and E319A severely impaired PglB activity (Lizak et al., 2011).Substitution of acidic side chains by iso-electronic amides, such as those present in the side chains of asparagine and glutamine, demonstrated that the negative charges provided by the carboxyl groups of aspartic acid and glutamic acid are essential for PglB activity.
Based on the solved structure, Lizak et al. (2011) proposed a three-step mechanism for glycosylation.The procedure starts with the sequon on the protein binding to PglB.Once bound, EL5 is thought to immobilize the bound peptide against the periplasmic domain of PglB, restricting movement.E319 is part of EL5 (Fig. 2b, c) and, therefore, this action results in the formation of the catalytic site where the asparagine is oriented, and amide activation occurs.Finally, binding of the LLO will result in nucleophilic attack of the activated amide resulting in glycosylation.
The elucidation of the C. lari PglB structure is a new and exciting development in the field of glycobiology.Apart from allowing an expansion of knowledge in the mechanism underlying glycosylation, it also opens the way for manipulating PglB enzymes in order to achieve different specificities, both for peptides and glycans, and perhaps enhanced enzymic activity, leading to an improved and wider application of PGCT.

Identification of novel bacterial N-glycosylation machineries in other bacteria
Identification and functional analysis of novel PglB orthologues from other bacterial species may lead to the identification of an OST with altered glycan substrate specificity.This would aid in overcoming some of the limitations of CjPglB for use in PGCT.To date, over 40 orthologues of CjPglB have been identified using different approaches (Nothaft & Szymanski, 2010), all of which have been found in d-and e-Proteobacteria.Many other species of Campylobacter have been found to possess orthologues to CjPglB; however, only one has been functionally characterized to date.The pgl locus of C. lari was shown to be functionally active in E. coli, and transfer by ClPglB of an oligosaccharide similar to that of C. jejuni was demonstrated.Additionally, ClPglB appears to possess a more relaxed acceptor protein specificity than CjPglB as it glycosylated two asparagine residues not located within a 'classical' five amino acid acceptor sequon as well as a native E. coli protein (Schwarz et al., 2011).ClPglB was only tested with the native glycan as a substrate, so it is possible that the enzyme may be able to transfer different glycans.Additionally, two PglB paralogues from non-Campylobacter species have recently been functionally characterized.A paralogue from Helicobacter pullorum has been demonstrated to partially complement a mutation in CjPglB in the recombinant E. coli system (Jervis et al., 2010).However, the acceptor protein specificity of HpPglB was found to be different from the specificity of CjPglB as only two of the four asparagine residues glycosylated by CjPglB were modified by HpPglB.HpPglB was demonstrated to transfer the C. jejuni heptasaccharide in vivo in E. coli as well as the native H. pullorum glycan in vitro (Jervis et al., 2010).No further data were provided on the alternative glycan specificity of this enzyme so it is possible that the enzyme may be able to transfer different glycans.A PglB orthologue from the d-proteobacterium Desulfovibrio desulfuricans was also recently functionally characterized.DdPglB was shown to transfer the C. jejuni heptasaccharide to the acceptor protein AcrA in the recombinant E. coli system (Ielmini & Feldman, 2011).However, a distinct difference in acceptor site specificity was detected.Only one asparagine residue was modified by DdPglB and mass spectrometry analysis of the acceptor site revealed the presence of a short, eukaryotic-like NXS/T sequon, demonstrating that DdPglB lacked the requirement for an acidic residue at the+2 position.Ielmini & Feldman (2011) also analysed the ability of DdPglB to transfer other glycans to AcrA and identified DATDH (2, 4-diacetamido-2,4,6-trideoxyhexose) as well as the E. coli O7 O-antigen polysaccharide, which contains an N-acetylglucosamine residue at the reducing end, as substrates for DdPglB.In addition to the CjPglB orthologues characterized to date, novel methods for high-throughput genome sequencing and an increased interest in micro-organisms from deepsea vent habitats have identified CjPglB orthologues in several novel species.The first examples of orthologues were found in the deep-sea vent e-proteobacterial species Nitratiruptor tergarcus SB155-2 and Sulfurovum lithotrophicum NBC37-1 (Nakagawa et al., 2007).
In summary, the functional analyses of bacterial PglB orthologues have demonstrated differences in acceptor protein specificity as well as flexibility in glycan substrate specificity.With increasing numbers of novel orthologues being identified, further functional analysis may serve to overcome the limitations of CjPglB, such as the specificity of the reducing end sugar.

Alternative bacterial glycosylation systems for use in PGCT
Bacterial surface structures such as pili have often been found to be decorated with glycan moieties through an Olinkage to a serine or threonine residue (Craig & Li, 2008).In many cases, such as the flagella of C. jejuni, the decorations are single residue sugars that are attached individually to the protein using nucleotide-activated sugar donors (Ewing et al., 2009).However, P. aeruginosa (Castric, 1995), N. meningitidis (Stimson et al., 1995) and F. tularensis (Egge-Jacobsen et al., 2011) decorate the pilus subunit protein pilin with more complex O-linked glycan structures.
In the case of P. aeruginosa, the PilO protein was demonstrated to be the OST involved in mediating the attachment of the glycan to pilin (Castric, 1995).For F. tularensis, the protein-targeting OST PglA was identified as the enzyme responsible for glycosylation of the PilA protein with a pentasaccharide (Egge-Jacobsen et al., 2011).Recombinant expression of PglA with PilA and a glycan substrate in E. coli resulted in glycosylation of the PilA protein, suggesting a possible application of this system for PGCT.In the case of N. meningitidis, a mutation in a gene termed pglL resulted in the production of pilin protein with reduced electrophoretic mobility (Power et al., 2006).Recombinant expression of PilO or PglL in E. coli, together with the respective pilin protein as well as a lipid-linked glycan, resulted in the production of glycosylated pilin (Faridmoayer et al., 2007).PilO was shown to only process short oligosaccharides, while PglL was demonstrated to be able to transfer long polysaccharides, similar to what has been shown for CjPglB.This suggested that PglL may be an interesting enzyme for use in PGCT.Further work by Faridmoayer et al. (2008) demonstrated that PglL displayed extreme substrate promiscuity and was able to transfer a large range of glycans to pilin, including glycans that have been shown to be unsuitable as a substrate for CjPglB such as the S. typhimurium O4 Oantigen.However, both reports used N. meningitidis pilin as the acceptor protein and no information was provided on the ability of PglL to glycosylate other proteins in E. coli or on the presence of an acceptor sequon that is analogous to the one found for N-linked glycosylation.A report by Vik et al. (2009) identified 11 O-linked glycoproteins in N. gonorrhoeae but did not find any consensus sequence surrounding the glycosylated amino acids.Instead, they identified the presence of domains of low complexity surrounding the glycosylated amino acids.This remains a disadvantage regarding the applicability of the O-linked system for PGCT, where rational engineering of any acceptor protein as a glycan carrier is desirable.Further studies on these O-linked glycosylation systems will expand the toolbox of useful OSTs for PGCT.
A surprising finding revealed that the high molecular mass adhesin proteins HMW1 and HMW2 of the Gram-negative c-Proteobacteria non-typeable H. influenzae are N-glycosylated (Grass et al., 2003), thus expanding the protein N-linked glycosylation beyond d-and e-Proteobacteria (Ielmini & Feldman, 2011;Nothaft & Szymanski, 2010).Around 75-80 % of non-typable H. influenzae express HMW1 and HMW2 proteins, which are essential in establishing the adherence of the bacteria to the epithelial cells of the respiratory tract of the human host, thus promoting and initiating colonization (St Geme et al., 1993, 1998).Compositional analysis of the protein showed that glycan moieties in the glycosylated HMW1 account for 5 % of its molecular mass (Grass et al., 2003).What is noteworthy is that glycosylation does not only affect the molecular mass but also the function of the protein.The absence of glycosylation of the HMW1 reduces HMW1mediated adherence of the bacteria due to the release of the HMW1 from the bacterial surface, highlighting the importance of glycosylation in cell-cell recognition (Grass et al., 2003).The key protein in HMW1 glycosylation was found to be HMW1C, which forms a complex with the protein in the cytoplasm, transferring the sugar moiety from the nucleotide-activated sugar to the asparagine residue in the protein, revealing a novel unique pathway that is different to the conventional lipid-linked sugar intermediate (Grass et al., 2003(Grass et al., , 2010)).Thus, amazingly, HMW1C is a bifunctional enzyme with both N-glycosyltransferase activity (ability to transfer a sugar moiety to an asparagine residue) and also O-glycosyltransferase activity (ability to transfer a donor sugar to an acceptor sugar, forming a hexose-hexose bond) (Grass et al., 2003) with a preference for long polypeptides (Schwarz et al., 2011).
Proteolytic digested fragments, covering 89 % of the amino acid sequence of the mature HMW1, followed by mass spectrometry studies showed that HMW1 is glycosylated at the asparagine residues of 31 sites with 47 simple nonacetylated hexoses or dihexoses (Gross et al., 2008), which is atypical for prokaryotes and eukaryotes in which acetylated sugars are transferred.All but one of the 31 sites are the common consensus sequence NXS/T, the exception being an asparagine in the DTTFNVER sequence (Gross et al., 2008).
Amino acid similarity analysis has shown that 42-68 % identity and 58-83 % similarity between different proteins in Gram-negative bacteria and HMW1C occurs in many pathogens, including Yersinia pestis, Yersinia pseudotuberculosis, Yersinia entercolitica, enterotoxigenic E. coli, Haemophilus ducreyi, Burkholderia spp.and A. pleuropneumoniae (Grass et al., 2010).Indeed, a 65 % identity and 85 % similarity overall was found between HMW1C protein and the A. pleuropneumoniae-designated ApHMW1C.ApHMW1C was shown to have a dual function analogous to the HMW1C (N-glycosyltransferase and O-glycosyltransferase) but was also able to complement the deletion of hmw1C, thus, promoting adherence in whole bacteria (Choi et al., 2010).The cytoplasmic N-glycosyltransferase (NGT) may prove to be a valuable new tool and an alternative to the current OSTs.Recently, the structure of ApHMW1C has been elucidated (Kawai et al., 2011), thus, revealing further insights into the structure and function of HMW1C-like proteins.

Engineering proteins to produce novel acceptors for PglB using glycotags
The capacity to engineer any given protein to form an acceptor for PGCT is desirable for downstream uses of recombinant glycoproteins.Using statistical analysis, Kowarik and co-workers determined the required glycosylation sequon for CjPglB to be D/EYNXS/T, where X and Y are any amino acid except proline.Additionally, they demonstrated that the engineering of an extended bacterial Nglycosylation sequon into a flexible loop of cholera toxin B enabled glycosylation of the protein by C. jejuni PglB (Kowarik et al., 2006b).Further analysis by nuclear magnetic resonance spectroscopy demonstrated that the structure of a glycosylatable loop of the native C. jejuni glycoprotein AcrA is very flexible (Slynko et al., 2009).Recent work by Fisher et al. (2011) showed that adding a repeating 'tag' of optimal acceptor sequons to the N-or C-terminus of a protein is sufficient to render the protein a substrate for C. jejuni PglB (Fisher et al., 2011).This method was tested for up to eight sequons and for several different proteins, including a recombinant murine IgG protein, and all potential glycosylation sites were found to be occupied.It was also demonstrated that the transient presence of the protein in the periplasm was sufficient for glycosylation to occur and glycoproteins could subsequently be trafficked to the outer membrane or even to the extracellular environment.This novel finding will be crucial for the engineering of future acceptor proteins for PGCT, particularly in the biosynthesis of novel glycoconjugate vaccines (Fisher et al., 2011).

Concluding remarks
The development and improvement of PGCT is paramount for the production of optimized multicomponent glycoconjugate vaccines that will be inexpensive to produce and will potentially protect against multiple pathogens.The recent technical developments reviewed here will increase our basic knowledge of bacterial glycosylation mechanisms and contribute greatly to the development of the technology.The discovery of PglBs with broader specificity, as well as the glycotagging technique will impact on the utility of PGCT.Furthermore, the detailed structure of ClPglB offers insight into the molecular detail, allowing the possibility of altering specificity and activity of this key enzyme.Additionally, improvement of PGCT can also derive from the cytoplasmic ApHMW1C or from the use of O-linked glycosylation.Clearly, understanding the basic mechanisms by which bacteria glycomodify proteins (a previously neglected area of research) will be fruitful for biotechnological applications and could be the dawn of a new era for the design and production of novel and inexpensive glycoconjugate vaccines.

Fig. 2 .
Fig. 2. Schematic of the Campylobacter lari PglB structure.(a) Complete PglB structure (in blue) with bound peptide (in yellow).(b) Topological schematic representing helices and connecting loops.Conserved residues present in the active site are represented by blue hexagons and a red circle.Residue R331 is represented by a yellow circle.(c) Essential residues interacting with the peptide in the catalytic site and in the peptide binding site.Adapted by permission from Macmillan Publishers Ltd: Nature (Lizak et al., 2011), copyright 2011.