Investigation of sequon engineering for improved O-glycosylation by the human polypeptide N-acetylgalactosaminyl transferase T2 isozyme and two orthologues

We have been developing bacterial expression systems for human mucin-type O-glycosylation on therapeutic proteins, which is initiated by the addition of α-linked GalNAc to serine or threonine residues by enzymes in the GT-27 family of glycosyltransferases. Substrate preference across different isoforms of this enzyme is influenced by isoform-specific amino acid sequences at the site of glycosylation, which we have exploited to engineer production of Core 1 glycan structures in bacteria on human therapeutic proteins. Using RP-HPLC with a novel phenyl bonded phase to resolve intact protein glycoforms, the effect of sequon mutation on Oglycosylation initiation was examined through in vitro modification of the naturally Oglycosylated human interferon α-2b, and a sequon engineered human growth hormone. As part of the development of our glycan engineering in the bacterial expression system we are surveying various orthologues of critical enzymes to ensure complete glycosylation. Here we present an in vitro enzyme kinetic profile of three related GT-27 orthologues on natural and engineered sequons in recombinant human interferon α2b and human growth hormone where we show a significant change in kinetic properties with the amino acid changes. It was found that optimizing the protein substrate amino acid sequence using Isoform Specific O-Glycosylation Prediction (ISOGlyP, http://isoglyp.utep.edu/index.php) resulted in a measurable increase in kcat/Km, thus improving glycosylation efficiency. We showed that the Drosophila orthologue showed superior activity with our human growth hormone designed sequons compared to the human enzyme. D ow naded rom http://pndpress.com /bchem j/article-oi/10.1042/BC J20210382/3/bcj-2021-0382.pdf by gest on 18 Sptem er 2021 Bchem al Jornal. This is an Acepted M ancript. ou re encuraged to se he Vrsion of R eord tat, w en puished, w ill relace his vesion. he m st up-tote-version is avilable at https://drg/10.1042/BC J210382 Thompson et al Biochemical Journal submission 2021


Introduction
Biological functions of protein O-linked glycans are essential and extensively varied in mammals. Many proteins with therapeutic applications rely on proper glycosylation for increased serum half-life through decreased proteolysis and a reduction in clearance by hepatic lectins. In contrast to serum-derived therapeutic proteins which are often heavily glycosylated with N-linked glycans (1) proteins in the cytokine or hormone families have simple O-glycan structures which are often missing in the recombinant versions, as they are produced in bacterial expression systems leading to the need for chemical modification with polyethylene glycol polymers to sustain serum half-life (2). A recent paper on human peptide hormone O-glycans shows that these glycans can protect peptide hormones from proteolysis leading to increased serum half-life and better bio-availability (3). Recently we developed an expression platform to produce cytokines and hormones in a bacterial expression system with the goal of providing the benefit of both ease of production and adding authentic human O-glycan structures (4). Our test proteins are the natively glycosylated interferon α-2b (IFNα2b) and the human growth hormone (hGH) which carries an (inactivating) O-glycan on ~50% of the 24kDa isoform produced in humans (5). These proteins share a common protein fold of a tetrameric helical bundle ( Figure 1) and offer an excellent platform to explore sequon engineering for the optimized addition of Oglycans (IFNα2b), or the introduction of a new glycosylation site (hGH) as potential means of increasing circulating half-life of these therapeutic proteins. We are currently optimizing this expression platform so that we can use synthetic biology to produce complex O-glycans on a variety of therapeutic proteins and peptides. Those O-glycans which begin an α-linked N-acetylgalactosamine residue are often referred to as mucin-type O-glycans, owing to the heavy presence of these glycans on all mucins. The O-GalNAc glycans come in 8 core types and can be a single monosaccharide or large, branched structures (6). Humans have a family of 20 polypeptide α-N-acetylgalactosaminyltransferases (GalNAc-Ts) which add the first residue of these O-glycan core structures. These have been extensively reviewed (7)(8)(9), and it has been observed that these enzymes are both redundant yet specific to human glycobiology (10).
The GalNAc-T enzymes belong to the CAZy GT-27 glycosyltransferase family (11) and they catalyze the covalent addition of an α-GalNAc moiety from a UDP-α-D-GalNAc donor onto a serine or threonine residue generating the structure known as the Tn antigen, which is then further elongated by multiple glycosyltransferases in the Golgi. GalNAc-T's have distinct catalytic and lectin domains which are important in the buildup of complex mucin-type glycosylation where multiple residues are glycosylated (7). Within that diverse GT-27 glycosyltransferase family, the GalNAc-T2 isozyme is conserved across diverse eukaryotic organisms, indicating how important the requirement of these O-glycans is. This glycosyltransferase has multiple substrate specificities and has been shown to be promiscuous (12). Unlike their N-linked counterparts, O-GalNAc glycans do not have a precise consensus amino acid sequence (sequon) where the transfer takes place. Some general considerations have emerged based on natural sites of glycosylation as well as extensive synthetic peptide modification work (8). The consensus observations are a strong bias for proline 3 residues Cterminal to the site of glycosylation and a strong preference of threonine over serine for glycosylation (13). This was also validated in vivo in tissue cultured cells (14). Like other GalNAc-Ts, the crystal structure of human GalNAc-T2 shows a proline pocket and other features which are consistent with the observed sequon preference (15).
There has been much published about the sequence specificity used by the GalNAc-T family (7,16). This led to the development of the sequon utilization prediction software: Isoform Specific O-Glycosylation Prediction (ISOGlyP, found at https://isoglyp.utep.edu/). This was developed by Gerken and co-workers (8) and can be used to tune the sequon to improve site occupancy in an in vivo situation (4). This is a boon for the synthetic biology approach because it means you can tailor the protein glycosylation site rather than having to include many GalNAc-T enzymes.
For our synthetic biology approach to the bacterial addition of Core 1 glycans on therapeutic cytokines and hormones we need to use enzymes from a variety of organisms, as we Previous research suggests that GalNAc-T1/T2 are the isozymes likely to initiate O-glycan synthesis (17). As we were focused on creating a single site of glycosylation, we then concentrated on the GalNAc-T2 isozyme which had been successfully expressed in bacteria. The expression of functional human GalNAc-T2 in E. coli was reported (18,19)

Expression and purification of target proteins
The target proteins IFNα2b-WT, IFNα2b-MUT, and hGH-MUT1 have been previously described (4) and are based on the pET21b-6xHis-GB1 expression plasmids. The two additional hGH sequon mutants reported in this work were obtained as synthetic genes (Integrated DNA Technologies) and cloned in the same fashion. IFNα2b-WT/MUT (amino acids 24-188) and hGH-MUT1/MUT2/MUT3 (amino acids 27-201) were expressed in E. coli Origami™ 2(DE3) (Novagen) in 2x YT medium with 150 μg/mL ampicillin. The strains were grown at 37°C until an OD 600 of 0.4, then induced with 0.5 mM IPTG at 20°C. After overnight incubation, the cell pellet was lysed and clarified as described for the transferases but using IMAC column binding buffer (50 mM HEPES, 300 mM NaCl, pH 8.0). COmplete™ His-tag purification resin (Roche Life Science) was used to purify the polyhistidine-tagged GB1-fusion proteins with a gradient
Transferase assays were repeated with two biological replicates of each enzyme, and kinetic parameters were generated by GraphPad Prism ver. 8.0. The specific activity of hGalNAc-T2 expressed with and without the chaperone hPDI was also assessed in 20 μL reactions containing 50 mM HEPES pH 7.4, 10 mM MnCl 2 , 1 mM UDP-GalNAc, 10-50 μg/mL enzyme and 0.6 mg/mL IFNα2b-MUT. Reactions were incubated and stopped in the same manner. The level of target conversion was less than <=20% so these were still initial rates. Units are micromoles/min/mg of enzyme.

One-pot transferase assays with hGH sequon mutants using BODIPY-sialic acid detection
A one-pot synthesis reaction was performed with each of the three hGH sequon mutants as summarized in Figure 2. This one pot synthesis reaction results in the addition of sialyl-T antigen on the purified protein, where the sialic acid is tagged with a BODIPY fluorophore (20). The assays contained 50 mM HEPES pH 7.0, 10 mM MnCl 2 , 0.1 mM UDP-GalNAc, 0.2 mM UDP-Gal, 0.2 mM BODIPY-9-Neu5Ac-CMP, 1 mg/mL hGH, 25 µg/mL GalNAc-T, 100 µg/mL T synthase (C1GalTA), and 100 µg/mL pST3Gal1. Again, assays were repeated with two biological replicates of each GalNAc-T, and reactions were incubated and stopped in the same manner. The reactions were analyzed again by intrinsic fluorescence instead of the BODIPY label so that both starting material and product could be measured.

Sequence comparison of GalNAc-Ts from human, Drosophila, and Caenorhabditis
A structure-based sequence alignment ( Figure 3) using hGalNAc-T2 as a guide (PDB: 2FFU) indicated highly conserved regions across the three GalNAc-Ts we are investigating. We want to find the one best suited for expression in E. coli and modification of therapeutic proteins. These were the human GalNAc-T2 (hGalNAc-T2), the Drosophila melanogaster GalNAc-T2, (DmGalNAc-T2, 66% identity to human), and the Caenorhabditis elegans GalNAc-T4 (CeGalNAc-T4, 83% identity to DmGalNAc-T2 and 53% identity to hGalNAc-T2). CeGalNAc-T4 was chosen for its similarity to the human T2 isozyme, and was shown to be able to glycosylate peptides derived from human proteins (21). DmGalNAc-T2 has been examined for specificity compared to the human orthologue and was shown to have a similar activity on peptide substrates derived from human proteins (16). Secondary structure elements were extracted from the PDB file 2FFU for hGalNAc-T2 and used as guide to align the DmGalNAc-T2 and CeGalNAc-T4 orthologues. The structure-based alignment was generated with Expresso (22). The figure was prepared with ESPript 3.0 (23).
The residues in red are identical for all three orthologues, those in yellow are conservative changes for all three orthologues.

Expression of the catalytic domains of GT-27 orthologues in E. coli
Our expression system makes use of the E. coli maltose binding protein as a fusion partner (24).
We used synthetic genes which were designed to remove the sequences for type II membrane anchor and part of the stem region, as well as the lectin domain. The purity of the proteins is shown in Figure 4. Some proteolytic fragments can be seen in the profiles.  The left panel shows an intact protein reaction (IFNα2b-MUT) with purified hGalNAc-T2 expressed without hPDI, the right panel is the same reaction but with purified enzyme expressed with hPDI.

Using ISOGlyP to engineer the sequons
The target proteins IFNα2b and hGH have similar protein folds in that they both have a core of four alpha-helices, with an unstructured loop on the side of the protein facing away from the receptor (IFNα2b PDB 3S9D) (26) (hGH PDB 3HHR) (27). IFNα2b is naturally glycosylated on Thr106 (28) which is within a flexible loop which we analyzed with ISOGlyP (29) to see what changes would be required to increase the efficiency of hGalNAc-T2. As hGH is not normally O-glycosylated on the equivalent loop, we had to engineer a site for hGalNAc-T2 to use it as a substrate. As we previously published (4) a two amino acid substitution in IFNα2b resulted in vastly improved bacterial glycosylation (>85% modification), and a four amino acid rearrangement in hGH (Table S1) gave >60% protein modification for bacterial glycosylation. Still, the mechanism by which better overall glycosylation was achieved remained unclear, so we undertook to examine enzyme kinetics on the intact proteins to assess changes in enzyme parameters. Additionally, the initial engineered sequon in our hGH-MUT1 protein was suboptimally glycosylated compared to that of IFNα2b-MUT. What we encountered with hGH-MUT1 was that the first Thr with the higher score was not the preferred site of addition despite the higher score (4). We sought to improve that by creating two more sequon variants based on much higher ISOGlyP scores (Table 1). The bold T is the threonine the GalNAc-T2 score refers to.

Enzyme kinetic assays on intact proteins
In our previous work with bacterial glycosylation, we showed that reversed phase HPLC was a valuable quantitative tool for the analysis of protein modification (4). We compared three target proteins and three GT-27 orthologues, and the data are summarized in Figure 6 and  Reaction rates are substantially higher for all three enzymes on the sequon mutant, and the sequon engineered version of hGH is used by all three orthologues. The sequon being modified is shown below each graph. Biological and technical replicates were used, and the error is shown as standard deviation on the graphs. reactivity than what was observed with the IFNα2b-WT sequon, but not as big a change as we observed for the modified IFNα2b.

Sequon engineering of hGH as assessed by one-pot reactions
We wanted to examine all three sequon mutations of hGH with in vitro modification reactions to examine the kinetic parameters but found that GalNAcT-2 reaction products from hGH-

Discussion
The examination of the enzymes which initiate mucin-type glycosylation has been extensive and the enzymology of many enzyme family members (GT-27) has been examined largely with synthetic peptides (8,16,31,32). These GT-27 GalNAc-T's are found in many eukaryotes as mucin-type glycosylation is highly conserved. The recombinant expression of many of these has been reported, although not in an active form in E. coli. In the development of specialized expression systems to produce active eukaryotic glycosyltransferases with multiple disulfide bonds, we have shown that co-expressed folding chaperones can be used to produce multi-milligram amounts of active and relatively pure GT-27 orthologues, which can have up to five disulfide bonds when including the lectin domain. There is some proteolysis of the fusion proteins prior to purification, as is particularly evident with the DmGalNAc-T2 construct presenting two major bands by SDS-PAGE, but this does not appear to compromise enzyme activity.
From the extensive peptide work performed by others (9,16) the isozyme hGalNAcT-2 was shown to have a wider substrate specificity compared to many other human GT-27 family members. Some similarities in activity were also noted with the DmGalNAc-T2, and to a lesser extent CeGalNAc-T4 (33). Each of these enzymes has a native set of protein substrates, however it is not known if the DmGalNAc-T2 or CeGalNAc-T4 ever glycosylate proteins with structures like the cytokines we used here. So, it is not entirely surprising that the wild type sequon in IFNα2b-WT is not used very well. This is also true for the human enzyme; the IFNα2b-WT sequon is predicted by ISOGlyP to be preferred by the GalNAc-T1 isoform, which is consistent with what we have previously observed (4).
What was very striking with the IFNα2b-MUT sequon was the exceptionally large improvement in reaction rate and turn-over for DmGalNAc-T2 compared to the other enzymes.
The human enzyme showed a marked improvement with the engineered IFNα2b sequon as well, This suggests some rigidity with respect to orientation of the loop region containing the Oglycosylation sequon, thus influencing the overall surface hydrophobicity of the protein as well as accessibility of the modified threonine.
Of course, we have shown only two target protein examples here, and we are designing sequons in other cytokine-like proteins to probe their glycosylation efficacy both in vitro and in bacterial cells with the goal of establishing a versatile approach to producing human glycans on a variety of protein substrates. It will be essential for therapeutics with these sequon mutations to also maintain biological activity. We previously showed that the IFNα2b-MUT appears as active as the wild type (4), and are currently following up on the activity of the hGH variants described here. We are also looking into structural strategies for introducing multiple sequons to investigate serum half-life benefits from having multiple glycans per protein.
We have not yet tested many members of the GT-27 family, but we were successful with bovine GalNAc-T1 and hGalNAc-T2 (4), and now the D. melonogaster and C. elegans orthologues for hGalNAc-T2. It appears that the general folding of GT-27 enzymes can be efficient in E. coli with assistance from protein disulfide isomerases. We can generate enough family members and sequon combinations in folded proteins. This will expand our access to these important protein modifying glycosyltransferases, and in combination with other glycosyltransferases permit the further engineering of bioactive O-glycans.

Data availability
All data will be made available upon request.