Structural basis for the strict exclusion of proline from the N-glycosylation sequon

Oligosaccharyltransferase (OST) catalyzes oligosaccharide transfer to the Asn residue in the N-glycosylation sequon, Asn- X -Ser/Thr, where Pro is strictly excluded at position X . Considering the unique structural properties of proline, this exclusion may not be surprising, but the structural basis for the rejection of Pro residues should be explained explicitly. The crystal structure of an archaeal OST in a complex with a sequon-containing peptide and dolichol-phosphate was determined to a 2.7 Å resolution. The sequon part in the peptide forms two inter-chain hydrogen bonds with a conserved amino acid motif, TI X E. We confirmed the essential role of the TI X E motif and the adjacent regions by extensive alanine-scanning of the external loop 5. A Ramachandran plot revealed that the ring structure of the Pro side chain is incompatible with the f backbone dihedral angle around -150° in the rigid sequon-TI X E structure. chromatography using Superdex 200 10/300 GL (GE Healthcare) was subsequently performed in 20 mM Tris-HCl, pH 8.0, 300 mM NaCl, and 0.05 % (w/v) DDM. For disulfide bond tethering, purified Af AglB(G617C) was incubated with a peptide at pH 8.0, at a molar ratio of 1:10. After an overnight incubation at room temperature, the Af AglB − peptide complex was separated from the unreacted peptide monomers and the byproduct peptide dimers and concentrated to 33 mg mL -1 by membrane filtration, in 20 mM Tris-HCl, pH 7.5, 200 mM NaCl, and 0.05% DDM. An inverse PCR-based site-directed mutagenesis kit (SMK-101, TOYOBO) was used to generate single-point mutations of the Af AglB sequence. For the oligosaccharyl transfer and FNG generation assays, the Af AglB mutant proteins were purified by nickel affinity chromatography only, and the His-tag at the C-terminus was not removed. X-ray data collection and data processing. The X-ray diffraction data were collected at beamlines BL32XU and BL44XU in SPring-8 (Hyogo, Japan). The final X-ray diffraction data were collected at beamline BL32XU using an EIGER X 9M detector (Dectris, Switzerland). A micro-focused beam of 10 μm × 15 μm (horizontal × vertical) with a wavelength of 1.0000 Å was used for both the raster scan and data collection, under a cryo stream operating at 100 K. The datasets in 10° wedges were collected from microcrystals with a frame rate of 50 Hz in a shutterless operation mode at a dose of 10 MGy. The automated data collection system ZOO, developed at SPring-8 51 , was used for automatic data collection from 2,529 microcrystals supported on 5 cryoloops. Data sets indexed with consistent unit cell parameters were subjected to a hierarchical cluster analysis based on unit-cell similarity. Finally, 483 datasets were merged, integrated, and scaled to 2.7 Å using the automatic data processing system KAMO 52 .

X -2 -X -1 -Asn 0 -X +1 -Ser/Thr +2 -X +3 -X +4 -X +5 . Eubacteria use an extended 5-residue sequon 13 , Asp/Glu -2 -X-Asn-X-Ser/Thr, although the presence of an acidic residue at position -2 is not absolutely required 14, 15 . The amino acid bias in the middle position (position +1) of the N-glycosylation sequon is an interesting phenomenon. Statistical analyses of many glycosylated sites in glycoproteins revealed little preference for a particular amino acid at position X, except for the strict Pro exclusion in eukaryotic 16,17 , archaeal 18 , and eubacterial glycoproteins 19 . The N-oligosaccharyl transfer is catalyzed by an integral membrane enzyme, oligosaccharyltransferase (OST) 20,21 . The OST enzyme determines the non-preference and the exclusion of amino acid residues at position +1 of the glycosylated sequons 22,23 . To clarify the structural basis of the sequon selection rules, we need the three-dimensional structures of the OST enzymes in complexes with the two substrates, an oligosaccharide donor and an oligosaccharide acceptor. In contrast to the relatively invariable properties of the amino acid sequences of the acceptor sequon, the oligosaccharide donor is highly diverse among the three domains of life. The oligosaccharide donor has the general structure of lipid-phosphate(s)-oligosaccharide, and is thus referred to as a lipid-linked oligosaccharide (LLO). The lipid part is dolichol in Eukarya and Archaea, and polyprenol in Eubacteria 24,25 . Polyprenol is a long chain isoprenoid alcohol with the general formula, [a-terminus] HO-(CH2-CH=C(CH3)-CH2)n-H [w-terminus], and dolichol is a special type of polyprenol that contains a saturated isoprene unit at the a-terminus. A diphosphate-type LLO is commonly used as the oligosaccharide donor for the OST catalyzed transfer reactions in the three domains of life, but a subset of Archaea, Euryarchaeota, exceptionally uses a monophosphate-type LLO [26][27][28] . The chemical structure of the oligosaccharide part is also diverse. Most eukaryotes use a well-conserved canonical 14-residue oligosaccharide structure, Glc3Man9GlcNAc2, and lower eukaryotes use a shorter version of the 14-residue structure, lacking the terminal glucose and/or mannose residues 29 . In contrast, Archaea and Eubacteria use completely different sets of oligosaccharide structures from species to species, with respect to the number, composition, and branching pattern of the monosaccharides 8 . Considering the substantial divergence of the oligosaccharide donor structures, comparisons between distantly related OST enzymes can capture the essence of substrate recognition and enzyme catalysis.
The OST enzymes are hetero-oligomeric protein complexes in most eukaryotes, and single-subunit proteins in lower eukaryotes 20,21 . The archaeal and eubacterial OSTs are also single-subunit enzymes. The OST enzymes are located in the endoplasmic reticulum membranes of eukaryotic cells and the plasma membranes of archaeal and eubacterial cells. The crystal structures of the eubacterium Campylobacter lari OST (alias ClPglB) were reported in complexes with an acceptor peptide (PDB: 3RCE) 30 , an acceptor peptide plus a non-hydrolyzable LLO analog (PDB: 5OGL) 31 , and an inhibitory peptide plus a reactive LLO analog (PDB: 6GXC) 32 . The crystal structures of the euryarchaeon Archaeoglobus fulgidus OST (alias AfAglB) were determined in complexes with a sulfate ion, which mimics the phosphate group of LLO (PDB: 3WAJ) 33 , and with an acceptor peptide (PDB: 5GMY) 23 . These binary and ternary complex structures provided many valuable insights into the oligosaccharyl transfer reaction. Recently, the cryo-EM single-particle structures of yeast OST and two human OST paralogs were reported [34][35][36] . The catalytic subunits, Stt3, in the multi-subunit OST enzymes have essentially identical structures to those of ClPglB and AfAglB 21 . The two human OST structures contain an endogenous dolichol-phosphate, which was co-purified during purification, and one of them also contains a model for an acceptor peptide of unknown origin. Unfortunately, the resolutions (3.3 -3.5 Å) of the cryo-EM structures are not sufficient to discuss the details of the sequon recognition and the catalytic mechanism.
Several conserved short amino acid motifs have been identified (see Fig. 1a) in the diverse Stt3, AglB, and PglB protein sequences (identity < 20 %). The C-terminal globular domain contains the WWDYG and DK/MI motifs 37 , where the slash delimiter indicates domain-specific conservation.
The DK motif is found in Eukarya and a subset of Archaea, whereas the MI motif is present in the remaining Archaea and Eubacteria. The WWDYG and DK/MI motifs form a binding site for the Ser/Thr residue in the sequon 30,33 . The DGGK motif is conserved among eubacterial PglBs and euryarchaeal AglBs and presumed to be involved in LLO binding 38 . The equivalent in the eukaryotic Stt3 and the AglBs from the ASGARD and TACK superphyla of Archaea is a double sequon motif, DNXTZNX[T/S], where X and Z can be any residue 20 . The N-glycan attached to the double sequon motif is involved in the interactions with other subunits in the multi-subunit OST complexes [34][35][36] . The N-terminal transmembrane (TM) region of the Stt3/AglB/PglB proteins consists of 13 TM helices and contains two DXD motifs on the first and second external loops (EL1 and EL2) and a TIXE/SVSE motif on the fifth external loop (EL5). The TIXE motif is found in Archaea and Eubacteria, whereas the SVSE motif is present in Eukarya. The mutually independent conformational changes of the N-terminal and C-terminal halves of the EL5 loop are considered to be essential for the binding of the LLO and sequon, respectively 39,40 . In response to the conformational changes of the EL5 loop, the catalytic structure dynamically forms by integrating the Glu residue in the TIXE/SVSE motif. No appropriate functional groups were found around the side-chain carboxamide group of the acceptor Asn, and thus a hypothetical "twisted amide mechanism" was proposed for the activation of the inert amide nitrogen 30 . In this mechanism, the N-C bond in the carboxamide group is transiently twisted through bipartite interactions with the two carboxy groups of the conserved acidic residues in the first DXD and the TIXE/SVSE motifs. The twisting abolishes the conjugation of the lone-pair electrons on the nitrogen atom with the carbonyl group, and thus increases the nucleophilic reactivity of the amide nitrogen 41 .
Here, we determined the crystal structure of the ternary complex of the A. fulgidus AglB protein (AfAglB) with a sequon peptide and a dolichol-phosphate molecule. The catalytic structure around the bound metal ion is almost the same as that of the binary AfAglB-peptide complexes determined previously 23 . Our analysis of the sequon recognition revealed the special roles of the TIXE motif in the EL5 loop. Although the conservation of the TIXE/SVSE motif was previously reported, its precise role in the oligosaccharyl transfer reaction has not been identified. We now report the formation of the inter-chain hydrogen bonds between the sequon and the TIXE motif in the AfAglB protein. The requirement of a special f dihedral angle in the rigid sequon-TIXE structure clearly explains the structural basis for the strict exclusion of Pro at the middle position of the N-glycosylation sequon.

Results
Crystallization and structure determination. We used the lipidic cubic phase (LCP) method to obtain crystals of the AfAglB in a complex with a donor LLO molecule and an acceptor peptide.
Native LLO was isolated from cultured A. fulgidus cells. The AfLLO preparation that produced diffraction-quality co-crystals was a crude mixture of LLOs with variable numbers of monosaccharides (6 and 7), isoprene units (C55 and C60), saturated isoprene units (3, 4, and 5), and the sulfate group (0 and 1) 27 . The peptide used for crystallization was custom synthesized. To compensate for the weak affinity, the sequon peptide was tethered to the AfAglB protein via a disulfide bond, to shift the association-dissociation equilibrium to the bound state 23 . A cysteine residue was introduced as a sole tethering point (G617C) in the AfAglB protein. To stop the transfer reaction, the Asn residue in the sequon was replaced by a L-2,4-diaminobutyrate (Dab) residue. The replacement of the amide group by an amino group is known to inhibit the oligosaccharyl transfer reaction in a competitive manner 42 . The peptide sequence is TAMRA-APY(Dab)VTASCR-OH, in which the non-reactive sequon is underlined and the cysteine residue for tethering is italicized. The N-terminal a-amino group is modified with a fluorescent carboxytetramethylrhodamine (TAMRA) dye for color detection. We chose 7.7 MAG as the host lipid with consideration of the larger water channel and reduced interfacial curvature of the cubic mesophase, which is suitable for the crystallization of membrane proteins with a large soluble domain 43 . Microcrystals were grown in a lipidic sponge mesophase under buffer conditions of 19-22% PEG400, 0.1 M Na-citrate, pH 6.0, and 50 mM NaCl. The positions and shapes of the crystals were easily identified by the magenta color of the TAMRA dye (Extended Data Fig. 1). Diffraction data were collected from 2,529 microcrystals at the microfocus beamline BL32XU, SPring-8, Japan. A small-wedge data set was collected from each crystal and merged to complete the data set. The structure was determined by the molecular replacement method to a resolution of 2.7 Å (Table 1). Even though the tethered peptide contained the non-reactive Dab residue at position 0, the LLO binding site was occupied by a dolichol-phosphate, instead of an intact LLO, suggesting that the LLO was hydrolyzed during the prolonged crystallization period. An omit electron density map revealed the clear densities for the dolichol-phosphate, except for the isoprene units in the middle part ( Fig. 1d, orange mesh). Another omit electron density map also revealed the clear density for the sequon part in the tethered peptide (Fig. 1c, red mesh). Consequently, the models were reliably built for the sequon segment, A -3 PYDabVT +2 , and the dolichol(C60)-phosphate. The construction of the models of the A +3 SC +5 linker segment of the tethered peptide and the central part of the dolichol chain was guided by the chemical structures of the amino acids and isoprene unit. Three water molecules around the metal ion were visible in the difference map and modeled (Fig. 1c, blue mesh).   Adjusted p values of 0.05 or less were considered statistically significant. *adjusted p < 0.05, ** < 0.01, *** < 0.001. The magenta bars represent the amino acid residues with a significant decrease, and the blue bars represent those with a significant increase of the oligosaccharyl transfer activity.
Attempts to uncouple the LLO hydrolysis activity from the oligosaccharyl transfer activity. and His 162 (close to the metal site), and Asp 552 , Gln 571 , and Lys 618 (close to the peptide site). We measured the two enzymatic activities (Extended Data Fig. 3). Disappointingly, no mutations with the desired suppression of the LLO hydrolysis activity relative to the oligosaccharyl transfer activity were obtained.

Discussion
We determined the crystal structure of a ternary complex of the AfAglB protein with an acceptor peptide and dolichol-phosphate. The resolution of the present structure is one of the best (2. The present structure has revealed that the AfAglB protein recognizes the sequon sequences through not only the side-chain groups of Asn 0 and Ser/Thr +2 , but also the main-chain groups of the X +1 and X +3 residues by the TIXE motif. The essential role of the TIXE motif was confirmed by the alanine-scanning study of the EL5 loop (Fig. 3). No similar exhaustive mutation scanning experiments of the EL5 loop have been performed for other OST enzymes.
Recognizing the structure and function of the TIXE motif is the key toward understanding the sequon recognition by the OST enzyme.
The exclusion of a Pro residue at position +1 in the N-glycosylation sequon is absolutely strict.   (Fig. 5a). This implies a common catalytic mechanism for the two types of LLOs.
Next, we focus on the binding mode of the lipid chains. In the present AfAglB structure, the ω-terminus of the dolichol is located in the tunnel structure formed at the interface between the two TM helices, TM helix 8 and helix 9 (Fig. 5b). This tunnel structure implies that the LLO molecule enters the binding site through the gap between TM helix 8 and helix 9 (Fig. 5c). TM helix 9 must move in concert with the conformational change of the EL5 loop, to enlarge the gap upon LLO binding. A similar "LLO entry gate" was proposed for the Stt3 subunit in the yeast OST, although no LLO molecule was bound in the determined cryo-EM structure 34 . The arrangements of the TM helix 8 and helix 9 are similar to each other between yeast Stt3 and AfAglB, but distinct in ClPglB 21 .
Consistently, for ClPglB, the LLO was assumed to thread into the binding site under the disordered EL5, while the TM helix 9 stayed in place 31 . The mutagenesis of the Tyr 293 residue in the EL5 loop of ClPglB resulted in a 7,000-fold reduction of the glycosylation turnover rate 40 . By contrast, the V349A mutation, which is located at the corresponding position in AfAglB, exhibited a moderate reduction in the oligosaccharyl transfer activity, but the effect was not as significant as in the ClPglB case (Fig. 3). The discrepancy in the mutational effects is attributable to the different binding modes of the dolichol/polyprenol chains. Finally, we discuss the yet-to-be-defined activation mechanism of the inert amide nitrogen in the acceptor Asn. To date, no convincing experimental evidence to support the twisted amide mechanism has been reported 32 . The OST enzymes might adopt a supportive mechanism to compensate for the poor nucleophilicity of the carboxamide group of the acceptor Asn. Locher and coworkers proposed that the divalent metal ion might directly activate the glycosidic oxygen to generate a reactive electrophile 32 . Alternatively, the rigid frame structure composed of the sequon and the TIXE/SVSE motif could function as a guiding device to bring the nitrogen atom in the vicinity of the C1 carbon atom (Extended Data Fig. 5). In the transition state, the amide nitrogen and the C1 carbon are forced to move within a closer reaction distance by the restriction of concerted motions to one direction. As the result, the unreactive amide nitrogen attacks the C1 carbon of LLO to perform a nucleophilic substitution, by converting energy from the conformational to chemical coordinate 48 .
In conclusion, the present structural and mutagenesis studies revealed the dual roles of the TIXE motif in the sequon recognition and catalytic mechanism. First, the TIXE motif participates in the formation of the rigid sequon-TIXE frame structure to recognize sequon sequences at the main-chain level (Fig. 2). The sequon-TIXE frame forces the amino acid residues at positions +1 and +3 to adopt high f dihedral angles (Fig. 4), which are inaccessible to Pro. This is the structural basis for the exclusion of Pro residues at the middle position and the position after the Ser/Thr residue of the N-glycosylation sequon. As the second role, the rigid sequon-TIXE frame structure effectively restricts the motion of the acceptor Asn residue, which could compensate for the poor nucleophilicity of the carboxamide nitrogen (Extended Data Fig. 5). AfAglB mutant proteins were purified by nickel affinity chromatography only, and the His-tag at the C-terminus was not removed.

Lipid-linked oligosaccharide from A. fulgidus cells.
A. fulgidus cells were cultured as previously described 49 . AfLLO was prepared from the cultured cells as described 27 . Briefly, after the cells were disrupted in a hypoosmotic buffer, insoluble materials were collected by centrifugation and homogenized with a probe sonicator. After centrifugation for debris removal, the supernatant was ultracentrifuged at 100,000 × g to collect the membrane fractions. AfLLO was extracted from the membrane fractions by two-phase partitioning with a chloroform/methanol/water solvent system, fulgidus cells. When we used purified LLO preparations, which were eluted as a single peak from a normal phase HPLC column, many microcrystals of fine appearance were obtained, but the quality of their X-ray diffractions was poor. We then switched to crude LLO preparations, prepared only by two-phase partitioning with a chloroform/methanol/water solvent system, and found that the microcrystals provided good diffraction data. Consequently, the AfLLO in the crystallization drops Amino acid residues related to the sequon recognition and the catalytic function are shown and labeled. The two structures were superimposed by the PyMOL align command, using all heavy atoms belonging to the 3-residue sequon and 4-residue TIXE motif, the conserved three acidic residues (D 47 /D 56 , D 161 /D 154 , and H 163 /D 156 ) in the two DXD motifs, the manganese ion, and the phosphate group directly linked to the oligosaccharide. The rms distance is 0.56 Å for 43 pairs of atoms. The /-delimited residue names represent the amino acid residue of AfAglB (first) and that of ClPglB (second). The sequon peptides containing the Dab residues are shown as yellow sticks, and the dolichol-phosphate molecule derived from the natural AfLLO, and the polyprenol-diphosphate part of the ClLLO analog are shown as salmon sticks. The GlcNAc residue of ClLLO is not displayed for clarity. The manganese ions are shown as purple spheres. Note that the resolution of the coordinates of PDB: 6GXC is 3.4 Å, but the positional accuracy of atoms is expected to be as high as that of AfAglB (2.7 Å) because the structure of the 6GXC was solved by the molecular replacement method, using the PDB entry 5OGL (2.7 Å) as the template. mechanisms. The amide nitrogen of a carboxamide group is poorly reactive due to electron delocalization from the amine to the carbonyl group, indicated by resonance. The amide nitrogen could be activated by an acid/base catalysis; i.e., protonation of the carbonyl oxygen and deprotonation of the amide nitrogen, or amide twisting; i.e., rotation around the N-C bond by the formation of two hydrogen bonds with the conserved acidic residues. d, Supportive mechanism that compensates for the poor nucleophilicity of the amide nitrogen. The rigid sequon-TIXE/SVSE structure restricts the motion of the side chain of the acceptor Asn in only one direction, for the effective conversion of energy from the conformational to chemical coordinates, and forces the nitrogen atom to move closer to the C1 carbon of the LLO within a reactive distance in the transition state.

Supplementary Table 1 | Data Collection and Refinement Statistics
This is a revised version of the previously determined binary AfAglB-peptide complex (PDB: 5GMY). The position of the bound Mg 2+ ion was corrected. The revised version of the coordinates has the same PDB entry name, 5GMY, using the entry versioning system. Values in parentheses are for the highest resolution shell.