The Glycoprotease CpaA Secreted by Medically Relevant Acinetobacter Species Targets Multiple O-Linked Host Glycoproteins

CpaA is a glycoprotease expressed by members of the Acinetobacter baumannii-calcoaceticus complex, and it is the first bona fide secreted virulence factor identified in these species. Here, we show that CpaA cleaves multiple targets precisely at O-glycosylation sites preceded by a Pro residue. This feature, together with the observation that sialic acid does not impact CpaA activity, makes this enzyme an attractive tool for the analysis of O-linked human protein for biotechnical and diagnostic purposes. Previous work identified proteins involved in blood coagulation as targets of CpaA. Our work broadens the set of targets of CpaA, pointing toward additional roles in bacterium-host interactions. We propose that CpaA belongs to an expanding class of functionally defined glycoproteases that targets multiple O-linked host glycoproteins.

All of the CpaA targets tested so far were simultaneously N-and O-glycosylated proteins. To test whether CpaA can cleave N-linked glycoproteins, we incubated CpaA with RNaseB, a glycoprotein modified only with N-linked glycans. No proteolytic activity was observed with RNaseB as a substrate (Fig. S1). Together, these results indicate that CpaA possesses broad substrate specificity, being able to cleave multiple O-glycosylated proteins in addition to fV and fXII, and that CpaA activity is unaffected by sialylation.
CpaA cleaves between Pro and a glycosylated Ser/Thr. We employed an MS approach to gain insight into the common molecular features that dictate CpaA substrate recognition. We excised the gel pieces containing the proteolysis products indicated with asterisks in Fig. 1, treated them with trypsin, and subsequently subjected them to MS analysis, as done previously (37). For most of the proteins, with the exceptions of TIM4 and C1-INH, we identified nontryptic peptides, resulting from CpaA activity ( Fig. 3 and Fig. S2). These peptides contained an invariant C-terminal Pro residue (P1 position) ( Fig. 3A and Fig. S2A, C, and D), which in the full-length protein sequences is always adjacent to a glycosylated Ser or Thr (S/T*, where the asterisk indicates a glycosylated amino acid) (Fig. 3C). We also identified nontryptic glycopeptides derived from etanercept that contained a glycosylated N-terminal Ser ( Fig. 3B and Fig. S2B). The peptides were glycosylated with either N-acetyl hexose-hexose-N-acetyl neuraminic acid (HexNAc-Hex-Neu) or HexNAc-Hex moieties ( Fig. 3B and Fig. S2B), reinforcing the concept that CpaA activity is indifferent to the presence of sialic acid. The CpaA-dependent cleavage products, including those of fXII (previously reported), were used as WebLogo inputs (Fig. 3C) (15,38). This analysis revealed that CpaA has a distinct peptide consensus sequence, P-S/T*, where cleavage occurred before the glycosylated Ser/Thr residue. As seen in Fig. 3C, the C-terminal Pro residue preceding the O-glycosylation site is likely a strict requirement for CpaA targeting. Human erythropoietin (EPO) contains the same Ser-bound oligosaccharide as fetuin but has an Ala residue preceding the glycosylation site. This protein was not cleaved by CpaA (Fig. S1B), further supporting the essentiality of the Pro residue preceding the O-glycosylation site.
Mammalian glycan array screening. The recombinant proteins tested in this study were all expressed and purified from HEK293 cells. This cell line generates proteins O-glycosylated with the disaccharide structure galactose ␤1-3 N-acetylgalactosamine (Gal␤1-3GalNAc), also known as core I, as well as mono-and disialylated core I and core   (39). Other glycan cores have more complex (branched) structures that may or may not be accommodated by CpaA (40). To explore this further, we employed a mammalian glycan array screening assay (performed at the Consortium for Functional Glycomics) to test if CpaA can directly bind various glycans. However, we did not detect any significant peak indicative of binding for any glycan in the array when employing two protein concentrations (5 g/ml and 50 g/ml) of either CpaA or its catalytically inactive mutant CpaA E520A , (Fig. S3). A similar lack of binding has been reported for other glycoproteases, which may be due to low-affinity or transient interactions with the glycans (41). It is noteworthy that none of the glycans in the glycan array are conjugated to P-S/T sequences. Considering our previous results, it is likely that recognition by CpaA is dependent on a combination of both protein sequence and glycan structure adopted in the array.
CD55 is removed from epithelial surfaces by secreted CpaA. Our previous in vitro results expanded the known CpaA substrates to include human glycoproteins beyond those involved in blood coagulation (Fig. 1A). To gain insight about CpaA activity in the context of an infection, we tested whether CpaA directly cleaves surface exposed O-glycoproteins. Of all the glycoproteins tested in vitro, CD55 and CD46 are highly expressed cell surface O-glycoproteins in HeLa cells (42). Thus, HeLa cells were treated with purified CpaA and CpaA E520A , and the levels of CD55 and CD46 bound to the cell surface were quantified by flow cytometry (Fig. 4). At the two protein concentrations tested, cells treated with CpaA displayed a reduced amount of CD55 on their surfaces compared to those treated with CpaA E520A (Fig. 4A). In contrast, although CpaA was able to cleave CD46 in our in vitro assay (Fig. 1A), levels of cell surface-exposed CD46 remained unaltered after CpaA treatment, independent of the protein concentration used (Fig. 4B).
Next, we infected HeLa cells with A. nosocomialis M2 expressing CpaA and CpaA E520A at three different multiplicities of infection (MOI) (10, 100, and 1,000) and used flow cytometry analysis to quantify the levels of cell surface exposed CD55 HexNAcHexNeuAc. (C) Sequences of the CpaA-dependent cleavage products were used as WebLogo inputs (weblogo.berkeley.edu). Factor XII cleavage sites were previously reported (15). The underlined peptide sequences were detected by mass spectrometry. The dashed line indicates the CpaA cleavage site. HexNAc, N-acetyl hexosamine; Hex, hexose; NeuAc, N-acetyl neuraminic acid. Characterization of the Glycoprotease CpaA ® and CD46 postinfection (Fig. 5). CD55 and CD46 levels remained unchanged when cells were infected with A. nosocomialis M2 secreting CpaA E520A , indicating that A. nosocomialis M2 does not secrete any other proteases targeting either glycoprotein (Fig. 5). In agreement with our previous results, secreted CpaA cleaved CD55 but not CD46 from the cell surface (Fig. 5). The CD46 protein used in the in vitro assay (Fig. 1A) was expressed and purified from HEK293 cells. It is well known that glycosylation patterns differ between cell lines (43); thus, it is possible that different protein glycosylation patterns impact CpaA activity. To address this, we repeated the experiment infecting HEK293 cells. As observed with HeLa cells, secreted CpaA digested CD55 but not CD46 (Fig. S4), indicating that potential differences in glycosylation between these two cells lines do not account for these discrepancies. Importantly, an MOI of 100 was sufficient to detect cleavage of CD55 from HeLa cells, and increasing the MOI to 1,000 did not boost CD55 cleavage by CpaA (Fig. 5A). The remaining CD55 (and perhaps CD46) may be associated with proteins/ligands that prevent CpaA activity. We conclude that A. nosocomialis secretes physiological levels of CpaA that can digest host surface-exposed proteins during infection.
Molecular modeling identifies a putative mode of binding of glycopeptides to CpaA. We previously determined the X-ray cocrystal structure of the CpaA-CpaB complex (13), and others have obtained crystal structures of related metzincin enzymes with peptide-and peptidomimetic-based ligands (Table 1) (44,45). X-ray structures of the related gluzincin glycopeptidases BT4244 (Bacteroides thetaiotaomicron), IMPa, and ZmpB (Clostridium perfringens ATCC 13124) have also been determined with bound glyco-amino acid and glycopeptide ligands (16). Notably, these gluzincin enzymes cleave fetuin, asialofetuin, and related synthetic glycopeptides to various degrees (Table 1). Thus, to better understand binding between CpaA and glycopeptide substrates, we performed docking experiments between a CpaA model and three different glycoforms of a fetuin-based fragment peptide (Ac-EAPSA-N-methyl [N-Me], where S is glycosylated). One of the glycoforms lacks the sialic acid moiety ( Fig. 6 and Fig. S5A), while the other two are sialylated at two different position of the Gal␤1-3GalNAc core ( Fig. S5B and C). The peptide portions of the docked species were able to contact the catalytic zinc ion while forming an antiparallel ␤-sheet with a ␤-strand of the active site (Fig. 6A), findings that are consistent with aforementioned metzincin crystal structures as well as docking experiments with StcE (16,22). In the docked structures, the consensus P1 proline residue lies adjacent to W493 and is somewhat solvent exposed (Fig. 6B). An H-interaction was observed between a proline beta hydrogen and the Characterization of the Glycoprotease CpaA ® tryptophan indole ring (Fig. 6B). These studies suggest that CpaA selectivity may be the result of W493 (i) forming a potential H-interaction with the prolyl ring, (ii) minimizing the prolyl residue's exposure to solvent, and/or (iii) sterically holding the substrate in the active site.
In the docked structures, the glycan moieties were found to form few interactions with CpaA and the substrate peptide. The acetyl group of the GalNAc moieties all formed hydrogen bonds with the peptide backbone (Fig. 6A). Similar contacts were observed in docking studies using O-GalNAc-ylated peptides and StcE (22). Such contacts have been shown to affect the conformation of mucin-like glycopeptides (46), and they may bias substrates into an extended conformation that would be more easily recognized by the enzyme. We also found that the 4-OH of the GalNAc moieties formed hydrogen bonds with the side chain amide of N551 ( Fig. 6A and Fig. S5A to C). This interaction would likely endow the enzyme with selectivity for peptides modified with GalNAc at P1=. Interestingly, the side chain of this residue occupies space similar to that of the side chains of tryptophan residues that are conserved in the related gluzincin enzymes BT4244, IMPa, and ZmpB ( Fig. S5D and E). In the gluzincin enzymes, the indole nitrogens instead hydrogen bond to the acetyl groups of GalNAc ligands. Such a contact may be formed between CpaA and its substrates if minor conformational The enzyme (gray) and the substrate's peptide portion (green) form hydrogen bonds (dashed blue lines) between their backbones similar to an antiparallel beta sheet. The catalytic histidine residues (gray sticks) and zinc ion (gray sphere) hydrolyze the amide bond between the substrate's proline (P1) and glycosylated serine (P1=) residues. The GalNAc moiety (yellow sticks) forms hydrogen bonds with the substrate and the enzyme, whereas the Gal moiety (yellow sticks) is exposed to solvent. (B to E) The tryptophan residue of the native enzyme can form an H-interaction (dashed orange lines) with the substrate. This is interaction is weakened when this residue is mutated to phenylalanine and nonexistent with aliphatic residues. changes occur. In either case, N551 appears to be the main residue recognizing the unique features of the GalNAc moiety, as the other residues flanking the glycans are mostly small and nonpolar. The related gluzincin enzymes, on the other hand, form several contacts between polar side chains and their GalNAc groups.
Larger, linear glycans project subsequent sugar residues into solvent; these groups are not predicted to interact with CpaA. Branched glycans are not well accommodated by the enzyme, as the disulfide bond adjacent to the active site limits the flexibility of the enzyme and the branched sugar's ability to bind. Conversely, both IMPa and ZmpB form interactions with additional sugars of their corresponding glycans. Still, only IMPa demonstrated binding activity in the mammalian glycan array screen (16), which may explain why no hits were found in the same screen with CpaA. Collectively, our docking studies indicate that CpaA interacts with both peptide and glycan components of the substrate and identify residue W493 as a potential mediator of the interaction between CpaA and the Pro residue of its substrate.
Effect of W493 on CpaA specific activity. Docking studies suggest an Hinteraction between the indole ring (of W493) of CpaA and a beta hydrogen of Pro residue (at the P1 position) of the substrate (Fig. 6B). While other substrate residues can form this contact with W493, the unique rigidity and geometry of Pro allow it to optimally present its beta hydrogen to the indole ring of W493. Thus, we hypothesized that W493 of CpaA plays a critical role in CpaA selectivity. Indeed, our molecular docking studies indicate a weaker Hinteraction when peptides were docked into a mutant W493F model (Fig. 6C). The modeled W493L and W493A mutants have aliphatic residues that are unable to form this interaction with the substrate (Fig. 6D and E). Moreover, all the mutant models formed fewer van der Waals contacts with the Pro residue and further exposed it to solvent.
To complement our molecular modeling experiments, we tested the effect of these mutations on CpaA activity. All CpaA point mutation variants were expressed and secreted at levels similar to those of CpaA and CpaA E520A , and no degradation products of CpaA were observed in the whole-cell fractions of the CpaA variants ( Fig. S6A and B). CpaA E520A was included as a negative control for CpaA activity. We purified all Histagged CpaA variants and determined their in vitro activities against various substrates ( Fig. 7A and Fig. S6C and S7). The different mutations affected CpaA efficiency and site recognition in a substrate-specific manner. All mutants except CpaA E520A were able to cleave fetuin, yielding a similar cleavage pattern (Fig. 7A). However, CpaAW493A cleaved fetuin with less efficiency. None of the CpaA variants were able to cleave EPO, further highlighting the essentiality of a Pro residue at P1 for targeting by CpaA (Fig. 7B). Treatment of CD55 and C1-INH with the CpaA mutants revealed that all variants are less active, as shown by an increase in the amounts of undigested substrate. Additional faint bands were observed, but we were unable to determine the cleavage site using MS analysis (Fig. S7).
We previously identified three cleavage sites for CpaA on the mid-region of etanercept, Pro 207 Thr 208 , Pro 215 Ser 216 , and Pro 225 Ser 226 (Fig. 3B and Fig. S2). Cleavage by CpaA generates two fragments of similar molecular weights that comigrate as a single band of about ϳ36 kDa in SDS-PAGE ( Fig. 1A and 7C). Notably, digestion of etanercept by CpaAW493A and W493L generated a major product of ϳ45 kDa instead (Fig. 7C). CpaAW493F produced three bands that migrate as ϳ36-, 40-, and 45-kDa fragments, respectively (Fig. 7C). MS analysis of these bands allowed the identification of the sites preferentially cleaved by the three CpaA variants (Fig. 7D). The unique ϳ45-kDa band results from cleavage at the Pro 183 Thr 184 site, which is not a preferred site for wild-type CpaA. Unlike the other cleavage sites, the Pro 183 Thr 184 site is in an area of low glycosylation of etanercept. Interestingly CpaAW493A and W493L variants were unable to cleave etanercept in the high-glycosylation-density region, which could explain the low activity against other mucin targets. Taken together, our data suggest that W493, although is not essential for activity, plays a role in CpaA substrate selectivity by interacting with the Pro residue of its target protein.
CpaA belongs to an expanding, functionally defined class of surface-exposed and secreted glycoproteases. Mounting evidence indicates that both commensal and pathogenic species produce and secrete glycoproteases to modulate adherence, penetrate the inner mucus layer, or evade the host immune response (18,21,47). For example, StcE contributes to immune evasion during EHEC infection by preventing immune cells from moving to the sites of infection (20,23,48,49). Additionally, StcE activity against mucins promotes access of EHEC to epithelial cells, which assists host cell colonization (20,50). P. aeruginosa produces IMPa, which cleaves the macrophage surface protein CD44, inhibiting phagocytosis (29). IMPa also cleaves P-selectin glycoprotein ligand 1 (PSGL-1), helping the bacterium to escape neutrophil attack (29). In Vibrio cholerae, secreted TagA targets host cell surface glycoproteins, modulating bacterial attachment during infection (26). These examples are indicative of the pivotal role of glycoproteases in modulating host-pathogen interactions by targeting various host proteins. Despite their relevance, only a relatively small number of bacterial glycoproteases have been biochemically characterized to various extents (Table 1). These enzymes are commonly encoded by bacteria isolated from mucin-rich environments, such the human gut and lungs. These glycoproteases differ in their secretion mechanisms, protease class, domain organization, catalytic site, and recognized targets. Like IMPa, StcE, TagA, and SslE/YghJ, CpaA is secreted by a T2SS. CpaA, IMPa, and ZmpC display broad O-glycoprotease activity targeting mucins, regardless of their glycosylation density, as well as O-glycoproteins with low O-glycan chain density (fetuin). On the other hand, StcE can cleave only mucins with long mucin-like regions (such as CD43 and CD55) or short mucin-like regions (such as C1-INH). In contrast, other glycoproteases can target only a subset of mucins. For example, TagA requires more extensive densely glycosylated regions, whereas SslE/YghJ can digest major mucins such as MUC2 and MUC3, but it is inactive against mucins like CD43 and bovine submaxillary mucin (9,23,37,51).
Due to low amino acid sequence conservation, it is not possible to differentiate proteases from glycoproteases solely on the basis of primary amino acid sequence. However, structural analyses revealed the presence of Ig-fold domains in all these enzymes (Table 1, domains in yellow). Moreover, several factors impact substrate targeting by glycoprotease, including glycan chain identity and density, as well as amino acid composition (Table 1). Although some target motifs are known, the molecular bases for the recognition of specific target sequences remain poorly understood. Thus, even if structural analyses identify putative glycan-binding domains in a protein of interest, it is difficult to predict the specific substrates targeted by the putative glycoprotease. Together with the functional analysis, these studies define a functional class of secreted O-glycoproteases that mediate host-pathogen and hostcommensal interactions.

DISCUSSION
Glycan chains decorate proteins to accomplish many different functions. One important role of glycosylation is the protection of proteins against proteolytic degradation. However, there is growing evidence that bacterial pathogens and commensals have evolved specific proteases that overcome the steric impediment posed by carbohydrates and indeed use glycans as recognition determinants to cleave glycoproteins right at the glycosylation site. In this work, we functionally characterized CpaA, a metzincin glycoprotease and T2SS-secreted virulence factor of several medically relevant Acinetobacter strains (4,7,8,12). Previous work identified the blood coagulation proteins fV and fXII as targets of CpaA, suggesting a potential role in dissemination by interfering with the intrinsic coagulation pathway (4,8,12). The present work expands the known targets of CpaA and indicates that CpaA is a broad-spectrum enzyme with the ability to cleave various O-linked human glycoproteins. Our MS analysis of proteolytic fragments resulting from glycoprotein treatment with CpaA revealed that CpaA has a consensus target sequence consisting of a Pro residue followed by a glycosylated Ser or Thr (P-S/T), which is unprecedented for bacterial glycoproteases. Unlike other secreted glycoproteases, CpaA activity is not affected by sialic acid and is not restricted to highly O-glycosylated proteins (mucins). Indeed, CpaA also cleaves sparsely O-glycosylated proteins, such as fetuin. Although broad-spectrum secreted or surfaceexposed glycoproteases appear to be widespread in bacteria, their identification cannot be assigned based on sequence homology, and biochemical and structural analyses are required to designate them as glycoproteases.
CpaA is composed of four very similar Ig-like domains and a catalytic domain (13). The catalytic domain located at the C terminus of CpaA exhibits all the canonical structural features of the metzincin superfamily. The four Ig-like domains are arranged in tandem, and they resemble the insertion domain of StcE, secreted by EHEC (13). These observations prompted us to further characterize CpaA activity. The StcE-specific motif S/T*-X-S/T (the asterisk denotes the glycosylation site) diverges from the P-S/T* motif recognized by CpaA (22). It is intriguing that despite recognizing different motifs, StcE and CpaA share the substrates C1-INH and CD55. Considering the different domain organization of the two proteases, it is not surprising that the proteins are classified in different metzincin subfamilies and target different proteins.
We previously showed that the substrate-binding cleft of CpaA is formed by residues from its four Ig-like domains and its catalytic domain (13). Thus, to be recognized by CpaA, the glycosylated substrate has to expose the P-S/T* peptide bond targeted for hydrolysis. It has been proposed that the interaction of the O-glycans with residues in the Ig-like domains (referred to as G sites by Noach et al. [16]) help to position the targeted peptide bond in the correct conformation to interact with the amino acids involved in catalysis (13,16). Here, we show that CpaA activity is unaffected by the presence of sialic acid, indicating that the sialic acid can be accommodated inside the cleft but is not required for glycopeptide recognition. Our molecular modeling of CpaA with glycosylated substrates showed that linear glycans modified by sialic acid project this moiety away from the active site and into solvent, which supports our in vitro findings. Our modeling also revealed a possible interaction between the indole ring of W493 and the ring of the Pro residue in the targeted sequence. Although the digestion of glycoproteins with W493 mutants displayed substrate-specific behaviors, overall, the mutants were less active and, in some cases, exhibited a shift in glycosylation site (glycosite) preference. We propose that CpaA selectivity may, at least in part, be the result of W493 forming a potential H-interaction with the prolyl ring, minimizing the prolyl residue's exposure to solvent, and/or sterically holding the substrate in the active site. Further structural and biochemical studies are required to uncover the structural features that enable CpaA to target such a remarkably broad range of O-linked glycoproteins.
CpaA expression and secretion occur across several medically relevant Acinetobacter strains. Deletion of CpaA resulted in attenuation of A. nosocomialis M2 virulence in a respiratory murine infection model, playing a role in the dissemination from the lungs to the spleen (8). Previously, the only known substrates for CpaA were fV and fXII, proteins involved in blood coagulation. By digesting these proteins, CpaA increases the clotting time of human plasma (12,15). We have now shown that CpaA can indeed cleave multiple proteins in vitro. Among these are several proteins involved in regulation of complement activation, including CD55 and CD46. However, only CD55 was removed from the cell surfaces, while CD46 remained unaltered during the A. nosocomialis infection assay. We hypothesize that CD46 interacts with another protein that blocks CpaA access. Thus, not all CpaA targets identified by our in vitro experiments are representative bona fide targets of CpaA in vivo. An additional role of CD55 is to act as an anti-adhesive molecule that regulates the release of neutrophils (52). Degradation of CD55 increases the retention of the neutrophils to the apical epithelial surface with the concomitant reduction of the amount of neutrophils that cross the epithelium (48,49). CpaA expression and secretion are conserved across several medically relevant Acinetobacter strains isolated from diverse anatomical sites (8). We aligned the amino acid sequences of CpaA from A. nosocomialis M2 and several medically relevant A. baumannii strains (Fig. S8). We observed extremely high protein identity, suggesting that our findings can be extended to CpaA orthologs secreted by A. baumannii. Considering the broad-spectrum activity of CpaA and the abundance of glycosylated proteins in the human host, we propose that the physiological role of CpaA likely extends beyond interfering with the coagulation pathway or complement cascade. Further work will be required to understand the full extent of host immunomodulation by CpaA.
Host mucins and O-glycoproteins are major components of mucus, and they are ubiquitously expressed on cellular surfaces, where they act as physical barriers, receptor ligands, and mediators of intracellular signaling (53,54). O-Glycoproteins and mucins lack a consensus sequence for O-glycosylation, and their O-linked glycans are highly heterogeneous in their glycan composition, numbers of residues, and linkages. However, aberrant mucin expression and glycosylation are also linked to various disease states, making mucins reliable biomarkers (54). For example, the mucin MUC1 is aberrantly expressed in the majority of cancers diagnosed each year in the United States. (53). Thus, the assessment of the mucin glycosylation status has high relevance for diagnosis of cancer and other diseases. Mucin domains are resistant to most commercially available proteases, which makes them difficult to analyze by traditional MS strategies. Bacterial glycoproteases have recently gained attention as tools for proteomic analysis of human glycoproteins (16,22,55). Our study shows that the broad-spectrum O-linked glycoprotease activity of CpaA is not affected by sialic acids. Moreover, it consistently digests any O-linked glycoprotein containing the P-S/T* sequence. These properties not only make CpaA a versatile enzyme modulating hostpathogen interactions but also highlight it as a robust and attractive new component of the glycoproteomics toolbox.

MATERIALS AND METHODS
Strains, plasmids, and growth conditions. Bacterial strains and plasmids used in this study can be found in Table S1. E. coli Stellar and A. nosocomialis M2 cells were grown in Lennox broth (LB) at 37°C. pWH1266-based plasmids were selected with tetracycline (5 g/ml). solution of 50 mM NH 4 HCO 3 -50% ethanol for 20 min at room temperature with shaking at 750 rpm. Destained bands were dehydrated with 100% ethanol, vacuum dried for 20 min, and then rehydrated in 50 mM NH 4 HCO 3 plus 10 mM dithiothreitol (DTT). Protein bands were reduced for 60 min at 56°C with shaking then washed twice in 100% ethanol for 10 min to remove residual DTT. Reduced ethanol washed samples were sequentially alkylated with 55 mM iodoacetamide in 50 mM NH 4 HCO 3 in the dark for 45 min at room temperature. Alkylated samples were then washed with 50 mM NH 4 HCO 3 followed by 100% ethanol twice for 5 min to remove residual iodoacetamide and then vacuum dried. Alkylated samples were then rehydrated with 12 ng/l trypsin (Promega) in 40 mM NH 4 HCO 3 at 4°C for 1 h. Excess trypsin was removed, and gel pieces were covered in 40 mM NH 4 HCO 3 and incubated overnight at 37°C. Peptides were concentrated and desalted using C 18 stage tips (57,58) before analysis by liquid chromatography (LC)-MS.
Identification of CpaA digestion products and cleavage sites using reverse-phase LC-MS. Purified peptides prepared were resuspend in buffer A* (0.1% trifluoroacetic acid [TFA], 2% acetonitrile) and separated using a two-column chromatography setup composed of a PepMap100 C 18 20-mm by 75-m trap and a PepMap C 18 500-mm by 75-m analytical column (Thermo Fisher Scientific). Samples were concentrated on the trap column at 5 l/min for 5 min with buffer A (0.1% formic acid, 2% dimethyl sulfoxide [DMSO]) and infused into an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific) at 300 nl/min via the analytical column using a Dionex Ultimate 3000 ultraperformance liquid chromatograph (UPLC) (Thermo Fisher Scientific). Gradients (45 or 65 min) were run for each sample, altering the buffer composition from 1% buffer B (0.1% formic acid, 77.9% acetonitrile, 2% DMSO) to 28% B over 20 or 40 min, then from 28% B to 40% B over 5 min, and then from 40% B to 100% B over 2 min; the composition was held at 100% B for 3 min, dropped to 3% B over 5 min, and held at 3% B for another 10 min. For 45-min gradients, the Lumos mass spectrometer was operated in a data-dependent mode automatically switching between the acquisition of a single Orbitrap MS scan (240,000 resolution) every 3 s and MS2 events. For each ion selected, dissociation parameters were as follows. For collision-induced dissociation (CID), Fourier transform MS (FTMS) was at 15,000 resolution, maximum fill time was 100 ms, and automatic gain control (AGC) was 2 ϫ 10 5 . For higher-energy collisional dissociation (HCD), FTMS was at 15,000 resolution, maximum fill time was 120 ms, normalized collision energy was 35, and AGC was 2 ϫ 10 5 . For electron transfer-higher-energy collision dissociation (EThcD), FTMS was at 15,000 resolution, maximum fill time was 120 ms, supplementary activation was 15%, and AGC was 2 ϫ 10 5 . For 65-min gradients, the Lumos mass spectrometer was operated in a data-dependent mode, automatically switching between the acquisition of a single Orbitrap MS scan (120,000 resolution) every 3 s and MS2 Mass spectrometry data analysis. The assessment of the protein coverage within CpaA-digested bands and the identification of mucin glycopeptides was accomplished using MaxQuant (v1.5.3.30) (60). Searches were performed against the custom databases populated with the protein sequence of the recombinant proteins of interest with carbamidomethylation of cysteine set as a fixed modification. Searches were performed with semitrypsin cleavage specificity allowing 2 miscleavage events. For the identification of glycopeptide, multiple searches were performed on each sample, allowing oxidation of methionine and a maximum of three glycan variable modifications using (i) HexNAc (S/T), HexHexNAc (S/T), HexNAcHexNAc (S/T) or (ii) HexNAc (S/T), Hex(1)HexNAc(1)NeuAc(1) (S/T), Hex(2)HexNAc(1)NeuAc(1) (S/T). The precursor mass tolerance was set to 20 ppm for the first search and 10 ppm for the main search, with a maximum false discovery rate (FDR) of 1.0% set for protein and peptide identifications. The resulting protein group output was processed within the Perseus (v1.4.0.6) (60) analysis environment to remove reverse matches and common protein contaminates. Semitryptic peptides and semitryptic glycopeptides were manually inspected for correctness. Annotation of MS-MS provided within the supplementary data was undertaken using the Interactive Peptide Spectral Annotator (60).
Following a previously described protocol (22), the crystal structure of CpaA was prepared by adding unresolved side chains and hydrogens as well as capping termini with acetyl or N-methyl (N-Me) groups. Notably, the crystallographic chaperone protein CpaB was not included during this study. This protein arranges its C-terminal tail into the CpaA catalytic site, similar to zymogens of related metallopeptidases (62), and we assume that this portion of CpaB is displaced by substrate.
The glycopeptides were docked into the CpaA model in three steps: conformational search, virtual screen, and minimization. This process allowed exploration of all reasonable conformations of each glycopeptide prior to induced-fit docking with the CpaA model. Each glycopeptide underwent a conformational search using the Amber10:EHT force field (65) to generate a corresponding library of conformers. During this step, the side chains of the peptide and the pendant groups of each sugar were allowed to freely rotate; any sialic acid moieties were also allowed to move freely. All other atoms were fixed. This process generated small libraries of approximately 4,000 conformers for each glycopeptide species. The conformers of each glycopeptide library underwent virtual screening with the CpaA model, again using the Amber10:EHT force field. During this process, all atoms of the enzyme and each glycopeptide were fixed, and the docking score for each resulting complex was calculated using the GBVI/WSA dG scoring function (66). Finally, the top 10 complexes identified from the virtual screening were subsequently minimized using the Amber10:EHT force field. The glycopeptide substrate, the catalytic zinc ion, and residues of the CpaA model having atoms within 10 Å of the substrate were allowed to move; all other residues were fixed, and solvent molecules were omitted. The best, most consistent complexes are shown.
This screening and minimization process was repeated for the Gal␤1-3GalNAc␣1-modified glycopeptide conformer library with the mutant CpaA models CpaAW493F, CpaAW493L, and CpaAW493A. In all cases, the dihedral angles of the peptide backbones (, ) and side chains () as well as the initial glycosidic linkages (, ) of the docked substrates were measured to ensure proper geometry.
Data availability. The mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium database via the PRIDE (67) partner repository with the data set identifier PXD019941.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.