Asparagine-Linked Glycans of Cryptosporidium parvum Contain a Single Long Arm, Are Barely Processed in the Endoplasmic Reticulum (ER) or Golgi, and Show a Strong Bias for Sites with Threonine*

Cryptosporidium parvum causes severe diarrhea in infants in developing countries and in immunosuppressed persons, including those with AIDS. We are interested in the Asn-linked glycans (N-glycans) of C. parvum, because (1) the N-glycan precursor is predicted to contain five mannose and two glucose residues on a single long arm versus nine mannose and three glucose residues on the three-armed structure common in host N-glycans, (2) C. parvum is a rare eukaryote that lacks the machinery for N-glycan-dependent quality control of protein folding in the lumen of the Endoplasmic Reticulum (ER), and (3) ER and Golgi mannosidases, as well as glycosyltransferases that build complex N-glycans, are absent from the predicted proteome. The C. parvum N-glycans reported here, which were determined using a combination of collision-induced dissociation and electronic excitation dissociation, contain a single, unprocessed mannose arm ± terminal glucose on the trimannosyl chitobiose core. Upon nanoUPLC-MS/MS separation and analysis of the C. parvum tryptic peptides, the total ion and extracted oxonium ion chromatograms delineated 32 peptides with occupied N-glycan sites; these were derived from 16 glycoproteins. Although the number of potential N-glycan sites with Thr (NxT) is only about twice that with Ser (NxS), almost 90% of the occupied N-glycan sites contain NxT. The two most abundant C. parvum proteins modified with N-glycans were an immunodominant antigen on the surface of sporozoites (gp900) and the possible oocyst wall protein 1 (POWP1). Seven other glycoproteins with N-glycans were unique to C. parvum; five shared common ancestry with other apicomplexans; two glycoproteins shared common ancestry with many organisms. In summary, C. parvum N-glycans are remarkable for the absence of ER and Golgi modification and for the strong bias toward occupancy of N-glycan motifs containing Thr.

and Golgi (22). C. parvum sporozoites label with cyanovirin-N, an anti-retroviral lectin that binds to the high mannose Nglycans of gp120 in HIV (23,24). C. parvum is a rare eukaryote that lacks the machinery for N-glycan-dependent quality control of protein folding, and there is no positive selection for N-glycan sites in secreted proteins of C. parvum (25)(26)(27). Antigenic proteins on the surface of Cryptosporidium sporozoites (e.g. gp900 and gp40/gp15), oocyst wall proteins (COWPs) and possible oocyst wall proteins (POWPs), are glycoproteins with numerous predicted N-glycan sites (24, 28 -40). Finally, Concanavalin A, which binds some N-glycans and other mannose-containing structures, recognizes numerous C. parvum antigens, Whereas release of N-glycans reduces binding of immune sera to parasite proteins on Western blots (41).
Here we used tandem mass spectrometry to identify the proteins that contain N-glycans, and determine the structures of N-glycans released with PNGaseF. We found that the Nglycans of C. parvum contain a single long arm, are barely processed in the ER or Golgi, and show an extreme bias for sequons with threonine.

EXPERIMENTAL PROCEDURES
Parasites and Reagents-C. parvum oocysts were purchased from Bunch Grass Farm (Deary, ID) and handled under BSL-2 protocols approved by the Boston University Institutional Biosafety Committee. All chemicals and reagents, including proteomics grade trypsin, were obtained from Sigma-Aldrich (St. Louis, MO), unless otherwise stated. All solvents used for LC-MS were Fisher Scientific Optima™ grade (Thermo-Fisher Scientific, Waltham, MA). PNGase F was from New England Biolabs (Ipswich, MA).
Protein Extraction-Two distinct methods were utilized to extract proteins from whole C. parvum oocysts. The first method used a combination of mechanical disruption and detergent extraction. Briefly, 10 9 oocysts were concentrated by centrifugation at 1000 ϫ g for 10 min at 4°C. The oocysts were resuspended in phosphate buffered saline (PBS) with EDTA-free cOmplete TM protease inhibitor (Roche, Basel, Switzerland). The oocysts were broken using 0.5-mm glass beads with 4 ϫ 5 min cycles of vigorous bead beating at 4°C. Samples were placed in an ice bath between cycles to mitigate any heating effect. Proteins were extracted using a buffer containing protease inhibitor (10 mM HEPES, 25 mM KCl, 1 mM CaCl 2 , 10 mM MgCl 2 , 2% CHAPS, 6 M guanidine HCl, 50 mM dithiothreitol (DTT), pH 7.4). Insoluble material was removed by centrifugation at 21,130 ϫ g for 5 min at 4°C in an Eppendorf (Hamburg, Germany) 5424R microcentrifuge. The supernatant was removed and added to a new microcentrifuge tube; proteins were precipitated by the addition of Ϫ20°C acetone (acetone/sample v/v 8:1) and the tube was allowed to sit undisturbed for Ն18 h at Ϫ80°C. The proteins were concentrated by centrifugation at 21,130 ϫ g for 20 min at 4°C. The supernatant was discarded, and the pellet was washed 3x with ice-cold acetone. Any remaining solvent was removed in an unheated SpeedVac Plus speed vacuum (Savant, Thermo-Fisher Scientific).
The second chemical method used hot phenol to kill and extract total proteins from 10 9 C. parvum oocysts (42,43). C. parvum oocysts were pelleted by centrifugation, resuspended in 500 l of distilled water, and added to a conical vial containing 1 ml of phenol, preheated to 68°C in a heating block filled with sand. The vial was sealed, and the contents mixed by inversion every 2 min for 20 min.
The vial was removed, placed on ice, and gently centrifuged to facilitate good phase separation. The aqueous layer was removed and discarded. The interphase and phenol layers were carefully separated and saved. The proteins were subsequently precipitated from the phenol and interphase layers by the addition of eight volumes of Ϫ20°C MeOH containing 100 mM NH 4 OAc, and allowed to sit undisturbed for Ն18 h at Ϫ20°C. The precipitated proteins were concentrated by centrifugation, and pellets were washed 3x with Ϫ20°C MeOH/0.1 M NH 4 OAc prior to lyophilization.
Trypsin Digestions-Three sets of samples were prepared for proteomics experiments. The fraction obtained from the mechanical extraction is referred to as "CHAPS" in the analysis. Two fractions from the chemical extraction method came from the phenol layer (referred to as "phenol") and the interphase layer (referred to as "interphase"). Precipitated proteins from these three samples were dissolved into 50 mM NH 4 HCO 3 , pH 8.0, reduced with 50 mM DTT for 20 min at 60°C, cooled to RT, and then alkylated with iodoacetamide (IAA) for 20 min at RT, while protected from light. Excess IAA was quenched with DTT, and peptides were generated by digestion with proteomics grade trypsin, overnight at 37°C (1:20, w/w). The resulting tryptic peptides were dried by speed vacuum and desalted with C18 ZipTip concentrators (EMD Millipore, Danvers, MA), according to the manufacturer's protocol.
Release and Processing of N-glycans-N-glycans were released from total protein isolated from oocysts (100 g) by overnight treatment at 37°C with ten units of glycerol-free PNGase F (New England Biololabs), according to the manufacturer's instructions, without the addition of Nonidet P-40. The product mixture was lyophilized, and the released N-glycans were separated from the proteins by addition of 0.1% trifluoroacetic acid (TFA) in LC-MS grade water. The aqueous phase was passed onto C-18 Sep-Pak cartridges (Waters Corporation, Milford, MA). The cartridges were washed with three bed volumes of 0.1% TFA/water, and the eluents were pooled and lyophilized. The N-glycan pool was reduced with 0.5 M NaBD 4 /2 M NH 3 (aq) overnight at 55°C. The reaction was quenched by dropwise addition of glacial acetic acid. The products were washed multiple times with 10% acetic acid/MeOH, dried with a gentle stream of nitrogen, and washed again multiple times with 100% MeOH.
Permethylation was performed by published methods (44,45). Briefly, a slurry of finely ground NaOH in dimethyl sulfoxide was added to the deutero-reduced sample; the suspension was mixed and methyl iodide was added. The solution was gently mixed at RT for one hr. To assure complete derivatization, the process was repeated three times. The product was isolated by extraction with water/chloroform, and the chloroform layer was dried in the SpeedVac.
MALDI-TOF MS-The purified deutero-reduced sample was dissolved in 20 l of 1:1 MeOH/water, and 0.5 l of this solution was spotted onto a stainless steel MALDI target with 2,5-dihydroxybenzoic acid as the matrix. The mass spectra were recorded with an ultrafleXtreme MALDI-TOF/TOF MS (Bruker Daltonics, Bremen, Germany) equipped with a smartbeam II Nd-YAG laser (355 nm, 3 nsec, 2 kHz). Each spectrum was acquired by summing the signals recorded after 500 shots from each of 10 locations within the sample spot.
Electron Excitation Dissociation (EED) Fourier Transform-Ion Cyclotron Resonance (FT-ICR) MS/MS-The released, deutero-reduced, and permethylated N-glycans were dried and re-suspended in 10 l of 50% MeOH, 20 M sodium acetate. The solution was loaded into a pulled glass capillary tube and directly infused into the ion source of a SolariX 12-T hybrid Qh-FT-ICR mass spectrometer (Bruker Daltonics), using a nano-ESI source. Each [M ϩ Na] 1ϩ parent ion was isolated by the quadrupole and accumulated in the collision cell for 8 s. The accumulated ions were then transferred into the ICR cell.
Fragmentation by EED was achieved via 14-eV electrons generated from a cathode source heated with 1.5 A current. Electron density and energy were modulated using the following parameters: bias, 14 V; ECD lens, Ϫ13.85 V; pulse width 1.0 s. For each spectrum, 80 transients were averaged.
LC-MS/MS-The dried and desalted peptides were reconstituted in 2% ACN, 0.1% formic acid (FA) and separated on a NanoAcquity Ultra Performance Liquid Chromatography (UPLC) system (Waters), fitted with a nanoAcquity Symmetry C18 trap column (5-m packing, 180 m ϫ 20 mm) and a BEH130C18 analytical column (1.7-m packing, 150 m ϫ 10 cm). The mobile phase A was 99:1:0.1 HPLC grade water/ACN/FA), and mobile phase B was 99:1:0.1 ACN/HPLC grade water/FA. Each sample was loaded on the trapping column for 4 min at 4 l/min flow rate and then separated on the analytical column using a 45 or 90 min 2-40% mobile phase B linear gradient at 0.5 l/min flow rate. The column was washed between runs and equilibrated for 30 min.
The analytical column was coupled to a TriVersa NanoMate ion source (Advion, Ithaca, NY), and the ions were introduced into either an LTQ-Orbitrap-XL-ETD or a QE Plus mass spectrometer (both from Thermo-Fisher Scientific), which was operated in the positive-ion mode. MS spectra were obtained by scanning over the range m/z 350 -2000. MS 2 HCD spectra were acquired by isolating the top 5 (LTQ-Orbitrap) or top 20 (QEϩ) precursor ions with a 2-m/z window and fragmenting the selected precursor ions with 27, 35, or 45 V HCD energy. The MS 2 HCD spectra were scanned from m/z 100 to a value that was dependent upon the parent ion.
Manual Interpretation of Glycopeptide MS/MS Spectra-Raw data files from LC-MS/MS experiments were manually interpreted using Qual Browser in the Xcalibur 2.2 software suite (Thermo-Fisher Scientific). HCD MS 2 spectra containing oxonium ions were manually interpreted to determine the peptide sequence and the linear arrangement of the glycan. The y1 ion, corresponding to the residue Lys (K) or Arginine (R), was used as the starting point for most of the manually interpreted spectra. The resulting peptide tag was then searched using the online NCBI BLASTP algorithm (https://blast.ncbi.nlm.nih. gov/Blast.cgi) against the predicted C. parvum proteome, and the entire nr database (17)(18)(19). When a match was found, we determined the mass difference between the predicted trypsin generated peptide [M ϩ H] 1ϩ and that of the precursor, converted to [M ϩ H] 1ϩ . The glycosidic bond fragment series, typically accounting for the most abundant peaks in the spectra, were sequenced in a similar manner, so far as each series could be followed. Missing residues were accounted for by calculating the difference between the highest member of the assigned series and the total observed molecular weight. Extracted ion chromatograms were generated to aid in the assignment of the numerous glycoconjugates.
LC-MS/MS Proteomics Database Search and Analysis-Once the possible N-glycoforms were discovered from the manual interpretation, these values could be utilized to search against the predicted C. parvum proteome as possible dynamic modifications. Database searches were performed using the PEAKS software suite version 7.5 (Bioinformatics Solutions Inc., Waterloo, ON, Canada). The following parameters were set for the search: the data refinement step corrected for the precursor m/z, for the PEAKSdenovo search stages, trypsin was specified as the enzyme, 8.0 ppm parent mass error tolerance, 0.05 Da fragment mass error tolerance, with carbamidomethyl cysteine set as a fixed modification, and possible dynamic modifications set to include methionine oxidation, HexNAc at serine/ threonine; Hex 6 HexNAc 2 and Hex 5 HexNAc 2 on Asn. A maximum of five dynamic modifications was specified. The PEAKSDB search stage was identical to the PEAKSdenovo stage, with the exception that up to three missed trypsin cleavages were allowed, with the possibility of one nonspecific cleavage. Searches were performed against the C. parvum Iowa-II predicted proteome release-5.0 obtained from the Cryptosporidium Genome Resource (cryptodb.org), which contained 3803 entries (18,19). False discovery rate (FDR) estimation was enabled. For the final PEAKSPTM stage, the de novo score average local confidence (ALC) threshold was 15 and the peptide hit threshold (-10 logP) was set to 30. All possible Unimod modifications were considered for this stage. The PEAKSPTM report was exported as a mzidentML with a FDR set to 5%, ALC 50% for de novo only, and proteins with a score of (-10 logP) Ն 20 containing unique peptides Ն 2. Each data file was analyzed individually for all samples and replicates.
Scaffold Analysis-The mzidentML files from the PEAKSPTM searches were imported into the computer program Scaffold version 4.6 for further analysis (Proteome Software, Inc., Portland, Oregon). Three "Biosamples" and two "categories" were specified for the samples. The two categories corresponded to the method of protein extraction, either "mechanical" or "chemical." The sample names correspond to the subsample classification, the "CHAPS" was the mechanically broken 2% CHAPS extraction buffer soluble portion, and the "phenol" and "interphase" samples correspond to the phenol and interphase layers from the chemical extraction procedure. Each sample was analyzed independently, with experiment wide grouping and protein clustering. The probability model utilized was Peptide Prophet with delta mass correction. All spectra that were assigned by the software as possible N-glycosylated peptides were manually reviewed for quality and proper assignment to compile the final lists of glyopeptides and proteins that are available in supplemental table Excel S3.
Analysis of N-glycosylation Sites-For each protein observed to be N-glycosylated, the lists of occupied and total potential N-linked sites were compared. The "occupied" data set was created from the list of peptides modified with an N-glycan, taking for each a nine-amino acid window, centered on the modified asparagine. The same window was taken for all tryptic peptides that contained a canonical N-glycosylation sequon (Asn-Xxx-Ser/Thr, Xxx Pro) that could theoretically be generated from the group of observed glycoproteins. The program WebLogo v3.5.0 from the Department of Plant and Microbial Biology, University of California, Berkeley, was used to generate logos (46).
Analysis of Released N-glycans-To assist in the interpretation of the MS/MS spectra, we used the software GlycoWorkBench 2.1 (release 146) to generate theoretical fragmentation lists. Additional theoretical m/z values were generated using Microsoft Excel. Observed and theoretical peak lists were compared with obtain the best match. Assignments within 1-ppm error were considered to be a likely match. In the event that there were isobaric ion values, annotations were preferentially assigned to the ion that would be generated from a single fragmentation event. A single cross-ring fragment in combination with one or more glycosidic cleavages was considered only if the simple glycosidic bond fragment was also observed within the spectrum. All annotations were assigned only after a thorough manual review of the spectrum using Bruker DataAnalysis software suite version 4.0 SP5 build 283. Manual inspection helped to assign ions that didn't fit the list of theoretical values for expected cleavages.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (74) partner repository with the data set identifier PXD005503 and 10.6019/PXD005503. http://www.ebi.ac.uk/pride/archive/projects/PXD005503.

N-glycans of C. parvum Are Much Simpler Than Those of the Host and Most
Other Parasites-The predicted N-glycan precursor of C. parvum, based upon its Alg enzymes, is Glc 2 Man 5 GlcNAc 2 (supplemental table Excel S1) (16,20,21,27). MALDI-TOF MS of N-glycans that had been deuteroreduced and permethylated after being released by PNGaseF from oocyst glycoproteins showed Hex 6 HexNAc 2 ([M ϩ Na] 1ϩ m/z 1800.906) to be the most abundant form, whereas Hex 5 HexNAc 2 ([M ϩ Na] 1ϩ m/z 1596.805) is less abundant (Fig. 1). The C. parvum N-glycans are much simpler than those of calf glycoproteins, suggesting oocysts, which are washed with PBS and purified on a CsCl gradient, are clean of host tissues (56). The released N-glycans also match those present on glycopeptides that were separated by reversed phase C18 nanoflow chromatography, and identified by manual interpretation of HCD MS/MS spectra (see below).
EED FT-ICR MS/MS was performed on the deutero-reduced and permethylated Hex 6 HexNAc 2 in order to generate the glycosidic fragments that provide topographic information (Fig. 2) and the cross-ring fragments that provide linkage information (Fig. 3).
The glycan topology is indicated by complete glycosidic bond fragmentation, shown dominated by the non-reducing end fragments (C n -2H) series; in addition, reducing end glycosidic bond fragments of the (Y n -2H), and Z n series are prominent (Fig. 2). The single long arm topology is suggested by the sequential Z 3␣ to Z 6␣ ions, and the parallel sequential (Y 3␣ -2H) to (Y 6␣ -2H) series, where 1-4 hexoses are attached without branching. If branching were present in this tetrasaccharide moiety, there would be a gap in the linear series, and double glycosidic bond fragments might be observed along the chain. Instead, the only double glycosidic bond fragment series (Z 3␣ /Z 3␤ ϩ 2H) through (Z 6␣ /Z 3␤ ϩ 2H) correspond to cleavages involving the short arm (Fig. 3, supplemental table Excel S1). The (Z 3␣ /Z 3␤ ϩ 2H) ion indicates there is no branching from the chitobiose core. The remaining double glycosidic fragments of the (Z n /Z 3␤ ) series, where Z n is Z 4␣ to Z 6␣ , show loss of the single hexose Z 3␤ branch, with no branch points down the long arm (Fig. 3, supplemental table Excel S1).
The EED spectrum also allowed assignment of the linkage positions, as shown in Fig. 3. The key cross-ring fragment 0,4 A 5 and its paired reducing end fragment ( 0,4 X 2 -2H), show that the short arm has a single hexose attached via a 1,6linkage to the central hexose. The observation of 3,5 A 5 and 0,3 A 5 ions support this assignment. The ion pairs ( 1,3 A 5 and 1,3 X 2 ) indicate the longer arm, containing four hexose residues, is attached at the 2 or 3 position to this central hexose. Observation of the 0,2 X 2 ion eliminates the 2 position as the linkage site, thus, narrowing the possibility for the long arm linkage to position 3. Although the 1,3 A 5 and 1,3 X 2 ions are isobaric to the 2,4 A 5 and 2,4 X 2 ions and these might allow assignment of the linkage to the 3 or 4 position, the presence of the 3,5 A 5 ion rules out linkage at the 4 position. In sum, these cross-ring fragments indicate that the long arm containing four hexoses is attached by a 1,3-linkage to the central mannose.
The observation of 1,3 A 4␣ and 1,3 A 3␣ ions suggest that second and third hexoses on the long arm are either 2-or 3-linked. 0,2 A 3␣ or 0,2 A 4␣ fragments would be expected if these hexoses were linked at the 3 position; their absence suggests that the links are on the 2 position. The presence of an internal fragment 1,3 X 5␣ /B 5 indicates that the terminal hexose is 1,3-linked and this assignment is confirmed by the observation of the 0,2 A 2␣ -2H ion that rules out the possibility that the terminal hexose is 2-linked (Fig. 3).
EED FT-ICR MS/MS was also performed on Hex 5 HexNAc 2 , which is the less abundant C. parvum N-glycan (supplemental Figs. S1 and S2, supplemental table Excel S1). The same topology as described for the aforementioned Hex 6 HexNAc 2 (minus a terminal hexose on the long arm) is indicated by the complete (C n -2H), Z, (Y n -2H), and (Z n␣ /Z 3␤ ϩ 2H) series where n ϭ 3, 4, or 5 (supplemental Fig. S1 and supplemental table Excel S1). A series of cross-ring fragments similar to the Hex 6 HexNAc 2 glycan were observed for this glycoform. The observation of the ion pairs 0,4 A 4 and ( 0,4 X 2 -2H) indicate that a single hexose is linked at the 6 position; this assignment is further supported by the ions 3,5 A 4 and 0,3 A 4 (supplemental Fig. S2). The longer trihexose arm is linked at the 3 position, as indicated by the ion pairs 1,3 A 4 and ( 1,3 X 2 -2H), in conjunction with the 0,2 X 2 ion, ruling out the possibility of a 2 link for the isobaric pairs 2,4 A 4 and ( 2,4 X 2 -2H). Therefore, it can be concluded that a single hexose is linked 1,6 to the first hexose on the core with the trisaccharide series linked 1,3 to the same residue. These results suggest the less abundant C. parvum N-glycan is likely Man 5 GlcNAc 2 and has a structure identical to Hex 6 HexNAc 2 without the terminal 1,3 linked hexose on the long arm. Although these methods cannot

Molecular & Cellular Proteomics 16 Supplement 4 S45
differentiate between isobaric monosaccharides, (e.g. mannose from glucose), these methods can accurately define the topology of the glycan and the linkages connecting each monosaccharide. The structures we have defined are consistent with the N-glycan structure that has been proposed on the basis of the presence or absence of the highly conserved N-glycosylation biosynthetic pathway enzymes identified in the C. parvum genome (supplemental table Excel S1 and supplemental Fig. S3). The topology of the Hex 5 HexNAc 2 structure defined here, presumably Man 5 GlcNAc 2, is different from the Man 5 GlcNAc 2 glycoform produced when host Man 9 GlcNAc 2 is processed by ER mannosidase 1 in higher organisms, which has 1,3 and 1,6 dimannosyl branches off the first 1,6-linked Man (16,25). Confident Assignment of Cryptosporidum parvum N-glycosylated Peptides-To rule out the possibility of host cell contamination that could occur because C. parvum is an obligate intracellular parasite, the MS/MS spectra of glycopeptides from a whole oocyst lysate were manually interpreted. Peptides generated from a trypsin digestion of proteins isolated from oocysts were separated on a reversed phase-C18 nanoflow column interfaced to a mass spectrometer, as described in the methods section. Extracted oxonium ion chromatograms for m/z 204.08 (HexNAc) and m/z 366.13 (HexNAc-Hex) were very abundant throughout the MS/MS spectra recorded across the HPLC separation (Fig. 4A). These XIC pointed out which spectra should be manually interpreted to obtain sequence information on both glycan and peptide. A representative HCD spectrum for [M ϩ 2H] 2ϩ m/z 1092.9426 eluting at 15 min, marked by the carat in Fig. 4A, is interpreted in Figs. 4B and 4C. Ions containing sequential glycosidic bond fragments dominate the spectrum and provide the linear sequence for Hex 6 HexNAc 2 (Fig. 4B) The spectrum is labeled only with glycosidic fragments that indicate the topology of the glycoform, revealing a single long arm and an unmodified core. All assignments can be viewed in supplemental table Excel S1. (Fig. 4C). The complete y-series is interpretable, starting from y1 m/z 175.1189, to the full-length peptide, observed at m/z 806.3992, thus revealing the peptide sequence NSTTEVR, and indicating the Hex 6 HexNAc 2 was linked to Asn (Fig. 4C). Checking the peptide sequence against the NCBI nr and C. parvum proteome databases, utilizing the blastp algorithm, revealed that the peptide belongs to the C. parvum protein POWP1 (Table I). In summary, the linear glycan sequence, conjugation site, and complete peptide sequence, can all be determined from a single spectrum.
This method of manual interpretation was continued systematically for the remaining spectra containing one or more oxonium ion(s). Many of the most abundant peptides mapped to the same protein, POWP1. Five N-glycosylation sites were The spectrum is labeled with only the informative cross ring fragments that provide linkage information. All assignments can be viewed in supplemental table Excel S1. mapped to this protein; all contain either Hex 5 HexNAc 2 or Hex 6 HexNAc 2 ( Fig. 5 and Table I). Many of the MS/MS spectra assigned to glycopeptides contained Y 1 and Y 2 ions that arose via glycosidic cleavages adjacent to the HexNAc residues in the chitobiose core, with charge retention on the peptide fragment, but none of these MS/MS spectra contained (Y 1 ϩ 146) or (Y 1 ϩ 162) ions that would indicate the presence of a deoxyhexose or hexose branch on the inner HexNAc residue. No spectra indicated the presence of glycopeptides that did not originate from C. parvum. This result demonstrated that the preparation was clean from contaminating host material, and assured that the released N-glycans are of parasite origin, as the results had already suggested (Figs. 2, 3, supplemental Figs. S1, S2). The manual interpretations of the glycopeptide spectra are consistent with the results obtained by analysis of the released glycans and underscore the very limited repertoire N-glycans in this organism. This information could then be applied to perform semiautomated database searches to dig deeper in the spectra, allowing for faster processing of replicate samples.
Like gp900, two Cryptosporidium N-linked glycoproteins that are also unique (UCG1 and UCG2) contain long runs of Thr, which are likely modified by O-linked GalNAc (30,35,57). Five other unique glycoproteins with occupied N-glycan sites (UCG3 to UCG7) remain uncharacterized. Other observed glycoproteins have analogs elsewhere in apicomplexa: the glideosome-associated protein (GAP50), three putative adhesion proteins with a Limulus coagulation factor C lectin (LCCL) domain (CCP1, CCP2, and FNPA), and a copper amine oxi-  5. Occupied N-glycosylation sites of POWP1 (cgd2_490). The most densely N-glycosylated protein is represented as a cartoon schematic. The occupied peptides are shown; the bold and italicized asparagine residues indicate the site of attachment of the glycan. dase (CAO) are conserved throughout apicomplexa (17-19, 52, 51, 57-59). An O-GalNAc transferase 4 is present in apicomplexans and mammalian hosts, whereas GMC oxidoreductase (GMCO) is present in apicomplexans, metazoans, fungi, plants, and bacteria (56).
Nearly 90% of the Occupied N-glycan Sites Contain Thr Rather than Ser-N-glycans of C. parvum are not used for quality control of protein folding, and there is no positive selection for N-glycan sites in its secreted proteins (20,25,26). However, we observed a large difference in the rate of occupancy of potential N-glycosylation sequons. Despite the 5:3 ratio of Thr (100) and Ser (61) in the second position relative to Asn, the number of occupied N-glycan sites overwhelmingly (9:1) favors Thr (35) over Ser (4), as shown in the WebLogo in Fig. 6 and the data in Table I ( 46). Notably, only 11 total spectra were assigned to peptides with the asparagine modified in an Asn-Xxx-Ser sequon, compared with the 412 that correspond to N-glycosylation on the Asn-Xxx-Thr motif (Table I, supplemental table Excel S2). DISCUSSION ALG enzymes, which are required for the synthesis of Nglycan precursors, are reliable predictors of the types of Nglycans transferred to the nascent peptides, because these glycosyltransferases are constitutively expressed in the ER (16,21). Two peculiarities present themselves with regards to the ALG enzymes of C. parvum. First, the ALG13 peptide of the glycosyltransferase that adds the second GlcNAc to the pyrophosphate-linked precursor can easily be identified in Cryptosporidium muris and in all other organisms that make N-glycans, but ALG 13 is absent from the predicted proteins of C. parvum and C. hominis (supplemental table Excel S1) (17)(18)(19)52). Second, although the C. parvum N-glycan precursor is predicted to be Glc 2 Man 5 GlcNAc 2 , Hex 7 HexNAc 2 was absent from the N-glycans released with PNGaseF and from tryptic glycopeptides (Figs. 1 to 4 and Table I). This result suggests that ALG8, which adds the second glucose to the N-glycan precursor, is not active, or a glucose residue is rapidly removed by glucosidase 2 from Glc 2 Man 5 GlcNAc 2 after it is transferred to the nascent peptide. In contrast, T. gondii, which has a predicted N-glycan precursor composed of Glc 3 Man 5 GlcNAc 2 , has been shown to have glycoproteins containing Hex 8 HexNAc 2 and Hex 7 HexNAc 2 (57).
Although it is well-known that the oligosaccharyltransferase (OST) that adds N-glycans to the nascent peptide prefers N-glycan sites with Thr over those with Ser, such a strong bias for Thr as that observed here for occupied N-glycan sites of C. parvum has not previously been de-  scribed, to our knowledge (26,60). The composition of the C. parvum OST, which includes the catalytic STT3 subunit and three non-catalytic subunits (supplemental table Excel S1), is similar to that found in other apicomplexans, whereas the OSTs of some parasites (e.g. Giardia and Trypanosoma) only contain STT3 (53). The binding of the anti-retroviral lectin cyanovirin-N to C. parvum strongly suggested that the parasite contains a high mannose N-glycan (23,24). Cyanovirin-N also binds to Entamoeba and Trichomonas, each of which builds its N-glycans from a precursor composed of Man 5 GlcNAc 2 (61)(62)(63). The N-glycan profile of C. parvum differs from those of the other parasites in the relative abundance of GlcMan 5 GlcNAc 2 and the absence of mannosidase products (Man 4 GlcNAc 2 and Man 3 GlcNAc 2 ) and/or hybrid and complex N-glycans, which contain LacNAc arms (Trichomonas) or galactose capped with Glc (Entamoeba) (63,64). Other parasites (Trypanosoma, Leishmania, and Acanthamoeba) and Dictyostelium have Nglycan precursors with three mannose arms and make numerous complex N-glycans that contain LacNAc, fucose, and xylose (65)(66)(67)(68). Finally, the N-glycans of the mammalian hosts (mice, humans, cats, etc.) are much more complex than those of C. parvum, which are remarkable for their simplicity (56). Whether the high mannose N-glycans of C. parvum are involved in antigen masking and/or pathogenesis, as has been shown for high mannose N-glycans on gp120 of HIV and on HA of influenza virus, remains to be determined (69 -71). It has been established that many of the C. parvum proteins that elicit a strong immune response are N-linked glycoproteins (35,41). Attempts have been made to develop vaccines from several of these glycoproteins; however, the critical details such as which amino acids are modified and with what glycan structure(s) were left unanswered (72,73). The results we have presented here fill in the missing details regarding the N-glycosylation of the immunodominant antigen gp900, and we also expand upon the number of N-glycosylated proteins previously described in the literature. Of particular interest is the abundant and densely glycosylated protein POWP1. The function of POWP1 remains to be determined. These details may be crucial in providing a means to finally developing an effective, synthetic glycoprotein or glycopeptide-based vaccine against cryptosporidiosis.
Supplementary material is freely available for downloading from the MCP website. It contains the EED MS/MS spectrum of the Hex 5 HexNAc 2 glycan, glycosidic and cross-ring fragment assignments, a list of the C. parvum glycosyltransferases and a figure showing the predicted glycans of C. parvum. Excel files S1 and S2 contain the lists of fragment ion assignments. Excel file S3 contains the complete list of glycopeptides, proteins, and related bioinformatics data.
Acknowledgments-We thank Yi Pu and Cheng Lin for their assistance setting up the EED FT-ICR MS/MS experiment. Furthermore, Cheng Lin provided valuable feedback and questions pertaining to the interpretation of the EED spectra. Additionally, we thank Giulia Bandini for her insight on glycosyltransferases and knowledge of T. gondii.

DATA AVAILABILITY
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (http://www.ebi.ac.uk/pride) with the dataset identifier PXD005503 and 10.6019/PXD005503 (http://www. ebi.ac.uk/pride/archive/projects/PXD005503). * Support for this study came from NIH grants R01 AI110638 and R01 GM031318 (J.S.), and P41 GM104603, S10 RR025082 and S10 OD010724 (C.E.C.), and NIH-NHLBI contract HHSN268201000031C (C.E.C.). We thank Thermo-Fisher Scientific for loan of the Q Exactive Plus system. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.