Identification and Validation of Mannose 6-Phosphate Glycoproteins in Human Plasma Reveal a Wide Range of Lysosomal and Non-lysosomal Proteins*S

Acid hydrolase activities are normally confined within the cell to the lysosome, a membrane-delimited cytoplasmic organelle primarily responsible for the degradation of macromolecules. However, lysosomal proteins are also present in human plasma, and a proportion of these retain mannose 6-phosphate (Man-6-P), a modification on N-linked glycans that is recognized by Man-6-P receptors (MPRs) that normally direct the targeting of these proteins to the lysosome. In this study, we purified the Man-6-P glycoforms of proteins from human plasma by affinity chromatography on immobilized MPRs and characterized this subproteome by two-dimensional gel electrophoresis and by tandem mass spectrometry. As expected, we identified many known and potential candidate lysosomal proteins. In addition, we also identified a number of abundant classical plasma proteins that were retained even after two consecutive rounds of affinity purification. Given their abundance in plasma, we initially considered these proteins to be likely contaminants, but a mass spectrometric study of Man-6-phosphorylation sites using MPR-purified glycopeptides revealed that some proportion of these classical plasma proteins contained the Man-6-P modification. We propose that these glycoproteins are phosphorylated at low levels by the lysosomal enzyme phosphotransferase, but their high abundance results in detection of Man-6-P glycoforms in plasma. These results may provide useful insights into the molecular processes underlying Man-6-phosphorylation and highlight circumstances under which the presence of Man-6-P may not be indicative of lysosomal function. In addition, characterization of the plasma Man-6-P glycoproteome should facilitate development of mass spectrometry-based tools for the diagnosis of lysosomal storage diseases and for investigating the involvement of Man-6-P-containing glycoproteins in more widespread human diseases and their potential utility as biomarkers.

The lysosome is an acidic, membrane-delimited organelle that is responsible for the degradation and recycling of macromolecules, playing a role in endocytosis and autophagy (1). Most of the resident hydrolases (e.g. proteases, glycosidases, lipases, phosphatases, sulfatases, and nucleases) and accessory proteins (e.g. activator proteins) are transported to the lysosome by the mannose 6-phosphate (Man-6-P) 1 targeting pathway. Like many other glycoproteins, soluble lysosomal proteins are synthesized in the endoplasmic reticulum and are cotranslationally glycosylated on select asparagine residues. As these proteins move through the secretory pathway, the lysosomal proteins are selectively recognized by a phosphotransferase that initiates a two-step reaction that results in the generation of the Man-6-P modification on specific N-linked oligosaccharides. The modified proteins are then recognized by two Man-6-P receptors (MPRs), the cation-dependent MPR and the cation-independent (CI-) MPR (2). These receptors bind the phosphorylated lysosomal proteins in the neutral environment of the trans-Golgi network and travel to an acidic prelysosomal compartment in which the low pH promotes dissociation of the receptors and ligands. The receptors then recycle back to the Golgi to repeat the process or to the plasma membrane. Here the CI-MPR can function in the endocytosis and lysosomal targeting of extracellular Man-6-P glycoproteins.
The primary location and site of function of lysosomal enzymes is by definition intracellular. However, lysosomal activities have also been identified in a variety of extracellular environments including plasma (3). These proteins may be delivered to the plasma through leakage from dead or dying cells, through mobilization of secretory lysosome or granule contents, or by the release of lysosomal residual bodies by cell defecation. It is also possible that a portion of the newly synthesized lysosomal proteins escape the intracellular targeting pathway for transport to the lysosome and are secreted. Evidence for the latter is well documented; for example, in cultured mouse embryonic fibroblasts, wild-type cells secrete a small proportion of Man-6-P glycoproteins (ϳ10% of that secreted in the absence of both MPRs, depending on the protein) (4 -7), indicating that the sorting of newly synthesized lysosomal proteins is not absolutely efficient.
The biological significance of circulating lysosomal proteins is unclear, but there are several possibilities. First, some hydrolases may be synthesized and released by one cell type and delivered to another, and so circulating lysosomal proteins may represent intermediates in this process. Second, circulating lysosomal proteins might have a specific function in plasma. Third, the presence of lysosomal proteins in plasma may simply represent the steady state levels reflecting the balance between the rate of unwanted appearance in plasma (due to leakage and/or lack of absolute fidelity in the intracellular targeting system) versus the rate of uptake and clearance.
Regardless of biological function, circulating lysosomal hydrolases potentially represent valuable biomarkers for the study and diagnosis of human diseases. Mutations in genes encoding lysosomal proteins result in over 40 storage diseases (for a review, see Ref. 8). In addition, lysosomal activities have also been indirectly implicated in more widespread pathogenic processes (9 -12) including tumor invasion and metastasis, Alzheimer disease, rheumatoid arthritis, and atherosclerosis as well as in normal processes such as aging (13) and immune system function (14). As a class, lysosomal proteins represent a relatively small subgroup of the plasma proteome that should be amenable to quantitative analysis.
In this study, we set out to characterize the human plasma proteome of mannose 6-phosphate glycoproteins to provide a reference dataset as a diagnostic tool for lysosomal storage diseases that could also facilitate further studies of the role of these proteins in other human conditions. We also viewed plasma as a promising source for the identification of potentially novel lysosomal proteins in our ongoing effort to identify components of the lysosomal proteome (15)(16)(17).
Given the complexity of the plasma protein population and the wide range of protein abundances, it is not currently feasible to directly detect lysosomal proteins in unfractionated plasma using proteomic methods due to their extremely low abundance compared with classical plasma proteins (18). We have therefore used an approach in which the Man-6-Pcontaining lysosomal proteins are purified by affinity chromatography on immobilized MPR (15-17, 19 -22). As expected, we identified many lysosomal proteins, but we also purified many proteins that have not previously been assigned either Man-6-P or lysosomal function. Many of these proteins are highly abundant plasma glycoproteins that we initially suspected to represent likely contaminants that do not contain Man-6-P. Therefore, to distinguish between true Man-6-P glycoproteins and contaminants, we used a technique to directly identify Man-6-P-modified glycopeptides. Unexpectedly we found that a small proportion of multiple abundant classical plasma proteins exist as Man-6-P-containing glycoforms.

EXPERIMENTAL PROCEDURES
Purification of Human Plasma Man-6-P Glycoproteins-Refrozen lots of thawed human plasma were generously provided by V. I. Technologies, Inc. (Melville, NY). Plasma Man-6-P glycoproteins were purified using a modification of the affinity protocol described for human brain (17). For each purification, ϳ2 liters of frozen human plasma were rapidly thawed at 37°C and then clarified by centrifugation at 13,000 ϫ g for 90 min at 4°C. The supernatant was filtered through six layers of cheesecloth and then added to an equal volume of ice-cold PBS containing 10 mM ␤-glycerophosphate, 10 mM EDTA, 2% Triton X-100, 0.4% Tween 20, and 0.2 mM Pefabloc (Pentafarm, Basel, Switzerland). The diluted plasma was then applied to a column of immobilized pentamannosyl phosphate-aminoethyl agarose (100-ml bed volume) to remove circulating soluble cation-independent MPR (sCI-MPR), and the flow-through was applied to a column of immobilized sCI-MPR (100-ml bed volume at a coupling density of 3.3 mg/ml) at a flow rate of ϳ200 ml/h to capture circulating Man-6-P glycoproteins. The MPR resin was batch washed with 3 ϫ 1 column volume of PBS containing 5 mM EDTA, 5 mM ␤-glycerophosphate, 1% Triton X-100 and then with 3 ϫ 1 column volume of the same buffer without Triton X-100. The column was flow-washed with 20 column volumes of PBS/EDTA/␤-glycerophosphate at 125 ml/h and then eluted with 2 column volumes of the same buffer containing 10 mM Man-6-P at a flow rate of 100 ml/h. Fractions containing Man-6-P glycoproteins were identified by protein and ␤-mannosidase assay, pooled, concentrated by ultrafiltration (Centriprep-10, Millipore, Billerica, MA), and buffer-exchanged to either 100 mM ammonium bicarbonate (single purification protocol) or PBS/EDTA/␤-glycerophosphate (first elution from the double purification protocol). For repurification, the buffer-exchanged Man-6-P eluate was reapplied to the column of immobilized MPR and purified essentially as described above. The Man-6-P eluate was buffer-exchanged with 100 mM ammonium bicarbonate. Purified proteins were stored at Ϫ80°C, and volatile salts were removed by lyophilization prior to subsequent analyses.
Tandem Mass Spectrometry-Proteolytic digests of MPR-affinity purified plasma proteins were analyzed by nanospray LC-MS/MS using a ThermoElectron LTQ linear ion trap (ThermoElectron, San Jose, CA) and a Waters Micromass Q-TOF API US (Waters, Milford, MA) mass spectrometer. For both the singly and doubly purified human plasma samples, MPR-affinity purified mixtures were digested with trypsin in solution prior to LC-MS/MS. Forty micrograms of each sample were also fractionated by 1D SDS-PAGE (Novex 10% Nu-Page gels with MOPS buffer (Invitrogen)), and ϳ30 gel slices were digested with trypsin. For the doubly purified sample alone, 40 g of purified proteins were fractionated by two-dimensional (2D) gel electrophoresis, and ϳ60 spots were excised and digested with trypsin.
The LTQ was used to analyze digests of all the samples discussed above. Solution digests of each of the singly and doubly purified samples were analyzed in duplicate. Instrument, chromatographic conditions, and generation of peak lists were as described previously (24) except for the analysis of gel slices, which was conducted using a 50-min gradient. For the Q-TOF, tryptic digests of each MPR-affinity purified sample (5 g) or tryptic digests of 1D gel slices were fractionated using a Micromass CapLC. Peptides were first desalted on a 300-m ϫ 1-mm PepMap C 18 trap column with 0.1% formic acid in HPLC grade water at a flow rate of 20 l/min. After desalting for 3 min, peptides were back-flushed onto an LC Packings 75-m ϫ 15-cm C 18 nanocolumn (3 m, 100 Å) at a flow rate of 200 nl/min. For solution digests of MPR-affinity purified proteins, peptides were eluted on a 170-min gradient of 3-42% acetonitrile in 0.1% formic acid. For digests of 1D gel slices, peptides were eluted on a 30-min gradient of 3-42% acetonitrile in 0.1% formic acid. The scan time for MS and MS/MS were 1.5 and 2.0 s, respectively. Mass ranges for the MS survey scan and MS/MS were m/z 300 -1900 and m/z 50 -1900, respectively. The top three multiply charged ions with an MS peak intensity greater than 30 counts/scan were chosen for MS/MS fragmentation with a precursor ion dynamic exclusion of 60 s. Instrument switching from MS/MS to MS mode occurred when either total MS/MS ion counts were over 5000 or when the total MS/MS scan time was over 6.0 s. Raw data were processed by ProteinLynx 2.0 to generate PKL files without peak threshold intensity limits (fast deiso-toping) and with automatic charge state determination.
Database Searching-Peptide assignment and protein identification were conducted using a local implementation of X!Tandem Version 2005.12.01 (GPM-USB, Beavis Informatics Ltd., Winnipeg, Canada) to search the human proteome database (the February 2006 assembly of the National Center for Biotechnology Information (NCBI) 35, which contains a total of 33,869 entries representing 30,352 unique sequences) as described previously (24). Database searching using data generated with the Q-TOF was identical except that the parent ion mass error was Ϯ0.5 Da. Methods for analysis of LC-MS/MS data from proteolytic digests of unfractionated mixtures or gel slices and for deglycosylated Man-6-P glycopeptides (see below) have been described previously (24). Data were exported into Microsoft Excel spreadsheets and filtered for threshold significance as described below. To avoid redundancy arising from proteins present in the human proteome database under multiple names and/or accession numbers, a list of ENSP numbers was compiled with all numbers corresponding to a single protein referenced to a single primary number. Protein assignments were compared with this database to ensure that multiple accession numbers representing the same protein were not listed.
Purification and Identification of Proteolytic Peptides Containing Mannose 6-Phosphate-The presence of Man-6-P on MPR-purified proteins was verified using a recently developed protocol for the purification and identification of glycopeptides containing Man-6-P from an unfractionated proteolytic digest of MPR-purified proteins (24).
Criteria for Protein Identification or Peptide Assignment-The GPM protein expectation score is a measure of the confidence of protein identification for which the lower the log value, the smaller the prob- ability that this represents a random match. When identifying proteins in the MPR affinity-purified protein mixtures, the score of highest confidence obtained at random when searching a reversed orientation human proteome was used as the threshold for acceptance of protein assignments. Method-specific thresholds were determined for unfractionated digests (log(e) ϭ Ϫ5.1 and Ϫ5.4 for the LTQ and Q-TOF, respectively) and for proteins fractionated by 1D PAGE when each gel slice was analyzed individually (log(e) ϭ Ϫ4.3 and Ϫ4.1 for the LTQ and Q-TOF, respectively). Each threshold was used to filter the respective datasets obtained from searching the forward orienta-tion human proteome. Samples were analyzed using different instruments (LTQ and Q-TOF), were singly and doubly purified, and were prepared for analysis using several different methods (solution digest and gel electrophoresis). Thus, as an additional level of stringency, proteins identified with one approach alone were rejected if the protein assignment scores failed to reach a threshold of log(e) less than Ϫ10. For analysis of purified Man-6-P glycopeptides, the threshold for acceptance of peptide assignments to proteins not previously classified as lysosomal was the score of lowest confidence of a deglycosylated peptide assigned to a known lysosomal species (pep-

TABLE I Known lysosomal MPR-purified proteins in human plasma
Unfractionated solution digests of MPR-affinity purified samples were analyzed in duplicate using the LTQ, and the highest confidence (i.e. lowest) log(e) score obtained in either of the two experiments is shown for each protein. For each set of duplicate experiments, we calculated the maximum log(I) for each individual protein and show here the differential between maximum log(I) scores obtained after one versus two cycles of affinity purification (i.e. ⌬ log(I) ϭ log(I) 2ϫ Ϫ log(I) 1ϫ ). PPP, Plasma Proteome Project high confidence proteome database (25) (www.bioinformatics.med.umich.edu/hupo/ppp/).

Gene
Protein PPP a Not identified in singly purified sample. b Not identified in doubly purified sample.

TABLE II Non-lysosomal MPR-purified proteins from human plasma
Unfractionated solution digests of MPR-affinity purified samples were analyzed in duplicate using the LTQ, and the highest confidence (i.e. lowest) log(e) score obtained in either of the two experiments is shown for each protein. For each set of duplicate experiments, we calculated the maximum log(I) for each individual protein and show here the differential between maximum log(I) scores obtained after one versus two cycles of affinity purification (i.e. ⌬log(I) ϭ log(I) 2ϫ Ϫ log(I) 1ϫ ). PPP, Plasma Proteome Project high confidence proteome database (25) (www.bioinformatics.med.umich.edu/hupo/ppp/). In total, 29 immunoglobulin chains were identified in this analysis. Given the difficulties in unambiguous assignment of mass spectral data to individual genes encoding these proteins and the fact that these appear to represent nonspecific contaminants, these immunoglobulin chains are not included here. AMBP, ␣ 1 -microglobulin/bikunin precursor.

Gene
Protein PPP Procollagen Cat eye syndrome critical region 1 tide log(e) of Ϫ2.3 for ␤-glucuronidase) with the added stringency that peptides with log(e) more than Ϫ3 were only included if the assignment could be clearly verified following manual inspection. MS/MS Spectra and General Summary of MS Data-A summary of data obtained with each instrument with different samples is presented in Supplemental Table 1. For each protein identified, the sequence coverage and number of unique peptides assigned are summarized in Supplemental Table 2. xml files for each database search that include protein assignments and MS/MS spectra are also included as supplemental data. (Saved xml data can be viewed on line by uploading to the GPM (human.thegpm.org/tandem/thegp-m_upview.html). Experimental parameters for each xml file are summarized in Supplemental Table 3.

Purification of Plasma Man-6-P Glycoproteins
Man-6-P glycoproteins were initially purified from human plasma with a yield of ϳ500 g/liter, which represents ϳ0.001% of the total starting protein content. Staining of the mixture after fractionation by two-dimensional gel electrophoresis using pH 3-10 isoelectric gradients revealed a complex pattern of purified proteins with Ͼ1500 spots resolved (Fig. 1A).
A significant problem encountered using an affinity purification approach to identify lysosomal proteins based on the presence of Man-6-P is the identification of false positives (24). Assuming correct protein assignment, false positives can represent abundant or "sticky" nonspecific contaminants (e.g. immunoglobulins or albumin) that bind nonspecifically and continuously leach off the column throughout the washing process and during the Man-6-P elution step. Alternatively non-Man-6-P proteins such as lectins or protease inhibitors can bind to and copurify with true Man-6-P glycoproteins, and these can be regarded as "specific contaminants" because they will elute specifically from the immobilized MPR with Man-6-P. A particular concern in this study, given that we expected lysosomal proteins to be very low abundance constituents of plasma, was the presence of nonspecific contaminants. To assess purity, we fractionated the mixture by twodimensional gel electrophoresis, transferred the sample to nitrocellulose, and used a radiolabeled form of the sCI-MPR in a Western blot style assay (22) to visualize Man-6-P glycoproteins (Fig. 1B). When the resulting MPR blot was compared with the stained gel, it was clear that a considerable proportion of the purified proteins appeared not to contain Man-6-P. We therefore repeated the purification on our preparation and now found that most of spots corresponding to the twice-purified proteins appeared to contain Man-6-P (Fig.  1, C and D). Given the stringency of this double purification, many of the spots that appeared not to contain Man-6-P probably represent either specific contaminants or proteolytically processed chains of lysosomal proteins that do not contain Man-6-P rather than nonspecific contaminants. Protein yields after two rounds of purification were approximately half of those obtained after a single round of purification (on average, 230 g/liter compared with 480 g/liter), reflecting the loss of contaminants.

Comparison of Analytical Methods
A number of mass spectrometric approaches were used to characterize the singly and doubly purified mixtures of MPR-
The Human Plasma Mannose 6-Phosphate Glycoproteome affinity purified proteins. All proteins that reached threshold significance (see "Experimental Procedures") are listed in Supplemental Table 2 along with identification statistics achieved with each sample preparation method and mass spectrometric approach. These data are summarized in Supplemental Table 1. Both tryptic digests of the total unfractionated mixtures and gel slices of material fractionated by 1D gel electrophoresis were analyzed. Nanospray LC-MS/MS was conducted using both a linear ion trap (ThermoElectron LTQ) and a quadrupole time-of-flight instrument (Micromass Q-TOF). Results obtained using the different instruments and samples are summarized in Supplemental Table I. The doubly purified material was also further fractionated by 2D gel electrophoresis and analyzed using the LTQ (Supplemental Table  1). For the Q-TOF, we found that prefractionation of the singly purified sample by SDS-PAGE provided more overall unambiguous lysosomal protein identifications than analysis of the unfractionated solution digest. In contrast, prefractionation of the doubly purified sample by SDS-PAGE resulted in fewer lysosomal protein identifications with this instrument when compared with the unfractionated solution digest. This likely reflects the fine balance between sample losses due to 1D SDS-PAGE and inefficiencies in tryptic digestion of proteins in gel slices versus the reduction in sample complexity. The doubly purified sample, being less complex than the singly purified sample, will thus receive less benefit from prefractionation. With the LTQ, prefractionation of either sample by SDS-PAGE resulted in fewer protein identifications than the unfractionated mixture; this probably reflects the faster duty cycle of the LTQ, meaning that sample complexity is less likely to be a limiting factor. Prefractionation by 2D gel electrophoresis resulted in a significant decrease in identification in the number of proteins identified. In terms of LC-MS/MS approach, of a total of 148 unique protein assignments exceeding threshold expectation values, 147 were made with the LTQ compared with 94 with the Q-TOF (Supplemental Table 2).

Identification of Purified Plasma Proteins
Data obtained using the different analytical approaches were evaluated in terms of identification of known lysosomal proteins (Supplemental Tables 1 and 2) to determine which provided the greatest number of unambiguous assignments. The most effective approach was analysis of unfractionated solution digests using the LTQ, which resulted in the identification of 44 of a total of 45 lysosomal proteins that are found when results from all methods are combined. As a single, readily comparable dataset, we have therefore restricted our analyses here (Tables I-V) to data generated with unfractionated solution digests and the LTQ, although proteins assignments made using all approaches are provided in Supplemental Table 2.
Forty-four known lysosomal proteins were identified by LTQ analysis of the unfractionated solution digest of the mixture of Man-6-P glycoproteins purified from human plasma, and as would be expected, the majority (37 of 44) were identified in both the singly and doubly purified preparations. Six proteins achieved significance only in the doubly purified mixture and were presumably masked by contaminants in the more complex, singly purified mixture. Previously we examined human brain lysosomal proteins using MPR affinity chromatography (17,24), and when combined with this study, a total of 53 lysosomal proteins were identified from these two human sources. The complement of lysosomal proteins identified from both sources was for the most part very similar with 41 proteins found in both brain and plasma. However, some proteins were found in brain but not in plasma (ARSB, CTSF, GALC, GLA, GM2A, IDUA, LIPA, SMPD1, and PPT2). In contrast, three lysosomal proteins (ACP5, CPVL, and heparanase) were found only in plasma.
The Plasma Proteome Project of the Human Proteome Organization (www.bioinformatics.med.umich.edu/hupo) has cataloged a high confidence (Ͼ95%) subset of the total plasma proteome database representing 889 plasma proteins (25). It is worth noting that of the known lysosomal proteins identified in this study, only one (␥-glutamyl hydrolase) is currently included in the high confidence dataset. This reflects the relatively low concentration of lysosomal proteins in plasma and the corresponding difficulties associated with their isolation and identification in a highly complex mixture.

Identification of Lysosomal Candidates
One of the aims of this and similar MPR affinity purification studies of the human proteome is to identify novel lysosomal candidates for further study. In our previous analysis of human brain (17), the number of proteins identified that were not classified as lysosomal was relatively small compared with the number of known lysosomal proteins; thus selection of candidates, based upon their respective biological and biochemical properties, was relatively straightforward. In contrast, in our analysis of the unfractionated solution digests of the mixture of proteins purified from plasma, we identified 80 different proteins (excluding contaminant immunoglobulin chains) that are not currently classified as lysosomal (Table II). Here selection of candidates of interest is not straightforward, especially given the significant challenges inherent in distinguishing a subset of plasma proteins with specific properties (e.g. the presence of carbohydrates containing Man-6-P) from highly abundant contaminants. We have therefore developed several experimental and bioinformatic approaches to help in the process of distinguishing the more probable lysosomal candidates from other purified proteins.
One approach is based on the hypothesis that true Man-6-P glycoproteins (and specific contaminants) will be enriched after two rounds of purification compared with a single purification as contaminants are removed. We used the X!Tandem log(I) score (the sum log intensities for all matched MS/MS fragment ions for every peptide assigned to a given protein) from the LTQ analyses of unfractionated mixtures as a meas-ure of protein abundance. As expected, of the 37 known lysosomal proteins identified in both the singly and doubly purified samples, the relative abundance of 35 increased in the doubly compared with the singly purified preparation and is graphically illustrated (Fig. 2A). In contrast, of the 44 proteins that are not currently classified as lysosomal that were found in both the singly and doubly purified samples, only 27 were enriched after two rounds of purification (Fig. 2B). Sixteen proteins were identified only in the doubly purified sample. Based on our observations with known lysosomal proteins, we predict that most of the remaining 37 that were either depleted or not found after the second round of purification are likely to represent nonspecific contaminants.
Proteins identified that are not classified as lysosomal have been sorted in Table II according to a decreasing ratio of intensity scores from the double and single purifications. Many of the proteins for which the maximum intensity score  Ϫ8.8 a Potential sites of Man-6-phosphorylation that were found on doubly glycosylated peptides for which singly glycosylated equivalents were not identified. These sites clearly contain endoglycosidase Hsensitive carbohydrates (indicated by the Nϩ203 modification), and at least one site must contain the Man-6-P modification. However, assignments of Man-6-phosphorylation to both sites should be regarded as tentative.
b Peptides containing phosphorylated glycosylation sites that were not identified in the mixture of MPR-affinity purified proteins from human brain. after two rounds of purification was increased have known or predicted enzymatic properties that might be consistent with lysosomal function. These include several ribonucleases, a fucosidase, a homolog of acid sphingomyelinase, and two hypothetical proteins with some similarity to phospholipases. However, there are also a number of proteins without reported enzyme activity and several protease inhibitors that may have been purified while bound to true Man-6-P glycoproteins (i.e. specific contaminants).
Although a comparison of the intensity scores obtained for individual proteins after a single versus double purification may provide some clues toward the identification of true Man-P glycoproteins, this is an indirect approach that cannot differentiate between true Man-6-P glycoproteins and specific contaminants. In addition, low abundance and/or sticky lysosomal proteins might be lost due to the increased sample handling. As an alternative experimental classification approach, we have recently developed an affinity purification protocol for the isolation of Man-6-P glycopeptides in a form suitable for identification by LC-MS/MS, allowing direct verification of the presence of Man-6-P (24). Briefly peptides containing Man-6-P from a proteolytic digest of a reduced and alkylated preparation of purified Man-6-P glycoproteins are purified by microscale MPR affinity chromatography, deglycosylated, and identified by LC-MS/MS. The presence of Man-6-P is verified when a characteristic mass increment resulting from deglycosylation with endoglycosidase H (Nϩ203) can be assigned in the context of an N-linked glycosylation motif (NX(S/T)).

Gene code
Protein name Man-6-P glycopeptides?

SMPDL3A
Acid sphingomyelinase-like phosphodiesterase 3A Leukocyte elastase CECR1 Cat eye syndrome critical region 1 ϩ been confirmed to contain Man-6-P in the analysis of lysosomal proteins from human brain (24), although a novel Man-6-P site was found here for di-N-acetylchitobiase, and a site was also identified for CPVL, a putative peptidase not identified in brain. In brain, we identified Man-6-P glycopeptides corresponding to 39 of 50 (78%) of the lysosomal proteins identified in the total mixture. This is a greater proportion than observed for plasma and may reflect the greater complexity of the Man-6-P glycopeptide mixture from this source. Sites for Man-6-phosphorylation were also identified in 28 proteins that were not previously assigned lysosomal function (Table III). Similar to the set of known lysosomal proteins identified in plasma, these plasma proteins were enriched in the doubly purified sample. A number of these proteins (acid sphingomyelinase-like protein 3a (SMPDL3A), biotinidase (BTD), cellular repressor of E1A-regulated gene transcription (CREG1), mammalian ependymin-related protein (EPDR1), prostaglandin-H 2 D-isomerase (PTGDS), and RNASET2) were also previously found to contain Man-6-P in human brain (24). However, an intriguing observation was the identification of Man-6-P glycopeptides derived from a large number of classical plasma proteins. Some of these proteins, e.g. haptoglobin, transferrin, hemopexin, and ␣ 2 -HS-glycoproteins, are extremely abundant plasma species in the 0.5-3g/liter range that we originally considered to be likely nonspecific contaminants. Classical plasma glycoproteins for which we identified Man-6-P isoforms in this study can be classified into several broad categories.
Protease Inhibitors-One striking observation was the number of protease inhibitors that were purified from human plasma by MPR affinity chromatography (Table IV). In total, 21 protease inhibitors or proteins containing protease inhibitor domains were found, representing 26% of the proteins identified that have not been assigned lysosomal function, and of these, eight are members of the SERPIN family. Given that many lysosomal Man-6-P glycoproteins are proteases, we previously suggested that such protease inhibitors probably represent specific contaminants (17). This is clearly the case with some, e.g. cystatins, which cannot contain Man-6-P as their primary sequence lacks the consensus motif for N-linked glycosylation. However, the presence of Man-6-P is demonstrated in eight of the protease inhibitors found in the mixture of MPR-purified proteins (as well as in thyroxin-binding globulin (SERPINA7), which was not identified with threshold confidence in the analysis of the unfractionated solution digests), indicating that there are glycoforms of these proteins that do contain Man-6-P. Interestingly previous studies have demonstrated the presence of Man-6-P on several proteins containing the SERPIN domain that are secreted by the uterus of pregnant sheep and pigs including uterine milk proteins and uteroferrin-associated protein (26,27). In addition, another member of the SERPIN family, C1 esterase inhibitor, may also contain Man-6-P under some circumstances (28).
Plasma Transporters-Vitamin E-binding protein afamin and metalloproteins ceruloplasmin (CP) and serotransferrin are all plasma transporters that appear to contain Man-6-P. Haptoglobin and hemopexin scavenge free hemoglobin from the bloodstream, and both were also found to contain Man-6-P. Other transport proteins found here include ␣ 1 -acid glycoprotein (ORM1 and ORM2) and fatty acid-binding protein 5.
Enzymes-Twenty-two of the proteins that are not classified as lysosomal have known enzymatic function or may be predicted to have catalytic function based on sequence homology with known enzymes. This category includes a number of abundant plasma glycoproteins such as CP, cholinesterase (BCHE), BTD, and PTGDS and complement factors (e.g. C2, C3, and BF) as well as a number of less abundant FIG. 2. Estimated relative abundance of proteins following one or two rounds of MPR affinity purification. Proteins that were identified only in one purification are excluded. For each protein that achieved threshold significance in the unfractionated mixtures when analyzed using the LTQ, log(I) scores, as an indicator of abundance, in both the singly and doubly purified samples are plotted. Plots in black correspond to those proteins for which Man-6-P glycopeptides were assigned for known lysosomal proteins (A) and for proteins not previously classified as lysosomal (B). species that are not found in the Plasma Proteome Project high confidence dataset. The relatively high prevalence of proteins with enzymatic function probably reflects the inclusion in this category of a number of novel lysosomal hydrolases.

DISCUSSION
In this study, we used an MPR affinity purification approach to isolate and characterize the proteome of circulating lysosomal proteins. In total, we identified 44 different lysosomal proteins that are associated with 23 lysosomal storage diseases. Lysosomal storage diseases are frequently difficult to diagnose using current clinical and biochemical approaches as presentation of genetically distinct diseases is often similar. In addition, different types of mutations within a single gene can result in a wide range of manifestations of disease (e.g. onset, severity, and symptom expression). One approach that may help definitive diagnoses in these diseases is the comparative proteomic profiling of disease and control samples to identify defective or deficient proteins. However, the complexity of the plasma proteome coupled with the low abundance of lysosomal proteins precludes a direct analysis of this subclass of plasma constituents. One of the aims of this study was therefore to determine whether MPR affinity purification of Man-6-P glycoproteins might form the basis for a comparative proteomic method for the global diagnosis of lysosomal storage diseases.
There are several approaches for proteomic profiling (29) that could be applicable to plasma Man-6-P glycoproteomes. One of the more frequently utilized methods is comparative two-dimensional gel electrophoresis, and in this study, we generated a 2D map of plasma Man-6-P glycoproteins (Supplemental Fig. 1 and Supplemental Table IV) as a potential clinical reference. Two problems were immediately apparent with this approach. First, LC-MS/MS analysis of even well resolved 2D gel spots (e.g. Supplemental Fig. 1 and Supplemental Table IV, gel spots 5-8) revealed a mixture of proteins; thus a deficiency in any given protein could potentially be difficult to identify. Second, limitations in the amount of clinical samples available, especially in young children undergoing diagnosis, would greatly limit use of the 2D approach. In terms of yield, we obtained ϳ200 g of Man-6-P glycoproteins/liter after a double purification; thus in a clinical setting, yields of ϳ 1 g/5 ml of plasma could be achievable with appropriate method development. With such amounts, most of the individual spots on a stained 2D gel would be below the limits of detection. Sensitivity could be increased by visualization of Man-6-P glycoproteins using a labeled MPR probe (22), but this would result in a decrease in resolution (for example, in Fig. 1, compare A with B and C with D). Therefore, although the comparative profiling of plasma Man-6-P glycoproteins by 2D gel electrophoresis might have investigational applications, these issues combined with the technical expertise required suggest that this approach is probably unsuitable as a widely applicable method for the diagnosis of lysosomal storage diseases.
An alternative approach that may overcome the drawbacks of 2D gel electrophoresis is quantitative mass spectrometry for the comparison of Man-6-P glycoproteins from patients and normal reference controls (30). In this way, Man-6-P glycoproteins that are deficient as a consequence of a lysosomal storage disease may be readily identifiable, and a preliminary diagnosis may be made that can be verified using standard biochemical and/or DNA-based tests. This approach is more specific than comparative 2D gel electrophoresis and may be compatible with the projected yields of Man-6-P glycoproteins from clinical samples. Quite apart from its utility in the diagnosis of storage disorders, mass spectrometry-based clinical methods could also prove useful in investigating the role of lysosomal proteins in more widespread human diseases, including cancer, where alterations in lysosomal activities have been implicated. In addition, such approaches could also prove useful in evaluating the potential of lysosomal proteins as plasma biomarkers of prognostic or diagnostic value. Implementation of quantitative methods (e.g. isobaric peptide tagging (31)) combined with the enrichment methods outlined in this study may be useful in this respect.
Another aim was to identify Man-6-P glycoproteins for further study that could potentially represent new lysosomal proteins. A large number of proteins were identified in this study that have not been assigned lysosomal function so we have used several criteria to filter the data to identify the most promising potential candidates. The first criterion is that a candidate should be enriched after a second round of purification reflecting the loss of nonspecific contaminants. The second criterion for a candidate is that it should either be or be predicted to be a hydrolase like most soluble lysosomal proteins. Forty-one of 44 lysosomal proteins found in plasma here fall into this class of enzymes. The third criterion is that a candidate should not represent an abundant plasma protein. Lysosomal proteins are low abundance constituents of plasma, and this is reflected by the fact that only one of the known lysosomal proteins found here is currently represented in the Plasma Proteome Project high confidence dataset (25) of plasma proteins (Table I). Categorization of both plasma lysosomal proteins and proteins not assigned lysosomal function with respect to these three criteria is illustrated in Fig. 3. Of the known lysosomal proteins, most (38 of 44) are hydrolases that are absent from the Plasma Proteome Project and that are enriched by two rounds of affinity purification. Based upon the assumption that novel lysosomal proteins will have properties similar to the known lysosomal proteins, we were able to identify nine highly promising candidates. These are summarized in Table V. Most of these proteins, i.e. the two phospholipase-like hypothetical proteins, SMPDL3A, plasma ␣-fucosidase (FUCA2), and several ribonucleases, have homology to known lysosomal proteins and are thus particularly promising. EPDR1 and CREG1 are also worth further investigation as they are low abundance, confirmed Man-6-P glycoproteins for which database searching reveals no similarity to other proteins that might provide clues to function. However, it should be emphasized that the objective of this and other large scale proteomic studies is to identify candidates of interest that will require further extensive characterization at the functional and compartmental levels before their actual cellular roles can be precisely ascertained.
An unexpected finding of this study was that a number of highly abundant classical plasma proteins were purified as Man-6-P-containing glycoforms (Table VI). Examples of such proteins include afamin, ␣ 1 -acid glycoprotein, ceruloplasmin, and haptoglobin. It is possible that detection of Man-6-phosphorylated glycoforms of these proteins may arise from a lack of absolute fidelity that occurs in biological processes (see below) and may result in a very small proportion of the proteins receiving Man-6-P. Thus, it is worth considering the relative amount of a given protein that is in the phosphorylated form in plasma. We previously used enzyme activity measurements and MPR affinity chromatography to estimate that, for any given lysosomal protein, 5-50% of the total enzyme activity present in plasma was in the Man-6-phosphorylated form (16). Here after two rounds of purification, we obtained ϳ200 g of Man-6-P glycoproteins/liter of plasma, representing 132 different proteins with an average yield of ϳ 1.5 g/liter for each individual protein, which is consistent with the predicted concentrations of lysosomal proteins containing Man-6-P in plasma. For example, total ␣-glucosidase is normally present at a median concentration of 17 g/liter of plasma (32); thus purification of 1.5 g/liter would correspond to 9% in the Man-6-P-containing form. In accordance, we previously determined by enzymatic assay that 28% of ␣-glucosidase in plasma actually contains Man-6-P (16). In contrast, we estimate that the proportion of the individual classi- FIG. 3. A classification scheme to identify candidate lysosomal proteins. A Venn diagram illustration of the distribution of known lysosomal proteins and proteins not assigned lysosomal function or localization as a function of three independent criteria: 1) hydrolytic function, 2) enrichment after two rounds of purification, and 3) absence from the high confidence Human Proteome Organization Plasma Proteome Project database (25). cal plasma proteins that contains Man-6-P is much lower. The average concentration of the classical plasma proteins purified here in Man-6-P glycoforms is ϳ 500 mg/liter (Table VI); thus the average proportion of each that contains Man-6-P would be 1.5 ϫ 10 Ϫ4 % of each respective total; this is many orders of magnitude lower than that observed for known lysosomal proteins. Although only a very small proportion of the classical plasma proteins appear to receive the Man-6-P modification, these glycoforms are probably detected here simply because these proteins are so abundant. These glycoproteins may represent substrates for the GlcNAc-phosphotransferase that are recognized with low affinity compared with lysosomal proteins bearing similar glycans in the same way that the non-lysosomal glycoprotein ribonuclease B can be Man-6phosphorylated but with lower efficiency than the lysosomal protein uteroferrin (33). Thus, we expect that the GlcNAcphosphotransferase will have a spectrum of K m values for different glycoproteins with bona fide lysosomal proteins exhibiting high substrate specificity constants (k cat /k m ) for the phosphotransferase and thus being Man-6-phosphorylated preferentially. Other glycoproteins will exhibit a range of lower substrate specificity constants and, depending on the upper value of this range, may be phosphorylated at a low level. This may reflect an intrinsic balance between the requirement for the phosphotransferase to specifically recognize many different types of lysosomal proteins and its ability to distinguish non-lysosomal and lysosomal proteins. Should the concentration of low affinity non-lysosomal proteins be extremely high, then this would allow them to compete with lysosomal proteins for the phosphotransferase and receive the Man-6-P modification. Given the three-dimensional nature of the determinant for Man-6-phosphorylation, it is likely that the ability of the GlcNAc-phosphotransferase to distinguish between lysosomal and non-lysosomal proteins is less stringent if the latter bear some structural similarity to the former. An example is the non-lysosomal protein renin that receives the Man-6-P modification and that is structurally similar to cathepsin D (34). It is possible that the SERPIN domain bears some structural similarity to the lysosomal determinant for Man-6-phosphorylation; this could explain why we found that a number of proteins of this family contain Man-6-P. Alternatively it is possible that such proteins are associated with bona fide lysosomal proteins in transit through the Golgi and become phosphorylated due to a proximity effect.
In addition to receiving the Man-6-P modification, isoforms of non-lysosomal proteins must also escape the cellular pathway for targeting to the lysosome to allow secretion or must be delivered into the plasma by some other mechanism (see the Introduction). Given that they may be Man-6-phosphorylated aberrantly, these proteins could also represent low affinity ligands for the MPRs that are not efficiently recognized and targeted to the lysosome but that are instead secreted. However, the high concentration of MPR used in the affinity chro-matography would allow such proteins to be purified. For example, cathepsin L is a protein that binds to the MPR with relatively low affinity and, as a result, is largely secreted in cell culture (35,36), but it is still purified by MPR chromatography (15,17,21).
Irrespective of biological significance and source, the observation of Man-6-P glycoforms of abundant classical plasma proteins has important implications for the increasing number of studies directed toward the characterization of lysosomal proteomes using MPR affinity chromatography (15,17,19,20,37). Although the Man-6-P modification is suggestive of lysosomal localization and can provide a valuable clue toward novel candidates, its detection does not obviate the necessity for detailed characterization of subcellular localization and functional analyses. Regardless the ability to detect and quantify a small and distinct subset of plasma proteins with both established and proposed biomedical importance is likely to have valuable future application in the discovery of useful biomarkers for human disease.