Targeted Mass Spectrometric Approach for Biomarker Discovery and Validation with Nonglycosylated Tryptic Peptides from N-linked Glycoproteins in Human Plasma*

A simple mass spectrometric approach for the discovery and validation of biomarkers in human plasma was developed by targeting nonglycosylated tryptic peptides adjacent to glycosylation sites in an N-linked glycoprotein, one of the most important biomarkers for early detection, prognoses, and disease therapies. The discovery and validation of novel biomarkers requires complex sample pretreatment steps, such as depletion of highly abundant proteins, enrichment of desired proteins, or the development of new antibodies. The current study exploited the steric hindrance of glycan units in N-linked glycoproteins, which significantly affects the efficiency of proteolytic digestion if an enzymatically active amino acid is adjacent to the N-linked glycosylation site. Proteolytic digestion then results in quantitatively different peptide products in accordance with the degree of glycosylation. The effect of glycan steric hindrance on tryptic digestion was first demonstrated using alpha-1-acid glycoprotein (AGP) as a model compound versus deglycosylated alpha-1-acid glycoprotein. Second, nonglycosylated tryptic peptide biomarkers, which generally show much higher sensitivity in mass spectrometric analyses than their glycosylated counterparts, were quantified in human hepatocellular carcinoma plasma using a label-free method with no need for N-linked glycoprotein enrichment. Finally, the method was validated using a multiple reaction monitoring analysis, demonstrating that the newly discovered nonglycosylated tryptic peptide targets were present at different levels in normal and hepatocellular carcinoma plasmas. The area under the receiver operating characteristic curve generated through analyses of nonglycosylated tryptic peptide from vitronectin precursor protein was 0.978, the highest observed in a group of patients with hepatocellular carcinoma. This work provides a targeted means of discovering and validating nonglycosylated tryptic peptides as biomarkers in human plasma, without the need for complex enrichment processes or expensive antibody preparations.

Glycosylation is one of the most important post-translational modifications of proteins secreted in serum and is related to protein folding (1), quality control, sorting, degradation, and secretion. The glycan helps to stabilize polypeptide folding and indirectly allows glycoproteins to interact with various lectins, glycosidases, and glycosyltransferases by acting as recognition "tags" (2,3). The glycoform distribution and degree of glycosylation of a glycoprotein can be significantly altered by diseases such as cancer, and glycoproteins have been reported in association with a variety of abnormal phenomena in patients with cancer (4 -10). Thus, quantitative analyses of the glycosylation of proteins may be useful in discovering biomarkers for cancer.
Human plasma is clinically the most important sample, although it contains an extraordinarily large and complex proteome, with its dynamic range spanning more than 10 orders of magnitude in concentration. As conventional proteomic technologies only work within a small subset of this dynamic range (11), glycoprotein analysis in low concentrations typically requires glycoprotein enrichment methods, such as reactions with lectin (12)(13)(14)(15)(16)(17)(18), hydrazide chemistry (19,20), HILIC (21,22), or other procedures (23,24) to reduce sample complexity while enriching the glycoproteins. However, such complex pretreatment steps are a cause of poor reproducibility. Also, the identification of glycopeptides and characterization of glycan information from mass spectrometric analysis is difficult because the instrumental sensitivity for glycopeptides is lower than that for general peptides, and a general data-base for glycopeptides is lacking. These factors make the discovery of a glycoprotein biomarker, and its validation in real clinical samples, extremely difficult.
Two analytical strategies have been used with glycoproteins: glycan and glycopeptide analyses. Many current glycan analyses chemically or enzymatically cleave glycoproteins to generate a pool of glycan moieties prior to mass spectrometry-based analysis (25)(26)(27) These analyses provide both quantitative and structural information on glycan moieties. However, because the sample is a pool of glycan moieties from a glycoprotein mixture, these analyses are unsuitable for the detection of specific glycoforms released from an interesting glycoprotein and for the identification of glycosylation sites.
For glycopeptide analysis, deglycosylated peptides released by endoglycosidase are generally analyzed by MS (18, 19, 25, 28 -31). This method is useful for identifying glycosylation sites; however, it usually requires glycopeptide enrichment. Enrichment methods such as lectin affinity chromatography or reaction with hydrazide beads have been widely applied for the analysis of glycoproteins at low concentrations.
However, lectin affinity methods have shown less satisfactory in enrichment efficiencies. For example, in a membrane protein study of breast tumor cells, only 25 of 88 proteins (28.4%) and 34 of 152 proteins (22.3%) were identified as glycoproteins using a concanavalin A (Con A) lectin affinity column, and only 22 of 87 proteins (26.4%) and 27 of 146 proteins (18.5%) were identified as glycoproteins using a wheat germ agglutinin lectin affinity column (31). The more general approach of the hydrazide method may isolate a higher number of glycoproteins or glycopeptides compared with lectin affinity methods designed to enrich them with specific glycan moiety. Although McDonald et al. (28) reported considerable nonspecific binding of abundant proteins to hydrazide resin in their analysis of cell surface glycoproteins, they identified 589 proteins in HeLa cell plasma membranes using the hydrazide method, and 191 (32%) corresponded to glycoproteins. Lee et al. (32) demonstrated binding of nonglycosylated proteins with both the lectin affinity and hydrazide methods of glycosylation enrichment. These low enrichment efficiencies are attributable to weak binding of glycoproteins or glycopeptides and nonspecific binding of abundant proteins, and can result in data with low reproducibility. It is not adequate for quantitatively analyzing a large number of samples. Thus, a simpler and more practical MS-based proteomics technology without the need for any complex and irreproducible glycoprotein enrichment steps or ambiguous glycan structural interpretations is desirable for the discovery of new glycoprotein biomarkers.
Tryptic digestions tend to be incomplete when applied to glycosylated proteins, because of glycan steric hindrance (25,33) However, the ways in which post-translational modifications affect trypsin digestion can provide crucial clues for the discovery of new glycoprotein biomarkers. For example, if there are differences in the degree of glycosylation of a protein between normal and diseased states, it would result in quantitatively different peptide products in accordance with the degree of glycosylation.
In particular, because peptides adjacent to N-linked glycosylation sites are sterically affected by bulky glycan groups during proteolytic digestion, the relative abundance of nonglycosylated tryptic peptides adjacent to N-linked glycosylation sites is altered post-digestion. Many studies (18,34,35) and a review article (36) have reported that the degree of glycosylation can be aberrantly high in diseased samples. Thus, we hypothesized that these nonglycosylated tryptic peptides adjacent to N-linked glycosylation sites may be a valuable biomarker, revealing quantitative differences in the glycosylation states of patients with cancer. Nonglycosylated tryptic peptides are not only detected with higher sensitively by MS, but also are readily identified and quantified using existing proteomic technologies, because of the absence of heterogeneous glycans.
In the current study, targeted nonglycopeptide biomarkers were identified and validated among glycoproteins in human plasma. First, the effect of glycan steric hindrance in glycoproteins was demonstrated via a tryptic digestion of glycosylated alpha-1-acid glycoprotein (AGP-T) 1 and deglycosylated AGP treated with PNGase F (AGP-PT). AGP-T and AGP-PT indirectly indicated highly glycosylated disease plasmas and less glycosylated normal plasma, according to the degree of glycosylation. Second, one pooled normal and ten hepatocellular carcinoma (HCC) human plasma samples (sample information in supplemental Table S1) were analyzed using a label-free method consisting of nano-ultrahigh-performance liquid chromatography (UPLC) followed by quadrupole time-of-flight (Q-TOF) MS to identify candidate nonglycopeptide biomarkers. Nonglycosylated tryptic peptides adjacent to N-linked glycosylation sites from glycoproteins were targeted as candidate biomarkers because of observed, quantitative differences between normal and HCC plasma samples. Third, newly discovered nonglycosylated tryptic peptides were quantitatively validated by multiple reaction monitoring (MRM) using a nano-UPLC/triple quadrupole mass spectrometer from ten normal and eighteen HCC human plasma samples (sample information in supplemental Table S2).
Nonglycosylated tryptic peptide biomarkers were discovered without the need for complex glycoprotein enrichment methods using this conventional MS-based proteomics technology, where many of the limitations inherent in currently accepted glycoprotein biomarker discovery methods were avoided.
Alpha-1-Acid Glycoprotein Preparation-Two AGP standard solutions were prepared at concentrations of 100 g/10 l. AGP solutions were reduced by adding with 1 l of 1 M dithiothreitol at 60°C for 30 min. Proteins were alkylated by adding 5 l of 500 mM iodoacetamide in the dark at room temperature and allowing the solution to react for 30 min. One of the 100-g AGP solutions was deglycosylated by reacting with 1 l of PNGase F (500,000 u/mg) at 37°C for 6 h followed by incubation and digestion in a solution of 10:1 trypsin (total protein:trypsin, by weight) at 37°C overnight (AGP-PT). The other AGP solution was incubated in 1 l of deionized water, instead of PNGase F, at 37°C for 6 h. This solution was digested as described above (AGP-T). The two AGP digests were dried in a SpeedVac. The dried samples were diluted in mobile phase A and, as needed, spiked with isotope-labeled TEDTIFL*R prior to LC/MS/MS analyses.
Plasma Sample Preparation-Plasma samples were obtained with informed consent and in accordance with IRB guidelines from Yonsei University College of Medicine (Seoul, Korea).
Plasma samples of blood from healthy donors and HCC cancer patients were divided into four equal volume bags with an appropriate concentration of K 2 EDTA. Each aliquot was frozen and stored at Ϫ80°C until use.
Depletion of Major Abundant Proteins from Human Plasma by Immunoaffinity-The six most abundant proteins in human plasma: albumin, transferrin, IgG, IgA, haptoglobin, and ␣ 1 -antitrypsin were depleted using a HP1100LC system (Agilent) equipped with a multiple affinity removal column (MARC; Agilent). Crude human plasma samples were diluted by a factor of five with Buffer A (for example: 80 l of Buffer A was added to 20 l of human plasma) containing the proper amount of protease inhibitor (Complete Protease Inhibitor Mixture tablet (Roche, Indianapolis, IN)) and filtered through 0.22-m filters by centrifugation (16,000 ϫ g, room temperature, 1-2 min). Diluted crude plasma (70 -100 l) was injected at a flow rate of 0.25 ml/min; flow-through fractions were collected and stored at Ϫ20°C. Depleted plasma samples were desalted by centrifugal filtration using 5000-Da MWCO (molecular weight cutoff) VIVASPIN filters (product VS061; Sartorius, Gö ttingen, Germany).
Digestion of Plasma Proteins and Internal Standards-Aliquots (5-10 g) of human plasma samples that had been quantitatively analyzed by Bradford protein assays were diluted with 100 mM Tris-HCl buffer (pH 8.00). An internal standard of G6PD was prepared in 100 mM Tris-HCl buffer (pH 8.00). The plasma and internal standard solutions were digested using the same AGP digestion process without the deglycosylation step. Briefly, human plasma samples and the internal standard were reduced, alkylated, and incubated as above. Digested samples were dried using a SpeedVac. Dried samples were frozen and stored at Ϫ20°C until use. For label-free quantification, digested internal standard solutions were spiked equally prior to liquid chromatography tandem MS (LC/MS/MS) analyses. For MRM quantification, stable isotope-labeled peptide mixtures in each plasma sample were spiked equally prior to LC/MS/MS analyses.
Nano-LC-ESI-MS/MS for Proof (DDA) of Glycan Steric Hindrance and Selection (DDA, MS E ) of Candidate Biomarkers-Digested AGP samples were dissolved in mobile phase A, and stable isotope-labeled peptide standards of AGP were spiked equally in each AGP sample prior to nano-LC/electrospray ionization (ESI)-MS/MS analyses. Digested plasma samples were also dissolved in mobile phase A, and an internal standard of digested G6PD was added equally to each plasma sample prior to nano-LC/ESI-MS/MS analyses. MS/MS experiments for identification and quantification of proteins from peptide mixtures were performed using a nano-LC/MS system consisting of a nano ACQUITY UPLC system (Waters Corp., Milford, MA) and a Q-TOF mass spectrometer (Premier; Waters Corp.) equipped with a nano-ESI source. An autosampler was used to load 5-l aliquots of the peptide solutions onto a C 18 trap column (I.D. 180 m, length 20 mm, and particle size 5 m; Waters Corp.). The peptides were desalted and concentrated on the trap column for 5 min at a flow rate of 20 l/min. The peptides were then separated at a flow of 450 -500 nL/min on an analytical microcapillary column (C 18 , I.D. 75 m, length 150 mm, particle size 1.7 m; Waters Corp.). Mobile phase A consisted of water with 0.1% formic acid. Mobile phase B contained acetonitrile with 0.1% formic acid. Gradient elution began with 1% B for 0.33 min, ramped to 35% B over 84.67 min, then to 50% B over 0.5 min, retained at 50% B for 6.5 min, then ramped to 95% B over 0.5 min, retaining for 7.5 min and then to 1% B for another 0.5 min. The column was equilibrated with 1% B for 19.5 min before each run. For AGP, the total run time was 75 min with ramping to 40% B over 39.67 min. All analyses were performed using positive and V mode ESI using a nano-spray source. In label-free analyses, for accurate MS measurements, a lock mass of [Glu1]-fibrinopeptide B(GFP) at 300 fmol/l was injected once every 30 s during a single LC/MS/MS run at a flow rate of 0.5 l/min with NanoLock spray source. The voltage applied to produce the electrospray was 2.2 kV and the cone voltage was 35 eV. Argon was introduced as a collision gas at a pressure of 20 psi. DDA mode was used for AGP samples and to focus the database of human plasma samples. The MS scan range was 400 -1600 m/z and the spectral acquisition time in each mode was 1.0 s with a 0.1-s interscan delay for DDA mode. Data-dependent peak selection of the three most abundant MS ions was applied and the collision energy was used as a means of charge state recognition. Elevated energy (MS E ) mode was used for label-free quantification of plasma samples and all samples were analyzed in triplicate.
MS E is used in LC/MS studies in order to obtain fragment information about small molecules in complex mixtures, based on acquisition of exact mass data at alternating high and low collision energies. In low-energy MS mode, data were collected at constant collision energy of 5 eV to yield peptide molecular weight information. In elevated MS E mode, the collision energy was increased from 20 to 35 eV for peptide fragmentation. All MS data generated during the elution of each peptide from LC were collected for the identification and quantitative analysis of the peptide. In both low energy (MS) and MS E modes of acquisition, the mass scan range was 50 -2000 m/z with spectral acquisition times of 1.5 s and 1.0 s, respectively, for MS and MS/MS modes with a 0.1-s interscan delay in MS E mode.
Database Search-Identification for DDA Data-Peak lists were generated and processed using the MassLynx software (ver. 4.1; Waters Corp.). MS spectra were smoothed once using a 5-point Savitzky-Golay method, centered on the top 50% of each peak. The resulting raw files from each analysis were automatically processed into a single *.pkl peak list file. The resulting *.pkl peak list files were processed against the International Protein Index (IPI) Human database (version 3.22; 57,846 sequences; 26,015,783 residues) using the Mascot search engine (version 2.1; Matrix Science, London, UK). Mascot was used with monoisotopic mass selected, a precursor mass tolerance of Ϯ1.5 Da, and a fragment mass tolerance of Ϯ0.8 Da. Trypsin was selected as the digestion enzyme, with one potential missed cleavage. ESI-Q-TOF was selected as the instrument type. Oxidized methionine, carbamidomethylated cysteine, propionamide cysteine, and pyroglutamate (N-term E, Q) were chosen as variable modifications for identifying nano-UPLC/ESI-Q-TOF results in DDA mode. Deglycosylated AGP analyses also included deamidation (N). In a Mascot search, AGP came up as the first hit with high-scoring peptides above the search threshold (expected p Ͻ 0.05, peptide score Ͼ42). All of the Mascot search results from normal human plasma, and individual HCC plasma samples were summed. The focused database was composed of sequences from all of the identified proteins in the pooled normal plasma sample and the ten HCC individual plasma samples plus the sequence of the G6PD internal standard.
Database Search and Quantification for MS E Data-A continuum of LCMS E data was generated and processed using ProteinLynx Global Server software version 2.2 (PLGS2.2; Waters Corp.). Protein identifications were performed by using the forced database composed of the Mascot search results from the previous step. The ion detection, clustering of peptide components by mass and retention time, and data normalization were performed using PLGS, as described by Silva and et al. (37,38). Briefly, all raw data were corrected for mass accuracy by m/z 785.8426, corresponding to the GFP ([Glu 1 ]-fibrinopeptide B) standard that was used as the lock mass. MS spectra were smoothed with a 7-point Savitzky-Golay method, centered on the top 80% of each peak. Trypsin was used as the digestion enzyme with one potential missed cleavage. ESI-QTOF was selected as the instrument type. Oxidized methionine and carbamidomethylated cysteine were chosen as modifications. For quantification, exact mass was selected with a mass tolerance of 20 ppm and the fine retention time window was 1-min. The intensity measurements for the entire data set were normalized to the internal standard, G6PD. From these criteria, the peak, which was detected at least twice, was used to quantify both normal and HCC individual plasma samples.

Nano-LC-ESI-MS/MS for Validation of Selected Biomarker Candidate Peptides: MRM Quantification by Online Nano-LC-MS-MRM
experiments were performed with a nanoACQUITY UPLC system (Waters Corp.) and a TSQ quantum ultra EMR triple-quadrupole mass spectrometer (Thermo Finnigan, San Jose, CA) equipped with a nanospray source. With the exception of the gradient, LC conditions were the same as those described above for the selection of candidate biomarkers. The gradient began with 5% B for 0.33 min, then ramped to 40% B over 39.67 min, to 50% over 10 min, to 95% over 0.5 min, retained at 95% B over 9.5 min, and then ramped to 5% B over 0.5 min. The column was equilibrated with 5% B for 14.5 min before the next run. Optimized conditions were used for MRM quantification: quadrupoles Q1 and Q3 were held at 0.7 m/z FWHM with a scan time of 50 ms per peptide and 2.0 kV spray voltage. Optimum transitions and collision energy parameters were determined for each peptide by infusion of 10 pmol/l peptide solution (Table III). Three transitions for each peptide were selected and monitored. First, precursor ions with a specific mass were transmitted from Q1 to the collision cell for fragmentation. Three fragment ions were then transmitted through Q3, yielding the signals used for quantification (39).
Statistical Analyses-For quantitative comparisons of the label-free plasma samples, the results of exact mass and retention time (EMRTs) from the PLGS analysis were exported to an Excel spread-sheet (Microsoft, Redmond, WA), and all data were normalized using a linear regression analysis. Validated MRM results of target nonglycopeptides from the normal and HCC cancer groups were compared statistically using MedCalc (version 10.1.8.0, demo version). The diagnostic accuracy of each peptide from the candidate biomarkers was evaluated using receiver operating characteristic (ROC) curve analyses. The area under the curve (AUC) is reported at the 95% confidence interval (CI). The related sensitivity and specificity were determined.

RESULTS AND DISCUSSION
Proof-Of-Concept for the Effects of Glycan Steric Hindrance on Proteolytic Digestion Using AGP-Successful enzyme digestion results in a solution of peptides that are completely and accurately cleaved at expected sites. This only occurs when the enzymes can efficiently access the reactive cleavage sites. Thus, steric hindrance in a protein that interferes with enzyme access affects proteolytic digestion (25,33). This effect is more pronounced in proteins such as Nlinked glycoproteins that have been highly modified posttranslationally. Steric hindrance from the glycan group(s) in N-linked glycoproteins has been thought to significantly affect the efficiency of proteolytic digestion if the enzymatically active amino acid is adjacent to the N-linked glycosylation site, resulting in quantitatively different peptide products in accordance with the degree of glycosylation.
This concept is shown in Fig. 1. The illustrations depicting N-linked glycoproteins are based on an article by Bunkenborg (40). Both normal (left) and aberrant (right) glycosylated Nlinked glycoproteins are shown. After tryptic digestion, most of the peptides that are not related to the N-linked glycosylation sites will be present at the same concentration, with a relative abundance close to unity. However, concentrations of peptides that were adjacent to N-linked glycosylation sites, such as peptide 7, will be lower due to steric hindrance by the glycan in aberrantly glycosylated N-linked glycoproteins. Because steric hindrance increases with the size of the glycan unit, the abundance of peptide 7, which boasts a relatively large glycan moiety, will be significantly decreased. In diseased plasma samples, aberrant glycosylation of glycoproteins, such as increased levels of fucose and sialic acid, the addition of polylactosamine units, or higher-order branching of N-linked glycans, are known (33). The peak area ratios (aberrant/normal) of specific nonglycosylated tryptic peptides that originate from sites adjacent to N-linked glycosylation sites are thus expected to be lower than those of other peptides. These peptides were the target of this study: nonglycosylated tryptic peptides from N-linked glycoproteins.
First, the effect of glycan steric hindrance on trypsin digestion was demonstrated using AGP as a model compound. Two AGP digest solutions were prepared and analyzed by LC/MS/MS. One consisted of a tryptic digest of AGP (AGP-T). The other consisted of a tryptic digest of deglycosylated AGP that had already been treated with PNGase F (AGP-PT).  Table I, where the underlined sequences indicate the quantified peptides. The nearest N-linked glycosylation sites in each identified peptide are indicated as 72N and 103N (in parentheses). Methionine oxidation generally occurs when a methionine residue is exposed to oxygen during digestion, and cysteine carbamidomethylation occurs during treatment with iodoacetic acid. The peak intensity ratio was much lower for the nonglycosylated tryptic peptide TEDTIFLR than for the other peptides (peptide 2, 3, and 4), which had peak intensity ratios of approximately one.
This indicates that the production of enzyme-digested peptides that had been adjacent to the N-linked glycosylation site in AGP was affected by the steric hindrance of the glycan unit. Peptide TEDTIFLR was produced by tryptic cleavage at both sites 73K and 81R. However, because of the steric hindrance by the glycan linked at 72N, trypsin's access to the 73K cleavage site was inhibited. Indeed, it caused the peptide TEDTIFLR to be present at low levels in the AGP-T sample. Notably, the same peptide exhibited a much higher abundance in the AGP-PT sample where the glycan had already been removed by PNGase F treatment prior to tryptic digestion. Furthermore, the degree of this effect would increase with the size of the glycan and decrease with the length between the enzyme active site and the N-linked glycosylation site.
Stable isotope-labeled peptide, TEDTIFL*R, was synthesized and spiked equally into the AGP-T and AGP-PT samples prior to LC/MS/MS analysis. The relative abundance of peptide TEDTIFLR (light) and peptide TEDTIFL*R (heavy) in the AGP-T and AGP-PT samples is shown in Fig. 2B. The level of TEDTIFLR peptide in the AGP-T sample was less FIG. 1. Effects of glycan steric hindrance and the degree of glycosylation on the efficiency of proteolytic digestion are illustrated. If the degree of glycosylation or size of the glycan unit(s) differs between normal and diseased samples, then these differences will be reflected in the relative abundance of nonglycosylated tryptic peptides that were adjacent to N-linked glycosylation sites following digestion. In the diseased sample shown here, nonglycosylated tryptic peptides (4 and 7) and N-linked glycosylated peptides (5 and 8) will be cleaved with a different efficiency compared with those in a normal plasma sample. Furthermore, the relative abundance of nonglycosylated tryptic peptide 7 will differ in the mass spectra of normal and diseased samples. Peptide 4 will not be affected by glycan steric hindrance because "R" (the active site for trypsin) is distant from the N-linked glycosylation site. Thus, peptide 4 will be detected at similar levels in both the normal and diseased samples. Other peptides, except nonglycosylated tryptic peptide 7 and peptide 4, will be present at similar levels. Full lines represent general peptides and dotted lines indicate nonglycosylated tryptic peptides. Glycopeptides 5 and 8 could not be identified in the MS analyses and are not displayed in the third step. than 10% of that in the AGP-PT sample. This experiment again demonstrates that the amount of tryptic peptide, TEDTIFLR, was quantitatively different in AGP-T and AGP-PT samples.
Discovery of Candidate Nonglycopeptide Biomarkers by Label-Free Analyses-A label-free quantitative LC/MS/MS method was applied to identify nonglycosylated tryptic peptide biomarker from HCC human plasma. Many of the la-beling approaches in quantitative proteomics have potential limitations, including complex and expensive sample preparation and incomplete labeling. Conversely, label-free quantification approaches do not have these limitations and, above all, are not limited by the number of samples. Label-free methods are limited only by the reproducibility of LC and the accuracy of mass measurements in mass spectrometry. In this study, all 11 plasma samples (one pooled normal and ten HCC plasma samples) exhibited highly reproducible retention times and accurate mass measurements for separated peptides. The standard deviations in the retention time and in the mass accuracy of the eight tryptic peptides from the internal standards (G6PD) were 0.19 -0.30 min and 0.53-3.45 ppm, respectively (supplemental Table S3).
All of the plasma samples were analyzed in DDA mode for identification prior to label-free quantification. All identified proteins from 11 analyses (one normal and ten HCC human plasma samples) were combined to form a focused database. This database consisted of 94 proteins including single peptide identifications. The small size of this database, relative to the total human database, increased the reliability of labelfree MS E results by the PLGS algorithm. Each sample was analyzed in triplicate, in MS E mode by LC/MS/MS. Peptides were identified and quantified by PLGS, developed by Sliva et al. (37,38). Quantitative results were statistically normalized by linear regression, although the retention times of the labelfree results were normalized by the internal standard peptides in PLGS.
Between the two groups, 133 peptides were quantified; 17 nonglycosylated tryptic peptides related to N-linked glycosylation sites are listed in Table II. In terms of the average ratio from all quantified peptides for each protein, the normal plasmas were similar to those of the HCC plasmas (Table II). In general, it ranged within Ϯ30%. At the peptide level, however, some peptides changed over 30% between the two groups or showed quite different ratios in comparison with the averaged ratios of the protein levels.
We selected four targets of peptides, 1, 2, 4, and 7, as nonglycosylated tryptic peptide biomarkers that were not only adjacent to N-linked glycosylation sites, but also exhibited quantitative differences between normal and HCC human plasma samples (marked with an asterisk in Table II). Despite their proximity to N-linked glycosylation sites, peptides 8,9,13,14,15, and 17 showed small quantitative differences of less than 20% and were excluded from consideration of nonglycosylated tryptic peptide biomarkers. Nonglycosylated tryptic peptides, such as 3, 5, 12, and 16, were also not selected because they were distant from the N-linked glycosylation sites. Also, peptides 10 and 11 were not selected based on their poor MS/MS spectra by MS E (supple-mental Fig. S1 presents MS/MS spectra of all nonglycosylated tryptic peptides in Table II except for peptides 10 and 11; the peptide probabilities from the PLGS search algorithm were over 95 for all peptides). Isoform HWM of Kininogen 1 precursor was identified by peptides 3, 4, 5, and 6. Among these peptides, peptide 4 was selected based on its proximity to N-linked glycosylation site. Fig. 3A shows the extracted ion chromatogram of peptide TEDTIFLR, a nonglycosylated tryptic peptide biomarker (No. 1 in Table II) from AGP. Compared with those of other peptides (Figs. 3B, 3C, 3D), TEDTIFLR from HCC plasma showed a relatively low intensity in peak abundance. Such a result is consistent with our previous experimental data, obtained as a proof of principle. As we expected, this demonstrated that the difference in a protein's glycosylation state between a normal and diseased sample can be determined by specific nonglycosylated tryptic peptides. Other MS-based label-free results showing only quantitative differences in specific nonglycosylated tryptic peptide biomarkers are presented in supplemental Fig. S2.
Although the label-free method showed quantitative differences in the nonglycosylated tryptic peptide TEDTIFLR of AGP, the Western blot analysis of AGP was very similar between the normal and HCC plasma samples (supplemental Fig. S3). Western blot analysis without specific glycoprotein enrichment did not reveal any distinguishing information between the normal and disease groups, despite differences in the glycan groups or degree of glycosylation reported previously (41)(42)(43). Western blot analysis shows total protein levels and cannot accurately detect changes in protein glycosylation.
At the peptide level, peptide TEDTIFLR from AGP, peptide TINPAVDCCK from afamin precursor, peptide ENFLFLTP-DCK from isoform HMW of kininogen 1 precursor, and peptide GQYCYELDEK from vitronectin precursor protein could be identified as nonglycosylated tryptic peptide biomarkers for HCC, without glycoprotein enrichment.
Verification of Candidate Nonglycopeptide Biomarkers by MRM Analyses-Triple quadrupole mass spectrometry yielded very specific and sensitive results, because all targeted peptides were monitored by two mass filters (MS1: parent ion, and MS2: specific fragment ions of the parent ion) during the MRM. MS-based MRM assays (44 -49) have been proposed as an alternative to antibody-based biomarker verification due to the high throughput, selectivity, and sensitivity of using two mass filter stages. This approach was used to verify selected nonglycosylated tryptic peptide biomarkers in ten normal and eighteen HCC human plasma samples.
All of the plasma samples were prepared as in the previous identification step. Prior to the LC/MS/MS run, all of the plasma samples were spiked equally with four stable isotopelabeled standard peptides (heavy) of selected native nonglycosylated tryptic peptides (light) and analyzed by triple quadrupole mass spectrometry combined with the same UPLC system used for the label-free analyses. The optimum transition ions and collision energy conditions for each peptide were determined through direct infusion nanoflow-ESI experiments with stable isotope labeled standards (Table III). Calibration curves for each peptide also showed strong linearity in the range of concentrations from 10 to 500 fmol for the human plasma sample (2 g; supplemental Fig. S4). The MS and MS/MS tolerances were ⌬1 Da and ⌬0.7 Da, respectively.
Quantitative MRM analyses were run in triplicate for each sample. Data that resulted in detection events in two or more of the three trials, within CV (coefficient of variation) Ͻ 30%, were accepted. supplemental Table S4 shows the CV% of light/heavy ratios of targeted nonglycosylated tryptic peptides from normal and HCC plasma samples by MRM. Data that yielded results under the limit of quantification for each peptide were removed. MRM chromatograms of the targeted nonglycosylated tryptic peptide and those of stable isotope labeled standard peptides are shown in supplemental Fig. S5.
The results of the MRM analyses were analyzed statistically using MedCalc (version 10.1.8.0). ROC curves and quantitative scatter plots showing the MRM results in both the normal and HCC groups are shown in Fig. 4. The target nonglycosylated tryptic peptides were present at quantitatively lower levels in the HCC samples relative to the normal group. This was consistent with the results from the label-free quantitative analysis (Table II). This corroborates the results of the labelfree analyses described above. The area under the ROC curve (AUC) values of each peptide are shown in the Table IV.  Sensitivity for the four targets ranged from 55.6 to 88.9%. Specificity was between 75 and 100% at the 95% confidence interval (CI). The AUC of peptide GQYCYELDEK, from a vitronectin precursor protein, was particularly high (0.978).
Recently, an antibody-lectin sandwich assay that recognizes asialo-alpha-1-acid glycoprotein (AsAGP) was developed and applied to 610 serum specimens from patients with hepatic diseases, 41 healthy donors, and 155 patients with nonhepatic diseases. AsAGP was validated as a marker for hepatic disease, with an AUC value of 0.919 (42). Pei et al.

CONCLUSIONS
In this study, we developed a novel mass spectrometric approach for the discovery and validation of biomarkers in human plasma targeting specific nonglycosylated tryptic peptides in an N-linked glycoprotein. Based on our hypothesis, steric hindrance by glycan groups in N-linked glycoproteins would significantly affect the efficiency of tryptic digestion if enzymatically active amino acids were adjacent to an N-linked glycosylation site. This results in quantitatively different peptide products in accordance with the degree of glycosylation. We identified four nonglycosylated tryptic peptide targets using one pooled normal and ten HCC plasma samples, and verified them using ten normal and eighteen HCC plasma samples.
The current approach has significant benefits for the discovery and validation of biomarkers. In terms of sample preparation, complex, time-consuming procedures and those with low reproducibility, such as glycoprotein/glycopeptide enrichment, are not necessary, whereas they have previously been required in conventional glycoprotein research. In terms of sensitivity, because we do not target glycopeptides, but instead nonglycosylated tryptic peptides, they can readily be detected and identified by MS without PNGase F treatment. This approach also reveals differences in the degree of glycosylation in glycoproteins, which are not seen by Western blot analyses. Above all, this approach is simple and practically applicable to large numbers of complex clinical samples, such as plasma.