Development of Glycoprotein Capture-Based Label-Free Method for the High-throughput Screening of Differential Glycoproteins in Hepatocellular Carcinoma*

A robust, reproducible, and high throughput method was developed for the relative quantitative analysis of glycoprotein abundances in human serum. Instead of quantifying glycoproteins by glycopeptides in conventional quantitative glycoproteomics, glycoproteins were quantified by nonglycosylated peptides derived from the glycoprotein digest, which consists of the capture of glycoproteins in serum samples and the release of nonglycopeptides by trypsin digestion of captured glycoproteins followed by two-dimensional liquid chromatography-tandem MS analysis of released peptides. Protein quantification was achieved by comparing the spectrum counts of identified nonglycosylated peptides of glycoproteins between different samples. This method was demonstrated to have almost the same specificity and sensitivity in glycoproteins quantification as capture at glycopeptides level. The differential abundance of proteins present at as low as nanogram per milliliter levels was quantified with high confidence. The established method was applied to the analysis of human serum samples from healthy people and patients with hepatocellular carcinoma (HCC) to screen differential glycoproteins in HCC. Thirty eight glycoproteins were found with substantial concentration changes between normal and HCC serum samples, including α-fetoprotein, the only clinically used marker for HCC diagnosis. The abundance changes of three glycoproteins, i.e. galectin-3 binding protein, insulin-like growth factor binding protein 3, and thrombospondin 1, which were associated with the development of HCC, were further confirmed by enzyme-linked immunosorbent assay. In conclusion, the developed method was an effective approach to quantitatively analyze glycoproteins in human serum and could be further applied in the biomarker discovery for HCC and other cancers.

Mass spectrometry (MS) based quantitative proteomics has become the most commonly used approach for studying expression changes of proteins in large-scale studies (1). A variety of mass spectrometry-driven protein quantification methods have been proposed involving stable isotope labeling of proteins or peptides coupled with tandem mass spectrometry (MS/MS) 1 sequencing, e.g. isotope-coded affinity tags (2), stable isotope labeling by amino acids in cell culture (3), and multiplexed quantification using isobaric tagging reagents (4). On the contrary, label-free methods have received increasing attention as promising alternatives that automatically waive some of the disadvantages of using stable isotope labeling methods. Various label-free quantification strategies have also been developed in recent years which were based on comparing the direct mass spectrometric signal intensity for a given peptide (5), or the number of acquired spectra to a protein (6), which is referred as the spectral counting method. With the advances of these quantification methods, the mass spectrometry-based quantitative proteomics is becoming an emerging technology in biomarker discovery of human diseases (7).
The human serum is generally considered as the primary clinical specimen in disease diagnosis and therapeutic monitoring (8). It represents the largest and deepest version of human proteome present in any samples. Beside the classical serum proteins, it contains various tissue proteins (as leakage markers) plus numerous distinct immunoglobulin sequences. However, it is challenging to perform the comprehensive se-rum proteome analysis because of the extraordinary dynamic range and complexity. Several separation methods have been developed to reduce the complexity of the serum sample. Because most of the disease-related proteins, always secreted or shed from cell surfaces or released from tissue, are glycosylated (9) , the enrichment of glycoproteins has become one of the key issues in biomarker discovery and various discovery methods have been developed, such as the capture of glycoproteins with lectins (10). Glycoprotein capture with either single lectin or a combination of lectins with affinity to different types of glycan linkages, coupled with mass spectrometry based quantitative proteomics methods, have been widely applied. However, the major drawback of the lectin capture method is the relatively weak binding of lectins to oligosaccharides, which may introduce large amounts of nonspecific adsorption of abundant proteins such as human serum albumin, which accounts for as much as 55% of total serum protein. For example, it was reported that only 34 of the 152 proteins (22.3%) bound to a concanavalin A (Con A) column in a membrane protein study of breast tumor cell were found to be glycoproteins (11). This nonspecific adsorption would affect the accuracy and reproducibility of glycoprotein quantification. In addition, glycoproteins of low abundance are not likely to be quantified with lectin capture methods because the isolation ability of lectins is limited. Although a depletion of some abundant proteins before lectin capture may reduce nonspecific adsorption and enhance analytical performance (12), this procedure may also deplete some proteins of low abundance, which are possible potential biomarkers because it has been found that immunosorbent columns and antibodies would bind to more than 100 proteins nonspecifically, some of which are glycosylated (13).
Another popular method for the isolation, identification, and quantification of glycoprotein is the capture of oxidized glycopeptides by hydrazide chemistry (14). This method shows high specificity for the solid phase extraction of N-linked glycopeptides. The captured glycopeptides can be released by PNGase F and labeled at N-terminal sides, which makes quantification of N-glycosites feasible. However, isotopic labeling increases sample complexity because of the differential labeling and combining of the samples (15). For glycoprotein with only one glycosite, the quantification accuracy may be affected by the labeling efficiency of different samples because there is only one peptide to be quantified. Moreover, because there exits great heterogeneity in the occupancy of different N-glycosites in glycoproteins with multiple glycosylated sites, quantification of glycoprotein by glycopeptides would lead to ambiguous results.
Compared with isotope labeling, label-free methods have received increasing attention as promising alternatives. Glycoproteins can be quantified by comparing liquid chromatography (LC)-MS maps of deglycosylated peptides that are generated by high resolution linear trap quadrupole-Fourier transform mass spectrometry using bioinformatics tools (16).
Although promising reproducibility and sensitivity could be achieved, the variability of signal intensities among replicates increases with decreasing signal intensity, which would introduce ambiguity in the quantification of glycoproteins of low abundance or with few N-glycosylation sites. Instead of quantification by the peak intensity of MS1 in a high accuracy MS instrument, another viable label-free quantitative strategy is spectral counting, where the number of spectra matched to peptides from a protein is used as a surrogate measure of protein abundance (6). Recent studies have demonstrated that spectral counting can be as sensitive as ion peak intensities in terms of detection range, while retaining linearity and is highly reproducible (17). However, there are a few studies in which spectral counting was applied to quantitatively analyze glycoproteins in a high throughput manner (11) and only one study combined spectral counting with the hydrazide chemistry approach, which used the spectrum number of glycopeptides as the index of glycosylation level (18).
In this study, we propose an approach that combines the specificity of hydrazide chemistry and the convenience of the spectral counting method to quantify the abundance changes of glycoprotein in human serum sample. Unlike the existing glycopeptides-based methods, only nonglycosylated peptides were released by trypsin digestion after glycoprotein capture and the spectral counts of the identified nonglycopeptides were used to determine the abundance of the glycoproteins. We demonstrate that this approach is accurate, sensitive, and highly reproducible in a wide dynamic range of protein concentrations. Most importantly, unlike the measurement of the glycosylation level of N-glycosites in the conventional approach using glycopeptides, the application of this approach for the analysis of pooled human serum of both healthy people and patients with HCC indicated that this approach quantifies glycoprotein expression. A total number of 38 glycoproteins were found to have differences in concentration, including ␣-fetoprotein (AFP), which is the only current clinical biomarker for HCC and results of most identified differential glycorpoteins were in accordance with previously published studies. Moreover, three selected glycoproteins were validated by ELISA with another HCC cohort, which proved that this method is an effective approach for high-throughput differential glycoproteins screening.

EXPERIMENTAL PROCEDURES
Sample Collection-The study was approved by the Institutional Review Board at the Second Military Medical University. Informed consent was obtained from every patient enrolled in this study. Sera samples were collected from April to December, 2008, in the Eastern Hepatobiliary Surgery Hospital, Second Military Medical University (Shanghai, China). Serum samples were processed from each individual using a 12G BD Vacutainer safety-Lok™ blood collection system. Blood samples were collected in "gold top" tubes containing gel separator and clot activator. After collection they were immediately placed on ice and allowed to stand for 30 min. After this the samples were centrifuged at 3000 rpm for 15 min. Serum samples were stored at Ϫ80°C until analysis. One sera cohort that composed of 27 HCC and 20 cases of an age and gender-matched normal control cohort, respectively, were pooled for quantitative glycoproteomic analysis. Another sera cohort, including 58 HCC and 30 normal controls were collected. HCC diagnoses were confirmed by histopathlogic study. Normal controls were selected from those without history of liver disease, with normal liver biochemical function and without hepatitis B or hepatitis C infection. The clinical information of HCC samples and normal controls for proteomics analysis and ELISA validation are listed in Table I.
Materials and Chemicals-All the water used in this experiment was prepared using a Milli-Q system (Millipore, Bedford, MA); iodoacetamide was purchased from AMERESCO (Solon, OH); sodium periodate, dithiothreitol, ammonium bicarbonate, urea, trifluoroacetic acid (TFA), formic acid, and acetonitrile were all obtained from Aldrich (Milwaukee, WI). All the chemicals are of analytical grade except acetonitrile, which is of high performance liquid chromatography grade. Affi-Gel ® Hz Hydrazide Gel was purchased from Bio-Rad (Hercules, CA).
Capture of Glycoprotein by Hydrazide Resin-The procedure for the capture of glycoproteins from pooled serum samples was based on the method reported previously with minor modification (19). Briefly, 10 l of human serum was first diluted with 30 l oxidation buffer (100 mM NH 4 Ac, 150 mM NaCl, pH ϭ 5) and exchanged to the oxidation buffer using a Spin P-6 (Bio-Rad) protein desalting column, then diluted to 180 l with the same buffer and 20 l 100 mM sodium periodate was added. The reaction was kept in the dark for one h at room temperature. The oxidant was removed with YM-3 ultrafiltration kit (Millipore) with a molecular weight cut off of 3000 Da. The oxidized glycosylated proteins were coupled to 50 l hydrazide resin (bead volume) overnight in coupling buffer at room temperature. After supernatants were removed, proteins were denatured by incubation with 20 mM dithiotreitol and 8 M urea in 100 mM ammonium bicarbonate at 37°C for 2 h. After that, the resin was washed with 100 mM ammonium bicarbonate and 20 mM iodoacetamide was added. The alkylation was carried out at 25°C in dark for 30 min. The resin was washed with 0.5 ml 100 mM ammonium bicarbonate containing 8 M urea, 1.5 M NaCl, and 100 mM ammonium bicarbonate three times to remove unbound nonglycosylated proteins. The nonglycopeptides of coupled glycoproteins were released by incubation with 5 g proteomics grade trypsin (Sigma-Aldrich, Milwaukee, WI) in 100 l of 50 mM NH 4 HCO 3 overnight at 37°C. Released nonglycopeptides were collected, lyophilized with SpeedVac, and reconstituted in 100 l 0.1% formic acid solution for further analysis. Glycopeptides were released by further incubating the beads with 2 l PNGase F in 100 l of 50 mM NH 4 HCO 3 overnight at 37°C. Collected glycopeptides were also lyophilized with SpeedVac and reconstituted in 10 l 0.1% formic acid solution for further analysis.
Analysis of Isolated Peptides-The analysis of the released nonglycopeptide samples was carried out by two-dimensional nano-LC-MS/MS according to our previous method (20). The 25 l of reconstituted peptide sample was loaded onto a phosphate strong cation exchange monolithic column (150 m i.d. ϫ 7 cm) by air pressure. After that, the SCX column with loaded peptides was manually connected to a C18 column (75 m i.d. ϫ10 cm) in tandem by a union. The peptides were eluted onto the analytical reversed phase column using stepwise salt solutions generated with 1000 mM ammonium acetate (pH ϭ 2.7) and 0.1% formic acid. The salt concentrations of the buffer used for ten stepwise elutions were 50, 100, 150, 200, 250, 300, 350, 400, 500, and 1000 mM, respectively. Each salt step lasts 10 min followed by equilibrium with 0.1% formic acid for additional 10 min. After each elution step, a subsequent reverse phase (RP)LC-MS/MS was executed with a 120 min gradient from 5% to 35% acetonitrile. The RPLC-MS/MS was performed on a nano-RPLC-MS/MS system. A Finnigan surveyor MS pump (Thermo Finnigan, San Jose, CA) was used to deliver mobile phase. For the C18 capillary separation column, one end of the fused-silica capillary was manually pulled to a fine point of ϳ5 m with a flame torch. The columns were in-house packed with C18 AQ beads (5 m, 120 Å) from Michrom BioResources (Auburn, CA) using a pneumatic pump. The nano-RPLC column was directly coupled to a linear trap quadrupole linear ion trap mass spectrometer from Thermo Finnigan (San Jose, CA) with a nanospray electrospray ionization source. The mobile phase consisted of mobile phase A containing 0.1% formic acid (v/v) in water, and mobile phase B containing 0.1% (v/v) formic acid in acetonitrile. The ESI voltage was 1.8 kV and the linear trap quadrupole instrument was operated at positive ion mode. Normalized collision energy was 35.0%. One microscan was set for each MS and MS/MS scan. All MS and MS/MS spectra were acquired in the data dependent mode. The mass spectrometer was set that one full MS scan was followed by six MS/MS scans on the six most intense ions. The dynamic exclusion function was set as follows: repeat count 2, repeat duration 30 s, and exclusion duration 90 s. System control and data collection were done by Xcalibur software version 1.4 (Thermo). The scan range was set from m/z 400 to m/z 2000. Database Searching and Data Processing-The MS/MS spectra were generated by Extract_msn in Bioworks 3.2 (Thermo) and searched with SEQUEST (version 2.7) against a composite database including both original human protein database of International Protein Index (ipi.human.3.17.fasta, including 60234 entries, http:// www.ebi.ac.uk/IPI/IPIhuman.html) and the reversed version of the forward one. For the database search of spiked samples, the sequences of bovine ribonuclease B and alpha-2-HS glycoprotein were added. The enzyme cleavage site was set as trypsin, KR/P. Enzyme limits was set fully enzymatic, cleaving at both end. Maximum of two missed cleavage was allowed. For nonglycopeptides search, carboxyamidomethylation (ϩ57) was set as static modification and dynamic modification was used: oxidation of methionine (ϩ16). For glycopeptides search, dynamic modification of PNGase F catalyzed deamination of asparagine (ϩ1 Da) was added. MS tolerance for precursor ion and fragment ion was 2 Da and 1 Da, respectively. The database search results were filtered with criteria XcorrՆ1.9, 2.2 and 3.75 for ϩ1, ϩ2, and ϩ3 charge states respectively, DeltaCn was set Ն0.1 for the three charge states. Validated peptides sequences were assembled into protein identifications with the nomenclature that provided a minimal list of proteins sufficient to explain all observed peptides, accounting for groups of isoforms and splice variants that share common peptides (21). Glycoproteins identified by nonglycopeptides were considered positive if at least two unique peptides matched or a total spectral count of more four spectra was evident (17). The false detection rate of peptide identification was determined by doubling the number of peptides from the reversed database and dividing by the total number of peptides. The false discovery rate of nonglycopeptides for glycoprotein quantification was below 1% with the filtering criteria described above. Glycopeptides were identified with N*XS/T motif where X could be any amino acid except proline as previously described (22). For the analysis of clinical samples, both the isolated peptides from healthy and cancerous serum were run two replicates in two-dimensional LC-MS/MS and the spectral counts were averaged. A spectral count fold-change of more than two was considered differently expressed between normal and cancer samples.
ELISA-Serological measurement of galectin-3 binding protein (Mac-2 BP or S90K), insulin-like growth factor binding protein 3 (IGFBP-3), and thrombospondin 1 (TSP-1) was performed using an enzyme linked immunosorbance assay (ELISA), from R & D Systems (Minneapolis, MN) according to the suggested protocols with another cohorts consisting of 58 HCC and 30 normal controls. The differences between HCC and control groups were evaluated by unpaired t test with Welch's correction using GraphPad Prism version 4 for Windows (GraphPad software, San Diego, CA).

RESULTS
Glycoprotein Identification with Nonglycopeptides-The purpose of this research is to establish a method for the high throughput monitoring of differential abundance of glycoproteins in clinical serum samples. Glycoprotein identification with glycopeptide capture method has been a popular protocol for several years (23). The specificity of glycopeptide isolation with hydrazide chemistry has been well studied in a previous report (19). It has advantages of high specificity and high efficiency of extracting N-glycopeptides from complex samples like human liver tissue (21). In this method, glycopro-teins are digested at first and only the glycosylated peptides would be captured and analyzed whereas the nonglycopeptides, which take the most part of a glycoprotein, are discarded and neglected. Recent research showed that the nonglycopeptides can also be used to identify glycoproteins and a good overlap of identifying results exists between that by glycopeptides and by nonglycopeptides (24). This finding prompts us to quantify serum glycoproteins by nonglycopeptides. In a typical procedure of our approach, glycoproteins in serum samples were first oxidized with sodium periodate and then coupled to solid beads with hydrazide chemistry. After thoroughly washing, the nonglycopeptides of captured glycoproteins were released with trypsin digestion and the digest was analyzed with two-dimensional LC-MS/MS. Glycoproteins were identified with matched nonglycopeptides and could be finally quantified with spectra counting approach as shown in Fig. 1. Compared with the method of glycopeptide capture, identification with nonglycopeptides has its own advantages. First, glycoproteins are captured by the same hydrazide chemistry, which can guarantee the specificity and efficiency. We optimized the glycoprotein capture and washing procedure to reduce the nonspecific adsorption of albumin and related proteins that may bind to albumin. After the captured proteins being denatured, the hydrazide beads were washed with 8 M urea in 100 mM NH 4 HCO 3 for another three times. This washing step could wash off albumin and the nonglycoproteins that adhesive to albumin and captured glycoproteins. Serum sample was processed with this optimized procedure. In a randomly chosen two-dimensional LC-MS/MS run, there were more than 1000 proteins identified. Within these proteins, 222 proteins were identified with at least two unique matched peptides or 4 MS/MS spectra, which is highly confident. For comparison, the glycopeptides on the beads were released by PNGase F, after the release of nonglycopeptides, and analyzed by LC-MS/MS. Fig. 2 shows the overlap of glycoprotein identifications between the glycopeptide method and nonglycopeptide method, which demonstrated that serum glycoproteins can also be identified by nonglycopeptides. It was observed that some glycoproteins identified by glycopeptides have not been identified by nonglycopeptides. This is because in glycoprotein identification with nonglycopeptides, much stricter criteria was applied, which needs two unique peptideidentifications. Actually, the coverage of serum glycoprotein identification by nonglycopeptides was better than that with glycopeptides as more glycoproteins were identified than that with glycopeptides. We investigated the 222 proteins identified by nonglycopeptides in the Swiss-prot database. A total number of 174 proteins were annotated and 120 (67.0%) of them were found to be glycoproteins. Although the percentage of glycoproteins was still lower than that in glycopeptides identification, this specificity is much better than that in lectin chromatography method, which is only 22.3% as reported by Wang et al. (11). The percentage of glycoproteins was also higher than the previous work using nonglycopeptides for glycoprotein identification, in which 191 of 589 proteins (32.4%) identified in Hela cell are glycosylated (24). Low specificity would reduce the accuracy and sensitivity in identification and quantification of low abundance glycoprotein in serum biomarker discovery because the most abundant serum albumin is nonglycosylated and some proteins would bind to albumin to form protein complexes. However, nonspecific adsorptions such as this could be removed with additional washing steps after protein being denatured. Moreover, the spectrum number of glycoproteins took more than 95% in the total spectrum number, which demonstrated that the nonspecific proteins had little affects on glycoprotein identification and quantification. Second, there are much more nonglycopeptides than glycopeptides in a glycoprotein, thus the increased number of peptides may increase the sensitivity and confidence in iden-tifying low abundance glycoproteins, especially for the proteins with only one glycosylated site, e.g. ␣-fetoprotein. Third, in a glycoproteomics study of human liver tissue, it has been found that some glycoproteins can not be identified by glycopeptide capture with only trypsin digestion because the tryptic glycopeptides are beyond the detection range of mass spectrometry (21). However, it would not be a problem by using nonglycopeptides to identify glycoproteins. Forth and importantly, identifying with nonglycopeptides facilitates the quantification of glycoproteins by spectral counting method, which is a more sensitive method for detecting protein abundance changes (17).
Relative Quantification of Glycoproteins with Nonglycopeptides-As a method developed to find potential biomarkers, it must be capable of accessing the relative abundance of proteins from various concentration levels in reproducible and high throughput way, from abundant glycoproteins like antitrypsin (mg/ml) to typical biomarker like ␣-fetoprotein (ng/ml). To investigate the capability of our method, the linearity and sensitivity have been evaluated with processing of commercially available human serum spiked with different amount of standard glycoproteins, i.e. rebonuclease B and ␣-2-HS glycoprotein from bovine. The spiked concentrations of glycoprotein standard ranged over two orders of magnitude. The linearity of the spectral counts and the amount of spiked standard glycoproteins was given in Fig. 3. It could be seen that the spectral counts correlated well with the amount of spiked standard glycoproteins with a linear range of at least 10 2 orders of degree, which demonstrated that the method could quantify the difference of glycoprotein abundance sensitively in a wide concentration range. Similar linearity and sensitivity with ordinary spectral counting approach have also been reported (6). It should be noted that when calculating the spectral counts, the glycosylated peptides that released by PNGase F were not taken into account because compared with the number of nonglycopeptides, the spectral counts of glycopeptides could be neglected and the ratio of spectral counts would not be influenced. Quantification by nonglycopeptides should be more sensitive than by glycopeptides theoretically because there will be much more nonglycosylated peptides than glycosylated peptides generated from a glycoprotein. However, increased number of peptides may also complicate the sample loaded on to MS instrument. To overcome the limitation of peak capacity of 1D nano-LC column, a two-dimensional LC-MS/MS system was used to analyze the complex tryptic peptides. The increased number of peptides facilitated the quantification by spectral counting method, which is an advance over quantification by MS peak intensity.
In the quantification of cell surface proteome changes by analysis of LC-MS maps of isolated N-glycopeptides (16), high Squared-Pearson-Correlation R 2 (0.986 -0.989) of technical replicates and R 2 ϭ (0.925-962) for biological replicates has been found. However, the variability of signal intensities among replicates increases with decreasing signal intensity. High variability for peptide quantification of low-intensity peptides was likely due to the low signal-to-noise ratio as well as the interference of higher abundance peptides (25), which is more prominent in serum sample analysis. Although the reproducibility can be improved by using more than one peptide for quantification, it is not practical for quantification by glycopeptides because it has been found that average of less than two glycopeptides were identified for one glycoprotein in large-scale glycoproteomics study (19,21).To assess the reproducibility of quantifying glycoproteins with glycoprotein capture and spectral counting, the spectral counts of identified glycoproteins in two technical replicates and two sample replicates were plotted with each other. For technical replicates, the same nonglycopeptides mixtures were analyzed in replicates with the online two-dimensional LC-MS/MS system. For sample replicates, two serum samples of the same amount were processed in parallel with the glycoprotein capture and nonglycopeptides releasing. Released nonglycopeptides were analyzed with two-dimensional LC-MS/MS respectively. The scatter plots and R-squared values for the two-dimensional LC-MS/MS run of the technical runs and two sample replicates were given in Fig. 4. The spectral count had a high reproducibility (R 2 ϭ 0.9942, 0.9843, respectively). This could be attributed to the high efficiency and specificity of simplified glycoprotein capture procedure, which need no immunodepletion. Beside, as there are many more nonglycopeptides identified compared with glycopeptides in one glycoprotein, the influence of the variances in instrument condition, glycoprotein capture procedure, and the other experimental processes can be offset, which will certainly compromise the reproducibility and accuracy in quantification with peak intensity of glycopeptides. The spectrum counting approach was proved to be more reproducible than that of MS1 ion chromatogram in the analyzes of S. cerevisiae crude membrane fractions (26). It was suggested that quantification with the peak area of a peptides in MS1 is a peptide or locally based quantitative approach whereas spectrum counting is a protein or global based approach. Large standard deviations would occur by using ion chromatograms when post-translational modification exists. Besides, spectrum counting has a wider dynamic range over ion chromatogram ratios.
Briefly, the combination of the selective capture of glycoprotein in human serum and label-free quantification of the nonglycopeptides is quite a feasible way to quantify concentration changes of glycoprotein in human serum samples. This approach demonstrated high selectivity to enrich the glyco-FIG. 3. Spectral counts of glycoproteins versus its spiked concentration, A, ␣-2-HS glycoprotein, from 5 g/ml to 500 g/ml, B, Rebonuclease B, from 5 g/ml to 100 g/ml. Different amounts of glycoprotein standards were spiked into 10 l human serum of healthy donors and the spiked samples were processed with the procedure described above. proteins in human serum without the need of immunodepletion step. The relative quantification of glycoprotein by comparing the spectral counts of released nonglycopeptides is an accurate, sensitive and reproducible approach to detect the differences in concentration of glycoproteins over a linear range of more than two orders of magnitude.
Application in the Differential Glycoprotein Screening in Hepatocellualr Carcinoma-As this accurate, reproducible and high throughput approach to quantitatively analyze glycoproteins in human serum was developed, it was applied to find diffrentially expressed serum glycoprotein in HCC. The pooled serum samples of 20 healthy people and 27 patients with HCC were analyzed with the established workflow. Each sample digest was analyzed by two-dimensional LC-MS/MS in two replicates. The spectral counts for a protein in each replicate run were averaged. The average of the spectral counts in healthy and HCC sample were compared. Totally 148 glycoproteins were quantified with high confidence when the strict criteria were applied to filter the dataset as described in experimental section. All information about the spectral counts and identified peptides of these proteins is given in supporting materials. Fig. 5 illustrates the changes of spectral counting of these proteins. The majority of identified glycoproteins except some high abundant glycoproteins like antitrypsin did not show substantial concentration changes with log2 spectrum count ratio within the range of -1 to ϩ1, which suggested there was no analytical bias toward different samples. However, there are 38 glycoproteins found with significant differences in concentration in two sets of samples which were listed in Table II. The good accordance of the results with previous publications demonstrated the reliability of our approach in differential glycoprotein identification. AFP, the only clinically used marker for HCC diagnosis, was only detected in HCC serum. Cholinesterase, formed in liver, is widely measured as a liver function test and the activity was reported to decrease in liver cirrhosis or hepatocellular carcinoma (27)'which is in accordance with our results. Vitronectin, which is a multifunctional plasma glycoprotein that participates in the regulation of coagulation, fibrinolysis, and the complement cascade (28), was also found in decreased concentration in serum of HCC patients. The plasma vitronectin level was low in all liver disease groups as compared with the healthy controls. The difference from the controls was significant in patients with hepatocellular carcinoma and decompensated cirrhosis. Moreover, the plasma vitronectin level was positively correlated with the levels of serum cholinesterase (29). The function and interaction of this protein has attracted extensive attentions. It has been demonstrated that increased expression of vitronectin associated with areas of lymphocyte infiltration in chronically inflamed liver and in primary and metastatic tumors, supporting the adhesion and migration of tumor-infiltrating lymphocytes (TIL), which suggested that vitronectin play an important role in recruiting and

FIG. 4. Scatter plots and R-squared values for the two-dimensional LC-MS/MS run of (A) two technical runs and (B) two sample replicates, the linearity of the plots demonstrates the reproducibility of the developed approach in both mass spectrometry identification and sample process.
FIG . 5. Log2 ratio of spectra counts (HCC vs. normal) for each glycoprotein versus its average spectra counts, which demonstrated the distribution of identified glycoproteins.
positioning lymphocytes with inflamed and malignant liver tissue (30). The 8900 Da C-terminal part of the V10 fragment of vitronectin was found with increased concentration in sera samples of HCC patients as compared with in the sera of patients with cirrhosis (31). All these results demonstrate that vitronectin need to be further studied to find its role and function in HCC. Some other glycoproteins that interact with vitronectin were also found with different concentration between healthy and cancerous samples. It has been found that vitronectin associates with the ␥-chain of fibrinogen during coagulation, and may thereby modulate hemostasis and inflammation (32). In our results the significant up-regulation of the ␥-B chain of fibrinogen was found, which was in accordance with the results reported by Comunale et al. (33). The increased level of complement factor B was also in consistent with their results.
Confirmation of Differential Level of IGFBP-3, S90K and TSP-1 in HCC Serum by ELISA-Apart from the alpha-fetoprotein, which is the only clinically used biomarker for HCC right now, other glycoproteins with substantial expression changes were also found , e.g. insulin-like growth factor binding protein 3 (IGFBP-3), galectin-3 binding protein (S90K,or Mac-2 BP) and thrombospondin 1(TSP-1). To confirm their differential expression of galectin-3 binding protein, IGFBP-3 a Proteins were confirmed as glycoproteins by identification of N-glycopeptides in this study and previous large-scale studies of human liver N-glycoproteome (21). b The spectral counting was the average of spectral counts of three replicated LC-MS/MS runs. c ND denotes no mass spectra matched to this protein in this sample set. and TSP-1 other cohorts of serum samples from 58 HCC patients and 30 normal control subjects were analyzed by commercial ELISA kit. Serum levels of galectin-3 binding protein were significantly elevated in HCC patients (median ϭ 4.49 g/ml), in comparison to healthy individuals (median ϭ 3.43 g/ml) (p Ͻ 0.0001). The IGFBP-3 serum levels in HCC were significantly lower (median ϭ 0.5526 g/ml) than those in healthy controls (median ϭ 2.143 g/ml) (p ϭ 0.0085). The TSP-1 serum levels in HCC were also lower (median ϭ 0.7092 g/ml) than those in healthy controls (median ϭ 1.058 g/ml) (p Ͻ 0.0001) (Fig. 6). All the results of ELISA were in accordance with that achieved with label-free quantitative glycoproteomics approach, which suggested that the developed approach was accurate and reliable in differential glycoprotein identification. In a preliminary assessment of these glycoproteins for the discrimination between HCC patients and healthy people, the diagnostic sensitivity was calculated, which was: galectin-3 binding protein -72%; IGFBP-3 -98%; TSP-1 -78% (Fig. 7).

DISCUSSION
Although great technical achievements have been made in the past few years to measure the global protein expression with mass spectrometry based quantitative proteomics approach, there are still some unresolved issues for the highthroughput and accurate quantification of proteins with modifications like glycosylation. Glycoprotein quantification with lectin-based approach was very popular in the discovery of serum glycoprotein biomarkers as lectins can recognize certain types of glycan linkages and glycan structure, which may reveal distinct glycosylation pattern changes in human cancers. Unlike lectin-based approach, hydrazide chemistry based approach captures oxidized glycoproteins onto solid beads via covalent linkage. Although the glycan information was lost during periodate oxidation, the efficiency and selectivity for extraction of glycoproteins from complex samples like serum could be increased. In hydrazide chemistry approach, glycoproteins were traditionally identified and quantified with the deglycosylated peptides which were released by PNGase F. It was reported in recent studies that the intensity of the glycopeptides in mass spectrometry were not consistent with the protein abundance, which indicates the quantification of glycopeptides was only a relative reflection of the site specific N-glycosylation occupancy. Generally, most current quantitative glycoproteomics approaches quantify the glycosylation change but not the protein expression. Since the most assay methods in disease diagnosis are the measure of protein expressions, effective methods to measure the actual amount of glycoprotein in clinical samples like serum are needed. We developed this glycoprotein captured based label-free approach which could quantify serum glycoproteins in accurate and high-throughput way. The performance of the developed approach has been demonstrated with control experiments that described in the results section. As most secreted glycoproteins should be glycosylated in the ER and Golgi apparatus and then secreted through the plasma membrane to the blood, it could be assumed that the total amount of the protein in the serum was equal to the glycosylated form of the protein. Because of the high efficiency of the developed approach, which captures both N-linked and O-linked glycoproteins at protein level, this method actually quantifies the protein expression change rather than glycosylation change. In this study, the quantitative results of most identified differential glycoproteins were in accordance with that reported with traditional methods. Moreover, the relative changes of three glycoproteins, IGFBP-3, galectin-3 binding protein and TSP-1, were validated with ELISA assay which were in accordance with that achieved with the quantitative proteomics with our approach. Though these differential glycoproteins were identified, there are still overlaps between HCC and normal samples. As most of the found marker candidates have the problem of overlap, the specificity could be increased by the combination of multiple proteins. Another possible approach to increase the specificity of cancer diagnosis was the combination both the changes of protein expression and glycosylation pattern. For example, the specificity of AFP could be increased using lectins like Lens culinaris agglutinin to detect fucosyaltion change (34). As a result, glycosylation pattern of these identified differential glycoproteins could be analyzed to see if the overlap could be reduced by lectin-based approaches.
Estimation from the year 2000 indicates ϳ564,000 new cases of HCC occurs worldwide per year. HCC remains the fifth most common malignancy in men and the eighth in women worldwide (35). Early detection of HCC is of great importance to improve the prognosis as therapeutic options increase. AFP is currently the only clinical marker which has been widely used for serological diagnosis of human HCC (36). However, its specificity is low, especially for chronic liver diseases and cirrhosis, and its sensitivity shows only 60 to 80% at cutoff value of 20 g/L in serum (37). As it is urgent to find another noninvasive, reliable method for detecting HCC as early as possible, the application of high-throughput screening technologies like proteomics is becoming the trend in biomarker discovery.
In this study, the developed method was applied in the differential glycoprotein screening between normal people and HCC patients, the three validated differential glycoproteins have already been found in association with the generation and development of human cancer. The IGF signaling pathway is regulated by a six-member family of IGF binding proteins (IGFBP-1 to 6). Among them, IGFBP-3 has the highest affinity for IGF-1 and the most abundant IGFBP family member in the circulation. IGFs are synthesized mainly in the liver and also in peripheral tissues (including the prostate). IGFBP-3 could sequester the ligand, preventing IGF-1-induced IGF-R1 autophosphorylation and signaling. Furthermore, IGFBP-3 has been shown to exhibit effects on proliferation, migration, and apoptosis that are independent of its involvement in IGF signaling (38 -39). IGFBP-3 was found to be inversely associated with benign prostate hyperplasia risk, although findings were statistically significant only for men with severe symptoms (40). Epidemiologic studies have also suggested that high circulating levels of IGF-I and low levels of IGFBP-3 are associated with increased risk of premenopausal breast cancer (41)(42) and lung cancer (43). In liver cancer, only one report showed that in 12 human HCC, nine showed reduced expression of the IGFBP-3 (44).
Galectin-3-binding protein (lectin, galactoside-binding soluble, 3-binding protein), also named Mac-2BP or S90K, is a large oligomeric glycoprotein first identified as a tumor-associated antigen in breast cancer (45). Mac-2BP binds to multiple proteins and molecules mediating cell-matrix and cellcell adhesions such as collagens, fibronectin and nidogen, which are critical during tumor cell invasion and migration (46).In immune system. Mac-2BP stimulates activity of natural killer cells and lymphokine-activated killer cells partly mediated through the induction of IL-2. In noninflammatory cells, Mac-2BP was identified as an IL-6 stimulatory factor in BMSC and a critical mediator in bone invasion in metastatic neuroblastoma (47). Mac-2BP is elevated and correlates with prognosis in patients with a variety of different cancers, including breast, nasopharyngeal, and lung cancer (48 -50). More recently, in defining a molecular signature of potential prognostic value of Ewing's sarcoma, Mac-2BP was identified and confirmed as the most important predictor of event-free survival and overall survival. Forced expression of Mac-2BP showed reduced ability to form colonies and to migrate as well as higher cellular aggregation. In addition, cells with higher expression of Mac-2BP displayed stronger adhesion to vitronectin but reduced adhesion to laminin and collagen IV, the 2 major components of basal lamina, implying that through favoring cell-cell aggregation and inhibiting cell adhesion to basal lamina, Mac-2BP maybe keep EWS cells in fluid blood circulation, where they are more prone to die, and prevent EWS cell adhesion to endothelium and extravasation (51). In liver diseases, serum 90K/Mac-2BP was found as an independent predictor of disease severity during HCV infection (52). In another report, serum Mac-2BP was evaluated in 11 chronic active hepatitis, 48 liver cirrhosis and 36 hepatocellular carcinoma. At a cut-off point of 14 g/ml (100% of specificity in 50 controls), increasing positivity was observed from chronic active hepatitisto cirrhosis and then to HCC (27%, 50 and 78%, respectively) (53).
Thrombospondin-1 (TSP-1) is a large matricellular glycoprotein secreted by many cell types. It is a component of the extracellular matrix during active and subacute processes. TSP-1 has various functions in different human diseases like hemostasis, angiogenesis, tumor cell biology, wound healing, and arterial remodeling (54). It has been found that TSP-1 is an endogenous inhibitor of tumor growth and angiogenesis through direct effects on endothelial cell migration and survival and through indirect effects on growth factor mobilization. TSP-1 that is present in the tumor microenvironment also acts to suppress tumor cell growth through activation of transforming growth factor ␤ in those tumor cells that are responsive to TGF-␤ (55). TSP-1 has been shown to be highly expressed in human malignant tissues and present in higher than normal levels in the plasma of cancer patients (56), like beast, lung, and colorectal carcinoma (57), which is in contrast with our result, suggesting a possible unique figure of hepatocellular carcinoma.
In summary, an effective approach in the serum glycoprotein quantification was developed and applied in the biomarker discovery for HCC diagnosis. Thirty-eight glycoproteins with abundance changes were identified, including AFP, the clinical marker for HCC. The concentrations changes of three glycoproteins, i.e. insulin-like growth factor-binding protein 3, galectin-3-binding protein, and thrombospondin-1, were confirmed by ELISA with another samples collection and demonstrated good specificity for the discrimination between HCC patients and healthy people. However, it should be noted that the aim of this study was the establishment of the method for high-throughput screening of differential glycoproteins in human serum and the developed method has been proved to be accurate and reliable. The clinical applicability of these proteins could be examined by further validation with samples from liver disease like cirrhosis and early stages of HCC. As this effective approach was developed, it can also be applied in the biomarker discovery of other diseases.