Glycoproteomic Analysis of Prostate Cancer Tissues by SWATH Mass Spectrometry Discovers N-acylethanolamine Acid Amidase and Protein Tyrosine Kinase 7 as Signatures for Tumor Aggressiveness*

The identification of biomarkers indicating the level of aggressiveness of prostate cancer (PCa) will address the urgent clinical need to minimize the general overtreatment of patients with non-aggressive PCa, who account for the majority of PCa cases. Here, we isolated formerly N-linked glycopeptides from normal prostate (n = 10) and from non-aggressive (n = 24), aggressive (n = 16), and metastatic (n = 25) PCa tumor tissues and analyzed the samples using SWATH mass spectrometry, an emerging data-independent acquisition method that generates a single file containing fragment ion spectra of all ionized species of a sample. The resulting datasets were searched using a targeted data analysis strategy in which an a priori spectral reference library representing known N-glycosites of the human proteome was used to identify groups of signals in the SWATH mass spectrometry data. On average we identified 1430 N-glycosites from each sample. Out of those, 220 glycoproteins showed significant quantitative changes associated with diverse biological processes involved in PCa aggressiveness and metastasis and indicated functional relationships. Two glycoproteins, N-acylethanolamine acid amidase and protein tyrosine kinase 7, that were significantly associated with aggressive PCa in the initial sample cohort were further validated in an independent set of patient tissues using tissue microarray analysis. The results suggest that N-acylethanolamine acid amidase and protein tyrosine kinase 7 may be used as potential tissue biomarkers to avoid overtreatment of non-aggressive PCa.

The identification of biomarkers indicating the level of aggressiveness of prostate cancer (PCa) will address the urgent clinical need to minimize the general overtreatment of patients with non-aggressive PCa, who account for the majority of PCa cases. Here, we isolated formerly N-linked glycopeptides from normal prostate (n ‫؍‬ 10) and from non-aggressive (n ‫؍‬ 24), aggressive (n ‫؍‬ 16), and metastatic (n ‫؍‬ 25) PCa tumor tissues and analyzed the samples using SWATH mass spectrometry, an emerging data-independent acquisition method that generates a single file containing fragment ion spectra of all ionized species of a sample. The resulting datasets were searched using a targeted data analysis strategy in which an a priori spectral reference library representing known N-glycosites of the human proteome was used to identify groups of signals in the SWATH mass spectrometry data. On average we identified 1430 N-glycosites from each sample. Out of those, 220 glycoproteins showed significant quantitative changes associated with diverse biological processes involved in PCa aggressiveness and metastasis and indicated functional relationships. Two glycoproteins, N-acylethanolamine acid amidase and protein tyrosine kinase 7, that were significantly associated with aggressive PCa in the initial sample cohort were further validated in an independent set of patient tissues using tissue microarray analysis. The results suggest that N-acylethanolamine acid amidase and protein tyrosine kinase 7 may be used Prostate cancer (PCa) 1 is the most common noncutaneous cancer and the second leading cause of cancer-related death in men in the United States (1). Most diagnosed cases represent slow-growing, non-lethal forms of cancer. Unfortunately, neither the currently available diagnostic biomarkers, such as serum prostate-specific antigen (PSA), nor histological examination of (biopsied) tumor tissue can distinguish aggressive (AG) PCa from non-aggressive (NAG) PCa. This situation leads to the undertreatment of AG PCa and, more important, the overtreatment of NAG PCa (2). In fact, up to 90% of men with PCa harbor localized tumors that are unlikely to cause significant symptoms or mortality. Of these, many are overtreated because of a lack of clear (molecular) indicators that guide physicians to the appropriate treatment. All available treatment options, including surgery, radiation therapy, and hormonal therapy, carry a risk of complications and show a range of side effects that impact the patient's long-term quality of life. There is, therefore, a pressing clinical need to identify new PCa biomarkers in clinical tissue or blood that distinguish AG from NAG prostate tumors.
PCa tissue samples (e.g. those obtained from biopsies) are routinely subjected to histopathological examination, and the results are reported by a Gleason score, a grading score ranging from 2 to 10 that is calculated by adding the score of the predominant grade pattern and that of the second most common grade pattern in a specific sample. The Gleason score helps guide patient treatment, but sometimes it fails to do so sufficiently because it cannot be used to distinguish significant molecular heterogeneities of PCa and a range of clinical trajectories (3). For example, the clinical outcome is unpredictable for most Gleason 7 PCas (4,5). Molecular-level phenotyping has been proposed as a means to develop a more highly resolving scoring system capable of correctly classifying clinically important PCa types. In principle, clinical samples can be phenotyped by different types of measurements (e.g. genomic (6), epigenomic (7), transcriptomic (3), metabolomic (8), and proteomic (5)). To date, transcript profiling has been used most extensively, mainly because of the relatively advanced maturity and accessibility of the respective measurement techniques (9). However, proteomic measurements should be equally or more informative, because proteins are more dynamic and diverse and more directly reflective of cellular physiology than nucleic-acid-based markers (10). Moreover, PSA and other approved protein markers (11) exemplify the potential information contents of proteins.
The glycoproteome represents a subproteome that is particularly relevant for clinical research because glycoproteins are usually found on the cell surface or secreted by tissues and are more likely to be detected in the blood stream as non-invasive biomarkers (12)(13)(14)(15)(16)(17). In fact, all current blood tumor biomarkers, including PSA in the case of PCa, that are approved by the U.S. Food and Drug Administration are glycoproteins (14). We previously developed a protocol for the solid phase extraction of glycopeptides (SPEG) to robustly isolate the glycoproteome based on chemical immobilization and enzymatic release of N-linked glycopeptides with high specificity (18,19) that was thereafter successfully applied in different cancer biomarker discovery studies (18, 20 -25).
In this study we used SPEG to profile the N-glycoproteome of PCa histotypes to identify glycoproteins associated with tumor aggressiveness. The isolated de-N-glycopeptide samples from prostate tissues were analyzed via the recently developed SWATH mass spectrometry (SWATH-MS) technology. SWATH-MS is a data-independent acquisition method (26) that essentially allows one to convert all the peptides ionized from a clinical sample into a perpetually reusable digital map (27). When combined with a targeted data analysis strategy, SWATH-MS was demonstrated to achieve the favorable accuracy, dynamic range, and reproducibility of selected reaction monitoring (SRM), the gold-standard quantitative proteomic technology, while greatly extending the degree of multiplexing to thousands of peptides (26,28,29). We have recently demonstrated that the combination of SWATH-MS and de-N-glycopeptide isolation has promising quantitative performance for biomarker verification in human plasma (28). Here we establish that this integrated technology is also highly efficient for "molecular phenotyping" of tissue specimens because, once acquired, the quantitative data files representing control and disease-affected human tissues support iterative in silico biomarker discovery. To facilitate the targeted analysis of SWATH maps, we generated a spectral library covering a large part of the human N-glycoproteome, specifically optimized for SWATH-MS analysis. This library will also provide the community with a high-quality set of reference assays for future MS analyses of the global human N-glycoproteome and for related clinical applications. Furthermore, our SWATH dataset led to the identification of regulated proteins and pathways that might serve a predictive role in discriminating AG and NAG PCas.
Clinical Samples-Samples and clinical information were obtained with informed consent, and procedures were performed with the approval of the Institutional Review Board of the Johns Hopkins University. NAG and AG primary prostate tumors were collected via radical prostatectomy or transurethral resection of the prostate at Johns Hopkins Hospital and Johns Hopkins Bayview Medical Center under the National Cancer Institute-funded Johns Hopkins prostate cancer SPORE project. The NAG PCa group included 22 tumor specimens with Gleason scores of 6 and 2 tumor specimens from tumors with Gleason scores of 7 with no evidence of recurrence in up to 15 years of follow-up. The AG PCa group included 11 tumor specimens with Gleason scores of 8 or 9 and 5 tumor specimens with Gleason scores of 7 from patients who either died of cancer metastasis within 6 years of surgery or were positive for metastatic tumor at the time of surgery (supplemental Table S1). The 25 metastatic tumors were from men who died of PCa and underwent autopsy as part of the Project to Eliminate Lethal Prostate Cancer rapid-autopsy program of the Johns Hopkins Autopsy Study of Lethal Prostate Cancer, initiated in 1994. All subjects underwent androgen deprivation during the course of their treatment. The 10 normal prostate tissues were from healthy transplant donors who died from accidents or suicide. The primary prostate tumor tissues were immediately frozen after resection from surgery. The normal prostate tissues were immediately frozen after resections from transplant donors. The metastatic tumor tissues were acquired via rapid autopsy (a few hours to a day after death). All specimens were snap-frozen, embedded in optimal cutting temperature compound, and stored at Ϫ80°C until use.
Isolation of de-N-glycopeptides and Sample Preparation-Frozen prostate tissues embedded in optimal cutting temperature compound were sectioned and stained with hematoxylin and eosin (H&E). The H&E staining was used to guide cryostat microdissection for enrichment of the tumor content of tissue. After cryostate microdissection, 6-m-thick tissue sections for each specimen were collected in sterile screw-cap bullet tubes. Proteins were extracted using cell lysis buffer (50 mM Tris, pH 8.0, 150 mM NaCl, 0.1% SDS, 0.5% sodium deoxycholate, 1% Triton X-100). BCA assay was performed, and 100 g of total protein mass per specimen was used to extract formerly N-linked glycopeptides via the SPEG procedure as described previously (20,22). Briefly, the proteins were alkylated and digested into peptides, which were cleaned up by means of C18 chromatography prior to SPEG. The peptides were treated with sodium periodate to oxidize the glycan moieties of glycopeptides and purified by G-10 gel filtration cartridges (Nest Group Inc., Southborough, MA). The sample was then conjugated to Affi-gel Hydrazine resin (Bio-Rad) overnight. The unbound peptides were removed through an extensive washing procedure. N-linked glycopeptides were released by PNGase F. Finally, de-N-glycopeptides were used for downstream MS analysis. To generate a SWATH spectral library and to quantify the de-N-glycopeptides from each tissue group, equal amounts of peptide samples from each tissue group were pooled together and analyzed via LC-MS/MS. In addition, small-scale sample pools (from five individual samples each) were generated for non-aggressive, aggressive, and metastatic prostate tumors. Eleven retention time anchor peptides (iRT peptides, Biognosys AG, Zurich, Switzerland (30)) were added into each sample at a ratio of 1:30 v/v. For each SWATH analysis, equal amounts of sample (estimated to be roughly 1 g of total peptide mass) derived from pooled tissue were analyzed so that a meaningful comparison could be achieved across groups.
SWATH-MS Measurement-SWATH-MS datasets (or SWATH maps) were acquired using an AB Sciex 5600 TripleTOF mass spectrometer (Concord, Ontario, Canada) interfaced to an Eksigent NanoLC Ultra 2D Plus HPLC system (Dublin, CA) as previously described (26,28,29). Peptides were directly injected onto a 20-cm PicoFrit emitter (New Objective, self-packed to 20 cm with Magic C18 AQ 3-m, 200-Å material) and then separated using a 120-min gradient of 2% to 35% buffer (buffer A: 0.1% (v/v) formic acid, 2% (v/v) acetonitrile; buffer B: 0.1% (v/v) formic acid, 90% (v/v) acetonitrile) at a flow rate of 300 nl/min. In SWATH-MS mode, the instrument was specifically tuned to optimize the quadrupole settings for the selection of precursor ion selection windows 25 m/z wide. Using an isolation width of 26 m/z (containing 1 m/z for the window overlap), a set of 32 overlapping windows was constructed covering the precursor mass range of 400 -1200 m/z. The effective isolation windows can be considered as 399.5-424.5, 424.5-449.5, etc. SWATH MS2 spectra were collected from 100 to 2000 m/z. The collision energy was optimized for each window according to the calculation for a charge 2ϩ ion centered upon the window with a spread of 15 eV. An accumulation time (dwell time) of 100 ms was used for all fragment-ion scans in high-sensitivity mode, and for each SWATH-MS cycle a survey scan in high-resolution mode was also acquired for 100 ms, resulting in a duty cycle of ϳ3.4 s.
Shotgun Measurement of Isolated de-N-glycopeptides from PCa Samples-For shotgun acquisition, peptides from subpools of each PCa group were firstly measured on an Oribtrap XL (Thermo Scientific) to check the sample quality in collision-induced dissociation mode, and then de-N-glycopeptides from all four tissue groups were pooled equally as a super-mixture and analyzed using classical shotgun data acquisition on a TripleTOF 5600 instrument via four injection replicates. For measurements on the Oribtrap XL, a 90-min gradient was used for each sample using the acquisition method published previously (31). For shotgun MS/MS on the TripleTOF, the same chromatographic system and settings as described above for SWATH-MS were used. MS1 spectra were collected in the range of 360 -1460 m/z for 250 ms. The 20 most intense precursors with charge states of 2 to 5 that exceeded 250 counts per second were selected for fragmentation, and MS2 spectra were collected in the range of 50 -2000 m/z for 100 ms. The precursor ions were dynamically excluded from reselection for 20 s.
Shotgun Measurement of Synthetic Peptides for the Generation of an N-glycoprotein SWATHatlas-We previously published an SRM assay library for 2007 human N-glycosylated proteins (N-glycoprotein SRMAtlas) for targeted proteomic analysis (31). In that work the SRM assays were generated mainly by SRM-triggered MS2 acquisition on a QTrap instrument. For this study, all human synthetic peptides from the SRM assay library (31) were re-acquired for spectral library generation, but this time using the shotgun mode on the 5600 mass spectrometer. Basically, the peptide selection sources were the Nglycosites identified in large discovery-driven MS-based experiments in diverse human tissues, cell lines, and plasma and the N-glycosites that were selected from the UniProt database (13,31), with additional peptide targets from recent shotgun datasets (28), yielding an identified protein list with a total of 2460 glycoproteins with the high sensitivity of the 5600 mass spectrometer. Peptides were synthesized using SPOT-synthesis technology (JPT Peptide Tech, Berlin, Germany) (32). About 800 peptides were mixed with iRT peptides together for each separate shotgun analysis using the 5600 instrument, with the same LC-MS settings described above. Shotgun measurements were repeated for certain plate-pools to maximize peptide coverage.
Spectral Library Generation-Profile-mode .wiff files from shotgun data acquisition were converted to mzML files in centroided format using AB Sciex Data Converter v.1.3 (default parameter) and then further converted to mzXML files using MSConvert v.3.04.238. The MS2 spectra were queried against the canonical Swiss-Prot complete proteome database for human (November 2012) appended with common contaminants, iRT peptide sequences, and the corresponding reversed sequence decoys (33) (40,951 protein sequences including decoys). The SEQUEST database search (34) through Sorcerer PE version 4.2 included the following criteria: trypsin as digestion enzyme; semi-tryptic peptides and peptides with up to two missed cleavages were allowed; static modifications of 57.02146 Da for cysteines; variable modifications of 15.99491 Da for methionine oxidations; and variable modifications of 0.98406 Da for asparagines (formerly N-glycosylated asparagines are converted to aspartic acids upon PNGase F treatment). The mass tolerances of the monoisotopic parent and fragment ions were set as 50 ppm. The identified peptides were processed and analyzed using Trans-Proteomic Pipeline 4.5.2 (35), and search results were validated using the PeptideProphet score (36). N-glycosylation motif information was used in PeptideProphet. For the synthetic peptides, the database for shotgun searching was generated by concatenating peptides plus scrambled pseudo protein sequences (31). The SEQUEST database searching parameters were identical to those for the shotgun analysis of the biological sample, except for the specifications of fully tryptic digestion and up to one miss cleavage. All the peptides were filtered at a false discovery rate (FDR) of 1%, as estimated by PeptideProphet at the peptide spectrum match (PSM) level (33). Peptides mapped to redundant protein identities were excluded for SWATH-MS quantification at the protein level.
The raw spectral libraries were generated from all valid PSMs in shotgun experiments of both natural samples and synthetic peptides and then refined into the nonredundant consensus libraries (29) using SpectraST (37). To generate the spectral library for SWATH-MS, the highest priority was given to the shotgun spectra acquired from the shotgun analysis on the TripleTOF 5600, because those fragment ion spectra most closely resembled those generated in SWATH-MS. This means that the spectra of unique shotgun identifications from the Oribtrap XL analyses were only accepted if the corresponding peptides were not identified with the TripleTOF. For each peptide, the retention time was mapped into the iRT space (30) with reference to a linear calibration constructed for each shotgun run, as previously described (29). The MS assays constructed from the top six most intense fragments with a Q3 range from 400 to 1200 m/z, excluding those falling in the precursor SWATH window, were used for targeted data analysis of SWATH maps.
Targeted Data Analysis for SWATH Maps-SWATH-MS .wiff files were first converted to mzXML profile using ProteoWizard (38). As described previously, the SWATH targeted data analysis was carried out using OpenSWATH (39) running on an internal computing cluster. OpenSWATH automatically integrates peak group extraction and a decoy scoring system using mProphet (40) to estimate FDR. Based on the generated spectral libraries, OpenSWATH identified the peak groups from the SWATH maps at FDR ϭ 1% and aligned them between SWATH maps from different samples based on the clustering behaviors of the retention time in each run. Specifically, features were considered for alignment based on a nonlinear alignment algorithm (41) with a maximum FDR quality of 0.25 (quality cutoff to still consider a feature for alignment) and/or the further constraint of a retention time difference of less than 60 s in LC gradient after iRT normalization.
Statistics and Functional Annotation-The peak intensities of unique peptides were reported by OpenSWATH for label-free quantification. First, a simple global normalization based on the total intensity was done for each sample (42). Hierarchical clustering analysis was done at the N-glycosite level using Cluster 3.0 on the log-transformed, two-dimensional centered and normalized peptide intensities (43). The hierarchical clustering analysis result was visualized with Treeview (44). Next, to quantify the protein abundances across samples, we summed up the most abundant identified peptides for each protein (top three if more than three peptides were identified). This proxy allows one to reliably estimate global protein abundance changes, as shown in previous studies (23,28,41,45). Peptides identified and aligned in Ͻ80% of samples were discarded at this step for quantitative glycoproteomic profiling. Principal component analysis was performed by R software using protein intensities. The candidates between PCa subtypes were prioritized by analysis of variance executed on Pomelo II (46), which also reported the differential glycoprotein list between AG and NAG groups. The GO cellular component classification was done by the BINGO 2.44 plugin in Cytoscape (47,48). The annotation of biological pathways and functional processes was done using the DAVID bioinformatics resource (49), and the enrichment analysis was performed by taking the entire N-glycoprotein SWATHatlas as background. The protein list in each category and the enrichment p values were then downloaded, manually compiled, and visualized by Cytoscape (48). SignalP 4.1 (50) and TMHMM 2.0 prediction (51) were used to predict the existence of signal peptides and transmembrane helices in a protein sequence for their possible involvement in classical secretion pathways or in integral membrane structures, with these protein features visualized by Protter (52). The receiver operating characteristic (ROC) curve analysis was done by PanelComposer (53), with values of the area under the ROC curve provided for individual proteins and the combine panel using a logistic regression model.
Network Analysis of the Relationship between Protein Sets and Public Genomic Data-The Reactome Functional Interaction Network (RFIN) (54,55) was used to investigate the functional relationships between regulated glycoproteins and altered genes. We manually compiled a list of 351 altered genes from seven genomic, transcriptomic, and epigenomic studies of prostate cancer (6, 7, 56 -60). These genes were found to be significantly mutated, bi-allelically mutated, epigenetically silenced, or recurrently fused or to have insertions and deletions in the previous studies. The statistical significance of the functional relationships between regulated glycoproteins and altered genes was assessed in RFIN and its 100 random instances. These random instances in the RFIN were created using a "switching algorithm" (61) in which the interaction partners of the nodes are randomized. The resulting network is of the same size and degree distribution as the original network, and it preserves the degree of each node. We used the switching algorithm implemented in the Random Network Plugin for Cytoscape (55), and the network graphs were visualized in Cytoscape (48).
Immunoassay Measurements of PSA in Clinical Specimens-Proteins used for immunoassay measurements of PSA from tissue specimens were the same protein extracts used for glycopeptide isolation. After protein extracts had been adjusted to 1 g/ml with PBS, tissue PSA levels were measured with an Access® Hybritech PSA assay (Beckman Coulter, Inc., Brea, CA) in a lab certified by Clinical laboratory improvement amendments at Johns Hopkins Medical Institutions.
Immunohistochemical Staining and Scoring-Staining was performed on formalin-fixed paraffin-embedded prostate tissue slides from six individuals with primary prostate tumors, as well as on a 336-core prostate tissue microarray (TMA). The TMA contained 56 cases of primary PCa. Each case included four cores of the tumor regions and two cores of the matching adjacent normal prostate tissues. IHC staining was performed as previously described (62,63). Briefly, slides with sections of tissues or TMA were deparaffinized and rehydrated. Tissue slides were incubated in antigen retrieval buffer at 92°C to 95°C for 10 min. Tissues were blocked by peroxidase block and 3% BSA/PBS for 30 min each followed by 2.5% horse serum blocking for 15 min at room temperature. The tissues were then incubated with a monoclonal mouse anti-NAAA or monoclonal mouse anti-PTK7 primary antibody in antibody dilution buffer at 50 g/ml or 10 g/ml, respectively, followed by incubation with anti-mouse antibody labeled polymer-HRP for 30 min each. The staining was detected with DAB chromogen. The intensity of NAAA and PTK7 was visually graded by a board-certified pathologist as 0 (no staining), 1 (weak staining), 2 (medium staining), or 3 (strong staining) in the epithelial compartments. A total of 56 cases (tumors and normal tissues) were scored. The Gleason scores of the tumor cores included in the TMA are listed in supplemental Table S2. IHC score differences were calculated by subtracting the IHC score of the matching adjacent normal cores from the tumor cores. The Wilcoxon-Mann-Whitney rank sum test was used to calculate hte statistical significance between normal and tumor tissues, as well as that between tumors with Gleason scores less than or equal to 3 ϩ 4 and tumors with Gleason scores greater than or equal to 4 ϩ 3.
Data Availability-The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http:// proteomecentral.proteomexchange.org) via the PRIDE partner repository (64) with the dataset identifier PXD000704.

Identification of Glycoproteomic Signals in SWATH Maps of
PCa Tumor Tissues-The experimental scheme of the present study is shown in Fig. 1. It principally combined SPEG isolation of de-N-glycopeptides from selected tissue specimens with SWATH-MS to discover glycoprotein candidates; the candidates were subsequently validated by means of TMA analysis. Overall, in this study we analyzed de-N-glycopeptide samples from 10 cases of normal prostate tissue, 24 cases of NAG PCa tissue, 16 cases of AG PCa tissue, and 25 cases of metastatic tumor samples. The level of aggressiveness was defined based on the Gleason score and the evidence of recurrence within up to 15 years (see "Experimental Procedures"). All PCa samples selected had a high tumor cell percentage (ϳ70%; supplemental Table S1). To gain sufficient starting material for the glycoproteomic analysis, and to focus on the general difference between normal prostate and the three different PCa types, samples were initially analyzed by equally pooling all samples from the same tissue group for the determination of a quantitative expression ratio between different tissue groups, based on total intensity normalization. Additional samples pooled with a smaller sample size from each tissue group were then analyzed to determine the p values of each glycoprotein between groups (supplemental Table S1).
The targeted identification of peptides in SWATH-MS datasets (or SWATH maps) requires a priori generation of a spectral library that includes essential coordinates such as precursor ion masses, fragment ion masses, fragment ion intensities, and retention times for each targeted peptide (27). We thus firstly generated a spectral library through shotgun sequencing of the isolated de-N-glycopeptides from same prostate samples. Glycopeptide pools were analyzed using classical shotgun data acquisition on Orbitrap XL and Trip-leTOF 5600 instruments (see "Experimental Procedures"). A total of 1919 N-glycosites assigned to 548 distinct glycoproteins were identified in all the shotgun experiments (see data supplement and supplemental Tables S3 and S4 for annotated MS2 spectra and identification summary on peptide and protein levels). We then used these fragment ion spectra to build a spectral library. Using this library, we identified and quantified 882 to 1296 N-glycosites in the SWATH maps generated from the clinical samples at an estimated FDR of 1% ( Fig. 2A).
To augment the spectral library with coordinates of Nglycosites that might be present in the PCa samples but not detected via the data-dependent acquisition (DDA) of de-N-glycopeptides from PCa tissues, in part because of the issue of stochastic precursor-ion selection in DDA (27,65,66), we built a more comprehensive spectral library by generating fragment ion spectra for all synthetic peptides from the human N-glycoprotein SRMatlas (31) using a 5600 mass spectrometer operated in DDA mode. This resulted in reference fragment ion spectra under conditions that mimicked those encountered in SWATH-MS, thus eliminating instrument bias (26). These data resulted in the generation of a synthetic consensus spectral library containing 5422 N-glycosites of 2460 human glycoproteins, covering nearly 50% of the human proteins that are annotated in the UniProt database as glycoproteins. This N-glycoprotein SWATHatlas library is the de facto model 5600 version of N-glycoAtlas. The coordinates that constitute a definitive mass spectrometric assay for each identified N-glycosite are provided in EXCEL format for future applications in supplemental Table S5. Intriguingly, by using the spectral library augmented with the spectra from synthetic peptides, we were able to detect 218 to 414 additional Nglycosites from the individual PCa SWATH maps, compared with the peptides identified using the tissue DDA library alone ( Fig. 2A). With these approximately 30% additional peptide identifications, the augmented library achieved significantly deeper analysis of the clinical samples than the library generated from tissue DDA data only, demonstrating that SWATH-MS analysis consistently identified and quantified N-glycosites that were difficult to detect with DDA analysis of the same samples (27,29). These data also illustrate the opportunity offered by SWATH-MS to iteratively reanalyze the SpectraST FIG. 1. Experimental design. Formerly N-glycosylated peptides of well-characterized PCa tissue samples were isolated via solid phase extraction of glycopeptides (SPEG) and subjected to SWATH-MS. A spectral library for the targeted identification and quantification of specific N-glycosites from the SWATH maps was generated via shotgun sequencing of the de-N-glycopeptides from clinical samples and synthetic reference peptides. The in-house-developed OpenSWATH software was used to identify and quantify peaks in SWATH maps, and quantification was followed by bioinformatic analysis and immunohistochemistry (IHC)-based tissue microarray validation. same digital datasets with improved or alternative spectral libraries.
After combining the data from the tissue-derived and synthetic peptide libraries, we identified 2188 N-glycosites corresponding to 897 N-glycoproteins from all SWATH maps (supplemental Table S6) and detected on average 1430 Nglycosites in each sample, presenting a much deeper and more consistent glycoproteomic survey than previous studies (22,24,66).
Quantitative Profiling by SWATH-MS Reveals Distinct Nglycoproteomic Signatures Associated with PCa Aggressiveness-To assess the technical quality of the glycoproteomic data with respect to reproducibility and quantitative accuracy, we compared datasets from the same samples acquired at different time points and validated some data points with an orthogonal method. Pooled N-glycosites from normal tissue (N_exp1 & 2) were analyzed 3 months apart via SWATH-MS. The integrated peak areas of the N-glycosites detected in the repeat analyses highly correlated with each other (r ϭ 0.919) (Fig. 2B and supplemental Fig. S1). To validate the quantification accuracy of SWATH-MS with an orthogonal method, we measured the PSA levels of all individual tissue specimens using an Access® hybritech PSA assay and compared the values to the relative peak intensities of PSA in respective glycoproteomic SWATH maps. Fig. 2C indicates that ELISA results and SWATH signals were strongly concordant (r ϭ 0.973). We conclude from these data that the quantitative values obtained via SWATH-MS accurately reflect real glycoprotein abundance changes in the clinical samples tested.
To analyze the correlation of the glycoproteome patterns of three different types of PCa and normal prostate tissues, we performed an unsupervised hierarchical clustering analysis of the samples. In total, 1057 N-glycosites were quantified in at least 8 out of 10 SWATH maps (Fig. 2B). The cluster graph shows that samples of the same subtype clustered together tightly. Moreover, we noticed that the overall glycoproteomes were markedly distinct among different PCa subtypes, with the most significant alteration observed between metastatic and nonmetastatic tumors. In contrast, the separation of the AG and NAG groups was moderate. Principal component analysis of the four samples of AG PCa and the three samples of NAG PCa consistently demonstrated that the AG and NAG groups could be separated by two principal components (Fig.  2D). Overall, these results position the dataset generated in this study as a valuable resource for further analysis of the glycoproteome as an indicator of the clinical behavior of various PCa subtypes.

FIG. 2. Quantitative profiling of tissue N-glyoproteome between different PCa groups.
A, numbers of N-glycosites identified from SWATH maps. Note that identification of de-N-glycopeptides using the SWATHatlas library from synthetic reference peptides increased the number of identified peptides by approximately 30%. B, hierarchical clustering analysis of 1057 N-glycosites quantified among Ͼ80% of the samples. C, the accuracy of SWATH-MS quantification was revealed by the example of PSA that was also measured by the tissue ELISA. D, principle component analysis of NAG and AG cases tested.

Identification of Specific Glycoproteins as Candidates Distinguishing Non-aggressive, Aggressive, and Metastatic PCa-
We employed analysis of variance to associate the expression patterns of specific glycoproteins with the normal, AG, NAG, and metastatic groups. Of the consistently quantified proteins, 220 proteins were differentially expressed across the groups with high significance (Table I) Table S7). Among these, 50 proteins were significantly altered between the AG and NAG groups. Extensive literature searching uncovered evidence that 125 (i.e. 56.8%) of the 220 significantly altered proteins are directly related to prostate cancer or the level of aggressiveness and metastatic potential of other human cancers (supplemental Table S7). Two-thirds of the proteins (i.e. 142 out of 220) showed clear positive staining in Ͼ25% of the prostate cancer tissue specimens documented in the Human Protein Atlas database (supplemental Table S7) (67). Moreover, the list contains clinically applied PCa biomarkers, including PSA and prostatic acid phosphatase (68,69).
Differentially expressed tissue glycoproteins are particularly interesting as biomarker candidates because of the high likelihood of their detectability in blood plasma. Using a bioinformatic approach, we assessed the potential detectability of our candidates in blood plasma. The program SignalP predicted 145 of the 220 (65.9%) proteins as classical secretory proteins (50) TMHMM predicted 124 of 220 (56.4%) proteins as transmembrane (51). In all, 199 of 220 proteins (90.5%) were predicted as secretory or transmembrane proteins (supplemental Table S7). These values predict a high likelihood of detecting these proteins remotely in the blood stream, and the probability is even greater than that determined from secretome studies in which conditioned medium of cultured cancer cells was analyzed (70,71). Indeed, our list covered 25 out of 54 PCa biomarker candidates that were discovered from a previous glycoproteomic analysis of a phosphatase and tensin homolog conditional knockout mouse model (supplemental Table S7) (23). Even more intriguingly, 17 of these 25 glycoproteins were found to be detectable in human serum via SRM (Table I) (23). These include seven proteins (i.e. metalloproteinase inhibitor 1, attractin, asporin, cell adhesion molecule 1, biotinidase, hypoxia up-regulated protein 1, and neural cell adhesion molecule 1) that were finally verified as blood biomarkers predictive for diagnosis or grading of PCa (23). Furthermore, serum cell adhesion molecule 1, cathepsin F, and periostin were found in another study as prognostic markers for survival when serum from 57 metastatic castration-resistant prostate cancer patients was studied (72). A direct comparison of these 220 proteins to the human plasma PeptideAtlas database (73) suggests that 125 proteins were previously observed in human plasma in large-scale shotgun experiments derived from PeptideAtlas. Accordingly, Ͼ75% of the proteins mapped to PeptideAtlas are expressed at a concentration range of Ͻ100 ng/ml in human plasma (supplemental Fig. S2), indicating the strong relevance of this list for biomarker discovery, as lots of tissue leakage products are found in blood at this concentration level (16,28). Table I lists representative biomarkers that were reported as diagnostic and prognostic biomarkers according to the literature, with a focus on the differentially expressed proteins between AG and NAG group and their detectability in human blood.
Annotation of Glycoproteins Significantly Changed in PCa Subtypes with Altered Pathways-We next investigated biological pathways and processes of the 220 significantly altered glycoproteins by functional GO annotation. The cellular component distribution (Fig. 3A) agreed with the prediction results above, as the majority of proteins were annotated to reside either in the plasma membrane (45.4%) or in the extracellular compartment (36.3%). The GO biological processes and PANTHER pathways associated with the altered protein set were displayed in a network with corresponding proteins (Fig. 3B) so that we could manually remove the redundant GO items annotated by the same proteins. Taking the N-glycoprotein SWATHatlas as a comparison background, we found several functionally interplayed processes enriched as clusters. Notably, one cluster was related to lysosome-based proteolysis, and a second was associated with the cell adhesion process (p Ͻ 0.01 in all the processes; Fig. 3B). The processes of protein maturation by peptide cleavage and acute inflammatory response were also enriched. These interlaced biological processes might be helpful in synoptically delineating the molecular variability of PCa subtypes.

Reactome Network Analysis Indicates That Glycoproteomic Regulation Is Closely Connected to PCa Genomic Events-
Cancer proteomic studies sometimes primarily reflect inflammation processes and acute phase responses that coincide with the primary disease, owing to the difficulty of achieving substantial analytical depth (74). Therefore, to explore whether the regulated glycoproteins identified in this study were directly associated with PCa, we related the altered protein set to the genes found altered in previous PCa genomic studies on a functional level (6, 7, 56 -60). We utilized RFIN to investigate functional relationships between regulated glycoproteins and altered genes. Of the 351 altered genes, 184 were found in the RFIN. Using the RFIN network and 100 randomized instances of the same size and degree distribution, we determined that the sets of altered genes and the 220 differentially expressed glycoproteins (from analysis of variance) were significantly interconnected (p ϭ 0.0099; supplemental Fig. S3). Further, the altered glycoproteins between AG and NAG groups were also found to be functionally related to the altered genes known in PCas, such as P53, PI3K, and AR mutations (p ϭ 0.021; Fig. 4). The RFIN result supports the notion that our tissue glycoproteomic investigation was rather deep and correlates well with genomic events commonly encountered in PCa with statistical significance (supplemental Fig. S4), although future studies are needed to establish a causative role between proteomic and genomic observations. Tissue Microarray Verification of NAAA and PTK7 as Signatures of PCa Aggressiveness-We next pursued TMA analysis to further validate the biomarkers of PCa aggressiveness. We selected NAAA and PTK7 in this phase based on the following considerations: 1. In the differential N-glycosite analysis described above, NAAA and PTK7 stood out as proteins with significantly altered expression between NAG and AG tumors. NAAA expression was significantly higher in the NAG group, whereas PTK7 was higher in AG cases (Table I and supplemental Fig. S5). 2. Both proteins showed prevalent positive staining (Ͼ90%) in PCa specimens according to the human Protein Atlas (67). 3. Both were predicted as secretory proteins (supplemental Fig. S5). 4. To the best of our knowledge, they have not been tested as tissue markers for PCa. We thus carried out preliminary IHC staining on six cases of PCa tissues. For both proteins, staining was primarily observed in the epithelial compartment, and increased expression of NAAA and PTK7 was detected in the tumor epithelium relative to normal epithelium (supplemental Fig. S6). In addition, whereas NAAA showed medium to strong staining in Gleason grade 3 tumors (Figs. 5A and 5B), the staining was rather faint in Gleason grade 4 tumors. In contrast, PTK7 staining was more intense in Gleason grade 4 tumors than in Gleason grade 3 tumors (Figs. 5C-5E).
To further evaluate NAAA and PTK7 expression in prostate tissues, TMA analysis was performed with 56 prostate adenocarcinoma cases. Each case contained four cores of tumor and two cores of patient-matched adjacent normal tissue (336 cores in total; supplemental Table S2). Among the 56 cases, 21 (37.5%) had a Gleason score of 3 ϩ 3; 4 (7.1%) had a Gleason score of 3 ϩ 4; 10 (17.9%) had a Gleason score of 4 ϩ 3; 11 (19.6%) had a Gleason score of 4 ϩ 4; 4 (7.1%) had a Gleason score of 4 ϩ 5; 5 (8.9%) had a Gleason score of 5 ϩ 4; and 1 (1.8%) had a Gleason score of 5 ϩ 5. The prostate tumor cases were classified into two categories, with 25 cases (44.6%) of Gleason 3 ϩ 4 or less and 31 cases (55.4%) of Gleason 4 ϩ 3 and above (Table II). NAAA and PTK7 staining intensities were then evaluated and scored by a board-certified pathologist in the epithelial compartment. We firstly confirmed the different expression of these two proteins between the matching adjacent normal tissue and tumor tissues. Figs. 5F and 5G show histograms of NAAA and PTK7 IHC score differences; 66.1% and 55.4% of the cases showed an increase in NAAA and PTK7 staining, respectively, in tumors relative to normal tissues (Wilcoxon-Mann-Whitney rank sum test; p Ͻ 0.001 for both proteins). Most important, whereas the majority (75%) of the adjacent normal tissues stained negative or weakly for NAAA, 54.8% of the tumors (Gleason score Ն 4 ϩ 3) and 88% of the tumors (Gleason score Յ 3 ϩ 4) stained with medium to strong intensity. In contrast, 94.6% of the adjacent normal tissues stained negatively or weakly for PTK7, whereas 28% of the tumors (Gleason score Յ 3 ϩ 4) and 38.7% of the tumors (Gleason score Ն 4 ϩ 3) stained with medium to strong intensities (Table II). Relative to expression in tumors with a Gleason score equal to or less than 3 ϩ 4, NAAA expression was significantly decreased and PTK7 expression was significantly increased (p Ͻ 0.05; Fig. 5H) in tumors with a Gleason score equal to or more than 4 ϩ 3. From the ROC analysis based on TMA staining intensities in the tumor specimens, we found that values of the area under the ROC curve of 0.743 and 0.709 were achieved by NAAA and PTK7 expression, respectively, both with statistical significance. Interestingly, a combined panel of these two proteins showed greater predictive power (area under the curve ϭ 0.801) than each of them alone, indicating a potential power of the two proteins combined to discriminate AG and NAG. DISCUSSION The molecular mechanisms of systemic diseases such as cancer are still poorly understood. Currently in clinics, disease symptoms are recorded, for example, through x-ray images, computed tomography scans, and bio-fluid test results. More accurate records of high fidelity, particularly those indicating molecular-level differences between healthy and disease-affected cells, might be helpful in understanding disease biology and for directing optimal treatment. Because proteomic technologies have lagged behind genomic technologies in terms of throughput, sensitivity, and reproducibility, most molecular analyses of disease tissues have been carried out at the genomic or transcriptomic level, even though it is widely acknowledged that proteomic measurements reflect the functional state of a tissue more closely than genomic patterns.
In this study we used an emerging proteomic method, SWATH-MS, to quantify the N-glycoproteome of a sample cohort consisting of normal tissue as well as non-aggressive, aggressive, and metastatic PCa tissue samples with the intent to identify proteins capable of orchestrating and distinguishing these groups. In contrast to the more widely used DDA methods, in which specific precursor ions are selected from the pool of available precursors, in SWATH-MS and other data-independent acquisition methods, all the precursor ions generated by the mass spectrometer from a particular sample are fragmented, and the fragment ions are recorded in the form of convoluted composite fragment ion spectral maps. In essence, SWATH-MS acquires a complete and permanent digital record for all the detectable components of a sample (26,27) and is therefore an appealing technique for the anal- ysis of unique, nonrenewable clinical samples (e.g. biopsied tissues). In this study, SWATH-MS was applied to the analysis of glycoproteins because this subproteome was assumed to be enriched for potential biomarkers (13,16,17). The data demonstrate that the combination of SPEG and SWATH-MS achieved a high degree of reproducibility for quantifying glycoproteins (Fig. 2) and correlated well with clinical quantitation of PSA. This is consistent with our previous observation E, representative NAAA and PTK7 staining in TMA with corresponding H&E staining. F, histogram of the difference in IHC score between tumor and its matched adjacent normal tissue for NAAA. G, histogram of the difference in IHC score between tumor and its matched adjacent normal tissue for PKT7. Wilcoxon signal rank order test (paired, two-sided) was performed for NAAA and PTK7 between tumors and matched adjacent normal tissues (p Ͻ 0.0001). H, a box-plot was generated for NAAA and PTK7 between tumors with a Gleason score less than or equal to 3 ϩ 4 tumor and tumors with a Gleason score greater than or equal to 4 ϩ 3 tumor. *p Ͻ 0.05; **p Ͻ 0.01. I, ROC analysis of NAAA and PTK7 based on their IHC staining intensities in the tumor specimens. that SWATH-MS allows equal variability (cv Ͻ 20%) and similar accuracy relative to SRM (28). Because SWATH-MS generates composite fragment ion spectra of multiple precursor ions that are concurrently fragmented, we implemented a targeted data analysis strategy to reliably identify and quantify specific peptides from the acquired datasets. The strategy requires the generation of a high-quality spectral library as prior information for the quantification of peptides from SWATH-MS datasets (27). In essence, an ensemble of fragment ion coordinates, including the m/z ratio of fragment ions, their relative signal intensity, the elution patterns of the fragment ion signals, etc., is extracted from the spectral library and used to confidently identify a target peptide in the SWATH-MS map using a dedicated search tool such as OpenSWATH (39). To support this type of data analysis for the N-glycoproteome, we generated an N-glycoprotein SWATHatlas derived from synthetic peptides as an easily transferrable resource that will also support future targeted proteomic studies focused on the N-glycoproteome. The assays from the N-glycoprotein SWATHatlas provided an unprecedented chance to identify and quantify 5422 N-glycosites of 2460 glycoproteins that essentially cover ϳ50% of the annotated human N-glycoproteome. The use of these assays helped us to retrieve 30% more identifications in every SWATH map as an extra benefit. More important, this increase highlights the possibility of data re-mining in the one-time acquired SWATH maps when there are more MS assays available for novel protein targets. This data re-mining feature is a unique advantage of SWATH-MS over traditional DDA or SRM approaches.
It should be noted that the addition of unique shotgun identifications from an instrument other than that used for the SWATH-MS analysis (e.g. Orbitrap XL-derived spectra added to a library generated on a 5600 instrument) is not ideal because the fragment ion spectra are less similar across instrument types than within instrument types (75). In this study, because of the limited amount of peptide available from the microdissected clinical samples, we nevertheless extended the 5600 instrument-derived spectral library with spectra generated on an Orbitrap XL instrument. The direct transferability for targeted proteomics of peptide fragmentation data across instrument platforms and the outcome were assessed in a recent study (75) in which Toprak et al. reported that TripleTOF shotgun peptide fragmentation provided the most similar fragmentations to that observed with SWATH-MS targeted data analysis, compared with other shotgun solutions (e.g. those provided by an Oribtrap instrument). Therefore, we again recommend that one acquire, when possible, instrument-specific libraries or specifically tune the fragmentation parameters of the MS instrument available.
We previously determined the limit of detection of SWATH-MS in human plasma N-glycoproteome at 5 ng/ml, a level close to the PSA concentration in blood of PCa patients (28). Here, substantial analytical depth of the tissue glycoproteome was also achieved in glycoproteomic SWATH analysis, which uncovered pathways (Fig. 3) associated with PCa progression including cell adhesion (76), integrin signaling (77), and proteolysis (78), indicating possible linkage between glycoproteomic regulations and common genomic mutations in PCa (Fig. 4). These results shed more light on the disease biology related to different PCa types, which might be interesting for future investigations. Although we deployed a greedy literature search for each putative biomarker candidate from the discovery phase (supplemental Table S7), defining the mechanism of their relationship with prostate cancer aggressiveness and metastasis is beyond the scope of this study. Nevertheless, our SWATH dataset nicely overlapped with already described PCa markers and, again, allows for the re-quantification and verification of newly discovered biomarkers or proteins of interest at any time in the future.
To summarize the above discussions, the combination of SPEG and SWATH-MS provides a reproducible, deep, and quantitative glycoproteomic reference that can be easily reexamined and therefore provides a significant advance for biomarker studies. The high specificity (75% to 90%) of SPEG was repeatedly tested in many previous studies (18 -25). Here, according to our shotgun data, we found that only 2.6% of the identified de-N-glycopeptides can present a natural, detectable un-deamidated form, demonstrating a practically high specificity of glycoproteomic survey by our combined method. In the instrument configuration for SWATH-MS used in this study, 1 to 3 g of total peptide mass represented an optimal sample load. This translates into a need for about 10E5 to 10E6 cells for proteomic profiling and at least 10E7 cells for the glycoproteomic survey by SWATH analysis following SPEG enrichment (19,79). Thus, a more efficient protein extraction strategy and a specific SPEG protocol (80), both optimized for small amounts of clinical samples such as those derived from microdissected tissues or biopsied samples, make an important contribution toward the reproducible glycoproteomic phenotyping of clinical samples via SWATH-MS.
Two glycoproteins, NAAA and PTK7, were for the first time reported and verified as novel potential biomarkers for PCa aggressiveness. NAAA is a lysosomal enzyme that degrades bioactive fatty acid amides to their corresponding acids, with primary reference to the anti-inflammatory substance N-palmitoylethanolamine. NAAA also exhibits weak hydrolytic activity against ceramides N-lauroylsphingosine and N-palmitoylsphingosine (81). It has been reported that NAAA is present in cell lines derived from human blood cells (82). Wang and coworkers found that among human tissues, the prostate showed the highest NAAA mRNA level (83). In addition, IHC staining images from the human Protein Atlas also showed stronger staining with NAAA antibody relative to all the other 19 types of human cancers (67). Moreover, Wang et al. also reported that NAAA is functionally active in PCa cells and is released as a secretory protein. Interest-ingly, they illustrated that the NAAA expression in LNCap cells, a PCa cell line that is responsive to androgen stimulation, is higher than that in PC-3 and DU-145 cells, which are androgen-insensitive cells representing the androgenrefractory phase of advanced PCa (83). In this study, we showed that NAAA has a significantly lower expression in PCa with a Gleason score Ն 4 ϩ 3, which for the first time established NAAA not only as a potential PCa biomarker, but also as a promising signature for tumor aggressiveness. However, the physiological role of lower NAAA expression in highgrade PCa tissues is unclear and requires further studies.
PTK7, also known as colon carcinoma kinase-4, is an inactive tyrosine kinase involved in various processes. PTK7 is an essential component of the Wnt/planar cell polarity pathway that controls tissue polarity and cell movement. Unlike NAAA, PTK7 has been previously annotated as a cancer-associated protein.
In previous reports, the expression of PTK7 has been shown to be increased in leukemia cell lines (84), acute myeloid leukemia specimens (85), esophageal squamous cell carcinoma tissues (86), and colorectal carcinomas (87). Also, PTK7 was reported as a potential prognosis biomarker for a number of cancers, such as gastric cancer (88), lung adenocarcinoma (89), and triple-negative breast cancer treated with chemotherapy (90). In the case of gastric cancer, positive staining of PTK7 was noted in 114 of 201 tissue samples (88). However, until now PTK7 expression has not been reported to be associated with prostate cancer. We found that PTK7 was dysregulated in PCa tumors relative to the matched adjacent normal tissues. More important, PTK7 was up-regulated in aggressive PCa tumors relative to non-aggressive PCa. The exact functional role of PTK in PCa progression remains to be determined.
We observed a certain degree of synergy in applying these two markers for identifying NAG cases (Figs. 5H and 5I). It therefore would be very interesting to seek the possibility of establishing a biomarker panel based on these two markers in future studies to avoid overtreatment of NAG PCa. Because both NAAA and PTK7 were predicted as secretory proteins (Table I), we envision further investigations into their detectability in human plasma and their discriminating power between PCa cases in blood. Also, the relationship of their expression with PCa metastasis needs to be ascertained. TMA analyses (56 cases, 336 cores) were performed for NAAA and PTK7, and the expressions of both proteins were significantly altered between AG and NAG samples. These data suggest that NAAA and PTK7 might be potential IHC markers for staging prostate cancer. However, larger scaled TMA analyses with different cohorts need to be done to validate their clinical utility, because of the practical difficulty of discriminating AG and NAG PCa (as we note in Fig. 2). All the newly identified markers for PCa aggressiveness have to be validated in prospective studies in the future.