Targeted Proteomics for Multiplexed Verification of Markers of Colorectal Tumorigenesis*

Targeted proteomic methods can accelerate the verification of multiple tumor marker candidates in large series of patient samples. We utilized the targeted approach known as selected/multiple reaction monitoring (S/MRM) to verify potential protein markers of colorectal adenoma identified by our group in previous transcriptomic and quantitative shotgun proteomic studies of a large cohort of precancerous colorectal lesions. We developed SRM assays to reproducibly detect and quantify 25 (62.5%) of the 40 selected proteins in an independent series of precancerous and cancerous tissue samples (19 adenoma/normal mucosa pairs; 17 adenocarcinoma/normal mucosa pairs). Twenty-three proteins were significantly up-regulated (n = 17) or downregulated (n = 6) in adenomas and/or adenocarcinomas, as compared with normal mucosa (linear fold changes ≥ ±1.3, adjusted p value <0.05). Most changes were observed in both tumor types (up-regulation of ANP32A, ANXA3, SORD, LDHA, LCN2, NCL, S100A11, SERPINB5, CDV3, OLFM4, and REG4; downregulation of ARF6 and PGM5), and a five-protein biomarker signature distinguished neoplastic tissue from normal mucosa with a maximum area under the receiver operating curve greater than 0.83. Other changes were specific for adenomas (PPA1 and PPA2 up-regulation; KCTD12 downregulation) or adenocarcinoma (ANP32B, G6PD, RCN1, and SET up-regulation; downregulated AKR1B1, APEX1, and PPA1). Some changes significantly correlated with a few patient- or tumor-related phenotypes. Twenty-two (96%) of the 23 proteins have a potential to be released from the tumors into the bloodstream, and their detectability in plasma has been previously reported. The proteins identified in this study expand the pool of biomarker candidates that can be used to develop a standardized precolonoscopy blood test for the early detection of colorectal tumors.

Colorectal malignancies are a major driver of cancer-related morbidity and mortality in adults (1,2). Most of these cancers originate as benign, adenomatous lesions of the colorectal mucosa (3)(4)(5). Although they are noninvasive, colorectal adenomas already harbor genetic and epigenetic alterations that disrupt the homeostatic equilibrium between proliferation, differentiation, and apoptosis (particularly those involving components of the Wnt-APC-␤catenin pathway). With time, however, some adenomas acquire additional mutations that result in their transformation. Colorectal adenomas and early-stage adenocarcinomas can be eradicated with ease, and the prognosis in these cases is generally good (2,6,7).
Early detection is thus the best way to reduce the incidence of colorectal cancer (CRC) 1 . Major progress on this front has been hampered, however, by the limitations of available screening techniques (8 -10). Colonoscopy offers high diagnostic accuracy and allows prompt removal of suspicious early-stage lesions, but endoscopic screening is invasive, costly, and time-consuming. Fecal assays have none of the latter drawbacks, but their specificity is low, and they are reportedly less effective in the detection of adenomas than of adenocarcinomas (10 -12).
One of the most promising solutions for filling this diagnostic gap involves the use of targeted proteomic methods to identify and quantify tumor-associated proteins in body fluids.
Major ongoing advances in proteomics technology are providing researchers with methods that are more sensitive and reproducible than data-dependent techniques (13)(14)(15)(16). Selected reaction monitoring (SRM)-also known as multiple reaction monitoring (MRM)-is a sensitive targeted proteomic technique for the reproducible quantification of specific proteins within a complex sample background (14,17,18). Importantly, in a single SRM experiment, numerous precursor/ fragment ion pairs (i.e. transitions), and therefore several proteins can be monitored across a relatively large number of samples. The resulting chromatographic trace provides information on the retention time and signal intensity per transition, which can be used to determine the relative or absolute abundance of several target peptides. If results are normalized to a defined amount of a stable isotope-labeled internal peptide, endogenous protein abundance can also be compared in various samples and/or under diverse test conditions (17,19,20).
Proteomic studies have already pinpointed several potential biomarker candidates for the detection of CRC. However, most studies stop short of verifying the candidates in a systematic and reproducible manner, an essential step for translating biomarker discoveries to clinical application (6,8,9,21). Verification, however, is an arduous process, which is made even more difficult by the lack of appropriate tools for identifying and reproducibly quantifying levels of candidate proteins across multiple samples (6,21). Enzyme-linked immunosorbent assays (ELISAs) are still the preferred methods for preclinical verification of candidate biomarkers, but the antibodies used for these assays are characterized by high specificity requirements and elevated production costs, which render them less than ideal for testing large numbers of tumor biomarkers (19,22). Targeted proteomic techniques have been proposed as standard complementary methods for the clinical verification and validation of candidate disease biomarkers, and these methods are being used in an increasing number of clinical proteomic-based cancer studies (19,(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33).
In a comprehensive shotgun proteomics study of human colorectal adenomas and patient-matched samples of normal colon mucosa, we recently identified several putative diagnostic markers representing tumor stages along the adenoma-adenocarcinoma pathway (34). In the present study, we developed targeted SRM assays (27) and used them to profile the expression of these candidate protein markers in a large, independent group of colorectal tumors, precancerous and invasive. The results revealed potential markers that significantly distinguished CRC tumors from healthy tissues, showed substantial correlation with clinical parameters and are apparently detectable in human plasma.

EXPERIMENTAL PROCEDURES
Human Tissue Samples-Tissue samples were used with local ethics committee approval and in conformity with the principles set forth in the Declaration of Helsinki. Human colorectal tissues were collected prospectively during colonoscopies performed at the Instituti Ospitalieri of Cremona, Italy. All donors provided written consent to sample collection, testing, and data publication. Samples were identified by numerical codes to protect donors' rights to confidentiality. The series comprised 19 adenomas and 17 adenocarcinomas, each with a matched sample of normal mucosa from the colon segment bearing the tumor (Ͼ2 cm from the lesion). The characteristics of the paired normal and tumor tissue samples are summarized in Table I. Immediately after collection, samples were frozen in liquid nitrogen and stored at Ϫ80°C.
Protein Extraction from Tissues-Rapidly weighed frozen tissue samples (weight range: 10 to 70 mg) were placed in solution (180 l) containing 100 mM triethylammonium bicarbonate (Sigma, St. Louis, MO, USA), 1X Complete Ethylenediaminetetraacetic acid (EDTA)-free Protease Inhibitor Mixture (Roche, Mannheim, Germany), 1 M urea, 5 mM ␤-glycerophosphate disodium salt hydrate, 1 mM sodium orthovanadate, and 5 mM sodium fluoride (Sigma) and homogenized on ice in a Wheaton glass borosilicate grinder. The homogenate was transferred to a 1.5-ml Eppendorf tube, the grinder was washed with 60 l of lysis buffer, and the combined volumes were sonicated with a Bioruptor (Diagenode, Denville, NJ) (high power, five 10"/10" ON/ OFF cycles) and centrifuged (16,000 ϫ g for 5 min at 4°C). The supernatant containing the tissue protein extract was collected and stored at Ϫ80°C.
Protein Digestion-For each sample, 50 g of protein in denaturing buffer (initial reaction volume ϭ 62.5 l in 4 M Urea) were reduced with 10 mM dithiothreitol (1.25 l of a 500-mM stock solution) at 35°C for 45 min and alkylated with 50 mM iodoacetamide (6.4 l of a 500-mM stock solution) at room temperature for 1 h in the dark. Dithiothreitol (50 mM: 7 l of a 500-mM solution) was added to terminate the alkylation reaction, and the samples were incubated at room temperature for 10 min. Afterward, 5 l of 0.1 g/l sequencing-grade endoproteinase Lys-C (Roche) was added to each sample and incubated at room temperature for 3 h. After Lys-C digestion, the samples were diluted with 50 mM ammonium bicarbonate to a final urea concentration of 1.4 M, and digested with 2.5 l of 0.4 g/l sequencing-grade modified trypsin (Promega, Madison, WI) at 37°C for 15 h. To stop the digestion, we added acetonitrile and trifluoroacetic acid (TFA) to a total concentration of 3% acetonitrile and 0.1% TFA, and pH 2-3. The pH of acidified samples was verified with pH-indicator strips (McolorpHast TM , Merck, KGaA, Darmstadt, Germany). Compared with in-solution trypsin digestion, serial proteolysis using Lys-C and trypsin increases the yield of fully cleaved peptides, thereby improving the accuracy and sensitivity of SRM protein quantification (35). Peptide solutions were desalted on SepPak C18 columns, lyophilized in a vacuum centrifuge, and resolubilized in 50 l (1:1 v/w) of a solution of 3% acetonitrile, 0.1% formic acid, and indexed Retention Time (iRT) 10ϫ Mix (Biognosys AG, Zurich, Switzerland) (36). The iRT mix was prepared in accordance with the manufacturer's protocol.
Stable Isotope-labeled Standard Peptides (Heavy Peptides) for SRM Assay Library-Isotope-labeled heavy peptides corresponding to the proteotypic peptides selected for the study (supplemental Table S1), and containing either a C-terminal (13C(6) 15N(4)) arginine or a (13C(6) 15N(2)) lysine residue were purchased from JPT Peptide Technologies GmbH (Berlin, Germany). A pool containing known quantities of each standard peptide in the sample matrix solution was subjected to nano-liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis carried out on an LTQ Orbitrap Velos (Thermo Fischer Scientific, Bremen, Germany). Mascot output .dat files were imported into Skyline (37) and used to generate the spectral library for SRM transitions.
SRM Assay Development and Time-scheduled SRM Analysis-The SRM assay was developed on a TSQ Vantage Triple Quadrupole Mass Spectrometer (Thermo Fischer Scientific) using a sample mixture containing defined amounts of each isotope-labeled peptide and data from the spectral library. The spectral library consisted of transitions measured in data-independent mode in the sample mixture. A preliminary SRM-transition list for at least two peptides per protein was created from the spectral library in Skyline. For each peptide, we selected four to eight dominant transitions (precursor and most intense fragment ion pair) consisting of doubly and triply charged y-and b-ion precursors (2ϩ and 3ϩ charge states), and fragment ions with charge states of 1ϩ and 2ϩ. Peptides were analyzed on the TSQ Vantage in the SRM mode, which involves repeated cycling through a list of transitions, with predefined dwell-times on each transition (17). Retention times extracted from the SRM scans were used to calculate iRT values (36) for each peptide in Skyline. SRM scans were searched to confirm the near absence of endogenous transitions in the crude peptide mix. We selected transitions with optimum SRM properties (14,17) and iRT values for defining time-scheduled SRM methods to monitor our target proteins. In time-scheduled SRM mode, the full cycle time is used to detect and quantify peptides expected to elute within a given retention-time window. This restricts the acquisition of defined transitions to a window around the elution time of the corresponding peptide. We optimized SRM parameters, including collision energy (CE), as previously described (38), dwell time, and crude peptide spike-in amount. Retention time scheduling for all transitions was selected based on Skyline scheduling predictions. A window of Ϯ 2.5 min was used to schedule all transitions. The final transition list consisted of 600 transitions. To facilitate the detection of lowabundance peptides without compromising the dwell time of 0.020 s per transition, the 600 transitions were divided into two groups of 300 (based on protein name) and analyzed with two different time-scheduled SRM acquisition methods (Method 1, Method 2). Complete information on the two groups of transitions is provided in supplemental Tables S2 and S3. Two proteins (GAPDH and HSP90AA1) generally used as "housekeeping" proteins were included in our protein target list to explore their utility as global standards in colorectal tissues, but they were not used to normalize the intensities of target transitions. SRM transitions for these proteins and iRT peptides were monitored in each method.
Liquid Chromatography-Tandem Mass Spectrometry (LC-MS/ MS)-Scheduled SRM measurements were performed on the TSQ Vantage instrument equipped with a nanoelectrospray ionization source and coupled to a NanoLC Ultra 2D HPLC system (Eksigent Technologies, Dublin, CA). The instrument was used in the SRM acquisition mode with the following settings: Q1 resolution, 0.7 FWHM Q3 resolution, 0.7 fwhm; cycle time, 2 s. A spray voltage of 1.3 keV was used with an ion transfer tube set heated to a temperature of 270°C. Argon was used as the collision gas at a nominal pressure of 1.5 mTorr. Chromatographic separation of peptides was performed on a frit column (150 mm ϫ 75 m) made in-house and coupled to a fused silica emitter (100 mm ϫ 75 m) (New Objective Inc., Woburn, MA). The columns were packed with reverse-phase C18 material (AQ, 3 m 200 Å, Bischoff GmbH, Leonberg, Germany) and maintained at 50°C with an automatic heater during all SRM experiments. Peptides were loaded onto the column from a cooled (4°C) Eksigent autosampler and separated with a linear gradient of acetonitrile/water containing 0.1% formic acid at a flow rate of 300 nl/min. An elution gradient of 5% to 35% acetonitrile over 60 min was used. Four microliters of sample, corresponding to 1.25 g peptide mass, was injected. Blank runs were performed between SRM measurements of paired biological samples to avoid sample carryover.
Sample Randomization and Blocking-Our experimental design employed a block-randomized sample strategy (39). As shown in Fig.  1, the order in which the adenoma/normal mucosa and adenocarcinoma/normal mucosa pairs were analyzed was determined randomly, and balanced allocation in each group was maintained during both the sample preparation and SRM analysis phases. Proteins were extracted from four tissue samples at a time (i.e. 1 adenoma/normal mucosa pair and 1 adenocarcinoma/normal mucosa pair). For the digestion of the protein extracts, we divided the 36 sample pairs into three groups. One sample pair randomly selected from each group was digested twice to control for unwanted variations in peptide intensity at this step of sample preparation. As a result, the total number of samples for scheduled SRM analysis increased from 72 to 78. The order of spectral acquisition on the TSQ Vantage was also randomized. For each SRM method (Fig. 1), we created a sample queue using the random function in Microsoft Excel. Samples were randomized based on protein digestion group, protein extraction pairs, day of sample processing, and SRM data acquisition. Each patient sample was analyzed with both time-scheduled methods, thus producing 156 acquired data files. Sample analysis time was 270 h, including run times for iRT and blank controls. SRM data was statistically analyzed to detect differences between the adenoma and normal mucosa groups and between the adenocarcinoma and normal mucosa groups.
Data Analysis-SRM data were analyzed as outlined in Fig. 1. Peak group identification and peptide scoring were performed using mProphet, a computational tool for statistically validating SRM mass spectrometry data (40), in SpectroDive software (Version 5.5.5478. 20997, Biognosys AG). Input files for this analysis consisted of the data files containing the SRM spectra acquired for each sample during the scheduled runs and the transition assay list detailing specified parameters such as Q1, Q3, CE, iRT score and relative fragment intensity (supplemental Table S4). SpectroDive processes SRM data using the assay list of target and heavy standard peptides and applies mProphet's extraction and scoring strategy. mProphet employs a "decoy-transition" approach to probabilistically score individual features in SRM peaks and integrates them into a combined discriminant score (Cscore) while controlling for the false discovery rate (FDR) (40). Decoy transitions are transitions for peptide species that are known to be absent in the biological sample; they therefore function as negative controls. We employed synthetic decoys because our workflow involved the use of isotope-labeled internal standards (supplemental Table S4). Peptides that were detected in at least 6 samples and had an FDR Ͻ1% corresponding to a Cscore Ͼ10 were further processed for quantification (supplemental Table S5).
Protein significance was analyzed with the MSstats R-package (MSstats.daily 2.1.6) in R statistical software (version 3.2.3) (41). The input file for this analysis contained values for Condition, Bioreplicate, and Run, as pre-assigned in Skyline according to the experimental design (supplemental Tables S6 and S7). Data were first processed in MSstats, and all transition intensities were transformed into log 2 values. Next, normalization based on "equalizeMedians" (41) was performed on all transitions using the transition intensities of the isotope-labeled standard peptides. This equalized the median of standard intensities across runs and applied similar between-run shifts to endogenous peptide intensities in the experiment. Quantification of protein abundance and analysis of differential abundance in patient groups were performed with the linear mixed effect model for SRM workflows utilizing stable isotope-labeled standard peptides (41,42) to normalize the intensities of respective endogenous peptides. The resulting p values were corrected for multiple comparisons (41). Significant tumor-associated dysregulation was defined as differential expression in neoplastic (adenoma and/or adenocarcinoma) and normal mucosa characterized by an adjusted p value of Ͻ0.05 and a linear fold change Ն Ϯ1.3 (corresponding to a log 2 fold change of 0.4).
Statistical and Functional Network Analysis-Protein abundance ratios (endogenous/isotope-labeled standard) were visualized using the TreeView software (43). The pROC package in R was used to generate receiver operating characteristics (ROC) curves and to calculate areas under the curves (AUCs) and corresponding confidence intervals. The 95% confidence interval of the AUC was computed with 2000 stratified bootstrap replicates for each case. Relationships between protein abundance and clinical variables were assessed with linear models (lm) using single stratum analysis of variance (ANOVA) (categorical variables) or regression (numerical variables) analysis, both performed in R. Intraclass correlation coefficients (ICC) (44) were calculated in R statistical software to assess the agreement between protein expression changes measured with SRM and those obtained in our previous study (34) with iTRAQ (isobaric tags for relative and absolute quantification) methods. Deconvoluted protein ratios were calculated to compare SRM-and iTRAQ-based measurements on the same scale. The deconvoluted protein ratios (e.g. normal mucosa versus adenoma or adenoma versus normal mucosa) derived from SRM measurements were calculated as the ratio of endogenous peptide (L) to heavy peptide (H) in each normal mucosa versus adenoma pair (i.e. normal L/normal H:adenoma L/adenoma H). Deconvoluted iTRAQ protein ratios for normal mucosa versus adenoma were calculated as previously described (34).
Functional network analysis to identify potential interaction partners was performed with the GeneMANIA algorithm (genemania.org). The methods used for weighting and for analyzing network associations were automatically selected in GeneMANIA. Bar plots and dot plots were prepared in GraphPad Prism (version 6). Protein association with diseases was examined using the web version of FunDO (45). Briefly, the disease association of each gene in the genome was annotated using the Disease Ontology and peer-reviewed evidence from GeneRIF (46). A condensed version of the Disease Ontology, Disease Ontology Lite, was used for the statistical analysis (47), and the significance of each disease association was evaluated by Fisher's exact test (48).

SRM Assays for Verification of Candidate Colorectal Tumor
Markers-We performed an independent validation study to verify findings from our earlier discovery-based transcriptomic (49) and shotgun proteomic (34) studies of a large cohort of precancerous colorectal lesions. Our objective was to create a high-throughput SRM-based workflow for detecting and quantitating the relative abundance of candidate biomarkers in an independent series of colorectal adenomas. We also examined colorectal adenocarcinomas to validate protein abundance changes during tumor progression. The characteristics of the 19 adenomas and 17 adenocarcinomas analyzed are listed in Table I. Protein abundance in each tumor was normalized to that found in patient-matched samples of normal mucosa. The 40 target proteins selected for this study (supplemental Table S1) included: 29 candidate biomarkers of colorectal neoplasia identified in our previous shotgun proteomics study (34); six proteins putatively involved in the same or related pathways as some of these 29 proteins; three candidate biomarkers of colorectal neoplasia identified in a previous transcriptomic study by our group (49); and two proteins widely used as housekeeping proteins (GAPDH and HSP90AA1). Proteotypic peptide sequences for each protein were selected based on the results of our previous study (34), and from other proteomics data sets available in the Peptide Atlas database (50). Additionally, peptide sequences conformed with empirical selection criteria including the length of the tryptic peptide sequence, hydrophobicity, uniqueness, and absence of missed cleavages (17,51,52). When possible, we also avoided the use of sequences with amino acids prone to chemical modification. This selection resulted in a total of 140 proteotypic peptides, with a minimum of two peptides per protein (supplemental Table S1).
Targeted SRM assays for the 40 selected proteins in unfractionated extracts were developed and optimized as illustrated in supplemental Fig. S1 (see also Experimental Section). Although sample fractionation improves sensitivity when combined with shotgun-proteomic workflows, it is less useful in SRM workflows (23,53). The additional sample preparation steps increase the chances of variation in the measurement of protein abundance. The sample throughput level also declines because multiple fractions from each sample have to be measured individually.
Stable isotope-labeled heavy standard peptides with amino acid sequences identical to those of the corresponding 140 endogenous peptides were synthesized and analyzed by LC-MS/MS to validate the peptide sequence. This step was necessary to ensure confident identification of each peptide monitored, to compensate for suppression effects, and to improve measurement precision (27). Spectral libraries were generated in Skyline with data acquired by shotgun analysis of a mixture of pooled aliquots of the 140 crude synthetic peptides.
Of the 140 peptides analyzed, only 74, corresponding to 31 proteins (supplemental Table S8), were detected. Nondetection of the other peptides may have been related to several factors, including suboptimal MS ionization properties and/or matrix effects caused by other components in the unfractionated samples.
Targeted Analyses of Colorectal Tissue Samples by LC-SRM-The optimized SRM assays were used to quantify 31 target proteins in colorectal tissue samples. Tissue peptide digests were assayed in randomized order (Fig. 1). Equal amounts of each standard peptide were introduced into each tissue peptide sample from a pooled mixture of 74 standard peptides. Standard isotope-labeled peptides were used as internal controls during statistical analysis of our SRM data and for the measurement of relative changes in protein abundance across tissue samples. Each tissue peptide digest also contained a known concentration of a mixture of iRT reference peptides (36), which served as internal controls for monitoring instrument and technical variations during time-scheduled SRM measurements. Based on precursor and fragment ion intensities, two to four transitions were selected for each standard (heavy) and endogenous (light) peptide.
Reproducibility of SRM Measurements-The sensitivity and validity of SRM measurements can be affected by pre-analytical variability (introduced by factors such as sample prep-aration) and/or analytical variability (resulting from undesirable variation of instrument parameters during SRM data acquisition) (19). Our block-randomized (39) study design (Fig. 1, explained in Experimental Procedures) ensured adequate sample randomization from the processing phase through SRM spectral acquisition. The paired feature (normal and tumor) of colorectal tissue for each patient was maintained in our randomized design. We examined our SRM data for unwanted bias caused by variability in instrument performance.
As shown in supplemental Fig. S2, a retention time shift of less than 1 min was maintained for each iRT peptide in all samples analyzed by time-scheduled SRM. Although each tissue sample was analyzed with both time-scheduled methods, chromatographic performance remained stable across all 156 SRM measurements.  The reproducibility of our SRM spectral peak measurements for each peptide was also assessed by comparing the endogenous (light) to standard (heavy) peptide (L/H) peak ratios for each peptide monitored in duplicate peptide digests from the same protein extract. As shown in Fig. 1, protein extracts from three paired samples of normal and tumor tissues (total: 6 samples) were digested twice, and the resulting 12 digests were analyzed separately on the mass spectrometer, applying a randomized SRM sequence. For each duplicate pair, an R 2 value of Ͼ0.9 and an intercept of ϳ1 was observed, indicating that a reproducible peak area was obtained for all peptides measured in each sample duplicate (supplemental Fig. S3). This demon-strates that the confounding effects of instrument variance on our data and the inferred results were adequately minimized.
Peptide Identification and Quantification of Protein Abundance in Colorectal Tissues-Confident identification of peptides in the 72 tissue peptide digests analyzed was achieved using the mProphet statistical tool (40). A comprehensive list of peptide identification parameters is provided in supplemental Table S5. Peptides were considered identified if peak groups had a Cscore (40) Ͼ10 at a controlled FDR (Qvalue) cutoff of Ͻ1%. Applying this stringent cutoff (supplemental Table S5), we identified 53 endogenous peptides and achieved a sensitivity of 71.5% in peak group selection. Moreover, we manually FIG. 1. Block-randomized experimental design for SRM analysis. Extraction of protein from samples was randomized based on the tumor type. Two paired samples (1 consisting of adenoma and adjacent normal mucosa, the other of adenocarcinoma and adjacent normal mucosa) were extracted at the same time. Enzymatic digestion of protein extracts was performed in three batches, each comprising 24 randomly allocated samples. In each batch, one randomly selected matched tissue pair (tumor/normal mucosa) was digested twice (duplicates). Transitions for time-scheduled SRM were divided into two groups, and each sample was analyzed twice according to a randomized queue. More details in Experimental Section and Results.
confirmed the coelution of heavy and light peptides for these 53 peptides.
A peptide peak group was chosen for protein quantification only if the corresponding standard synthetic peptide transitions were observed in peptide digests from all 72 tissue samples analyzed. The 39 peptides that fulfilled this requirement were used to quantify 25 proteins. Overall, the SRM assays developed and refined in the present study to quantify selected proteins in colorectal tissues achieved a success rate of 62.5% (25 out of 40 proteins) ( Fig. 2A and supplemental Fig. S4).
SRM Verification of Putative Biomarkers for Colorectal Neoplasia-Our SRM assays successfully detected 25 of the 40 candidate protein biomarkers in unfractionated digests of the tissue proteome. We subsequently confirmed that these assays could be used to reproducibly quantify levels of those 25 proteins across tumor samples and to identify their differential abundance between tumor and normal mucosa samples. All 25 proteins were reliably quantified across samples (Fig. 2B). Statistical analysis to identify significantly dysregulated proteins was performed with MSstats (41, 42), as described in Experimental Procedures. Proteins with an adjusted p value of Ͻ0.05 and a fold change Ն Ϯ1.3 were considered significantly altered (Fig. 3). One of the housekeeping proteins, GAPDH, displayed significantly different abundance level in adenomas and normal mucosa samples after the data were normalized with heavy peptides (equalizedMedians option in MSstats, see Experimental Procedures). Therefore, normalizing data to "housekeeping" proteins did not appear to be a reliable method for maximizing quantification of colorectal tumor-related protein variations in this study.
The diagnostic potential of the five most markedly dysregulated proteins (selected on the basis of p values) in each tumor type was assessed using ROC analysis on protein intensity ratios (L/H). Fig. 4 depicts the trade-off between sensitivity and specificity for five adenoma-associated proteins (S100A11, SORD, SERPINB5, NCL, and PGM5) with AUC values Ն0.87. A similar analysis of adenocarcinoma samples yielded AUC values Ͼ0.80 for SERPINB5, ARF6, ANXA3, CDV3, and NCL, indicating a fairly good capacity for distinguishing cancer lesions from normal mucosa (Fig. 5).
In summary, most of the proteins whose expression levels were significantly altered in tumor samples showed directionally similar forms of dysregulation (as compared with normal mucosa) in both adenomas and adenocarcinomas (Table II, supplemental Fig. S5): 11 of these proteins (ANP32A, ANXA3, SORD, LDHA, LCN2, NCL, S100A11, SERPINB5, CDV3, OLFM4, and REG4) were significantly up-regulated, whereas expression levels of ARF6, and PGM5 were decreased in both tumor types. However, some proteins showed significant expression changes in only one of the two tumor types: upregulation of PPA1 and PPA2 and downregulation of KCTD12 were seen exclusively in adenomas, whereas up-regulation of ANP32B, G6PD, RCN1, and SET and down-regulation of AKR1B1, APEX1, and PPA1 were specific to adenocarcinomas. In ϳ80% of both types of tumors the direction of dysregulation (increased or decreased protein levels) (r ϭ 0.77, p value ϭ 0.001) was the same in both tumor groups. PPA1 was an exception, displaying higher-than-normal abundance in adenomas and lower-than-normal abundance in carcinomas (supplemental Figs. S5 and S6). In a combined ROC analysis using adenoma and adenocarcinomas as a single classifier (i.e. neoplastic tissue), a panel of five proteins (S100A11, SERPINB5, NCL, SORD, and ANXA3) discriminated between normal mucosa and neoplastic tissue with sensitivity ranging from 70% to 100% at 80% specificity (Fig. 6).
We investigated the correlation (p values and Pearson's r score) between the abundance of these five proteins and different clinical features of the tumors using linear models for regression analysis or ANOVA (see Experimental Procedures). As shown in Fig. 7, SORD and NCL levels in adenomas and cancers tend to be higher in larger lesions (panel A). Furthermore, levels of SORD, S100A11, and SERPINB5 were more positively correlated with adenoma-associated mucosal pit patterns (IIIs, IIIl, and IV) and histology (TA, TVA, VA) than with cancer-associated pit pattern (type V) and histology (panels B and C, respectively) (3). When correlation analysis was extended to all the proteins that were significantly dysregulated in adenomas and/or cancers, PGM5 and G6PD levels displayed significant correlation with patient age, whereas SET, AKR1B1, and PPA2 abundance showed significant correlations with tumor size based on p values (supplemental Figs. S7). Furthermore, an array of proteins outlined in supplemental Fig. S8 significantly associated with tumor mucosal pit pattern, tumor histology and colon segment where tumor was localized (Table II).
SRM-based measurements of the selected 23 proteins were in general consistent with corresponding figures obtained with iTRAQ technology in our previous study in another adenoma series (34): ICC (A,1) ϭ 0.65, p value ϭ 0.00010, 95% CI ϭ 0.353 -0.828. We also calculated the Pearson's correlation coefficient between SRM and iTRAQ measurements using the average deconvoluted normal versus adenoma ratio for each protein (see Experimental Procedures for explanation of protein deconvoluted ratios). A Pearson's rvalue of 0.82 showed a positive correlation between both data sets (Fig. 8A). However, few iTRAQ changes were not validated by SRM. The up-regulation of APEX1, RCN1 and SET observed with iTRAQ in adenomas were not confirmed by SRM, but they were detected in the adenocarcinomas analyzed by SRM (Fig. 3). ARF6 levels were up-regulated in the adenomas analyzed by iTRAQ and downregulated in the adenomas of the present study (Fig. 8A). To explore this discrepancy, we recalculated the protein abundance ratio of ARF6 in the iTRAQ adenomas-first using the sum of reporter ion channels from all confidently identified peptides (34), then with reporter ion intensity from the unique peptide employed for SRM quantification (supplemental Table S8)-and compared the resulting deconvoluted protein ratios with the one from the SRM study (Fig. 8B). As expected, ARF6 protein ratio based on all peptides confidently identified by iTRAQ indicated up-regulation in adenomas, whereas that based solely on the proteotypic peptide employed for SRM quantification showed a similar regulation trend with no significant change in expression from that of normal mucosa. This inconsistency in iTRAQ and SRM measurements for ARF6 might be attributed to biological variance inherent to patient samples. The adenoma-related down-regulation of ARF6 detected in the patient samples examined by SRM is most likely accurate because this protein is even more substantially downregulated in adenocarcinomas ( Fig. 3 and Table II), and its expression is positively correlated with a more advanced (type V) mucosal pit pattern (Fig. 8C and Table II). DISCUSSION This study demonstrates that SRM technique can be used to quantify levels of multiple proteins that are putative markers of colorectal neoplasia across a relatively large set of tissue samples. In the assay, we developed, tissue protein extracts  6. Potential markers for colorectal tumors (adenoma or adenocarcinoma). ROC analysis was performed for the five proteins displaying the highest dysregulation in adenomas and adenocarcinomas (ranked according to p values). ROC curves show the power of each protein to distinguish between tumors and normal mucosa (upper panel). Protein intensities used for ROC analysis are shown in the box plot (lower panels). N; normal mucosa samples, T: adenomas and adenocarcinomas. could be analyzed without being subjected to sample fractionation or depletion of high-abundance proteins. Randomization and blocking were incorporated into the experimental design to maximize the chances of detecting true quantitative differences between normal and neoplastic (adenomas or carcinomas) colorectal tissues. Thanks to their high reproducibility, our sample preparation and SRM measurement protocols (supplemental Figs. S2 and S3) can be readily adapted for use in various laboratories. Our panel of five proteins displaying tumor-associated dysregulation predicted the presence of both precancerous and invasive colorectal tumors.

FIG. 7. Correlation between the five-marker protein panel and clinical features of each tumor.
A, Linear fit shows the positive correlation between SORD or NCL abundance and lesion diameter. B, SORD, S100A11, and SERPINB5 were also positively associated with type V pit pattern in the neoplastic mucosa. C, Tumor expression of SORD, S100A11, and SERPINB5 was higher in adenoma subtypes than in adenocarcinomas (see Table I). Cancer -adenocarcinomas, SSA -sessile serrated adenoma, TA -tubular adenomas, TVA -tubulovillous adenomas, VA -villous adenomas.
FIG. 8. Correlation between SRM measurements of target proteins in adenomas in the present study and corresponding iTRAQ measurements from our previous study (34). A, Analysis of the average deconvoluted protein ratios (normal mucosa/adenoma) obtained with the two methods shows fairly good correlation between the two sets of measurements. p value ϭ 0.000196, Pearson's r ϭ 0.82. B, Deconvoluted log 2 ratio (normal/adenoma) of ARF6 in tumors investigated with iTRAQ and with SRM. ARF protein ratio in matched normal and adenoma samples was quantified based on (1) the intensity of the unique peptide consistently quantified in samples in the SRM study (SRM 1 unique peptide), (2) the intensity of this same peptide measured in samples from the iTRAQ group (iTRAQ 1 unique peptide), and (3) the intensity of all confidently identified peptides measured with iTRAQ 8-plex method (iTRAQ all peptides). Protein ratio in adenomas and matched normal samples was compared with a paired t test and^indicates a significant p value (C) ANOVA of fit between ARF6 abundance and tumor pit pattern.
The fact that our SRM assays failed to quantify 12 of the 40 proteins we targeted might be related to the detection limit we set for low-abundance proteins in our unfractionated, complex samples (54,55), because it was higher than that used for fractionated peptide digests in our previous study (34). It is also possible that the endogenous proteotypic peptides we selected to represent these proteins had suboptimal MS signal responses. Despite these shortcomings, our experience in this study clearly highlighted the advantages of targeted proteomics platforms over the shotgun (or discovery) MS methods (e.g. iTRAQ) used in our previous study (34)), which include higher quantitative sensitivity, improved reproducibility, and multiplex capabilities (56). The emerging targeted MS methods, Sequential Window Acquisition of all Theoretical Spectra (SWATH) and Parallel Reaction Monitoring, may well provide even more comprehensive proteome profiles and further expand the degree of multiplexing achieved with SRM (14 -16).
The proteotypic peptides targeted by SRM are selected a priori to facilitate differentiation between protein isoforms or proteins with shared peptides. In addition, the targeted peptide and fragment ions are reproducibly monitored in all samples, which improves selectivity (14,17,57). In our previous shotgun MS study (34), for example, adenoma-associated expression changes in LDH, ANP32, and PPA could not be attributed to a specific isoform of the enzymes because similar peptides were used for the relative quantification of both isoforms (LDHA versus LDHB, ANP32A versus ANP32B and PPA1 versus PPA2). In the present study, we chose peptides that were unique to each protein/isoform and thus obtained isoform-specific information on the altered abundance of these enzymes in colorectal tumors (Table II and supplemental Fig. 9).
On the whole, SRM-based measurements of the proteins of interest displayed good agreement and correlation with corresponding figures obtained with an 8-plex iTRAQ approach (Fig. 8A). The main outlier was ARF6, which displayed reduced adenoma-related expression in the present study whereas iTRAQ analysis had revealed increased levels in the precancerous tumors. This inconsistency mainly reflects heterogeneity in ARF6 expression across the patient tumors examined. It is not wholly related to the use of one unique peptide for SRM quantification of ARF6 because the reported expression trend for this protein in the iTRAQ cohort did not change when protein intensity was recalculated based on only this peptide (Fig. 8B). Moreover, SRM measurements based on the unique peptide for ARF6 demonstrated adenomarelated down-regulation of this protein in an independent cohort, which was supported by the positive correlation with more severe mucosal changes (Fig. 8C) and progression to cancer (Fig. 3).
Compression of signal intensity ratios secondary to coisolation and fragmentation of tagged precursors is a known drawback of iTRAQ labeling (58 -60). Others include under-sampling because of stochastic selection of precursor ions and intensity-biased selection of precursor ions (59,60). All of these phenomena can cause variability across samples in protein identification and quantification (17,61), underscoring once again the need for more sensitive and reproducible methods such as SRM for the validation of potential biomarker candidates.
Regular colonoscopic screening is recommended for individuals with age-related risk factors for CRC. The reported sensitivity of this approach ranges from 89% to 98% for the detection of adenomas 10 mm or larger and from 75% to 93% for those measuring 6 mm or more (62). Colonoscopy rarely misses adenomas 10 mm or larger (2.1%), but miss rates increase with smaller adenomas (63). In addition, the efficacy of colonoscopic screening programs is often reduced by low patient participation rates (64,65). Efforts have thus been made to find cheaper, noninvasive alternatives that can detect both CRC and advanced precancerous lesions (i.e. Ͼ10 mm large adenomas) with high sensitivity, thereby improving patient compliance and clinical outcomes. The past decade has witnessed the discovery of a number of potential biomarkers for the diagnosis of colorectal neoplasms (6,66,67), but the only non-invasive screening methods for CRC approved thus far involve immunoassays for fecal hemoglobin (68). Studies are underway to assess the efficacy of a more recently developed multi-target stool DNA test, which combines hemoglobin detection with quantitative molecular assays for KRAS mutation, ␤-actin, and aberrant NDRG4 and BMP3 methylation (68). In a large cross-sectional validation study on asymptomatic individuals at average risk for CRC, this new approach detected significantly more cancers than the fecal immuno test, but it also yielded more false-positive results. Its sensitivity for the detection of advanced adenomas-admittedly mediocre (42.4%)-was clearly higher than that of the fecal immuno test (23.8%).
Unlike DNA, protein-based markers provide an accurate representation of the functional cellular conditions being investigated and are fundamental for the phenotypic diagnosis of a disease. For example, mutation of a gene might affect different domains of the protein it encodes, resulting in diverse phenotypes. Furthermore, DNA aberrations generated by double-strand breaks can be amplified into a variety of protein modification events. However, assays for measuring protein levels in fecal samples are not as well established as stool DNA assays despite reports that stool samples from patients with CRC can be distinguished from those of controls based on levels of several tumor-derived proteins, including pyruvate kinase M2 (69,70), carcinoembryonic antigen (CEA) (70), secreted clusterin isoform (71), and minichromosome maintenance proteins (72). The application of proteomics to the analysis of stool samples has been hindered by the biological processes involving partial digestion and decay of proteins throughout the gastrointestinal tract, as well as by technical challenges related to sample preservation and processing.
Disease-specific eukaryotic protein markers enter the extracellular space via conventional and/or unconventional secretory pathways (73). Proteins with signal peptides are usually secreted through conventional pathways involving the endoplasmic reticulum-Golgi system. Unconventional secretory pathways include (1) translocation across the plasma membrane, (2) lysosome-dependent pathways, and (3) extracellular vesicles-dependent secretion (74). We examined probable secretion pathways for the 23 candidate proteins.
First, the SignalP server 4.0 (75) was used to identify protein(s) that possessed a signal peptide and were actively secreted via the classical pathway, whereas the SecretomeP server 2.0a (76) was used to predict protein secretion via non-classical pathways for proteins with or without signal peptides (Table II). At a Discrimination score (D-score) cutoff Ն0.45, NGAL, OLFM4, RCN1, and REG4 are most likely secreted via the classical pathways. Four additional proteins (APEX1, ARF6, S100A11, and KCTD12) were predicted by SecretomeP (Neural Network output or NN-score Ն0.60) to undergo non-classical secretion.
Second, because non-classical secretion of proteins can also occur via their externalization in extra-cellular vesicles (e.g. microvesicles, exosomes, and apoptotic bodies), we analyzed our candidates using Vesiclepedia (77), a manually curated compendium of molecules identified in different classes of extracellular vesicles. Surprisingly, all our candidate proteins have been detected in microvesicles and exosomes from colorectal cells (Table II). Tumor cells can also be shed into the blood. Because extracellular vesicles and circulating tumor cells can be robustly and reproducibly detected in blood and other body fluids, it seems reasonable to expect that our candidate protein biomarkers are potentially detectable in blood (78). This conclusion is supported by the fact that 61% of our target proteins (Table II) were found in the high-confidence human plasma proteome reference set reported by Farrah and colleagues (79), and 96% (all except OLFM4) were present in the largest high-confidence plasma proteome data set available to date, that generated by Keshishian and colleagues using a novel plasma-based biomarker discovery workflow that increased sample throughput and improved the depth of proteome identification and relative quantification (80). Leakage into the blood stream of proteins from early-stage adenomas is probably unlikely given the small size of these tumors and the barriers the proteins would have to circumvent prior to reaching the blood. There is a higher chance of detecting proteins from advanced adenomas or adenocarcinomas in the plasma/serum because of leakage, shedding or protein secretion. For example, studies on one of our candidate biomarkers, SERPINB5, confirmed that levels of this protein are increased in serum from patients with colorectal adenomas or adenocarcinomas, as compared with healthy individuals (81).
Certain molecular changes associated with tumor occurrence and progression can be observed in multiple types of cancer and even in certain chronic inflammatory diseases. Their detection in body fluids might thus be fairly nonspecific. We examined 23 of the 25 protein targets listed in Table II for possible associations with other cancer types and/or with inflammatory bowel diseases. A review of Polanski and Anderson's list of 1261 proteins reported to be differentially expressed in human cancers (82) indicated that eight of these proteins (35% of our protein list) were associated with cancers other than CRC (Table II). FunDO (45) analysis of the genes encoding these 23 candidates revealed six additional proteins that were associated with multiple cancers, but not with inflammatory bowel diseases (supplemental Fig. 10). These findings indicate that our proteins of interest are not associated with inflammation of the digestive tract, however fourteen (56%) of them have links to multiple cancers.
Carcinoembryonic antigen is the only protein currently used as a circulating biomarker for CRC recurrence and metastasis (83). The proteins that were consistently up-regulated in our colorectal tumors and that distinguished adenoma and/or adenocarcinomas from normal mucosa with appreciable sensitivity and specificity (Figs. 4 to 6) are promising candidates for use in blood-based assays for the detection of these tumors. Some of these proteins have been already investigated in other proteomics studies as prognostic serum biomarkers in patients with CRC. Elevated serum levels of REG4 and of total LDH, for example, have both proved to be significant indicators of liver metastasis in this setting (84,85), whereas increased serum levels of LCN2 are reportedly associated with higher neoplastic volume and disease recurrence (86).
Our data suggest that the presence in body fluids of S100A11, SERPINB5, NCL, SORD, and ANXA3 might be an indication for colonoscopy, although this five-marker signature cannot distinguish whether the patient is likely to be harboring an adenoma or an adenocarcinoma. Three of these markers have also been reported in blood-based validation studies on CRC. For instance, a nine-protein multiplex serum array biochip that includes S100A11 and CEA showed high diagnostic potential for colon cancer (87). Serum levels of SERPINB5 have also been found to be elevated in patients with colorectal adenoma or CRC (81), and ANXA3 was recently validated as a potential diagnostic and therapeutic serum marker for CRC (88).
The substances secreted in detectable amounts by CRCs also include products of tumor metabolism. The analysis of volatile organic compounds is an emerging field in the search for noninvasive metabolite tumor-marker profiles (89). Electronic-nose technology and canine and Caenorhabditis elegans scent detection have been used to analyze the presence of these compounds in breath, urine, and stool samples (89 -91) from CRC patients. The reported sensitivity and specificity of these approaches in distinguishing between individ-uals with and without CRC or advanced adenomas are comparable to those of conventional colonoscopy (Ͼ90%) (90,91). In the present study, we found significantly increased levels of metabolic enzymes in the adenomas (SORD, LDHA) and adenocarcinomas (SORD, G6PD, LDHA) we examined. Metabolite products from reactions catalyzed by these enzymes and associated pathway enzymes are potential indicators of tumor occurrence. Interestingly, lactate levels in the adenomas we examined in (34) were significantly higher in adenomas than in healthy control tissue, suggesting that aerobic glycolysis occurs in benign colorectal tumors.
The potential translational impact of our findings on the future clinical management of patients with early colorectal lesions is obvious. However, they can also be exploited to improve our understanding of the molecular mechanisms underlying colorectal tumorigenesis. To this end, we probed the five proteins that most effectively discriminated between normal mucosa and colorectal tumors (Fig. 6) for functional relations with the rest of the proteome. Using the GeneMANIA tool, we queried association data on protein and genetic interactions, pathways, coexpression, colocalization, and protein domain similarity. As shown in supplemental Fig. S11, nucleolin (NCL) was the signature protein with the highest number of interactions with cancer-related proteins. It directly associates with TERT and TOP1, which are key players in oncogenesis (92,93). TERT mRNA and protein expression occurs early in in vivo colorectal tumorigenesis, beginning in the early premalignant phase and increasing gradually during progression to invasive stages (94). As for TOP1, inhibitors of this enzyme are used as first-line chemotherapeutics for CRC (93). The calcium binding protein S100A11 is best described as a dual mediator with effects that depend on its localization and interaction partners. It therefore functions as a tumor suppressor in some cancers (e.g. those of the breast) and as an oncogene in others, including CRC. It interacts directly with NCL, S100B, ANXA1, and UBE2F (95-98) (supplemental Fig. S11). The peptidase inhibitor, SERPINB5, has also been reported to exert tumor-promoting as well as tumor-suppressor effects. Recent protein biomarker studies have revealed that elevated SERPINB5 expression in CRCs correlates with high CEA levels and poor prognosis (99). SORD was found to physically interact with three proteins: AKR1B1, which-like SORD-is involved in the polyol pathway; GCLM, and C16orf13 (98). The roles of GCLM and C16orf13 in tumorigenesis are unclear. Annexin A3 is a calcium-dependent phospholipid-binding protein with potential roles in maintaining cell proliferation and invasion (88). It physically interacts with ANXA11 (98).
In conclusion, SRM allows reproducible, simultaneous detection and quantification of multiple proteins in numerous samples. Protein abundance changes in adenomas relative to the normal colorectal mucosa are unambiguously indicative of phenotypic alterations present in early stage tumors. Our study is limited in regard to the validation of these potential biomarkers in serum/plasma of patients undergoing colonoscopic screening, and patients diagnosed with adenomas and/or CRC. Future work should assess levels of the proteins we verified in colorectal tissue samples in blood samples collected from patients. Efforts of this type will undoubtedly reveal additional, related proteins or metabolites with complementary diagnostic attributes that might also prove useful in precolonoscopic, non-invasive tests for the early diagnosis of colorectal tumors.
Additional Information-The MS/MS data from this study have been submitted to the PeptideAtlas SRM Experiment Library (PASSEL) (http://www.peptideatlas.org/passel/). SRM data are available at PeptideAtlas SRM Experiment Library (PASSEL) and can be accessed using the identifier PASS00900.