Integrated Proteomic Profiling of Cell Line Conditioned Media and Pancreatic Juice for the Identification of Pancreatic Cancer Biomarkers

Pancreatic cancer is one of the leading causes of cancer-related deaths, for which serological biomarkers are urgently needed. Most discovery-phase studies focus on the use of one biological source for analysis. The present study details the combined mining of pancreatic cancer-related cell line conditioned media and pancreatic juice for identification of putative diagnostic leads. Using strong cation exchange chromatography, followed by LC-MS/MS on an LTQ-Orbitrap mass spectrometer, we extensively characterized the proteomes of conditioned media from six pancreatic cancer cell lines (BxPc3, MIA-PaCa2, PANC1, CAPAN1, CFPAC1, and SU.86.86), the normal human pancreatic ductal epithelial cell line HPDE, and two pools of six pancreatic juice samples from ductal adenocarcinoma patients. All samples were analyzed in triplicate. Between 1261 and 2171 proteins were identified with two or more peptides in each of the cell lines, and an average of 521 proteins were identified in the pancreatic juice pools. In total, 3479 nonredundant proteins were identified with high confidence, of which ∼40% were extracellular or cell membrane-bound based on Genome Ontology classifications. Three strategies were employed for identification of candidate biomarkers: (1) examination of differential protein expression between the cancer and normal cell lines using label-free protein quantification, (2) integrative analysis, focusing on the overlap of proteins among the multiple biological fluids, and (3) tissue specificity analysis through mining of publically available databases. Preliminary verification of anterior gradient homolog 2, syncollin, olfactomedin-4, polymeric immunoglobulin receptor, and collagen alpha-1(VI) chain in plasma samples from pancreatic cancer patients and healthy controls using ELISA, showed a significant increase (p < 0.01) of these proteins in plasma from pancreatic cancer patients. The combination of these five proteins showed an improved area under the receiver operating characteristic curve to CA19.9 alone. Further validation of these proteins is warranted, as is the investigation of the remaining group of candidates.

Pancreatic cancer is the fourth leading cause of cancerrelated deaths and one of the most highly aggressive and lethal of all solid malignancies (1). Because of the asymptomatic nature of its early stages, coupled with inadequate methods for early detection, the majority of patients (Ͼ75%) present with locally advanced and inoperable disease at the time of diagnosis (1). At these advanced stages, chemotherapy, radiation, and combinatorial therapies are largely anecdotal, and less than 5% of patients survive up to five-years postdiagnosis (1,2).
One way to aid in the clinical management of cancer patients is through the use of serum biomarkers. Currently, the most widely used biomarker for pancreatic cancer is carbohydrate antigen 19.9 (CA19.9) 1 , a sialylated Lewis A antigen found on the surface of proteins (3,4). Although CA19.9 is elevated mainly in late stage pancreatic cancer, it is also elevated in benign diseases of the pancreas and in other malignancies of the gastrointestinal tract (5). Other tumor markers such as members of the carcinoembryonic antigen (CEA) (6,7) and mucin (MUC) (8 -10) families have also been associated with pancreatic cancer. When used in combination, with or without CA-19.9, some of these markers have shown enhanced sensitivity and specificity; however none have become a constant fixture in the clinic. The lack of a single highly specific and sensitive marker has led to a growing consensus in the field toward the development of multiparametric panels of biomarkers, whereby the combinatorial assessment of multiple molecules can likely achieve increased sensitivity and specificity for disease detection and management (11)(12)(13).
Protein-based biomarkers that can be detected in circulation are typically proteins that are secreted, shed, or cleaved from tumor cells, or ones that might leak out because of local tissue destruction during disease progression (14). As such, biological fluids in close proximity to tumor cells likely serve as enriched sources of potential biomarkers before they enter the circulation and become vastly diluted and potentially masked by proteins of high abundance (15)(16)(17)(18). With respect to pancreatic cancer, proteomic analysis of biological fluids such as pancreatic juice, cyst fluids, and bile have been conducted (19 -26). Protein numbers ranging from 22 to 170 have been identified in six pancreatic juice studies using a variety of different MS-based approaches (19, 20, 22, 24 -26), as well as over 460 proteins identified in a cyst fluid study (23), and 127 proteins in the bile proteome from patients with bile duct stenosis (21).
Tissue culture supernatants or conditioned media (CM) is another relevant fluid, the use of which, for the identification of novel biomarkers, has been demonstrated in multiple cancer sites by our group (27)(28)(29)(30), and others (31)(32)(33)(34)(35)(36). What is lacking in the field is integrative analysis and mining of the proteomes from different biological sources pertaining to a disease type for biomarker discovery. The utility of using an integrative approach to biomarker discovery has been described recently (15,16). Given that cancer is a highly heterogeneous disease, through integration and comparison of proteomes from multiple biological sample types, the advantages of one source might account for the shortcomings of others, resulting in more relevant and stronger candidates for verification in plasma.
As such, in the present study, we performed in-depth proteomic analyses, integrating and comparing the proteomes of CM from six pancreatic cancer cell lines (MIA-PaCa2, BxPc3, PANC1, CAPAN1, CFPAC1, and SU. 86.86), the normal human pancreatic ductal epithelial cell line (HPDE) and two pools of pancreatic juice (each pool containing three samples). All samples were analyzed in triplicate using strong cation exchange chromatography followed by liquid chromatography (LC)-tandem MS (MS/MS) on an linear trap quadrupole (LTQ)-Orbitrap mass spectrometer, and a total of 3479 nonredundant proteins were identified with two or more peptides in our comparative proteomics analysis.
One of the challenges in high throughput proteomics-based discovery studies is in the selection of the most promising candidates for verification (37). In the present study, we used three strategies for generation of candidates. First, given the relatively large number of cancer cell lines profiled, proteins overexpressed in the cancer cell lines in comparison to the HPDE cell line were examined through label-free quantification. Four-hundred and eighty-three proteins were found to be expressed over 5-fold in at least one cancer cell line in comparison to HPDE, 63 of which were annotated as extracellular or cell surface and overexpressed in at least three cancer cell lines. This included several proteins previously studied as pancreatic cancer serum biomarkers, helping to provide credence to our label-free quantitative approach. As a second strategy, we performed a comparative analysis, integrating the pancreatic juice proteome with that of the cell lines and selecting proteins that are common to the cancer cell lines and pancreatic juice. Comparison to a third biological fluid (pancreatic cancer ascites, Kosanam et al., unpublished) and focusing on proteins also identified in the ascites proteome helped to further filter candidates. Tissue specificity is also a desired criteria for biomarkers (37,38), and our third strategy entailed tissue specificity analysis of identified proteins based on mining of the Tissue-Specific Genes Database (TiSGeD) (39), Tissue-specific and Gene Expression and Regulation (TiGER) (40), Unigene (41), and Human Protein Atlas (42) databases, focusing on proteins specific to or highly expressed in the pancreas.
Of the derived candidates, initial verification studies using ELISAs in plasma samples from patients with established pancreatic cancer and controls resulted in the identification of five proteins-anterior gradient homolog 2 (AGR2), Polymeric immunoglobulin receptor (PIGR), Olfactomedin-4 (OLFM4), Syncollin (SYCN), and Collagen alpha-1(VI) chain (COL6A1)that showed a significant increase (p Ͻ 0.01) in plasma from pancreatic cancer patients. AGR2, PIGR, and COL6A1 were proteins overexpressed in three or more cell lines, OLFM4 was a protein consistently identified in the multiple biological fluids, and SYCN was a protein identified in our proteomic analysis that showed high tissue specificity to the pancreas. CA19.9 levels were also assessed in the plasma samples and the combination of the five proteins showed an improved area under the receiver operating characteristic (ROC) curve to CA19.9. Our verification studies in plasma provide evidence that our approach can identify candidates with potential diagnostic utility for pancreatic cancer. Further validation of AGR2, PIGR, OLFM4, SYCN, and COL6A1 in a larger number of plasma samples is warranted, as is the investigation of the remaining group of candidates.
Cell culture media specified by ATCC for each of the six pancreatic cancer cell lines were used and are as follows: Dulbecco's modified Eagle's medium (DMEM) ( 30 -2005) with 10 and 20% fetal bovine serum was used for the CFPAC-1 and Capan1 cell lines, respectively. The HPDE cell line was grown in keratinocyte serum free media (Catalog No.17005-042; Invitrogen) supplemented with bovine pituitary extract and recombinant epidermal growth factor. All cells were cultured in an atmosphere of 5% CO 2 in air in a humidified incubator at 37°C.
Cell Culture-Cells were cultured in T-175 cm 2 flasks at determined optimal seeding densities of ϳ10 ϫ 10 6 , for MIA-PaCa2, Panc1, and Capan1, 14 ϫ 10 6 for BxPc3, 3 ϫ 10 6 for HPDE, 13 ϫ 10 6 for CFPAC1 and 4 ϫ 10 6 for Su.86.86 in three replicates per cell line. Cells were first cultured for 48 h in 40 ml of their respective growth media to obtain adherence to culture flasks. The media was then removed and the cells and flasks were subjected to two gentle washes with 30 ml of phosphate-buffered saline (Invitrogen). Forty millilitres of chemically defined Chinese hamster ovary serum-free medium (Invitrogen) supplemented with 8 mM glutamine (Invitrogen) was then added and the cells were left to culture for determined optimal incubation periods of 72 h for Capan1, CFPAC1, and SU.86.86, 96 h for BxPc3 and HPDE, and 144 h for MIA-PaCa2. The chemically defined Chinese hamster ovary media that the cells were grown in were subsequently collected and centrifuged at 1500 rpm for 10 min to remove cellular debris. Total protein concentration (as determined through a Coomassie (Bradford) total protein assay, (46)) was measured in each of the three replicates and a volume corresponding to 1 mg of total protein from each of the replicates was subjected to the sample preparation protocol below.
Before analysis of samples, the seeding density and incubation periods were optimized through a procedure described previously (28,29). Briefly, cells were left to culture for a period of 5-7 days using different seeding densities. Each day, a 1 ml sample of media was collected and protein secretion was assessed by measuring the levels of proteins known to be secreted from cells (e.g. multiple kallikreins, KLK) through ELISAs. Cell death was assessed through measurement of LDH, an intracellular enzyme that is released into the media on cell death or lysis via an automated activity assay and an "optimal" incubation period and seeding density that supports increased protein secretion (i.e. increased levels of KLK) with minimal cell death (i.e. low LDH) were selected (28,29).
Pancreatic Juice-Pancreatic juice samples were provided by Dr. Felix Rü ckert, Dresden, Germany. Approximately 50 -500 l of pancreatic juice was collected from the main pancreatic duct of patients undergoing pancreatic surgery. The study was approved by the local ethics committee and all patients had given written informed consent before operation. On collection, the samples were stored at Ϫ80°C and were not thawed until use in this study. Samples from patients with clinically confirmed cases of pancreatic ductal adenocarcinoma without visible signs of blood were selected for analysis. Six pancreatic juice samples met these criteria. The samples were centrifuged at 16,000 rpm for 10 min at 4°C to remove tissue debris. Total protein concentration of each sample was measured using the Biuret method (47). Keeping in line with the cell line conditioned media analysis, it was desirable to use a total protein amount of 1 mg for analysis of each of the three replicates per sample. As a result, two pools of pancreatic juice (pools A and B) were made, containing three samples each, with total protein concentrations of 2.65 mg/ml and 2.32 mg/ml for pool A and B, respectively. A volume corresponding to 1 mg of total protein was retrieved from each pool, in triplicate, and subjected to the standardized sample preparation protocol below (with the exception of dialysis).
Sample Preparation-Samples were processed as described previously (28). Briefly, samples were dialyzed using a 3.5 kDa molecular weight cut-off membrane (Spectrum Laboratories, Inc., Compton, CA) in 5 L of 1 mM NH 4 HCO 3 buffer solution at 4°C overnight and subsequently frozen and lyophilized to dryness to concentrate proteins using a ModulyoD Freeze Dryer (Thermo Electron Corporation). Proteins in each lyophilized replicate were denatured using 8 M urea and reduced with the addition of 200 mM dithiothreitol (final concentration of 13 mM) in 1 M NH 4 HCO 3 at 50°C for 30 min. Samples were then alkylated with the addition of 500 mM iodoacetamide and incubated in the dark, at room temperature, for 1 h. Each replicate was then desalted using a NAP5 column (GE Healthcare), frozen and lyophilized. Last, samples were trypsin-digested (Promega, sequencing-grade modified porcine trypsin) through an overnight incubation at 37°C using a ratio of 1:50 trypsin to protein concentration. Tryptic peptides were frozen in solution at Ϫ80°C to inhibit trypsin function and lyophilized.
Strong Cation Exchange (SCX) on a High Pressure Liquid Chromatography (HPLC) System-The tryptic peptides were resuspended in 510 l of a solution containing 0.26 M formic acid in 10% acetonitrile with pH 2-3 (mobile phase A) and loaded directly onto a 500l loop connected to a PolySULFOETHYL A™ column (The Nest Group, Inc. Southborough, MA). The column has a silica-based hydrophilic, anionic polymer (poly-2-sulfoethyl aspartamide) with a pore size of 200 Å and a diameter of 5 m. The SCX chromatography and fractionation was performed on an HPLC system (Agilent 1100) using a 1-hour procedure with a linear gradient of mobile phase A. For elution of peptides, an elution buffer, which contained all components of mobile phase A with the addition of 1 M ammonium formate, was introduced at 20 min in the 60 min method, and a wavelength of 280 nm was used to monitor the eluent (28). Fractions were collected every min from the 20 min time point onwards resulting in the collection of 40 one-min fractions. Collected fractions were left unpooled or subsequently combined into 2, 3, or 5 min pools, according to the elution profile of the resulting SCX chromatogram. As a general strategy, in which the absorbance readings of the elution profile was greater (typically the first 10 -15 min of elution), fractions were left unpooled or pooled every 2 min to keep sample complexity at a minimum. Where the absorbance readings were lower (toward the end of the method), fractions were pooled in 3 or 5 min pools. The same pooling method was used for the three replicates of each sample.
Mass Spectrometry (LC-MS/MS)-The SCX fractions or pools were purified through OMIX Pipette Tips C 18 (Varian Inc.) to further remove impurities and salts and eluted in 4l of 70% MS Buffer B (90% acetonitrile, 0.1% formic acid, 10% water, 0.02% trifluoroacetic acid) and 30% MS Buffer A (95% water, 0.1% formic acid, 5% acetonitrile, 0.02% trifluoroacetic acid). Eighty microliters of MS Buffer A was added to the eluent, and 40l of sample was loaded onto a 3 cm C 18 trap column (with an inner diameter of 150 m; New Objective), packed in-house with 5 m Pursuit C 18 (Varian Inc.). A 96-well microplate autosampler was used for sample loading. Eluted peptides from the trap column were subsequently loaded onto a resolving analytical PicoTip Emitter column, 5 cm in length (with an inner diameter of 75 m and 8 m tip, New Objective) and packed in-house with 3 m Pursuit C 18 (Varian Inc, Palo Alto, CA). The trap and analytical columns were operated on the EASY-nLC system (Proxeon Biosystems, Odense, Denmark), and this liquid chromatography setup was coupled online to an LTQ-Orbitrap XL hybrid mass spectrometer (Thermo Fisher Scientific, San Jose, California) using a nanoelectrospray ionization (ESI) source (Proxeon Biosystems, Odense, Denmark). Samples were analyzed using a gradient of either 54 or 90 min (for 5 min pools, a 90 min gradient was used, and for 2min, 3min, and nonpooled samples, a 54 min gradient was used). Samples were analyzed in data dependent mode and whereas full MS 1 scan acquisition from 450 -1450 m/z occurred in the Orbitrap mass analyzer (resolution 60,000), MS 2 scan acquisition of the top six parent ions occurred in the linear ion trap (LTQ) mass analyzer. The following parameters were enabled: monoisotopic precursor selection, charge state screening, and dynamic exclusion. In addition, charge states of ϩ1, Ͼ4, and unassigned charge states were not subjected to MS 2 fragmentation.
Protein Identification-XCalibur software (version 2.0.5; Thermo Fisher) was used to generate RAW files of each MS run. The RAW files were subsequently used to generate Mascot Generic Files (MGF) through extract_msn on Mascot Daemon (version 2.2.2). Once generated, MGFs were searched with two search engines, Mascot (Matrix Science, London, UK; version 2.2) and X!Tandem (Global Proteome Machine Manager; version 2006.06.01), to confer protein identifications. Searches were conducted against the nonredundant Human International Protein Index database (version 3.62), which contains a total of 167,894 forward and reverse protein sequences using the following parameters: fully tryptic cleavages, 7 ppm precursor ion mass tolerance, 0.4 Da fragment ion mass tolerance, allowance of one missed cleavage, fixed modifications of carbamidomethylation of cysteines, and variable modification of oxidation of methionines. The files generated from MASCOT (DAT files) and X!Tandem (XML files) for the three replicates of each cell line or pancreatic juice pool were then integrated through Scaffold 2 software (version 2.06; Proteome Software Inc., Portland, Oregon) resulting in a nonredundant list of identified proteins per sample. Results were filtered separately for each individual cell line (three replicates combined) and pancreatic juice pool (three replicates combined) using the X!Tandem LogE filter and Mascot ion-score filters on Scaffold to achieve a protein false discovery rate (FDR) Ͻ1.0%. FDR was calculated as [2XFP/(TPϩFP)]X100, in which FP (false positive) is the number of proteins that were identified based on sequences in the reverse database component and TP (true positive) is the number of proteins that were identified based on sequences in the forward database component. Protein isoforms and individual members of a protein family would be separately identified only if peptides that enable differentiation of isoforms were identified based on generated MS/MS data. If such peptides were not identified, Scaffold software would group all isoforms under the same gene name. Different proteins that contained similar peptides and which were not distinguishable based on MS/MS data alone were grouped to satisfy the principles of parsimony.
Data Analysis-Scaffold prot-XML reports were generated and uploaded onto Protein Center Professional Edition (version 3.5.2.1; Proxeon Bioinformatics, Odense, Denmark) to facilitate comparisons between cell line CM and pancreatic juice proteomes, and to obtain Genome Ontology information. Specifically, cellular localization, function, and process annotations were extracted by Protein Center from the Gene Ontology (GO) Consortium (http://www.geneontology.org/ GO.tools.shtml). Because of the large number of different GO annotations per localization, function, and process, Protein Center reduces terms to ϳ20 high-level terms that are used for filtering. Details can be found at http://tgh.proteincenter.proxeon.com/ProXweb/Help/Manual/ apd.html. A Microsoft Excel Macro developed in-house by Dr. Irvin Bromberg, Mount Sinai Hospital, was also used for comparison of protein lists based on accession number or gene name. Hierarchical clustering analysis of proteomic data was performed using Permut-Matrix, available freely online at http://www.lirmm.fr/ϳcaraux/Per-mutMatrix/EN/index.html. PermutMatrix is a software originally developed for gene expression analysis (48). More recently it has been used and validated for proteomics (49). For clustering analysis, a standardization method described previously (32), was adapted, in which average exponentially modified protein abundance index (em-PAI) values from the triplicate analysis of the samples were exported from Protein Center into a space delimited Microsoft Excel file. For visualization, comparison and data analysis purposes, cell line or pancreatic juice samples with missing emPAI values for a particular protein were assigned half the minimum emPAI value for that protein in the data set (32). The emPAI values were imported into Permut-Matrix and transformed to Z score values for normalization. Two-way hierarchical clustering analysis was performed using the Pearson and Ward's minimum variance methods for distance and aggregation, respectively. Resultant dendograms with cell lines and pancreatic juice samples on the x axis and gene name on the y axis were exported. Last, Ingenuity Pathway Analysis (IPA, Ingenuity Systems, www.ingenuity.com), which uses a knowledge-base from literature, was used to obtain disease associations of groups of proteins and their associated pathway interactions.
Label-free Protein Quantification-Relative label-free protein quantification analysis was conducted between the cancer cell lines and the HPDE normal pancreatic ductal epithelial cell line to ascertain proteins over-or under-expressed in the cancer cell lines based on spectral counting (50 -52). The "Quantitative Value" function of Scaffold 2.06 software, which provides normalized spectral counts based on the total number of spectra identified in each sample was used. One Scaffold file containing all of the normalized spectral counts of each of the three replicates from the seven cell lines was generated for proteins identified with two or more peptides. One-way ANOVA was conducted to determine proteins that show a significant difference among the seven cell lines (p Ͻ 0.05). For proteins that showed a p value Ͻ0.05, the average spectral count for the three replicates was calculated and fold-change was determined by dividing the average counts from each of the cancer cell lines with that of the normal HPDE cell line and vice versa. Not all proteins were identified in all of the cell lines and unidentified proteins or missing values in a particular biological sample were assigned a normalized spectral count of 1 to keep from dividing by zero and to prevent overestimation of fold-changes. In addition, all proteins had to have been identified by ten or more spectra in at least one biological sample to be included. Percent coefficient of variance (%CV) was evaluated as a measure of variance for the triplicates of each sample, and proteins with ambiguous peptides were searched individually to ensure that normalization of spectral counts did not significantly alter assigned spectral count values. Quantification of multiple isoforms occurred only if the isoforms were identified as individual entries (i.e. with distinct gene names) through Scaffold Software.
Plasma Samples-Blood samples were collected from pancreatic cancer patients at the Princess Margaret Hospital GI Clinic in Toronto, Canada, or from kits sent directly to consenting patients recruited from the Ontario Pancreas Cancer Study at Mount Sinai Hospital following a standardized protocol (age range 55-86; median age 68; 10 female and 10 male). Samples were collected with informed consent, and with the approval of the institutional ethics board. All cases were histologically confirmed adenocarcinomas of the pancreas and all samples were from patients not receiving chemotherapy at the time of collection. Samples from healthy controls were obtained from the Familial Gastrointestinal Cancer Registry. The controls are nonblood relatives of patients in the Familial Gastrointestinal Cancer Registry studies (age range 46 -84; median age 60; 9 female and 11 male). Blood was collected in ACD (anticoagulant) vacutainer tubes and plasma samples were processed within 24 h of blood draw. To pellet the cells, blood samples were centrifuged at room temperature for 10 min at 1000 ϫ g. Immediately after centrifugation, the plasma samples were aliquoted into 250-l cryotubes and stored at Ϫ80°C or liquid nitrogen until use in this study.
Statistical Analysis-Mann-Whitney U-tests were applied to verification experiments in plasma to determine if differences in the medians were significant between cancer and control groups using Graph Pad Prism 4 Software. The five candidates that showed a statistically significant difference (p Ͻ 0.05) were then assessed in combination, in comparison to CA19.9, through ROC curve analysis. The area under the curves (AUC) were calculated using ROCR software and the corresponding variances were calculated with a bootstrap method.

RESULTS
Increasing Protein Yield-Once the cell lines were grown and CM collected, the samples were subjected to a twodimensional (2D)-LC-MS/MS analysis, which combined SCX liquid chromatography on an HPLC system, followed by LC-MS/MS. A schematic of the workflow is provided in Fig. 1. Guided by our previous experience (27)(28)(29)(30), SCX fractions were initially collected at 5-min intervals during peptide elution, resulting in approximately eight fractions that were ana-lyzed using a ϳ2 h reverse-phase method on the LTQ-Orbitrap mass spectrometer. This resulted in the identification of 1305, 1468, and 1755 proteins (Ն1 peptide) in the triplicate analysis of the BxPc3, HPDE, and MIA-PaCa2 cell lines, respectively (supplemental Table S1). In some of the individual 5-min fractions analyzed (specifically the fractions that contained the highest absorbance readings during SCX peptide elution) Ͼ700 proteins were identified per fraction (data not shown). Based on previous experience, this was a relatively large number of proteins to have been identified in individual fractions. Consequently, we opted to employ a different fraction collection and pooling strategy. By collecting fractions every minute from SCX and pooling fractions based on the intensity of peaks eluting on the SCX chromatogram (as described in the Experimental Procedures; supplemental Fig. S1), we identified 2017 proteins for the BxPc3 cell line, 2297 for HPDE, and 2756 for the MIA-PaCa2 cell line (Ն1 peptide) subjected to the same growth and sample processing conditions, in triplicate. To ensure that this increase in protein identification was not because of variation in cell growth or sample collection, an additional replicate using MIA-PaCa2 CM left over from the initial analysis, which had been stored in Ϫ80°C, was also run and 2348 proteins were identified (a 52-56% increase from the individual replicates of the first run of MIA-PaCa2) (supplemental Table S1).
Over 90% of the proteins from the first analysis were reidentified and the new pooling strategy resulted in approximately a 54 -58% increase in protein yield across the cell lines. This improved strategy was used for proteomic analysis of the remaining cell lines and the pancreatic juice samples.
Protein Identification through LC-MS/MS-Six human pancreatic cancer cell lines, one "near normal" HPDE and six pancreatic juice samples from ductal adenocarcinoma patients (in two pools) were analyzed in triplicate in this study (Fig. 2). Using both MASCOT and X!Tandem search engines, between 2017 and 3250 proteins were identified in the seven cell lines and 1018 and 957 proteins were identified from pool A and B of pancreatic juice, respectively (Table I). These numbers represent proteins identified in the three replicates FIG. 1. Schematic outline of proteomic analysis methodology. Sample processing, followed by strong cation exchange (SCX) and reverse-phase chromatography coupled online to an LTQ-Orbitrap mass spectrometer, and subsequent data analysis is outlined. CM, conditioned media; LC-MS/MS, liquid chromatography tandem mass spectrometry. combined, with 1 or more peptides and with protein FDR of Ͻ1.0%. FDR was calculated as described in the Experimental Procedures section (55)(56)(57). For increased stringency and assurance of protein identification, only proteins identified with two or more peptides were included in the remainder of the analysis, resulting in between 1261 and 2171 proteins for each of the cell lines and a total of 648 nonredundant proteins from the pancreatic juice analysis. This data is summarized in Table I and a complete list of proteins identified (Ն2 peptides), including number of unique peptides, International Protein Index accession number, number of assigned spectra and percent sequence coverage is presented in supplemental Table S2 for each cell line and pancreatic juice pool. Further and more specific details on peptide/protein identification are provided in supplemental Tables S3-S5  (supplemental Table S3: BxPc3, CAPAN1, and CFPAC1; supplemental Table S4: HPDE, MIA-PaCa2, and PANC1; supplemental Table S5: SU.86.86, pancreatic juice pool A, and pancreatic juice pool B).
Protein Overlap Among Samples-From our combined analysis, a total of 12,805 proteins were identified with Ն2 peptides. The majority of these proteins were common to multiple biological samples and our study resulted in the identification of 3479 nonredundant proteins (3324 in the cell lines and 648 in the pancreatic juice analysis; supplemental Table S6). Six-hundred and forty-four proteins (of 3324; 19.4%) were common to all cell lines and an average of 143 proteins were unique to each (Table II). Similar to previous findings (32), proteins common to an increasing number of cell lines were, on average, of higher abundance (Table II). From our preliminary studies of the three cell lines described in the "increasing protein yield" section, 81 additional nonredundant proteins (Ն2 peptides) were identified. These were not included in the remainder of the analyses but have been included in supplemental Table S6 under a separate tab.
Significant overlap was noted between the pancreatic juice and CM proteins. Approximately 76% (493 of 648) of proteins identified in the pancreatic juice samples were also identified in the cell line analysis (Fig. 2B), which underscores the similarity in the proteomes among these biological fluids; however many proteins that are largely associated with exocrine pancreatic function were unique to the pancreatic juice and were not identified in the cell lines. Analysis of overrepresented Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways through Protein Center (supplemental Table S7), further revealed the KEGG pancreatic secretion pathway (hsa04972; supplemental Fig. S2) to be one of three pathways overrepresented in the pancreatic juice proteome in comparison to the combined cell line proteome (p ϭ 3.611 ϫ 10 Ϫ5 ) (58,59).
Gene Ontology-Function, Process, and Cell Localization Classifications-Gene ontology classifications, which include function, process, and cell localization, were obtained for all identified proteins. Proteins that are secreted into the extracellular milieu or cleaved from the plasma membrane of cells have the highest chance of entering the circulation and serving as serological biomarkers. Between 34.1 and 42.6% of proteins in each of the cell lines and 57% of proteins in the pancreatic juice samples were annotated as belonging to the extracellular and cell surface compartments (Table I). In total, 1376 (40%) of 3479 proteins were associated with these two annotations. The cytoplasm received the greatest number of annotations in both biological fluids and ϳ2.9% of the total contingent of proteins did not contain cell localization information and remained unannotated (Fig. 3A). It is important to note that proteins can be classified as belonging to multiple cellular localizations, processes, and functions and as a re- sult, the categories for each are nonexclusive and the sum of the percentages can be Ͼ100%.
The top three molecular functions for the cell lines and the pancreatic juice were the same: protein binding (ϳ80.2%, 79.9%), catalytic activity (69.4%, 70.2%), and metal ion binding (45.7%, 49.7%), respectively. Both fluids also shared the top two biological processes-metabolic process (81.2%, 83.5%) and regulation of biological process (61.9%, 63.1%), respectively (supplemental Fig. S3A, B). In a comparison between the cell line and pancreatic juice proteomes as a whole, several molecular functions related to enzyme activity and biological processes related to digestion and proteolysis were found overrepresented in the pancreatic juice proteome (Fig.  3B). The only GO category overrepresented in the cell line proteome was the biological process "biopolymer metabolic process" (GO:0043283; p ϭ 4.053 ϫ 10 Ϫ7 ; FDR p ϭ 2.524 ϫ 10 Ϫ3 ). No GO terms were over-or under-represented in a comparison between the cancer cell lines and HPDE. Specifics for localization, function, and process for each protein are provided in supplemental Table S6.
Hierarchical Clustering-One of the difficulties in dealing with large data sets is visualizing the proteomes as a whole and identifying subsets of proteins that might be of importance within certain biological contexts. In an initial attempt to mine and explore the CM and pancreatic juice proteomes, unsupervised two-way hierarchical clustering analysis was performed using average emPAI values of the three replicates  a Indicates the number of proteins with two or more peptides that were commonly identified in the reported number of cell lines. b A total of 1001 proteins were only identified in one of the seven cell lines; an average of 143 proteins were unique to each cell line.
c Average emPAI values (semiquantitative measure of protein abundance) were obtained from the triplicate analysis of each cell line for each protein and then an average was calculated for each protein if the protein was identified in multiple cell lines. Subsequently an average (and standard deviation) for each group of proteins common to the stated number of cell lines was calculated and reported. The proteins common to multiple cell lines had a higher average emPAI than those common to a small number of cell lines or unique to one cell line indicating proteins common to a larger number of cell lines were on average of higher abundance. d Indicates the percentage of proteins common to the multiple cell lines that were also identified in the pancreatic juice proteome.
of each sample, normalized through Z-scores. Through these means, proteins were clustered based on abundance within each sample. The concentrations of two proteins (KLK6 and KLK10) were assessed in the CM through ELISA to determine if Z-scores of emPAI values are a suitable indicator of protein abundance. Good correlation (r 2 ϭ 0.74) was seen between Z-scores of ELISA concentrations and Z-scores of emPAIs ( Fig. 4A) (supplemental Table S8). In addition, the lowest ELISA concentration measured was 0.80 g/L for KLK10 in CAPAN1 CM, which indicates the sensitivity of our mass spectrometry analysis in general, to be at least in the low g/L range for the CM analysis.
Hierarchical clustering analysis was performed on the entire data set of 3479 proteins and based on normalized emPAI values, the pancreatic juice samples were distinctly clustered separately from the cell lines, and within the cell lines, the three derived from metastatic sites (SU.86.86, CFPAC1, and CAPAN1) were clustered together. MIA-PaCa2, PANC1, and BxPc3 are cell lines derived from the primary tumor site of three patients (43). The MIA-PaCa2 and PANC1 proteomes were clustered together, as were the BxPc3 and HPDE cell lines. Heat-map visualization facilitated a first exploration of the data set and the identification of several regions or protein clusters of interest. Among them, two clusters containing 34 proteins were shown to be highly expressed in multiple cancer cell lines and the pancreatic juice samples, all with minimal expression in HPDE (Fig. 4B) (supplemental Table S9). This included proteins such as MUC1 (Mucin-1) (8) and RNASE1 (pancreatic ribonuclease) (60) which have been shown to be elevated in pancreatic cancer and studied previously as pancreatic cancer biomarkers in serum. This prompted us to further examine proteins that are differentially expressed between the cancer cell lines and the HPDE cell line. As an additional note, from Fig. 4B, the pancreatic juice protein expression profile appears more closely associated with that of the metastatic cell lines. In general, a two-sample test for equality of proportions with continuity correction showed that there was a closer association of the pancreatic juice proteome with the combined metastatic cell line proteome than with the combined primary cell line proteome (p value ϭ 8.655 ϫ 10 Ϫ05 ).
Differential Expression of Proteins in Cancer versus Normal Cell Lines-Normalized spectral counts of the cancer cell lines were compared with those of the normal HPDE cell line as described in the Experimental Procedures section. The Pearson correlation coefficient was evaluated for all pairs of the 21 replicates from the seven cell lines using normalized spectral count values (supplemental Table S10). With the exception of replicate 2 from CFPAC1, which showed 0.727 and 0.851 correlation with CFPAC1 replicates 1 and 3, good correlation (Pearson r ranging from 0.944 to 0.993) was seen among replicates of each cell line (including CFPAC1 replicates 1 and 3) indicating good reproducibility (supplemental Table S10).
Analysis of variance (ANOVA) identified 1293 proteins (each with a minimum number of 10 spectra in at least one cell line), with a statistically significant difference in normalized spectra among the seven cell lines (p Ͻ 0.05). Based on the criteria described in the Experimental Procedures, 483 of these proteins showed Նfivefold increase in at least one cancer cell line compared with HPDE. These proteins, along with normalized spectral count values, %CV, fold-change and other details are provided in supplemental Table S11. One-hundred and nineteen proteins further demonstrated Նfivefold increase in at least three cancer cell lines in comparison to HPDE, of which 53 proteins showed over 10-fold increase and 18 showed over a 20-fold increase in at least three cancer cell lines. Sixty-three of the 119 proteins were extracellular and cell surface-annotated and are listed in Table III. Pathway analysis show cellular movement, cancer, and cell-to-cell signaling and interaction as the top pathways assigned to this set of proteins (supplemental Fig. S4). In addition, 17 of these proteins have been previously shown to be up-regulated in pancreatic cancer in at least four studies (61), and 10 have been shown to be elevated in pancreatic cancer serum in comparison to controls (60, 63-71) ( Table III). The unstudied proteins might yield promising new candidate biomarkers for pancreatic cancer.
Many of the proteins were also identified in a comprehensive database of human plasma proteins (72)  Comparison of z-scores between emPAI values and ELISA concentrations for two control proteins, KLK6 and KLK10 (A). Good correlation was seen (R 2 ϭ 0.74). Clustering analysis using Pearson and Ward's minimum variance methods for distance and aggregation (B) depicting the seven cell lines and two pancreatic juice pools on the x axis and proteins on the y axis. Shown is a segment of the resulting dendrogram depicting two clusters of proteins found highly expressed in the cancer cell lines and pancreatic juice (low expression in the normal HPDE cell line). See also supplemental Table S9 for z-score values for each protein in Fig. 4B. KLK6, kallikrein 6; KLK10, kallikrein 10; emPAI, exponentially modified protein abundance index; EC, extracellular; CS, cell surface.  ied (supplemental Table S11). Examination of underexpressed proteins revealed 21 proteins consistently decreased at least fivefold in all six cancer cell lines and 46 consistently decreased in five cancer cell lines in comparison to HPDE (supplemental Table S11, separate tab). It should be noted that all proteins had to be identified with at least 10 spectra in one cell line to be included in the quantitative analysis described above and proteins with %CVs greater than 50% (for normalized spectral counts in triplicate analysis) were not included. Mean %CVs ranged from 13.2 to 21.8 for the proteins that showed over fivefold increase in the seven cell lines (supplemental Table S11). Percent CVs Ͼ50% were only accepted in comparison cell lines (i.e. comparison cell line refers to the normal HPDE cell line when overexpression between the cancer versus HPDE cell line was the focus of analysis, or each of the cancer cell lines when underexpression between the cancer versus HPDE cell lines was the focus of analysis) in which proteins were identified with Ͻ10 spectra in the comparison cell line.
Further Prioritization of Candidates through Integration of Biofluids and Tissue Specificity-Recent evidence suggests the integration of data from different biological fluids might yield stronger candidates for verification phases of biomarker discovery (15,16). As such, in addition to the differential protein expression analysis of the cell line CM, we also applied a comparative, or integrative, approach based on overlap of proteins between different biological sources and cellular localization for generation of additional candidates. Of the 488 proteins common to the pancreatic juice and cell lines (Fig. 2B), 235 had been annotated to extracellular and cell surface compartments. One-hundred and nine of these proteins were also identified in the proteome of ascites fluid from three patients with pancreatic adenocarcinoma (Kosanam Kosanam H., Makawita S., Judd B., Newman A., and Diamandis E.P. Mining the malignant ascites proteome for pancreatic cancer biomarkers, unpublished data), and of these, 43 were not identified in the HPDE cell line (supplemental Table S12).
Because there might be pertinent proteins in the pancreatic juice that might not be identified in the CM and vice versa, proteins identified in either proteome that were shown to be highly expressed in or highly specific to the pancreas were also included. To examine tissue specificity, we used microarray, expressed sequence tags (ESTs), and immunohistochemistry data using TiSGeD (39), TiGER (40), Unigene (41), and the Human Protein Atlas (42) databases, respectively. Specifically, we compared our list of identified proteins to 150 proteins reported as specific to pancreas tissue using TiSGeD specificity measure Ͼ0.90 (39), 55 "pancreas-restricted" proteins from Unigene (41), 205 proteins preferentially expressed in the pancreas based on the TiGER database, and 198 proteins showing "strong" pancreatic exocrine cell staining and annotated on the Human Protein Atlas. Twenty proteins were common to at least three or more of the databases, of which two proteins, PRSS1 and SPINK1, were identified in the cell  line CM, and 15 proteins (which included PRSS1 and SPINK1) were identified in the pancreatic juice proteome (Table IV). Twelve of these proteins have been previously shown to be elevated in serum and plasma of patients with pancreatitis or pancreatic cancer (74 -84), leaving CTRC (chymotrypsin C), SYCN (syncollin), and REG1B (Lithostathine-1-beta) (Table  IV). Candidate Verification in Plasma-Several candidates identified through the differential protein expression, integration of multiple biological fluids, and tissue specificity analyses, as described above and listed in Tables III, IV and  supplemental Table S12 and not previously shown to be increased in plasma and serum from pancreatic cancer patients were verified based on availability of enzyme-linked immunosorbent assays. Specifically, seven proteins were verified, of which five, Anterior Gradient Homolog 2 (AGR2), Olfactomedin-4 (OLFM4), Syncollin (SYCN), Collagen alpha-1(VI) chain (COL6A1), and Polymeric Immunoglobulin Receptor (PIGR) showed a significant increase in plasma from pancreatic cancer patients in our sample set (Fig. 5). Two other proteins (NUCB2, nucleobindin-2; PLAT, tissue-type plasminogen activator) did not show a significant increase in pancreatic cancer plasma (data not shown).
In the CM analysis, AGR2 showed over 10-fold increase in the BxPc3, CAPAN1, CFPAC1, and SU-86 -86 cell lines compared with the near normal HPDE cell line (Table III). As well, AGR2 was common to the CM and pancreatic juice proteomes and was identified in the cluster of proteins highly expressed in many cancer cell lines and pancreatic juice in comparison to HPDE (Fig. 4B). In plasma, AGR2 levels were significantly increased in pancreatic cancer patients (p Ͻ 0.0001) in comparison to controls (Fig. 5A). Mean and median plasma levels in the pancreatic cancer patients were 8.8 g/L and 2.1 g/L whereas mean and median levels in controls were 0.33 g/L and 0.28 g/L).
OLFM4 was a protein identified based on the integrated method (supplemental Table S12), and also identified in the cluster shown in Fig. 4B. In the plasma samples, OLFM4 also showed a significant elevation (p Ͻ 0.0001) in cancer (mean ϭ 161 g/L, median ϭ 90 g/L) in comparison to controls (mean ϭ 51 g/L, median ϭ 38 g/L) (Fig. 5B). SYCN was a protein identified solely in the pancreatic juice samples. It is monospecific to the pancreas based on TiGER, TiSGeD, and Unigene databases (data was unavailable in the Human Protein Atlas) (Table IV). This protein is part of the secretory granule membranes of the exocrine pancreas, and because of its tissue specificity, it was identified for verification. In plasma, SYCN also showed a significant increase in pancreatic cancer patients (p ϭ 0.0011; mean cancer ϭ 18.2 g/L, median cancer ϭ 13.5 g/L; mean controls ϭ 5.1 g/L, median controls ϭ 2.9 g/L) (Fig. 5C).
COL6A1 was expressed over 20-fold in all of the cancer cell lines except for the BxPc3 cell line in comparison to the HPDE cell line. Similarly, PIGR was expressed over 20-fold in three of the cancer cell lines (Table III). Both proteins showed a significant increase in pancreatic cancer plasma in our pre-

CPA1
Carboxypeptidase A1 cDNA FLJ53709, highly similar to Carboxypeptidase A1 Isoform Alpha of Pancreatic secretory granule membrane major glycoprotein GP2 Isoform 1 of Pancreatic secretory granule membrane major glycoprotein GP2 Pancreatic secretory trypsin inhibitor liminary analysis (p ϭ 0.0098; mean cancer ϭ 3.3 mg/L, median cancer ϭ 2.1 mg/L; mean controls ϭ 1.5 mg/L, median controls ϭ 0.73 mg/L for COL6A1 and p Ͻ 0.0001; mean cancer ϭ 16.8 mg/L, median cancer ϭ 12.3 mg/L; mean controls ϭ 9.2 mg/L, median controls ϭ 8.96 mg/L for PIGR) (Fig. 5D, 5E). At present, CA19.9 is the most widely used pancreatic cancer biomarker and CA19.9 levels were also assessed in our screening set of plasma samples. Because we used mostly pancreatic cancer patients with established disease, the AUC of CA19.9 alone in this set of samples was 0.97 (Fig.  6A). The AUCs for each one of our newly discovered five biomarkers alone ranged from 0.95 to 0.74 (Fig. 6A). Each one of our biomarkers was able to improve CA19.9 AUC to 1.00 (CA19.9 ϩ AGR2, data not shown) or to 0.98 (Fig. 6B). The scatterplot of Fig. 6C shows that AGR2 was able to identify two patients with pancreatic cancer, which did not show an CA19.9 elevation (Fig 6D). Fig. 6E shows that our panel of five newly discovered biomarkers is slightly more discriminatory than CA19.9 alone in this sample set.

DISCUSSION
The proteome presented in this article, is, to our knowledge one of the largest and most comprehensive proteomes to date for pancreatic cancer-related biological fluids in a single study.
Although many notable differences exist, the genomic and transcriptional make-up of cancer cell lines have been shown to recapitulate many salient aspects of primary tumors (43,44,85,86). In addition, the identification of many known biomarkers in the conditioned media of cancer cell lines for numerous cancer sites, supports that it is a good source to mine (37,87). Previously, our group has characterized the CM of breast, ovarian, prostate, and lung cancer-related cell lines using three to four cell lines per cancer site (27)(28)(29)(30). Using an LTQ mass spectrometer, 1139, 1830, 2124, and 2039 proteins were identified with at least one peptide in the breast (28), lung (29), prostate (30), and ovarian cancer (27) analyses, respectively. Given the vast heterogeneity of cancer, from our previous work we concluded that a larger number of cell lines per cancer site, as well as study of proximal biological fluids from patients, might provide a more complete picture of disease heterogeneity and the tumor-host interface, thereby facilitating the identification of stronger candidates for verification.
In the present study, we applied such an approach to pancreatic cancer. By using 2D LC-MS/MS, we characterized the proteomes of conditioned media from six pancreatic cancer cell lines, one near normal pancreatic ductal epithelial cell line and six pancreatic juice samples in two pools. All experiments were performed in triplicate and multiple search engines (MASCOT and X!Tandem), which employ different search algorithms, were used for protein identification. Previously, it has been reported that use of multiple search engines results in increased confidence of the proteins identified (88, In the first part of the study, an increase in protein yield of ϳ50% was achieved by applying a prefractionation strategy that was tailored to the SCX elution profile. SCX was the first dimension of fractionation in our multidimensional approach. Different modes of fractionation from isoelectric focusing (IEF) to SDS-PAGE fractionation and SCX have been previously compared, with different studies reporting different methods as the most effective, when coupled with MS analysis (90 -93). Fractionation of complex samples before MS analysis minimizes sample complexity and allows deeper mining into the proteome, thereby achieving increased coverage of proteins. A corollary of increased fractionation is decreased throughput. In the present study, reduced gradient times during the second dimension of separation (reverse-phase) helped to keep any increase in analysis time to a minimum.
Not all proteins identified in shotgun proteomics-driven discovery approaches will be suitable as serological biomarkers, and one of the challenges in the field is in the selection of the most promising candidates for further investigation. In the present study, we used three strategies: (1) semiquantitative analysis through label-free protein quantification between the cancer and normal cell lines, (2) integrative and comparative analysis of the multiple biofluids, and (3) tissue specificity analysis. Label-free approaches typically employ chromatographic ion intensity-based methods or spectral count-based means to obtain relative quantification of proteins among LC-MS/MS run samples (50,52). Further approximations of absolute protein abundance can be obtained through reported indices such as emPAI and absolute protein expression (APEX) (94,95). Normalized spectral counts have been reported previously to be reliable indicators of protein abundance in studies comparing different label-free methods, and strong correlation between spectral counts and protein abundance have been shown (51). When restricting analysis to proteins identified with five or more spectra, results comparable to label-based approaches are obtainable (96). In the present study, this method was used for relative quantification between the cancer cell lines and the HPDE cell line.
Using the criteria outlined in the Experimental Procedures, 119 proteins were found to be consistently expressed over fivefold in at least three cancer cell lines (compared with the HPDE cell line), 63 of which were extracellular or cell surface annotated (Table III). Included in this list were many proteins FIG. 6. Receiver Operating Characteristic curve analysis for CA19. 9 and candidates and scatter plot of CA19.9 and AGR2. AUC (area under curve) is given at 95% confidence intervals. AUC of CA19.9, AGR2, OLFM4, SYCN, COL6A1, and PIGR is depicted individually (A). CA19.9 performs best individually in this sample set of 20 cancer and 20 controls (AUC of 0.97). The combination of each candidate with CA19.9 shows improved AUC to CA19.9 alone (B). The combination of AGR2 and CA19.9 produced a complete separation of cases from controls in this sample set and was not modeled. Instead, the combination of AGR2 and CA19.9 is depicted in the log-transformed scatterplot showing complete separation of cases (blue circles) from controls (red crosses) (C) in comparison to the separation of cases from controls for CA19.9 alone (D). Note that two cancer samples which had been missed by CA19.9 had elevated levels of AGR2. The combination of AGR2, OLFM4, SYCN, COL6A1, and PIGR in panel shows an improvement to the AUC of CA19.9 alone (E). previously shown to be up-regulated in pancreatic cancer. For instance, the protein GDF15, also known as macrophage inhibitory cytokine 1 (MIC1), showed Ͼ10-fold increase in the CAPAN1, CFPAC1, PANC1, and SU.86.86 cell lines. Increased GDF15 mRNA and protein levels have been shown previously in pancreatic cancer tissue in comparison to adjacent normal controls (66) and evaluation of this protein in serum has also shown it to have diagnostic potential (67). Similarly, neutrophil gelatinase-associated lipocalin (LCN2) (71), matrix metalloproteinase 7 (MMP7) (68), complement component 3 (C3) (63,70), and leucine-rich alpha-2-glycoprotein (LRG1) (97) have been reported to be elevated in serum of pancreatic cancer patients, whereas mesothelin (MSLN) (98), tissue-type plasminogen activator (PLAT) (99), C-X-C motif chemokine 5 (CXCL5) (100) and other proteins highlighted in Table III have been shown to be up-regulated in pancreatic cancer or pancreatic neoplasia at the level of tissue and/or mRNA. In addition, other proteins, such as Lysyl oxidase-like 2 (LOXL2), have been associated with pancreatic cancer pathology and tumorigenicity. For instance, when inhibited, LOXL2 has been shown to result in reduced viability of pancreatic cancer cells and their increased sensitivity to chemotherapy (101).
Identification of these proteins provides some credence to our label-free discovery approach; however proteomic comparisons between nonmalignant and malignant biological sources are limited by the possibility that the observed differences might be because of many factors, not solely because of differences in tumorigenic potential alone. This was demonstrated as two of the seven proteins verified did not show a significant increase in plasma from pancreatic cancer patients. These proteins, PLAT and NUCB2, were expressed over fivefold in three and one of the pancreatic cancer cell lines, respectively; however this failed to translate into our plasma analysis (individual value plots for these two proteins are not shown; fold change in CM is available in supplemental Table S11).
We further investigated three other proteins, AGR2, PIGR, and COL6A1, which showed over fivefold increase in four, three, and five cancer cell lines, respectively (Table III). Our analysis of these proteins in human plasma also showed a significant increase in pancreatic cancer patients. Except for AGR2, to the best of our knowledge, neither of these proteins has previously been studied in sera/plasma of pancreatic cancer patients. AGR2 is an ortholog of the Xenopus laevis protein XAG-2, which is a protein shown to play a role in ectodermal patterning (102). The function of AGR2 in normal human states is largely unknown; however in human cancers, AGR2 has been associated with several cancer types (103)(104)(105) and recently, increased AGR2 levels were reported in pancreatic juice (62). In this latter study, Chen et al., used quantitative proteomics to profile pancreatic juice samples from pancreatic intraepithelial neoplasia (PanIN) patients in comparison to controls, and AGR2 was one of the proteins with an over twofold increase in PanIN-stage III. Although Chen et al., found diagnostic relevance for AGR2 in pancreatic juice, their analysis in six paired serum and pancreatic juice samples from PanIN patients found no correlation between serum and pancreatic juice AGR2 levels. Further analysis by this group in serum of nine pancreatic cancer and nine cancer-free controls showed no significant difference in AGR2 levels as well (62). Despite this, given that AGR2 was highly elevated in the majority of cancer cell line CM (based on spectral counting in this study) as well as its identification in pancreatic juice, we tested its levels in our screening set of plasma samples by ELISA and found a significant elevation in AGR2 levels in pancreatic cancer plasma versus controls ( Fig  5A). AGR2 has been previously shown to play a role in invasion and metastasis (103,106,107), and it might be that elevated levels of this protein occur in blood in the later stages of pancreatic cancer; however our initial results warrant further evaluation of this protein in plasma and sera in larger sample sets.
PIGR has been shown previously through MRM to be increased in endometrial cancer tissue homogenates (108); however it has not been studied in clinical samples from many other cancer sites. In the present study we demonstrate its significant increase in pancreatic cancer plasma. COL6A1 is an important component of microfibrillar network formation, associating closely with basement membranes in many tissues. It is an extracellular matrix protein also found in stromal tissue (109). Mutations in this gene play a role in muscular disorders and differential COL6A1 gene expression has been associated with astrocytomas (110,111); however it has not been studied in pancreatic cancer and was found to be significantly increased in our preliminary assessment in plasma. Taken together, the increased levels of these proteins in pancreatic cancer plasma demonstrate the utility of our label-free differential protein quantification approach to identify proteins relevant for study as potential serological biomarkers of pancreatic cancer.
The identification of cancer-derived protein alterations through integration of different biological sources is another area of interest in cancer proteomics and the integrative mining of multiple biological fluids might result in the identification of relevant candidates (15). For instance, in a recent analysis done in our laboratory (16), which compared the proteins and genes identified in six publications chosen arbitrarily to represent various biological sources and both proteomic and genomic data pertaining to ovarian cancer (two cell line CM studies, two ascites, one tissue proteomics study, and one microarray study), no proteins were found common to all six; however two proteins were found common to four of the studies. The proteins identified were WAP four-disulfide core domain protein two precursor (HE4) and GRN (granulin). Both have been implicated in ovarian cancer and WAP four-disulfide core domain protein two precursor is a recently FDAapproved ovarian cancer biomarker (112). In this respect, we also looked at proteins common to the cancer CM and pancreatic juice for identification of further candidates. These proteins were also compared with a pancreatic cancer ascites proteome (Kosanam H., Makawita S., Judd B., Newman A., Diamandis E.P. Mining the malignant ascites proteome for pancreatic cancer biomarkers. (unpublished data)) for additional filtering. Most, if not all, current biomarkers, such as prostate specific antigen for prostate cancer, CA125 for ovarian cancer, hCG for testicular cancer, etc. are secreted and shed proteins. Consequently, special focus was given to extracellular and cell surface proteins, as they have the highest likelihood of entering into the circulation (14,113). Focus was also given to proteins highly or specifically expressed in the pancreas. If a protein is only expressed in one tissue in healthy individuals, that tissue is likely the only contributor to endogenous serum levels of that protein. As such, increasing serum contributions of such a protein by a growing tumor might be more easily detected. Furthermore, many good currently used cancer biomarkers, such as prostate specific antigen, are highly specific to one tissue (38).
Interestingly, the great majority of pancreas-specific proteins (as denoted by several databases) were unique to the pancreatic juice and were not identified in the cell lines (Table IV). Similarly, the KEGG pancreatic secretion pathway was overrepresented in the pancreatic juice proteome (supplemental Fig. S2). In the exocrine pancreas, acinar cells are responsible for secretion of enzymes (zymogens) whereas ductal cells secrete primarily an alkaline fluid (114,115). Although the majority of pancreatic cancers are ductal adenocarcinomas with pancreatic ductal cell-like properties, the cell of origin of these cancers is still unclear (116,117). Previously, it has been shown that acinar cells, once having undergone a transformation to ductlike cells, show a reduced secretion of zymogens (117). The lack of pancreas specific proteins (enzymes, zymogens, etc.) in the cell line CM might likely reflect the ductal-like nature of the cell lines, whereas the presence of such proteins in the pancreatic juice might be reflective its acinar cell contributions.
Among the proteins common to all three biological fluids (supplemental Table S12), were several proteins shown previously to be increased in the serum of pancreatic cancer patients and studied as pancreatic cancer biomarkers, such as MUC1 (mucin 1) and CEACAM5 (CEA) (6 -8). Two proteins identified through integration of the biological fluids and tissue specificity analysis, not previously assessed in the serum/ plasma of pancreatic cancer patients, OLFM4 and SYCN, were verified in this study. OLFM4 has been shown to promote proliferation in the PANC1 cell line by Kobayashi et al. (118). Its mRNA levels were shown to be elevated in five cancerous, versus noncancerous pancreatic tissue samples in the same study, and OLFM4 serum protein levels have shown potential diagnostic utility for gastric cancer (119). In this study, OLFM4 showed over fivefold expression in the CA-PAN1 cell line in comparison to the HPDE cell line. It was also identified by us in the pancreatic juice and ascites and our preliminary assessment shows that it is significantly increased in plasma from pancreatic cancer patients (Fig 5B). Syncollin is a zymogen granule protein specific to the pancreas and is believed to play a role in the concentration and/or efficient maturation of zymogens (120). Syncollin has been previously identified through mass spectrometry in human pancreatic juice and shown to be increased in the quantitative proteomic analysis of plasma from a murine pancreatic cancer model (22,121); however little is known about the role of this pancreas-specific protein in pancreatic cancer and other pathologies. Our data show that it is significantly elevated in human pancreatic cancer plasma through ELISA (Fig. 5C).
The growing consensus in this field is toward the development of panels of biomarkers, as the combined assessment of multiple molecules can result in increased sensitivity and specificity, in comparison to the assessment of molecules individually. In the present study, this was demonstrated preliminarily to be true, as the combined assessment of AGR2, OLFM4, PIGR, SYCN, and COL6A1 showed improved AUC, compared with CA19.9 alone. CA19.9 has reported sensitivity and specificity values between 70 and 90% (median ϳ79%) and 68 and 91% (median ϳ82%), respectively, for detection of pancreatic cancer (note: sensitivity decreases to ϳ55% in early-stage disease and CA19.9 is often undetectable in many asymptomatic individuals; specificity decreases with benign disease) (3). In the present study, CA19.9 showed a very high AUC (0.97), likely because the cancer plasma samples used were from patients with established (primarily late-stage) pancreatic ductal adenocarcinoma. We used such samples as the goal in the present study was to determine the utility of our approach to identify proteins increased in serum/plasma of pancreatic cancer patients and to preliminarily examine if the candidates are informative. Our marker panel requires further validation with samples that have low or normal CA19.9 values and include patients with early-stage disease, as well as with benign abdominal pathologies. Of considerable note was the ability of one of our proteins (AGR2) to supplement CA19.9 and raise its sensitivity, in this small series, from approx. 90% to 100%, at 100% specificity (Fig. 6C, 6D).
Three of the five proteins increased in pancreatic cancer plasma (AGR2, PIGR, and OLFM4) were also identified in relevant clusters through hierarchical clustering analysis. em-PAI is another means of label-free protein quantification (94), and the identification of these three proteins through emPAIbased quantification, and several other proteins, that were also identified through spectral counting, is not unexpected. It aids in further corroborating our results. Recently, Wu et al. (32), used emPAI values of proteins normalized through zscores for pathway-based biomarker discovery as part of their study of 23 human cancer cell lines. In the present study, we used normalized emPAI values of proteins to gain a preliminary understanding of the data set through hierarchical clustering analysis. The six cancer cell lines chosen for analysis in the study are well characterized and already highly studied. They contain many of the major genetic aberrations present in pancreatic cancer such as mutations in Kras, SMAD4, CD16, and TP53 (43,44). Interestingly, the cancer cell lines derived from metastatic sites (SU-86 -86, CFPAC1, and CAPAN1) were clustered together, whereas MIA-PaCa2 and PANC1, which are cell lines derived from a primary tumor site were clustered together, as were the BxPc3 and HPDE cell lines. The pancreatic juice proteome was more closely associated with the metastatic cell lines than the primary cell lines. This might be because the patients had progressed to late stage (metastatic) disease at the time the pancreatic juice was obtained. BxPc3 is a cancer cell line derived from a primary tumor site and HPDE is a widely used surrogate for normal pancreatic ductal epithelial cells. Incidentally these two cell lines were the only ones with wild-type Kras expression (43), a gene that is mutated in the vast majority (Ͼ90%) of pancreatic cancers; however firm conclusions cannot be drawn regarding the clustering without further investigation. None-theless, identification of three of the five proteins verified in this study render the proteins identified in relevant clusters through normalized emPAI values a potentially viable means for the generation of biologically relevant leads.
Pancreatic cancer bodes one of the lowest five-year survival rates (Ͻ5%) of all cancer types (1). This is largely associated with the existence of locally advanced or metastatic disease in the majority of patients at the time of diagnosis. Genomic sequencing studies reveal that a broad window might exist for the detection of pancreatic cancer between the initial stages of tumor development and dissemination to secondary sites (122). Here, we present the proteomic analysis of pancreatic cancer-related cell lines and pancreatic juice for the identification of novel diagnostic leads. Label-free protein quantification methods revealed a group of proteins differentially expressed in pancreatic cancer. Included within this group were numerous proteins previously studied as pancreatic cancer biomarkers and associated with pancreatic cancer pathology. Further candidates were generated through integrative analysis of multiple biological fluids and tissue specificity analysis. Through a preliminary assessment, five proteins (AGR2, OLFM4, PIGR, SYCN, and COL6A1) were shown to be significantly increased in plasma from pancreatic cancer patients. Appropriate validation of potential biomarkers requires the use of clearly defined clinical specimen, appropriate controls and a large number of samples (preclinical, early and late-stage, benign disease, healthy controls) (123). Our preliminary assessment warrants further validation of these five proteins in larger cohorts of samples (early and late-stage pancreatic cancer, benign disease and healthy controls) as well as consideration of these proteins in the development of biomarker panels for pancreatic cancer.
The current state of cancer proteomics boasts a large number of discovery studies resulting in the generation of many potential diagnostic and therapeutic leads; however due in part to a lack of parallel high-throughput or multiplexed quantitative technologies, subsequent verification and validation of these leads is lagging. In this regard, the proteomic data-set presented in this study, when combined with existing repositories or compendiums (61,124), might also aid in further prioritizing candidates for future diagnostic and therapeutic applications.