In-depth Proteomic Analysis of Nonsmall Cell Lung Cancer to Discover Molecular Targets and Candidate Biomarkers*

Advances in proteomic analysis of human samples are driving critical aspects of biomarker discovery and the identification of molecular pathways involved in disease etiology. Toward that end, in this report we are the first to use a standardized shotgun proteomic analysis method for in-depth tissue protein profiling of the two major subtypes of nonsmall cell lung cancer and normal lung tissues. We identified 3621 proteins from the analysis of pooled human samples of squamous cell carcinoma, adenocarcinoma, and control specimens. In addition to proteins previously shown to be implicated in lung cancer, we have identified new pathways and multiple new differentially expressed proteins of potential interest as therapeutic targets or diagnostic biomarkers, including some that were not identified by transcriptome profiling. Up-regulation of these proteins was confirmed by multiple reaction monitoring mass spectrometry. A subset of these proteins was found to be detectable and differentially present in the peripheral blood of cases and matched controls. Label-free shotgun proteomic analysis allows definition of lung tumor proteomes, identification of biomarker candidates, and potential targets for therapy.

that modern medicine has to offer, the five-year survival rate remains less than 15%. Although a small subset of tumors have been found to be driven by single mutated oncogenes for which active, but still noncurative, therapies are available, the vast majority of patients have complex multifactorial disease with few effective therapeutic options. New early detection strategies and molecular therapeutic targets are urgently needed to improve patient survival.
Genomic analysis has enabled us to measure the sequence, copy number, and expression changes of thousands of genes simultaneously, which can be used to discover transcripts specifically altered or expressed in tumor tissues (2)(3)(4). Although genomic studies have given important new insights into the mechanisms of carcinogenesis, therapeutic targets, and most practical biomarkers are their protein products, and the correlation between transcript sequence or level and protein function remains complex and poorly understood. Protein expression, in part, depends on transcript levels, but it is clear that significant translational and post-translational regulation of protein levels and function occurs, adding another level of complexity in the regulation of activity, especially in tumor cells (5,6). It would be ideal to have a comprehensive understanding of the novel changes in protein expression levels and the modifications of proteins in cancer cells, but the technology to directly study proteomes has lagged behind that to assess genomes and transcriptomes. We and others have used matrix-assisted laser desorption and ionization-time of flight mass spectrometry protein profiling to better understand protein expression pattern alterations and discover biomarkers, but the number of proteins detected is far from satisfactory. Matrix-assisted laser desorption and ionization-time of flight mass spectrometrybased proteomic profiling (7,8) yields only a couple of hundred anonymous signals predominantly derived from lowmolecular weight and high abundance proteins, and identification of the proteins that generate these signals is problematic. Proteome analyses based on an alternative method using two-dimensional gel electrophoresis (9) are dif-ficult to reproduce and typically yield only several hundred proteins that can be adequately compared between phenotypes.
Shotgun proteomic analysis based on multidimensional liquid chromatography-tandem mass spectrometry (LC-MS/ MS) 1 provides high-throughput peptide sequence identification of complex peptide mixtures (10). This approach has been successfully used for proteomic analysis not only of tissues (11)(12)(13)(14), but also of pleural fluid and plasma from lung cancer patients (15,16). The major advantage of this technique is sensitivity, with thousands of proteins directly identified in typical analyses (13). Detection of low abundance proteins is possible and quantitative information can be obtained from the spectral counts obtained for each peptide sequence (17)(18)(19). Recent studies by the National Cancer Institute Clinical Proteomic Technology Assessment for Cancer (CPTAC) network, in which we participate, have demonstrated high reproducibility and sensitivity of shotgun proteomics platforms (20,21). A related proteomic technology platform, liquid chromatography-multiple reaction monitoring mass spectrometry (LC-MRM-MS) provides targeted quantitative analyses of proteins through sensitive measurements of their component peptides (22).
In this report, we have demonstrated that we can not only efficiently mine the proteome of lung tumors and noninvolved lung tissue specimens with great accuracy and sensitivity (3621 proteins identified, with rigorous definitions), but that we uncovered a potentially important new molecular pathway in lung cancer progression (PAK2). Shotgun proteomic analysis of primary lung tumors was combined with MRM analysis that represents a powerful new approach to identify and quantify proteins and molecular pathways that may be altered in the pathogenesis of lung cancer. Finally we show that we can translate differentially expressed proteins to an ELISA platform and interrogate the plasma of individuals at risk (including patients with lung nodules and COPD) or with lung cancer with reproducible prediction accuracy.

EXPERIMENTAL PROCEDURES
Study Subjects-We selected cases of stage I lung carcinomas in our lung tissue biorepository that were surgically resected with curative intent at the Vanderbilt Medical Center and the Nashville VA Medical Center between January 2001 and February 2007. All patients provided informed consent for participation and this project was approved by the Institutional Review Board at both institutions. All specimens were collected immediately after surgery, snap frozen, and stored in liquid nitrogen until the time of analysis to minimize the effects of storing and handling the tissue. Tissue specimens used in this analysis included cancer tissue from pathological Stage I lung cancer patients with no previous cancer history. In addition, normal lung tissue specimens were obtained from patients undergoing lung resection for suspicion of lung cancer but not carrying a diagnosis of lung or other cancer. We evaluated normal lung from cases resected because of clinical suspicion of lung cancer rather than adjacent normal-appearing lung from cancer resections in order to avoid referring to premalignant "field-cancerization" alterations, known to be present in lung cancer patients, as our "normal" control. These patients had similar demographic characteristics as shown in supplemental Table S1.
Shotgun Analysis Sample Preparation-To increase the efficiency of our analysis, pooled protein lysates from sets of 19 to 20 samples per phenotype were analyzed. Four protein lysate pools were generated: two from noninvolved lung tissue (normal control, n ϭ 20 and n ϭ 19, respectively), one from stage I adenocarcinomas (ADC, n ϭ 20), and one from stage I squamous cell carcinomas (SCC, n ϭ 20). Patient characteristics are described in supplemental Table S1. Hematoxylin and eosin (H&E) stained sections were reviewed by a pathologist (ALG) to identify areas containing tumor cells and to determine tumor percentage. Specimens were selected with at least 80% tumor cells. By aligning the H&E section with the tumor block, macrodissection was able to be performed on the tumor tissue with a razor blade.
Tissue proteins were extracted and digested by the method of Wang et al. (23). Five to 20 macrodissected sections of tissue were suspended in 200 l of 50% 2,2,2-trifluoroethanol (Acros Organics, Belgium), 50% 50 mM ammonium bicarbonate (Fisher Scientific) (v/v). The tissue lysates were homogenized using sonication with three, 20 s cycles at 30 s intervals, followed by incubation at 60°C for 1 h with shaking. After the 1-h incubation, the sonication cycle was repeated. After the second sonication cycle, the protein concentration was measured using the BCA protein assay (Pierce Biotechnology, Rockford, IL) of each individual tissue lysate using bovine serum albumin as a protein standard. A total of 1 mg of pooled protein lysate was created by adding equal amounts of protein from each individual sample from each of the four pools in a total volume of 50 l each. These pooled lysates (1 mg) were digested by diluting with 100 l of 40 mM tris (2-carboxyethyl) phosphine hydrochloride (Pierce Biotechnology) with 100 mM dithiothreitol (Acros Organics), and incubating at 60 C o for 30 min with shaking. After cooling down the tubes, 100 l of 200 mM iodoacetamide was added, and incubated 20 min at room temperature in the dark. Samples were diluted with 600 l of 50 mM ammonium bicarbonate. In order to generate peptides suitable for MS-MS analysis, these pooled lysates were digested by adding trypsin (20 g, trypsin/protein ratio of 1:50 (w/w), Promega, Madison, WI) and digestion was carried out at 37°C overnight.
After lyophilizing the resulting peptide mixture, samples were reconstituted by distilled water and applied to Sep-Pak C18 cartridges (Waters, Milford, , MA). After washing the column with 1 ml of distilled water, digested peptides were eluted from the column with 1 ml of 80% acetonitrile. Eluted peptides were evaporated to dryness in a SpeedVac (Thermo-Fisher) and reconstituted with 2.5 ml of 6 M urea for isoelectric focusing (IEF) of peptides.
IEF Fractionation of Peptide Digests and LC-MS/MS Analyses-Four independent IEF peptide separations were performed on aliquots of each pool digest equivalent to 200 g protein. Immobilized pH gradient (IPG) strips, 24 cm, pI 3.5-4.5 (IPGphor, GE Health Care, NJ), were rehydrated overnight, then loaded and focused using an Ettan IPGphor 3 IEF system (GE Health Care) for 25 h as described previously (13). Immediately after focusing, IPG strips were cut into 20 pieces and peptides were extracted, the extracts were dried down, desalted, dried down again and then reconstituted in 0.1 ml of 0.1% formic acid for LC-MS/MS analysis (13).
LC-MS/MS analyses were performed on an LTQ-Orbitrap hybrid mass spectrometer (Thermo Fisher) equipped with an Eksigent 1D Plus NanoLC pump and autosampler (Dublin, CA). Peptides were separated on a packed capillary tip (Polymicro Technologies, 100 m X 11 cm) with Jupiter C18 resin (5 m, 300 Å, Phenomenex) using an in-line solid-phase extraction column (100 m ϫ 6 cm) packed with the same C18 resin using a frit generated with liquid silicate Kasil 1 (24) similar to that previously described (25). Mobile phase A consisted of 0.1% formic acid and Mobile phase B consisted of 0.1% formic acid in acetonitrile. A 95 min gradient was performed with a 15 min washing period (100% A at a flow rate of 1.5 l min Ϫ1 for the first 10 min followed by a gradient to 98% A at 15 min) to allow for solid-phase extraction and removal of any residual salts. Following the washing period, the flow rate was reduced to 0.6 l min Ϫ1 and the gradient was increased to 25% B by 50 min, followed by an increase to 90% B by 65 min and held for 9 min before returning to the initial conditions. Centroided MS/MS scans were acquired on the LTQ-Orbitrap using an isolation width of 2 m/z, an activation time of 30 ms, an activation q of 0.250 and 30% normalized collision energy using 1 microscan with a max ion time of 100 ms for each MS/MS scan and 1 microscan with a max ion time of 500 for each full MS scan. The mass spectrometer was tuned prior to analysis using the synthetic peptide TpepK (AVAGKAGAR), so that some parameters may have varied slightly from experiment to experiment, but typically the tune parameters were as follows: spray voltage of 2 KV, a capillary temperature of 150°C, a capillary voltage of 50 V and tube lens of 120 V. The AGC target value was set at 500,000 for the full MS and 10,000 for the MS/MS spectra. A full scan was obtained for eluting peptides in the range of 400 -2000 atomic mass unit (amu) was collected on the Orbitrap portion of the instrument at a resolution of 60,000, followed by five data-dependent MS/MS scans on the LTQ portion of the instrument with a minimum threshold of 1000 set to trigger the MS/MS spectra. MS/MS spectra were recorded using dynamic exclusion of previously analyzed precursors for 60 s with a repeat of 1 and a repeat duration of 1.
Peptide Identification from MS/MS Data, Protein Assembly, and Filtering-Captured peak lists from the mass spectral .RAW files were transcoded to mzML version 1.1 format by the ProteoWizard MSConvert tool (26). The software was configured to transcode only tandem mass spectra; MS scans were excluded. MS data was searched using the MyriMatch version 1.6.57 search algorithm (27) against the International Protein Index (IPI) human database version 3.64 supplemented with potential contaminant sequences for a total of 84,079 sequences in forward and reverse orientation. The search results were filtered and assembled using IDPicker version 2.0 (28). Any number of miscleavages was allowed and peptides were allowed to have one nontryptic end. A static modification for carbamidomethylation was defined for cysteines, whereas dynamic modifications reflecting oxidation of methionines and formation of N-terminal pyroglutamines were allowed. Precursor mass tolerance was set at 10 ppm and product ion mass tolerance was set at 0.5 m/z. Peptide identification stringency was set at a maximum of 2.5% reversed peptide identifications (5% overall peptide false discovery rate (FDR)) and a minimum of two unique peptides to identify a given protein within the full data set. IPI database entries that mapped to the identical set of peptide identifications were grouped into "protein groups," which consist almost exclusively of isoforms or identical proteins resulting from redundancy in the database (29). Because the majority of false identifications occur with low frequency and such low-count identifications are unlikely to yield statistically significant results, an additional filter was applied that removed all protein groups that were identified by 7 or fewer MS spectra in the comparison between normal control and ADC, SCC, by three or fewer in the comparison between ADC and SCC. The full and unfiltered IDPicker output data set is provided as supplementary data (supplemental Table S2, see data sharing below) and includes a complete list of protein IDs and their sequence relationships, the number of distinct peptides and peptide coverage observed per protein, the number of spectra observed per protein, and full peptide sequences.
Comparison of Proteome Inventories Based on MS/MS Spectral Count Data-Previous results have shown that a frequency-based analysis approach using the number of observed spectra (spectral counting) provides a measure of protein concentration in complex protein mixtures, especially for more abundant proteins (17,30). We developed a statistical method to model and compare different shotgun data sets for proteins that were likely to be present at different levels in the samples analyzed (31). To account for the specific properties of spectral count data, this method uses a quasi-likelihood model (32), which has no restriction on distribution assumptions. This approach also accounts for the type of overdispersion and/or underdispersion usually observed in shotgun data. To protect against multiple comparison issues while simultaneously testing thousands of proteins, we applied the FDR method (33). Normalization between different runs was achieved by adding the number of confident identifications into the model as the offset. This serves as the size variable, which determines the number of opportunities for proteins to occur. The model generates quasi-p values for each of the protein entries in the data set and estimates an average spectral count () across the replicate analyses. Three separate comparisons were performed: a) one using the combined spectral counts from the two control groups compared with either the pooled ADC and SCC data sets, and b) one using the pooled ADC data set compared with the pooled SCC data set. Proteins were considered statistically different if their spectral counts log ratio was more than 3 (in SCC versus normal comparison) or 2 (in ADC versus normal comparison, ADC versus SCC comparison) and if their quasi p value was less than 0.01. The log ratio is the logarithm transformation of the ratio between the peptide counts of a protein in one pool over the same protein in the other pool.
MRM Analysis-LC-MRM-MS Analyses-Extracts from unfractionated tryptic peptide digests from lung tissues were prepared as described above and 2 l aliquots were analyzed on a TSQ Quantum Ultra mass spectrometer (Thermo-Fisher) equipped with a Thermo Surveyor solvent delivery system, autosampler, and a microelectrospray source and equipped with a 6 cm (150 m inner diameter) fused silica capillary precolumn packed with C18 resin (5 m, 300Å) (Phenomenex Inc., Torrance, CA). Peptides were loaded on the precolumn and desalted for 6 min with 1% acetonitrile/0.1% formic acid at 1.8 l min Ϫ1 and then resolved by reversed-phase chromatography on an 11 cm fused silica capillary column (100 m inner diameter) packed with same C18 media at a flow rate of 1 l min Ϫ1 . The mobile phase consisted of 0.1% formic acid in either HPLC grade water (A) or acetonitrile (B). Peptides were eluted with a linear gradient from 98% A at 6 min to 75% A at 45 min, then programmed to 50% A at 55 min to 10% A from 65-76 min and returned to 99% A from 76 -85 min. For MRM analysis, 3-4 optimized transitions for each peptide from the corresponding proteins were monitored. Instrument parameters included Q2 gas 1.5 mTorr, scan width 0.002 m/z, scan time 5 ms, Q1 and Q3 FWHM resolution were 0.2 and 0.7 respectively. Collision energy was continuously adjusted according to the relationship CE ϭ 3.314 ϩ 0.034 ϫ precursor m/z. The integrated chromatographic peak areas for the transitions of each targeted peptide were summed. When data for multiple peptides were collected for a candidate protein, the reported values are for the peptide yielding the highest mean signal.
Peptide Selection and MRM Transitions-For each protein candidate, up to four proteotypic peptides from the shotgun data set having the highest spectral counts for a given charge state were selected for MRM analysis. The MS/MS spectrum with the greatest overall signal intensity for each selected peptide was extracted and the 3-5 most intense y ions were selected. Observed y ions were required to be within 1.0 m/z of the predicted values and ions within a window below 20 m/z of the precursor were excluded. Precursor m/z, selected fragment m/z and computed collision energies were saved in a text file, which was imported into the Xcalibur method MRM setting tables. Peptide sequences used for analysis are provided in a separate table (supplemental Table 3).
Webgestalt Cellular Component Analysis and GO Enrichment Analysis-To categorize the cellular compartments from which the identified proteins in our data set were derived we used Webgestalt (34) cellular component analysis. To do this, we first had to transform the IPI accession to Ensembl gene identifiers before applying the Webgestalt program. The second analysis we performed was to characterize those proteins differentially expressed between the SCC and ADC pools. GOTM (35) was used for this GO enrichment analysis. For the analysis of proteins differentially expressed between ADC and SCC, we selected 54 proteins using a cutoff log ratio of more than two and a quasi p value of less than 0.01. We used all identified proteins in our data set as reference data set. Enrichment analysis was performed with the statistical cutoff of 0.01(adjusted p value, by Benjamini & Hochberg (33)).
PAK2 Expression in Nonsmall Cell Lung Cancers (NSCLCs)-A tissue microarray (TMA) of NSCLC was prepared from paraffin-embedded formalin-fixed tumor tissues from the patients whose frozen tissues were used for proteomic analyses. Paraffin-embedded formalin-fixed tissue blocks representing 20 normal lung, 19 adenocarcinoma, and 19 squamous cell carcinoma were used to construct the TMA following protocols described earlier (36). For all tissue blocks, H&E-stained sections were reviewed and the areas to be punched for array production were carefully marked. One millimeter diameter cores were punched in duplicate from the selected area of each specimen and inserted into a recipient paraffin block for a total of 116 cores from 58 patients. Five micron sections were cut from the TMA block and mounted onto charged slides. Immunohistochemical staining was performed by the avidin-biotin complex method using the Vectastain Elite ABC kit (Vector Laboratories, Burlingame, CA) as described previously (36). Slides were deparaffinized in xylene and hydrated with successive 100 and 95% ethanol washes (v/v). Antigen retrieval was performed in citrate buffer (DAKO, Carpinteria, CA) with heating by microwave for 10 min followed by 20 min at room temperature. Slides were washed with distilled water twice and placed in 3% H 2 O 2 blocking solution for 10 min to inhibit endogenous peroxidase activity. After washing with distilled water, blocking serum was applied for 1 h. Slides were incubated with PAK2 antibody (diluted 1:100, Epitomics, Burlingame, CA) overnight at 4°C. Universal secondary antibody from the kit and the ABC Elite reagent was applied for 30 min each, followed by washing of the slides with TBST (1%Tween-20). Reactions were developed using diaminobenzidine (DAKO) and counterstaining with hematoxylin.
A pathologist (ALG) scored each of the tissues represented on the TMA and each of the cores was classified as either positive or negative. A sample was defined as partially positive if one core was positive and the other core was negative. The immunostained TMA slide was also analyzed by an Ariol SL-50 automated slide scanner (Applied Imaging, San Jose, CA). Because the PAK2 antibody demonstrated primarily cytoplasmic staining, we trained the Ariol scanner to accurately distinguish positive areas (DAB staining) and whole core (counter staining). The positive area divided by whole core (percent positive area) was recorded. We manually excluded inappropriate areas such as connective tissue and folded tissue. The data was analyzed using the restricted residual maximum likelihood (REML) based mixed effect model.

SDS-PAGE and Immunoblot Analysis-Tumor and normal tissues
were lysed with RIPA buffer (Sigma-Aldrich) containing Complete Protease Inhibitor (Roche Diagnostics, Indianapolis, IN). The tissue lysate was sonicated three times for 20 s at 30-s intervals. The protein concentration was measured, and 25 g of protein from each sample was separated on SDS-polyacrylamide gels and transferred to nitrocellulose membranes (Bio-Rad, Hercules, CA). The membranes were then blocked with 5% milk in Tris-buffered saline containing 0.1% Tween 20 (TBST) and were incubated with the primary antibody anterior gradient homolog (AGR2), PTGES3 (Abcam, Cambridge, MA), STRAP (Becton-Dickenson, Franklin Lakes, NJ), AKR1B10 (Abcam, Cambridge, MA), and beta-actin (Sigma-Aldrich, St. Louis, MO) overnight. After washing the membrane with TBST, the membrane was incubated with horseradish peroxidase-conjugated secondary antibody and was developed using the chemiluminescent detection kit (Pierce, Rockville, IL).
Down-regulation of PAK2 by shRNA-The MISSION shRNA expression constructs (TRC0000002118) and control (nontarget) were purchased from Sigma-Aldrich (St Louis, MO) as glycerol stocks. The shRNA vector and packaging vectors (pMD2.G and pCMV dR7.74ps PAX2) were cotransfected into 293FT cells by Fugene 6 according to the manufacturer's protocol (Roche Diagnostics, IN). The culture medium was changed 7 h after transfection. Two days post-transfection, virus-containing culture medium was collected, centrifuged to remove cells and filtered with a 45-m pore size membrane. Target cells were prepared in a six-well plate at a confluency of 70 -90%. Transduction was performed by replacing the culture medium with lentivirus-containing media for 6 h. After transduction, the culture medium was replaced, and the cells incubated for 4 days before the experiments were conducted. Four days after transduction, the cells were harvested for the colony formation assay and the growth rate assay. For the colony formation assay, 10 3 cells were seeded in each well of a six-well plate and were incubated for 7 days. Cells were fixed with 4% paraformaldehyde in PBS (v/v) and were stained with crystal violet. For the growth rate assay, 1000 cells were seeded in each well of a 96-well plate, and cell viability (the number of living cells) was measured by WST-1 reagent (Clontech, Mountain View, CA) in triplicate each day for 4 days according to the manufacturer's protocol. The PAK2 knockdown level was confirmed by immunoblot analysis of the cells after 4 days of transduction.
Transmembrane Migration Assay With a PAK Inhibitor (IPA-3)-Cells were serum-starved for 18 -24 h then seeded into a chamber containing an 8.0 m porous membrane (24-well plate format) at a density of 2 ϫ 10 4 H1299 cells and 4 ϫ 10 4 A549 cells. Wells contained serum free medium with a DMSO control or IPA-3 at 1 or 10 M. After a 45 min preincubation in inhibitor, each chamber was moved into a well containing either 0.1% FBS, 10% FBSϩDMSO, 10% FBSϩ 1 M IPA-3, or 10% FBSϩ 10 M IPA-3. After 6 h (H1299) or 12 h (A549) of incubation, cells were fixed and stained with the Diff quick Stain kit (SIEMENS, Munich, Germany). Cells not passing through the membrane were removed by wiping with a cotton swab. Cells in eight randomly selected fields were counted under a microscope using the 20ϫ objective lens.
Matrigel Invasion Assay With a PAK Inhibitor (IPA-3)-Cells were prepared and seeded onto Matrigel invasion chambers (Becton Dickinson, Franklin Lakes, NJ) in the same manner as for the migration assay with the exception that 25 ϫ 10 3 cells were seeded for both H1299 and A549. Cells on the chamber membrane were fixed and stained after 24 h (H1299) or 48 h (A549) of incubation. Cell counts were obtained as described for the transmembrane migration assay. For statistical analysis, mixed-effect model was used to assess the effect of 4 treatment groups (0.1% FBS, 10% FBSϩDMSO, 10% FBSϩ 1 M IPA-3, or 10% FBSϩ 10 M IPA-3) on cell count where square root transformation was applied to the cell count data to meet the normal assumption of mixed-effect model.
Plasma Collection and Preparation-Our initial signature evaluation set consisted of 45 plasma samples and was obtained from individuals with and without lung cancer from our Lung SPORE repository. Thirty samples were from patients with histology proven lung cancers (stages IA-IIIB) of either squamous cell carcinoma or adenocarcinoma subtypes. Another 15 control plasma samples were obtained from individuals matched for age, gender, and smoking history. Control individuals were proven without evidence of lung cancer at one year follow up. Plasma was prepared following a standard operating procedure (supplemental Table S11), aliquoted, and stored at Ϫ80°C until analysis.
Our independent set consisted of 169 independent blood samples, cases carefully matched to controls for relevant clinical characteristics including COPD, and in addition, the majority of our controls also had small pulmonary nodules thought to be cancer, but ultimately proven not to be cancer. This is one of the most rigorous comparisons possible. The clinical characteristics of this set are given in supplemental Tables S12 and S13.
ELISA Analysis of Plasma Samples-The plasma protein concentration measurements were tested in two phases. First we verified the differential protein expression expected from the shotgun and MRM analysis for 12 candidate proteins in 45 samples consisting of 15 SCC, 15 ADC of predominantly advanced stages and matched controls (supplemental Table S11). Second, we validated a subset of nine candidate biomarkers in a case control study of 75 samples made of 41 cases of SCC and 34 controls (supplemental Table S12) and a subset of six candidates in a second case controls study of 94 samples made of 45 cases and 49 matched controls (supplemental Table S13).
Candidate biomarker protein levels were measured in plasma from controls and patients with NSCLC using commercially available sandwich ELISA kits. The optimal plasma dilutions for each protein that fall within the linear range of the assays detection were determined for each analyte empirically. Samples were diluted in ELISA kit diluents buffer following the manufacturer recommendations. ELISA for Calprotectin (a heterodimer of S100A8 and S100A9) that was used as a surrogate to measure levels of S100A8 and S100A9 was purchased from Hycult biotec (Canton, MA) at plasma dilution of 1:40. Levels of LGAL7, LGAL3A, CSTB, MSLN, and advanced glycosylation end product-specific receptor were measured in plasma using ELISA assay kits purchased from R&D systems (Minneapolis, MN) using plasma dilutions of 1:3, 1:2, 1:10, 1:50, and 1:4 (v/v) respectively. ELISA for PPBP measurements was performed using ELISA kit specific for proteolytic fragment of this protein, (neutrophil activating protein-2) NAPII from R&D systems (Minneapolis, MN, USA) at 1:1000 plasma dilution. ELISA for Krt19 (proteolytic fragments CYFRA 21-1) was purchased from DRG International, Inc (Mountainside, NJ) and used at 1:4 dilutions. Plasma measurements for MMP2, matrix metallopeptidase 10, NAMPT, and IGBP2 were performed at Aushon BioSystems, Inc using chemiluminescence detection system (Billerica, MA). The characteristics of the ELISA assays used the two test sets is summarized in supplemental Table S14. RESULTS NSCLC Shotgun Proteomic Analysis Detection, Coverage, and Reproducibility-Proteomic analysis was performed on pools of samples of two lung tumor types, adenocarcinoma (ADC), and squamous cell carcinoma (SCC), with each pool composed of 20 individual tissue specimens. We used two pools of normal lung tissue samples with 20 and 19 tissue specimens respectively as normal controls. Lung cancer specimens were obtained from patients with pathological stage I disease. The control pool consisted of normal lung tissues from patients found not to have lung cancer during resection for suspected lung masses.
Spectral counting was used as our primary quantitative metric to examine protein expression profiles for each tissue type. Shotgun proteomics using LC-MS/MS is essentially a sampling technique, in which probability of detection is a function of protein abundance and quantitation is assessed by counting the numbers of spectra that map to identified proteins. This method has been evaluated in several laboratories and displays robust performance and broad dynamic range (17)(18)(19). To reduce the complexity of the mixture for each LC-MS/MS analysis and to improve detection of low abundance proteins we used peptide IEF separation as a prefractionation method (13), followed by LC-MS/MS analysis of each IEF fraction on an LTQ-Orbitrap-hybrid instrument (IEF-LC-MS/MS). Following a database search to identify peptide sequences that match the acquired MS/MS spectra, IDpicker (29) was used to filter the peptide identifications to a uniform FDR and to generate a minimum set of proteins to account for the identified peptide sequences. We refer to these assignments as "protein groups." Although some peptide sets map to multiple protein database entries that comprise a group, the vast majority of the assigned protein groups map to a single database entry.
The complete IDpicker data set of proteins identified by at least two distinct peptides consisted of 5923 protein groups, of which eight were classified as "contaminant proteins" (common laboratory contaminant proteins such as trypsin and human epidermal keratins, which may or may not be true contaminants) and 631 as "reverse proteins." In large data sets, a threshold of two unique peptides per protein identification yields a large number of reverse proteins-in this case leading to an estimated protein-level FDR of 21.3%. We thus required at least eight spectral counts per protein to accept identifications into this data set, which resulted in a calculated 2.3% protein-level FDR. The resulting data set contained 3621 protein groups, including six "contaminant proteins" and 42 "reverse proteins" ( Table I). The number of protein groups in the ADC, SCC, and normal pools were 3513, 3558, and 2968, respectively (Table I; Fig. 1A).
To increase the peptide coverage and to allow assessment of the spectral count distributions within and across groups, we carried out four independent experiments on the same peptide mixture. A heatmap view of the spectra counts from four independent experiments in the SCC pool (Fig. 1B) shows that a high percentage, 62.2%, of all discovered protein groups was observed in all of the experiments, whereas ϳ17.3% of the proteins were observed only once. We subjected all of the identified proteins to gene ontology (GO) cellular component analysis to provide information about the cellular compartments from which the identified proteins came. The analysis showed that whereas most of the proteins were cytoplasmic and nuclear (including chromosomal), pro-teins associated with the membrane, extracellular matrix, and with organelles such as the ER, Golgi, and mitochondria (Fig.  1C) were also successfully identified. This result shows that shotgun proteomics provides a comprehensive analysis that can identify proteins from every cellular compartment.
Differential Protein Levels in ADC and SCC versus Normal Controls-Our primary goal was to determine the differences in protein expression profiles between tumor tissue and tissue with normal histology. Of the 3621 protein groups, 2863 protein groups (79.1%) were observed across all pools, whereas 598 protein groups (16.5%) were shared among the tumor pools (Fig. 1D). A Venn diagram depicts the number of overlapping and unique protein groups identified in the various tissues (Fig. 1A). The number of protein groups observed in each pool but below the detection threshold in the others was 11 in the combined control pools, 11 in the ADC pool and 44 in the SCC pool (supplemental Tables S4A-4C).
Spectra assigned to some known tumor markers were detected in the tumor pools, but not in normal tissues in our analysis. Carcinoembryonic antigen (CEA) is one of the first tumor markers to be described and is elevated in a number of NSCLCs (37). This protein was detected in our ADC and SCC pools with spectral counts of 14 and 10, respectively (supplemental Table S5). Squamous cell carcinoma antigen (SCC) is also a well-known tumor marker frequently observed in lung and laryngeal cancer patients' sera (38,39). We observed this molecule with a spectral count of 13 in the SCC pool and three in the ADC pool. Other known tumor markers, CYFRA 21-1(KRT19) and neuron-specific enolase, were also identified in our data set, but were not significantly different from normal tissue (data not shown).
Twenty-five significantly up-or down-regulated proteins are listed along with their spectral counts (Tables IIA, 2B, IIIA, and 3B). No spectra were detected in normal tissue for any of the listed up-regulated proteins in either cancer histology so a log ratio could not be calculated. Of the top 25 up-regulated proteins listed for each histology group, most are shared between the ADC and SCC pools. However, six proteins are unique to ADC (calcitonin-related polypeptide alpha, Chromogranin B, IPI00911047, proprotein convertase subtilisin/ kexin type 1, and nerve growth factor inducible) whereas two proteins are found only in the SCC pool (visinin-like 1 and matrix metallopeptidase 10). For the down-regulated proteins, many have peptides that can be detected at some level in tumor tissues. Some of the identified proteins were (as might be expected) hemoglobins from vascular elements. However the presence of embryonic (epsilon) and fetal (gamma) hemoglobin proteins is interesting and not previously reported to be expressed in lung or lung cancer. Only advanced glycosylation end product-specific receptor had no detectable spectra in either cancer histology. A complete list of up-/down-regulated proteins is provided in supplemental Table S6A-6D with spectral counts and, when detected in the normal samples, a log ratio.

Verification of Shotgun Data by LC-MRM-MS and Western
Blot-LC-MRM-MS is a targeted, quantitative proteomics method that enables multiplexed quantitation of proteins through measurements of representative peptides in a medium-throughput manner (22). LC-MRM-MS provides a means to more precisely measure levels of biomarker candidate proteins without the use of antibodies. We chose 14 proteins of interest for analysis by LC-MRM-MS. Each protein was quantified by measurements of at least two peptides and each peptide was quantified as the sum of four MRM transitions.
When we analyzed the original individual samples making up the pools of 20 ADC, 20 SCC, and 22 normal controls used for the MS-MS analyses, we successfully confirmed the presence and relative expression levels of AGR2 ( Fig. 2A), proliferating cell nuclear antigen (PCNA) (Fig. 2B), desmoglein 2 (Fig. 2C), cellular retinoic acid binding protein 2 (Fig. 2D), and advanced glycosylation end product-specific receptor (Fig. 2E). These four up-regulated and one down-regulated proteins showed significant differences based on spectral count data from the shotgun analyses. These differences could be verified by LC-MRM-MS in individual sample analyses based upon the comparison of normalized peak areas from the target peptides. The results of an additional nine up-regulated proteins are shown in supplemental Figs. S1A-S1I. With most of the candidate proteins, there are clear differences in peak areas of the peptides monitored between the tumor and normal samples. However, cathepsin B (CTSB) (supplemental Fig.  S1F) and eukaryotic translation initiation factor 5A (supplemental Fig. S1G), while appearing to have higher protein levels in most tumor samples, also have significant levels of protein in the selected normal samples. We also validated protein expression level of several up-regulated proteins by Western blot (Fig. 2F). We examined four proteins that were detected as being highly expressed in tumor pools, which included AGR2 (spectral count ADC 24, SCC 24, normal 0), STRAP (spectral count ADC 18, SCC 20, normal 0), PTGES3 (spectral count ADC 15, SCC 14, normal 0) and AKR1B10 (spectral count ADC 16, SCC 43, normal 0). Immunoblot analysis showed a clear difference in expression level between tumor samples and normal controls, indicating our proteomics analysis can routinely be confirmed by antibody based immunoassays. In some cases; however, there is discordance, requiring further investigation and may be related to the differing specificities of MS proteomic detection and relatively less specific antibody-based methods.
To further assess the concordance between shotgun and MRM analyses of differentially expressed proteins, we analyzed three additional sets of normal lung (n ϭ 21), ADC, (n ϭ 20), and SCC (n ϭ 20) tissues by LC-MRM-MS (supplemental Table S7). The majority of these samples (70%) were stages I-II, whereas the remainder (30%) were stages III-IV. These analyses measured 95 proteins that had been observed in the shotgun data sets for normal, ADC and SCC tissues. A comparison of these MRM measurements with the spectral count data for shotgun analyses of the pooled samples are presented in supplemental Table S8. In the ADC versus normal comparison, out of 50 proteins that were differentially expressed, 44 proteins were verified by MRM (88%). In the SCC versus normal comparison there were also 50 proteins that were differentially expressed and 42 of them were validated by MRM (84%). Thus, most of the protein expression differences measured by shotgun analyses of the normal, ADC and SCC pools were confirmed in an independent set of specimens.

Gene Ontology Enrichment Analysis of ADC versus SCC Reveals Discriminate Proteins Between the Two Histologies-
Although they are usually readily distinguishable morpholog-ically, biological differences between ADC and SCC at the molecular level have not been fully defined. To elucidate molecular differences at the protein level, the protein profile for ADC was compared with that derived for SCC. The top 20 proteins significantly differentially expressed between histologies are shown in supplemental Table S9A and S9B. In addition to identifying enriched proteins in each of the histologies, we used GOTM (35), to conduct GO enrichment anal-ysis. GO enrichment analysis identifies functions and pathways that are differently activated between two different histology subtypes (supplemental Fig. S2). For GO functional analysis, we used 54 differentially expressed proteins. All proteins in our data set were used for a reference to identify  S2A; supplemental Table S10A), which included several keratin family members, KRT5, KRT6A, 6B, and KRT13 enriched in SCC. The ADC histology was enriched in cellular retinoic acid-binding protein, and sciellin (SCEL). In the cellular component analysis (supplemental Fig. S2B; supplemental Table S10B), the extracellular region category was enriched. This category includes SCC enriched matrix metallopeptidase 10, SERPINB2, SERPINB5, as well as ADC enriched proteins such as mucins (MUC1, MUC5B). There are two known ADC markers in this category, surfactant protein B (SFTPB), which was reported as having ADC-specific expression by immunohistochemistry (40), and polymeric immunoglobulin receptor (PIGR). These results show that proteins associated with tissue development and structural molecular activities are differentially expressed in ADC and SCC. Identification of p21-activated Kinase, PAK2-In addition to biomarker identification, our shotgun approach provides insight into potential molecular pathways that are aberrantly regulated in NSCLC. We were especially interested in proteins that are potential therapeutic targets. One group of proteins, the Group I, p21-activated kinases (PAKs), were of interest as PAK2 was identified in each histology, but with a much higher spectral counts in the SCC pool (Table 2B). PAK1 was also present, but was much less abundant than PAK2. The PAK proteins have been reported to be up-regulated in multiple cancer types (41-43) but have not yet been implicated in NSCLC. To confirm increased expression levels of PAK2 in NSCLC in our shotgun proteomics analysis, we stained a tissue microarray from the same patients whose frozen tissue blocks were used for proteomics analysis. After pathologic (ALG) review, it was determined that out of 19 SCC samples, 12 cases were strongly stained, whereas three normal cases were stained out of 14 samples (Table IV). A representative image of normal and squamous cell carcinoma tissue stained with a PAK2-specific antibody shows high protein levels of PAK2 in the tumor (Fig. 3A). In the ADC samples we did not see high expression as frequently as in SCC with only three positives out of 19 samples. We observed a varying degree of cytoplasmic staining of PAK2 in tumor cells, especially in SCC histology where weak staining was observed in some normal type two alveolar cells, inflammatory cells, and vascular endothelial cells. We also quantified positive staining area by image analysis. The distribution of percent PAK2 positive area for each histology is shown in Fig. 3B. This method of analysis supported the previous result of high PAK2 in SCC and also provided evidence of moderately higher PAK2 in ADC when compared with normal tissue.
To investigate whether PAK2 is important for cell survival in lung cancer cell lines we knocked down PAK2 with several shRNA expressing lentiviruses in H1299 and A549 cells. Loss of PAK2 protein ( Fig. 4A; supplemental Fig. S3A) resulted in a significant reduction in cell growth on plastic and a dramatic reduction in the number of colonies formed (Figs. 4B and 4C; supplemental Figs. S3B and S3C). A small molecule inhibitor of the Group I PAKs, IPA-3, has recently been identified through a screening effort to identify allosteric inhibitors of these proteins (44). This in vitro screen took advantage of the auto-inhibitory domain present in the Group I PAKs and the conformational change that takes place upon CDC42 binding and subsequent activation of the kinase activity. Allosteric inhibitors, unlike ATP mimetics, provide a higher level of specificity because they rely on protein structure rather than the ATP-binding domain that is so similar among the ATP-binding proteins. However, because of the structural similarity of the group I PAK proteins, IPA-3 will inhibit PAKs 1, 2, and 3 (44). To assess the ability of IPA-3 to interfere with migration and invasion of NSCLC cell lines we performed migration and invasion assays (Figs. 4D and 4E; supplemental Figs. S3D and S3E). In both analyses, IPA-3 inhibited the migration and invasion abilities in a dose-dependent manner. These observations support the idea that p21-activated kinase may have an important role in lung cancer tumorigenesis and as potential therapeutic targets.
Translation of the Tissue-based Approach Into Candidate Plasma Biomarkers-To test how the large protein inventories obtained from shotgun proteomics analysis of lung tumors may identify tumor-derived proteins that are elevated in the plasma of early stage lung cancer patients and thus of potential use as candidate diagnostic biomarkers, we tested the concentration of shotgun proteomic candidate biomarkers in the plasma of individuals with and without lung cancer. To prioritize our candidates, we used a combination of statistical and biological criteria including: the expression level in cancer tissues, presence in plasma and other biofluids, and statistical significance using commercial and publicly available bioinformatics tools (including Ingenuity Pathway Analysis (IPA), Webgestalt). The resulting list of candidates was further refined by cross-examination against other publicly available databases such as the Plasma Proteome Project of Human Proteome Organization (PPP-HUPO). This process yielded 164 candidate biomarkers that are differentially expressed in early stage squamous or adenocarcinomas of the lung as compared with matched controls.
The plasma protein concentration measurements were tested in 2 phases (supplemental Fig. S4). First, in a proof of concept experiment, we verified the differential protein expression expected from the shotgun and MRM analysis for 12 candidate proteins for which commercial ELISA kits were available. We tested those in 45 samples consisting of 15 SCC, 15 ADC with predominantly advanced stages and matched controls (supplemental Table S11). The data presented in supplemental Fig. S5 show that all 12 of the selected proteins were detected in the plasma from control, ADC or SCC samples. The concentrations of MMP2 were significantly higher in plasma from SCC patients compared with controls, and KRT19 was nearly significant (p ϭ 0.06), consistent with previously published reports (45,46). Here we also report for the first time elevated plasma concentrations of LGALS7 in patients with SCC of the lung.
Second, we developed and carefully validated custommade ELISAs for 30 proteins using in-house raised and purified polyclonal antibodies for detection in plasma. These were first tested in a pilot study for protein concentration in 45 plasma samples of patients with advanced disease and matched controls (data not shown). From these 30, the nine candidate biomarkers that displayed a consistent relationship to cancer phenotypes in this proof of concept phase were then tested by ELISA in a larger data set for confirmation of these findings. The performance characteristics of these nine ELISAs are presented in supplemental Table S14. These nine candidate biomarkers were tested in a case control study of 75 samples comprising 41 cases of SCC and 34 controls (supplemental Table S12). A subset of six candidates in a second case controls study of 94 samples made of 45 cases of ADC and 49 matched controls (supplemental Table S13). The results of this independent and carefully case-control matched set of 135 patients (the sum of all cases and controls from these two data sets 169 (74 ϩ 94) corresponds to a total of 135 individuals because controls overlap between the groups) are shown in supplemental Table S15. A multivariable logistic regression model was built to assess these biomarkers' ability to differentiate cancers from controls. The prediction performance was measured primarily by AUC metrics. The results for prediction of the diagnosis of squamous cell carcinoma (n ϭ 41 cases and 34 matched controls) showed an AUC of 0.72 [95% CI: 0.62, 0.82]. The AUC for the prediction of the diagnosis of the adenocarcinomas (45 cases, 49 matched controls) was 0.59 [95% CI: 0.5, 0.67]. Note that both the reported point estimators and their corresponding 95% confidence interval were bias-corrected by the bootstrap method. We also investigated the support vector machine algorithm to address performance accuracy and obtained very similar results (supplemental Table S15). It is important to note that not only were the cases and controls were very closely matched for clinical characteristics including age, gender, pack year smoking history, and COPD, but the majority of the "controls" were patients who presented for evaluation of pulmonary nodules suspicious for cancer, but were proven not to have cancer after a one year follow up. Lung cancer is by far the largest single cause of cancer death in the western world, responsible for the deaths of more people than the next four most frequent cancers combined. Significant progress has been made in the treatment of this disease with the identification of subsets of lung cancer containing mutations in the epidermal growth factor receptor and fusions of the anaplastic lymphoma kinase gene, but together these account for only about 15% of cases, and in even these cases therapy is palliative and not curative. Technology is becoming available for the practical sequencing of the entire genome or transcriptome, which will undoubtedly identify other potentially targetable lesions, though preliminary data suggest that they will be in even smaller subsets, and many will not have clear therapeutic interventions. It is very clear, however, that genes can also be activated or repressed by mechanisms other than coding region mutations, including methylation and histone acetylation, and significant progress has been made using expression array technology to address the genome-wide alterations in RNA expression that results from these mechanisms.
However, in virtually every case, it is neither DNA or RNA that is the ultimate biologically functional moiety, but protein.
Protein expression and activity levels can be affected by another layer of postgenomic regulation, including post-trans-lational modification and alteration in protein degradation. Thus the most complete portrait of the dysregulated functional networks in a cancer cell will very likely be achieved only at the protein level, and a complete knowledge of the state and structure of the proteins in a cancer cell will undoubtedly be more informative than that of the genome. This would have use not only for better understanding the biology of cancer, but also defining new targets for therapy and even practical biomarkers for the early identification of disease while it is still in a readily curable stage. However, previous proteomic technologies have limited analysis to a very small and superficial subset of the cancer proteome.
Although other studies have yielded inventories of selected subsets of proteins, such as phosphopeptides (47), the shotgun proteomic analysis presented here is the most comprehensive analysis of the global cancer proteome in the literature to date. We have identified and assessed the levels of more than 3500 proteins in the two major subtypes of nonsmall cell lung cancer and uninvolved lung tissues, significantly greater than any previous study (48 -54). The success of this multidimensional proteomic analysis from trypsin-digested biological specimens has been partly owing to the careful selection of prefractionation technologies. An optimized pre-fractionation method is a key step if shotgun proteomic analysis is to have good sensitivity for the detection of low-abundance peptides and to provide reproducible results. Our previous investigation (13) indicated that the IEF prefractionation method may have an advantage in terms of reproducibility over strong cation exchange-based multidimensional LC-MS/MS (MudPIT) (55) and that four replicate analyses of each pooled sample achieves ϳ90% of the identifications that would be made by exhaustive analysis (nine replicates). The other critical element of the analysis is our approach to database searching of the obtained MS/MS spectra, the filtering of results and inference of protein identifications. We used reversed sequence database searching to enforce a peptide identification FDR of 0.05 and required two distinct peptide sequences per protein identification. We also used parsimonious assembly to minimize redundant protein identifications (29) and also required eight or more spec-tral counts across the pools for protein identifications in the final data set. These latter steps significantly reduced the number of protein identifications, but produced a much more reliable data set, as demonstrated by a 2.3% protein FDR. Thus, although the final data set represents a conservative inventory of protein expression in the tissues, the inventories are sufficiently deep to reflect the biological diversity of both ADC and SCC.
With the approach presented here, we have generated a large data set of identified proteins whose levels were significantly different between lung tumor groups and normal lung tissues. Previous proteomic analyses (e.g. those on two-dimensional gel platforms) provided relatively superficial coverage of the proteome and detect only the most abundant proteins (9). One indication that we have probed deeply enough into the proteome to potentially identify novel tumor biomarkers was that spectra were detected for the known lung tumor biomarkers such as carcinoembryonic antigen in ADC pool, and SCC in the squamous cell carcinoma pool (supplemental Table S5). To further substantiate our shotgun results, a targeted proteomic analysis employing LC-MRM-MS, Western blot, and immunohistochemistry were used to examine individual patient tumor samples for expression of select proteins identified in the shotgun analysis. The high percentage of proteins verified as up or down regulated (11 of 13 up and 1 of 1 down) with separate methods of analysis increases our confidence that differences detected with the initial shotgun analysis provide useful information regarding protein levels in NSCLC. MRM analyses of 95 proteins in three additional sets of normal, ADC and SCC tissues confirmed many additional protein expression differences measured in our shotgun analyses of the pools and provides further evidence of the reliability of our approach.
In this first study of its kind in lung cancer, we decided to pool samples from each of these two subtypes. We fully understand that there is significant heterogeneity within these morphologically defined subtypes (e.g. 10% frequency of EGFR mutations in adenocarcinomas), but our intent was to attempt to uncover common alterations in these two major categories that would be detectable by a pooled approach. This is particularly important when seeking to define candidate diagnostic biomarkers applicable to patients with suspected tumors. In addition, a pooled approach allows "signal averaging" to emphasize the dominant signals and reduce noise. The efficacy of our approach is demonstrated by the MRM data showing significant heterogeneity in single samples for our marker proteins, but clear confirmation of the histologic class distinctions. Single sample analyses are underway, but these will take a long time to achieve statistically significant cohort sizes, especially within these heterogeneous subgroups.
The decision not to use adjacent "normal" lung as our normal but rather patients who underwent resection for suspected lung cancer but were found to have benign lesions was also deliberate. Lung cancers have clear molecular abnormalities that have been observed in adjacent normal-appearing tissue ("field effect") that are clearly found in cancer patients and not matched normal (56). With our intent to discover differences between cancer and noncancer normal, and for biomarker discovery, we felt that clinically matched normal controls were the optimal choice, especially for a pooled strategy.
Among the proteins differentially expressed between normal and tumor, several have already been shown to be upregulated in lung cancer. These would include serine/threonine kinase receptor associated protein (STRAP), aldo-keto reductase family 1, member B10 (AKR1B10), and proliferating cell nuclear antigen (PCNA), which has been routinely used as a proliferation marker in many cancer types (57,58). AKR1B10 is involved in retinoid metabolism and may be responsible for the alterations in retinoid signaling known to be important in lung carcinogenesis (59). Maspin (SERPINB5) was higher in our SCC pool and is a well-described marker of the SCC histology (60). This is supported by data from the public immunohistochemistry database, the Human Protein Atlas (61), which shows high levels of maspin in SCCs of the lung. Proteins up-regulated in our study not previously associated with lung cancer are creatine kinase, mitochondrial 1B (CKMT1B), which was found to be up-regulated in breast cancer by expression arrays and associated with poor prognosis (62), and FAM3C (63), a secreted interleukin-like protein that was identified in proteomic screens for secreted proteins from pancreatic cancer cells (62,64). More recently FAM3C was discovered as a translationally controlled protein implicated in EMT and breast cancer progression (63). Also discovered through a proteomics approach in CRC was proteasome (prosome, macropain) activator subunit 3, a proteasome-associated protein that was recently identified as a novel serum tumor marker in this cancer type (65). The expression of calcitonin-related polypeptide alpha, a peptide associated with increased angiogenesis and endothelial cell proliferation in placental development, but not previously associated with cancer, was higher in ADC versus normal tissue (66). Notably, the mini-chromosome maintenance (MCM) proteins, MCM2, MCM3, MCM4, MCM5, and MCM6, were also present in our list of up-regulated proteins. These proteins are reported to exist in a heterohexameric complex with MCM7 (67). The fact that five of the six proteins in this complex were detected in our analysis is further support that this methodology is robust. The MCM proteins, and MCM2 in particular, have been investigated as proliferation markers in many types of cancer because they appear to be more reliable than Ki67 at detecting dysplastic cells and correlating with survival in lung cancer (68,69).
Our analysis also identified a potential drug target not previously reported to be important in lung cancer. The p21activated kinases (PAKs) were identified in this study as being dysregulated in NSCLC and we demonstrated that knockdown or drug inhibition of the PAKs has significant functional consequences in vitro. The PAKs have been studied in multiple cancer types, but have not yet been examined for their role in lung cancer. The group I PAKs are known to bind to and phosphorylate multiple substrates that have been shown to be involved in proliferation, survival, and cytoskeletal remodeling (70). Ample evidence exists for the involvement of the Group I PAKs in breast, colon, prostate, as well as many other cancers (70). With regard to lung cancer, indirect evidence exists for the involvement of the PAKs in tumor development and progression. In a mouse model of KRAS-induced lung tumors, Rac1, an upstream activator of the PAKs, was necessary for tumor formation and progression (71). Our group's finding through a proteomic analysis that PAKs are over expressed in NSCLC, and that knockdown or drug inhibition affects the tumorigenic properties of lung cancer cell lines, would be the first to directly implicate PAKs as having a role in lung tumor biology, and suggest their potential as therapeutic targets.
In addition to identification of dysregulated pathways and proteins to better understand NSCLC biology or find new therapeutic targets, our analysis also identified novel candidates as blood biomarkers for the early detection of lung cancer. It is possible that a subset of the proteins found to be over-expressed in lung tumors might be detectable by highly sensitive and specific assays in the peripheral blood and be useful in the clinical setting. In this study, we show results from an early assessment of the plasma levels of candidate biomarkers derived from this shotgun proteomics analysis. Although direct MRM analysis of plasma could have been employed, detection of low abundant target analytes without enrichment techniques would not have afforded the same level of sensitivity as that of the ELISA platform. Therefore, ELISA assays were initially performed to test candidates for which commercial assays for the protein candidates of interest were available. We then developed and carefully characterized in-house ELISAs and tested our most promising candidates in 135 clinical specimens including early stage lung cancers and matched controls with very encouraging performance characteristics. The large inventories of differentially expressed proteins identified by this platform yield many more candidates for diagnostic plasma biomarkers that we are in the process of evaluating.
Proteomics analysis may help identify molecular differences not easily addressed by microarray or genomic analysis, as it is clear that many very important cellular processes are primarily regulated by protein stability or post-translational modification. Further development of this technology will undoubtedly increase the analytic depth and can be extended to catalog not only protein levels but also specific somatic mutant sequences and post-translational modifications adding further dimensions to our understanding of the cancer cell inaccessible to genomic and transcriptomic technologies. Our results demonstrate that a tissue-based in depth proteomic approach allows the identification and validation of candidate dysregulated pathways and diagnostic biomarkers. Future work will be required to validate potential candidate targets and to determine whether the identified plasma biomarkers may add value to the performance of the screening chest CT scans and change clinical management.