Proteomic Analysis of Human Saliva From Lung Cancer Patients Using Two-Dimensional Difference Gel Electrophoresis and Mass Spectrometry*

Lung cancer is often asymptomatic or causes only nonspecific symptoms in its early stages. Early detection represents one of the most promising approaches to reduce the growing lung cancer burden. Human saliva is an attractive diagnostic fluid because its collection is less invasive than that of tissue or blood. Profiling of proteins in saliva over the course of disease progression could reveal potential biomarkers indicative of oral or systematic diseases, which may be used extensively in future medical diagnostics. There were 72 subjects enrolled in this study for saliva sample collection according to the approved protocol. Two-dimensional difference gel electrophoresis combined with MS was the platform for salivary proteome separation, quantification, and identification from two pooled samples. Candidate proteomic biomarkers were verified and prevalidated by using immunoassay methods. There were 16 candidate protein biomarkers discovered by two-dimensional difference gel electrophoresis and MS. Three proteins were further verified in the discovery sample set, prevalidation sample set, and lung cancer cell lines. The discriminatory power of these candidate biomarkers in lung cancer patients and healthy control subjects can reach 88.5% sensitivity and 92.3% specificity with AUC = 0.90. This preliminary data report demonstrates that proteomic biomarkers are present in human saliva when people develop lung cancer. The discriminatory power of these candidate biomarkers indicate that a simple saliva test might be established for lung cancer clinical screening and detection.

Lung cancer has long been a significant worldwide public health issue. In the United States, lung cancer is the most common cause of cancer-related death in men and women. According to the American Cancer Society, the major type of lung cancer is non-small-cell lung cancer (NSCLC) 1 , which comprises ϳ85% of lung cancers (1,2). Small cell lung cancer (SCLC) is another type of lung cancer, which represents about 15% of cases (3). Cigarette smoking causes lung cancer, which is by far the major risk factor (4,5,6). Lung cancer has a high mortality rate among malignancies, in part because symptoms are frequently absent until disease is already metastatic and is, therefore, incurable (7,8,9). Early detection represents a very promising approach to reduce lung cancer incidence and mortality. However, conventional diagnostic methods for lung cancer are unsuitable for widespread screening because they are either expensive and occasionally miss tumors or invasive cancer (10,11,12,13). Computed tomography has been wildly used for lung cancer early screening, although it often produces a high rate of false positives (7,14,15). Better diagnostic methods are urgently needed to improve the detection of lung cancer.
Tissue and blood have been extensively used in attempts to detect lung cancer earlier (1,16,17,18). Sputum has also been used to predict lung cancer by detecting its aberrant promoter methylation (19,20) or metal ions (21); however, the screening might be limited by the sample availability or composition variability. Different proteomic technologies have been engaged for this type of biomarker discovery. Quantitative proteomic technologies, like iTRAQ (22,23,24) and twodimensional gel electrophoresis (2-DE) (5,25) have been extensively used for proteome analysis in different types of lung cancer samples.
Human saliva is an attractive early detection biofluid (26,27,28) because its collection is noninvasive, and it contains a large array of proteins, many of which have been shown to be informative for the detection of oral (29,30) and systemic diseases (31,32,33). Systemic diseases, like lung cancer, may affect salivary glands directly or indirectly, and may influence the quantity of saliva that is produced, as well as the composition (26,27,34). A recent study has shown that salivary biomarkers are significantly and consistently altered in a mouse lung cancer model, which suggests that the salivary glands may be regulated by mediators released from remote tumors (35).
In this study, human saliva samples were collected from lung cancer patients and matched healthy control subjects. Salivary proteins were analyzed and compared in the two pooled samples from each group. We hypothesized that lung cancer related proteins exist in human saliva, which could be clinically used to discriminate lung cancer patients from healthy control subjects. With the discovery and validation of discriminatory proteomic markers from saliva, lung cancer will be noninvasively detected with high specificity and sensitivity.

EXPERIMENTAL PROCEDURES
Patients and Study Design-This study consisted of two phases, including a discovery phase and a confirmation phase (Fig. 1). In total, 72 saliva samples were used. All these saliva samples were collected under a protocol approved by our institutional review board (IRB) and all patients provided written informed consent. The samples were all collected, processed, and stored in a similar fashion. Unstimulated saliva samples were consistently collected, stabilized, and preserved as previously described (31). The sample supernatants were kept at Ϫ80°C prior to assay. Identified proteins were first verified in the discovery sample set (10 lung cancer samples and 10 healthy control samples). A prevalidation sample set (26 lung cancer samples and 26 healthy control samples) was used in the biomarker confirmation phase.
Proteomics Platform-Two dimensional difference gel electrophoresis (2D-DIGE) was used to separate and quantify the salivary proteins from lung cancer patients and matched healthy control subjects. MALDI-TOF MS and LC-MS/MS were used to identify the proteins in the selected gel spots.
2D-DIGE-Saliva protein concentration was determined by BCA Protein Assay Kit (Thermo Scientific Pierce Protein Research Products, Rockford, IL). Pooled saliva samples were made from 10 lung cancer samples and 10 healthy control samples, respectively, by taking equal amount of protein from each individual sample. The 200 g of proteins in each pooled saliva sample were precipitated by ethanol and then resuspended in the 2D cell lysis buffer (30 mM Tris-HCl, pH 8.8, containing 7 M urea, 2 M thiourea, and 4% 3-[(3cholamidopropyl)dimethylammonio]propanesulfonate). The total proteins of the two pooled saliva samples, lung cancer patients and healthy control subjects, were labeled by minimal CyDye Cy3 and Cy5, respectively. The two labeled samples were then combined and subjected to 2D-DIGE. Briefly, after loading the labeled samples, isoelectric focusing (pH3-10) was run following the protocol provided by Amersham BioSciences. The 17 cm immobilized pH gradient strip was rinsed in the SDS-gel running buffer before transferring into 13.5% SDS-gels. The SDS-gels were run at 15°C until the dye front ran out of the gels. Gel images were scanned immediately by using Typhoon TRIO (Amersham BioSciences, GE healthcare, Waukesha, WI) following running the SDS-PAGE. The fold change of the protein expression levels was manually obtained from in-gel analysis by using software Progenesis SameSpots (Version 3.3, Nonlinear Dynamics Limited). After in-gel image analysis, 253 spots were located on the gel with fold change ranging from 1.1 to 8. Based on their fold change (greater than or equal to 1.5), abundance, and relative location on the gel, 30 spots were further excised and in-gel digested by trypsin.
In-gel Trypsin Digestion and Mass Spectrometry Identification-For matrix-assisted laser desorption ionization/time of flight (MALDI-TOF) MS analysis, selected spots were cut and washed multiple times to remove staining dye and other inhibitory chemicals. Dried gel spots were rehydrated in the digestion buffer containing sequencing grade modified trypsin (Promega, Madison, WI). Proteins were digested in-gel at 37°C overnight. Digested peptides were extracted from the gel with trifluoroacetic acid extraction buffer and shaking. The digested tryptic peptides were desalted by using C-18 Zip-tips (Millipore, Billerica, MA). The desalted peptides were mixed with CHCA matrix (alpha-cyano-4-hydroxycinnamic acid) and spotted into wells of a MALDI plate for MALDI-TOF MS/MS (ABI4800) identification. Protein identification was based on peptide fingerprint mass mapping (using MS spectra) and peptide fragmentation mapping (using MS/MS spectra). Combined MS and MS/MS spectra were submitted for database search using Global Protein Server Explorer software version 3.6 (Applied Biosystems, Foster City, CA) equipped with the MASCOT search engine (version 2.2.0, http://www.matrixscience. com) to identify proteins from the National Center for Biotechnology Information nonredundant Homo sapiens amino acid sequence database (224815 sequences out of 9937670 sequences). The parameters for searching were enzyme of trypsin, 1 missed cleavage, fixed modifications of carbamidomethyl (C), variable modifications of oxidation (M), peptide mass tolerance: Ϯ 0.5 Da, fragment mass tolerance: Ϯ 0.5 Da, peptide charge of 1ϩ and monoisotopic. Only significant hits, as defined by the MASCOT probability analysis (p Ͻ 0.05), were accepted. The criteria of two peptides and C.I. %Ͼ95 were used for protein identification.
For LC-MS/MS analysis, the proteins in the selected spots were digested as the same as for MALDI-TOF MS analysis and loaded to LC-MS/MS (Eksigent NanoLC-2D with Thermo LTQ XL). Peptides were first enriched on a reverse phase trap column (ProteoPep II, 100 mϫ2.5 cm, C18, 5 m, 300Å, New Objective) and then eluted to analytical column (ProteoPep II, 75 m ϫ 10 cm, C18, 5 m, 300Å, New Objective). The mobile phase consisted of buffer (A) 5% acetonitrile and 0.1% formic acid in water, and buffer (B) 0.1% formic acid in acetonitrile. A flow rate of 400 nL/min and 60 min of the gradient from 15% B to 95% B was applied for the separation of peptides. Spectra were collected and processed by Xcalibur software version 3.3.0 (Thermo Scientific, Waltham, MA). Combined MS and MS/MS spectra were converted from RAW to mzXML (ReAdW version 4.3.1) and submitted for database search again Human Swissprot by using X!Tandem (version 2010.04.21). The parameters for searching were enzyme of trypsin, 1 missed cleavage, fixed modifications of carbamidomethyl (C), variable modifications of oxidation (M), parent ion tolerance of 4 Da and fragment mass tolerance: Ϯ 0.4 Da. The criteria of two peptides and log (E-value) Ͻ-10 were used for protein identification.
Western Blot-Reduced saliva protein or cellular protein (15 g of total proteins per lane) was loaded into a 10% Bis-Tris gel and run at 150V in MES SDS Running Buffer for one hour. Prestained protein standard (Invitrogen, Carlsbad, CA) was used to track protein migration. The proteins were transferred to nitrocellulose membrane by using iBlot (Invitrogen). The membrane was then washed in wash buffer (10 mM Tris-HCl, pH 7.6, 150 mM NaCl, and 0.1% (v/v) Tween-20 (Sigma-Aldrich, St. Louis, MI)) before blocking for one hour in wash buffer containing 5% of nonfat dry milk (Santa Cruz, Santa Cruz, CA). After further washes in wash buffer, the membrane was incubated with the primary antibody (Mouse monoclonal to annexin A1 (ANXA1) (ab2487, Abcam, Cambridge, MA), mouse monoclonal to haptoglobin hp2 (HP) (LS-B2863, Lifespan Bioscience), mouse monoclonal to zinc ␣2-glycoprotein (AZGP1) (sc-13585, Santa Cruz Biotechnology), mouse monoclonal to ␤-actin (A1978, Sigma-Aldrich)) according to the manufacturers' instructions in blocking buffer at room temperature for two hours. The membrane was then washed before applying the secondary antibody (Anti-mouse IgG, peroxidase-linked species-specific whole antibody from sheep, GE Healthcare) according to manufacturer instructions, for one hour at room temperature. Finally, the membrane was washed and visualized using ECL Plus detection kit (GE Healthcare). The signal intensity of the bands was measured by using Image J software (National Institutes of Health, Bethesda, MD) and the corresponding ␤-actin expression was used as reference.
ELISA-The ELISA test for human calprotectin was performed according to the manufacturer's instructions (Cell Sciences). All saliva samples were diluted 100 times with sample diluents. All standards and saliva samples were loaded in duplicate. For cell lysate, 3 g of total protein was loaded to each well in duplicate.
Data Analysis-The software Graphpad Prism (Version 5.01) and R were used for all data analysis. p value was calculated based on Wilcoxon test and p Ͻ 0.05 was used as cutoff for significance. Logistic regression method was used in the prediction model building. For all the validated biomarkers, we constructed the receiver operating characteristic (ROC) curve and computed the area under the curve (AUC) value by numerical integration of the ROC curve. The confirmed lung cancer related proteins were analyzed by logistic regression and stepwise backward selection was performed to determine final combinations of biomarkers. The sensitivity and specificity for the biomarker combinations were estimated by identifying the threshold of the predicted probability that yielded the highest sum of sensitivity and specificity. Fig. 1. All saliva samples were collected under a protocol approved by our IRB (IRB#10 -000505), and all enrolled subjects provided written informed consent. The 36 lung cancer saliva samples were collected from patients who have been diagnosed as lung cancer by using biopsy at the Ronald Reagan UCLA Medical Center and will proceed to surgery; most of them are NSCLC ( Table I). The saliva samples from 36 healthy control subjects were collected as controls by matching their age, sex, and ethnicity with the cancer group (p Ͼ 0.05). Their smoking history was matched generally by whether they are current or former smokers and their duration and intensity. Patient demographics and clinical profiles are presented in Table I.

Study Design-The study design is briefly shown in
Lung Cancer Saliva Proteome Analysis-For the proteomic biomarker discovery, 10 untreated lung cancer patients and 10 matched healthy control subjects were recruited for saliva sample collection. Pooled saliva of the cancer and control group was made by pooling equal amount of protein from each individual sample for the following analysis. Then the two pooled saliva samples were analyzed by 2D-DIGE. The merged 2D-DIGE image is shown in Fig. 2. After in-gel image analysis, 253 spots were located on the gel with fold change ranging from 1.1 to 8. Based on their fold changes, abundance and relative location on the gel, 30 spots were further excised, in-gel tryptic digested and then analyzed by mass spectrometry. Of these 30 spots, 19 had a fold change greater than or equal to 2. Their digested peptides were loaded to MALDI-TOF plate, and the data was searched by MASCOT search engine against National Center for Biotechnology Information nonredundant Homo sapiens amino acid sequence database. Fourteen of these spots were identified as 11 unique proteins. The identified proteins are shown in Table IIA. Eleven of the 30 spots had fold change greater than or equal to 1.5 but less than 2, their digested peptides were loaded to LC-MS/MS, and the data was searched against X!Tandem database. The results (Table IIB) show that eight spots were identified as seven unique proteins.
In total, 22 spots were identified as proteins with high confidence according to database searching and 16 unique proteins were identified. All the spots are marked on the merged 2D-DIGE image in Fig. 2. The detailed information about these identified proteins is listed in Table II. According to their fold changes, biological function and the availability of validation reagents, four proteins were selected for verification, including HP, ANXA1, AZGP1, and human calprotectin. In terms of the relative standard deviation of current immunoassay methods, only those spots with fold change Ն2 were selected for biomarker verification.
Verification of Candidates in Discovery Sample Set-Immunoassays were used to verify these four proteins. Western blot was used to test ANXA1, HP, and AZGP1. Human calprotectin was tested by ELISA kits. Of these four proteins, all of them could be well detected, except ANXA1. The Western blot of HP, AZGP1, and the corresponding ␤-actin are shown in Fig. 3. The distribution of HP, AZGP1 and human calprotectin in lung can-cer and healthy control groups showed the significant differences with p value of 0.0041, 0.0040, and 0.015, respectively.
Confirmation in a Prevalidation Sample Set-To further confirm these proteins' presence in human saliva, a prevalidation sample set (26 lung cancer samples and 26 healthy control samples) was used. By using the same immunoassay methods mentioned above, all the three salivary proteins (HP, AZGP1, and human calprotectin) still show the significant difference in lung cancer patients and healthy control subjects with p value of 1.48E-4, 1.05E-4, and 4.56E-5, respectively.  Biomarker Performance-The performance of these validated protein markers was further evaluated for the detection of lung cancer in the prevalidation sample set. The dot plots for all the three proteins are shown in Figs. 4A, 4 B, 4C). The ROC curve for human calprotectin is shown in Fig. 4D. The corresponding AUC value for HP, AZGP1, and human calprotectin are 0.807, 0.813, and 0.817, respectively.
Logistic regression was used to combine different markers through software R. The biomarker performance of all combinations of the three prevalidated salivary proteomic bio-markers were listed in Table III. The corresponding threshold was chosen based on the maximum sum of sensitivity and specificity. In Table IIIA, the threshold was from the prevalidation sample set. Whereas in Table IIIB, the threshold was from the discovery sample set. Even though they have the same AUC value, the threshold in Table IIIB is larger than that in Table IIIA (except the combination of HP and human calprotectin), which lead to different sensitivity and specificity. The dot plot of logistic regression combined biomarkers of human calprotectin, HP, and AZGP1 is shown in Fig. 5A. The corresponding ROC curve is shown in Fig. 5B. As listed in Table IIIA, the combined performance of HP, AZGP1, and human calprotectin can reach 88.5% sensitivity and 92.3% specificity with AUC ϭ 0.90. The corresponding PPV (positive predictive value) and NPV (negative predictive value) were also shown.
Confirmation in Lung Cancer Cells-In order to determine whether the three salivary proteins were lung cancer related, all three proteins were tested in NSCLC cells (NCI-H1299 and NCI-H460 cell line) and healthy lung cells (MRC-5 cell line). Results show that the concentration of human calprotectin in MRC-5 cells is 0.156 ng/(g total protein). It up-regulated to 0.235 ng/(g total protein) and 0.336 ng/(g total protein) in NCI-H460 cells and NCI-H1299 cells, respectively. The Western blot of HP, AZGP1 and the corresponding ␤-actin are shown in Fig. 6A. Their quantification data (Fig. 6B) shows that these two proteins were also elevated in lung cancer cells. Collectively, the concentration of all the three proteins was increased in the two lung cancer cell lines when compared with the healthy lung cells. These findings were consistent with our findings in the human saliva, which might help confirm why these three proteins were elevated in lung cancer patients' saliva.
Influence of Smoking-Among the 52 samples in the prevalidation sample set, 33 of them have smoking history and 19 of them are nonsmokers. Besides, 17 lung cancer patients have smoking history and nine lung cancer patients are nonsmokers. The three biomarkers' distribution in these nonsmoking and smoking subjects is comprehensively compared. The data is shown in Fig. 7. The significant difference between each group is labeled.

DISCUSSION
In this manuscript, we have demonstrated that discriminatory proteomic biomarkers are present in human saliva when lung cancer developed. Although a proof of concept study, this is the first study engaged a de novo biomarker discovery approach in saliva for lung cancer detection. Three proteins were first discovered by 2D-DIGE and MS and then further verified in the discovery sample set, prevalidation sample set and lung cancer cell lines. The discriminatory power of these candidate biomarkers in lung cancer patients and healthy control subjects can reach 92.0% PPV and 88.9% NPV (Table III). These data collectively demonstrated that these three salivary proteins could potentially be used as discriminatory biomarkers to differentiate lung cancer patients from healthy control subjects.  Candidate Biomarkers-In this study, our goal is to develop discriminatory proteomic biomarkers for the early detection of lung cancer. Although there were 253 spots located on the gel after in-gel image analysis, only spots with relatively large fold change (Ն1.5) were exercised for further identification and verification. It is of interests that most of these identified proteins were lung cancer related. Of note is HP, which has been found in serum and used as biomarker for the diagnosis of NSCLC (2,36). The overexpression of this acute phase protein in human lung cancer tissues has been confirmed by immunohistochemistry and Western blot (37). Notably, primary tumors of lung could release soluble factors that induce the expression human calprotectin, which could facilitate the survival and proliferation of metastasizing cancer cells (38). Human calprotectin is the complex of S100A8 and S100A9. It has antibacterial, antifungal, immunomodulating, and antipro-liferative effects in human saliva. Whereas S100A8 alone did not inhibit fungal growth, S100A9 by itself had a moderate antifungal effect (39). One possible reason is that in case of disease, more S100A8 and S100A9 may conjugate together, which result in the up-regulation of human calprotectin and the down-regulation of S100A8 and S100A9 (Table II). The up-regulation of S100A8 and S100A9 in lung cancer tissue has been tested by immunohistochemical staining (40). The mRNA levels of AZGP1 in human lung tissue has also been reported to correlate with lung cancer disease status (41). The expression of AZGP1 protein in lung tissue is evidenced by immunohistochemical staining (6). The presence of AZGP1 serum autoantibody may be used as a prognostic marker in patients with lung adenocarcinoma. The up-regulation of AZGP1 mRNA in lung adenocarcinoma may be affected by chromatin remodeling by means of histone acetylation (42). Besides, annexins are cytosolic proteins that can be associate with cell membranes in a calcium-dependent manner and ANXA1 has been found up-regulated in lung tumors (43,44). Of these selected candidates, three of them could be well detected. The negative result of ANXA1 may be because of its low abundance in saliva or the limited specificity of the antibody.
Based on the biomarker performance in the prevalidation sample set (Table III), different sensitivity and specificity have been calculated for this sample set based on the selected thresholds. In Table IIIA, the threshold was from the prevali-dation sample set that based on the maximum sum of sensitivity and specificity. In order to validate the model built in the discovery sample set, its threshold was applied to the prevalidation sample set (Table IIIB). Compared with that in Table  III A, the sum of sensitivity and specificity in Table IIIB is generally weaker because of the selected thresholds. For the human calprotectin and HP combination, the threshold from the discovery sample set is less than that from the prevalidation sample set. This caused the ideal sensitivity (92.3%) while with frustrating specificity (46.2%). For the combination of all three proteins, even though a much higher threshold was applied, good sensitivity (73.1%) and specificity (96.2%) were achieved. This data demonstrated the robust power of our discovered biomarkers in the differentiation of lung cancer patients from normal control subjects.
The Effects of Smoking on Salivary Biomarkers-Smoking is by far the leading risk factor for lung cancer (3,4,17,45). Its effects on salivary biomarkers were comprehensively compared. In this study, smoking has elevated the expression of AZGP1 and human calprotectin and decreased the expression of HP in the cancer group (Fig. 7). However, our data show that smoking did not demonstrate a significant influence on these markers. It is worth mentioning that the distribution of HP and AZGP1 show the significant difference between the healthy control and lung cancer groups, no matter whether they are smokers or nonsmokers. In the healthy control subjects group, smoking does not have a significant effect on the expression of these proteins. These results demonstrate that these verified protein biomarkers were mostly elevated by lung cancer.
The Specificity of Salivary Proteomic Biomarkers to Lung Cancer-The clinical samples used in this study were matched by their age, sex, and ethnicity between the cancer group and control group. Their smoking history was matched by whether they are current or former smokers and their smoking duration and intensity.
Most lung cancer patients will have chronic obstructive pulmonary disease (COPD) because both conditions are mainly caused by smoking. Our verified salivary biomarkers also have been reported to be related with smoking (5,6). References show that the three verified biomarkers also related with COPD in tissue (37), bronchoalveolar lavage fluids (46), and blood (16,47). However, only 1 or 2% of COPD patients will go on to develop lung cancer. Though COPD is considered the "transition state" between normal and lung cancer, its protein profiling may not likely be the median. The goal of this study is to discover biomarkers that can differentiate normal subjects from lung cancer patients. Thus COPD was not included in this study for lung cancer biomarker discovery. However, to further verify the specificity of the discovered lung cancer biomarkers, COPD should be included in further definitive validation.
The Proof of Concept-Through the two phases of the pilot study of salivary biomarker development for lung cancer detection, the preliminary data show that three proteins were identified and preclinically validated. All of them were also confirmed to be elevated in lung cancer cells. The performance of these proteins was further evaluated. The distributions of these proteins in the saliva of lung cancer patients and matched healthy control subjects have demonstrated their discriminatory power for lung cancer detection. To the best of our knowledge, the present study is the first proof-of-concept report on de novo salivary proteomic biomarker discovery and prevalidation for lung cancer detection. Although further validation in a larger sample set is necessary for definitive vali-dation, these discovered and confirmed salivary protein biomarkers have the potential to be used for the detection of lung cancer. A saliva test could be easily conducted in the clinic and would have the potential to detect disease at an earlier time point, when the likelihood of curative therapy would be greater.
Acknowledgments-We thank Dr. Jieping Yang for the kind help in lung cancer cell experiments. We thank David Akin for assistance in clinical sample collection, process, and storage. ʈʈ DTW is also supported by the Felix & Mildred Yip Endowed Professorship.