In-depth Proteomic Analysis of Six Types of Exudative Pleural Effusions for Nonsmall Cell Lung Cancer Biomarker Discovery*

Pleural effusion (PE), a tumor-proximal body fluid, may be a promising source for biomarker discovery in human cancers. Because a variety of pathological conditions can lead to PE, characterization of the relative PE proteomic profiles from different types of PEs would accelerate discovery of potential PE biomarkers specifically used to diagnose pulmonary disorders. Using quantitative proteomic approaches, we identified 772 nonredundant proteins from six types of exudative PEs, including three malignant PEs (MPE, from lung, breast, and gastric cancers), one lung cancer paramalignant PE, and two benign diseases (tuberculosis and pneumonia). Spectral counting was utilized to semiquantify PE protein levels. Principal component analysis, hierarchical clustering, and Gene Ontology of cellular process analyses revealed differential levels and functional profiling of proteins in each type of PE. We identified 30 candidate proteins with twofold higher levels (q<0.05) in lung cancer MPEs than in the two benign PEs. Three potential markers, MET, DPP4, and PTPRF, were further verified by ELISA using 345 PE samples. The protein levels of these potential biomarkers were significantly higher in lung cancer MPE than in benign diseases or lung cancer paramalignant PE. The area under the receiver-operator characteristic curve for three combined biomarkers in discriminating lung cancer MPE from benign diseases was 0.903. We also observed that the PE protein levels were more clearly discriminated in effusions in which the cytological examination was positive and that they would be useful in rescuing the false negative of cytological examination in diagnosis of nonsmall cell lung cancer-MPE. Western blotting analysis further demonstrated that MET overexpression in lung cancer cells would contribute to the elevation of soluble MET in MPE. Our results collectively demonstrate the utility of label-free quantitative proteomic approaches in establishing differential PE proteomes and provide a new database of proteins that can be used to facilitate identification of pulmonary disorder-related biomarkers.

The lungs are covered by parietal and visceral pleural membranes, including a small amount of fluid (10 -20 ml) in the pleural cavity that helps the lungs expand and contract smoothly. Pleural effusions (PE) 1 , an accumulation of pleural fluid, contain proteins originating from the plasma filtrate and are released by inflammatory or epithelial cells. PE is triggered by a variety of etiologies, including malignancies and benign diseases such as pneumonia (PN), tuberculosis (TB), pulmonary embolism, heart failure, renal dysfunction, and autoimmune disease (1). Based on their biochemical characteristics, PEs are classified as transudative or exudative; determination of the PE type is a crucial step in the differential diagnosis and management of PEs. Transudative effusions, generally caused by systemic diseases, can be effectively distinguished from exudative PEs using the established modified Light's criteria (2,3). However, further discrimination among different exudate types such as malignant and nonmalignant effusions (e.g. paramalignancies or acute and chronic inflammatory diseases) is sometimes diagnostically challenging because of similar biochemical and/or cellular profiles. For example, neutrophil-rich fluid is generally observed in patients with bacterial PN whereas lymphocytic effusions are generally observed in cancer or chronic inflammatory diseases such as TB (4).
PEs caused by cancer are generally divided into two categories, malignant (MPE) and paramalignant (PMPE). MPEs result when cancer cells metastasize to the pleural cavity (stage IV), wherein exfoliated malignant cells are observed in pleural fluid by cytological examination or detected in percutaneous pleural biopsy, thoracoscopy, thoracotomy, or at autopsy (5). PMPE occurs in cancer patients with no evidence of tumor invasion in the pleural space and may be caused by airway obstruction with lung collapse, lymphatic obstruction, or the systemic effects of cancer treatment (5). A high percentage of MPEs (Ͼ75%) arise from lung, breast, and ovarian cancer or lymphoma/leukemia. Lung cancer is a major etiology underlying MPE (6); however, only ϳ40 -87% patients with MPE can be accurately diagnosed upon initial examination (7). Inaccurate diagnosis of MPE and PMPE underestimates or overestimates the disease stage and leads to inappropriate therapy. Thus, it is important to identify a specific and powerful biomarker to distinguish MPE from benign diseases and PMPE.
Notably, tumor-proximal body fluids are promising sources for biomarker discovery because they represent a reservoir of in vivo tumor-secreted proteins without a large dynamic range or complexity of plasma or serum (8). Tumor-proximal fluids include PEs, nipple aspirate, stool, saliva, lavage, and ascites fluid. Previously, we utilized the powerful analytical capability of high-abundance protein depletion followed by one-dimensional SDS-PAGE combined with nano-LC-MS/MS (GeLC-MS/MS) for biomarker discovery to generate a comprehensive MPE proteome data set from 13 pooled nonsmall cell lung cancer (NSCLC) patients (9). Because a variety of pathological conditions can lead to exudative effusions, generating different PE proteomic profiles would accelerate discovery of potential PE biomarkers that can be used to discriminate between malignant and nonmalignant pulmonary disorders. The aim of this study is to establish differential PE proteomes from six types of exudative PEs, including three MPEs (from NSCLC, breast, and gastric cancers), one PMPE from NSCLC, and two benign diseases (TB and PN), using a labelfree semiquantitative proteomics approach. Our results were verified by clinical validation of three potential biomarkers using an enzyme-linked immunosorbent assay (ELISA; Fig. 1).

EXPERIMENTAL PROCEDURES
Patient Population and Clinical Specimens-This study was approved by the Institutional Review Board for Research Ethics at the Chang Gung Memorial Hospital, Linkou, Tao-Yuan, Taiwan. Written informed consent was received from all patients prior to sample collection. Medical records of patients were reviewed, and all patient identities were protected. All PE samples were obtained from patients subjected to PE aspiration at Chang Gung Memorial Hospital, Linkou, Tao-Yuan, Taiwan. Patients with PMPE were radiologically monitored regularly over 6 months to exclude the possibility of occult malignancy within the effusion. For biomarker discovery, we used 60 PEs: 10 lung adenocarcinoma MPEs, 10 lung adenocarcinoma PMPEs, 10 TB PEs, 10 PN PEs, 10 gastric cancer (GC) PEs, and 10 breast cancer (BC) PEs. Demographics of these 60 patients are summarized in supplemental Table S1. To validate potential biomarkers by ELISA, 345 PE samples from six types of PE were used: 109 MPEs and 43 PMPEs from NSCLC, 61 TB, 68 PN, 45 breast cancer, and 19 gastric cancer. Demographics of these individuals, including age, gender, and smoking behavior are summarized in supplemental Table S2. PE samples were centrifuged at 2000 ϫ g for 15 min at 4°C. The cell-free supernatants were transferred to a new tube with a protease inhibiter mixture (Roche, Mannheim, Germany, cat. no. 11836145001) and stored at Ϫ80°C until analysis. To detect the protein expressions in lung cancer tissues by Western blotting, four pairs of specimens of surgically resected primary lung adenocarcinoma lesions and adjacent noncancerous tissues were obtained from four patients (two stage IA and two stage IV). The fresh frozen tissues were stored at Ϫ80°C until analysis.
1D-SDS-PAGE and In-gel Protein Digestion-After depletion of high-abundance proteins, equal protein amounts obtained from each PE sample type were pooled (10 patients/group), and 40 g protein samples were resolved on 10% SDS-PAGE and stained by Coomassie Brilliant Blue G-250 (AppliChem GmbH, Darmstadt, Germany). The entire gel lane was cut into 30 pieces and subjected to in-gel tryptic digestion as described previously (9). Briefly, gel pieces were destained in 50 mM NH 4 HCO 3 /ACN (3:2, v/v) three times for 25 min each and then dehydrated in ACN and dried in a SpeedVac. In-gel proteins were reduced with 10 mM dithiothreitol in 25 mM NH 4 HCO 3 at 56°C for 45 min, allowed to stand at room temperature (RT) for 10 min, and then alkylated with 55 mM iodoacetamide in the dark for 30 min at RT. After the proteins were digested by sequencing grade modified porcine trypsin (1:100; Promega, Madison, WI) overnight at 37°C, peptides were extracted from the gel with ACN to a final concentration of 50%, dried in a SpeedVac, and then stored at Ϫ20°C until further use.
Reverse-Phase LC-MS/MS-After trypsin digestion, each peptide mixture was reconstituted in high-performance LC buffer A (0.1% formic acid; Sigma, St. Louis, MO), loaded into the trap column (Zorbax 300SB-C18, 0.3 ϫ 5 mm; Agilent Technologies) at a flow rate of 20 l/min, and washed with buffer A at a flow rate of 20 l/min for 10 min. Desalted peptides were then separated by 10 cm analytic C 18 column (75 m inner diameter; New Objective, Woburn, MA). The peptides were eluted by a linear gradient of 5-30% buffer B (99.9% ACN containing 0.1% formic acid) for 47 min, 30 -45% buffer B for 5 min, 45-95% buffer B for 2 min, and 95% buffer B for 4 min at a flow rate of 0.25 l/min. The LC setup was coupled with a 2D-linear ion trap LTQ-Orbitrap MS (Thermo Fisher, San Jose, CA) operated by Xcalibur 2.0.7 software (Thermo Fisher). The MS full-scan was performed over a range of 400 -2000 Da and at a resolution of 60,000 at m/z 400. Internal calibration was performed using a (Si(CH 3 ) 2 O) 6 H ϩ ion signal at m/z 445.120025 as a lock mass (10). The data-dependent procedure that alternated between one MS scan followed by 10 MS/MS scans for the 10 most abundant precursor ions in the MS survey scan was applied. The m/z values selected for MS/MS were dynamically excluded for 40 s; the electrospray voltage applied was 1.8 kV. Both MS and MS/MS spectra were acquired using a microscan with maximum fill times of 1000 and 150 ms for MS and MS/MS analyses, respectively. Automatic gain control was used to prevent overfilling the ion trap; 5 ϫ 10 3 ions were accumulated in the ion trap for generation of MS/MS spectra. For triplicate GeLC-MS/MS analysis, PE samples were subjected to three independent 1D-SDS-PAGE in-gel protein digestion followed by reverse-phase LC-MS/MS analysis.
Database Searching-Resulting MS/MS spectra were searched using the Mascot algorithm (version 2.2.06, Matrix Science, London, U.K.) against the Swiss-Prot database (released Jun 15, 2010, selected for Homo sapiens, 20367 entries). The search parameters were set as follows: carbamidomethylation (C) as the fixed modification, oxidation (M) as variable modification, 10 ppm for MS tolerance, 0.5 Da for MS/MS tolerance, and one for missing cleavage. Validation of MS/MS-based peptides and protein identification were completed with Scaffold proteome software (version 3.3.2, Proteome Software Inc., Portland, OR), in which peptide and protein threshold cut offs were at a minimum of 95.0% with a minimum of two peptides.
Calculation of Spectral Counts and Bioinformatic Analysis-To generate comparative PE proteome databases, label-free semiquantitation of protein levels was determined by spectral counts. The number of spectra assigned for each protein were exported from the Scaffold software. The total spectral counts were calculated by totaling the spectral counts obtained in each experiment (30 LC-MS/MS runs). The normalized spectral count of each protein in the experiment was obtained by dividing the spectral count of a given protein by the total spectral counts of the experiment. After normalization, a q value was used to evaluate significant differences between different types of PEs. First, we determined the p value by ␤-binomial modeling (11), followed by the calculation of a q value according to Storey et al. (12); a q Ͻ 0.05 between different PE sample types was considered significantly different. The fold change was determined by dividing the average spectrum count from any two different PE sample types. We failed to identify all proteins in all experiments (triplicate for each PE sample type); unidentified proteins or missing values in a particular example were assigned a spectral count of one to avoid dividing by zero and to prevent overestimation of fold changes. After quantification analysis, differentially expressed proteins of interest were converted into Swiss-Prot accession numbers and uploaded into Meta-Core Version 6.13 build 61585 (GeneGo, St. Joseph, MI) for Gene Ontology (GO) of cellular processes analysis.
Cluster Analysis of PE Proteomes-All spectral counts were imported into Microsoft Excel and transformed to Z-scores, a common normalization approach used in microarray data analysis (13). Zscores were calculated as Z ϭ (X Ϫ x )/ x, where X represents individual spectral counts, x represents the mean of spectral counts for an identified protein across different PE types, and x is the standard deviation associated with x . A spreadsheet containing the Z-scores was uploaded to Partek ® Genome Suite (Partek Inc., St. Louis, MO) and analyzed using principal component analysis (PCA) as well as a two-way hierarchical clustering algorithm (HCL); the parameter used in HCL was according to Pearson distance and Ward's aggregation method. All quantified proteins were arranged into mock phylogenetic trees (dendrograms), wherein the y axis displays PE triplicates and the x axis shows proteins.
ELISA-Hepatocyte growth factor receptor (MET) and dipeptidyl peptidase IV (DPP4) protein levels in PEs were determined by sandwich ELISA (R&D Systems, Minneapolis, MN). Receptor-type tyrosine-protein phosphatase F (PTPRF) levels in PEs were determined by homemade ELISA as described by Whitmore (14). Briefly, monoclonal rat anti-human PTPRF antibodies (MAB3004; R&D Systems), polyclonal goat anti-human PTPRF antibodies (AF3004; R&D Systems), and purified NSO-derived recombinant human PTPRF (aa27-1251; R&D Systems) were used. Monoclonal rat anti-human PTPRF antibodies were coated on ELISA plates (250 ng per well) overnight at 4°C, followed by six washes using wash buffer (0.05% Tween 20 in PBS) and blocking with reagent diluents (1% bovine serum albumin in PBS) for 2 h at room temperature (RT). PE samples with 1:50 dilution in reagent diluents and various amounts of recombinant PTPRF (standards; 0.3125-20 ng/ml) were added to the wells followed by incubation for 1 h at RT. After another six washes, polyclonal goat anti-human PTPRF antibodies (1:400) were added and incubated on a shaker for 1 h at RT. After washing, horseradish peroxidase (HRP)labeled donkey anti-goat IgG antibodies (Santa Cruz Biotechnology, Santa Cruz, CA) were added to wells and incubated for 1 h at RT. After six washes, tetramethylbenzidine substrate was added to wells for 10 min on a shaker in the dark, and the reaction was finally stopped with the addition of 2 N H 2 SO 4 . The resulting signals were measured by SpectraMax ® M5 Multi-Mode Microplate Reader (Molecular Devices, Sunnyvale, CA) at 540 and 450 nm, respectively. The performance of the ELISA assay was examined by coefficient of variation (CV%) values determined in the intra-plate and inter-plate of ELISA assays using five individual PE samples. The results are shown in supplemental Fig. S1. All of the ELISA assays were performed twice for each sample. In addition, four PE samples were used as the internal standards in every ELISA plate, and all the protein levels determined in each batch were normalized based on these internal standards.
Cell Culture-CL1-0 and CL1-5 cells were kindly provided by Dr. P.C. Yang (Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan, Republic of China). The CL1 cell line was established from a 64-year-old man with a poorly differentiated adenocarcinoma, and the subpopulations CL1-0 and CL1-5 from CL1 cells were selected according to their differential invasiveness and metastatic ability in vitro and in vivo (15). The human breast cell line, MDA-MB231 was obtained from the American Type Culture Collection (ATCC, Rockville, MD). CL1-0 and CL1-5 cells were cultured in RPMI 1640 (Invitrogen, Grand Island, NY) supplemented with 10% FBS. MDA-MB231 cells were cultured in DMEM (Invitrogen) supplemented with 10% FBS. All cells were cultured at 37°C with a humidified atmosphere of 95% air/5% CO 2 . Conditioned media (CM) from the various cancer cell lines were collected and processed as previously described (16).
Western blot-Protein samples prepared from CL1-0, CL1-5, and MD-AMB 231 cancer cell lines were separated on a 7.5% SDS-PAGE gel, then transferred onto a PVDF membrane, blocked with skim milk for 1 h, and incubated for 16 h at 4°C with goat anti-MET polyclonal (R&D systems, cat. BAF358) or rabbit anti-MET monoclonal (Spring Bioscience, Pleasanton, CA, cat. M3440) antibodies. Bound primary antibodies were detected by HRP-labeled donkey anti-goat (Santa Cruz Biotechnology, Santa Cruz, CA) or HRP-labeled donkey anti-rabbit IgG secondary antibodies (GE Healthcare), respectively. Target proteins were visualized using an enhanced chemiluminescence system (Millipore).
Statistical Analysis-All data were processed using SPSS 12.0 (SPSS Inc., Chicago, IL). All continuous variables were expressed as the mean Ϯ standard deviation (S.D.). To compare protein levels in different PE types, we used the nonparametric Mann-Whitney U test and linear regression to analyze variations in ELISA results for different clinical parameters; a p Ͻ 0.05 was considered statistically significant. Receiver-operator characteristic (ROC) curves were constructed by plotting sensitivity versus 1-specificity, and the areas under curves (AUC) were analyzed by the Hanley and McNeil method (17). The optimal cut off for establishing an accuracy score in each case was determined using Youden's index (J) (18). The ROC and AUC values of combined biomarker candidates were calculated using binary logistic regression. Briefly, we applied the binary logistic regression to calculate the probability of combined biomarkers according to their protein concentrations, and then the probability was used to obtain the ROC and AUC values for combined biomarkers. To analyze the AUC performance of individual marker and combined markers, we applied PanelComposer, a web-based panel construction tool developed by Professor Young-Ki Paik's research team for multivariate analysis of disease biomarker candidates (19). In this method, pairwise comparison between a panel of biomarkers and the individual proteins is performed using the Mann-Whitney U test and represented with a p value.

Generation of Proteomic Data sets from Six PE types-To
accelerate discovery of potential PE biomarkers, we performed quantitative PE proteomics analysis (immunodeple-tion and GeLC-MS/MS) on samples from patients with six different PE types, including MPE and PMPE from NSCLC, TB, PN, breast cancer, and gastric cancer (Fig. 1). Validation of depletion efficacy and homogeneity of fractionated PE samples are shown in supplemental Fig. S2. Ten samples from each PE type were pooled and subjected to 1D-SDS-PAGE analysis and in-gel protein digestion followed by reverse-phase LC-MS/MS analysis; spectral searching identified 772 nonredundant proteins with high confidence (95.0% minimum peptide probability, 95.0% minimum protein probability and a minimum of two peptides). The reliability of the PE data sets was confirmed by the protein and peptide false discovery rate (FDR; supplemental Table S3). The highest FDRs for protein and peptide identification were 0.74% and 0.13%, respectively. The overlap of identified proteins between any two of three independent LC-MS/MS experiments for each type of PE was ϳ84% (supplemental Fig. S3). Detailed information for proteins identified in triplicate experiments of six pleural effusion samples is shown in supplemental Table S4; this information includes name, gene symbol, protein probability, best peptide probability, unique peptide number, unique spectral number, spectral counts, sequence coverage, best Mascot ion score, and peptide sequence. A list of 772 nonredundant proteins identified in the different PE sample types is summarized in supplemental Table S5. The overlap and intersection of the PE proteomes were analyzed. A total of 721 proteins were identified in NSCLC-MPE or benign disease (TB and PN); 472 were common in these two types of PEs (supplemental Fig. S4A). There were 559 proteins identified in NSCLC-MPE or NSCLC-PMPE. Of these, 387 were detected in both types of PEs, yielding an overlap of 69.2% (supplemental Fig. S4B). We identified 633 proteins in NSCLC or nonlung cancer (BC and GC), and 477 were overlapping in these two PE types (supplemental Fig. S4C). Furthermore, the identified proteins of five PE types (NSCLC-MPE, TB, PN, NSCLC-PMPE, and nonlung cancer) were analyzed for the overlap. Among the 772 unique proteins identified in the present study, 363 (47.0%) were common to all types of PEs (supplemental Fig. S4D).
PCA and HCL Analysis of Identified Proteins-To generate differential proteome data sets, we used spectral counts to build label-free semiquantitative proteomic data sets for the six PE types. To examine the accuracy of quantification and reproducibility, we analyzed the correlation (r value) and CV% values among the three independent GeLC-MS/MS experiments. For each PE type, the r value between any two of three independent GeLC-MS/MS experiments was higher than 0.987 (Supplemental Fig. S5). The average CV% values of 772 quantified proteins obtained from the three independent GeLC-MS/MS experiments of six PE types was 20.87%, and the CV% value for each quantified protein was shown in supplemental Table S6. For the PCA and HCL analysis, 772 proteins were totally included; none was excluded. The different PE types clearly separated into six groups in addition to the grouping of triplicate experiments of each PE type in PCA ( Fig. 2A). The resulting heat map and HCL dendrogram also showed a similar separation of the six PE types as well as a clustering of triplicate experiments (Fig. 2B). Notably, the PEs from different cancers were clustered together (lung, breast, and gastric cancers) but were separated from benign diseases (TB and PN), demonstrating distinct protein profiling in each PE type. Furthermore, these results support the high reproducibility of our triplicate experiments, suggesting that GeLC-MS/MS combined with spectral counting is a reliable method for generating both quantitative and qualitative PE proteomic data sets.
Comparison of Differentially Expressed Proteins in Pulmonary Disorders-Multiple lines of evidence indicate a link between inflammation and cancer. Additionally, inflammatory microenvironments produced by chronic infections have been suggested to lead to cancer-related inflammation. Consequently, discovery of specific PE biomarkers that distinguish between lung cancer and benign diseases is necessary to assist in the diagnosis of pulmonary disorders. Therefore, we focused our analysis on differentially expressed proteins in MPE from NSCLC (NSCLC-MPE), TB, and PN. We  Table S8). The pathophysiological status and potential pulmonary disorder diagnostic value for these proteins was assessed using Metacore bioinformatic software to analyze the GO of cellular processes involved in chronic (TB) and acute (PN) inflammatory diseases. The top three most significant cellular processes for the 90 up-regulated TB compared with NSCLC-MPE proteins were those related to stress, immune system process, and defense response (Fig. 2C) whereas for the 195 up-regulated PN compared with NSCLC-MPE proteins, the processes were response to stress and catabolic and immune processes. However, the three most significant cellular processes of the 75 up-regulated proteins in NSCLC-MPE compared with benign diseases (TB and PN) (Fig. 3) were related to cell adhesion, biological adhesion, and regulation of anatomical structure morphogenesis. These results further confirm that integration of multiple quantitative PE proteomes coupled with bioinformatic analysis is altogether feasible and invaluable to the discovery of potential pulmonary disorder biomarkers. Detailed information pertaining to GO cellular processes, p values and the protein list are summarized in supplemental Table S9.
Selection of Potential MPE Biomarkers for NSCLC-To identify malignancy-related proteins in NSCLC, we first combined 63 proteins that were increased in NSCLC-MPE compared with TB ( Fig. 3; supplemental Table S7) with 42 proteins that were increased in NSCLC-MPE relative to PN ( Fig. 3; supplemental Table S8). Accordingly, we identified 30 potential proteins that had higher protein levels (twofold change, qϽ0.05) in NSCLC-MPE compared with these two benign diseases (Table I). To narrow down the possible candidates that could be used for efficient NSCLC diagnostics, we established the following protein candidate selection criteria: First, we selected proteins with a twofold increase (qϽ0.05) in NSCLC-MPE relative to NSCLC-PMPE (supplemental Fig.  S6C); discrimination between MPE and PMPE has profound implications in the therapy and prognosis of lung cancer. Second, we stipulated that mRNA levels of candidates should be up-regulated in NSCLC tissues compared with normal tissues to establish expression levels of secreted proteins in primary cancer and noncancerous tissues. Third, the biological functions of candidates were correlated to tumorigenesis. Finally, candidates had to represent novel PE biomarkers for NSCLC and have commercially available antibodies or ELISA kits (Fig. 3). Following application of the first selection criteria, 25 candidates were identified as NSCLC biomarkers (Table I;  supplemental Table S10). Next, 10 mRNA data sets of lung adenocarcinoma deposited on the Oncomine 4.4.4.4 database, a cancer microarray database with web-based datamining characteristics (20), were examined. These data sets included Beer (21) (29), and Yamagata (30) lung. We observed that 19 of the 25 candidate proteins were dysregulated in human cancers (supplemental Table S10), which suggests that our integrated approach provides a reliable method for identifying multiple potential NSCLC biomarkers. Among these 25 candidate proteins, the mRNA levels of 11 candidates were overexpressed in NSCLC tissues compared with noncancerous tissues in at least five data sets (p Ͻ 0.05; Table I; supplemental Table  S10). Upon application of the remaining selection criteria, three of the 11 candidates were selected for further verification by ELISA: MET, PTPRF, and DPP4.
Validation of MET, PTPRF, and DPP4 as Potential PE Biomarkers for NSCLC-Clinical verification of MET, PTPRF, and DPP4 was conducted using 345 PE samples from six PE types (supplemental Table S2; Table II). The average CV% values of ELISA assays for duplicate 345 PE samples of MET, PTPFR, and DPP4 were 1.75%, 3.23%, and 2.15%, respec-   PN). Statistical analysis showed that malignancy was an independent factor of MET, PTPRF, and DPP4 levels (all p Ͻ 0.001) in 238 lung-disease PEs (supplemental Table S12). There was no significant correlation between the PE level of MET and sex, age, or smoking history; a similar result was also observed with PTPRF. However, both age and malignancy were independent factors of DPP4 level (p ϭ 0.017 and p Ͻ 0.001, respectively) in patients with lung diseases. Thus, these findings collectively indicate that MET and PTPRF are effective PE biomarkers that distinguish NSCLC-MPE from benign inflammatory lung diseases. The ROC curves discriminating between NSCLC-MPE and benign diseases revealed that the AUC values of MET, PTPRF, and DPP4 were 0.892, 0.803, and 0.612, respectively (p Ͻ 0.05 for all three proteins; Fig. 4D; Table III). The combination of these three markers exhibited higher diagnostic capacity than any marker alone (AUC ϭ 0.903; Fig. 4D; Table III). To analyze the AUC performance of MET and combined markers, we applied Panel-Composer, a web-based panel construction tool developed by Professor Young-Ki Paik's research team for multivariate analysis of disease biomarker candidates (19). Supplemental Table S13 shows the comparison of the effectiveness of combined markers (MET, PTPRF, and DPP4) compared with the MET protein alone in distinguishing NSCLC-MPE from benign diseases, NSCLC-PMPE, or nonlung MPE. We observed that the AUC of A (combined markers) in distinguishing NSCLC-MPE from benign diseases (TB, PN) or NSCLC-PMPE was better than that of B (MET), and the significance of the difference is denoted by the p value. For example, with a given specificity of 90%, the sensitivities of using MET, PTPRF, and DPP4 alone to distinguish NSCLC-MPE from benign diseases were 73.39%, 60.55%, and 29.36%, respectively. Signifi- cantly, combining the three markers enhanced the sensitivity for NSCLC-MPE detection (78.90%, supplemental Table S13) compared with an individual marker. When applying any one of the three markers with a given cut-off value to discriminate NSCLC-MPE from benign diseases, the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 88.07%, 65.89%, 68.57%, and 86.73%, respectively (supplemental Fig. S7).
Next, we examined the potential application of these candidates in discriminating NSCLC-MPE from NSCLC-PMPE. The protein concentrations of MET, PTPRF, and DPP4 in NSCLC-PMPE were 157.57 Ϯ 88.75, 96.33 Ϯ 88.17, and 154.31 Ϯ 54.00 ng/ml, respectively. The protein levels of these three candidates in NSCLC-MPE and NSCLC-PMPE were significantly different (p Ͻ 0.05; Table II; supplemental  Table S11 and supplemental Table S15). The AUC values of MET, PTPRF, and DPP4 in distinguishing NSCLC-MPE from NSCLC-PMPE were 0.875, 0.789, and 0.761, respectively (p Ͻ 0.05 in all three proteins; Fig. 4E; Table III). The AUC that best discriminates between NSCLC-MPE and benign dis-eases or NSCLC-PMPE is MET, with cut offs of 200.755 and 186.093 ng/ml, respectively; corresponding specificity and sensitivity are shown in Table III. Unexpectedly, the AUC values of a combination of any two or all three candidates either did not improve or improved only slightly relative to MET alone in distinguishing between NSCLC-MPE and benign diseases or NSCLC-PMPE (supplemental Table S14). To confirm the reliable validation for these three potential PE biomarkers used in distinguishing malignant (NSCLC-MPE) from nonmalignant pulmonary disorders (TB, PN, and NSCLC-PMPE), we performed the ELISA assays using an independent sample set, including 59 NSCLC-MPE, 30 benign diseases, and 28 NSCLC-PMPE. We examined the clinical characteristics between the three sample groups before performing the ELISA verification. Supplemental Table S17 shows that there was no significant difference (p Ͻ 0.05) in age, sex and gender between NSCLC-MPE and NSCLC-PMPE or benign disease (TB and PN) patients. Consistent with the results shown in Table II, MET, PTPFR, and DPP4 were validated as the potential PE biomarkers for NSCLC- MPE using the second independent cohort (Table IV). Notably, we observed that the levels of these three potential PE biomarkers were significantly associated with malignancy (lung MPE compared with TB and PN; lung MPE compared with lung PMPE) and lung cancer histology (adenocarcinoma compared with nonadenocarcinoma). Although the sample population of adenocarcinoma (n ϭ 181) and nonadenocarcinoma (n ϭ 58) is asymmetrical in the current study (Table II  and Table IV), we used multivariate analysis to analyze the independent factors of these three potential biomarkers. Table V shows that both MPE (p Ͻ 0.001) and histology (p ϭ 0.032) were independent factors of MET levels; however, whether a patient had MPE was the only independent factor of PTPRF and DPP4 levels. Based on these results, we conclude that the three PE protein levels may be used as potential NSCLC biomarkers. However, we emphasize that MET was a potential PE biomarker used in distinguishing adenocarcinoma MPE, the most common histology type of NSCLC-MPE, from nonmalignant pulmonary diseases.  Fig. 4 and Table II). Notably, significant alternation of MET and PTPRF in lung malignancy (n ϭ 109) compared with other malignancies (n ϭ 43) was observed ( Fig. 4A-4C; Table II; supplemental Table S16). The AUC values of MET, PTPRF, and DPP4 used to distinguish between NSCLC-MPE and nonlung cancer MPE were 0.787, 0.656, and 0.600, respectively ( Fig. 4F; Table III). In addition, we showed that the AUC values of MET, PTPRF, and DPP4 used to distinguish NSCLC-MPE from all other tested pleural types (TB, PN, NSCLC-PMPE, BC, and GC) were 0.870, 0.709, and 0.642, respectively. These results imply that MET may be a specific lung cancer MPE biomarker; however, we require more samples from nonlung cancer MPE to warrant this conclusion.
Considering its potential as a PE biomarker for lung cancer, we investigated the source of MET production in PEs. The MET protein contains a disulfide link between the ␣-(50 kDa) and ␤-subunits (145 kDa), forming a ␣/␤-heterodimer (31)(32)(33). Notably, soluble MET (sMET) is generated via ectodomain shedding, in which the ␤-subunit is proteolytically cleaved and released (34). We confirmed that the molecular weight of the MET isoform detected in our PE sample was equivalent to sMET (90 kDa; Fig. 5A). Because met gene overexpression has been reported in lung cancer (35,36), we posited that increased sMET results from overexpression and/or proteolysis of membrane-bound MET (mMET) in lung cancer tissues. To confirm this hypothesis, we examined the expression of   mMET in lung cancer cell lines and tissues. Western blot analysis showed that mMET levels in crude extracts and sMET in conditioned media of CL1-5 lung adenocarcinoma cells (high malignancy) were higher than in CL1-0 lung adenocarcinoma cells (low malignancy; Fig. 5A). Advanced stage cancer tissues (stage IV) also showed high levels of mMET compared with adjacent normal tissue (Fig. 5B). These results suggest that sMET was derived from cancerous cells and tissues and supports the positive correlation we observed between MET levels and cancer malignancy.

The PE protein Levels as a Useful Adjunct if Combined with Cytological Evaluation in Diagnosis of NSCLC with Pleural
Cavity Metastasis-Considering that cytological examination is one of the gold standards for diagnosis of pleural cavity metastasis in NSCLC, it is worthwhile to examine the potential clinical applications of PE biomarkers if combined with cytological examination. We observed that the sensitivity of cytological examination for NSCLC-MPE diagnosis in the present study was 73.80%, indicating that only 124 of 168 NSCLC-MPE were examined as positive cytological samples. When any one of three biomarkers with a given cut-off value (MET: 186.093 ng/ml; PTPRF: 78.113 ng/ml; DPP4: 191.036 ng/ml) was applied for NSCLS-MPE diagnosis, the sensitivity was 93.45% (157/168, supplemental Fig. S8A). This result indicates that 39 of 44 NSCLC-MPE samples missed by cytological examination were rescued by PE biomarkers. When cytological examination was combined with PE biomarkers, the sensitivity was 97.02% ([(124 ϩ 39)/168]*100% ϭ 97.02%) (supplemental Fig. S8A). We also showed that the sensitivity of repeated cytological examination, adjunctive methods (pleural biopsy, pleural seeding nodules) and any one of three markers in the diagnosis of these 44 NSCLC-MPE missed by cytological examination was 77.3% (34/44), 22.7% (10/44), and 88.6% (39/44), respectively (supplemental Fig. S8B). In addition, if we applied the same given cut-off value of three PE biomarkers described above to discriminate NSCLC-MPE from all other tested pleural types in the current study (TB, PN, NSCLC-PMPE, BC, and GC), the sensitivity, specificity, PPV and NPV were 93.45%, 27.55%, 42.43%, and 88.04%, respectively (supplemental Fig. S9). These results collectively suggest that cytological examination (100% specificity and appropriate sensitivity) combined with PE biomarkers (high sensitivity and NPV) would improve the overall clinical diagnostic efficacy of NSCLC-MPE.
Notably, the levels of MET and PTPRF in NSCLC-MPE patients with positive cytology results (n ϭ 124) were significantly higher than in patients with negative cytology results (false negative, n ϭ 44) (Table VI). Significantly, the protein levels of these three potential PE markers (MET, PTPRF, and DPP4) in NSCLC-MPE patients with negative cytology results (n ϭ 44) were significantly higher than the levels in NSCLC-PMPE (true negative, n ϭ 71) (Table VI). These results suggest that the PE protein markers were more discriminated in effusions in which the cytology sample was positive. Accordingly, we propose that PE protein levels would provide a useful adjunct if combined with cytological evaluation in diagnosis of NSCLC with pleural cavity metastasis. DISCUSSION Lung cancer is the most common malignancy in humans and the leading cause of cancer-related death worldwide (37). Previous studies collectively demonstrate the utility of MPE proteomics in biomarker discovery for human cancers (8). The first proteomics study on MPE was published in 2004; in this study, Bard et al. focused on identification of protein components derived from MPE exosomes in three cancer types (mesothelioma, lung, and breast cancer) by MALDI-TOF. This study reported 18 proteins identified in exosomes isolated from the PE of one lung cancer patient (38). Previously, we created a comprehensive MPE proteome data set with 482 proteins and established the clinical relevance of potential biomarkers in NSCLC (9). Wang et al. recently also identified 16 differentially expressed proteins between lung adenocarcinoma and benign inflammatory PEs by two-dimensional difference gel electrophoresis combined with MALDI-TOF (39). We identified/quantified 15 of 16 (93.75%) differently expressed proteins reported by Wang et al., and only one protein, Jumonji domain containing five, has not been identified in our PE proteome (supplemental Table S18). The present study is the first comprehensive, label-free, quantitative proteomic study of six types of exudative PEs with 772 identified/quantified proteins. Our results established differentially expressed PE proteomes from patients with malignancy (lung, breast, and gastric cancers) and nonmalignancy (TB, PN, and paramalignancy). To the best of our knowledge, the most comprehensive PE data set with more than 1300 proteins was recently generated by Mundt et al. (40). The authors used immunodepletion (Top-14) and narrow-range immobilized pH gradient/high-resolution isoelectric focusing (pH 4 -4.25), followed by LC-MS/MS, to perform the PE proteome from mesothelioma, lung adenocarcinoma, and benign pleurisy patients. Therefore, application of multidimensional protein fractionation technology should be necessary to improve the number of identified PE proteins in the near future. The four proteins (AHSG, AGN, CST3, and IGFBP2) reported as the potential PE biomarkers in our previous NSCLC-MPE data set (9) were also identified/quantified in the current study (supplemental Tables S4 and S5). Consistent with our previous findings, the label-free quantification of these potential PE markers revealed that the protein levels in malignant PE were higher than the levels in nonmalignant PE (TB, PN, and PMPE), although the average ratio of NSCLC-MPE/PN for IGFBP2 was 0.97 (supplemental Table S19). Current bioinformatic analyses revealed distinct expression profiles and biological processes in lung cancer and inflammatory diseases. GO cellular process analysis supports the pathophysiological status of pulmonary disorders herein, including TB, PN, and malignancy. We also verified the PE levels of three potential biomarkers using two cohorts of clinical samples. Our results support the novelty of these three PE biomarkers and the utility of currently established PE proteomes for biomarker discovery in pulmonary disorders.
The three biomarker protein candidates (MET, PTPRF, and DPP4) identified herein were selected based on five criteria (Fig. 3) and validated by ELISA analysis of two independent individual sample sets. MET, encoded by the met protooncogene, is a prototypical member of the subfamily of receptor tyrosine kinases, predominantly expressed in epithelia (33). The main ligand that activates MET is hepatocyte growth factor (HGF) also known as a scatter factor (41,42). Mature MET is expressed on the cell surface to facilitate ligand binding and activation of related transduction molecules through auto-phosphorylation of tyrosine residues in tyrosine kinase and juxtamembrane domains. The HGF/MET pathway regulates important processes involved in cellular development, including differentiation, proliferation, motility, the cell-cycle, and cell death (43,44). Inappropriate HGF signaling has been observed to dysregulate proliferation, motility, and invasion in several human malignancies (45). Specifically, treatment of lung cancer cells with HGF to activate the MET pathway has been reported to stimulate cellular motility, migration, and invasion (35,36,46).
According to our results shown in Fig. 5, we proposed that MET overexpression in advanced stage NSCLC cells elevated  (47). These phenomena were similar to those shown in a breast cancer study in which MET shedding correlated with malignancy in cultured cells and tumor burden in tumor xenograft mouse models (48). Conversely, Yang et al. demonstrated the beneficial effects of high sMET concentrations in human plasma, showing that the overall median plasma concentration of sMET in patients with gastric cancer was lower than in controls, and sMET levels decreased as the onset of cancer drew nearer (49). This study also investigated the interactions between CagA-related genes and sMET protein concentration in the development of gastric cancer, suggesting that the genetic background of different cancer types may influence the application of MET concentration in the diagnosis/prognosis of human cancers. A second biomarker candidate in this study was PTPRF, a member of the receptor protein tyrosine phosphatase type IIa subfamily, which exclusively comprise extracellular Ig do-  mains and fibronectin III repeats (50). This phosphatase family has been implicated in several signaling pathways by regulating receptors such as EGFR, RET, and MET (51). PTPRF is expressed in various cell types, including epithelial cells, neuronal cells, and fibroblasts (52), and plays an important role in cell adhesion and migration by directly interacting with integrins in focal adhesions (53). In addition, PTPRF expression was observed to be associated with the potential for metastasis in the well-characterized 13762NF rat mammary adenocarcinoma clones (54). Another protein candidate herein was DPP4, also known as CD26, a widely distributed 110 kDa transmembrane glycoprotein with peptidase activity. DPP4 is expressed as a cell surface antigen in melanocytes, epithelial cells, and lymphocytes (55)(56)(57), is a functional receptor for collagen, and is essential for normal immune function (58). Moreover, there are significant levels of DPP4 activity in plasma, serum (sCD26), and urine (59). Similar to sMET, sCD26 originates from the shedding of surface-bound CD26. Overexpression of DPP4 is observed in various human cancers, including thyroid, breast, prostate, and ovarian cancers and is involved in tumor development, invasion, and metastasis (60,61). However, previous studies have also observed that sCD26 was deficient in total homogenates of colon, kidney, lung, and liver tumors (62,63) as well as in different transformed and cancer-derived cell lines (64).
To further explore the clinical application of PE markers, we examined the positive correlation between PE and serum protein levels obtained from the same individuals; the levels of MET and DPP4 in PE and serum positively correlated (p Ͻ 0.05; n ϭ 36, supplemental Fig. S10). It is notable that MET levels in PEs (475.879 Ϯ 650.286 ng/ml) were higher than levels in paired serum samples (228.711 Ϯ 66.725 ng/ml), supporting the benefits of using PEs for pulmonary diseaserelated biomarker discovery. Although future work is warranted to identify and validate additional protein biomarker candidates using a large cohort of PE and serum/plasma samples, the 30 proteins identified as associated with malignancy in the present study (Table I) are viable candidates to establish a useful panel of biomarkers used in diagnosis and/or prognosis of NSCLC.