Proteome Screening of Pleural Effusions Identifies Galectin 1 as a Diagnostic Biomarker and Highlights Several Prognostic Biomarkers for Malignant Mesothelioma

Malignant mesothelioma is an aggressive asbestos-induced cancer, and affected patients have a median survival of approximately one year after diagnosis. It is often difficult to reach a conclusive diagnosis, and ancillary measurements of soluble biomarkers could increase diagnostic accuracy. Unfortunately, few soluble mesothelioma biomarkers are suitable for clinical application. Here we screened the effusion proteomes of mesothelioma and lung adenocarcinoma patients to identify novel soluble mesothelioma biomarkers. We performed quantitative mass-spectrometry-based proteomics using isobaric tags for quantification and used narrow-range immobilized pH gradient/high-resolution isoelectric focusing (pH 4–4.25) prior to analysis by means of nano liquid chromatography coupled to MS/MS. More than 1,300 proteins were identified in pleural effusions from patients with malignant mesothelioma (n = 6), lung adenocarcinoma (n = 6), or benign mesotheliosis (n = 7). Data are available via ProteomeXchange with identifier PXD000531. The identified proteins included a set of known mesothelioma markers and proteins that regulate hallmarks of cancer such as invasion, angiogenesis, and immune evasion, plus several new candidate proteins. Seven candidates (aldo-keto reductase 1B10, apolipoprotein C-I, galectin 1, myosin-VIIb, superoxide dismutase 2, tenascin C, and thrombospondin 1) were validated by enzyme-linked immunosorbent assays in a larger group of patients with mesothelioma (n = 37) or metastatic carcinomas (n = 25) and in effusions from patients with benign, reactive conditions (n = 16). Galectin 1 was identified as overexpressed in effusions from lung adenocarcinoma relative to mesothelioma and was validated as an excellent predictor for metastatic carcinomas against malignant mesothelioma. Galectin 1, aldo-keto reductase 1B10, and apolipoprotein C-I were all identified as potential prognostic biomarkers for malignant mesothelioma. This analysis of the effusion proteome furthers our understanding of malignant mesothelioma, identified galectin 1 as a potential diagnostic biomarker, and highlighted several possible prognostic biomarkers of this disease.

Malignant mesothelioma affects tissue that covers the serous cavities of the body. Approximately 80% of mesotheliomas are of pleural origin, and exposure to high concentrations of asbestos is the most common cause. The latency period range is 20 -40 years, and by the time patients present with clinical symptoms, the disease has often progressed to an advanced stage with limited treatment possibilities (1). Reaching a conclusive mesothelioma diagnosis is often difficult (1,2). The first symptom is frequently pleural effusion that needs to be drained to relieve the patient's discomfort, and this effusion is often the first biological material that is available for diagnostic analysis. Identification of soluble biomarkers of malignant mesothelioma in pleural effusions might complement the morphological examination and shorten the time needed to reach a conclusive diagnosis.
To date, several molecular markers for malignant mesothelioma have been analyzed at the tissue and cellular levels, but few markers are of value when measured in effusions or in serum. The two best-established soluble biomarkers are mesothelin, a protein also known as pre-pro-megakaryocyte-potentiating factor, and hyaluronan, which is a linear polysaccharide. Mesothelin is expressed by both benign and malignant mesothelial cells (3). This protein is proteolytically cleaved into two fragments, one that is cell bound (C-ERC/ mesothelin) and one that is soluble (megakaryocyte potenti-ating factor or N-ERC/mesothelin). These fragments have similar diagnostic capabilities (4), with moderate specificity and sensitivity for malignant mesothelioma (5)(6)(7)(8)(9)(10). Mesothelin has limited specificity for diagnosis because it is also secreted by tumors such as ovarian and pancreatic adenocarcinomas (11,12); in addition, mesothelin levels increase with age and declining renal function (13)(14)(15). Hyaluronan is synthesized by mesothelial cells, and high levels in mesothelioma effusions were noted as far back as the early 1940s (16). This linear polysaccharide is produced in the cell membrane and has a high specificity, but only moderate sensitivity, for mesothelioma (7, 16 -25).
Osteopontin, also called secreted phosphoprotein-1, has been linked to mesothelioma by transcriptomics analysis (26). Although an initial study confirmed the diagnostic value of osteopontin (27), most studies ultimately found that osteopontin was insufficient for diagnostic purposes (4,28,29). Hegmans et al. used surface-enhanced laser desorption/ionization TOF-MS to identify apolipoprotein C-I in the serum of mesothelioma patients (30). With an area under the curve (AUC) 1 of 0.76, apolipoprotein C-I showed good discriminatory properties but did not outperform C-ERC/mesothelin as a diagnostic measure. Recently, fibulin-3 was shown to have promising discriminatory capabilities for mesothelioma (31). However, further studies are needed to confirm its diagnostic usefulness. The current biomarkers identify only a proportion of mesotheliomas, and additional markers are needed to improve diagnostic sensitivity.
In this study we aimed to identify additional biomarkers for malignant mesothelioma for use in conjunction with morphological diagnosis. Accordingly, we performed discovery proteome screening of pleural effusions from mesothelioma and lung adenocarcinoma patients, and candidate biomarkers were validated in a larger patient cohort.

EXPERIMENTAL PROCEDURES
Ethics Statement-All patients included in this study provided written informed consent. The study was approved by regional ethics committees in Stockholm and at the Eskisehir Osmangazi University in Turkey.
Patients and Effusion Characteristics-Discovery Population-Pleural effusions in the discovery population were subjected to shotgun proteomics to identify biomarker candidates. Pleural drainage of effusions from patients was performed at the Chest Diseases Department of Eskisehir Osmangazi University in Eskisehir, Turkey. After collection, effusions were left for 10 min and then centrifuged at 2,000 rcf for 10 min; acellular supernatants were collected for analyses and stored without additives at Ϫ80°C. Effusions from patients diagnosed with pleurisy were collected at the Department of Laboratory Medicine, Division of Pathol-ogy, Karolinska Institutet, Stockholm, Sweden. These seven effusions were centrifuged at 1,700 rcf for 10 min, and acellular supernatants were stored without additives at Ϫ20°C before analyses. Effusions were analyzed from six patients with epithelioid mesothelioma, six patients with lung adenocarcinoma, and seven patients with benign pleurisy/mesotheliosis (Table I). All cancer diagnoses were verified by histology and immunohistochemistry, and all patients with benign pleurisy were alive without a malignant diagnosis three years after sample collection. No patients had a history of systemic diseases (such as diabetes, rheumatoid arthritis, or systemic lupus) or other malignancies and none had received chemotherapy prior to collection of the effusion samples. Cytology on cell pellets collected after centrifugation was used to ensure a high mesothelial cell content in the effusions from pleurisy patients.
Validation Population-Biomarker candidate proteins were validated in a set of effusions from nonconsecutive patients (all samples were collected in Eskisehir, Turkey, as described above) with malignant mesotheliomas (n ϭ 37), pleural metastases of adenocarcinomas (n ϭ 22), squamous carcinoma of the lung (n ϭ 2), adenosquamous non-small cell lung cancer (n ϭ 1), and benign reactive conditions (n ϭ 16) (Table II). The patient group with pleural metastases included n total ϭ 25. 1 The abbreviations used are: AUC, area under the curve; ANOVA, analysis of variance; CV, cross-validation; FDR, false discovery rate; HiRIEF, high-resolution isoelectric focusing; IPG, immobilized pH gradient; iTRAQ, isobaric tags for relative and absolute quantification; OPLS-DA, orthogonal partial least square discriminant analysis; PCA, principal component analysis.

Depletion of Abundant Proteins by the Multiple Affinity Removal
System-The 14 most abundant serum proteins were removed using the Agilent Human 14 Multiple Affinity Removal System (MARS-14) column (Agilent Technologies, Inc., Santa Clara, CA) to reduce the dynamic range of the pleural effusions. The DC™ Protein Assay (Bio-Rad Laboratories, Inc.) was used to determine protein concentrations in each sample, and aliquots of each sample containing 1.6 mg of protein were depleted using the MARS-14 column coupled to an Ä kta™ chromatography system (GE Healthcare) following the manufacturer's instructions. Proteins in the flow-through and retained proteins were followed using UV light absorbance (280 nm). Removal of abundant proteins was verified on NuPAGE® Novex® Bis-Tris mini gels (10%; Invitrogen) stained with Coomassie Brilliant Blue. Protein sizes were determined via comparison to a prestained protein ladder (SeeBlue®Plus2, Invitrogen).
Preprocessing of Effusion Proteins-Tryptic Digestion Using Filter-Aided Sample Preparation-All samples were individually reduced and alkylated before digestion with trypsin. In brief, samples were placed in centrifugation tubes containing Microcon YM-10 filters (10-kDa cutoff; Nanosep® Centrifugal Devices with Omega™ Membrane, 10 k). For further details, please see Ref. 32. Samples were centrifuged (14,000 rcf for 15 min) in urea buffer #1 (8 M urea, 1 mM dithiothreitol, 25 mM HEPES, pH 7.6). The material retained on the filter was then alkylated with urea buffer #2 (4 M urea, 55 mM iodoacetamide, 25 mM HEPES, pH 7.6) for 20 min at room temperature. Another centrifugation step preceded treatment with urea buffer #3 (4 M urea, 25 mM HEPES, pH 7.6). Two additional centrifugation steps were performed to wash the filters. All buffers were prepared fresh prior to use, and centrifugation was conducted at room temperature. Samples were trypsinized (Promega, Madison, WI) 0.25 M urea, 100 mM HEPES, pH 7.6 plus trypsin at an enzyme-toprotein ratio of 1:50) overnight at 37°C. Finally, the filter units were centrifuged for 15 min at 14,000 rcf and then subjected to another centrifugation with Milli-Q water. The flow-through, which contained the tryptic peptides, was collected.
Isobaric Labeling for Quantification-Labeling with isobaric tags for relative and absolute quantification (iTRAQ) was performed following the manufacturer's instructions (AB Sciex, Framingham, MA). Two iTRAQ 8-plexes were used, and 100 g of peptides from each sample were labeled with each iTRAQ tag (114 -119 and 121 Da). Both iTRAQ 113 tags were mixed with 14.3 g of peptides from all 14 samples (for a total of 200 g of peptides); this was divided into two 100-g aliquots for use as an internal standard to link the two 8-plexes. The pooled iTRAQ samples were cleaned using Strata-X-C columns (Phenomenex Inc., Torrance, CA) following the manufacturer's instructions (33,34). The two different iTRAQ 8-plex samples are referred to as iPool 1 and iPool 2; each pool contained peptides from three mesotheliomas, three metastatic lung adenocarcinomas, one benign pool composed of either three or four benign effusions, and the internal standard.
Isoelectric Focusing of Peptides-To reduce the sample complexity, the two iPools were fractionated by means of high-resolution isoelectric focusing (HiRIEF) on an ultra-narrow immobilized pH gradient (IPG) covering a pH range of 4 -4.25 as described in Ref. 35 (IPG-HiRIEF kindly supplied by GE Healthcare). In silico digestions of mesothelin (UniProt entry #Q13421), osteopontin (#P10451), and CA125 (#Q8WXI7) were performed to predict a suitable pH interval that would include peptides from these proteins in the range of the IPG-HiRIEF strip. For each iPool, 390 g of peptides were diluted in 8 M urea and bromphenol blue and applied to an IPG strip (24 cm), and the strips were allowed to swell overnight. The two IPG-HiRIEFs were run for at least 100 kVh on an IPGphor-II (GE Healthcare Bio-Sciences AB) before elution with Milli-Q water three times for 45 min into 72 fractions using a prototype liquid handling robot (GE Healthcare Bio-Sciences AB). The fractions were dried in a speed vacuum centrifuge and stored at Ϫ80°C. For further details, please see Ref. 35.
Liquid Chromatography Electrospray Ionization LTQ-Orbitrap Analysis-Peptides were separated using an Agilent 1200 nano-LC system before analysis on the LTQ Orbitrap Velos (Thermo Fisher Scientific). Samples were trapped on a Zorbax 300SB-C18 and separated on an NTCC-360/100 -5-153 (Nikkyo Technos, Ltd., Tokyo, Japan) nano-LC column at 0.4 l/min using a 50-min linear gradient ranging from 3%-40% acetonitrile in 0.1% formic acid. The LTQ Orbitrap Velos was operated in a data-dependent manner. Five precursors were selected for sequential fragmentation by collision-induced dissociation and higher-energy collisional dissociation, and the results were analyzed by the linear ion trap and Orbitrap, respectively. The survey scan was performed in the Orbitrap at a resolution of 30,000 in the profile mode from 300 -2,000 m/z, using a lock mass of m/z 445.120025, with a maximum injection time of 500 ms and the automatic gain control set to 1 ϫ 10 6 ions. For the generation of higher-energy collisional dissociation fragmentation spectra, a maximum ion injection time of 500 ms and an automatic gain control of 5 ϫ 104 were used before fragmentation at 50% normalized collision energy and 100-ms activation time. For Fourier transform MS2 spectra, a normal mass range was used, centroiding the data at a 7,500 resolution. Peptides for collision-induced dissociation were accumulated for a maximum ion injection time of 200 ms and an automatic gain control of 3 ϫ 10 4 , fragmented with 35% collision energy, wideband activation on, activation q 0.25, and an activation time of 10 ms before analysis at the normal scan rate and mass range in the linear ion trap. Precursors were isolated with a width of 2 m/z and put on the exclusion list for 90 s. Single and unassigned charge states were rejected from precursor selection.
Peptide and Protein Identification-Orbitrap spectra were initially mapped by SEQUEST and MASCOT against the UniProt human canonical sequence protein database (January 18, 2011; Proteome Discoverer 1.2), and the results were filtered using a 5% false discovery rate (FDR) cutoff. Later, the spectra were searched again using the SEQUEST search engine and the Percolator algorithm with Proteome Discoverer 1.3 software (Thermo Scientific) against the UniProt human canonical sequence protein database (October 7, 2011; 56,869 entries). The results were filtered using a 5% FDR cutoff. A precursor mass tolerance of 10 ppm and product mass tolerances of 0.02 Da for higher-energy collisional dissociation Fourier transform MS and 0.8 BAP, benign asbestos pleuritis; PPP, parapneumonic pleurisy; TBC, tuberculosis; CHF, congestive heart failure. All pleural effusions were collected at the Chest Diseases Department of Eskisehir Osmangazi University in Eskisehir, Turkey.
Da for collision-induced dissociation ion trap MS were used. In addition, fixed modifications included trypsin with one missed cleavage, iodoacetamide on cysteine, and iTRAQ 8-plex on lysine and the N terminus; oxidation of methionine was set as a variable modification. Quantification of the iTRAQ 8-plex reporter ions was performed using Proteome Discoverer on higher-energy collisional dissociation Fourier transform MS tandem mass spectra using an integration window tolerance of 20 ppm and median centering for each sample. Only unique peptides in the dataset were used for quantification determinations. The raw data associated with this manuscript have been deposited in the ProteomeXchange Consortium via the PRIDE partner repository (36) with the dataset identifier PXD000531. For proteins identified with one unique peptide, annotated spectra can be viewed in supplementary File S1.
Statistical Analyses and Bioinformatics-Multivariate Analysis-Multivariate analysis was performed on proteins found in both iPool 1 and iPool 2 (i.e. overlapping proteins). Principal component analysis (PCA) and orthogonal partial least square discriminant analysis (OPLS-DA) were performed using SIMCA-P ϩ (v.13) software (Umetrics, Umeå, Sweden). PCA was performed on unit variance-scaled and median-centered data to view the data distribution and to identify possible outliers. OPLS-DA model building is a supervised classification to determine the quantified proteins that, together, can discriminate between diseases such as mesotheliomas versus lung cancers. To refine the OPLS-DA models, proteins with low variable importance scores (Ͻ1) and proteins with nonsignificant weights (i.e. a 95% confidence interval that includes 0) were excluded in successive optimization steps. The OPLS-DA model was validated internally using 7-fold cross-validation (CV), and model validity was assessed with CV-ANOVA (with the p value indicating the probability that the model is the result of chance alone). R2Y denotes the percentage variance in Y that is explained by the model, and Q2 is the fraction of the total variation of Y that can be predicted by the model.
Univariate Analysis-Significant analysis of microarray (SAM 4.0, Stanford Tools, Stanford, CA) was used to identify deregulated proteins by assessing the fold-change and the FDR (37). Values were log 2 -transformed and median-centered for the analysis. The FDRs are presented as individual q-values. Significant analysis of microarray cannot be used for groups with fewer than five samples; therefore, the analysis was performed on only the overlapping proteins from both iPools. The primary comparison was of all mesotheliomas against all lung carcinomas. The two pools that included pleurisy patients were regarded for benign reference purpose. Different datasets were superimposed using the Web-based tool BioVenn (38).
Network Analyses (DAVID and Ingenuity Pathway Analysis TM )-Selected proteins were uploaded into the DAVID database (Database for Annotation, Visualization and Integrated Discovery) to evaluate possible associations with known pathways (39). DAVID utilizes data from other databases, such as KEGG and BioCarta, to determine a protein group's involvement in biological processes.
Deregulated proteins (fold-change of Ն1.5 or Յ1.5) from the overlap of the iPools were entered into the Ingenuity Pathway Analysis software (Ingenuity Systems, Qiagen, Hilden, Germany), a network-building and pathway annotation tool. Ingenuity Pathway Analysis employs Fisher's exact test to calculate the probability that chance alone explains the overlap of identified proteins and canonical pathways. Associations with a p value Ͻ 0.05 were considered statistically significant.
Validation of Biomarker Candidates in an Extended Patient Group-Candidate Selection-Proteins with high fold-changes and q-values of 0% were automatically considered prime biomarker candidates. A second selection accepted higher q-values, and the emphasis was on proteins that were selected by the OPLS-DA models or that seemed to be biologically relevant to mesothelioma or cancer. "Suggested biological relevance" was defined as top-scoring proteins involved in known pathways (as determined in the Ingenuity Pathway Analysis and DAVID databases) or that seemed relevant based on evidence in the scientific literature.
Enzyme-linked Immunosorbent Assay-A subset of selected candidate biomarkers were validated in a larger set of patient effusions using commercial enzyme-linked immunosorbent assays (ELISAs). Manufacturer information and the sample dilution factors are listed in supplementary Table S1.
Comparison of Groups Using One-way ANOVA-The three sample groups (mesothelioma, metastatic carcinomas, and benign conditions) were compared using the nonparametric Kruskal-Wallis test, with comparisons between groups performed using Dunn's post hoc test to obtain multiplicity-adjusted p values. Analyses were performed and figures were generated using GraphPad Prism software (v. 5.04, GraphPad Software Inc., La Jolla, CA).
Receiver Operator Characteristics-The true positive rate (sensitivity) was plotted against the false positive rate (100% Ϫ specificity), and the AUC values are reported with the 95% confidence interval as an estimate of diagnostic usefulness. The curves were created and the analyses were performed using GraphPad Prism software. These analyses were performed on all patients in the validation population. Differences between receiver operator characteristics were calculated using the StAR Web-based application. This method uses the nonparametric Mann-Whitney U-statistic to compare distributions and the AUC when computed by the trapezoidal rule (40). Patient expression levels from two reference biomarkers, hyaluronan and N-ERC/mesothelin, were extracted from an earlier study (41).
Survival Analysis-Survival analysis was performed for patients in the validation population. The Web-based threshold selection tool Cutoff Finder was used to condition a threshold for each biomarker based on its most significant hazard ratio (42). The Kaplan-Meier estimator was used to evaluate possible prognostic information. The log-rank (Mantel-Cox) test was performed to compare survival curves and to estimate hazard ratios. Analyses were performed and graphs were created using GraphPad Prism software.

Pleural Effusion Processing before Proteome Screening-
The mesothelioma and lung adenocarcinoma patients in the discovery population had similar smoking habits and asbestos exposure, although the range was wide for both patient groups. Patients in all groups were of similar age, and all were male except for three female pleurisy patients. The patient characteristics of the discovery population are shown in Table I. Affinity purification (MARS-14 column, Agilent) removed ϳ80% of the proteins in each sample as judged by UV absorption at 280 nm and the DC TM Protein Assay (data not shown). Coomassie Brilliant Blue staining verified an overall loss of protein load after purification, especially at the size of albumin (67 kDa; data not shown). A schematic workflow for this proteomic discovery phase is shown in Fig. 1A.
Protein Detection and Identification-We used IPG-HiRIEF with the pH interval 4.00 -4.25 to simplify the samples before nano-LC-MS/MS. Each iPool's IPG-HiRIEF gel strip was fractionated into 72 parts before elution of focused peptides. For further details, please see Ref. 35. Spectra from the LTQ Orbitrap Velos were matched against the UniProt database using SEQUEST and were post-processed with Percolator. With a 5% FDR cutoff, 1,184 proteins were identified in iPool 1 and 569 were identified in iPool 2; 382 proteins overlapped (i.e. were found in both groups) (supplementary File S2 and Fig. 1B). A total of 1,371 proteins were identified in the screened pleural effusions.
Statistical Analyses and Bioinformatics-Multivariate Analysis-Unsupervised PCA-PCA showed high homogeneity for most samples. One of the lung cancers from iPool 1 was flagged as a possible outlier (sample L118a in Fig. 2A), and two mesothelioma patients showed a different distribution than the bulk of the patients (M115a and M115b in Figs. 2A and 2B). Proteins clustering around the L118a lung cancer sample were uploaded to the DAVID database, and significant BioCarta pathways were involved in the complement system and in prothrombin-activated coagulation. However, the fluid was not visibly contaminated with blood/ hemoglobin (i.e. it was a milky yellow color). Proteins separating the two M115 mesothelioma patients from the rest were involved in the glycolysis pathway, the complement and coagulation cascades, and response to hypoxia-induced stress. Uploading all proteins into DAVID showed associations with the complement system but not with prothrombin-activated coagulation. No clustering of disease groups could be discerned with PCA. There were no outliers when PCA was performed with only mesothelioma and lung adenocarcinoma samples (Fig. 2B).
Multivariate Analysis-Supervised OPLS-DA-Supervised OPLS-DA was performed on the six mesothelioma and six lung cancer samples for multivariate supervised classification between the groups based on quantitative proteomics data.  Table III with fold-changes and q-values. When the analysis was performed without the possible outlier (L118a), the optimized model included 10 proteins (Fig. 3C) and showed a better predictive ability (R2Y ϭ 0.71 and Q2 from CV of 0.64; CV-ANOVA p value ϭ 0.016). In the earlier model based on results from a combined search with SEQUEST and MASCOT, galectin 1 was part of the model that distinguished malignant mesothelioma from lung cancer. However, galectin 1 was not part of the model after the data were searched with an updated database and updated search software (see supplemental Fig. S1 for the initial OPLS-DA model).
Univariate Analysis-Deregulated proteins had high q-values regardless of which groups were compared. A comparison of overlapping proteins in iPool 1 and iPool 2 from all mesothelioma samples against all lung adenocarcinoma samples resulted in the list of deregulated proteins. Considering the low number of samples and the large biological variance between human samples, it might be appropriate to use a threshold that is more lenient than q-value Ͻ 5%. Four proteins were up-regulated with a q-value of 0%, 9 proteins were up-regulated at q-value Ͻ 20%, and a Ͻ 35% q-value cutoff expanded the list to 21 proteins (Table IV) proteins are shown in the table, as all had high q-values (Ͼ40%).
Bioinformatics Analyses-Proteins identified as important for distinguishing between groups in the OPLS-DA models were matched to pathways from the DAVID database. These pathways were associated with complement activity and with coagulation and also included immune-system-associated pathways. Ingenuity Pathway Analysis that compared mesothelioma patients with either all non-mesothelioma patients or with patients with lung adenocarcinoma showed that the biological functions that differed included cell-to-cell signaling and interaction, cellular movement, cell cycle, cellular growth and proliferation, free radical scavenging, and antigen presentation (supplementary Table S2).
Validation of Biomarker Candidate Proteins in an Extended Patient Group-Candidate biomarkers for mesothelioma were selected based on the following criteria: fold-change and q-value, contribution to the OPLS-DA model, and suggested biological relevance. Commercial ELISA assays were used to analyze samples from an extended population to validate seven biomarker candidates: aldo-keto reductase 1B10, apolipoprotein C-I, galectin 1, myosin-VIIb, superoxide dismutase 2, tenascin C, and thrombospondin 1. Fig. 4 shows the initial fold-changes and q-values along with expression levels in the FIG. 2. Principal component analysis (PCA) performed on the 382 overlapping proteins in the two iPool samples. A, the PCA score plot of components t1 and t2 shows the most variation in the data. L118a is shown as a possible outlier value based on protein expression (i.e. outside the 95% confidence interval, which is indicated by the circle). B, PCA of samples from mesothelioma and lung adenocarcinoma patients only (L118a is now just within the 95% confidence interval). M, mesothelioma; L, lung adenocarcinoma; B, benign. The number indicates the iTRAQ ion used to label the individual sample, and lowercase "a" and "b" indicate iPool 1 and iPool 2, respectively. validation population as measured by MS and ELISA, respectively. Receiver operator characteristic AUCs are reported in Table V. The receiver operator characteristics were constructed, and of seven validated proteins, superoxide dismutase 2, aldo-keto reductase 1B10, and galectin 1 had AUCs that were significantly different from 0.5 (Fig. 5). Galectin 1 was significantly different in the mesothelioma group compared with the metastatic carcinoma group (p value Ͻ 0.0001). The AUCs for galectin 1 for the comparison of mesothelioma patients against (i) patients with a benign condition, (ii) all non-mesothelioma patients, and (iii) patients with metastatic carcinoma were 0.78 (95% confidence interval ϭ 0.65-0.90), 0.87 (0.79 -0.95), and 0.93 (0.86 -0.99), respectively.
Survival Analyses-Cutoff selections for each biomarker are reported in supplemental Fig. S2. Aldo-keto reductase 1B10, apolipoprotein C-I, and galectin 1 showed significant prognostic trends (Fig. 6). For aldo-keto reductase 1B10, there was a survival difference of 5.5 months for patients below and above 0.65 ng/ml (11-and 5.5-month median survival, respectively; p value ϭ 0.01), and the corresponding Mantel-Cox hazard ratio was 3.4 between the two groups (95% confidence interval ϭ 1.33-8.63). Patients with high apolipoprotein C-I levels (Ͼ5.33 g/ml) also had a mean survival time that was 5.5 months longer (10 months compared with low expressing patients' survival of 4.5 months; p value ϭ 0.02) with a reciprocal hazard ratio of 2.30 (1.4 -9.0). The last biomarker candidate to show significant prognostic value was galectin 1. At a cutoff of 22.5 ng/ml, galectin 1 separated highand low-expressing patients with median survival times of 4.5 and 9.5 months, respectively (p value ϭ 0.04), with a hazard ratio of 2.34 (1.02-5.41).

DISCUSSION
The role of soluble biomarkers in the diagnostic work-up of malignant mesothelioma is a topic under ongoing debate (43). With the correct ancillary techniques, minimally invasive methods such as effusion cytology are accurate and reliable  . 114) indicate the iTRAQ ion, and "a" and "b" indicate iPool 1 and iPool 2, respectively. B, proteins that differed between samples from mesothelioma and lung adenocarcinoma patients in the OPLS-DA model. Loading scores are derived from 7-fold cross-validation. Positive loading scores indicate that the protein was up-regulated in mesothelioma effusion relative to lung adenocarcinoma effusion, whereas a negative loading score indicates that the protein was down-regulated in mesothelioma effusion relative to lung adenocarcinoma effusion. The fold-changes and q-values of these 37 proteins are listed in Table II. C, an OPLS-DA model based on 10 proteins was constructed after excluding lung adenocarcinoma sample L118a. This model gave R2Y ϭ 0.71 and Q2 ϭ 0.64 with a CV-ANOVA p value ϭ 0.016. Error bars indicate the confidence intervals of the coefficients. *Model 3C's only unique protein when compared with model 3B. diagnostic routes; one such ancillary technique is to measure soluble biomarkers (44). To date the most investigated biomarkers are hyaluronan and soluble mesothelin-related proteins. The soluble mesothelin-related proteins are useful for discriminating malignant mesothelioma from nonmalignant diseases but show limited discrimination against other cancers, whereas hyaluronan lacks sensitivity. Other biomarkers have been evaluated, but no single one is accurate enough to diagnose a mesothelioma from all other conditions (45,46); thus, the search for additional mesothelioma biomarkers is of great importance.
In this study, we screened the proteomes of pleural effusions from patients with epithelioid mesothelioma and compared them to the effusion proteomes from patients with lung adenocarcinoma to identify new candidate biomarker proteins. Some of the identified proteins have already been pro-posed as markers for malignant mesothelioma, including osteopontin (4,(27)(28)(29), apolipoprotein C-I (30), superoxide dismutase 2 (47), and mesothelin (5-10). These findings support the validity of our study design. Significant analysis of microarrays indicated high q-values for the majority of the deregulated proteins. The high q-values were most likely due to the small number of samples in each group and to the high interpatient and technical variance. This was also reflected by the predictability of the OPLS-DA models, which was moderate, indicating high variability within patient groups in the dataset. However, it is hard to avoid high q-values in a study with few clinical replicates that are screened with sensitive techniques (48). Proteins with high q-values were considered candidate biomarkers only when additional biological relevance had been described or if the protein was included in an OPLS-DA model. Fold-change (FC) and q-values (false discovery rate in percentages) were calculated with significant analysis of microarray based on the same patients included in the model. Positive FC ϭ up-regulated and negative FC ϭ down-regulated in effusions from mesothelioma patients relative to those from lung adenocarcinoma patients.
Some of the proteins that were highly expressed in the mesothelioma effusions were factors that were related to metastasis, angiogenesis, redox regulation, proliferation, and immune evasion. Taken together, these results reflect a malignant state in which the immune response is modulated, cells are primed for proliferation, and the microenvironment favors angiogenesis and invasion. Notably, our data showed links to several mesothelioma-specific pathways (Fig. 7). The pathogenesis of mesothelioma is linked to oxidative stress (49) caused by iron-coated asbestos fibers in addition to incomplete digestion of these fibers by macrophages. Reactive oxygen species, which contribute to oxidative stress, induce expression of both superoxide dismutase 2 and catalase (47,50). Additionally, in malignant mesothelioma, physical activation of the TNF-␣ receptor by asbestos fibers also induces superoxide dismutase 2 via the NFkB pathway (47); this enzyme has a protective role in that it dismutates superoxide anions into peroxide and oxygen.
The validation experiments in this study only indicated diagnostic and prognostic trends for superoxide dismutase 2 (Figs. 4 and 6).
Kallistatin reduces the activity of VEGF, TNF-␣, and NFkB in a lung cancer model (51), and a recent analysis of a panel of biomarkers showed that its down-regulation predicts malignant mesothelioma (52). Tenascin C, in contrast, induces NFkB via the Wnt-␤-catenin axis or via fibronectin and integ-rins. Fibronectin/integrin activation can also be dependent on thrombospondin 1 or osteopontin, resulting in an intracellular signaling cascade operating through the Raf-pERK-AP1 axis (53). The AP1 complex is deregulated in mesotheliomas (47) as demonstrated by direct epidermal growth factor receptor activation by asbestos fibers (54). The collagen ␣-1(VI) chain is an inducer of metastasis (55,56), whereas osteoglycin and pigment epithelium-derived factor have both been reported to inhibit cancer metastasis (55)(56)(57)(58)(59), tumor progression, and angiogenesis in a murine mesothelioma model (60). Up-regulation of collagen ␣-1(VI) and simultaneous down-regulation of osteoglycin and pigment epithelium-derived factor could explain in part the aggressive and highly invasive nature of mesotheliomas.
Galectin 1 inhibits the immune system, both in normal conditions and in pathological states (61). Galectin 1 induces T-cell apoptosis in lung cancer, suppressing the immune system and increasing invasion and metastasis (62,63). As a consequence, lung cancer patients with high levels of galectin 1 have a worse prognosis than those expressing lower levels (64). Furthermore, galectin 3, another galectin family member, has been reported to be up-regulated in pleural effusions from adenocarcinomas relative to effusions from malignant mesotheliomas (65).
Seven of the identified proteins were validated in a larger population composed of mesotheliomas of alternating phenotypes, the most commonly metastasizing adenocarcinomas to the pleura, and effusions from nonmalignant conditions such as exudates due to inflammatory conditions and transudates caused by congestive heart failure. This population represents an initial validation cohort, covering the most commonly seen causes of pleural fluids, and acts as a good starting point for validating novel biomarkers. Although the most common metastases that cause pleural effusions are adenocarcinomas of pulmonary and mammary origin, the candidates identified in this study need to be validated against a broader spectrum of malignancies, such as ovarian and gastrointestinal cancers.
In the initial phase of this study, MASCOT and SEQUEST were used to search the data. Galectin 1 was then included in an early OPLS-DA model (supplemental Fig. S1). Literature searches revealed galectin 1's involvement in cancer progression and prognosis, as described above, which led us to include galectin 1 as a candidate for validation. Subsequent re-analysis of our raw data with an updated database and a newer version of Proteome Discoverer, using SEQUEST and Percolator, resulted in slightly different levels of galectin 1. Nevertheless, galectin 1 was still included in the dataset and predicted to be down-regulated in malignant mesothelioma relative to lung cancer. The protein was also included in OPLS-DA models; however, upon OPLS-DA model trimming (conditioning for high variable importance scores and significances) it was excluded before the final model was constructed, as presented in Fig. 3. Validation of galectin 1 confirmed negative prediction a Proteins with known links to malignant mesothelioma and/or mesothelioma diagnosis.
All fold changes (FC) are positive, indicating that proteins were up-regulated in effusions from the screened mesothelioma patients relative to those from lung adenocarcinoma patients. (q-values in percentage).
for mesothelioma when compared with metastatic carcinomas (Figs. 4 and 5). With an AUC of 0.93 (95% confidence interval ϭ 0.86 -0.99), galectin 1 showed excellent performance as a negative discriminator and compared well to the reference markers hyaluronan (0.82 (0.70 -0.94)) and N-ERC/mesothelin (0.87 (0.77-0.96)). Although galectin 1 must be validated in a larger cohort of patients, measuring galectin 1 levels in patient effusions shows great promise as an accurate way to discriminate between malignant mesothelioma and metastatic cancers. As a strong negative predictor, galectin 1 would be a key candidate to be combined in a biomarker panel with positive diagnostic markers such as mesothelin and fibulin 3.
Furthermore, we showed a possible link connecting aldoketo reductase 1B10, apolipoprotein C-I, galectin 1, and the survival of malignant mesothelioma patients (Fig. 6). As already stated, the expression of galectin 1 affects the immune system and is prognostic for lung cancer (61). Even though  FIG. 6. Kaplan-Meier survival estimates. Patients were separated into "high expressers" and "low expressers" based on their hazard ratios. The analysis was performed on all mesothelioma patients with available follow-up data.
The p values were calculated via a logrank (Mantel-Cox) test. galectin 1 has a relatively low expression in malignant mesothelioma, the mere presence of galectin 1 might still be potent enough to affect the immune system and allow a more aggressive phenotype of the cancer to progress, thereby reducing patient survival.
Aldo-keto reductase 1B10 is up-regulated in several cancers (66 -69) and precancerous lesions (70). This enzyme also is associated with cervical cancer recurrence after surgical resection, may have a predictive role in colorectal cancer cells in vitro, and is a diagnostic marker for non-small cell lung cancer (71). Aldo-keto reductase 1B10 has many functions, including acting as a retinaldehyde reductase and repressing retinoic acid synthesis, thereby increasing proliferation (72). Moreover, several studies link aldo-keto reductase 1B10 to resistance against several cytostatic drugs in vitro (68,73), and tissue samples from bladder cancer patients show induction of this enzyme after carboplatin and gemcitabine treatment. Furthermore, bladder cancer patients with high levels of the aldo-keto reductase 1B10 after chemotherapy had significantly lower disease-free survival rates (74). Aldo-keto reductase 1B10 expression has been described as being promoted by EGF and AP1 (c-fos/c-jun), which links it to malignant mesothelioma (75). In the future, the role of aldo-keto reductase 1B10 in mesothelioma should be investigated in terms of patient survival, treatment, and tumor stage. Adjuvant treatment with aldose-reductase inhibitors such as Tolrestat, which is given to diabetes patients, could inhibit the enzyme and increase the effect of a particular chemotherapy, thus prolonging the survival of mesothelioma patients.
In the present study, apolipoprotein C-I did not show diagnostic value for mesothelioma (Fig. 4). This was surprising because this protein has previously been described as a specific mesothelioma biomarker. Hegmans et al. used surface-enhanced laser desorption/ionization MS to discover and validate apolipoprotein C-I as a serum biomarker of mesothelioma (30). In our study, we validated our initial MS findings using an antibody-based ELISA technique. The choice of validation method might explain the discrepancy between these studies, but the difference also might be due to the use of a particular ELISA (76). Nevertheless, apolipoprotein C-I displayed a significant prognostic role. This indicates that even though the diagnostic potential is dependent on the method to quantify certain fragments, the relative expression within the malignant mesothelioma patient group might carry prognostic information.
In summary, here we identified three prognostic candidate biomarkers for malignant mesothelioma. Future studies are needed to evaluate the clinical role of these proteins. We also identified galectin 1 as a negative predictor of malignant mesothelioma. Its discriminatory capacity for malignant mesothelioma seems to be equal to or greater than those of both N-ERC/mesothelin and hyaluronan. Thus, galectin 1 should be considered a prime candidate for clinical validation for use in mesothelioma diagnosis.