Data from a comparative proteomic analysis of tumor-derived lung-cancer CD105+ endothelial cells

Increasing evidence indicates that tumor-derived endothelial cells (TECs) are more relevant for the study of tumor angiogenesis and for screening antiangiogenic drugs than normal ECs (NECs). In this data article, high-purity (>98%) primary CD105+ NECs and TECs purified from a mouse Lewis lung carcinoma model bearing 0.5 cm tumors were identified using 2D-PAGE and Matrix-assisted laser desorption/ionization tandem mass spectrometry (MALDI-MS/MS). All the identified proteins were categorized functionally by Gene Ontology (GO) analysis, and gene-pathway annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG). Finally, protein–protein interaction networks were also built. The proteomics and bioinformatics data presented here provide novel insights into the molecular characteristics and the early modulation of the TEC proteome in the tumor microenvironment.


Specifications Table
Subject area Biology More specific subject area

Tumor microenvironment
Type of data The proteins were separated using 2D-PAGE, in-gel digested and analyzed using MALDI TOF/TOF Data source location

Data accessibility
The data is available with this article

Value of the data
Highly optimized method for primary ECs proteomic analysis by 2D-PAGE from tumor tissues. Bioinformatics data can be useful for clarified the heterogeneity of tumor derived ECs. The differentially expressed proteins indicate the potential function of the TEC in tumor microenvironment.

Data
The data is related to the identification and verification of transgelin-2 as a potential biomarker of tumor-derived lung-cancer endothelial cells by comparative proteomics [1]. A highly optimized method for primary CD105 þ NECs and TECs proteomic analysis by 2D-PAGE and MALDI-MS/MS was presented here. All the identified proteins were categorized by GO, KEGG and protein-protein interaction analysis, to clarify the function of TEC in tumor microenvironment.

Experiment design, materials and methods
Primary CD105 þ NECs and TECs were isolated from a mouse Lewis lung carcinoma model bearing 0.5 cm tumors. Differentially expressed proteins were identified using 2D-PAGE and Matrix-assisted laser desorption/ionization tandem mass spectrometry (MALDI-MS/MS). 2D-PAGE was performed using the GE Ettan™ IPGphor™ 3 and DALTsix system. Proteins were visualized by silver staining, and images were recorded on a GE ImageScanner III system and analyzed with the ImageMaster 2D Platinum software. Mass spectrometry data were obtained in an automated analysis loop using a 4800 Plus MALDI TOF/TOF™ Analyzer (Applied Biosystems, USA), and collected using the 4000 Series Explorer™ software and submitted to database search via GPS Explorer™ (Applied Biosystems). MASCOT Server version 2.2 and NCBI non-redundant database were used for protein identification. A total of 63 spots resulted in the identification of 48 unique proteins (28 up-and 20 down-regulated proteins) were detected by at least 1.5-fold changes in TECs. All the identified proteins were categorized functionally by Gene Ontology (GO) analysis. Gene-pathway annotations were compiled from Kyoto Encyclopedia of Genes and Genomes (KEGG), BioCarta, BioCyc, and Reactome. Protein-protein interaction networks were built using the DIP, MINI, BioGRID, IntAct, and STRING databases, and the data were imported into Cytoscape in order to visualize the graphs.

Establishment of ECs cultures
Primary ECs were purified by combining the enzymatic digestion, differential adherence and magnetic cell-sorting using a CD105 MultiSort Kit, according to the procedure described in the Journal of Proteomics paper [1]. Endothelial phenotype and purity were confirmed by cytofluorimetric analysis on the basis of positive expression of a panel of endothelial markers [2,3], and isotype control stainings were shown in Fig. 1. ECs (CD105 expression of 4 98%) at first passage were used for the proteomic analysis to maintain the most properties of the in vivo state [4].

Comparative proteomic analysis of NECs and TECs
NECs and TECs were harvested and suspended in lysis buffer containing 7 M urea, 4% CHAPS, 2 M thiourea, 60 mM DTT, 10 mM Tris, 1 mM EDTA, 0.002% bromophenol blue, and 2% ampholine (pH 3-10) [5]. Cells were disrupted on ice by five 15 s pulses of sonication, followed by five cycles of freezethaw: 5 min in liquid nitrogen, 1 min in a 37°C water bath and 3 min at room temperature. Then, supernatant fractions were collected after centrifugation at 14,000 Â g for 40 min at 4°C and then stored at À 80°C. Protein concentration was determined using a Bradford assay kit. A non-linear pH gradient of 3-10 was chosen for isoelectric focusing (IEF). The second-dimension was performed on a 12.5% SDS-PAGE to optimize the separation of proteins from 12 to 97 kDa. Before IEF, a solution containing 50 mM MgCl 2 , 1 mg/mL DNase 1, and 0.25 mg/mL RNase A was added to the protein samples at ratio of 1:20 (V/V). Aliquots containing 100 μg of protein were resuspended in 250 μL of rehydration solution. Equal amount of sample was loaded in triplicate. After 18 h of rehydration of the IPG strips (GE Healthcare, USA), IEF was performed using the GE Ettan™ IPGphor™ 3 system at 67,860 V Á h. After focusing, the strips were first equilibrated for 15 min in a buffer containing 6 M urea, 20% glycerol, 2%SDS, 2% DTT, and then for 15 min in the same buffer containing 2.5% iodoacetamide instead of DTT. SDS-PAGE was performed on a GE Ettan™ DALTsix system. Finally, the proteins were visualized by silver staining. Briefly, gels were soaked in fix solution (50% ethanol, 10% acetic acid) for at least 45 min, rinsed in 30% ethanol and ddw for 3 Â 10 min, respectively. To sensitize, gels were soaked in sensitivity enhancing solution (2 mL of 10% sodium thiosulfate solution per liter) for 2 min (one gel at a time), followed by rinsed in ddw for 2 Â 1 min. For silver reaction, submerged gel in 0.1% silver stain solution [0.1% silver nitrate with 0.08% formalin (37%)] for 20 min, followed by rinsed in ddw for 2 Â 1 min. Developed image in development solution [2% sodium carbonate with 0.04% formalin (37%)] until desired intensity of staining occur, then quickly washed in 5% acetic acid for 10 min, and rinsed in ddw for 5 min to stop the staining. Finally, all gels were rinsed with water (several changes) prior to drying or densitometry.  Images were recorded on a GE ImageScanner III system. The gels were analyzed with the Ima-geMaster 2D Platinum software, and automatic spot matching in conjunction with detailed manual checking of the spot finding, to identify proteins in both the NECs and TECs. The quality of the gels was verified by using the quality control of the software. Spot intensities were expressed as the percentage of the integrated spot density (volume) over the total density of all measured spots. Significantly over-abundant spots were detected at a significance level of 5% and a fold number of 41.5. After statistical analysis, 63 spots were identified in TECs, compared with NECs, and the histograms in Fig. 2 show the relative levels of signal intensity. The histograms contain information about spot ID, spot intensity, relative ratio, and statistical result of triplicate repeats. Spots that were differentially expressed between NECs and TECs were then isolated and identified using mass spectrometry as described below.
Gels were analyzed with the ImageMaster 2D Platinum software. The quality of the gels was verified by using the quality control of the software. Spot intensities were expressed as the percentage of the integrated spot density (volume) over the total density of all measured spots. Significantly over-abundant spots were detected at a significance level of 5% (p-value o 0.05%) and a fold number of 41.5.
Differentially expressed protein spots were picked manually and enzymatic digestion in-gel was carried out according to the procedure of Zimmerman et al. with some modifications [6]. Briefly, dried gel pieces were incubated with 10 μL of 25 μg/mL sequencing-grade trypsin (Promega) in 40 mM ammonium bicarbonate for 30 min at 4°C. Then another 20 μL of 40 mM ammonium bicarbonate was added to ensure complete cover of the pieces. Digestion was carried out at 37°C for 12 h and peptides were recovered by sequencing extractions with 25 mM ammonium bicarbonate, 50% ACN/0.1% TFA, and 100% ACN, and all steps were repeated once more.

Database searching
MALDI-MS/MS data were obtained in an automated analysis loop using a 4800 Plus MALDI TOF/ TOF™ Analyzer (Applied Biosystems, USA). Digested peptides were desalted using C18 ZipTips s (Millipore, USA). MS and MS/MS spectra were collected using the 4000 Series Explorer™ software and submitted to database search via GPS Explorer™ (Applied Biosystems). MASCOT Server version 2.2 (Matrix Science, London, UK) and the NCBI non-redundant database were used for protein identification. The search parameters were set as follows: taxonomy: Mouse; mass values, monoisotopic; precursor mass tolerance, 71 Da; fragment mass tolerance, 70.3 Da; enzyme, trypsin; maximum missed cleavage allowed, 1; modifications, carbamidomethyl Cys (permanent); methionine and proteinprotein interaction networks for differentially expressed proteins in TECs. Proteins were uploaded into the Ingenuity Pathway analysis (IPA) software server. The network was built using the STRING (http://string-db.org/) database, and the data were imported into Cytoscape (www.cytoscape.org) for visualization. Table 3 The top 10 differentially expressed proteins sorted by network betweeness.  Table 1. For these candidate biomarkers, our results are in agreement with published data.

Real-time PCR analysis of selected proteins
On the basis of GO annotations (20 proteins, including in the top 10 GO BP terms) and proteinprotein interaction analysis results (3 proteins), the mRNA levels of 23 differentially expressed proteins were analyzed by real-time RT-PCR. Total RNA was extracted using TRIzol. RT-PCR analysis was performed by using the SYBR s Green I RT-PCR Master Mix kit from Bio-Rad Laboratories, Inc. on a Rotor-Gene 3000 system. The relative mRNA levels of differentially expressed proteins were normalized to that of GAPDH, and NECs were used for calibration. Primers for selected proteins are listed in Table 2. Measurement of △Ct was performed in triplicate. RT-PCR data were analyzed for relative gene expression using the △△Ct method. The results of the RT-PCR analysis were mostly consistent with those obtained in the 2D-PAGE analysis (see Fig. 3A).

Bioinformatics analysis of the identified proteins
To get a precise prediction, multiple bioinformatics methods were performed. First, the mouse genes thus identified were associated with their putative human orthologs using NCBI's HomoloGene resource. Homogene annotations were downloaded from "ftp://ftp.ncbi.nih.gov/pub/HomoloGene/ build67/homologene.data." Then, the molecular functions of the all identified proteins were assigned on the basis of a search against the Human Protein Reference Database (HPRD, HPRD_Re-lease9_041310.tar.gz). Results including biological process, cellular component, and molecular function were shown in Fig. 3B-D. Second, all the identified proteins were categorized functionally by GO analysis. GO was downloaded from the GeneOntology website (geneontology.org/ontology/ geneontology_edit.obo). Corresponding mouse GO-gene annotations were downloaded from the NCBI Entrez Gene ftp website (ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz). The GO analysis results, including the biological process (BP), cellular component (CC), and molecular function (MF), were generated. Gene set enrichment analysis revealed that all the differentially expressed proteins were enriched in 99 GO terms (p o0.05), including 58 BP, 23 MF and 18 CC. The top 10 GO terms ranked according to their significance level were listed in Fig. 4A.
Third, gene-pathway annotations were compiled from Kyoto Encyclopedia of Genes and Genomes (KEGG), BioCarta (http://www.biocarta.com/), BioCyc, and Reactome. A hypergeometric test was chosen for statistical analysis, and significantly enriched pathways were identified at a corrected pvalue of o0.05. Results were listed in Fig. 4B.
Forth, protein-protein interaction networks were built using the Database of Interacting Proteins (DIP), Molecular Interaction (MINI), Database of Protein and Genetic Interaction (BioGRID), IntAct molecular interaction (IntAct), and STRING (http://string-db.org/) databases, and the data were imported into Cytoscape in order to visualize the graphs. The graphs was shown in Fig. 4C, and the details of top 10 proteins were listed in Table 3, including the degree, betweenness, gene ontology, and KEGG pathway.

Verification of candidate proteins in clinical samples
Lung squamous cell carcinoma specimens from 30 patients (11 lung squamous cell carcinoma and 19 adenocarcinoma) were chosen for IHC analysis. Histopathology reports were also obtained along with the samples, and shown in Table 4. Serum samples from 54 LC patients, 31 colorectal cancer patients, 31 esophageal cancer patients, and 84 normal individuals were used for the ELISA analysis. The clinical data of the LC patients are presented in Table 5.