Analysis of gene expression changes associated with human carcinoma-associated fibroblasts in non-small cell lung carcinoma

This study aimed to investigate the gene expression changes associated with carcinoma-associated fibroblasts (CAFs) involving in non-small cell lung carcinoma (NSCLC). We downloaded the GEO series GSE22862, which contained matched gene expression values for 15 CAF and normal fibroblasts samples, and series GSE27289 containing SNP genotyping for four matched NSCLC samples. The differentially expressed genes in CAF samples were identified using the limma package in R. Then we performed gene ontology (GO) and pathway enrichment analysis and protein–protein interaction (PPI) network construction using the identified DEGs. Moreover, aberrant cell fraction, ploidy, allele-specific copy number, and loss of heterozygosity (LOH) within CAF cells were analyzed using the allele-specific copy number analysis. We obtained 545 differentially expressed genes between CAF and normal fibroblasts samples. The up-regulated genes are mainly involved in GO terms such as positive regulation of cell migration and extracellular region, while the down-regulated genes participate in the lung development and extracellular region. Multiple genes including bone morphogenetic protein 4 (BMP4) and transforming growth factor, beta 3 (TGFB3) are involved in the TGF-β signaling pathway. Genes including BMP4, TGFBI and matrix Gla protein (MGP) were hub genes. Moreover, no LOH event for BMP4 and MGP was found, that for sphingosine kinase 1 (SPHK1) was 70%, and for TGFBI was 40%. Our data suggested that BMP4, MGP, TGFBI, and SPHK1 may be important in CAFs-associated NSCLC, and the abnormal expression and high LOH frequency of them may be used as the diagnosis targets of CAFs in NSCLC.


Background
Lung cancer is one of the most common leading cause of cancer deaths worldwide [1]. Statistics show that the non-small cell lung carcinoma (NSCLC) accounts for about 85% of all lung cancers in the world [2]. Treatment methods such as surgery, drug therapy and chemotherapy have played certain roles in curing NSCLC [2]. However, in [3] and [4], the authors suggest that 5-year survival rate is poor due to the difficulties in the early diagnose of NSCLC and the easy invasion and metastasis of NSCLC cells. Therefore, exploring the mechanism in NSCLC metastasis and invasion will be of great significance to provide basis for NSCLC diagnosis and treatment.
In [5], authors suggest that carcinoma-associated fibroblasts (CAFs) are the most important component of developing cancers. Recent studies implied that CAFs played important roles in cancers biology occurring in epithelia, such as neoplastic progression, tumor growth, invasion and metastasis [6][7][8]. CAFs constitute a major portion of the reactive tumor stroma and are related to the cell invasion and metastasis in NSCLC during malignancy [9]. Also, Neta et al. [10] referred that CAFs activities promoted the macrophage recruitment, neovascularization, and tumor growth in incipient neoplasia to orchestrate tumor inflammation via the NF-kB signaling pathway. Furthermore, an increasing number of evidences show that mutations of genes for CAFs are crucial for cancer metastasis and growth. For examples, some changes in DNA copy number of cluster CAFs may contribute to the metastasis for breast cancer cells [11]. In spite of many researches devoted to the exploration of the role of CAFs in NSCLC, the mechanism of CAFs in NSCLC still remains not fully understood.
Using the GEO (Gene Expression Omnibus) series GSE22862, Horie et al. [12] proved that RNAi-mediated targeting of transforming growth factor, beta 1 (TGFB1) ligands were beneficial for lung cancer treatment through its action on cancer and stromal cells. In this study, we screened the differentially expressed genes (DEGs) in NSCLC samples compared with the normal samples using the same GEO series GSE22862 [13]. Bioinformatics methods were used to analyze the functions and pathways of the DEGs, as well as the allele-specific copy number (ASCN) of them to predict hub genes which were related with the CAFs in NSCLC. This study aimed to explore the underlying genes associated with pathomechanism of CAFs in NSCLC, which may help to search for diagnosis and treatment targets for this disease.

Microarray data
We downloaded the GEO series GSE22874 [13] from the GEO database in NCBI (http://www.ncbi.nlm.nih. gov/geo/). This series contains four subseries and two of them (GSE22862 and GSE27289) were analyzed in this study. The single nucleotide polymorphism (SNP) series GSE27289 is generated from four paired primary NSCLC CAF and normal fibroblasts (NF) samples based on the platform of GPL13135 HumanOmniExpress Bead-Chip. GSE22862 is a CAF series, which originates from matched gene expression values from 15 CAF and NF samples of resected NSCLC tissues based on the platform of GPL5175 Affymetrix Human Exon 1.0 ST Array (Affymetrix Inc., Santa Clara, California, USA). At present, we chose four matched samples (4 CAF samples and 4 NF samples) of two stages I squamous cell carcinomas and 2 stage II adenocarcinoma from the 30 samples of GSE28682.

Microarray data preprocessing and conversion
The CEL file data of GSE22862 download from the GEO database have been normalized using the Robust Multi-array Analysis method [14] in affy (http://www. bioconductor.org/packages/release/bioc/html/affy.html) package in Bioconductor. When multiple probes correspond to the same gene, the mean expression value was calculated and considered as the expression value of this gene. Meanwhile, the CEL file data of GSE27289 that have been preprocessed using the GenomeStudio [15], were extracted for the SNP symbols, chromosome number, chromosome location, B allele frequency and Log R values both in CAF samples and in NF samples.

Screening and functional enrichment analysis of DEGs
We screened the DEGs between CAF and NF samples in GSE22862 series using the limma package in R (http:// www.bioconductor.org). The p value <0.05 and |log 2 (fold change)| ≥0.58 (fold change ≥1.5 or ≤0.67) were chosen as the thresholds. In the present study, we did not performed multiple test correction for the p values, because the corrected p values were too low to select enough significant genes.
In addition, we conducted gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analyses for the DEGs using the Database for Annotation Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/) [16] online tool. p value <0.05 was chosen as threshold.

Protein-protein interaction (PPI) network construction of DEGs
PPI analysis can provide new insights into protein function, besides, it may help to uncover the generic organization principles of functional cellular networks [17], therefore, we would construct PPI network to further analyze the DEGs. Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, http://string.embl. de/) [18] that provided functional associations between proteins was used to predict the interaction pairs of the selected DEGs. In this study, we only mapped the DEGs into STRING database to predict the PPI pairs because we intended to investigate the interactions between DEGs. Cytoscape (http://www.cytoscape.org/) [19] that was widely used to integrate biomolecular interaction networks into models was used to construct the PPI network of the DEGs. Most of previous obtained biological networks were found to obey the scale-free attribution [20]. Thus, we analyzed the connectivity degree of nodes in the PPI network by topological analysis to obtain the important nodes with higher degrees (hub proteins) [21]. Genome-wide studies have shown that deletion of a hub protein is more likely to be lethal than deletion of a non-hub protein, thus, we think that the hub nodes may play important roles in the CAF of NSCLC.

ASCN analysis
ASCN analysis of tumor [22,23] is a method that can be used to analyze the aberrant cell fraction, ploidy, ASCN, and loss of heterozygosity (LOH) of tumor cells. The ASCN for one gene was calculated based on the Log R and B allele frequency values of allele B. In this study, we used ASCN method to determine the values of the aberrant cell fraction, ploidy, ASCN and LOH of the selected genes. Twenty-two autosomes plus two gender associated chromosomes were chosen as parameters of the number of the chromosomes.

Candidate gene identification
On account of GO and KEGG pathway enrichment results, as well as PPI network of the DEGs, we identified the potential critical genes combining with the results of ASCN analysis. Additionally, based on the databases of tumor suppressor (TS) gene [9] and tumor-associated gene (TAG) [10], we identified the DEGs that function as transcription factors, tumor suppressors or oncogenes. Finally, the ASCN values of these identified genes were analyzed.

Identified DEGs
With p value <0.05 and |log 2 (fold change)| ≥0.58, we identified 545 DEGs including 66 up-regulated and 479 downregulated DEGs in CAF samples compared with NF samples in GSE22862. The number of down-regulated DEGs was far more than that of up-regulated DEGs, suggesting the important roles of down-regulated genes in NSCLC.

Functional enrichment analysis of DEGs
The GO functions and pathways of up-and down-regulated DEGs in CAF samples were shown in Tables 1, 2, respectively. Up-regulated genes are significantly involved in GO terms such as cell surface receptor linked signal transduction, G-protein couple receptor protein, positive regulation of cell migration and extracellular region, while down-regulated genes are associated with GO terms including lung development, respiratory tube development and extracellular region (Table 1).

Table 1 The enriched gene ontology terms and pathways of the differentially expressed genes (DEGs) in carcinoma-associated fibroblasts (CAF) of non-small cell lung carcinoma (NSCLC)
BP biological process, CC cellular component, MF molecular function, Count number of genes In addition, only pathway of Olfactory transduction is enriched by up-regulated genes, such as olfactory receptor, family 6, subfamily B, member 1 (OR6B1), olfactory receptor, family 5, subfamily L, member 2 (OR5L2) and olfactory receptor, family 10, subfamily H, member 1 (OR10H1). The down-regulated genes, such as bone morphogenetic protein 4 (BMP4), TGFB3 and SMAD family member 3 (SMAD3), are mainly involved in TGF-β signaling pathway ( Table 2).

Analysis of copy number of alleles
The aberrant cell fraction and ploidy of genes of tumor cells in CAF samples were respectively 77% and 2.002 based on the ASCNT analysis. In addition, based on the functional enrichment analysis and PPI network, we obtained 16 critical genes. ASCN results of the 16 critical genes were shown in Table 3. The copy number of alleles of most genes were abnormal except for MGP, BMP4, PPARG and TLR4, moreover, they were all positive for LOH. For instance, the LOH proportion of TGFB3 and sphingosine kinase 1 (SPHK1) was 70%, while that of MGP was 0% (Table 3).

Identified candidate genes
The result of function annotation for DEGs in CAF samples was shown in Table 4. The results displayed that six up-regulated genes and ten down-regulated genes function as transcription factors. In addition, SPHK1 and pre-B-cell leukemia homeobox 1 (PBX1), etc. were oncogenes, while runt-related transcription factor 3 (RUNX3) and TGFBI, etc. were tumor suppressor genes ( Table 4).

Discussion
In this study, we screened a total of 66 up-regulated and 479 down-regulated DEGs by comparing gene expression between CAFs tissues and NF tissues in NSCLC. DEGs including BMP4, TGFB3 and MGP involving in the TGF-β signaling pathway were found to be hub genes in the PPI network. Moreover, no LOH was found in both BMP4 and MGP. Besides, LOH of the tumor oncogene SPHK1 was 70%, and of tumor suppressor gene TGFBI was 40%.
BMP4 is a member of the bone morphogenetic protein family which belongs to the transforming growth factor (TGF-β) superfamily [24], while no LOH events were found about it. In addition, it was found that BMP4 was a hub gene in the PPI network and involved in lung

Table 3 Copy numbers of allele and loss of heterozygosity (LOH) for the genes using the allele-specific copy number (ASCN) analysis
Gene symbol the name of one gene, proportion the percentage of copy number of A allele minus copy number of B allele, Description the frequency of LOH, FC fold change  [25] reported that the TGF-β factor signaling which was regulated by the transcriptional co-activator p/CAF (a histone acetyltransferase), played key roles in breast cancer cell migration and invasion. In previous studies, BMP4 was found play important roles in developmental and many cellular processes including invasion and migration of various cancer cells [26,27]. Since BMP2 is closely associated with NSCLC metastasis [28], BMP4 may be a contributor to affect the behavior and function of CAFs in NSCLC via TGF-β signaling pathway. Meanwhile, TGFB3 as another member of the TGF-β family is also enriched in lung development and TGF-β signaling pathway. It was detected with a high frequency of LOH (70%). Some studies indicated the abnormal expression of TGFB3 in NSCLC [29,30]. Therefore, abnormal expression of BMP4 and TGFB3, as well the high LOH frequency of TGFB3 may be used as biological indicators for malignant CAFs in NSCLC.
Our data also showed that tumor suppressor TGFBI was a hub protein in the PPI network and LOH value of it was 40%. TGFBI has been reported to suppress tumor cell growth in NSCLC [31] and other types of human lung cancers [32]. Therefore, TGFBI may be a tumor suppressor for NSCLC. Besides, Muraoka et al. [33] proved that blocking TGFBI expression enhanced inhibition of tumor cell migration and metastasis. Also, Levy et al. [34] proved that LOH of TGFBI located on chromosome 19q13.1 contributed to the metastasis of breast cancer cells. Thus, LOH may lead to the down-regulation of TGFBI in this study. On the other hand, Fong et al. [35] said that TGFBI promoted the migration of lung cancer cells. CAFs are important for breast cancer cell migration and metastasis [36]. Based on the previous evidences, we speculated that TGFBI may be a tumor suppressor for NSCLC and may be responsible for the CAFs migration in NSCLC.
SPHK1 catalyzes the phosphorylation of sphingosine to form sphingosine-1-phosphate (S1P), a lipid mediator with both intra-and extracellular functions [37]. Presneau and his colleagues proved that the LOH of SPHK1 assigned to chromosome region at 17q25.1-q25.2, suggesting its key role in regulating early events during the development of sensory ganglia in ovarian   -regulated TBX4, RORB, PPARG, KLF4, IFI16, PBX1, SOX5,  NFIA, SMAD3, FOXP2   TEC, MLF1, PBX1, BANF, MLLT3,CDON LXN, DPP4, CLU, RARRES1, TGFBI, IGFBP5,  ERRFI1, CADM1, JUP, DIRAS3, TXNIP,  RARRES3, RPS6KA6, DNAJB4, WNT5A,  SFRP1, ITGA7, DAPK1, RASL11A, EPB41,  TP53INP1, SCARA3, HBP1, LIMD1, FBXO32,  SMAD3, RNASET2, KCNRG, SEMA3B,  NRCAM cancer [38]. Thus, the higher LOH value may be related to the abnormal expression of SPHK1 in NSCLC samples. Therefore, the positive result of SPHK1 at LOH analysis was consistent with the up-regulation of it. In addition, GO analysis showed that SPHK1 enriched in multiple cancer-related processes including positive regulation of cell migration. Chen et al. [39] reported that the tumor oncogene SPHK1 which was up-regulated was responsible for tumor cell migration in hepatocellular carcinoma. Moreover, SPHK1 was found to enhance the NSCLC cell apoptosis via activating PI3K/Akt pathway [40]. Besides, it can induce breast cancer cell migration [41]. Therefore, we speculated that over-expressed SPHK1 may be associated with CAFs migration in NSCLC. Interestingly, some of our results were in accordance with the original publication by Navab et al. [13]. In their study, microarray gene-expression analysis of the 15 matched CAF and NF cell lines identifies 46 differentially expressed genes which are significantly enriched for extracellular proteins regulated by the TGF-β signaling pathway. Our study also found that BMP4 and TGFB3 are involved in TGF-β signaling pathway, suggesting the important role of this pathway in CAF-associated NSCLC. Additionally, their study revealed prominent involvement of the focal adhesion and MAPK signaling pathways, which was not identified in our study. This difference may be due to different sample size and different screening threshold for DEGs.
In conclusion, genes such as BMP4, TGFB3, TGFBI, and SPHK1 may play important roles in CAFs of NSCLC. BMP4 and TGFB3 may be contributor to make effects on CAFs in NSCLC via TGF-β signaling pathway. SPHK1 promoted while TGFBI inhibited the NSCLC progression, and both of them may be associated with CAFs migration in NSCLC. In addition, high frequency of TGFB3, TGFBI and SPHK1 may be useful for distinguishing the normal fibroblasts from malignant CAFs in NSCLC. However, further experimental studies are still needed to confirm our results.