Biomarkers Associated With Metastasis and Prognosis of Lung Cancer Based on Microarray Databiomarkers Associated With Metastasis and Prognosis of Lung Cancer Based on Microarray Data

Background: We attempted to discover the biomarker associated with lung cancer metastasis and prognosis. Methods: The mRNA/lncRNA expression proles were downloaded from the publicly available database, which included three highly metastatic and three weakly metastatic samples. The differentially expressed genes and lncRNAs were analyzed and survival analysis were performed based on the TCGA database. The prognosis associated protein-protein interaction (PPI) network and mRNA-lncRNA coexpression network were constructed followed by the function and pathway enrichment analysis. Results: Total 256 differentially expressed genes and 2 lncRNAs were found to be closely related with prognosis. PPI network was constructed with 222 nodes and 1464 edges. Two modules were divided from PPI network. Genes in Module A were signicantly enriched in cell cycle checkpoint, chromosome segregation, and mitotic cell cycle checkpoint. The Module B was closely related with pyridine nucleotide metabolic process, nicotinamide nucleotide metabolic process and carbon metabolism. Coexpression network revealed lncRNA H19 and lncRNA SNHG12 were signicant nodes. SNHG12 was closely related with GO:0006260~DNA replication, GO:0055114~oxidation-reduction process and hsa00010:Glycolysis/ Gluconeogenesis. H19 was enriched in GO:0006555~methionine metabolic process, and GO:0046655~folic acid metabolic process. Conclusion: TTK, CCNB1 and lncRNA SNHG12 may be the biomarker associated with metastasis and prognosis of lung adenocarcinoma.


Introduction
Lung cancer is originated from the uncontrolled cell proliferation in lung tissues which can spread to other tissues by the process of metastasis. This cancer can be classi ed into two types such as small-cell lung carcinoma (SCLC) and non-small-cell lung carcinoma (NSCLC) (1). Smoking is one of the risk factors for lung cancer. Although the incidence of lung cancer is declining in developed countries, with the increasing rate of smoking in developing countries, the incidence of lung cancer is expected to increase especially for China and India (2,3). Adenocarcinoma of lung, as the most common histological subtype of NSCLC is closely related with high mortality and metastasis rate (4). For the poor prognosis, the treatment for lung adenocarcinoma has been widely investigated.
Gene alteration is implicated in the progression of cancers and studies about that drive the discovery of biomarkers for the diagnosis and treatment of lung cancer. It is reported that MALAT1 (metastasis associated lung adenocarcinoma transcript 1) is overexpresseed in lung cancer cells and is the biomarker for predicting metastasis and prognosis of NSCLC (5). Besides, lncRNAs are found to be dysregulated in lung cancers and play a regulatory role in metastasis process of tumor cells (6,7). LncRNA ANRIL is reported to be overexpressed in lung cancer tissues and served as the marker for predicting prognosis of NSCLC patients (8). Recent evidence suggests that lncRNA CAR10 (chromatin-associated RNA 10) is up-regulated in lung tumor tissues and promotes the metastasis of lung adenocarcinoma cells (9). Despite the advances in exploring the pathogenesis of lung cancer, biomarkers predicting the prognosis of lung adenocarcinoma have not been fully understood. Therefore, in this paper, we downloaded the lncRNA/mRNA microarray data (10) associated with NSCLC cell metastasis and re-analyzed the differentially expressed mRNAs and lncRNAs between highly metastatic and weekly metastatic lung adenocarcinoma cells. Different from that study, we also downloaded the mRNA/lncRNA expression pro les of lung adenocarcinoma and predicted the prognosis associated genes and lncRNAs through survival analysis. The prognosis associated gene-lncRNA network was constructed. We aimed to explore the biomarkers associated with the metastasis and prognosis of lung adenocarcinoma.

Data preprocessing and differential expression analysis
The raw data were downloaded and preprocessed by Limma package (12), including background correction, normalization, and concentration prediction. The expression values of mRNA and lncRNA were calculated based on the annotation information of probes. Differential expression analysis between SPC-A-1sci and SPC-A-1 samples was performed with the application of limma package. Classic Bayes test and Benjamini/Hochberg method were used for correction. LncRNA or mRNA with adj.P.val < 0.05 and |log 2 FC (fold change)| > 2 were considered as differentially expressed (dif) gene . Heatmap of dif-mRNAs and lncRNAs were visualized by pheatmap (13) (version: 1.0.10, https://cran.rproject.org/web/packages/pheatmap/index.html) in R.

Survivalanalysis
The data of lung adenocarcinoma samples with prognostic information were downloaded from TCGA database. The expression values and clinical information of dif-lncRNA and -mRNA(gene) were captured. The samples were classi ed into high expression and low expression groups based on the median expression value of a given lncRNA or gene. The Kaplan-Meier survival curves were analyzed by Survival (14) (version: 2.42-6 https://cran.r-project.org/web/packages/survival/index.html) in R package with p < 0.05.

Functional analysis
The dif-mRNAs and -lncRNAs that signi cantly associated with prognosis were subjected to enrichment analysis of GO (Gene ontology) biologial process (BP) terms and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway (15) with the application of clusterpro ler package (16) (version 2.4.3, http://bioconductor.org/packages/3.2/bioc/html/clusterPro ler.html) in R. The GO terms or pathways with P value < 0.05 and count≥ 2 were considered to be signi cant.

Prognosis associated PPI (protein-protein interaction) network analysis
The interactions between the dif-mRNA encoding proteins were predicted by STRING (17)  http://apps.cytoscape.org/apps/MCODE). The modules with score ≥ 5 were captured. Then, the function and pathway related with module genes were analyzed with P value < 0.05 and count ≥ 2.
Coexpression analysis of prognosis associated lncRNA and mRNA analysis The Pearson correlation coe cient between prognosis associated lncRNA and mRNA was calculated. LncRNA-mRNA coexpression pairs with r > 0.99 and p value < 0.05 were collected. The lncRNA-mRNA coexpression network was constructed. The lncRNA associated function and pathways were further analyzed.

RNAs with differential expression
Based on the threshold value, 1457 dif-mRNAs (64 up-regulated and 1393 down-regulated ones) and 119 dif-lncRNAs (19 up-regulated and 100 down-regulated lncRNAs) were obtained. The heatmap illustrated that the SPC-A-1sci and SPC-A-1 samples were clearly distinguished based on the expression pro le of dif-mRNAs ( Figure 1A) and dif-lncRNAs ( Figure 1B).

Survival analysis
In order to explore the mRNAs or lncRNAs closely associated with prognosis, the KM survival curve analysis was performed. The expression signature of 256 mRNAs were associated with prognosis including 11 up-regulated mRNAs and 245 down-regulated mRNAs. Two down-regulated lncRNAs (H19 and SNHG12) were identi ed to be associated with prognosis. The down-regulated mRNAs were closely related with 34 KEGG pathways and 558 GO-BP terms, such as glyoxylate and dicarboxylate metabolism (hsa00630), pentose phosphate pathway (hsa00030), organelle ssion (GO:0048285) and nuclear division (GO:0000280). The top 10 GO terms and pathways were displayed in Figure 2.
PPI network and modules Figure 3A illustrated that the PPI network consisted of 1464 interactions connecting 222 nodes. With the application of MCODE plugin, two modules with score ≥ 5 were captured as module A (score = 30.848) and module B (score = 6.5). Module A was comprised of 34 nodes and 509 edges ( Figure 3B) and module B contained 13 nodes and 39 interaction pairs ( Figure 3C). The module A genes were primarily enriched in cell cycle, DNA replication related pathways and chromosome segregation, cell cycle checkpoint related GO-BP terms ( Figure 4A and B). The genes in module B were closely related with the pathways of glycolysis/ gluconeogenesis, central carbon metabolism in cancer and GO functions of pyridine nucleotide metabolic process and nicotinamide nucleotide metabolic process ( Figure 4A and B).

LncRNA-mRNA coexpressed network
All the prognosis associated genes and lncRNAs were subjected to coexpression analysis. Total 129 lncRNA-mRNA interaction pairs were obtained and the coexpressed network was constructed with 101 nodes containing 2 lncRNAs and 99 genes ( Figure 5). LncRNA H19 and lncRNA SNHG12 were the signi cant nodes in coexpression network. GO function and pathway analysis showed that target genes of SNHG12 was signi cantly enriched in 5 KEGG pathways and 7 GO-BP terms such as GO:0006260~DNA replication, GO:0000082~G1/S transition of mitotic cell cycle and hsa00010:glycolysis/ Gluconeogenesis pathway. H19 was closely related with GO:0006555~methionine metabolic process, GO:0046655~folic acid metabolic process, hsa01130:Biosynthesis of antibiotics and hsa04115:p53 signaling pathway ( Figure 6).

Discussion
Lung adenocarcinoma is the most common type of lung cancer with 80% incidence of all lung cancers (20), which is closely related with metastasis and poor prognosis. Histopathology is necessary for cancer diagnosis, while it is inadequacy for predicting disease progresssion and prognosis of lung adenocarcinoma (21). The gene expression pro le analysis based on microarray data facilitates the discovery of biomarkers for patient survival prediction in lung adenocarcinoma. Here, in this paper, we utilized the lncRNA/mRNA expression pro les of NSCLC cells to identify the biomarkers for predicting metastasis and prognosis of lung adenocarcinoma patients. PPI network was constructed for genes with differential expression. Our data showed that CCNB1 and TTK were the signi cant nodes in PPI network and both of the genes were clustered in module A. CCNB1 encoding cyclin B1 protein is a member of conserved cyclin family that plays a regulatory role in cell cycle (22). The function and pathway analysis showed that CCNB1 was involved in cell cycle related biological processes, such as cell cycle checkpoint and chromosome segregation, which was according to the previous report (23). The genetic polymorphisms of CCNB1 was found to be related with the susceptibility, progression, and survival of breast cancer in Han Chinese (23). CCNB1 has been proposed to be the marker for predicting prognosis of patients with ER+ breast cancer (24). In addition, the downregulation of CCNB1 is closely related impaired cell proliferation and tumor growth of colorectal cancer (25). Besides, the previous gene expression pro ling analysis showed that cell cycle genes such as CCNB1 were altered at the early stage of lung adenocarcinoma (26). Previous evidences have showed that CCNB1 expression is associated with tumor progression and prognosis of lung adenocarcinoma (27,28). Our survival analysis also showed CCNB1 was a prognosis associated gene and was expected to be a biomarker for predicting prognosis of lung adenocarcinoma.
TTK (threonine and tyrosine kinase), also known as Mps1 (monopolar spindle1), is the core component of spindle assembly checkpoint, which plays a key role in chromosomes allocation. TTK has been found to be overexpressed in several types of human cancers, such as glioma, breast cancer, and colon cancer (29). It is reported that TTK protein was signi cantly up-regulated in liver tumor tissues compared with adjacent normal hepatic tissues (30). TTK expression inhibition by TTK siRNA signi cantly suppressed the liver tumor growth and the spread of tumor cells. Targeting TTK has been proposed to be an adjunct therapy for liver cancer. In addition, TTK has been found to be specially overexpressed in triple-negative breast cancer, a aggressive subtype of breast cancer (31). The expression of TTK has been determined to evaluate the prognosis of colon and breast cancer (32,33). Although the role of TTK in cancers has been widely reported, the studies about TTK in lung adenocarcinoma are limited. In this paper, TTK was found to be a prognosis associated gene and prominently enriched in cell cycle checkpoint, chromosome segregation. We suggested that TTK was involved in cell proliferation and may be the biomarker for predicting the prognosis of lung adenocarcinoma.
In addition, lncRNA-mRNA interaction network showed that CCNB1 was a target for lncRNA SNHG12 which was the most signi cant node in lncRNA-mRNA network. A recent study suggested that SNHG12 mediated oxygen-glucose deprivation and induced reoxygenation damage in neurons underlying ischaemic stroke. Our function enrichment analysis showed that SNHG12 was closely related with GO:0055114~oxidation-reduction process and hsa00010:Glycolysis/Gluconeogenesis pathway. Our ndings may explain the role of SNHG12 in ischaemic stroke, which suggested that our ndings were signi cant. Besides, SNHG12 was also involved in GO:0006260~DNA replication and GO:0000082~G1/S transition of mitotic cell cycle, which suggested that SNHG12 could regulate the cell proliferation. Recent evidences show that lncRNA SNHG12 promotes proliferation and metastasis of osteosarcoma and papillary thyroid carcinoma (34,35), which was consistent with our ndings. The oncogenic role of lncRNA SNHG12 is also determined in cervical cancer (36) and prostate cancer (37). However, the prognostic role in lung adenocarcinoma has been reported rarely. In this paper, survival analysis showed that lncRNA SNHG12 was closely associated with prognosis. Thus, we suggested lncRNA SNHG12 as a biomarker for prognosis of lung adenocarcinoma.
In conclusion, the mRNA and lncRNA expression pro le revealed the differential expressed mRNAs and lncRNAs based on microarray data. TTK and CCNB1 were differentially expressed in high metastatic lung adenocarcinoma cells and they were the signi cant nodes in PPI network. LncRNA SNHG12 was identi ed to be the differentially expressed lncRNA in lung adenocarcinoma cells and was a core node in coexpression network. TTK , CCNB1 and lncRNA SNHG12 were proposed to be the biomarker for the metastasis and prognosis of lung adenocarcinoma.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data generated or analysed during this study are included in this published article [and its supplementary information les].

Competing interests
The authors declare that they have no competing interests.

Funding
None.
Authors' contributions JD designed the experiment and drafted the manuscript. HQ collected, interpreted and analyzed the data.
DX reviewed the manuscript for important intellectual content. All authors read and approved the nal manuscript." The heatmap for differentially expressed genes and differentially expressed lncRNAs. The samples between groups are distinguished clearly based on the expression pro le of differentially expressed genes (A) and differentially expressed lncRNAs (B).  Protein-protein interaction network (A) , module A (B) and module B(C). Red circle, up-regulated genes; green rhombus, down-regulated genes; node size represents the node degree. LncRNA-mRNA interaction network. Green rhombus represents gene and blue hexagon represents lncRNA.