Comprehensive network analysis of different subtypes of molecular disorders in lung cancer

Lung cancer is the leading cause of cancer-related death worldwide. In this study, we attempted to identify the common pathogenesis of lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) based on a modular and comprehensive analysis method. Data were downloaded and the differences analyzed in LUAD samples, LUSC samples, and normal samples, respectively. Co-expression analysis, enrichment analysis, and hypergeometric testing were used to predict transcription factors (TFs) and ncRNAs, as well as target genes. We obtained 4,596 differentially expressed genes which were clustered into 14 modules dysfunction. The 14 clustered genes (including DOK2, COL5A1, and TSPAN8) were identified as the core genes of the module. Module genes are substantially involved in biological processes, such as extracellular matrix, carbohydrate binding and renal system development, and signal transduction as well, including PPAR signal transduction, cGMP-PKG signal transduction, PI3K-Akt signal transduction, and Apelin signal transduction. We identified ncRNA (miR-335-5p, ANCR, TUG1) and transcription factors (RELA, SP1) to regulate dysfunction module genes essentially. The analysis showed that comprehensive co-expression analysis contributes to understanding the TF ncRNA. Moreover, it assisted in further understanding of the molecular pathogenesis of co-expression of modular genes that regulate LUAD and LUSC. It provided a precious resource and theoretical basis for further experiments.


Introduction
Lung cancer remains one of the cancers with the highest incidence and ranks first in cancerrelated deaths [1]. Lung cancer is generally categorized as non-small cell lung cancer (NSCLS) and small cell lung cancer, of which the major histological subtypes of NSCLS are LUAD and LUSC [2]. Patients with lung cancer usually have complications, and brain metastasis remains the most common neurological complication [3]. Lung cancer is a heterogeneous disease comprising several subtypes with pathologic and clinical relevance. Both environmental and genetic factors contribute to lung cancer development. Environmental factors include exposure to alfalfa, cooking fumes, asbestos, heavy metals and environmental tobacco fumes, human papillomavirus infection etc. [4,5]. In patients with non-small cell lung cancer, smokers are found to be more con-nected with LUSC than adenocarcinoma, and adenocarcinoma is more common in nonsmokers [6]. Genetically, there is a strong association between SNP rs920778 and rs1899663 in HOTAIR and susceptibility to primary lung cancer [7]. Mir-196a2 polymorphism affects the susceptibility of lung cancer [8]. The polymorphism of MIRLET7BHG (the MIRLET7B host gene at 22q13.31) may be a significant predictor of asbestos exposure-connected lung cancer [9]. Researchers studied the pathogenesis of lung cancer subtypes from all aspects and achieved specific results. For example, Jin's study detected that miR-375 is essentially upregulated in LUAD and small cell lung cancer. However, the expression is down-regulated in LUSC, and it was found that miR-375 targets ITPKB to promote cell growth of small cell lung cancer [10]. Down-regulation of lncRNAs LINC00222 inhibits the proliferation and migration of LUAD cells [11]. Knockout USP33 inhib-its migration, invasion, and metastasis of LUAD cells via IL-6 and SLIT2/ROBO1 signal transduction pathways [12]. MiR-372-3p targeting FGF9 promotes cell proliferation and metastasis in LUSC [13]. MicroRNA-588 targets GRN to inhibit migration and invasion of LUSC [14]. Although many studies have also found important molecules in LUSC and LUAD, we are still not clear about the core molecules. We conducted a systematic modular analysis and exploration, and identified common dysfunction modules and core molecules between LUAD and LUSC to further investigate pathways involved in the pathogenesis of LUAD and LUSC. Comprehensive co-expression analysis has clinical significance to our understanding of the molecular disorders in lung cancer. Moreover, it helps us to understand further the molecular pathogenesis of co-expression of modular genes regulating different subtypes of network. It provides a precious resource and theoretical basis for further experiments.

Data resource
The study aimed to apply high-throughput genomic analysis techniques to obtain a better understanding of cancer and medically contribute to prevention, diagnose and treatment of cancer. The expression profile data of LUAD included 61 LUAD paired samples and 56 normal samples. Expression profile data for LUSC included 48 LUSC paired samples and 48 normal samples. Raid v2.0 database [27899615] has registered more than 5.27 million RNA related interactions, involving 130,000 RNA in 60 species/protein symbols, which provides enough to comprehensively observe RNA related interactions. We screened 431,937 ncRNA-mRNA interaction pairs and 5,431 ncRNAs (score ≥ 0.5) from raid v2.0 database. At the same time, all human TF data were downloaded and used in TRRUST v2 database for transcription research. And the study has obtained the approval from the ethic committee of our hospital.

Statistical analysis
Most of the statistical analysis was undertaken on SPSS software (version 19.0, IBM Corp., Armonk, NY, USA). P-value less than 0.05 was considered indicative of statistical significance.

Difference analysis
For RNA-Seq data on LUAD and LUSC on TCGA, we used R language DEseq2 for differential analysis [25516281]. The R language DEseq2 analysis process consisted of three main steps, including normalization, dispersion estimation, and differential expression testing. Normalization was performed using weighted conditional likelihoods. By simulating the dispersion in all samples, we observed and corrected for too low dispersion estimates. BBSeq simulated the dispersion on the mean. DSS used the Bayesian method to estimate the dispersion of individual genes. These genes could explain the heterogeneity of the dispersion values of different genes. BaySeq and ShrinkBayes estimated the a priori of the Bayesian model for all genes. The findings provided posterior probabilities and false discovery rates for differential expression. For differential genes, we screened for highly differential multiples of P < 0.01 and |logFC| > 1.

Co-expression analysis
We used WGCNA to explore the co-expression of high differential folding genes in 4,596 LUADs and LUSC [19114008]. We analyzed the expression profile matrix of high differential multiple genes and found synergistic expressed gene modules. We use the N-th power of the gene correlation coefficient and the weight of the correlation coefficient to obtain the correlation coefficient between any two genes. In addition, the connections between genes in the network were upon a scale-free network distribution. Correlation coefficients between genes constructed a hierarchical clustering tree. We resorted to the association between ME and clinical features to determine the relevant modules. Gene significance (GS) was defined as the log 10 conversion of the P value (GS = 1 g P) in a linear regression between gene expression and clinical information. Module significance (MS) was defined as the average GS of all genes in the module, with the modules defined as clinical traits.

Enrichment analysis
The exploration of gene function and its signal transduction are often effective methods to study the molecular mechanism of disease. For the genes of 14 important modules of LUAD and LUSC, we used the R language Cluster pro-filer package [22455463] for enrichment analysis on KEGG pathway and Go function (P-value cutoff = 0.05, q-value Cutoff = 0.05).

TFs and ncRNAs regulating dysfunctional modules
For each dysfunctional module, we specified that the pivot regulator refers to the target adjustment times between each adjuster and each module more than twice. Based on the hypergeometric test P-value < 0.01, we obtained the interaction between the regulator and the module. In the study, we used the ncRNA target data as the established background prediction. We wrote the R program to predict and got the key regulator of the dysfunction module. Based on the human ncRNA-mRNA interaction pairs in the RAID v2.0 database and all human TF target data in the TRRUST v2 database, the predicted ncRNA and TF were retrospected to the target gene. We obtained the target gene and detected significant difference in multiple gene intersections.

Determining the typical expression of dysregulated molecules in LUSC and LUAD
To screen for genes that may cause ADC and SCC, based on microarray expression profiling data, we conducted differential gene screening for ADC, SCC, and normal samples in TCGA. The intersection of the two differential genomes showed that 4,596 differences were common matrix was set. Through WGCNA, the study found genes that showed significant group coexpression in disease samples. We obtained 14 functional disorder modules, including 2,965 differential genes ( Figure 3A, 3B and Table S3) and identified the co-expression panel as one module. Central gene with the highest connectivity in each module was identified based on the gene connectivity degree of the network, and 14 central genes including DOK2, COL5A1, and TSPAN8 were acquired (Table S4). The co-expression network indicated that the central gene was an essential gene of the dysfunction module. The blue module and LUAD were negatively correlated with LUSC ( Figure 3C), which suggested that the turquoise module might have functions in tumorigenesis of SCC, and the blue module had contributions in the development of ADC.

Module genes involved in functions and pathways
The exploration of functional pathways involved in the exploration of dysfunctionality modules was conducive to defining the relationship of genes in the same pathway within a module. They also facilitate to build molecular bridges between modules and diseases in systems biology. Enrichment analysis was performed on 14 modules and obtained 6,431 biological processes, 728 cells, 1,290 molecular functions, and 180 KEGG pathways (Table S5). What's more, the genes of the six modules essentially participated in related biological processes ( Figure 1). The results showed that there were 5,952 differential genes in LUAD, and there were 8,055 differential genes in LUSC (Tables S1, S2 and Figure 2A, 2B). Among these genes, there might be related genes that have a significant effect on the development of ADC and SCC, which required further analysis.   such as extracellular matrix, carbohydrate binding and renal system development ( Figure 4A). In addition, two modules of genes were involved in PPAR signal transduction, cGMP-PKG signal transduction, PI3K-Akt signal transduction, and Apelin signal transduction ( Figure 4B).

TF and ncRNA driving lung cancer subtype progression
We carried out a pivotal analysis of the co-modules based on the targeted regulatory relationship of TF and ncRNA to the modular genes and investigated vital transcriptional regulators that regulate the progression of LUAD and LUSC. The results (Tables S6, S7) showed that a total of 341 ncRNAs involved 360 ncRNA-module regulatory pairs and 57 TFs contained 62 TF-module target pairs. Besides, the number of control modules was analyzed, and the most dysfunctional modules with ncRNA (miR-335-5p, ANCR, TUG1, miR-29c-3p) and TF (RELA, SP1) were obtained. These TFs and noncoding RNA were found to regulate the progression of LUAD and LUSC by regulating dysfunctional modules. Potential regulators were identified as dysfunctional molecules of LUAD and LUSC.

Retrospective target gene
The study found that miRNA, snRNA, snoRNA, asRNA, and piRNA in the cell were all synthesized by non-coding genes. They participated in biological processes including gene activation, gene silencing, gene imprinting, does compensation, protein synthesis, and function regulation, and metabolic regulation. Furthermore, a combination of four core ncRNAs and five core TFs that favored the LUAD and LUSC co-expression modules, and regulated genes and genes involved in the pathway ( Figure 5A, 5B). The study observed a relationship between the regulation of common differential gene regulation (transcriptional and post-transcriptional) and gene-dependent pathways in LUAD and LUSC. By modulating the genes of modules 1, 2, 3, 6, 7, 13, the study observed four core ncRNAs while understanding their involvement in the regulation of multiple signal transductions. Among the five core TF regulatory modules, genes 1, 2, 3, 7, and 10 participated in the regulation of multiple pathways. Among them, the most regulated genes of TUG1 and SP1 had the strongest connectivity and had contributions in the pathogenesis of LUAD and LUSC, and were considered to be the most nuclear regulatory factors.

Discussion
Lung cancer is a heterogeneous disease [15,16]. It is pathologically categorized into subtypes of non-small cell lung cancer, such as LUAD, LUSC, and large cell neuroendocrine carcinoma [17]. This study collected the RNA-Seq data of LUAD and LUSC from TCGA for differential analysis, and obtained differential genes with high different fold. Therefore, these differential genes were found to have significant contributions to the dysfunctional mechanisms of LUAD and LUSC. Combining the weighted gene co-expression network, the study identified 14 co-expression modules, among which genes had synergistic expression behavior, and we believe that the synergistic expression of these genes promotes the occurrence of the disease. Furthermore, pathways and functions in the module were observed, as well as the signal transductions involved in the two modules including PPAR signal transduction, cGMP-PKG signal transduction, PI3K-Akt signal transduction, and Apelin signal transduction. TGFβ induced PPAR γ signal transduction to promote EMT and played a vital part in the invasion and migration of lung cancer cells [18]. For patients with advanced LUAD, signal transductions (MAPK, PI3K-Akt, Ras, and cGMP-PKG) are considered to be most likely connected with platinum resistance [19]. Also, CK2α might regulate the invasion and migration of LUAD cells through the PI3K-Akt signaling pathway [20]. The silencing receptor tyrosine kinase ROR1 inhibited the proliferation of LUAD cells via the PI3K/AKT/mTOR signal transduction [21]. P53 regulated the survival of LUSC cells by inhibiting PI3K/AKT signaling [22]. At the molecular level, we identified 14 central genes such as DOK2, COL5A1, and TSPAN8 through a coexpression network. In addition, these genes also remained an important regulatory role in the dysfunction module. DOK2 had been identified as a gene for tumor suppressor of EGFR mutant LUAD [23]. Liu's study found that COL5A1 might promote the metastasis of LUAD cells [24]. Up-regulation of TSPAN8 promoted cell viability and proliferation, which in turn caused non-small cell lung cancer [25]. It was also identified that 341 ncRNA-driven modules  that functioned, including the long non-coding gene (ANCR, TUG1) and the small non-coding gene (miR-335-5p, miR-29c-3p). ANCR downregulated the TGF-β1 pathway to inhibit migration and invasion of NSCLC cells [26]. The upregulation of TUG1 was connected with increased tumor size, a degree of differentiation, lymph node metastasis, distant metastasis, and TNM staging. It is the most promising diagnostic marker for patients with LUAD [27]. TUG1-mediated HOXB7 expression affected cell proliferation in non-small cell lung cancer [28]. Increased expression of miR-335-5p inhibits cell proliferation in NSCLC cells [29]. We examined 57 TFs (HDAC2, NANOG, RELA, SP1, and SP3) that mediate differential gene co-expression networks in LUAD and lung squamous cell lung cancer, thereby regulating the pathogenesis of lung cancer subtypes. HDAC2 upregulated fibronectin by NF-κB to initiate migration and invasion of NSCLC cells [30]. NANOG is expressed in a variety of cancers, and its expression is connected with poor survival in cancer patients [31]. With platinumbased chemotherapy to advanced non-small cell lung cancer, NANOG can be a poor predictor [32]. Co-expression of RELA and ACTN4 induces apoptosis in non-small cell lung cancer cells [33]. In the early stage of lung cancer, SP1 mediates the expression of miR-182, which inhibits the expression of FOXO3 and stimulates the proliferation of lung cancer cells. Down-regulation of advanced SP1 and miR-182 increases the expression of FOXO3 caused metastasis of lung cancer cells [34,35]. SP1 regulates cell proliferation during the development of non-small cell lung cancer [36]. Key regulatory factors have regulatory contributions and have an essential impact on the formation of LUAD and LUSC. This study has a certain limitations for that it conducted data screening through existing data and did not possess genetic characteristics for specific populations. Therefore, it may be limited by geography when the research results are applied in clinical practice.
In conclusion, this study obtained 4,596 differentially expressed genes clustered into 14 modules of dysfunction. The 14 gene clusters were identified as the core genes of the module. Module genes take substantial part in biological processes of lung cancer. It also identified ncRNA pivot (miR-335-5p, ANCR, TUG1) and Transcription Factors pivot (RELA, SP1) to regulate dysfunction module genes essentially, which expected to set a theoretical basis for future research.

Disclosure of conflict of interest
None.
Address correspondence to: Yun Zhang, Department of Nursing, The First Affiliated Hospital of Shandong First Medical University, Jinan, Shandong, China. Tel: +86-18663759461; E-mail: zy18663759461@163.com Figure 5. Regulation of the core regulatory factor regulatory module genes involved in signal transduction. A. Core ncRNA-Gene-pathway network diagram. The red hexagon repersents the core ncRNA, the light red diamond means the signal transduction, and the rest are the genes within the module. B. Core TF-Gene-pathway network diagram. The red arrow represents the core TF, the light blue diamond represents the signal pathway, and the rest represents the genes within the module.