Identification of DEGs
In the present study, the DEGs between thymoma and normal thymus tissues adjacent to the tumor were screened by using microarray. An adjusted P value < 0.05 and ∣Log2FC∣≥1 were set as cut-off criteria at first. We selected 738 DEGs, including 256 upregulated and 482 downregulated genes. The heatmap indicated that 738 DEGs were indeed differentially expressed between thymoma and normal thymus tissues, as illustrated in Fig. 1A. The top 10 upregulated DEGs included CCL25, E2F2, DNTT, PASK, HIST1H2BD, S100A14, NPTX1, HIST1H2AE, RAG2, and HIST1H1D. While the top 10 downregulated DEGs included IL-6, MYOC, ADH1A, PLIN1, ADIPOQ, ADH1C, MGST1, LPL, CIDEC, and FMO2 (Fig. 1B).
Enrichment analysis of DEGs
To find out the biological functions and signaling pathways involved in 738 DEGs, GO and KEGG pathways enrichment analyses were conducted using DAVID. The top 10 for each GO term and the top 10 KEGG pathways are listed. The results of GO showed that: for biological processes, the upregulated DEGs were significantly enriched in cell cycle, cell division, mitotic cell cycle process, chromosome segregation, mitotic nuclear division, while downregulated DEGs in cellular response to chemical stimulus, blood vessel development, vasculature development, cardiovascular system development, blood vessel morphogenesis; for cellular component, upregulated DEGs were mainly enriched in condensed chromosome, chromosomal region, kinetochore, condensed chromosome kinetochore, microtubule cytoskeleton, while downregulated DEGs in extracellular matrix, extracellular region, extracellular organelle, extracellular exosome, cell surface; for molecular function, upregulated DEGs were particularly enriched in chromatin binding, kinase binding, kinetochore binding, protein kinase binding, tubulin binding, while the downregulated DEGs in heparin binding, sulfur compound binding, signaling receptor binding, structural molecule activity, growth factor binding.
The result of KEGG indicated that the upregulated DEGs were particularly enriched in Systemic lupus erythematosus, Alcoholism, Viral carcinogenesis, Cell cycle, and Primary immunodeficiency, while the downregulated DEGs in Complement and coagulation cascades, Malaria, Viral protein interaction with cytokine and cytokine receptor, PPAR signaling pathway, and Vascular smooth muscle contraction (Fig. 2).
PPI network and module selection
We imported 738 DEGs into the STRING online database to obtain the PPI network. After removing free genes, we constructed a PPI network complex with 675 nodes and 6745 edges, based on STRING and Cytoscape analysis. MCODE was used to identify the significant cluster modules in the PPI network and we obtained the top 2 significant modules(Figure 3). For details, module 1 was composed of 66 nodes and 1944 edges, including 66 upregulated DEGs, while module 2 was composed of 64 nodes and 264 edges, including 9 upregulated DEGs and 55 downregulated DEGs. Following GO annotation screening, module 1 was revealed to be associated with nuclear division (BP), chromosomal region (CC), and ATPase activity (MF), and the results were listed in Table 1. Meanwhile, module 2 was revealed to be associated with the regulation of smooth muscle cell proliferation (BP), collagen-containing extracellular matrix (CC), and extracellular matrix structural constituent (MF), and the results were listed in Table 2. The KEGG pathway enrichment analysis revealed that module 1 was mainly enriched in the Cell cycle, DNA replication, and Fanconi anemia pathway (Table 1), while module 2 was mainly enriched in the Hematopoietic cell lineage, FoxO signaling pathway, and AMPK signaling pathway (Table 2).
Table 1
GO enrichment and KEGG pathways analysis of module 1
Terms | ID | Description | p.adjust | qvalue |
BP | GO:0000280 | Nuclear division | 2.6765E-31 | 1.64261E-31 |
BP | GO:0007059 | chromosome segregation | 2.6765E-31 | 1.64261E-31 |
BP | GO:0048285 | organelle fission | 3.90102E-30 | 2.39412E-30 |
BP | GO:0140014 | mitotic nuclear division | 3.90102E-30 | 2.39412E-30 |
BP | GO:0000819 | sister chromatid segregation | 1.64451E-27 | 1.00926E-27 |
CC | GO:0098687 | chromosomal region | 9.76794E-32 | 4.99413E-32 |
CC | GO:0000776 | kinetochore | 6.2444E-25 | 3.19263E-25 |
CC | GO:0005819 | spindle | 3.4413E-18 | 1.75946E-18 |
CC | GO:0005874 | microtubule | 9.8966E-10 | 5.05991E-10 |
CC | GO:0030496 | midbody | 3.27481E-09 | 1.67434E-09 |
MF | GO:0016887 | ATPase activity | 2.0207E-08 | 1.27623E-08 |
MF | GO:0008017 | microtubule binding | 1.09298E-07 | 6.90301E-08 |
MF | GO:0003777 | microtubule motor activity | 3.70237E-07 | 2.33834E-07 |
MF | GO:0015631 | tubulin binding | 1.50005E-06 | 9.47399E-07 |
MF | GO:0035173 | histone kinase activity | 7.73364E-06 | 4.8844E-06 |
KEGG | hsa04110 | Cell cycle | 2.2813E-09 | 1.57804E-09 |
KEGG | hsa03030 | DNA replication | 1.12271E-06 | 7.76613E-07 |
KEGG | hsa03460 | Fanconi anemia pathway | 0.000139817 | 9.67152E-05 |
KEGG | hsa04115 | p53 signaling pathway | 0.000385498 | 0.00026666 |
KEGG | hsa04218 | Cellular senescence | 0.006065831 | 0.004195913 |
Table 2
GO enrichment and KEGG pathways analysis of module 2
Terms | ID | Description | p.adjust | qvalue |
BP | GO:0048660 | regulation of smooth muscle cell proliferation | 7.16734E-06 | 4.67E-06 |
BP | GO:0019216 | regulation of the d metabolic process | 8.30286E-06 | 5.41E-06 |
BP | GO:0006638 | neutral lipid metabolic process | 8.30286E-06 | 5.41E-06 |
BP | GO:0006639 | acylglycerol metabolic process | 8.30286E-06 | 5.41E-06 |
BP | GO:0006641 | triglyceride metabolic process | 2.99665E-05 | 1.95E-05 |
CC | GO:0062023 | collagen-containing extracellular matrix | 6.36603E-10 | 4.44E-10 |
CC | GO:0005811 | lipid droplet | 3.8072E-05 | 2.65E-05 |
CC | GO:0005788 | endoplasmic reticulum lumen | 0.000296028 | 0.000206 |
CC | GO:0045121 | membrane raft | 0.000296028 | 0.000206 |
CC | GO:0098857 | membrane microdomain | 0.000296028 | 0.000206 |
MF | GO:0005201 | extracellular matrix structural constituent | 6.30957E-08 | 4.48E-08 |
MF | GO:0019838 | growth factor binding | 4.60891E-05 | 3.27E-05 |
MF | GO:0071723 | lipopeptide binding | 0.00033957 | 0.000241 |
MF | GO:0048018 | receptor ligand activity | 0.001948086 | 0.001382 |
MF | GO:0030546 | signaling receptor activator activity | 0.001948086 | 0.001382 |
KEGG | hsa04640 | Hematopoietic cell lineage | 2.52995E-05 | 2.17E-05 |
KEGG | hsa04068 | FoxO signaling pathway | 0.006500142 | 0.005564 |
KEGG | hsa04152 | AMPK signaling pathway | 0.018413393 | 0.015763 |
KEGG | hsa03320 | PPAR signaling pathway | 0.022333974 | 0.019119 |
KEGG | hsa04270 | Vascular smooth muscle contraction | 0.024157616 | 0.02068 |
Hub genes identification and analysis
The PPI network was imported into Cytoscape software to visualize and hub genes were selected by the cytoHuba plug-in. We ranked the top 15 hub genes by the Degree, MCC, and Closeness methods, and the results revealed the top 10 genes identified as hub genes, including CDK1, E2F2, IGF1, IL6, JUN, MCM2, MKI67, MYC, TOP2A, and VWF (Fig. 3). We used GEPIA to detect the transcript levels of top10 hub genes in 118 thymoma tissues and 339 peri-tumor tissues from the TCGA database. The results indicated that there was no difference between tumor tissues and peri-tumor tissues in IL6, while CDK1, E2F2, IGF1, JUN, MCM2, MKI67, MYC, TOP2A, and VWF were significantly highly expressed in tumor tissues compared with peri-tumor tissues (Fig. 4).
Hub genes survival analysis and validation
The prognostic value of 10 hub genes was evaluated scientifically by Kaplan-Meier-Plotter. As illustrated in Fig. 5, it was observed that a total of 118 thymoma patients were available for the analysis of overall survival (OS) and six hub genes had a close connection with prognosis. The relative higher expression of JUN(HR:4.95[1-24.47], p = 0.031) was associated with a worse OS in thymoma patients, while the lower expression of CDK1(HR:0.17[0.03–0.83], p = 0.015), E2F2(HR:0.16[0.03–0.83], p = 0.015), MCM2(HR:0.16[0.03–0.82], p = 0.014), MKI67(HR:0.18[0.04–0.89], p = 0.02) and TOP2A(HR:0.12[0.02–0.58], p = 0.0018) was associated with a worse OS in thymoma patients. Survival analyses of the other hub genes (IL6, VWF, MYC, IGF1) did not show any statistical significance (p > 0.05).
To validate the bioinformatics analysis results, we collected 80 samples of thymoma tissues and 10 samples of paraneoplastic thymic tissues to perform immunohistochemical analysis. Compared with thymus tissues, 6 hub genes (CDK1, E2F2, MCM2, MKI67, TOP2A, and JUN) presented more positive immunotherapy results in thymoma tissues (Fig. 6). Above all, we demonstrated that CDK1, E2F2, MCM2, MKI67, TOP2A, and JUN were significantly correlated to patients’ outcomes and validated their high expression in thymoma by immunohistochemical analysis.
Relationships between hub genes’ expression and clinical characteristics of thymoma
To explore the relationship between six prognosis-related genes and thymoma patients’ clinical characteristics, we compared the expression levels of six genes with pathological subtypes, tumor stage, sex, and age, based on the IHC scores of 80 samples of thymoma tissue and 10 samples of paraneoplastic thymus tissue. The results indicated that the expression levels of the six genes were not different when comparing tumor stage, age, and sex. However, the expression levels of six hub genes showed significant differences between 5 subtypes of thymoma (including A, AB, B1, B2, and B3). Among the six genes, the expression of JUN and MKI67 were higher than normal thymus tissue and showed no difference in 5 subtypes of thymoma. TOP2A and CDK1 were highly expressed in AB, B1, B2 and B3 subtypes of thymoma compared with normal thymus tissue. Furthermore, E2F2 was particularly overexpressed in type B2 thymoma compared with the other subtypes and normal thymus tissue (Fig. 7).
Relationship between hub genes expression and tumor purity and immune infiltration
The tumor microenvironment (TME) is a complex and continuously evolving entity, including tumor cells, stromal cells, and infiltrating immune cells. TIMER database provides a systematical analysis of immune infiltrates across diverse cancer types. In our study, we used the TIMER database to analyze the association of specific gene expression levels with thymoma immune populations.
Among the 6 hub genes, interestingly, only E2F2 (r=-0.198, p = 3.35E-02) was negatively correlated with tumor purity. Moreover, the expression of CDK1, E2F2, JUN, and MCM2 was significantly correlated with the infiltration of B cells, CD8 + T cells, CD4 + T cells, macrophages, neutrophils, and DCs. The expression of MKI67 and TOP2A was significantly correlated with the infiltration of B cells, CD8 + T cells, CD4 + T cells, macrophages, and DCs (Fig. 9). On the other hand, the copy number variation (CNV) OF E2F2 and JUN was significantly correlated with infiltration levels of CD4 + T cells, macrophages, and DCs. CDK1 was associated with CD8 + T cells, macrophages, and DCs. MCM2 was associated with CD4 + T cells and DCs. MKI67 was significantly correlated with B cells, macrophages, and DCs. The expression of TOP2A was associated with B cells, CD8 + T cells, CD4 + T cells, macrophages, and DCs (Fig. 8).
Hub genes GESA analysis
To gain an in-depth understanding of the potential biological function of the 6 hub genes in thymoma, we performed GSEA on TCGA-thymoma RNA-seq data. We acquired a total of 100 significant genes by GSEA, with both positive and negative correlations. GSEA was used to conduct a hallmark for CDK1, E2F2, JUN, MCM2, and TOP2A. As is shown in Fig. 9 the results indicated that 6 hub genes were all enriched through the cell cycle and p53 signaling pathway. Meanwhile, the 4 genes (CDK1, JUN, MKI67, and TOP2A) were all enriched in RNA degradation. Moreover, CDK1 and MCM2 were enriched in the TGF-β signaling pathway. E2F2 and TOP2A were enriched in nucleotide excision repair. Interestingly, the signaling pathway most involved MCM2 contained DNA replication regulation of autophagy, cell cycle, oxidative phosphorylation, p53 signaling pathway, and TGF-β signaling pathway. Some of these pathways had a close connection with the development and metabolism of tumor.