Bioinformatics analysis of the key potential ceRNA biomarkers in human thymic epithelial tumors

Abstract Background: Thymic epithelial tumors (TETs), originating from the thymic epithelial cells, are the most common primary neoplasms of the anterior mediastinum. Emerging evidence demonstrated that the competing endogenous RNAs (ceRNAs) exerted a crucial effect on tumor development. Hence, it is urgent to understand the regulatory mechanism of ceRNAs in TETs and its impact on tumor prognosis. Methods: TETs datasets were harvested from the UCSC Xena as the training cohort, followed by differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miRNAs (DEmiRNAs) at different pathologic type (A, AB, B, and TC) identified via DESeq2 package. clusterProfiler package was utilized to carry out gene ontology and Kyoto encyclopedia of genes and genomes functional analysis on the DEmRNAs. Subsequently, the lncRNA-miRNA-mRNA regulatory network was constructed to screen the key DEmRNAs. After the key DEmRNAs were verified in the external cohort from Gene Expression Omnibus database, their associated-ceRNAs modules were used to perform the K-M and Cox regression analysis to build a prognostic significance for TETs. Lastly, the feasibility of the prognostic significance was validated by receiver operating characteristic (ROC) curves and the area under the curve. Results: Finally, a total of 463 DEmRNAs, 87 DElncRNAs, and 20 DEmiRNAs were obtained from the intersection of differentially expressed genes in different pathological types of TETs. Functional enrichment analysis showed that the DEmRNAs were closely related to cell proliferation and tumor development. After lncRNA-miRNA-mRNA network construction and external cohort validation, a total of 4 DEmRNAs DOCK11, MCAM, MYO10, and WASF3 were identified and their associated-ceRNA modules were significantly associated with prognosis, which contained 3 lncRNAs (lncRNA LINC00665, lncRNA NR2F1-AS1, and lncRNA RP11-285A1.1), 4 mRNAs (DOCK11, MCAM, MYO10, and WASF3), and 4 miRNAs (hsa-mir-143, hsa-mir-141, hsa-mir-140, and hsa-mir-3199). Meanwhile, ROC curves verified the accuracy of prediction ability of the screened ceRNA modules for prognosis of TETs. Conclusion: Our study revealed that ceRNAs modules might exert a crucial role in the progression of TETs. The mRNA associated-ceRNA modules could effectively predict the prognosis of TETs, which might be the potential prognostic and therapeutic markers for TETs patients.


Introduction
Thymic epithelial tumors (thymoma and thymic carcinoma, TETs) were the most common primary tumors of anterior mediastinum in humans, which originated from thymic epithelial cells. In light of the 2015 World Health Organization (WHO) classification, TETs were categorized as thymomas (A, AB, B1, B2, and B3 subtypes) and thymic carcinomas (TCs) based on the tumor cell morphology, degree of atypia, and extent of the thymocyte component. [1] Thymoma has the basic structure of the thymus, whereas the tissue structure in TC is unclear. The WHO histological classification of thymoma is closely related to its prognosis. Type A, AB, and B1 thymoma progress slowly, and the 5 to 10 years overall survival rate is more than 80%. [2] While thymomas B2 and B3 have an intermediate behavior, and their 5-year survival rate is close to 50%. [3] Early TETs can be treated surgically, but the advanced and recurrent TETs have only palliative therapy with chemotherapy. Although targeted therapy had shown promising results in other types of tumors, the advance of targeted therapy in TETs is still slow. [4][5][6] Therefore, screening the biomarkers for early identification and potential molecular targets for advanced and recurrent TETs was greatly significant.
The ceRNAs competitively bind specific miRNAs with other RNA molecules to regulate the expression of target genes. [7][8][9] Considering that any transcripts containing miRNA response elements can theoretically function as ceRNAs, they may represent a widespread form of post-transcriptional regulation of gene expression, whether in pathological or physiological situations. The activity of ceRNA could be affected by many factors, such as the abundance, subcellular localization, the binding affinity of miRNAs to sponge, RNA secondary structure, and RNA binding protein. The distortions of these factors may release the control of the ceRNA network, implicated in human diseases, including cancer. [10] The endogenous RNA molecules with miRNA action sites can compete with miRNA combinations, thereby indirectly regulating the expression of miRNA target genes. This competitive binding miRNA role is called miRNA molecular sponges. [11] It has Chen et al. Medicine (2021) 100: 24 Medicine been shown that various RNA molecules (including mRNA, lncRNA, miRNA, circular RNA, and Pseudogenes) could be mutually regulated through the ceRNAs mechanism, thereby affecting the proliferation and metastasis of the tumors. [12] In addition, ceRNAs are also considered as a specific mechanism involved in the regulation of tumor-related proteins. [13] Therefore, the research for ceRNAs molecular related to TETs would provide new perspectives on the diagnosis and treatment of TETs.
In this study, we comprehensively analyzed and identified the differentially expressed mRNA (DEmRNAs), lncRNAs (DElncR-NAs), and miRNAs (DEmiRNAs) that were associated with TETs progression. The potential key DEmRNAs were screened via lncRNA-miRNA-mRNA network and their ceRNAs modules. The K-M and Cox regression analysis were used to assess the prognostic significance for TETs. The predictive accuracy was validated by receiver operating characteristic (ROC) curves and area under the curve (AUC) validation. The study identified the novel diagnostic ceRNAs modules of TETs and provided new evidence for the early screening and therapeutic target for TETs patients.

Data collection
The mRNAs, lncRNAs, and miRNAs expression profiles of TETs were downloaded from the UCSC Xena (https://xena.ucsc.edu/) database, containing 16 samples of type A, 35 samples of type AB, 57 samples of type B, 11 samples of TC, as well as 2 samples of normal tissue. The flow chart of the study was presented in Fig. 1. And our study had been approved by the ethics committee of The Second Affiliated Hospital of Harbin Medical University.

Differently expression genes identification
Firstly, the genes that were expressed in less than 10% of the samples would be eliminated. Next, the R package DESeq2 [14] was used to screen the differently expression genes (DEGs) between normal tissues and different pathologic type of TETs (A, AB, B, and TC), under the threshold of false discovery rate (FDR) < 0.25 and Log 2 (Fold Change (FC)) > 1.

GO and KEGG pathway annotation of DEmRNAs
Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis could be used to investigate the potential biological function of the target multigene or geneset. In our study, the R package clusterProfiler (enrichGO and enrichKEGG function) [15] was used to conduct the GO and KEGG enrichment analysis. P < 0.05 was considered to be significant.

Construction of ceRNAs network
The lncRNA-miRNA-mRNA interaction network was constructed via Cytoscape according to the relationship between lncRNA, mRNA, and miRNA. The criteria adopted to determine the ceRNA relationship was as follows: (1) mRNAs and lncRNAs were competitively bound to miRNA. The hypergeometric method used to verify the enrichment significance of miRNAs between mRNA and lncRNA via the following formula (P < .05)

Verification of DEmRNAs in an external validation cohort
To further verify the reliability of the results, an external cohort of GSE79978 from Gene Expression Omnibus (GEO) database which contained 13 samples of TETs and 3 samples of the normal thymus, was utilized to perform the DEGs analysis via the Wilcoxon Rank sum test. (P < .05).

Overall survival analysis and establishment of the TETs-specific prognostic significance
To explore the relationship of selected ceRNAs modules with the prognosis of TETs patients, the survival analysis was conducted via Kaplan-Meier survival analysis and log-rank test (cutoff point: median value). Univariate cox proportional hazards regression was used to identify the relationship between  lncRNA-miRNA-mRNA and overall survival. For each ceRNA module, the multivariate survival analysis risk score was calculated to determine the impact of ceRNA on the prognosis of TETs via the survival package in R. In addition, the survival ROC package [16] was utilized to build the ROC curves and estimate the area under the ROC curve (AUC) for the 4 ceRNA models to evaluate the accuracy for the prediction of prognosis.

Differential RNA expression analysis in TETs
Among the TETs datasets from UCSC Xena, a total of 6632 mRNAs, 2798 lncRNAs, and 316 miRNAs were obtained. We

GO and KEGG pathway annotation of DEmRNAs
A total of 463 abnormally expressed DEmRNAs were analyzed in this part. The enrich GO/KEGG function was used for GO and KEGG annotation. GO analysis could be divided into 3 functional groups: biological process (BP), cellular component (CC), and molecular function (MF) groups (Fig. 4). The most enriched GO biological process terms were "cell-cell junction," "membrane raft," "G-protein coupled peptide receptor activity," and "extracellular matrix structural constituent" while the most significant KEGG pathways were "PPAR signaling pathway," "Calcium signaling pathway," "Vascular smooth muscle contraction," "Insulin signaling pathway" (Fig. 5). Therefore, the functional enrichment results suggested that DEmRNA might be involved in the progression of TETs by regulating the above related biological processes and pathways.

Construction and analysis of ceRNA network in TETs
To explore the potential relationship of DEGs, miRanda and starBase v2.0 database were used to predict the interaction of miRNAs with lncRNAs and mRNAs. According to the pairs of miRNA-mRNA and lncRNA-miRNA, an integrated lncRNA-miRNA-mRNA-based ceRNA network was built via Cytoscape software. The ceRNA network consisted of 12 lncRNA, 25 mRNAs, and 13 miRNAs, with a total of 50 nodes and 87 edges (Fig. 6). Based on the molecular level, lncRNA LINC00665, lncRNA NR2F1-AS1, lncRNA RP11-285A1.1 ranked the first, second, and fifth, respectively. mRNA MCAM, mRNA MYO10, mRNA WASF, and mRNA DOCK11 ranked fifth, sixth, sixth, and seventh, respectively.

GEO data verifies the expression of key ceRNA
The high throughput sequencing profile GSE79978 of TETs from GEO database was utilized to validate the expression of key DEmRNAs in the ceRNA network. The mRNAs DOCK11, MCAM, MYO10, and WASF3 were proved to be consistent with previous studies. As shown in Figure 7 and Table 1, DOCK11 showed significantly lower levels in tumor datasets, while MCAM, MYO10, and WASF3 exhibited significantly higher levels in tumor datasets, compared with the normal group.

Discussion
TETs, as a kind of low incidence tumor, mainly originate from thymic epithelial cells. The pathological classification provides a powerful tool for grading patient's prognosis, [17] in which TETs are further subdivided into thymomas (A, A/B, B1, B2, and B3 subtypes) and TCs. Thymomas are less aggressive and the 5-year survival rate for patients with thymoma is about 90%. [18,19] While TCs is highly malignant and invasive and the 5-year survival rate of TCs is only 55%. [20,21] Therefore, exploring the molecular pathological mechanism of TETs would help to provide a theoretical reference for the treatment of TETS in the future.
With the development of biotechnology, high-throughput sequencing technology and microarray had been applied to the research of TETs genome, such as epigenetic analysis, gene mutation detection, transcriptome analysis, and copy number variants. [22][23][24] Recently, some researchers have revealed that some RNA molecules are widely involved in tumor initiation and progress. For example, circulating miR-21-5p and miR-148a-3p  www.md-journal.com might act as a new biomarker in plasma to evaluate the efficacy and prognosis of TETs. [25] Up-regulation of miR-145-5p could inhibit the proliferation and migration of TC1889 cells by inhibiting its target gene expressions. [26] In this study, DEGs were screened by downloading the lncRNA, miRNA, and mRNA profiles of different pathologic types in TETs from the UCSC Xena database. A total of 2798 DElncRNAs, 316 DEmiRNAs, and 6632 DEmRNAs in the TETs group were distinguished. And 463 common DEmRNAs of different pathologic types in TETs were performed the functional enrichment analysis. The GO analysis showed that the DEGmRNAs were primarily involved in "cell-cell junction," "membrane raft," "G-protein coupled peptide receptor activity," "extracellular matrix structural constituent." KEGG analysis confirmed that "PPAR signaling pathway," "Calcium signaling pathway," "Vascular smooth muscle contraction," "Insulin signaling pathway" might be involved in the development of tumors. In these results, PPAR signaling pathway had been proved that it might be a new molecular target for the treatment of TCs. [27]     An increasing number of studies have proved that ceRNAs are involved in the development of various tumors, and it could exert diverse biological functions in this process. ceRNAs regulatory network provides a new perspective to understand the relationship between miRNAs-mediated lncRNAs and mRNAs. Here, the ceRNA network of TETs was constructed, which may regulate the progression of TETs through a variety of mechanisms. In this ceRNA network, there were 50 nodes and 87 edges, including12 DElncRNAs, 13 DEmiRNAs, and 25 DEmRNAs. After verification with the external cohort, we found that DEmRNAs DOCK11, MCAM, MYO10, and WASF3 were consistent with the results from the training cohort. And finally, the 4 DEmRNA associated-ceRNA modules (DOCK11-RP11-285A1.1-hsa-mir-143, MCAM-NR2F1-AS1-hsa-mir-141, MYO 10-LINC00665-hsa-mir-140, and WASF3-LINC00665-hsa-mir-3199) were selected for the follow-up study. In the ceRNAs network, we found that LINC00665 had the highest degree which may suggest its important role in TETs. lncRNA LINC00665 could bind with mRNAs MYO10 and WASF3 through miRNAs hsa-mir-140, hsa-mir-3199. This indicated that lncRNA LINC00665 might be the key molecule in the molecular regulatory mechanism of TETs. Studies had revealed that lncRNA LINC00665 was significantly upregulated in lung cancer and was an independent predictor of survival in lung cancer patients. And lncRNA LINC00665 could enhance the proliferation and invasion ability of lung adenocarcinoma cells via absorbing mir-98 and being involved in ERK signaling pathway. [28] The lncRNA NR2F1-AS1 was the second degree in the ceRNA network, which could indirectly interact with the mRNAs FBN1, GALNT16, HAND2, and MCAM via hsa-mir-140, hsa-mir-139, and hsa-mir-141. Research had shown that NR2F1-AS1 upregulated FOXA1 through the adsorption of miR-483-3p, thus increasing the malignant degree of osteosarcoma and suggested it might be a potential therapeutic target for osteosarcoma. [29] To further explore the biological function of the ceRNAs network, we used the Kaplan-Meier curve analysis to screen the DEGs related to the prognosis of TETs patients. Our analysis found that the above 4 ceRNAs modules, composed by 3 lncRNAs, 4 miRNAs, and 4 mRNAs., were significantly associated with the overall survival rate of TETs patients (P < 0.01). These 4 mRNAs MCAM, MYO10, WASF3, and DOCK11 were fifth, sixth, sixth, and seventh in the ceRNA network. As a new molecular marker, DOCK11 was reported to be highly expressed in TCs. [30] MCAM, MYO10, and WASF3 were reported to be closely related to the regulation of the proliferation and metastasis of malignant tumors. [31][32][33] However, our study still had some limitations. We just identified the ceRNA with mRNA as the target through the method of bioinformatic analysis without further experimental verification. Meanwhile, the 4 identified ceRNAs modules still need to be closely integrated with clinical practice to verify its reliability. To compensate for these shortcomings, we plan to collect TETs samples and relevant clinical information and conduct experiments to verify our results.

Conclusion
In this study, we analyzed the abnormal expression of key RNA molecules in different pathological types of TETs. Furthermore, the common DEmRNAs were used to conducted GO and KEGG enrichment analysis and revealed their potential biological function in TETs. A ceRNAs network was constructed based on the DEmRNA, DElncRNA, and DEmiRNA where 4 crucial DEmRNAs (DOCK11, MCAM, MYO10, and WASF) were screened and validated via an external cohort. The 4 ceRNAs modules associated with DOCK11, MCAM, MYO10, and WASF were closely related to the prognosis and survival of TETs which were verified via ROC analysis. In summary, these relevant ceRNA molecules not only provided the basis for the diagnosis and prognosis of TETs but also may be new therapeutic targets. Our study also further explored the molecular mechanism of RNA in TETs development, which would pave the way for future clinical practice.