Bioinformatics analysis reveals the ceRNA co-expression network in tumor microenvironment and prognostic biomarkers in Sarcomas


 Background

Sarcomas (SARCs) are rare, heterogeneous mesenchymal neoplasia. To understand the tumor microenviroment (TME) and identify potential biomarkers for prognosis that associated with TME of SARCs might provide effective clues for immune therapy.
Methods

We evaluated the immune scores and stromal scores by using the RNA sequencing dataset of SARCs patients from The Cancer Genome Atlas (TCGA) database and the ESTIMATE algorithm. Then the differentially expressed mRNAs (DEGs), miRNAs (DEMs) and lncRNAs (DELs) were filtered after comparing the two high- and low- scores groups. Next, based on these DERNAs, we established a competing endogenous RNA (ceRNA) network and explored the prognostic roles of biomarkers involved in the network with the help of bioinformatics analysis.
Results

High immune scores were significantly associated with favorable overall survival of SARCs patients. Next, a total of 328 DEGs, 18 DEMs and 67 DELs that commonly regulated in the immune and stromal scores groups were obtained. And for the DEGs, some of the Gene Ontology (GO) terms and pathways were mainly associated with immune processes. A ceRNA network were constructed with 142 nodes and 424 edges, in which hsa-miR-9-5p, hsa-miR-490-3p, hsa-miR-133a-3p, hsa-miR-133b and hsa-miR-129-5p were the top 5 nodes. Additionally, the protein-protein Interaction (PPI) network identified MMP9, TYROBP, CSF1, CXCR4, FBN1, FLNA, PDGFRB, CYBB, FCGR3A and MYH11 as hub nodes with considerable importance that functioned in the network. Finally, the Kaplan-Meier survival analysis demonstrated that 9 mRNAs (APOL1, EFEMP1, LYZ, MEDAG, MYH11, RARRES1, TNFAIP2, TNFSF10 and ZNF385A), 2 miRNAs (hsa-miR-9-5p and hsa-miR-183-5p) and 3 lncRNAs (CTD-2228K2.7, HOTAIRM1 and NCF1C) were closely associated with the overall survival of SARCs patients.
Conclusions

Taken together, our study confirmed that the prognosis value of immune scores for SARCs patients, also we identified a list of TME-related biomarkers which might contribute to prognostic prediction and help improve the efficacy of immune therapy.


Introduction
Sarcomas (SARCs) are rare, heterogeneous mesenchymal neoplasia and typically divided into two groups (bone SARCs and soft tissue SARCs) [1]. In the US, there were about 10,700 SARC diagnoses and 3,800 deaths per year, while in Europe, SARCs accounts for only 1-2% of all cancers in adults with an overall annual incidence of 5.6 cases per 100,000 adults [2,3]. At present, clinical treatments including surgical resection, radiotherapy and chemotherapy which were adopted singly or together could be effective ways in treating SARCs. However, more than half SARCs patients suffered with a high-risk of metastasis and death for the blood transfer of SARCs in early stages [4].
Recent elucidation of the relationships between cancer cells and immune systems has contributed to the development of immunotherapy [5]. One of the most successful strategies involve immune checkpoint inhibitors, for example, PD-1 (programmed cell death-1 (PD-1) and programmed death-ligand 1 (PD-L1) expressions were signi cantly correlated with CD8 + tumor in ltrating lymphocytes [6]. And their expression were also associated with clinical stage, distant metastasis, poor differentiation of tumor, overall survival and event-free survival in SARCs patients [7]. Thus, understanding the tumor microenviroment (TME) and identifying potential biomarkers that associated with TME of SARCs is critical to improve the e cacy of immune therapy. TME is consisted of two essential components (in ltrating immune and stromal cells) that impact cancer prognosis [8]. And the emergence of Estimation of STromal and Immune cells in Malignant Tumours using Expression data (ESTIMATE) algorithm makes it possible to predict the in ltration of stromal and immune cells in tumors by generating stromal and immune scores based on Gene Set Enrichment Analysis of each tumor samples [9]. Recently, the ESTIMATE has been applied in several cancers, such as head and neck squamous cell carcinoma [10], glioblastoma [11] and acute myeloid leukemia [12]. A previous study used ESTIMATE algorithm to reveal the immune-related gene signature in bone SARCs [13], however, whether this algorithm could be used to investigate prognostic biomarkers in SARCs remains to be elucidated.
In this study, by using 262 SARCs tumor samples from The Cancer Genome Atlas (TCGA) database and the ESTIMATE algorithm, we aimed to establish a competing endogenous RNA (ceRNA) network and identify a set of TME-related biomarkers that are associated with survival in SARCs patients.

Data processing and immune and stromal scores determination
The RNA expression data and the basic clinical information downloaded from the TCGA database consists of 261 SARCs tumor samples (https://gdc.nci.nih.gov/). The immune and stromal scores were evaluated by applying the ESTIMATE algorithm using the estimate R package (R version 3.5.3) [9]. The clinical data of SARCs patients was shown in (Table S1). The scores were used to re ect the level of immune cell and stromal cells in ltration of tumor tissue.
DEGs, DEMs and DELs obtained based on immune and stromal scores According to the median of immune and stromal scores, these SARCs samples were categorized into high-and low-score groups. P < 0.05, |Fold change (FC)| > 1.5 and FDR < 0.05 were set as cutoff criteria to lter for DEGs, DEMs and DELs of these two comparisons. Then, intersect DERNAs of immune and stromal scores groups showed with the Veen diagrams were selected for further analysis.

GO and KEGG analysis of DEGs
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of the intersection DEGs obtained above were performed with DAVID version 6.8 [14,15]. GO terms including biological processes (BP), molecular functions (MF), and cellular components (CC) were evaluated. P < 0.05 represents signi cant difference.

Construction of the ceRNA and protein-protein Interaction (PPI) Network
Target genes of DEMs were predicted by utilizing Miranda (http://www.microrna.org/) and TargetScan (http://www.targetscan.org/), target lncRNAs of DEMs were predicted by utilizing MiRanda and PITA (https://genie.weizmann.ac.il/pubs/mir07/mir07_data.html). Then the common negatively regulated pairs of the MiRanda and TargetScan results, and MiRanda and PITA results, which were also differentially expressed were chosen for construction of the ceRNA network. In addition, for the DEGs in the ceRNA network, PPI network were performed to predict the interactions between them using STRING ( https://string-db.org/). The networks were visualized by Cytoscape software (version 3.6.1).

Survival analysis
Based on the overall survival time of SARCs patients, Kaplan-Meier analysis was conducted to evaluate the prognostic effect of immune scores, stromal scores and all biomarkers regulated in the ceRNA network. A P value less than 0.05 was considered as statistical signi cance.

Results
Immune scores are signi cantly associated with SARCs patients RNA sequencing datasets of 261 patients with SARCs from the TCGA database were analyzed. With the ESTIMATE algorithm, we found that the immune scores for the patients ranged from − 1750.11 to 3630.51, and the stromal scores ranged from − 1384.46 to 2518.94 (Table S2).
Subsequently, in order to determine the relationship between immune and stromal scores and survival in SARCs samples, Kaplan-Meier survival analyses was performed. The results showed that SARCs patients with high immune scores had signi cantly longer overall survival time when compared with those with low immune scores (P = 0.0283; Fig. 1A). Meanwhile, SARCs patients with high stromal scores also had favorable outcomes than those with low stromal scores, though there was no statistical signi cance (P = 0.293; Fig. 1B).
GO and KEGG analysis of DEGs 328 commonly regulated DEGs above were used to explore their potential function. Top GO terms upregulated included neutrophil degranulation and innate immune response in BP; protein binding and serine-type endopeptidase activity in MF; extracellular exosome and extracellular region in CC. While the GO terms downregulated included muscle contraction and platelet aggregation in BP; protein binding and actin lament binding in MF; cytosol and cytoplasm in CC (Fig. 3A, 3B).
Additionally, for the KEGG pathways, the upregulated DEGs were signi cantly enriched in staphylococcus aureus infection, phagosome and tuberculosis, the downregulated DEGs were mainly involved in vascular smooth muscle contraction, focal adhesion and adrenergic signaling in cardiomyocytes (Fig. 3A, 3B).
Furthermore, some of the GO terms and pathways were mainly associated with immune processes.

Construction of ceRNA and PPI Network
We rstly used the 18 common DEMs to predict mRNAs and lncRNAs using the algorithm mentioned above. 347 mRNAs and 260 lncRNAs that existed in both groups were predicted (Table S3, S4). After we compared these results of negatively regulated pairs predicted with the DEGs and DELs, 89 DEGs, 14 DEMs and 38 DELs were nally obtained to construct the ceRNA network. The network consisted of were constructed with 142 nodes and 424 edges, and the degrees of the top 5 nodes (hsa-miR-9-5p, hsa-miR-490-3p, hsa-miR-133a-3p, hsa-miR-133b and hsa-miR-129-5p) were 32, 27, 23, 22 and 17, respectively (Fig. 4A).
Then the PPI network which consisted of 67 nodes and 180 edges were conducted to show the interactions between DEGs in the ceRNA network. The top 10 genes (MMP9, TYROBP, CSF1, CXCR4, FBN1, FLNA, PDGFRB, CYBB, FCGR3A and MYH11) were considered as hub genes based on the degree of importance (Fig. 4B).

Survival analysis
We analyzed that association between 89 DEGs, 14 DEMs and 38 DELs in the ceRNA network and overall survival time of SARC patients. 9 mRNAs (APOL1, EFEMP1, LYZ, MEDAG, MYH11, RARRES1, TNFAIP2, TNFSF10 and ZNF385A) among 89 DEGs closely related to the overall survival of SARCs patients. And the high expression of all these 9 mRNAs were related to high survival rates in SARCs patients. we also observed that the low expression of 2 miRNAs (hsa-miR-9-5p and hsa-miR-183-5p) among the 14 DEMs were related to favorable survival outcomes. In addition, 3 lncRNAs (CTD-2228K2.7, HOTAIRM1 and NCF1C) were closely associated with the overall survival of SARCs patients. For CTD-2228K2.7 and HOTAIRM1, their low expressions were related to a high overall survival rate in SARCs patients. While for NCF1C, its high expression was correlated with good survival time (Fig. 5).

Discussion
The current study was the rst to investigate the ceRNA network that associated with TME of SARCs based on TCGA database. We rst used the ESTIMATE algorithm and found that SARCs patients with high immune scores had longer overall survival time. This was similar to a previous study which reported the high immune scores has favorable outcomes in bone SARCs patients [13]. While for the correlation between stromal scores and survival in SARCs samples, we failed to nd any signi cant difference.
Expect for TNFSF10 (TRAIL), which has been reported in several subtypes of SARCs, the rest prognostic biomarkers are novel for SARCs. APOL1 is a novel BH3-only protein, and its overexpression could induce autophagy and autophagy-associated cell death in several kinds of cancer cells [16,17]. It was overexpressed in pancreatic cancer, lung adenocarcinoma and papillary thyroid carcinomas compared to matched normal tissues, and their prognostic roles were found in pancreatic cancer and lung adenocarcinoma [18][19][20]. EFEMP1 can be found in different human tissues, is a member of the bulin family of extracellular glycoproteins [21]. Its high expression helps enhance the tumor growth in pancreatic carcinoma cells via binding the EGF receptor and activate the MAPK and Akt pathways [22]. Also, EFEMP1 could promote the migration and invasion of bone SARCs via MMP-2 with induction by AEG-1 via NF-κB signaling pathway, and EFEMP1 was also reported to be a poor prognostic indicator of bone SARCs [23]. LYZ encodes human lysozyme and acts as a macrophage marker, its expression levels positively correlate with the numbers of CD68 + pSTAT1 + macrophages [24]. In addition, it could interact with CD34 + cells and neutrophils that may predict an increased risk of thrombosis in essential thrombocythemia patients [25]. MEDAG was involved in the processes of lipid accumulation, adipocyte differentiation, and glucose uptake in mature adipocytes [26]. Overexpression of MEDAG was related to lymph node metastasis and poor prognosis in papillary thyroid microcarcinoma, which was also the rst to report its roles in cancer [27]. MYH11, RARRES1 and TNFAIP2 have been commonly investigated in many cancers. MYH11 is a protein that participates in muscle contraction through the hydrolysis of adenosine triphosphate [28]. Its expression have been found to be associated with many kinds of cancer, such as lung cancer, bladder cancer [29], head and neck cancer [30] and colorectal cancer [31]. RARRES1 is also gene that dysregulated in several cancers. It could induce autophagy in prostate cancer and cervical cancer cells [32,33]. A recent study con rmed that RARRES1 contributed to the regulation of dendritic cells, and acted as a novel immune-related biomarker for glioblastoma [34]. TNFAIP2 is abundant in immune cells like myelomonocytic cells, endothelial cells, peripheral blood monocytes, dendritic cells, intestinal M cell, and macrophages [35][36][37]. And it played essential roles in in ammation, cell proliferation, angiogenesis, migration, membrane nanotube formation [38]. ZNF385B belongs to the family of zinc-nger genes and encodes transcription factors. In serous ovarian cancer, the increased mRNA levels of ZNF385B contributed to shorter survival time of ovarian cancer patients [39].
For the 2 miRNAs (hsa-miR-9-5p and hsa-miR-183-5p) and 3 lncRNAs (CTD-2228K2.7, HOTAIRM1 and NCF1C), the expression of hsa-miR-9-5p could be downregulated by lncRNA TUG1, and TUG1 overturned the effect of miR-9-5p on the proliferation, colony formation, cell cycle arrest, and apoptosis in bone SARCs cells [40]. Hsa-miR-183-5p was the rst found in SARCs, highly expressed hsa-miR-183-5p might be associated with a poor prognosis for Clear cell renal cell carcinoma patients [41]. In addition, it was also considered as the optimal diagnostic biomarkers for hepatocellular carcinoma [42]. The lncRNA HOTAIRM1(HOXA transcript antisense RNA myeloid-speci c 1) is a natural antisense transcript of HOXA1 gene and expresses in the myeloid lineage and induced during neuronal differentiation [43,44]. The authors reported that HOTAIRM1/HOXA1 could downregulate the expression of suppressive molecules released by MDSCs and increase the antitumor immune response [45]. It has been reported to be involved in many malignant diseases [45][46][47][48], while it was the rst to be found in SARCs. However, there is no study on the roles of CTD-2228K2.7 and NCF1C in TME of SARCs patients, which means that more investigations are needed.

Declarations
Authors' contributions DZ and YW: conception and design of the research. MW and BZ: acquisition of data and analysis and interpretation of data. FH, YL and BZ: statistical analysis. DZ and YW: drafting the manuscript. FH, YL, BZ DZ and YW: revision of manuscript. All authors have read and approved the nal manuscript. Figure 1 The association between immune scores (A) and stromal scores (B) and overall survival with SARCs patients.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.