Identification of crucial genes in metastatic osteosarcoma and prognostic biomarker in osteosarcoma patients


 Background Osteosarcoma is still a challenging cancer that poses a huge threat to human health worldwide, especially in young people. The aim of this study was to identify prognostic genes associated with neoplasm metastasis in osteosarcoma, and construct a prognostic nomogram to help the orthopedist to assess prognosis of osteosarcoma patients.Methods Gene expression and clinical information were extracted from the TARGET database. Differentially expressed genes (DEGs) associated with osteosarcoma metastasis were identified and subjected to GO functional and KEGG pathway enrichment analyses. Survival and Cox analyses were used to identify prognostic genes.Results First, 214 DEGs were identified as crucial genes associated with osteosarcoma metastasis. Then functional enrichment analyses showed that DEGs were enriched in cell-cell signaling, inflammatory response, and immune response. Survival and Cox analyses identified five prognostic genes (MUC17, MYC, TAC4, HERC5, OR8G5) that were related to the prognosis of patients. This five-genes signature has a great diagnostic value for prognosis because it was not affected by age at diagnosis, gender, metastasis or relapse.Conclusions The present study identifies five target genes related to the metastasis of osteosarcoma. This five-genes signature can be used to predict the prognosis of osteosarcoma patients and may become new potential therapeutic targets for the treatment of osteosarcoma.


Introduction
Osteosarcoma is the most common primary malignant bone tumor, generally affecting children and young adults, with a global incidence of one to three cases per million people annually. Although multiagent chemotherapy regimens have effectively improved the long-term survival rates, osteosarcoma remains a challenging tumor because the pathogenesis is not completely clear [1]. The 5-year survival rate of the young cohort (0-24 years) is 61.6%, and decreases with age [2]. About 15-20% of patients present with metastasis upon diagnosis, frequently in the lung (85-90%), which leads to a poor prognosis [3]. This highlights the importance of identifying molecular targets of pulmonary metastasis for improving the prognosis of osteosarcoma patients.
Extensive efforts have been made to explore, but the pathogenesis of pulmonary metastasis in osteosarcoma is not completely clear yet. MicroRNA-645 is up-regulated in osteosarcoma tissue and promotes metastasis by inhibiting the tumor suppressor NME2 [4]. Secreted protein acidic and rich in cysteine -like 1 (SPARCL1) inhibits osteosarcoma metastasis by activation of canonical WNT/β-catenin signaling. SPARCL1 stabilizes the interaction between canonical WNT ligands and their receptors through physical interaction with FZDs and LRP5/6 [5]. Overexpression of microRNA-511 may result in low expression of MAPK1, and inhibiting metastasis of osteosarcoma [6]. KEAP1 promotes lung metastasis in osteosarcoma patients [7]. Overexpression of integrin-β1 promoted osteosarcoma metastasis by inhibiting cell apoptosis [8]. However, the complex pathogenesis of metastatic osteosarcoma is still poorly understood.
Currently, microarray analysis has been applied to screen for potential molecular targets in many cancers,

Expression analysis of DEGs associated with metastatic osteosarcoma
Original gene expression data were converted to gene symbols according to gene annotation information of the human (http://www.ensembl.org). Differentially expressed genes were unveiled by package edgeR of R platform [9]. The criterion for comparison was whether metastasis had occurred. Genes meeting the cut-off criteria of |log2 fold-change| > 1, and adjusted p-values < 0.05 were selected as DEGs. A volcano plot of the DEGs was drawn by the ggplot package in R.

GO and KEGG pathway enrichment analysis
The Database for Annotation Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov) is a website that provides a comprehensive set of functional annotation tools for researchers to investigate the biological meaning of genes [10]. Identi ed DEGs were investigated further using DAVID (version 6.8), in the Gene Ontology (GO) functional annotation analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [11][12]. A p-value < 0.01 and gene count > 5 was considered statistically signi cant in the GO analysis, while the KEGG pathway enrichment analysis used a criterion of P < 0.05 and gene count > 5.

Protein-protein interaction (PPI) network construction
The Search Tool for the Retrieval of Interacting Genes database (STRING; version 11.0; https://stringdb.org) was used to analyze the interactive relationships among the DEGs [13]. Interactions with a combined score > 0.4 were de ned as statistically signi cant.

Survival analysis of DEGs
The osteosarcoma cases were used to estimate the correlation between prognostic genes with overall survival time and vital status by Kaplan-Meier (K-M) survival curves. A P < 0.05 was considered statistically signi cant for prognostic capacity.

Univariate and multivariate Cox proportional-hazards regression analysis
Univariate Cox analysis was used to examine DEGs affecting the overall survival time of the osteosarcoma patients. Genes with a P < 0.05 were considered as candidate genes. The genes, which that had a P < 0.05 in K-M survival analysis and univariate Cox analysis simultaneously, were used to multivariate Cox proportional hazard regression analysis. Some clinical parameters including age at diagnosis, gender, metastasis, and relapse were also included in multiple regression analyses. Then, an optimal formula that can calculate the risk score of each sample was constructed. Survival analysis and receiver operating characteristic curve (ROC curve) analysis was used to evaluate the value of the riskscore formula.

Identi cation of DEGs associated with metastatic osteosarcoma
Comparing the metastatic and non-metastatic groups, 214 DEGs were identi ed from the datasets, including 114 upregulated genes and 100 downregulated genes. A volcano plot for the identi ed DEGs was constructed (Fig. 1).
Functional enrichment analyses of DEGs and PPI network GO function and KEGG pathway enrichment analyses for the DEGs were performed using DAVID (Table 1). The enriched GO terms were divided into cell component (CC), biological processes (BP), and molecular function (MF) ontologies. The results of the GO analysis indicated that the DEGs were mainly enriched in BP, including transcription from RNA polymerase II promoter, cell-cell signaling, in ammatory response, and immune response. MF analysis showed that the DEGs were signi cantly enriched in cytokine activity, transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-speci c binding, and hormone activity. For the CC, the DEGs were enriched in extracellular space, extracellular region, integral component of plasma membrane, and proteinaceous extracellular matrix. KEGG pathway analysis showed that DEGs were mainly enriched in neuroactive ligand-receptor interaction, cytokine-cytokine receptor interaction, and Wnt signaling pathway. Protein interactions among the DEGs were predicted with STRING tools. A total of 201 nodes and 185 edges were involved in the PPI network (Fig. 2).

Correlation of DEGs expression with survival
The Kaplan-Meier survival curves were used to investigate the prognostic values of the hub genes. Twenty two prognostic genes had a P < 0.05 and were considered prognostic ( Table 2).  Table 3) were prognostic genes for metastatic osteosarcoma. To t the optimal risk-score formula, ve genes were used to construct the risk-score model. The risk score was calculated by the risk-score formula: risk score = 0.296 × Exp MUC17 + 0.738 × Exp MYC + 0.539 × Exp TAC4 − 0.372 × Exp HERC5 − 0.550 × Exp OR8G5 . All samples were divided into the low-risk group and the high-risk group by the risk score. In order to evaluate the value of ve prognostic genes, K-M survival curves, and the 5-year ROC curve was made and showed a strong predictive power (Fig. 3). About the ve prognostic genes, low expression of two genes (HERC5, OR8G5) and high expression of three genes (MUC17, MYC, TAC4) were associated with worse prognosis (Fig. 4), indicating that MUC17, MYC and TAC4 expression may be a risk factor for osteosarcoma metastasis. In addition, high expression of HERC5 and OR8G5 forecast improved survival outcomes in osteosarcoma patients. Validation of risk score formula In order to evaluate the effect of different clinical traits on the risk formula, all samples were divided into two groups according to age at diagnosis (group 1: age ≤ 18, group 2: age > 18; Fig. 5A). Gender (group 1: Male; group 2: Female; Fig. 5B), distant metastasis (group 1: non-metastasis; group 2: metastasis; Fig. 5C), and relapse (group 1: non-relapse; group 2: relapse; Fig. 5D) also were analyzed by the same way. The results of ROC analysis were used to estimate the effect of the risk score formula. The results showed that risk formula was accurate in each group, and not affected by ages, gender, metastasis or relapse.

Discussion
Osteosarcoma is challenging cancer with high chemotherapeutic resistance and metastatic incidence.
Because the prognosis of osteosarcoma is still poor, identifying prognostic genes involved in the pathogenesis of metastatic osteosarcoma and constructing a predictive model of survival are of crucial importance for management policy. In this study, we used the TARGET datasets to identify DEGs between metastatic osteosarcoma and non-metastatic osteosarcoma. In total, 214 genes were consistently expressed differentially in metastatic osteosarcoma (114 up-regulated and 100 down-regulated). Twentysix signi cantly enriched functions were obtained in GO function and KEGG pathway enrichment analyses, such as cell-cell signaling, in ammatory response, cytokine activity, immune response, and cytokine-cytokine receptor interaction. A PPI network was constructed to investigate the interrelationship of the DEGs. Survival analysis and univariate Cox analyses were used to nd out candidate genes for multivariate analyses. Fifteen genes were screened from the survival analysis and univariate Cox analyses. Multivariate analyses demonstrated that ve genes were independent prognostic indicators for osteosarcoma, speci cally three were up-regulated (MUC17, MYC, TAC4) and two were down-regulated (HERC5, OR8G5). After veri cation, we found that this ve-genes signature was not affected by clinical traits such as age at diagnosis, gender, metastasis, and relapse. Not only did the ve genes have a high potential of distinguishing the patient with a high risk of metastasis from osteosarcoma patients, but it also could be regarded as a signature to predict the prognosis of osteosarcoma patients.
The MYC proto-oncogene, a bHLH transcription factor (MYC), also known as MRTL, MYCC, c-Myc, and bHLHe39, is a proto-oncogene and encodes a nuclear phosphoprotein that plays a role in cell cycle progression, apoptosis, and cellular transformation. Ampli cation of this gene is frequently observed in numerous human cancers and also has been reported in osteosarcoma. MYC gene up-regulation in osteosarcoma has been associated with poor prognosis of osteosarcoma patients [14][15]. HOTTIP upregulates MYC, and the positive feedback loop formed by HOTTIP and MYC promotes osteosarcoma cell migration and invasion [16]. Overexpression of miR-34a up-regulates MYC expression and accelerates osteosarcoma cell apoptosis induced by cisplatin. MYC was necessary for osteosarcoma apoptosis induced by the combination of miR-34a and cisplatin [17]. Recombinant adenovirus encoding antisense c-myc (Ad-Asc-myc) increases the sensitivity of osteosarcoma cells to cisplatin in vitro and induced apoptosis [18].
HECT and RLD domain containing E3 ubiquitin protein ligase 5 (HERC5), also known as CEB1 and CEBP1, is a member of the HERC family of ubiquitin ligases and encodes a protein with a HECT domain and ve RCC1 repeats. Zhu et al reported that HERC5 was negative correlated with SOX18 expressions in osteosarcoma cell, and SOX18 was overexpressed in OS and promoted migration and invasion of ostesarcoma [19]. But them did not elaborate further on the role of HERC5 in osteosarcoma.
None of the other three prognostic genes (MUC17, TAC4, OR8G5) have been reported in osteosarcoma, but MUC17 and TAC4 have been shown to play an important role in many cancers. MUC17 was more highly expressed in gastric cancer, and inhibited the progression of gastric cancer through a MYH9-p53-RhoA regulatory feedback loop to limited im ammatory responses [20]. TAC4 was a member of the tachykinin family of neurotransmitter-encoding genes. Tachykinin proteins are cleaved into small, secreted peptides that activate members of a family of receptor proteins. Hemokinin-1 (HK-1), the newest tachykinin encoded by the TAC4 gene, promoted migration of melanoma cells [21]. Studies on OR8G5 are still infrequent at present, and there are no relevant studies on tumor eld.
Our study indicates that the ve prognostic genes play a potentially important role in the pathogenesis of metastatic osteosarcoma and may serve as target biomarkers for the prognosis of osteosarcoma. But four of them have not been researched in osteosarcoma. Further experiments are needed to validate and re ne their function and signaling pathways. Besides, there are still several limitations that cannot be ignored in our study. First, the number of eligible cases was not particularly su cient and some meaningful clinical data were incomplete, so we had to remove some cases, which may cause relevant bias. Second, lack of independent large-scale biological tissue samples or cell lines to verify the results obtained in this study and completely demonstrate the underlying mechanism of ve genes in osteosarcoma. Our ndings will be more reliable if another independent dataset could be used for validation.
Osteosarcoma is a complex regulatory network, multigene as biomarkers achieved higher speci city and sensitivity compared with single-gene. The present study identi es ve target genes related to the metastasis of osteosarcoma. This ve-genes signature can be used to predict the prognosis of osteosarcoma patients and may become new potential therapeutic targets for treatment of osteosarcoma.  Volcano plot for the 214 DEGs. Nodes represent the genes; red nodes represent 114 upregulated genes, green nodes represent 100 downregulated genes, black nodes represent non-differentially-expressed genes.

Figure 2
Protein-protein interaction network constructed with the differentially-expressed genes. Nodes represent the genes; edges represent the co-expression relationship of the DEGs).