A panel of eight mRNA signatures improves prognosis prediction of osteosarcoma patients

Abstract Genetic alterations are vital to the progression of osteosarcoma carcinoma. The present study investigated a panel of gene signatures that could evaluate prognosis in osteosarcoma based on data from the Therapeutically Applicable Research To Generate Effective Treatments initiative. Osteosarcoma messenger RNA (mRNA) profiles and clinical data were downloaded from the therapeutically applicable research to generate effective treatments database. Patients with osteosarcoma were divided into two groups based on findings at diagnosis: with and without metastasis. Differentially expressed mRNAs were compared and analyzed between groups. Univariate and multivariate Cox regression analyses identified a set of eight mRNAs with the ability to classify patients into high-risk and low-risk groups with significantly different overall survival times. Further analysis indicated that the eight-mRNA signature was an independent prognostic factor after adjusting for other clinical factors. Receiver operating characteristic curve analysis demonstrated a good performance of the eight-mRNA signature. Further, the biological processes and signaling pathways of the eight-mRNA signature were reviewed using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes resources. Finally, the results of the TCGA analysis were verified by other cohorts from Gene Expression Omnibus database. The identification of an eight-mRNA signature not only provides a prognostic biomarker of osteosarcoma but also offers the potential of novel therapeutic targets for its treatment.


Introduction
Osteosarcoma is the most common type of primary malignant bone cancer, constituting 15% of all diagnosed bone tumor cases. [1][2][3][4] It is a highly heterogeneous malignancy, with a large percentage of patients having a distant metastasis when first diagnosed. [5,6] The current standard treatment [7,8] is a combination of local control surgery with post/preoperative systemic multiagent chemotherapy including cisplatin, epirubicin, etopside, methotrexate, and cyclophosphamide. [9,10] Although the standard therapy has been proven to be an effective treatment for osteosarcoma, patients still have a poor prognosis with a greater than 40% 5-year mortality rate. [11][12][13] Therefore, it is important to increase our understanding of the underlying molecular mechanisms of osteosarcoma, and to establish a molecular biomarker model for diagnosis and prognosis to improve patient outcome.
With the rise of genetic sequencing, we now have a better understanding of messenger RNAs (mRNAs). These are a large family of RNA molecules that convey genetic information from DNA to the ribosome, where they specify the amino acid sequence of the protein products of gene expression. A large number of differentially expressed mRNAs have been observed in tumors compared with healthy adjacent tissues. mRNAs are closely associated with cell differentiation, proliferation, and apoptosis, and accumulating evidence indicates that dysregulated mRNAs play an oncogenic or tumor suppressor role in cancer development. [14][15][16][17] Given the fundamental roles and intrinsic characteristics of mRNAs, many have been regarded as diagnostic and prognostic biomarkers in cancers, including BRAF, [18] KRAS, [19] and Her-2. [20] Although several studies have previously classified mRNA biomarkers for prognosis prediction in osteosarcoma, [21][22][23] a systematic investigation of the prognostic significance of mRNA signatures in osteosarcoma patients has not been undertaken.
The present study downloaded and analyzed mRNA expression profiles of a large number of osteosarcoma patients from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) project to systematically investigate the prognostic significance of mRNAs. We identified eight biologically relevant mRNAs that were significantly connected with the prognosis of osteosarcoma patients using survival analysis and Cox regression analysis. These were used to establish an eight-mRNA signature that was confirmed as an independent prognostic factor for use in predicting overall survival in patients with osteosarcoma.

Data source
RNA sequencing (RNA-seq) data and clinical information about osteosarcoma patients in the TARGET project were downloaded from the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/) in October 2018. First, RNAseq data files were merged into a matrix file using the merge script of the Perl language (http://www.perl.org/). Then, the gene name was converted from the Ensembl id to the matrix of the gene symbol through the Ensembl database (http://asia. ensembl.org/index.html). We selected mRNA data derived from osteosarcoma tumor tissue samples. A total of 101 osteosarcoma TARGET mRNA data samples were downloaded from the database. According to the findings at diagnosis, patients were divided into two groups: Group NM (osteosarcoma patients without metastasis) and Group M (osteosarcoma patients with metastasis). The R package edgeR was used to identify genes that were differentially expressed (DEGs) between Group NM and Group M, with false discovery rate <0.05 and jlogFCj>2 set as the threshold. Gene expression levels based on microarray data were calculated using R package limma with RMA correction.

Gene information acquisition and clinicopathological features
Gene information was obtained from downloaded data. We excluded patients with incomplete clinicopathological parameters and those with missing prognostic follow-up data. A total of 95 patients were analyzed in our study.

mRNA-related prognostic model
To discover potential mRNA factors affecting the prognosis of osteosarcoma patients, univariate Cox proportional hazard regression analysis was applied using the R survival package to evaluate the association between mRNA expression and overall survival. Only those mRNAs with P < .05 were considered candidate mRNA prognostic markers of osteosarcoma. Multivariate Cox proportional hazards regression was performed to identify optimal prognostic mRNAs that impact on the survival of osteosarcoma patients. An individual's risk score model was established to evaluate survival risk as follows: Here, mRNAi is the identifier of the ith selected mRNA. The risk score model was a measure of prognostic risk for each osteosarcoma patient.

Risk stratification and ROC curve
The risk score was calculated according to the predictive mRNA signature model. Using the median risk score as the cutoff, enrolled patients were classified into high-risk and low-risk groups. A highrisk score indicates poor survival for osteosarcoma patients. Kaplan-Meier analysis was used to generate overall survival curves, and log-rank tests were performed to assess overall survival differences between high-risk and low-risk patient groups. The performance of the mRNA prognostic model was evaluated by calculating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve in the R package of "survival ROC". Univariate and multivariate analyses with Cox proportional hazards regression for overall survival were performed on individual clinical risk factors with and without the eight-mRNA signature. Hazard ratios and 95% confidence intervals were estimated.

Functional enrichment analysis
The mRNA data samples from osteosarcoma tumor tissues in the TARGET project were downloaded from the Genomic Data Commons Data Portal. According to univariate and multivariate analyses, eight prognostic mRNAs were associated with the prognosis of osteosarcoma patients. Pearson correlation coefficients between the expression profiles of the eight prognostic mRNAs and co-expressed protein-coding genes (PCGs) were computed to identify co-expression relationships of mRNAs and PCGs. PCGs with jPearson correlation coefficientj >0.40 and P < .01 were considered to be mRNA-related PCGs. Functional enrichment analysis of PCGs with eight prognostic mRNAs was conducted for gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways using Database for Annotation, Visualization and Integrated Discovery Bioinformatics Resources (version 6.8). Significant functional categories were identified when adjusted P values by Benjamini were <.1.

GEO database verification
The datasets of osteosarcoma patients were obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi. nlm.nih.gov/geo/). GSE39058 was selected in the present study and the expression profiles were normalized by log2-conversion, the clinical characteristics of which are also attached.

Statistical analysis
Statistical evaluation was performed with SPSS 21.0 software package (IBM, Armonk, NY). Measurement data are presented as means ± standard deviations (X± S). Continuous variables with normal distributions were tested by independent sample t-tests between intergroup comparisons; otherwise, the one way Mann-Whitney test was used. Enumeration data are presented as frequency and rate. Chi-square tests were used to analyze categorical variables. Univariate and multivariate Cox regression analyses were used to identify factors that were related independently to the outcome of osteosarcoma patients. Prognostic performance was estimated by ROC analysis. P < .05 was considered statistically significant.

Clinical data of samples
Ninety-five patients were analyzed in our study. Their clinical data are shown in Table 1.

Differentially expressed mRNAs between Group NM and Group NM osteosarcoma patients
mRNA expression in Group NM and Group M was investigated using gene expression information from TARGET database profiles. In total, 701 differentially expressed mRNAs were identified between Group NM and Group M patients, of which 404 were upregulated and 297 downregulated ( Fig. 1).

Identification of prognostic mRNAs
To identify prognostic mRNAs, expression data of each differentially expressed mRNA were subjected to univariate Cox proportional hazards regression analysis. This identified 14 mRNAs that were significantly associated with overall survival (adjusted P < .01). Multivariate Cox proportional hazards regression was used to select the optimal eight mRNAs as independent remarkable prognostic factors (Table 2). Among these eight prognostic mRNAs, four were risk factors because they had positive coefficients in Cox regression analysis and their high expression was associated with a shorter overall survival time. The other four mRNAs could be regarded as protective factors because their high expression was closely associated with longer patient survival.

The eight-mRNA prognostic model
The eight prognostic mRNAs were combined to establish the eight-mRNA signature, which is associated with the overall survival of osteosarcoma patients. This signature was developed as a linear combination of the expression levels of the eight mRNAs weighted by their relative regression coefficients in multivariate Cox regression as follows: Risk Score = (0.7661 Â expression value of UGT3A2) + (-1.0487 Â expression value of SAXO2) + (-0.8840 Â expression value of PLEKHG1) + (0.6846 Â expression value of DSCR8) + (0.6977 Â expression value of MYL1) + (-0.8993 Â expression value of TMEM125)+ (0.3215 Â expression value of GAGE1)+ (-1.1019 Â expression value of SCN1A). According to the risk score model, we calculated an mRNA expression-based risk score for each patient and classified them into high-risk (n =47) or low-risk groups (n = 48) using the median risk score as the cutoff point. The overall survival of the two groups differed significantly (P = 3.896eÀ11, log-rank test) (Fig. 2). The median survival time for patients in the high-risk and low-risk groups was 1.8 and 4.9 years, respectively, while overall survival at 5 years was 18.1% and 88.3%, respectively. The univariate Cox regression model of overall survival indicated that the high-risk group was more likely to have higher mortality than the low-risk group (odds ratio 12.66 [95% confidence interval 4.891-32.750, P < .0001]) ( Table 3).

ROC curve analysis and risk stratification of the eight-mRNA signature
Time-dependent ROC curves were used to measure the predictive performance of the eight-mRNA prognostic risk model. The optimal risk score cutoff obtained from ROC curve analysis was 0.730 (sensitivity 86.8% and specificity 75.4%). The AUC of the ROC curve for the eight-mRNA prognostic model was 0.882 at 5 years overall survival (Fig. 3A). The survival status of the osteosarcoma patients is shown on the dot plot (Fig. 3B), and the heatmap shows expression patterns of the eight prognostic mRNAs between high-risk and low-risk groups (Fig. 3C). In the high-risk group, the four risk mRNAs were up-regulated and the four protective mRNAs were down-regulated compared with the low-risk group. The high expression of protective mRNAs in the low-risk group was associated with longer survival time.

The eight-mRNA signature prognostic value is independent of other clinical variables
To evaluate whether the prognostic value of the eight-mRNA signature was independent of other clinical variables, we performed multivariate Cox regression analysis using age, sex, ethnicity, relapse or metastasis, primary tumor site, tumor necrosis rate, and prognostic risk score model as co-variables. We observed that the eight-mRNA signature maintained an independent correlation with overall survival when adjusted for age, sex, ethnicity, relapse or metastasis, primary tumor site, and tumor necrosis rate (Table 3). However, relapse or metastasis and the tumor necrosis rate were significant in the multivariate analysis. Therefore, stratification analysis was performed to examine the independence of the eight-mRNA signature according to relapse or metastasis and tumor necrosis rate.
All 95 osteosarcoma patients were stratified into two groups either with relapse and metastasis (n = 46) or without relapse or metastasis (n = 49). Then, according to the eight-mRNA signature, patients without relapse or metastasis were classified into high-risk or low-risk groups. We detected a significant difference in overall survival between high-risk and low-risk groups (log-rank test P = 5.673eÀ05) (Fig. 4A). Similarly, patients with relapse and metastasis were divided into the high-risk group with shorter overall survival or the low-risk group with longer overall survival (log-rank test P = 7.941eÀ03) (Fig. 4B). Osteosarcoma patients were also stratified into those with a tumor necrosis rate 90% (n = 55) and those with a tumor necrosis rate >90% (n = 40). We found that the eight-mRNA signature separated patients in each of these groups into high-risk or low-risk groups, with associated significant differences in overall survival (log-rank test P = 1.647eÀ07 for the tumor necrosis rate 90% patient group and log-rank test pP = 4.919eÀ05 for the tumor necrosis rate >90% patient group) (Fig. 5A, B).

Functional implication of the eight-mRNA signature
To explore the functional implication of the eight-mRNA signature, we performed GO and KEGG functional enrichment analysis to reveal particular functional categories of co-expressed PCGs. We measured the co-expression relationships of mRNAs and PCGs by calculating Pearson's correlation coefficient between the expression profiles of the eight prognostic mRNAs and PCGs. PCGs with jPearson correlation coefficientj >0.40 and P < .01 were considered to be mRNA-related PCGs. Functional enrichment analysis demonstrated that co-expressed PCGs clustered most significantly in eight GO Biological Process (BP) terms (Fig. 6A), 12 GO Cell Component (CC) terms (Fig. 6B), 10 GO Molecular Function (MF) terms (Fig. 6C), and five KEGG pathways (Fig. 6D). For GO BP terms, co-expressed PCGs were mainly enriched in chloride transmembrane transport, the oxidation-reduction process, and pigmentation. For GO CC terms, co-expressed PCGs were mainly enriched in the plasma membrane, the perikaryon, and dendrites. For GO MF terms, coexpressed PCGs were mainly enriched in oxygen binding, iron ion binding, and heme binding. Five KEGG pathways were enriched, which mainly focused on neuroactive ligand-receptor interactions, the peroxisome proliferator-activated receptor (PPAR) signaling pathway, GABAergic synapses, and metabolic pathways.

GEO verification
In order to further confirm the previous findings from TCGA analysis, we selected GSE39058 to verify the accuracy of the above results. According to the risk score model, we calculated an mRNA expression-based risk score for each patient and classified them into high-risk (n = 20) or low-risk groups (n = 21) using the median risk score as the cutoff point. The expression of eight selected mRNA were significantly different between high-and low-risk groups (P < .05) (Fig. 7). Moreover, the overall survival of the two groups was significantly different; patients in the highrisk group had a shorter overall survival compared with patients in low-risk group ( Fig. 8; P = 3.068eÀ05). The outcomes of GEO datasets were consistent with results of the abovementioned TCGA profiles.

Discussion
Although current standard treatments have improved the survival of osteosarcoma patients, their outcome remains unsatisfactory. During the past decade, several conventional prognostic systems for osteosarcoma have been proposed to predict patient prognosis. These systems typically take into account a series of clinical characteristics, such as age, sex, surgical approach, tumor necrosis rate, and chemotherapy, but often result in an insufficient prediction for risk classification and clinical outcome evaluation. Recent advances have demonstrated the molecular heterogeneity of osteosarcoma, which underlies the fact that individual patients have different prognoses and tumor responses to therapy. [24,25] Thus, novel molecular prognostic indicators that can effectively predict the prognosis of osteosarcoma require identification. Given their inherent characteristics and importance, accumulating evidence has revealed roles for mRNAs in cancer diagnosis and prognosis. [26][27][28][29][30][31][32] In this study, we investigated the prognostic value of mRNAs by analyzing the correlation between mRNA expression and overall survival in osteosarcoma patients. Using Cox regression analysis and risk scoring methods, we identified eight mRNAs that were significantly associated with the outcome of osteosarcoma patients. We developed an eight-mRNA signature that categorized osteosarcoma patients into high-risk and low-risk groups with significantly different overall survival times. ROC     Multivariate Cox analysis also revealed that the eight-mRNA signature is independent of conventional clinical factors, including age, sex, ethnicity, relapse or metastasis, primary tumor site, and tumor necrosis rate. Additionally, we found the eight-mRNA signature clearly distinguished patients at high-risk from those at low-risk within subgroups by performing subgroup stratified analysis. Taken together, these results indicate that the eight-mRNA signature has the potential to allow clinicians to determine and select individualized and effective treatment for patients with different clinical or molecular characteristics.
Recently, a wealth of evidence has revealed that mRNAs show over-expression or reduced expression in the development of osteosarcoma. [33][34][35] Furthermore, many studies demonstrate that mRNAs are involved in a diverse range of biological processes by regulating protein translation, [36,37] and that coexpressed genes are often involved in the same process or signaling pathway. [38] Therefore, it is reasonable to reveal  www.md-journal.com potential roles of the eight-mRNA signature by functional views of PCGs that are co-expressed with the prognostic mRNAs. We found that the prognostic mRNAs were enriched in chloride transmembrane transport, the oxidation-reduction process, pigmentation, the plasma membrane, perikaryon and dendrites, oxygen binding, iron ion binding, heme binding, neuroactive ligand-receptor interaction, the PPAR signaling pathway, GABAergic synapses, and metabolic pathways, which were closely associated with cellular metabolism, cell activity, and tumor regression. Neuroactive ligand-receptor interactions are considered to be critical in apoptosis, cell proliferation, and metastasis. [39,40] Apoptosis and cell proliferation are associated with neoplasia and tumor progression. PPARs (Peroxisome proliferator-activated receptors) are ligand-activated transcription factors that are crucial regulators of glucose and lipid metabolism, cellular homeostasis, and inflammation. [41,42] Moreover, cellular growth and differentiation are key processes for carcinoma development and progression. The findings in our study are consistent with recently reported results, [43] which show that neuroactive ligand-receptor interaction and the PPAR signaling pathway are highly associated with osteosarcoma.
In conclusion, the present study identified an eight-mRNA signature panel that can be considered a composite prognostic marker for survival risk stratification in osteosarcoma patients. This signature not only provides new insights into the molecular heterogeneity of osteosarcoma, but also has an independent prognostic value beyond conventional clinicopathological factors to predict the outcome of osteosarcoma. However, there are some limitations in our study that should be considered. The present research lacks effective validation in an independent cohort. We only analyzed TARGET data and the dependability of our findings has not been verified by in vitro or in vivo experiments. Further in vitro and in vivo approaches are needed to fully evaluate the role of the eight-mRNA signature in osteosarcoma.

Author contributions
Zhaoming Ye and Bo Wu designed the study. Bo Wu, Zhan Wang, Nong Lin, Xiaobo Yan, Zhangchun Lv, Zhimin Ying performed the data collection and analysis. All authors participated in the writing of the manuscript. All the authors have read and approved the final version of this manuscript.