Prognostic value of long non-coding RNA signatures in bladder cancer

Bladder cancer (BLCA) is a devastating cancer whose early diagnosis can ensure better prognosis. Aim of this study was to evaluate the potential utility of lncRNAs in constructing lncRNA-based classifiers of BLCA prognosis and recurrence. Based on the data concerning BLCA retrieved from TCGA, lncRNA-based classifiers for OS and RFS were built using the least absolute shrinkage and selection operation (LASSO) Cox regression model in the training cohorts. More specifically, a 14-lncRNA-based classifier for OS and a 12-lncRNA-based classifier for RFS were constructed using the LASSO Cox regression. According to the prediction value, patients were divided into high/low-risk groups based on the cut-off of the median risk-score. The log-rank test showed significant differences in OS and RFS between low- and high-risk groups in the training, validation and whole cohorts. In the time-dependent ROC curve analysis, the AUCs for OS in the first, third, and fifth year were 0.734, 0.78, and 0.78 respectively, whereas the prediction capability of the 14-lncRNA classifier was superior to a previously published lncRNA classifier. As for the RFS, the AUCs in the first, third, and fifth year were 0.755, 0.715, and 0.740 respectively. In summary, the two-lncRNA-based classifiers could serve as novel and independent prognostic factors for OS and RFS individually.


INTRODUCTION
Bladder cancer (BLCA) is the ninth most common malignant cancer with high incidence and recurrence rates [1,2]. The risk evaluation of prognosis and recurrence has a critical impact on clinical decision and patient consultation [3]. The most significant factors involved in this evaluation include general condition of patients, clinicopathological characteristics, clinical treatment and progression of disease [1,4,5]. Additionally, tumor node metastasis (TNM) staging system, is currently applied in clinical work as the most common prediction tool [4,6].
Nevertheless, this single clinical prediction model is considered less accurate at prediction than models merging several clinical characteristics [7]. Moreover, the current clinical prediction model cannot facilely incorporate novel factors, such as molecular biomarkers and complex external environmental factors [5].
Over the years, scientists have proposed numerous potential molecular signatures as predictors of the risk of cancer progression, with the most important of them being the DNA methylation-based models [8][9][10], mRNA [11,12], microRNA(miRNA) [13] and long AGING non-coding RNA (lncRNA)-based models [14,15]. Increasing evidence has indicated the critical role of lncRNAs in BLCA prognosis and recurrence, being involved in cancer initiation, progression and metastasis [16]. However, the prognostic value of lncRNAs in BLCA has not been adequately explored yet.
In this study, in an effort to assess the potential utility of lncRNAs in prognosis and recurrence of BLCA, we constructed a 14-lncRNA-based classifier for overall survival (OS) and a 12-lncRNA-based classifier for relapse-free survival (RFS) by using the least absolute shrinkage and selection operation (LASSO) Cox regression. Both of the lncRNA-based classifiers could optimize the predictivity of the current TNM staging system. Our results demonstrate that these lncRNA-based classifiers could be used as reliable prognostic predictors of BLCA survival and recurrence.

Data source and processing
The lncRNA expression profiles in BLCA tissues (n=414) along with the adjacent non-tumor tissues (n=19) were obtained from the TCGA database. As shown in Figure 1, a total of 1643 DElncRNAs ( Figure  2A) with |logFC| >1 and padj < 0.05 were identified using edgeR. Additionally, lncRNAs with p < 0.05 were chosen by applying a univariate Cox regression in the entire data. Following this, 463 lncRNAs (OS, Figure  2B) and 201 lncRNAs (RFS, Figure 2C) were retained for the next step of the analysis. For OS, these samples (n=406) were randomly split into training (n=271) and validation sets (n=135) at 2:1 ratio. Similarly, for RFS, the samples (n=337) were randomly split into training (n=225) and validation sets (n=112) at a 2:1 ratio. The LASSO Cox selection method was applied to construct the prognosis-predicting models in the training cohort at a 20-fold cross-validation (OS: Figure  2D, 2E; RFS: Figure 2F, 2G).

Construction of lncRNAs classifiers for OS and RFS
In the training cohort, a 14-lncRNA-based classifier for OS and a 12-lncRNA-based classifier for RFS were constructed using the LASSO Cox regression mode at 20-fold cross-validation. Detailed information of these lncRNAs is shown in Table 1. According to the prediction value, patients were divided into high-and low-risk groups based on the cut-off of the median risk score. The Kaplan-Meier log-rank test showed significant differences in OS and RFS between low-and high-risk groups in the training cohorts ( Figure 3A, 3B), the validation cohorts ( Figure 3C, 3D) and in the whole cohorts ( Figure 3E, 3F).

Correlation between lncRNAs classifiers and clinicopathologic characteristics
There were no significant difference and deviation between the training cohort and the validation cohort, because these samples were randomly split into training and validation sets at a 2:1 ratio in Tables 2-5. As shown  in Table 2, for OS, the clinical characteristics (subtype, pT, pN and grade) showed significant differences between the two groups in whole cohort. However, for RFS, many clinical characteristics, except pT, did not vary significantly between the two groups in whole cohort (Table 3). Though the lncRNA-based risk scores of OS or RFS were independent of several clinical characteristics, positive associations were detected between them (Figure 4). Patients with high pT, pN or grade were inclined to have a high-risk score.

Prognostic value of lncRNAs classifiers for assessing clinical outcome
In the time-dependent ROC curve analysis, the AUCs for OS ( Figure 5A) in the first, third, and fifth year were 0.734, 0.78, and 0.78 respectively, while the prediction capability of the 14-lncRNA classifier was superior to the previously published lncRNA classifier [17]. As for RFS ( Figure 5B), the AUCs in the first, third, and fifth year were 0.755, 0.715, and 0.740 respectively, whilst the 12-lncRNA-based classifier was mainly built to be a powerful prognostic predictor of BLCA recurrence.  As shown in Table 4, the 14-marker-based classifier, age, pT, pN and pM were significantly associated with OS in the univariate Cox regression analyses. After the multivariate Cox regression analyses of the abovementioned factors, only the 14-marker-based classifier model was retained to be a dependable and independent prognostic factor for OS (p < 0.001) in whole cohort. In univariate Cox regression analyses, the 12-markerbased classifier, subtype, pT, pN and pM were significantly associated with RFS in Table 5. Finally, the multivariate Cox regression analyses revealed that only the 12-marker-based classifier model could be a novel and independent prognostic factor for RFS (p = 0.001) in whole cohort.
In clinical practice, the most commonly used risk classification is TNM staging. Therefore, the association between the lncRNA-based classifier models and TNM staging was explored. The ROC curve analysis compared TNM staging with the lncRNA-based classifier models which had an obvious better predictive accuracy. The results indicated that the combination of the lncRNAbased classifier models and TNM staging could enhance the ability to predict prognosis of survival and recurrence AGING ( Figure 5C, 5D). The Kaplan-Meier curves revealed that patients separated by combining the lncRNA-based risk scores and TNM staging had evidently discrepant prognoses (p< 0.0001, Figure 5E, 5F).

DISCUSSION
Patients with BLCA, especially muscle-invasive bladder cancer (MIBC), still have significant risks of relapse and death, in spite of radical cystectomy [4,6,18,19]. To a certain extent, the aggressiveness of BLCA cannot be accurately stratified by the TNM staging system, which mostly depends on the pathological staging without any molecular biological features [20,21]. On that account, finding new and effective prognostic biomarkers is critical for patients with MIBC due to the disappointing clinical outcomes.
Increasing evidence has demonstrated that dysregulated lncRNAs may contribute to cancer initiation, progression  and metastasis [22]. Several lncRNA-based signatures have been applied to predict the risk of cancer progression in patients with different cancer types, such as renal cell carcinoma [14] and colon cancer [15]. As for BLCA, although the prognostic value of lncRNAs has also been explored by some authors [17,23], there are still many things to be improved. The reasons for this are the following: (1) the internal validation dataset is needed to validate the stability of the constructed model; (2) the comparison between the constructed model and the existing TNM staging system is indispensable; (3) the prognostic value of BLCA recurrence should be further explored. Therefore, in this study, based on a TCGA-BLCA cohort, we established and validated novel prognostic lncRNA-based signatures for OS and RFS, in order to improve the prediction of mortality and disease recurrence. The LASSO-Cox regression mode, as a popular tool for regression with high-dimensional predictors, has previously been performed in the study of colon cancer but has not been applied yet to the study of BLCA. Thus, in this study, the LASSO-Cox regression mode was applied as an effort to optimally select lncRNAs with high expression variances, significant prognostic values and low correlation by using LASSO penalization. A 14-lncRNA-based classifier for OS and a 12-lncRNA-based classifier for RFS were constructed and validated to optimize the predictive ability of prognosis for BLCA patients. The results indicated that AGING the two classifiers could successfully divide BLCA patients into high/low-risk groups with significant differences in OS and RFS in training cohorts. The prognostic value of the two classifiers could be confirmed in validation cohorts, indicating the repeatability and practicability of the two lncRNAbased classifiers for the prognostic prediction for OS and RFS. As shown in Table 2 and Table 3, the 14marker-based classifier, age, pT, pN and pM were significantly associated with OS, while the 12-markerbased classifier, subtype, pT, pN and pM were significantly associated with RFS in univariate Cox regression analyses. In multivariate Cox regression analyses, only the 14-lncRNA-based classifier model was retained to be a dependable and independent prognostic factor for OS (p < 0.001) and only the 12-lncRNA-based classifier model could qualify as a novel and independent prognostic factor for RFS (p = 0.001).
In clinical practice, the most used risk classification is TNM staging. Next, the association between the lncRNA-based classifier models and TNM staging were explored. In the ROC curve analysis, compared TNM staging, the lncRNA-based classifier models had an obviously better predictive accuracy, and the combination of the lncRNA-based classifier models and TNM staging could enhance the ability to predict prognosis of survival and recurrence The Kaplan-Meier curves revealed that patients separated by both the lncRNA-based risk scores and TNM staging had evidently discrepant prognoses.
Our study has showed that the 14-lncRNA-based classifier for OS and the 12-lncRNA-based classifier for RFS were both strongly associated with the prognosis of BLCA. However, most of the lncRNAs in our classifiers have not been completely clarified and functionally annotated. On the other hand, several lncRNAs used in our classifiers have been explored in previous studies. MAFG-AS1 has been shown to function as a ceRNA to increase the expression of MMP15 and NDUFA4. It does so by competing for miR-339-5p and miR-147b, thus exerting its oncogenic function in non-smallcell carcinoma [24] and colorectal cancer [25]. LINC01138 induces malignancies via activating arginine methyltransferase 5 and interacting with PRMT5 to promote SREBP1-mediated lipid desaturation individually in hepatocellular carcinoma [26] and clear cell renal cell carcinoma [27]. Given their strong relevance to prognosis, these genes should be explored in the future, especially in relation to BLCA.
Inevitably, the present study has some innate limitations which need to be addressed. Firstly, the current study was of a retrospective nature, since it was based on data AGING from TCGA dataset without validating it in a prospective clinical trial. Secondly, the mechanism behind the lncRNAs in our classifiers remains entirely unclear. Hence, the need for further studies of the specific lncRNAs is indisputable, as they can contribute to a distinct understanding of the implication of lncRNAs in BLCA initiation and progression. Moreover, the information regarding several important clinicopathological features, such as treatments, was not available in the TCGA-BLCA cohort. Despite these drawbacks, the results demonstrate that our lncRNAbased classifiers could be used as reliable prognostic predictors of BLCA survival and recurrence.
In summary, a 14-lncRNA-based classifier for OS and a 12-lncRNA-based classifier for RFS were constructed   AGING using the LASSO Cox regression model. These classifiers could be novel and independent prognostic factors for OS and RFS respectively, while optimizing the predictive ability of the current TNM staging system. Nevertheless, future, large-scale, multi-center studies are necessary to confirm our results before the lncRNA-based signatures can be applied in the clinic.

Patient datasets
TCGA-BLCA RNA sequencing dataset and corresponding clinical characteristics of patients were downloaded from the TCGA website (https://cancer genome.nih.gov/), including 414 BLCA tissues and 19 adjacent non-tumor tissues. The RFS data was downloaded from the UCSC Xena website (https://xena. ucsc.edu/). We excluded the lncRNA whose expression (read counts) was "zero" in 90% of the BLCA patients.

Data processing
BLCA data were annotated by Gencode (GENCODE v 26) GTF file in this study. As shown in Figure 1, we used edgeR for the entire data in order to identify the differentially expressed lncRNAs(DElncRNAs) with |logFC| >1 and padj < 0.05 between tumor and normal samples. Meanwhile, we conducted a univariate Cox regression for all lncRNAs in cancer samples and chose the lncRNAs with p < 0.05 for the next analysis. The DElncRNAs with |logFC| >1 and padj < 0.05 were retained to determine their overlap with lncRNAs with p < 0.05 in the univariate Cox regression. Afterwards, these samples were randomly split into training and validation sets at a 2:1 ratio. Following this, we applied the LASSO Cox selection method at 20-fold cross-validation to construct the survival-predicting models. The predictive ability of the model for the training, validation and whole cohorts were evaluated by the Kaplan-Meier log-rank test, Time-dependent ROC curve analysis and multivariate Cox regression analysis.

Construction of lncRNAs signature and statistical analysis
The lncRNAs-based prognosis risk score was constructed based on a linear combination of the expression level multiplied regression model (β) and the LASSO Cox selection method [28][29][30] at 20-fold cross-validation. Based on the cut-off of the median risk score, BLCA patients were divided into high-and low-risk groups. The Kaplan-Meier survival curves for the cases predicted to have low or high risk were produced. All the analyses were implemented in SPSS version 23.0 or R version 3.5.2 with the following packages: 'edgeR', 'glmnet', 'survivalROC' and 'gplot'. All the hypotheses were two-sided and P < 0.05 was considered statistically significant.

AUTHOR CONTRIBUTIONS
A.H and S.H: design, analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript; A.H, D.P and Y.Z statistical analysis; Y.L and Z.C: acquisition of data; Y. G, X.L and L.Z: critical revision of the manuscript for important intellectual content, administrative support, obtaining funding, supervision. All authors read and approved the final manuscript.