An eight-long non-coding RNA signature as a candidate prognostic biomarker for bladder cancer

Backgroud: Bladder cancer (BLCA) is one of the most fatal types of cancer worldwide. However, there are limited methods for us to provide a prognostic prediction of BLCA patients. Therefore, we aimed at developing a lncRNA signature to improve the prognosis prediction of BLCA. Results: An eight-lncRNA signature was significantly associated with recurrence free survival in BLCA patients from both discovery and validation groups. Furthermore, genes involved in the signature were enriched in extracellular matrix organization pathway. Finally, functional experiments demonstrated that six out of the eight lncRNAs significantly regulated the invasion of BLCA cells. Method: A total of 343 BLCA patients from The Cancer Genome Atlas (TCGA) were employed and randomly divided into training (n=172) and validating (n=171) groups. The lncRNA expression profiles of BLCA patients were screened and a risk-score formula were created and validated according to the Cox regression analysis. Next, WGCNA method was employed to cluster genes that highly correlated with the risk scores based on the profiling data of TCGA dataset and transwell assay was also performed to further investigate the role of these lncRNAs. Conclusions: Our results suggested that the eight-lncRNA signature was a candidate prognostic biomarker for predicting tumor recurrence of patients with BLCA.


INTRODUCTION
Bladder cancer (BLCA) is the ninth most common cancer worldwide, accounting for 7% of all cancers and 3% of all cancer deaths [1]. Despite diverse treatment methods including surgery, radiotherapy, chemotherapy, and Bacillus Calmette-Guérin (BCG) therapy [2], the risk of recurrence after 5 years ranges from 50% to 90% in non-muscle-invasive bladder cancer [3]. The high recurrence rate of bladder cancer is partly due to the lack of effective prognostic biomarkers. Therefore, developing an effective screening method for early detection of bladder cancer is critical.
Long non-coding RNAs (lncRNAs) are defined as RNA transcripts longer than 200 bases that that are not translated into proteins [4,5]. Although the functions of only a limited number of lncRNAs have been fully explained, numerous studies have suggested that lncRNAs are involved in many biological processes, including cell proliferation [6], differentiation [7], and chromatin modification [8]. Accumulating evidence has suggested that lncRNAs are frequently deregulated in cancer cells and involved in the development and progression of cancers. For example, in prostate cancer, lncRNA HULC was up-regulated in cancer tissues and associated with a poor overall survival of prostate cancer patients [9]. Li et al. have found that lncRNA FAL1 was positive in hepatocellular carcinoma (HCC) tissues and functioned as an oncogene [10]. Ye et al. have reported that LINC00460 might be a potential prognostic biomarker in lung cancer [11]. Recent studies have also demonstrated emerging roles of lncRNAs in BLCA. For instance, Zhu et al. have found that lncRNA LSINCT5 was significantly over-expressed in human BLCA specimens, and facilitated BLCA progression by enhancing Wnt/β-catenin signaling activation and epithelial mesenchymal transition (EMT) [12]. These findings strongly suggested lncRNAs could serve as diagnostic and prognostic biomarkers in human cancer.
Currently, with the advancements in transcriptome profiling, lncRNA profiling could be achieved by mining previously published gene expression microarray data. Therefore, searching a lncRNA signature might be better strategy to find a novel biomarker for the accurate prognosis prediction of patients with cancer. For example, Yang et al. have identified a six-lncRNA signature associated with recurrence of ovarian cancer [13]. In addition, Song et al. developed a lncRNA signature with prognostic value for survival outcomes of gastric cancer [14]. Subsequent studies also discovered lncRNA signatures that were significantly associated with the survival of patients with thyroid papillary carcinoma [15], pancreatic cancer [16], and oesophageal squamous cell carcinoma [17]. However, the prognostic power of lncRNA signatures in predicting the survival of patients with BLCA has not yet been investigated.
In the present study, we conducted a comprehensive study of lncRNA expression profiles in a cohort of 343 BLCA patients from The Cancer Genome Atlas (TCGA) database. We identified an eight-lncRNA signature with the ability to predict the recurrence free survival of patients with BLCA and validated their biological function in human BLCA cells.

Derivation of an eight-lncRNA prognostic signature from BLCA patients in the training dataset
The BLCA samples (n=343) were randomly divided into training and validating series (Table 1). There is no significant difference in age, race, pathological grade, disease stage and recurrence status between the two series except the proportion of male patients (68% VS 81.3%). To explore the correlation between lncRNA expression signatures and the recurrence free survival (RFS) of BLCA patients, we firstly screened the lncRNA expression profile from training series (n= 172) and then evaluated in the validating series (n=171). By subjecting the lncRNA expression data derived from the training series to univariable Cox proportional hazards regression analysis, we identified some lncRNAs that were strongly correlated with patients' recurrence free survival (P value were less than 0.05). As a result, 8 genes were finally screened out as the predictors. Among these genes, positive coefficients indicated that the higher expression levels of six genes (APCDD1L-AS1, FAM225B, LINC00626, LINC00958, LOC100996694 and LOC441601) were associated with shorter survival. The negative coefficients for the remaining two genes (LOC101928111 and ZSWIM8-AS1) indicated that their higher levels of expression were associated with longer survival ( Table 2).

An eight-lncRNA signature predicts survival of BLCA patients in the training dataset
To investigate whether the eight-lncRNA signature could provide an accurate prediction of RFS in BLCA patients, a risk-score formula was created according to the expression of these 8 lncRNAs for RFS prediction, as follows: Risk score = (0.37 × expression value of APCDD1L-AS1) + (2.13 × expression value of FAM225B) + (0.68 × expression value of LINC00626) + (0.03 × expression value of LINC00958) + (0.18 × expression value of LOC100996694) + (−0.56 × expression value of LOC101928111) + (0.11 × expression value of LOC441601) + (−4.26 × expression value of ZSWIM8-AS1). Then the eight-lncRNA signature risk score were calculated for each patient in the training series. As such, patients were ranked according to their risk scores and divided into a high-risk group (n = 86) or a low-risk group (n = 86) using the median risk score of the training series as the cutoff point ( Figure 1A). As expected, a higher recurrence rate was noted for BLCA patients with high-risk scores than for those with low-risk scores ( Figure 1B). Moreover, tumor tissues obtained from patients with high-risk scores tended to express high level of risky lncRNAs (APCDD1L-AS1, FAM225B, LINC00626, LINC00958, LOC100996694 and AGING

Validation of the eight-lncRNA signature for survival prediction
To confirm our findings, the prognostic power of the eight-lncRNA signature was further evaluated in the AGING validating series. According to the same risk formula, patients in this cohort were divided into high-risk group (n = 86) and low-risk group (n = 85). Kaplan-Meier curves revealed that the high-risk scores of eight-lncRNA signature were significantly associated with lower RFS of BLCA patients (HR = 1.73, 95% CI: 1.09-2.79, p =0.022) ( Figure 1E), which were similar to those observed in the training series.

Survival prediction by the eight-lncRNA signature is independent of clinical features
We performed multivariable Cox regression analysis to evaluate whether the eight-lncRNA signature was an independent predictor of BLCA patient's survival. Clinical features including age, gender, race, pathological grade and TNM stage were defined as covariates. Our results from the validation series showed that the prognostic power of the eight-lncRNA signature risk score (high-risk group vs. low-risk group, HR = 1.73, 95% CI: 1.09-2.79, P = 0.022) was independent of these clinical features (Table 3). In addition, similar results were also obtained from all the BLCA samples (n=343, Table  4). Moreover, we performed stratified analysis to identify the subgroups of appropriate using the eight-lncRNA signature. The results showed effective prognostic power of the eight-lncRNA signature in female patients from the validation series (HR = 5.58, 95% CI: 1.88-20.12, P = 0.003; HR = 3.42, 95% CI: 1.78-6.58, P < 0.001). The results also showed effective prognostic power of the eight-lncRNA signature in Caucasians patientsand high pathological grade patients.

Identification of eight lncRNA signature associated biological pathways and processes
In order to explore the potential mechanisms of the eight lncRNA signature, we performed WGCNA method to cluster genes that highly correlated with the risk scores based on the profiling data of TCGA dataset. We identified a total of 14 modules and found that cyan and green modules were most significantly correlated with the risk-score (Figure 2A). Pathway enrichment analysis was then performed using the genes in cyan and green modules. As shown in Figure 2B and 2C, genes were significantly enriched in cancer-related networks, including extracellular matrix organization pathways, interferon alpha/beta signaling, cytokine production pathway et al., suggesting the activation of these pathways might contribute to higher mortality risk in patients with high risk scores. Our data showed that knockdown of CDD1L-AS1, FAM225B, LINC00626 or LINC00958 significantly inhibited cell invasion. In contrast, knockdown of LOC101928111 or ZSWIM8-AS1 exerted opposite effects. It is noteworthy that the effects of LOC100996694 and LOC441601 on cell invasion were not obvious (Figure 3).

DISCUSSION
Accumulating evidence suggested that lncRNA are involved in cancer development and these dysregulated lncRNAs have already shown great potential as novel molecular biomarkers in early diagnosis, therapeutic process monitoring and prognostic evaluation of cancer [4]. Nevertheless, single lncRNA may not be accurate enough for predicting the prognosis of cancer patients [18]. In recent years, transcriptional profiling analyses have discovered a number of tissue-specific lncRNAs in normal tissues and dysregulated lncRNAs in a variety of human cancers. Therefore, the expression profile-based prognostic lncRNA signature has been investigated in breast cancer, colorectal cancer, gastric cancer and lung cancer [4,[19][20][21]. However, the prognostic values of lncRNA signature in BLCA have not yet been investigated. To explore the prognostic lncRNA signature, we profiled lncRNA by mining the existing lncRNA expression data in TCGA and identified an eight-lncRNA signature which was closely related to the prognosis of patients with BLCA.
Although men are more likely to suffer from bladder cancer, women present generally with more advanced disease and have worse oncologic outcomes [22]. One possible reason lies in the difference in sex steroid hormones and their receptors between men and women, which plays an important role in bladder cancer development and progression. Interestingly, in the present study, we find that the prognostic values of the eight-lncRNA signature were more favorable in female.
In addition, the relationship between race and BLCA is also complex. DeDeugd et al. found that African-Americans initially present with more aggressive BLCA, however, African-Americans actually have an improved overall survival compared with Caucasians [23]. However, Schinkel et al. reported that white and black patients with BLCA were not significantly different in overall and recurrence-free survival regardless of muscle invasion [24]. In the present study, when the patients with BLCA were stratified by race, the results showed that for the Caucasians, they can be divided into either a high-risk group with shorter survival or a low-risk group with longer survival according to the eight-lncRNA signature. For other races, however, the model has lost its prognostic power.  [27]. In accordance with these studies, our data also showed that knockdown of LINC00958 significantly inhibited invasion of BLCA cells. However, the biological functions of the other seven signature lncRNAs were not reported before. Thus, our results still need to be further investigated and validated in human cancers.
Several limitations of this study should be noted. First, the small sample size and lack of validation data from an independent cohort. Second, there are two lncRNAs involved in the signature (LOC100996694 and LOC441601) had no effects on cell invasion without a reasonable explain. Finally, the detailed mechanism still needs further experiments.
In summary, we identified an eight-lncRNA signature, which was significantly associated with recurrence free survival in BLCA patients. Further analysis revealed that genes involved in the signature were enriched in extracellular matrix organization pathway. Moreover, six out of the eight lncRNAs significantly regulated the invasion capability of BLCA cells.

Human BLCA cell line
The BIU-87 human BLCA cell line was preserved in our department and routinely cultured in DMEM medium supplemented with 10% FBS. All siRNAs were designed and synthesized by GenePharma (Shanghai, China). The siRNAs were transfected with LipoGene™ 2000 PLus Transfection Reagent (US Everbright Inc. Suzhou, China) reagent according to the manufacturer's protocol.

Bladder cancer datasets and patient information
Clinical information and the FPKM (Fragments Per Kilobase of transcript per Million fragments mapped) values for LncRNAs in BLCA tissues were directly download from TCGA using an online software (https://shengxin.ren). Then the clinical information files were converted into matrix format and the ENSG ID in RNA-Seq were also converted into gene Symbol using the same software. To analyze the correlation of lncRNA expression signatures with the recurrence free survival of bladder cancer patients, a total of 343 patients with recurrence information were enrolled in the study, after filtering out samples without clinical survival information. Then, the 343 patients were divided into training and validating groups randomly according to their batch numbers.

Data processing and risk-score calculation
The lncRNAs were subjected to univariable Cox regression proportional hazards regression analysis to select lncRNAs which were associated with RFS of BLCA patients. Those lncRNAs with a statistical significance in univariable Cox regression were then selected into multivariable Cox regression to obtain the coefficients. By linearly combining the expression value of selected lncRNAs weighted by their coefficients, a risk-score formula was constructed as following: is the number of prognostic lncRNA genes; Expi is the expression value of lncRNA, and Coei is the estimated regression coefficient of lncRNA in the multivariable Cox regression analysis. Such a risk score model is fully taken into account in the power of each of the prognostic lncRNA genes. Each patient has been given a risk score that is a linear combination of the expression levels of the significant lncRNAs weighted by their respective Cox regression coefficients.

Weighted correlation network analysis (WGCNA)
Considering that our risk score model was based on the expression levels of eight lncRNA, we construct a "risk scrore-gene" co-expression analysis to predict the potential biological function of our model. In this study, we select WGCNA method to find the gene modules associated with our risk scores using the R package "WGCNA" according to previous reports [28]. The soft thresholding power was selected to 9 to produce a weighted network. The enrolled genes were hierarchically clustered into 14 modules.

Pathway enrichment analysis
The cyan and green modules, the most significant modules being associated with risk score, were picked out to perform the pathway enrichment analysis. Pathway enrichment was carried out using an onlinebased web tool "Metascape" (http://metascape.org/). The significance threshold of false discovery rate (FDR) for the significantly enriched biological processes and pathways was set at 0.05.

Invasion assay
The invasive capability of BLCA cells was determined by the transwell assay. The membrane was coated with the Matrigel (200 ng/mL; BD Biosciences). Then BLCA cells transfected with lncRNA siRNAs or control siRNA were seeded in the upper chamber. The DMEM medium supplemented with serum was placed in the lower chamber. The cells on the lower side of the filters were defined as invasive cells.

Statistical analysis
The Kaplan-Meier method was used to estimate survival time for training group and validating group. Then the two-sided log rank test was performed to compare the differences in survival times between the low-risk and high-risk groups in both sets. Furthermore, multivariate Cox analysis and data stratification analysis were performed to test whether the risk score was independent of other clinical features, including age, gender, race, pathological grade and TNM stage, which were used as covariates. P-values less than 0.05 were considered statistically significant.