Combined eight-long noncoding RNA signature: a new risk score predicting prognosis in elderly non-small cell lung cancer patients

The elderly are the majority of patients with non-small cell lung cancer (NSCLC). Compared to the overall population's predictive guidance, an effective predictive guidance for elderly patients can better guide patients' postoperative treatment and improve overall survival (OS) and disease-free survival (DFS). Recently, the long non-coding RNAs (lncRNAs) have been found to play an important role in predicting tumor prognosis. To identify potential lncRNAs to predict survival in elderly patients with NSCLC, in the present study, we chose 456 elderly patients with NSCLC and analyzed differentially expressed lncRNAs from four Gene Expression Omnibus (GEO) datasets (GSE30219, GSE31546, GSE37745 and GSE50081). We then constructed an eight-lncRNA formula to predict elderly patients’ prognosis in NSCLC. Furthermore, we validated the prognostic values of the new risk model in two independent datasets, TCGA (n=670) and GSE31210 (n=130). Our data suggested a significant association between risk model and patients’ prognosis. Finally, stratification analysis further revealed the eight-lncRNA signature was an independent factor to predict OS and DFS in stage I elderly patients from both the discovery and validation groups. Functional prediction revealed that 8 lncRNAs have potential effects on tumor immune processes such as lymphocyte activation and TNF production in NSCLC. In summary, our data provides evidence that the eight-lncRNA signature could serve as an independent biomarker to predict prognosis in elderly patients with NSCLC especially in elderly stage I patients.


INTRODUCTION
AGING majority of lung cancer cases [3]. Moreover, there is evidence that age is an important risk factor for NSCLC patients [4]. If the elderly can be prevented in time and receive the optimal treatment, the incidence of lung cancer, even mortality and recurrence rate will be greatly reduced. Therefore, it is necessary to find more targeted diagnostic and prognostic indicators in elderly patients with lung cancer.
Long noncoding RNAs (lncRNAs) are a group of novel RNAs of more than 200 nucleotides in length. Although they have no significant protein-coding capacity, lncRNAs play important roles in regulating gene expression at epigenetic, transcriptional and posttranscriptional levels [5]. Accumulating evidence suggested that lncRNAs play the potential role as novel biomarkers for prognosis prediction in cancers [6][7][8]. A growing number of lncRNAs are found to be closely associated with patients' outcome such as XIST, PVT1 and HOTAIR in lung cancer [9][10][11]. Moreover, the application of risk score models in tumor prognosis is also increasing. In gastric cancer, the 24-lncRNA signature was found to predict patient outcome [12]. Similarly, only two literatures have also found a lncRNA signature predicting prognosis in NSCLC [13,14]. However, the study of outcome-related lncRNA in elderly patients with NSCLC is still in its infancy and requires long-term efforts.
With the rapid development of the big data era, public databases such as The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) provide great help for the analysis of high throughput data and clinical data. By using proper statistical analysis, researchers identified number of prognostic biomarkers of various malignancies. In lung cancer, Meng Zhou et al. [13] verified the prognostic power of eight-lncRNA signature in three non-overlapping independent NSCLC cohorts were obtained from the GEO database. Ting Lin identified a seven-lncRNA signature associated with overall survival in NSCLC through a comprehensive analysis of TCGA and GEO data [14]. However, the lncRNA biomarkers that can effectively predict the prognosis of elderly patients with NSCLC have not been fully elucidated.
Bearing this in mind, in this study, we analyzed 456 elderly patients with NSCLC from GEO database in order to select optimal lncRNAs for prognostic prediction according to the corresponding risk score. Then TCGA dataset and another GEO dataset were used to validate the screened lncRNAs. Furthermore, combining with the clinical characteristics of patients, we explored the potentiality of eight lncRNAs in different clinical subgroups.

Identification of eight lncRNAs for prognosis prediction in the training group
After data normalization and combination, a large group comprised of 682 NSCLC samples was constructed based on four GEO datasets (GSE30219, GSE31546, AGING GSE37745 and GSE50081). Out of them, 456 elderly NSCLC patients (age>=60 years) were selected as a training group. Univariable Cox proportional hazards regression analysis was performed to identify certain prognostic related lncRNAs (log2|fold change| >1 and adjusted P < 0.05). A total of 281 lncRNAs were chosen for further analyses. Among them, there were 11 lncRNAs significantly correlated with both OS and DFS (both P < 0.01). After adjusted by gender, pathological subtypes, smoking status and AJCC stage by using multivariable Cox proportional hazards regression analyses, eight lncRNAs were finally identified as independent prognostic biomarkers for elderly NSCLC patients. These eight lncRNAs included LOC284632, LINC00869, LINC00703, LINC00662, LINC00324, ITGA9-AS1, HOXA11-AS, DHRS4-AS1. The detailed information of the above eight lncRNAs were shown in Table 1. We calculated the risk scores of 456 patients in training group using above formula. Then the median risk score was used as the cut-off value to divide the training set into two groups, high-risk (n = 228) and low-risk groups AGING (n = 228). The ranked risk scores of patients in the training set was showed as Figure 1A. A heatmap described the expression profiles of these eight lncRNAs in the training group. The samples were ranked according to their risk scores ( Figure 1B). Among the 8 lncRNAs, LINC00324, ITGA9-AS1 and DHRS4-AS1 received a negative coefficient and acted as protective factors. The other 5 lncRNAs with positive coefficients, including LOC284632, LINC00869, LINC00703, LINC00662 and HOXA11-AS, acted as risk factors. In addition, vital and disease status for each patient was plotted, respectively. The proportion of death and recurrence events in different risk groups was also analyzed ( Figure 1C-D). In the high-risk group, the patients showed higher mortality and recurrent rate than in the low-risk group.
Moreover, Kaplan-Meier analysis was used to evaluate the impact of the above prognosis signature on the survival and recurrence of NSCLC patients in training group. The results showed that the high-risk group had a significantly poorer OS and DFS than that of the lowrisk group (Figure 2A-B). We used time-dependent ROC analysis to assess the prognostic significance of eight lncRNAs. The area under the ROC curve (AUC) for the eight-lncRNA signature on OS and DFS was 0.669 and 0.659, respectively, indicating a favorable prognostic value in predict patients' survival ( Figure  2C-D).

The prognostic values of eight lncRNA signature in two independent validation groups
In order to clarify the significance of the above 8 lncRNA in the elderly patients with NSCLC, we used another two independent groups (TCGA dataset and GSE31210 dataset) as validation groups. The corresponding risk score were calculated according to the constructed formula. The elderly NSCLC patients in TCGA (validation group-1, n=670) and GSE31210 (validation group-2, n=130) datasets were divided into high-risk and low-risk groups using dichotomy method, respectively. In validation group-1 and validation group-2, the scatter plots for death and recurrence events were shown in Figure 3. Kaplan-Meier analyses were carried out in validation group-1 ( Figure 3A-B).
The elderly patients with NSCLC in high-risk group showed worse OS (log-rank test P =0.001) and DFS (log-rank test P =0.006) than patients in low-risk group. Next, we performed the same analysis on validation group-2 ( Figure 3C-D). Consistent with training group results and validation group-1 results, high risk scores on the eight-lncRNA indicated that elderly patients with NSCLC may have a worse OS (log-rank test P =0.017) and DFS (log-rank test P <0.001). These results demonstrated that the predictive value of the eight-lncRNA signature in elderly patients with NSCLC had a great potential in predicting NSCLC patients's OS and DFS.

The eight lncRNAs signature was associated with prognosis in stage I patients
To further investigate the utility of the eight-lncRNA signature, stratification analysis for OS and DFS were performed based on the clinicopathological factors, AGING including gender, smoking status, pathological subtypes and AJCC stage ( Table 2 and Table 3). The eight-lncRNA signature had strong predictive power for OS in elder male patients with NSCLC. However, differences between high-risk group and low-risk group for DFS were observed in training group and validation group-2 only. In addition, the eight lncRNAs signature acted as an independent risk factor for patients in both squamous cell carcinoma and adenocarcinoma. This result was only confirmed in training group and validation group-1 because the second validation group did not contain pathological information.
Furthermore, we performed stratified analysis in different AJCC stages. The result showed that the eight-lncRNA signature had the ability of predicting prognosis in stage I only. Kaplan-Meier curves for the high-and low-risk groups in stage I patients were plotted. Our data showed that patients with high-risk scores exhibited poorer OS than those with low-risk scores. Above results were confirmed in both the training group ( Figure 4A, log-rank test P <0.001) and the two validation groups ( Figure 4B, log-rank test for validation 1: P =0.003; Figure 4C, log-rank test for validation 2: P =0.015). Similarly, our results also showed that the eight lncRNAs signature was associated with DFS of NSCLC patients with stage I in three groups ( Figure 4D-F). Above findings suggested that the eight lncRNAs signature might be a prognostic biomarker for NSCLC patients with early stage.

Functional characteristics of eight prognostic lncRNAs
To further explore the potential function of the above eight lncRNAs in NSCLC, we analyzed the coexpressed genes with eight lncRNAs by calculating Pearson correlation between the eight-lncRNA signature and 7600 protein-coding genes in TCGA dataset. The screening criteria for the encoded protein genes was that these genes were positively associated with at least one lncRNA (Pearson coefficient > 0.4, P < 0.01) ( Figure  5A). A total of 126 genes were selected for pathway enrichment analysis. The results showed that the 126 coexpressed genes were mostly enriched in 18 pathways (especially immune regulatory pathways), such as lymphocyte activation, antigen processing and presentation of exogenous peptide antigen, etc ( Figure  5B-C). It suggested that these eight lncRNAs might be involved in regulating tumor immune status. AGING

DISCUSSION
In the present study, we identified a potential eight-lncRNA signature for predicting OS and DFS of elderly NSCLC patients. A total of five GEO and two TCGA datasets were employed in this study. After a comprehensive analysis, eight lncRNA signature was conducted and were identified to be associated with prognosis in elderly NSCLC patients. The ability to predict prognosis has also been confirmed in two other independent datasets. Furthermore, stratified analysis showed that the eight-lncRNA signature had a high predictive accuracy in predicting OS and DFS of NSCLC patients with early stage.
It is well known that population aging has become a global issue. It will cause a rapid increase of primary lung cancer as well as the number of operations for lung cancer among elderly patients. Therefore, effective disease prevention and treatment strategies for the elderly are necessary. During the past few decades, researches on the prevention, diagnosis and treatment of elderly patients with lung cancer have been reported. In a study investigating the efficacy of metronomic vinorelbine in the treatment of patients with advanced unresectable NSCLC, age was found to be an important factor that affected the treatment efficiency [15]. Exploring effective indicators for elderly cancer patients has been drawing increasing attentions. In the present study, we, for the first time, identified a risk model containing eight lncRNAs that can effectively predict the prognosis of elderly patients with NSCLC. Moreover, it can effectively predict overall survival and tumor-free survival at the same time.
Because of the critical limitations on the TNM staging system and other scoring systems today, it is necessary to find new molecular markers to help clinical evaluation of prognosis and diagnosis. A large number of literatures have reported that certain protein-encoding genes and microRNAs can predict the prognosis and diagnosis of lung cancer patients [16,17]. For example, high expression of miR-155 in serum can help diagnose non-small cell lung cancer. The sample of the detection method is convenient to obtain [18]. Moreover, thanks to the development of CHIP technology, a large number of lncRNAs aberrantly expressed in tumor tissues were discovered [19][20][21]. Many of them have been confirmed to be closely related to the occurrence, development and recurrence of tumors [22,23]. Accumulating evidence suggested that lncRNAs were involved in oncogenic AGING and tumor suppressive pathways indicating a great potential as tumor biomarkers. Furthermore, these dysregulated lncRNAs have already shown great potential as novel molecular biomarkers for diagnosis, prognosis and treatment of cancer. For example, lncRNA AFAP1-AS1 could affect NSCLC patients' survival and epigenetically repress p21 expression which was a key molecular in tumor progression [24]. In our study, instead of looking for a single lncRNA as a predictor of lung cancer prognosis, we found multiple lncRNAs to predict tumor prognosis. In this study, we identified a total of eight lncRNAs (LOC284632, LINC00869, LINC00703, LINC00662, LINC00324, ITGA9-AS1, HOXA11-AS and DHRS4-AS1) and built a prognostic formula. Kaplan-Meier analysis results showed this risk score model has good ability in prognosis prediction. Furthermore, we employed two independent group (TCGA and GSE31210 datasets) as validation groups in order to minimize the bias generated by small-scale data analysis. Our results confirmed the eight-lncRNA signature was a robust and reproducible prognostic biomarker.
Stratification analysis based on clinical characteristics was performed in this study. After analyzing the prognostic values in different AJCC stages, we found AGING the eight-lncRNA signature was significantly associated with OS and DFS in patients with stage I. Considering the surgery is the first-line recommend therapy for stage I patients [25], our eight-lncRNA signature could help physicians to predict patients' prognosis after surgery and to implement effective treatment options. In addition, a large number of studies have been conducted to successfully detect microRNAs in plasma/serum. For example, miR-155 could be sensitively and specifically measured in serum. Overexpression of miR-155 in serum specimens could constitute a diagnostic marker for the early detection of lung adenocarcinoma [18]. Similar as microRNAs, lncRNA plays a huge role in tumor diagnosis and prognosis. Techniques for detecting lncRNA in plasma/serum could contribute to diagnosing disease and predicting prognosis. A study identified plasma HDRF and RDRF which is RNA fragments in plasma/serum derived from lncRNA HOTTIP-005 and lncRNA RP11-567G11.1 in pancreatic cancer (PC). It would to be used as prognostic and diagnostic biomarkers of PC [26]. Therefore, we believe that the expression level and significance of these 8 lncRNAs in the plasma/serum of patients with NSCLC need further study. This would further improve the early diagnosis rate and recurrence rate of patients with NSCLC and improve the survival rate of patients.
Among the eight lncRNAs, five of them, including LOC284632, LINC00869, LINC00703, LINC00662 and HOXA11-AS, acted as protective factors for NSCLC, and the other three lncRNAs (LINC00324, ITGA9-AS1 and DHRS4-AS1) were risk factors. Except for HOXA11-AS and DHRS4-AS1, the other six lncRNAs have not been reported in the literature. Moreover, except for HOXA11-AS, the other 7 lncRNAs in this study were firstly reported as biomarkers in NSCLC. DHRS4-AS1 as a tumor inhibitor functions by preventing the proliferation and invasion, inhibiting the cell cycle progression and promoting the apoptosis of clear cell renal cell carcinoma cells [27]. HOXA11-AS has been studied as a oncogene in NSCLC, gastric cancer, liver cancer, osteosarcoma, and breast cancer [28][29][30][31][32][33]. HOXA11-AS was markedly overexpressed in NSCLC and was associated with patients' prognosis [28]. Experimental evidences suggested that HOXA11-AS was involved in cellular proliferation, migration and invasion. AGING HOXA11-AS also mediated cisplatin resistance of NSCLC cells [34]. Several signaling pathways, such as TGF-beta (TGF-β) pathway, was regulated by HOXA11-AS [35]. This provides new ideas for the study of non-small cell carcinoma machines.
Due to the unclear function of 8 lncRNAs in NSCLC, we also performed pathway enrichment analysis to find the potential biological functions of eight lncRNAs. The mostly enriched pathways were involved in immune regulation, including lymphocyte activation and antigen processing, presentation of exogenous peptide antigen and regulation of tumor necrosis factor (TNF) production, etc. It indicated that the eight-lncRNA might function as tumor immunomodulatory in NSCLC. Nowadays, the investigations of lncRNA in tumors mainly focused on gene imprinting and tumor cell differentiation. A few literatures also reported that lncRNAs were involved in regulating immune response of cancer patients. It was reported that CD8+ T cells and CD4+ T cells expressed a large number of lncRNA genes, many of which were specific to lymphocytes and were dynamically regulated during differentiation or activation [36,37]. Moreover, we also predicted that the eight-lncRNA might affect the production of tumor necrosis factor (TNF). Our above findings need further experimental studies to confirm.
In summary, we identified an eight-lncRNA signature to predict NSCLC patients' OS and DFS. The eight-lncRNA signature showed great potential of prognostic prediction of patients, particularly in those with early stage. To our knowledge, this was the first study to identify lncRNA signature in elderly NSCLC patients. Our findings provided evidence of developing effective prognostic biomarkers for NSCLC patients.

Patient information and study design
A total of seven datasets which contain genetic information and clinical data of NSCLC patients were selected in the study. Five of them (GSE30219, GSE31546, GSE37745, GSE50081 and GSE31210) were downloaded from the Gene Expression Omnibus (GEO) and two (TCGA-LUSC and TCGA-LUAD) from The Cancer Genome Atlas (TCGA) websites. AGING Among them, four GEO datasets (GSE30219, GSE31546, GSE37745 and GSE50081) were integrated as a training group via data normalization, including 456 patients. Meanwhile, 670 patients from TCGA dataset (combination of TCGA-LUSC and TCGA-LUAD) and 130 patients from another GEO dataset (GSE31210) were employed as two independent validation groups. The patients included in this study were all NSCLC patients with >=60 years old. Patients under the age of 60 and patients with missing or no clinical data were excluded. The clinicopathological parameters of the HCC patients in each group are listed in Table 4.

Normalization and lncRNA annotation of GEO data
Because of the inconsistency of gene profiling for four GEO datasets (GSE30219, GSE31546, GSE37745 and GSE50081), the quantile normalization using Robust Multi-array Average (RMA) method were performed in the raw data which were downloaded as probe-level CEL files. The Affymetrix U133 Plus 2.0 which downloaded from Affymetrix website (http://www.affymetrix.com) contained 2986 lncRNA-specific probes.

Construction of the risk formula for prognostic prediction
Firstly, the lncRNAs whose expression levels can not be detected (value=0) in more than 10% of all samples were eliminated. Then Univariate Cox proportional hazards regression was performed for the lncRNAs that were significantly associated with the OS of elderly patients with NSCLC in the training group. The lncRNA with a P value of less than 0.05 was included in the subsequent analysis. Next, stepwise and multivariate Cox regression model was used to identify optimal lncRNAs which is independently associated with prognosis. Finally, a prognostic risk formula was established based on a linear combination of the expression level of these lncRNAs multiplied by the regression coefficients derived from the multivariate Cox regression model as mentioned above.

Statistical analysis
Cox proportional hazards regression was used to identify survival-related biomarkers. Comparison of prognosis between high-risk group and low-risk group was performed by Kaplan-Meier survival curves and log-rank test. Time dependent ROC curve was plotted to assess the specificity and sensitivity of the prognostic prediction. The above analyses were performed using R (version 3.3.1). The stratification analysis based on clinicopathological parameters and univariate and multivariate Cox regression analyses were performed using SPSS software (version 24.0).