Immune-related lncRNAs as predictors of survival in breast cancer: a prognostic signature

Breast cancer is a highly heterogeneous disease, this poses challenges for classification and management. Long non-coding RNAs play acrucial role in the breast cancersdevelopment and progression, especially in tumor-related immune processes which have become the most rapidly investigated area. Therefore, we aimed at developing an immune-related lncRNA signature to improve the prognosis prediction of breast cancer. We obtained breast cancer patient samples and corresponding clinical data from The Cancer Genome Atlas (TCGA) database. Immune-related lncRNAs were screened by co-expression analysis of immune-related genes which were downloaded from the Immunology Database and Analysis Portal (ImmPort). Clinical patient samples were randomly separated into training and testing sets. In the training set, univariate Cox regression analysis and LASSO regression were utilized to build a prognostic immune-related lncRNA signature. The signature was validated in the training set, testing set, and whole cohorts by the Kaplan–Meier log-rank test, time-dependent ROC curve analysis, principal component analysis, univariate andmultivariate Cox regression analyses. A total of 937 immune- related lncRNAs were identified, 15 candidate immune-related lncRNAs were significantly associated with overall survival (OS). Eight of these lncRNAs (OTUD6B-AS1, AL122010.1, AC136475.2, AL161646.1, AC245297.3, LINC00578, LINC01871, AP000442.2) were selected for establishment of the risk prediction model. The OS of patients in the low-risk group was higher than that of patients in the high-risk group (p = 1.215e − 06 in the training set; p = 0.0069 in the validation set; p = 1.233e − 07 in whole cohort). The time-dependent ROC curve analysis revealed that the AUCs for OS in the first, eighth, and tenth year were 0.812, 0.81, and 0.857, respectively, in the training set, 0.615, 0.68, 0.655 in the validation set, and 0.725, 0.742, 0.741 in the total cohort. Multivariate Cox regression analysis indicated the model was a reliable and independent indicator for the prognosis of breast cancer in the training set (HR = 1.432; 95% CI 1.204–1.702, p < 0.001), validation set (HR = 1.162; 95% CI 1.004–1.345, p = 0.044), and whole set (HR = 1.240; 95% CI 1.128–1.362, p < 0.001). GSEA analysis revealed a strong connection between the signature and immune-related biological processes and pathways. We constructed and verified a robust signature of 8 immune-related lncRNAs for the prediction of breast cancer patient survival.


Background
Among women, breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-related death [1,2]. Globally, it is reported that nearly 2.1 million new breast cancer cases were diagnosed, and more than 0.6 million women died of breast cancer in 2018 [3]. Breast cancer is a highly heterogeneous disease and, thus, its etiology, pathological manifestations, and outcomes vary from person to person [4,5]. Considering the high mortality and heterogeneity, it is urgent to identify suitable detection approaches for breast cancer prognosis biomarkers. Many studies have shown that some genes and mRNAs play significant roles as prognostic molecular markers in malignancies [6][7][8]. Recently, research into tumor immunity has become the most rapidly advancing area within cancer. Immunotherapy provides the unprecedented opportunity to effectively treat malignancies owing to the essential involvement of the immune system in tumor development, progression, and therapy [9], especially in some malignancies such as hepatocellular carcinoma [10], early-stage squamous cell cancer of the anal canal [11], prostate cancer [12].
Long non-coding RNAs (lncRNAs) are classified as transcripts that are longer than 200 nucleotides and do not encode proteins. However, lncRNAs can physically interact with DNA, RNA, or protein. Through such interactions, lncRNAs are able to regulate gene expression at various levels such as transcriptional, post-transcriptional, and translational regulation.As a result, lncRNAs have important roles in the occurrence, progression, and prognosis of cancers and various other diseases [13][14][15]. In addition, recent studies have indicated that lncRNAs have crucial functions in different phases of cancer immunity such as antigen presentation, immune activation, and immune cell infiltration [16,17]. Thus, immune-related lncRNAs have drawn considerable attention. One study has reported the prognostic merit of an immune-related lncRNA signature in the prediction of overall survival (OS) in breast cancer [5]. However, this immune-related lncRNA signature has not been validated externally by other researches yet.
In the present study, we identified and validated a robust and reliable molecular signature for the prediction of survival in breast cancer patients. Our results validated a risk scoring model based on 8 immune-related lncRNAs. The model can be used as a reliable prognostic predictor, and the 8 lncRNAs could be potential therapeutic targets for breast cancer.

Data source and preprocessing
We downloaded RNA sequencing data sets as well as the corresponding clinical characteristics of breast cancer samples and normal samples from the TCGA website (https ://cance rgeno me.nih.gov/) in February 2020. Male breast cancer samples or samples with a follow-up time of less than 30 days were excluded. The expression levels were normalized using the Trimmed Mean of M-values method from the "edge" R package and underwent a log2(x + 1) transformation. R3.6.2 software was applied to normalize, process, and analyze the data. Perl (https :// www.perl.org/) was used to integrate the RNAseq value of each sample into a matrix file. A list of immune-related genes was downloaded from the gene list resources in Immunology Database and Analysis Portal (ImmPort), an open repositories of subject-level human immunology database for translational and clinical research (https :// www.immpo rt.org/) [18].

Identification of immune-related lncRNAs
Immune-related lncRNAs were screened by co-expression analysis of immune-related genes from the list as described above. All samples were randomly split into training and validation sets using the R software, at a 1:1 ratio. Univariate Cox proportional hazard regression analysis was conducted to screen the immune-related lncRNAs significantly associated with overall survival (OS) in the training cohort with p < 0.01 as the criterium. The least absolute shrinkage and selectionoperator (LASSO) regression method was used for the identification of immune-related lncRNAs most correlated with overall survival using the "glmnet" package for R.

Construction of the immune-related risk prognostic system
Multivariate Cox regression analysis was used to identify significant lncRNAs for construction of the prognostic signature. We then calculated the risk score based on the expression levels of lncRNAs for each patient through following formula [19]: Riskscore = exp1*β1 + exp2*β2 … + expi*βi (expi was the expression value of each lncRNA, and βi was the regression coefficient of the multivariate Cox analysis for the target lncRNA). According to the median risk score in the training set, breast cancer patients were divided into a high-risk group and a lowrisk group.
Keywords: Breast cancer, Long non-coding RNA, Immune-related predictors, Prognostic signature, TCGA , Overall survival Page 3 of 13 Ma et al. J Transl Med (2020) 18:442 Application and validation of the risk scoring system In order to validate the predictive value of the model, we performed the Kaplan-Meier log-rank test, timedependent ROC curve analysis, univariate analysis, and multivariate Cox regression analysis for comparison of the survival between the high-and low-risk group in the training, validation, and the total cohort using the R packages "survival" and "survivalROC". We then analyzed the correlation between the expression of the 8 immunerelated lncRNAs and clinicopathological characteristics.

Gene set enrichment analysis
In order to identify different functional phenotypes between the high-risk and low-risk groups, we performed a Gene set enrichment analysis (GSEA) 4.0.3(https :// www.broad insti tute.org/gsea/index .jsp) [20]. The mRNA expression profiles of breast cancer samples from the TCGA dataset, which were divided into two groups according to risk score, were performed on GO gene sets. The study included 1000 random sample permutations and enriched gene sets with a nominal p < 0.05and FDR < 0.25 were considered statistically significant. All other parameters were set based on their default values.

Data source and processing
Initially, we obtained a total of 14143 lncRNA expression and 19659 gene expression profiles from 1053 breast cancer samples and 111 normal samples. In addition, the corresponding clinical data of 986 patients were downloaded from TCGA. The immune-related gene list downloaded from the ImmPort database contained 1534 immune related genes (Additional file 1: Table S1). We then obtained 937 immune-related lncRNAs through coexpression analysis of the immune gene list (p < 0.001). The top 10 positively and negatively correlated lncRNAs are shown sorted by correlation coefficient in Table 1.

Identification of immune-related lncRNAs and construction of the prognostic model
The data of breast cancer patients was allocated randomly to the training and validation cohort, 494 patient samples in the training cohort, 492 patient samples in the validation cohort. We carried out univariate Cox regression analysis on the expression profiles of the lncRNAs in the training set and obtained 15 candidate immune-related lncRNAs, significantly associated with survival, p < 0.01 (Fig. 1a, Table 2). We performed Lasso regression on these lncR-NAs. In order to avoid over fitting of the predicted signal, the prediction accuracy wasestimated by tenfold cross-validation (Fig. 1b, Figure S1 and Additional file 3: Table S2, then we compared their expression between cancer samples and normal samples (Additional file 4: Figure S2). Except for AL161646.1 and LINC00578 (p < 0.0001), lncRNAs, including OTUD6B-AS1, AC245297.3, AC136475.2, AL122010.1, AL161646.1 and LINC01871, were high expressed between breast cancers and normal tissues. Some of the correlation between the lncRNAs and immune genes were shown in Additional file 5: Figure Table 3).

The immune-related lncRNA model is a robust prognostic tool for breast cancer
Breast cancer patients were divided into low-and high-risk groups according to the median risk score in the training set. Figure  Kaplan-Meier test. The p value of the log-rank test was 1.215e − 06, indicating that patients in the low-risk group had a 10 year longer median OS compared with the high-risk group. To assess the accuracy of the prognostic model, further examinations in both the testing set and the whole set were performed with the same algorithm cutoff. Both sets yielded similar results. The low-risk group exhibited remarkably better overall survival (OS) than the high-risk group, which indicated that the prognostic signature was effective (p = 0.0069 in the validation set; p = 1.233e − 07 in the total set) (Fig. 2b, c). The ROC curve analysis of the model in the training set demonstrated its promising predictive value for breast cancer survival (1-year AUC = 0.812, 5-year AUC = 0.772, 8-year AUC = 0.81, 10-year AUC = 0.857, Fig. 2d). We then validated the model in the testing set, and the 1-year AUC was 0.615, the 5-year AUC was 0.599, the 8-year AUC was 0.68, and the 10-year AUC was 0.655 (Fig. 2e). As for the total cohort, the 1-year AUC was 0.725, the 5-year AUC was 0.678, the 8-year AUC was 0.742,and the 10-year AUC was 0.741 (Fig. 2f ).
Principal component analysis (PCA) of the training, testing, and total breast cancer cohort demonstrated a different distribution pattern between the high-and lowrisk groups, based on the expression of the 8 immunerelated lncRNAs. This was indicative of the difference between the immune phenotypes of the groups (Additional file 6: Figure S4).

Assessment of the correlation between candidate lncRNAs and clinicopathological characteristics
We generated risk curves and scatter plots to display the risk score and survival status of each breast cancer patient, not only in the training set, but also in the testing and in the total set. The risk coefficient and mortality in the low-risk group were lower than those in the high-risk group (Fig. 3a-f ).

Evaluation of the immune-related lncRNA signature as an independent prognostic factor in patients with breast cancer
We carried out univariate and multivariate Cox regression analyses to verify that the model could serve as an independent prognostic factor for breast cancer, also accounting for certain clinicopathological variables (such as age, ER status, PR status, AJCC 7th T stage, etc.) (Fig. 5)  analyses of the above mentioned factors indicated that the immune-related lncRNA model was a reliable and independent prognostic factor for OS in the training set (HR = 1.432; 95% CI 1.204-1.702, p < 0.001), validation set (HR = 1.162; 95% CI 1.004-1.345, p = 0.044), and whole set (HR = 1.240; 95% CI 1.128-1.362, p < 0.001).

Gene set enrichment analysis for functional annotation of the immune-related risk signature
GSEA of the risk signature was performed using the GSEA software. The results revealed that immunerelated responses were further enriched in low-risk groups compared to high-risk groups. We demonstrated 5 immune-related gene ontology terms in the GSEA results with FDR < 0.25 (Fig. 6), including the positive regulation of immune effector processes, positive regulation of the adaptive immune response, positive regulation of lymphocyte activation, regulation of T cell activation, and the T cell receptor signaling pathway.

Discussion
With the in-depth researches on lncRNAs and the immune system [21][22][23][24], scholars have realized that immune-related lncRNAs may prove to be useful not only as potential prognostic biomarkers but could also provide novel therapeutic options. Nevertheless, the lack of validation cohorts is a limitation for the proper evaluation of the prognostic merit of potential biomarkers. In the current study, we identified a novel immune-related lncRNA molecular signature using Cox and Lasso regression analyses. The signature was then validated in a testing group and a total group, indicative of its robustness and reliability. The signature demonstrated good predictive performance and could effectively classify breast cancer patients into a high-risk and a low-risk group within the training and validation sets. The low-risk group had a significantly longer overall survival compared with the high-risk group. Further, the signature proved to be an independent prognostic factor based on multivariate Cox analysis, which revealed the signature's reproducibility and reliability for breast cancer prognosis. Among the 8 candidate lncRNAs, AC136475.2, AL161646.1, LINC01871, and AP000442.2 had not been previously reported. OTUD6B-AS1 and LINC00578 were discovered as prognostic signatures in breast cancer for the first time. Although AL122010.1 and AC245297.3 had been previously reported as prognostic signatures in breast cancer [25], our research characterized their functional involvement as lncRNAs associated with tumor immunity. OTUD6B-AS1 is transcribed from the opposite strand of the OTUD6B gene, which is located on chromosome 8 in head-to-head orientation to OTUD-6BAS1 [26]. It was reported that high OTUD6B-AS1 expression indicates poor prognosis in ovarian cancer [27]. However, Gang Wang et al. [28] found that high OTUD6B-AS1 expression was associated with improved survival and inhibited clear cell renal cell carcinoma proliferation via the Wnt/β-catenin signaling pathway. Zhuolu Wang et al. found that OTUD6B-AS1 inhibits viability, migration, and invasion of Thyroid Carcinoma by Targeting miR-183-5p and miR-21 [29]. Further more, it was reported that the expression of miR-21 in cells of the tumor immune infiltrate, and in particular in macrophages, was responsible for promoting tumor growth [30]. Activation-induced up regulation of miR-21 biases the transcriptome of differentiating T cells away from memory T cells and toward inflammatory effector T cells [31]. In the current study, we found that the overall survival was lower under high OTUD6B-AS1 expression. Further, expression was higher in the N1 and N2 groups when compared to the N0 group. We speculated that OTUD6B-AS1 affects macrophages and differentiating T cells through miR-21, then affects immune status, and finally affects the occurrence and development of tumor. Therefore, further research and mechanistic insight are required. LINC00578 was found as a potential biomarker in lung adenocarcinoma [32,33], major depressive disorder [34], and pancreatic cancer [35]. Although the AC243960.1 and OIP5-AS1 lncRNAs were the most positively and negatively correlated, respectively, with immune genes, they did not appear in the signature. It was indicated that high AC243960.1 expression indicates better prognosis in breast cancer, however, no statistically significant interactions between OIP5-AS1 expression and OS was found (Additional file 7: Figure S5). This may be due to the deletion of some relevant data during regression analysis.
As breast cancer is a highly heterogeneous disease, scientists divide breast cancers into different clinically relevant molecular subtypes based on the expression levels of the estrogen receptor (ER), progesterone receptor (PR), and HER2 [36][37][38]. Distinct prevalence, prognosis, and systemic therapies are utilized in the management of these different breast cancer subtypes [39][40][41]. HER2-overexpressing (ER and PR− , HER2 +) and triplenegative (ER and PR− , HER2−) subtypes are known to be more aggressive and have poorer outcomes [38,42]. Our results indicated that the expression of LINC01871 was high on both subtypes. Meanwhile, and LINC01871 shows a strong positive correlation with immune genes such as GZMB, CTLA4, PDCD1 etc. (Additional file 5: Figure S3). GZMB, the most potent cytotoxic molecules, act mainly as antitumoral and anti-infectious factors. However, when expressed by immune regulatory cells it may contribute to immune evasion of specific cancer types [43]. Cytotoxic T lymphocyte-associated antigen 4 (CTLA4) and programmed death-1 (PD-1; encoded by the PDCD1 gene) represent crucial immune checkpoints, the blockade of which can potentiate anti-tumour immunity [44,45]. Therefore, it is suggested that LINC01871 may play an important role, particularly related to the above immune processes and immune genes, in the development of breast cancer in the two phenotypes, which requires in-depth investigation in the future.
Finally, GSEA further confirmed the robust connection of the signature with the immune response. Samples from patients with low-risk scores were associated with positive regulation of adaptive immune response. Further, patients with a high-risk score exhibited greater adaptive immune resistance. Adaptive immune resistance is a process during which cancers change their phenotype in response to a cytotoxic or pro-inflammatory immune response, thereby evading it [46]. Inhibition of adaptive immune resistance is the mechanistic foundation of responses to PD-1 or PD-L1 inhibition [47,48], which has made a significant contribution to the treatment of breast cancer [49,50]. Our prognostic signature may provide directions for predicting the efficacy or studying the mechanism of PD-1 or PD-L1 inhibition immunotherapy. Lymphocytes, including T cells, B cells, and natural killer cells, are the main force underlying immune defense mechanisms. For them to perform their immune function, lymphocytes must be activated either through the recognition and binding of antigens or through stimulation by cytokines [51][52][53]. In this study, samples from patients with low-risk scores were associated with the positive regulation of lymphocyte activation, which may indicate that low-risk patients have a more active immune status and better immune defense than do highrisk patients. T cells have been considered as having a significant role in immune surveillance and tumor eradication. On the basis of this paradigm, over the past quarter century, T cell-based cancer therapies have achieved success in patients [54,55].
The advantage of the current study was that our signature is based on population databases and high-throughput sequencing data. Further, both exploration and validation were used in order to evaluate the risk score method. Tumor immunology has become the most rapidly advancing area of cancer research, and immunotherapy has provided promising treatment in recent years. The current study utilized a new immune-related prognostic approach for breast cancer. However, there were some limitations in this study. First, as the analyzed data was obtained from online databases, the study was of retrospective nature. Second, there is no in vitro or in vivo experimental data confirming our findings. In addition, we did not explore the potential mechanisms of investigated lncRNAs. Thus, more functional studies on the 8 lncRNAs, alone and in combination, should be carried out to further test the predictive accuracy of the signature and discover potential immune-related mechanisms. Of note, we are currently working on the clinical validation and mechanistic elucidation of these results. In various studies, gene expression differences between cancer and normal tissues are compared for the screening of prognostic genes. This may leave out certain genes with little expression differences between cancer and normal tissues. Such genes with no obvious expression difference may have a great influence on the biological behavior of tumors, chemotherapy, immunotherapy, and other factors affecting the survival of patients. Despite these limitations, to the best of our knowledge, this study is the first to report the external validation of an established immune-related lncRNA signature for breast cancer. The eight immune-related lncRNAs had never been studied in breast cancer.

Conclusions
In conclusion, we have identified an 8 immune-related lncRNA signature as a potential prognostic tool for breast cancer patients. It is strongly connected to the risk value, tumor status, and OS. The signature provides a novel insight into immune-related lncRNAs in breast cancer and identifies potential biomarkers for prognosis and immunotherapy.