Identication of Immune-Related lncRNA Biomarkers Associated With Prognosis In LUAD

Background: Long non-coding RNAs (lncRNAs) are involved in regulation of immune response and could serve as biomarkers for tumors. Methods: WGCNA were used to construct an immune-related gene interaction network and elucidated possible functions of the lncRNAs by gene enrichment analyses. 6 lncRNAs were selected and used to construct an immune-related lncRNAs risk score (ILRS) classier. Results: Patients with high and low ILRS showed signicant differences in microenvironment and immune cell inltration type. Furthermore, enrichment analyses showed that these lncRNAs regulated immune-related pathways. Finally, four clusters derived from LUAD patients showed a signicant correlation with overall survival of patients. Conclusions: This work provide new insights for subsequent development of clinical strategies against the disease.

Reports indicate that only 2% of the human DNA encodes proteins while 90% of them could not be transcripted [14]. Transcripts that do not encode proteins are referred as non-coding RNAs [15] with those ranging from 200 nucleotides to 100 kilobases in length considered long non-coding RNAs (lncRNAs) [16,17]. Studies have demonstrated that are involved in many biological processes including normal developmental processes and cancers, as well as immune responses [18]. For instance, a host-derived lnc-Lsm3b was found to compete with viral RNAs at the late stage of innate immunity [19]. Similarly, LincR-Ccr2-5'AS was shown to act as a Th2-speci c lncRNA, regulating expression of immune-related genes in Th2 cells and movement of mouse Th2 cells into the lungs [20,21]. With regards to tumor information and development, an altered expression of lncRNAs has been shown to affect development, invasion and metastasis of many cancers due to regulation of gene expression is regulated by lncRNAs at epigenetic, transcriptional, and post-transcriptional levels [22]. Moreover, different lncRNAs have been shown to enhance the chemoresistance of cancer cells by improving DNA repair, modulating cellular apoptosis and changing drug metabolism [23]. According to previous studies, several lncRNAs related to LUAD process have been identi ed. These include, DGCR5 that promotes LUAD progression by inhibiting hsa-mir-22-3p and HOXA11-AS that drives cisplatin resistance of human LUAD cells by modulating miR-454-3p/Stat3 [24,25]. In addition, immune-related lncRNAs have been identi ed as potential therapeutic targets in anaplastic gliomas, breast and colorectal cancer [26][27][28]. However, speci c roles played by lncRNAs in LUAD-in ltrating immune cells remain unknown.
In this study, we investigated the relationship between immune-related lncRNAs and clinical outcomes in LUAD patients using genome-wide comparative analysis of RNA expression pro les. We constructed an immune-related gene interaction network based on lncRNA using WGCNA analysis, and further modularized the network. We then chose a prognosis-related module for LASSO COX analysis and selected the lncRNA set which was signi cantly related to the overall survival. We constructed the immune lncRNA risk score (ILRS) classi er based on these lncRNA sets. The relationship between ILRS and LUAD was evaluated and proved that ILRS was in uenced by the immune invasion score. Furthermore, enrichment analysis revealed that immune-related lncRNAs participate in many immune regulation processes. Finally, we strati ed TCGA-LUAD population into 4 clusters that had signi cant relationships with clinical outcomes based on the immune-related lncRNA pro les.

Expression pro les and acquisition of clinical data
The RNA expression pro les for LUAD patients was obtained from Gene Expression Omnibus (GEO) database [29]. This dataset was sequenced on the GPL16791 (Illumina HiSeq 2500 (Homo sapiens)) platform and can be accessed using accession number GSE81089. The database contained RNA expression pro les of 199 LUAD patients as well as other relevant clinical information such as age, gender, tumor stage, life style and ps who (WHO performance status). This data-set was mainly used for subsequent acquisition of immune-related co-expression modules. TCGAbiolinks [30] package was then used to download RNA-seq data and related clinical information of LUAD patients from the TCGA database. The dataset contained expression pro les of 524 patients and was mainly used for construction of prognostic models based on immune-related lncRNAs. RNA sequence data for these samples were generated using the IlluminaHiSeq platform. Clinical metadata of these patients included their age, TNM stage, stage, overall survival data and outcome.

Acquisition of immune-related genes
Immune-related genes were downloaded from innateDB [31] and ImmPort [32] (Immunology Database and Analysis Portal). Innate immune-related genes from 35,747 publications were included in the two databases (innateDB and ImmPort) and included all genes involved in different immune processes from a variety of experimental studies. We combined the immune-related RNAs from these two databases to obtain a total of 1,238 genes (Table S1).

Weighted gene co-expression network analysis
We employed the Weighted gene co-expression network analysis (WGCNA) [33,34] to obtain the candidate immune-related lncRNAs and GSE81089 to construct a co-expression network. In order to establish a reliable RNA co-expression network, we rst screened genes and lncRNAs according to the following criteria: 1) for each gene and lncRNA, RNA with zero expression on more than 60% of LUAD samples was deleted. 2) RNA expression data with a variance equal to 0 was deleted. Then, the WGCNA package [34] was used to construct a topological overlap matrix (TOM) of immune-related RNA and lncRNAs which represents the relative interconnectedness between each pair of genes. To enable better characterization of the biological network in the constructed TOM, we selected the optimal soft threshold here to satisfy the scale-free topology of the network as previously described [34]. Then, we proceeded to do modular analysis of immune-related gene and lncRNAs, and calculated Pearson correlation coe cients between different modules as well as clinically related indices. If the proportion of immunerelated genes contained in a module was greater than 1/4, and the module was signi cantly correlated to prognostic indicators (p-value < 0.05), then the module was selected for immune-related prognostic module. LncRNAs in these modules are de ned as candidate immune-related lncRNAs.

Identi cation of immune-related lncRNAs
To further screen the immune-related lncRNAs controlling LUAD prognosis, the Cox proportional-hazards (Coxph) model, Least Absolute Shrinkage and Selection Operator (LASSO) regression models were applied during WGCNA analysis. Brie y, the TCGA-LUAD samples were split into a training and testing set in the ratio 2:1 for developing and validating the prognostic model. The univariate Coxph model, adjusting for age, gender, stage, T stage, N stage and M stage of the patients, was used to select the lncRNAs with prognostic signi cance (p <0.05). Then these prognostic biomarkers were screened using multivariate LASSO regression via the glment [35] and survival [36] packages implemented in the R version 3.5.2 [37].
Establishment of a risk assessment model Multivariate Coxph analysis was rst performed on each immune-related lncRNA based on the TCGA-LUAD training dataset. For further construction of the prognostic risk score model, a coe cient for each immune-related lncRNA was extracted and the model constructed based on linear combination of their expression levels as follows: where γ i is the coe cient of lncRNA i obtained from multivariate Coxph regression, E i is the expression of RNA i, and N is the number of lncRNAs in the Coxph model. Then, the risk score for each patient was calculated based on expression of the selected lncRNAs. The patients in the TCGA-LUAD training, testing and entire dataset were divided into high and low-risk subgroups for further analysis according to the median cut-off value obtained from the risk scores. Association between the risk score and each clinical feature was assessed using Fisher's exact test, with p < 0.05 considered a signi cant cut-off. TimeROC package [38] was further used to separately draw receiver operator characteristics (ROC) curves for the datasets with the area under the ROC curve (AUC) calculated to examine and compare performance of the classi er.

Determination of immune cell in ltration and leukocyte subtypes in the LUAD samples
To investigate the relationship between ILRS and the degree of immune cell in ltration, we used the ESTIMATE package [39] to generate stromal and immune scores as well as tumor purity of TCGA-LUAD patients. The stromal and immune scores represented presence of stromal and immune cells, respectively in tumor tissues while tumor purity was calculated based on the combination of these scores. They were obtained on the basis of RNA expression pro les of the LUAD samples. Furthermore, we used CIBERSORTx [40] to assess the relative fraction of 22 leukocyte subtypes based on all mRNA transcripts in the TCGA-LUAD cohort, with samples showing statistical signi cance (p < 0.05) kept for further analysis. A student's t test was conducted using the ggpubr package [41] to compare differences in immune cell in ltration and leukocyte fractions across the high and low ILRS groups.

Enrichment analysis
We used the Limma package [42] to obtain differentially expressed genes (DEGs) at FDR< 0.01, and |fold change| > 1.5 between patients with a high and low risk scores. We then used the DAVID Functional Annotation Bioinformatics Microarray Analysis [43] to perform gene ontology (GO) and KEGG enrichment analyses on the DEGs. GO enrichment analysis included biological process (BP), molecular function (MF) and subcellular localization (CC). Enrichment analysis and visualization were conducted using the TCGAanalyze_EAcomplete and TCGAvisualize_EAbarplot2 functions implemented in TCGAbiolinck package [30] respectively.

Survival analysis
To evaluate the prognostic value of different variables for LUAD, including lncNRA expression level, risk score and immune in ltration, we performed the Kaplan-Meier (KM) analysis and considered p <0.05, obtained by log-rank test to be statistically signi cant. KM analysis and log-rank test were mainly implemented by the "sruvival" package [36] with the "survminer" package [44] used for drawing the KM curve.

Patient strati cation
We used expression pro les from immune-related lncRNAs to classify the LUAD patients. First, a z-score transformation was used to normalize immune-related lncRNA expression in the TCGA-LUAD cohort as previously described [45]. Then, a Pearson's correlation coe cient (PCC) between samples was computed to assess their similarity followed by the use of partition around medoids algorithm to subsequently divide the samples into k subgroups [63]. The maximizing cluster reliability approach was adopted to determine the optimal number of subgroups as previously described. Finally, a consensus clustering analysis was carried out using the "ConsensusClusterPlus" package [46].
To further validate and visualize the immune-related lncRNA derived clustering, a principal component analysis (PCA) was performed among the TCGA-LUAD tumor samples. We conducted the PCA analysis based on normalized expression data using the "prcomp" function from the "stats" module implemented in R 3.5.2 software. We then selected the rst two principal components that had most variance and projected each sample into two-dimensional space. Finally, "ggplot2" package [47] was used to visualize the expression pattern of all samples as well as the immune-related lncRNA -derived clusters.

Results
Identi cation of candidate immune-related RNA modules WGCNA was successfully used to obtain immune-related lncRNAs. First, the similarity of RNA expression pro les in 108 LUAD patients (GSE81089) was calculated and unsupervised clustering further used to detect outliers. Three obviously abnormal samples were deleted ( Figure S1). Then, we constructed a regulatory network containing lncRNAs and immune-related genes. To ensure that the distribution of the regulatory network degree obeys the power-law distribution, we chose a soft threshold of 2 (Fig. 1A, Figure S2). LncRNAs and mRNAs were clustered into 11 modules using WGCNA (Fig. 1B). The numbers of RNAs (lncRNA and mRNA) contained in each module are outlined in table S1. Pearson correlation coe cients of each module associated with the prognosis of patients' clinical phenotypes was calculated with the results showing that red, black and turquoise modules were respectively related with the patients' status, ps who, overall survival days (Fig. 1C, p = 0.05, 0.02 and 0.05 respectively). Furthermore, we found that the proportion of immune-related genes in these modules was greater than 25% (Fig. 1D), which suggested that lncRNAs in these modules may be highly related to immune responses. Consequently, resultant lncRNAs from the three modules were selected as candidates for further analysis.
Construction of prognostic classi ers based on candidate immune-related lncRNAs We rst integrated 156 lncRNAs from three modules; 42, 27 and 87 in the red, black and turquoise module respectively. The data from LUAD patients downloaded from TCGA were then divided into training and testing datasets in the ratio of 2:1, and respectively used to construct the prognostic classi er model for further validation. In the training dataset, the univariate Coxph regression analysis adjusting for age, gender, stage and TNM stage was used to nd 7 lncRNAs (p < 0.05) associated with overall survival. Then, LASSO Coxph regression was employed to obtain 6 immune-related prognostic biomarkers. A multivariate Coxph regression model was constructed and relevant parameters calculated based on the training dataset. A forest map showed that 3 of these immune-related lncRNAs (RP6-91H8.3, RP11-504A18.1 and AC006129.2) were potential risk while the other 3 (RP11-1275H24.1, LINC01003 and AC006129.2) were potential protection factors ( Fig. 2A). This result was further con rmed by KM curve, in which these lncRNAs were found to be signi cantly correlated with overall survival (logrank test p < 0.05) ( Fig. 2B − 2G). WGCNA analysis was used to evaluate the interaction between these lncRNAs and immune-related genes and further construct an interaction network (Fig. 2H). Results showed that these lncRNAs came from different modules, including RP691H8.3 from the black, AC006129.2 from the red, and the remaining lncRNA from the turquoise module. This indicated that these lncRNAs may control the prognosis of LUAD patients from different aspects. More importantly, each lncRNA interacts with multiple immune-related genes, suggesting their potential roles in immune regulation.
Furthermore, a prognostic classi er model was constructed using the multivariate Coxph regression coe cient of lncRNA and the immune lncRNA-based risk score (ILRS) for each patient in the training dataset calculated. Patients were divided into high and low risk score groups (Fig. 3A). We found that patients with a high ILRS had a signi cantly (Fisher's test p < 0.05) increased risk of dying (Fig. 3B). In addition, we found that immune-related lncRNAs showed different expression patterns between the groups and ILRS was also related to various clinical information, including Stage (Fisher's test p < 0.05) (Fig. 3C). This phenomenon was consistent in the testing (Fig. 3D − 3F) as well as the entire datasets ( Fig. 3G − 3I). In conclusion, the classi er based on immune-related lncRNAs has potential to be a prognostic indicator for patients.
We evaluated ILRS performance using ROC curve in order to verify its e cacy in predicting prognosis of LUAD patients. In the test dataset, a 1, 3, and 5-year survival of LUAD patients of 0.71, 0.61, and 0.54, respectively was predicted (Fig. 4A). The KM curve showed a signi cant correlation (logrank-test p < 0.05) between ILRS and the overall survival (Fig. 4B). We also observed the predictive effect of ILRS on patient survival in the entire dataset (Fig. 4C, 4D). Notably, ILRS outperformed traditional clinical indicators, including stage and T Stage, in predicting 3 and 5-year survival (Fig. 4E, 4F). Furthermore, results from the univariate (p = 4.60E-10) and multivariate (p = 3.42E-06) Coxph analyses indicated that ILRS was the most signi cant and independent risk factor in the entire TCGA cohort (Table 1).

Differences in immune in ltration scores between ILRS groups
To examine the relationship between ILRS and immune in ltration, we used Estimate package [39] to rst determine the stromal and immune scores as well as tumor Purity of each LUAD patient based on data from TCGA. The results showed no signi cant relationship (logrank test p > 0.05) between Stromal score and survival. However, the extension trend of high stromal scores in the overall survival was still observed (Fig. 5A). Immune Score and tumor purity were both signi cantly (logrank test p < 0.05)correlated with the overall survival of patients (Fig. 5B,5C). We then investigated the effects of high and low ILRS on stromal, and immune scores as well as tumor purity and found signi cant differences in the training, testing and entire datasets (Fig. 5D -5F). Among them, the stromal and immune scores of patients in the high ILRS group were signi cantly lower than those of patients in the low ILRS group across the three datasets (training, testing, entire). On the other hand, tumor purity in the high ILRS group was signi cantly higher than that in the low group. These results indicated that ILRS is linked to immune in ltration. In addition, this process may to some extent be regulated by the immune-related lncRNAs from LUAD.

Differences in leukocyte cell subsets between ILRS groups The abundance of different types of immune response cells is directly related to prognosis in tumor patients [48]. To gain insights into the relationship between ILRS and different types of immune cell in ltration, we rst estimated the proportion of 22 types of immune cells in each LUAD patients identi ed
from the TCGA database using CIBERSORTx [40], and selected samples with a p-value < 0.05 for further analysis. From the analysis, we observed that the types (total > 70%) of in ltrating immune cells in the testing dataset were mainly T, and B cells as well as macrophages (Fig. 6A). All the three types of immune cells were found to be closely related to prognosis of tumor patients. Consistent results were observed in the entire dataset. Next, we compared abundance of the 22 types of immune cells in LUAD patients with high and low ILRS (Fig. 6C − 6D). It is worth noting that the tumor killing immune cells, such as CD8 T, plasma and NK cells, were signi cantly lower in high than in low ILRS LUAD patients. However, immunosuppression-related cells were more abundant in high ILRS. This indicates that immune-related lncRNAs may be mainly involved in the regulation of these immune cells (CD8 T cells, plasma cells, NK cells).
GO and pathway enrichment analyses identify ILRS associated genes GO and pathway enrichment analyses were carried out to explore the potential mechanism of action of different prognosis and immune in ltration in high and low ILRS patients. Differential gene expression analysis revealed a total of 1,214 genes in patients with high and low ILRS. In Go analysis, the Biological Process (GOBP), Cellular Component (GOCC), Molecular Function (GOMF) and pathway enrichment analyses were performed for these genes. From these, we identi ed 15 most signi cant items for each enrichment analyses and these are outlined in Fig. 7. Most of these items were directly related to immune response. Based on the GOBP enrichment, we observed that these DEGs were related to immune system regulation, T cell activation and antigen presentation among other functions (Fig. 7A). In the enrichment of GOCC and GOMF, we found that these genes are involved in production of MHCII class molecules, as well as cytokines (Fig. 7B, 7C). Consistent results were observed in pathway enrichment analysis, which indicated that these genes were primarily enriched in T helper, and cytotoxic T cells as well as dendritic cell-related pathways (Fig. 7D).
Strati cation of LUAD patients based on candidate immunerelated lncRNA We used molecular cancer pro les to stratify patients in cancer informatics studies due to the complexity and heterogeneous of this disease [49]. Based on our earlier results that demonstrated the remarkable prognostic values of immune-related lncRNAs, we investigated whether these immune-related lncRNAs are capable of classifying LUAD patients into clinically-relevant subtypes. Here, we used expression pro les of the immune-related lncRNAs of LUAD patients from TCGA database as genomic signatures with unsupervised consensus clustering approach for discovery of distinct subgroups. According to the results, LUAD patients clustered into four distinct clusters with different patient numbers of 113, 186,121 and 104 (Fig. 8A). We further validated the separation between different patient groups by using a principal component analysis (PCA) [29]. The PCA map showed lower intra-cluster patient-to-patients compared to the inter-cluster similarities with the rst two principal components contributing up to 74% of the total variation. In addition, patients in different subgroups exhibited unique expression pro les (Fig.  8B). To assess whether the strati cation determined by the immune-related lncRNAs were associated with clinical outcomes, a KM survival analysis was adopted for evaluation of prognostic performance in the clusters with respect to overall survival. The immune-related lncRNA-based subtypes were signi cantly (log-rank test, p < 0.01) associated with overall survival in this LUAD population (Fig. 8C). Taken together, these ndings demonstrate that the immune-related lncRNAs could be used as probes to identify novel patient subgroups with signi cant clinical outcomes expected.

Discussion
The immune system plays a vital role in development of LUAD. Technological advancement such as highthroughput technology has enabled partial characterization of biological features of LUAD in ltrating immune cells at gene and mRNA levels [50,51]. Exploring more regulatory mechanisms of the immune system is of primary importance towards designing and innovating immunotherapeutic strategies. In recent years, the discovery of lncRNAs has provided a new perspective on gene regulation in diverse biological processes [52] although the role played by these factors during LUAD remains unclear. In order to provide insights into mechanisms regulating immune responses during LUAD, we obtained and analyzed RNA expression pro les of 101 LUAD patients from GEO database. A total of 11 modules were clustered using WGCNA and 156 lncRNAs in 3 immune-related modules were selected (Fig. 1). Immunerelated clusters showed a signi cant correlation to the overall survival of patients.

Page 12/24
To further verify the candidate immune-related lncRNAs in LUAD, 524 more patients from the TCGA database were divided into training and testing datasets and used to construct a prognostic classi er model. From the LASSO Coxph regression results, we identi ed RP6-91H8.3, RP11-504A18.1 and AC092171.4 as potential protection factors while RP11-1275H24.1, LINC01003 and AC006129.2 were found to be potential risk factors (Fig. 2). Four of these lncRNAs, except RP6-91H8.3 and RP11-1275H24.1 have been studied. Rong-quan He et al. reporting RP11-504A18.1 and AC006129.2 as independent prognostic indicators for soft-tissue sarcoma in patients [53]. Similarly, LINC01003 was reported to be a prognostic predictor for acute myeloid leukemia [54] while high expression of AC092171.4 in hepatocellular carcinoma patients correlated with poor survival [55]. However, the role of these immune-related lncRNAs during LUAD has not been studied. The results this, therefore, provides insights into this little-understood phenomenon.
Patients from TCGA database were divided into high and low ILRS score groups based on multivariate Coxph regression coe cient analysis of each patient. Recently, Huang et al.
[56] constructed a lncRNAbased risk score model to predict prognosis of lung squamous cell carcinoma and con rmed that 9 lncRNAs were strongly related to this disease. In the current study, our results showed that a risk score was highly correlated to various clinical information in LUAD patients (Fig. 3). ILRS based ROC curve further exhibited a 1, 3, and 5-year survival of LUAD patients with AUC values of 0.71, 0.61, and 0.54, respectively, which were superior to traditional clinical indicators in predicting 3 and 5-year survival (Fig. 4). These results indicate that ILRS can serve as a new and independent risk factor for prediction of prognosis in LUAD patients.
Immune in ltration of the tumor can be directly related to prognosis of patients. In addition, tumorin ltrating immune cells acquire unique characteristics in LUAD [57,58]. To determine whether ILRS is related to immune in ltrating cells, we estimated the stromal, and immune scores as well as tumor Purity of each LUAD patient. We found that the immune score and tumor purity were signi cantly correlated with overall survival (Fig. 5), and up to 70% of the in ltrating immune cell types in the test dataset were T, B cells as well as macrophages (Fig. 6). Reports have demonstrated that activation of cytotoxic T cell responses is essential for tumor cell elimination and the adoptive immunotherapy requires isolation and expansion of speci c CD8 + T cells [59,60] To gain more insights into the functional roles of the immune-related lncRNAs, GO and pathway enrichment analyses performed in this study revealed that most of the signi cant items for each enrichment analysis were directly related to immune response (Fig. 7). The participation of immune-related lncRNAs in dendritic cell maturation, T cell activation and cytotoxic T lymphocyte associated antigen-4 (CTLA-4) signaling indicates their importance in regulation of immune responses to LUAD. CTLA-4 antibodies were the rst immune checkpoint inhibitors approved by the US Food and Drug Administration [69]. A combination of checkpoint inhibition therapy is being extensively evaluated for potential clinical bene ts in tumor-related histology [70]. Therefore, it is possible that lncRNAs related to CTLA-4 signaling may guide the application of checkpoint inhibitors.

Conclusions
From the ndings of this study, we con rm outstanding prognostic values of the immune-related lncRNAs following analysis of the role of immune-related lncRNAs in classi cation of LUAD patients. The unsupervised consensus clustering approach reveled four distinct clusters with unique expression pro les. In addition, the study revealed that immune-related lncRNA-based subtypes were signi cantly associated with overall survival in the LUAD population (Fig. 8). Taken together, our results present evidence that different immune-related lncRNA patterns are related to LUAD progression, implying that these speci c lncRNAs may provide additional information for LUAD classi cation, prognosis as well as immunotherapy.

Declarations
Ethics approval and Consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The gene expression data and clinical data in this study can be found online at the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/, GSE81089) and TCGA (https://www.genome.gov/Funded-

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.