An immune-related lncRNA risk coefficient model to predict the outcomes in clear cell renal cell carcinoma

Objective: Using model algorithms, we constructed an immune-related long non-coding RNAs (lncRNAs) risk coefficient model to predict outcomes for patients with clear cell renal cell carcinoma (ccRCC) to understand the infiltration of tumor immune cells and the sensitivity to immune-targeted drugs. Methods: Open genes data were downloaded from The Cancer Genome Atlas and The Immunology Database and Analysis Portal, and immune-related lncRNAs were obtained through Pearson correlation analysis. R language software was used to obtain differentially expressed immune-related lncRNAs and immune-related lncRNA pairs. The model was constructed using least absolute shrinkage and selector operation regression analysis, and receiver operator characteristic curves were drawn. The Akaike information criterion was used to distinguish the high-risk from the low-risk group. We also conducted correlation analysis for the high- and low-risk subgroups. Results: We identified 27 immune-related lncRNAs pairs, 16 of which were included in the model construction. After merging clinical data, the areas under the curve of 1 -year, 3-year, and 5-year survival times of ccRCC patients were 0.867, 0.832, and 0.838, respectively. Subgroup analyses were conducted according to the cut-off value. We found that the high-risk group was associated with poor outcomes. The risk score and tumor stage were independent predictors of the outcome of ccRCC. The risk model predicted specific immune cell infiltration, immune checkpoint gene expression levels, and high-risk groups more sensitive to sunitinib targeted therapy. Conclusion: We obtained prognostic-related novel ccRCC markers and risk model that predicts the outcome of patients with ccRCC and helps identify those who can benefit from sunitinib.


INTRODUCTION
Renal cancer is among the top ten most common cancers worldwide; it accounts for 5% of new cases each year. Approximately 45,520 people were diagnosed with renal cancer in the United States in 2020 [1]. Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cancer, accounting for about 85% of the total cases, with a male-to-female ratio of 1.7:1. Most patients are middle-aged and elderly, with an average age of 64 [2]. AGING CheckMate 025, nivolumab was more effective than everolimus in treating patients with advanced ccRCC who had previously received treatment (the 5-year survival rate was about 26% vs. 18%). The (Food and Drug Administration) FDA approved Nivolumab for the treatment of ccRCC [8]. Although some of these drugs can improve outcomes, their effectiveness is limited [9]. Therefore, we need to explore the mechanisms of ccRCC occurrence and development and identify new biomarkers to predict related drug sensitivity.
The tumor microenvironment of ccRCC is heterogeneous, and several studies showed that the degree of immune cell infiltration in the tumor microenvironment was related to outcomes. Studies showed that high levels of CD8+ T cell infiltration in ccRCC was associated with poor outcome. Macrophages were also an essential part of the tumor microenvironment; high degrees of infiltration of M2type macrophages are associated with tumor invasion and poor outcome of ccRCC [5]. Therefore, it is necessary to identify immune-related tumor prognostic markers.
Long non-coding RNA (lncRNA) does not code for protein. The length is more than 200 bp. It was initially considered a by-product of RNA polymerase II transcription and did not have biological functions. Many studies showed that lncRNA has a conserved secondary structure that can interact with proteins, DNA and RNA, and participate in regulating various biological processes, especially in tumors. It plays critical regulatory roles in tumors, including chromatin modification, transcription activation and inhibition, post-transcriptional mediation, and miRNA-induced molecular interference with gene expression [6][7][8].
Many studies showed that changes in molecular biology are closely related to the occurrence and development of ccRCC. Transcriptome studies described the abnormal expression of specific long non-coding RNAs, and the occurrence and progression of ccRCC have a close relationship. The expression level of lncRNA HOTAIRM1 in ccRCC decreased and inhibited the hypoxic pathway of tumor development [10]. Another study showed that lncRNA URRCC and EGFL7/P-AKT/FOXO3 signaling was related to poor outcomes and promoted the proliferation and invasion of ccRCC [11]. Another study showed that lncARSR transported by exosomes promoted the expression of AXL and c-MET in ccRCC cells by competitively binding miR-34/miR-449, rendering ccRCC patients resistant to sunitinib [12]. The binding lncRNA-LET and miR-373-3p induced the up-regulation of DKK1 and TIMP2 levels and reduced the anti-tumor effect of lncRNA-LET-mediated by ccRCC cells [13]. Other studies demonstrated that immune-related lncRNA has vital clinical significance in predicting outcomes in patients with ccRCC and as a target for targeted therapy [14][15][16][17][18][19]. Therefore, the present study aimed to construct an immune-related lncRNA risk coefficient model using a model algorithm, lncRNA pairing, and iteration to predict outcomes in patients with ccRCC, understanding the tumor immune cell infiltration and the sensitivity of targeted drugs.

RESULTS
The illustration of summary highlight was provided in Figure 1.

Analysis of differential expression of immunerelated lncRNAs in ccRCC
The transcriptome and immune gene-related data of ccRCC were obtained from The Cancer Genome Atlas (TCGA) database and The Immunology Database and Analysis Portal (ImmPort). The Ensembl database was used to annotate and distinguish transcriptome data. Using Pearson correlation analysis, with co-expression correlation coefficient >0.4 and P < 0.001 as the identifying criteria, 433 immune-related lncRNAs were identified. We used differential expression analysis, with |log Fold Change| >1.5 and false discovery rate (FDR) <0.05 as the identifying criteria. We obtained 90 differentially expressed immune lncRNAs, and the gene heatmap (Figure 2A) was generated using R software. Sixteen lncRNAs expressions were downregulated, and 74 lncRNAs were upregulated in ccRCC ( Figure 2B and Table 1).

Establishment of differentially expressed immune lncRNA pairs and risk coefficient scoring model
By matching the 90 differentially expressed immune lncRNA pairs for multiple cycles, a total of 2663 differentially expressed immune lncRNA pairs were obtained. Next, 27 immune lncRNA pairs were identified using least absolute shrinkage and selector operation (LASSO) regression analysis and Cox univariate regression analysis ( Figure 3A). Then, a Cox multivariate regression analysis was performed based on these 27 immune lncRNA pairs, 16 of which can participate in constructing the risk coefficient scoring model ( Figure 3B) and the risk coefficient of each immune lncRNA pair was obtained ( Table 2).

Evaluation of the prognostic predictive power of the risk model
Above 27 prognostic-related immune lncRNA pairs were used to construct the 1-year, 3-year, and 5-year AGING receiver operator characteristic (ROC) curves of patients ( Figure 4A), and the 1-year area under the curve (AUC) was calculated to be the largest AUC of 0.867 ( Figure 4B). In addition, the 3-year and 5-year AUC obtained were 0.832 and 0.838, respectively, which also had predictive power. Through the best fit, the cut-off value for distinguishing between high-and low-risk groups of ccRCC patients was 2.822. We included 190 patients in the low-risk group and 360 patients in the high-risk group.

Analysis of the correlation of clinical indicators by the risk model
The relationship between the risk factor score and the risk subgroup patients ccRCC was analyzed via R language software ( Figure 5A). According to the time process, the relationship between the patient's survival status and the risk coefficient score were obtained ( Figure 5B), and the Kaplan-Meier curve was constructed according to the survival status of the high-and low-risk groups ( Figure  5C). The result was that the survival rate of patients in the low-risk group was significantly higher than that in the high-risk group (P < 0.001).
The heatmap in Figure 6A described the relationship between the level of risk scores and clinically relevant indicators. We found that the survival status of patients with ccRCC (P < 0.001), tumor grade (P < 0.001), tumor stage (P < 0.001), T stage (P < 0.001), M stage (P < 0.001), and N stage (P < 0.01) were related to risk coefficient score significantly. It can be seen from the box plot we constructed that ccRCC patients with a higher risk factor had a higher chance of death ( Figure  6B). Furthermore, tumor grade ( Figure 6C), tumor clinical stage ( Figure 6D), T stage ( Figure 6E), N stage ( Figure 6F), and M stage ( Figure 6G) were also higher.
A Cox univariate and a multivariate regression analysis were performed on the risk score and clinical correlation indicators. Then the R language software's survival package was used to visualize the data, and forest maps were done ( Figure 7A and Figure 7B). It was found that the tumor grade, tumor stage, TNM stage, and risk coefficient score were related to the outcome of the Cox univariate analysis, but in the Cox multivariate analysis, age, gender, and risk coefficient score were independent predictors of outcome. The ROC curve of clinical-related indicators and the 1-year risk coefficient score were compared in the same chart ( Figure 7C). The result was that the patient's risk coefficient score (AUC = 0.867) and tumor stage (AUC = 0.868) had the highest predictive efficacy.  Table 1. Immune-related lncRNAs of ccRCC obtained after differential expression analysis. Abbreviations: logFC: log fold change; FDR: false discovery rate.

Correlation analysis between risk coefficient model and immune cell infiltration
XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC, CIBERSORT-ABS, and CIBERSORT were used to estimate the proportion of immune cells in these samples of ccRCC patients based on marker gene and deconvolution algorithm. Pearson correlation test was used to analyze the correlation between the risk coefficient model and tumor immune infiltrating cells with screening criteria P < 0.05, and R language software was used for data visualization (Figure 8). We found that the samples of the high-risk group were positively correlated with the infiltration of NK cells, regulatory T cells, and M1 macrophages in ccRCC and negatively correlated with the infiltration of neutrophils in ccRCC.

Correlation analysis of risk coefficient model with genes
Immune-targeted therapy is one of the most popular drugs for the treatment of renal clear cell carcinoma. We further explored the relationship between the risk coefficient model and genes, and found that among the high-risk patients, the expression levels of CTLA4 (P < 0.001; Figure 9A), LAG3 (P < 0.001; Figure 9B), PDCD1 (P < 0.001; Figure 9C), GAL9 (P < 0.001; Figure 9D), and TIGIT (P < 0.001; Figure 9E) increased. The expression level of PDCD1LG2 increased but not significantly (P > 0.05; Figure 9F). The expression levels of CD274 (P < 0.01; Figure 9G), HAVCR2 (P < 0.01; Figure 9H) decreased in this model. These genes are the potential therapeutic targets for ccRCC.  Red dots: upregulation with significant differential expression, green dots: downregulation with significant differential expression, black dots indicate no significant difference.

Correlation analysis between risk coefficient model and targeted therapy drugs
Targeted therapy drugs are the first-line therapy for patients with advanced ccRCC. We also analyzed the relationship between the risk coefficient scoring model and the sensitivity of targeted therapy drugs. The IC50 was used to evaluate the efficacy of drugs. Lower IC50 suggests higher sensitivity. We found that the high-risk group was associated with higher sensitivity of sunitinib  ( Figure 10A), which was statistically significant (P = 3 e-08); axitinib ( Figure 10B), bevacizumab ( Figure 10C), pazopanib ( Figure 10D), and sorafenib ( Figure 10E) were not significantly different in the high-and low-risk groups.

DISCUSSION
Many studies found that transcriptome RNA expression levels are related to the outcomes of malignant tumors [20][21][22]. Recent research on the role of non-coding RNA in ccRCC is also a focus of research [10,11,13,23]. In previous studies, lncRNA-related models of ccRCC were constructed based on the expression level of transcriptome data [14,16,24]. In the present study, we used immune-related lncRNAs pairs to construct risk coefficient models to assess the outcome of patients with ccRCC, not based on expression levels of lncRNA. We first used TCGA and ImmPort to obtain the lncRNA and immune-related gene data of patients with ccRCC and then used R software to identify immunerelated lncRNAs. Then, the differential expression of ccRCC and normal samples adjacent to the cancer were analyzed, and immune-related lncRNA pairs were obtained. We obtained the risk coefficient of each sample of ccRCC patients and established a risk coefficient model using Cox univariate factors, multivariate regression analysis, and LASSO regression analysis. By generating ROC curves, we found that the AUC for one-year outcome was the largest, and the Akaike information criterion (AIC) optimal fitting was used to obtain the critical value for distinguishing high from low-risk groups. The survival analysis of the highand low-risk groups showed that the survival rate of patients in the low-risk group was significantly higher than that of the high-risk group (P < 0.001).
We also calculated the correlation between the risk coefficient score of each ccRCC sample and various clinical indicators. We found that age, gender, and risk coefficient score were independent predictors of outcome through Cox multivariate regression analysis. We also constructed the ROC curve of clinical-related indicators, which compared the ROC curve of the 1year risk coefficient score in the same chart. We found that the one-year outcome risk coefficient score and tumor stage were the best predictors of ccRCC outcome, suggesting the reliability of the risk coefficient model.
To analyze the relationship between risk factor score and immune cell infiltration, the immune cell infiltration data of patients with ccRCC, we used XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC, CIBERSORT-ABS, and CIBERSORT and correlation analysis. We found that the level of infiltration of NK cells, regulatory T cells, and M1 macrophages in the high-risk group was high. AGING functions [25]. A study showed that high levels of infiltration of dendritic cell quiescence, dendritic cell activation, mast cell quiescence, mast cell activation, and eosinophils were associated with a good outcome in ccRCC patients, while B cell memory, T cell follicular helper cells, and T cell regulation associated with poor outcome in ccRCC [26]. Xu et al. found that HK3 promoted the infiltration of monocytes and macrophages that present cell surface antigens and regulated the critical genes PD-1 and CTLA-4 of debilitating T cells, thereby affecting the immune escape process [27].
We also performed correlation analysis on the risk model for immune checkpoint genes and targeted therapy drugs and found that expression levels of CTLA4, LAG3, PDCD1, GAL9, and TIGIT increased, while expression levels of CD274 and HAVCR2 decreased in samples from patients in the high-risk group; these can be used as immune targeted therapy with potential therapeutic targets.
ccRCC discovered early and mid-term can be removed surgically, while patients with advanced ccRCC experience poor outcomes because of metastasis and missing the optimal time for surgery. Immune-targeted drug therapy has become the first-line treatment for patients with advanced ccRCC. Of these, vascular endothelial growth factor monoclonal antibody and tyrosine kinase inhibitors are the primary drugs used for anti-tumor angiogenesis therapy [28][29][30]. However, changes in the tumor microenvironment may be associated with the emergence of resistance of ccRCC to immune-targeted drugs. Therefore, identifying sensitive drugs may reduce treatment costs and reduce the side effects of immune-related drugs.
In the risk coefficient model, we included sunitinib, axitinib, bevacizumab, pazopanib, and sorafenib and found that patients with ccRCC in the high-risk group were more sensitive to sunitinib than the low-risk group.
This study has some limitations, although we adopted rigorous methods and algorithms to build the model. It is necessary to validate the reliability of our risk  Compared with the low-risk group, the sunitinib IC50 value of high-risk patients was lower (A), which was statistically significant (P = 3e-08). Axitinib (B), bevacizumab (C), and pazopanib (D), and sorafenib (E) were not significantly different between the high-and low-risk groups.
AGING coefficient model using external data. In a future study, we will collect more clinical data and expand the sample size.
Outcome-related novel ccRCC markers and risk coefficient model were obtained by constructing immune-related lncRNA pairs, which can predict outcomes of ccRCC and help distinguish those who can benefit from sunitinib.

Data acquirement
The GDC Data Transfer Tool was used to download open transcriptome data from TCGA (https://cancergenome.nih.gov/) [31] of ccRCC and normal tissues adjacent to the cancer, which included 539 cases of ccRCC samples and 72 cases of normal tissue samples adjacent to cancer. Then gene transfer format (GTF) files were downloaded using Ensembl (http://asia.ensembl.org) [32] to annotate and distinguish the mRNA and lncRNA of the transcriptome data. Immune-related genes were obtained from ImmPort (http://www.immport.org).

Differential expression analysis of immune-related lncRNAs
Immune-related lncRNAs were identified based on the co-expression strategy and Pearson correlation analysis according to the co-expression correlation coefficient > 0.4 and P < 0.001. DEirlncRNA was selected using the limma package of R language software [33] with |log Fold Change|>1.5 and FDR < 0.05. Obtained lncRNAs were visualized using the heatmap package.

Construction of immune-related lncRNA pairs
DEirlncRNAs were identified using multiple rounds of pairing. Parameter values of 0 or 1 were used for definition, and α was defined as the parameter value. If the expression of lncRNA A was greater than that of lncRNA B in an immune-related lncRNA pair in a sample, then the α value of the lncRNA pair was 1; otherwise, the value of α was 0. If the ratio of the α value (either 0 or 1) of the immune-related lncRNA pair in all samples is less than 80%, it means that the immune-related lncRNA pair was an effective match; otherwise, it needs to be re-paired.

Acquisition of clinical data and establishment of model
First, clinical data related to ccRCC were downloaded from TCGA. Then, the limma package of R software was used to match the immune-related lncRNA pairs in the previous step. We then took the intersection and deleted the repeated clinical data with no follow-up time. A single factor regression analysis was performed on the immune-related lncRNA pairs initially obtained, and the immune-related lncRNA pairs related to the survival status were found. The significance screening criterion was P < 0.01.
To prevent over-fitting, the glmnet package of R language was used to perform LASSO regression analysis [34] on the obtained immune lncRNA pairs, run 1000-repeated random cycles, and immune lncRNA pairs with a matching frequency of more than 100 times identified those with P < 0.05 after the second crossvalidation. The best pairing combination was selected to obtain immune lncRNA pairs that can participate in constructing the Cox risk coefficient model. By constructing Cox univariate and multivariate analysis models, the risk coefficient of each immune lncRNA pair related to the outcome was obtained, and the risk score of each patient's tumor sample was determined. The total risk score of each ccRCC patient sample was equal to the sum of the expression amount of each immune lncRNA pair in the sample multiply risk coefficient. The formula is following: The Cox analysis results were visualized using the survminer and survival packages of R software.

Construction of the ROC curve with risk coefficient model
ROC curves were constructed using the survivalROC package of the R software, which included ROC for 1-, 3-, and 5-years and the AUC values were calculated to determine the value predicted by the model. We found that the 1-year ROC curve had the largest AUC value. According to the AIC best fit [35], it was possible to distinguish low-risk with high-risk patients by finding a critical value with the largest sum of specificity and sensitivity.

Clinical correlation analysis with risk coefficient model
Survival and survminer packages of the R language software were used to compare the survival differences between the high-and low-risk groups. P < 0.001 indicated a significant difference. A Kaplan-Meier curve was constructed to visualize the data. The relationships between the risk score and the previously obtained clinical indicators (survival status, age, gender, AGING tumor grade, tumor stage, and T, N, and M stages) were analyzed using the chi-square test. The relationships between the risk score and the different subgroups of clinical indicators were analyzed using the Wilcoxon rank-sum test. The limma package and ggpubr package of the R language software were used to visualize the data. To determine whether the risk score can be used as an independent predictor related to the outcome of patients with ccRCC, we performed Cox univariate and multivariate regression analysis on the risk score and clinical correlation indicators, which used the hazard ratio to evaluate. P < 0.05 was the identification criterion, and the survival package of R software was used to visualize the data. To compare the accuracy of the risk score and clinically relevant indicators in predicting survival and outcome, we compared the ROC curves obtained for the 1-year follow-up with the ROC curve of clinically relevant indicators in the same graph.

Gene correlation analysis
We found that CD274, CTLA4, HAVCR2, LAG3, LGALS9, PDCD1, PDCD1LG2, and TIGIT were abundantly expressed in ccRCC samples. To determine whether these genes differed between the high-and low-risk groups of the risk model, the limma and ggpubr packages of the R language software were used to analyze and visualize the data using violin charts.

Correlation analysis of targeted drugs
To determine whether there was a difference in patients' response in the high-and low-risk groups of ccRCC patients to targeted drugs, the half-inhibition rate (IC50) of the drug was used as an index to measure drug sensitivity. The data were analyzed and visualized using the limma package, ggpubr, ggplot, and pRRophetic package in R software.