MiRNA-based model for predicting the TMB level in colon adenocarcinoma based on a LASSO logistic regression method

Abstract Some patients with advanced colon adenocarcinoma (COAD) are not sensitive to radiotherapy and chemotherapy, and as such, immunotherapy has become the most popular option for these patients. However, different patients respond differently to immunotherapy. Tumor mutational burden (TMB) has been used as a predictor of the response of advanced COAD patients to immunotherapy. A high TMB typically indicates that the patient's immune system will respond well to immunotherapy. In addition, while microRNAs (miRNA) have been shown to play an important role in treatment responses associated with the immune system, the relationship between miRNA expression levels and TMB has not been clarified in COAD. We downloaded miRNA data and mutational files of COAD from the Cancer Genome Atlas database. Differentially expressed miRNAs were screened in the training group, and miRNAs used to construct the model were further identified using the LASSO logistic regression method. After building the miRNA-based model, we explored the correlation between the model and TMB. The model was verified by a receiver operating characteristic curve, and the correlation between it and 3 widely used immune checkpoints (programmed death receptor-1, programmed death-ligand 1, and cytotoxic T-lymphocyte associated protein-4) was explored. Functional enrichment analysis of the selected miRNAs was performed, and these respective miRNA target genes were predicted using online tools. Our results showed that a total of 32 differentially expressed miRNAs were used in the construction of the model. The accuracies of the models of the 2 datasets (training and test sets) were 0.987 and 0.934, respectively. Correlation analysis showed that the correlation of the model with programmed death-ligand 1 and cytotoxic T-lymphocyte associated protein-4, as well as TMB, was high, but there was no correlation with programmed death receptor-1. The results of functional enrichment analysis indicated that these 32 miRNAs were involved in many immune-related biological processes and tumor-related pathways. Therefore, this study demonstrated that differentially expressed miRNAs can be used to predict the TMB level, which can help identify advanced COAD patients who will respond well to immunotherapy. The miRNA-based model may be used as a tool to predict the TMB level in patients with advanced COAD.


Introduction
Colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide, accounting for a third of the morbidity and mortality rates in both males and females. It is estimated that there will be 147,950 new CRC cases and 53,200 deaths from CRC in the United States in 2020. [1] Colon adenocarcinoma (COAD), which is one type of CRC, represents the majority of all of the CRC cases in the United States. [2] Traditional treatments for COAD mainly include surgery, radiotherapy, and chemotherapy. With early screening and effective treatment, a 5-year survival rate of 90% can be achieved. [3] However, due to the subtle symptoms in the early stage, some patients have metastases at the time of initial diagnosis, resulting in a 5-year relative survival rate of only 14%, although they acquire systemic therapy. [3] Therefore, it is urgent to explore effective treatment schemes for patients with advanced COAD.
Immunotherapy is to control the time and place of immune responses to increase antitumor activity through an immune checkpoint blockade. Immunotherapeutics, including high dose interleukin-2 and antibodies that block programmed death receptor-1 (PD-1)/programmed death-ligand 1 (PD-L1), and cytotoxic T-lymphocyte associated protein-4 (CTLA-4) can induce durable responses across numerous types of solid tumors [4][5][6][7][8][9][10] and hematologic malignancies. [11,12] Immunotherapy has been shown to be highly effective in cancer. The use of tumor PD-L1 expression as a biomarker has been studied extensively. [13] However, there is a general need to better identify responders, as only 25% to 30% of patients under checkpoint treatment show long-term responses and these might not be exclusively identified by PD-L1 expression. [14,15] It is an unmet need for biomarkers that will identify patients more likely to respond to PD-1/PD-L1 blockade as well as other immunotherapeutics.
Thus, identifying patients who are most likely to benefit from immunotherapy is crucial. Tumor mutational burden (TMB), measured by hybrid based NGS, is a new biomarker for response to immunotherapy. [16,17] Cancers are caused by the accumulation of somatic mutations that can result in the expression of neoantigens. [18] A high TMB can lead to modifications of the proteins encoded by the mutated genes, which would make them more easily recognized by the immune system because of the neoantigens. [19] It has been suggested that tumors with numerous antigens are more sensitive to immune checkpoint inhibitors (ICIs) [20,21] and have a higher TMB level, and have been shown to exhibit a strong response to immunity inhibitors in non-small cell lung carcinoma, melanoma, and CRC. [18,22,23] High TMB can lead to modifications of the proteins encoded by the mutated genes. The modified proteins may be recognized by the immune system as "nonself" and activate specific. [24] The translation of the mutated gene into a modified protein requires posttranscriptional regulation, and microRNAs (miRNAs) are important molecules involved in posttranscriptional regulation.
miRNA are a class of small RNAs with no coding potential. By complementary pairing to the 3 0 -untranslated region of messenger RNA, miRNAs exert posttranscriptional control of protein expression, which are often expressed aberrantly in cancer. [25,26] Since miRNAs are involved in the regulation of various cancer hallmarks, miRNAs may be promising outcome predictors for various types of cancers. [27][28][29] Studies have shown that miRNAs play important roles in mediating and controlling several immune and cancer cell interactions. [30] Therefore, we hypothesized that miRNA expression patterns could be used as biomarker for predicting TMB levels different. To confirm our hypothesis, we downloaded the datasets of COAD, including mutation annotation files and miRNA expression profiles from the Cancer Genome Atlas (TCGA) database, in order to establish an miRNA-based model for predicting TMB levels in COAD.

Data acquisition and processing
The miRNA expression profiles contained 458 samples (450 COAD tissue samples and 8 matched healthy colon tissue samples) were obtained from the TCGA database (https://gdc. cancer.gov/). The mutation data of COAD were also downloaded from TCGA. All data were downloaded from public databases (TCGA), so ethical approval did not apply to this study. TMB was defined as the number of somatic variants per megabase (MB) of genome. [31] This study estimated the size of the exome to be 38 MB. [32] We chose the Varscan2 pipeline as the somatic mutation calling workflow. We also defined those genes with mutations <10 per MB as a low TMB level and with ≥10 mutations per MB as a high TMB level. [33] In total, 383 samples (309 low TMB samples and 74 high TMB samples) with miRNA profiles and TMB expression values were identified for further analysis using the "limma" package in R (version 4.0.0; https://www.r-project.org/). These samples were divided into the training set (60%) and test set (40%) using "caret" package based on clinical characteristics (Table 1, all P-value > .05) after using bootstrapping method.

Screening of differentially expressed miRNAs
The miRNAs that were expressed <10% in the COAD samples were excluded from the training set. The differentially expressed miRNAs in the training set were analyzed using the "limma" package. [34] The Table 1 Information of the training and test sets. fold change (FC) in the expression of each miRNA was calculated, and the miRNAs that met the requirement of jlogFCj > jlog1.5j (FC = 1.5) and P-value < .01 (adjusted by false discovery rate) were considered as differentially expressed. The expression of miRNA was visualized in a heatmap using "pheatmap" package.

miRNA-based model for predicting the TMB level
Least absolute shrinkage and selection operator (LASSO) regression, which was used to reduce the dimension in multiple highly correlated features, helped with the selection of optimal miRNAs in this study. Based on the remaining miRNAs weighted by their own coefficients in LASSO regression, we constructed a model to estimate TMB levels. A model index for each sample could be created by the following formula: where the "Intercept" is a constant of the created model, "Exp" represents the expression value of a selected miRNA, and "Coef" represents the respective weighting coefficient. An index of ≥0.5 was considered as a high TMB level, and an index of <0.5 was considered to be low TMB. The above steps were completed with the "glmnet" package. [35] We performed the validation in the test set to estimate the accuracy and applicability of the prediction model. The efficiency of the model was assessed by 5 frequently used aspects of accuracy: sensitivity, specificity, positive predictive value, negative predictive value, and area under curve (AUC). ROC curves were drawn and compared using the "pROC" package [36] in R.

Principal component analysis (PCA)
In order to make the model as recapitulative and low-dimensional as possible, we performed PCA within gene profiles of differentially expressed miRNAs before and after the feature dimension reduction in LASSO. The above steps were performed using the "ggplot2" package in R. The outputs of the PCA are shown in 2-dimensional scatter plots ( Fig. 3B and C).

Correlation between the miRNA-based model and the expression of 3 immune checkpoints and TMB levels
In the total set, the model index of each sample was calculated. We then estimated the linear relationship between the model and TMB, as well as the expression of 3 widely known immune checkpoints (PD-1, PD-L1, and CTLA-4) using the "limma," "ggplot2," and "ggpubr" packages in R.

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis
DIANA-mirPath web-server45 (http://snf-515788.vm.okeanos. grnet.gr/) was used to perform KEGG pathway and GO enrichment analysis for selected miRNAs of the model. The TarBase 7.0 [37] tool in the DIANA-mirPath web-server was utilized in this study. A P-value of <0.01 was considered to be significantly enriched. The results of GO and KEGG pathway analysis were visualized in the bubble plots using the"ggplot2" package in R.

Target genes of selected miRNA
The MiRDB (http://mirdb.org/), miRTarBase (http://mirtarbase. mbc.nctu.edu.tw/php/index.php), and TargetScan (http://www. targetscan.org/vert_72/) databases were applied to these miRNAs in order to investigate their target genes. We identified these genes that were simultaneously recognized by the 3 above database as target genes of selected miRNAs. The results were also visualized in Venn diagrams using the "VennDiagram" package in R.

Differentially expressed miRNAs
The workflow of our research is shown in Figure 1. Conventional clinicopathological characteristics did not differ significantly between the training sets and the test sets (Table 1). There were 230 samples in the training sets, including 181 with low TMB levels and 49 with high TMB levels. A total of 63 differentially expressed miRNAs, including 39 upregulated miRNAs and 24 downregulated miRNAs, met the cut-off criteria (P < .01 and jlogFCj > jlog1.5j) in the training set. Figure 2 shows a heat map representing the results of the differentially expressed analysis.
The expression values of differently expressed miRNAs were related to the TMB level of the samples, which could distinguish the samples with high expression of TMB from those with low expression.

Enrichment analysis
The GO and KEGG enrichment analysis of these 32 miRNAs are shown in Figure 5A and B, from which it could be seen that they were enriched in numerous immune-related biological processes, as well as cancer-related pathways. This result suggested that these miRNAs play nonnegligible roles in cancer-related immune processes.

Prediction of the target genes of the 32 miRNAs via miRDB, miRTarBase, and TargetScan databases
After a total of 32 miRNA were identified by the LASSO regression method, we visualized the forecast results of the above 3 databases via drawing Venn diagrams. Certain genes were identified as the target genes of those selected miRNAs, including miR-92b-3p, miR-99a-5p, and miR-146b-5p ( Fig. 6A-C).

Discussion
Although immunotherapy shows a satisfactory curative effect for tumor therapy, only a fraction of patients benefit from it. One reason for the low success rate may be that low TMB levels cannot trigger a proper immune response in the body. A previous study showed that high somatic mutation loads were associated with prolonged progression-free survival. [38] Further research found that the large proportion of mutant neoantigens in mismatch repair deficient cancers make them sensitive to immune checkpoint blockade, regardless of the tissue origin of the cancer cells. [39] The TMB, as a genomic marker and predictor of ICIs treatment response, has been reported in many cancers, such as lung cancer [40] and bladder cancer. [41] Thus, predicting the TMB levels of patients with cancer will allow for a more personalized treatment plan. Previous studies have found that miRNAs simulate the therapeutic efficacy of ICIs via regulating the expression of checkpoint receptors either directly or indirectly. [25] Recent advances have revealed that miRNAs are being recognized as an important role in ICI therapy. Some studies have begun to highlight the prognostic value of a miRNA-based model for CRC. [42] However, it is unknown if miRNA expression is related to TMB in COAD. Therefore, we hypothesized that miRNA  expression could be used as a biomarker to predict the TMB level in advanced COAD patients, which would allow clinicians to identify patients who are more sensitive to immunotherapy. Thus, we established a miRNA-related model to verify this hypothesis. We identified 32 miRNAs for model construction.
The ROC curves showed that this model was very accurate for predicting the TMB level in both the training and test sets (Fig. 4).
For this result, we believe that the accuracy of this 32 miRNAbased model was so high because of the typical COAD cases in TCGA and the relatively large number of miRNAs screened. Therefore, external verification that contains more cases are necessary in the future. The PCA showed that the 32 selected miRNAs were able to distinguish patients with different TMB levels in 2 dimensions, which was similar to the results of the PCA of the 63 differentially expressed miRNAs. This indicated that the miRNA-based model was robust and available.
The results of the correlation analysis between the model index and the 3 immune checkpoints showed that this model has a positive correlation with PD-L1 and CTLA-4, but had no correlation with PD-1. This discovery aroused our interest. PD-L1 expression is strictly associated with miRNAs function in cancer cells. [43] Cancer cells highly express PD-L1 which help them in evading immune responses. [44] PD-L1 is a transmembrane protein, highly expressed on antigen presenting cells and is involved in imparting self tolerance. A previous study showed that certain miRNA expression could decrease PD-L1 expression in patients with COAD. [45] In addition, Several studies also have revealed that miRNAs could decrease PD-L1 expression by binding to 3 0 -untranslated region of PD-L1, suggesting that miRNAs were negatively related with PD-L1 expression. [46][47][48] Moreover, miRNAs regulate PD-L1 expression and have potential therapeutic uses. For instance, miR-200/ZEB axis is strongly correlated with high PD-L1 expression, consistent annihilation of CD8+ cell infiltration, and high EMT scores. [49] Another miRNA, miR-197-5p is negatively correlated with PD-L1 expression via CDC28 protein kinase regulatory subunit 1B/ STAT3 axis. [50] Study had shown that circFGFR1 could directly interact with miR-381-3p and subsequently act as a miRNA sponge to upregulate the expression of the miR-381-3p target gene C-X-C motif chemokine receptor 4, which promoted nonsmall cell lung carcinoma progression and resistance to antiprogrammed cell death 1-based therapy. [51] However, in our research, this miRNA expression signature was positively related with PD-L1 expression. Besides, there are few related literatures between miRNAs and PD-1, and the regulatory network between them is still unclear. Study have found that the miRNAs were positively related with PD-1 expression in CRC, [52] But it was reversed in another study. [53] This reveals that the relationship between miRNAs and PD-1 expression is still controversial. And the result of our research may be another relationship between miRNAs and PD-1. Thus, future researches are required to explore the underlying mechanism between these TMB-related miRNAs and PD-L1 and PD -1 expression. Besides, the miRNArelated expression signature showed a median positive correlation with TMB, indicating that this miRNA-related expression signature predicted the TMB level from a biological perspective of the anticancer immune response. This result was consistent with our expectations.
GO analysis demonstrated that 32 miRNAs were involved in a number of important biological processes associated with the immune response, such as "immune system process," "innate immune response," and "toll-like receptor signaling pathway." KEGG analysis indicated that the 32 miRNAs were also involved in a relatively unique pathway associated with various human tumors, such as " CRC," bladder cancer," and "pancreatic cancer." These initial results indicated that the miRNA-based model was feasible for predicting TMB levels in patients with COAD. Although this result is exciting, more research is needed to verify this result.
Although this miRNA-based model showed excellent results, there are still several potential limitations in the present research. First, the threshold of TMB levels may vary owing to different methods. [54] Second, the number of miRNAs in this model is large compared with another study, [33] which may be responsible for the high accuracy of ROC curves. Third, further studies with a larger sample size are needed in order to validate the forecast effect of the signature of the 32 miRNAs. Fourth, further validation of the selected miRNA target genes is required.

Conclusions
We demonstrated that the differential expression patterns of 32 miRNAs have a high correlation with the TMB values of COAD patients. The miRNAs-based model may provide clinicians with a predictor of TMB levels in advanced COAD patients. The results from this study have the potential to help distinguish patients with a high TMB who will benefit from immunotherapy.