Construction of an M1 macrophage-related lncRNA signature for predicting the tumor immune microenvironment

Bioinformatic analysis screened a 7-lncRNA signature that predicted M1 macrophage infiltration and prognosis.


Introduction
A total of 1.9 million colon cancer patients were newly diagnosed in 2020, causing over 900,000 deaths, ranking second in all cancer-related deaths [1] .The incidence rate continues to rise around the world.Studies have identified that bad lifestyles and habits, such as smoking [2,3] , high-fat diet [4] and alcoholism [5] , increase the incidence risk of colon cancer.Apart from external environmental factors, approximately 2% to 5% of colon cancer patients are derived from intestinal familial genetic diseases such as Lynch syndrome and familial adenomatous polyposis (FAP) [6,7] .Gene mutations in hMSH2 and hMLH1, which are responsible for mismatch repair (MMR), were detected in most Lynch syndrome patients, which greatly increases the risk of colon cancer oncogenesis if left untreated [8] .APC loss or mutation is a hazard factor that contributes to Wnt-β-catenin signaling pathway dysregulation, further leading to FAP or even colon cancer [8,9] .
Long noncoding RNAs (lncRNAs) are defined as >200 nt noncoding RNAs that cannot be translated into functional proteins.Recently, lncRNAs, as regulatory molecules, have displayed attractive prospects in the functional regulation of cells, especially in the field of cancer research.Abnormal lncRNA transcription has been found in various types of cancer.For instance, PCGEM1 and PRNCR1 can bind to the androgen receptor and promote prostate cancer progression [10] .In pace with the development and rise of immunotherapy, light has been shed on the relationships between lncRNAs and the tumor immune microenvironment [11−14] .
Macrophages play a crucial role in innate and adaptive immune processes, which work for pathogen or senescent cell clearance and homeostasis maintenance.A classic classification of macrophages is based on specific functions, categorizing macrophages into M1 and M2 subtypes.M1 macrophages are generally responsible for pro-inflammatory functions and kill cancerous cells [15,16] , while M2 macrophages mainly exert repairing or anti-inflammatory activities in injured tissue [15,17,18] and pro-tumor function in the background of cancer [19,20] .Tumor-associated macrophages (TAMs) are now attracting increasing attention.Studies have demonstrated that a complex TME can reprogram macrophages into different subtypes [21−23] .At the same time, the question about where TAMs originate also arose, and many studies have described that TAMs are recruited by chemotactic factors secreted from tumor cells [24] , such as CCL2 [25] , CCL5 [26] , and CXCL12 [27] .
In this study, we mainly focused on TME M1 macrophages, and bioinformatic approaches were used to find differences in transcriptome levels between different COAD patient populations, which were divided by M1 proportions.M1 macrophage-related lncRNAs were identified and further screened by survival analysis.The remaining lncRNAs were collected into a signature to test its prognostic value and performance in the TCGA-COAD cohort and other independent validation datasets.

Tumor microenvironment infiltrated immune cell proportion
The CIBERSORT algorithm [28] was used to deconvolute immune cell proportions from bulk RNA-Seq data based on the LM22 immune cell signature, which contains B cells (naive and memory), plasma cells, CD8 T cells, CD4 T cells, follicular helper T cells, regulatory T cells (Tregs), gamma delta T cells, NK cells (resting and activated), monocytes, macrophages (M0, M1, M2), dendritic cells (resting and activated), mast cells (resting and activated), neutrophils and eosinophils.M1 macrophages were extracted for further evaluation and analysis.

Differentially expressed lncRNA detection and lncRNA signature construction
⩽ 0.05 DESeq2 [29] was used to find differentially expressed genes between the high (M1-Mφ high ) and low (M1-Mφ low ) M1 macrophage infiltration groups, which were stratified by the survival package and survminer package in R. |LogFC| > 1 and adjusted p value were set to filter upregulated and downregulated genes.LncRNA information was downloaded from the lincpedia database (https://lncipedia.org/) [30].The upregulated genes in the M1-Mφ low group were intersected with the lncRNA list to obtain upregulated lncRNAs in the M1-Mφ low group.These lncRNAs were further narrowed by survival analysis to filter survival-correlated lncRNAs for lncRNA signature construction.

Gene set enrichment analysis and functional enrichment analysis
The score of the lncRNA signature was calculated by GSVA, and differential biological processes (BPs) between the lncRNA score high and lncRNA score low groups were analyzed through gene set enrichment analysis (GSEA) software developed by the Broad Institute [31,32] .Gene sets were downloaded from Molecular Signatures Database v7.5.1, GSEA official website (http://www.gsea-msigdb.org/gsea/msigdb/index.jsp), C5 ontology gene sets and H hall mark gene sets ⩽ ⩽ were queried for downstream analysis.In addition, mRNAs negatively correlated (cor −0.3, p value 0.05) with the lncRNA signature were selected for further gene ontology (GO) and functional enrichment analysis, which were performed by the online enrichment tool g:profiler (https:// biit.cs.ut.ee/gprofiler/gost) [33] .

Genomic mutation profile
Mutation information of TCGA-COAD was downloaded from the TCGA official website, and the package maftools [34] was used to visualize the mutation profile.The top 30 mutated genes in lncRNA score high and lncRNA score low patients were extracted for further analyses.

Statistical analysis
Kaplan-Meier survival plots and log rank p values were carried out to identify different survival probabilities between the M1high and M1low groups, which were determined by the "surv_cutpoint" function in the "survminer" R package.Multivariate Cox regression analyses were performed by the "coxph" function in the "survival" R package.The Spearman correlation coefficient and corresponding p values were calculated by the "cor.test"function.Mann-Whitney tests were performed to distinguish differences between two different groups.Statistical analyses were performed using R v4.1.3with the necessary R packages and GraphPad Prism 7.0.

Identification of differentially expressed lncRNAs between M1-Mφ high and M1-Mφ1 low
To discover the relationship between infiltration of M1 macrophages in the tumor microenvironment and patient prognosis, the M1 macrophage (M1-Mφ) proportion of each TCGA-COAD patient calculated by Cibersort was grouped into two cohorts, the M1-Mφ low (n=47) group and the M1-Mφ high (n=369) group, which were divided by the "surv_cutpoint" function from the survival package in R (cutpoint=0.007690675).Patients with lower M1 macrophage infiltration had significantly shorter overall survival times than patients with higher M1 macrophage infiltration (Fig. 1a, log rank p=0.0082).The same result was obtained in another independent validation dataset (Fig. 1b, GSE14333, log rank p=0.0068, cutpoint=0.0453332;Fig. 1c, integrated dataset of GSE17536 and GSE17538, log rank p=0.0472;Fig. 1d, GSE39582, cutpoint=0.003866829,log rank p=0.0088, cutpoint=0.005733291).To investigate the differences between the M1-Mφ low and M1-Mφ high groups, differentially expressed genes were calculated by the DESeq2 package, and a total of 1634 DEGs were detected, which contained 194 upregulated genes and 1440 downregulated genes in the M1-Mφ low group.Then, differentially expressed lncRNAs (DElncRNAs) were extracted by intersecting DEGs with the lncRNA list from the LNCipedia database, and a total of 71 DElncRNAs were obtained (Fig. 1e), which contained 13 upregulated lncRNAs and 58 downregulated lncRNAs (Fig. 1f ther analysis, and the overexpression of these lncRNAs may be responsible for lower M1 macrophage infiltration in the TME.
To assess the lncRNA signature performance on prognosis prediction, the lncRNA signature score of each TCGA-COAD patient was calculated by the GSVA package in R. All scored patients were stratified into 2 groups based on the median lncRNA signature score (median score = −0.04486603),and the Kaplan-Meier curve showed that patients with higher lncRNA signature scores had significantly shorter overall survival times (Fig. 2h, left, log rank p=0.00024).Additionally, in the lncRNA score high group, patients had lower infiltration of M1 macrophages, suggesting that the lncRNA signature score is negatively correlated with M1 macrophages in the TME (Fig. 2i, p<0.0001,Mann-Whitney test).In the validation cohort of GSE17537, the lncRNA signature score also correlated with a shorter overall survival time (Fig. 2h, right, log rank p=0.011, cutpoint=−0.4367792).
Multivariate Cox regression analysis of the lncRNA signature showed that it served as an independent risk factor that significantly correlated with patient prognosis (Fig. 2j).A lower lncRNA score acted as a protective factor, which is consistent with earlier survival analysis results.In addition, with the progression of colon cancer stage, the lncRNA score increased (Fig. 2k).The analysis above indicated that the lncRNA signature is a significant risk factor correlated with patient prognosis.
The tumor immune microenvironment was assessed via the ESTIMATE package in R. The tumor immune score of TCGA-COAD patients was calculated, and its correlation with the lncRNA signature score was also tested.The results suggested that a higher lncRNA signature score was negatively correlated with the immune score (Fig. 3b, Rs= −0.302,

⩽ ⩽
p=3.935×10 −10 ), indicating a lack of immune cells within the TME of colon cancer.Commonly, lncRNAs exert their functions by manipulating mRNAs to regulate corresponding downstream pathways.Hence, we calculated the relationships between protein-coding gene expression and the lncRNA signature score.mRNAs with a Spearman correlation coefficient −0.3 and p 0.05 (n=141) were removed for GO enrichment and functional enrichment analyses.Notably, the results showed that these mRNAs were largely enriched in immune-associated reactome pathways (Fig. 3c, Fisher's exact test) and biological processes (Fig. 3d, Fisher's exact test), indicating that members in the lncRNA signature may impede immune-associated gene expression and function.

lncRNA signature negatively correlates with antigen presentation and processing and immune cell chemotaxis
Immune cell recruitment to the TME depends on tumorrelated antigens that can be recognized by antigen-presenting cells such as DCs and macrophages.In addition, chemokines secreted by tumor cells or stromal cells in the TME mediate extratumor boundary immune cell chemotaxis.Antigen processing-and presentation-related gene sets were included to evaluate their relationship with the lncRNA signature.The results showed that the higher the lncRNA set score was, the lower the score for antigen presentation and processing (Fig. 4a, Rs = −0.307,p=2.054×10 −10 ).Additionally, the chemotaxis score of each sample in the lncRNA score high and lncRNA score low groups was assessed, and the results showed that the lncRNA score high group had a lower chemotaxis score, which indicated an immune exclusion microenvironment (Fig. 4b, Mann-Whitney test).These results provide additional evidence that patients with a high lncRNA score may also have an immune-cold TME.In addition, we tested the relationship between the lncRNA score and genes that positively regulate antigen processing and presentation and positively regulate chemotaxis.We found that these immune-positiveregulating genes were mostly negatively correlated with the lncRNA score (Fig. 4c-d), which further confirmed the inertness of the immune-related pathway in lncRNA score high patients.

lncRNA signature predicted immune checkpoint inhibition therapy
We then investigated the performance of the lncRNA signature in predicting immune checkpoint inhibition (ICI) therapy outcomes.The immune checkpoint genes PD1 and PD-L1 were extracted, and their correlations with the lncRNA signature score were calculated.The differences in PD1 and PD-L1 expression between the lncRNA score high and lncRNA score low groups were tested, which revealed that patients with lower lncRNA scores also had higher PD1 and PDL1 expression (Fig. 5a   to ICI therapy compared to the other group of patients.Then, we grouped TCGA-COAD patients into PD1 high and PD1 low groups and PD-L1 high and PD-L1 low groups based on the median expression of PD1 and PD-L1, respectively.Then, we combined the PD1/PD-L1 groups with the previous lncRNA score groups to investigate the cross impact of the lncRNA score and immune checkpoint gene expression on patient prognosis.The results suggested that patients with low lncRNA scores and high PD1 (Fig. 5c, left, log rank p=0.0042) or PD-L1 (Fig. 5c, right, log rank p=0.0046) expression had prolonged survival times compared with the remaining 3 groups, while for patients with high lncRNA scores, even if they had higher immune checkpoint expression, their prognosis was still undesirable.
Studies have reported the relationship between MSI subtypes in colon cancer and their potential value in predicting ICI therapy outcomes [35−37] .MSI information of TCGA-COAD patients was downloaded and combined with the lncRNA signature score.In the lncRNA score high group, only 10 patients had the MSI-H phenotype, while in the lncRNA score low group, 59 patients had the MSI-H phenotype, suggesting that the lncRNA score high and lncRNA score low groups showed dif-ferent MSI phenotypes (Fig. 5d, p<0.05, chi-square test).These results further support that the lncRNA signature score might serve as a predictor for ICI therapy response.

LncRNA score indicates tumor mutation landscape differences
Tumor mutation burden (TMB) is considered a biomarker for predicting ICI therapy outcomes [38,39] , and tumors with higher TMB tend to load more neoantigens that can be recognized by immune cells such as T cells [40] .The TMB of lncRNA score high and lncRNA score low samples was calculated via the maftools package, and the results showed that TMB in the lncRNA score low group was significantly higher than that in the lncRNA score high group (Fig. 6a, p = 0.0029, Mann-Whitney test).This feature was consistent with the analyses above that a lower lncRNA score is correlated with a higher immune score, indicating a better immune cell infiltration level due to more loading of neoantigens.The genome mutation landscapes of lncRNA score high and lncRNA score low were inspected and visualized.The top 30 mutated genes were selected and are displayed in Fig. 6b.After excluding shared mutated genes (n=21, APC, TP53, KRAS, TTN, PIK3CA, SYNE1, MUC16, FAT4, RYR2, DNAH5, FLG, ZFHX4, CSMD1, LRP1B, PCLO, USH2A, OBSCN, DNAH11, AD-GRV1, LRP2 and NEB), the remaining genes were extracted.

Discussion
Immune cell infiltration in the TME is closely correlated with tumor progression and has been widely recognized as an important factor in predicting the prognosis of cancer pa- tients.However, clinical identification of immune cells basically depends on pathologists' diagnosis via hematoxylin and eosin (HE) staining or immunohistochemistry (IHC) staining, which introduces diagnostic bias of individual pathologists.
With the rapid increase in transcriptome data, immune cellspecific signatures have been discovered, and their relationship with patient prognosis has been described in detail.addition, with the development of sequencing techniques, some noncoding RNAs have been detected, and their regulatory functions on protein-coding mRNAs have attracted increasing attention.lncRNA molecular markers have been reported and validated.Recent studies have also discovered that lncRNAs are correlated with tumor-infiltrating immune cells.Huang et al. [12] reported that the lncRNA NKILA induced Tcell death to escape immune killing.Zhou et al. [41] built a Bcell-based lncRNA signature as a prognostic predictor for bladder cancer.M1 macrophage infiltration in the tumor region is considered a factor that predicts better clinical outcomes [42−44] .In addition, the role of noncoding RNAs in macrophage polarization cannot be ignored [43] .In our work, patients with lower M1 macrophage residence were closely focused on, and their differentially expressed lncRNAs were queried and investigated.Seven significantly upregulated lncRNAs were identified in the M1low cohort.As we assumed, this 7-lncRNA signature correlates with worse patient prognosis.When the lncRNA signature score was combined with other clinical characteristics, it still acted as an independent prognostic factor.The functions of individual members in the lncRNA signature were also reported in some articles.LINC01494 promotes glioma development via the miR-122-5p-CCGN1 axis [45] , and the overexpression of LINC01494 can inhibit the sponge of miR-122-5p and further upregulate CCGN1, a cell cycle-related gene, causing glioma progression.The relationship between EVX1-AS and colon cancer was predicted in the lncRNADisease v2.0 database (http://www.rnanut.net/lncrnadisease/index.php/home) [46] .However, the role of these lncRNAs in macrophage development and polarization is still unknown, and more experimental evidence is needed to confirm this hypothesis.According to the canonical function of lncRNAs, they may affect macrophage function in a direct way, which regulates macrophage differentiation and polarization [47−48] , or in an indirect way, lncRNAs affect tumor cell metabolism or export by exosomes to modulate macrophage functions [49−50] .
The colon cancer immune environment was reported as an important feature that predicts ideal outcomes [51,52] .The ES-TIMATE immune score, an index indicating the immune cell infiltration level, was negatively correlated with the lncRNA signature, suggesting an immunodeficient environment in the highly scored lncRNA signature group.We investigated factors that affect TME immune cell infiltration, and the potential cause of such an immune-excluded TME might be mediated by impeding antigen presentation and weakening the recruitment of various immune cells.
ICI therapy is a novel and prospective approach for cancer treatment, and even if limited patients could benefit from ICI [49] , it still attracts many researchers' attention.Meanwhile, biomarkers used to evaluate ICI response still need further discovery and assessment.Currently, the immune checkpoint molecules PD1 and PD-L1 and their inhibitors are widely used and investigated in clinical and research situations.Hence, an effect evaluation signature is crucial for predicting ICI therapy outcomes.By combining PD1 or PD-L1 expression with the lncRNA signature score, we identified that low lncRNA score cohort patients had significantly higher expression of PD1 and PD-L1 than patients with higher lncRNA scores.By comparing survival time among the lncRNA score - high -PD1 high /PD-L1 high , lncRNA score high -PD1low/PD-L1 low , lncRNA score low -PD1high/PD-L1 high and lncRNA score low -PD1low/PD-L1 low groups, patients with low lncRNA scores and high PD1/PD-L1 expression showed better prognosis than the other three groups.Notably, patients with low lncRNA score, even if higher PD1/PD-L1 expression detected, their overall survival time still significantly shorter than low lncRNA score patients, which provided evidence that PD1/PD-L1 expression alone was not enough for predicting ICI therapy outcomes, linking PD1/PD-L1 expression level with lncRNA score could serve a robust predicting result with enhanced precision.MSI, especially MSI-H, as a predictor of better prognosis of colon cancer has been reported [50−52] .MSI colon cancers deficient in the DNA mismatch repair (dMMR) system are responsible for monitoring and rectifying microsatellite-associated genome errors.MSI colon cancer generally possesses mutations in MLH1, MSH2, MSH6 and PMS2, which are crucial for the dMMR system [52] .Loss of function of these four genes leads to mismatch repair dysfunction, and cancer cells are prone to display more neoantigens that can be detected by immune cells.In addition to colon cancer, studies have found that MSI is a general phenomenon across various tumor types, such as endometrial cancer [53] , gastric cancer [54] , and ovarian cancer [55] .Grouping patients via lncRNA signature score, we found that MSI patients, especially MSI-H patients, were mostly grouped into the lncRNA score low group (n=59), while only 10 patients were characterized as MSI-H in the lncRNA score high group, indicating better clinical outcomes in lncRNA score low patients.
Tumor mutation burden, which is considered a better prognostic factor, was significantly higher in lncRNA score low group patients, concordant with previous survival analyses.Additionally, four MSI-related genes (MLH1, MSH2, MSH6 and PMS2) were more frequently mutated in the lncRNA score low group, causing MSI in colon cancer and further linked to an ideal clinical prognosis.Notably, FXBW7, a gene that was previously reported to be closely related to anti-PD1 therapy resistance in melanoma, was highly mutated in the lncRNA score high group (16% vs 10%) [56] .Loss of function of FXBW7 changed TIM by inhibiting IFN-I and MHC-I expression, and restoration of FXBW7 rescued PD-1 blockade therapy.This result provided evidence that lncRNA score low patients may be resistant to ICI therapy due to high mutation of FXBW7, which is consistent with the analysis that the lncRNA signature and PD-1/PD-L1 expression together could predict ICI therapy outcomes.
With the development of single-cell RNA sequencing, the Construction an M1 macrophage-related lncRNA signature for predicting the tumor immune microenvironment Wu et al.
classification of M1 and M2 macrophages is now overly simple and rough.More newly discovered and defined subpopulations of macrophages have been discovered and investigated.For example, SPP1 + macrophages together with FAP+ fibroblasts promote the progression of colorectal cancer [61] .The infiltration of TREM2 + macrophages remodels an immunosuppressive microenvironment in the tumor region [62−63] .In contrast, FOLR2+ macrophages could boost the immune response, leading to better clinical outcomes [64−65] .However, the roles of lncRNAs in macrophage development and differentiation are still unclear.Therefore, more experiments and bioinformatic analyses are needed to elucidate how lncRNAs regulate the development and transformation of macrophage subtypes.

Fig. 3 .
Fig. 3.The tumor immune microenvironment is negatively correlated with the lncRNA signature.(a) GSEA plots of differentially enriched cancer hallmark pathways between the lncRNA score high and lncRNA score low cohorts.(b) Spearman's correlation between lncRNA score and ESTIMATE immune score (Rs=−0.302,p=3.935×10 −10 ).(c) Reactome enrichment plot of mRNAs that negatively correlated with the lncRNA set (n=23, Fisher's exact test).(d) GO:BP enrichment plot of mRNAs that negatively correlated with the lncRNA signature.The top 20 enriched biological processes are displayed (Fisher's exact test).

Fig. 4 .Fig. 5 .
Fig. 4. Antigen-presenting processes were downregulated in the lncRNA score high group.(a) Spearman correlation between lncRNA score and antigen presenting and processing score (Rs=−0.307,p=2.054×10 −10 ).(b) Box plot of differences in chemotaxis score between the high lncRNA group and the low lncRNA group (**: p<0.05, ***: p<0.0001,Mann-Whitney test, data are represented as the mean ± SD).(c) Spearman correlation of lncRNA score and genes that positively regulate antigen processing and presentation.(d) Spearman correlation of lncRNA score and genes that positively regulate chemotaxis.

Fig. 6 .
Fig. 6.Relationship between tumor mutation burden and lncRNA signature.(a) Swarm diagram shows the differences in TMB between the lncRNA scorehigh and lncRNA scorelow groups (data are represented as the mean ± SD, Mann-Whitney test).(b) Waterfall plot shows the top 30 mutated genes in the lncRNA scorehigh (upper panel) and lncRNA scorelow groups (lower panel).Genes in black suggest shared mutated genes across the lncRNA scorehigh and lncRNA scorelow groups.Genes in red were highly mutated in the lncRNA scorehigh group.Genes in cyan were highly mutated in the lncRNA scorelow group.(c) Comparing MSI-related gene mutations between the lncRNA scorehigh and lncRNA scorelow groups.