A Novel Potential Diagnostic and Prognostic Biomarker for Esophageal Cancer


 Background: Esophageal cancer (EC) is a lethal malignancy of the digestive tract with high morbidity and mortality rates, lacking molecular biomarkers for EC diagnosis and prognosis with high specificity and sensitivity.Methods: Downloaded from the Cancer Genome Atlas (TCGA), the high-throughput transcriptome sequencing mRNA data was normalized, after which dysregulated genes were identified via limma package. Weight gene coexpression network analysis (WGCNA) was performed to select diagnostic modules and candidate genes. Later, Hub genes were screened through functional enrichment, LASSO Cox regression and survival analysis. Furthermore, we performed comparation with other biomarkers and investigated the association between our target gene and immune microenvironment.Results: Esophageal adenocarcinoma subtype and esophageal squamous cell carcinoma subtype displayed similar expression patterns in terms of differentially expressed genes, which indicates the existence of consensus biomarkers. After a series of analyses, including weighted gene coexpression network analysis, Gene Ontology enrichment analysis and Cox regression analysis, CMA1 was identified as an abnormally upregulated biomarker with highest diagnostic and prognostic potential. Moreover, CMA1 might exert its effect by influencing the immune microenvironment. Conclusion: CMA1 is a potential biomarker for EC in not only diagnosis but also prognosis. Future prospective clinic trials are expected to evaluate its performance in clinical practice.


Introduction
Esophageal cancer (EC) is a lethal malignancy of the digestive tract that occurs worldwide. It was estimated that there were over half a million new cases and related deaths in 2018. [1] Because of the lack of speci c early symptoms and early screening methods, most patients with EC are diagnosed at advanced stages and miss the best treatment opportunity. Statistics show that more than 30% of patients with EC are diagnosed at an advanced stage, and the overall 5-year survival rate of EC is only 16%. [2] Traditional treatments, including surgery, radiotherapy and chemotherapy, are still the main strategies in EC. However, their outcomes are far from satisfactory. [3] Therefore, physicians and researchers are paying increasing attention to molecular biomarkers as both diagnostic and prognostic tools for EC, as they are e cient means for understanding the molecular basis of this disease and could also be exploited in new treatment strategies. Hence, an important task is to identify these biomarkers, which could thereby facilitate EC diagnosis and prognosis.
In the eld of the transcriptome research, several studies have been published on the identi cation of biomarkers in EC; some biomarkers were identi ed for diagnosis or prognosis, and some were identi ed to distinguish between the esophageal squamous cell carcinoma (ESCC) or esophageal adenocarcinoma (EAC) subtypes of EC. [4][5][6] In general, the identi ed biomarkers are not broadly applicable. This situation leads to the lower feasibility of the identi ed biomarkers because only small groups of patients may bene t.
Unlike these studies, we tried to identify important risk-related genes that have the ability not only to diagnose tumors but also to predict overall survival (OS) for EC within a set of differentially expressed protein-coding genes (PCGs). Unlike most of the studies that tested the reproducibility of the results in the last step, we tried to validate the results in most steps of our analyses.

Overall design
A schematic of our procedure is shown in Figure S1. An online high-throughput transcriptome sequencing mRNA data was extracted for the analyses. Raw data were processed and normalized, and dysregulated genes were identi ed between tumor samples and normal tissue samples. Hence, normalized data and differentially expressed gene (DEG) data were obtained. DEG data were divided into normal reference and tumor DEG data, and the latter set was further divided into ESCC and EAC subgroups. Network concepts were used to measure the similarity between ESCC and EAC. Tumor DEG data were further split into two sets, a tumor test set and a tumor reference set, according to the OS status of the patients. The OS of the patient corresponding to each sample was uncensored in the tumor test set and censored in the tumor reference set.
Duplicated tumor samples from a single case were removed to ensure that the corresponding clinical information had a unique match within the expression pro le data. Moreover, outlier samples were removed from the tumor test set, tumor reference set and normal reference set. The tumor test set was used for the subsequent procedures, including weighted network construction, module detection, and module selection in the network. The tumor reference set was used to check the reproducibility of the network, examine the preservation of the modules, and match a diagnosis to the selected module. The normal reference set was also used for comparisons.
Hub genes were selected, and Gene Ontology (GO) enrichment analysis was performed within the selected module. Genes in the intersection of the hub gene set and the enriched gene set in the GO enrichment analysis were selected as candidates. Least absolute shrinkage and selection operator (LASSO) Cox regression was used to identify the biomarker among the candidates. Finally, the selected biomarker was widely explored and comprehensively compared with other biomarkers.

Data sources
High-throughput transcriptome sequencing data of EC samples were derived from the TCGA-ESCA project in The Cancer Genome Atlas (TCGA) platform, and mRNA expression pro les were extracted.
were tumor samples and 11 were normal tissue samples. Of the tumor samples, 78 were from EAC cases, and 81 were from ESCC cases.
Another dataset was derived from the Genotype-Tissue Expression (GTEx) project for diagnostic validation purposes. The dataset included information for 1445 normal esophageal tissues.

Identi cation of DEGs
First, raw data were imported, organized, and ltered. The processed data were then transformed into logcounts per million (log-CPM) values, and the trimmed mean of M-values (TMM) was calculated as a scaling factor to normalize the data. Comparisons were made between tumor samples and normal tissue samples. Second, with the limma package, the mean-variance trend was investigated, and precision weight values were calculated to remove the heteroscedacity of the data. Finally, the empirical Bayes and treat algorithms were used to assess differential expression and perform gene set testing in linear models. DEGs were identi ed as signi cant if the fold change (FC) > 2 or < 0.5 and adjusted p-value < 0.05.

Similarity comparison for EAC and ESCC and assessment of the reproducibility of the network between the tumor test set and the tumor reference set
Outlier samples were detected and removed in advance. Standardized connectivity [7] was used as the statistic for detecting outliers, and the critical values were set at ± 5. The similarity between EAC and ESCC patients was explored by assessing network features, including the density, mean of the clustering coe cient, centralization, connectivity correlation, and adjacency correlation. A correlation-based adjacency matrix was used to explore the patterns. The same strategy was used to examine the reproducibility of the network between the tumor test set and the tumor reference set.
2.5. Weight gene coexpression network analysis (WGCNA) 2.5.1. Network construction, module exploration and preservation A power function was used to construct the network, and a soft threshold that led the network to satisfy the feature of scale-free topology was determined. Distances were de ned by dissimilarity of the topological overlap matrix (TOM), and then dynamic tree cut[8] techniques were used to detect modules.

Module selection and diagnosis
The module of interest was selected based on three aspects. The rst aspect is based on the concept of mean gene signi cance (GS). In a given module, the mean GS is de ned by the mean absolute value of the correlation between gene expression and uncensored survival time. The second aspect was the correlation between the module and survival. This can be measured by assessing the association between module eigengenes and uncensored survival time, where module eigengenes are de ned as the rst principal component of the modules. In addition, the association between intramodular connectivity and the absolute value of GS also was considered, as it is crucial to select hub genes in the downstream analysis. To ensure that the selected module was credible, further graphics were used to assign a diagnosis to the selected module of interest.

Hub genes selection and GO enrichment analysis
The relationship between intramodular connectivity and eigengene-based connectivity in the selected module was explored graphically; eigengene-based connectivity was represented as module membership (MM) and was de ned as the absolute value of the correlation between the eigengenes and the expression values of each gene. Moreover, the association between MM and GS was investigated to decide the criterion for selecting the hubs.
The GO enrichment analysis results for the genes in the selected module were explored with the Database for Annotation, Visualization and Integrated Discovery (DAVID). [10,11] Important GO terms were selected under the following criteria: p-value > 0.05, fold enrichment > 2, and proportion of genes involved > 0.1.

LASSO Cox regression and survival analysis
Genes related to important GO terms were identi ed, and those intersecting with hub genes were considered candidates. LASSO Cox regression analysis [12,13] was used to select the nal biomarker at the intersection. In brief, in one experiment, ten folds cross-validation was used to estimate the values of lambda. Then, lambda.min, the value of lambda that led to the minimum for the mean cross-validated error, was used to predict the coe cients of candidates. Inclusion or exclusion of the candidates was decided according to their corresponding nonzero or zero coe cients. The experiments were repeated 1000 times. The most frequent candidate was nally selected as the biomarker in our study. For the selected biomarker, OS features were investigated in the null model; Kaplan-Meier (KM) survival curves were generated, and the log-rank test was used to measure the difference in survival between patients with low expression levels and high expression levels of the marker.

Comprehensive comparisons with other biomarkers
A brief summary of recent studies that identi ed biomarkers in EC was made. These biomarkers include MAGEA6 [4], CXCL12, and CXCR4. [5,6] We broadly compared the performance of the biomarkers identi ed in our study and others. In addition, traditional biomarkers, including PTGS2, EGFR, ERBB2, and TP53, were also included in the comparisons.
In the diagnostic aspect, the diagnostic ability of the biomarkers was validated externally by comparing TCGA tumor samples and GTEx normal tissue samples. Boxplots were used to depict the differences, and the predictive ability of the biomarkers was measured via random forest models and corresponding receiver operating characteristic (ROC) curves. In the prognostic aspect, bootstrap-validated C-index values were calculated to indicate the predictive ability of the biomarkers.

Target biomarker analysis
The Tumor IMmune Estimation Resource [14] (TIMER, https://cistrome.shinyapps.io/timer/) web server provides systemic analysis of immune in ltrates in various cancer types. Here, we mainly made use of the differential expression (Diff Exp) module, gene module, somatic copy number alteration (SCNA) module and correlation module. The Diff Exp module presents information on DEGs of intertest between tumor tissues and adjacent normal tissues across all TCGA tumor types, and signi cance is evaluated by Wilcoxon test. The gene module helped to estimate tumor purity [15] and the abundances of B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells based on statistical deconvolution of immune in ltrates.
[16] In the SCNA module, SCNAs are de ned by GISTIC 2.0. The in ltration level for each SCNA category was compared between tumor samples and normal samples using a two-sided Wilcoxon rank-sum test. In addition, the correlation between the target biomarker and immune biomarkers was explored with adjustments for tumor purity.

Statistics
Statistical analyses were performed by R software (version 3.6.1). The main package used in our research was WGCNA. In addition, edgeR [17] was used to process raw data, and limma[18] was used to identify DEGs; survival [19], rms [20], pec [21] and glmnet [12] were used to analyze the data; ggplot2 [22], survminer [23], GOplot [24], ggpubr [25] and RColorBrewer[26] were used to create graphs. Figure 1 shows the procedures used to process, normalize data, and identify DEGs. Initially, none expressed genes in the pro les were ltered. In total, 969 out of 19645 genes were removed. Then, the sequencing depths of the mRNA pro les were explored and are shown in Fig. 1A. The maximum value was over 3 times the lowest value. Hence, normalization factors were calculated via the TMM method, and the corresponding results are shown in Fig. 1B. In addition, the data were transformed to a log-CPM scale for normalization.

Identi cation of DEGs
The mean-variance trend was explored and the voom plot shows a decreasing trend between the means and variances in Fig. 1C, where the horizontal axis represents the normalized value and vertical axis represents the square-root of the standard deviations of the residual variances extracted from the tting linear models in the voom function. Hence, normalized data were accommodated via precision weights; the result is shown in Fig. 1D, where the blue line represents the average log 2 residual standard deviation.
Finally, empirical Bayes moderation was carried out, and the treat method was used to identify DEGs between tumor samples and normal tissue samples. In brief, 542 DEGs were identi ed, of which 388 were downregulated and 154 were upregulated. The results are shown visually as a volcano plot in Fig. 1E, where the green dots represent downregulated DEGs and the red dots represent upregulated DEGs.

Similarity comparison for EAC and ESCC and reproducibility checking of the network between the tumor test set and the tumor reference set
The similarity between EAC and ESCC subtypes was examined via network feature assessment. We also used the normal reference set to indirectly illustrate the similarity or dissimilarity between the sets. Outlier samples were removed before the analysis. One sample each was removed from the EAC and ESCC subsets. Figure 2A-B shows the corresponding cluster trees and heatmaps for the EAC group, ESCC group and normal reference group. The darker the color is, the higher the absolute correlation value. The EAC and ESCC groups showed similar patterns in the cluster tree and heatmap plot, whereas the normal reference group exhibited a quite different pattern.
We next looked at network features for each set. Mathematically, density values range from 0 to 1. The closer the value is to 1, the tighter the connection strength of the nodes in a network. The summarized table shows that the density values were 0.22, 0.21, and 0.51 for the EAC, ESCC, and normal reference sets, respectively. The mean clustering coe cient was used to quantify the internal extent of the network structure. The values of the two subtypes were 0.28 and 0.26. Centralization was used to measure star topological structure of a network. A value of 0 indicates that the connectivity of all nodes is equal, and a value of 1 indicates that a star topology exists. The results show that the values were relatively low in the three groups, which suggests that the network is far from having star topology. In addition, the correlation values for connectivity and adjacency were high in the EAC and ESCC subsets. In summary, the two subsets are quite similar. The similarity may be indirectly illustrated via the normal reference group, which exhibits a quite different pattern. In short, EAC and ESCC appear similar; thus, it may be feasible to identify a consensus biomarker for the EAC and ESCC groups.
The same strategy was used to examine the reproducibility of the network between the tumor test set and the tumor reference set. The corresponding results are shown in Fig. 2C-D. In brief, the two tumor sets are similar, which means that the network is stable and reproducible.

Network construction, module exploration and preservation
A power function was used to transform the correlation-based similarity matrix into an adjacency matrix to construct the weighted coexpression network, and the corresponding cluster dendrogram is shown in Fig. 3A. Under the TOM and dynamic tree cut algorithms, four modules were identi ed and labeled in different colors. Notably, the gray module did not hold statistical properties. Hence, we were not interested in this module, as hubs could not be identi ed via statistical procedures. The second largest module was the turquoise module, which contained 100 DEGs, and the smallest module was the brown module, which contained 68 DEGs. To better illustrate the clustering results, a 3D plot is shown in Fig. 3B.
Module preservation was measured via Zsummary and medianRank. In brief, the module is preserved if Zsummary > 10, is not preserved if Zsummary < 2, and is weakly to moderately preserved if 2 < Zsummary < 10. In addition, a high-ranked module (according to the statistic of medianRank) is more preserved than a low-ranked module. Figure 3C-D show the preservation values for the tumor reference and normal reference sets. All modules were well preserved in the tumor reference set and less preserved in the normal reference set. In addition, the turquoise module was ranked as the highest module in terms of medianRank. In conclusion, the modules are preserved in the tumor reference set, which indicates that the identi ed modules are reproducible.

Module selection and diagnosis
The module of interest was selected based on Fig. 4A-C. GS values across modules were rst explored. The analysis showed that the blue and turquoise modules had higher GS values than other modules (Fig. 4A). In addition, relations between module eigengenes and survival times were investigated via heatmaps (Fig. 4B). The red and blue colors represent positive and negative correlations. The darker the color is, the stronger the correlation. Both the blue module and the turquoise module were signi cantly correlated with survival time. Figure 4C shows the correlation between intramodular connectivity and GS. The blue module had the highest correlation among all the modules. Based on the above ndings, the blue module was selected as the candidate module.
The blue module was then examined graphicly. The relationships between eigengenes and the expression level of each gene are shown in Fig. 4D. The plot shows that the highly expressed genes have corresponding positive eigengene values, and weakly expressed genes have negative eigengene values.
Hence, eigengenes are representative of all the genes within the modules. In addition, Fig. 4E-F depicts the network plots of the blue module in the tumor test set and the tumor reference set. The line is colored red if the correlation is positive and blue if it is negative. The thickness of the line re ects the absolute correlation. The size of each black circle indicates the connectivity of the corresponding gene. Genes were ordered by connectivity in the tumor test set. The results indicated that all genes were positively correlated, as no blue line was observed. Subjectively, the two circle plots look similar, which indicates that the blue module is preserved and reproducible. Figure 5A shows that MM and intramodular connectivity are high correlated. This indicates that the selection of hub genes via MM is practicable. Figure 5B shows the correlation between MM and GS. Conservatively, we set MM > 0.7 and GS > 0.2 as the criteria to select the hubs, as MM and GS are not highly correlated. As a result, 32 genes were identi ed as hubs.

Hub genes selection and GO enrichment analysis
In addition, GO enrichment analysis was performed for the blue module genes. Five important GO terms were identi ed, and the corresponding results are shown in Fig. 5C. Thirty-two genes were involved in these GO terms, and the results are shown in a chord plot (Fig. 5D).

LASSO Cox regression and survival analysis
Twelve candidates were identi ed in the intersection between hub genes and highly enriched genes. The results of LASSO Cox regression for these 12 genes are shown graphically. Figure 6A shows the coe cients for candidates under different values of lambda, and Fig. 6B shows how many candidates were kept in the model under lambda.min in just one experiment. The results show that the log value for lambda.min is approximately − 2.9 in Fig. 6B, and only one coe cient was nonzero (Fig. 6A), which indicates that only one gene was retained in the model in the experiment. Figure 6C exhibits the results for 1000 experiments. CMA1 was shown to be the most frequent candidate in LASSO Cox regression.
For CMA1, the null model for OS was explored in Fig. 6D. Five-year predictive ability was ignored in the following analyses because only 3 patients were at risk after the fth year of survival. The predictive ability of CMA1 for OS is shown in Fig. 6E, and the log-rank test showed that it is a signi cant predictor (P = 0.009).

Comprehensive comparisons with other biomarkers
Broad comparisons were made within PCG biomarkers for diagnostic and prognostic purposes. Important results are shown in Fig. 7. Boxplots for the expression values of the identi ed biomarkers between TCGA tumor samples and GTEx normal tissue samples are shown in Fig. 7A. It seems that the diagnostic ability of MAGEA6, CXCL12, and CMA1 is acceptable, as the boxes overlapped less than other candidates between tumor tissues and normal tissues. The predictive abilities of the biomarkers were further examined via random forest models and corresponding ROC curves (Fig. 7B). CMA1 ranked 3rd among all the biomarkers in terms of the area under the ROC curve (AUC) criterion.
The prognostic ability of the biomarkers was assessed according to the C-index, and its corresponding 95% bootstrap validated con dence interval (Fig. 7C). CMA1 was the best biomarker, and its performance was slightly greater than that of EGFR. Furthermore, since only 15 patients assessed were at risk after 3 years of survival, we paid more attention to comparisons of biomarkers within 1 to 3 years of survival. The bootstrap cross-validated C-index was measured at a series of time points within one to three years, and the corresponding results are shown in Fig. 7D. Again, CMA1 was the best biomarker at the selected time points. In summary, CMA1 is the best choice for both diagnosis and prognosis prediction for all the identi ed biomarkers.

CMA1 expression correlated with the immune in ltration level in EC
In many types of malignancies, CMA1 was downregulated in tumor tissues compared to normal tissues (Fig. 8A). Tumor purity was de ned as the proportion of cancer cells in the admixture [15]. Herein, the CMA1 expression level was adversely associated with purity in ESCA, suggesting that CMA1 is highly expressed in the microenvironment. Positive correlations were found between CMA1 expression and CD4 + T cell (p < 0.01), macrophage (p < 0.01), and dendritic cell (p < 0.001) in ltration levels (Fig. 8B). We further explored the association between the expression of CMA1 and immune markers, including HLA-DPB1 for dendritic cells, FOXP3 for regulatory T cells (Tregs) and PD-1 (PDCD1) for exhausted T cells. All the markers in the scatter plots demonstrated a positive partial correlation (p < 0.01) (Fig. 8C). Figure 8D shows the distributions of each immune subset for each copy number status in EC. Only high ampli cation of B cells and arm-level deletion of CD4 + T cells and dendritic cells displayed signi cant differences (p < 0.05) between tumor tissues and normal tissues.

Discussion
EC is one of the most lethal malignancies of the digestive tract worldwide and has high morbidity and mortality rates. Traditional methods are unsatisfactory. Hence, clinicians and physicians are paying more attention to molecular biomarkers for EC diagnosis and prognosis. There are two main histological subtypes of EC: EAC and ESCC. Studies at the molecular level are usually based on one of these two subtypes of EC. In addition, diagnostic and prognostic studies are usually separated. In brief, the identi ed biomarkers are not broadly applicable. However, no evidence has shown that different molecular therapy methods should be developed that apply to different histological types of EC.
We aimed to identify the risk genes among PCGs that can be applied in both diagnosis and OS prediction. The feasibility of our aim was explored in advance via network analysis. After con rmation of the applicability of the design, a comprehensive study was generated to identify biomarkers via weighted coexpression network analysis, GO enrichment analysis, and LASSO Cox regression analysis. The reproducibility of the network and preservation of the modules were also validated in the analysis. These procedures are crucial, as they are prerequisites to ensure that the results are repeatable. Finally, CMA1 was identi ed as a biomarker, and it was widely explored and comprehensively compared with other biomarkers identi ed in other studies of EC. The results show that CMA1 is a promising candidate among these biomarkers due to its superior performance in both diagnostic and prognostic roles.
Downregulation of CMA1 expression was observed in various cancers, and the difference was relatively high between tumor and nontumor tissues in the EC dataset (Fig. 8A). Tumor in ltrating lymphocytes are prognostic biomarkers in various cancers. [27,28] TIMER analysis revealed a correlation between the in ltration of many types of immune cells, especially dendritic cells, and the expression of CMA1, which indicated the possibility that CMA1 in uences tumor progression through immune in ltration.
Mast cells are recognized as a crucial part of the immune microenvironment in tumor tissues and modulate tumor progression by releasing protumorigenic and antitumorigenic molecules, [29] including transforming growth factor-beta, tumor necrosis factor-alpha, interleukin-8, FGF-2, and VEGF. [30] Increased activity of chymase and tryptase secreted by mast cells was observed at all stages of tumor progression and was accompanied by an increase in the number and size of blood vessels. [31] In a breast cancer immunotyping model based on tumor-in ltrating immune cell subsets published in 2020, [32] patients with high B cell, NK cell, CD8 + cell, activated CD4 + memory T cell levels and low γδT cell and activated mast cell levels had a better prognosis than those with other in ltration patterns, so activated mast cells are unfavorable factors in terms of breast cancer patient survival. Moreover, an in vitro animal experiment showed that coinjection of mast cells and HER2-positive breast cancer cells increased tumor engraftment and outgrowth in mice. [33] Additionally, a high level of mast cells in extratumoral benign prostate tissues was considered a risk factor for a poor prognosis in prostate cancer patients. [34] Mechanistically, it was observed that tumor-derived microvesicles from non-small-cell lung cancer (NSCLC) cells are internalized by mast cells, which enhances the migratory ability of mast cells and promotes the release of TNF-alpha and MCP-1. [35] In EC, however, there are some con icting reports about the effect of mast cell in ltration on prognosis. Tinge et al found that the number of mast cells was not related to prognosis in EC patients,[36] but Fakhrjou et al pointed out that high mast cell density in the invasive edge of the tumor correlated with tumor progression and poorer survival. [37] Further studies are required to con rm the prognostic signi cance of mast cell in ltration in malignancies due to the complicated functions of mast cells.
Located at 14q11.2 in a cluster of genes encoding other proteases, CMA1 encodes a chymotryptic serine proteinase belonging to the peptidase family S1.
[38] It is secreted by mast cells and has been reported to play an essential role in the degradation of the extracellular matrix, the regulation of submucosal gland secretion, and the generation of vasoactive peptides. In the heart and blood vessels, the major convertor from angiotensin I to the vasoactive peptide angiotensin II is not angiotensin converting enzyme but CMA1, so CMA1 aroused the interest of experts in heart [39] and hypertension research. [40] In the eld of oncology, however, CMA1 is not well studied. There are only a few studies concerning CMA1 in oncology. It was reported in 2015 that chymase has angiogenic effects and can induce tumor growth. [41] Interestingly, miR-9 induced an elevation of CMA1 expression in both P815 (a mouse malignant mast cell line expressing activating KIT mutations) and mouse bone marrow-derived mast cells, supporting that miR-9 promotes metastasis by increasing the expression of proteases crucial for physical remodeling of the extracellular matrix. [42] Chymase-positive mast cells seem to be a biomarker of poor prognosis in breast cancer, [43] gastric cancer, [30] and lung cancer. [44] In different malignancies, the mRNA levels of CMA1 have various prognostic implications. Elevated CMA1 mRNA expression levels were found in benign prostate hyperplasia tissue compared to prostate cancer tissue or normal prostate tissue.
[45] Of note, the CMA1 mRNA levels were lower in oral squamous cell carcinoma tissue than in normal tissue, and patients with high CMA1 expression had apparently better prognoses (p = 7*10 − 6 ).
[46] The CMA1 mRNA levels were elevated in the normal mucosa compared to the normal tissues and tumor tissues of colorectal carcinoma patients.
[47] The upregulation of THBS2 and SPARC at the same time also implied that in ammatory activation of stromal broblasts is not only an integral part of colorectal carcinoma but can facilitate its invasion of the surroundings.
[47] Increased CMA1 mRNA was associated with an unfavorable prognosis in gastric cancer but a favorable prognosis in ovarian cancer.
[48] Mechanistically, CMA1 may in uence the prognosis of cancer patients by altering the immune microenvironment. In our EC research, the CMA1 expression level was positively correlated with the in ltration of dendritic cells, regulatory T cells, and exhausted T cells (Fig. 8B-C). The metastasis promotion potential of dendritic cells is related to inducing Treg cell cytotoxicity and reducing the cytotoxicity of CD8 + T cells. In addition, overexpression of CMA1 was correlated with poorer OS in gastric cancer patients at lymph node stages 1-3 but not in patients without lymph node metastasis. This discovery strongly suggests that the CMA1 expression level is a potential predictive marker of metastasis.
[48] However, if CMA1 plays only a driving role in tumor development, it is di cult to explain its controversial prognostic implications across different tumors. Further studies are required to uncover the intricate regulatory network between CMA1 and the immune microenvironment and develop it as a potential immune therapy target.

Conclusion
In summary, we revealed that CMA1 is a potential diagnostic and prognostic biomarker of EC through a comprehensive analysis; CMA1 might exert its effect by in uencing the immune microenvironment. This nding may facilitate the individualized treatment of EC patients.

Declarations
Ethics approval and consent to participate All the data were obtained from public databases according to their guidence, so ethics approval and consent to participate were not in requirement.

Consent for publication
All the authors have read the nal edition of this manuscript and approve the publication.
Availability of data and material RNA sequencing data and associated clinical features of ESCA cohort is available in the Cancer Genome Atlas (TCGA) database, and Genotype-Tissue Expression (GTEx) project is also accessible online for academy usages.

Competing interests
The authors declare that they have no competing interests.