Identification of HMMR as a prognostic biomarker for patients with lung adenocarcinoma via integrated bioinformatics analysis

Background Lung adenocarcinoma (LUAD) is the most prevalent tumor in lung carcinoma cases and threatens human life seriously worldwide. Here we attempt to identify a prognostic biomarker and potential therapeutic target for LUAD patients. Methods Differentially expressed genes (DEGs) shared by GSE18842, GSE75037, GSE101929 and GSE19188 profiles were determined and used for protein-protein interaction analysis, enrichment analysis and clinical correlation analysis to search for the core gene, whose expression was further validated in multiple databases and LUAD cells (A549 and PC-9) by quantitative real-time PCR (qRT-PCR) and western blot analyses. Its prognostic value was estimated using the Kaplan-Meier method, meta-analysis and Cox regression analysis based on the Cancer Genome Atlas (TCGA) dataset and co-expression analysis was conducted using the Oncomine database. Gene Set Enrichment Analysis (GSEA) was performed to illuminate the potential functions of the core gene. Results A total of 115 shared DEGs were found, of which 24 DEGs were identified as candidate hub genes with potential functions associated with cell cycle and FOXM1 transcription factor network. Among these candidates, HMMR was identified as the core gene, which was highly expressed in LUAD as verified by multiple datasets and cell samples. Besides, high HMMR expression was found to independently predict poor survival in patients with LUAD. Co-expression analysis showed that HMMR was closely related to FOXM1 and was mainly involved in cell cycle as suggested by GSEA. Conclusion HMMR might be served as an independent prognostic biomarker for LUAD patients, which needs further validation in subsequent studies.


INTRODUCTION
Lung cancer remains the most common cause of cancer-related death worldwide. More than 1,600,000 patients are newly diagnosed with lung cancer each year, which reduces patients' life quality and brings heavy financial burden to the patients (Qu et al., 2020). Lung adenocarcinoma (LUAD) is the most frequent histological type among lung cancers, which occupies approximately 40% (Bai et al., 2019a). Although the clinical strategies treating LUAD, such as chemotherapy, radiotherapy, targeted therapy, surgery, and immunotherapy, have been developed in recent decades, the 5-year survival rate of LUAD remains unsatisfactory, mainly because of the lack of effective prognostic biomarkers and therapeutic targets involved in the development and progression of the lung carcinoma (Chen et al., 2019a). It is therefore crucial to perform an integrated study to identify the core genes and comprehensively understand its relevant signaling pathways during the LUAD occurrence and progression.
Nowadays, the combination of genomics technology and bioinformatics analysis facilitated the wide application of gene expression profiles in a range of human cancers, providing a new insight into screening tumor-associated genes and identifying the core prognosis factors (Kaushik et al., 2020;Huang et al., 2019;Liu et al., 2019). Meanwhile, high-throughput technologies simplified the procedure of gene expression profiling and a growing body of expression data can be retrieved from public databases, which allows us to further study the underlying molecular mechanisms and medicine targets in LUAD (Guo et al., 2020;Sun et al., 2020). Although many studies have identified hundreds of differentially expressed genes (DEGs) and indicated corresponding biological pathways associated with lung cancer, the results were not consistent because of various reasons (Long et al., 2019;Mao et al., 2019;Lu et al., 2020). Considering the limitation of a single microarray analysis, characterized by limited samples, unbalanced datasets and serious systematic error, we herein conducted an integration analysis of genetic data with multiple gene expression profiles and public databases to overcome the shortage and obtain reliable diagnosis markers.
In this work, multiple gene expression profiles retrieved from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) were used for identification of DEGs and candidate core genes. Protein-protein interaction (PPI) was then constructed following the identification and validation of the hub gene. In addition, survival analysis, meta-analysis, prognostic analysis, and co-expression analysis were performed successively to evaluate the potential of the core gene as a prognostic factor in LUAD. The biological functions of the hub gene were explored through the Gene Set Enrichment Analysis (GSEA) in our study. The workflow of this work is provided in Fig. 1.

Data collection and processing
Eight gene expression profiles were obtained from the GEO online public database, which was provided in Table S1. The adjusted p-value < 0.05 and |log 2 (foldchange)| ≥ 2 were set as cut-off values for screening DEGs in GSE18842, GSE19188, GSE75037 and GSE101929 profiles using R package "limma". Subsequently, the common DEGs shared by the four profiles were used for further study. Additionally, transcriptome RNA-sequencing and proteomics data were extracted from the Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/) andthe Clinical Proteomic Tumor Analysis Consortium (CPTAC) database (https://cptac-data-portal.georgetown.edu/), repectively.

PPI network construction and enrichment analysis
Construction of PPI network and identification of key module were constructed as previously described in Li et al. (2021). Then, the genes with the highest connectivity degrees in key module were recognized as candidate hub genes, whose mRNA expression levels were further validated via GSE19804 and TCGA datasets. The FunRich software (version 3.1.3) was applied to perform enrichment analysis in biological process, cellular component, molecullar function and biological pathway with the cutoff criteria of p < 0.05 for the corresponding genes in key module.

Hub gene identification and validation
Correlations between candidate hub genes and tumor stages were determined by Pearson's correlation analysis in GSE31210 profile and TCGA dataset using R package "corrplot". Then, the common genes shared by the two datasets were identified as hub genes. Furthermore, the protein expression levels of hub geneswere further verified utilizing CPTAC data and immunohistochemical results from Human Protein Atlas (HPA) (https://www.proteinatlas.org). In addition to public datasets, we performed qRT-PCR and western blotting to measure the expression pattern of the core gene in LUAD cells (A549 and PC-9).

Survival analysis and prognostic value analysis
The association of hub genes with overall survival (OS), progress-free survival (PFS) and disease-free survival (DFS) in LUAD patients were conducted as previously described in Li et al. (2021). Meanwhile, Kaplan-Meier Plotter (www.kmplot.com) and Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn/) platforms were also used to perform survival analysis for the corresponding genes. Additionally, the prognostic estimation of hub genes using meta-analysis was performed based on the methods in Li et al. (2021). Finally, Cox regression analysis was conducted following the protocols in Li, Qi & Li (2020).

Co-expression analysis and gene set enrichment analysis (GSEA)
Co-expression analyses for hub genes were conducted using the Oncomine database (https://www.oncomine.org) in order to identify a significant factor associated with hub genes, which was further verified in both GSE31210 profile and TCGA dataset using R package "corrplot" and GraphPad Prism software (version 7.0). Additionally, GSEA software (version 4.0.3) was utilized to conduct enrichment analysis for DEGs between high and low hub gene expression subgroups to investigate the biological functions of hub gene. FDR < 0.05 and p < 0.05 were set as the cut-off criteria.

Cell culture
HBE, A549 and PC-9 cells were cultured according to protocols in Li et al. (2021).

Quantitative RT-PCR analysis
Quantitative RT-PCR analysis was conducted following the instructions in Li et al. (2021).

Western blot analysis
Westerm blot analysis was subsequently performed according to protocols in Li et al. (2021). The primary antibody against HMMR was purchased from Abcam (Cambridge, UK, diluted 1:1000, ab124729).

Statistical analysis
The R (version 3.6.0) and Graphpad prism software (version 7.0) were used to perform statistical analysis. Cochran's Q test and Higgin's I 2 statistics were applied for the heterogeneity estimation in the meta-analysis in meta-analysis. The statistical differences in the Kaplan-Meier analysis were calculated by log-rank test. Besides, data are presented as the mean ± SD of at least three independent experiments. The 2 −DDCt method was utilized to analyze the results of qRT-PCR and Student's t-test was applied to assess the significance of differences between groups. A p-value of < 0.05 was considered statistically significant.  Table S5). Meanwhile, Kinesins, Cell Cycle, FOXM1 transcription factor network were the main enriched biological pathways for these 24 DEGs ( Fig. 3C and Table S6).

Identification of HMMR as a hub gene in LUAD
Besides, among the 24 DEGs with high connectivity degrees in the whole PPI network (Figs. 3A and 4A), 19 DEGs were identified with leading intramodular connectivity according to the key module (cluster 1) PPI network (pink nodes constructing the network in Fig. 3A) and meanwhile, they were considered as the candidate hub genesin the work (Fig. 4B). Afterwards, the candidate hub genes were identified in the GSE19804 and GSE31210 profiles (Figs. 4C and 4D). In addition to the difference analysis, the GSE31210 profile and the TCGA dataset were used to evaluate the correlation between the expression of the candidate hub genes and tumor stages ( Table 1). The results revealed that HMMR was uniquely shared by these two datasets among the top three genes most strongly correlated with tumor stages (Fig. 5A). Though p-values were significant, R values in correlation analyses of HMMR expression and tumor stage were not satisfied. Therefore, we further performed differential analysis for HMMR expression in LUAD patients with different tumor stage aiming to supply the unsatisfied correlation. In the work, HMMR was considered as the hub gene. It was highly expressed in LUAD tissues  HMMR served as a prognostic factor in LUAD It was found that high HMMR expression significantly deteriorated overall survival (OS) in TCGA dataset, GSE68465 profile, GSE50081 profile, Kaplan-Meier Plotter and GEPIA platforms (Figs. 6A-6E), as presented in GSE31210 profile (Fig. S2). Besides, HMMR was negatively associated with progression-free survival (PFS) in TCGA dataset, GSE68465 profile, Kaplan-Meier Plotter platform (Figs. 6F-6H). The associations of HMMR with inferior disease-free survival (DFS) were also observed in Figs. 6I and 6J using GSE50081 profile and GEPIA platform. Because of the unsignificant heterogeneity (I 2 < 50%, p > 0.05), we selected solid model to perform the meta-analysis. The results of the meta-analysis implied that HMMR served as a prognostic factor for the OS of LUAD patients (Fig. 6K). Moreover, the Cox regression analysis revealed that HMMR acted as an independent prognostic factor for the survival of LUAD patients (Table 2).

Prognostic value of HMMR in LUAD
The HMMR expression was significantly impacted by gender (female vs. male), tumor stage (stage I & II vs. stage III&IV), T classification (T1 vs. T2-4), N classification (N0 vs. N1-3), and M classification (M0 vs. M1), whereas not by age (<=65 vs. >65) (Fig. 7A). The LUAD patients were classified into high and low HMMR expression subgroups according to the median of HMMR expression levels. Subsequent survival analysis showed that high HMMR expression group exhibited significantly poor OS in >65 age, female, male, stage III&IV, M0 and N1-3 subgroups (Fig. 7B). Interestingly, there was no statistical significance for HMMR expression between >65 age subgroup and <=65 age subgroup, while expression level of HMMR significantly impacted the OS for LUAD patients with above 65 age. To some extent, there was contradictory for the results might due to limited samples and unbalanced data, which was needed further investigation.
HMMR co-expressed with FOXM1 in LUAD According the results of pathway enrichment analysis, we have identified the FOXM1 transcription factor network and cell cycle were the main biological pathway enriched by the common 24 DEGs (Fig. 3C). Among the pathways, FOXM1 (forkhead box protein M1) was an essential proliferation-associated transcription factor. It was widely expressed during cell cycle and involved in cellular growth, self-renewal, and tumorigenesis (Liao et al., 2018). Additionally, through Oncomine co-expression analysis, we found that the expression of HMMR was positively associated with that of FOXM1 (r = 0.890) (Fig. 8A), which was consistent with our expectation. The mRNA expression of FOXM1 exhibited significant elevation in LUAD tissues relative to the matched normal lung tissues in both the GSE31210 profile and the TCGA data. In line with Oncomine database, the GSE31210 profile and the TCGA data also suggested that the expressions of HMMR and FOXM1were positively correlated via Pearson's correlation analysis, even with similar correlation strengths, with r = 0.723, p < 0.001 and r = 0.730, p < 0.001, respectively (Figs. 8B, 8C).

HMMR was involved in cell cycle in LUAD
GSEA was performed to illuminate the potential biological functions of HMMR in LUAD progression using DEGs between high and low HMMR subgroups according to the median  HMMR expression level. As shown in Fig. 9A and Table S7, the functions of these genes were significantly associated with cell cycle, G2/M phase transition, cell cycle process, chromosome segregation, and nuclear division pathways, as indicated by biological process enrichment analysis. DNA repair, E2F targets and G2/M checkpoint were the major enriched HALLMARK terms ( Fig. 9B and Table S8). Cell cycle, DNA replication, p53 signaling pathway, and other biological pathways were the major enriched HALLMARK terms ( Fig. 9C and Table S9). Taken together, those results indicated that the implication of HMMR in LUAD progression might be mediated by cell cycle regulation.

DISCUSSION
Currently, LUAD is considered to be the most common subtype of lung cancer in clinical practice, with a 5-year overall survival rate varying from 4% to 17% (Hirsch et al., 2017). The exact pathogenesis of LUAD remains elusive and numerous diverse and complicated processes have been reported to play a role. For instance, the presence of aberrant gene expression, autophagy activation, unexpected tumor microenvironment, immune cell infiltration, abnormal cell cycle, DNA methylation, epigenetic interactions, and other molecular and cellular events, have been identified to be associated with the development of LUAD (Fan et al., 2019;Chen et al., 2019b;Wang et al., 2019a;Lazarus et al., 2018). It has also been reported that several signaling pathways, such as MAPK pathway, AKT-mTOR signaling and Wnt/β-catenin signaling pathway, are frequently activated in LUAD (Hou et al., 2019;Liu et al., 2020;Luo et al., 2019a). In spite of the great progress in illustrating the pathogenesis of LUAD and treating LUAD, effective prognostic biomarkers and treatment targets are stilling lacking. Hence, there is an urgent need to find a precise and effective prognostic factor and a therapeutic target in order to improve LUAD-associated survival rate.
Bioinformatics is a data-driven subtype of science and currently frequently applied in aspects like analyzing genetic data, studying tumor progression, screening core genes, and identifying the medical targets through the usage of bioinformatic analytical tools including software plug-ins and packages, as well as database platforms (Luo et al., 2019b;. Previously, an integrated regulatory network, containing DEGs, transcription factors, and miRNAs, has been constructed using GSE37764 dataset, which attempted to find out the important factors in the pathogenesis of lung adenocarcinoma and lay the foundation for the further investigation (Du & Zhang, 2015). Meanwhile, an eight-gene prognostic signature (DLGAP5, KIF11, RAD51AP1, CCNB1, AURKA, CDC6, OIP5 and NCAPG) also has been identified for the survival of LUAD using GSE19188 and GSE33532 profiles . Besides, Zhou et al. (2019) aimed to screen out the driver genes in smoking-associating lung adenocarcinoma via bioinformatics, and then they suggested that the seven genes (CYP17A1, PKHD1L1, RPE65, NTSR1, FETUB, IGFBP1 and G6PC) could be promising prognostic factors for lung adenocarcinoma according to their negative correlation with the patient survival.
In this study, we aimed to find a core gene that has both prognostic value and the potential to become a treatment target for LUAD via integrated analysis. Firstly, shared DEGs by the four GEO profiles were obtained and used for PPI network construction followed by key module identification in order to identify candidate hub genes. Secondly, biological process and pathway enrichment analyses for the identified genes were carried out by the FunRich software. The enrichment results suggested that the potential functions of those candidate genes were specifically associated with cell cycle, spindle assembly, FOXM1 transcription factor network, M phase, and other pathways. It is well known that cell cycle and M phase transition play essential roles in manipulating the proliferation of cells and the occurrence of tumors. For example, repressing p21 expression, the check point of cell cycle, could promote tumor proliferative capacity and accelerate cancer procession . Likewise, many molecules, like Rac3, SMAD3, CDCA7 and DMBX1, which regulate cell growth via cell cycle, have similar functions Wang et al., 2016aLuo et al., 2019c). Accumulating studies have reported the involvement of FOXM1 in the progression of tumors including lung cancer, gastric cancer, breast cancer, and epithelial ovarian cancer (Hsieh et al., 2019;Bai et al., 2019b;Ring et al., 2018;Wang et al., 2016b). Specifically, in lung cancer, FOXM1 has been found to be co-expressed with CENE and could regulate the expression of MMP2, contributing to LUAD growth and metastasis (Shan et al., 2019;Hsieh et al., 2019). Besides, supporting our findings in the present study, FOXM1 transcription factor network was identified as a major predictor of poor outcomes in pan-cancer in the report of Gentles et al. (2015). Finally, among these candidate genes, HMMR was identified as the hub gene in this study due to its significant role in predicting survival and its association with tumor stage, which was further validated at the protein level in the HPA database and at the mRNA expression level on the CCLE platform. The potential functions of the hub gene were explored by GSEA, and the results indicated that mainly cell cycle and its relevant pathways were activated pathways in LUAD highly expressing HMMR.
The above results confirmed the prognostic value of HMMR expression in LUAD. HMMR, a sulfonated glycosaminoglycan, is a receptor for hyaluronic acid (HA), which accumulated during pulmonary inflammation. It has been found that high expression of HMMR is associated with multiple human malignancies, such as gastric cancer, breast cancer, prostate cancer, ovarian cancer, bladder cancer, with characteristics of promoting cancer progression and indicating poor prognosis in patients (Huang et al., 2017;Yeh et al., 2018;Rizzardi et al., 2014). For example, a zebrafish xenograft assay verified that highly expressed HMMR, under the control of both TGFβ signaling and Hippo pathway, contributed to sarcoma genesis and metastasis (Ye et al., 2020). In gastric cancer patients, HMMR over-production was remarkably associated with tumor relapse and poor prognosis, and resulted in resistance to the chemotherapy via promoting epithelialmesenchymal transition and modulating cancer stem cell properties . Additionally, HMMR also interacted with CD44, another HA receptor characterized by forming complexes with ERK1/2, to exert its functions in breast cancer (Telmer et al., 2011). Besides, HMMR was identified as a promising diagnostic biomarker and an independent prognostic factor for hepatocellular carcinoma (HCC) via bioinformatics, the expression of which was positively correlated with HCC tumor grade and stage in HCC patients (Lu et al., 2020).
We found that high expression of HMMR was significantly correlated with poorer OS and survival rate in subgroups with clinical stage III/IV, lymph node metastasis classification 1/2/3, male, female and patients with an age of above 65. Multivariate COX regression analysis further suggested that the HMMR expression level could serve as an independent prognostic indicator in LUAD. These findings highlighted the prognostic value of HMMR expression in LUAD. We further conducted correlation analysis based on the Oncomine database and the results indicated the expression of HMMR was positively associated with that of FOXM1, which encoded transcription factor of fork head family and intriguingly was also found to negatively impact prognosis in many solid tumors (Liang et al., 2019;Liu et al., 2018). Additionally, cell cycle, DNA replication, p53 signaling pathway, RNA degradation, spliceosome, and ubiquitin mediated proteolysis were significantly enriched pathways in LUAD highly expressing HMMR. It was well known that these six signaling pathways were typically involved in the occurrence and development of cancers (Aubrey et al., 2018;Fish et al., 2019;Dvinge et al., 2019;Senft, Qi & Ronai, 2018). For example, the core protein and RNA components of the spliceosome were essential for splicing decision and able to manipulate the splice sites during pre-mRNA processing, thereby playing importantly regulatory roles in tumorigenesis and metastasis of malignant cells (Hsu et al., 2015). However, there was no inhibitor of HMMR currently undergoing preclinical or clinical testing anywhere in the world.
In the work, we identified HMMR as the core gene via integrated bioinformatics analysis and provided robust evidence for the potential prognostic and therapeutic role of HMMR in LUAD, though the prognostic value was not significant in patients with clinical stage I/II and an age of below 65. It might also be a limitation, as only RNA-based bioinformatics analysis was performed in this study and no functional experiment was conducted. Therefore, further studies are needed to improve the reliability of the results.

CONCLUSION
In summary, HMMR might serve as an independent prognostic factor for the OS in LUAD patients. This work would facilitate the development of novel prognostic biomarkers and therapy targets for LUAD, though further experiments are needed to verify these findings.