Bioinformatics analysis deciphering the transcriptomic signatures associated with signalling pathways and prognosis in the myelodysplastic syndromes

ABSTRACT Background: Several studies scatteredly identified the myelodysplastic syndromes’ transcriptomic profiles (MDS). However, the exploration of transcriptional signatures, key signalling pathways, and their association with prognosis and diagnosis in the integrated multiple datasets remains lacking. Methods: We integrated the GSE4619, GSE19429, GSE30195, and GSE58831 microarray datasets of CD34 + cells for identifying the differentially expressed genes (DEGs) in the MDS. The series of bioinformatics methods are applied to identify the key hub genes, gene clusters, prognostic hub genes, and genes associated with diagnostic efficacy. Finally, we validated the expression differences of hub genes in the GSE114922 dataset. Results: We explored the DEGs related to gene ontology enrichment and KEGG pathways. We identified significant hub genes, including 168 upregulated hub genes (such as STAT1, IFIH1, EPRS, GRB2, RAC2, MAPK14, CASP1, and SPI1) and 52 downregulated hub genes (such as CREBBP, HIF1A, PIK3CA, EZH2, PIK3R1, MDM2, IRF4, CXCR4, PCNA, and CD19) in the MDS. In addition, we identified six significant molecular complex detection (MCODE)-derived upregulated gene clusters and one downregulated gene cluster, respectively. Moreover, we found that the higher expression level of MX2, GBP2, PXN, IFI44, FDXR, PLCB2, ASS1, ERCC4, PML, and RRAGD and the lower expression level of CD19, PAX5, TCF3, LEF1, NUSAP1, and TIMELESS hub genes are significantly correlated with shorter survival times of MDS patients. Furthermore, the area value under the ROC curve (AUC) of PXN, FDXR, PLCB2, PML, CD19, PAX5, and LEF1 prognostic genes are more than 0.80, indicating that these genes could be effectively used for the diagnostic efficacy of MDS patients. Conclusions: Identifying key hub genes and their association with the prognosis and diagnostic efficacy may provide substantial clues for the treatment and diagnosis of MDS patients.


Introduction
The most common malignant blood disorder is myelodysplastic syndromes (MDS) characterized by dysfunctional haematopoiesis [1]. The major molecular cause of MDS is genetic instability, which in turn transformed into acute myeloid leukaemia (AML) in a larger number of cases [1]. MDS is frequently occurring in older males and in other individuals who are earlier exposure to cytotoxic therapy [2]. Some previous studies demonstrated the genetic abnormalities and molecular pathogenesis of MDS patients [3][4][5][6]. It has been reported that the pathogenesis, initiation of disease, disease progression, and heterogeneity of MDS are crucially associated with the genetic abnormalities [3]. The deregulated transcriptomes have a greater contribution to the development of MDS [4]. The primary hallmark of MDS development is genes or chromosomal changes, copy-number alterations, and somatic mutations [6]. For example, Ying le demonstrated that the dysfunctional genes and signalling pathways are altered during the progression of MDS [5]. Aberrantly expressed transcriptomes are substantially associated with the clinical outcome of MDS patients [4]. Autophagy-associated genes predicted the poor prognosis of MDS patients [7]. Also, deregulated metabolic genes are correlated with poor prognosis of MDS patients [8]. These MDS-associated studies provided a clue that the deregulated and genetically altered transcriptomes are associated with the pathogenesis, development, progression, and clinical outcomes of the MDS patients.
Herein, a meta-analysis integrated the four microarray gene expression datasets of MDS to identify the differentially expressed genes (DEGs). Then, we explored the functional enrichment analysis, included biological processes, molecular functions, cellular components, and KEGG pathways that are significantly associated with the upregulated and downregulated genes, respectively. In addition, we constructed the protein-protein interaction (PPI) network of DEGs and identified the key hub genes based on the degree of interaction of DEGs. We investigated the expression validation of top hub genes in the independent datasets of MDS patients. Moreover, we explored the association of hub genes with the survival prognosis of MDS patients. Finally, we investigated the diagnostic efficacy of prognostic hub genes in MDS patients.

Datasets
We searched the NCBI gene expression omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) using the keywords 'myelodysplastic syndromes ', 'bone marrow', 'CD34 + cells' ' CD34 + cells in MDS', 'bone marrow in MDS' and 'transcriptomes in MDS', and identified a five gene expression datasets of CD34 + cells that are associated with MDS of human. We downloaded and integrated the GSE4619 (n = 66) [9], GSE19429 (n = 200) [10], GSE30195 (n = 19) [11], and GSE58831 (n = 176) [4] to identify the DEGs in the MDS. The platform of these datasets is Affymetrix human genome U133 plus 2.0 array. The integrated dataset (combining four datasets) has 461 samples, including 412 MDS samples and 49 healthy controls. Then, we used the GSE114922 (n = 90) [12] for validating the expression differences of key genes between the MDS and control samples. The platform of the GSE114922 is GPL20301 (Illumina HiSeq 4000 (Homo sapiens)). All data were transformed into log2-base normalization. For analyzing the survival differences between the two groups of patients, we downloaded the clinical data of GSE58831 [4].

Identification of differentially expressed genes (DEGs)
We used the web tool NetworkAnalyst software [13] to identify the DEGs between MDS and normal samples. The multiple probes of a single gene were averaged and normalized by log2 normalization. We reduced the batch effects of multiple datasets by using the ComBat method [14]. A total of 19,547 common genes were found after integrating the four datasets. We employed the R package 'limma' to identify the significant DEGs between MDS and normal samples [15]. We identified the DEGs with a threshold absolute value of combined effect size (ES) >0.585 and adjusted P-value≤0.05.

Gene ontology and gene-set enrichment analysis
We performed the DEGs' functional and gene-set enrichment analysis by using the GSEA [16]. We used the GSEA tool for identifying the gene ontology terms and KEGG pathways for the significant upregulated and downregulated DEGs in the MDS. The biological processes, cellular components, molecular functions, and KEGG [17] pathways that are significantly associated with the upregulated DEGs and the downregulated DEGs were identified, respectively. The FDR-value < 0.05, calculated by the Benjamini -Hochberg method [18], was considered significant when selecting the functionally enriched terms.

Identifying hub genes and gene clusters from the protein-protein interaction (PPI) network of DEGs
We constructed the PPI network using the STRING database to understand better the regulatory relationship among the identified DEGs [19]. To determine the rank of hub genes, we used Cytoscape plug-in tool cyto-Hubba [20]. Hub genes were identified based on the degree of interactions with their neighbour genes. To construct the PPI network of DEGs, we selected the minimum required interaction score of 0.40. Hub genes were defined as a gene that was connected to a minimum of 20 other DEGs in the original PPI. We visualize the PPI networks by utilizing the Cytoscape 3.6.1 software A Cytoscape plug-in molecular complex detection (MCODE) was used to detect the significant modules from the original PPI network [21]. We identified the significant modules based on the MCODE score and node number. The threshold of the MCODE analysis was MCODE score > 5.0, Node Score Cut-off: 0.2, Haircut: true, K-Core: 2, and maximum depth from Seed: 100.

Validation of Hub gene expression levels in the independent dataset
The expression levels differences of top hub genes were further validated using the GSE114922 (n = 90) [12] of MDS patients. The GSE114922 is RNA-sequencing data that was performed on bone marrow CD34 + hematopoeitic stem and progenitor cells from patients with MDS (n = 82) and healthy controls (n = 8). We used the R package 'limma' to identify the significant DEGs between MDS and healthy samples [15]. The expression differences of hub genes with P < 0.05 were considered statistically significant between the two groups.

Survival analysis of Hub genes
We utilized the R package 'survival' to investigate the survival prognosis of MDS patients. We compared the overall survival (OS) of MDS patients classified based on the expression levels of all hub genes (High expression group > mean > Low expression group). Kaplan-Meier survival curves were used to show the survival differences, and the log-rank test was utilized to evaluate the significance of survival differences. The prognostic roles of screened hub genes in the MDS were analyzed using the expression and clinical data of GSE58831 [4]. Cox P< 0.05 was taken as the significant between two groups of MDS patients.

Diagnostic efficacy evaluation for the prognostic hub genes in the MDS
To evaluate the diagnostic values of the prognostic key upregulated and downregulated hub genes, the receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUC) was calculated using the 'pROC' R package [22], indicating the capability to distinguish MDS and healthy control. The diagnostic roles of screened hub genes in the MDS were analyzed using the expression data of GSE58831 [4]. The more excellent AUC value of individual genes indicated the differences between MDS and normal samples, and the key gene with AUC > 0.5 in the MDS datasets was defined as a diagnostic efficiency of the gene [23].
3.2 Bone marrow of CD34+ cells-derived transcriptomes are associated with gene ontology and the enrichment of KEGG pathways We used the GSEA tool to identify the enriched gene ontology terms and KEGG pathways that were associated with identified DEGs (Figures 1 and 2). The identified gene ontology included biological processes, cellular components, and molecular functions ( Figure  1A and 1B). The upregulated genes are significantly associated with the enrichment of biological processes (Such as small molecule metabolic process, cellular macromolecule localization, establishment of protein localization, protein containing complex subunit organization, intracellular transport, regulation of intracellular signal transduction, immune effector process, lipid metabolic process, and positive regulation of biosynthetic process) (Supplementary Table  S3), cellular components (mitochondrion, envelope, vacuole, mitochondrial envelope, endosome, organelle subcompartment, Golgi apparatus, mitochondrial matrix, Golgi apparatus subcompartment,     Table  S8). Figure 1A and Figure 1B illustrated the top ten upregulated and downregulated GO, respectively. Moreover, we identified the significantly enriched KEGG pathways ( Figure 2) for upregulated and downregulated genes, respectively. The upregulated pathways are mainly associated with immune regulation (RIG-Ilike receptor signaling pathway, adipocytokine signaling pathway, chemokine signaling pathway, Fc epsilon RI signaling pathway, Toll-like receptor signaling pathway, Fc gamma R-mediated phagocytosis, natural killer cell mediated cytotoxicity, B cell receptor signaling pathway, T cell receptor signaling pathway, epithelial cell signaling in Helicobacter pylori infection, leukocyte transendothelial migration, NOD-like receptor signaling pathway, cytokine-cytokine receptor interaction), metabolisms (lysine degradation, propanoate metabolism, valine, leucine and isoleucine degradation, fatty acid metabolism, beta-Alanine metabolism, butanoate metabolism, Fructose and mannose metabolism, glycerolipid metabolism, glycerophospholipid metabolism, inositol phosphate metabolism, limonene and pinene degradation, amino sugar and nucleotide sugar metabolism, alanine, aspartate and glutamate metabolism, glycosaminoglycan biosynthesischondroitin sulfate, sulfur metabolism), cellular signaling and development (lysosome, MAPK signaling pathway, cytosolic DNAsensing pathway, peroxisome, apoptosis, endocytosis, VEGF signaling pathway, Jak-STAT signaling pathway, homologous recombination, Wnt signaling pathway, phosphatidylinositol signaling system, oxidative phosphorylation, tight junction, nucleotide excision repair, ErbB signaling pathway, basal transcription factors), cancers (acute myeloid leukemia, pathways in cancer, and pancreatic cancer) (Figure 2A and Supplementary  Table S9). Besides, we identified significantly enriched KEGG pathways that are associated with downregulated genes in the MDS. Some of these downregulated pathways are B cell receptor signaling pathway, primary immunodeficiency, cell cycle, circadian rhythmmammal, cytokine-cytokine receptor interaction, p53 signaling pathway, neuroactive ligand-receptor interaction, cell adhesion molecules (CAMs), hematopoietic cell lineage, renin-angiotensin system, Oocyte meiosis, and apoptosis ( Figure 2B and Supplementary Table S10).

STRING-based PPI analysis identifying key hub genes and modules in the MDS
We constructed the PPI of all significant upregulated and downregulated genes to identify the key hub  Table S11) and 632 downregulated genes are (Supplementary Table S12) associated with the PPI network. Then, we investigated the rank of all upregulated and downregulated genes based on the Cytoscape plug-in CytoHubba tool. We identified the168 upregulated hub genes (minimum degree of interaction is 20 with other neighbours DEGs) (Supplementary Table S11) and 52 downregulated hub genes (minimum degree of interaction is 20 with other neighbours DEGs) (Supplementary Table S12) in the MDS. The PPI networks of upregulated and downregulated hub genes were illustrated in Figure 3A and B, respectively. The top ten upregulated hub genes are STAT1, STAT3, IFIH1, EPRS, GRB2, RAC2, MAPK14, CASP1, SPI1, and TBK1. The identified top ten downregulated hub genes are CREBBP, HIF1A, PIK3CA, EZH2, PIK3R1, MDM2, IRF4, CXCR4, PCNA, and CD19. The top upregulated and downregulated hub genes and their regulatory status, names, the rank of hub genes, degree of interaction, combined ES, and adjusted P values are shown in Table 3.
In addition, we investigated the significant gene clusters for the downregulated genes. We revealed that one cluster (Supplementary Table S13), with an MCODE score of 13.935, is significantly derived from the original PPI of downregulated genes. The 32 genes are interconnected in the cluster 1 network ( Figure 5A and Supplementary Table S13). The highly connected nodes in cluster 1 are EZH2, HELLS, MKI67, CD19, PAX5, RRM2, TCF3, EBF1, CD79A, VPREB1, KIF11, NUSAP1, E2F8, and IL7R. We found that the 3 pathways, included primary immunodeficiency, B cell receptor signaling pathway, and hematopoietic cell lineage are significantly enriched in cluster 1 ( Figure 5B).

Expression validation of top hub genes in the independent dataset
To determine the aberrant expression of MDS-associated hub genes in RNA-sequence data, we investigated the expression level of top ten upregulated and top ten downregulated hub genes in the independent dataset GSE114922. Interestingly, we found that the upregulated STAT1, IFIH1, RAC2, and CASP1 are significantly upregulated in the GSE114922 when compared to the MDS patients with controls ( Figure 6). Besides, the 9 hub genes out of the top ten downregulated hub genes (CREBBP, HIF1A, PIK3CA, EZH2, PIK3R1, MDM2, IRF4, CXCR4, and CD19), are consistently downregulated in the GSE114922 (Figure 6). This result indicates that the aberrant expression of these key hub genes is associated with the pathogenesis of MDS.

Hub genes are associated with poor survival prognosis in the MDS
To find the correlation of hub genes with the survival time of MDS patients, we investigated the survival prognosis of 168 upregulated and 52 downregulated hub genes based on the mean expression of individual genes. We found that upregulated hub genes' expression levels included MX2, GBP2, PXN, IFI44, FDXR, PLCB2, ASS1, ERCC4, PML, and RRAGD are significantly correlated with shorter survival times of MDS patients ( Figure 7A). On the other hand, the lower expression level of downregulated hub genes, including CD19, PAX5, TCF3, LEF1, NUSAP1, and TIMELESS are significantly associated with the shorter survival time of MDS patients ( Figure 7B). Since, these hub genes are involved in the PPI network with other neighbour genes, indicating that the prognostic hub genes are associated with the clinical outcome of MDS patients through the PPI-based signaling in the cellular environment. Interestingly, the functional enrichment analysis revealed that the sixteen hub genes are related to the enrichment of acute myeloid leukemia, melanogenesis, Wnt signaling pathway, chemokine signaling pathway, and pathways in cancer (P

Discussions
Genome-wide expression profiling is the fundamental step to identify the differentially expressed genes in disease conditions [24]. We combine the four microarray gene expression profiling datasets into a single integrated dataset for investigating the differential expression of genes in the MDS relative to the healthy controls. We revealed that the 1619 upregulated (Supplementary Table S1) and 785 downregulated genes (Supplementary Table S2) in the MDS patients by a meta-analysis (Combined ES > 0.585, Adjusted P < 0.05). Previous studies consistently explored that CD34 + cells-derived transcriptomes are associated with MDS and acute myeloid leukemia. For example, the expression level of the ALDH3B1 gene is elevated in the MDS [10]. E2F6, another upregulated gene, is associated with CHK proteins in cellcycle checkpoint control and consistently deregulated in the MDS [10]. Deregulated expression of IRF4 is associated with acute myeloid leukemia and myelodysplastic syndrome susceptibility [25]. AKAP12, a downregulated gene with tumour suppressor properties, is related to the myeloid malignancies [26].
It was previously reported that the functional enrichment analysis, including gene ontology and the enrichment of pathways, are substantially associated with significant DEGs in the MDS [5,8]. We found that the upregulated and downregulated DEGs are significantly related to various biological processes, cellular components, and molecular functions, respectively ( Figure 1). Since deregulated pathways are critically associated with the molecular pathogenesis of MDS [27], we identified the KEGG pathways that are significantly associated with identified DEGs. The upregulated pathways are mainly involved in immune system deregulation, metabolic abnormalities, and aberrant cellular signalling and developmental process (Figure 2A and Supplementary Table S9). In addition, the downregulated genes are aberrantly involved with the deregulation of immunity, cancer, and cellular signalling in the MDS (Figure 2A and Supplementary Table S10). A. Pellagatti et al. demonstrated that the Wnt pathways, immunodeficiency, apoptosis, and chemokine signaling are the deregulated pathways in the MDS [10]. Ying le et al. reported that hematopoietic cell lineage is deregulated in the Table 3. The top ten upregulated and top ten downregulated hub genes were extracted from the original PPI. The hub genes were identified by using the Cytoscape plug-in CytoHubba tool. MDS [5]. B cell receptor signaling pathway is associated with shared gene signature between MDS and AML [28]. Altogether, these results indicated that the aberrantly expressed genes are involved with the deregulation of signaling pathways and key biological functions in the MDS. Various human diseases are originated from the result of abnormal protein-protein interactions [29]. We constructed and identified the key interacting genes from the PPI network of DEGs (Figure 3). We found that the STAT1, STAT3, IFIH1, EPRS, GRB2, RAC2, MAPK14, CASP1, and SPI1 were the top upregulated hub genes (Supplementary Table S11) and CREBBP, HIF1A, PIK3CA, EZH2, PIK3R1, MDM2, IRF4, CXCR4, PCNA, and CD19 (Supplementary Table  S12)   and MDS [30]. In stem cells of MDS patients, the master myeloid differentiationdriving transcription factor SPI1 was expressed at higher levels [31]. In our expression validation analysis, we checked the expression differences of the top 10 upregulated and top 10 downregulated hub genes in the independent dataset (GSE114922). Interestingly, we revealed that the thirteen genes are consistently deregulated in the validation cohort ( Figure 6), indicating that the expression of these genes is consistently deregulated in the multiple datasets. Some previous studies reported the consistent results. For example, the elevated IFN-γ/ STAT1 signaling is associated with MDS progression and treatment response [32]. Rac2 may play a crucial role in the initial development of myelodysplastic syndrome [33]. The expression of CREBBP is associated with hematopoietic malignancies, including myelodysplastic syndrome (MDS) [34]. Variants in IRF4 related to acute myeloid leukemia and MDS susceptibility [25]. Anke C. Spoo et al. suggested that CXCR4 should be incorporated into the risk assessment of AML patients [35]. The expression level of CD19 is downregulated and one of the crucial genes in the MDS [36]. Since genes with similar characteristics have tended to interact with each other through PPI networks in the diseases/disorders, clustering analysis provides the information to predict human disease-related gene clusters/ modules [37]. We investigated the significant gene clusters (MCODE score > 5.0) in the MDS, and we revealed that these gene clusters are substantially related to the enrichment of KEGG pathways in the MDS (Figures 4 and 5 and Table 4), indicating that these gene clusters are crucially associated with the pathogenesis of MDS. Furthermore, we analyzed the association of hub genes with the survival prognosis of MDS patients. The upregulation of MX2, GBP2, PXN, IFI44, FDXR, PLCB2, ASS1, ERCC4, PML, and RRAGD hub genes are significantly correlated with shorter survival times of MDS patients ( Figure 7A) and the downregulation of CD19, PAX5, TCF3, LEF1, NUSAP1, and TIMELESS ( Figure 7B) are significantly correlated with the overall survival prognosis of MDS patients. It was stated that the overexpression of PXN is associated with poor prognosis in AML patients [38]. M. A. ElBaiomy et al. concluded that the higher expression level of LEF1 is a favourable prognostic marker in the AML patient risk [39]. Finally, we found that these survival-associated hub genes have diagnostic efficacy in the MDS when compared with healthy control (Figure 8). Altogether, this study provides the clue that these deregulated sixteen genes (MX2, GBP2, PXN, IFI44, FDXR, PLCB2, ASS1, ERCC4, PML, RRAGD, CD19, PAX5, TCF3, LEF1, NUSAP1, and TIMELESS) may be associated with the prognostic and diagnostic value of MDS patients. The aberrant expression level of CD19 is indicated as the marker of monocytes lineage in acute myelogenous leukemia [40]. In hematolymphoid, it was demonstrated that the lower expression of PAX-5 helps to establish B-cell lineage and increases diagnostic yield [41]. In the myelodysplastic syndromes, downregulation of the LEF1 is associated with disease progression [42]. The major limitation of our study is that the deregulated key genes and networks identified by bioinformatics analysis have not been validated by experimental research in the MDS patients. Another drawback of our study is that MDS is a group of heterogonous diseases, the different classification subgroup and the different prognosis subgroup should have other hub genes. However, we did not identify the hub genes in the different subtypes of MDS patients. Thus, although our findings could provide potential biomarkers for the MDS prognosis and diagnosis, as well as therapeutic targets, further experimental and clinical validation would be warranted to transform these findings into the clinical application of MDS and different subtypes of MDS patients.

Conclusions
This bioinformatics analysis provided compelling evidence about the deregulated CD34 + cells-derived hub transcriptomes that are associated with an inferior prognosis and diagnostic efficacy in the MDS patients. The association of these prognostic-hub genes should be experimentally validated in different subtypes of MDS patients. In summary, identifying key hub genes and their association with the prognostic and diagnostic efficacy may provide substantial clues for the diagnosis and treatment of MDS patients.

Disclosure statement
No potential conflict of interest was reported by the author(s).