Screening and identification of key candidate genes and pathways in myelodysplastic syndrome by bioinformatic analysis

Myelodysplastic syndrome (MDS) is a heterogeneous hematologic malignancy derived from hematopoietic stem cells and the molecular mechanism of MDS remains unclear. This study aimed to elucidate potential markers of diagnosis and prognosis of MDS. The gene expression profiles GSE19429 and GSE58831 were obtained and downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) in MDS were screened using GEO2R and overlapped DEGs were obtained with Venn Diagrams. Then, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway functional enrichment analyses, protein–protein interaction network establishment and survival analyses were performed. Functional enrichment analysis indicated that these DEGs were significantly enriched in the interferon signaling pathway, immune response, hematopoietic cell lineage and the FOXO signaling pathway. Four hub genes and four significant modules including 25 module genes were obtained via Cytoscape MCODE. Survival analysis showed that the overall survival of MDS patients having BLNK, IRF4, IFITM1, IFIT1, ISG20, IFI44L alterations were worse than that without alterations. In conclusion, the identification of these genes and pathways helps understand the underlying molecular mechanisms of MDS and provides candidate targets for the diagnosis and prognosis of MDS.


INTRODUCTION
Myelodysplastic syndrome (MDS) is a clonal hematopoietic stem cell (HSC) disease, mainly involving cytogenetic changes and/or genetic mutations, and there is also a widespread gene hypermethylation at advanced stage (Adès, Itzykson & Fenaux, 2014). The main features of MDS are myeloid cell cytopenias, morphologic dysplasia, ineffective hematopoiesis and a high risk of transformation to acute myeloid leukemia (AML) (Weinberg & Hasserjian, 2019). Although there are many new drugs for the treatment of MDS in recent years, about one-third of patients with MDS experience transformation to AML and the overall survival (OS) of MDS remains not ideal (Duchmann & Itzykson, 2019). There is accumulating evidence that MDS are associated with abnormal genetic These two gene expression datasets are based on the GPL570 platform (HG-U133_Plus_2 Affymetrix Human Genome U133 Plus 2.0 Array). GSE19429 dataset contained 183 MDS patients and 17 healthy controls. GSE58831 dataset contained 159 MDS patients and 17 healthy controls. Bone marrow samples were obtained and CD34+ cells isolated from MDS patients and healthy controls.
Identification of DEGs and data processing GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) is an interactive web tool that can compare two or more sets of samples from the GEO series to identify different genes under different experimental conditions. The DEGs between MDS patients and healthy controls samples were screened using GEO2R. Probe sets without corresponding gene symbols were removed. In this study, P-value < 0.05 and (logFC (fold change)) ≥ 1 were considered statistically significant. I also used the online tool Venn diagrams (http://bioinformatics.psb.ugent.be/webtools/Venn/) to identify the overlapped DEGs in the GSE19429 and GSE58831 datasets.
GO and KEGG pathway enrichment analyses DAVID (http://david.ncifcrf.gov) is a bioinformatics database that integrates biological data and analysis tools to provide systematic, comprehensive biological functional annotation information for large-scale genes or proteins. I used DAVID to perform a functional enrichment analysis, which included GO and KEGG pathway analyses. GO term enrichment analysis includes molecular function (MF), cellular component (CC) and biological process (BP). KEGG is a database resource that combine of genomic information and higher-level functional information by computerizing processing of known BPs in cells and systematically analyzing the existing gene functions. Pathway analysis can graphically show intracellular BPs such as metabolism, membrane transporting, signaling and cell growth cycles. P-value < 0.05 was considered statistically significant.

PPI network establishment and module analysis
Search Tool for the Retrieval of Interacting Genes (STRING, http://string-db.org) is an open, free online system that provides predicted and experimental interactions of genes and proteins. I used STRING online database to predict and construct the PPI network. In the present study, STRING was used to construct PPI network of DEGs for MDS and the confidence score of the interaction >0.4 was considered statistically significant. Cytoscape is a public software that can graphically display the network and analyze and edit. In this study, I used cytoscape to annotate and beautify the up-downregulated genes of PPI. Molecular Complex Detection (MCODE) is an APP of a plug-in Cytoscape to find densely connected regions for clustering a given network. The PPI networks were mapped using Cytoscape and the significant modules in the PPI networks were identified using MCODE. The screening criteria for the module genes were as follows: degree cutoff = 2, node score cutoff = 0.2, k-score = 2 and Max depth = 100. Subsequently, I used STRING to perform GO and KEGG pathway enrichment analyses on the module genes.

Hub genes selection and co-expression analysis
The screening of hub genes is mainly carried out through the MCODE plug-in of cytoscape. CBioPortal (http://www.cbioportal.org) is an online analysis platform for multidimensional cancer genomic data that visualizes genes, samples and data types. In this study, I constructed a network of hub genes and their co-expression genes via cBioPortal. Then I used the BiNGO of Cytoscape plug-in to perform BP analysis and visualization of the hub genes. Finally, I performed hierarchical clustering of the hub genes through UCSC Cancer Genomics Browser (http://genome-cancer.ucsc.edu).

Survival analysis
The OS analysis of hub genes and module genes were performed using Kaplan-Meier curve in cBioPortal. P-value < 0.05 was considered statistically significant.

Identification of DEGs in MDS
I obtained gene expression profiles from the GSE19429 to GSE58831 datasets of CD34+ cells of bone marrow samples from MDS patients and healthy controls and analyzed DEGs using GEO2R. As shown in Figs. 1A and 1B, I identified 179 and 500 DEGs from GSE19429 to GSE58831, respectively. The overlap among the two datasets between MDS patients and healthy control contained 133 genes as shown in the Venn diagram (Fig. 1C).

GO and KEGG pathway enrichment analyses of DEGs
I used DAVID to perform and analyze the biological classification of DEGs, which included GO analysis and KEGG pathway enrichment analysis. GO analysis showed that the function of upregulated genes in DEGs were mainly enriched in type I interferon signaling pathway, defense response, response to external stimulus, immune effector, cell surface receptor signaling pathway, cytokine-mediated signaling pathway, regulation of cell action potential, sensory perception, immune response, Notch signaling pathway (Table 1). Downregulated genes in BP were mainly enriched in immune response, cell surface receptor signaling pathway, cell activation, cell differentiation, regulation of immune system, regulation of macromolecule metabolic, cell-cell adhesion, biosynthetic process, cell proliferation, cell death, specific DNA binding, apoptotic process, regulation of metabolic process, epithelium development, blood vessel morphogenesis, cell migration, muscle tissue development. For CC, downregulated genes were mainly enriched in side of membrane and extracellular space. For MF, enrichment of downregulated genes was primarily in transcription factor activity, receptor binding, signal transducer activity, steroid hormone receptor activity, hormone receptor binding, core promoter binding, macromolecular complex (Table 1). KEGG pathway analysis revealed that enrichment of DEGs was mostly in primary immunodeficiency, hematopoietic cell lineage, FOXO signaling pathway, hippo signaling pathway, transcriptional misregulation in cancer (Fig. 2).

PPI network establishment and module analysis
The PPI network of DEGs was established based on the STRING online database and Cytoscape software (15 upregulated and 103 downregulated) (Fig. 3A). There were 133 DEGs in the two datasets, however, 15 intersection genes were not found in the STRING. We obtained four significant modules based on the degree of importance by utilizing cluster analysis of the PPI network in Cytoscape MCODE. Module 1 contained 12 nodes and 53 edges (  (Fig. 3E). The enrichment analyses of genes involved in these modules were preformed using DAVID. As showed in Table 2, GO analysis indicated these genes in modules were mainly concentrated in immune response, immune cell differentiation, side of membrane, transcription factor, DNA binding, cell adhesion, hemopoiesis, cell activation, defense response, cell surface receptor, DNA metabolic, kidney development. As for KEGG pathway analyses, module genes mainly enriched in primary immunodeficiency, hematopoietic cell lineage, B cell receptor signaling pathway, FOXO signaling pathway.

Hub genes selection and biological analysis
In study, a total of four hub genes were identified by MCODE, as IFI44L, IRF4, IL7R, SKIL. I constructed a network of hub genes and their co-expression genes via cBioPortal, but did not find the co-expressed gene of IFI44L in cBioPortal (Fig. 4A). Then I analyzed the BPs of hub genes through BINGO as showed in Fig. 4B. Hierarchical clustering of hub genes was constructed using UCSC and blood derived cancers and healthy controls were roughly identified by hub genes (Fig. 4C).

Survival analysis
Kaplan-Meier curve was employed to predict the prognosis of the 25 identified module genes. First I had a simple understanding of these module genes, such as the full name, abbreviation and function of the genes (Table 3). Among the genes examined, MDS patients having BLNK, IRF4, IFITM1, IFIT1, ISG20, IFI44L alterations showed worse OS (Fig. 5).
Although the difference between IFIT1 and IFI44L was not statistically significant, the OS of patients having IFIT1 and IFI44L alteration showed an obvious decline tendency.

DISCUSSION
MDS is a type of stem cell-derived bone marrow clonal disease with a relatively uneven presentation profile. The incidence of MDS increases with age, the onset of MDS before the age of 40 years is rare. However, when the age is older than 70, the incidence rate exceeds 50 per 100,000 people. In recent years, due to the gradual increase of the population aging problem, the number of patients with MDS has also increased year by year. The exact cause of MDS is uncertain and its relatively risk factors include chemotherapy or radiation therapy and tobacco or poisonous solvents or agricultural chemicals (Tefferi & Vardiman, 2009). In recent years, molecular biology and immune regulation have become increasingly important in MDS (Hosono, 2019;Liu et al., 2019). MDS, also known as pre-leukemia, is characterized by a high risk of transformation to AML. This may be one of the causes of poor prognosis in MDS, so identifying potential markers for MDS diagnosis and prognosis is extremely urgent.
Recently, gene chips have been widely used in research and the corresponding data have been stored in public databases. A number of studies have been conducted based on microarray data profiles to elucidate the mechanisms of disease. But if there is only one data, errors may occur. Therefore, combining bioinformatics methods with expression profiling techniques can partially reduce bias. In this study, I analyzed the expression of genes in two mRNA microarray datasets based on CD34+ cells of bone marrow samples from MDS patients and healthy controls. A total of 133 DEGs were identified, including 15 up-regulated genes and 103 down-regulated genes, however, 15 DEGs were not found in STRING. GO and KEGG pathway enrichment analysis was performed by     (Grivennikov, Greten & Karin, 2010;Hanahan & Weinberg, 2011). In a mouse model, IFN-γ-induced apoptosis gene expression promotes destruction of HSCs and progenitor cells (Chen et al., 2015). Interferons are proteins secreted by human immune cells in response to infections (mainly viral). Most of the genes I have identified are immune genes and/or related to the interferon pathway. Therefore, I speculate that the pathogenesis of MDS may be related to immune factors, similar to the study by Silzle et al. (2019) which showed that lymphopenia has a significantly reduced OS in low-risk MDS patients, these data subtly support the role of the immune system in MDS.
In the current study, I selected IFI44L, IRF4, IL7R, SKIL4 as hub genes with the highest degrees. IFI44L, IRF4 and IL7R are immune genes and/or related to the interferon pathway. IFI44L, a type I interferon-stimulated gene, has been shown to be involved in human carcinogenesis and tumor progression (Gao et al., 2013). Pellagatti et al. (2018Pellagatti et al. ( , 2010 showed that IFI44L was upregulated by more than twofold in at least 60% of the MDS patients and indicated that IFI44L is one of the significant survival predictors in MDS. A recent study found that down-regulation of IRF4 is a necessary and independent effector of immunomodulatory drugs in primary effusion lymphoma (Patil, Manzano & Gottwein, 2018). IRF4 has oncogenic implications in a variety of hematological tumors including multiple myeloma, leukemia and lymphoma (Agnarelli, Chevassut & Mancini, 2018;Cherian et al., 2018;So et al., 2014;Zhang et al., 2013). IL7R is over expressed in adult acute lymphoblastic leukemia (ALL) and is associated with JAK/STAT pathway mutations (Gianfelici et al., 2019). In contrast, IL7R is down-regulated in MDS and this differential expression may be contributed to identifying MDS subtypes and leukemia (Pellagatti et al., 2004). SKIL, also known as SNON, promotes cell invasion of immortalized human mammary epithelial cells. These differential expression could be either cause of the ailment or an effect (or an indirect effect).
The PPI network of DEGs was established and four modules were considered significant in this network. A total of 25 genes were identified in the module models and the functions of these genes were analyzed and Kaplan-Meier curves were employed to predict prognosis. Moreover, the BLNK, IRF4, IFITM1 and ISG20 alteration are significantly associated with worse OS. While the IFIT1 and IFI44L alteration were not statistically significant, they showed an obvious downward tendency. IFI44L and IRF4 have been introduced previously. BLNK (B Cell Linker Protein), which is a selective target of repression, can result in the development of ALL by differentiation block of PAX5-PML protein (Imoto et al., 2016). The results of Nakayama et al. (2009) are consistent with this and somatic cell loss and mutation of BLNK lead to the generation of pre-B-cell leukemia. IFIT1 and IFITM1 are up-regulated in the CD34+ cells of most patients with MDS. IFIT1 and IFITM1 are interferon-stimulated genes (ISG) and up-regulation of IFIT1 is a potential diagnostic marker for MDS (Pellagatti et al., 2006). The ISG20 gene was first introduced by Gongora et al. (1997) as a promyelocytic leukemia nuclear body (PML-NB)-associated protein. The expression of ISG20 is reduced in IDH mutant tumors and ISG20 can promote local tumor immunity and may serve as a potential therapeutic target of human glioma (Gao et al., 2019). This is very interesting: all these identified genes are somehow related to interferons and/or immune response. However, due to a shortfall of the underlying data acquisition technology, false positive results may occur, which is also the biggest flaw of this paper. Next, I plan to collect samples of MDS patients and verify the conclusions of this paper through experiments.

CONCLUSION
In conclusion, this study aimed to perform a comprehensive bioinformatics analysis of datasets for MDS patients and healthy controls. Identify DEGs that may be involved in MDS progression and prognosis. Interestingly, most of prognostic genes and hub genes are interferons and immune-related genes. However, further studies are warranted to elucidate the biological function of these genes in MDS.