Identification of Novel Therapeutic Targets in Myelodysplastic Syndrome Using Protein-Protein Interaction Approach and Neural Networks

A Myelodysplastic syndrome (MDS) is a disorder characterized by active but ineffective hematopoiesis that leads to pancytopenia. MDS, also termed as myeloid neoplasms, is described by different level of cytopenia that is a different level of blood cells in the body. Various genes mutations have been reported to associate with MDS. To investigate the mechanisms at molecular level underlying MDS patients carrying genetic mutations, the gene expression profiles of MDS the patients were compared to that of healthy individuals and analyzed by bioinformatics tools. In biological networks, genes having important functional roles can be identified by a measure of the node. Networks of genes an in co-expression, candidate hubs also called extremely associated genes have been connected with the key disease-related pathway. Thus, this technique was used to discover the MDS related genes hub. Affymetrix Human Genome U133 plus 2.0 gene expression dataset of microarray GSE58831 was retrieved from GEO (Gene Expression Omnibus) database that contained four 159 diseased samples and 17 samples of control. Based on statistical method and co-expression networking, DEGs gene was detected. DAVID an online tool was employed for Gene ontology (GO) function and KEGG pathway enrichment analysis of DEGs. Besides, PPI (Protein-protein interaction) networks were developed by mapping the DEGs with respect to protein-protein interaction set available in databases for the identification of the pathways involving DEGs. PPI interaction networks were divided into subnetworks via MCODE algorithm and were examined by Cytoscape. Interferon Signaling Pathway, cellular response to zinc ions and negative growth regulation. Immune response, negative regulation of transcription from RNA polymerase II promotor, positive regulation of smooth muscle cell proliferation and cellular response to Dexamethasone stimulus, extracellular matrix, extracellular space, and extracellular region were the main enriched processes and pathways in these DEGs and many of the hub genes’ (UBC, TP53, EGFR, GADPH, CREBBP, HDAC1, STAT1, IL6, ESR1, SMAD4) reported in this study were purposed as novel therapeutic targets against MDS disease. Identification of Novel Therapeutic Targets in Myelodysplastic Syndrome Using Protein-Protein Interaction Approach and Neural Networks


Introduction
A Myelodysplastic syndrome (MDS), also known as preleukemia, is a state of disease characterized by active but ineffective hematopoiesis leading to pancytopenia [1]. Fatigue, breath shortness, paleness, bleeding, and rashes are the general symptoms observed in patients with MDS and also known as myeloid neoplasms, described by different level of Cytopenia that is a different level of blood cells in the body. Association of Cytopenia with dysplasia usually led to acute myeloid leukemia. Epidemics of MDS are frequently reported in older individuals [2,3]. Many factors have been reported as causes of MDS including gene mutation that is broadly considered as a major factor contributing to MDS. Gene mutations result in genotypic alteration and thus, lead to cytogenetic shifts in gene expression. These cytogenetics shifts are usually characterized by abnormal transcription of the gene, epigenetic, cell signaling and effects of gene dosage. In addition, many frequent cytogenetic aberrations were observed including a long arm of chromosome 7, 20 and 5 that lead to a complex karyotype. In MDS, most commonly mutated genes are of RNA splicing regulators and epigenetic modifiers, along with pathways of signal transduction and transcription factors have been frequently targeted towards this syndrome [4][5][6]. MDS was reported to occur because of mutations in different genes primarily includes SF3B1, SRSF2, ZRSR2, U2AF1, DNMT3A, EZH2, TP53, RUNX1, and TET2. SF3B1 and demonstrated independent expression with low mutation frequency, reduced expression of TET2 in the stem and progenitor cells, and increased DNA methylation activity in MDS [7]. Mutations in TET2 occurred with same frequency in MDSs [8]. Different mutations were identified but a mutation in SF3B1 gene was concluded as a most important factor. MDS epidemics due to a mutation in SF3B1 gene contributed a total of 19.9% of all the reported cases. Patients with SF3B1 gene were usually reported with more complications during the lifespan in comparison to the patient with MDS due to other risk factors. SF3B1 gene was also documented with mutations in a variety of other tumor types [9]. According to IPSS (International Prognostic Scoring System), MDS was categorized in different risk groups, such as primary and secondary MDS [10]. Abnormalities were observed like clonal karyotype that formed nearly 40-50% of primary MDS and about 90% of secondary MDS [11]. To date, no inclusive treatment other than Azacitidine, Decitabine, and Lenalidomide is available in the markets and approved by Food and Drug Administration of United States (US) for MDS. However, allogenic therapy was reported as conclusive but currently, less than 10% patients undergo such stem cell transplant [12]. Pellagatti [13] detected several downregulated genes and gene pathways in MDS using gene expression profiling [13][14][15][16][17][18]. Recently Gerstung et al. [19] described a huge mutation in 738 patients with MDS and presented a comprehensive map for the mutational landscape of myelodysplasia screen in 111 cancer genes. Expression changes were typically lower than 10% as a reoccurrence of mutation and could not be reliably mapped in unknown and small subgroups. The genomics data reported by Gerstung et al. [19] characterized the gene expression profiles of 159 MDS patients by comparing with 17 normal individuals to explore the expression pattern of the genome in MDS affected individuals [19].
To date, the only agent with high efficacy such as Hypo ethylating agent (HMA) has been employed for the treatment that improved clinical outcomes in 40-60% of patient, however, no universal and inclusive drugs are till date available against MDS. Therefore, we used microarray gene expression data reported by Gerstung et al. [19] to discover the potential drug targets. This study follows the statistical test models along with genes enrichment and protein-protein interaction analysis to identify the novel drug targets as a treatment for MDS. This examination may facilitate and give a better understanding of the detailed molecular mechanisms underlying MDS and thus, assist in the selection of suitable and effective treatment strategies for patients with MDS.

Microarray data
Gene expression profiling data of MDS was download from GEO (Gene Expression Omnibus) [20] reported by Gerstung et al. [19], under accession number GSE58831 [19]. This dataset was based on Affymetrix Human Genome U133 plus 2.0 Array GPL570 platforms. A total of 176 samples were reported in this dataset, that includes 17 normal candidates and remaining 159 candidates carried different mutations for MDS. Quantile normalization [20] was carried out to normalize the dataset via integrated GCRMA package of R v3.0.2 [21,22]. On the basis of Gerstung et al. [19] reported information, data were divided into two groups as normal and diseased ( Table 1).

Identification of DEGs
For the identification of differentially expressed genes, statistical tools were applying to compare the normal samples with the diseased samples to understand the molecular markers disturbed in MDS. Student's t-test, Pearson correlation test, and Benjamin-Hochberg method were also applied for multiple testing via R v 3.0.2 as a standardized method to identify the database of essential genes (DEGs). The parameters were fixed for identification of these essential genes contained Benjamin-Hochberg [22] multiple testing methods (FDR <0.05) with fold-change ≥ 1 and the adjusted p value <0.05.

Gene ontology and pathway enrichment analysis of DEGs
Gene ontology (GO) analysis such as GO Biological Processes, Molecular Function and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways analysis are the most common and useful annotation of different Genes and its products. Attributes of high throughput genomics and transcriptomic data could be obtained through GO [23,24]. For the functional analysis, DAVID is an essential and most frequent functioning online server which can functionally annotate genes with high success [25]. Here, DAVID was used with a P value <0.05 to annotate the functional role, KEGG pathways and GO enrichment analysis of the identified DEGs [26].

PPI Network Generation
Cytoscape [13] is reliable software for the construction, mapping, visualization, and analysis of protein-protein interaction (PPI) networks. It works parallel with large databases which provides information regarding protein-protein, protein-DNA, and genetic interactions. STRING [27], BioGrid [28], GeneMANIA [29] for the retrieval of protein interactions. In the present study, we used GeneMANIA, STRING, and BioGrid to retrieve the interactions and construction of Protein-protein interaction network for the identified DEGs. Many topological parameters are available to analyze and compare the network. Cytoscape is freely available software which also provides an integrated function "Network Analyzer" to analyze the gene/protein network. We also used "Network Analyzer" to calculate the parameters for all the constructed networks. The primary parameters which were analyzed include power law of node distribution, distribution of node degree, clustering coefficient, network centralization and density to distinguish the three constructed networks [30].

Hub Genes Identification and Molecular Complex Detection Analysis of DEGs
A number of plugins are available for Cytoscape to perform different analyses. Identifying hub genes which can be employed as probable drug targets were identified using a well-known integrated plugin Cytohubba [31]. Cytohubba provides the eleven methods of topological analysis comprising degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Maximal Clique Centrality and six centralities (Bottleneck, Eccentricity, Closeness, Radiality, Betweenness, and Stress) based on shortest paths. It uses ranking features to rank different nodes in a network and based on their values hub genes are reported. "Molecular Complex Detection" (MCODE) is a novel clustering algorithm which identifies sub-modules, as shown in Figure 1, in large PPI networks. It allows fine-tuning of clusters of interest for protein networks. We used MCODE [32] along with Cytohubba to identify the interconnected dense sub-modules in the network. The hub genes and sub-modules were subjected to enrichment analysis again for subsequent verification using BinGO [33], an integrated app in Cytoscape.
Step 2: Input the data for training, the interrelated values of input and output execute for training using neural network algorithms.

Start
[1] Define the sample input and define buffer to store all the samples.
[5] Input one sample and if the desired job output for the sample is a new process; there are no knowledge datasets under this process. Assign a new process and put new knowledge datasets under this process, Go to step 6.
Step 4: Calculate the neurons of output, every neurons output signals calculated using net j =∑ i=1~m w ji x i +b j and sigmoidal function is making use of change net j for every neuron of hidden layers.
Step 5: Signal of output layers calculation using net k =TV k +δ K L .
Where TV k is target value of output neurons and δ K L is the error of neuron.
Step 6: Compute the error of neuron k until network is congregate and the error is computed using SSE=∑ i=1~n (T i -Y i ) 2 . Where T i is actual assessment and Y i is estimated assessment.

DEGs analysis and co-expression network
The complex biological system is composed of thousands of genes and its products. Genes and respective products interacted randomly and formed a more complicated network. The expression of these genes and respective proteins performed and conveyed much information such as signaling, transport, immunity, and defense and disease susceptibility. Microarray expression analysis may pose many regular variations. Herein, we used "Array Quality Metrics" and "GCRMA" to perform the normalization. Comparing two types of datasets, control and diseased, using statistical analysis; Student's t-test, Pearson correlation test and Benjamin-Hochberg methods, we identified a total of 585 DEGs, wherein the 224 and 361 genes were recorded as Upregulator and down-regulator genes, respectively.

GO function and KEGG pathway enrichment analysis
GO analysis approach is the easy and accessible approach for the functional annotation of genomics data. Using DAVID database, we explored the functional changes in a patient with MDS. GO enrichment analysis was performed using the identified DEGs results into a diverse array of processes and functions as shown in Table 2. Biological processes of up regulated genes were mainly found in Interferon Signaling Pathway, Cellular Response to Zinc Ions and negative regulation of growth. Immune Response, negative regulation of transcription from RNA polymerase II promotor, positive regulation of smooth muscle cell proliferation and cellular response to Dexamethasone stimulus were the enriched biological processes in down regulated DEGs. While only molecular function in down regulated DEGs described the transcriptional activator activity, RNA polymerase II core promotor proximal region sequence-specific binding with FDR 0.04. Furthermore, up regulated DEGs were found in extracellular matrix, extracellular space, and extracellular region part. The analysis of KEGG pathway enrichment revealed that down regulated DEGs were mainly involved in only pathway primary for immunodeficiency while up regulated genes were not enriched in any of KEGG pathways (Table 3).

Protein-Protein Interaction Network
A powerful tool to identify and understand the mechanism of cellular networking is protein-protein interaction network. It provides the basic understanding of both healthy and diseased conditions by understanding the distortion in protein cellular network. Nodes are proteins/genes while their interactions are edges in the protein-protein interaction networks. Herein, DEGs obtained from the analysis of microarray gene expression dataset of MDS were mapped and a protein interaction network was constructed.
A total of 585 DEGs were mapped onto protein-protein interaction network in Cytoscape. These DEGs were retrieved from available public databases such as GeneMANIA, STRING, BioGrid and hub genes were identified using a Cytoscape app, Cytohubba. Based on the degree of connectivity between nodes and hub genes were identified as well as ranked. The red nodes were highly connected nodes while the others were low and medium. Cytohubba was used to calculate the degree of each node in the network, that detected the hubs as nodes with degree value >10 (Table 4). The complete DEGs network was divided into a highly dense interconnected module using MCODE algorithm. A total of 9 modules (Sub-networks were identified) using the k-core value of 2.0, node score cut-off of 0.2, maximum depth from the seed node of 100 and graphics processing-unit-based parallelization was employed to find modules efficiently. Only sub-networks with a number of nodes greater than 15 were selected ( Figure 2). Finally, functional enrichment of sub-networks was carried out. First, hub genes were checked in subnetworks and presence of hub genes in the sub-networks revealed that our results were reliable. Figure 2 showing hub genes which mean nodes with high degree. Red nodes are highly connected genes, yellow and orange color are medium and low connected genes. Subnetworks 1,2,3 and 4 were enriched in GO terms related to chemical component extracellular region part, extracellular matrix. Subnetworks 1 and 2 were enriched in Immune response, Negative regulation of transcription from RNA Polymerase II promotor. Subnetwork 3 and 4 were enriched in the Positive regulation of smooth muscles cell proliferation, Cellular response to Zinc ion. P-values of all the enriched GO terms were in the range of 1.50E-07 to 8.78E-07 (Table 5 and Figure 1).

Interaction of Hub Nodes with Interaction
Calculate the neurons of output, every neurons output signals calculated using neural network algorithms where Table 6 indicate the best optimized proteins interactions.

Discussion
Protein-protein interaction network has become a powerful tool for identification of targets and analysis of different diseases. In the current era, PPI network analysis has been widely utilized to understand the mechanism of different diseases, identifying drug targets and metabolic process. Analysis of gene expression dataset and identification of differentially expressed genes in a disease condition compared to the normal run a way of targeting different nodes for the discovery of novel drug candidates. Here, we used microarray gene expression dataset submitted to GEO under accession number GSE58831. Different statistical tools were used such as, Student's t-test, Pearson correlation, and Benjamin Hochberg multiple testing method (FDR<0.05 with a fold change>_1) and adjusted P-value 0.05) for the identification of DEG, that result into a total of 585 differentially expressed genes, in which 361 were downregulated and 224 were upregulated. Among the downregulated genes, RAG1 (recombination activating gene 1) was found to be the most downregulated one with a -4.69-fold change followed by MME (Membrane metallic-endopeptidase) and ARPP21 (cAMP-regulated phosphoprotein 21) with 444 Fold change and -4.43, respectively. Of the identified upregulated, DEGsHBG2/// HBG1 (hemoglobin subunit gamma 2///hemoglobin subunit gamma 1), HBG2///HBG1 (hemoglobin subunit gamma 2///hemoglobin subunit gamma 1) and HBG2///HBG1 (hemoglobin subunit gamma

EGFR
Epidermal growth receptor factor 108 Positive regulation of smooth muscles cells Proliferation.

Histone Deacetylase 1 82
Negative regulation of transcription from RNA polymerase II promotor.

HDAC1
Histone Deacetylase 1 76 Negative regulation of transcription from RNA polymerase II promotor.

STAT1
signal transducer and activator of transcription 1 66 Positive regulation of smooth muscles cells Proliferation

IL6
Interleukin 6 64 Positive regulation of smooth muscles cells Proliferation.

SMAD4
Mothers against decapentaplegic homolog 4 57 Cell signaling. The enrichment of hub genes given in Table 4 reports the same category of processes and thus validated our results. The Ubiquitin C (UBC) and Tumor protein (TP53) were found as "super hubs" genes. Zhang et al. [33] reported the Tumor protein 53(TP53) genes mutations occurring in patients with myelodysplastic syndrome (MDS) which was associated with abnormalities like a high risk of karyotype including 17p and complex cytogenetics. The existence of TP53 mutations in 17% of MDS was isolated with del(5q). TP53 was reported as a most frequently mutated gene with a complex karyotype [34,35]. Signal transducer and activator of transcription 1 play a vital role in body's immune response. Herold et al. [36] reported that interleukin-6(IL-6) exerts positive effect and played a great role in infections, inflammations and possible involvement in leukemogenesis. Mothers against decapentaplegic homolog 4(SMAD4) gene was an important element of tumor suppressor as reported by Dolatshad et al. [36] that also showed differential Exon usage overlapping with other gene. Li et al. [37] documented the epidermal growth factor receptor (EGFR) gene represent somewhat the malignant proliferation and suppress apoptosis through an unknown mechanism in MDs because it overexpressed in most cases [37]. Furthermore, our finding revealed that a number of Hub genes were associated with proteins because they disturbed the progression and development of MDS, and provided the important route towards therapeutics development against MDS [38].

Conclusion
To examine the relationships between the importance of genes and several topological characteristics in human PPI network, system biology approach was used in the present study. There are several previously reported works with mutations and pathways associated with different gene mutation and pathways involved in its pathogenesis. Because of this, gene expression data of MDS patients and Healthy individual's samples were used to identify the DEGs relation to MDS.