Bioinformatics analysis of the molecular mechanism underlying Huntington’s disease

We explore the underlying molecular mechanisms and to identify key molecules in Huntington's disease by utilizing bioinformatics methods. The gene expression profile of GSE3621 was extracted from the gene expression om-nibus. Differentially expressed genes between the R6/1 transgenic mouse model of Huntington's disease and controls at different time points were screened by limma package in R. Kyoto encyclopedia of genes and genomes database. It was used to analyze the pathways of differentially expressed genes. A searching tool of the protein-protein interaction network was constructed and visualized by Cytoscape. Molecular complex detection was utilized to performed module analysis. There were 513, 483, and 528 differentially expressed genes identified at weeks 18, 22 and 27, respectively, when compared with the control samples. Also, 24 significantly enriched R. Kyoto encyclopedia of genes and genomes database pathways were identified (9 in week 18, 6 in week 22, 9 in week 27), and 31 significant modules were identified from the protein-protein interaction network (13 in week 18, 8 in week 22, 10 in week 27). Hoxd8, Atf3, and Egr2 were confirmed as transcription factors related to Huntington's disease. There are widespread gene expression changes in Huntington's disease at different time points. Some hub genes, such as Usp18, Oasl2, and Rtp4, may play important roles in the pathogenesis of Huntington's disease.


Introduction
Huntington's disease (HD) is an autosomal dominant inherited neurodegenerative disease caused by a repeat expansion of prolonged CAG in the huntingtin (HTT) gene (Rodrigues and Wild, 2018). Mutant protein and production of transcriptional disorders, proteasome, autophagy, apoptosis, mitochondria, and metabolic dysfunction, neuroinflammation, oxidative stress, and consequent neurodegenerative changes, especially in the striatum. Among these, transcriptional dysregulation is one of the most important pathological mechanisms in HD (Bithell et al., 2009;Seredenina and Luthi-Carter, 2012;Vuono et al., 2015). The clinical character-istics of HD are progressive motor, cognitive as well as emotional disturbances (Bano et al., 2011). There is a wide range of gene expression changes in HD, and there is evidence that these changes are due in part to complex clinical manifestations (Seredenina and Luthi-Carter, 2012).
Inflammation is an emerging research area in neurodegenerative diseases (Kwan et al., 2012). Elevated circulating monocytes and levels of serum inflammatory cytokines are found in HD patients that considered to be hyper-responsive to immune stimuli (Ellrichmann et al., 2013). The elevated level of serum inflammatory cytokines was also found in some HD mouse models (Wild et al., 2008). Trager et al. (2015) found myeloid cells from the HD mouse model showed increased cytokine production when stimulated by lip polysaccharides (LPS) in vitro, macrophage isolated from R6/2 mouse showed elevated level of phagocytosis, similarly to results in HD patients.
Gene expression profiling is a rapid, high-throughput method for identifying mRNA expression in cells. By comparing the different gene expressions of the HD with controls, we can better understand the pathogenesis of HD and help to find potential target genes for treatment. Recently, bioinformatics mining and network construction were considered to play a critical role in researching and predicting the pathogenesis of various neurodegenerative disorders, including HD (Thanh-Phuong et al., 2014;Trager et al., 2015). However, up to the present, there is no bioinformatics study on HD differential gene expression profiles at different time points. Hodges et al. (2008b) analyzed the differential gene expression in the R6/1 mouse line throughout the progression of phenotypic signs from 18 to 27 weeks and uploaded data at the GEO accession GSE3621. By using GSE3621, we aim to further mine the key molecules and the potential molecular mechanisms related to HD pathogenesis. In this study, significant DEGs the R6/1 transgenic mouse model of Huntington's disease and controls at different time points were screened, followed by functional annotation. Potential HD related genes were revealed by PPI network analysis of proteins encoded by DEGs. In addition, some hub genes were also identified by MCODE analysis. The present study may provide new insights into the molecular mechanisms at different time points of HD pathogenesis.

Microarray data
Gene expression profile of GSE3621 was downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo/) database. Researches were designed to elucidate the relationship between mRNA differential expression and the emergence of the behavioral phenotype in the R6/1 mouse model of HD. Nine R6/1 transgenic mice (expressing full-length versions of mutant 122-131 glutamines) at week 18 (n = 3), week 22 (n = 2), week 27 (n = 4) and 9 R6/1 wild controls at week 18 (n = 3), week 22 (n = 3) and week 27 (n = 3) were used. Hodges et al. (2008a) detected a difference in activity at 18 weeks, such as rotarod performance, while novel object exploration and locomotor activity changed significantly at 22 and 27 weeks, so three-time points were selected to detect DEGs.The base data built on the platform of GPL1261 was analyzed by the Affymetrix mouse genome 430 2.0 array.

Data pre-processing and analysis of DEGs.
Identifiable expression forms of data were first converted from the original raw one. The limma package (linear models for microarray data) in the R language (Smyth, 2005) was used to identify DEGs between R6/1 transgenic mice and controls. The student's t-test was used to select the DEGs by the criteria of P values < 0.05 and |log 2 FC| > 1.

Cluster analysis of DEGs
To elucidate the changes of DEG expression at the three different time points, a clustered heatmap of DEGs was constructed by using Bioconductor (gplots package in R). Mean expression values of DEGs among the three different time-point between R6/1 transgenic mice and controls were utilized to constituent the expression matrix.

KEGG pathway enrichment analysis of DEGs
DAVID (Database for annotation visualization and integrated discovery) online analysis tools (http://david.abcc.ncifcrf. go) form an integrated bioinformatics database. The system can excavate the biological functions of numerous genes and protein ID, and play a vital role in further extraction of biological gene information (Huang et al., 2009). The KEGG (www.genome.jp/ kegg/) pathway enrichment analysis was performed to identify DEGs by using DAVID (Kanehisa et al., 2008). Enriched terms including more than two genes and P values < 0.01 were considered statistically significant.

PPI network construction and analysis
The STRING (search tool for the retrieval of interacting genes, http:/string.embl.de/) is an online comprehensive perspective database designed to evaluate protein interaction information (Szklarczyk et al., 2011). In this study, STRING was designed to obtain a PPI network of DEGs and visualized by Cytoscape subsequently (Smoot et al., 2011). The confidence score of 0.4 was confirmed as the cut-off standard. MCODE (Bader et al., 2003) is the preferred computational method to analyze protein complexes from PPI networks. MCODE can generate overlapping clusters by identifying densely connected subgraphs and filtering non-dense subgraphs (Spirin and Mirny, 2003). In our study, MCODE was applied to screen modules of the PPI network with a node score cut-off = 0.2, degree cut-off = 2, max depth = 100 and k-core = 2.

TF annotation
TFs were annotated among DEGs basing on TRANSFAC version 6.0 (http://www.gene-regulation.com) (Matys et al., 2003). The differences and similarities between TFs at the three timepoints as well as the degrees of TFs were analyzed.

Identification of DEGs at three time points
A total of 513, 483, and 528 DEGs were identified at week 18, week 22, and week 27, respectively, when comparing with the control samples. Upregulated and downregulated DEGs are listed in Table 1. There are more downregulated DEGs than upregulated DEGs at the first two time-points, at week 27, the up and downregulated genes are almost equal.

Cluster analysis of the DEGs
To further define the changes of DEG expression level at threetime points in HD transgenic mice, cluster analysis is conducted. Cluster heat maps are performed by comparing the R6/1 transgenic mice and controls at three-time points, respectively. The top fifty DEGs are shown in Fig. 1.

KEGG pathway enrichment analysis
The KEGG pathways enriched for dysregulated genes are listed in Table 2. The results show that the pathways of dysregulated genes are relatively the same between week 18 and week 22, which are enriched in the pathways of neuroactive ligandreceptor interaction, protein digestion, and absorption, measles, influenza A and cytosolic DNA-sensing pathway. At week 27, downregulated genes are enriched in pathways related to cocaine addiction, amphetamine addiction, serotonergic synapse,and alcoholism; upregulated genes are enriched in pathways related to retinol metabolism, chemical carcinogenesis, rheumatoid arthritis, and cytokine-cytokine receptor interaction, respectively. In general, DEGs are predominantly involved in pathways of immune (infective disease) and neuropsychological system-related diseases.

TFs
TFs among the DEGs at the three-time points are listed in Table 4. The maximum number of TFs is found at week 18, including Hoxd8, Atf3and Stat4. The minimum number of TFs is found at week 22, including Egr2 and Atf3. By analyzing the PPI network, Atf3, Hoxd8, and Egr2 were the critical TFs in the development of HD.

Discussion
So far, the exact pathophysiological mechanism of HD is still not clear (Kieburtz et al., 2018). In this study, we aim to explore the underlying molecular mechanisms and to identify key molecules in HD by using bioinformatics methods. By analyzing the gene expression profiles of the brain from the HD mouse model at three different time points, hundreds of DEGs are found in this study. DEGs are identified to be enriched in pathways related to immune (infective disease) and neuropsychological systems. According to the PPI network and MCODE analysis, some hub genes are identified, including Rtp4, Oasl2, and Usp18. TFs among DEGs are also identified, including Atf3, Hoxd8, and Egr2. We infer that these genes may participate in the pathophysiological mechanism of HD. (Saito et al., 2004). A study on the gene in neurodegenerative disease is relatively rare, and we do not retrieve the relevant research on RTP4 and HD. In the study, we find RTP4 is related to the pathway of signaling by GPCR (G protein-coupled receptor). Yao et al. (2015) have demonstrated GPCR modulate huntingtin levels and toxicity in vitro and in vivo in STHdh Q7/Q111 cells, their research revealed that by modulating the GPCR function, mHTT levels and toxicity could be stabilized or reduced. However, the specific mechanism of RTP4 in GPCR and its possible impact on HD still need further study. OASL2 is an interferon stimulating gene and involves in innate immune response (Ellrichmann et al., 2013). It has been proved to be an important reaction module for macrophage activation (Zhu et al., 2014). Kwan et al. (2012) demonstrate that migration of macrophage under inflam-matory stimulation is impaired in their BACDD mouse model of HD, which may predict one of the potential mechanisms of HD (Kwan et al., 2012). McDermott et al. (2011) also suggest that OASL2 takes part in the responses which are crucial for macrophage activation and migration. OASL2 dysregulation may interrupt activation and migration of macrophage in the HD mouse model or interaction with mHTT fragments in immunological manners. USP18 (ubiquitinspecific protease 18) is an interferon-inducible protein and participate in regulation of interferon response on viral infection . Goldmann et al. (2015) demonstrate that Usp18 is essentially contributedto microglial quiescence in white matter microglia. Microglia are regardedas macrophages in the central nervous system and are important for tissue homeostasis. Microglia dysfunctionis thought to be involved in the pathogenesis of certainneurodegenerative and neuroinflammatory disorders. (Goldmann et al., 2015) further report that microglial Usp18 could negatively regulate the activation of Stat1 and inductinterferon-induced genes concomitantly, thereby terminating interferon signaling. Valekova et al. (2016) find that prominent decline of IFNα in central nervous system of HD animals could influenceIFNαrelated innate immune response, lead to enhanced disease progression, further suggesting a highly critical role for IFNα in HD. These results conclude Usp18 as a key negative regulator of microglia activation and identify a protective role of Usp18 for microglia function on regulating the interferon pathway. By preventing destructive microgliopathy and influencing interferon levels, USP18 may take part in the pathogenesis and progression of HD.

RTP4 (Receptor Transporter Protein 4) is a protein-coding gene
ATF3 is a member of the mammalian activated transcription factor protein family. In our study, we findthat ATF3 is differentially expressed at all threetime points. ATF3 is considered to be an effective marker for regeneration after nerve injury and thought to be a new indicator for nerve injury (Linda et  al., 2011). ATF3 can bind to other members of the ARF/CREB (cAMP-response element-binding protein) family, including C-Jun, ATF2and JUNB, by forming dimmers to actas a transcriptional activator and inhibitor (Wang et al., 2015).
Previously enormous researches have demonstrated the role of CREB in HD pathogenesis and disease progression (Chaturvedi et al., 2012;Choi et al., 2009;Giralt et al., 2013).
ATF3 may participate in HD by influencing CREB signaling.
Hoxd8 is a protein encoded by the HOXD8 gene (Wilming et al., 2015). It is a highly conserved family of transcription factors and plays a key role in the morphogenesis of multicellular organisms (Brison et al., 2012). Hoss et al. (2014) have demonstrated that a large amount of HOX genes differentially expressed in HD patients. Hox genes are a major transcription factor family in embryonic development (Lemons and McGinnis, 2006). They are highly associated with most aspects of early development and are significantly expressed in neural differentiation (Pearson et al., 2005). An increase in HOX transcriptional activity may be compensatory, helping to maintain or reconstitute cell polarity or indirect epigenetic regulation. EGR2 (Early growth response protein 2) is a protein encoded by the EGR2 gene. EGR2 is a transcriptional regulator containing two zinc finger DNA binding sites and is highly expressed in migrating neural crest cells (Swiatek and Gridley, 1993).
Many studies have focused on the abnormal expression of EGR2 on the Charcot-Marie-Tooth disease, which is a hereditary motor and sensory neuropathy (Krajewski et al., 2000). So far, we can confirm that EGR2 playsimportant roles in early neurodevelopment; however, its role in HD needs further study.
A large number of previous enrichment analyses have been applied on HD, and many KEGG pathways are involved, including DNA replication, p53, GnRH, and NF-kappa B signaling pathway . In this study, we find infectious diseases, such as measles, herpes simplex infection, and immune system such as neuroactive ligand-receptor interaction and rheumatoid arthritis enriched pathways are more common than other pathways. Neurotoxicity of mHTT is the trigger point for HD, Tai et al. (2007) suggested that the presence of mHTT aggregation could initiateimmune responses in CNS and pointed out microglia and astrocytes were the major contributors to the inflammatory reaction. Tai et al. (2007) demonstrated that microglial activation in striatum was closely associated with the occurrence of cognitive function (Tai et al., 2007). Elevated IL-6 and IL-8 levels were detected in premanifest subjects in the study of Björkqvist (2016) and they extrapolated the results were possibly corresponding to the innate immune response (Wild et al., 2008). Chang et al. (2015) also observed plasma inflammatory proteins, such as MMP-9 and TGF-β 1 altered in pre-symptomatic, early, or late stages of HD (Ionescu et al., 2011). In addition to the indirect effects of mHTT neurotoxicity on immune system, ? demonstrated mHTT could directly interact with IKKγ, which was responsible for immune response. Khoshnan et al. (2004) also confirmed the mHTT could bind to and directly activate IKKβ , which regulated mHTT neurotoxicity via triggering the caspase-dependent cleavage of mHTT. Summarizing previous studies, we concluded that immune responses mightplay important roles in the pathogenesis of HD, and some immune-related cytokines can be used as biomarkers for disease progression and evaluate the prognosis of the disease. At the same time, interfering with certain immune pathway may prevent progression or improve prognosis in HD patients.
Our research had some limitations. First, we used bioinformatics to analyze DEGs based on R6/1 transgenic mouse model, which had 5' end of the human HD gene carrying115-150 CAG repeat at expansions expressed ubiquitously in the mouse genome (Menalled et al., 2002), such genetic manipulation may potentially alter the overall number of genes express or silenced, mismatching the preclinical results presented in our study with gene expression in HD patients. Secondly, potential genetic rearrangements of cellular or animal models on each generation leading to the unstable expansion or retraction of the CAG length repeat may also affect the results of DEGs (Gatto et al., 2015). Moreover, the regulation on the expression the microinjection fragments from the 5' end of the human HD gene isolated from a phage genomic clone (Mangiarini et al., 1996) are under the regulation of murine genes which may not be operated by the same gene regulation as in the human disease. However, the use of other full-length mouse models (such as the BACHD or YAC128 mice) as well as knock-in HD mice models (HdHQ125 and zQ175 mice), which includes on their genomic insertion part of the regulatory genes, could improve such limitations (Gatto et al., 2019).
In conclusion, by performing bioinformatics analysis of gene expression, hundreds of genes related to HD were identified, including Rtp4, Oasl2, and Usp18. Furthermore, our study suggests that HD may have similar pathways involved in several diseases, including measles, herpes simplex infection, and rheumatoid arthritis, and most of the involved pathways are mainly focused on the immune system. Our study provided novel insight into the molecular mechanisms of HD, which may help to explore the pathogenesis of HD and may improve the treatment strategy of the HD.