A Genome-Wide Association Study and Complex Network Identify Four Core Hub Genes in Bipolar Disorder

Bipolar disorder is a common and severe mental illness with unsolved pathophysiology. A genome-wide association study (GWAS) has been used to find a number of risk genes, but it is difficult for a GWAS to find genes indirectly associated with a disease. To find core hub genes, we introduce a network analysis after the GWAS was conducted. Six thousand four hundred fifty eight single nucleotide polymorphisms (SNPs) with p < 0.01 were sifted out from Wellcome Trust Case Control Consortium (WTCCC) dataset and mapped to 2045 genes, which are then compared with the protein–protein network. One hundred twelve genes with a degree >17 were chosen as hub genes from which five significant modules and four core hub genes (FBXL13, WDFY2, bFGF, and MTHFD1L) were found. These core hub genes have not been reported to be directly associated with BD but may function by interacting with genes directly related to BD. Our method engenders new thoughts on finding genes indirectly associated with, but important for, complex diseases.


Introduction
Bipolar disorder (BD) is a common and severe mental disorder characterized by alternative episodes of mania/hypomania and depression [1]. It affects 1-5% of the world's population [2][3][4]. Genetic studies have shown that bipolar disorder is a complex genetic disease that involves the interaction of multiple genes and the environment. Genetic factors can account for up to 60-85% of the risk [5][6][7][8]. The strong genetic basis of BD inspires plenty of research focused on finding candidate genes or single nucleotide polymorphisms (SNPs) associated with this disease.
Over the past few decades, traditional family-based linkage analysis and population-based case-control association analysis have been common means of identifying bipolar disorder susceptibility genes. With the advent of the third-generation polymorphism genetic marker SNPs, genome-wide association studies (GWASs) have also been applied to large-scale scanning of new BD susceptibility gene loci and a number of genes, such as CACAN1C [9,10], ANK3 [10,11], SYNE1 [12], CSMD1 [12], ITIH1 [11], KIT [11], and DGKH [13], have been found.
GWASs have proven to be useful in finding susceptibility genes of diseases. However, when used alone, it is difficult to determine genes that have relatively high GWAS p-values but may play a role through interaction with the genes directly associated with BD. The complexity of the disease makes it even more difficult to elucidate its molecular mechanism. Therefore, although the previous study has found a lot of genetic factors with significant effects on BD, its molecular mechanism remains unresolved. In this case, a comprehensive analysis focusing on gene interactions and biological functions will provide valuable information to explore the pathogenesis of BD. It has been found that the distribution of genetic marker loci on chromosomes and the interaction between SNPs are one of the major genetic basis for complex diseases [14]. The gene network is often used to reveal complex relationships among genes.
Considering that complex mental phenotypes may be affected by many genes with small or mild effects rather than one or two genes with a major impact [15], a comprehensive analysis of the underlying genes in the pathway or network framework may provide more insights into its molecular mechanism. It will be more efficient to understand the role of genes in complex diseases using network study. Some methods have been developed in this area, but the problem is far from being solved. There is scarce known molecular interaction mechanism and systematic gene network analysis for BD. Construction of a gene interaction network can be used to explore the synergistic effect of multiple genes on BD.
In this study, we performed a GWAS to obtain BD-related genes and confirmed their function by functional enrichment analysis. To further explore the association between these genes and BD, a network was constructed using a human protein-protein interaction database, and the BD-risk genes identified in the GWAS were mapped onto the network to find core hub genes. This will provide more insight into the molecular mechanisms of BD by determining the core hub genes of the network.

GWAS Results
A total of 482,247 SNPs located on 22 chromosomes of 1868 BD cases and 2938 controls satisfies the quality control. The number of SNPs decreases to 354,282 after the Hardy-Weinberg equilibrium test. Finally, a total of 6458 SNPs is qualified in the GWAS where p < 0.01 and used for further analysis. The result is shown in Figure 1.

Gene Functional Analysis
A total of 2045 risk genes was obtained after mapping the 6458 SNPs onto human genes. These genes were then classified into three Gene Ontology (GO) sections: cellular components, molecular functions, and biological processes. The first 10 GO items (p < 0.01) are shown in Tables 1-3. Genes with transferase and kinase function dominate in molecular functions. In cellular components, most gene products are located in the nervous system. This coordinates with the biological process result in which most genes are involved in nervous system development.

Gene Functional Analysis
A total of 2045 risk genes was obtained after mapping the 6458 SNPs onto human genes. These genes were then classified into three Gene Ontology (GO) sections: cellular components, molecular functions, and biological processes. The first 10 GO items (p < 0.01) are shown in Tables 1-3. Genes with transferase and kinase function dominate in molecular functions. In cellular components, most gene products are located in the nervous system. This coordinates with the biological process result in which most genes are involved in nervous system development.

Overlapped Genes in Different Mental Illnesses
We enriched these candidate genes in BD and other three mental illnesses: schizophrenia, intellectual disability, and autistic disorder. In the total 2045 risk genes, the numbers of genes associated with schizophrenia, intellectual disability, autism, and bipolar disorder are 151, 123, 84, and 84, respectively. A Venn map of the overlap genes of these four diseases shows that, out of the 84 genes associated with bipolar disorder, 55 genes are in common with schizophrenia, 17 with intellectual disorder, and 28 with autism ( Figure 2).

Protein Interaction Network
The 2045 genes from the GWAS result are mapped onto the protein-protein interaction network constructed using data from the STRING database ( Figure 3).

Protein Interaction Network
The 2045 genes from the GWAS result are mapped onto the protein-protein interaction network constructed using data from the STRING database ( Figure 3).

Protein Interaction Network
The 2045 genes from the GWAS result are mapped onto the protein-protein interaction network constructed using data from the STRING database ( Figure 3).  There are 1083 nodes in the network. The average node degree of the network is 7.555. The clustering coefficient is 0.232, and the characteristic path length is 3.393. The properties of the network are further analyzed and the results are shown in Figure 4. The connectivity of the network exhibits characteristic power distribution. Figure 4b shows that the shortest path with the highest frequency among the candidate genes of BD is between 3 and 4, indicating that the network is not a stochastic network but a complex network with characteristics of biological molecular network. The number of neighbors shared by the network nodes has a significant inverse relationship with its topological coefficients (Figure 4c), but shows a positive correlation with the node's identity (Figure 4d). There are 1083 nodes in the network. The average node degree of the network is 7.555. The clustering coefficient is 0.232, and the characteristic path length is 3.393. The properties of the network are further analyzed and the results are shown in Figure 4. The connectivity of the network exhibits characteristic power distribution. Figure 4b shows that the shortest path with the highest frequency among the candidate genes of BD is between 3 and 4, indicating that the network is not a stochastic network but a complex network with characteristics of biological molecular network. The number of neighbors shared by the network nodes has a significant inverse relationship with its topological coefficients (Figure 4c), but shows a positive correlation with the node's identity (Figure 4d).

Hub Genes of the Network
One hundred twelve gene nodes with a degree >17 is chosen as hub genes from the network for further analysis (Tables 4 and A1).

Hub Genes of the Network
One hundred twelve gene nodes with a degree >17 is chosen as hub genes from the network for further analysis (Tables 4 and A1).  Of these 112 hub genes, 45 were reported associated with BD in previous studies. Another 24 were reported associated with other mental illnesses. Gene nodes with higher degrees have a higher ratio of genes being reported associated with BD. Only five genes with a degree >23 (51 genes) are not reported associated with BD or other mental illnesses, while 24 with a degree ≤23 (61 genes) are not found reported directly associated to any mental illnesses. Obviously, risk genes with more degrees have a closer connection to BD than those with fewer degrees.

Significant Modules of the Network and Core Hub Genes
Five significant gene modules are found in the network containing 112 hub genes with Cytoscape. Four core hub genes are found in these modules: FBXL13, WDFY2, bFGF (FGF2), and MTHFD1L. No core hub gene is found for one module (Cluster 4) (Table 5, Figure 5).

Most BD Risk Gene Products Are Located in the Nervous System
Our gene functional analysis of 2045 BD risk genes shows that most of their products are located in the nervous system, such as synapse and postsynapse. The result of GO biological process analysis shows most genes are involved in nervous system development. These two results verify each other and are consistent with previous studies [104]. BD risk genes may affect patients in two aspects: shortterm and permanent. Environmental or internal factors may cause ectopic expression of some of the risk genes, which in turn cause episodes of BD. Some genes may work in the development of the nervous system and have a permanent effect on patients. This may explain why 60% patients will relapse into depression or mania within two years after treatment [105].

Intense Overlappings of Genes Associated with BD and Other Mental Disorders
Many symptoms and signs overlap between different mental disorders and patients often present with features of more than one disorder [106]. This may be caused by underlying genetic reasons. We compared BD risk genes with those of three other mental disorders and found intense overlaps. Similar results were also reported in other studies [28,52,86,89,106,107].
Schizophrenia and BD share the most associated genes. Previous work also found a significant correlation between a BP polygenic risk score and the clinical dimension of mania in schizophrenia patients [86]. PRKG1 was reported to be significantly associated with schizophrenia. In this study, we also find it is a hub gene in the network of BD risk genes. This gene encodes a cGMP-dependent protein kinase which acts as key mediator of the nitric oxide (NO)/cGMP signaling pathway. Another gene, SMARCA2, was also found to play a role in the pathophysiology of schizophrenia [27]. Its product is a transcription activator and involved in neuron differentiation. Many other risk genes are also found involved in signal transduction and nervous system development. This suggests that these two diseases may share some common underlying pathways.

Most BD Risk Gene Products Are Located in the Nervous System
Our gene functional analysis of 2045 BD risk genes shows that most of their products are located in the nervous system, such as synapse and postsynapse. The result of GO biological process analysis shows most genes are involved in nervous system development. These two results verify each other and are consistent with previous studies [104]. BD risk genes may affect patients in two aspects: short-term and permanent. Environmental or internal factors may cause ectopic expression of some of the risk genes, which in turn cause episodes of BD. Some genes may work in the development of the nervous system and have a permanent effect on patients. This may explain why 60% patients will relapse into depression or mania within two years after treatment [105].

Intense Overlappings of Genes Associated with BD and Other Mental Disorders
Many symptoms and signs overlap between different mental disorders and patients often present with features of more than one disorder [106]. This may be caused by underlying genetic reasons. We compared BD risk genes with those of three other mental disorders and found intense overlaps. Similar results were also reported in other studies [28,52,86,89,106,107].
Schizophrenia and BD share the most associated genes. Previous work also found a significant correlation between a BP polygenic risk score and the clinical dimension of mania in schizophrenia patients [86]. PRKG1 was reported to be significantly associated with schizophrenia. In this study, we also find it is a hub gene in the network of BD risk genes. This gene encodes a cGMP-dependent protein kinase which acts as key mediator of the nitric oxide (NO)/cGMP signaling pathway. Another gene, SMARCA2, was also found to play a role in the pathophysiology of schizophrenia [27]. Its product is a transcription activator and involved in neuron differentiation. Many other risk genes are also found involved in signal transduction and nervous system development. This suggests that these two diseases may share some common underlying pathways.

Core Hub Genes Give New Insights of BD
We combined protein-protein network and genome wide association analysis in this study and found four core hub genes. Although genes with higher degrees are more frequently reported to be associated with BD, two core hub genes (WDFY2 and FBXL13) have relatively low degrees (20 and 19, respectively).
bFGF has not been reported to be associated with BD before, but is usually used for treatment of neurodegenerative diseases such as Alzheimer's disease [42]. It plays an essential role in regulation of cell proliferation, differentiation, and migration. bFGF is found as a core hub gene implies the abnormal nervous development of BD patients.
There is no obvious evidence for another core hub gene, MTHFD1L, to be associated with BD, but it is thought to have an important effect on the pathophysiology of depression through rumination, and maybe via this cognitive intermediate phenotype on other mental and physical disorders [38].
WDFY2 is not directly associated with BD, but its product interacts with AKT1 [108], which has been found involved in BD and schizophrenia [109]. This result suggests that the pathophysiology of BD is even more complicated than we thought. Some genes may play a role through its interaction with genes directly associated with BD.
FBXL13 functions in the maturation of human dendritic cells [68] which are key regulators in the immune system and show mild aberrancies in bipolar disorder that can be fully restored to even activation after in vivo lithium treatment [67].
Interestingly, all the four core hub genes are not directly associated with BD. Although the role of these genes in the pathophysiology of BD requires further investigation, our method inspires new initiatives to find those genes that are important for BD but overlooked by studies using GWAS alone.

Effectiveness of GWAS Followed by Gene Network Analysis
GWAS is a successful tool for identifying human disease-associated genes. However, results of different studies often vary due to sampling even when a strict significant p-value of 5 × 10 −8 is used [110]. In this study, a loose p-value threshold of 0.01 was used for the GWAS, and a gene network analysis was then used to find BD-associated genes in the GWAS result. Many resulted genes with high network degrees but relatively high GWAS p-values are reported to be associated with BD and/or other mental illnesses (Table 4), which suggests that the combination of the two methods is efficient in finding disease-related genes. It is necessary to use a loose p-value threshold in the first step to provide enough input genes for the following network analysis. A second screening using network degrees can help to make the final result more reliable. Sklar et al. conducted a combined GWAS with 7481 BD cases and 9250 controls and identified CACNA1C and a miRNA located in the first intron of ODZ4 as BD-associated genes [87]. The calcium channel subunit coding gene CACNA1C has also been found to be associated with BD in previous studies [9,52,86] and is confirmed with a relatively high degree (30) in our results. However, the miRNA is not detected in this study, probably due to our relatively smaller sample size.

Bipolar Disorder Datasets
The dataset is from a study published by Wellcome Trust Case Control Consortium (WTCCC), which conducted a genome-wide scan of all SNPs of 17,000 British Caucasian loci by human SNPs genotyping chips. This dataset includes 14,000 disease samples from seven common complex diseases: bipolar disorder, bipolar depression, Crohn's disease, hypertension, rheumatoid arthritis, type 1 diabetes, type 2 diabetes, and 3000 healthy control samples, which has been completed by more than 50 research teams [111]. The dataset is downloaded from WTCCC website [112]. This study uses the BD part of the dataset. Human SNP annotation data and human reference sequence data are downloaded from NCBI (https://www.ncbi.nlm.nih.gov/), which contain 336,843,011 SNPs on 24 human chromosomes and the start and end of genes in which they are located [113].

Screening of Risk SNPs
SNP sites that do not meet one of the following criteria are excluded for quality control: Hardy-Weinberg equilibrium test (Bonferroni corrected p < 5 × 10 −7 ), missingness >5%, minor allele frequency <5%, and odds ratio R 2 > 0.8. Risk SNPs are screened under p < 0.01. Quality control and risk gene screening are finished with Plink software [114].

Mapping Significant Risk SNPs to Genes
Risk SNPs are mapped onto human genes by comparing them with transcription start sites and stop sites. An SNP will be mapped onto its nearest gene within 5 kb if it is not located within any gene. SNPs located outside of the 5 kb of genes are removed.

Gene Function and Disease Enrichment Analysis
FunRich [115] software is used to carry out gene enrichment analysis with p < 0.01. Results are reversely ordered by FDR-values, and only the first 10 results are listed in each GO section. ToppGene [116] is used to enrich genes in four different mental illnesses.

Protein Network Analysis
STRING database [117] is used to find a protein-protein relationship and FunRich is then used to map BD risk genes to the protein-protein network. Those protein (gene) nodes with degree >17 are sifted out as hub genes, which are further analyzed with the MCODE plugin [118] of Cytoscape [119] to find out network clusters (modules) and core hub genes. The node gene with the highest MCODE node score in a cluster is designated as its core hub gene, which is crucial for the cluster.
The topological properties of a gene cluster include [120,121] the following: (1) degree, the number of genes directly connected to a gene, (2) the cluster coefficient (CC), the coincidence of the common regulatory genes between two adjacent genes, defined as where n i represents the number of edges of the k i neighbors that connect to node i-the mean of the clustering coefficients of all nodes is designated as the clustering coefficient of the network-(3) the shortest path, the path with the least edges between two nodes, and (4) betweenness (B(v)), the sum of the ratios of number of shortest paths connecting to a node to that of all shortest paths in a network where δ st is total number of shortest paths from node s to t, and δ st (ν) is the number of those paths that pass through v.