Integrating genome-wide association study with regulatory SNP annotation information identified candidate genes and pathways for schizophrenia

Background: Schizophrenia is a complex mental disorder. The genetic mechanism of schizophrenia remains elusive now. Methods: We conducted a large-scale integrative analysis of two genome-wide association studies of schizophrenia with functional annotation datasets of regulatory single-nucleotide polymorphism (rSNP). The significant SNPs identified by the two genome-wide association studies were first annotated to obtain schizophrenia associated rSNPs and their target genes and proteins, respectively. We then compared the integrative analysis results to identify the common rSNPs and their target regulatory genes and proteins, shared by the two genome-wide association studies of schizophrenia. Finally, DAVID tool was used to conduct gene ontology and pathway enrichment analysis of the identified targets genes and proteins. Results: We detected 53 schizophrenia-associated target genes for rSNP, such as FOS (P value = 2.18×10-20), ATXN1 (P value = 5.22×10-21) and HLA-DQA1 (P value = 1.98×10-10). Pathway enrichment analysis identified 24 pathways for transcription factors binding regions, chromatin interacting regions, long non-coding RNAs, topologically associated domains, circular RNAs and post-translational modifications, such as hsa05034:Alcoholism (P value = 2.57×10-7) and hsa04612:Antigen processing and presentation (P value = 6.82×10-8). Conclusion: We detected multiple candidate genes, gene ontology terms and pathways for schizophrenia, supporting the functional importance of rSNPs, and providing novel clues for understanding the genetic architecture of schizophrenia.

AGING SCZ is a multi-factorial disease, and its occurrence depends on environmental and genetic factors. The high heritability of SCZ points to a major role for inherited genetic variants in the etiology of SCZ, with estimated heritability up to 80% [1,2]. In recent years, considerable progress has been done in the genetic studies of SCZ [4]. Multiple susceptibility genes have been identified for SCZ [2,5], such as NRG1 [5], DISC1 [5] and DRD2 [2]. Major histocompatibility complex (MHC) [1] on chromosome 6p is the most significant region for SCZ, which contains many markers reaching genome-wide significance [4]. However, the genetic mechanism of SCZ remains largely unknown.
Genome-wide association studies (GWAS) have great power to identify susceptibility genetic loci associated with complex diseases. Over the years, a number of creditable candidate genes for SCZ has been identified, largely by GWAS [2]. Because a stringent threshold requires a huge sample size to reliably identify risk genes, the significant loci identified by GWAS are usually limited and functionally independent, providing limited information for the mechanism studies of SCZ. Moreover, a large part of genetic variants detected by GWAS are located in non-coding chromosomal regions [4]. It is usually confusing how these non-coding regions implicated would be involved in the development of SCZ.
Functional SNPs located in protein-coding and noncoding genes are named regulatory single nucleotide polymorphisms (rSNPs), which usually have major impacts on gene functions [6]. They mainly include transcription factor binding regions (TFBRs), chromatin interactive regions (CIRs), topologically associated domains (TADs), long non-coding RNAs (lncRNAs) coding regions, and circular RNA (circRNA) regions. Additionally, some SNPs located in the protein coding regions can alter protein post-translational modifications (PTMs) [7], such as phosphorylation, methylation, acetylation, ubiquitination, and glycosylation. The implication of rSNPs in the development of complex diseases has been well documented in previous studies [8][9][10]. Integrating GWAS and functional rSNPs annotation information have improved GWAS power and provided novel clues for the genetic studies of complex diseases [11][12][13], such as periodontal diseases [13] and breast cancer [12]. To the best of our knowledge, limited efforts have been paid to explore the functional relevance of rSNPs with SCZ.
In this study, we conducted a large-scale integrative analysis of two GWAS datasets of SCZ with functional annotation datasets of rSNPs. The significant SNPs identified by the two GWAS were first annotated to obtain SCZ-associated rSNPs and their target gene and proteins, respectively. We then compared the integrative analysis results to identify the common rSNPs and their target gene and proteins, shared by the two GWAS of SCZ. Finally, DAVID tool was used to conduct gene ontology (GO) and pathway enrichment analysis of the identified target genes and proteins shared by the two GWAS of SCZ.

DISCUSSION
Considering that most of SCZ variants identified by GWAS were not causal, integration of the GWAS results with functional rSNPs information is a powerful approach to discover novel candidate genes for SCZ [4]. To evaluate the roles of rSNPs in the pathogenesis of SCZ, we conducted a large-scale integrative genomics analysis of two GWAS datasets of SCZ with functional annotation datasets of rSNPs. We identified multiple candidate genes, GO terms, and biological pathways for SCZ. Our study results support the functional importance of rSNPs in the genetic mechanism of SCZ, and provide novel clues for understanding the genetic architecture of SCZ. Note: TFBRs, transcription factor binding regions; CIRs, chromatin interactive regions; TADs, topologically associated domains; PTMs, protein post-translational modifications.

AGING
We identified several candidate genes for SCZ, such as FOS, GABBR1, MDK, ATXN1 and ZSCAN31. FOS is classified as one of the immediate early genes (IEGs), which encode not only transcription factors, but also a much wider variety of proteins, including signaling molecules, growth factors and cytoskeletal proteins [14]. Alteration in the expression of IEGs is linked to neuron generation and neuronal cell death. Nadia Cattane et al. have reported that FOS was significantly upregulated in the fibroblasts of SCZ patients [14]. Defects in synaptic plasticity are involved in the pathophysiology of SCZ. Interestingly, SNPs either protective (rs1063169) or positively associated with SCZ (rs7101T) were identified, showing transcription factor c-fos was important in the regulation of synaptic plasticity [15]. Huang et al. observed high FOS expression in non-neural peripheral samples and low FOS expression in the brain tissues of SCZ patients compared with healthy controls, which suggests that FOS is a sensitive marker for SCZ [16]. In addition, detection of FOS in blood samples may be helpful for SCZ diagnosis [16]. These combined results support the functional relevance of FOS with SCZ [14][15][16], which is consistent with our study results.
GABBR1 is another SCZ-associated gene identified by this study. γ-aminobutyric acidB (GABAB) is an inhibitory transmitter molecule acting at neuronal synapses. Functional GABAB receptor requires both the GABBR1 and GABBR2 subunits. GABAB receptor can modulate the release of a number of neurotransmitters, including dopamine, serotonin, noradrenaline, somatostatin, glutamate and GABA [17]. Interestingly, the reduction of GABBR1 in pyramidal cells, and consequent reduction of GABAB receptors, can result in the dysfunction of inhibitory mechanism and increase signal output [18]. Previous studies have also observed association between GABBR1 and SCZ. Fatemi et al. observed a significant reduction of GABBR1 protein in the lateral cerebellum of the subjects with SCZ, bipolar disorder, and major depression [17]. In addition, Zai et al. conducted a case-control study and detected a positive association between GABBR1 and SCZ [19].
Genetic variation in a region on chromosome 11 that contains MDK was significantly associated with SCZ [20]. In addition, MDK also accumulated in senile plaques in the hippocampus of patients with Alzheimer's disease [20,21]. Notably, ATXN1 serves as one of the hub genes in the protein-protein interaction network containing many known SCZ risk genes [22]. Actually, ATXN1 is highly expressed in prefrontal cortex, and SCZ patients had significantly decreased ATXN1 expression [22]. In addition, Takeo Saito et al. have revealed an association of rs7779855 in ZSCAN31 with SCZ [23].
GO analysis detected several GO terms for SCZ. One important finding of this study is the disclosure of the MHC class II protein complex (GO:0042613), a class of MHC molecules like human leukocyte antigens HLA-DQA1 and HLA-DQB1, which present antigens to CD4-positive T-lymphocytes. The association between SCZ and the immune system has been repeatedly revealed in genetic, epidemiological and post mortem studies [24]. The protein encoded by HLA-DQA1 gene binds to the protein encoded by HLA-DQB1. Together, they form a functional protein complex called an antigen-binding DQαβ heterodimer, which displays foreign peptides to immune system to trigger body's immune response. Interestingly, HLA alleles were previously shown to be associated with SCZ risk [25]. rs9272105 within HLA-DQA1 explained 1-3% of variation in attentional control, and to a lesser extent, premorbid intelligence quotient in psychotic patients [26]. Additionally, rs9272105 within HLA-DQA1 was also individually associated with variation in neuropsychological function [26]. It has been demonstrated that the level of HLA-DRB1 in SCZ was decreased in peripheral blood samples in contrast with increased level in prefrontal cortex [27]. Moreover, HLA-DPA1 and CD74, which are integral components of the MHC Class II protein complex, were both reduced in hippocampus, amygdala, and dorsolateral prefrontal cortex regions in SCZ [28].
T helper 1 type immune response (GO:0042088) is another significant GO term detected by this study. Delayed hypersensitivity reaction is the classic cellmediated immune response with sensitized T helper-1 cells. Michael et al. found an attenuated skin reactivity to various antigens in SCZ patients [29]. They also observed significantly diminished responses of schizophrenics to antigen stimulation with tetanus and diptheria antigen presentation [29]. Significantly increased serum level of interleukin-18, a cytokine known to activate T helper 1 type cells in immune responses, has been previously observed in SCZ patients [30].
In addition, we also identified several other GO terms associated with SCZ, such as Golgi membrane (GO:0000139) and GABA signaling pathway (GO:0007214). Previously, it was shown that differentially expressed genes related to Golgi apparatus, vesicular transport and membrane association were over-represented in SCZ [31]. In line, Devor et al. identified a large number of GABA-associated genes for SCZ [32].
The involvement of cell adhesion molecules (CAMs) in the pathophysiology of SCZ has long been hypothesized. In this study, CAMs (hsa04514) pathway was detected for SCZ. The CAMs pathway is implicated in a variety of neurocognitive processes, including memory and attention-related reaction time.
Multiple CAM genes has been reported to be associated with SCZ risk [26]. Neural CAMs are very important members of the exclusive group of the molecules responsible for precise wiring of nervous system. Neural CAMs serve as ''glue'' in cell-to-cell adhesion and contact-mediated attraction [33]. They can interact with numerous matrix components, and are involved in many aspects of neuronal development, synaptogenesis, and myelination, which guide brain morphology and support highly-coordinated brain activity [33]. Cerebrospinal fluid neuronal CAMs were significantly increased in SCZ patients [34]. Additionally, the plasma levels of intercellular adhesion molecule-1 was also elevated in SCZ patients [35]. Honer et al. found that syntaxin immunoreactivity in the cingulate cortex from schizophrenics was increased, along with neural CAMs and the CAMs to synaptophysin ratio [36]. Besides, it has been reported that L1 cell adhesion molecule interaction was involved in neuronal function [37].
Another interesting SCZ associated pathway is alcoholism (hsa05034). Recent studies have suggested that alcoholism has a wide-reaching influence on the clinical course of SCZ, contributing to shape abnormalities in hippocampus and subcortical shape differences [38]. We also observed that systemic lupus erythematosus (SLE) (hsa05322) was associated with SCZ. Despite the fact that SCZ is not classified as a typical autoimmune diseases, the dysregulation of the immune system cannot be excluded [39,40]. Interestingly, DNA and myelin basic protein (MBP)hydrolyzing antibodies, which play an important harmful role in SLE pathogenesis, were also detected in the sera of SCZ patients. In addition, light chains of IgGs from SCZ patients were similar to those of SLE patients [41].
The majority of SNPs identified by GWAS are enriched in non-coding regions [4], and contribute to complex traits and diseases through various molecular mechanisms. These include effects on transcription factor binding affinities, which can result in differential gene expression [11]. However, significant loci identified by GWAS have rarely been tracked to causal polymorphisms thus far. Integrative analysis of GWAS with functional rSNPs is helpful to improve GWAS power and provide novel clues for pathogenetic studies of SCZ. Notably, the functional SNPs data can assist to exclude unlikely genes/loci, effectively reducing the number of tests needed for unbiased searches across the genome, thus, improving the power to discover novel causal loci [4]. To the best of our knowledge, this is the first large-scale integrative analysis of GWAS and rSNPs for SCZ. The implication of rSNPs in the development of SCZ was systematically investigated considering TFBRs, CIRs, lncRNAs regions, TADs, circRNAs, and PTMs in our study. Nevertheless, one limitation of this study should be noted. The SCZassociated SNP sets were driven from previous GWAS. The accuracy of our integrative analysis may be affected by the power of previous GWAS of SCZ. Therefore, further studies are warranted to confirm our findings.
In conclusion, we conducted a large-scale integrative genomics analysis of two GWAS datasets of SCZ with functional annotation datasets of rSNPs to explore the genetic basis of rSNPs in the pathogenesis of SCZ. We observed multiple candidate genes, GO terms and pathways for SCZ. We hope that our study results could provide novel clues for the pathogenic and therapic studies of SCZ.

The first GWAS dataset of SCZ (SCZ1)
A large GWAS meta-analysis data of SCZ was driven from the Psychiatric Genomics Consortium (PGC) [42], totally containing 33,426 SCZ cases and 32,541 controls. Genotypes from all studies were processed by the PGC using unified quality control procedures followed by imputation of SNPs and insertion-deletions using the 1000 Genomes Project reference panel [43]. Quality control and imputation were performed on each of the study cohort datasets, according to the standard procedures established by the PGC [42]. Genotype imputation was performed using the prephasing/imputation stepwise approach implemented in IMPUTE2 [44] and SHAPEIT [45]. The imputation reference set consists of 2,186 phased haplotypes from the full 1000 Genomes Project dataset. Logistic regression was conducted to control for 13 components of ancestry, study sites and genotyping platform. Detailed description of sample characteristics, experimental design, statistical analysis and quality control can be found in the previous studies [42].

The second GWAS dataset of SCZ (SCZ2)
Another independent GWAS data of SCZ [2] was used here. Briefly, this GWAS included 36,989 SCZ cases and 113,075 controls, from 49 ancestry matched, nonoverlapping case-control samples (46 of European and three of East Asian ancestry, 34,241 cases and 45,604 controls) and 3 family-based samples of European ancestry (1,235 parent affected-offspring trios). Genotypes from all studies were processed by the PGC using unified quality control procedures. The 1000 Genomes Project reference panel was used for SNPs imputation [43]. In each sample, association testing was conducted using imputed SNP dosages and principal components to control for population stratification. After quality control (imputation INFO score ≥ 0.6, MAF ≥ 0.01, and successfully imputed in ≥ 20 samples), they considered around 9.5 million variants. An inverseweighted fixed effects model was used for final metaanalysis. Detailed description of sample characteristics, experimental design, statistical analysis and quality control can be found in the previous study [2].

rSNPs annotation datasets
The rSNPs annotation information were driven from the rSNPBase 3.1 database (http://rsnp3.psych.ac.cn) [46] and the AWESOME database (http://www.awesomehust.com) [7]. rSNPBase 3.1 provided rich functional annotation for human SNP-related regulatory elements and their target regulatory genes, including TFBRs, CIRs, mature microRNA (miRNA) regions, predicted miRNA target sites, lncRNA regions, TADs and circRNAs. AWESOME database is an analysis tool that systematically evaluates the role of SNPs on nearly all kinds of PTMs based on 20 available tools. They construct a comprehensive platform to collect and integrate SNPs and multiple PTM information, utilizing 24 published database or tools. 1,043,608 germline missense variants from the dbSNP was used and each SNP was matched with its protein sequence in AWESOME. Detailed description of the two rSNPs annotation database can be found in the published studies [7,46].

Statistical analysis
The significant SNPs with GWAS P value < 5.0 × 10 -8 were selected from the two GWAS of SCZ (SCZ1 and SCZ2). The selected SCZ-associated SNPs were then annotated by the rSNPBase 3.1 database [46] and the AWESOME database to obtain SCZ associated rSNPs and their target regulatory genes and proteins. We then compared the integrative analysis results to identify the common rSNPs and their target genes and proteins shared by the two GWAS of SCZ. To explore the functional relevance of identified target regulatory genes and proteins with SCZ, GO and pathway enrichment analyses of the identified common target genes and proteins shared by the SCZ1 and SCZ2 were performed AGING by the Database for Annotation, Visualization and Integrated Discovery (DAVID) tool [47].

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.