A large-scale integrative analysis of GWAS and common meQTLs across whole life course identifies genes, pathways and tissue/cell types for three major psychiatric disorders

Attention deficit hyperactivity disorder (ADHD), bipolar disorder (BP) and schizophrenia (SCZ) are complex psychiatric disorders. We conducted a large-scale integrative analysis of genome-wide association studies (GWAS) and life course consistent methylation quantitative trait loci (meQTLs) datasets. The GWAS data of ADHD (including 20,183 cases and 35,191 controls), BP (including 7481 cases and 9250 controls) and SCZ (including 36,989 cases and 113,075 controls) were derived from published GWAS. Life course consistent meQTLs dataset was obtained from a longitudinal meQTLs analysis of 1018 mother-child pairs. Gene prioritization, pathway and tissue/cell type enrichment analysis were conducted by DEPICT. We identified multiple genes and pathways with common or disease specific effects, such as NISCH (P = 9.87 × 10-3 for BP and 2.49 × 10-6 for SCZ), ST3GAL3 (P = 1.19 × 10-2 for ADHD), and KEGG_MAPK_SIGNALING_PATHWAY (P = 1.56 × 10-3 for ADHD, P = 4.71 × 10-2 for BP, P = 4.60 × 10-4 for SCZ). Our study provides novel clues for understanding the genetic mechanism of ADHD, BP and SCZ.


Introduction
Attention deficit hyperactivity disorder (ADHD), bipolar disorder (BP) and schizophrenia (SCZ) are common complex psychiatric disorders, which are characterized by clinical and genetic heterogeneity that manifests in a wide range of deficits. ADHD is a neurodevelopmental psychiatric disorder, affecting around 5% of children and adolescents and 2.5% of adults worldwide (Faraone and Doyle, 2015). BP has a community lifetime prevalence of 4% in its broadest sense (Ketter, 2010). SCZ affects more than 1% of the population worldwide (Owen et al., 2005). Psychiatric disorders have enormous effects on morbidity and are one of the most crucial causes of disability, accounting for around one-third of years lost due to disability among adults (World Health Organization, 2008). Moreover, BP and SCZ are among the top 15 leading causes of disability (World Health Organization, 2008).
Family studies have provided consistent evidence that genetic factors contribute greatly to the development of psychiatric disorders (Kendler and Eaves, 2005). Genome-wide association studies (GWAS) have identified several risk loci for ADHD, BP and SCZ (Demontis et al., 2017;Group, 2011;Consortium, 2014). Despite the success of GWAS in identifying loci associated with psychiatric disorders, the genetic basis of major psychiatric disorders remains elusive now. Due to the strict genome-wide significance threshold, the significant loci identified by GWAS are usually limited and functionally independent, providing limited information for mechanism studies of psychiatric disorders. Furthermore, a large part of loci identified by GWAS locate at noncoding chromosomal regions and are likely to be ignored in previous GWAS.
Recent years have seen a surge of interest in non-coding regulatory loci. Zhang et al. demonstrated the importance of non-coding regulatory loci (such as expression quantitative trait loci, eQTLs) in the pathogenesis of complex diseases (Zhang and Lupski, 2015). It was also found that non-coding regulatory loci are enriched in the disease associated loci detected by GWAS (Hindorff et al., 2009). These enriched non-coding regulatory loci can provide essential links between the loci in a GWAS region and the biological processes they affect, which would help to understand the genetic architecture of complex diseases and traits. Therefore, integrative analysis of GWAS and regulatory loci information can provide useful clues for pathogenetic studies. This method has become a focus of interest for researchers attempting to study the genetics of complex diseases (Wray et al., 2018). Gamazon et al. applied integrative approaches to eQTLs from 44 tissues and GWAS data, providing comprehensive analyses of the contribution of eQTLs to trait variation. They observed that eQTLs in whole blood were enriched for associations with about half of the traits tested, which demonstrated the utility of blood for broadly studying the underlying genetic mechanisms of complex diseases (Gamazon et al., 2018).
DNA methylation refers to the addition of a methyl group at the fifth position of cytosines in CpG dinucleotides (Rakyan et al., 2011), which is influenced by both genetic and environmental factors. DNA sequence variants that associate with methylation status, namely methylation quantitative trait loci (meQTLs), have been found throughout the genome for multiple tissues (Gibbs et al., 2010;Schalkwyk et al., 2010). Recent studies have demonstrated the functional relevance of meQTLs with the development of complex human diseases (Mcclay et al., 2015). It has been observed that meQTLs are more enriched in the SNP sets with low GWAS P values than randomly sampled SNPs for multiple diseases, including Alzheimer's disease and SCZ (Gaunt et al., 2016). A previous research observed significant overlap of meQTLs among different developmental stages (Smith et al., 2014). Furthermore, recently, Gaunt, T.R. et al. again confirmed that DNA methylation contains a significant heritable component that is highly stable across life course (Gaunt et al., 2016).
To better understand the underlying genetic basis of ADHD, BP and SCZ, we conducted a system integrative analysis of GWAS and common meQTLs across whole life course. Briefly, we first selected a life course consistent genome-wide meQTLs annotation dataset from a recent meQTL longitudinal analysis conducting at five time points through the life course (Gaunt et al., 2016). The life course consistent meQTLs set was then integrated with the GWAS data of ADHD, BP and SCZ (Demontis et al., 2017;Group, 2011;Consortium, 2014) to identify the meQTLs sets showing significant associations for ADHD, BP and SCZ, respectively. The identified three psychiatric disorders associated meQTLs were finally subjected to DEPICT (Pers et al., 2015) for detecting the genes that account for associations as well as the pathways and tissue/cell types in which the genes' actions affect the risks of ADHD, BP and SCZ, respectively. Our study results provide novel useful information for clarifying the genetic mechanism of ADHD, BP and SCZ.

GWAS summary dataset of ADHD
A recent large-scale GWAS summary dataset of ADHD was applied here (Demontis et al., 2017). In brief, this GWAS dataset contained 20,183 ADHD cases and 35,191 controls collected from 12 cohorts. Quality control was performed using the bioinformatics pipeline Ricopili (available at https://github.com/Nealelab/ricopili) prior to analysis. Related individuals were excluded, and genetic outliers were removed based on principal component analysis. The 1000 Genomes Project Phase 3 reference panel was used for imputation. GWAS was conducted in each cohort using logistic regression with the imputed additive genotype dosages. Principal components were included as covariates to correct for population stratification, and variants with imputation INFO score < 0.8 or minor allele frequency < 0.01 were removed. The GWAS were then meta-analyzed using an inverse-variance weighted fixed effects model. Detailed information of cohorts, genotyping, imputation, and quality control approaches can be found in the published study (Demontis et al., 2017).

GWAS summary dataset of BP
GWAS summary dataset of BP was driven from a combined GWAS of BP (Group, 2011). Briefly, this study consisted of 16,731 individuals, including 7481 cases and 9250 controls. Primary genotype and phenotype data were received and divided into 11 case and control groupings. 46,234 SNPs were genotyped in all 11 groups and 1,016,924 SNPs were genotyped in between 2 and 11 groups. The missing genotypes were imputed using BEAGLE (Browning and Browning, 2007) based on the reference haplotypes from the HapMap phase 2 CEU samples. Logistic regression of case status was performed on imputed SNP dosage using PLINK. Detailed description of cohorts, genotyping, imputation, and quality control approaches can be found in the published study (Group, 2011).

GWAS summary dataset of SCZ
GWAS summary dataset of SCZ was driven from a multi-stage GWAS of SCZ, including 36,989 cases and 113,075 controls (Consortium, 2014). In brief, 49 ancestry matched, non-overlapping case-control samples (46 of European and 3 of East Asian ancestry, 34,241 cases and 45,604 controls) and 3 family-based samples of European ancestry (1235 parent affected-offspring trios) were constructed from genomewide genotype data. Genotype data from all studies were processed by the Psychiatric Genomics Consortium using unified quality control procedures. Imputation was performed using the 1000 Genomes Project reference panel. In each sample, association testing was conducted with PLINK using imputed marker dosages as well as principal components for controlling possible population stratification. An inverse-weighted fixed effects model was used for result combination. Detailed description of sample characteristics, experimental design, statistical analysis and quality control can be found in the previous study (Consortium, 2014).

meQTLs datasets of five time points of life course
A comprehensive genome-wide cis and trans blood meQTL longitudinal analysis was used in this study (Gaunt et al., 2016). The longitudinal analysis involved three time points of a large number of participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) (Boyd et al., 2013) and two time points of their mothers (Fraser et al., 2013). Blood was selected from 1018 mother-child pairs. For children, samples were collected at birth (cord blood, n = 771), childhood (n = 834) and adolescence (n = 837). For their mothers, samples were collected during pregnancy (n = 764) and in middle age (n = 742) (Relton et al., 2015). DNA methylation levels were determined by the Illumina Infinium HumanMethylation450 BeadChip at five time points across the life course. Exhaustive whole-genome meQTL analysis was performed by testing approximately 8.3 million common SNPs against each reliable CpG probe for each time point. The meQTLs with genome-wide association testing P values < 5.0 × 10 −8 were used in our study. Fig. 1, presents the overlap levels of meQTLs among the five life time points.

Statistical analysis
The meQTLs with P values < 5 × 10 −8 of five time points were first selected and compared to obtain the meQTLs shared by all of the five time points of life course. The obtained life course consistent meQTLs were then aligned with the GWAS data to get the meQTLs showing association evidence (GWAS P values < 1.0 × 10 −5 ) with ADHD, BP and SCZ, respectively. We chose P value < 1.0 × 10 −5 as the GWAS significance threshold, which was recommend by the developers of the DEPICT software (Pers et al., 2015). Using the standard procedure and default parameters of the DEPICT software (Pers et al., 2015), the ADHD, BP and SCZ associated meQTLs were input to DEPICT for gene prioritization, pathway enrichment and tissue/cell type enrichment analysis, respectively. DEPICT is a powerful integrative tool that is capable of prioritizing likely casual genes at associated loci and testing enriched gene sets based on predicted gene function. Briefly, DEPICT prioritizes genes based on the assumption that truly associated genes should share functional annotations. Gene set enrichment analysis is conducted by identifying whether genes at given loci are enriched in 14,461 reconstituted gene sets. For tissue/cell type enrichment analysis, DEPICT utilizes 37,427 human Affymetrix microarray data to assess whether the genes at associated loci are highly expressed in any of the 209 Medical Subject Heading (MeSH) tissue and cell type annotations. The detailed description of DEPICT analysis approach is available in the published study (Pers et al., 2015). It has been widely used in previous studies for pathway analyses (Marouli et al., 2017;Ehret et al., 2016).

DEPICT gene prioritization results
A total of 1,424,656 meQTLs with P values < 5.0 × 10 −8 were shared by the five time points of life course and used in this study. After aligning the 1,424,656 meQTLs with the significant loci identified by the GWAS, we detected 935 meQTLs associated with ADHD, 189 meQTLs associated with BP and 17,012 meQTLs associated with SCZ. Gene prioritization analysis results are shown in Table S1. The top 30 genes of three psychiatric disorders are listed in Table 1. We identified three genes shared by BP and SCZ, including NISCH (P value = 9.87 × 10 −3 for BP, P value = 2.49 × 10 −6 for SCZ), TWF2 (P value = 3.02 × 10 −2 for BP, P value = 7.54 × 10 −3 for SCZ) and TRANK1 (P value = 3.10 × 10 −2 for BP, P value = 1.14 × 10 −2 for SCZ).

DEPICT tissue/cell type enrichment results
Tissue/cell type enrichment analysis results are presented in Table 3. As expected, most of the enriched tissue/cell types were related to various brain regions. We observed 14 brain related tissue/cell types shared by BP and SCZ, such as cerebral cortex (P value = 1.62 × 10 −2 for BP, P value = 9.61 × 10 −3 for SCZ) and cerebrum (P value = 1.36 × 10 −2 for BP, P value = 9.45 × 10 −3 for SCZ). Additionally, frontal lobe was shared by ADHD and BP (P value = 4.03 × 10 −2 for ADHD, P value = 2.55 × 10 −2 for BP) and basal ganglia was shared by ADHD and SCZ (P value = 4.54 × 10 −2 for ADHD, P value = 4.53 × 10 −2 for SCZ).

Discussion
Over the past few years, GWAS has overtaken other approaches of gene finding, such as candidate gene and linkage analyses, and has unveiled multiple new genes for major psychiatric disorders (Demontis et al., 2017;Group, 2011;Consortium, 2014). However, a large part of loci identified by GWAS locate at non-coding chromosomal regions and are likely to be missed. Therefore the genetic basis of major psychiatric disorders still remains elusive now. Recent studies have demonstrated the functional relevance of meQTLs with the development of complex diseases (Mcclay et al., 2015). Additionally, meQTLs have been observed enriched in low GWAS P values for several neuropsychiatric disorders (Gaunt et al., 2016), providing essential links between the loci in a GWAS region and the biological processes they affect. Considering that meQTLs are highly stable across the life span (Smith et al., 2014;Gaunt et al., 2016), we analyzed the role of life course consistent meQTLs in the pathogenesis of ADHD, BP and SCZ by integrating the GWAS data with the meQTLs shared by the five time points of life course. Comparing DEPICT results of ADHD, BP and SCZ, we successfully identified multiple common and disease specific genes, pathways and tissue/cell types for the three major psychiatric disorders.
Notably, one important finding of this study is that NISCH, TWF2 and TRANK1 were associated with both BP and SCZ. NISCH encodes a cytosolic protein Nischarin. It has been demonstrated that Nischarin is widely distributed throughout the brain, with a higher expression in the cerebral cortex and hippocampus (Ding et al., 2013). In addition, NISCH has been identified as a putative SCZ risk gene (Lin et al., 2016). TWF2 encodes twinfilin actin binding protein 2, which is involved in neurite outgrowth (Yamada et al., 2007). As for TRANK1, interestingly, in an investigation exploring shared risk loci for BP and SCZ, the most strongly associated SNP was located near TRANK1 (Forstner et al., 2017). Based on previous and our study results, we suggest the important impact of NISCH, TWF2 and TRANK1 on the development of both BP and SCZ.
Additionally, we identified several genes only associated with a certain psychiatric disorder. For instance, ST3GAL3 was the top gene only associated with ADHD. It has been found that ST3GAL3 mutation can lead to nonsyndromic intellectual disability (Hu et al., 2011). Furthermore, in a recent research, probe annotated to ST3GAL3 was among the identified 13 probes which were differentially methylated between high and low trajectories of ADHD symptoms at birth (Walton et al., 2016), indicating that there is a relationship between ST3GAL3 and ADHD. Our results further confirmed the functional relevance of ST3GAL3 with ADHD. SPTBN2 was the top gene only associated with BP. SPTBN2 encodes β-III spectrin, which is crucial for trafficking and anchoring of important neurotransmitter transporters and ion channels to the neuronal cell membranes (Clarkson et al., , 2014Perkins et al., 2010). In particular, mutations in SPTBN2 can lead to two human neurodegenerative diseases, spinocerebellar ataxia type-5 (SCA5) (Ikeda et al., 2006) and spectrin-associated autosomal recessive cerebellar ataxia type-1 (SPARCA1) (Lise et al., 2012). DGKZ was the top gene only associated with SCZ. Interestingly, Rietschel et al. has observed association between DGKZ and the risk of SCZ (Rietschel et al., 2012).
We detected 8 pathways shared by ADHD, BP and SCZ, such as KEGG_MAPK_SIGNALING_PATHWAY and REACTOME_CELL_DEATH_ SIGNALLING_VIA_NRAGE_NRIF_AND_NADE.
Several genes of  KEGG_MAPK_SIGNALING_PATHWAY have been suggested to be functionally related to nervous system or psychiatric disorders, such as AKT3, CACNG2 and PLA2G4B. AKT3 is highly expressed in human fetal brain (Wu et al., 2009). Mice with AKT3 knockout result in a decrease in neuronal number and size, which lead to selective reduction in brain size (Easton et al., 2005). CACNG2 was identified as a susceptibility gene of SCZ (Liu et al., 2008). PLA2G4B has been observed to be involved in the development of SCZ (Tao et al., 2005). For REACTOME_CELL_DEATH_SIGNALLING_VIA_NRAGE_NRIF_AND_ NADE, some of the genes in this pathway have been shown to be associated with SCZ, such as ARHGAP4 and NCSTN (Lin et al., 2016;Hatzimanolis et al., 2013). It is also noteworthy that the ARHGEF9 gene of this pathway encodes a crucial neuronal synaptic protein, dysfunction of which results in severe intellectual disability and epilepsy (Alber et al., 2017). Tissue/cell type enrichment analysis observed that three psychiatric disorders associated meQTLs were mainly related to brain as well as hemic and immune systems. Interestingly, abundant researches have observed the associations between immune system and psychiatric disorders. For instance, high levels of the acute-phase protein CRP were observed in major depressive disorder patients (Howren et al., 2009). In addition, previous study found that cognitive processes, including reaction time and working memory, could be influenced by cytokines (Brydon et al., 2008). Using rodent and non-human primate animal models, researchers have clearly demonstrated a causal relationship between maternal infection and autism spectrum disorder as well as SCZ related behavioral abnormalities (Estes and Mcallister, 2016). Furthermore, in a population based longitudinal study, high concentrations of serum interleukin 6 in childhood have been associated with increased risk of depression and psychosis in young adult life (Khandaker et al., 2014).
Psychiatric disorders generally have both common and distinct genetic basis. Previous studies have explored risk loci and pathways with shared effects on some psychiatric disorders (Consortium, 2013;Forstner et al., 2017). However, few of them were from the perspective of meQTL. In this study, we compared gene prioritization, pathway enrichment and tissue/cell type enrichment analyses results of the three psychiatric disorders associated meQTLs. Taken together, findings from our study demonstrate that the three major psychiatric disorders have both common and specific causal genes and pathways. Some of our findings have already been related to psychiatric disorders by previous studies (Lin et al., 2016;Walton et al., 2016;Rietschel et al., 2012).
There are two limitations that should be noted. First, through integrating GWAS and five life time points consistent meQTLs datasets, we identified multiple candidate genes and pathways associated with ADHD, BP or SCZ. However, to the best of our knowledge, the biological mechanism of identified genes and pathways involving in the development of ADHD, BP and SCZ remains largely unknown. Further biologic mechanism studies, such as cell and animal experiments are needed. Additionally, it is well known that environment factors are capable of affecting DNA methylation levels (Christensen et al., 2015). Given the important impacts of environmental factors on the risks of psychiatric disorders, it is worthy to explore the joint effects and related biological mechanism of environmental risk factors, candidate genes and pathways in the pathogenetic studies of psychiatric disorders. Second, it is worth noting that the meQTLs datasets of birth, childhood and adolescence time points were derived from children, and the meQTLs datasets of pregnancy and middle age time points were derived from their mothers. Although all study subjects are Caucasians, the difference of the study population among different life stages may cause discrepancies that have nothing to do with age, and therefore exaggerate the meQTL differences among different life stages.
In conclusion, we conducted an integrative analysis of GWAS and life course consistent meQTLs datasets for ADHD, BP and SCZ, respectively. We identified multiple genes, pathways and tissue/cell types with common or disease specific effects. It is important for future work to continue to investigate the detailed biological mechanism. We hope that our work could provide novel clues for exploring the pathogenesis of ADHD, BP and SCZ.

Funding
This study was supported by the National Natural Scientific Foundation of China [81472925,81673112]; and the Key projects of international cooperation among governments in scientific and technological innovation [2016YFE0119100].