Common and Rare Variant Analysis in Early-Onset Bipolar Disorder Vulnerability

Bipolar disorder is one of the most common and devastating psychiatric disorders whose mechanisms remain largely unknown. Despite a strong genetic contribution demonstrated by twin and adoption studies, a polygenic background influences this multifactorial and heterogeneous psychiatric disorder. To identify susceptibility genes on a severe and more familial sub-form of the disease, we conducted a genome-wide association study focused on 211 patients of French origin with an early age at onset and 1,719 controls, and then replicated our data on a German sample of 159 patients with early-onset bipolar disorder and 998 controls. Replication study and subsequent meta-analysis revealed two genes encoding proteins involved in phosphoinositide signalling pathway (PLEKHA5 and PLCXD3). We performed additional replication studies in two datasets from the WTCCC (764 patients and 2,938 controls) and the GAIN-TGen cohorts (1,524 patients and 1,436 controls) and found nominal P-values both in the PLCXD3 and PLEKHA5 loci with the WTCCC sample. In addition, we identified in the French cohort one affected individual with a deletion at the PLCXD3 locus and another one carrying a missense variation in PLCXD3 (p.R93H), both supporting a role of the phosphatidylinositol pathway in early-onset bipolar disorder vulnerability. Although the current nominally significant findings should be interpreted with caution and need replication in independent cohorts, this study supports the strategy to combine genetic approaches to determine the molecular mechanisms underlying bipolar disorder.


Introduction
Numerous genome-wide association (GWA) studies have recently been performed on bipolar disorder (BD), but few signals were replicated and meta-analyses identified only a couple of associated genes, such as CACNA1C, ODZ4 and NCAN, with a small effect size [1][2][3][4]. Although convergent evidences argue in favour of a role for these genes in vulnerability to BD [5][6][7][8][9], more extensive studies suggest it might be involved in other psychiatric disorders [3,10,11]. The difficulty to identify susceptibility genes specific to BD could be due to the ethnic heterogeneity, as a consequence of the huge number of patients and controls required to get genome-wide significant signals, or to the clinical and a genetic heterogeneity of the DSM-IV ''bipolar disorder'' entity.
In the current study, we focused on patients with BD selected with an early age at onset, a clinical characteristic known to identify a subgroup of patients with a higher morbid risk for BD in relatives [12][13][14][15][16][17]. We then compared these data to those observed in two previously published GWA studies of BD, conducted by the Wellcome Trust Case Control Consortium (WTCCC) [18] and the Genetic Association Information Network Bipolar Sample (GAIN-BP) [19]. Finally, we screened our regions of interest for rare variants, suspected to have a larger effect size, and suggested the importance of combined approaches in the identification of vulnerability genes to BD.

Ethics Statement
Protocols and procedures were approved by the research ethics board of the Pitié-Salpêtrière Hospital in Paris for the French sample and the ethics committees of the Faculties of Medicine at the Universities of Bonn and Mannheim/Heidelberg for the German sample. Written informed consent was obtained from all subjects prior to study participation.

Sample
French sample. Two hundred and twenty patients of French descent with at least three grandparents from mainland France were collected through a French national network for mental health (Fondation FondaMental) in three university-affiliated psychiatry departments (Paris-Créteil, Bordeaux and Nancy) as previously described [12]. All patients met DSM-IV criteria [20] for BD type I (N = 172), type II (N = 46) or BD not otherwise specified (N = 2) before the age of 22. The age at onset was defined by the first mood episode (depressive, manic or hypomanic) and was retrospectively assessed using medical case notes and information obtained with the Diagnostic Interview for Genetic Studies (DIGS) [21]. The threshold of 22 was chosen on the basis of previous admixture analyses [16,[22][23][24][25].
German sample. One hundred and sixty-seven German patients were recruited from consecutive admissions to the psychiatric inpatient units of the Central Institute of Mental Health in Mannheim and the Department of Psychiatry and Psychotherapy of the University of Bonn (BOMA bipolar study). These patients received a lifetime diagnosis of BD type I based on DSM-IV criteria and had an age at onset lower than 22.
Control population. Genotyping data from 1,823 French controls and 629 German controls were provided by the Centre National de Génotypage (CNG, Evry, France). These samples were previously collected and were not screened for psychiatric disorders. They have been genotyped for association and population studies as described elsewhere [26] and permission to use them were obtained from the original investigators. DNA from 380 additional German controls were collected in the above mentioned German hospitals. An additional sample of 293 unaffected French controls screened for personal history of psychiatric disorders using the DIGS has been screened for mutation in the PLCXD3 exon 2, in order to determine the specificity of the p.R93H mutation.

DNA extraction and genotyping
DNA was isolated from lymphocytes either directly from venous blood sample or after transformation by Epstein Barr virus. Isolation was performed by salting-out with saturated sodium chloride solution [27]. French and German cases were genotyped at the CNG, using HumanHap550 or Human 610-Quad BeadArrays and the Infinium II assay (Illumina, San Diego, CA, USA). Control samples were genotyped on HumanHap300 (N = 2,503) or on HumanHap550 BeadArrays (N = 380). In order to determine whether the use of multiple arrays might result in a population stratification, we performed a multidimensional scaling analysis based on pairwise identity-by-state distance between pairs of individuals. The good clustering of all individuals showed that there was no stratification due to the use of multiple arrays ( Figure  S1 in File S1).

Quality control criteria
All samples were passed through the standard quality-control procedures followed at the CNG for GWA studies, as described elsewhere [26]. For the French population, the quality control was performed using the PLINK toolset [28] on a total of 317,131 commons SNPs genotyped in 2,043 DNA samples (Table S1 in File S1). We removed monomorphic SNPs, as those with a minor allele frequency (MAF) lower than 0.01 and SNPs with a call rate lower than 0.97. We performed a missing chi-square test that compares, for each SNP, missingness between cases and controls and excluded SNPs with P,10 23 . Then, we removed SNPs that did not show Hardy-Weinberg proportions in controls, using a significant threshold at P,10 23 . In parallel, we removed male samples with more than 0.5% or female samples with less than 20% heterozygous markers on the X chromosome and individuals with a call rate lower than 0.97. We estimated the average genome-wide identity-by-state (IBS) sharing between individuals and analysed the clustering using a multidimensional scaling (MDS) plot. Hence, we excluded 15 individuals (3 cases and 12 controls) with a low level of identity. After pruning, the final data set consisted in 261,525 SNPs genotyped in 1,930 individuals (211 cases and 1,719 controls) that were used for association study. Identical quality control criteria were used for the replication sample from Germany (Table S1 in File S1) and MDS plot showed 6 controls to be removed. This led to a final data set of 288,167 SNPs and 1,157 individuals (159 patients and 998 controls).

Association analysis
Basic case/control association analyses were performed using the PLINK software v1.07 [28]. A haplotype analysis was performed on the two most associated regions using sliding windows of two to five SNPs shifting by one SNP at a time. We tested for case/control haplotype-based association using a chisquare test with one degree of freedom. Only window with best Pvalues are reported for each region. Imputation data were based on the 3,967,651 SNPs genotyped in 90 subjects of the CEU population of the HapMap project (http://www.hapmap.org). Imputation was performed using PLINK and consisted in estimating the allele frequencies of an ungenotyped SNP based on its surrounding haplotypic background.
Genotyping data from the WTCCC [18] and the GAIN-TGen collection [19] were provided for BD patients with an age at onset lower than 22 and for controls. In the WTCCC, BD was defined according to Research Diagnostic Criteria [29] and included BD type I, type II, schizoaffective disorders and manic disorders. The WTCCC sample was genotyped on Affymetrix 500K array, containing only few SNPs in common with the Illumina arrays used in other samples. Thus, the genotype correlation data were obtained for SNPs within a 1 Mbp window, based on individuals and SNPs in the CEU HapMap filtered set (http://www.hapmap. org) and the 2,938 WTCCC control sample combined [18]. An association test between 764 early-onset patients and 2,938 controls was performed, using SNPs with a genotyped correlation higher than 0.2 with associated SNPs at P,5610 25 identified through the meta-analysis.
All patients from the GAIN-TGen cohort met DSM-IV criteria for BD type I. Imputed data from the GAIN-TGen bipolar casecontrol sample [30] were obtained, using the MACH software with phased haplotypes from the 60 HapMap-CEU founders (release 22). Allele frequencies of 15 SNPs out of the 16 associated SNPs at P,5610 25 , were compared between 1,524 patients with an early age at onset and 1,436 controls.
A flow chart summarizing all steps of the genetic analysis is shown on the Figure S2 in File S1.

Detection of copy number variations
Copy number variations were detected by fluorescence level comparison through the chromosome, using the SnipPeep software (R. Toro, Pasteur Institute, France). The deletion in PLCXD3 was confirmed by quantitative PCR analysis using one internal and two external couples of primers and Mesa Green qPCR master mix plus (Eurogentec, Liege, Belgium). PCR were performed and run in a Mastercycler ep realplex2S (Eppendorf, Hamburg, Germany) for the deleted patient and two unaffected controls, known to have no deletion in the regions under study. Each real-time PCR was performed in triplicate and assessed by comparing Ct at a determined threshold between the three amplicons and the three individuals.

Mutation screening in PLCXD3
The PLCXD3 coding regions were amplified for 334 French subjects with BD, sequenced by the Sanger's method using the BigDye terminator v3.1 cycle sequencing kit (Life Technologies, Carlsbad, CA, U.S.A.) and run on a 16-Capillary ABI PRISM 3130xl genetic analyzer. The PLCXD3 exon 2 was sequenced for 293 unaffected French controls. All primers used for PCR amplifications and sequence analyses are available on request.

Analysis of a French cohort
We carried out an association study on 211 patients with earlyonset BD of French origin and 1,719 matched controls, who have been genotyped on an Illumina platform with 261,525 single nucleotide polymorphisms (SNPs) passing quality control. The estimation of an inflation factor [31] showed a good genetic homogeneity in our sample (l = 1.03), suggesting that no correction for genomic control was needed. According to the genome-wide significance threshold of P = 5610 28 , no significant difference in allele frequencies was observed in our sample, but 17 SNPs showed a nominal P,5610 25 (Table S2 in File S1), suggesting these loci may be associated with vulnerability to earlyonset forms of BD. Interestingly, the lowest P-value (rs10096683, P = 4.11610 26 , OR = 2.26), was observed for a SNP located on chromosome 8p22, a region that has previously been associated with schizophrenia [32][33][34][35] and suicide behaviour in patients with recurrent major depression [36].

Replication on a German cohort
Next, we selected 14,037 SNPs with a P,0.05 to perform a replication study on an independent sample of 159 German subjects with an early-onset BD and 998 matched controls (Table  S1 in File S1). Again, a good genetic homogeneity (l = 1.04) was observed after quality control for this sample. Out of the 14,037 SNPs previously selected, 13,734 were genotyped in the German sample, and 739 showed a P,0.05, showing a significant overrepresentation (P,0.03) of nominally significant P-values. The most associated SNP in the German sample was located on chromosome 12 upstream to the PLEKHA5 gene (rs2970836, P = 5.3610 26 , OR = 1.76).

Meta-analysis on the French and the German cohorts
We subsequently performed a meta-analysis, combining data from the French and the German cohorts, yielding a sample of 370 patients with early-onset BD and 2,717 controls. Again, no locus reaches the genome-wide significant threshold. The largest difference in allele frequencies between cases and controls was observed for chromosome 12 (rs2961365, P meta = 1.6610 26 , OR meta = 1.57) with no evidence of heterogeneity between the two cohorts (Q = 0.94, 1 degree of freedom, I 2 = 0). In this region, 5 SNPs showed a difference in allele frequencies at P,5610 25 in 355 Kbp ( Figure 1, Table 1). These SNPs span two genes, PLEKHA5 and AEBP2. The second strongest association with no heterogeneity (Q = 0.59, 1 degree of freedom, I 2 = 0) was observed for chromosome 5p13 (rs10512793, P meta = 2.7610 26 , OR meta = 1.61), in which two SNPs, spaced by 196 Kbp, showed a difference in allele frequencies at P,5610 25 ( Figure 1, Table 1). These two SNPs were located in the OXCT1 gene, and upstream to PLCXD3. In order to further explore the associated peak in these two regions, we used a three-SNP sliding window and performed a haplotype analysis. Whereas no further significant difference, as compared to single SNP analysis, was observed on chromosome 12p12, a more significant difference between patients and controls was observed on chromosome 5p13 for the combination of rs624097-rs316762-rs10512793, for which an overall P meta = 2.6610 27 was observed ( Figure 1).

Imputation data
In order to refine the association signal in these two regions, we imputed data for ungenotyped SNPs in the French and the German cohorts based on the 3.9 million SNPs genotyped in the CEU population of the HapMap project. The results of the two populations were then combined to perform an association study. After imputation, we observed an increased signal for chromosome 12 with the lowest estimated P-value for rs10743315 (P = 2.1610 27 , OR = 1.61) located in PLEKHA5 (Figure 1). On the contrary for chromosome 5, the lowest P-value was observed for the genotyped SNP, rs10512793.

Replication studies on the WTCCC and the GAIN-TGen cohorts
We analysed the top-16 SNPs (P,5610 25 ) out of the metaanalysis in two additional samples of early-onset bipolar patients, already genotyped on different platforms through the WTCCC and the GAIN and performed a mega-analysis including the four samples (Table S3 in

Structural abnormalities in region of interest
Fluorescence level of genotyped SNPs was only available for a subsample of early-onset patients of French and German origin (N = 321). We analysed this fluorescence level in the two 1.5 Mbp regions of interest on chromosome 5p13 and 12p12. On chromosome 5p13, we identified, in a French patient who manifested a first mood episode by the age of 17, a 90-Kbp deletion removing the PLCXD3 promoter region as well as the first coding exon of the gene (Figure 2), whereas no copy number variation has been previously reported in this region in the database of genomic variations (http://projects.tcag.ca/variation/). Although DNA was not available for first-degree relatives of this patient, the deletion was confirmed using quantitative PCR analysis (data not shown). In order to determine whether independent family specific mutations might be frequently found in bipolar patients, we screened all coding exons of PLCXD3 in 334 French subjects (204 patients with early-onset BD and 130 patients with late-onset BD). Three synonymous variations (p.V58, p.A130 and p.P251) were found in these patients, and one amino acid change (p.R93H) was identified in one subject with early-onset BD type I and obsessive compulsive disorder. This variation was not found in 293 unaffected matched controls. This amino acid is highly conserved through evolution and, according to PolyPhen-2 [37], this variation is predicted to strongly affect the function of the protein. Further analysis of the available DNA sample in this family showed that this variation was transmitted from the father ( Figure S3 in File S1). The patient manifested first mood symptoms by the age of 19. Although his father did not meet criteria for mood disorder according to DSM-IV, his first cousin manifested major depressive disorder by the age of 65. Unfortunately, no DNA was available for this subject.

Discussion
The important clinical heterogeneity of BD has probably hampered the identification of vulnerability genes so far. After tumultuous starts, GWA studies have enabled the identification of robust and replicable genetic findings of weak effect, in regions of CACNA1C, ODZ4 and NCAN, but without disease specificity [4]. In this study, we conducted a genome-wide approach on a highly heritable phenotype, focusing on patients with an early age at onset. Based on the genotyping data of 261,525 SNPs, we did not identify SNPs that reached the genome-wide significant threshold in our patients and controls of French origin. This was most likely due to the small sample size of our cohort. We calculated the statistical power of our study, using the CaTS software [38] and estimated that we had 80% chance to detect only SNPs with a genetic risk factor higher than 1.7 ( Figure S4 in File S1). However, recent data from larger cohort studies showed a polygenic component in BD [3,39], consistent with a combine vulnerability resulting from many variants of weak effect. We thus conducted a replication study on a German cohort of patients with early-onset BD and performed a subsequent meta-analysis. Although it did not reach the genome-wide significant threshold, the most associated region in our meta-analysis was located on chromosome 12p12. This region has been previously associated with bipolar disorder [40][41][42] and more specifically when patients had an early age at onset of mania [40]. Five SNPs in this region showed a difference in allele frequencies between patients and controls, spanning two genes, PLEKHA5 and AEBP2. The genetic refinement using imputed data showed the lowest P-value for a SNP located in PLEKHA5 (rs10743315, P = 2.1610 27 ). PLE-KHA5 encodes a protein containing a pleckstrin homology domain, which interacts with phosphatidylinositol 3-phosphate [43]. Few elements are known about this gene except its ubiquitous pattern of expression, including foetal and adult brains. Nevertheless, there is compelling evidence for the implication of the phosphatidylinositol signalling pathway in the etiopathogeny of BD and the two most effective mood stabilisers for BD (lithium and valproate) directly inhibit this pathway [44]. Further exploration of response to mood stabilizer in regards to the PLEKHA5 genotypes might help in understanding the difference observed according to the patients' age at onset [15].
The second most associated region in our meta-analysis was located on chromosome 5p13. In this region, the highest difference in allele frequencies between patients with early-onset BD and controls was observed for rs10512793. This SNP was located in OXCT1, which encodes a 3-oxoacid CoA transferase 1. This enzyme catalyzes the reversible transfer of CoA from succinyl-CoA to acetoacetate and is thus the first step of ketone body utilization, the main source of lipid-derived energy for the brain  [45]. Mutations in this gene are associated with succinyl CoA:3oxoacid CoA transferase (SCOT) deficiency, characterised by episodes of severe ketoacidosis [46,47], which can also emerge during treatment with some atypical antipsychotics [48]. However, the haplotype analysis of this region suggested that the associated peak might be located more likely downstream to OXCT1 and upstream to PLCXD3. This result is consistent with the 90 Kbpdeletion reported in one of the French patients and that removed the promoter region and the first coding exon of PLCXD3. Further exploration of this gene revealed a mutation predicting to change the amino acid sequence of the protein (p.R93H) and transmitted from his father, who had a first cousin with major depressive disorder. Interestingly, this gene encodes a phosphatidylinositol-specific phospholipase C, strengthening the involvement of the phosphoinositide-signalling pathway in vulnerability to BD. Note that tricyclic antidepressants (e.g. desipramine), which are known to induce rapid cycling or induce manic or hypomanic episodes in some patients with BD [49], stimulate phospholipase C activity and the production of the second messenger inositol 1,4,5trisphosphate [50]. In addition, experiments performed using rat cultured hippocampal neurons revealed that desipramine rapidly enhanced the spontaneous SNAP25-dependent vesicular release of glutamate [51]. This Ca 2+ dependent mechanism is consistent with our previously reported association between SNAP25 and earlyonset BD [12] and with the strong associations observed between SNPs located in CACNA1C and BD [1,3].
The main limitation of our study is not to reach the stringent genome-wide threshold. Although this might translate an absence of common genetic variations for a clinical subgroup based on an early age at onset, this is also a direct consequence of the small sample size and the loss of power inherent to the subphenotype analysis. Similar study on larger sample should be performed to confirm our data and valid the relevance of such approach. Nevertheless, results observed in our French subsample with an early age at onset have been replicated in a German sample and supported by the data from the WTCCC with similar phenotype criteria.
Another possible cause of the loss of power in our analysis is the categorical exploration of the age at onset. Despite a widely valid threshold for the age at onset [16,17,[22][23][24][25], a continuous exploration of this variable would allow to determine whether specific loci might lead to various age at onset, which would result, under multiplicative or additive models, in an overall early age-atonset in patients. Such hypothesis is supported by the recent observation of the International Schizophrenia Consortium, which showed the molecular genetic evidence for a substantial polygenic component to the risk of BD involving thousands of common alleles of very small effect [39]. Similar analysis should thus be performed according to the patients' age at onset, but this should require larger samples for the study and the replication analysis.
In conclusion, although we did not reach the genome-wide significant threshold in our association study, we report here convergent data between our GWA study on early onset BD, the structural abnormality observed in one patient and a missense mutation reported in one family with early-onset BD. The difficulties to reach the stringent genome-wide significant threshold suggest that larger samples should be included, which will also help in better defining the most convenient strategy to take into account the age at onset of patients. Although the current nominally significant findings should be interpreted with caution and need replication in independent cohorts, our results displayed that the combined exploration of common and rare variants in BD can reveal identical pathways suggesting both approaches should be systematically used in further genetic exploration of this disorder. Using these approaches, we provided evidence that variations in phosphoinositide secondary messenger signalling pathway, and for the first time, that the PLEKHA5 and PLCXD3 genes, might confer vulnerability to early-onset BD. Further exploration of these mechanisms might be used to better identify markers of prognosis and help the development of innovative treatment for these severe forms of the disease.