Whole Genome Association Study in a Homogenous Population in Shandong Peninsula of China Reveals JARID2 as a Susceptibility Gene for Schizophrenia

DNA pooling can provide an economic and efficient way to detect susceptibility loci to complex diseases. We carried out a genome screen with 400 microsatellite markers spaced at approximately 10 cm in two DNA pools consisting of 119 schizophrenia (SZ) patients and 119 controls recruited from a homogenous population in the Chang Le area of the Shandong peninsula of China. Association of D6S289, a dinucleotide repeat polymorphism in the JARID2 gene with SZ, was found and confirmed by individual genotyping (X2 = 17.89; P = .047). In order to refine the signal, we genotyped 14 single nucleotide polymorphisms (SNPs) covering JARID2 and the neighboring gene, DNTBP1, in an extended sample of 309 cases and 309 controls from Shandong peninsula (including the samples from the pools). However, rs2235258 and rs9654600 in JARID2 showed association in allelic, genotypic and haplotypic tests with SZ patients from Chang Le area. This was not replicates in the extended sample, we conclude that JARID2 could be a susceptibility gene for SZ.


Introduction
Schizophrenia, SZ (MIM 181500) is a severe psychiatric disorder which affects ∼1% of the general population and has a strong genetic component [1]. However, the model of transmission is complex and is most likely polygenic [2]. Over the last two decades, linkage and association studies have identified a number of chromosomal regions and several promising candidate genes [3,4]. Those for which the evidence is the strongest are neuregulin 1 (NRG1), Dysbindin (DTNBP1), Disrupted-in-Schizophrenia 1 (DISC1), and regulator of G-protein signalling 4 (RGS4). In addition, D-amino oxidase activator (DAOA) has been implicated in multiple studies of schizophrenia and bipolar disorder (BP) [5]. Support for the involvement of these genes in the pathophysiology of psychotic disorders comes from expression and neurobiological studies as well as from linkage and association studies [6][7][8][9]. However, in spite of what might appear an impressive weight of evidence, in no case there has been consistent replication for the same markers and haplotypes across studies, and specific risk or protective nucleotide variants have not been consistently and unambiguously identified. While such consistency might emerge as these genes are subjected to more detailed and systematic analysis in sufficiently large samples, the strength and consistency of the genetic association evidence in support of these candidates is not currently as strong as that reported for other complex disorders such as type 1 and 2 diabetes, rheumatoid arthritis, and macular degeneration [10][11][12][13][14][15][16]. Much hope is now pinned on the results of genomewide association (GWA) studies, as it is now possible to conduct such studies with hundreds of thousands of SNPs. However, such studies are expensive and require very large sample sizes to compensate for the necessary correction for multiple testing, and are, therefore, impossible to conduct by smaller research teams. Small laboratories can, however, rely on DNA pooling studies which cut dramatically the cost [17].
Although the sensitivity and specificity of pooled DNA analyses are imperfect, previous research suggests that such analysis can offer an economic alternative to individual genotyping, although clearly, not all associated loci will be detected. Association studies using microsatellite markers distributed across the human genome every 100 to 150 kb were reported to have distinct advantages over SNP typing because of the much longer extent of linkage disequilibrium (LD) of microsatellites and their higher informativeness [18].
When an appropriate isolated population of relatively recent origin can be studied, it might be possible to use fewer microsatellite markers to screen the entire genome [19]. Based on this theory, we investigated the immigration history of the population residing in Shandong peninsula of China. In particular, we identified the population of the Chang Le area as having the elements of an isolate. This area is located in the middle east of Shandong peninsula and the population there expanded from a limited number of ancestors who emigrated from Shan Xi Hong Dong district in 1388. We reasoned that the 600 year-long history of this population is short enough to preserve long LD, so that some it might be captured by only 400 microsatellite markers spaced at about 10 cm throughout human genome. We admit that there is no sufficient evidence that this is a homogenous population and that our coverage of the genome is likely to be insufficient. Ideally, we would have preferred to cover the genome with 3 000-6 000 markers, but this was beyond our resources.

Subjects.
A number of 119 unrelated SZ cases and 119 controls from Chang Le County in Shandong peninsula of China were recruited for the first stage of the DNA pooling genome screen. According to the historical records (Ming Taizu Record), the population of Shandong peninsula in the fourteenth century was reduced dramatically due to continuous war, natural disasters (flood, plague, and grasshoppers) and most of the villages were devastated. Many people left the area. In contrast, no war or natural disasters took place in Shan Xi province at that time. The two provinces are separated by 1000-kilometer distance and high mountains, serving as a natural barrier. To balance the population distribution in China, the Emperor of ming dynasty (Y.Z. Zhu) initiated two massive immigration waves from Shan Xi to Shandong peninsula from 1388 to 1389 in order to repopulate that region. It appears that the present-day populations of Shandong peninsula are mostly offspring expanded from different groups of limited number of ancestors that settled there 600 years ago. So, we expect a high degree of homogeneity in the genetic structure of this population, compared with the outbred populations formed in the recent 50 years in many other parts of China, especially in the big cities. The average age of the cases was 33.98 ± 11.99, and the male/female ratio was 1.05(61/58); the average age of the controls was 22.14±5.59, and the male/female ratio was 1.05(61/58), the difference in average age of the cases and controls is significant.
Other two groups of SZ cases and controls from Jinan and Zichuan area were recruited giving a total of 309 cases and 309 controls for the second stage of the study. All SZ cases were from inpatient departments of local psychiatric hospitals. The average age was 27.52 ± 10.84, the male/female ratio was 1.73(196/113). All controls were from the corresponding local blood transfusion centers that were geographically matched to the three psychiatric hospitals. Controls were not screened for mental illness. The average age of controls was 34.21 ± 22.22 and the male/female ratio was 1.83(200/109). All cases and controls are Han Chinese. Diagnoses were made according to DSM-IV criteria [20] by at least two experienced psychiatrists, on the basis of empirical diagnostic interviews and review of medical records. Written consent was obtained from all subjects and this study was approved by the Ethics Committee of the Institute of Basic Medicine of Shandong Academy of Medical Sciences.

DNA Pools Construction.
Genomic DNA was extracted from peripheral venous blood by a modified phenolchloroform method. The concentration was measured twice on a DU-530 Life science UV/V spectrophotometer (BECK-MAN). Samples showing > 10% difference were measured again and we used the mean of the three measurements. DNA samples were diluted to 20 ng/µL working concentrations, and two DNA pools from SZ cases and controls were constructed, respectively, by adding up 10 µL of each sample.

Genotyping.
We used microsatellite markers from the ABI PRISM Linkage Mapping Set MD-10 (Applied Biosystems, Foster City, Calif, USA) comprising 400 markers with an average spacing of 10 cm. Polymerase Chain Reaction (PCR) in a 15 µL volume was performed with 20 ng of genomic DNA, 1.5 mM MgCI 2 , 10x reaction buffer, 0.2 mM of each dNTP, 0.3 µM of each primer, and 1 U of Taq polymerase. Amplification was performed on a Gene Amp 9700 thermocycler (Applied Biosystems) with the following parameters: 95 • C for 12 minutes initial denaturation, followed by 94 • C 15 seconds, 55 • C 15 seconds, 72 • C 30 seconds, 10 cycles; then 89 • C 15 seconds, 55 • C 15 seconds, 72 • C 30 seconds, 20 cycles, then 72 • C 10 minutes, and 4 • C hold. Electrophoresis was carried out on ABI PRISM 3100-Avant Genetic Analyzer, by mixing up to 9 PCR products simultaneously, which had no overlap of products of the same size range labelled with the same dye. Genotyping was performed with GeneMapper 3.5 software (Applied Biosystems).
SNPs for fine mapping were selected from the HapMap database (www.hapmap.org), ensuring that they are polymorphic in the Chinese population (CHB). LD blocks were determined with the criteria R 2 > 0.80 and minimum MAF 2%, using HAPLOVIEW ver.3.2 software [21]. Haplotype tag SNPs (htSNPs) were defined as those capturing 90% of the haplotype diversity within each LD block with no exclusion criterion of minimum minor allele frequency. Fourteen SNPs   were selected around DTNPB1 and JARID2 ( Table 2). Thirteen of those were genotyped using the Sequenom Mass ARRAY system as per the manufacturer's instructions with either hME or iplex chemistries. SNP rs4715984 was genotyped with allele-specific PCR using the Amplifluor system [22,23], as it would not fit on a Sequenom panel.
All assays used to type the full association sample were optimized initially by genotyping DNA from the 30 CEPH parent-offspring trios used in the international HapMap project. All genotypes were called blind to sample identity and affected status.

Statistical Analysis.
The predicted allele frequencies of microsatellites between cases and controls in DNA pools were first estimated by dividing each allele peak height obtained by the GeneMapper v3.5 software with the sum of all peak heights of that marker. This ratio was converted into allele counts and the statistical significance was calculated with CLUMP [24] which can analyze multiallelic markers. The analysis of individual genotyping of D6S289 was also performed by CLUMP. Tests of allelic, haplotype association, and Hardy Weinberg equilibrium were performed with HAPLOVIEW3.32 [21]; Genotypic and allelic association analyses were performed with PLINK [25]. Haplotype association of the microsatellite marker D6S289 and SNPs was performed with UNPHASED [26].

Results
All 400 microsatellites were genotyped, some after repeated PCR and/or electrophoresis. A P value of <.05 produced by CLUMP was set as a cut-off point for follow-up with individual genotyping. Markers meeting this standard were repeated in both PCR and electrophoresis, and the number of replications varied from 2 to 5 times depending on the quality of the genotypes achieved. A total of 126 loci were repeated and only three markers remained significant: D4S403, D6S276, and D6S289.
D6S289 is located on chromosome 6p23 within the linkage locus for schizophrenia (SCZD3). It lies within JARID2 and immediately adjacent to DTNBP1, a strong candidate gene for SZ. Therefore, we chose it as a priority for individual genotyping, which confirmed the predicted association: X 2 = 17.89, P = .047. A total of eleven alleles were observed, and three alleles (209, 211 and 213) were more common in cases than in controls (Table 1).
SNP genotyping was performed on the samples included in the pool, and in additional cases and controls, giving a total of 309 cases and 309 controls. Genotypes of the 14 SNPs located in JARID2 and DTNBP1 in the same 90 CEU DNA samples used in the HapMap Project were 100% concordant with HapMap data. However 13 of the 14 SNPs were in Hardy Weinberg equilibrium, with rs4715984 showing HW P value of .001. No significant association with illness was observed for any single marker by allele or genotype in the whole cohort of 309 cases and 309 controls (data not shown).

Discussion
In this paper, we first performed a DNA pooling study with 119 SZ cases and 119 healthy controls from an isolated population in China. We covered the entire genome with 400 microsatellite markers, assuming that in an isolated population the LD would cover larger distances and some association signals can be detected with this space coverage. The extent of linkage disequilibrium (LD) in general populations is unlikely to extend for more than 1 Mb [27]. Therefore, genome screens to map disease loci by case control design usually require at least 3 000-6 000 highly polymorphic, evenly spaced microsatellite markers. However, when an appropriate isolated population of relatively recent origin is available for association mapping of disease, a few hundred markers could be sufficient to scan the whole genome, if it can be assumed that linkage disequilibrium extends to ≥ 10 Mb [19]. Based on this view, we investigated the immigration history of the population in Shandong peninsula, and chose samples from three geographical areas.
From the pooling study, we identified a positive signal on chromosome 6, near DTNPB1, and confirmed the association with individual genotyping of the microsatellite marker, X 2 = 17.89, P = .047. We then saturated the region with 14 SNPs and typed them individually in an extended sample of 309 cases and 309 controls (including 89 cases and 88 controls in the pools). SNPs genotyping showed association with two of them in the original sample included in the pool. Allelic association was observed with rs2235258 and rs9654600 of P = .0087 and P = .046, respectively. The association was not replicated in the additional samples from other regions, LD between the microsatellite marker and SNPs in the region extended up to 8 Mb in the relatively isolated population of Chang Le, although we admit that we do not know the corresponding extent of LD in the other Chinese samples, as they were not genotyped for the microsatellite. The maintenance of homogeneity in genetic structure in the Chang Le population is mainly due to the economy, which has long been dominated by agriculture. According to the "Ming Tai Zu Record," this population expanded from a limited number of ancestors who were resettled from the Shanxi province by Emperor YZ Zhu during the ming dynasty (AD1388). No further immigration happened in the following 600 years. Relatively isolated populations have produced some promising findings in psychiatric genetics in the past [28], such as for the genes NRG1 [29] and DAOA [30].
One weakness of the current study is that only 89 cases from Chang Le are available for the SNP genotyping. Such a small sample size could lead to a false positive finding (type 1 error). However, the possibility of type 1 error is reduced because our analysis was carried out on an isolated population, reducing genetic heterogeneity. In addition, the association findings were based on an analysis of a gene targeted by our hypothesis-free genome wide linkage study, a region previously linked to schizophrenia by many groups.
The association was not present in additional samples from the same peninsula. This could indicate that the genetic structure of the population in Shandong peninsula is not Table 1: Allele frequencies of genotyping of D6S289 in 119 SZ cases and 119 controls from a homogenous population in Shandong peninsula of China. Normal chi-square (T1) = 17.89, P = .047). This was reached 4657 times in 100 000 simulations by clump analysis.
Alleles (bp)  207  209  211  213  215  217  219  221  223  225  227  Case  24  20  20  18  62  64  22  6  1  0  1  Control  39  17  7  14  65  61  19  8  5 3 0 uniform, and the relatively homogenous population might be only restricted to a certain range of geographic area (100-150 km according to this study). Our findings suggest that the power to detect association is not merely related to enlargement of the sample size, but more importantly, to the genetic structure in the samples. Therefore, using a heterogeneous population increases the chance of producing a false negative result (type II error). JARID2 (jumonji, AT rich interactive domain 2) is an ortholog of the mouse jumonji gene, which encodes a nuclear protein essential for mouse embryogenesis, including neural tube formation. Over expression of mouse jumonji negatively regulates cell proliferation. The jumonji proteins contain a DNA-binding domain, called an AT-rich interaction domain (ARID), and share regions of similarity with human retinoblastoma-binding protein-2 and the human SMCX protein. The first report of JARID2 as a candidate gene for 6p23-linked SZ was from the USA [31] and our study suggests that JARID2, rather than DTNBP1, is a candidate gene for 6p22.3-linked SZ. A significant difference in allele distribution for a tetra nucleotide repeat polymorphism in the JARID2 gene that affects an AT-rich domain was found in that study. In our study, for D6S289, a dinucleotide repeat polymorphism of 11 alleles in JARID2, alleles of 209, 211, and 213 bp were more frequent in SZ cases than controls from Chang Le area of Shandong peninsula. This association was also present in two SNPs within the gene, rs2235258 and rs9654600. Straub et al. described the results of family-based association analysis on chromosome 6p and first reported genetic variation in the DTNBP1 gene, encoding dysbindin, which was associated with schizophrenia and related phenotypes [32]. The association has been well replicated in several populations, including Caucasian, Chinese (Shanghai, Sichuan) and Japanese samples [33][34][35][36][37][38][39][40][41]. Furthermore, variation in DTNBP1 conferring susceptibility to schizophrenia through reduced expression has also been reported [6,42]. However, no disease-causing alleles have been identified so far and some replication studies failed to provide positive results [43][44][45][46][47][48].
Including the current one, there have been a total of four studies on DTNBP1 in Chinese SZ samples: three of these samples were recruited from the southern part of Yangtze River, which is the biggest natural barrier in China and divides the Chinese population into two parts from an anthropological point of views. In Sichuan samples, a single haplotype, which included rs2619538 and P1583; and another rare haplotype, composed of P1320 and P1757, were significantly associated with SZ [37]. In samples from Shanghai, only a haplotypic (TGTCA) but no allelic or genotypic association was observed [49]. No association was reported with SZ in families from Taiwan either [45]. Therefore, the association of DTNBP1 with SZ in the Chinese population is far from convincing. The evidence implicating DTNBP1 as a susceptibility gene by so many investigators could not be regarded as invalid, and the extremely close 6 Journal of Biomedicine and Biotechnology proximity of JARID2 and DTNBP1 (within 1553 bp), makes them very likely to be regulated coordinately. DTNBP1 could be a downstream target of the JARID2 transcription factor, or both are coordinately regulated by the same cis or transacting factor [31]. The design of the SNPs genotyping was originally for testing the DTNBP1 association in this Chinese SZ samples, therefore, the JARID2 5' end was not analyzed in this study.