Exonic Re-Sequencing of the Chromosome 2q24.3 Parkinson’s Disease Locus

Genome-wide association studies (GWAS) in Parkinson’s disease (PD) have identified over 20 genomic regions associated with disease risk. Many of these loci include several candidate genes making it difficult to pinpoint the causal gene. The locus on chromosome 2q24.3 encompasses three genes: B3GALT1, STK39, and CERS6. In order to identify if the causal variants are simple missense changes, we sequenced all 31 exons of these three genes in 187 patients with PD. We identified 13 exonic variants including four non-synonymous and three insertion/deletion variants (indels). These non-synonymous variants and rs2102808, the GWAS tag SNP, were genotyped in three independent series consisting of a total of 1976 patients and 1596 controls. Our results show that the seven identified 2q24.3 coding variants are not independently responsible for the GWAS association signal at the locus; however, there is a haplotype, which contains both rs2102808 and a STK39 exon 1 6bp indel variant, that is significantly associated with PD risk (Odds Ratio [OR] = 1.35, 95% CI: 1.11–1.64, P = 0.003). This haplotype is more associated than each of the two variants independently (OR = 1.23, P = 0.005 and 1.10, P = 0.10, respectively). Our findings suggest that the risk variant is likely located in a non-coding region. Additional sequencing of the locus including promoter and regulatory regions will be needed to pinpoint the association at this locus that leads to an increased risk to PD.


Introduction
Parkinson's disease (PD) was not historically considered a genetic disease until in depth studies of the segregation of genetic variants in families revealed several inherited mutations in genes such as SNCA, LRRK2, and PARK2 [1,2]. These first discoveries were followed by population based genome-wide association studies (GWAS) aimed at identifying risk factors for sporadic PD, which represents up to 90% of PD cases [3]. To date, GWAS have nominated over 20 loci influencing the risk to PD [4]. Causal genes have been nominated for a few of the loci (mostly because they overlap with familial PD genes) but the majority of GWAS loci are defined by large regions of linkage disequilibrium (LD) containing several different genes.
The chromosome 2q24.3 locus was associated with increased PD risk in 2011 through a meta-analysis of GWAS published by Nalls et al [5] and has since been replicated in even larger meta-analytical approaches [6,7]. There has also been independent replication in Caucasian populations although the association has not been observed in Asian series of Han Chinese descent [8][9][10][11]. A recent GWAS in Ashkenazi Jewish patients also identified an association signal in the region, although the sample size was limited and significance was not achieved [12]. Serine Threonine Kinase 39 (STK39) has been put forward as the causal gene, but the locus, as defined by Nalls et al, contains two other candidate genes (UDP-Gal:BetaGlcNAc Beta 1,3-Galactosyltransferase, Polypeptide 1 [B3GALT1] and Ceramide Synthase 6 [CERS6]). In order to identify the potential causal variant(s) responsible for the GWAS signal, the region needs to be re-sequenced and fine-mapped.
Variants in coding regions are likely to have an effect on protein structure and function and thus a great impact on phenotype. Therefore, in the present study we undertook the screening of all 31 exons of genes B3GALT1, STK39 and CERS6 in 187 patients with PD. We identified 13 exonic variants including four non-synonymous and three insertion/deletion variants. After validation in controls, we genotyped the seven non-synonymous and insertion deletion variants in three independent series (US, Irish, and Polish) consisting of a total of 1976 patients and 1596 controls. We did not identify a single variant responsible for the risk at the 2q24.3 locus but we observed a haplotype that included a STK39 coding variant which was significantly associated with PD risk.

Study subjects
A total of 1976 patients with clinically diagnosed PD and 1596 controls were included in this case-control study. The patients are all unrelated non-Hispanic Caucasians of European descent. Subjects were from a US series collected at Mayo Clinic's Florida campus (895 patients, 976 controls), an Irish series (368 patients, 368 controls), and a Polish series (713 patients, 252 controls). Characteristics of subjects included in the study are summarized in Table 1 for each series. Patients were diagnosed with PD using standard criteria [13]. Controls were individuals free of PD or a related movement disorder at the time of examination. The Mayo Clinic Institutional Review Board approved the study and the review boards of the Mater Misericordiae University Hospital (Ireland), the Polish Academy of Sciences, the Medical University of Silesia, Jagiellonian University and the Central Hospital of the Ministry of Interior and Administration (Poland) received local IRB approvals, and all subjects provided written informed consent.
(NM_020981.3, 2 exons), STK39 (NM_013233, 18 exons), and CERS6 (NM_001256126.1, 11 exons) were sequenced in 187 patients with familial late onset PD (from the US series). This series subset consists of 129 males (64%). These patients have a mean age of 80.4±8.1 (62-97) years old and a mean age at onset of 65.4±8.0 (51-83) years old. Non-synonymous variants were then validated (validation stage) by sequencing 376 control samples (168 males (45%), mean age 67.1±12.3 (29-88)) from the US. Bi-directional sequencing was performed as previously described [14]. In addition, the three insertions/deletions in exon 1 of STK39 were genotyped by fragment sizing: PCR was performed using a fluorescently-labeled DNA primer, amplicons were run on an ABI 3730XL DNA sequencer (Applied Biosystems, Foster City, CA, USA), and reads were analyzed using GeneMapper 5 software (Life Technologies, Carlsbad, CA, USA). For the replication stage, all samples from the US, Irish, and Polish series (including the aforementioned 187 US PD patients and 376 US controls) were genotyped. STK39 exon 1 insertion/deletion variants (del6: ss1570217805, ins3: ss1570217817, del21: ss1570217825) were genotyped using fragment sizing as previously described and the other identified variants (rs141683896, rs56031549, rs4496303, rs34110122) as well as GWAS tag rs2102808 were genotyped using TaqMan Allelic Discrimination Assays on an ABI 7900HT Fast Real-Time PCR system (Applied Biosystems, Foster City, CA, USA) and data was analyzed using Taqman Genotyper Software Version 1.3 (Applied Biosystems, Foster City, CA, USA). Primer sequences and amplification conditions are available upon request. Call rate for sequencing and genotyping was 98% at each stage.

Statistical analysis
For stage 2 (validation), chi-square tests were used to compare the frequency of each variant between the 187 PD patients and 376 controls included in that stage. For stage 3 (replication), the association of each variant with PD was evaluated using a logistic regression model. ORs and 95% confidence intervals (CIs) were estimated, and each variant was considered under an additive model (i.e. effect of each additional minor allele). Additionally, to evaluate the effect of the coding single nucleotide polymorphisms (SNPs) on the GWAS association signal, we adjusted for each coding SNP individually and together in logistic regression models that included rs2102808 as a covariate (under an additive model). To test the combined effect of alleles, haplotype-based logistic regression analyses were performed on the variants with minor allele frequency (MAF) >1%, where only haplotypes occurring at a frequency of 1% or greater were considered. All regression models were adjusted for age, gender, and series (combined The sample mean ± SD (minimum-maximum) is given for age and age at onset. series only). Where indicated (P corr) P-values were corrected using the Bonferroni correction. P-values of 0.05 or lower were considered as statistically significant. All analyses were performed using PLINK v1.7 (http://pngu.mgh.harvard.edu/purcell/plink/) [15].

Result
We aimed to explain the PD GWAS signal at the chromosome 2q24.3 locus. In order to detect putative causal PD risk variants, we sequenced all 31 exons of genes B3GALT1, STK39, and CERS6 in 187 PD cases from our US series. Upon sequencing of exon 1 of gene STK39, we identified a region rich in repeats and containing three in frame indels, an insertion of three base pairs, a deletion of six base pairs and a deletion of 21 base pairs (see Fig 1). We complemented our sequencing with fragment sizing to fully genotype these exon 1 variants. Following this screening, we identified 13 variants including four non-synonymous changes and three indels ( Table 2). One non-synonymous SNP was located in B3GALT1 (exon 2), three indels (exon 1) and one non-synonymous SNP (exon 11) were located in STK39, and two non-synonymous SNPs were located in CERS6 (exon 1 and 5).
We prioritized the non-synonymous and indel variants as they are more likely to have a functional impact. To validate the non-synonymous variants identified, we sequenced (and genotyped through fragment sizing) 376 controls from our US series. Although none of the variants were statistically significant when comparing frequencies with the aforementioned 187 PD cases (Table 2), odds ratio estimates in this small patient-control group suggested that some may increase risk of PD. After evaluation of our statistical power to detect a significant association signal in our replication cohort, we decided to follow up on all non-synonymous variants.
In order to assess the role of our coding SNPs in PD risk at this locus, we genotyped all seven variants in all 1976 patients and 1596 controls from each of our three series (US, Irish, and Polish) and compared the association signal with GWAS locus tag SNP rs2102808 (chromosome 2 position 169117025 assembly GRCh37.p13). Fluorescent-based PCR fragment sizing of the indels allowed phasing of the three variants. Four alleles exist at the locus, they are: wt-wt-wt (70.3%), del6-wt-wt (20.9%), del6-wt-del21 (6.44%), and wt-ins3-wt (2.36%). The wt-ins3-wt allele sits on a haplotype with rs2102808 allele G (protective) and the del21 allele is more frequently transmitted with rs2102808 allele T (risk allele). The linkage disequilibrium between the four STK39 SNPs and rs2102808 is presented in S1 Fig. The results of the single variant association tests are shown in Table 3. The STK39 ins3 variant consisting of an in frame insertion of three base pairs in a repeat region of exon 1 shows association with risk of PD in the US series (OR: 1.59, 95% CI: 1.01-2.51, P = 0.046), before correction for multiple testing, but this was not seen in the other series, including the large combined series (OR: 1.27, 95% CI: 0.91-1.75, P = 0.16). This is possibly due to population heterogeneity. The only SNP that was significantly associated with PD in the combined patientcontrol series was the GWAS tag SNP rs2102808 (OR: 1.23, 95% CI: 1.06-1.42, P = 0.005, P corr = 0.04). Additionally, in logistic regression analyses adjusting for the 2q24.3 locus coding variants, the GWAS association signal for rs2102808 was not altered (data not shown).
We were interested in testing if haplotypes consisting of the GWAS SNP and the 2q24.3 coding SNPs carried increased risk to PD compared to single variants. Results for the three series are shown in Table 4. One haplotype defined by the rs2102808 minor allele (G>T) and a six base pair insertion in exon 1 of STK39 (del6, CGGGGC>-) was significantly associated with PD in the combined series (OR = 1.35, 95% CI: 1.11-1.64, P = 0.003, P corr = 0.02). This particular haplotype is more significantly associated than rs2102808 by itself in the combined series (P corr = 0.04) and the OR suggests that it confers a slightly increased risk to PD than the GWAS variant (1.35 (1.11-1.64) compared to 1.23 (1.06-1.42)).

Discussion
Our screening at the PD GWAS locus at 2q24.3 identified a risk haplotype defined by rs2102808 allele T as well as a six base pair deletion in exon 1 of the STK39 gene. This haplotype is associated with an increased risk of PD (p = 0.003, OR = 1.35) with an estimated effect size that is greater than the effect observed when these alleles are tested independently (OR = 1.1 [del6] and 1.23 [rs2102808]). Given that the strength of the association is greater for this haplotype than for each single allele individually (p = 0.10 and 0.005), it is possible to suspect a contribution of the del6, or an untested variant in LD with it, to PD, although with a small effect size. Our screening did not identify a single common coding variant responsible for the locus association signal which suggests at least two scenarios: 1) the causal risk factor at the locus consists of several different variants of low frequency and our sample size is too small to detect individual effects, or 2) the causal variant is located outside of the coding region. In the latter case, a screening of the non-coding regions might identify variants located in regulatory elements such as promoter and enhancers that modulates gene expression levels. Of interest for the study of PD, BioGPS reports STK39 mRNA levels to be greatest in brain regions compared to other tissue tested with the affymetrix expression microarray U133, whereas CERS6 mRNA levels are higher in dendritic cells and in the pineal gland and B3GALT1 is expressed ubiquitously [16]. Protein STK39 is a kinase involved in the phosphorylation and activation of Na + -K + -Clco-transporters. These transporters are implicated in the neuronal depolarizing response led by GABA and glycine neurotransmitters via changes in the intracellular concentration of Cl -. [17] STK39 knockout mice have been shown to have higher nociceptive threshold, impaired motor function and increased anxiety [18]. Although no conclusions can be drawn as to the location of the causal variants based on this particular study, the STK39 exon 1 is an interesting candidate region in the search for regulatory variants, as it contains many repeat elements. The exon encodes a proline/alanine rich region (amino acids 12 to 53) called the PAPA box for which the precise function is still unknown [17]. The PAPA box is designated as an active promoter region and includes a CTCF binding site based on ENCODE ChIP-seq data [19]. CTCF is a ubiquitously expressed protein which functions as transcriptional repressor, activator or an insulator blocking enhancer activity and thus influencing gene expression [20]. We identified three indels located in the PAPA box but none of these variants were significantly associated with PD risk. The variant located on the associated haplotype is a two amino acid deletion with a minor allele frequency of 26%. The variant is located in a repeat motif (unit: CGGGGC) with the major allele being five repeat units and the minor four.
Resolving the underlying genetic variation at each GWAS loci that is associated with disease susceptibility is critical to our understanding of not only the clinical relevance but also the disease mechanisms. This goal is challenging and even for the loci that overlap with known familial PD genes (e.g. SNCA and LRRK2), the functional associated variants accounting for the GWAS signal have not yet been identified. The exonic portion of the LRRK2 gene, recognized as the most common genetic cause of both familial and sporadic PD, has been extensively studied by our group [21]. Although low penetrant variants have been identified and confirmed to modestly increase or decrease disease risk, these associations do not explain the GWAS signal at the LRRK2 locus [22]. This most likely reflects the presence of functional/regulatory variants located outside of the coding region accounting for the association signal. This is also the case for the SNCA gene with no common coding variation observed, and may be a common phenomenon for a number of the other GWAS nominated loci. If this is true, additional genetic sequencing studies with increased sample size and a focus extended to non-coding regulatory regions will be needed to pinpoint to the precise variants responsible for the association signal at locus Chromosome 2q24.3. Numbers on the squares represent D'(x100) between two variants, no number mean D' = 1. A white square represents LOD scores less than 2 and D' less than 1 (low LD), a light blue square represents D' = 1 but LOD score less than 2. Shades of pink squares represent D' less than 1 and LOD score more than 2 and bright red squares show variant in LD, D' = 1 and LOD score more than 2. B. Shades of grey squares represent the correlation between variants expressed as r2 (x100). Del6, del 21 and rs2102808 are in high LD but have very different minor allele frequencies, hence the high D' and low r2. (PDF)