Selection in the dopamine receptor 2 gene: a candidate SNP study

Dopamine is a major neurotransmitter in the human brain and is associated with various diseases. Schizophrenia, for example, is treated by blocking the dopamine receptors type 2. Shaner, Miller & Mintz (2004) stated that schizophrenia was the low fitness variant of a highly variable mental trait. We therefore explore whether the dopamine receptor 2 gene (DRD2) underwent any selection processes. We acquired genotype data of the 1,000 Genomes project (phase I), which contains 1,093 individuals from 14 populations. We included single nucleotide polymorphisms (SNPs) with two minor allele frequencies (MAFs) in the analysis: MAF over 0.05 and over 0.01. This is equivalent to 151 SNPs (MAF > 0.05) and 246 SNPs (MAF > 0.01) for DRD2. We used two different approaches (an outlier approach and a Bayesian approach) to detect loci under selection. The combined results of both approaches yielded nine (MAF > 0.05) and two candidate SNPs (MAF > 0.01), under balancing selection. We also found weak signs for directional selection on DRD2, but in our opinion these were too weak to draw any final conclusions on directional selection in DRD2. All candidates for balancing selection are in the intronic region of the gene and only one (rs12574471) has been mentioned in the literature. Two of our candidate SNPs are located in specific regions of the gene: rs80215768 lies within a promoter flanking region and rs74751335 lies within a transcription factor binding site. We strongly encourage research on our candidate SNPs and their possible effects.

In humans, five different dopamine receptors exist. They are classified into two categories based on their structure and their pharmacological and biochemical properties. The D1-class includes the dopamine receptors 1 and 5, while the D2-class consists of the dopamine receptors 2, 3 and 4 (Andersen et al., 1990;Niznik & Van Tol, 1992;Sibley & Monsma, 1992;Sokoloff et al., 1992;Civelli, Bunzow & Grandy, 1993;Vallone, Picetti & Borrelli, 2000). The focus of our study is on the dopamine receptor 2 and its gene DRD2. The dopamine receptor 2 gene lies on the long arm of chromosome 11 (11q23.1). It spans from 113,280,317 to 113,346,413 for a total of 66,096 base pairs (bp) (information accessed on NCBI in the GnRH37 assembly). For the gene card, see Fig. 1. DRD2 has six introns (Gingrich & Caron, 1993). Alternative splicing between intron 4 and 5 of an 87 bp exon generates two variants of the dopamine receptor 2. The difference between D2S (short) and D2L (long) is a 29-amino-acids-long chain in the third intercellular loop of the protein (Giros et al., 1989;Monsma et al., 1989). While the short form (D2S) is mainly expressed at the presynapse, the long form (D2L) is expressed postsynaptically (Usiello et al., 2000;De Mei et al., 2009). The D2S are mainly autoreceptors, i.e., they reduce the expression of dopamine when activated. This leads to an important negative feedback mechanism (Wolf & Roth, 1990;Missale et al., 1998;Sibley, 1999). (Again, as reviewed by Beaulieu & Gainetdinov, 2011).
Among the many single nucleotide polymorphisms (SNPs) of DRD2, one prominent example is rs6277, also known as C957T. It has been associated with schizophrenia in Han Chinese in Taiwan (Glatt et al., 2009), in Russians (Monakhov et al., 2008 and in Bulgarians (Betcheva et al., 2009). Together with the -141C allele, the 957T allele is associated with the diagnosis of anorexia nervosa (Bergen et al., 2005). A meta-analysis showed that the Ser311Cys polymorphism (rs1801028) in DRD2 is a risk factor for schizophrenia.
The heterozygotes (Ser/Cys) and the homozygotes for Cys were both at elevated risk for schizophrenia when compared to the Ser/Ser genotypes (Glatt & Jönsson, 2006). In a study with alcoholic patients and controls, the A allele of rs1076560 was more frequent in alcoholic patients (Sasabe et al., 2007). In 2012, Mileva-Seitz et al. conducted a study with Caucasian mothers and their infants. They taped mother-infant behaviour and genotyped various SNPs of DRD2 and also DRD1. rs1799732 and the previously mentioned rs6277 were both associated with direct vocalization of the mother towards the infant.
The body of literature on SNPs and their possible effects is growing rapidly. Considering the influences those SNPs could have on human behaviour, and bearing in mind the different ecological habitats of Homo sapiens, we explore if DRD2 underwent any selection processes. In 2004, an interesting proposal by Shaner, Miller & Mintz stated that schizophrenia was the low fitness variant of a highly variable mental trait. Based on the connection between dopamine receptor 2 and schizophrenia, as stated above, we focused our analysis on DRD2.
To reduce false-positives, we used two selection detection algorithms to explore DRD2. This exploratory ("hypothesis-free") approach is designed to find candidate SNPs that were under selection. The data basis of our analysis is the 1,000 Genomes Project samples.

MATERIAL AND METHODS
We acquired data from the 1,000 Genomes Project (phase I) through SPSmart engine v5.1.1 (http://spsmart.cesga.es/engines.php; Amigo et al., 2008), using the search term "DRD2." We included all single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) greater than 0.05 (N = 151 SNPs) to include only the more frequently occurring SNPs. To verify our results also on the basis of a higher number of SNPs (which occur less frequently), we conducted the same analysis also based on a MAF > 0.01 sample (N = 246 SNPs; data presented in Supplemental Information). The structure of the DRD2 gene (113,280,346,413 in the GnRH37.p13 primary assembly) is shown in Fig. 1. The populations used for our analysis are shown in Table 1.
Two different programs were used to detect selection; both use F ST approaches to detect outliers. The program LOSITAN calculates FDIST, which uses F ST and the expected heterozygosity. It assumes an island model of migration with neutral markers. An expected distribution of Wright's inbreeding coefficient is calculated and then outliers are identified. A neutral mean F ST was computed by the program before the 50,000 simulations were performed. The infinite alleles model was used. To avoid false positive detection, we set the significance level to p < 0.01 (P(Simulation F ST < sample F ST )) (Antao et al., 2008).
BayeScan is a Bayesian statistics program. Basically, it calculates two simulations for every locus: one in which it assumes the locus is under selection and the other one in which this assumption is dropped. It splits the F ST coefficient into two parts. The alpha value is a locus-specific component shared by all populations. The beta value is a population-specific component shared by all loci. This is achieved via logistic regression and provides insight into selection. The alpha value serves as an indicator for selection. Significant positive values of alpha indicate directional selection, whereas significant negative values indicate balancing selection. The posterior probabilities are estimated using a reversible-jump Markov Chain Monte Carlo (MCMC) approach. The posterior probabilities are gained by counting how many times alpha is included in the model. Before computing the Markov chains, we calculate 20 pilot runs with 5,000 iterations each. The initial burn-in is set to 50,000 steps and the chains are run with 5,000 iterations and a thinning interval of 10. The program output consists of a posterior probability, the logarithm (base 10) of the posterior odds and a q value. These three values are all for the model with selection. Furthermore, the alpha value is reported along with an F ST coefficient average of all population per locus. In BayeScan the threshold of a posterior P of >0.99 and a log10(PO) of 2 or higher is used. This threshold is labelled as "Decisive" by BayeScan (see the program manual at http://cmpg. unibe.ch/software/BayeScan/files/BayeScan2.1 manual.pdf) (Foll & Gaggiotti, 2008).
To compute linkage disequilibrium (LD) of the SNPs, we used the R "genetics package" (http://cran.r-project.org/web/packages/genetics/genetics.pdf; Warnes et al., 2013). Mueller (2004) states that D ′ is particularly useful to assess the probability for historical recombination in a given population and r 2 is useful in the context of association studies. We therefore primarily calculate D ′ , but we also calculated r 2 , which is presented in Supplemental Information.
In most populations one or more SNPs had to be excluded to successfully run the computation. The population IBS was excluded entirely from this computation. IBS is a very small population (n = 14), and 30 SNPs caused the computation to fail. For a detailed view on all excluded SNPs, see Table S1.
We accessed information on the gene via NCBI (http://www.ncbi.nlm.nih.gov/) and on the specific SNPs via Ensembl (http://www.ensembl.org/).  Table S2 for the exact results on the SNPs.

RESULTS
The combined results of LOSITAN and BayeScan yielded nine candidate SNPs under balancing selection (MAF > 0.05); see Table 2. Figure 2 shows the graphical output of LOSITAN and Fig. 3 the output of BayeScan.
For a detailed view on the results of LOSITAN and BayeScan for all SNPs, see Table S2 in the supplementary material. Figure 1 provides a gene view of DRD2 with labels for the candidate SNPs.
Three SNPs (rs6277, rs12800853, rs11608109) that do not reach significance in the MAF > 0.05 sample (Table S3A) do reach significance in the MAF > 0.01 sample (Table  S3B), for directional selection. They barely reach significance (P < 0.01) based on the MAF > 0.01 sample in LOSITAN ( Fig. S1 and Table S3B). However if we applied more stringent detection prerequisites ("force mean F ST " and "neutral mean F ST "; increasing computational load, but also increasing convergence and lowering the bias in F ST estimation) in LOSITAN, none of these three SNPs reaches significance (Fig. S2). The results for balancing selection in BayeScan remained nearly unchanged in the MAF > 0.01 sample, with the exception of rs12574471, which did not reach significance (Table S3B).  Table S2. Squares: the nine SNPs in the MAF > 0.05 sample that BayeScan and LOSITAN find. All nine SNPs detected based on MAF > 0.05 are intron variants (Fig. 1). Only rs12574471 (3) is mentioned in the literature because it is near a supposed recombination hotspot (Glatt et al., 2009). rs80215768 (4) lies within a promoter flanking region; rs74751335 (7) lies within a transcription factor binding site. Nonetheless, we found no known associations for those two SNPs.
The F ST values of these nine loci indicate an overall low genetic differentiation, as well as a low differentiation between populations (Table 2). This is in accordance with balancing selection acting on the gene. The differences in F ST values stem from different algorithms used by the programs.
The Linkage Disequilibrium measurements D ′ and r 2 were used. The heat maps for all nine populations are shown in the supplementary material (Figs. S3-S15 for D ′ and Figs. S16-S28 for r 2 ). The relative position of the marked SNPs change because different populations had different SNPs excluded (see Table S1 for the list).

DISCUSSION
We found nine SNPs to be candidates for balancing selection based on the sample MAF > 0.05; of those, two had been also detected under balancing selection based on the MAF > 0.01 sample. We found no SNPs based on the MAF > 0.05 sample and the MAF > 0.01 sample, under directional selection, that are detected by both algorithms (on P < 0.001 in LOSITAN) and if more stringent detection criteria were applied in LOSITAN. We therefore conclude that, if directional selection has been acting on DRD2, then the signs are rather weak, i.e., too weak to make definitive conclusions.
Checking all nine SNPs under balancing selection based on the MAF > 0.05 sample with Ensembl reveals that they are all intronic region variants. For rs60599314 (1) and rs79549222 (2) that are found by LOSITAN and BayeScan on both the MAF >0.05 sample and the MAF >0.01 sample, no particular additional information is known. We therefore suggest that these two SNPs may provide interesting candidates for future functional studies.
rs80215768 (4) lies within a promoter flanking region and rs74751335 (7) lies within a transcription factor binding site (TFBS) (both SNPs detected based on the MAF >0.05 sample). Many studies are available on the possible effects of mutations in such regions (Hayashi, Watanabe & Kawajiri, 1991;In et al., 1997; or for a more general review on the topic, Jaenisch & Bird, 2003). Nonetheless, the SNPs show low F ST values, which is congruent with the finding of balancing selection. Sewall Wright's guidelines for interpreting F ST values suggest little genetic differentiation in our populations (as cited by Jobling et al., 2013;Chapter 5, Box 5.2). As silent mutations in DRD2 are known to alter the mRNA stability and even the synthesis of the receptor itself (Duan et al., 2003), we call for exploring the possible effects of these SNPs.
Additionally, the levels of the linkage disequilibrium measurement D ′ are typical for the respective populations: African populations show a dispersed pattern and no clear LD blocks (Figs. S3, S11, and S15). While the LD blocks are visible in American populations (Figs. S7, S12 and S13), they are not as clear as in Asian (Figs. S5, S6 and S10) or European populations (Figs. S4, S8, S9 and S14). Our candidate SNPs are part of tight LD blocks (D ′ > 0.8), which prevents us from making any further interpretations. We also examined the measurement r 2 for all populations (again, excluded SNPs are listed in Table S1), which revealed no new insight.
The finding of balancing selection suggests that in our sample the minor alleles bear some fitness disadvantage. Note that some individuals are homozygous for the minor allele (0.8-2.3% per SNP, over all populations). Fitness is altered if survival or reproduction of an organism is affected. This raises the possibility of a connection between our candidate SNPs and diseases or malfunctions of dopamine receptor 2. In the list of diseases associated with dopamine (see "Introduction") the most striking example is schizophrenia because dopamine receptor 2 blockers can successfully treat patients.
Albeit we aim to avoid overhasty conclusions regarding directional selection on DRD2, the three SNPs detected by BayeScan under positive selection-and that are found by LOSITAN just below significance-could be of interest. This is particularly the case for rs6277, with its known phenotypic associations: rs6277 has been associated with schizophrenia in Han Chinese in Taiwan (Glatt et al., 2009), in Russians (Monakhov et al., 2008 and in Bulgarians (Betcheva et al., 2009).
Nonetheless, rs6277 was not identified among the 108 schizophrenia-associated loci that have recently been published based on 36,989 cases and 113,075 controls by the Schizophrenia Working Group of the Psychiatry Genomics Consortium (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). Nevertheless, DRD2 is one of the genes that was confirmed as being associated with schizophrenia by the Working Group. Thus, even weak signs of directional selection on DRD2 might be important to (i) identify potential new disease-related phenotypical associations and (ii) to speculate on what the "selective force" could have been bringing mutations on DRD2 towards fixations and (iii) what potential condition-related consequences selection on DRD2 could have. The question is whether these conditions affect fitness. Accordingly, Bassett et al. (1996) showed that reproductive fitness is reduced in groups of familial schizophrenia, which suggests a selection process. Puzzlingly enough, they also found some evidence for an increased fitness of a small subsample of sisters. Shaner, Miller & Mintz (2004) proposed that schizophrenia is the low-fitness trait of a highly variable mental trait. They argue that the persistence of the illness at about 1% globally is too high for new mutations. Thus, mainly balancing selection would fit this hypothesis very well, and our candidate SNPs under balancing selection could be viable indicators for this.
DRD2 is clearly associated with schizophrenia. Schizophrenia, however, is a "polygenic condition" including genetic loci over the whole human genome (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014). Accordingly, the importance of DRD2 should not be over-estimated. Moreover, the method of selection detection does not allow direct inferences about a phenotype (e.g., schizophrenia). Our overall results can serve as a valuable precursor to future studies on the subject.
To untangle the possible effects of our SNPs, we propose a study in which our candidate SNPs are investigated in schizophrenic and non-schizophrenic persons. A simple comparison of the SNPs and the different haplotypes between the two groups should efficiently help assess our findings. If this proposed study finds differences in those two groups, then the mechanisms of those SNPs and their possible haplotypes must be investigated.

CONCLUSION
We found nine candidates for balancing selection on DRD2 but only a weak signs for directional selection. Interestingly, rs6277, a SNP known to be associated with schizophrenia, is among those SNPs for which we found weak evidence for directional selection. Some of the SNPs under balancing selection are potentially associated with various diseases. These SNPs could be important as biomarkers due to their very low F ST values: the genetic differentiation of one population compared with the whole sample is very small. While all candidate SNPs may be worth exploring, we definitely recommend using rs60599314 and rs79549222 (as these were detected under balancing selection based on a MAF >0.05 and a MAF >0.01 sample). We also recommend rs80215768 and rs74751335, found under balancing selection, for further studies on DRD2 because the former within a promoter flanking region and the latter lies in a transcription factor binding site.