Polymorphisms in genes involved in the absorption, distribution, metabolism, and excretion of drugs in the Kazakhs of Kazakhstan

Studies of genes involved in the absorption, distribution, metabolism, and excretion (ADME) of drugs are crucial to the development of therapeutics in clinical medicine. Such data provide information that may improve our understanding of individual differences in sensitivity or resistance to certain drugs, thereby helping to avoid adverse drug reactions (ADRs) in patients and improve the quality of therapies. Here, we aimed to analyse single nucleotide polymorphisms (SNPs) involved in the ADME of multiple drugs in Kazakhs from Kazakhstan. A total of 158 SNPs involved in the ADME of various drugs were studied. We analysed 320 Kazakh DNA samples using OpenArray genotyping. Of the 158 SNPs, 75 were not found in heterozygous or homozygous variants. Comparative analysis among Kazakhs and world populations showed a fairly high percentage of population differentiation. These results provide further information for pharmacogenetic databases and may contribute to the development of personalized approaches and safer therapies for the Kazakh population. Moreover, these data provide insights into the different racial groups that may have contributed to the Kazakh population.

Hordes); historically, these three groups had demarcated territories. Additionally, there were several tribes in each Zhuz [6]. Every Kazakh knows to which tribe and Zhuz he or she belongs, and representatives of the same tribe are considered relatives as they have descended from a common ancestor. Therefore, according to the seven generations law, marriage between members of the same tribe is only possible after seven generations from a common ancestor [7].
Many genes associated with the ADME of drugs have been identified. A team including representatives of the pharmaceutical industry and an academic centre developed a core list of 32 ADME genes, which includes 184 markers that can be used to screen patients in clinical trials. These data are available on the PharmaADME website (http://pharmaadme.org/).
In this study, we aimed to analyse single nucleotide polymorphisms (SNPs) involved in the ADME of multiple drugs in Kazakhs from Kazakhstan using an Open-Array PGx Panel derived from the PharmaADME Core Marker List.

Allele and genotype frequency analysis
Allele and genotype frequency data were obtained for 158 SNPs (Additional file 1). Seventy-five out of the 158 SNPs included in this study were not found in heterozygous or homozygous variants (Additional file 2). The allele and genotype frequencies of the remaining 83 SNPs are summarized in Table 1.
The correspondence of the distributions of genotype frequencies to the Hardy-Weinberg equilibrium was assessed using exact tests with a modified version of the Markov-chain random walk algorithm [8]
Next, we performed a comparative analysis of the differences in genotype frequencies among the Kazakh population and data for world populations collected from the HapMap database. For individuals of African ancestry living in the southwest USA (ASW), only 35 SNPs of a total of 56 were analysed. The remaining data for this population were no included in the HapMap database. Twenty of these 35 SNPs were significantly different from those in the Kazakh population. These genes encoded drug transporters (ABCB1, ABCC2, ABCG2, SLC15A2, SLC22A2, SLCO1B1, SLCO1B3, and SLCO2B1) and phase I (DPYD, CYP1A1, and CYP2B6) and II (GSTP1, TPMT, UGT2B7, and UGT1A1) drug metabolic enzymes. However, we found that there were no significant differences in SNPs within genes belonging to the acetyltransferase family (NAT2).
For Utah residents with Northern and Western European ancestry from the CEPH collection (CEU), population analysis was carried out for 50 SNPs; 21 of these SNPs showed significant differences compared with the Kazakh population. These SNPs were found in genes encoding drug transporters (ABCB1, ABCC2, SLC22A1, SLCO1B1, SLCO1B3, and SLCO2B1) and phase I (CYP1A1, CYP2C8, CYP2C9, and CYP2C19 ) and II (NAT2, GSTP1, and UGT1A1) drug metabolic enzymes. SNPs in genes belonging to the solute carrier family 15 (H+/peptide transporter) did not show differences between the Kazakh and CEU populations.
Only 26 of 50 SNPs showed significant differences among the Kazakh population and the Han Chinese population in Beijing, China (CHB). For the Chinese population in Metropolitan Denver, CO (CHD), population analysis was carried out for 34 SNPs; 14 of these SNPs showed significant differences from the Kazakh population. Significant differences were also observed for 24 of 51 SNPs in the Japanese population in Tokyo, Japan ( For the Gujarati Indian population in Houston, TX (GIH), population analysis was carried out for 38 SNPs; 23 of these SNPs showed significant differences from the Kazakh population. Notably, comparative analyses of        Significance level = 0.05 rs12720461, rs28371685, rs1058930, and rs3918290 were carried out only for the GIH population because frequency data in the HapMap database were only available for this population. Of these SNPs, only rs3918290 showed a significant difference from the Kazakh population.
If we compare the ratios of significantly different SNPs with the amount of data (i.e., the number of SNPs that were analysed) for each population, the YRI population showed the greatest differences compared with the Kazakh population. However, similar to the CEU population, statistically significant differences for SNPs of genes belonging to the solute carrier family 15 (H+/peptide transporter) were not found.
Tag-SNP analysis was also carried out using the aggressive tagging strategy (r 2 threshold: 0.8, logarithm (base 10) of odds [LOD] threshold: 3.0, minimum distance between tags: 0 kb). The analysis results are shown in Table 4. We found that rs1143672 was a tag-SNP for block 4. Therefore, it was likely that block 4 was formed by four SNPs, i.e., rs2293616, rs2257212, rs1143671, and rs1143672, rather than three SNPs.

Comparative analysis of haplotype frequency
Next, we carried out a comparative analysis of the haplotype frequencies of the samples from the Kazakh population and published data from the HapMap database, including 11 worldwide populations. All of the SNPs described in Fig. 1 were used for analysis; however, not all of these SNPs were present in the HapMap database. For block generations, the Confidence Intervals default algorithm was used (Haploview 4.2, MAF < 0.05). Block generation results for all 11 population are presented in Additional file 3. From these data, only the CEU population formed a block in the NAT2 gene that was similar to that in the consisting of rs1041983, rs1801280, rs1799929, rs1799930, and rs1208. The CEU block contained seven haplotypes, whereas that in the Kazakh population contained only six haplotypes; additionally, the frequencies were different ( Table 5). The GIH, LWK, MKK, and TSI populations generated blocks consisting of only four SNPs: rs1041983, rs1799929, rs1799930, and rs1208, whereas the MEX and YRI populations generated blocks consisting of three SNPs (rs1041983, rs1799929, and rs1799930). The JPT population generated blocks consisting of two SNPs (rs1041983 and rs1799930). Blocks were not generated by ASW, CHB, or CHD populations. Additionally, CEU, CHB, JPT, and YRI populations generated blocks similar to those of the Kazakh population, consisting of two SNPs (rs4149117 and rs7311358) in the SLCO1B3 gene (Additional file 3). These populations had four haplotypes that differed in frequency (Fig. 2). The highest frequency of haplotype GA was found in the CEU population (0.852), whereas the lowest frequency of haplotype GA was found in the YRI population (0.342). The value closest to that in the Kazakh population for haplotype GA (0.758) was found in the CHB population (0.710). The highest and lowest frequencies of haplotype TG were found in the YRI  (0.658) and CEU (0.148) populations. The value closest to the Kazakh population for haplotype TG (0.213) was found in the CHB population (0.265). The TA haplotype was found only in the JPT (0.038) and CHB (0.025) populations, and the GG haplotype was found only in the Kazakh population (0.030). The rest of the analysed populations did not generate blocks.
Kazakh population block, consisting of rs7662029 and rs7668258 in the UGT2B7 gene, was found in all 11 populations (Additional file 3). The highest and lowest frequencies of haplotype GC were found in the YRI (0.824) and CEU (0.490) populations, and the highest and lowest frequencies of haplotype AT were found in the CEU (0.510) and AWS (0.176) populations, respectively. The GC (0.464)  Fig. 2 Haplotype analysis results of rs4149117 and rs7311358 in the SLCO1B gene (chromosome 12) and AT (0.525) haplotype frequencies in the Kazakh population were close to the respective frequencies in the CEU population (Fig. 3). All 11 populations generated blocks in the SLC15A2 gene (Additional file 3). However, these blocks contained different numbers of SNPs. The CEU, CHB, JPT, and YRI populations generated blocks consisting of four SNPs: rs2293616, rs2257212, rs1143671, and rs1143672. The blocks of the other analysed populations consisted of three SNPs: rs2293616, rs2257212, and rs1143671. The highest and lowest frequencies of haplotype GCC were found in the MEX (0.728) and CEU (0.253) populations (Fig. 4). The highest and lowest frequencies of haplotype GCCG were found in the CEU (0.540) and JPT (0.233) populations. The highest frequencies of haplotypes ATT and ATTA were found in the CHD (0.747) and CHB (0.750) populations, whereas the lowest frequencies of haplotypes ATT and ATTA were found in the GIH (0.295) and CEU (0.450) populations.
If we take into account rs1143672 tagging analysis results of the Kazakh population and assume that block 4 consisted of four SNPs, the frequency of the GCCG haplotype was 0.459, and that of ATTA was 0.537. These values were nearly identical to the results of the YRI population.

Discussion
In this study, we examined the frequencies of specific SNPs in the Kazakh population and compared the results with those in the HapMap database for 11 other populations throughout the world. The results showed a fairly high percentage of population differentiation, providing insights into the different racial groups that may have contributed to the Kazakh population.
The Kazakh population is an interesting model in population genetics, and the process through which the Kazakh population formed is poorly understood.
However, some scientists believe that the Kazakh population was formed by the mixing of the Asian and Caucasoid populations [6] owing to the observation that there are Kazakh individuals who have distinctive Asian and/or Caucasoid traits. Additionally, the Kazakh people are divided into three Zhuzes and further divided into distinct tribes in each Zhuz. The historical division into Zhuzes could be argued on the basis of the different origins of each Zhuz; this could explain the different frequencies of SNPs within the population. However, in our previous study, in which we had a larger sample collection, we compared the frequencies of SNPs within the three Zhuzes and found no significant differences in SNPs between Zhuzes [7]. Thus, we concluded that we could combine all samples in one sample collection.
Genotyping of 158 SNPs from 320 DNA samples showed that 75 SNPs were not found in the studied samples (Table 1, Additional file 2). The frequencies of many of these SNPs were very low in other populations as well [10]. However, we could not conclude that these SNPs did not occur (or were only present in a very low frequency) in the Kazakh population. In addition, seven of 83 SNPs identified in the Kazakh population were not in Hardy-Weinberg equilibrium. We expect that this result may have been caused by the insufficient power of the study.
In this study, we selected SNPs involved in the ADME of drugs for genotyping. Thus, 19 of 83 SNPs occurring in the Kazakh population were associated with drugs used in the treatment of cardiovascular diseases (statins, beta-blockers, anticoagulants, and antiplatelet agents). The recommended dosage for the cholesterol-lowering agent simvastin is 80 mg (U.S. Food and Drug Administration [FDA], www.fda.gov). Moreover, the FDA recommends dose correction when using simvastatin with certain drugs that cause increased concentrations of simvastatin, resulting in increased risk of myopathy. In patients with the C allele at the SNP rs4149056 in the SLCO1B1 gene, there are modest increases in myopathy risk even at lower doses of simvastatin (40 mg daily); if optimal efficacy is not achieved with a lower dose, alternate agents should be considered [11]. The TT genotype frequency in our study was 72 % in Kazakhs, compared with 91 %, 71 %, 60 %, and 98 % in the ASW, CHB, TSI, and YRI populations, respectively. Moreover, responses of individuals to statin drugs are associated with ABCB1 (rs2032582), ABCC2 (rs717620), ABCG2 (rs2231142), SLCO1B1 (rs2306283), CYP2C8 (rs10509681), and CYP2C9 (rs1799853, rs1057910). Comparative analysis of the frequencies of these SNPs in the Kazakh population with those in the ASW population showed significant differences for all SNPs, except for the SNPs in cytochrome P450. In contrast, for the CEU population, only the SNPs in cytochrome P450 and SLCO1B1 (rs2306283) were significantly different from those in the Kazakh population. The VKORC1 gene on chromosome 16 is one of the main genes associated with the dosage of coumarin anticoagulants, and several mutations in this gene are associated with enzyme deficiency. An allelic variant in VKORC1 (c.-1639G > A) determines up to 30 % of the variability in warfarin dosage [12,13]. In a previous study, the VKORC1 c.-1639G > A mutation was found to be linked with VKORC1 c. 173 + 1369G > C (rs8050894) and VKORC1 c. 173 + 1000C > T (rs9934438) mutations [14]. Subjects carrying the 1173 T (rs9934438) allele required a lower maintenance dose of warfarin compared with that in subjects harbouring the CC genotype in African Americans and Caucasians. Before reaching the maintenance dose, only Caucasians with the T allele had a significantly increased risk of international normalized ratio compared with that in Caucasians harbouring the CC genotype. Polymorphisms in the VKORC1 gene are associated with the maintenance dose requirements of warfarin among both African Americans and Caucasians [15]. Interestingly, in VKORC1, the allele frequency of rs8050894 c. 173 + 1369G > C is as high as 94 % (G allele) in Asian populations, whereas that in Caucasians is about 37 % (G allele). In the Kazakh population, we found that the frequency of allele G was 63 %. Importantly, the response to anticoagulant drugs (e.g., warfarin) is associated with CYP1A1 (rs1048943) and CYP2C9 (rs1057910, rs28371685, and rs1799853). Comparative analysis of the frequencies of these SNPs showed that all of the SNPs listed above were significantly different between the Kazakh population and the YRI population, with the exception of rs28371685. The majority of the data were not present in the HapMap database ( Table 2).
The treatment of cardiovascular diseases often involves administration of Plavix (clopidogrel). The influence of genetics on the pharmacokinetic and pharmacodynamic response to clopidogrel has been examined in previous studies [16]. Several polymorphic P450 enzymes are involved in the activation of clopidogrel. The CYP2C19 isoenzyme is involved in the formation of an active metabolite and intermediate metabolite, 2-oxoclopidogrel. The pharmacokinetics and antiplatelet effects of the active metabolite of clopidogrel, which were investigated by means of platelet aggregation ex vivo, vary depending on the genotype of the CYP2C19 isoenzyme. Allele CYP2C19*1 is responsible for the normally functioning metabolism, whereas alleles of the CYP2C19*2 and CYP2C19*3 genes are responsible for decreased metabolism. The frequency of the A (rs4244285) allele in our study was 17 % in Kazakhs, compared with 15.5 %, 28 %, and 14 % in CEU, JPT, and YRI populations, respectively. For rs4986893, the A allele frequency in our study was 4 % in Kazakhs; no HapMap data were available for other populations. Other alleles associated with reduced metabolism have been identified in CYP2C19*4, CYP2C19*5, CYP2C19*6, CYP2C19*7, and CYP2C19*8; however, these alleles were rarely found in our population. The response to antiplatelet agents (Plavix) is also associated with ABCB1 (rs2032582), CYP1A1 (rs1048943), CYP1A2 (rs762551), CYP2B6 (rs3745274), CYP2C8 (rs10509681), CYP2C9 (rs1799853), and CYP2C19 (rs12248560). Comparative analysis of SNP frequencies showed that these SNPs were significantly different between the Kazakh population and the YRI population, with the exception of rs2032582. The majority of data were not available in the HapMap database. Significant differences in genes in the ATP-binding cassette system were not found between the Kazakh and JPT populations ( Table 2).
Labetalol is a nonselective β-adrenergic antagonist with additional α1-adrenergic antagonist properties. CYP2C19 is involved in the metabolism of several important groups of drugs, including a number of βblockers, such as propranolol and labetalol [17]. A previous study showed that the activity of labetalol is significantly affected by common CYP2C19 polymorphisms in individuals of Chinese ethnicity; specifically, subjects with the CYP2C19*2/*2 (rs4244285) genotype had a higher peak and area under the concentrationtime curve than subjects with the CYP2C19*1/*1 genotype, and heterozygotes had intermediate values [18]. In the Kazakh population, genotype AA was found in 2 % of individual, whereas 5.2 %, 6.8 %, and 3.4 % of individuals in the CEU, JPT, and YRI populations carried this allele.
Responses to β-blockers are associated with ABCB1 (rs1128503) and UGT1A1 (rs4148323 and rs4124874). All of these SNPs were significantly different between the Kazakh and YRI populations, although most data were not available in the HapMap database. Significant differences in genes in the ATP-binding cassette system and UDP glucuronosyltransferase were not observed between the Kazakh and JPT populations. Moreover, SNPs in the UGT1A1 genes did not differ between the CHD and TSI populations ( Table 2).
SNPs in ABCB1 (rs1045642) and CYP2C19 (rs4244285) are associated with the response to βblockers, anticoagulants, and antiplatelet agents. Importantly, the frequencies of these SNPs were significantly different between the Kazakh population and the ASW, CEU, GIH, MKK, and YRI populations for rs1045642 and between the Kazakh population and the CHB and JPT populations for rs4244285 ( Table 2).
Analysis of the results of haplotype frequencies among the populations examined in this study showed substantial and significant variations. For example, only four populations generated the block in the SLCO1B3 gene, similar to the Kazakh population. The CHB population had the most similar haplotype frequency compared with the Kazakh population. However, there were variations in haplotypes among populations, with differences in GA, TG, and TA haplotypes for the CHB and in GA, TG, and GG haplotypes in the Kazakh population. Only eight populations generated blocks in the NAT2 gene, and 24 haplotypes were formed by the analysed SNPs. From these results, none of the examined populations were similar to the Kazakh population with regard to this gene. However, all 11 populations generated haplotype blocks in UGT2B7 and SLC15A2 genes, and the CEU population had the closest frequency for UGT2B7, whereas the YRI population had the closest frequency for SLC15A2 relative to the Kazakh population. Thus, for these three genes (UGT2B7, SLC15A2, and SLCO1B3), the Kazakh population showed similarities with three different populations. All three of these populations showed significant differences in these three genes.

Conclusion
In summary, our data provided important information for personalised medicine in the Kazakh population, supporting the genotyping of specific SNPs before administration of drugs with respect to the patient's ethnicity. The allele frequencies of the studied SNPs were quite different in the Kazakh population compared with those for all of the other populations examined. Moreover, we could not classify the Kazakh population as Asian or Caucasian, indicating that the Kazakh population may have been formed from several populations belonging to different racial groups.
Our study had several limitations. First, we had only a small number of samples. In addition, it will be useful to perform comparative analysis of the frequencies of SNPs in the different Zhuzes in order to clarify that combining samples from all Zhuzes is acceptable. Unfortunately, in this study, we did not have sufficient data to classify individuals into Zhuzes, only by nationality. In future studies, we plan to increase the number of samples and to examine additional SNPs.

Characteristics of the study populations
A total of 320 individuals living in Astana during 2012-2013 and belonging to the Kazakh nationality participated in this study. All individuals included in the present study were unrelated and randomly selected from different regions of Kazakhstan. The mean (± standard deviation [SD]) age of the participants was 44.06 ± 17.98 years (age range: , and the population included 239 men and