Prediction of the CYP2D6 enzymatic activity based on investigating of the CYP2D6 genotypes around the vivax malaria patients in Yunnan Province, China

In recent years, the incidence rate of vivax malaria recurrence still had 3.1% in Yunnan Province population after eradication therapy using primaquine (PQ). In order to understand the specific failure reasons for preventing vivax malaria relapses, a preliminary exploration on the CYP2D6 enzyme activity was carried out in the vivax malaria patients in Yunnan Province population by analysing mutational polymorphism in the coding region of CYP2D6 gene. Blood samples were collected from vivax malaria patients with suspected relapse (SR) and non-relapsed (NR) malaria in Yunnan Province. The DNA fragments containing 9 exons regions of human CYP2D6 gene were amplified by performing PCR and sequenced. The sequencing results were aligned by using DNAStar 11.0 to obtain the coding DNA sequence (CDS) of CYP2D6 gene. DnaSP 6.11.01 software was used to identify mutant polymorphisms and haplotypes of the CDS chain. The waterfall function of GenVisR package in R was utilized to visualize the mutational landscape. The alleles of CYP2D6 gene were identified according to the criteria prescribed by Human Cytochrome P450 (CYP) Allele Nomenclature Committee Database and the CYP2D6 enzyme activity was predicted based on diploid genotype. A total of 320 maternal CDS chains, including 63 from SR group and 257 from NR group, were obtained. Twelve mutant loci, including c.31 (rs769259), c.100 (rs1065852), c.271 (rs28371703), c.281 (rs28371704), c.294 (rs28371705), c.297 (rs200269944), c.336 (rs1081003), c.408 (rs1058164), c.505 (rs5030865), c.801 (rs28371718), c.886 (rs16947), and c.1,457 (rs1135840) were observed on the 640 CDS chains (including 320 maternal and 320 paternal chains). The high-frequency mutation at rs1135840 (0.703) and low-frequency mutation, such as rs28371703, were detected only in the SR group. The frequency of mutant rs1058164 and rs1135840 were significantly increased in the SR group (x2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}^{2}$$\end{document}= 4.468, 5.889, P < 0.05), as opposed to the NR group. Of the 23 haplotypes (from Hap_1 to Hap_23), the nomenclatures of 11 allelic forms could be found: Hap_3 was non-mutant, Hap_2 accounted for the highest frequency (36.9%, 236/640), and Hap_9 had the most complex sequence structure, containing 7 loci mutations. Allele *10 was the most frequent among these genotypes (0.423). Among the allele *10 standard named genotypes, *1/*10, *1/*1 and *2/*10 were significantly more frequent in the NR group (x2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}^{2}$$\end{document}= 3.911, P < 0.05) and all showed uncompromised enzyme activity; the impaired genotype *10/*39 was more frequent in the SR group (x2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${x}^{2}$$\end{document}= 10.050, P < 0.05), and genotype *4/*4was detected only in the SR group. In the patients receiving PQ dosage in Yunnan Province population, both rs1135840 single nucleotide polymorphism and *10 allele form was common in the CYP2D6 gene. Low-frequency mutation sites, such as rs28371703, were only presented in patients with vivax malaria relapse.


Background
PQ currently remains the only reliable drug to eliminate the hypnozoites of Plasmodium vivax [1]. However, the production of its active ingredient, 5-hydroxy-primaquine, relies on the catalysis of CYP2D isoenzymes in human hepatocytes [2][3][4]. CYP2D6 is the only monooxygenase in the P450 family that cannot be induced by the environment. CYP2D6 is characterized by low capacity, high catalytic efficiency, and complex heterogeneity; more than 30% of commonly used clinical drugs are catalysed by CYP2D6 [4][5][6]. Previous study has revealed the association between CYP2D6 gene polymorphism and the ineffective potency of PQ in the radical cure of vivax malaria [7,8]. Evidence revealed the notably high frequency of rs1065852 and rs1135840 locus mutations with vivax malaria relapse patients in a Brazilian population [7,9].
Recently, the World Health Organization (WHO) classified the impaired CYP2D6 function as one of the four risk factors for PQ therapy, in addition to G6PD deficiency, pregnancy and/or lactation and infancy [1,13,14]. The WHO further advocated the personalized dosage of PQ therapy for the eradicative treatment of vivax malaria based on acquired understanding of patients' CYP2D6 genetic polymorphism and heterogeneity of enzyme activity [14,15].
Molecular epidemiological investigation on G6PD deficiency in vivax malaria patients has been initiated in Yunnan Province [16,17] to carry out polymorphism analysis of the full coding region of G6PD gene, revealing that G > T (rs72554665) missense mutation at c.1376 locus could be used as the molecular marker for indicating G6PD deficiency. It also found that vivax malaria patients with genotype *4/*4 of CYP2D6 poor enzyme activity were prone to the failure of PQ radical treatment for vivax malaria [7], suggesting that the influence of biological factors may cause obstacles to effectively block the transmission of vivax malaria and effectively eliminate sources of malaria infection in Yunnan Province. The current study further analysed CYP2D6 genotype and its predicted enzyme activity in vivax malaria patients from Yunnan Province population who underwent PQ treatment in order to systematically understand factors that may adversely affect deployment of PQ for radical cure of vivax malaria in Yunnan Province.

Study subjects and blood sampling
All vivax malaria cases included in the current study had been diagnosed and reported by county-level laboratories in Yunnan Province since 2014. The initial diagnosis from county-level laboratories were re-tested by Yunnan Provincial Malaria Diagnostic Reference Laboratory (YPMDRL) using both microscopic examination and genetic testing (Additional file 1) of Plasmodium in peripheral blood [18]. YPMDRL was formally certified as a member of the China Malaria Diagnosis Reference Laboratory Network in 2012 [19], which is composed of provincial microscopists and molecular biology experts. Based on collaborative relationship between YPMDRL and 129 county-level malaria diagnosis laboratories, every malaria case reported from Yunnan Province was accurately identified [20]. During the period, China had implemented a malaria epidemic reporting and review system to ensure the integrity of all malaria cases diagnosed and reported since 2005. The peripheral blood samples from these cases had been collected in YPMDRL since 2014. The 0.6 ml of venous blood collected from every subject was stored in a dried tube at − 80 °C before DNA extraction. Dried blood spot (DBS) sampling was used to help with genetic analysis.
Anti-malarial treatment for malaria cases was carried out in county-level hospitals after completing initial diagnosis. All vivax malaria patients received oral chloroquine therapy (1550 mg in total) within 3 days, followed by the subsequent 8-day course of primaquine therapy (22.5 mg/day) (commonly known as 8-day chloroquine/primaquine therapy). The vivax malaria recurrence cases were found by the uncompromised enzyme activity; the impaired genotype *10/*39 was more frequent in the SR group ( x 2 = 10.050, P < 0.05), and genotype *4/*4was detected only in the SR group.

Conclusion:
In the patients receiving PQ dosage in Yunnan Province population, both rs1135840 single nucleotide polymorphism and *10 allele form was common in the CYP2D6 gene. Low-frequency mutation sites, such as rs28371703, were only presented in patients with vivax malaria relapse.
Keywords: Yunnan Province, Vivax malaria patient, CYP2D6, Genotype, Enzyme activity, Prediction county-level laboratories, too. Everyone received 8-day chloroquine/primaquine therapy was able be identified as recurrence case when his peripheral blood showed Plasmodium infection again after 28 days under no longer had the experience of exposure to malaria [18].
Then, it is necessary that the paired infection strains between previous and recurrence episodes were molecularly identified by YPMDRL after that. The paired strains, between the previous and recurrent infections, were molecularly identified by YPMDRL, and verified as the single clone with homologous or the sibling clones with weak heterologous DNA sequences including both pvcsp gene and pvmsp-1 gene [18,21] (Published in another manuscript: Molecular identification of vivax malaria recurrent episodes in Yunnan Province basing on analysis of the homology between pvcsp genes in different Plasmodium vivax stains), the case with recurrent experience can be confirmed as vivax malaria suspected-relapse.
In order to systematically reveal the heterogeneity of CYP2D6 enzymatic activity in vivax malaria cases in Yunnan Province, the current study adopted a grouping and cluster sampling survey. The vivax malaria patients were divided into suspected-relapse group (SR group) and non-relapse group (NR group). The NR group, which included the cluster patients from 2018 to 2020, had not experienced recurrent symptoms after being clinically cured by 8-day chloroquine/ primaquine therapy. In the SR group, all patients from 2014 to 2020 were included.

Extraction of human genomic DNA and PCR amplification of CYP2D6 gene fragments
The DBS method was used on filter paper dried blood stain (diameter = 5 mm). QIAgen Mini Kit (DNA Mini Kit, QIAamp, Germany) was used to extract human genomic DNA from the blood stain, according to manufacturer's instructions. DNA samples were stored at -20 °C for subsequent assay.
Reference sequence (ID: NC_000022.11) was used as the template to design the PCR primers and to determine the reaction conditions for CYP2D6 gene, according to methods described in previous studies [18,22,23]. The coding regions covering exons 1-4 and exons 5-9 were amplified by segmentation. It is predicted that the amplification regions contain from 42131088 to 42128678 and from 42126035 to 42128422 and amplification products with length of 2411 bp and 2388 bp, respectively. The amplification products were sequenced by Shanghai Meiji Biomedical Technology Co. Ltd, using the Sanger method.

Analysis of CYP2D6 gene coding region polymorphism and prediction of enzyme activity
The sequencing results were collated by using DNAStar 11.0 and BioEdit 7.2.5, and the first coding DNA sequence (CDS) chain of CYP2D6 gene of each sample, known as maternal CDS chain, was obtained by the splicing method [18]. The maternal CDS chain was used to derive the paternal CDS chain by replacing the bases of the unread double allelic heterozygous sites [24], which were identified by searching the DNA sequencing peak chromatograms. DnaSP 6.11.01 software was used to identify site-specific mutations and haplotypes in the CDS chain, to calculate the nucleic acid diversity index (π) and expected heterozygosity (He), which were aligned with referent sequence and determined the SNP ID of each mutation on the National Center for Biotechnology Information (NCBI) platform (https:// www. ncbi. nlm. nih. gov/ snp/). GenVisR package in R (version: 4.0.2) was used to visualize the mutational waterfall plots of the haplotypes of the loci in the allelic form.
The haplotypes (allelic forms) of the CDS chain of CYP2D6 gene were determined in accordance with the criteria of the Human Cytochrome P450 Allele Nomenclature Committee Database (NM_000106.5) [25] and published literature [18,26]. The sub allelic forms can be directly defined by haplotype. Haplotypes devoid of criteria for allelic forms were customized as * + a single lowercase letter [18].
The genotype of each sample was presented in the diploid form consisting of maternal and paternal CDS chains, such as *10/*39, *1/*10 and so forth. The diploid allelic form of the genotype was no longer refined to the sub-allelic level, as the sub-allelic forms should be merged in the first place. For example, *10.001 and *10.002 could be combined into the allelic form *10 to count the frequency.
For the genotypes with standard nomenclature, the CYP2D6 enzyme activity was predicted by calculating the genotype score based on diploid genotypes, following the previously described method [11,27,28]. The genotype score is the sum of two allelic form scores. For instance, a score of 0.75 for genotype *10/*39 is the sum of 0.25 score for allelic form *10, and 0.5 score for allelic form *39. In the event of genotype score being 0.0, the enzyme activity is absent and hence poor metabolizer (PM). The genotype score ranging from 0.25 to 1.0, 1.25 to 2.25, and > 2.25 indicates intermediate metabolizer (IM), normal metabolizer (NM), and ultrarapid metabolize (UM), respectively [11,22,23,[29][30][31]. Meanwhile, the online tool PROVEAN (http:// prove an. jcvi. org/ index. php) was used to predict the function of CYP2D6 enzyme protein of different genotypes as deleterious (DT) or neutral (NT).

Statistics
The frequency of mutation site was presented by the ratio of the number of base substitutions to the total number of CDS chains. Each study subject had two CDS chains, including maternal chain and paternal chain. Chi-square test x 2 was used to analyse the differences in locus mutations and genotypes between the two groups. P < 0.05 was regarded statistically significant.

Sample information and PCR amplification of CYP2D6 gene fragment
A total of 345 blood samples were collected from vivax malaria patients for subsequent PCR amplification of human genomic DNA, which covers exon1-4 and exon5-9 regions of CYP2D6 gene. PCR amplification products showed the target fragment more than 2000 bp bands at gel electrophoresis. After collating the sequencing results of PCR amplification products, all 320 maternal CDS chains of CYP2D6 gene (1491 bp in length) were obtained in 345 subject blood samples (92.8%, 320/345), including 63 SR cases and 257 NR cases.
The blood samples of obtained 320 maternal CDS chains were collected from diagnosed vivax malaria cases in the following prefectures of Yunnan Province: Yingjiang (182), Mangshi (15) The demographic and clinical characteristics of 320 cases are listed in Table 1 and Additional file 2. The male to female ratio was 2.6:1. Most of the vivax malaria patients were imported into Yunnan Province after infection in Myanmar (98.4%, 315/320) and presented one suspected relapsed episode (95.2%, 60/63) ( Table 1).

CDS chain polymorphism and allelic forms of CYP2D6 gene
The polymorphism of 640 CDS chains, 320 maternal and 320 paternal, of CYP2D6 genes revealed base  Table 2).

Discussion
Located in the non-telomeric region of the long arm of human chromosome 22, CYP2D6 gene has 4,383 bp in length and is composed of 8-9 exons and 7-8 introns. The CDS chain of CYP2D6 has 1,338-1,491 bp to encode Table 3 Prediction of CYP2D6 enzyme activity based on genotyping of CDS chain of vivax malaria patients in Yunnan Province a CR is acronym for constituent ratio b TPR is acronym for Test positive rate c Chi-square test (significant level at P < 0.05) d The diploids with at least one undocumented allele (*m, *n, *o, *p, *q,*r, *s,*t, *u, *v, *w and *x) 446-497 amino acids. To be more specific, CYP2D6 gene with 9 exons has an inserted sequence encoding 51 amino acids chains of NVFLARYGPAWREQRRFSVSTL-RNLGLGKKSLEQWVTEEA. ACLCAAFANHSGC in the downstream of exon2, compared with that has 8 exons by aligning sequence [32]. The CYP2D6 genes analysed in this study all have 9 exons, and as high as 92.8% of the maternal CDS chains were obtained. Such a high efficiency of PCR amplification of CYP2D6 gene could be attributable to the absence of Poly-T/A structure in the CYP2D6 gene and the lower degree of DNA polymerase slippage on the template to facilitate the extension of DNA chain.
To date, as many as 300 allelic and sub-allelic variants of CYP2D6 gene have been identified [25]. Although the 12 locus mutations detected in this study at the CYP2D6 locus, such as c.31 (rs769259), c.100 (rs1065852), c.886 (rs16947), and c.1,457 (rs1135840), were all validated SNPs, up to 52.2% (12/23) of the star allelic variants by combination [32] have not been documented or assigned [24][25][26][27]. This could hinder accurate classification of CYP2D6 enzyme activities based on the genotype score of biallelic mutation for prediction. The insensitivity in the previous confirmation criteria of allele [25,26] to the combination of mutant loci obtained by whole gene sequencing is the most probable reason, in addition to the lack of accuracy due to the inclusion of intron SNPs in CYP2D6 gene allele identification.
Intriguingly, low-frequency mutant loci, such as c.271 (rs28371703), c.281 (rs28371704), c.294 (rs28371705), and c.505 (rs5030865), were present only in samples of SR group (Table 2). This study observed that genotype *4.001 and diploid *4/*4indicate null CYP2D6 enzyme function (PM), a conclusion consistent with the findings of Bennett et al. [2], who validated that allele *4 signifies severely impaired CYP2D6 enzyme activity. Although the frequency of homozygous diploid *4/*4 in the population of vivax malaria patients in Yunnan Province was 0.003 (1/320), which is much lower than 0.12 (27/220) in Brazilians [33], and 0.008 (2/250) in Whites [34]. However, Silvino's study showed that the most common CYP2D6 diploid from vivax malaria recurrence patients were predicted as reduced metabolism *2/*4 [34], indicating that allele *4 would have a negative effect on CYP2D6 enzyme activity. Thus, the rational suggestion put forward has been that those mutations at rs28371703, rs28371704, rs28371705 as well as other loci may be applied as the molecular markers of impaired CYP2D6 enzyme activity in clinical practice. Mutant variant rs16947, which was validated as damaged CYP2D6 enzyme activity in a previous study [35], showed a frequency of merely 17.8% in the present study. However, the allelic form *2 and its homozygous diploid *2/*2 did not display preferential distribution in the SR group or the NR group, let alone presaging the phenotype of abnormal enzyme activity (Table 3). This could be attributed to the rigorous scoring and classification of CYP2D6 enzyme activity, which could not reflect the subtle disparities in the genetic structure.
In contrast, the PROVEAN method, by taking every amino acid substitution into the CYP2D6 peptide chain rather than calling the known allelic combination like scoring method preserves the full amount of polymorphic information of the gene structure in its prediction results. Furthermore, in this study, the reduced activity allelic form *10 was detected in 57.8% (185/320) of the samples [36,37], and exhibited the highest frequency (0.423) in the samples. This is consistent with the previous conclusion that allele *10 is more widely distributed in Southeast Asian populations [38], yet the notable preferential distribution of genotype *10/*10 cannot be found, either in the SR group or the NR groups. In contrast, genotype *10/*39 indicates both IM and DT, and was commonly observed in the SR population. It is puzzling that predicted enzyme activity phenotype level of homozygous diploid *39/*39 by PROVEAN method is not lower or the same as genotype *10/*39. The speculated reason may be related to different allelic by combined inappropriately from some loci. For example, both allele *2.001and allele *10.001 were composed of three loci mutations at c.408, c.1,457, c.886 and at c.408, c.1,457, c.100 (Fig. 1), respectively, while the allele score of the former was higher than the latter. In addition to other Southeast Asian countries, the allele *39 and homozygous diploid *39/*39 were found in Yunnan Province, suggesting that allele *39 was not an accidental existence in the local population although the 0.03 frequency of allele *39 was lower than 0.2 in central/ South Asia and East Asia [39]. Such findings remind that during the process of effectively eliminating the infectious source of imported vivax malaria, Yunnan Province should pay attention to the effect of allele *39 existence, which may be one of the biological factors affecting eradication of malaria with PQ. Moreover, although the recent study shows that genotype *10/*39 has been a risk factor affecting PQ efficacy, whether sequencing CYP2D6 gene for PROVEAN assessment or not, it still needs to be selected according to the roles of different laboratories. After all, when there are residual blood samples of cases exposed to PQ, it is no technical obstacle to supply the sequencing analysis of CYP2D6 gene, and it will not bring any negative impact on these cases' health. However, the heterogeneity of CYP2D6 enzymatic activity predicted only by molecular markers cannot be used as the basis for adjusting drug policy on PQ in Yunnan Province. The authors promoted the testing of substrate metabolic activity from different CYP2D6 genotypes according to previous standard [12,30] to observe the phenotype of CYP2D6 enzyme activity.
The radical cure of vivax malaria is a critically important intervention to prevent the re-transmission and repeated attack of malaria [38]. As we know, the study of the CYP2D6 gene allelic polymorphism and its distribution in the PQ-exposed population in Yunnan Province was firstly reported, and the experimental approaches are developed and introduced to press for the accurate screening of population suitable for PQ therapy in Yunnan Province.
This study has certain shortcomings. First, the determination of the recurrence of vivax malaria was somewhat flawed, as the Plasmodium reservoir in bone marrow into peripheral blood [40,41] or the resistance of Plasmodium to PQ [42], despite efforts to validate the genetic consistency of the infected strains of patients with recurrent vivax malaria. Second, the effect of gene copy number on the prediction of CYP2D6 enzyme activity was not ruled out. It was shown that once the normal allele of CYP2D6 gene activity has more than two copies, CYP2D6 enzyme activity will be changed from the phenotype of NM into UM [43, 44]. Third, SNPs or single nucleotide variant (SNV) in the intron region of CYP2D6 gene were not identified nor applied to define the allelic form, hence more undocumented allelic forms of CYP2D6 gene require further validation. In the next stage, the research team will refine and optimize the experimental protocol by making constant adjustment.

Conclusions
In this study, only 12 mutation loci, including c.31, c.100, c.271, c.281, c.294, c.297, c.336, c.408, c.505, c.801, c.886, and c.1,457 were detected in Yunnan Province vivax malaria patients, which is consistent with the findings of a previous small sample. In those patients receiving PQ dosage in Yunnan Province, both rs1135840 SNP and allele form *10 was common in the CYP2D6 gene coding region. Low-frequency mutation sites, such as rs28371703, were presented only in patients with vivax malaria relapse. Genotype *10/*39 could be suitable for indicating impaired CYP2D6 enzyme activity in this population to be treated with PQ.