Characterization of CYP2B6 and CYP2A6 Pharmacogenetic Variation in Sub‐Saharan African Populations

Genetic variation in CYP2B6 and CYP2A6 is known to impact interindividual response to antiretrovirals, nicotine, and bupropion, among other drugs. However, the full catalogue of clinically relevant pharmacogenetic variants in these genes is yet to be established, especially across African populations. This study therefore aimed to characterize the star allele (haplotype) distribution in CYP2B6 and CYP2A6 across diverse and understudied sub‐Saharan African (SSA) populations. We called star alleles from 961 high‐depth full genomes using StellarPGx, Aldy, and PyPGx. In addition, we performed CYP2B6 and CYP2A6 star allele frequency comparisons between SSA and other global biogeographical groups represented in the new 1000 Genomes Project high‐coverage dataset (n = 2,000). This study presents frequency information for star alleles in CYP2B6 (e.g., *6 and *18; frequency of 21–47% and 2–19%, respectively) and CYP2A6 (e.g., *4, *9, and *17; frequency of 0–6%, 3–10%, and 6–20%, respectively), and predicted phenotypes (for CYP2B6), across various African populations. In addition, 50 potentially novel African‐ancestry star alleles were computationally predicted by StellarPGx in CYP2B6 and CYP2A6 combined. For each of these genes, over 4% of the study participants had predicted novel star alleles. Three novel star alleles in CYP2A6 (*54, *55, and *56) and CYP2B6 apiece, and several suballeles were further validated via targeted Single‐Molecule Real‐Time resequencing. Our findings are important for informing the design of comprehensive pharmacogenetic testing platforms, and are highly relevant for personalized medicine strategies, especially relating to antiretroviral medication and smoking cessation treatment in Africa and the African diaspora. More broadly, this study highlights the importance of sampling diverse African ethnolinguistic groups for accurate characterization of the pharmacogene variation landscape across the continent.

Genetic variation in the cytochrome P450 (CYP) supergene family is a major contributor to the drug response variability within and between populations.Of the 57 known functional CYP genes, 12 encode enzymes responsible for the metabolism and bioactivation of 70-80% of all clinically prescribed medications. 1,2In particular, CYP2B6 and CYP2A6 combined are important for the metabolism of over 10% of drugs that have predominantly CYPmediated pathways, including antiretrovirals (e.g., efavirenz and nevirapine), nicotine, and bupropion, among other substrates. 1,3he human CYP2B6 and CYP2A6 genes are located on chromosome 19q13 within the large CYP2ABFGST gene cluster. 4oth of these genes are in close proximity with homologous pseudogenes, CYP2B7 and CYP2A7, respectively.CYP2B6 and CYP2A6 are highly polymorphic, with over 37 and 45 star alleles (haplotypes) catalogued for these genes, respectively, by the Pharmacogene Variation Consortium (https:// www.pharm var.org).Star alleles comprise various combinations of single nucleotide variations (SNVs), small insertions and deletions (indels), and/or structural variants-which include copy number variations and other more complex re-arrangements. 5A number of star alleles, such as CYP2B6*6 (decreased function), CYP2B6*22 (increased function), CYP2A6*4 (gene deletion), and CYP2A6*46 (58 bp 3′-UTR gene conversion), are known to contribute to variability in patient response to the aforementioned medications metabolized by CYP2B6 and CYP2A6, respectively. 6,7However, the full catalogue of pharmacogenomically relevant star alleles is yet to be determined, especially in African populations, and across other under-represented biogeographical groups. 2,8frican populations have higher genetic diversity compared with any other global superpopulations.Therefore, the paucity of information on CYP2B6 and CYP2A6 star allele distributions in under-represented African populations poses challenges for effective phenotype prediction 9,10 based on variants or diplotypes determined via various next-generation sequencing (NGS)-based platforms and bioinformatics pipelines.This effectively hampers efforts aimed at optimizing drug therapy adjustments in clinical settings, as there is often variability in measured drug response even within the same genotype-predicted metabolizer phenotype categories.The presence of the neighboring CYP2B7 and CYP2A7 pseudogenes and complex structural variant star alleles make diplotyping CYP2B6 and CYP2A6 challenging and oftentimes laborintensive. 8,11However, the recent availability of high coverage African genomes generated by various international 12 and Africabased projects, 13 and development of star allele calling bioinformatics tools [14][15][16] provide the opportunity to study CYP2B6 and CYP2A6 pharmacogenetic variation across African populations at scale.Furthermore, recent NGS technologies, such as singlemolecule real-time (SMRT) sequencing can facilitate validation of novel star alleles in these genes, as previously applied to other complex pharmacogenes, such as CYP2D6. 17,18This study therefore

Study Highlights
WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?
 CYP2B6 and CYP2A6 genetic variation contributes to clinically relevant differences in response to antiretrovirals, nicotine, and bupropion (among other drugs) across individuals and populations.CYP2B6 and CYP2A6 are therefore important genes in clinical pharmacogenetic implementation initiatives globally.Current catalogues of CYP2B6 and CYP2A6 star alleles (haplotypes) are incomplete in part due to the high polymorphism in these genes and difficulty in interrogating their genomic loci.

WHAT QUESTION DID THIS STUDY ADDRESS?
 To date, the proportion of individuals with African ancestry has been relatively low across pharmacogenetic studies focused on CYP2B6, CYP2A6, and other key pharmacogenes.In particular, continental African populations have been underrepresented.This study addresses the paucity of information about the distribution of known star alleles in CYP2B6 and CYP2A6 across diverse African populations, and also highlights novel African-ancestry star alleles, with a view toward enabling relevant precision medicine strategies in Africa.

WHAT DOES THIS STUDY ADD TO OUR KNOWLEDGE?
 This study highlights the varying distributions of known and novel CYP2B6 and CYP2A6 star alleles, and predicted metabolizer phenotypes (for CYP2B6) across previously under-represented African populations, and in comparison with global populations.For each of the two pharmacogenes, over 4% of the sub-Saharan African participants had predicted novel star alleles.Predicted novel CYP2B6 and CYP2A6 star alleles from our comparative analysis involving other global biogeographical groups are also provided.Furthermore, this study exemplifies the utility of high coverage whole genome sequence data and validated bioinformatics algorithms in catalyzing the investigation of haplotypes in hypervariable pharmacogenes, such as CYP2B6 and CYP2A6.In addition, this is one of the first studies to demonstrate the use of targeted high-fidelity single-molecule real-time sequencing for characterizing novel CYP2B6 and CYP2A6 star alleles.

HOW MIGHT THIS CHANGE CLINICAL PHARMA-COLOGY OR TRANSLATIONAL SCIENCE?
 This study highlights the need for clinical pharmacogenetics implementation strategies across Africa for substrates such as antiretrovirals, nicotine, and bupropion.Moreover, our findings (such as the relatively high number of novel star alleles) emphasize potential pitfalls in transferability of CYP2B6 and CYP2A6 phenotype prediction strategies based predominantly on Europeanancestry populations.Therefore, pharmacogenomic studies and relevant variant functional impact assays involving more understudied populations-particularly in Africa-are warranted to inform effective drug efficacy and safety optimization across Africa, the African diaspora, and other global settings.aims to extensively characterize the distribution of known and potential novel CYP2B6 and CYP2A6 star alleles, in particular across diverse sub-Saharan African (SSA) populations.
Our findings have significant implications for precision medicine strategies, particularly involving optimizing antiretroviral treatment, smoking cessation drug therapy, and treatment of major depressive disorders, across clinical settings in SSA, Africa at large, and those serving the African diaspora.Furthermore, our star allele distribution comparisons with other global biogeographical groups provide key insights into previously uncharacterized CYP2B6 and CYP2A6 pharmacogenetic variation globally.In addition, this study highlights the use of targeted SMRT sequencing for the validation of CYP2B6 and CYP2A6 star alleles.

Study population and whole genome sequence data sources
SSA populations represented in this study and the primary data sources are summarized in Table 1.We analyzed 961 SSA whole genome sequence (WGS) samples in total.This included 272 genomes that were generated as part of the Human, Heredity, and Health in Africa (H3Africa)-Baylor dataset 13,19 (see Table 1 for details), 100 genomes of south eastern Bantu Speakers (SEB) that are part of the Africa Wits-INDEPTH Partnership for Genomics Research (AWI-Gen) project, 20 40 South African genomes generated by the Cell Biology Research Laboratory (CBRL; 39 African ancestry and one of mixed ancestry), 15 of the 24 genomes (7 excluded due to recent admixture) generated by the Southern African Human Genome Programme (SAHGP), 21 31 African genomes generated by the Simons Genome Diversity Project, 22 and 504 continental African genomes generated by the 1000 Genomes Project. 12In addition, we analyzed the rest (n = 2,000) of the 1000 Genomes Project high-coverage WGS samples, including African American/Afro-Caribbean participants (n = 157), and participants of European (n = 503), East Asian (n = 504), South Asian (n = 489), and Admixed American (n = 347) ancestry, in order to compare the CYP2B6 and CYP2A6 star allele distribution in Africa vs. that in global populations.All the genomes analyzed in this study were sequenced on Illumina platforms to a minimum depth of 30× by the primary research projects, and they were aligned to the GRCh38 reference genome.

DNA samples for star allele validation
Genomic DNA from 192 study participants was used during our longread-based CYP2B6 and CYP2A6 star allele validation under ethics amendment terms in protocol M200993.This included DNA samples from the CBRL South African participants, the SEB AWI-Gen participants, and aliquots of samples (from Ghana and Burkina Faso participants) provided by AWI-Gen to H3Africa-Baylor for high coverage sequencing.

Star allele analysis
CYP2B6 and CYP2A6 star alleles were called from WGS datasets using three separate tools: StellarPGx version 1.2.6, 14 Aldy version 4.4, 16 and PyPGx version 0.19.0 23 (successor to Stargazer 15 ), in order to minimize the possibility of false-negative calls.Binary Alignment Map files were provided as input to StellarPGx and Aldy, which perform combinatorial-based diplotype assignment.For PyPGx, we supplied Variant Call Format files and depth of coverage files generated using the in-built create-input-vcf and prepare-depth-of-coverage scripts. 23The 1000 Genomes Reference Panel was used for PyPGx's statistical phasing.Consensus diplotype calls were determined by considering star alleles called by at least two of the algorithms.However, for samples where complex structural variants (such as CYP2A6*46 which is defined by a 58 bp 3′-UTR conversion to CYP2A7) were called by at least one tool, we performed manual visual inspections of the read coverage using the Integrative Genomics Viewer, 24 in addition to considering copy number and allele fraction profile plots output by PyPGx.

Metabolizer phenotype prediction
The CYP2B6 consensus diplotype calls in this study were translated to CYP2B6 metabolizer phenotypes based on the Clinical Pharmacogenomics Implementation Consortium guidelines for efavirenz 9 as there were no corresponding guidelines for other CYP2B6drug pairs at the time of this study.Participants with potentially novel star alleles were assigned an indeterminate metabolizer status.CYP2A6 phenotypes were not predicted in this study (see Discussion).

CYP2B6 and CYP2A6 long-range PCR
As CYP2B6 is a relatively large gene (~27 kb), multiple XL-PCR fragments were generated to cover various regions (see Supplementary Material S1).CYP2B6 Frag1 (~9 kb) stretched from the CYP2B6 upstream region to intron 1; Frag3 (~8.5 kb) covered exon 2 to exon 6; Frag4 (~10.3 kb) covered exon 4 to exon 9; and Frag5 (~7.1 kb) covered exon 8 to the CYP2B6 downstream region.We ran multiple PCR optimisations to amplify Frag2 (~13 kb, exon 1 to exon 3) but none were successful.The XL-PCR primers and cycling conditions for each fragment are detailed in Supplementary Material S1.
For CYP2A6, 9.2 kb-long amplicons (FragA) that comprise the CYP2A6 gene as well as upstream and downstream non-coding regions were generated following the long-range PCR (XL-PCR) protocols described by Wassenaar et al. 11 with some modifications.The XL-PCR forward and reverse primers used and PCR cycling conditions are detailed in Supplementary Material S1.We used previously published primers 25,26 to ascertain the presence of CYP2A6 gene duplication(s).See Supplementary Material S1 for depictions of the CYP2A6 XL-PCR fragments targeted in this study.

Amplicon pooling and barcoding
Amplicon pooling, barcoding, and the subsequent high fidelity (HiFi) sequencing were performed by Inqaba Biotech.Quality control of the PCR products from the first-round PCR was carried out using a 0.8% agarose gel for visual inspections and the Agilent 4200 TapeStation (Diagnostech, Johannesburg, South Africa) for quantification following the D5000 Screen tape kit.For each participant, CYP2B6 amplicons were pooled equimolarly with CYP2A6 amplicons, and also with CYP2D6 amplicons from the related study. 17All amplicons were in the size range of 5-12 kb.The amplicon pools were purified using the AMPure PB bead purification (Pacific Biosciences, California, USA).Thereafter, barcodes were added to the purified amplicons via a second round of PCR.The 25 μL reaction mix contained 5-10 ng/μL of initial pooled PCR product, 11 μL of 2X longAmp Taq ReadyMix (New England Biolabs, USA), 1 μL of DMSO (100%), and 5.0 μL of 2 μM barcoded universal primers (Inqaba Biotech).The PCR cycling conditions were as follows: 20 cycles of 95°C for 1 minute, 65°C for 30 seconds, and 72°C for 11 minutes.

Single-molecule real-time sequencing
SMRTBell libraries were constructed from ~ 500 ng of pooled barcoded fragments by following standard end-repair, adapter ligation, and purification strategies detailed in the Pacific Biosciences protocols (https:// www.pacb.com/ wp-conte nt/ uploa ds/ Proce dure-Check list-Prepa ring-HiFi-SMRTb ell-Libra ries-using-SMRTb ell-Expre ss-Templ ate-Prep-Kit-2.0. pdf).Annealing and binding of SMRTbell templates was performed using the Sequel II Binding kit 2.2 and sequencing primer version 5, and Circular Consensus Sequencing was performed for a movie time of 30 hours to generate HiFi reads via the SMRT Link software on the Sequel IIe instrument (Pacific Biosciences) at Inqaba Biotech.The star allele definitions followed in this study are according to PharmVar v5.

ARTICLE
Raw HiFi sequence data were demultiplexed and processed into individual samples according to the corresponding barcode sequences using the NGSutils NGS data analysis software kit. 27CYP2B6 and CYP2A6 HiFi reads were aligned to the corresponding regions in GRCh38, and also to the NG_007929.1 and NG_008377.1 reference sequences, respectively, using pbmm2 v1.7.0 (https:// github.com/ Pacif icBio scien ces/ pbmm2 ).Variant calling was done using DeepVariant. 28Thereafter, variant phasing and read haplotagging was carried out for samples containing more than one heterozygous variant using WhatsHap. 29riant functional prediction CYP2B6 and CYP2A6 variants were annotated using the Ensembl Variant Effect Predictor (VEP).30 The functional effects (corresponding to the NM_000767.5 and NM_000762.6transcripts) of potential novel star allele-defining variants on these genes were predicted using VEP plugins including SIFT, 31 Polyphen-2, 32 CADD, 33 LRT, 34 PROVEAN, 35 and VEST4, 36 taking into account the absorption, distribution, metabolism, and excretion (ADME)-optimized parameters suggested by Zhou et al. 37 SIFT Indel 38 was used to annotate frameshift variants while LOFTEE 39 was used to identify loss of function variation.

Statistical analysis
Star allele frequencies were summarized using percentages.Deviations from Hardy Weinberg equilibrium were investigated using the genetics package (https:// www.rdocu menta tion.org/ packa ges/ genet ics/ versi ons/ 1. 3.8.1. 3) in R version 4.1.3(https:// www.r-proje ct.org).The Fisher's exact test was used to determine significant differences in population CYP2B6 and CYP2A6 star allele frequencies.Any P values of < 0.05 were considered statistically significant.

Ethics statement
This study was approved by the Human Research Ethics Committee (Medical) of the University of the Witwatersrand under protocol numbers M190631 and M200993.We performed secondary analysis of full genomes generated by contributing studies/centers based across Africa (each of which obtained local ethics approval), and further supplemented this with analysis of data from public repositories.

CYP2B6 star allele frequencies
Among the normal function CYP2B6 star alleles, CYP2B6*1 and *2 were the most frequent in SSA followed by *17 (Table 2).However, CYP2B6*17 was not observed among the Berom in Nigeria and the Fon in Benin, whereas CYP2B6*2 was not observed among the participants from Cameroon (Table 3).We found CYP2B6*5 to be rare in SSA compared with frequencies among European, admixed American, and South Asian participants (Table 2).
Among the known no function CYP2B6 star alleles, only CYP2B6*18-which is defined by rs28399499 (I328T)-was present in SSA (AF = 9.5%).The frequency of *18 was lower in African American/Afro-Caribbean participants (AF = 7.5%) but this difference was not statistically significant (P = 0.3).In comparison, CYP2B6*18 was found to be virtually absent among the European, East Asian, and South Asian participants represented in the 1000 Genomes Project dataset (Table 2).The frequency of *18 varied across the SSA populations, ranging from 2.2% among the Berom in Nigeria to 18.8% among the participants from Cameroon (Table 3).
With regard to the increased function CYP2B6 alleles, CYP2B6*4-defined by rs2279343 (K262R) without rs3745274 (Q172H) in phase-was found in only one participant (among GWD) across SSA.Conversely, CYP2B6*22 (defined by rs34223104) was more common and present in all SSA populations included in this study (combined AF = 1.1%), except for the Fon in Benin, Berom in Nigeria, Bantu speakers from Zambia, and Ghanaian participants.

CYP2A6 star allele frequencies
The frequencies of established CYP2A6 star alleles are summarized in Table 2 for SSA and other global populations (for comparison) represented in this study.CYP2A6*1 (normal function star allele) was observed at the highest frequency across majority of the populations in this study.
CYP2A6*46 (formerly *1B; defined by a 58 bp gene conversion to CYP2A7 in the 3′-UTR) was observed at a frequency of 6.1% in SSA.Conversely, this star allele was observed at significantly higher frequencies in European, East Asian, and South Asian populations (Table 2).The frequency of CYP2A6*46 varied across SSA populations in the study, ranging from 1.5% among the Luhya in Webuye (Kenya) to 11% among the Fon in Benin (Table 3).
Among the previously characterized duplication star alleles, we detected CYP2A6*1x2 in this study.This star allele was present at a ARTICLE frequency of 0.8% in SSA which was less than the frequency among African American/Afro-Caribbean participants (2.6%; P = 0.01), comparable to the frequency in European, admixed American, and South Asian participants, but absent from the East Asian populations in this study (Table 2).Among the SSA participants, CYP2A6*1x2 was most frequent among the participants from Ghana (AF = 1.9%) and the Mende in Sierra Leone (AF = 1.8%;Table 3).Among the relatively large number of CYP2A6 star alleles that cause decreased CYP2A6 expression and/or activity, *9 (defined by the rs28399433 variant in the TATA box) and *17 (rs28399454, V365M) were the most frequent across SSA (Table 2), but observed at non-uniform frequency distributions (Table 3).The frequency of the CYP2A6*9 haplotype in SSA (7.5%) was comparable to that in other biogeographical groups, except in the East Asian populations (AF = 19%; Table 2).In contrast, CYP2A6*17 was largely African-specific as it was observed at a frequency of 11% in SSA and 12.3% in African American/Afro-Caribbean participants, but absent among the European, East Asian, and South Asian participants (  2, 3).
We observed the CYP2A6*4 (CYP2A6 gene deletion) at a frequency of 3.1% in SSA, which was similar to the *4 frequency in the African American/Afro-Caribbean participants, and South Asian participants.However, CYP2A6*4 was less frequent among the European populations and highest among the East Asian populations (Table 2).Among the SSA participants in this study, CYP2A6*4 was most frequent among the Berom in Nigeria, and Bantu-speakers from Zambia (AF = 6.1%), but it was not observed among the participants from Ghana.b HiFi SMRT sequencing data later revealed an exon 3/intron3 conversion to CYP2A7 in phase with this core variant (see Figure 3).This core variant may occur as part of the exon 3/intron3 conversion to CYP2A7.d The duplicated form of CYP2A6*56 is yet to be fully validated.
For haplotype 11 (Table 4, Figure 2), HiFi data from a South African participant confirmed that rs142421637 (NG_007929.1:g.17856~C>T, R109W) was in phase with CYP2B6*6defining variants.For the other participants that have been successfully resequenced so far, the star alleles predicted from the short-read WGS data were concordant with the ones identified from the long-read data.Moreover, we also inferred phasing information for suballeles from the HiFi data mainly from exons 2-9.
From the targeted SMRT sequencing of the CYP2A6 XL-PCR fragments, we further characterized CYP2A6*54 which was identified in 2 SEB South African participants.This haplotype has rs558145012 (NG_008377.1:g.5128G>T, G36V) on a *46 background (see haplotype 1 in Table 4, Figure 3).In addition to rs558145012, we observed 4 other missense variants (Figure 3) on this haplotype arising due to a potential partial exon 3/intron 3 gene conversion (NG_008377.1:g.6798-6846bp).The second CYP2A6 haplotype for this participant was a novel *17 suballele (Figure 3).The second novel major CYP2A6 star allele validated in this study is CYP2A6*55 defined by rs114558780 (NG_008377.1:g.10126C>T, T378I) on a *1 background (see haplotype 13 in Table 4, Figure 3).The CYP2A6*55 suballele depicted in Figure 3 is from a South African participant.This haplotype (allele count = 13) was also identified in participants from Botswana, Nigeria, Congo, and the Gambia (Table 4), which adds another layer of validation in terms of its occurrence and definition.In addition, HiFi data from a South African participant enabled validation of CYP2A6*56 defined by rs113558392 (NG_008377.1:g.9394T>G, V292G) on a *46 background (see haplotypes 16 and 17 in Table 4, Figure 3).The WGS data for this participant indicated the presence of a duplication of *56, however, the duplicated allele was not successfully amplified for SMRT sequencing in this study.The potentially novel CYP2A6*56 duplication had a frequency of 0.5% (allele count = 10) in SSA based on all the WGS datasets in this study.

DISCUSSION
CYP2B6 and CYP2A6 are important pharmacogenes as genetic variation in these genes is known to impact the metabolism and response to medications, such as efavirenz and nevirapine (antiretrovirals), bupropion (antidepressant), and nicotine (major psychoactive component in cigarette smoke).These medications are important in the African context given the existing high HIV burden and the increasing prevalence of major depressive disorders, and smoking-related non-communicable diseases.In this study, we report the distribution of CYP2B6 and CYP2A6 star alleles across SSA based on the comprehensive analysis of 961 high coverage genomes representative of diverse populations from central, eastern, western, and southern Africa.These (short-read) data mainly include genomes generated by H3Africa projects (https:// h3afr ica.org) and other collaborations within Africa, 13,20,21,41 and data from the 1000 Genomes Project Consortium. 12For CYP2B6, we further present the distribution of the efavirenz-based predicted phenotype distribution.Our analysis also includes comparisons between the CYP2B6 and CYP2A6 allele distributions in SSA and other global populations.In addition, we infer 50 potentially novel African-ancestry alleles for CYP2B6 and CYP2A6 combined, and perform long-read-based characterization for some of these star alleles.
The known CYP2B6 star allele distributions across SSA populations are mainly from studies among participants from Ghana, 42 Uganda, 43 Zimbabwe, 6 SEB in South Africa, 44 and the 5 SSA populations represented in the 1000 Genomes Project phase III dataset which largely comprises low coverage WGS data and whole exome sequence data. 45For CYP2A6, the available frequency data for individuals of African ancestry is predominantly from African American populations, as reviewed by Tanner and Tyndale. 3 In comparison, this study presents CYP2B6 and CYP2A6 star allele distributions from a more diverse set of genomes from continental SSA populations, including previously understudied populations, for example, the Berom in Nigeria, Fon in Benin, Bantu speakers from Zambia, participants from Botswana, participants from Cameroon, and South African populations represented in the AWI-Gen and CBRL datasets (Table 1).Our findings highlight the varying allele distributions for both CYP2B6 and CYP2A6 (Table 3) across the SSA populations in this study.Some significant star allele frequency differences were also observed between populations in some neighboring African countries and/or ethnolinguistic groups within the same country which typifies the complex pharmacogenomic variation landscape observed across Africa by previous studies. 46s expected, CYP2B6*6 was the most frequent decreased function star allele in SSA whereas the no function CYP2B6*18 allele was also common but with varying frequencies.These two alleles are partly responsible for the high proportion of efavirenz PM phenotypes predicted in SSA in this study (Figure 1), which is consistent with previous research studies in SSA. 6,47,48The pharmacogenetics analysis in these studies focused mainly on CYP2B6*6 and *18.However, based on the results in this study, genotyping only *6 and *18 among SSA populations, where CYP2B6*9, *20, *22, *29, and *36 frequencies can be > 1%, could lead to inaccurate star allele calls and phenotype predictions.Furthermore, the CYP2B6*6 frequency differences (Table 3) and presence of potential novel star alleles on a CYP2B6*6 backbone (in addition to those on *1, *2, and *22 backbones; see Table 4) exemplify the caveats of blanket precision medicine implementation strategies.Among the CYP2B6 star alleles with unknown/uncertain function currently catalogued by PharmVar, we only identified *11 and *33 in SSA, both of which were singletons.This was in contrast to the frequency for *11 (7.1%) inferred from previous studies across SSA in the PharmGKB CYP2B6 reference materials (https:// www.pharm gkb.org/ page/ cyp2b 6RefM aterials), and emphasizes the importance of star allele assignment based on full haplotype information.
For CYP2A6, this study provides insights into the distribution of key star alleles, such as CYP2A6*17 and *9 (associated with decreased CYP2A6 activity) across diverse continental African populations.Furthermore, the WGS data used in this study enabled detection of CYP2A6 structural variants-including the CYP2A6*1x2 and *46 star alleles, which are associated with greater in vivo nicotine metabolism. 3CYP2A6*46 (defined by a 58 bp gene conversion in 3′-UTR) is challenging to call from short-read WGS as the gene conversion causes read misalignments to CYP2A7, and it occurs in linkage disequilibrium with multiple other star alleles.StellarPGx was the only tool that enabled calling of CYP2A6*46 in this study.The 3′-UTR gene conversion is associated with increased mRNA stability, thus contributing to increased CYP2A6 expression. 49In general, *46 occurred at a lower frequency among SSA populations (6.1%) in comparison to other global populations (Table 2).For CYP2A6*1x2 (frequency of 0.8% in SSA), we could not differentiate CYP2A6*1x2A from CYP2A6*1x2B via the tools and WGS data used in this study.CYP2A6*1x2 results from unequal crossover involving CYP2A6 and the neighboring CYP2A7 pseudogene during recombination. 26The reciprocal of this unequal crossover is the no function CYP2A6*4 (gene deletion) allele, which we observed at frequencies as high as 6.1% in the Berom in Nigeria, and Bantu speakers from Zambia, and contrastingly lower frequencies among some of the other SSA populations (Table 3).
Regarding the CYP2B6 metabolizer phenotypes, we observed unique distributions for SSA compared with other global biogeographical groups (Figure 1a).This was mainly due to the differences in the diplotype frequencies and largely consistent with estimates in the PharmGKB CYP2B6 reference materials. 2Notably, the high proportions of participants with CYP2B6 poor and/or IM status observed across SSA (Figure 1a,b) emphasize the need for precision medicine implementation across Africa for medications that rely on CYP2B6-mediated metabolism.However, the predicted CYP2B6 phenotypes should be interpreted with caution as there are a number of non-genetic factors not investigated in this study (e.g., substrate specificity, phenoconversion, and environmental factors) that could influence the CYP2B6 phenotype.Therefore, pharmacokinetic studies in people with various CYP2B6 diplotypes in African populations are needed to determine appropriate drug dosage optimization algorithms.
This study inferred multiple potential novel African-ancestry star alleles for both CYP2B6 and CYP2A6 (Table 4).The core variants defining these star alleles were all rare and are not novel per se, but rather they are nonsynonymous variants that are either not currently catalogued as allele-defining by PharmVar or have been found in different combinations in our study.The functional impact of these novel star alleles is yet to be ascertained.However, CYP2B6 novel haplotypes defined by rs373926269 (splice-donor) and rs370958436 (stop-gained), and the CYP2A6 novel haplotype defined by rs61605570 (stop-gained; see Table 4) are likely to be nonfunctional as they have protein-truncating consequences.Although all the predicted novel star alleles in this study are independently relatively rare, collectively (frequency of 2% and 3% for CYP2B6 and CYP2A6 novel star alleles, respectively) they represent a significant challenge to pharmacogenetics strategies across SSA, if tests based only on common variants are implemented.Furthermore, the relatively high number of these previously uncharacterized haplotypes exemplify the considerable genetic diversity known to occur among African populations, including diversity in the pharmacogene variation landscape. 46,50Regarding star allele validation, this is the first study to perform targeted SMRT sequencing to characterize novel CYP2B6 and CYP2A6 star alleles in an African setting.Three of the novel major CYP2A6 star alleles (*54, *55, and *56) inferred from the short-read WGS were further characterized via SMRT sequencing, as were multiple novel suballeles, and they also have been reviewed and designated by PharmVar.It is important to note that the partial exon 3/intron 3 conversion in CYP2A6*54 can pose diplotype assignment challenges (similar to the CYP2A6 3′-UTR conversion) when using short-read WGS as some of the "conversion SNVs" may not be detected during variant calling-which is a result of read misalignments to CYP2A7 (see Supplementary Material S3).CYP2B6 targeted SMRT sequencing presented considerable challenges-discussed below in the limitations.However, it also provided resolution for 3 novel CYP2B6 star alleles.The process of submitting these novel star alleles to PharmVar for naming is ongoing.
There were some limitations in this study.First, we used mostly short-read WGS data for our analysis.Therefore, computationally inferred novel star alleles for both CYP2B6 and CYP2A6 should be interpreted with caution given the difficulties associated with diplotyping these genes. 8,11In the same vein, we were unable to computationally resolve CYP2B6 diplotypes for 26 SSA participants and CYP2A6 diplotypes for 4 SSA participants (Supplementary Material S2) either due to presence of novel core variants that could not be phased or due to potential uncharacterized structural variations that were novel to all the algorithms used in this study.We have reported the background star alleles for these participants and indicated the potential novel allele-defining variants (which require further experimental validation), and where possible provided sample IDs (Coriell and SGDP samples).Future NGS studies involving long-read platforms may be more informative in resolving CYP2B6 and CYP2A6 diplotypes for samples not further characterized in this study, and generally when analyzing variation in these complex pharmacogenes across understudied populations.Second, predicted CYP2A6 phenotypes (relating to nicotine metabolism) could not be assigned for SSA participants in this study as the recently developed genetic risk score 10 has only been validated in African American/Afro-Caribbean individuals but not continental African populations.In addition, this genetic risk score only considers a few well-characterized CYP2A6 star alleles and/or variants, which could be potentially misleading given the extent of novel star alleles predicted in this study.In the context of smoking initiation and cessation, we anticipate that CYP2A6 PMs are less likely to become smokers, and if they do, they would smoke fewer cigarettes per day.For CYP2B6, the predicted phenotypes were based on efavirenz metabolism as there are no Clinical Pharmacogenetics Implementation Consortium guidelines for other drugs currently.About 7% of individuals had indeterminate phenotypes due to harboring novel/known star alleles with unknown function and/or ambiguous diplotypes.Last, in our laboratory validation, CYP2B6 XL-PCR products for 27 participants failed to barcode during the HiFi sequencing library preparation due to technical challenges that arose from pooling the CYP2B6 amplicons with those from CYP2A6 (for which 28 samples failed barcoding).We mitigated this by optimizing barcoding for select individual amplicons from samples that had predicted novel star alleles or suballeles.
Future work entailing pharmacogenomics analysis based on WGS datasets from other under-represented African populations (e.g., Nilo-Saharan, Afroasiatic, and non-Bantu language families) would be critical for precision medicine across Africa and addressing disparities in comparison to global settings.Furthermore, in-depth characterization of the CYP2B6 and CYP2A6 novel star alleles not validated in this study and also analyses to determine their clinical functional impact would be important in supporting clinical pharmacogenomics implementation strategies across Africa and the African diaspora.In the context of HIV treatment, it is important to assess how known and novel star alleles in CYP2B6, CYP2A6, and other pharmacogenes might affect response to new first-line drugs such as dolutegravir.Similarly, in the context of nicotine response, more studies are needed to assess the transferability of the CYP2A6 weighted genetic risk score 10 for phenotype prediction across various continental African populations, and further improve on its accuracy through inclusion of more Africanancestry star alleles.
In conclusion, this study presents an extensive characterization of the CYP2B6 and CYP2A6 pharmacogenetic variation in diverse SSA populations based on analysis of high-depth genomes generated by multiple projects.The differences in CYP2B6 and CYP2A6 star allele frequencies across SSA, and compared with other global populations, as well as the high number of potentially novel alleles in SSA emphasize the need for pharmacogenomic studies across under-represented populations for effective precision medicine implementation in Africa.Furthermore, these findings emphasize the advantage that sequencing-based strategies would present over targeted SNV genotyping tests for precision medicine purposes in genetically diverse populations.In addition, from our comparative star allele analysis we highlight potential novel CYP2B6 and CYP2A6 star alleles across other global populations, which is relevant in informing pharmacogenetic testing strategies worldwide.

Figure 1
Figure 1 Distribution of CYP2B6 phenotypes predicted in relation to efavirenz metabolism.(a) Comparison of CYP2B6phenotypes across the global biogeographical groups included in this study.In general, SSA populations have a higher proportion of CYP2B6 poor and intermediate metabolizers compared with other biogeographical groups.This is in part accounted for by the high frequency of the CYP2B6*6 (decreased function) and CYP2B6*18 (no function) across Africa.(b) CYP2B6 phenotype distribution across the SSA populations in this study.BFA, participants from Burkina Faso; BOT, participants from Botswana; BRN, Berom in Nigeria; BSZ, Bantu speakers from Zambia; CAM, Cameroonian participants; ESN, Esan in Nigeria; FNB, Fon in Benin; GHA, Ghanaian participants; GWD, Gambian in Western Division (Mandinka); IM, intermediate metabolizer; LWK, Luhya in Webuye (Kenya); MSL, Mende in Sierra Leone; NM, normal metabolizer; PM, poor metabolizer; RM, rapid metabolizer; SEB, south-eastern Bantu in South Africa; UM, ultrarapid metabolizer; YRI, Yoruba in Ibadan, Nigeria. c

Figure 3
Figure 3 Novel CYP2A6 star alleles characterized via targeted single-molecule real-time (SMRT) sequencing.Panel (a) depicts a CYP2A6 diplotype observed in a South African participant with the novel CYP2A6*54 star allele which is defined by rs558145012 (p.G36V) and core single nucleotide variations (SNVs) arising from a partial exon3/intron3 CYP2A7 conversion, on a CYP2A6*46 (58 bp 3′-UTR conversion to CYP2A7) backbone.High fidelity (HiFi) data facilitated the unambiguous read alignment spanning the entire CYP2A6 region, including the 2 gene conversions.The second haplotype was characterized as a CYP2A6*17 novel suballele (CYP2A6*17.002).Panel (b) depicts a CYP2A6 diplotype observed in a South African participant with the novel CYP2A6*55 star allele defined by rs114558780 (p.T378I) on a CYP2A6*1 background.Panel (c) depicts a CYP2A6 diplotype observed in a South African participant with the novel CYP2A6*56 star allele defined by rs113558392 (p.V292G) on a CYP2A6*46 background.The *56 allele appeared to be duplicated based on whole genome sequence data.However, XL-polymerase chain reaction (XL-PCR) for the duplicated gene copy was unsuccessful in this study.The second haplotype was characterized as a CYP2A6*35 novel suballele (CYP2A6*35.003).Panel (d) depicts 2 novel CYP2A6 suballeles (*9.002 and *31.003) observed in a Ghanaian participant.

Table 2
CYP2B6 and CYP2A6 star allele frequencies (%) in sub-Saharan African populations compared with global populations The star allele definitions followed in this study are according to PharmVar v5.2.14.1.CPIC, Clinical Pharmacogenetics Implementation Consortium; n, individuals; PharmGKB, Pharmacogenomics Knowledge Base; PharmVar, Pharmacogene Variation Consortium; SSA, sub-Saharan African populations.a The functional impacts of CYP2A6 star alleles mentioned in this table were based on a review by Tanner and Tyndale, 2 but they have not yet been curated by the PharmVar CYP2A6 expert panel and the PharmGKB team.
aThe functional impacts of CYP2A6 star alleles mentioned in this table were based on a review by Tanner and Tyndale, 2 but they have not yet been curated by the PharmVar CYP2A6 expert panel and the PharmGKB team.

Table 4
Potentially novel CYP2B6 and CYP2A6 haplotypes inferred from African short-read whole genome sequence datasets in this study