Pharmacokinetics and pharmacogenomics of clozapine in an ancestrally diverse sample: a longitudinal analysis and genome-wide association study using UK clinical monitoring data

Summary Background The antipsychotic, clozapine, is the only licensed drug against the treatment-resistant symptoms that affect 20–30% of people with schizophrenia. Clozapine is markedly underprescribed, partly because of concerns about its narrow therapeutic range and adverse drug reaction profile. Both concerns are linked to drug metabolism, which varies across populations globally and is partly genetically determined. Our study aimed to use a cross-ancestry genome-wide association study (GWAS) design to investigate variations in clozapine metabolism within and between genetically inferred ancestral backgrounds, to discover genomic associations to clozapine plasma concentrations, and to assess the effects of pharmacogenomic predictors across different ancestries. Methods In this GWAS, we analysed data from the UK Zaponex Treatment Access System clozapine monitoring service as part of the CLOZUK study. We included all available individuals with clozapine pharmacokinetic assays requested by their clinicians. We excluded people younger than 18 years, or whose records contained clerical errors, or with blood drawn 6–24 h after dose, a clozapine or norclozapine concentration less than 50 ng/mL, a clozapine concentration of more than 2000 ng/mL, a clozapine-to-norclozapine ratio outside of the 0·5–3·0 interval, or a clozapine dose of more than 900 mg/day. Using genomic information, we identified five biogeographical ancestries: European, sub-Saharan African, north African, southwest Asian, and east Asian. We did pharmacokinetic modelling, a GWAS, and a polygenic risk score association analysis using longitudinal regression analysis with three primary outcome variables: two metabolite plasma concentrations (clozapine and norclozapine) and the clozapine-to-norclozapine ratio. Findings 19 096 pharmacokinetic assays were available for 4760 individuals in the CLOZUK study. After data quality control, 4495 individuals (3268 [72·7%] male and 1227 [27·3%] female; mean age 42·19 years [range 18–85]) linked to 16 068 assays were included in this study. We found a faster average clozapine metabolism in people of sub-Saharan African ancestry than in those of European ancestry. By contrast, individuals with east Asian or southwest Asian ancestry were more likely to be slow clozapine metabolisers than those with European ancestry. Eight pharmacogenomic loci were identified in the GWAS, seven with significant effects in non-European groups. Polygenic scores generated from these loci were associated with clozapine outcome variables in the whole sample and within individual ancestries; the maximum variance explained was 7·26% for the metabolic ratio. Interpretation Longitudinal cross-ancestry GWAS can discover pharmacogenomic markers of clozapine metabolism that, individually or as polygenic scores, have consistent effects across ancestries. Our findings suggest that ancestral differences in clozapine metabolism could be considered for optimising clozapine prescription protocols for diverse populations. Funding UK Academy of Medical Sciences, UK Medical Research Council, and European Commission.


Supplementary Methods
Curation of the pharmacokinetic data 19,096 pharmacokinetic assays were available for 4,760 CLOZUK individuals, including information on the clozapine and norclozapine plasma concentrations, the daily clozapine dose, and the time of both the drug intake and blood draw.All plasma concentrations were determined by a standard high-performance liquid chromatography mass spectrometry (HPLC-MS) procedure at Magna Laboratories (Ross-on-Wye, U.K.), and further details are provided in a previous publication [1].The pharmacokinetic assay dataset was assessed for potential clerical errors (e.g.negative time periods, multiple samples recorded at the same time point) and cross-checked with ZTAS records to ensure data integrity (e.g.correct year of birth for each individual).Data from all assays containing any erroneous information was removed.
Additionally, assays were excluded from further analyses if they fitted any of the following criteria: (i) blood drawn outside of a "trough sample" interval of 6-to 24-hours postdose [2]; (ii) clozapine or norclozapine concentration <50 ng/mL, outside the minimum detection range of the HPLC-MS instrument; (iii) clozapine concentration >2000 ng/mL, reaching the range of potential toxicity [3]; (iv) clozapine:norclozapine ratio outside of the 0.5-3.0interval, indicating potential non-adherence to treatment [4]; and (v) clozapine dose >900 mg/day, the maximum advised by the British National Formulary.Finally, due to potentially different treatment regimes, we also removed all data from individuals under 18 years of age.

Curation of the genomic data
The CLOZUK2 samples underwent genotyping, quality control and imputation as previously described [1].Briefly, 7,417 samples were genotyped by deCODE Genetics (Reykjavík, Iceland), using an Illumina HumanOmniExpress-12 array with 719,665 SNPs.Quality control was performed using PLINK v1.9 [5].Samples and markers with a missingness rate of >2% and samples with an inbreeding coefficient of F>0.2 were excluded from further analyses.After curation and merging with the pharmacokinetic assay dataset, 3,578 samples genotyped at 698,442 SNPs remained in CLOZUK2.
For CLOZUK3, 1,439 samples were genotyped at the Icahn School of Medicine at Mount Sinai (New York City, USA) using an Illumina® Infinium Global Screening Array-24 (GSA-24) with 654,027 SNPs.The curation of both samples and markers was performed using the same procedures as CLOZUK2, implemented in the DRAGON-Data pipeline [6].After curation and merging with the pharmacokinetic assay dataset, 917 samples genotyped at 537,334 SNPs remained in CLOZUK3.
Genotype imputation for both cohorts was performed using the Haplotype Reference Consortium (HRC) panel through the Michigan Imputation Server [7].All imputed genotype dosages of CLOZUK2 and CLOZUK3 were further curated within each cohort with the following parameters: Imputation quality r 2 ≥ 0.7; hard-call genotype probability ≥ 80%; hardcall missingness ≤ 5%; minor allele count (MAC) ≥ 2. The remaining variants in common between both samples were then merged and a second round of dosage curation was performed with the following parameters: Hard-call missingness ≤ 2%; MAC ≥ 400 and Hardy-Weinberg equilibrium (HWE) mid-p > 10 -6 .This led to over 2.9 million high-quality markers for GWAS and polygenic risk score (PRS) analyses.For context, the choice of setting quality control thresholds on MAC instead of minor allele frequency (MAF) was motivated by evidence relating this parameter to the power and robustness of multiple genomic association tests [8,9], for which usual MAF thresholds are not always relevant.

Prediction of biogeographic genomic ancestry
Genomic data was also used for biogeographic ancestry prediction using a linear discriminant analysis (LDA) model based on ancestry-informative markers [AIMs; 10].Briefly, genotype data was merged with a public geo-localised reference panel based on the Human Genome Diversity Project [HGDP ; 11].The population differentiation statistic FST was used to identify AIMs by selecting those overlapping SNPs maximally differentiated between HGDP superpopulations.Genotype principal components (PCs) were then generated based on these AIMs and processed through a LDA model with the HGDP samples of known ancestry as training set and CLOZUK2/CLOZUK3 as test sets.The model resulted in a set of probabilities, for each sample, of belonging to each of five global biogeographic ancestries: "European", "North African", "Sub-Saharan African", "East Asian" and "Southwest Asian" [10].For comparability with other studies, it should be noted that these ancestries are equivalent in definition to the non-admixed standardised biogeographic groups recently proposed for use in pharmacogenomics research by Huddart et al. [12].Individuals were assigned to one specific ancestry if their LDA probability surpassed a threshold of 80%, and to a group labelled as "admixed/unknown" otherwise.For carrying out ancestry-specific GWAS, given the small sample size of some of these subgroups, the merged CLOZUK dosages were curated within each ancestry with the following parameters: Hard-call missingness ≤ 2%; hard-call MAC ≥ 40 and HWE mid-p > 10 -4 .

GWAS model fitting and statistical fine-mapping
The equation of a generalised linear mixed model (GLMM), as used for pharmacokinetic, GWAS, and PRS association tests, takes the following form in matrix notation [13]: Where y is the outcome variable, in our case the plasma concentrations of either clozapine or norclozapine, or the clozapine:norclozapine metabolic ratio.X is a matrix of variables with a corresponding matrix  of fixed effects, in our case formed by genetic (SNPs/PRSs, PCs, ancestry probabilities) and non-genetic predictors (sex, age, clozapine dose, time between dose intake and blood sample).Z is a matrix of grouping variables with a corresponding matrix  of random effects, in our case a CLOZUK sample ID which uniquely identifies all pharmacokinetic assays from the same person, Finally, e is an error term of unexplained ("residual") variability.
For our main GWAS analyses, we estimated effect sizes, standard errors and p-values for each SNP using the Wald test implemented in TrajGWAS.A thorough description of this method, including power curves for a range of variant effect sizes and MAFs can be found in Ko et al. [14].TrajGWAS uses the "within-subject variance estimator by robust regression" (WiSER) approach [15], a GLMM framework that can accommodate predictors of the outcome mean and variance while being robust to multiple outcome and random effect distributions.However, computational problems with the WiSER procedure may occur while modelling long-tailed outcome distributions, requiring changing the numerical optimisation procedures for particular genomic regions or SNPs [14].As long tails and extreme values are often features of pharmacokinetic metrics [16], we sought to avoid this potential problem by normalising and standardising all of our outcomes before the TrajGWAS analysis.For this we used a cube root transformation for the gamma-distributed clozapine and norclozapine concentrations [17,18], and a logarithmic transformation for the log-normally-distributed metabolic ratio.All covariates were also standardised, as recommended to improve the computational performance of GLMMs [19].These procedures were also used in the genomic glmmTMB analyses.
Taking advantage of the WiSER modelling capabilities, TrajGWAS models included sex as a predictor of both the between-person mean and the within-person variance, reflecting the observation that differential smoking habits in males and females might account for some of the "noise" in clozapine pharmacokinetic assays [20].When replicating the results of specific SNPs in glmmTMB (Supplementary Table 3), we approximated this approach by setting sex as a predictor of both the mean ("location") and the residual variance ("scale") in the GLMM.
We also included a second-order polynomial term (age 2 ) among our fixed effects in all models.This is a standard way of modelling non-linear relationships between outcomes and predictors in linear models [21], in this case reflecting the potentially steeper change in drug pharmacokinetics that may occur with aging [22].
By implementing a generic SNP-level association test to longitudinal genomic datasets, trajGWAS can be employed in both ancestry-specific and cross-ancestry experimental designs.
It has been shown that GWAS approaches employing the cross-ancestry framework can achieve greater discovery power than ancestry-specific studies by avoiding the exclusion of admixed individuals [23], as well as by taking advantage of similarities in variant effect size across ancestries [24].However, the greater diversity in cross-ancestry samples requires a tighter control of population stratification to prevent genomic inflation, which was addressed in our main analysis by covarying for 10 PCs and 4 biogeographical ancestry LDA probabilities.These are two related but not redundant measures of "global" genetic ancestry [24], with the latter being a quantitative approximation to the ancestral background of the individual that might incorporate non-genetic (cultural or environmental) diversity relevant to the phenotype under study, as well as genetic information not captured by the top PCs [25].It should be noted that trajGWAS, although made for fitting GLMMs to genetic data, does not incorporate an option for using a kinship or genetic relatedness matrix (GRM) as the covariance structure of its individual-level random effect term to account for ancestry, cryptic relatedness or family structure.That function is characteristic of other statistical genomics software derived from the "variance components" GLMM framework [26], such as GCTA [27] or GENESIS [28].These were not designed for fitting complex regression models to longitudinal data and carrying out the adaptations and tests required for this purpose was beyond the scope of our study.The use of variance components GLMMs in cross-sectional datasets with population stratification has been reviewed elsewhere [29].
Secondary ancestry-specific GWAS were carried out using the saddlepoint approximation (SPA) to the score test also implemented in TrajGWAS, which is appropriate even for small sample sizes and rare allele frequencies [14].These analyses did not include ancestry probabilities as covariates, though still included the full set of 10 PCs to control for population stratification.Ancestry-specific GWAS were not run in the sample of individuals classified as "admixed/unknown" due to their potential ancestral complexity; nor in those of East Asian ancestry due to the small sample size of this group (n=41).
Genomic inflation statistics (λ values) were calculated on the output of each GWAS using the fastman R package [30].LD clumping was also performed to summarise the results and identify tagged genes.LD clumps were formed around genome-wide significant ("index") SNPs and variants with r²>0.1 within 3000 kb were assigned to these clumps.Fine-mapping of genomewide significant clumps was carried out with FINEMAP v1.41 [31], setting the maximum number of causal SNPs ("k") parameter to k=5, and employing locus-wide LD panels created from the CLOZUK dosage data.Clumps in common between different outcomes were additionally run through flashfm v1.0 [32] using the top 1,000 causal configurations identified by FINEMAP, in order to pinpoint potentially shared causal SNPs.Additionally, for each genome-wide significant GWAS locus and phenotype, dosages from the fine-mapped SNP with the largest probability of being causal were extracted and analysed in R (using glmmTMB) to estimate GLMM effect sizes in the scale of the original pharmacokinetic variables.

Estimation of SNP-based heritability
We used MiXeR v1.3 to estimate the heritability of clozapine metabolism phenotypes using summary statistics, given its reported good performance across a range of genomic architectures including oligogenic traits [33].An LD reference panel was generated directly from CLOZUK best-guess genotypes and all analyses were ran using default parameters.As recommended by the software developers, results were generated from 20 MiXeR runs using random sets of ~500,000 LD-independent SNPs (MAF > 0.05; r 2 < 0.9).

Polygenic score association analysis
As described in the main text, GLMM regression models to estimate PRS effect sizes followed the main pharmacokinetic analyses.As an index of the proportion of variance explained by PRS in the context of other fixed and random effects, a semi-partial R 2 statistic was estimated in R using partR2 [34].Given the current implementation of this method cannot accommodate gamma-distributed GLMMs, all PRS models were re-fitted for R 2 estimation using normalised phenotypes.Following a recent recommendation by Rights and Sterba [35], time-varying fixed-effect predictors (age, age 2 , clozapine daily dose and TDS) were also centred at the individual level ("de-meaned") in the re-fitted LMM models.

Statistical fine-mapping of GWAS loci
The best-fitting FINEMAP models for CYP1A1/1A2 and POR included just a single causal variant (k=1); while UGT1A*, UGT2B10 (in norclozapine) and CYP2C18 had two (k=2).The metabolic ratio locus at UGT2B10 could not be confidently fine-mapped with the default software settings (best-fitting k ≥ 5).We reran FINEMAP at this locus using a conditional analysis approach, which inferred a best causal model with k=7.Based on the data from the best FINEMAP model for each locus, we then defined credible sets with 95% posterior probability of including all causal SNPs at a locus.These contained between 1 and 411 variants, each with an estimated PPI value indicating its probability of being a causal variant (Supplementary Table 5).Apart from the metabolic ratio CYP2C18 locus, all FINEMAP credible sets contained fewer SNPs than a previous fine-mapping effort [1].Within credible sets, one SNP was confidently identified as the only causal signal behind the CYP1A1/1A2 locus for both clozapine and norclozapine (rs2472297, PPI=1).SNPs with high causal probabilities (PPI ≥ 0.33) were also found within the boundaries of two genes: UGT2B10 (norclozapine and metabolic ratio) and POR (metabolic ratio).These genes themselves cumulatively contained most of the PPI within each locus, suggesting they are the most likely genes in the region to be causally related to the association signal [36].
The other SNPs were not found within these results.On the UGT1A4 locus, we instead found several other missense SNPs within the FINEMAP credible sets, but the majority were within a UGT1A5 exon (4/8 for clozapine, cumulative PPI=0.119;4/7 for norclozapine, cumulative PPI=0.110).This result was consistent with the smaller flashfm credible sets, which had most of their missense variants also mapped to UGT1A5 (5/6 for clozapine, cumulative PPI=0.127;5/6 for norclozapine, cumulative PPI=0.120).On the UGT2B10 locus, for either norclozapine or the metabolic ratio, none of our analyses included any missense variants as part of credible sets, and in fact those were largely different between the assessed phenotypes.It should be noted this result could have been confounded by the large number of causal SNPs inferred for the metabolic ratio GWAS, which itself might have arisen due to the complexity and breadth of the association signal (1385 SNPs).Finally, rs28379954 could not be evaluated in our GWAS due to the poor imputation quality of the NFIB gene within CLOZUK2 and CLOZUK3 (imputation r 2 < 0.6).
Additional analyses using flashfm were undertaken for loci shared between multiple phenotypes.This led to further shrinking in the UGT1A* credible set, from 90-107 SNPs in the norclozapine and clozapine analyses, respectively, to 74 SNPs (Supplementary Table 5).
However, the credible sets of CYP1A2 and UGT2B10 widened, likely because flashfm uses information from all the FINEMAP models, not just those that are the best-fit, and thus reflects uncertainty in the number of causal variants underpinning the GWAS signal.Using flashfm also revealed a set of SNPs in high LD (r 2 >0.7) within the UGT2B10 locus that were putatively causal for both phenotypes and partially within the coding region of this gene (SNP group "A"; norclozapine cumulative PPI=0.169;ratio cumulative PPI=1; Supplementary Table 5).

Ancestry-specific effect sizes of main GWAS SNPs
The cross-ancestry consistency of the markers in Table 4 was explored by re-fitting the regression models within the CLOZUK ancestry groups.Across 40 tests (10 SNPs x 4 ancestries), we saw consistency (in direction and magnitude) of all SNP effect sizes with the main analysis in 38 instances, including every comparison in which the ancestry-specific association statistics were at least nominally significant (Supplementary Table 6).

Heritability and polygenicity of clozapine metabolism
The univariate analyses models implemented in MiXeR supported the oligogenicity of clozapine metabolism, with less than 0.001% of genetic variants genome-wide having nonzero effects for any of the analysed phenotypes.Common SNP-based heritability values were estimated at 2.87% (SE=0.30%)for clozapine; 4.86% (SE=0.28%)for norclozapine and 8.62% (SE=0.59%)for the metabolic ratio.Despite these low values, all the model fitting performance metrics (Akaike's Information Criterion, AIC; Bayesian Information Criterion, BIC) for these models were positive with the only exception of the BIC value of clozapine, supporting that the current TrajGWAS analysis is sufficiently powered for this procedure.

Supplementary Table 2
Additional fixed-effect covariates for the four clozapine pharmacokinetics models reported in Table 2, as estimated with GLMM regression.
Predictor names indicate their unit of measurement or, in the case of binary predictors, their non-reference level.Effect sizes (β) indicate the positive or negative impact of a one-unit increase of the predictor in the average of the outcome, across all the individuals and longitudinal assays of the cross-ancestry CLOZUK sample.* Putatively shared causal variant for norclozapine and the clozapine:norclozapine metabolic ratio reported by flashfm.

Supplementary Table 7
Pharmacokinetic outcomes stratified by genotype of genome-wide significant SNPs associated with clozapine metabolism phenotypes.To account for both longitudinal measurements and covariates, intercept-free GLMMs were used to calculate outcome means and standard errors, as suggested by Schielzeth [21].Scales used are ng/mL for plasma concentrations and a natural scale for the clozapine:norclozapine ratio.

Table 3
Effect sizes of fine-mapped genome-wide significant SNPs associated with clozapine metabolism phenotypes, in a ng/mL scale for plasma concentrations and a log scale for the ratio.PPI: Largest posterior probability of the variant being causal across FINEMAP and flashfm analyses.Unadjusted effect size estimates, controlled only for genomic covariates and a random effect, are given in Supplementary Table7.
Putatively shared causal variant for norclozapine and the metabolic ratio reported by flashfm.