A Bayesian approach to estimate the probability of resistance to bedaquiline in the presence of a genomic variant

Background Bedaquiline is a core drug for treatment of rifampicin-resistant tuberculosis. Few genomic variants have been statistically associated with bedaquiline resistance. Alternative approaches for determining the genotypic-phenotypic association are needed to guide clinical care. Methods Using published phenotype data for variants in Rv0678, atpE, pepQ and Rv1979c genes in 756 Mycobacterium tuberculosis isolates and survey data of the opinion of 33 experts, we applied Bayesian methods to estimate the posterior probability of bedaquiline resistance and corresponding 95% credible intervals. Results Experts agreed on the role of Rv0678, and atpE, were uncertain about the role of pepQ and Rv1979c variants and overestimated the probability of bedaquiline resistance for most variant types, resulting in lower posterior probabilities compared to prior estimates. The posterior median probability of bedaquiline resistance was low for synonymous mutations in atpE (0.1%) and Rv0678 (3.3%), high for missense mutations in atpE (60.8%) and nonsense mutations in Rv0678 (55.1%), relatively low for missense (31.5%) mutations and frameshift (30.0%) in Rv0678 and low for missense mutations in pepQ (2.6%) and Rv1979c (2.9%), but 95% credible intervals were wide. Conclusions Bayesian probability estimates of bedaquiline resistance given the presence of a specific mutation could be useful for clinical decision-making as it presents interpretable probabilities compared to standard odds ratios. For a newly emerging variant, the probability of resistance for the variant type and gene can still be used to guide clinical decision-making. Future studies should investigate the feasibility of using Bayesian probabilities for bedaquiline resistance in clinical practice.


Introduction
In 2020, tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), was the 13 th leading cause of death and the second leading infectious killer after COVID-19 [1]. The public health threat posed by TB is worsened by the occurrence of drug-resistant TB. In 2019, close to half a million people developed rifampicin-resistant TB (RR-TB) worldwide [2]. In the past decade, important progress has been made, including rapid diagnosis of RR-TB and introduction of short all-oral treatment regimens [3].
Bedaquiline (BDQ) is an essential component of the novel short all-oral RR-TB treatment regimens [4]. Its use is rapidly increasing, with 90 countries using BDQ in 2018 [5]. Soon after its introduction, cases of acquired and primary BDQ resistance and cross-resistance between BDQ and clofazimine (CFZ), another drug frequently used to treat RR-TB, were reported [6][7][8][9]. To prevent acquisition and transmission of BDQ resistant TB, the use of BDQ should be accompanied with access to BDQ drug susceptibility tests (DST). Unfortunately, phenotypic DST (pDST) for BDQ, the gold standard, is slow, not yet standardized, and has a low positive predictive value in settings with low prevalence of BDQ resistance [10]. A genotypic DST (gDST) assay could provide accurate and timely information if knowledge on genetic variants and their association with resistance to BDQ is comprehensive [11][12][13].
Several candidate BDQ resistance genes have been identified [14]. Mutations in the atpE gene can cause a loss of binding affinity of BDQ with its drug target, resulting in high-level BDQ resistance in vitro [15], but they are rarely observed in clinical isolates [16]. Overexpression of the MmpS5/MmpL5 efflux system caused by mutations in the Rv0678 gene is the main mechanism of clinical resistance to BDQ [6,13,17]. Lastly, mutations in the pepQ and Rv1979c genes have also been implicated in BDQ resistance, but evidence is lacking [8,18].
A recent systematic review analyzed the association between phenotypic BDQ resistance and genomic variants in atpE, Rv0678, Rv1979c, and pepQ [11]. When applying standard statistical methods, only two out of 313 variants reported in the literature could be statistically associated with resistance [19]. Similarly, no mutations satisfied the criteria for BDQ resistance in the 2021 WHO catalogue of mutations in Mtb complex [14]. Consequently, the body of evidence on BDQ genotype-phenotype association cannot inform the use of gDST for patient care.
Bayesian methods can overcome data sparsity by combining data with expert opinion [20]. A Bayesian approach could be of great value in a clinical setting as it is conceptually similar to how clinicians approach the diagnostic process. Furthermore, the credible interval is well suited for clinical practice as it has a more intuitive interpretation [21,22]. We applied a Bayesian method to estimate the probability of BDQ resistance in the presence of genomic variants.

Materials and methods
Bayesian methods were used to estimate the probability of BDQ resistance in the presence of a mutation. First, a survey was conducted to obtain expert opinion (prior information) on genotype-phenotype associations for BDQ. Next, an update of a recent systematic literature review was performed to obtain the most comprehensive phenotypic-genotypic data [11]. Finally, Bayesian analyses were used to obtain the posterior distribution of the probability of BDQ resistance, the posterior median and its 95% credible interval (CrI) [23].

Survey
Experts were selected based on their active involvement in the field of genotype-phenotype associations in Mtb as determined by scientific publications or participation in the development of the WHO catalogue of mutations in Mtb [14,24]. Experts were asked to share contact details of relevant colleagues. The online "Bedaquiline resistance survey" tool (see Survey, S3 File, which includes the consent form and survey presented to experts) focused on the expert rules introduced by Miotto et al. [19] and Köser et al. [25]. For the atpE, Rv0678, Rv1979c, and pepQ genes, experts were asked if they believe that a mutation can cause BDQ resistance. If the expert answered yes, their opinion on the probability of resistance in the presence of different types of mutations (in-frame indel, frameshift indel, synonymous, nonsense, missense, and homoplastic mutation) was obtained using a six-point Likert scale, except for loss of function mutations (nonsense and frameshift mutation) in the atpE gene as these have not been observed in Mtb isolates. The survey was piloted by five experts.
Experts' responses were eligible for inclusion in the analysis if the expert responded "yes" to "Do you believe that a mutation in the atpE gene can confer resistance to BDQ?", indicated he/ she was "sure" or "relatively sure" about their responses, and there was no evidence of poor response quality.

Systematic literature review
To obtain the most comprehensive data on BDQ genotype-phenotype, the individual isolate systematic review (publications between 2008 and October 2020) was updated using the same search engines (Europe PubMed Central and Scopus), search terms and methodology for the period of November 2020 to December 30, 2021 [11]. to identify studies that reported genotypic and phenotypic bedaquiline resistance for individual clinical or non-clinical Mtb isolates. New records of individual isolates were added to the review database (see S2 File, which includes the screened literature, all extracted data, the meta-analysis, expert survey responses, and Bayesian analysis).

Statistical analysis
All analyses were done using R statistical software (version 4.0.5) (see S1 File). The correlation between survey responses for the same mutation types in different genes was assessed using the Spearman correlation coefficient [26].

Construction of prior distributions
Experts' responses on probability of BDQ resistance in the presence of a variant were translated into prior distributions. For the genes where all experts agreed on its potential role in BDQ resistance, a beta distribution (with shape parameters denoted as α and β) was used to parametrically model the prior distribution for the probability of BDQ resistance for each type of mutation (synonymous mutation, nonsense mutation, frameshift indel, missense mutation, inframe indel, homoplastic mutation). In order to obtain maximum likelihood (ML) estimates for the α-and β-parameters of the prior distribution for each mutation type, the likelihood function was maximized using the optim function (in the stats R package) with default Nelder-Mead line search method [27]. To deal with the interval-censored nature of the Likert scale data (i.e., x i = [ll i , ul i ), for i = 1, . . ., n), the (log)likelihood function, denoted hereunder as LL(x | α, β), for the estimation of the beta density was adjusted as follows [28]: where x = (x 1 ,. . ., x n ) represents the vector of reported probability intervals of BDQ resistance, given the presence of a specific type of mutation, for all experts and F(s | α, β) is the cumulative distribution function of a beta distributed random variable with parameters α and β evaluated in s, and ul i and ll i the upper and lower limits, respectively, of the probability interval associated with the Likert scale category chosen by expert i. For genes where experts expressed uncertainty about the role in BDQ resistance, a mixture distribution was used as prior. When all three answers ("yes", "no", "I do not know") were observed for the question "can mutations in gene X confer resistance to BDQ", a three-component mixture prior distribution was defined using empirical probability weights w j : where f j (p i ) represents the probability density for mixture distribution j = 1, 2, 3 and nonnegative weights w j such that 0 < w j < 1 for all j and P 3 j¼1 w j ¼ 1 [29]. More specifically, f 1 is the probability density function (pdf) of a beta distribution with shape parameters α 1 and β 1 , f 2 is the discrete probability distribution of a degenerate distribution at zero (i.e., f 2 (p i ) = Pr(P i = p i ) = I(p i = 0) for 0 < p i < 1, where I(.) is the indicator-function) and f 3 is the pdf of a uniform distribution on the unit interval, which is the same as a beta distribution with α 3 = β 3 = 1 [30]. The mixture proportions of each component were set equal to the observed probability of responding "yes", "no" or "I do not know", respectively.
When only two answers were observed, a two-component mixture prior distribution was used: To capture the experts' opinion on the assumption that the effect of mutations on the phenotype that occurs in laboratory experiments can be extrapolated to the phenotype of clinical isolates containing the same variant, a single unconditional prior distribution was constructed combining the information from experts answering "yes" and the uncertainty from experts answering "I don't know" or "no." [31]: More specifically, f ðp i Þ ¼ represents the conditional distribution of (P i | X = x j ), j = 1, 2, for experts answering "yes" (j = 1) and "I don't know" or "no" (j = 2), respectively. Finally, f represents the discrete probability distribution for the aforementioned response categories, i.e., f(x j ) = Pr(X = x j ).

Data likelihood
The genotype-phenotype individual isolate analysis of the updated systematic literature was used for the construction of the data likelihood. Observing phenotypic BDQ resistance (Z k = 1) or susceptibility (Z k = 0) in the presence of a specific mutation (for k = 1, . . .,m) can be viewed as a sequence of independent Bernoulli trials with the same probability of resistance, thereby implying that the number of resistant isolates, denoted by Y ¼ P m k¼1 Z k � Bðm; pÞ; follows a binomial distribution with m the number of times a specific genetic variant in the target gene of interest was reported in the literature, and π the probability of phenotypic BDQ resistance of an isolate containing such variant.

Posterior distribution and Bayesian inference
To estimate the probability of BDQ resistance and its 95% CrI, inferences were made based on the posterior distribution. Dependent Markov Chain Monte Carlo (MCMC) sampling of the posterior is performed using a Metropolis-Hastings (MH) algorithm. Convergence of the MCMC chains was checked by inspection of the trace plots and formal Gelman-Rubin convergence statistics, denoted by R. Three chains with starting values 0.25, 0.5 and 0.75 for the resistance probability were run and convergence was declared when R was less than 1 [32][33][34]. An autocorrelation plot was used to evaluate the dependence of the parameter to the Markov process after the burn-in period [35]. Thinning of the chain was performed every 5 th , 10 th , or 15 th iteration, depending on the results of the autocorrelation plot, and the iteration number was increased after thinning. The sufficiency of the iteration number to estimate the 2.5% and 97.5% quantiles was checked by the Raftery diagnostic [36,37].

Ethical approval
The study has been reviewed and approved by the Research Ethics Committee of the University Hospital of the University of Antwerp (REF number: 21/06/093). The survey was anonymous, all participated experts gave written informed consent.

Expert survey results and estimation of the prior
Of the 120 invited researchers, 46 (38.3%) completed the survey. Thirteen (28.3%) respondents were excluded from the analysis because they did not fulfil the predefined inclusion criteria: 11 participants reported they were not sure about their response and the responses by 2 participants were of poor quality (same answers to all questions and completion of the full survey in less than 5 minutes). Of the 33 experts whose results could be included, most (n = 25, 75.8%) were laboratory researchers and almost all (n = 27, 81.8%) resided in a high-income country ( Table 1).
Regarding general expert rules on genotype-phenotype associations, most experts believed that resistance-conferring mutations occurring in isolation also confer resistance when occurring in combination with other mutations (n = 26, 78.8%), that any increase above the critical concentration is clinically relevant (n = 20, 60.6%), and that the phenotypic effect of a mutation in a laboratory strain can be extrapolated to the effect in a clinical isolate (n = 22, 66.7%) ( Table 2).
Regarding the role of specific genes in BDQ resistance, all experts believed that a variant in atpE or Rv0678 can cause resistance to BDQ. For the pepQ and Rv1979c genes, 51.5% (17/33) and 78.1% (25/32) were uncertain whether a variant can cause resistance to BDQ, respectively.

Updated systematic literature review results
The updated search identified 503 additional studies of which five studies [38][39][40][41][42] were eligible for inclusion in the analysis, bringing the total number of included studies to 46 (see S1 Fig, showing the literature update flowchart). The majority of records were removed because they were reviews (n = 140), did not report on BDQ (n = 152), or studied non-tuberculous mycobacteria (n = 47). In the 46 included studies, 313 unique variants (223 from the published systematic review and 90 additional unique variants) were reported in 756 isolates. Most variants (92.3%, n = 289) occurred in clinical isolates of which 98 were reported in isolates phenotypically resistant to BDQ. Of the 189 (25.0%) BDQ resistant isolates, 142 (75.1%) were clinical isolates. Among these 142 phenotypically resistant clinical isolates, the presence of one or more

Posterior median probability of BDQ resistance for specific mutations
For single nucleotide polymorphisms (SNPs), the probability of resistance ranged from 0.4% for the silent mutations (Rv0678_C108T, Rv0678_T798G and atpE_G183A) to �95% for the nonsense mutation Rv0678_C400T and missense mutation atpE_G187C. However, some missense and nonsense mutations had a substantially lower median probability, while some silent mutations had a much higher probability of resistance (Tables 5 and 7). For SNPs, the probability of resistance ranged from 0.3% to 97.1% in atpE, from 0.4% to 96.9% in Rv0678, from 22.1% to 28.7% in pepQ, and from 2.9% to 29.6% in Rv1979c. For indels, the probability of resistance ranged from 10.0% to 98.2% in Rv0678, and from 18% to 27% in pepQ (Tables 6 and  7). The 95% CrIs were wide for almost all variants.
To overcome these challenges, we combined expert opinions and published genotype-phenotype data to estimate the Bayesian probability of BDQ resistance in the presence of a genomic variant. Our results showed that many experts were uncertain about the role of pepQ and Rv1979c variants in BDQ resistance and that experts overestimated the probability of resistance for most variant types, resulting in lower posterior (median) probabilities as compared to prior estimates. As expected, the posterior probability of BDQ resistance was low for synonymous mutations in atpE and Rv0678 (<5%) and relatively high for missense mutations in atpE (60.8%) and nonsense mutations in Rv0678 (55.1%). However, some missense and nonsense mutations had a substantially lower median probability, while some synonymous mutations had a much higher probability of resistance. This could either suggest that the presence of a specific type of mutation is not sufficient to determine the resistance phenotype or that the contradictory findings are a consequence of limited data availability. Surprisingly, the posterior probability of BDQ resistance was relatively low for missense mutations and frameshift in Rv0678 (�35%) and low for missense mutations in pepQ and Rv1979c (<3%). For specific SNPs and indels, the probability of resistance varied greatly between variants of the same type, even in the same gene, and 95% CrIs were wide due to sparse data.
In our analysis, the use of a Bayesian approach did not substantially improve the ability to predict bedaquiline resistance in the presence of specific variants. Nevertheless, we believe this is an important step forward for two reasons. First, the Bayesian approach is flexible and allows inclusion of different types of data. When, for example, data in the effect of variants on protein structure becomes available, this information could be used to improve the Bayesian prediction model. Second, for clinicians, the interpretation of a probability may be more intuitive than the interpretation of an odds ratio. For example, for variant Rv0678_A202G, the odds ratio of 3.8 (95% CI of 0.05-298.8) is interpreted as: the odds of resistance to BDQ for an Mtb isolate with a A202G variant is 3.8 times that of isolates without the A202G variant, independent of the presence of wild type or any other variant, and if we performed the analysis 100 times using different datasets (based on different samples), the estimate of the odds ratio would lie between the lower and upper limit of the constructed CI in 95% of the analyses. In contrast, the interpretation of the probability of resistance to BDQ for an Mtb isolate with a A202G variant derived from a posterior distribution with median (or mean) of 58.7% (95% CrI 16.1-92.8%) is more intuitive: the probability of resistance to BDQ for the isolate containing the A202G variant is estimated to be 58.7% and we are 95% certain that the true probability of resistance lies between 16.1% and 92.8%, given the data at hand.
The results of our study should be interpreted in light of some limitations. First, while the most accurate priors would be obtained by limiting the survey to the most knowledgeable experts, we deliberately extended the survey to obtain a broad and globally representative perspective. While this could have caused the wide range of opinions observed, the responses on the probability of resistance by mutation type in the Rv0678 and atpE genes were similar for experts with a laboratory versus clinical, epidemiological, or public health background. To limit the impact of extreme values in the survey responses, we used the median instead of the mean to summarize the prior probabilities. In the face of discrepancies, determining whether the prior or the data is correct is challenging [23]. Second, in the construction of the prior distributions, we did not account for the uncertainty around the estimated shape parameters of the beta distributions. In order to do so, one would need to specify hyperprior distributions for α and β thereby increasing posterior uncertainty further. Third, the published expert rules were developed for resistance to other drugs used to treat TB and may not fully apply to BDQ. For example, while Miotto et al assumed that nonsense and frameshift mutations can be predicted with confidence to cause a loss-of-function phenotype [19,25], we found that the posterior probability of BDQ resistance in the presence of a nonsense mutation in Rv0678 was only 55.1% and the probability of resistance for frameshift mutations in Rv0678 was only 29.9%. Fourth, the phenotype-genotype reported in studies included in the analysis may not always be accurate due to lack of standardization of BDQ pDST and the use of the current critical concentration to determine the BDQ resistance phenotype [24]. This may especially be a problem for mutations that confer a modest increase in BDQ MIC, which overlaps with the wild-type MIC distribution [41]. Fifth, to maximally use all reported data on BDQ resistance given a specific mutation, data on clinical and non-clinical isolates was combined and analyzed. Furthermore, the published data may not reflect the actual sampling distributions of the phenotype for a specific variant. Finally, posterior distributions can only be estimated for variants that have been reported in the literature. Given that the introduction of BDQ is recent, it is possible that many variants have not yet been observed or reported in the literature. The strength of the Bayesian approach is that it also provides a resistance probability for different variant types in the candidate resistance genes, which can give guidance to the clinician when confronted with a novel variant.
Probabilistic approaches to predict phenotypic resistance to BDQ from genotypic data could be an alternative approach to the standard statistical methods and help guide treatment decisions as it presents physicians with more intuitively interpretable probabilities (and 95% CrI) as compared to odds ratios (and corresponding 95% CI). For a variant not previously reported, the probability of resistance for a type of variant in a specific gene can still be used to guide clinical decision making. Since a probabilistic approach is new for clinicians, studies should investigate how feasible it is for physicians to use probabilities of BDQ resistance in clinical practice.