Compensatory mutations are associated with increased in vitro growth in resistant clinical samples of Mycobacterium tuberculosis

Mutations in Mycobacterium tuberculosis associated with resistance to antibiotics often come with a fitness cost for the bacteria. Resistance to the first-line drug rifampicin leads to lower competitive fitness of M. tuberculosis populations when compared to susceptible populations. This fitness cost, introduced by resistance mutations in the RNA polymerase, can be alleviated by compensatory mutations (CMs) in other regions of the affected protein. CMs are of particular interest clinically since they could lock in resistance mutations, encouraging the spread of resistant strains worldwide. Here, we report the statistical inference of a comprehensive set of CMs in the RNA polymerase of M. tuberculosis, using over 70 000 M. tuberculosis genomes that were collated as part of the CRyPTIC project. The unprecedented size of this data set gave the statistical tests more power to investigate the association of putative CMs with resistance-conferring mutations. Overall, we propose 51 high-confidence CMs by means of statistical association testing and suggest hypotheses for how they exert their compensatory mechanism by mapping them onto the protein structure. In addition, we were able to show an association of CMs with higher in vitro growth densities, and hence presumably with higher fitness, in resistant samples in the more virulent M. tuberculosis lineage 2. Our results suggest the association of CM presence with significantly higher in vitro growth than for wild-type samples, although this association is confounded with lineage and sub-lineage affiliation. Our findings emphasize the integral role of CMs and lineage affiliation in resistance spread and increases the urgency of antibiotic stewardship, which implies accurate, cheap and widely accessible diagnostics for M. tuberculosis infections to not only improve patient outcomes but also prevent the spread of resistant strains.

Table S1: Median growth of resistant samples with specific resistance mutations.Mann-Whitney p-values are given with respect to the subscripted sample type and n indicates the sample size.Growth distributions of these samples are shown in Figure 2.
Table S2: Reference CMs used for evaluating our approach to identifying new CMs.Putative CMs are shown on the left.We included reference CMs that were either proven experimentally or identified in at least three of the reference papers.Fisher indicates if the CM came up as significantly resistance associated in our statistical association test and was located below the heuristic p-value threshold and showed homoplasy.Table S6: Lineage-wise median growth of samples with different compensatory mutations compared to pansusceptibles and samples with only resistance.The confidence interval (CI) for the median is calculated using bootstrapping where 'CI low' indicates the lower threshold and 'CI high' the upper threshold.P-values are given with respect to resistant (p-value r ) and pan-susceptible sample growth (p-value s ) and n indicates the sample size.

Figure S1 :
Figure S1: Growth distributions for pan-susceptible samples vs samples with specific rifampicin (RIF) resistance mutations in M. tuberculosis (A-C) Distributions of growth in percent of covered well-area as measured in the CRyPTIC project 9 were plotted as a histogram against the proportion of samples that display this amount of growth.Samples with the resistance mutation indicated in the legend and no other potentially interfering mutations are plotted in red, samples that were classified as pan-susceptible are plotted in green.Vertical lines indicate the respective medians.The medians and Mann-Whitney p-values of the distributions are listed in Table1.

Figure S2 :
FigureS2: Sensitivity and number of significant hits (putative compensatory mutations) depending on pvalue The graph shows the number of significant hits and reference hits detected depending on the log 10 p-value cutoff shown on the x-axis.The left y-axis refers to the percentage of found reference hits from a compiled list, also termed sensitivity or true positive rate (TPR).The right y-axis shows the number of mutations that were classified as significantly resistance associated under the respective cut-off.The vertical lines indicate the p-value cut-off with Bonferroni correction and our heuristic p-value cut-off at the 98% quantile, respectively.

Figure S3 :
Figure S3: Location of two high-confidence compensatory mutations (CMs) on the RNA polymerase (RNAP) (A) The CM G332S is located on the β ′ subunit, in a contact region to the β subunit (magenta).The change from Glycine (stick representation) to Serine (negatively charged side chain) might enable an interaction with the close-by Arginine (positively charged side chain) on the β subunit.The bound drug rifampicin (light blue) can be seen in the background.(B) The CM G433S is located on the β ′ subunit, in a contact region to the β subunit.The change from Glycine (stick representation) to Serine might enable an interaction with the close-by Histidine (positively charged side chain) on the β subunit.

Figure S4 :
Figure S4: Growth distributions of M. tuberculosis samples within different lineages (A) Distribution of growth in M. tuberculosis Lineage 1 in percent of covered well-area as measured in the CRyPTIC project 9 were plotted as a histogram against the proportion of samples that display this amount of growth.Samples with rifampicin (RIF) resistance mutations but no putative compensatory mutations (CMs) are plotted in red, samples that were classified as pan-susceptible are plotted in green.Samples that have RIF resistance mutations and at least one CM are shown in blue.Vertical lines indicate the respective medians.The medians and Mann-Whitney p-values of the distributions are shown in Supplementary Table S5.(B) Plot layout as in (A), but samples derive from M. tuberculosis Lineage 2. (C) Plot layout as in (A), but samples derive from M. tuberculosis Lineage 3. (D) Plot layout as in (A), but samples derive from M. tuberculosis Lineage 4.

Table S3 :
Hit list resulting from Fisher's exact test for association of resistance with co-occuring mutations, after removing synonymous mutations.The first column indicates the resistance mutation that the putative compensatory mutation (CM) in the second column is associated to.'Only CM' indicates how often the CM occurs on its own, without the corresponding resistance mutation, and 'both' indicates how often we see the two mutations occur together.The last two columns indicate if the CM has been mentioned in the literature and if it shows homoplasy, respectively.

Table S4 :
Median growth of resistant samples with compensatory mutations compared to pan-susceptible samples and samples with only resistance mutations.The confidence interval (CI) for the median is calculated using bootstrapping where 'CI low' indicates the lower threshold and 'CI high' the upper threshold.P-values are given with respect to resistant (p-value r ) and pan-susceptible sample growth (p-value s ) and n indicates the sample size.

Table S5 :
Median growth of pan-susceptible samples from different M. tuberculosis lineages.The confidence interval (CI) for the median is calculated using bootstrapping where 'CI low' indicates the lower threshold and 'CI high' the upper threshold.P-values are given with respect to each lineage, indicated by the subscript x (pvalue x ) and n indicates the sample size.