Expression Dysregulation as a Mediator of Fitness Costs in Antibiotic Resistance

ABSTRACT Antimicrobial resistance (AMR) poses a threat to global health and the economy. Rifampicin-resistant Mycobacterium tuberculosis accounts for a third of the global AMR burden. Gaining the upper hand on AMR requires a deeper understanding of the physiology of resistance. AMR often results in a fitness cost in the absence of drug. Identifying the molecular mechanisms underpinning this cost could help strengthen future treatment regimens. Here, we used a collection of M. tuberculosis strains that provide an evolutionary and phylogenetic snapshot of rifampicin resistance and subjected them to genome-wide transcriptomic and proteomic profiling to identify key perturbations of normal physiology. We found that the clinically most common rifampicin resistance-conferring mutation, RpoB Ser450Leu, imparts considerable gene expression changes, many of which are mitigated by the compensatory mutation in RpoC Leu516Pro. However, our data also provide evidence for pervasive epistasis—the same resistance mutation imposed a different fitness cost and functionally distinct changes to gene expression in genetically unrelated clinical strains. Finally, we report a likely posttranscriptional modulation of gene expression that is shared in most of the tested strains carrying RpoB Ser450Leu, resulting in an increased abundance of proteins involved in central carbon metabolism. These changes contribute to a more general trend in which the disruption of the composition of the proteome correlates with the fitness cost of the RpoB Ser450Leu mutation in different strains.

1 Identifying the basis of RpoB Ser450Leu tness cost.
As we point out in the main text, we expect that the physiological changes incurred by the tness cost of RpoB Ser450Leu are likely to manifest as deviations in gene expression. And because we know that the secondary mutation in RpoC: Leu516Pro, does have a compensatory role, we expect that comparative analysis will allow us to dene the subset of expression changes that are most relevant to the understanding of tness cost of rifampicin resistance. We used the global proling tools of RNAseq and SWATH-MS to achieve this. which impacts the protein compartment in a more idiosyncratic way.

Denining the signature of compensation
Next, we used dierential expression analysis to identify the functional consequences of rifampicin-resistance. Based on the data overview, we expected that an RNAP mutation would have a pleiotropic eect, albeit one of small magnitude.
We envisaged that only a subset of all expression dierences specic to RifR would coherently point to a likely biological basis for the reduced growth rate. We hypothesised that this subset would be characterised by a reversal of RpoB Ser450Leumediated dysregulation through the phenotypic eect of RpoC Leu516P ro . We named this trend a signature of compensation, see Figure 2B in the main text, and we derived it by identifying genes that are uniquely dierentially expressed in RifR compared to the other three strains in our dataset. To maximise the probability of identifying the signature of compensation, we chose an inclusive denition of dierential expression: an adjusted p-value of less than 0.05 for the negative binomial or linear mixed models for transcriptomic and proteomic data, respectively (see Methods in the main text). In keeping with our inclusive approach, we also deliberately did not use an eect size threshold (e.g. minimum log-fold change). Using these criteria, we identied 536 transcripts that could be involved in the cost of resistance. 289 transcripts were less abundant and 247 were more abundant in RifR compared to the other samples (see Figure 2A in the main text and Supplementary Figure 2). To assess the probability of detecting this many dierentially expressed genes by chance, we scrambled the sample labels and repeated the comparison 1,000 times. 70% of the simulated comparisons did not lead to the identication of any dierentially expressed genes and the empirical 95% condence interval spanned 0-95 signicantly dierentially expressed genes. 536 genes represented the most extreme outcome among the comparisons. The pattern was similar for proteomic data: we found 536 proteins that showed a signicant signature of compensation in RifR (260 more and 276 less abundant proteins, see Supplementary Figure 3). 92.8% of the iterations had no signicant changes.

Estimating the impact of compensation
The impact of individual mutations on the overall expression prole of a bacterial strain can be estimated by tting a linear model to the log of the fold changes of treatments of interest [1] .     Rose et al. [11] , previously showed that the genetic distance between two Mtb strains correlates with their baseline gene expression. This phenomenon might be underlying the dierences we observed in our system as well. Starting from this possibility, we expected that correlating rpoB mutant-imposed expression dierences with the corresponding genetic distances could be used to shed light on the forces that impact gene expression in drug-resistant strains.

Genetic distance is independent from expression distance
We hypothesised that a positive correlation between genetic distance and expression distance would provide compelling evidence that the response to rpoB mutations is modulated by the constellation of mutations that a strain accumulated over its evolutionary history. If true, this result would support the notion that strains belonging to specic lineages or sub-lineages respond to resistance mutations in a common way: a consequence of this being that some backgrounds are more likely to develop resistance, as has been suggested for the Lineage 2 -Beijing family of strains [13;14] . The absence of correlation on the other hand could mean either that the impact of the rpoB mutation is independent of genetic distance, or that the modulation of resistance-driven changes in gene expression is done by mutations that were acquired more recently in evolutionary time, making the perturbation of each strain unique. We can discern between the two by looking at the variability in expression distance. In the former case, we would expect the variance to be low, while in the latter, we would expect a large amount of heterogeneity. In order to test our hypothesis, we needed rst dene a metric of expression distance. We complemented our earlier approach based on the overlap of dierentially expressed genes by calculating the Hamming distance across genes between strain pairs. We also used a fundamentally dierent measure of similarity that is rooted in expres-  Figure 1: Relationship between bacterial tness, measured as growth in vitro, and transcriptional activity, expressed as the relative rate of transcript elongation. In both cases, the wild type strain is used as reference and the parameters for dierent RNAP mutants expressed as a proportion thereof. E. coli data were obtained from [5] , P. aeruginosa from [6] . M. tuberculosis tness data were obtained from [7] , and the relative transcription rate from [8] . In the case of E. coli and P. aeruginosa transcriptional eciency is mea-