Combining structure and genomics to understand antimicrobial resistance

Antimicrobials against bacterial, viral and parasitic pathogens have transformed human and animal health. Nevertheless, their widespread use (and misuse) has led to the emergence of antimicrobial resistance (AMR) which poses a potentially catastrophic threat to public health and animal husbandry. There are several routes, both intrinsic and acquired, by which AMR can develop. One major route is through non-synonymous single nucleotide polymorphisms (nsSNPs) in coding regions. Large scale genomic studies using high-throughput sequencing data have provided powerful new ways to rapidly detect and respond to such genetic mutations linked to AMR. However, these studies are limited in their mechanistic insight. Computational tools can rapidly and inexpensively evaluate the effect of mutations on protein function and evolution. Subsequent insights can then inform experimental studies, and direct existing or new computational methods. Here we review a range of sequence and structure-based computational tools, focussing on tools successfully used to investigate mutational effect on drug targets in clinically important pathogens, particularly Mycobacterium tuberculosis. Combining genomic results with the biophysical effects of mutations can help reveal the molecular basis and consequences of resistance development. Furthermore, we summarise how the application of such a mechanistic understanding of drug resistance can be applied to limit the impact of AMR.


Antimicrobial resistance (AMR)
Drugs against bacterial, viral and parasitic pathogens have truly revolutionised modern medicine, transforming human health and saving millions of lives. This transformation, however, is under threat due to emerging and widespread resistance to these drugs [1]. This threat is termed antimicrobial resistance (AMR), and is a natural and expected consequence of the Darwinian principle of ''survival of the fittest". Almost all antimicrobial drugs have seen resistance arise within 5-10 years of their introduction [2]. The consequences of AMR pose a catastrophic public health threat, responsible for over 700,000 annual deaths [3], prolonged hospital stays, poor disease outcome, less effective treatments, and potentially untreatable diseases. Considering antibiotic resistance alone, the toll is predicted to rise above 10 million deaths per year by 2050 if left unchecked. The associated global economic burden is estimated at 100 trillion USD [3].
The disease burden of AMR has been accelerated by the overuse and misuse of antimicrobials in health, animal and agricultural industries. This burden is further compounded by a lack of market incentives for antimicrobial drug development [3]. Nearly all major infectious diseases are affected by either prevailing or emerging resistance. For example, it is estimated that people with MRSA (Methicillin-Resistant Staphylococcus aureus) are 64% more likely to die than people with a non-resistant form of the infection [1]. Similarly, resistance to artemisinin-based combination therapy, the first-line treatment for malaria caused by Plasmodium falciparum (P. falciparum), has been confirmed in 5 countries in the Greater Mekong Region in 2016 [1]. Likewise, in 2010, an estimated 7-15% patients starting antiretroviral therapy (ART) in developing countries had drug-resistant HIV, with up to 40% resistance observed in patients re-starting treatment [1].
Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), is a major global health problem, with increasing drug resistance making disease control difficult [4]. In 2017, 558,000 cases of rifampicin resistant TB were reported, among which 82% had additional resistance to isoniazid, leading to multidrug-resistant TB (MDR-TB). Among these MDR cases,~9% cases were further resistant to one fluoroquinolone and one injectable 2nd line drug, leading to extensively drug resistant TB (XDR-TB) [5,6].
Resistance is attributed to multiple factors including selective pressure on Mtb from repeated exposure to the same antibiotic, a lack of access to new therapies, and patient non-compliance due to long treatment regimens and drug toxicity effects [7,8]. Both phenotypic and genotypic routes are involved in the development of Mtb resistance. While epigenetic changes and post transcriptional modifications drive the phenotypic route to resistance [9,10], the genetic route is chiefly acquired via accumulation of mutations in the absence of horizontal gene transfer. Resistance-associated point mutations have been described across all anti-TB drugs, including newer ones (fluoroquinolones, bedaquiline) [11,12].

Drivers of AMR
The drivers of AMR can be both intrinsic or acquired. Intrinsic resistance refers to the innate mechanisms present within microbes to combat the action of drugs, and is considered to be independent of previous drug exposure. Intrinsic mechanisms include: (i) the presence of an additional impermeable outer membrane in Gram negative bacteria making them naturally resistant to antibiotics that target cell wall synthesis such as vancomycin [13]. (ii) the presence of enzymes that either prevent drug binding within an organism, or destroy the drug. An example of the former is the low affinity binding by Gram positive bacteria of penicillin-binding proteins (PBPs) required for the synthesis of peptidoglycan in the cell wall, thus making them naturally resistant to the b-lactam antibiotic aztreonam. An example of the latter is the production of blactamase by Gram negative bacteria which destroy blactam antibiotics before they can reach their PBP targets [14]. (iii) the presence of multi-drug efflux pumps, which are complex bacterial molecular machines capable of removing drugs and toxic compounds out of the cell. For example, efflux mediated drug resistance in tetracycline is mediated by the Tet efflux pumps which use proton exchange as its energy source to expel the antibiotic [15]. (iv) the lack of enzymes or metabolic pathways in aerobic bacteria to chemically reduce the drug metronidazole to its active form [13]. (v) the co-evolution of microbes with their surroundings containing a variety of toxic and benign molecules and compounds, which is commonly observed in environmental microbes. For example, the soil bacteria actinomycetes harbours an intrinsic 'resistome' to the many antibiotics it produces [16,17]. (vi) the phenomenon of bacterial persistence, notably observed in asymptomatic and chronic infections such as typhoid and TB. Persisters are a sub population of antibiotic tolerant cells that exhibit low metabolic activity and arrested growth, contributing to increased drug tolerance and resistance [18].
Acquired drug resistance is typically driven by genetic variation including point mutations (missense mutations or nonsynonymous single nucleotide polymorphisms; nsSNPs) and insertions/deletions (INDELs) such as frameshift mutations. Such muta- tions can alter drug activation, binding affinity and permeability, efflux pump activity, and biofilm formation [19]. Furthermore, a common and prominent mechanism called horizontal gene transfer (HGT) or lateral gene transfer (LGT) has been a significant cause of widespread drug resistance. HGT/LGT is found almost exclusively in bacteria where resistance conferring genes are transferred between bacterial species [20,21]. Despite the two distinct routes of resistance, intrinsic mechanisms may be driven by adaptive/acquired routes. For example the efficacy of drug efflux pumps in Mtb are modulated by SNP mutations [22,23]. The drivers of AMR and the various mechanisms beyond point mutations (which forms the focus of this review) have been extensively reviewed elsewhere: antibiotic resistance [13,14], antifungal resistance [24][25][26], antiviral resistance [27,28] and antiparasitic drug resistance [29][30][31].

Point mutations linked to AMR
A major route to AMR is driven by point mutations. For example, in Mtb, mutations in several genes have been associated with resistance to rifampicin (rpoB), isoniazid (katG, inhA and ahpC), streptomycin (gidB, rrs and rpsL), pyrazinamide (pncA), ethambutol (embB) and fluroquinolone (gyrA and gyrB). More generally, mutations within gyrA confer low level fluroquinolone resistance in Gram negative bacteria, while additional mutations in parC and gyrB are responsible for high level resistance [32]. Ribosomal mutations affecting ribosome assembly are particularly problematic since these lead to large scale transcriptomic and proteomic changes. In Mycobacterium smegmatis, such mutations have led to downregulation of KatG catalase (activating enzyme for the drug isoniazid) and upregulation of the transcription factor WhiB7 involved in innate antibiotic resistance. Further, the fitness cost of these mutations is alleviated in a multi-drug environment which promotes the evolution of high-level, target-based resistance [33]. Antiviral resistance is mainly an adaptive process, chiefly driven by mutations [27]. In the case of antiretrovirals used in HIV treatment, the primary mechanism of resistance to most Nucleoside Reverse Transcriptase Inhibitors (NRTI) is through accumulation of mutations near the drug binding site [34]. In Hepatitis B virus, multiple missense point mutations have been linked to several drugs, along with cross resistance observed between drugs [35]. Point mutations in the preS/S region are associated with vaccine failure, immune escape, occult HBV infection and the occurrence of hepatocellular carcinoma (HCC). Similarly, nsSNPS in the preC/ C region are related to HBeAg negativity, immune escape, and persistent hepatitis, while those in the X region are implicated in promoting HCC [36]. Likewise, antifungal resistance in Aspergillus fumigatus is also primarily driven by mutations in the azole target cyp51A gene [37], while resistance to artemisinin in P. falciparum malaria is driven by multiple mutations in the Kelch 13 (K13) propeller protein.

Genomics to identify point mutations linked to AMR
High throughput genomic platforms methods of next generation sequencing (NGS) technologies such as whole genome sequencing (WGS) and genotyping arrays have enabled large scale investigations of AMR for identifying resistance determining genetic variants such as SNPs, INDELs, copy number variation, and frameshift mutations [38][39][40][41][42][43]. The role of genetic variants, in particular SNPs, have been implicated in drug resistance by several studies [44][45][46][47]. Building on human complex disease applications [48][49][50], genome-wide association studies (GWASs) have been applied to reveal genotype -AMR phenotype associations, at a locus or variant level. Furthermore, GWAS regression models allow the estimation of mutation or genotype effect sizes (e.g. odds ratios). Examples of GWAS analysis in the context of AMR include for Burkholderia multivorans [51], Mtb [11,52,53], severe malaria [50] and fungal pathogens [54].
Bioinformatic approaches exploiting output from WGS technologies and GWAS analyses have enabled AMR prediction and surveillance. Leveraging this wealth of information has enabled novel applications of artificial intelligence and machine learning (AI/ML) in the pan-genome identification of resistance genes, pathways, mechanisms [55][56][57][58], as well as resistance prediction [59][60][61]. Bioinformatic approaches have also been used to identify novel drug targets like Inositol-3-phosphate synthase (I3PS) in Mtb, opening up new avenues in TB drug discovery [62].
Despite the immense utility provided by genomic analysis, these methods lack the mechanistic underpinning required to develop robust prediction tools [63] necessitating follow-up functional studies [64]. In order to strengthen genomic analysis, it is important to supplement genomic associations with functional consequences of mutations on drug targets. One of the ways to achieve this is via biophysical assessment of mutations on drugtarget structure and their interactions.

Biophysical consequences of point mutations on protein structure
The biophysical consequences of protein mutations are mainly studied by assessing thermodynamic stability, which is often used as a proxy for function [65]. This relationship has been clearly demonstrated in the evolution of influenza nucleoprotein which appears to be constrained to avoid low-stability sequences [66]. The synergy between the fields of protein biophysics and protein evolution helps contextualise and rationalise concepts of thermodynamic stability, mutational robustness, evolvability and epistasis in resistance development [67][68][69]. Missense mutations resulting in a change in the amino acid may disrupt downstream function by altering protein stability and its associated interactions [70]. For example, three missense point mutations within the Mtb gidB gene lead to gidB mutants with lower thermodynamic stability and higher flexibility, considered to be a major driving factor in the emergence of high-level streptomycin resistance [71]. Equally, structural insights into the stability-function relationship have highlighted the rationale for such a trade-off in the development of antibiotic resistance [72].
1.6. Using structure to understand impact of point mutations linked to AMR Structural consequences of point mutations can provide functional insights for resistance phenotypes. For example, point mutations in the Penicillin-Binding Proteins confer resistance to blactam antibiotics by making the active site amenable to hydrolysis, or reducing binding affinity for the antibiotic [73]. Structure guided design demonstrated the potential of boronate-based PBP inhibitors to overcome b-lactam resistance in Gram positive organisms [74]. Similarly, missense mutations in the Mtb gidB gene (target for the antibiotic streptomycin) are responsible for drug resistance through distortion of the binding pocket affecting SAM (co-factor) binding [71]. Likewise, mutations in Mtb pncA gene (target for the pro-drug pyrazinamide) are responsible for the loss of enzyme activity [75]. The underlying mechanism of mutations in the gidB gene conferring low and high-level streptomycin resistance in Mtb were found to be associated with distortion in the active site morphology by proximal and distal residues affecting the overall structure [76]. Further, the prominent mutation H275Y within the neuraminidase enzyme of the H1N1 pandemic strain renders the drug oseltamivir ineffective due to distortion in the binding pose of the drug within the active site [77]. Structural analysis of C580Y and R539T mutations in the K13 propeller gene (associated with artemisinin resistance) in P. falciparum malaria revealed local conformational disruption in the mutant and two solvent-exposed patches at conserved sites affecting protein-protein interactions [78]. Structural insights can aid in the absence of phenotypic data [79] as well as provide a physical basis to a more comprehensive understanding of mutational impact on the underlying biological mechanisms. Therefore, computational tools measuring the biophysical effects of resistance linked mutations can aid mechanistic understanding and inform functional studies. Understanding mutational consequences with respect to global (drug-target structure) and local (protein-ligand, protein-protein and proteinnucleic acid) stability effects [80] can be further extended to predict drug resistance for novel mutations [81,82].
Here, we review several of the principal computational tools and methods currently available for measuring mutational consequences, focusing on those tools which have been used to analyse variation within a pathogen genome and their application in the context of AMR. It is not meant to be an exhaustive list, with other tools available centred on important questions like assessing cancer variations and other human mutations. As such, these go beyond the scope of this review and have been extensively reviewed elsewhere [83][84][85].
Different tools can be used to describe the effect of mutation on protein function, which may provide an explanation for the AMR phenotype. Some are primarily based on conservation or substitution matrices, and do not require a protein structure as input (Sequence-based methods). Others consider the local environment of the variant within the protein structure in their calculation (Structure-based methods). In the presence of a known AMRrelated phenotype, these tools are useful as they provide mechanistic insight which may explain how resistance is brought about at the protein level. Therefore, when analysing specific proteins, it can be beneficial to use different methodologies, as different strategies may give complementary information. Summaries of the types of methods are given below and represent some of the principal tools currently available. Table 1 summarises the main features of some of the currently available tools for analysing effects of pathogen mutations.

Sequence-based methods
As these methods rely solely on the gene or protein sequence, they are often useful in the absence of a known protein structure or when homology modelling is not possible. The predictions from these tools are generally based on sequence alignments, predicted secondary structures and subsequent conservational trends. Most methods determine a score with cut-offs leading to functional classification of mutations into deleterious or neutral. This functional classification is not always applicable to AMR mutations, as variants may be 'deleterious' to protein conservation, but gain-offunction through survival in the presence of drug. For example, when analysing rifampicin resistant Mtb mutations we found that they tended to cluster within more conserved regions of the rpoB gene [80] (Portelli and Ascher, personal communication). Similar analysis carried out on pyrazinamide [82] and bedaquiline [81], revealed that known resistant Mtb mutations were more likely to lead to deleterious effects compared to susceptible variants in the same gene [100]. However, when measuring mutational tolerance [101], strong evidence of positive selection for resistant mutations was observed. Therefore, the utility of these tools in understanding AMR mechanisms lies in the actual scores, where a comparison of different scores across variants, accounting for their genetic position can uncover important underlying mechanisms and trends related to evolutionary conservation. We have previously shown that this sequence information is also complementary to structural information, particularly within the context of machine learning [102]. Several of the major methods which are applicable across pathogens and human genomes are:

a. SIFT
The SIFT (Sorting Intolerant From Tolerant) can be used to analyse missense mutations and INDELs. The SIFT scoring method combines sequence alignment with a position-specific scoring matrix (PSSM), which accounts for the likelihood of an amino acid to occur within a specific position. The amino acid chemical properties are also incorporated to determine a scaled probability of the mutation (SIFT score), on which the output (tolerated or deleterious) is based [100]. SIFT has been used to build the Variant Effect Predictor (VEP) tool developed as part of the Ensembl 2018 project [103].

b. PROVEAN
PROVEAN (Protein Variant Effect Analyzer) is able to account for (multiple) missense mutations and INDELs. It uses the BLOSUM62 substitution matrix as an amino acid probability matrix and combines this with differences in sequence similarity between wildtype and mutant sequences. The sequence context in which variation occurs is also considered, to represent environmental surroundings and effects. A numerical score is generated for each variant, which enables the functional classification into deleterious or neutral [104]. PROVEAN scores have provided the evolutionary basis for the recently deployed web-based tool SUSPECT-PZA [82] which predicts pyrazinamide (PZA) resistance mutations in the Mtb pncA gene.

c. SNAP2
SNAP2 (Screening for Non-Acceptable Polymorphisms v.2) characterises the effect of all possible missense mutations as either neutral or deleterious. It is a machine learning-based predictor trained on neural networks. It also accounts for amino acid position probabilities using position-specific independent counts, based on the BLOSUM62 matrix. This predictor considers other features such as protein fold (Pfam, PROSITE) and functional annotations (SWISS-PROT) during training, and as such is the tool that spans the most comprehensive feature space [105]. As well as forming part of the SUSPECT-PZA tool [82], SNAP2 scores have provided the evolutionary basis for a similar tool called SUSPECT-BDQ [81]. This tool predicts the effect of missense mutations on the anti-TB drug bedaquiline, reserved to treat MDR and XDR TB.

d. ConSurf
ConSurf estimates an evolutionary rate score for every position across the sequence, unlike the tools above which base functional classification on score thresholds. In the context of drug resistance, it can help identify sites which are likely to lead to resistance if mutated. The ConSurf score is based on a multiple sequence alignment, which generates probabilistic evolutionary models and phylogenetic links. Through this score, more conserved sites (having slower evolutionary rates), which have important functional and structural consequences are identified [106]. Consurf has been used to estimate and visualise conserved regions within SARS-CoV-2 [107], the SARS-CoV nsp12 polymerase domain [108], and T. Tunstall, S. Portelli, J. Phelan et al.

Table1
Sequence and structure-based tools that predict effect of pathogen missense mutations. The table is an up-to date list of currently available tools (as on 3rd August 2020). The type of method for each tool is specified using the following code; S: sequence-based  webGL structural visualisation for input mode 1.
Works at an atomic level.
Demonstrates correlation between atomicdistance pattern of the wild-type residue environment and mutational impact.
Calculates overall stability of protein and interactions. Accounts for protein molecular motion and flexibility.
Easy and detailed visualisation of results including interatomic interactions, deformation and fluctuation analysis.
Returns a change in stability.
Computationally expensive with relatively long runtime. e. Mapp MAPP (Multivariate Analysis of Protein Polymorphism) predicts the functional impact of all possible missense mutations. It combines evolutionary conservation and physicochemical information. It uses data from multiple sequence alignments from orthologs to estimate a mean for each of the six physicochemical properties (hydropathy, polarity, charge, volume, and free energy in alpha helices and beta strands) for each position. A single composite value for each physicochemical value is generated based on the deviation from the mean for all twenty amino acids. High MAPP scores indicate highly conserved sites, which in the context of drug resistance can indicate resistance promoting sites [110]. MAPP has been used to develop the ProPhylER [111] tool, used for proteome wide investigation of mutational impact on eukaryotic protein.

Structure-based methods
When analysing missense mutations, structure-based methods can offer a 3-dimensional explanation of molecular consequences of mutations, which may not be evident from sequence analysis alone [86,89]. These methods include the analysis of the protein structural and functional consequences of mutations, including those on protein folding, stability, dynamics, and alterations to interactions with normal ligands. Protein structure information can be incorporated through rule-based or machine learning based approaches (see Table 1). As acquired resistance can develop through missense mutations, analysing their effects can inform on underlying mechanisms of resistance. In previous analyses, we observed that known resistance mutations arising in the drug-target tend to significantly reduce functional affinities, such as nucleic acid affinity [80][81][82][93][94][95]. Resistance mutations in drug activators are associated with large decreases in protein stability or activity [79,80], and those in drug exporters tend to increase protein flexibility to promote drug export [91]. To run these predictors, a crystal structure of the protein or a homology model is required. A summary of the principle methods and applications are described below:

Measures of protein stability
The introduction of resistance-causing missense mutations to a protein structure rarely comes at a negligible cost to protein stability, whether decreasing local stability and affecting protein folding, or increasing local stability and compromising wild-type protein dynamics [112]. Therefore, quantifying the effect of missense mutations on stability presents a good starting point in understanding the basic variant protein changes. Computational tools predicting thermodynamic stability of a protein do so by estimating the Gibbs free energy (DG Kcal/mol). The subsequent impact of a point mutation on protein stability is then estimated as a change in the Gibbs free energy (DDG Kcal/mol) between wild-type and mutant proteins, or vice versa. Additionally, these tools provide both the extent (the actual value of DDG) as well as the direction (destabilising/stabilising) of the resulting mutational effect. Different in silico protein stability predictors are available, of which we highlight a few, based on the methodologies considered in their approximations. Further details for these (and additional) methods can be found in Table 1. a. FoldX is an empirical-based predictor which provides information on how a single point mutation alters the stability of a protein. It constructs structure models of the protein with the mutation and estimates the stability (DG) associated with the mutant protein. Estimation of stability is based on intramolecular interactions such as van der Waals' forces, solvation energies, interactions with water, hydrogen bonds, electrostatic effects and main and side chain entropies. Mutational impact is calculated through a weighted summation of all the intramolecular interactions, and estimated as a change in stability (DDG) between mutant and wild-type structures. In this way, DG for each mutant protein, DDG upon mutation, and the contribution of each intramolecular interaction, are made available to the user. The extent of the mutational impact (the value of DDG) as well as the direction of change (DDG < 0: stabilising, DDG > 0: destabilising) are captured by the predictions [113,114]. b. PoPMuSic (v2.1) is a statistical method which uses knowledge-based potentials to predict mutational impact on the stability of a protein. It returns the predicted DDG of a single point mutation of a protein and is able to systematically analyse this for all possible point mutations for a given protein. Additionally, an 'optimality' score for each amino acid in the sequence with respect to stability is returned. The optimality score identifies sites of structural weakness i.e. clusters of residues that are considered nonoptimal from an evolutionary perspective. Therefore, mutations with desired stability properties (DDG < 0: stabilising, DDG > 0: destabilising) and poorly optimised positions can be identified. These sites can relate to the protein's function, and be used for rational protein design and other experimental studies. In PoPMuSic, a protein is represented as a statistical potential based on individual residue properties such as sequence position, conformation, solvent accessibility, or a combination of inter-residue distances. The optimality score is computed from the sum of the predicted DDG of all stabilising mutations at a given position in the sequence.
Since the majority of the mutations have a destabilising effect, this score is expected to be close to zero for most positions in the sequence, with high negative values indicating sites with strongly stabilising mutations and/or several stabilising mutations with mild effect [115]. c. I-Mutant (v2.0) is an ML based predictor which computes mutational stability changes using support vector machines. It provides an estimate of the DDG upon a single point mutation based on protein structure (or sequence). The resulting DDG highlights the extent as well as the direction of impact (DDG < 0: destabilising, DDG > 0: stabilising) on the protomer. The predictions consider the mutated residue environment as a 9 Å region (structure) and a 19-residue window (sequence) surrounding the mutation. This environment is combined with experimental pH and temperature conditions, enabling the user to define different pH and temperature conditions on a case-by-case basis to better encompass protein biological conditions [116]. d. STRUM is an ML based predictor and returns an estimate of the DDG of a single point mutation on 3D models based on wild-type sequences. It can be used to analyse single mutations or all possible mutations within a specified region of the protein. Similar to methods above, both the magnitude of change as well as the direction (DDG < 0: destabilising, DDG > 0: stabilising) are encapsulated in the predictions. The 3D models are generated using iterative threading assembly and combined physics-and knowledge-based energy functions. Predictors are trained based on 3 groups of features: sequence, threading, and I-TASSER structure. A total of 120 features are trained through Gradient Boosted Regression Trees (GBRT) to overcome overfitting effects [117].

Measures of global and local stability within a single framework
The mCSM (mutation Cut-off Scanning Matrix) suite of computational tools accounts for the changes in protein stability dynamics [118], and interactions with other proteins [119], ligands [88] and nucleic acids [120] upon introduction of missense mutation. It estimates change in stability (DDG) and change in binding affinity of the ligand. Measuring the impact of missense mutations beyond protein stability, by looking at functional affinities, is crucial to characterise the mechanisms of AMR-associated mutations. This is because affinities to ligands, nucleic acids and other proteins are highly dependent on specific interaction sites, irrespective of protein stability changes. Functionally, protein affinity changes to its ligand is especially important in AMR, as it enables the identification of mutations directly affecting ligand binding. The extent of this importance, however, relates to the drug mode of action, meaning that other functional affinities should also be considered to identify mechanisms beyond direct ligand binding. The mCSM suite of tools quantify these stability and functional measurements using graph-based signatures [121], which summarise the global environment of the protein as a series of nodes for each atom, and represents the local environment at the mutation site as edges on the graph between the nodes at similar distances from the mutation. A pharmacophore count is appended to these signatures to account for any physicochemical changes imparted by the missense mutations [122] (Fig. 1). Through this graph-based network, the impact of a missense mutation over the whole protein can be calculated. All methods within the mCSM suite are based on ML approaches in quantifying missense mutational changes, and are freely available via their respective web servers.
Ensemble methods like DUET [102] generate a consensus prediction based on two different tools, while the meta-predictor tool by Broom, et al. [123] combines predictions from eleven available tools. Similarly, the ELASPIC method [124] combines semiempirical energy terms, sequence conservation, and several molecular features to predict mutational effect on stability and affinity. Likewise, DynaMut [118] combines graph-based structural predictions with Normal Mode Analysis to account for protein dynamics and molecular motion to assess mutational impact. Consensus approaches have the advantage of improved accuracy over individual tools, but are tightly coupled and sensitive to their availability.

Insights from molecular dynamics simulation experiments
Despite not providing direct thermodynamic measures of mutations, molecular dynamics (MD) remains an invaluable technique for analysing mutational effects on protein conformational movement, especially considering that other techniques run on static protein structures. In the context of AMR, MD simulations enable comparison between wild-type and mutant protein trajectories. Visualising these differences can highlight co-occurring mutations and sites with local protein rigidification. Different MD techniques may be used, depending on computational cost and the level of throughput required.
An all-atom MD method has been adopted to study cooccurring missense mutations V82F/I84V (known to confer resistance to target inhibitors) within HIV-1 protease [125]. This analysis enabled the characterisation of an equilibrium shift imparted by these mutations from a closed to a semi-open conformation as a possible cause of drug resistance [125]. More recently, the effect of G140S mutation on HIV-1C Integrase (IN) protein provided insight into dolutegravir resistance. Decreased stability of IN and higher flexibility around the 140 loop region in the mutant system reduced drug affinity [126]. Similarly, MD simulations also examined artemisinin resistance in malaria. Mutation R539T and C580Y in the P. falciparum K13 region revealed local structural destabilisation of the Kelch-repeat propeller (KREP) domain but not the overlapping shallow pocket [78]. In fungal and bacterial enzymes, MD investigation of the interaction of triazole drugs with their target, CYP51, has highlighted the potential to design inhibitors with greater ortholog specificity. While protein-fluconazole interactions were strongly mediated by ligand-HEME interactions in fungal enzymes, the same was mediated by polar interactions in the bacterial counterpart (CYP51 Mtb) [127]. Stereochemical changes, rather than electrostatic effects, of ten point mutations in Mtb katG led to isoniazid (INH) resistance by restricting access of the drug to its catalytic site [128]. Likewise, conserved motions and unbinding events of 82 point mutations in Mtb pncA, linked to PZA resistance, were also discerned through MD simulations. Coupled expansions and contractions of the pncA lid and the side flap were observed in the unbinding of PZA in some mutants, while destabilisation of the ''hinge" or nearby residues facilitated lid opening and PZA release from the active site [129].
MD studies have also shed light on AMR mutations in biological pathways. For example, mutations Y59H, M84I and E160D within the RamR homodimerization domain on ramA promoter were shown to affect structure stability and binding affinity. These mutations led to dysregulation of the multidrug efflux pump RND, and consequent drug resistance in Salmonella enterica [130]. Another example, where extensive modifications modelled by MD simulations of six missense mutations in Thymidylate synthase A (ThyA), a key enzyme in the Mtb folate pathway, provided a deeper understanding of Para-aminosalicylic acid resistance [131]. Likewise, investigation of inhA-INH resistance in Mtb revealed a ligand ''locking" mechanism together with increased vibrational coupling between inhA cofactor binding site residues, responsible for the inhibitory function of the wild-type complex. This insight provided an explanation of how the resistant mutation S94A circumvents these subtle changes in global structural dynamics, with downstream effects in the fatty acid synthase pathway [132]. All-atom MD simulations have also been used to understand the mechanism of anti-microbial peptides within biofilms, which can potentially serve as alternative therapies in the presence of AMR [133].
Although, an all-atom MD approach offers detailed analysis of specific mutations, it is often computationally expensive making it impractical for large mutational datasets. In such cases, an approximated MD technique, known as normal mode analysis (NMA) can be adopted. NMA uses harmonic motion to summarise protein dynamics arising from vibrational entropy changes. This approach is the basis for DynaMut [118] (part of the mCSM-suite of computational tools described above) which predicts missense mutational impact on proteins while accounting for their molecular motions.

Applications of the computational tools for characterising drug resistance in TB and other infectious diseases
The tools described above for measuring the effects of mutations within a gene have been used to provide a molecular understanding of how variants can affect pathogen drug resistance in Mtb [80,92] and P. vivax [134]. In all cases, the different tools have provided complementary information to describe mutational effects under selective pressure as a balance of fitness costs across different protein properties.
To demonstrate the utility of this approach, we explore in more detail Mtb variants in two genes katG (resistance to isoniazid) and rpoB (resistance to rifampicin), which have been associated with drug resistance from GWAS analyses [11,45]. Most katG mutations conferred resistance through a disruption of protein stability [80]. Functionally, it is thought that Mtb renders the non-essential KatG unstable to impede the activation for prodrug isoniazid, thereby conferring resistance. When considering rifampicin resistant mutations within gene rpoB, we found that most mutations disrupt protein-protein interactions, leading to a loss in nucleic acid affinity. Structurally, the effects of these mutations within RpoB, the bsubunit of RNA polymerase, are compensated for by mutations within RpoC, which is the b 0 subunit, thereby restoring normal functioning of the RNA polymerase, with an added resistance property [135][136][137]. Within this analysis, two distinct classes of mutations were observed: (i) those having high allele frequency within GWAS, but which had mild overall effects on protein stability and affinities to ligands, other proteins and nucleic acids, and (ii) those having lower allele frequency but more drastic effects on protein properties. Theoretically, it is thought that a high mutational incidence of class (i) mutations is a result of lower likelihood of evolutionary purging when compared to class (ii) mutations, which is based on the structural and functional effects imparted at the protein level. Mutations from each class were also seen to co-occur as haplotypes, where they are thought to compensate for each other in terms of protein fitness [80].
Using 571 missense SNPs in katG across 19265 Mtb isolates, we tested for an association between mutation odds ratio and allele frequencies with the biophysical effect on protein stability (Fig. 2). This analysis suggests a higher proportion of destabilising mutations (~84%, n = 480 vs~55.5%, n = 105) with only a small proportion of mutations lying within 10 Å of the active site (~10%, n = 57 vs~15%, n = 28) highlighting the importance of allosteric mutations in INH drug resistance. There is a weak negative correlation between protein stability and odds ratio (q = À0.15, P < 0.001), and between protein stability and allele frequency (q = 0.31, P < 0.001) (Fig. 3a). Analysis of biophysical effects (destabilising vs stabilising) of katG mutations by Mtb lineage revealed statistically significant differences (Fig. 3b, Kolmogorov-Smirnov P 1.3e-08).
This type of analysis can be implemented on proteins encoded on plasmids (a common vector of resistance), where this approach has been used to explain the evolution of carbapenem resistance in Acinetobacter baumannii [91].

Computational structural tools predicting drug resistance
A limitation of current genomic sequencing-based resistance diagnostic approaches is that they require pre-existing knowledge about the phenotypic consequences of a variant. This means we often cannot detect it until it has been established within the population. By contrast, we have shown that using these tools we can pre-emptively identify likely drug resistant mutations in the absence of previous genomic data. These insights are of particular relevance for new drugs without extensive clinical data, and drugs which lack approved diagnostic tests. We have therefore used this approach to explore resistance against the TB drugs BDQ [81] and PZA [82]. The use of our PZA predictive model within the clinic was the first successful translational application of structural guided resistance detection. This revealed the power of combining structural interpretation within existing diagnostic sequencing frameworks [93]. Additionally, other ML based approaches have also been used in predicting drug resistance in Mtb [56,138].

Designing better antibacterial drugs
It has been suggested that a way to minimise the development of resistance is by making compounds that interact similarly to a natural ligand [139]. The rationale being that this would lead to any resistance hot-spot having a higher fitness cost associated with it. This led to one of the first successful structure-guided drug discovery projects on neuraminidase inhibitors. Computational tools aid molecular characterisation of novel genomic variants, which provide opportunities to pre-empt likely resistant mutations. Anticipating these variants before they arise in a population can inform the drug discovery pipeline, especially in developing compounds less prone to resistance emergence. Such an approach has already been used as part of the drug development efforts against the TB drug target IMPDH [99]. The mutation predicted was the only resistant variant detected in subsequent in vitro resistant assays. Further, compounds designed to avoid this hot-spot were less prone to develop resistance [96][97][98]. This type of analysis complements the development of new tools that integrate geno- mic and structural data such as the Target-Pathogen online resource [140], which prioritises candidate drug targets in ten clinically important and diverse pathogens. This approach underscores the importance of structural data in guiding the drug-discovery process [140].

Summary and outlook
Large scale genomic studies have enabled identification of mutational associations with a resistance phenotype, useful for surveying the presence and spread of resistance to a wide range of antimicrobials. However, understanding the functional effects of putative mutations is crucial. Computational tools accounting for anti-symmetric properties of variation i.e. DDG (A->B) = -DDG (B->A) [118,141,142] are able to achieve improved prediction performance complementing experimental studies [85].
Genomic and structural analysis of resistance can infer mutational effects with therapeutic consequences before they become fixed in a pathogen population. This has implications for both infection surveillance and in the development of next generation drugs. The latter is of particular relevance to fragment-based drug discovery (FBDD) [143,144]. For the past 20 years, this has been a powerful route to new therapeutics, for example, in the development of vemurafenib for late-stage melanoma [145], and is increasingly being applied in the search for new antimicrobial drugs [146][147][148]. FBDD uses a library of low molecular weight, low affinity binding molecules (fragments) to probe a target protein. This approach helps to identify areas that are receptive to binding. Biophysical and structural biology techniques are used to determine which fragment binds, and how. The target can then be used to guide an expansion of the fragment to a higher molecular weight and higher affinity binding molecule. An important step in this process is elaborating fragments that bind, to generate compounds that can be taken through to clinical testing. This is the stage at which crucial decisions are made about the regions of the drug target to exploit. However, pathogen tolerance is seldom considered, with direct consequences on drug effectiveness or efficacy. Current methods of analysing the effects of mutations either operate at the gene level (identifying known markers of resistance) or focus on a specific effect of the mutation (protein stability) without directly relating it to a resistance phenotype. Combining genomic results with structural analysis permits consideration of mutational impact on a potential drug binding region, providing informed decisions regarding drug efficacy. This has the potential to help the design of better antimicrobial drugs.
CRediT authorship contribution statement

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.