In silico identification of genetic mutations conferring resistance to acetohydroxyacid synthase inhibitors: A case study of Kochia scoparia

Mutations that confer herbicide resistance are a primary concern for herbicide-based chemical control of invasive plants and are often under-characterized structurally and functionally. As the outcome of selection pressure, resistance mutations usually result from repeated long-term applications of herbicides with the same mode of action and are discovered through extensive field trials. Here we used acetohydroxyacid synthase (AHAS) of Kochia scoparia (KsAHAS) as an example to demonstrate that, given the sequence of a target protein, the impact of genetic mutations on ligand binding could be evaluated and resistance mutations could be identified using a biophysics-based computational approach. Briefly, the 3D structures of wild-type (WT) and mutated KsAHAS-herbicide complexes were constructed by homology modeling, docking and molecular dynamics simulation. The resistance profile of two AHAS-inhibiting herbicides, tribenuron methyl and thifensulfuron methyl, was obtained by estimating their binding affinity with 29 KsAHAS (1 WT and 28 mutated) using 6 molecular mechanical (MM) and 18 hybrid quantum mechanical/molecular mechanical (QM/MM) methods in combination with three structure sampling strategies. By comparing predicted resistance with experimentally determined resistance in the 29 biotypes of K. scoparia field populations, we identified the best method (i.e., MM-PBSA with single structure) out of all tested methods for the herbicide-KsAHAS system, which exhibited the highest accuracy (up to 100%) in discerning mutations conferring resistance or susceptibility to the two AHAS inhibitors. Our results suggest that the in silico approach has the potential to be widely adopted for assessing mutation-endowed herbicide resistance on a case-by-case basis.

Introduction Acetohydroxyacid synthase (AHAS, also known as acetolactate synthase or ALS) is a group of biosynthetic enzymes found in all plants, fungi, and bacteria (but absent in animals and humans). AHAS is a key enzyme that catalyzes the formation of acetolactate and acetohydroxybutyrate from pyruvate and 2-ketobutyrate [1,2]. This is the first step in biosynthesis of the essential branched-chain amino acids (valine, leucine, and isoleucine), which are critical for all forms of life. AHAS has long been an attractive target in the development of herbicides, fungicides, and antimicrobials because its inhibitors have a low toxicity to mammals while still being highly selective and very potent [3]. AHAS-inhibiting herbicides are the largest site-ofaction group on the market, with more than 50 chemicals belonging to five classes (sulfonylaminocarbonyltriazolinones, triazolopyrimidines, pyrimidinyl(thio)benzoate, sulfonylureas, and imidazolinones) and sulfonylureas being the majority [4]. However, persistent use of herbicides has exerted intense selection pressure on a great variety of weed species and resulted in the evolution of resistance [5]. In the most common mechanism, resistance is conferred by alteration of amino acids in the target site that attenuates the sensitivity to target-specific herbicides [6,7]. The magnitude of herbicide resistance depends on weed species, structural change induced by mutation, and the type of herbicide. For a specific herbicide, a given mutation may endow moderate to high resistance [7,8] or, in rare instances, an increase in sensitivity to the herbicide in different species [5]. In the current practice of weed control, resistance mutations may be discovered only after repeated failure of herbicide application. Therefore, there is a strong and urgent demand for a reliable and systematic approach for determining resistance profiles of different herbicides that are in use or have been newly developed before commencing weed treatment. Compared to wet lab-based experiments and techniques, computational approaches provide a rapid and cost-effective solution to screen and detect resistance mutations.
Although computational endeavors in understanding herbicide resistance have been scarcely reported [8,9], considerable in silico efforts have been made to interpret and predict drug resistance associated with genetic mutations during the last decade [10][11][12][13][14]. Here we focus on computational studies in which the mutational effect is evaluated by measuring protein-ligand interactions. A handful of biophysics-based methods have been employed to estimate the affinity of inhibitors binding to wild-type (WT) or mutated proteins [15][16][17][18][19][20][21][22], and the results are in good agreement with experimental data. Moreover, viable mutations that confer resistance to an inhibitor of dihydrofolate reductase have been predicted by a protein design algorithm before being verified by crystallography and other experiments [23]. In addition to mutational effects on binding affinity, the influence of mutations on catalytic activity has been studied [24]. A successful resistance mutation should only impede the inhibitor binding to the enzyme, but not the catalytic efficacy. In the aforementioned reports, the noncovalent interaction between protein and ligand is typically described by a molecular mechanics (MM) potential function. Despite the success of MM force fields, it is always of immense interest to precisely treat noncovalent interactions for accurate calculation of binding affinity. In theory, noncovalent interactions can be handled more accurately with quantum mechanics (QM) than with MM [25,26] because important effects such as charge transfer and electronic polarization are considered in QM, but not in MM. Inevitably, these additional considerations cause a drastic increase in computational requirements, which limits the size of the systems that can be studied. As a tradeoff between efficiency and accuracy, semi-empirical quantum mechanical (SQM) methods have lately attracted attention again [27,28]. SQM-DH, an improved SQM approach with the addition of dispersion (D) and hydrogen bond (H) correction terms, yields results comparable to high-level QM calculations in terms of accuracy [28,29]. Even so, SQM is still too computationally demanding to be applicable for a large biomacromolecular system. A feasible solution is the hybrid QM/MM model, in which the interaction region is treated by QM whereas the remaining part is described by MM. Consequently, a number of QM/MM approaches have been developed to study a group of ligands binding to a receptor [25]. However, it is unknown how well these QM/MM methods could differentiate between mutations that confer resistance or susceptibility to an herbicide.
Here we investigate if the MM and QM/MM approaches can correctly identify AHAS mutations in Kochia scoparia (also called Bassia scoparia) that confer resistance to two sulfonylurea herbicides, tribenuron methyl and thifensulfuron methyl. K. scoparia is one of the most problematic annual broadleaf weeds in North America known for its rapid adaptability and widespread herbicide resistance [30]. So far, it has been reported that resistance to herbicides arises via multiple mechanisms of action [31]. The resistance mechanism of K. scoparia to AHAS-inhibiting herbicides, including the two most popular classes of AHAS inhibitors (sulfonylurea and imidazolinone), has been well studied. Principally, this resistance is acquired through mutations in the AHAS gene, which inhibit herbicide binding, but do not severely impair AHAS catalytic activity and plant growth [30,32,33].
Because the structure of K. scoparia AHAS (KsAHAS) has not been solved, we first modeled the structures of WT and mutated KsAHAS in complex with the two sulfonylurea herbicides using homology modeling and docking. Then a plethora of MM and QM/MM methods were employed to calculate the binding affinity, including MM-PBSA/GBSA/ALPB (i.e., MM combined with Poisson-Boltzmann/generalized Born/Analytical Linearized Poisson-Boltzmann and surface area continuum solvation) and QM/MM-GBSA combined with SQM and SQM-DH approaches. The estimation of binding affinity was based on single structure or an ensemble of structures sampled from classical and QM/MM molecular dynamics (MD) simulations.

Molecular docking and homology modeling
AtAHAS shared a sequence similarity of about 80% with KsAHAS, whereas the sequence similarity between ScAHAS and KsAHAS was about 40%, which allowed us to build homology models. First, the dimer structure of 1YI1 (AtAHAS) was modeled using the structure of 1T9A (ScAHAS) as the template. Then, tribenuron methyl and thifensulfuron methyl were docked to the dimer structure of 1YI1 using DOCK 6.7 [41]. Finally, using the docked structures as templates, KsAHAS structures bound with tribenuron methyl or thifensulfuron methyl were constructed using Modeller 9.17 [42]. For each herbicide, 29 KsAHAS (1 WT and 28 mutated) structures were generated. In docking simulations, the receptor box delimiting the binding pocket was calculated with SHOWBOX. Potential grids were generated by the GRID program using a 0.3 Å spacing. The herbicides were flexibly docked into the AtAHAS or KsAHAS structure, and the number of sampled ligand poses was set to 1,000.   In silico identification of resistance mutations

Molecular dynamics (MD) simulations
One of the two symmetry binding sites in KsAHAS was removed for reduction of computational complexity. The remaining protein with two chains and 484 residues was used for MD simulation and calculation of binding affinity. The WT and mutated KsAHASs were modeled using the Amber ff14SB force field [43]. The herbicides were modeled using the GAFF2 force field [44]. Each complex structure was explicitly solvated in a rectangular box of TIP3P water molecules with a minimal distance of 10 Å from the protein to the box edges. Counter ions (Na+) were added to neutralize uncompensated charges. The whole system was energy minimized with 2,000 steps of steepest descent followed by 10,000 steps of conjugate gradient without any harmonic restraint. The energy-minimized structures were extracted from the water box for calculation of the single-point binding free energy. Then, coupled to a Langevin thermostat, the system was heated from 10 K up to 300 K by increments of 100 K in 20 ps and continued to run for 40 ps at 300 K at constant volume. Finally, the system was equilibrated for 200 ps in NPT ensemble with the Langevin thermostat and isotropic position scaling at 300 K and 1 bar, respectively. The production run for each complex structure was carried out for 2 ns in NVT ensemble with the Langevin thermostat at 300 K using the parallel PMEMD. The MD simulations ran only for 2 ns because it was previously reported that the length of MD simulations had little effect on the calculation of binding affinity [45]. After that, the QM/MM [46,47] MD simulations were turned on. The QM region was composed of the herbicide and side chains of residues at the 7 mutation sites except Gly268 and Pro197. Gly268 was excluded because it was located outside the QM region. For Pro197, only backbone atoms were included. Thus, the QM region contained 89-124 atoms, depending on herbicides and mutations. The AM1 method with dispersion correction (AM1D) was employed to model the QM region. The QM/MM MD simulations ran for 200 ps in NVT ensemble with the Langevin thermostat at 300K using the parallel SANDER. The trajectories were sampled at a time interval of 10 ps. In our case, the QM/MM MD simulations were 45-120 times slower than the classical MD simulations, depending on the size of the QM region. For each classical MD trajectory, the last 50 frames were obtained for calculation of binding free energy. For any QM/MM MD trajectory, the last 10 frames were extracted to calculate the binding free energy.
All MD simulations were carried out using Amber 16 [40]. The equations of motion were solved with the leapfrog integration algorithm with a time step of 2 fs. The lengths of all bonds involving hydrogen atoms were kept constrained with the SHAKE algorithm. The particle mesh Ewald (PME) method was applied for treating long-range electrostatic interactions. Periodic boundary condition was used in all simulations that were performed on a Cray XE6 High Performance Computing system with 32 CPUs on each node.

Binding free energy calculation
In MM-PBSA [48], binding affinity (ΔG bind ) was estimated from free energies of reactants (receptor and ligand) and product (complex): ΔG bind = G complex −(G receptor +G ligand ). The free energy of a state (receptor, ligand, or complex) was decomposed into gas-phase molecular mechanics energies (E MM ), solvation energies (G solv ), and conformational entropy (TS). The standard molecular mechanics energy included internal (bond, angle, and dihedral), electrostatic, and van der Waals interactions. The solvation energy was determined by the polar (G pol ) and nonpolar (G np ) contributions. The polar solvation contribution was calculated by solving the Poisson-Boltzmann (PB) equation, and the nonpolar contribution was estimated by the solvent accessible surface area (SASA). The entropy contribution was obtained by normal mode analysis or quasi-harmonic approximation, which was dropped in our calculation. Thus, the binding affinity was estimated as the sum of the three energy terms (Eqs 1 and 2).
The polar solvation contribution (G pol ) was also estimated using the generalized Born (GB) method, a computationally efficient approximation to the PB equation. In this work, two GB models (the improved GB OBC model [49,50] and the GBn model [51]) were tested. ALPB (Analytical Linearized Poisson-Boltzmann) [52,53] was another approximation method we tested to handle electrostatic interactions within the implicit solvent model.
The calculations of binding free energy were performed with SANDER implemented in Amber 16. In MM-PBSA, the internal dielectric constant (ε) was set to 2 and 4, as previously described [45]. Other parameters were kept at their default settings. The binding free energy was calculated using single structure as well as ensembles of conformations obtained from classical and QM/MM MD simulations.

Prediction power assessment
For each combination of herbicide and binding free energy estimation method, the 29 KsA-HASs were sorted by the binding free energy in descending order (i.e., from weak binding to strong binding), and the top-ranked KsAHASs were predicted to be potentially resistant. Based on the known field resistance data, we made such prediction calls as true resistance (TR) and false resistance (FR) in the top n KsAHASs, and true susceptibility (TS) and false susceptibility (FS) in the remaining (29 -n) KsAHASs. Out of the 29 KsAHASs, 25 contained resistance mutations, which should ideally be ranked ahead of the other 4 KsAHASs (3 mutated plus 1 WT). If a resistant KsAHAS was ranked below 25, it was called a FS; if a susceptible KsAHAS was found in the top 25, it was called a FR. The following three metrics were introduced to compare the overall prediction power of the methods for binding affinity calculation: accuracy, enrichment factor (EF), and AUC-ROC (see below for definition and explanation). Accuracy (Ac) was defined as TRþTS N , where N was the total number of AHASs. EF was defined as a=n A=N in the top-ranked 10 AHASs, where N was the total number of AHASs; n was the number of AHASs selected (that is, 10); A was the total number of resistant AHASs; and a was the number of resistant AHASs in the selection.
The receiver operating characteristic (ROC) [59] curve was drawn by plotting the TR rate ( TR TRþFS ) against the FR rate ( FR FRþTS ) with an increasing n. Here, the area under the curve (AUC) of ROC was the integral of the ROC plot, and served as a measure of the probability that a binding free energy estimation method could correctly rank a resistant AHAS over a susceptible AHAS.
The paired t-test hypothesis testing method was employed to assess if two groups of data were statistically different from each other. The null hypothesis was that there was no significant difference between the two groups. The paired t-test is a parametric statistic with an assumption that the data follow a normal distribution. All hypothesis testing was carried out at the 5% significance level using Python 3.4 and SciPy 0.18.

Modeling of KsAHAS-herbicide structures
Crystallographic studies indicate that AHAS inhibitors bind in a channel leading to the active site on the interface of the AHAS dimer and consequently block substrate access to the catalytic site [37,38]. While the complex structures of tribenuron methyl with AtAHAS and ScA-HAS have been experimentally determined (Fig 2A and 2B), the structure of AHASthifensulfuron methyl was unavailable at the time of this study. Using the known structures as templates, the KsAHAS-herbicide complexes were built using homology modeling as described in the Materials and methods section (Fig 2C). Inspection of the binding mode of ScAHAS and tribenuron methyl (PDB entry 1T9A) showed that Trp574 formed the π-π interaction with the heterocyclic ring of tribenuron methyl while Pro197 and Asp376 were in close contact with the aromatic ring (Fig 2D). The three residues (Pro197, Asp376, and Trp574) are highly conserved in AHAS genes [6], and the aforementioned interactions can be found in other AHAS-herbicide complexes as well as in the modeled KsAHAS-herbicide structures. Mutations at these conserved residue sites may disrupt the important interactions between AHAS and herbicides (thereby disturbing herbicide binding), which may explain why resistance mutations mostly occurred at these three sites (see Fig 1 and Table 1).
Examination of AHAS-herbicide crystal structures suggests that the same herbicide adopts nearly identical orientation when bound to AHASs in different species. For example, the poses of tribenuron methyl in complex with AtAHAS and ScAHAS were very much alike with a ligand RMSD (root mean square deviation) of 0.79 Å (Fig 3A). In the modeled KsAHAS-tribenuron methyl complex, the ligand pose was almost the same as those experimentally determined with a ligand RMSD of 0.67 Å (AtAHAS) or 1.07 Å (ScAHAS). In the KsAHASthifensulfuron methyl complex, the ligand exhibited a binding mode highly similar to that of tribenuron methyl in AHAS. As shown in Fig 3A, the central sulfonylurea bridge and heterocyclic ring of the herbicides completely overlapped, while the thiophene ring of thifensulfuron methyl moved slightly away from the aromatic ring of tribenuron methyl. This was possibly because chemical structures of the two herbicides were very similar with the aromatic ring in tribenuron methyl substituted by the thiophene ring in thifensulfuron methyl (Fig 3B), which induced the rearrangement.

Prediction performance comparison
For each KsAHAS-herbicide complex, the binding affinity was estimated by 24 MM or QM/ MM methods in combination with three different sets of structures obtained from structure minimization (i.e., single structure), classical MD simulations, and QM/MM MD simulations. The predictive power of different approaches to distinguish resistant mutants from susceptible ones was evaluated by the following three metrics: EF, AUC, and accuracy (see S1-S4 Tables). Fig 4A and 4B, the best prediction performance was achieved by MM-PBSA combined with single structure. With this approach, AUC and accuracy were both above 0.9 while EF reached the highest value of 1.16 for both herbicides. The internal dielectric constant (ε) had little influence on the results. When ε changed from 2 to 4, accuracy was slightly improved from 0.93 to 1, and AUC rose from 0.94 to 1 for tribenuron methyl. However, there was no change in EF, AUC, and accuracy for thifensulfuron methyl. The top three ROCs for tribenuron methyl and thifensulfuron methyl (see S5 and S6 Tables) were plotted and shown in Fig 4C and 4D, respectively. For thifensulfuron methyl, the ROCs of MM-PBSA with ε being either 2 or 4 were the same, so only the ROC of MM-PBSA with ε of 2 was presented (Fig 4D). Among QM/MM-GBSA methods, PM6D-GBn (for tribenuron methyl, Fig 4C) and AM1D-GB OBC (for thifensulfuron methyl, Fig 4D) also demonstrated a high capability in differentiating between mutations causing resistance or susceptibility. Based on an ensemble of structures obtained from classical MD simulations, methods such as MM-PBSA, AM1D, and PM6D were better than most of the others. They were only inferior to the best performer (MM-PBSA combined with single structure). As AM1D performed well with both single structure and classical MD, it was chosen for depiction of the QM region in the QM/MM MD simulations. With an ensemble of structures from QM/MM MD, the QM/MM approaches showed similar discriminating ability for either herbicide.

Discussion
It is noteworthy that there was a large variation in the discerning ability of the tested methods on the basis of single structure (Fig 4A and 4B). MM-PBSA combined with single structure was the frontrunner among all approaches, but some QM/MM GBSA methods with the same sampling strategy led to the worst performance in this work. By contrast, the discriminating power remained stable across different approaches based on an ensemble of structures from either classical or QM/MM MD simulations. It was also observed that the ability of QM/ MM-GBSA to distinguish resistance mutations depended on the GB model and SQM correction. Here we further discuss how sampling techniques (i.e., single structure, classical MD, and In silico identification of resistance mutations QM/MM MD), GB models (GB OBC and GBn), and SQM corrections (D and DH) affected identification of resistance mutations using different methods for binding affinity calculation.
The influence of sampling technique is summarized in Fig 5A and Table 2. For MM methods, single structure outperformed both classical MD and QM/MM MD in terms of EF and AUC, whereas the difference between classical MD and QM/MM MD was subtle. Specifically, the EF of classical MD was slightly better than that of QM/MM MD (p-value < 0.001), but QM/MM MD had a higher accuracy than classical MD (p-value < 0.01). For QM/MM-GBSA methods, QM/MM MD was significantly superior to single structure in terms of AUC (pvalue < 0.01) and accuracy (p-value < 0.001) and was also better than classical MD in terms of EF (p-value < 0.05) and AUC (p-value < 0.001). The statistical difference between single structure and classical MD cannot be confirmed because the EF of single structure was statistically higher than that of classical MD (p-value < 0.05), but the accuracy of classical MD was better In silico identification of resistance mutations than that of single structure (p-value < 0.001). These results suggest that, when MM methods were employed to compute binding affinity, single structure was more suitable for distinguishing resistance mutations than the ensemble of structures from MD simulations. With QM/ MM-GBSA methods in use, the ensemble of structures sampled from QM/MM MD simulations was preferred.
The above results are in agreement with a previous study which reported that single structure performed as well as or better than MD simulations [60]. However, it is well known that, with single structure, the results depend on the starting structure, given that conformational changes are ignored. In this work, herbicides were first docked into the AtAHAS, and the  In silico identification of resistance mutations resulting structures were used as templates for building the KsAHAS-herbicide complexes. In a preliminary study, we tested an alternative modeling strategy, in which the KsAHAS structures (WT and mutated) were modeled first before the herbicides were docked into them. We observed that its discriminating power was unsatisfactory and much worse than what is reported here. Combined with single structure, most of the QM/MM-GBSA approaches were inferior to the MM approaches. This was possibly because the starting structures were energyminimized using the MM force field and only one structure was adopted in the calculation of binding affinity. MM-PBSA performed better than QM/MM-GBSA possibly because the PB model was more accurate and less computationally expensive than the GB model. The lower accuracy of QM/MM-GBSA was likely introduced by the GB model. Nevertheless, more indepth research is warranted to answer such questions as why certain methods performed better than others on a specific ligand-protein system. In binding affinity calculation, the solvation effect is usually estimated by implicit solvent models [60]. Between the two implicit solvent models used for calculation of binding affinity in this study, there was no statistical difference for MM methods in terms of EF, AUC, and accuracy (p-values > 0.05, see Fig 5B and Table 3). For QM/MM-GBSA methods, the GBn model achieved a significantly greater accuracy (p-value < 0.001) than the GB OBC model. Generally speaking, the GBn model was a better choice than the GB OBC model for identification of resistance mutations, which worked well for both MM and QM/MM-GBSA methods.
For the two SQM methods (AM1 and PM6), dispersion (D) and hydrogen bond (H) corrections were taken into account in this work. Compared to methods without corrections (i.e., AM1 and PM6), the SQMs with corrections (D and DH) notably enhanced prediction power measured by EF and AUC (p-values < 0.001, see Fig 5C and Table 4), which is consistent with previous studies [28,29]. However, the addition of hydrogen bond correction (AM1DH and PM6DH) resulted in lower AUC than the dispersion correction only (AM1D and PM6D, pvalue < 0.05), even though there was no statistical difference in EF and accuracy between them (p-values > 0.05). Therefore, SQM corrections were able to ameliorate the prediction performance of QM/MM-GBSA methods, and dispersion correction was more important than hydrogen bond correction.
The structural and functional characterization of the interaction between herbicides and their biomacromolecular targets is critical for better understanding and accurately assessing In silico identification of resistance mutations genetic mutation-conferred resistance. Therefore, modeling of KsAHAS-herbicide complexes is an important aspect of recognizing resistance mutations. It is impractical to acquire experimental data to exhaustively examine all variables such as plant species, mutation, and herbicide due to inhibitory costs. As a result, many herbicide-enzyme complexes are not well characterized. To safeguard the effectiveness and sustainability of herbicide-based invasive plant management, it is vital to accurately assess the impact of mutations on ligand binding. In this work, we modeled the complex structures of a WT and 28 mutant KsAHAS bound with two sulfonylurea herbicides (tribenuron methyl and thifensulfuron methyl), and investigated the ability of computational methods to distinguish between mutations causing resistance or susceptibility to either herbicide. Up to 100% accuracy was achieved when MM-PBSA combined with minimized single structure was employed to estimate the binding affinity of the two herbicides to the 29 KsAHAS structures. Although we observed that all susceptible mutants had mutations outside the binding pocket while the resistant mutants possessed mutations inside the binding pocket, it is premature to extrapolate this observation to other mutations due to both the small sample size in this study and the high number of possible single-point mutations and combinations of multi-point mutations.
The KsAHAS-herbicide sensitivity dataset used in this study was imbalanced with more resistant biotypes than susceptible ones. Imbalanced datasets may present a challenge to machine learning algorithms when the minority class is of interest, because these algorithms train a model by learning the features of the entire dataset. The more members a class has, the more the class is represented in the model. Although the imbalanced K. scoparia dataset was not an ideal one, it served the objective of this study, i.e., compare a wide variety of biophysicsbased in silico methods and identify the best one for discerning mutation-conferred herbicide resistance in field populations of invasive plant species. Here, we compared 24 different MM and QM/MM methods combined with three different structure sampling strategies (Fig 4 and  S1-S4 Tables). These methods showed differential performance, and MM-PBSA with single structure was identified as the best approach for the specific system of AHAS-herbicide complexes.
To further evaluate the impact of dataset imbalance on method performance, we conducted a sensitivity test by randomly selecting 4 to 24 resistant mutants together with all 3 sensitive mutants and the wild-type KsAHAS, and recalculating prediction accuracy for the new datasets. We plotted the mean and standard deviation of accuracy against the number of selected resistant mutants for seven scenarios of single structure sampling: MM-PBSA (ε = 2), MM-PBSA (ε = 4) and QM/MM-GBSA for both tribenuron methyl and thifensulfuron methyl, and MM-GBSA for thifensulfuron methyl (S1 Fig). The accuracy of 1000 unique random combinations (except for 25 combinations of 24 resistant mutants and 300 combinations of 23 resistant mutants, both of which were the maximum number of all possible non-redundant combinations) was influenced by the degree of dataset imbalance (i.e., ratio of resistant to sensitive mutants) in six of the seven scenarios. While the mean accuracy was little affected, variation in accuracy increased as the dataset became more balanced (i.e., fewer resistant mutants included). Obviously, MM-PBSA had smaller variations in accuracy and was more resistant to dataset imbalance than MM-GBSA and QM/MM-GBSA. Especially, the MM-PBSA method with ε = 4 for tribenuron methyl was not affected by imbalance because it determined that all resistant KsAHAS mutants had a lower binding affinity with the herbicide than the wild-type and sensitive mutants.
In summary, we present here how homology modeling, docking, and MD simulations were integrated to computationally predict the resistance of a mutated enzyme to an herbicide with no prior knowledge of the complex structure. The estimation of noncovalent interaction remains a big challenge in accurate identification of resistance mutations because binding affinity calculation is computationally expensive, and the calculation accuracy depends on sampling techniques and estimation methods. However, with increasing computational power, noncovalent interactions can be precisely modeled with more rigorous methods, and the complex structure can be sampled more thoroughly. This study demonstrates that excellent agreement between in silico prediction and experimental data of herbicide resistance can be achieved when appropriate computational approaches are chosen.

Disclaimer
The content is solely the responsibility of the authors and does not necessarily represent the official views of U.S. Army Corps of Engineers and U.S. Food and Drug Administration.
Supporting information S1 Fig. Influence of dataset imbalance on prediction accuracy of select methods. A sensitivity test was performed to examine the impact of dataset imbalance (i.e., the ratio between resistant (R) to sensitive (S) mutants) on method performance (evaluated using accuracy). See S5 and S6 Tables for the data input. The number of resistant mutants varied from 4 (R:S = 1:1) to 24 (R:S = 6:1). Shown are the mean ± standard deviation (n = 1000 unique random combinations, except 25 and 300 combinations of 24 and 23 resistant mutants included, respectively, because they were the maximum number of all possible non-redundant combinations). (DOCX) S1 Table. Enrichment