Combined Theoretical and Experimental Study to Unravel the Differences in Promiscuous Amidase Activity of Two Nonhomologous Enzymes

Convergent evolution has resulted in nonhomologous enzymes that contain similar active sites that catalyze the same primary and secondary reactions. Comparing how these enzymes achieve their reaction promiscuity can yield valuable insights to develop functions from the optimization of latent activities. In this work, we have focused on the promiscuous amidase activity in the esterase from Bacillus subtilis (Bs2) and compared with the same activity in the promiscuous lipase B from Candida antarctica (CALB). The study, combining multiscale quantum mechanics/molecular mechanics (QM/MM) simulations, deep machine learning approaches, and experimental characterization of Bs2 kinetics, confirms the amidase activity of Bs2 and CALB. The computational results indicate that both enzymes offer a slightly different reaction environment reflected by electrostatic effects within the active site, thus resulting in a different reaction mechanism during the acylation step. A convolutional neural network (CNN) has been used to understand the conserved amino acids among the evolved protein family and suggest that Bs2 provides a more robust protein scaffold to perform future mutagenesis studies. Results derived from this work will help reveal the origin of enzyme promiscuity, which will find applications in enzyme (re)design, particularly in creating a highly active amidase.

. Key inter-atomic distances of the different states appearing along the hydrolysis of N-(4-nitrophenyl)-butyramide catalyzed by wild-type Bs2. Structures optimized at M06-2X/MM level. All distances are given in Å. RC TS1 INT1 INT1 TS2 INT2 INT2 TS3 INT3 INT3 TS4 Table S3. Electrostatic potential (in kJ·mol -1 ·e -1 ) generated by the enzyme on the key atoms of the reaction along the states appearing in the acylation step in Bs2 and CALB.  7 The differences in sequence together with the regions of missing residues are presented in Figure S1. Additionally, all histidine residues present in the enzyme were found to be neutral, with pKa values varying between 4.12 and 6.19. After a detailed inspection of surrounding of each histidine residue, it was concluded that all should be protonated in Nd position. Thus, once the hydrogen atoms were added to the structure, the 20 counterions ions were placed in the most electrostatically favorable positions in order to neutralize the system. Subsequently, the system was solvated by placing it in a 100 ´ 80 ´ 80 Å 3 pre-equilibrated box of TIP3P 10 water molecules. Any water with an oxygen atom lying in a radius of 2.8 Å from a heavy atom of the S9 protein was deleted. In order to equilibrate the total system, classical MD simulations were done. For N-(4-nitrophenyl)-butyramide the same force field parameters were used as determined in our previous work. 11 Such prepared model was optimized, then the system was heated to 303 K with 0.1 K temperature increment and equilibrated during short (100 ps) NPT MD simulations, the proper 100 ns of non-accelerated classical MD simulations were done using the NVT ensemble with AMBER force field, 12 as implemented in NAMD software. 13 The temperature during the MD simulation was controlled using the Langevin thermostat. 14 In order to improve the time of simulations, cut-offs for nonbonding interactions were applied using a smooth switching function between 14.5 and 16 Å. During MD simulations all atoms were free to move. Periodic boundary conditions were used. Time-dependent evolution of the root mean square deviations (RMDS) together with B-factor are depicted in Figure S2. Thus, one structure from the MD simulation was chosen for further study applying QM/MM approach. The starting structures were selected by its proximity to the average values of RMSD.
QM/MM simulations. In present work, the standard additive hybrid QM/MM scheme was used to construct the total Hamiltonian, H " !"/"" , where the total energy $%/%% is obtained as the sum of specific contributions, as presented in equation 1: where %% is the energy of the MM subsystem term, $%-%% In order to explore all PESs, the harmonic constraint of 5000 kJ·mol -1 ·Å -2 was used to maintain the proper interatomic distances along the reaction coordinate, and a series of conjugate gradient optimizations and L-BFGS-B optimization algorithms 22 were applied to obtain the final potential energy of the minimized constrained geometry. The QM sub-set of atoms were described by the Austin Model 1 (AM1) semiempirical Hamiltonian. The distances evolution was controlled by applying small size change of 0.1 Å when the distance between two heavy atoms was explored, or 0.05 Å when the transfer of light hydrogen atom was involved.
A micro-macro iteration optimization algorithm 23,24 together with the Baker's algorithm 25,26 was used to localize, optimize, and characterize the transition states (TS) and structures using a Hessian matrix containing all the coordinates of the QM subsystem, whereas the gradient norm of the remaining movable atoms was maintained less than 0.25 kcal mol -1 Å -1 . The Intrinsic Reaction Coordinate 10 (IRC) was traced down from located TSs to the connecting valleys in mass-weighted Cartesian coordinates. And the same micro-macro iteration optimization algorithm was used to optimized reactant complex (RC), intermediates (Is) and product complex (PC). The existence of the saddle-points, as well as those located in minima, was confirmed by frequency calculation. Thus, for TS structures, only one imaginary value of S12 frequency was registered, while for structures from located in the minimum of the PESs no imaginary values were found.
Free Energy Surfaces. FESs were obtained, in terms of two-dimensional potential mean force (2D-PMF), 27 for every step of the reaction using the Umbrella Sampling approach 27,28 combined with the Weighted Histogram Analysis Method (WHAM). 29 The procedure for the PMF calculation is straightforward and requires a series of molecular dynamics simulations in which the distinguished reaction coordinate variable, x, is constrained around particular values.
The values of the variables sampled during the simulations are then pieced together to construct a distribution function from which the PMF is obtained as a function of the distinguished reaction coordinate (W(x)). The PMF is related to the normalized probability of finding the system at a particular value of the chosen coordinate by eq 2: The activation free energy can be then expressed as: where the superscripts indicate the value of the reaction coordinate at the reactants (R), and at the TS ( ‡), and 6 ( 5 ) is the free energy associated with setting the reaction coordinate to a specific value at the reactant state. Normally this last term makes a small contribution, and the activation free energy is directly estimated from the PMF change between the maximum of the profile and the reactant's minimum: The selection of the reaction coordinate is usually trivial when the mechanism can be driven by a single internal coordinate or a simple combination (as the antisymmetric combination of two interatomic distances). However, this is not the case for all possible steps of the reaction subject of study in this paper where many coordinates are participating. Instead, we were S13 compelled to obtain a much more computationally demanding 2D-PMF using two coordinates: 1 and 2. The 2D-PMF is related to the probability of finding the system at particular values of these two coordinates: To estimate the activation free energy from this quantity, we recovered one-dimensional PMF changes tracing a maximum probability reaction path on the 2D-PMF surface and integrating over the perpendicular coordinate.
Thus, a series of MD simulations were performed adding a constraint for the selected reaction  Table S5.
where S is the two-dimensional spline function and Δ 99 :9 is the difference between the energies obtained at low-level (LL) and high-level (HL) of the theory of the QM part. The AM1 semiempirical Hamiltonian was used as LL method, while the DFT method was selected for the HL energy calculation. In particular, HL energy calculations were performed by means of the hybrid M06-2X functional using the standard 6-31+G(d,p) basis set. These calculations were carried out using the Gaussian09 program.
Phylogeny. The phylogenetic tree was build using a random sample of 150 sequences of proteins belonging to EC 3.   In order to validate the CNN for our particular task, 1000 samples applying random rotations of Bs2 and CALB were predicted, and the ratio of correct classifications was used as measure of the performance of the CNN.
Finally, an alanine scan was done in order to figure out which are the structural determinants in both Bs2 and CALB. For that, all the residues that had an atom inside the cube were mutated to alanine consecutively, obtaining a library of mutants. The mutations were introduced using Modeller. 7 Then, 1000 predictions were done in the same way as explained before. With this strategy we were able to highlight which are the residues that the CNN has learned to be crucial for its correct classification. A classification score was computed by the difference between the classification ratio of the actual class and the ratio of the most probable of the remaining classes

S17
In the above equation s stands for the score, c is the number of correctly classified instances, t is the total number of instances, i1 stands for the actual class and i2 and i3 are the rest of the classes. In this sense a score of 1 corresponds to a perfectly classified mutant and a score of -1 is the one completely misclassified. Finally, after each iteration the mutation that dropped the score the most was accumulated for the next iteration. In this way we performed up to 5 iterations.  The data was collected in positive electrospray ionization mode and analyzed using Waters

S22
The plasmid containing the gene for wild-type Bs2 was transformed into Ca 2+ chemically competent BL21 (DE3) cells and grown on LB agar plates supplemented with kanamycin (50 μg/mL) at 37 °C overnight. One colony from the plate was picked to inoculate a 10 mL LB starter culture containing kanamycin (50 µg/mL) and grown at 37 °C and 180 rpm overnight.
The starter culture was diluted into 1 L of fresh LB media containing kanamycin (50 µg/mL). Kinetic assay analysis. In order to convert the absorption reading obtained from the kinetic assay into concentrations for the determination of kinetic data a calibration curve was generated using 10, 50, 100, 250 500, 1000, 2000, 3000 µM of 4-nitroaniline in 150 µL buffer (50 mM NaPi, pH 7.0, 10% DMSO). Raw absorption data was converted into product concentrations and data analysis with Origin 2020 was performed to obtained kinetic data assuming Michaelis-Menten kinetics. The maximum velocities vmax were obtained from the linear range of the product concentration vs. time plots and used to calculate kcat and KM for wild-type Bs2.