Incorporating Prior Knowledge in the Seeds of Adaptive Sampling Molecular Dynamics Simulations of Ligand Transport in Enzymes with Buried Active Sites

Because most proteins have buried active sites, protein tunnels or channels play a crucial role in the transport of small molecules into buried cavities for enzymatic catalysis. Tunnels can critically modulate the biological process of protein–ligand recognition. Various molecular dynamics methods have been developed for exploring and exploiting the protein–ligand conformational space to extract high-resolution details of the binding processes, a recent example being energetically unbiased high-throughput adaptive sampling simulations. The current study systematically contrasted the role of integrating prior knowledge while generating useful initial protein–ligand configurations, called seeds, for these simulations. Using a nontrivial system of a haloalkane dehalogenase mutant with multiple transport tunnels leading to a deeply buried active site, simulations were employed to derive kinetic models describing the process of association and dissociation of the substrate molecule. The most knowledge-based seed generation enabled high-throughput simulations that could more consistently capture the entire transport process, explore the complex network of transport tunnels, and predict equilibrium dissociation constants, koff/kon, on the same order of magnitude as experimental measurements. Overall, the infusion of more knowledge into the initial seeds of adaptive sampling simulations could render analyses of transport mechanisms in enzymes more consistent even for very complex biomolecular systems, thereby promoting drug development efforts and the rational design of enzymes with buried active sites.


Table S1. Properties of tunnel identified in
) Tunnels schemes.In the Tunnels scheme, not only positions of DBE molecules were seeded, but also the corresponding structure of LinB86 harboring appropriate tunnel fragments was used.Protein structure is shown as gray (static) or white (ensemble) cartoon.Positions of DBE are shown in red sticks, except for the Tunnels scheme in which sticks are colored according to the tunnel seeded: p1a (blue), p1b (cyan), p2 (green), and p3 (red).
Text S1 Protocol for MD simulation for tunnel calculations used during seeding with scheme Tunnels: For MD setup, AMBER18 1 package was used and the crystal structure of LinB86 was retrieved from the PDB database (ID 5LKA).The periodic boundary conditions were maintained by particle mesh Ewald method 2 with the SHAKE algorithm for 4 fs time-step and hydrogen mass repartitioning method 3 .For minimization, the system was subjected to five rounds of minimization consisting of 500 steps of steepest descent minimization.After minimization, the system was gradually heated at canonical NVT ensemble from 0 K to 200 K with restrained heavy protein atoms for 20 ps using the Langevin thermostat.Then the system was equilibrated for 2 ns in three rounds.First using harmonic restraints, the system was gradually heated to 310 K within 100 ps and keeping constant temperature for another 900 ps in canonical NVT ensemble.Secondly, the system was subjected to constant pressure and temperature (NPT) while restraining the backbone atoms for 1 ns, using a weak-coupling barostat.Finally, 100 ns of NPT simulation without restraints was performed.
The MD trajectory was analyzed to identify geometric tunnels by CAVER 3.0.1 4,5 with the starting points of tunnels defined by the following active site residues: Asn38, Asp108, Trp109, and His272, using a probe radius of 0.9 Å, shell radius 3 Å and shell depth 4 Å.The obtained tunnels were clustered by hierarchical clustering with a clustering threshold of 3.5 Å.For each known tunnel, i.e., p1a, p1b, p2, and p3 (Table S1), the 100 most opened tunnels were selected for positioning the DBE molecule along the entire tunnel length by the CaverDock program 6 .As a result, sets of DBE positions along the tunnel and the corresponding upper-bound interaction energies were generated for each tunnel ensemble.These energies were converted to approximate maximal migration barriers for tunnel segments of 1 Å length by finding the highest energy in the segment and subtracting the global minimal interaction energy found among the entire tunnel ensemble (Figure S2-S8).
Next, we have generated composite tunnels with putatively minimal energy costs for DBE migration through the respective tunnel conformations (Figure S9).For each tunnel segment, the tunnel conformation with the lowest migration barriers for DBE transport was selected unless the barrier of the previous optimal tunnel conformation was within 1 kcal/mol for this segment too.In such a case, the same tunnel conformation was retained to avoid frequent switching between the tunnel conformations.This procedure resulted in the generation of a subset of tunnel segments with the corresponding protein conformations with favorably bound DBE along the whole tunnel length (Figure S1D).

Figure S1 .
Figure S1.Initial seeds of the DBE molecule around the LinB86 enzyme used in four investigated schemes.A) Bulk, B) Cavity, C) Cavity&Bulk, and D) Tunnels schemes.In the Tunnels scheme, not only positions of DBE molecules were seeded, but also the corresponding structure of LinB86 harboring appropriate tunnel fragments was used.Protein structure is shown as gray (static) or white (ensemble) cartoon.Positions of DBE are shown in red sticks, except for the Tunnels scheme in which sticks are colored according to the tunnel seeded: p1a (blue), p1b (cyan), p2 (green), and p3 (red).

Figure S2 .
Figure S2.DBE migration energy barriers profile of p1a tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.1 kcal/mol).

Figure S3 .
Figure S3.DBE migration energy barriers profile of p1b tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.0 kcal/mol).

Figure S4 .
Figure S4.DBE migration energy barriers profile of p2a tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.0 kcal/mol).

Figure S5 .
Figure S5.DBE migration energy barriers profile of p2b tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.0 kcal/mol).

Figure S6 .
Figure S6.DBE migration energy barriers profile of p2c tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.0 kcal/mol).

Figure S7 .
Figure S7.DBE migration energy barriers profile of p2d tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.0 kcal/mol).

Figure S8 .
Figure S8.DBE migration energy barriers profile of p3 tunnel ensemble.The values correspond to the maximal interaction energy in each bin (1 Å length) predicted by CaverDock decreased by the global minimal interaction energy in the tunnel ensemble (-3.0 kcal/mol).

Figure S9 .
Figure S9.Creation of composite tunnel with corresponding bound poses of DBE molecule for seeding.Step 1: Compiling the ensemble of tunnels and maximal migration barriers for DBE molecule computed based on CaverDock energy profiles for each tunnel.Step 2: generation of composite tunnels (from several favorable segments of tunnels from the ensemble).Finally, in Step 3: the positions of ligands are selected for seeding based on the composite tunnel profiles by placing the DBE in favorable positions.

Figure S10 .
Figure S10.Implied time scales of MSM generated for three replicates of the Cavity scheme and corresponding spectral separation analysis.The arrow indicates the last point considered for deciding the number of metastable states.

Figure S14 .
Figure S14.Chapman-Kolmogorov tests for replicate 2 of the Cavity scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S15 .
Figure S15.Chapman-Kolmogorov tests for replicate 3 of the Cavity scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S16 .
Figure S16.Chapman-Kolmogorov tests for replicate 1 of the Cavity&Bulk scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S17 .
Figure S17.Chapman-Kolmogorov tests for replicate 2 of the Cavity&Bulk scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S18 .
Figure S18.Chapman-Kolmogorov tests for replicate 3 of the Cavity&Bulk scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S19 .
Figure S19.Chapman-Kolmogorov tests for replicate 1 of the Tunnels scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S20 .
Figure S20.Chapman-Kolmogorov tests for replicate 2 of the Tunnels scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S21 .
Figure S21.Chapman-Kolmogorov tests for replicate 3 of the Tunnels scheme.Plots show that MSMs correctly represent original MD simulation data.

Figure S22 .
Figure S22.The COM of residues used to characterize DBE locations within the LinB86 structure.A) three catalytic residues defining the bottom of the active site cavity, B) residues defining the bottleneck of the p1a tunnel, C) residues defining the bottleneck of the p1b tunnel, D) residues defining the bottleneck of the p2 tunnels, and E) residues defining the bottleneck of p3 tunnel.The COMs are shown as blue spheres and the positions of the residues are defined by their CA atoms, shown as red spheres.

Figure S23 RMSD calculated
Figure S23 RMSD calculated from ASMD simulations.The individual trajectories in each replicate were joined into a single aligned trajectory and deviation in position of Cα atoms of entire protein except the first 10 residues forming the unstructured N-terminus (Figure S24) was analyzed.

Figure S24 RMSF calculated
Figure S24 RMSF calculated from ASMD simulations.The individual trajectories in each replicate were joined into a single aligned trajectory and fluctuation of all atoms in each residue was analyzed.

Figure S25 .
Figure S25.Region preferentially occupied by DBE molecule at approximately 25 Å-distance from the catalytic residues in Bulk scheme replicate2.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 1 % structures with DBE molecule having distance to catalytic residues between 25 and 26 Å is shown as red surface.The distance is measured to the COM of catalytic residues (Figure S22A).

Figure S26 .
Figure S26.Overview metastable states derived from MSM of Cavity scheme replicate1.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S27 .
Figure S27.Overview metastable states derived from MSM of Cavity scheme replicate2.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S28 .
Figure S28.Overview metastable states derived from MSM of Cavity scheme replicate3.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S29 .
Figure S29.Overview metastable states derived from MSM of Cavity&Bulk scheme replicate1.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S30 .
Figure S30.Overview metastable states derived from MSM of Cavity&Bulk scheme replicate2.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S31 .
Figure S31.Overview metastable states derived from MSM of Cavity&Bulk scheme replicate3.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S32 .
Figure S32.Overview metastable states derived from MSM of Tunnels scheme replicate1.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S33 .
Figure S33.Overview metastable states derived from MSM of Tunnels scheme replicate2.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S34 .
Figure S34.Overview metastable states derived from MSM of Tunnels scheme replicate3.Protein structure is shown as a gray cartoon while the region occupied by DBE molecule in 20 % (1 % for bulk solvent state) of 1000 structures representing given metastable state is shown as red surface.

Figure S35 .
Figure S35.Characteristic distances of representative structures of metastable states from the Cavity scheme.

Figure S36 .
Figure S36.Characteristic distances of representative structures of metastable states from the Cavity&Bulk scheme.

Figure S37 .
Figure S37.Characteristic distances of representative structure of metastable states from the Tunnels scheme.

Figure S38 .
Figure S38.Clustering of metastable states from all MSM analyses into unified ligand states (ULS).The location of individual metastable states (FiguresS26-S34) according to their fingerprints of characteristic distances (FiguresS35-S37) in space formed by three principal components (PC) from the PC analysis.The disks are colored based on the cluster membership from the HDBSCAN method, with the gray disk corresponding to cluster composed of outliers.The metastable states are labeled according to the seeding scheme (C-Cavity, C&B-Cavity&Bulk, and T-Tunnels) followed by replicate (r1-3) and the metastable state number (m0-7) and colored by their assignment to the respective ULS.