Exploring Successful Parameter Region for Coarse-Grained Simulation of Biomolecules by Bayesian Optimization and Active Learning

Accompanied with an increase of revealed biomolecular structures owing to advancements in structural biology, the molecular dynamics (MD) approach, especially coarse-grained (CG) MD suitable for macromolecules, is becoming increasingly important for elucidating their dynamics and behavior. In fact, CG-MD simulation has succeeded in qualitatively reproducing numerous biological processes for various biomolecules such as conformational changes and protein folding with reasonable calculation costs. However, CG-MD simulations strongly depend on various parameters, and selecting an appropriate parameter set is necessary to reproduce a particular biological process. Because exhaustive examination of all candidate parameters is inefficient, it is important to identify successful parameters. Furthermore, the successful region, in which the desired process is reproducible, is essential for describing the detailed mechanics of functional processes and environmental sensitivity and robustness. We propose an efficient search method for identifying the successful region by using two machine learning techniques, Bayesian optimization and active learning. We evaluated its performance using F1-ATPase, a biological rotary motor, with CG-MD simulations. We successfully identified the successful region with lower computational costs (12.3% in the best case) without sacrificing accuracy compared to exhaustive search. This method can accelerate not only parameter search but also biological discussion of the detailed mechanics of functional processes and environmental sensitivity based on MD simulation studies.


Potential energy of F1-motor and simulation protocol
In this study, total potential energy for the F1-motor consists of four terms: V γ is the off-lattice Go model for the intra γ subunit, in which the reference structure of γ was determined in 1994 (1BMF) and all parameters were described in Koga's work. See the next subsection for the detailed information of the off-lattice Go model potential V γ and parameters.
V α3β3 (R|X i ) is the switching Go model for the α 3 β 3 ring, in which R corresponds to the temporal structure and X i corresponds to the reference structure based on the 1994 complex. To mimic ATP hydrolysis in our simulation, we switched the interaction potential V α3β3 (R|X i ) for the α 3 β 3 ring according to CG-simulation scheme as shown in Fig. S1.The entire simulation for 360-degree rotation of the γ subunit is divided into 4 phases. Starting from phase-1 , after the constant time steps are simulated in each phase (8 × 10 4 steps for Newtonian dynamics, 8 × 10 5 steps for Langevin dynamics), then interaction potential V α3β3 (R|X i ) is switched into the next phase:V α3β3 (R|X i+1 ). The transition from i to i+1 is assumed to be accompanied by ATP binding (E → T P ), ADP and Pi release (DP → E), and ATP hydrolysis (T P → E) in each αβ subunit. The parameters for V α3β3 (R|X i ) are same as those used by Koga's work. (See the next subsection for detailed information of parameters of V α3β3 ).
V α3β3−γ is the excluded volume interaction (EVI) between γ and α 3 β 3 subunit: which is defined as In this study C was 4.0Å, was 0.36 kcal/mol, EV I was set to various values. Concretely, EVI is power of 2: EVI =1,2,4 . . . 2 N ; N was 8 for Langevin dynamics and 11 for Newtonian dynamics (EVI= 128 129.75 corresponding to the case studied in Koga's work). V anchor is introduced to maintain the F1-atpase complex system on a plane. This potential is defined as follows: where k is 0.18 kcal/mol/Å 2 , L i is the distance of i-th residues from initial position, and L i , 0 is 1.5Å for β 1 9, β 2 9, and β 3 9 and 2.0Å for γ 272 which were described by Koga's work.

Energy function of off-lattice Go model and model parameters
In this study, particulary for V γ (interaction of the intra-γ subunit) and V α3β3 (interaction of α 3 β 3 ring) the following Clementi's type of the C α Go potential are applied: where R represents the Cartesian coordinates, and X signifies the native structure; in switching Go-model reference structure X is switched according to scheme as shown in Fig. S1. In the above equation, the first term maintains the length of the virtual covalent bond; b ibd is the length of the i-th virtual bond between the i-th and i − 1-th amino acids. The second and third terms indicate the local bond angle and dihedral angle interactions: θ i and φ i are virtual bond angle between i − 1-th and i-th bond and around i-bond respectively. The fourth term is the native contact pairs, which are defined as follows: when one of the non-hydrogen atoms in the i-th amino acid is within a distance of 6.5Å from any non-hydrogen atom in the j-th amino acid in the native structure, the i-th and j-th amino acids are regarded as forming a native contact pair. The last term reflects excluded volume interactions (EVIs) between non-native contact pairs. Parameters with the subscript X are constants, whose values are taken from the the native structure X. We use the following values for interaction parameters throughout the present study: where m i (= 10.0) is the mass of each amino acid, V is the total system potential energy function, Γ i is the friction coefficient, and ξ i is the random force in solution environment. The time steps of each phase i(= 1, 2, 3, 4) for switching Go (see Fig. S1) is 8 × 10 5 . The random force is defined by a white and Gaussian noise that satisfies the following fluctuation theorems: where the bracket indicates the ensemble average, k B is the Boltzmann constant and T is the temperature. For the underdamped Langevin dynamics system we investigated the 3-dimensional parameter space defined by temperature T = 10, 20, 30 · · · 300[K], excluded volume interaction strength EVI= 2 0 , 2 1 , 2 2 · · · 2 8 , and friction coefficient Γ = 0.05, 0.1, 0.15, 0.2, 0.25. For each combination of parameters (T , EVI, and Γ) we conducted 10-trial simulations from the same initial structure (1994 complex structure) with different random noise sets {ξ}.
Here we have to note that the friction coefficient Γ ≤ 0.25 is much smaller than Γ 45 which estimated by taken account of the viscosity of water 8.0 × 10 −4 kg/(m s) in the experimental environment of the motor. In this study we applied the relatively lower friction coefficients Γ because we want to reproduce qualitative rotational motion of gamma-subunit of F1 smoothly with realistic simulation time steps.

Definition of rotation angle θ(t) of γ subunit in F1 motor
In our simulation system, the long axis of γ-subunit is nearly perpendicular to the plane defined by the three β 9 C α positions anchored by the positional constrained potential V anchor and passes the center of the triangle made by the three C α atoms. The counterclockwise rotation of the γ-subunit from the initial structure is monitored by a vector Ω pro (t) which is defined as the projected vector of Ω(t) on the plane which is defined as three anchored points (β 9).
where r(t) is the position of i-th C α at time t. The rotation angle θ(t) is defined by the angle between Ω pro (t) and corresponding vector in the initial structure (1994 complex structure).  phase-4 Figure S1: CG-simulation scheme. The ellipsoids (blue and green) correspond to the α and β subunits, respectively. The "TP", "DP", and "E" are the ATP-bound, ADP-bound, and nucleotidefree states, respectively. The entire simulation cycle for 360-degree rotation is divided into 4 phases. In each phase i (= 1, 2, 3, 4), V α3β3 (R|X i ) indicate the potential of α 3 β 3 ring for which the reference structure X i is based on the 1994 structure. The transition i → i + 1 phases are accompanied by ATP binding (E → T P ), ADP and Pi release (DP → E), and ATP hydrolysis (T P → E) in each αβ subunit. Each phase is simulated for the constant time steps (8 × 10 5 steps for Langevin dynamics, 8 × 10 4 for Newtonian dynamics). Figure S2: Distributions of the successful parameter with different threshold τ for Langevin dynamics simulations. The yellow and blue grids indicate "success" and "failure" parameters, respectively. The thresholds are 0.9 (a), 0.7 (b), 0.6 (c), and 0.5 (d).