Investigating the Unbinding of Muscarinic Antagonists from the Muscarinic 3 Receptor

Patient symptom relief is often heavily influenced by the residence time of the inhibitor–target complex. For the human muscarinic receptor 3 (hMR3), tiotropium is a long-acting bronchodilator used in conditions such as asthma or chronic obstructive pulmonary disease (COPD). The mechanistic insights into this inhibitor remain unclear; specifically, the elucidation of the main factors determining the unbinding rates could help develop the next generation of antimuscarinic agents. Using our novel unbinding algorithm, we were able to investigate ligand dissociation from hMR3. The unbinding paths of tiotropium and two of its analogues, N-methylscopolamin and homatropine methylbromide, show a consistent qualitative mechanism and allow us to identify the structural bottleneck of the process. Furthermore, our machine learning-based analysis identified key roles of the ECL2/TM5 junction involved in the transition state. Additionally, our results point to relevant changes at the intracellular end of the TM6 helix leading to the ICL3 kinase domain, highlighting the closest residue L482. This residue is located right between two main protein binding sites involved in signal transduction for hMR3′s activation and regulation. We also highlight key pharmacophores of tiotropium that play determining roles in the unbinding kinetics and could aid toward drug design and lead optimization.


MD Simulations
Classical MD simulations Figure S1 -Root mean square deviation (RMSD) of the protein backbone with respect to the crystal structure PDB 4U14 during the three, 10 ns-long production runs in A) the tiotropium-bound and B) the apo forms.

MLTSA Dataset Creation
All datasets containing interatomic distances were created using mdtraj, the outcomes of the simulations were assigned using the original biasing CV value from unbinding. If after 5ns the distance was under XÅ it was assigned IN, otherwise if bigger than YÅ it was assigned OUT.

XYZ-PCA Dataset
To assess the importance of protein-protein distances, an additional dataset was created using the XYZ coordinates of all protein atoms (except hydrogens). The number of protein atoms for the system is ~2247, which yields around 6741 coordinates. To be able to reduce this number, a PCA projection was applied to the data. In Figure X, the resulting explained variance ratio can be found for the different PCA components resulted from the calculation.  At around 100 components, the explained variance ratio dropped close to 0. We decided to use these 100 new components as features for training.

3Å and 6Å Datasets
The dataset was created by selecting all protein atoms within 3Å of any of the ligand atoms in the starting TS structure and defining all interatomic distances between ligand atoms and protein heavy atoms.

3Å+Loop Datasets
To assess the importance of particular residues outside the pocket, datasets with all interatomic distances from ranging from I222 to T231 were added to the previous 3Å set, including the ECL2-TM5 junction region. (ECL2-TM5 dataset).
To validate the relevance of the ECL2-TM5 loop, an additional dataset with residues I501 to C516 around ECL3 were also tested. However, no distances from this loop were identified as important either by MLP or GBDT models.

MLP
The MLP model was setup using the MLPClassifier from Scikit-Learn 2 , using 3 layers (input, hidden, output) with as many input nodes as features used in the first layer, 100 hidden nodes in the second layer using the ReLU 3 activation function and 1 output layer made of 1 node with a logistic activation function (0-IN,1-OUT). The model was optimized using the Adam solver, 4 with a learning rate of 0.001, iterating over data until convergence or upon reaching the maximum number of iterations (500 epochs). Convergence is determined by the tolerance and the number of epochs with no change in loss. When having 10 consecutive epochs with less than 0.0001 improvement on the loss, the training stops, and it is considered that the model has reached convergence. All datasets used the same parameters for training.

GBDT
The GBDT model used was setup using the GradientBoostingClassifier implementation from Scikit-Learn, 2 we trained 500 decision stumps as weak learners minimizing a logistic loss function, with a learning rate of 0.1. The Friedman Mean Squared Error (MSE) was used for to assess the split of the internal nodes, using a minimum of 2 samples to split, and 1 sample required at the leaf nodes. The maximum depth of the individual estimators was 3, without a limit on the maximum number of features to consider for the best split or the leaf nodes. Training was done using a validation fraction of 0.1 internally.

Training at different times
To find the optimum time-range from the downhill trajectories to train the ML models, a training at different times was performed. Using a time window of 0.05 ns at a time, we tested from 0.05 ns to 0.5 ns. It was found that between 0.05 ns and 0.1 ns would be as early as possible without sacrificing reasonable accuracy.      Figure S15 -Top: RAD and RFI for the 3Å adding interatomic distances from an alternate loop ECL3. Residues newly included range from I501 to C516. Top distances marked in red, top residues highlighted in color. Bottom: Average per residue RAD and RFI for the 3Å+ECL3. Top residues marked in red.