Unveiling Switching Function of Amino Acids in Proteins Using a Machine Learning Approach

Dynamics of individual amino acids play key roles in the overall properties of proteins. However, the knowledge of protein structural features at the residue level is limited due to the current resolutions of experimental and computational techniques. To address this issue, we designed a novel machine-learning (ML) framework that uses Molecular Dynamics (MD) trajectories to identify the major conformational states of individual amino acids, classify amino acids switching between two distinct modes, and evaluate their degree of dynamic stability. The Random Forest model achieved 96.94% classification accuracy in identifying switch residues within proteins. Additionally, our framework distinguishes between the stable switch (SS) residues, which remain stable in one angular state and jump once to another state during protein dynamics, and unstable switch (US) residues, which constantly fluctuate between the two angular states. This study also illustrates the correlation between the dynamics of SS residues and the protein’s global properties.


Selected atoms in amino acids
To determine angles within individual amino acids, we considered them as rigid bodies and chose atoms positioned at their edges and nodes.These selected atoms effectively capture the overall structure and dynamics of the amino acids.Table S1 shows the list of chosen atoms within each amino acid.

Basic vs dihedral angles
In general, angles formed among three atoms describe local geometry around individual atoms with more details and achieve more information about the spatial arrangement of Table S1.Selected atoms within amino acids structures.atoms around a central atom and dynamics of an amino acid in comparison with those created among four atoms (dihedral angles).Importantly, as Table 2 in the main manuscript shows, seven amino acids in Fs peptide protein have been identified as switch residues using the basic angles (between sets of three atoms), however, only ALA9 amino acid exhibits switching function using dihedral angles.This confirms that some structural information is overlooked when relying only on dihedral angles (see GitHub repository for scripts of both basic and dihedral angle analyses).Fig. S1 displays plots of basic (a) and dihedral (b) angles within ALA22 amino acid in the Fs peptide protein.Although the amino acid is classified as the switch residue using the basic angles (the switch angle is highlighted in red in Fig. S1a), it is identified as non-switch residue based on dihedral angles (Fig. S1b).It is important to clarify that all angular states with single minima may not satisfy the switch definition.It is obvious that the switch residue is defined as amino acids switching between two distinct angular states with no intermediate state, as only the highlighted angular state in Fig. S1a meets this definition.

Instability ratio
Our study uncovered that switch residues, despite having similar densities of angular states, can significantly vary in terms of their structural dynamics and stability.Specifically, the stable switch (SS) residue remains stable in one angular state, undergoing a single transition to another state during protein dynamics.On the other hand, the unstable switch (US) residue constantly fluctuates between the two angular states.To detect them using ML models, we defined the Instability ratio: Instability ratio = total number of transitions between the two angular states length of trajectories In order to determine the total number of transitions between the two angular states, we employed k-means algorithm, which is an unsupervised clustering technique, implementing the Scikit-learn library.By identifying the state of each point along the trajectory, we were able to quantify the number of transitions occurring between the two states and subsequently calculate the Instability ratio.The Instability ratio effectively distinguishes the SS and US residues when it is either below 1% (Fig. S2a) or above 6% (Fig. S2b).However, differentiating between the two becomes difficult when the Instability ratio falls within the intermediate range (1%<Instability ratio<6%).This difficulty arises due to the varied distributions of transitions' timesteps between the two angular states throughout the trajectories.Some cases with very close Instability ratios in the intermediate range (e.g., 5.69% and 5.76%) can exhibit entirely different transition distributions over the trajectories, as depicted in Fig. S2c,d.To address this challenge, we used the Logistic Regression model to establish the Table S2.List of residues in β 2 AR receptor classified as bimodal switch residues using the RF model.The angle switch ratio (ANSR) and atom switch contribution (ATSC) are reported for the residues.Highlighted residues display the SS residues.The Ballesteros-Weinstein numbers are utilized to represent the amino acids.Instability ratio for classifying the SS and US residues.
4 US and SS residues in β 2 AR receptor Fig.S1.Basic (a) vs dihedral (b) angles within ALA22 amino acid in Fs peptide protein.The identified switch angle formed by basic angles is highlighted in red.The plot represent the residue-atom that formed the angles.
Fig.S2.Instability ratio as a metric to distinguish the SS and US residues.(a) Instability ratio <1% introduce SS residues, (b) Instability ratio >6% describe US residues.(c)(d) the stability of the states is significantly influenced by the distributions of transitions' timesteps for intermediate range of instability ratios (between 1% and 6%).Consequently, the Logistic Regression model is utilized to classify the SS and US residues.
TableS2shows the list of residues classified as US and SS in β 2 AR receptor using the RF model.The table also presents the angle switch ratio (ANSR) and atom switch contribution (ATSC) for each switch residue.The ANSR represents the ratio of switch angles to the total angles within a single residue.The table is ordered based on the ANSR values.As shown in the table, the Y 209 5.48 residue contains the most angle switch ratio within the β 2 AR receptor.It is a justifiable assumption that residues with higher ANSR values demonstrate stronger correlations with the global properties of a protein.The ATSC quantifies the extent