Computational analysis of dynamic allostery and control in the SARS-CoV-2 main protease

The COVID-19 pandemic caused by the novel coronavirus SARS-CoV-2 has no publicly available vaccine or antiviral drugs at the time of writing. An attractive coronavirus drug target is the main protease (Mpro, also known as 3CLpro) because of its vital role in the viral cycle. A significant body of work has been focused on finding inhibitors which bind and block the active site of the main protease, but little has been done to address potential non-competitive inhibition, targeting regions other than the active site, partly because the fundamental biophysics of such allosteric control is still poorly understood. In this work, we construct an elastic network model (ENM) of the SARS-CoV-2 Mpro homodimer protein and analyse its dynamics and thermodynamics. We found a rich and heterogeneous dynamical structure, including allosterically correlated motions between the homodimeric protease's active sites. Exhaustive 1-point and 2-point mutation scans of the ENM and their effect on fluctuation free energies confirm previously experimentally identified bioactive residues, but also suggest several new candidate regions that are distant from the active site, yet control the protease function. Our results suggest new dynamically driven control regions as possible candidates for non-competitive inhibiting binding sites in the protease, which may assist the development of current fragment-based binding screens. The results also provide new insights into the active biophysical research field of protein fluctuation allostery and its underpinning dynamical structure.

GENENMM -pdb 6lu7_holo1.pdb -c 8 -ca -het -mass -res -fcust ffile The crystallographic structure of a protein is assigned by -pdb flag. The cut-off distance (-c, by default 12 Å) was set to 8 Å and heteroatoms (HETATM in PDB notation) from the PDB file were included using -het, if required. Hookean spring strengths (by default 1 kcal Å −2 mol −1 ) around selected residues can be altered via the -fcust flag. The associated ffile specifies which residue springs' moduli are modified. By default, all ENM nodes' masses (atoms from PDB structure are converted into ENM nodes) are set to the same fixed value, 1 amu, but actual atomic masses (-mass) were used in our ENM. The -ca flag creates an ENM model where amino acids' Cα atoms form nodes. In combination with -ca, the -res flag assigns the Cα nodes the whole residue mass. GENENMM Generates the interaction matrix for the ENM, written into matrix.sdijf file. Next, the DIAGSTD function diagonalises the interaction matrix using small-block diagonalisation and iterative schemes (1).
where -i flag specifies input. The product of the diagonalisation is written into the matrix.eigenfacs file. We calculate the inter-atom distance and cross-correlation of motion maps for the Cα ENM of the protein using SPACING and CROSCOR functions, respectively. SPACING -pdb 6lu7_holo1.pdb -ca -het and CROSCOR -i matrix.eigenfacs -s 7 -e 31 -het where -s and -e flags indicate the first and the last normal modes to include in calculation. Note, the first six normal mode frequencies, which correspond trivially to translational and rotational motion, are equal to zero. Finally, fluctuation free energies for the set of the normal modes are calculated via the FREQEN function as follows FREQEN -i matrix.eigenfacs -s 7 -e 106 The temperature (T ) value used to calculate the partition function Z and Gibbs free energy G was 298 K, the default value for FREQEN function.
Normal mode frequencies of global modes in proteins are in the acoustic regime (less than 10 THz) (2,3). This fact is especially true for the slowest modes we investigated in this study. In this classical limit, ω kBT 1 at 298 K temperature. Therefore, FREQEN function in DDPT employs the following expression for fluctuation free energy calculation In the differences (∆G) and difference of a difference (∆∆G) in fluctuation free energy (Eq. 2) significance of 1 2 ω kBT term is negligible.
B.2. The N3 Inhibitor Coarse-graining. The N3 peptide-like inhibitor consists of four chemical constructs: 02J, AVL peptide, PJE and 010. DDPT reads each heavy atom from 02J, PJE and 010, and creates nodes with corresponding atomic coordinates and mass for each atom in elastic network. This procedure is default for the HETATM record type in PDB files with the chosen GENENMM flags. However, the AVL peptide is classified as ATOM record type, thus it is treated in the same way as the amino acids in the main chain: the tripeptide is represented as three nodes with each amino acid's alpha carbon coordinates and mass corresponding to the amino acids' mass.
In consequence, each ligand results in 32 elastic network nodes: 29 from 02J, PJE and 010 constructs, and 3 nodes from the AVL tripeptide -shown in figure S2.
C. Fluctuation Energy Convergence. The allosteric free energy change is calculated using the fluctuation free energy (Eq. 6) change for three forms of the protein: For the apo form, the fluctuation free energy curve is smooth and stably converging with the number of modes included (Fig.  S4). However, figure S5 shows a poorer convergence in the ratio of dissociation constants, due to the inherent noise-amplification in taking the difference of a difference in the (numerical) fluctuation free energies. However, the K2/K1 values fluctuate stably in the 1.02-1.06 range when the cut-off in mode number is between 23 and 33 modes. The higher-mode number structure that sets in beyond mode 50 is also apparent in the mutation scan calculations shown in S6. For this reason and three others stated in section 2 of the main paper text, we have chosen to stop summing at 25 non-trivial modes. The 1-point mutational scan for the apo form is stable in respect of the identification of biologically active sites for increasing values of mode-sum cut-off (Fig. S6A). The same mutational scan, but for allosteric energy change, is less stable (for the same reasons of numerical-difference as in the sensitivity of K2/K1 of figure S5 above) (Fig. S6B). Nevertheless, the identified bioactive sites persist stably with mode cut-off. As the sign of the stiffening modelling local mutation is changed, we observed a similar, and induced, sign change in the allosteric response but, again, the patterns of bioactive residue persist. Visual inspection of the hydrophobic environment around residue 214 and 284-286 (Fig. S7) informed the choice of magnitudes of kR/k in our ENM that corresponds to the experimental mutation.
The identified candidate regions that exhibit allosteric activity in our ENM study (Tab. 1) were visualised in 3D space (Fig.  1).

D. Definition of residues for dynamically allosteric control in SARS-CoV-2 M pro ENM.
We use results from the 1-point mutational apo scan (Fig. 3A) to suggest potential control residues on both homodimeric chains distant from the active sites. This map captures significant fluctuation free energy change in the ENM dynamics of residue 214 and 284-286 mutations. The active residue patterns converge with fluctuation mode summation. In order to be considered a candidate control residue, the residue must pass two criteria: 1. Absolute fluctuation free energy change must be greater or equal to the smallest absolute value of experimentally evidenced dynamically allosteric residues (214 and 284-286) at kR/k=0.25 or 4.00.
2. Cα node distance between the candidate residue and the catalytic active site residues (H41 and C145) in the ENM must be greater or equal to twice the ENM cut-off value (16 Å).
The first criterion filters out all residues which do not contribute to free energy change significantly; the second criterion ensures that the control effect of mutation or binding at the candidate residue is not caused by spatial proximity. We made only one exception (for E14) which is located on the homodimer interface; this showed desirable activity but was 15.1 Å away from C145. Residues that passed the two criteria above are documented in table 1 and coloured black in figure 1 of the main manuscript text.

E. 2-point spring constant mutational scan.
Double-mutation effects were explored from a continuous range of spring-stiffening and weakening in calculations of apo and allosteric free energies, restricted to the residue pair of C145 and H41, but with all possible pairwise combinations of spring constant change kR/k in range from 0.25 to 4.00 over the first real 25 fluctuation modes. Results are displayed in (Fig. S8). We note the following preliminary findings: • Region of wild-type values forms a curve that creates a larger parameter region for negative contributions to free energies in all apo scans.
• The allosteric scans indicate an approximately linear addition when the residue pairs are identical on the two chains, but when they differ generate a non-linearity in which stiffening at the C145 site has less effect than weakening.