SARS-CoV-2 Main Protease Inhibitors: Structure-Based Enhancement to Anti-Viral Pre-Clinical GC376 Encourages Further Development

: SARS-CoV-2 Main protease (M pro ) is pivotal in viral replication and transcription. M pro mediates proteolysis of translated products of replicase genes ORF1a and ORF1ab. Surveying pre-clinical trial M pro inhibitors suggests potential enhanced e ﬃ cacy for some moieties. Concordant with promising in vitro and in silico data, the protease inhibitor GC376 was chosen as a lead. Modi ﬁ cation of GC376 analogues yielded a series of promising M pro inhibitors. Design optimization identi ﬁ ed compound G59i as lead candidate, displaying a binding energy of (cid:1) 10.54 kcal/ mol for the complex. Robust interactivity was noted between G59i and M pro . With commendable ADMET characteristics and enhanced potency, further G59i analysis may be advantageous; moreover, identi ﬁ ed key M pro residues could contribute to the design of neotenic inhibitors.


INTRODUCTION
Vaccination should be an effective strategy to manage SARS-CoV-2, but there may be other concurrent interventions.Because the virus is genomically volatile, variants such as omicron BA.5 are continually being identified with enhanced rates of transmission and infection. 1Since a continuum of infection is likely with peaks and troughs 2 and high mutation rates are directly proportionate to greater resistance, combination therapy seems appropriate.Although the number of approved drugs is low, current resources are increasingly focused on improving the antiviral stock. 3oreover, efficacy of approved oral anti-virals molnupiravir and paxlovid has limitations. 4The mutagen molnupiravir reduces the mortality in specific circumstances but with lower rates than early trials suggested and early administration is advised. 5Paxlovid has statistically significant efficacy in high-risk patients (p ¼ 0:0001). 6Accordingly, a range of antiviral drugs mitigates continued the research.Given that the drug development ab initio is generally a protracted process, and the demand is urgent, drug repositioning is a prudent option.
Accordingly, in silico investigation of the coronavirus genome and life-cycle would be timely and potentially provide a focus for direct experimentation.Furthermore, possible interventions with other RNA viruses would be a valuable collateral benefit.In this context, GC376 is a promising inhibitory antiviral after the drug showed notable success in feline peritonitis virus 7,8 and was selected as a scaffold for this study to identify new compounds with improved efficacy.Attention in this study focused on important nonstructural proteins (NSPs) encoded by the SARS-CoV-2 genome, including two cysteine proteases: A Papainlike protease (PL2 pro ) and a 3-Chymotrypsin-like protease (3CL pro ), or main protease (M pro ). Figure 1 locates these NSPs in the viral replicative cycle.
The SARS-CoV-2 RNA genome contains a 5 0 -methylated cap and 3 0 -polyadenylated tail, with twothirds of the genome encoding for the large replicase polyproteins 1a and 1b (pp1a and pp1ab).These polyproteins are cleaved by PL2 pro and M pro at specific sites, producing 16 essential NSPs that are required for both viral replication and transcription.The functional leverage of M pro , combined with the absence of closely related human homologues, prioritizes the enzyme as an attractive target.The leverage potential of M pro inhibition is underlined by the validation of numerous inhibitor-enzyme crystal structures currently available in clinical trials. 9M pro is cardinal in mediating replication and transcription pathways through the proteolysis of pp1a and pp1ab at 11 sites yielding the salient NSPs 4-16.
The most efficacious currently approved M pro inhibitors were escalated in this study to docking protocols in silico and subsequent molecular dynamics (MD) analysis.Appropriate design objectives were high noncovalent binding affinities, optimum pharmacokinetic profiles and minimal side effects.
Reviewing the sequence of all peptides cleaved by M pro , a distinct motif is observed.The proteolysis cascade involves a four-step cycle (Fig. 2) Proteolysis always occurs between a Gln residue (P1), and a Ser, Ala, or Asn residue (P1 0 ).Furthermore, a bulky nonpolar residue such as Val, Leu, or Phe, always precedes the key Gln residue P2. 10 Hydrogen bonds are established between the natural substrate and residues Thr24, Thr26, Asn142, Asn143, His163, His164, Glu166, Gln189 and Thr190 of the active site. 11This bonding aligns the peptide within the active site, promoting close contact between the target Gln residue and the M pro catalytic Cys145.The substrates P4-P1 0 range is recognized as the most valuable area for this anchoring process, similar to that of SARS-CoV-2. 12A recent study has revealed the proteolysis mechanisms of M pro by quantum mechanics/molecular mechanics methodology. 13nderstanding M pro structure and its proteolytic activity underpins the design of approved M pro inhibitors and should contribute to developing new and more efficacious drugs.The active site residues of HIV and -thrombin proteases show strong similarities with SARS-CoV-2 M pro . 14GC376 is an antiviral with known inhibitory capacity against several viruses. 15The possibility therefore arises that a structure developed in this study from GC376 may also be an inhibitor for these enzymes.This point is reinforced by identification of compound G59i interactions with SARS-CoV-2 M pro residues that have commonality with proteases of other RNA viruses.Compound G59i and the development from GC376 is shown in Sec. 3. The compound was named for the 59th analogue of GC376 (G59), then the 9th modification of G59 since "i" is the 9th letter of the alphabet.
This study found the compound G59i to be a promising M pro inhibitor compared with GC376.Compound G59i displays enhanced binding affinity and has an encouraging ADMET prediction.MD analysis indicates the SARS-2 M pro -G59i model is stable into biologically relevant times.

Molecular docking
Molecular models were constructed in Discovery Studio Visualiser (Client) v16.1.0.15350 (DSV Client), BioVia Draw 4.1 (BVD) 16 and UCSF Chimera v1.15. 17Docking protocols were rendered in Autodock v4.2.6 18 downloaded from MGLTools v1.5.6 and visualized via Pymol 2.4 19 and DSV Client.Drug relevance was estimated with OSIRIS Property Explorer. 20 pro in complex with the plant flavonoid baicalein (PDB 6M2N) was selected as macromolecule because of suitable X-ray diffraction resolution (2.20 Å) and its homodimeric complex with a noncovalent inhibitor.Absence of a covalent inhibitor preserves the natural His41-Cys145 dyad state, allowing for an improved estimation of the binding mechanisms.Separation of M pro from the enzyme-substrate complex was completed in UCSF Chimera and the isolated protein file was uploaded to the Autodock Tools.Further preparation included the separation of the monomer (Chain-A) from the tetramer complex.Kollman charges were assigned and AutoDock atom types were applied.
2D ligands designed in BVD were converted into tertiary structures with DSV and submitted to the docking software.Flexibility was assigned to each ligand by defining the rotatable, unrotatable and nonrotatable bonds.AutoDock maps co-ordinates for docking to an active site in a 3D Grid box, saving parameters as a grid parameter file (.gpf).  10 First, a proton is transferred from Cys145 to His41, together with the nucleophilic attack on the carbonyl carbon atom of the peptide bond by the sulfur atom of Cys145, thus leading to a thiohemiketal intermediate, E:I1.Second, the cleavage of the peptide bond between Gln at P1 and residue at P1 0 is assisted by the proton transfer from the protonated His41 to the nitrogen atom of the P1 0 residue, forming the acyl-enzyme complex intermediate, E:I2.Third, the P1 0 polypeptide branch is released from the active site and an activated water molecule attacks the carbonyl carbon atom of the P1 Gln, concomitant with a proton transfer to His41, resulting in E:I3.Finally, the covalent bond between Cys145 and Gln in this protonated thiohemiketal intermediate is broken to release the second product species (P1 polypeptide branch) of the reaction E:P.
Supplementary Table S1 itemizes the co-ordinates for the 6M2N grid box.
AutoDock implements four different search algorithms: Simulated annealing, genetic algorithm (GA), local search and a hybrid global-local Lamarkian GA (LGA).LGA was employed in these AutoDock experiments as the primary conformational search algorithm.LGA generates a trail population of conformations of each ligand, leading to a selection of conformations possessing the minimum binding energy level.LGA uses a mix of molecular and quantum theory, and is computationally cheap, therefore taking less time to complete docking procedures.The LGA is utilized as a scoring function in the AutoDock tool and is used to generate the binding affinity of the docking simulation.The parameters specifying which algorithm to use are contained in a docking parameter file (DPF).
Fifty runs were completed for each docking.A range of binding energy between consensual efficacy (À5 kcal/mol) and potential toxicity contingent on irreversible noncompetitive inhibition (À15 kcal/mol) was selected. 21Binding energy was graded within these boundaries; interactions were noted and residues involved.The lowest binding energy conformation was selected from the pdbqt data for the isolated enzyme.For visual assessment, this was opened in DSV.Auto-Dock returns for the four G59i steroisomers were identical except binding affinity for isomer-1 (Supplementary Fig. S5(a)) was marginally lower.
Overlay method, chemical resemblance and binding energy were all used for the validation of the docking methodology (Supplementary Figures S1-S3).The metrics indicated good similarity of fit between the crystallized Baicalein-M pro complex, and the docked (Fig. 3).

Analogue design
The docking and in vitro profile of pre-clinical trial M pro inhibitors were analyzed to determine a lead compound(s).Baicalein and GC376 showed the most promising results, followed by hydroxychloroquine (Table 1).However, because adverse side effects are associated with the treatment of COVID-19 using hydroxychloroquine, this was not continued as a lead compound.Other molecules reviewed were novel compounds with no current in vivo data.Therefore, derivative analogues would be created with unknown pharmacokinetic and pharmacodynamic properties (other than in silico).AutoDock allows a 3% tolerance on predicted values for affinity. 22olecular candidates were designed in BVD and Property Explorer OSIRIS (PEO), primarily using bioisosteric replacement of lead compounds. 20Subsequent analogues were modified using structure-activity relationships and a structure-based drug design.Key residues, binding energies and interactions were tabulated.As an acceptable method of lead candidate validation, molecular designs were submitted to SwissADME for estimation of absorption, distribution, metabolism and excretion properties. 27Based on comparison with archived substructure searches, the software determines  substantial predictive models of drug likeness based on lipophilicity, solubility, medicinal chemistry, pharmacokinetic and physicochemical properties.ADMET characteristics were further reviewed with PEO.Drug relevance was predicated on toxicity risk, octanolwater partition coefficient, aqueous solubility calculation, molecular weight, fragment-based drug-likeness and overall drug score.Toxicity risk assessment -This prediction derived from an archive of pre-determined fragments (core and functional groups) taken from the Registry of Toxic Effects of Chemical Substances database.The drug is flagged if it contains fragments that display one or more of mutagenicity, tumorigenicity, irritating effects, or reproductive effects.
Octanol-water partition coefficient ðlog PÞ calculation -This calculation method is implemented by adding contributions of every atom based on its atom type.368 atom types are distinguished by the atom properties (atomic no and ring membership) and neighboring atom properties (bond type, aromaticity state and atomic no).The accepted range for this follows Lipinski's rule of 5 i.e., log P value 5.
The aqueous solubility ðlog SÞ calculation -This calculation aims to avoid poorly-soluble compounds.The same properties are taken into consideration as in the log P calculation.The favorable log S range is > À4.
Molecular weight -The molecular weight is accepted as < 450 Daltons.
Fragment-based druglikeness and overall drug score -Druglikeness is based on a list of $ 5300 substructure fragments from Fluka TM research chemicals 28 with associated druglikeness scores.The score value of those fragments that are present in the molecule under investigation are summarized and a value is given.The drug score combines this druglikeness with the four properties above in a single equation (Eq.( 1)) that is used to deduce the compound's overall potential as a drug.
where ds ¼ drug score, S i is the contribution calculated directly from of cLogP, logS, molweight and druglikeness and t i is the contributions taken from the four toxicity risk types.A drug designed in PEO was redrawn with BVD and transferred to Microsoft Word for record.
A covalent bond with Cys145 characterizes most crystallized M pro inhibitors as potentially irreversible.Accordingly, analogues in this study were designed specifically as reversible noncovalent inhibitors.Candidates were submitted to Racoon v1.0b for molecular screening and ADMET profile predictions were detailed in SwissADME. 27AutoDock data of natural compounds and repurposed drugs e.g., the anti-viral Remdesivir sampled in this study for docking with an M pro isolate, are itemized in Table 1.Baicalein and GC376 were selected as scaffolds predicated on binding affinity and the number of key residues available for ligation.

Molecular dynamics
MD simulations were run in NanoMolecular Dynamics NAMD 2.13 Linux-x86 64-multicore (NAMD) with the default CHARMM27 force field.NAMD applied a hydration cube with 15 Å sides, giving a total of 24,197 atoms and the Particle Mesh Ewald summation set to a cut-off at 8 Å.The default application Visual Molecular Dynamics (VMD) was accessed as graphical user interface.
The water model used was AMBER TIP3P 29 with an explicit solvent.Periodic boundary conditions (PBCs) were applied and the system neutralized by addition of an appropriate number of Na þ counter-ions via VMD Auto-ionize Plug-in v.1.5.A median salt concentration of 150 mM was implemented using the Tcl interface of NAMD to reflect physiological conditions and pH retained at the default setting of 7.0.
Systems were minimized by 50; 000 Â 0:01 Å steps of steepest descent and 5000 steps of equilibration.Initial and target temperatures were 0 K and 310 K, respectively.The default setting of zero was used for the pressure variable.A constant volume simulation for 100 ps was run at 310 K with the barostat off to thermalize the system.The Langevin algorithm was incorporated as barostat in the configuration file.To provide good equilibration, the models were then run in a constant pressure simulation for 100 ps at 310 K with the barostat on.Hydrogens were constrained using the SHAKE algorithm. 30For production runs, initial temperature was 0.0 K and the system heated to the nominal physiological temperature of 310 K. Simulations were set at 300 ns and a time-step of 2 fs was used for plotting trajectories.
Metrics for analyzing MD results were root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (R G ), H-bond frequency, time-dependent (TD) secondary structure formation, TD torsion angles (delta-' and delta-), TD ionic bond formation and TD solvent accessibility surface area.

Design of G59i
The series of modifications in the development of G591 are shown in Fig. 4. GC376 is a prodrug for GC373 which bonds covalently with the Cys145 residue of a protease and thereby critically impedes viral function.GC376 to fits well into the M pro binding site through H-bonding with seven residues (Fig. 5).This network involves the dyad residue His41, Phe140, Ser144, Glu166, His164, His172 and Gln189.Further -alkyl interactions are seen with residues His41, Met49 and Pro168.The 6-carbon cyclic of GC376 and Met165  have a -sulfur interaction.Recent research also identified GC376 as an efficacious inhibitor of M pro26 and in vivo analyses indicate a lack of obvious toxicity. 31C376 was selected as a scaffold predicated on aspects of validated research confirmed in this study.GC376 analogues were designed and docked with an M pro isolate endeavoring to enhance binding affinity, pharmacokinetics and ADMET attributes.Attention focused on analogue G42 based on these metrics.
G42 was designed by bio-isosteric replacement of GC376 functional groups.O18 of the 5C-cyclic was removed to reduce the polar surface, which was predicted as being too high.Sulphonic acid was replaced by indoline to reduce polarity and quantity of rotatable bonds.Butyl was replaced by a cyclopentylamine to improve OSIRIS drug score calculations.An increase in binding affinity from À6.84 to À8.64 kcal/mol was noted.Interacting residues are shown in Fig. 6.
G42 binds to M pro His41, Met49, Leu141, Ser144, Cys145, Met165, Glu166, Asp187 and Arg188.G42 indoline settles in S1 where pyridine forms two weak Hbonds with Cys145 thiol.The thiol hydrogen acts as an HBD for both, establishing the -donor 3.65 Å H-bond with the phenyl ring, and conventional 4.11 Å H-bond with the tertiary amine.Three carbon-H-bonds are created between the pyrrolidine of G42 and the hydroxide Glu166 and Ser144 hydroxyl groups, and Leu141 carboxyl.G42 displays VDW interactions in S1 and S3 with residues Phe140, Gly143, His163, His164 and His172.An unfavorable donor-donor interaction is also observed between the Glu166 amino group and the secondary amide sitting in S3.The cyclopentylamine settles in the S2 pocket with VDW interactions to Cys44, Leu50, Tyr54, Pro52 and Gln189.The primary amine forms H-bonds with Asp187 and Asp48 carboxyl groups, each hydrogen acting as HBDs.The nitrogen heteroatom is a HBA in a carbon-H-bond with Arg188.Met49 thioether and His41 imidazole, form alkyl and -alkyl interactions with the G42 cyclopentylamine fragment, respectively.The lone phenyl group extends deep into the S4 pocket, where a -sulfur interaction is made with Met165 thioether.G42 and S4 residues Leu167, Pro168 and Gln192 have VDW interactivity.
G59 was derived from G42 by removing the phenyl group to reduce the number of rotatable bonds.The amide heteroatoms preceding this were also removed as interactions with M pro were absent.This fragment was replaced with a tetrahydroisquinoline group, and double bonding with cyclopentylamine.Imine replaced amine to reduce the number of rotatable bonds.The secondary amide was replaced by carbonyl to remove an unfavorable donor-donor bond with Glu166 to provide an HBA.Indoline was replaced with indane-2-one to extend the functional group into the S1 pocket.The weak H-bond formed with the tertiary amine was removed to improve drug score and allow for potential binding between the carbonyl and Cys145.
Interestingly, docked G59 (Fig. 7) shows indane-2one extending deep into the S4 pocket rather than S1 as more plausible.phenol viastacking and -cation interaction.Piperidine amine and carbonyl form an intramolecular H-bond.The amine also H-bond with His164 carboxyl.G59 carbonyl forms also H-bonds with Glu166 amino, replacing unfavorable donor-donor bonding of G42.S1 VDW interactions are absent, but present in S1 0 with Thr25, Val42 and the key residue Cys145.
G59i (Fig. 8) was predicated on the G59 docking results.S1 0 hydroxyl was removed because of inactivity.Tetralin phenyl was thiolated at a different location, to mimic the thiol-amine H-bond in the His41-Cys145 dyad.S3 carbonyl was replaced with hydroxyl with the intention of establishing additional H-bonds with residues Glu166, Met165 and His164.The cyclopentylimine of G59i, as seen with the previous inhibitors, inserts deeply into the S2 subsite establishing VDW interactions with Asp48, Pro52, and Gln189.Three H-bonds are formed with Met49, Asp187 and Arg188.Asp187 and Met49 carbonyl oxygen and G59i nitrogen act as HBAs.G59i cyclopentyl established an alkyl interaction with Met49 thioether and Cys44 thiol.A -alkyl interaction occurs between His41 imidazole and the S2 cyclopentyl fragment.G59i conformation reverts to the intended conformation of G59 with indane-2-one sitting deep within S1 and tetralin within the S3/S4 area.Indane-2-one phenyl group forms two -donor H-bonds with Cys145 thiol and Asn142 primary amide.Indane-2-one carbonyl oxygen as an HBA bonds His172 imidazole.An extensive network of S1 VDW interactions involve Phe140, Leu141, Ser144, Gly143 and His163.Tetralin thiol H-bonds with the carboxyl of S4 Glu166 carboxyl.Met165 thioether establishes a -alkyl interaction with tetraline phenyl.Additionally, Leu167 and Gln192 display VDW interactions with G59i in S4.The same G59 intramolecular H-bond is apparent in G59i with reduced distance (2.60 Å > 2.14 Å).Analogue G59i was predicted to have enhanced druglikeness and was submitted for more detailed analysis.

Molecular dynamic results of the
G59i-M pro complex MD analysis established the M pro -G59i complex maintained a kinetic energy level of 15 kcal/mol AE2% for 300 ns.RMSD of the ligand showed a slight upward trend from $ 1:6 Å at t 0 $ 2:9 Å at t 300 but no significant configurational changes (Supplementary Figure S7).RMSD for the M pro -G59i complex indicated some dynamic activity in most residues between Asp56 and Asp92 for a short period, and to a lesser extent later in the simulation.The overall system retained structural integrity.RMSD of 39.71 was calculated for the entire complex.RMSF expressed as a per residue timeline shows very little fluctuation indicating that averaged over the simulation, individual particles have minimal Cartesian displacement compared with a reference point (Supplementary Figure S8).This implies the RMSD value mainly reflects displacement of molecules in the water box applied during MD simulation.
H-bond frequency showed the complex has dynamic but stable characteristics: the majority of H-bonds are nontransient, whilst a few are temporary but iterative (Supplementary Figures S9 and S10).The mean frequency of formation formatted as a histogram was determined to be 46 frame À1 The radius of gyration (R G ) trajectory for M pro -G59i v water showed an expected upward trend after minimization, which continued with a reduced gradient (Supplementary Figure S11).The expansion derived almost exclusively from the water-box dynamics.Although PBCs were applied, some water molecules leave the system temporarily before reinstatement within the box at a different site.
A per residue timeline of M pro -G59i complex ' torsional angles indicates minimal fluctuation (Supplementary Figure S12a).Several consistent torsional angles are predicted by the metric as a per residue timeline with only marginal background variation.Taken together with ' torsions, individual residues when complexed with G59i maintain a stable configuration for the duration of the docked MD simulation (Supplementary Figure S12b).
M pro -G59i complex ionic bond formation as a per residue timeline indicate the data resonate closely with RMSD, RMSF and H-bond frequency data.Only minor variations in molecular integrity of the complex are predicted (Supplementary Figure S12c).Changes in torsional angles for the M pro -G59i complex show very little variation from their typical values, ' ¼ À140 and ¼ À130 .Exceptions with intermittent reduced ' angles are as follows: Cys44, Ser158 and Glu166.Ala191 has increased ' angles for the duration of the simulation.Increased angles are noted in Arg188 and intermittent reduced ' angles in Asp56 and His172.Cys44, His172, Arg188 and Glu166 have noncovalent interactions with G59i.
Aspartate residues show numerous ionic bonds.All except some Asp56 and Arg60 links are intermittent or transient.Asp58 and Asp187 appear to be particularly active.The latter participates in M pro -G59i ligation as does Asp48 which shows limited interaction with the ligand.Ionic bonding may reinforce other noncovalent interaction in the active site.
Ramachandran plotting predicts most secondary structures in the M pro -G59i complex generally have favorable torsion angles (Fig. 9).Most parallel and antiparallel -sheets are evenly located in the core region of the À'= À quadrant and almost all in the allowable region.-helices are slightly less favorable since $ 50% are in the allowable region.Three left-hand -helices are located in the allowable region, and several outliers in the þ'= À quadrant are noted.These may be insignificant errors in the model or inconsequential features of the complex.
Variation in secondary structure for the M pro -G59i complex was observed to be low.Extended-configuration -sheets and -sheet turns showed almost no reconfiguration.Between Cys44 and Met49, -sheet turns and 310 helices were found to alternate with apparent periodicity.The region also showed very occasional transient -helix formation.-sheet turns appear to alternate with random coils between Asn95 to Thr98 and also Val186 to Asp197.Similarly, -helices in the regions between Ser10 to Glu14 and Glu55 to Asn63 alternate transiently with -sheet turns.
Solvent accessibility surface area analysis suggests little or none of G59i is accessible to a solvent.This tends to reinforce data from other analyses since molecular properties may differ for a ligand superficial to M pro rather than internal.Leu30, Ile59, Arg76, Val104, Cys160 and Phe181 return values that show these residues are probably solvent accessible.

Key binding mechanisms
Several M pro amino acids have been identified here as essential for inhibitor binding and possible de novo drug design.In terms of inhibitor binding mechanisms, residues can be grouped into the subsites where they are most active.

Subsites 1 and 1 0
Subsites 1 and 1 0 (S1 and S1 0 ), specifically the His41-Cys145 dyad, are the most important target areas.Six current inhibitors form a bond covalently with Cys145.However, these covalent inhibitors are not currently clinically tested.Potential problems associated with irreversible inhibitors discourages the design of such agents.The potent inhibitors that fail to bind covalently to Cys145, consistently interact in an alternative method with the residue.For example, G42 and G59i both form -donor H-bonds with Cys145, and G59 binds viainteractions.An established H-bond, interaction, or VDW interaction, is present between all of the top ranked docked GC376 analogues and Cys145.The most promising inhibitors all interacted with His41, but not all formed H-bonds.However, interactions with residue His41 are much less common among the GC376 analogues.
The three key residues of S1 are Phe140, Asn142 and His172.Inhibitors often interact with Phe140, by Hbonding with its carboxyl group.This interaction is typified inter alia by GC376.The phenyl group of remdesivir sits shallowly in the S1 pocket and displays interactions with the side chain of Phe140.GC376 analogues have almost exclusive VDW interactions with Phe140.In this study, 166 compounds were docked and analyzed: GC376 þ 93 analogues, baicalein þ 67 analogues, carmofur, chloroquine, hydroxychloroquine and remdesivir.From these samples, only GC376 and remdesivir interact with His172.Table 2 itemizes the compounds with most favorable binding affinity including GC376 and G42.
The carboxyl group of GC376 forms a carbonhydrogen bond with His172 imidazole, and the phenyl ring of remdesivir interacts with the same group via stacking.A similar hydrogen bond observed between M pro residue His172 and compound GC376, also arises in G59i.H-bonding in both cases is between a carbonyl functional group and His172.

Subsite 3
Glu166 extends from S1 to S4 in close proximity of the active site.Probably contingent on its position, Glu166 is important for binding of most M pro inhibitors and spans multiple subsites.The Glu66 backbone sits in subsite 3 (S3) where secondary amino and carboxyl groups regularly H-bond.The carboxyl side chain locates in S1 usually as an HBA.Further Glu166 interactions arise in S4, as seen with the thiol-carbonyl hydrogen bond formed with G59i.Met165 is another key amino acid proximal to Glu166, where the sidechain permeates S3.Most of the prime inhibitors interact with Met165 thioether.At His164, the amino acid chain diverges from the active site.However, His164 carboxyl carbonyl angles towards S3 acting as an HBA.This H-bond arises in GC376, chloroquine and G59.

Suggested drug design for M pro inhibition
The His41-Cys145 dyad of M pro is best targeted at the Cys145 thiol group in S1.The thiol readily yields its proton to His41 imidazole, and the electron-rich sulfur atom will undertake a nucleophilic attack on a proximal carbocation.A primary or secondary carbocation is suitable for this covalent bonding.Cys145 covalency has been achieved by using a vinyl group, hydroxymethylketone and aldehyde group. 33Replacing hydroxymethylketone with a reactive nitrile species underpins the development of Paxlovid. 34The current in silico study postulated a noncovalent bond with Cys145 may also be achieved via HBA or HBD interaction with an amine, amide, carboxylic acid or hydroxyl species.Potential toxicity associated with covalency may be avoided as a collateral benefit.These can be employed as substituents between S1 and S1 0 , potentially forming hydrogen bonds with the Cys145 thiol, His41 imidazole ring, or the Asn142 carboxamide side chain.Contingent on the orientation and depth of these substituents in S1 or S1 0 , there is also a potential for H-bonding with Leu141 and Ser144.
A heterocyclic with an HBD extended back into S1 and an HBA/HBD towards S3, is suggested at S1.This should interact with Phe140 phenyl and His172 imidazole through -interactions.The HBD is predicted to H-bond with Phe140 carboxyl, whereas the HBA towards S3 could interact with Glu166 carboxyl.By utilizing the space within S1, exploitation of the S1 VDW system is also possible.Hydrophobic functional groups are suitable for S2.The nonpolar baicalein available 6C cyclic establishes multiple -interactions in S2 with Cys44, His41 and Met49.The docking simulations have shown that the presence of a cyclopentylamine/cyclopentylimine at S2 is propitious in the S2 VDW web as -interactions are replaced by potential carboxyl Met49, Asp187 and Asp48 H-bonds.
The main chain amide and carboxyl of Glu166 can be targeted in S3.Most inhibitors here take advantage of the Glu166 backbone.The addition of an HBD or HBA in S3, or peripheral to S1 and S4, will likely lead to H-bonding with Glu166.If an inhibitor sits deep enough in S3, an HBD on the chain between S1 and S2 could H-bond with the His164 backbone carboxyl.
A heterocyclic is optimal at S4.An HBD or HBA might H-bond with Gln189, Glu166, Gln192 and possibly Pro168.VDW and interactions, arise with Met165, Pro168, Gln189 and Glu166.The heterocyclic indane of G59 exemplifies this in S4, H-bonding with Pro168 and Gln192, and a -interaction with Met165.The GC376 analogue G59i derived from these observations shows improved binding energy and pharmacokinetics compared to current M pro inhibitors.

G59i
A large number of compounds related to GC376 were tested for our modeling study (Table 2).G59i achieved pre-eminent inhibitory characteristics; the predicted ADMET profile and pharmacokinetics satisfy all criteria for druglikeness (Supplementary Figure S6, Supplementary table S3).The molecule is nonBBB transient and has a synthetic accessibility factor of 4.27 suggesting preparation of G59i should be straightforward.G59i was the most potent inhibitor (binding affinity À10.54 kcal/mol) of our designed GC376 analogues, interacting with multiple key residues, including the His41-Cys145 dyad.(Fig. 8).
G59i forms seven H-bonds within the active site between the key residues Asn141, Cys145 and His172, in S1/S1 0 ; Met49, Asp187 and Arg199, in S2; and Glu166 in S4.The G59i orientation within S2 facilitates the important VDW network, and -interactions with key residues His41 and Met165 (Fig. 10(a)).GC376 analogues are generally poorly interactive with Phe140.This is likely due to the functional groups being too shallow in the S1 pocket, or alternatively bearing towards S3.This fragment orientation is present in G59i binding, probably derived from intramolecular Hbonds pulling indane-2-one towards S3 and Glu166 (Fig. 10(b)).
Compared with GC376, G59i has enhanced binding energy (À6.84-À10.54kcal/mol).It binds with an additional H-bond, and lacks the covalent Cys145 bond of GC376.G59i is predicted to have a commendable ADMET profile (Supplementary Fig. S6).When comparing G59i to other current M pro inhibitors, the binding mechanisms are unique; the G59i S2 interactions are unseen in other compounds.Current inhibitors mimic the natural peptide ligand.Whilst this was factored into the creation of G59i, accounting for subsite properties was foremost.This, and the larger number of predicted H-bonds between G59i and M pro , is encouraging.

CONCLUSION
SARS-CoV-2 M pro is a valid target for anti-viral drugs.This study has found His41, Met49, Phe140, Asn142, Met165, Glu166, Asp187, Arg188 and Gln189 are potentially important in M pro inhibition.However, residues His172 and Tyr54, may be less significant than customarily suggested.Instead, previously unacknowledged residues Asn142, His163 and His164 may be preferential for de novo design.Although VDW interactions are nominally weak, two networks in the hydrophobic S1 and S2 pockets are utilized in all prime inhibitors.The data collected here implicate S1, S1 0 and S2 pockets as essential target regions for inhibitor design.Modified from GC376, structure G59i has the characteristics of a promising M pro inhibitor showing greater binding affinity than current M pro inhibitors in silico.G59i has advantageous pharmacokinetics and no apparent toxicity concerns.Detailed MD analysis indicates the structural integrity of M pro -G59i is stable in a biologically relevant context.Moreover, several important SARS-CoV-2 and M pro features clarified in this study could be relevant to inhibitor design.G59i may also inhibit proteases in other RNA viruses which strongly recommends it worthy of further development.

FUTURE PERSPECTIVES
The COVID-19 pandemic is still globally prevalent as of 2022, with no eradication predicted in the near future.Despite worldwide efforts, many people are still suffering from financial, mental and physical trauma.Whilst vaccines are currently the most promising solution in controlling this pandemic, there are still concerns surrounding their use.Vaccine production, distribution, storage and administration are financially challenging and intricate which often prohibits availability to underdeveloped nations.The difficulty is exacerbated by the SARS-CoV-2 RNA genome, whose genetic instability induces more mutations, increasing the challenge to develop an effective vaccine.
Utilizing viral proteases has shown promising results in past treatments, 35,36 thus identifying a promising avenue for antiviral drug development.Referencing M pro , this study has highlighted numerous potential drug compounds under consideration.The identification of noncovalent drugs, although time consuming, offers the safest opportunity to develop a potential inhibitor.The unpredictability for required treatment length, promotes minimization of associated side effects.Several crystallographic compounds of ligands bound to M pro have been established.These encouraging benchmarks are highly relevant to future work.Continued insight into SARS-CoV-2 molecular biology should consolidate the rational design of potential new inhibitors.
We anticipate an active participation in this strategy and are optimistic of in vitro projects to endorse and develop our findings.We note these may partially address concerns regarding ongoing SARS-CoV-2 variants in particular and RNA viruses in general.

EXECUTIVE SUMMARY
. Because SARS-CoV-2 is genomically volatile, variants are continually being identified with apparent increased transmissibility and infectivity as exemplified by the omicron variant.
. The functional leverage of the SARS-CoV-2 main protease (M pro ), combined with the absence of DOI: 10.1142/S273741652350014X closely related human homologues, prioritizes the enzyme as an attractive target.
. GC376 is an anti-viral with known inhibitory capacity against several viral proteases, including M pro .G59i is an analogue of GC376 that has a promising M pro inhibition docking profile (À10.5 Kcal mol −1 ) in silico, binding with numerous key M pro residues.Based upon our in silico data, G59i would have a good ADMET profile for a potential orally bioavailable drug . Molecular dynamic simulations have predicted that the G59i-M pro complex is stable.
. Results here encourages further research into the potential of G59i as an M pro inhibitor.
. This study has highlighted key M pro residues, and drug substituents, that may provide support in the design of future M pro inhibitors.
. Our conclusions may also apply to other RNA viruses.

Fig. 1 .
Fig. 1. (Color online) SARS-CoV-2 M pro function and inhibition consequence.Pre-and post-inhibition pathways are represented in blue and red, respectively.Inhibition of M pro prevents RNA synthesis and therefore viral replication and virion assembly, resulting in failure of host cell to release a new intact virion.

Fig. 2 .
Fig. 2. (Color online) Molecular mechanism of the proteolysis catalyzed by M pro as deduced from QM/MM FESs.10 First, a proton is transferred from Cys145 to His41, together with the nucleophilic attack on the carbonyl carbon atom of the peptide bond by the sulfur atom of Cys145, thus leading to a thiohemiketal intermediate, E:I1.Second, the cleavage of the peptide bond between Gln at P1 and residue at P1 0 is assisted by the proton transfer from the protonated His41 to the nitrogen atom of the P1 0 residue, forming the acyl-enzyme complex intermediate, E:I2.Third, the P1 0 polypeptide branch is released from the active site and an activated water molecule attacks the carbonyl carbon atom of the P1 Gln, concomitant with a proton transfer to His41, resulting in E:I3.Finally, the covalent bond between Cys145 and Gln in this protonated thiohemiketal intermediate is broken to release the second product species (P1 polypeptide branch) of the reaction E:P.

Fig. 3 .
Fig. 3. (Color online) Overlay of the docked conformation of reference ligand baicalein over its bioactive conformation.

Fig. 5 .
Fig. 5. (Color online) Crystallographic structure of M pro in complex with drug GC376 (PDB: 6WTT).(a) 3D molecular model of GC376-M pro monomer complex, with a solid ribbon protein style colored by secondary structure.(b) Binding interactions between GC376 and M pro .Interactions are colored as follows: Conventional hydrogen bond-green, -donor hydrogen bond-Light green (light green bond between His41 and baicalein is a carbon hydrogen bond),t-shaped interaction-dark pink, -alkyl-light pink and -sulfur-orange.

Fig. 4 .
Fig. 4. (Color online) Chemical structures of enhanced inhibitors derived from compound GC376.Blue circles highlighting the functional changes, and red circles highlight the fragment modifications from scaffold to G59i.The progression in silico to G42 involved three modifications of functional groups of GC376; four modifications from G42 to G59 and two modifications from G59 to G59i.Detailed description in text.

Fig. 6 .
Fig. 6. (Color online) Docking conformation and binding interactions of the M pro -G42 complex.(a) 3D structure of G42 in the active site of M pro .(b) G42 ligand interactions within the M pro active site: Dashed turquoise lines represent hydrogen bonds, dashed orange lines represent -interactions and red lines and residues represent VDW interactions.

Fig. 7 .
Fig. 7. (Color online) Docking conformation and binding interactions of the M pro -G59 complex.(a) 3D structure of G59 in the active site of M pro .(b) G59 ligand interactions within the M pro active site: Dashed turquoise lines represent hydrogen bonds, dashed orange lines represent -interactions, dashed dark blue lines represent alkyl interactions, red lines and residues represent VDW interactions and dashed black line represents intramolecular hydrogen bonds.

Fig. 8 .
Fig. 8. (Color online) Docking conformation and binding interactions of the M pro -G59i complex.(a) 3D structure of G59i in the active site of M pro .(b) G59 ligand interactions within the M pro active site: Dashed turquoise lines represent hydrogen bonds, dashed orange lines represent -interactions, dashed dark blue lines represent alkyl interactions, red lines and residues represent VDW interactions and dashed black line represents intramolecular hydrogen bonds.

Fig. 9 .
Fig. 9. (Color online) Ramachandran plot for M pro -G59i complex.Core regions in blue, allowable regions in green.In conjunction with Supplementary Figures S12(a) and S12(b), consistent torsional angles contribute to the overall structural integrity of the complex.

Fig. 10 .
Fig. 10.(Color online) Orientation of G59i functional groups relative to their surrounding amino acids.(a) Orientation of the G59i cyclopentylimine group (lime green) relative to its surrounding amino acids in S2.(b) Orientation of the G59i indane-2-one group (lime green) relative to its surrounding amino acids in S1.(c) Orientation of the G59i tetrahydroisoquinoline-6-thiol group (lime green) relative to its surrounding amino acids in S4.

Table 1 .
In vitro and in silico results of natural compounds and repurposed drugs against M pro .Key M pro residues are bolded.

Table 2 .
Molecular docking results of best inhibitors against M pro in terms of binding energy.Key residues are bolded.