Virtual and biochemical screening to identify the inhibitors of binding between SARS-CoV-2 spike protein and human angiotensin-converting enzyme 2

Severe acute respiratory syndrome coronavirus (SARS-CoV-2) appeared as a new viral pathogen and caused the COVID-19 pandemic worldwide. Since the antiviral medicines effective for the treatment of COVID-19 are rare, it is necessary to identify the new candidate molecules for chemotherapy. The glycosylated Spike protein (S-protein) of SARS-CoV-2 plays a critical role in entering into the host cell through a direct interaction with human angiotensin-converting enzyme 2 (ACE2). For this reason, S-protein has served as one of the most effective therapeutic targets for discovering the antiviral medicines for COVID-19. In this work, we report the new small-molecule inhibitors of the interaction between the S-protein of SARS-CoV-2 and human ACE2, which were discovered through the structure-based virtual screening and in vitro biochemical binding assays. As a consequence of combining the computational and experimental validations, three novel inhibitors against the binding of S-protein and ACE2 were found with the associated IC50 values ranging from 50 to 100 μM. Although the biochemical potencies are moderate, the newly found inhibitors are worth being considered for further investigation by structure-activity relationship analysis to maximize the antiviral activity because of the low molecular weights and good physicochemical properties as a drug candidate. The interaction patterns of the new inhibitors in the ACE2-binding region of S-protein are addressed in detail.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive-sense RNA virus and responsible for the outbreak of atypical pneumonia pandemic termed COVID-19. Human infection by SARS-CoV-2 may cause serious symptoms including pneumonia, shock, severe acute respiratory syndrome, and multiple organ failure [1]. This has prompted the development of new antiviral medicines as well as vaccines for immunization. The infection of SARS-CoV-2 begins with the intermolecular interaction of the viral Spike glycoprotein (S-protein) with the human angiotensin-converting enzyme 2 (ACE2), which serves as a receptor to initiate the viral attachment and the entry into the cell [2]. Therefore, the inhibition of the protein-protein interactions between the receptor binding domain (RBD) of S-protein and ACE2 with small molecules may be a promising therapeutic strategy for the specific treatment of COVID-19. The usefulness of S-protein as a target for the antiviral medicine was further confirmed by the neutralization activity of human monoclonal antibodies against SARS-CoV-2, the epitope of which comprised the amino-acid residues in the ACE2 binding region of S-protein RBD [3].
Recently, the X-ray crystal structure of the S-protein RBD was reported in complex with human ACE2 [4,5]. The hot-spot amino acid residues of S-protein at the interface of S-protein-ACE2 complex were identified through the extensive structural analysis. The presence of such structural information on the interaction between S-protein and ACE2 shed new light on designing the small-molecule inhibitors that may develop into an antiviral medicine for the treatment of COVID-19. Nonetheless, the discovery of S-protein inhibitors lags behind the biological and structural findings. The majority of the known inhibitors for S-protein-ACE2 binding were discovered from the experimental Abbreviations: SARS-CoV-2, severe acute respiratory syndrome coronavirus 2; ACE2, angiotensin-converting enzyme 2; RBD, receptor binding domain; MW, molecular weight; 3D, three dimensional; cLogP, calculated LogP; RU, resonance units; SPR, surface plasmon resonance. screening of miniproteins [6], oligopeptides [7,8], and natural products [9][10][11] as well as from repurposing of drug molecules in the market [12,13]. Some novel classes of small-molecule inhibitors have also been reported in the literature including DRI-C compounds [14], clobenztropine analogues [15], and demethylzeylasteral [16]. However, the applicability of these small-molecule inhibitors has been limited due to the undesirable physicochemical properties as a drug candidate and the inclusion of reactive chemical moieties.
In this work, we aim to find the new classes of the inhibitors disrupting the interactions between S-protein RBD and ACE2 via the structure-based virtual screening with molecular docking simulations and the subsequent in vitro biochemical binding assays. Because the majority of the known inhibitors has a drawback of high molecular weight (MW), only the small organic molecules with MW lower than 360 amu were taken into account in virtual screening to find the proper molecular cores for chemical derivatizations.
Virtual screening with molecular docking simulations has often been unsuccessful due to the incompleteness of the binding free energy function to estimate the strength of binding between a small molecule and the target protein [17]. Among the highly scored molecules in docking simulations, therefore, those that are supposed to interact strongly with the hot-spot residues of target protein are considered only for experimental evaluations. In this regard, the hydrogen bonds involving the backbone groups of the interfacial residues turned out to contribute most significantly to protein-protein interactions [18]. Hence, the hydrogen-bond interaction with Gly496 of S-protein was adopted as a criterion for virtual screening on the grounds that the backbone carbonyl oxygen of Gly496 formed a stable hydrogen bond with Lys353 of ACE2 in the original X-ray crystal structure [4]. The two-step filtration involving the molecular docking simulations and the subsequent structural restraint is anticipated to make the virtual screening more rigorous by reducing the number of the false positives from the initial hits derived with the imperfect scoring function.

Materials and methods
The receptor model for S-protein was prepared from the crystal structure of S-protein in complex with human ACE2 (PDB entry: 6M0J) [4]. To build the all-atom model for S-protein, hydrogen atoms were added to individual heavy atoms according to the hybridization and protonation states. For example, the sidechains of Asp and Glu residues were maintained deprotonated unless at least one of the carboxylate oxygen atoms resided in proximity to the hydrogen-bond accepting group. The protonation state of Lys and His residues were also assigned based on the presence of intramolecular or intermolecular hydrogen bonds in the crystal structure of S-protein-ACE2 complex. The resulting all-atom model of S-protein served as a receptor in molecular docking simulations for virtual screening to find the new inhibitors of the interactions between S-protein and ACE2 from a large chemical library.
The chemical library for virtual screening was prepared by collecting the commercially available molecules from a chemical vendor Inter-BioScreen (http://www.ibscreen.com). Prior to the docking simulations for virtual screening, a total of 486,237 synthetic and 69,067 natural compounds were filtered according to Lipinski's "Rule of Five" to select only the molecules possessing the physicochemical properties that should be satisfied by potential drug candidates [19]. The MWs of the candidate inhibitors were limited to the range between 250 and 360 amu because the aim of this work was focused on the identification of a proper molecular core from which the highly potent S-protein inhibitors could be obtained by chemical derivatizations. As a consequence, a virtual chemical library comprising approximately 113,000 molecules was generated for the molecular docking simulations.
All the molecules in the virtual library were used as the input for the CORINA program to produce their three-dimensional (3D) atomic coordinates [20]. A stable 3D structure could be generated for each molecule with the conformational parameters optimized with the crystal structures of the small molecules in Cambridge Structural Database. Atomic charges were then assigned to both all the ligand molecules and S-protein with Gasteiger-Marsili method [21]. AutoDock program of version 4.2 [22] was selected as a computational tool for virtual screening of S-protein inhibitors. The potential parameters required in calculating the intermolecular van der Waals interaction energy and the internal energy of a ligand were extracted from the AMBER force field database [23]. Docking simulations of each candidate inhibitor were then carried out in the ACE2-binding region of S-protein to score all the molecules in the chemical library in the order of the calculated binding affinity.
In the actual simulations of protein-ligand docking, we used the binding free energy function (ΔG bind ) in the original AutoDock program, which can be expressed in the following mathematical form.
The weighting factors for van der Waals interaction, hydrogen bond, electrostatic interaction, torsional, and ligand desolvation energy terms in Eq. (1) were set equal to 0.1485, 0.0656, 0.1146, 0.3113, and 0.1711, respectively. The hydrogen-bond energy term has an additional weighting factor, E(t), to reflect the angle-dependent directionality. r ij is the interatomic distance, and A ij , B ij , C ij , and D ij are associated with the depth of the potential well and the equilibrium distances between the protein and ligand atoms. A sigmoidal function was used for the distance-dependent dielectric constant (ε(r ij )) to calculate the electrostatic interactions between S-protein and the candidate inhibitors [24]. N tor in the torsional term means the number of all rotatable bonds in a ligand molecule. In the ligand dehydration term, S i and V i parameters indicate the atomic hydration energy per unit volume and the fragmental volume of atom i [25], respectively.
Docking simulation of a molecule in the chemical library began with calculating the 3D grids of the protein-ligand interaction energy for all possible atom types in molecules. These potential grids constructed for S-protein were used in common for docking simulations of all the molecules in the chemical library. As the center of the common grids, we chose the atomic coordinates of the backbone carbonyl oxygen of Sprotein Gly496 in the original X-ray crystal structure. The grid maps comprised 61 × 61 × 61 points with the uniform spacing of 0.375 Å, and generated a receptor model to include the atoms of S-protein within 22.9 Å of the grid center. These grid maps were extensive enough to include the entire part of the ACE2 binding region of S-protein. Based on this 3D receptor model, docking simulations between S-protein and individual molecules in the chemical library were carried out to select 100 top-scored compounds. 10 docking runs were conducted with the initial population of 50 individuals for each ligand molecule. Among the highly scored molecules in docking simulations, those that were supposed to form a hydrogen bond with the backbone carbonyl oxygen of Gly496 of S-protein were selected only for experimental evaluations.
It should be pointed out that Gly446 of S-protein may also be a candidate hot-spot residue because its backbone moiety forms a hydrogen bond with Gln42 of ACE2 [4]. Nonetheless, Gly446 was not selected as a hot spot in virtual screening of the S-protein inhibitors due to the exposure to bulk solvent even in the S-protein-ACE2 complex. This would have the effect of rupturing a hydrogen bond involving Gly446 by the intrusive solvent molecules. Instead, Gly496 was selected only as the hot-spot residue in virtual screening because the hydrogen bond with Lys353 of ACE2 appeared to be protected well by the neighboring hydrophobic residues.
All assays for the binding between SARS-CoV-2 S-protein and human ACE2 were performed at Reaction Biology Corp. (Malvern, PA, USA) with surface plasmon resonance (SPR) technology. The influence of a test molecule on the binding of S-protein with ACE2 was measured by loading ACE2 onto biosensors, and subsequently dipping into a buffer involving S-protein and the test molecule. The signal for binding response was quantified in resonance units (RU). All the RU values associated with the inhibitory activity of each test molecule were measured in duplicate at varying concentrations of 0.0, 0.2, 0.4, 0.8, 1.6, 3.1, 6.3, 12.5, 25, 50, and 100 μM to produce the approximate sigmoidal dose-response curves. Finally, the IC 50 value of each test molecule was determined with regression analysis using the GraphPad PRISM program of version 4.

Results and discussion
Of approximately 113,000 molecules screened with molecular docking simulations in the ACE2-binding region of S-protein, only 108 molecules with the calculated binding free energy lower than − 20 kcal/ mol were selected for further analysis. As depicted in Fig. 1, only 16 molecules satisfied the configurational constraint to form a hydrogen bond with Gly496 of S-protein and were selected as virtual hits. All these putative inhibitors were commercially available from a compound supplier (InterBioScreen Ltd., BAR, Montenegro) and were tested for the inhibitory activity for binding of ACE2 to S-protein. The binding assays were performed by Reaction Biology Corp. (Malvern, PA, USA) with SPR technology to measure the direct binding of the potential inhibitors to Sprotein. This assay seemed to be suitable for determining whether a test molecule inhibits the S-protein-ACE2 interaction to block the entry of SARS-CoV-2 into the host cells [26].
As a consequence of the extensive computational and experimental screening, three molecules were identified as the new inhibitors against the binding of S-protein to ACE2 with the associated IC 50 values ranging from 50 to 100 μM. The chemical structures and the biochemical potencies of the newly found S-protein inhibitors are summarized in Fig. 2 and Table 1, respectively, along with MWs and the calculated LogP (cLogP) values. As a matter of fact, these three molecules have never been reported as the S-protein inhibitor in the literature. We note that 1 and 3 include 2-(4-acetamido-1H-indol-1-yl)acetamide moiety in common while 2 involves a single substitution on N-(1H-benzo[d]imidazole-2-yl)benzofuran-2-caroxamide scaffold. 1 and 3 are similar to DRI-C series of S-protein-ACE2 interaction inhibitors in that the two amide moieties flank an aromatic group [15]. A common structural feature of 1-3 is that several hydrogen bonding groups are attached to the aromatic heterocycles including indole, pyridine, benzimidazole, and benzofuran. This implies that all three inhibitors would be bound in the ACE2-binding region of S-protein through the multiple hydrogen bonds in combination with the hydrophobic contacts. To disprove the possibility of being a false positive in binding assays, 1-3 were checked in the publicly accessible ZINC15 database [27] to confirm the absence of any substructure in pan assay interference compounds (PAINS) [28].
As listed in Table 1, the IC 50 values of 1-3 associated with the inhibitory activity against S-protein range from 50 to 100 μM. This moderate biochemical potency can be attributed to the fact that only the small molecules with MW lower than 360 amu were considered in virtual screening. Nonetheless, 1 and 2 seem to deserve consideration for further development through the chemical derivatizations because they were also screened for having the physicochemical properties desirable  as a drug candidate [19]. Although the biochemical potency of 2 is slightly lower than that of 1, the former is likely to serve as a good molecular scaffold as the latter for the optimization of the inhibitory activity due to the lower MW.
To gain some structural insight into the inhibitory actions of 1-3, their calculated binding modes in the ACE-2 binding region of S-protein were examined in the comparative fashion. Fig. 3 shows the lowestenergy binding configurations between S-protein and the newly found inhibitors in the ACE2-binding region. Among the three major subbinding regions that were located at the S-protein-ACE2 interface [4], 1-3 appear to be accommodated in Region 1 comprising Gly496, Asn501, and Tyr505 as well as in Region 2 involving Tyr453 and Ser494, while Region 3 containing Ala475 and Asn487 remains unoccupied. 1 and 2 are arranged in the opposite directions in spite of sharing a small pocket for the hydrogen-bond interactions with Gly496. In all three cases, the polar bicyclic aromatic moieties stay in close proximity to Gly496 while the remaining hydrophobic groups point toward the amino-acid residues on the neighboring β-sheets (1 and 3) and loops (2). To address the possibility that 1-3 would be bound in the unexpected site of S-protein RBD other than the ACE2-binding region, the additional molecular docking simulations were conducted using the extensive grid maps to encompass the entire structure of S-protein RBD. However, no peripheral binding region in which 1-3 could bind with a negative binding free energy was detected during the entire course of simulations. These results support the probability that 1-3 would impair the interaction between S-protein RBD and ACE2 by binding in the ACE2-binding region of the former.
In spite of the worldwide vaccination, the COVID-19 pandemic continues due to the manifestations of numerous SARS-CoV-2 variants. Among them, the novel SARS-CoV-2 variant termed Omicron (B.1.1.529) has become predominant over 40 countries [29]. To estimate the inhibitory activity of 1-3 against the highly contagious variants, docking simulations were also carried out on RBD (residues 319-541) of the Omicron variant, which involved 15 mutations including G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, and Y505H. The 3D atomic coordinates of the Omicron variant were prepared with the homology modeling using the structure of the wild type as the template. As shown in Fig. 4, the calculated binding modes of 1-3 for the Omicron variant are similar to those for the wild-type S-protein in that they occupy Region 1 and Region 2 of RBD without significant interactions with Region 3. The binding free energies of 1-3 with respect to the Omicron variant were also calculated to address the strength of interactions in RBD, which are listed in Table 2 in comparison with those for the wild type. The calculated binding free energies of 1-3 remain very similar even though the receptor protein changes from the wild type to the Omicron variant. The difference in the binding free energies between the wild type and the Omicron variant fall within 5% in all three cases. Judging from the similarity both in the binding modes and in the binding affinities, it can be argued that 1-3 would also have the biochemical potency against the Omicron variant as comparable to that against the wild type.
It is worth noting that the glycosylation of RBD residues may alter the binding modes of 1-3 because S-protein contains a considerable amount of glycosylation hot spots. Among a total of 17 glycosylation sites identified with energy-optimized LC-MS/MS and ion mobility MS/ MS methods [30], only the two residues (Asn331 and Ala344) are located in RBD. Because both residues reside far from Region 1-3 at the binding interface, the glycosylation at RBD seems to have little effect on   the inhibitory activity of 1-3.
We now focus our interest on addressing the detailed intermolecular interactions relevant to the stabilization of the newly identified inhibitors in the ACE2-binding region of S-protein RBD. Fig. 5 shows the simulated binding mode of 1 with respect to S-protein RBD. We note that the terminal acetamide moiety of 1 donates a hydrogen bond to the backbone carbonyl oxygen of Gly496. This interaction seems to play the role of anchor for accommodating 1 in the ACE2-binding region of Sprotein because the importance of maintaining a hydrogen bond with the backbone group of Gly496 was well appreciated both in the X-ray crystal structure [4] and in the extensive computational investigations [31]. An N-H⋅⋅⋅O hydrogen bond is also observed between the central carbonyl oxygen of 1 and the sidechain guanidinium group of Arg403. Most probably, the two hydrogen bonds mentioned above would serve as the most significant binding force for 1 to be bound in the ACE2-binding region of S-protein RBD because any other stronger interaction involving the charged groups was not found in the calculated S-protein-1 complex. The molecule 1 appears to be stabilized further in the ACE2-binding region of S-protein RBD through the hydrophobic interactions with the sidechains of Lys417, Leu455, Tyr453, Tyr495, Phe497, and Tyr505. The involvement of Leu455 in the van der Waals contact with 1 is consistent with the recent computational and experimental finding that Leu455 is required for ACE2 binding [32,33]. Judging from the interaction patterns revealed in docking simulations, the micromolar-level biochemical potency of 1 may stem from the combined effects of the two hydrogen bonds and hydrophobic contacts in the ACE2-binding region of S-protein RBD. Due to the low MW (322.4 amu), 1 is likely to play the role of molecular core from which highly potent inhibitors are derived by substituting various chemical moieties to maximize the interactions with S-protein. Fig. 6 shows the calculated docking pose of 2 in the ACE2-binding region of S-protein RBD. The role of the hydrogen-bond donor with respect to the backbone carbonyl oxygen of Gly496 is played by the central amidic nitrogen of 2, which also makes a hydrogen bond with the sidechain carbonyl oxygen of Gln498 in the bifurcated form. This interaction may contribute significantly to the inhibitory activity of 2 on the grounds that Gln498 was found to be one of the key interacting residues in complexation with ACE2 [29,33]. The third hydrogen bond in the calculated S-protein-2 complex is established between the backbone of Tyr505 and the benzimidazole moiety of 2, which would also have the effect of facilitating the formation of a protein-ligand complex. Hydrophobic interactions in the S-protein-2 complex appear to be established in the weaker form than those in the S-protein-1 complex because that only the three residues (Tyr495, Phe497, and Tyr505) reside in proximity to the aromatic rings of 2. The weakening of hydrophobic interactions can be invoked to explain a little lower biochemical potency of 2 than 1. Nonetheless, 2 is also worth serving as a new inhibitor scaffold for chemical derivatizations due to the lower MW (307.3 amu) than 1.
With respect to the calculated binding modes in the ACE2-binding region, it is noteworthy that the hydrophobic interactions are situated in the vicinity of the intermolecular hydrogen bonds both in S-protein-1 and in S-protein-2 complex. This may be helpful for preventing the rupture of the potency-enhancing hydrogen bonds by limiting the approach of water molecules that are supposed to hydrolyze the hydrogen bonds. The protective role of neighboring hydrophobic groups becomes more prominent when the intermolecular hydrogen bonds involve the highly soluble moieties. In this regard, the maintenance of the hydrogen bonds by inducing the supportive vicinal hydrophobic interactions has been a facile technique in maximizing the biochemical potency of drug candidates [34,35].
Because the scoring function of AutoDock program exhibited a relatively good performance in finding the inhibitors against the binding of S-protein to ACE2, it would be also effective in designing the highly potent inhibitors using 1 and 2 as a molecular core. This seems to be made possible by performing the structure-guided de novo design in the stepwise fashion. The first step is to generate a variety of the derivatives of 1 and 2 based on the structural features of the two S-protein-inhibitor complexes derived in the precedent docking simulations. Second, a number of chemical derivatives generated at random should be scored and ranked in the order of the binding affinity for S-protein RBD using the AutoDock scoring function. Several top-ranked derivatives can be prepared by chemical synthesis or commercial purchase for the experimental measurement of the inhibitory activity. Related scientific endeavors can be followed to identify the highly potent inhibitors against the binding of S-protein to ACE2.

Conclusions
We have discovered three new inhibitors for the binding between Sprotein RBD and human ACE2 by applying a computer-aided drug design procedure involving the virtual screening with molecular docking simulations and in vitro binding assays. The biochemical potencies of these inhibitors were shown to be moderate in terms of rupturing the interaction between S-protein and ACE2 with the associated IC 50 values ranging from 50 to 100 μM. Nonetheless, 1 and 2 are worth being considered for the further development to optimize the inhibitory and antiviral activity because they are also screened in silico for having the physicochemical properties of a drug candidate with low molecular weights (322.4 and 307.3 amu). The results of molecular docking simulations showed that the newly found inhibitors could be accommodated in the ACE2-binding region of S-protein RBD due to the combined effects of the multiple hydrogen bonds and the hydrophobic contacts. Highly potent inhibitors are expected when the chemical derivatives of 1 and 2 will be prepared through the structure-based de novo design with the calculated binding modes.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.