Drug repurposing for identification of potential spike inhibitors for SARS-CoV-2 using molecular docking and molecular dynamics simulations

For the last two years, the COVID-19 pandemic has continued to bring consternation on most of the world. According to recent WHO estimates, there have been more than 5.6 million deaths worldwide. The virus continues to evolve all over the world, thus requiring both vigilance and the necessity to find and develop a variety of therapeutic treatments, including the identification of specific antiviral drugs. Multiple studies have confirmed that SARS‐CoV‐2 utilizes its membrane-bound spike protein to recognize human angiotensin-converting enzyme 2 (ACE2). Thus, preventing spike-ACE2 interactions is a potentially viable strategy for COVID-19 treatment as it would block the virus from binding and entering into a host cell. This work aims to identify potential drugs using an in silico approach. Molecular docking was carried out on both approved drugs and substances previously tested in vivo. This step was followed by a more detailed analysis of selected ligands by molecular dynamics simulations to identify the best molecules that thwart the ability of the virus to interact with the ACE2 receptor. Because the SARS-CoV-2 virus evolves rapidly due to a plethora of immunocompromised hosts, the compounds were tested against five different known lineages. As a result, we could identify substances that work well on individual lineages and those showing broader efficacy. The most promising candidates among the currently used drugs were zafirlukast and simeprevir with an average binding affinity of –22 kcal/mol for spike proteins originating from various lineages. The first compound is a leukotriene receptor antagonist that is used to treat asthma, while the latter is a protease inhibitor used for hepatitis C treatment. From among the in vivo tested substances that concurrently exhibit promising free energy of binding and ADME parameters (indicating a possible oral administration) we selected the compound BDBM50136234. In conclusion, these molecules are worth exploring further by in vitro and in vivo studies against SARS-CoV-2.


Introduction
In late December 2019, an outbreak of a novel coronavirus disease  occurred in the Wuhan Province of Hubei, China, with a clinical presentation closely resembling viral pneumonia [1]. Since then, [1]. The virus belongs to the Coronoviridae family that also includes other human infecting viruses like SARS-CoV, which was responsible for the 2002-2004 epidemic [2], MERS-CoV [3] and HCoV-NL63, HCoV-OC43, HCoV-229E and HCoV-HKU [4]. The latter four species are often associated with the "common cold", upper respiratory infections showing only mild symptoms. The SARS-CoV-2 genome is similar to that of other Coronaviridae with one large ORF encoding two overlapping polyproteins (orf1a and orf1ab) that are cleaved into at least 16 nonstructural proteins. The ORF1ab is followed by other ORFs encoding structural proteins, including the spike, envelope, membrane, and nucleocapsid proteins [5]. The spike glycoprotein, located on the virion surface, mediates receptor recognition and membrane fusion with the host cell. Multiple studies have confirmed that SARS-CoV-2, and similarly SARS-CoV, utilize the angiotensin-converting enzyme 2 (ACE2) for host cell entry [6,7]. ACE2 is a broadly expressed protein and is found in many different species [8,9]. Thus, the virus may transmit from animals to humans through multiple intermediate hosts [10]. Considerable effort has been put to understanding how the binding process occurs and there are now multiple structures depicting the spike-ACE2 complex [6,11,12]. Several differences were identified in the receptor-binding domain (RBD) of the SARS-CoV-2 spike protein compared to SARS-CoV, which allows for stronger binding affinity with the human ACE2 receptor [6]. This can explain the increased transmissibility of SARS-CoV-2 compared to SARS-CoV [13]. Another human protein important for host recognition is transmembrane protease serine 2 (TMPRSS2). This protein cleaves the C-terminal segment of ACE2 and spike sequence between the S1 and S2 subunits enhancing the spike protein-driven viral entry [5,14].
In addition to acute respiratory failure, hypertension is one of the major co-morbidities associated with COVID-19 [15]. SARS-CoV-2 may influence the kinin-kallikrein and renin-angiotensin systems (RSS) that modulate vasodilation, inflammation and blood pressure [16]. Bradykinin (BK) activity is increased by the angiotensin-(1-9) axis generated by ACE2 [17] and decreased due to inactivation by ACE [18]. In normal conditions, ACE and ACE2 should be in balance, however, cells in bronchoalveolar lavage fluid from COVID-19 patients have decreased expression of ACE and increased expression of ACE2. It leads to excessive levels of BK (the "Bradykinin Storm") and hyaluronic acid in the body [19]. This phenomenon can produce a highly viscous substance that prevents oxygen uptake in the lungs of COVID-19 patients. On the other hand, blood cells from COVID-19 patients express less ACE2 than healthy donors; thus, the SARS-CoV-2 infection also influences the expression of blood pressure regulators [20].
The SARS-CoV-2 virus continues to evolve and, according to the Pango resource, there are now more than 1,600 unique lineages [21]. Because the virus has spread to all parts of the world, it is now able to make use of immunocompromised individuals to reemerge as a new and potentially dangerous strain for which a vaccine is unable to provide adequate defense [22]. Only a few of these lineages have spread to global proportions and, as a consequence, are of major concern to public health. These lineages include B.1.1.7 (also referred to as alpha), B.1.351 (beta), P.1 (gamma) and B.1.617.2 (delta). The omicron variant appearing in late 2021 was also recently declared as a variant of concern due to its high number of mutations compared to previously mentioned lineages [23]. Mutations observed between variants are spread throughout the SARS-CoV-2 genome including the spike protein itself.
The outbreak of COVID-19 resulted in a massive effort to identify therapeutic strategies that would mitigate the severity of the infection. Several vaccines are now available, however, although their effectiveness against infection by the ancestral virus and the alpha variant was high, it was reduced by around 20% against the delta variant [24]. Furthermore, in a more recent analysis carried out in England during the dominance of the delta variant, it was shown that 39% of the infections in fully vaccinated households had arisen from a fully vaccinated index case [25]. This highlights the need to identify drugs that can alleviate COVID-19 symptoms and speed up the recovery process. Previous research efforts to develop antiviral agents against members of Coronaviridae suggested several ways to disrupt the viral life cycle [26,27]. This includes (I) inhibiting viral entry to the cell by blocking either the spike protein or its molecular target ACE2; (II) disrupting processing of orf1ab and orf1a polyproteins by inhibiting viral proteases: main protease (M pro , also called 3-chymotrypsin like protease) or papain-like protease (PL pro ); (III) preventing viral replication through inhibition of the viral RNA-dependent RNA polymerase (RdRp). The outbreak of SARS-CoV-2 spawned multiple analyses where the goal was to identify substances potentially useful in COVID-19 treatment. Most of the studies apply a drug repurposing strategy that identifies new therapeutics from already approved substances, since de novo identification of drugs is a costly and lengthy process [28]. In one of the research studies, it was found that emodin, omipalisib, and tipifarnib can serve as inhibitors of RdRp [26]. Enalkiren, ethylsulfonamide-D-Trp-Gln-paminobenzamidine and Z-LY-CMK were found among 22 potential inhibitors of M pro [29]. Another docking study revealed that the drugs cytarabin, raltitrexed, tenofovir, cidofovir, lamivudine and fludarabine are potent binders of the spike protein [30]. In other research, rather than finding specific protein targets, the authors analyzed geneinteraction networks and found several potential drugs including fluorouracil, cisplatin, sirolimus, cyclophosphamide, and methyldopa [28]. Several other drugs such as chloroquine, hydroxychloroquine, remdesivir, galidesivir and others were also considered for potential treatment of COVID-19 patients [31]. Other therapeutic strategies involved the use of monoclonal antibodies or patient-derived plasma [32]. However, to date no definite treatment has been established, although only very recently the combination of the antiviral drugs nirmatrelvir and ritonavir has been approved for treatment of mild and moderate cases of COVID-19 in the US (https://fda.gov). The latter compound is an inhibitor of the HIV protease, approved in 1996 [33]. This shows how drug repurposing can help overcome the global health problems.
In this study, we focused on drug repurposing to expand the current catalogue of compounds targeting the SARS-CoV-2 spike protein. Druglike substances were divided into two groups: (I) those already approved as drugs and (II) those only tested in vivo. Both groups were selected from the catalogue of molecules from the ZINC15 database [34]. Drugs were prioritize based on the binding free energy, predicted by Auto-Dock4, followed by Molecular Dynamics (MD) simulations and free energy calculations using the Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) approach [35]. To account that SARS-CoV-2 evolves and substances effective against the Wuhan-1 strain may not be useful in infections caused by novel variants of concern, drugs were tested against the spike proteins of five lineages: B.1.1.7 (alpha), B.1.351 (beta), P.1 (gamma), B.1.617.2 (delta), and C.36 lineage. Among the approved drugs, we only considered substances that are administered orally because it is simple, safe and convenient for the patient. The two most promising candidates selected in this study are zafirlukast and simeprevir. According to the AutoDock4 results, both compounds are among the top 1% binders among the tested ligands. Moreover, throughout the 100 ns MD simulations, they remain bound to the spike near its interface with ACE2 and their average MM/PBSA binding energy was -22 kcal/mol. From among the in vivo tested molecules, drug candidates were prioritized based on the predicted binding energy and the most promising ADME (absorption, distribution, metabolism, and excretion) properties that indicate possible oral administration. After MM/PBSA calculations among the in vivo substances, which showed the most promising inhibition for all analyzed lineages, we selected BDBM50136234 (ZINC id: ZINC000049888620). This substance was previously analyzed as a potential agonist of the prostaglandin E2 receptor [36].

Ligand selection
Two datasets of ligands were selected for this study based on the data available in the ZINC15 database. The first contains drugs approved in major jurisdictions and is based on the "world" subset from ZINC15. It comprises 5,903 unique molecules; however; 3D structures of only 4,380 compounds are available in the database. The second dataset consist of substances tested in both animals and man. It is based on the "in vivo" subset from ZINC15. This dataset has nearly 130,000 compounds; however, as with the "world" subset, only 39,623 3D structures are available (not including the molecules already present in the "world" dataset). The 3D structures for ligands were download from ZINC15 and processed with the AutoDock Tools [37]. The non-polar hydrogen atoms were removed and the Gasteiger charges were added to the ligands.

Protein preparation
The genomic sequences of the specific SARS-CoV-2 lineages were downloaded from the GISAID database (https:/gisaid.org) and the spike protein sequences were obtained using the nextclade tool v 1.7.0 [38]. The following samples from GISAID were used as the representatives of the SARS-CoV-2 lineages: EPI_ISL_581117 (Pango lineage -B.1.1.7), EPI_ISL_660190 (B.1.351), EPI_ISL_792683 (P.1), EPI_ISL_8402453 (B.1.617.2), and EPI_ISL_6376910 (C.36). Homology models of the proteins were proposed using the following approach. The structure of the spike protein of SARS-CoV-2 in the active conformation was selected as a template based on analysis of the available structures in the RCSB database [39]. The pdb|7dwz structure was solved using electron microscopy to 3.3 Å resolution. The sequence-to-structure alignment was proposed using MAFFT v7.475 [40]. The 3D models of the proteins were built with MODELLER [41]. A model quality assessment was carried out using the MolProbity web server [42]. Prior to molecular docking, each protein underwent energy minimization using the procedure described in detail in Section 2.6 with the only difference being the lack of a ligand in the simulation box and a shorter simulation time of 10 ns. The simulations utilized the full trimeric structure of the protein. The receptor binding domain (RBD) from the last snapshot of the simulation was extracted and converted to pdbqt format using the AutoDock Tools. The docking grid was centered close to the Gly496 residue, a part of the spike-ACE2 interface. The grid spacing was set to 0.3 Å and the box size was 27 × 30 × 12 Å.

Molecular docking simulations
All docking simulations were carried out with the GPU-accelerated version of AutoDock4 [43]. The docking parameters for the Lamarckian Genetic Algorithm were as follows: the population size -150, number of runs -150, crossover rate -80%, and rate of gene mutation -2%. A Root Mean Square Deviation (RMSD) tolerance of 2 Å was used to cluster the results of the docking simulation. The results were visualized using PyMOL 2.4 and BIOVIA Discovery Studio Visualizer 2021.

Ligand selection
Two approaches were used to prioritize ligands for the MM/PBSA calculations. For molecules already approved as drugs, the raw docking score was the main criterion; however, we discarded substances that are not administered orally. For each lineage, we selected up to three bestfitting compounds for further analysis. Furthermore, we selected the five most promising inhibitors of all lineages based on the average rank obtained from molecular docking simulations. A ligand could be selected twice; i.e., it was among both the best compounds working on a specific lineage and had potential to inhibit other lineages. For substances that were only tested in vivo, we also prioritized molecules based on their docking score; however, for the first 500 best hits, we also analyzed their ADME parameters and toxicity using the ADMETlab 2.0 web server [44]. Substances were prioritized if they indicated possible oral administration. Candidate substances met the following criteria: (I) the Lipinski Rule with at most one violation (molecular weight ≤ 500 Da, logP ≤ 5, hydrogen acceptor count ≤ 10, and hydrogen donor count ≤ 5); (II) topological polar surface area ≤ 75 Å 2 ; (III) human intestinal absorption ≤ 0.7; (IV) human oral bioavailability 30% ≤ 0.7; (V) the drug's half-life ≤ 0.8; (VI) carcinogenicity ≤ 0.3; (VII) AMES toxicity ≤ 0.3.

Molecular dynamics (MD) simulations
Molecular Dynamics (MD) simulations were carried out for the selected ligand-protein complexes using GROMACS 2020.2 [45]. For ligands, the General Amber Force Field (GAFF2) was used in combination with partial charges obtained from the Austin Model 1 semiempirical molecular orbital technique combined with Bond Charge Correlations (AM1-BCC). The associated force field and charges were assigned to the ligand using the acpype program [46] in conjunction with AmberTools21 [47]. The simulation box was dodecahedron-shaped with a specified distance of 1.4 nm separating the point between the largest principal radius of the centered protein and the edge of the dodecahedron. Periodic boundary conditions were applied in all directions. For all simulations, the AMBER99sb force field was applied to the protein along with SPC water. Also, 100 mM NaCl was added to the simulation box, including neutralizing counterions. Short-range nonbonded interactions were cut off at 1.2 nm and long-range electrostatics were calculated using the Particle Mesh Ewald (PME) method. The system underwent the following simulations steps. First a steepest descent minimization was carried out until the maximum force of the system was below 1000 kJ/ mol/nm. Next, the system underwent a 100 ps restrained NVT simulation followed by a 100 ps of restrained NPT simulation. The Berendsen thermostat (set to 310 K) and Berendsen barostat (set to 1 bar) were used. Finally, an unconstrained 100 ns simulation was performed using the Berendsen thermostat and Parrinello-Rahman barostat.

Molecular Mechanics/Poisson-Boltzmann surface area (MM/PBSA) calculations
MM/PBSA is one of the most widely employed methods for estimating protein-ligand affinities due to its high accuracy, efficiency, and correlation with the experimental data [35]. The free energy of binding (ΔG MM/PBSA ) of a receptor-ligand complex is calculated from where, ΔG vacu PL represents the binding energy between the ligand and protein in vacuo and comprises van der Waals and electrostatic energies. These values were extracted from MD simulations using the GROMACS energy module. The solvation free energy (ΔG solv ) was calculated separately for ligand, protein and the protein-ligand complex based on the results of MD simulations using APBS v.1.4.1 [61]. Nonpolar solvation energy (ΔG apol ) was extracted with the g_mmpbsa program [48].
The ΔG MM/PBSA was calculated for each frame extracted every nanosecond from the simulation and averaged to obtain the final value.

Structural analysis of coronavirus spike protein
The membrane-bound spike protein of SARS-CoV-2 utilizes ACE2 as a cellular receptor. Furthermore, during internalization of the virus, this protein mediates fusion of the viral and host cell membranes, thus offering an attractive target for anti-COVID19 drugs. The key part of the protein responsible for ACE2 recognition is the receptor-binding domain (RBD); however, as SARS-CoV-2 evolves so does the sequence of the RBD. In our study, we selected five variants for analysis rather than focusing on the Wuhan-1 strain of SARS-CoV-2. As of January 2022, three of these are designated as the Variants of Concern -B.1.351 (beta), P.1 (gamma), B.1.617.2 (delta) -and two as de-escalating variants -B.1.1.7 (alpha), C.36 (https://ecdc.europa.eu/en/covid-19/variants-c oncern). The 3D models of these variants were proposed using a homology modelling approach, the procedure described in the Materials and Methods. The template for the modelling was the SARS-CoV-2 trimer (pdb|7dwz). Each variant of the spike protein was solved with one of its RBD in the active conformation.
There are several notable differences in the sequences of the selected proteins near the spike-ACE2 interface (Fig. 1), including Lys417, Leu452, Thr478, Glu484 and Asn501 (residue numbering according to the Wuhan-1 strain). These mutations alter the properties of the RBD, like the Asn501Tyr mutation where an amino acid with an aromatic side chain is introduced. This mutation is linked with enhanced transmissibility of SARS-CoV-2 [49]. These mutations also influence the conformation of neighboring residues that must accommodate for the altered environment within the RBD with respect to that from the Wuhan-1 strain. This is a crucial factor for our downstream analysis since, for virtual screening, a rigid docking procedure was used and only changes to the conformation of the ligand, but not the protein, were allowed. To remove any potential steric clashes that might have arisen due to the modelling procedure, each model underwent a short 10 ns unconstrained MD simulation using the procedure described in Section 2.6. The full trimer of the spike protein was simulated and we calculated the changes to the RMSD for the Cα atoms with respect to the initial conformation (see supplementary data). In all the cases, we observed a plateau within the first few nanoseconds of the simulation with no largescale changes to the protein conformations. The RBD was extracted from the final snapshot from each of simulations and used for virtual screening. The quality of this conformation was analyzed using the MolProbity webserver. The clash-score ranged from 0.65 for the alpha variant to 2.62 for the delta. That value is well within the 98th percentile, based on analysis of 1,784 crystal structures. The Molprobity score ranged from 1.34 for the alpha variant to 1.9 for C.36. The latter value is within 81st percentile (out of 27,675 structures). In conclusion the quality of these structures is sufficient for the virtual screening step.

Inhibitors of individual lineages
Drug repurposing is a strategy in which compounds already approved as drugs are used to identify medications for new diseases. This permits a quicker introduction of new therapeutics due to the already established list of their side effects and known ADME parameters. In our study, a preselected set of drugs was downloaded from the ZINC15 database. First, we used the "world" subset that contain 5,903 drugs approved in major jurisdictions. We analyzed 4,366 that have their 3D structure available in the database. The second dataset considered in this work contains 129,565 substances that were tested in either human or animal. In our work we analyzed 39,609 compounds after excluding those that already belong to the "world" subset and for which the 3D structures were not available in the database.
The molecular docking was carried out using the procedure described in Section 2.3. For docking, only the region of the spike protein near the ACE2 interface was considered. The protein remained rigid throughout the simulation with the RBD in the active state, while the ligand remained fully flexible to sample its best conformation. The results are based only on the ligand conformation with the highest docking score. Despite the differences in the RBDs for the analyzed lineages, the correlation between the predicted free energy of binding (ΔG binding ) for the analyzed ligands was remarkably high. For example, the Pearson correlation value was 0.9 for the docking experiment involving the beta and gamma lineages, regardless whether we analyzed drugs or the compounds tested in vivo. The lowest observed correlation was 0.77 between the results of docking to C.36 and the delta variant. Considering the generally poor reproducibility of docking [50], this is a positive, yet surprising output. It must be noted that there are two major differences between the docking experiments involving the two lineages. One is the mutations in the RBD and the other is changes in the conformations of the residues introduced by 10 ns MD simulation. All the analyzed ligands achieved negative values of ΔG binding that ranged between − 1.17 to − 13.14 kcal/mol for drugs and between − 0.66 to − 13.6 kcal/mol for the in vivo tested compounds. This can be attributed to the size of the RBD as the solvent accessible surface of the residues in the RBD is between 2,775 and 2,989 Å 2 . Additionally, there are no major cavities in the area, resulting in the possibility of a multitude of molecules binding the protein surface and creating multiple strong interactions. The lack of any major steric constraints on the ligand might result in selecting as top hits large compounds that show little specificity toward the spike protein. Thus, most of the molecules would show only transient binding.
Below, we discuss the results of the docking procedure for the ligands selected from among the already approved drugs. We focus our attention on two categories of ligands. Firstly, we present the top three binders for each individual SARS-CoV-2 lineage that are the most promising targets for anti-COVID-19 drugs. These compounds may not always show the highest binding score, as we only kept those administered orally, excluding metabolites or substances used in diagnostics. Secondly, we identified five compounds that showed a strong affinity toward all tested lineages simultaneously. The latter were chosen based on their average rank in individual docking simulations.
The best compounds for B.1.1.7 lineage are candesartan cilexetil, azelnidipine and forasartan with ΔG binding of − 11.16, − 9.91 and − 9.82 kcal/mol, respectively. The candesartan cilexetil was the best scored molecule while the latter two compounds were ranked 5th and 6th, respectively. The candesartan position is stabilized within the RBD by several contacts (Fig. 2A,C). First, the ligands' tetrazole ring, biphenyl and methoxyl groups form cation-pi contacts with Arg403 with the additional hydrophobic interactions involving both Tyr495 and Tyr501. The latter residue is also involved in stacking interaction with the biphenyl group. The cilexetil group of this ligand forms a hydrogen bond with Gln493 and the ligands' hexane group is placed near the hydrophobic Leu455 and Phe456. Forasartan, is structurally similar to candesartan with both compounds containing a tetrazole ring connected to a phenyl group. Both drugs also share a common molecular function as angiotensin II receptor antagonists. Interestingly, forasartan adopts a very similar conformation that is also stabilized by interactions with Arg403 (Fig. 1B, C). The less negative ΔG binding of this compound compared to candesartan cilexetil can be attributed to the lack of the cilexetil group that can form multiple strong polar interactions with the protein.
The main differences between the RBD of the C.36 and alpha lineages involve the substitutions of Leu452Arg and Tyr501Asn. As a consequence, most of the highest scoring ligands for the C.36 lineage drift toward position 452 that is now occupied by a polar residue. The most promising candidate targeting the C.36 lineage was montelukast with a predicted ΔG binding of − 10.25 kcal/mol. This compound is a leukotriene receptor antagonist used in asthma treatment. Arg452 provides the most important stabilizing effect by forming hydrogen bonds with the carboxyl and hydroxyl groups of the ligand (Fig. 3A,C) and Pi-Cation interactions with a phenyl group. The carboxyl group is additionally stabilized by a hydrogen bond with Asn450. The chlorine substituted quinoline is placed in a pocket formed by Arg403, Tyr453, Gln493, Tyr495 and is stabilized by a Pi-Alkyl interaction with Tyr505. The abovementioned candesartan cilexetil was also among the strongest binders for the C.36 strain (ranked 19th); however, changes observed to the RBD of this variant could cause the re-orientation of the ligand with respect to the conformation from the alpha variant (Fig. 3B,C). The tetrazole ring is now stabilized by Ser494, which however leads to a potential unfavorable interaction with Arg452. The cilexetil group is now occupying roughly the same area near Tyr453, Tyr495, Asn501 as the tetrazole ring did in the conformation observed for the alpha variant. The carbonyl groups are mainly stabilized via hydrogen bonds with Lys417, and the backbone of Gly496. These differences between candesartan's conformations results in an RMSD of 9 Å. The predicted ΔG binding is − 9.63 kcal/mol, only slightly higher than that of the alpha lineage. Manidipine, the third selected drug, occupies a similar area of the RBD as candesartan cilexetil and montelukast. Arg452 provides the stabilizing Pi-Cation interactions with one of the ligand's phenyl groups. Additional contacts are formed with residues Arg403, Lys417, Gln493. The ΔG binding for this compound is − 9.66 kcal/mol, also similar to that of candesartan cilexetil.
The beta variant of the SARS-CoV-2 emerged in South Africa in late 2020 [51] and still remains one of the Variants of Concern. The main differences in the spike protein with respect to the Wuhan-1 strain include Asp80Ala, Asp215Gly, Lys417Asn, Glu484Lys, Asn501Tyr, and Ala701Val. Three of these positions (417, 484 and 501) are part of the spike-ACE2 interface. This leads to an altered list of potential ligands compared to that reported for the alpha and C.36 lineages. The most promising candidates for this lineage are lutein, pranlukast and canthaxanthin. These compounds ranked as 3rd (ΔG binding = − 9.93 kcal/mol), 8th (− 9.36 kcal/mol), and 9th (− 9.34 kcal/mol) based on the AutoDock4 results, respectively. Lutein is a xanthophyll used both as a drug and dietary supplement. The ligand is predominantly hydrophobic with only two carbonyl groups capable of forming hydrogen bonds. One of them, based on the highest-scored conformation, is formed with the backbone of Asn417 while the second hydrogen bond is formed with Gly447 (Fig. 4A, C). Among the five tested lineages, Asn417 is found only in the beta and gamma variants, again explaining the difference between the best drug candidates for different SARS-CoV-2 lineages. The remaining contacts involve mostly Pi-Alkyl interactions formed with Tyr449, Tyr453, and Tyr495. Pranlukast is a leukotriene receptor-1 antagonist. Like candesartan cilexetil and forasartan, pranlukast contains the tetrazole ring that is stabilized by Asn448. This residue is located near the Arg452 that was important for the candesartan cilexetil interactions with the C.36 variant (Fig. 4B, C). The ligand's amide group forms a hydrogen bond with Tyr449 while the alkoxy group is interacting with the backbone of Ser494.
Manidipine, a calcium channel blocker, which was already identified as one the best binders for the C.36 lineage, is the best inhibitor for the delta variant. This can be attributed to the fact that both these lineages share the Leu452Arg mutation and they lack Tyr501, which is characteristic of all the other lineages analyzed in this work. Other changes to the delta variant RBD include Thr478Lys that is not observed in the other four analyzed lineages. Manidipine forms contacts with Arg452 for both the delta-and C.36-bound conformations (Fig. 5A, C). However, while for the delta variants, Arg452 is close to the ligands' nitrobenzene group, for the C.36 variant it interacts with the phenyl group located on the opposite side of the molecule (Fig. 5 B, D). For the C.36 lineage, the nitrobenzene group forms hydrogen bonds with Arg403 and Lys417. As a result of these changes, the RMSD between these conformations is 10.9 Å. To better understand this phenomenon, we looked for possible sources of differences between the analyzed variants. As mentioned above, the only difference in the sequences of both variants is Thr478Lys; however, Lys478 is located more than 15 Å away from the nearest atom of manidipine. Thus, we looked for changes in the conformations of the other residues in the RBD that arose after the MD simulations. In our opinion, a key player is the altered conformation of Lys417. For C.36, this residue extends towards Arg403 in a conformation that allows the nitro group to form a hydrogen bond. However, for the delta variant, Lys417 moves away from Arg403 and the ligand is unable to simultaneously bind to both Lys417 and Arg452. This exemplifies the possible pitfalls of rigid docking, where even small alteration in the protein conformation can have a huge impact on the results. However, despite the change in the ligand's conformation of the delta variant, the ΔG binding is − 11.2 kcal/mol and is comparable to that observed for the C.36 lineage that is − 9.66 kcal/mol. The two remaining candidates for the delta variant inhibitors are lercanidipine (ΔG binding = − 10.90 kcal/mol) that is also a calcium channel blocker that shares a substantial similarity to manidipine. The diphenyl group of both ligands share almost exactly the same position near Phe497 and Tyr505. Finally, zafirlukast, is the third candidate to inhibit the delta lineage. The compound has identical therapeutic indications to pranlukast that was identified as a potential hit for the beta variant. The ΔG binding for zafirlukast is − 10.40 kcal/mol.
Finally, for the gamma variant saquinavir was found to have the best ΔG binding of − 10.20 kcal/mol. It was followed by candesartan cilexetil (ΔG binding = − 9.88 kcal/mol) and zafirlukast (ΔG binding = − 9.63 kcal/ mol). The latter two drugs have already been identified among strong inhibitors of other SARS-Cov-2 lineages. Saquinavir is a HIV protease inhibitor and potentially can interfere with the viral reproductive cycle at multiple stages. The ligand interactions are predominantly through hydrogen bonds with the side chains of Gln493 and Ser494 and the backbone of Phe490. Additional contacts include alkyl-pi interactions with Leu452, Leu455, Phe456 and Tyr489 (Fig. 6 A, B). Gln493 and Ser494 are also stabilizing candesartan cilexetil conformation, highlighting the importance of these residues.

Inhibitors of multiple variants
The high sequence similarity of all analyzed RBDs and agreement of the docking results prompted us to identify molecules that could act as inhibitors of multiple lineages simultaneously. Such compounds would be of great benefit as their use would speed up the treatment process. To identify such molecules, an average rank of each compound was calculated based on the results of individual docking experiments. Like previously, only compounds administered orally that are not metabolites of other substances were selected. The top drug identified with this procedure was zafirlukast. It is a potentially strong inhibitor of the beta (rank of 12.5), gamma (7th) and delta variants (6th) , although its binding to the alpha (27th) and C.36 variants (rank of 28.5) was slightly attenuated compared to other tested drugs. As a result its average rank was 16. As mentioned previously, different conformations of a given ligand can achieve a very similar docking score. For example, for zafirlukast the average RMSD between its alpha bound conformation and the conformations bound to other variants was 8.8 Å. There are, however, notable exceptions to this observation. The conformation bound to the gamma and C.36 variants have an RMSD value of 3.4 Å. Despite this, the average ΔG binding for zafirlukast was predicted to be − 9.6 kcal/mol with a standard deviation of 0.5 kcal/mol.
The second most promising candidate was also mentioned: pranlukast, which was among the best candidates for inhibitor of the beta variant. The average rank of this drug is 30, and it is lower than that of zafirlukast due to weaker binding to the delta variant (ranked 52nd). Nevertheless, the average ΔG binding of pranlukast is − 9.3 kcal/mol, close to that of zafirlukast. The remaining three compounds were candesartan cilexetil (average rank 31.8, ΔG binding = − 9.5 kcal/mol), saquinavir (40.4, ΔG binding = − 9.2 kcal/mol) and finally, simeprevir (41.9, ΔG binding = − 9.2 kcal/mol). All these compounds, except simeprevir, were already discussed in Section 3.2.1. Simeprevir, is used as an inhibitor of HCV protease and, like saquinavir, can interfere with the SARS-CoV-2 replication cycle on multiple levels. The ranks and ΔG binding of all these ligands are summarized in supplementary data.

Inhibitors derived from in vivo tested compounds
To expand the catalogue of possible spike inhibitors, we carried out molecular docking using the in vivo tested compounds selected from the ZINC15 database. As before ligands were docked to spike proteins of the five previously mentioned lineages. Unlike the ligands from Section 3.2.1, the in vivo tested compounds were prioritized based on their docking score, and additionally, by their favorable ADME parameters and minimal toxicity. ADME parameters and toxicity were calculated using the ADMETlab 2.0 server for the 500 best compounds either showing strong inhibition of individual lineages or inhibiting multiple lineages simultaneously. The ligands had to meet specific criteria, as described in Section 2.4, which should prioritize molecules that can be administered orally. As it turns out, only eleven molecules with unique ZINC id passed our filters.
This might be a reflection of the same issue observed for molecular docking experiments involving approved drugs. Larger compounds capable of forming multiple contacts with the protein were selected as best inhibitors due to the shape of ligand binding site. However, since one of our filters included restriction on the compound's mass and number of hydrogen bond donors and acceptors (in accordance with the Lipinski rule of five), most such compounds were then removed. As a result, the list of compounds that inhibit individual lineages and all lineages simultaneously is almost identical. ZINC000049888620 (BindingDB id BDBM50136234), a possible prostaglandin receptor antagonist [52] , was selected as the most universal compound because it showed an average rank of 788 and an average ΔG binding of − 8.8 ± 0.2 kcal/mol. After removing substances with poor ADME parameters, this compound turned out to be the best inhibitors of the alpha lineage despite its initial rank of 505. For the gamma and C.36 lineages, the ZINC000027711299 (BindingDB id BDBM50087164) molecule was selected as the most promising inhibitor. This compound was previously tested as a acetylcholinesterase inhibitor [53]. For the gamma lineage, it was ranked as the 94th ligand with ΔG binding of − 9.9 kcal/mol. This compounds' lower average rank compared to the previous molecule is the result of a poor rank among inhibitors of the alpha and beta lineages (2351st and 1374th ligand, respectively). Despite this fact, the average ΔG binding is similar to that of BDBM50136234 and equals to − 8.9 ± 0.9 kcal/mol. A stereoisomer of this compound was selected as the best hit for the delta variant with ΔG binding of − 10.4 kcal/mol. However, since each stereoisomer has a different ZINC id, they are considered separate hits. Finally for the beta variant, Soyasapogenol B was selected to be the 198 th ligand, ΔG binding of − 9.0 kcal/mol. Surprisingly, this compound's binding to all the other lineages was poor, thus in our opinion it might be a false positive hit.

Molecular dynamics simulations of selected ligands
As discussed previously, even slight changes to the conformation of a protein might have a dramatic impact on the results of molecular docking. Furthermore, changes to the conformation of a protein-ligand complex will impact the predicted binding energy of a potential inhibitor. To account for these effects and issues, we carried out the following procedure. First, for the most promising candidates selected during virtual screening step, we performed an unconstrained 100 ns MD simulations. This was followed by free energy of binding calculations using the MM/PBSA method, according to the procedure described in Sections 2.4, 2.6 and 2.7. We analyzed both the top four inhibitors for each individual lineage (three drugs plus one in vivo tested compound) and six molecules showing activity on more than one lineage (five drugs and one from the in vivo set). The initial conformation of each ligand was identical to that proposed by AutoDock4. Similarly, to the molecular docking, only the RBD of the spike protein was included in the simulation.

Some ligands do not retain binding to the RBD after the MD simulations
The first aspect we analyzed was the changes to the conformation of the ligand within the RBD. For several complexes, the ligand does not retain its conformation within the RBD, but rather, dissociates from the protein or binds to a different part of protein's surface. For example, in Fig. 7A, changes to the distance between one of the atoms of pranlukast and the Cα atom of Tyr495 of the delta variant throughout the simulation are indicated. As can be seen, the ligand stays inside the RBD of the delta lineage for around 50 ns of the simulation; however, afterwards, it drifts away and binds to a neighboring cavity (Fig. 7B). This situation is not unique as several other protein-ligand complexes exhibit similar behavior; like simeprevir bound to the spike from the C.36 lineage or candesartan cilexetil in complex with the spike from the delta or C.36 lineages. Such profound changes to the ligand's conformation is not without biological impact, as binding outside the RBD most likely wouldn't obstruct the formation of the spike-ACE2 complex. We next noticed that the dissociation of the ligands rarely occurs when a simulation involves a spike protein in complex with one of the top binders. To further explore this idea, we divided the complexes into two groups. The first group ("in-RBD") comprised complexes where the ligand remains bound to the ligand binding site, albeit sometimes in a very different orientation than the initial one. The second group ("out-RBD") included complexes where the ligand moved away from the RBD. A boxplot showing the distribution of ligands' ranks based on the AutoDock4 score for the complexes from both groups is presented in Fig. 7C. It can be seen that the "in-RBD" group comprises mostly the ligands of low rank; hence they tend to be the best binders for a given lineage. On the other hand, the "out-RBD" group includes complexes where the ligand's rank was higher.
As discussed previously, the ligands assumed very different conformations depending on the lineage it was associated with. This could be partially attributed to the rigid docking approach where small changes in the protein RBD impose large changes to the ligand's 3D structure. This, however, is not an issue with molecular dynamics where the full motion of a protein might allow a ligand to converge to similar conformations regardless of the lineage its bound to. Fig. 8A shows conformations of zafirlukast from all five variants after 100 ns MD simulation. The ligand retains its binding to the RBD but the RMSD between the conformations is 20.6 Å with alpha bound conformation as a reference. Still, these conformations would retain their ability to block the spike-ACE2 interactions. The same analysis applied to all other compounds yielded similar conclusions. For pranlukast, the average RMSD was 7 Å (with the alpha bound conformation as a reference), while after MD simulations it increased to 14 Å even after excluding the delta bound variant which moved outside the RBD (Fig. 8B).
Although the orientation of all ligands changed dramatically throughout the simulations, the proteins' structure did not. For example, Fig. 8C depicts the RMSD for Cα atoms calculated with respect to the initial conformation for all the complexes involving pranlukast. Throughout the 100 ns simulations, the RMSD value rarely exceeded 3 Å. Changes to the RMSD value were not affected by the ligand involved in the simulations. Considering other cases, the simulation of saquinavir bound to the spike from the beta variant yielded the highest observed RMSD for the protein (4.2 Å). This suggests that 10 ns simulation of the apo-form of a protein prior to the molecular docking was sufficient to alleviate any steric clashes that might have arisen due to the homology modelling procedure.

MM/PBSA binding free energy calculations
Finally, to further refine the list of ligands identified with virtual screening, we calculated ΔG binding using the MM/PBSA method. The total binding energy comprises the following energy terms: (I) van der Waals energy; (II) electrostatic energy; (III) electrostatic contribution to the solvation free energy, and (IV) nonpolar solvation energy approximated using the solvent accessible area as discussed in Section 2.7. The ΔG binding was calculated for every nanosecond of the simulation and averaged to obtain the final value. The results of this procedure for selected ligands are presented in Table 1. The ΔG binding calculations included snapshots of the simulation where the ligand had drifted outside the RBD. This fact resulted in increased values of ΔG binding for some of the complexes, indicating that the ligand cannot carry out its proper function as an inhibitor of the spike protein. As it turns out, the ΔG binding for most molecules is comparable or slightly better than that predicted by AutoDock4. For example, the ΔG binding for zafirlukastalpha spike complex was − 9.2 kcal/mol according to AutoDock4. However, the ΔG binding calculated with the MM/PBSA method was closer to − 21.6 kcal/mol, although considering measurement uncertainty, it could be as high as − 14.2 kcal/mol. The average ΔG binding for this ligand, considering all the analyzed SARS-CoV-2 lineages, was around -22 kcal/mol, with the ligand occupying the RBD in all the analyzed cases. A similar average ΔG binding was obtained by simeprevir; however, for its complex with the spike from the C.36 lineage, the ligand binds near the loop between Ser438 to Lys444, outside the spike-ACE2 interface. Thus, despite the ΔG binding of − 20.8 kcal/mol for this complex, simeprevir might not block access of ACE2 to the C.36 RBD. The three remaining ligands (candesartan cilexetil, pranlukast and saquinavir) obtained an average ΔG binding between − 12 to − 17 kcal/mol. However, they usually failed to remain bound to the RBD of at least one tested lineage. This resulted in a substantial reduction of ΔG binding . For example, for candesartan cilexetil, the upper value of the predicted ΔG binding with the spike from the delta variant was positive, indicating no complex is formed spontaneously with this drug. Next, we observed that ligands selected based on their ability to inhibit specific lineages usually obtained lower ΔG binding than ligands that inhibited multiple lineages simultaneously. This is in agreement with the results from the docking experiments. Finally we analyzed the compounds from the in vivo group. ZINC000049888620 selected as a potential inhibitor of multiple lineages showed the average ΔG binding of − 11.5 kcal/mol. This indicates that this ligand is a weaker inhibitor than the ligands selected from among the known drugs. It must also be noted that during the simulation with the delta variant of spike, ZINC000049888620 separated from the RBD, hence, it exhibits a relatively poor binding to that lineage.

Concluding remarks
Researchers have made extensive use of computational tools ranging from chemoinformatic searches to advanced molecular dynamics simulations to determine potential drugs for COVID-19. Most of them utilize a drug repurposing strategy that allows identifying compounds that do not require time-consuming clinical trials and can be quickly introduced on the market. However, SARS-CoV-2 evolves rather rapidly and there are now more than 1,700 variants of the virus. As a result, new compounds effective against novel strains must be found, especially the Variants of Concern. Furthermore, it is important to reevaluate if compounds effective against the Wuhan-1 strain remain active.
The results of our study showed that the most favorable ligands among drugs administered orally are zafirlukast, pranlukast, candesartan cilexetil, saquinavir and simeprevir. Their predicted AutoDock4 ΔG binding was around − 10 kcal/mol for all analyzed lineages -alpha, beta, gamma, delta, and C. 36. This indicates that their inhibition constant is in nanomolar range. These compounds interact with a diverse set of residues from the RBD of the spike protein, some of which are lineagespecific. For candesartan cilexetil, there is a clear difference in the ligand's conformation when bound to either spikes from alpha or C.36 lineage. This can be linked to the Leu452Arg mutation located in the RBD of the spike near its interface with ACE2. Changes to the ligands' conformation did not, however, dramatically alter the predicted ΔG binding for this compound. Furthermore, we tested more than 30,000 compounds from substances previously tested in vivo to expand the list of possible anti COVID-19 drugs. The compounds selected from the in vivo dataset had to show ADME parameters indicating plausible oral administration and minimal toxicity; we selected BDBM50136234 as a molecule of interest. Finally, because molecular docking can only roughly approximate the ligand binding energy, we used MD simulations coupled with MM/PBSA to better estimate this parameter. As it turns out, some of the tested drugs did not remain within the RBD near its interface with ACE2 but rather dissociated or bound to other parts of spike. This is not the case for zafirlukast, which remained bound to the RBD, albeit with a different conformation than that predicted by Auto-Dock4. The average ΔG binding for this ligand, considering all the analyzed SARS-CoV-2 lineages, was around − 22 kcal/mol. Other compounds identified with molecular docking obtained the ΔG binding between − 11.5 kcal/mol to − 22 kcal/mol.
Many of the molecules identified with our study were already indicated as potential targets for COVID-19, albeit often targeting proteins Table 1 Result of the MM/PBSA calculations for the selected protein − ligand complexes. The mean ΔG binding is shown with standard deviation in units of kcal/mol. The gray fields indicate that the ligand migrated from the RBD during the simulation. From among the top three inhibitors selected for individual lineages, we show the results of the best compound ("Best among top 3"). The compound's name is provided in parentheses in the appropriate column. other than spike. Zafirlukast is a leukotriene receptor antagonists used in asthma treatment; however, in the work of Mehyar et al., the authors used molecular docking and FRET-based assay to show that it is also active against the SARS-CoV-2 helicase [54]. In another work, it was shown using machine learning that pranlukast could be repurposed as an effective drug for COVID-19 [55]. Montelukast is a third leukotriene receptor antagonist found in our analysis. According to multiple in silico and in vitro studies, it has the dual potential to inhibit both the SARS-CoV-2 M pro and viral entry into the host [56]. Saquinavir and simeprevir are inhibitors of proteases from HIV and HCV, respectively. Saquinavir was also indicated as a viable inhibitor of M pro after applying QSAR and HQSAR models [57], while the latter compound was identified as an M pro inhibitor using molecular docking [58]. Candesartan cilexetil, which according to our study can inhibit spike from various lineages, had an IC50 value of approximately 67 μM against M pro in a FRET-based activity assay [59]. This drug can have a dual effect as it is used in high blood pressure treatment, which is also one of the hallmarks of COVID-19 due to the SARS-CoV-2 influence on the RSS. Azelnidipine, a calicum channel blocker used in hypertension, was found among 15 compounds to exhibit anti-infective effects in Vero E6 cells [60]. Consequently, many of the compounds we identified may possibly target multiple steps in the viral life cycle, not only host cells recognition. However, to determine the true potential of compounds identified in this study and their role as spike inhibitors further in vitro and in vivo studies are required.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.