In silico studies on phytochemicals to combat the emerging COVID-19 infection

The current COVID-19 pandemic, caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and its variants, remains a serious health hazard globally. The SARS-CoV-2 Mpro and spike proteins, as well as the human ACE2 receptor, have previously been reported as good targets for the development of new drug leads to combat COVID-19. Various ligands, including synthetic and plant-derived small molecules, can interact with the aforementioned proteins. In this study, we investigated the interaction of eight phytochemicals, from selected medicinal plants (Aegle marmelos, Azadirachta indica, and Ocimum sanctum) commonly used in Indian traditional medicine, with SARS-CoV-2 Mpro (PDBID: 6LU7), SARS-CoV-2S spike protein (PDB ID: 6M0J) and the human ACE2 receptor (PDB ID: 6M18). All compounds were subjected to density functional theory (DFT) and frontier molecular orbitals (FMO) analysis to determine their geometry, and key electronic and energetic properties. Upon examining the interactions of the phytochemicals with the human ACE2 receptor and the SARS-CoV-2 Mpro, spike protein targets, two compounds (C-5 and C-8) were identified as the best binding ligands. These were further examined in MD simulation studies to determine the stability of the ligand–protein interactions. QSAR, pharmacokinetic and drug-likeness properties studies revealed that C-5 may be the best candidate to serve as a template for the design and development of new drugs to combat COVID-19.


Introduction
The on-going COVID-19 pandemic, caused by the SARS-CoV-2 virus, remains a serious global health issue due to the highly contagious nature of this virus [1,2]. Unfortunately, the pandemic continues to have detrimental effects on all sectors of society [3]. The symptoms and clinical presentations of COVID-19 are complex, due in part to the ability of the virus to present rapid mutations in its spike protein receptor binding domain (RBD). The common symptoms include a dry cough, fever, shortness of breath, fatigue and dyspnoea. SARS-CoV-2 mainly transmits through respiratory aerosols/droplets from infected persons' coughs, sneezes, talks, breaths and via airborne transmission [2,4]. Other means of transmission such as direct and indirect contacts (e.g. via the fecal-oral route) have also been reported [5,6]. The virus infects the upper and lower respiratory tract, heart, kidney, liver, gut, nervous system and ultimately can lead to multi-organ damage [7]. Severe complications are frequently observed in immunocompromised individuals with diabetes, obesity, cardiovascular disorders and hypertension [8][9][10]. Different SARS-CoV-2 variants have been reported worldwide, with double and triple mutants of this virus present in some countries. These variants are highly transmissible (i.e. high infectivity and transmission rate), can cause re-infections, and there is some concern as to whether current vaccines can control all of them [11,12].
The main research efforts towards the discovery of new drugs to combat COVID-19 using computer-aided methodologies to date have focused on targeting the active site of SARS-CoV-2 Mpro and spike protein with small molecule ligands [13][14][15]. This has included studies on synthetic derivatives [16], including one study which identified six benzimidazole and benzothiazole derivatives with high binding affinity for SARS-CoV-2 (Mpro) and the human ACE2 receptor [12]. This has also included investigating phytochemicals to find potential inhibitors of Mpro through virtual screening and structure-based drug discovery approaches [17][18][19]. For example, Lakshmi and co-workers [20] investigated 47 phytochemicals against SARS-CoV-2 (Mpro), SARS-CoV-2S and the human ACE2 receptor. Another study investigated 27 caffeic-acid derivatives against several proteins of SARS-CoV-2, reporting MD simulations, ADME properties and toxicity profiles of these derivatives [21].

Computational studies (DFT)
The selected phytochemicals were first optimized using the Gauss View 6.0. 16 program [22] and subjected to density functional theory (DFT) calculation using the GAUSSIAN 09 suite programs [23]. Computations were executed using the B3LYP method and exchange-correlation functional with 6-31 G (d, p) basic set for the calculation of carbon, nitrogen, oxygen, sulphur and hydrogen atoms.

Molecular docking study
The binding affinities of ligands can be predicted using molecular docking protocols [24]. Receptor-oriented flexible docking was carried out using the Autodock Vina package. The selected phytochemicals were docked against three essential targets, namely SARS-CoV-2 Mpro (PDB ID: 6LU7), ACE2 (PDB ID:6M18) and SARS-CoV-2S spike protein (PDB ID: 6M0J). The three-dimensional crystal structure of SARS-CoV-2 Mpro (PDB ID: 6LU7), SARS-CoV-2S (PDB ID: 6M0J) and the human ACE2 receptor (PDB ID: 6M18) were retrieved from the Protein Data Bank (http://www.rcsb.org/ pdb). Prior to the docking process, all ligands and the proteins were prepared and saved as pdbqt files. The Autodock Tools software was used to attribute polar hydrogens, solvation parameters, Kollman charges and fragmental volumes to each protein. It was also used to create a grid box around the binding site of each protein. The configuration file for the grid parameters was generated using the Auto grid engine. During the docking procedure, the ligands and proteins were considered as flexible. Results <1.0 Å in positional RMSD clustered together and represented the best binding free energy. The best pose with the lowest docking score (binding energy or binding affinity) was used for further analysis. Discovery Studio 3.5 was used to visualize the 2-D and 3-D interactions between ligand atoms and protein amino acid residues.

Molecular dynamics simulation
MD simulations were carried out to evaluate the stability of the protein-ligand complex. This was carried out only for the best complexes (C8-6LU7, C5-6M18, and C8-6M0J) using the Desmond MD simulation package of Schrodinger [25]. The OPLS_2005 force field was used for the protein-ligand complexes. These complexes were solvated in a cubical water box (TIP3P water model) in x, y and z dimensions using the system builder tool of Desmond keeping 12 Å buffer space. As and when required, each system was neutralized by the addition of suitable counter ions. The neutralization was maintained by adding an ionic concentration of 0.15 M NaCl. 10,000 steepest descent steps were used to minimize the systems followed by gradual heating from 0 to 300 K, under NVT ensemble. Before each run, the systems were allowed to relax thermally using the Nose-Hoover Chain thermostat method for 5 ns and pressure relaxation for 5 ns using the Martyna-Tobias-Klein barostat method. A 100 ns run was performed under the NPT ensemble using a cut-off distance of 12 Å . Trajectories of 5000 frames were generated and saved at every 10 ps. In silico studies on phytochemicals to combat the emerging COVID-19 infection

Pharmacokinetic and drug-likeness predictions
The SwissADME online software was used to predict the pharmacokinetic properties of the investigated compounds. This included their drug-likeness properties, solubility and other properties. The Lipinski's properties were determined using the PubChem database. A work flow diagram of the methodology used is illustrated in Scheme 1.

Molecular modelling studies
Density Functional Theory (DFT) calculations, using the commonly-employed GAUSSIAN inter phase, were used to theoretically outline the structural projections and atomic arrangements of the studied compounds. The use of exchange-correlation functional enabled to estimate molecular properties with precision comparable to conventional correlated ab-initio methods [26,27]. Molecular orbitals (MOs) geometry optimization studies were used to evaluate the geometry and electronic properties of compounds [28][29][30][31][32][33]. The key energetic properties such as single point energy and dipole moment (D) values for each compounds are presented in Table 1. It was observed that compound C-3 had a consider-ably higher single point energy than all other compounds. Compound C-8 showed the least energy among all compounds, which indicated a greater stability. All compounds possessed dipole-dipole interactions, but compound C-2 had a higher dipole moment value compared to others.

FMOs approach
FMOs studies are helpful to understand the reactivity of compounds and identify the reactive site in conjugated systems.
Scheme 1 Work flow diagram of the methodology used in the study.  [34]. The energy gap (band gap) corresponds to the chemical reactivity and chemical stability behaviour of active molecules [35]. C-2 was found to have the least energy difference [E HOMO -E LUMO ] among all compounds. The energy difference values obtained suggest that the reactivity of C-3 will be the greatest, and the stability of C-4, C-5, C-7 and C-8 will be greatest among all compounds. The E HOMO and E LUMO variation describes the charge transfer (CT) fundamental interaction. To get some conclusive evidence, a range of chemical reactivity parameters such as chemical potential (l), electronegativity (v), global softness (S), global hardness (g), and global electrophilicity index (x) [36] were calculated ( Table 2). The chemical softness (S) values obtained for C-1, C-3, C-4, C-5, C-6 were less compared to C-7, C-8 (moderate) and C-2 (maximum), which explains the compounds stability. A compound is more reactive if it has a high chemical softness value and in reverse, for smaller values, reactivity decreases and stability increases [30,32]. The electrophilicity (x) value is another important parameter, and positive values measures the flow of the system to gain electron from the surrounding [30,32]. We observed that C-4 was the most stable compound because it had a low electrophilicity value compared to other compounds ( Table 2).

QSAR studies
QSAR studies were used to anticipate the reactivity and properties of the selected compounds. The computational calcula-tion was carried out using the HyperChem Professional 8.0.3 program. Initially, all compounds were optimized using the (MM + ) force field, with semi-empirical PM3 methods, and energy minimization was achieved using a Fletcher-Reeves conjugate gradient algorithm method. We found that the partition coefficient (log P) value for C-5 was greater than the rest of the compounds. Log P values are important to evaluate the biological activity of compounds and estimate their permeability into cell membranes [37]. Other key parameters such as Refractivity, Polarizability, Mass, Volume, Hydration energy, Surface area, Total energy, Free energy and RMS Gradient were also estimated for all compounds (Table 3).

Molecular docking study
Molecular docking was used to predict the binding interactions between each protein and the selected phytochemicals (ligands). The three-dimensional crystal structure of SARS-CoV-2 Mpro (PDB ID: 6LU7), SARS-CoV-2S (PDB ID: 6M0J) and the human ACE2 receptor (PDB ID: 6M18) were used as the biological targets for the docking analysis. We investigated eight previously reported phytochemicals from different natural sources [38][39][40]. All phytochemicals interacted with the main protease (Mpro) by docking in the binding pocket cavity comprising of common amino acid residues including ARG131, LYS137, THR199, TYR239 and LEU287. The best docking (binding free energy) scores for all phytochemicals were found in the range À6.2 to À7.9. Compound C-2 exhibited a binding energy value of À6.2 while C-8 had a binding energy value of À7.9. Detailed interactions are depicted in Fig. 2 and Table-S1. These ligands were also  docked with the human ACE2 RBD. Compound C-3 showed a binding affinity for ACE2 with a docking score of À6.3, whereas compounds C-5, C-6 and C-8 were predicted to have higher binding energies ( Fig. 3 and Table-S2). The docking scores obtained for each compound interacting with the SARS-CoV-2 spike protein receptor are presented in Fig. 4 and Table-S3. Binding energy values for all compounds ranged from À6.4 to À7.6, with C-5, C-6, C-7 and C-8 exhibiting the highest binding energy values. Taking all targets into consideration, we found that C-5, C-6 and C-8 exhibited the highest binding affinity values. Based on their scoring values, these compounds were further inspected   [12,15,16,19].

Molecular dynamics simulation
As margolonone (C-5) from A. indica, and luteolin-7-Oglucuronide (C-8) from Ocimum spp. revealed significant interactions with the protein targets in the molecular docking study, these compounds were chosen as the best ligands for further MD simulation work. A molecular dynamics simula-tion of 100 ns was carried out for these compounds to get better insights into the stability of the protein-ligand complexes. The overall stability was further investigated through RMSD and RMSF analysis.

Protein RMSD
MD simulation was run as reported earlier to investigate the stability of each ligand-protein interactions [41]. MD simulations were performed for the best complex (C8-6LU7, C5-6M18, and C8-6M0J) [15,42]. The ligand-protein complexes were simulated for 100 ns to analyze RMSD and RMSF (Fig. 5). For the C8-6LU7 complex, the protein achieved a maximum RMSD (2.6 Å ) at 90 ns and then gradually came to an equilibrium reaching 1.3 Å . The ligand showed initial fluctuations, then reached 20.0 Å and became stable reaching 15.0 Å at 100 ns for the same complex (Fig. 5a). The RMSF In silico studies on phytochemicals to combat the emerging COVID-19 infection remained less than 2.5 Å throughout the simulation (Fig. 5a). The protein RMSD trajectory for the C5-6M18 complex achieved a maximum root means square deviation (12.0 Å ) at 10 ns, decreased further up to 100 ns and then became stable at 9.0 Å . The ligand showed fluctuations initially and then gradually reached equilibrium (Fig. 5b). The RMSF was less than 4.5 Å for the complex (Fig. 5b). For the C8-6M0J complex, the protein achieved a maximum RMSD (2.8 Å ) at   (Fig. 5c). The RMSF was less than 3.0 Å for the complex throughout the simulation (Fig. 5c). The RMSD average value of Mpro (6LU7), ACE2 (6M18) and the spike protein (6M0J) at equilibrium was found to be 2.1 Å , 9.0 Å and 1.6 Å , respectively. A deviation within 1-3 Å is acceptable for small globular proteins [15,41,42]. These complexes were further used for protein-ligand contact analysis.  (Figs. 6-8).

Protein-ligand contacts
The protein-ligand (P-L) contacts for these stable complexes were studied using contact histograms (Figs. 9-11) [43]. The P-L interactions were classified into four types, including hydrophobic, ionic, hydrogen bonds, and water bridges. Hydrogen bonds play a crucial role in ligand binding and, as such, are crucial to take into account in drug design.

Pharmacokinetics and drug-likeness predictions
The pharmacokinetic and drug-likeness properties of the investigated compounds were calculated ( Fig. 12 and Tables S4-S9).
In-silico pharmacokinetic studies are used as an effective approach towards the search and design of potential small drug-leads for a specific target [44]. All compounds (except for C-4, C-6, C-7, C-8) obeyed the Lipinski's Rule of Five Fig. 9 Protein-ligand contact plots and interaction residues for the (C8-6LU7) complex.
(Ro5), with a molecular weight < 500 g/mol, number of hydrogen bond donors and acceptors < 5, logP value < 5 and molar refractivity < 140 [45,46]. The TPSA of all compounds (except for C-7, C-8) was less than 110 Å 2 , indicating the potentiality of these compounds as favourable drug molecules [43]. The number of rotatable bonds for all compounds was 5, suggesting that these compounds are flexible in nature. In addition, all compounds (except for C-7, C-8) were sol-uble and highly absorbable in the gastrointestinal tract. The synthetic accessibility value of the compounds was 6, which indicated their feasibility of synthesis. Drug-likeness filters such as, Ghose, Egan, Veber, and Muegge filters were also used to enhance the predictions. As C-6, C-7, C-8 did not obey the rules for oral drug-likeness, these compounds may be administered through other routes. Our results indicate that C5, however, is the best candidate for oral administration.

Conclusion
Upon examining the interactions of eight phytochemicals with the human ACE2 receptor, and the SARS-CoV-2 Mpro, spike protein targets, two phytochemicals (C-5 and C-8) were identified as the best binding ligands. These were further examined in MD simulation studies to determine the stability of the ligand-protein interactions. QSAR, pharmacokinetic and drug-likeness properties studies revealed that C-5 may be the best candidate for the design and development of new drugs. Further studies are warranted to examine the potential of this phytochemical in the fight against COVID-19.

Author contributions
All data were generated in-house, and no paper mill was used. All authors agree to be accountable for all aspects of work ensuring integrity and accuracy.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.