Identification of potential bioactive compounds from Azadirachta indica (Neem) as inhibitors of SARS-CoV-2 main protease: Molecular docking and molecular dynamics simulation studies

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative virus of coronavirus disease 2019 (COVID-19), has caused serious health problems worldwide and placed tremendous socioeconomic burdens. Azadirachta indica (Neem) is known as a versatile medicinal plant with many pharmacology activities. This study explored the potency of bioactive compounds from A. indica as inhibitors for SARS-CoV-2 main protease (M pro ) through molecular docking and molecular dynamics simulation (MDS) studies. Molecular docking and MDS were performed on 76 compounds contained in A. indica after a geometry optimization stage. This study found that odoratone (ORN), salimuzzalin (SMZ), and nimbocidin2 (NC2) had the best docking scores of −11.57, −9.83, and −9.60 kcal/mol, respectively. These scores are even better than nirmatrelvir (NTV) as an active drug targeting SARS-CoV-2 M pro (−8.42 kcal/mol) and the reference ligand (FJC) (−7.93 kcal/mol). Although SMZ indicated the lowest average root mean square deviation value (1.90 Å) for the SARS-CoV-2 M pro backbone disruption and the lowest average root mean square fluctuation value (1.32 Å) when interacting with residues, ORN still had the best average Δ G oMMGBSA value (−31.27 kcal/ mol), which showed the strongest binding of the protein-ligand complexes. These results could be a starting point for further in vitro and in vivo evaluation of several compounds in A. indica that are potential SARS-CoV-2 M pro inhibitors.


Molecular docking
The molecular docking was conducted in AutoDock 4.2 (https://autodock.scripps.edu/)(Morris et al., 2009) on a high-performance computer with Debian GNU/Linux 8.11 (jessie) 64-bit operating system, Intel® Core™ i7-5820K CPU @ 3.30 GHz × 12 processors, 31.4GiB memory, and NVIDIA® GeForce GTX™ 980 Ti/PCIe/SSE2 graphics card.First, the redocking of the reference ligand, FJC, on the target protein was necessary for molecular docking method validation.Subsequently, all prepared secondary metabolites and NTV were docked to the target protein in the same parameters as molecular docking validation.The molecular docking parameters used were central grid point coordinates x = −12.045,y = 11.235,z = 68.856;grid box size 40 × 46 × 40; grid point spacing 0.375 Å; default docking parameters; and Lamarckian Genetic Algorithm (Morris et al., 1998).Nonbonded interactions between ligand and SARS-CoV-2 M pro for the molecular docking result structures with the lowest free energy of binding, NTV, and FJC were analyzed with BIOVIA Discovery Studio 2021 Visualizer (https:// www.3ds.com/).
Azadirachta indica A. Juss, commonly known as Neem, is a member of the Meliaceae family, which originated from South and Southeast Asia, and is now grown in other tropical and subtropical regions, including Africa, America, and Australia (Passos et al., 2019).It is one of the most versatile medicinal plants having a wide range of pharmacological activities, such as antioxidant, anti-inflammatory, antidiabetic, anticancer, antimalarial, antifungal, antibacterial, antiviral, hepatoprotective, neuroprotective, and wound healing effect activities (Alzohairy, 2016).Azadirachtin, nimbolide, and gedunine are some of the bioactive compounds of A. indica known to have a stupendous ability to arrange many biological pathways in vitro and in vivo (Sarkar et al., 2021).Several reports were showing the antiviral properties of A. indica against coxsackievirus B (Badam et al., 1999), herpes simplex virus type 1 (Tiwari et al., 2010), human immunodeficiency virus type 1 (Awah et al., 2011), dengue virus type 2 (Dwivedi et al., 2021), and poliovirus type 1 (Baildya et al., 2021;Faccin-Galhardi et al., 2012).
Computer-aided drug discovery approaches are a cost-effective and time-saving strategy to screen candidates with inhibitory potency targeting SARS-CoV-2.Furthermore, molecular docking and virtual screening techniques have been demonstrated to identify potential drugs for COVID-19 treatment by the protein-ligand free energy of binding estimation (Hosseini et al., 2021).Sharon (2020) has reported the potency of selected compounds from A. indica in inhibiting SARS-CoV-2 M pro (PDB code: 6Y2E, 6LU7, and 2GTB) using molecular docking.Among 13 bioactive molecules, azadiradione, epiazadiradione, nimbione, and vepnin seemed to have the highest docking score (Sharon, 2020).Another study also showed that azadirachtin, epoxyazadiradione, and gedunin are the compounds that give the highest docking score (Kumar, 2020).
The molecular docking technique has restrictions on protein flexibility, solvation models, and its simplification of the scoring functions (Souza et al., 2021).Thus, another computational method is vital to complement molecular docking to give more reliable results.Full atomistic molecular dynamics simulation (MDS) is one of the methods for analyzing the dynamic behavior of ligand, protein, and solvent at a fixed period (Santos et al., 2019).
In the current work, a more extensive data set of secondary metabolites than in former studies (Kumar, 2020;Sharon, 2020) was utilized to explore the potency of A. indica against SARS-CoV-2 M pro (PDB code: 6M0K) through molecular docking and MDSs.Rather than using an aspartate protease inhibitor as positive control like in those studies, we compared the testing ligands using a specific 3CL-like protease inhibitor, nirmatrelvir (NTV), for positive control in the docking and MDS.The 3CL pro is the M pro of SARS-COV-2.It has the characteristic of the catalytic dyad Cys-His, which differs from the catalytic triad Asp-Thr-Gly of aspartate protease.Thus, using NTV as positive control is more reasonable.
where each G x term on the right in the equation is calculated as follows in E MM is the molecular mechanical energy in the gas phase and G solvation is the solvation energy.A single trajectory protocol approach was utilized for MMGBSA calculation, where the receptor and the ligand trajectories were extracted from the complex.Consequently, the internal energy terms, including angles, bonds, and dihedrals, were excluded as these are similar in bound and unbound states (Valdés-Tresanco et al., 2021).

Drug-likeness and absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction
The drug-likeness of secondary metabolites in A. indica was evaluated using Lipinski's rule of five in ChemOffice 2020 (https://perkinelmerinformatics.com/).The rule applies five criteria to specify if a compound is drug-like or not.The compound must have a molecular weight lower than 500 Da, log P under 5, h-bond donors fewer than or equal to 5, h-bond acceptors fewer than or equal to 10, and the violation from mentioned four rules not more than one.Eventually, pharmacokinetic properties, including ADMET, were predicted using the pkCSM server (http://biosig.unimelb.edu.au/pkcsm/prediction) (Pires et al., 2015).
Molecular docking studies were performed on all secondary metabolites, NTV, and FJC.In advance, a validation step was conducted by redocking FJC as the reference ligand on SARS-CoV-2 M pro .This is intended to verify whether the docking algorithm results in an accurate pose and whether the scoring function recognizes it as a top pose (Prieto-Martínez et al., 2018).It was confirmed to be valid if the RMSD of crystal and redocked structures is less than 2 Å (Tallei et al., 2020).The best RMSD achieved from this step was 1.938 Å. Crystal and redocked structures of reference ligand in the overlay are depicted in Figure S2.
Virtual screening based on a molecular docking approach was employed to identify potential compounds RTX™ 8000 graphics card.The protein topology was written by pdb2gmx using CHARMM36-feb2021 (http://mackerell.umaryland.edu/)(Huang et al., 2017) force field and TIP3P original water model (Jorgensen et al., 1983).The topologies of the minimum energy configurations of the three best secondary metabolites in A. indica reference ligand, and NTV obtained from molecular docking studies were processed in CHARMM General Force Field server (https://cgenff.umaryland.edu/)(Vanommeslaeghe et al., 2010;Yu et al., 2012).Furthermore, the topology of the M pro -ligand complex was built by combining M pro and ligand topologies.
The unit cell was defined using a rhombic dodecahedral shape, and the M pro -ligand complex was placed at its center at least 10 Å from the unit cell shape edge.Besides, this unit cell was filled with water using spc216.gro,a generic equilibrated three-point solvent configuration for TIP3P water.Using genion module in GROMACS 2021.5, some water molecules were replaced with Na + and Cl − to make the system charge neutral.Hereafter, energy minimization was accomplished by the steepest descent algorithm until the maximum force was under 1,000 kJ mol −1 nm −1 .Two equilibration phases were carried out, isothermal-isochoric ensemble (NVT) and isothermal-isobaric ensemble (NPT).Both NVT and NPT equilibration phases were done for 100 ps at 300 K. Fast smooth particle-mesh Ewald (PME) (Darden et al., 1993) was applied for long-range electrostatic.In contrast, the cut-off of short-range electrostatic and van der Waals was determined at 12 Å.After the system was well equilibrated at 300 K and 1 bar, the 100-ns molecular dynamics production was run in the same parameters as NPT.However, the position restraints were released previously.The temperature and pressure coupling used for the production were modified Berendsen thermostat (v-rescale) (Bussi et al., 2007) and the Parrinello-Rahman barostat (Parrinello and Rahman, 1981).Unrestrained MDS runs were conducted in triplicate for each ligand.The MDS was supported with the graphical processing unit, where the PME, nonbonded force calculation, and bonded force calculation could be offloaded, which accelerated the simulation significantly.

MDS trajectory analysis and binding energy calculation
The MDS structural trajectories were recentered and rewrapped within the rhombic dodecahedral shape using the gmx trjconv module.gmx rms module was computed for the root mean square deviation (RMSD) to compare each structure from a trajectory to the energy minimization result structure.In addition, the root mean square fluctuation (RMSF) of the residue position in the trajectory after fitting to the energy minimization result frame was analyzed with the gmx rmsf module.
The binding free energy values of protein-ligand complexes were estimated by molecular mechanics generalized born surface area (MMGBSA) method using gmx_MMPBSA (https://valdes-tresanco-ms.github.io/gmx_MMPBSA/dev/)(Valdés-Tresanco et al., 2021).The MMGBSA binding free energy (ΔG o MMGBSA ) in solution is defined as follows in donor-donor bond with NTV.Besides, NTV and FJC have halogen (fluorine) bonds that ORN and SMZ do not have.
The halogen bonding has stabilization effects on inter-and intramolecular interactions and influence molecular folding (Suárez-Castro et al., 2018).These bonding types were found in Thr26 connected to NTV as well as Arg188 and Gln189 linked to FJC. contained in A. indica.In molecular docking simulation, the energy of ligand-protein complex structure must be the global minimum on the binding energy landscape, representing a thermodynamic state of the energy function used in the simulation (Verkhivker et al., 2000).The molecular docking studies showed that among 76 secondary metabolites, 18 of them have higher binding energy scores than NTV.NTV was used in this study as this antiviral component of PAXLOVID™ which has SARS-CoV-2 M pro enzyme inhibitory activity (Catlin et al., 2022).Odoratone (ORN), salimuzzalin (SMZ), and nimbocidin2 (NC2) had the highest binding energy scores of −11.57, −9.83, and −9.60 kcal/mol, respectively.In comparison, binding energy scores for NTV and FJC are −8.42 and −7.93 kcal/mol, respectively.The estimated free energy of binding for all compounds is shown in Table S2, while the top 15 rankings, reference ligand, and NTV are listed in Table 1.
The two best secondary metabolites from molecular docking studies, ORN, and SMZ, were selected for visualization.Figure 1a shows that ORN built conventional hydrogen bonds with several residues, i.e., His163, Gln189, Thr190, and Gln192.Hydrogen bonding is principal intermolecular interactions that provide plenty of stability to the protein-ligand complex (Kwofie et al., 2021).Other critical amino acids involved in hydrophobic interactions (alkyl bonds) are Met165 and Pro168.Hydrophobic interaction plays an essential role as a driving force that allows the spontaneous folding of the protein into 3D structures (Yusof et al., 2019).
NTV and FJC from molecular docking results were also visualized in Figure 1c and d, respectively.In contrast to ORN and SMZ, Gln192 was linked via an unfavorable  Atomic fluctuation of each residue was evaluated through RMSF to understand the residue behavior of SARS-CoV-2 M pro in complex with the ligands.The higher the peak, the higher the atomic fluctuation (Fig. 3).The atoms in N-and C-terminal have higher fluctuation, up to 8 Å, than the atoms in any other area of the protein.The atoms in the protein surface loops tend to fluctuate more than the atoms in the α-helix or β-strand, especially those in the buried area, which indicates a more flexible movement of the protein loops.Some residue types, such as arginine, phenylalanine, glutamine, and tyrosine, located in the surface loops have RMSF values between 2 and 4 Å compared to other residue types in the same location have lower RMSF values.This could be due to their longer side chain that interacts with polar water molecules.
The atomic fluctuation in the ligand binding site was also evaluated to know ligand binding stability inside the pocket.It was revealed that SMZ generated the lowest average RMSF value of 1.32 Å when interacting with residues in the ligand binding site, followed by NTV (1.42 Å), ORN (1.43 Å), NC2 (1.62), and reference ligand, FJC, (1.71 Å).The average RSMF value of residues in the ligand binding site for SMZ, NTV, and ORN is even lower than that of the Apolipoprotein-SARS-CoV-2 M pro (1.61 Å).Residues in the ligand binding site with high RSMF values are predicted to make weaker interactions with the ligand.It implies that SMZ produces the most optimal binding when interacting with SARS-CoV-2 M pro .Some residues in the ligand binding site whose atoms fluctuate more than 2 Å when interacting with SMZ are only Met49, Arg188, and Gln189, of which the last two residues happen to be in the surface loop.The other residues in the ligand binding site whose atoms have RMSF lower than 2 Å might provide stronger binding.Interestingly, ORN might provide stronger binding because all residues in its binding site have atomic fluctuation below 2 Å, although the average value is not better than SMZ.
The ΔG o MMGBSA calculation was carried out to 100 ns of MDS trajectory (0-10,001 frames).Figure 4 shows that ORN had the most significant average ΔG o MMGBSA value, −31.27 kcal/mol, compared to other ligands.Meanwhile, the reference ligand, FJC, had a more negative average ΔG o MMGBSA value than two other compounds from A. indica, SMZ and NC2, and the positive control, NTV.The calculated average ΔG o MMGBSA values of FJC, SMZ, NC2, and NTV were found to be − 26.46, −20.73, −16.30, and −17.32 kcal/mol, respectively.The calculated values are approximations of the binding free energy, where more negative values indicate stronger binding.

Drug-likeness and ADMET prediction
Similar to the efficacy of plants, bioavailabilities and ADMET profiles are also determined by the nature of each compound contained.It is essential to evaluate these properties so a single compound can be suggested as a drug candidate that is safe and optimal for use.However, ADME studies are not usually required for herbal remedy discovery and development.Most herbal medicines have little or no data on their ADME and pharmacokinetic properties in humans (He et al., 2011).
Lipinski's rule or rule-of-five states that poor absorption or permeability of a compound is more likely when it has more than 5 hydrogen-bond donors, the molecular mass

Molecular dynamics simulation
The stability of complexes in the SARS-CoV-2 M pro active site was studied using MDS within 100 ns.The ligands to be compared in this simulation are ORN, SMZ, NC2, reference ligand (FJC), and NTV as a positive control.Parameters, such as RMSD, RMSF, and binding energy calculation (MMGBSA), generated from MDS were analyzed.
The RMSD value measures the simulation system stability and the conformational perturbation of the protein backbone during a time scale of simulation (Sargsyan et al., 2017).Overall, the convergence of RMSD for the complex of SARS-CoV-2 M pro and the ligands were observed after 15 ns with the average RMSD value of 2.10 Å that denotes fewer changes in the overall structure during the simulation time (Fig. 2).The protein backbone of SARS-CoV-2 M pro was stable during the entire simulation, with the average RMSD value of 2.16 Å, which indicates lower fluctuation and stable behavior.The ligand SMZ displayed the most stable simulation system with the lowest average RMSD value of 1.90 Å among other ligands, slightly lower than positive control NTV (1.98 Å) for the SARS-CoV-2 M pro backbone perturbation.The RMSD profile of the complex M pro -SMZ resembles that of the complex M pro -NTV.The difference between the two is that no residues in the backbone of M pro -SMZ appeared to have RMSD > 2.50 Å, while some residues in the backbone of M pro -NTV were still perturbed higher than 2.50 Å.The docked complex of the other ligand, ORN, was stabilized without significant deviation since 50 ns with the average RMSD value of 2.02 Å.Unlike other complexes that have been quite stable since the simulation's beginning, the complex with ligand ORN resulted in an RMSD of more than 3.00 Å before reaching 50 ns.On the other hand, the complex M pro -FJC, the reference ligand, showed significant deviation with RMSD > 3.00 Å from 58 ns onward and average backbone RMSD 2.36 Å, putting this complex as the most unstable among other ligands.NC2, the third-best secondary metabolite in the docking study, still remained in third place in this MDS compared to SMZ and ORN with the average RMSD 2.26 Å, but still better than the reference ligand, FJC. is higher than 500, the calculated log P is greater than 5, and the hydrogen bond acceptor is more than 10.Lipinski's rule of five was initially conceived to aid the development of orally administrated drugs.However, it was not intended to generalize all small-molecule drugs (Lipinski et al., 1997;Neidle, 2012).From the evaluation of 15 metabolites with the highest rankings as potential inhibitors of SARS-CoV-2 M pro , only limocinin did not meet this rule.Lipinski's rule-of-five results for all secondary metabolites are listed in Table S3.
The predicted ADMET properties are tabulated in Tables S4-S7.The prediction data shows that each compound from A. indica displayed diverse ADMET.Still, it can be tolerated because there are synergy effects probabilities.In general, they tend to be absorbed by the gastrointestinal part sufficiently with low blood-brain-barrier (BBB) permeability values.Some compounds do not affect CYP2D6 substrate, CYP2C9 inhibitor CYP2D6 inhibitor cytochromes.The skin permeability for the compounds is between 2,314 and 3,599 units.Most compounds show negative AMES toxicity except nimbisonol, and none of them has shown hERGI inhibition activity.The LD 50 values of the studied compounds are between 1.8 and 4.0 mol/kg.administrated SARS-CoV-2 M pro inhibitor, and the reference ligand.Meanwhile, SMZ showed fascinating performance on MDS with the lowest average RMSD for the SARS-CoV-2 M pro backbone perturbation.The ligand SMZ also had the lowest average RMSF, indicating its strong interactions with the residues.Nevertheless, in MDS studies, ORN showed the best average ΔG o MMGBSA value, which indicated the strongest binding free energy values of the protein-ligand complexes.This study showed that A. indica has several prospective compounds as SARS-CoV-2 M pro inhibitors, and further wet experiments are needed to validate these potentials.

ETHICAL APPROVALS
This study does not involve experiments on animals or human subjects.

Figure 2 .
Figure 2. RMSD plots of SARS-CoV-2 M pro backbone in ligand-unbound (APO) and ligand-bound structures.The given chart is the average of the triplicate MDS results for each ligand.

Figure 3 .
Figure 3. RMSF plots of SARS-CoV-2 M pro in ligand-unbound (APO) and ligand-bound structures.The given charts are the average of the triplicate MDS results for each ligand.

Figure 4 .
Figure 4. ΔG o MMGBSA values' box plot of some ligands binding to SARS-CoV-2 M pro , which are the averages of the triplicate MDS results for each ligand.

Table 1 .
Estimated free energy of binding rankings from molecular docking results of secondary metabolites in A. indica (top 15 of 76 compounds), reference ligand, and NTV.