Molecular Docking Studies of Curcumin Analogues against SARS-CoV-2 Spike Protein

Severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2) is the etiologic agent of the current pandemic of coronavirus disease 2019 (COVID-19) that has inflicted the loss of thousands of lives worldwide. The coronavirus surface spike (S) glycoprotein is a class I fusion with a S1 domain which is attached to the human angiotensin converting enzyme 2 (ACE2) receptor, and a S2 domain which enables fusion with the host cell membrane and internalization of the virus. Curcumin has been suggested as a potential drug to control inflammation and as a potential inhibitor of S protein, but its therapeutic effects are hampered by poor bioavailability. We performed a molecular docking and dynamic study using 94 curcumin analogues designed to have improved metabolic stability against the SARS-CoV-2 spike protein and compared their affinity with curcumin and other potential inhibitors. The docking analysis suggested that the S2 domain is the main target of these compounds and compound 2606 displayed a higher binding affinity (-9.6 kcal mol) than curcumin (-6.8 kcal mol) and the Food and Drug Administration (FDA) approved drug hydroxychloroquine (-6.3 kcal mol). Further additional validation in vitro and in vivo of these compounds against SARS-CoV-2 may provide insights into the development of a drug that prevents virus entry into host cells.


Introduction
The SARS-associated coronavirus (SARS-CoV) is the causative agent of severe acute respiratory syndrome (SARS), which emerged as an epidemic from 2002 to 2003. More than 8000 cases were reported worldwide and before being brought under control, it took the lives of about 10% of those infected. 1 We are now facing a new SARS-associated coronavirus pandemic and the first cases were detected in Wuhan City, Hubei Province of China during December 2019. It is transmitted from human-to-human via sneezing, coughing and respiratory droplets and until February 2021, it has infected in excess of 107 million people around the world and caused over 2.3 million deaths. 2,3 The new virus was named by World Health Organization (WHO) as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the associated disease Coronavirus Disease-2019, also known as  Coronaviruses are part of the Coronaviridae family of the order Nidovirales. Within this subfamily there are four genera, Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus. [5][6][7][8] These viruses can be found in several animals, such as birds and mammals, which are the possible transmitters of the virus to humans. 9 Phylogenetic studies revealed that SARS-COV-2 is a β coronavirus of 2B group related to other bat SARS-related coronaviruses such as SARS-CoV and the Middle Eastern respiratory syndrome coronavirus (MERS-CoV) which frequently has been associated with fatal diseases. 4,10 SARS-CoV-2 is a positive-sense single-stranded ribonucleic acid (RNA) virus with a genome of 29.891 kb in size, enclosed by a 5'-cap and 3' poly-A tail, coding 10 open reading frames (ORF). It has unique genomic features that determines diversification into at least three different genetic clades, 5 but maintains structural proteins like those of other members of the Coronavirus family of which the major ones are the spike (S), membrane (M), envelope (E), and nucleocapsid (N) proteins. 6 One of the main differences between the virus found in humans and the ones found in animals, is in the spike (S) protein. 7 This protein has two subunits, S1 and S2, the first being responsible for the adherence of viruses to the host cells and the second for the fusion of the virus and cell membranes. 8 The adhesion process of the S1 subunits takes place through the receptor binding domains (RBD, receptor-binding domain), which in the case of SARS-CoV and SARS-CoV-2, interact with ACE2 human cell receptors (angiotensin, converting enzyme 2), 9 Figure 1. The structure of the ACE2 ectodomain, 11 presents a claw-shaped N-terminal peptidase, with the active site at the base of a deep groove and a C-terminal "collectrin" domain. 12 A fragment of the S1 region, residues 318 to 510, is sufficient for binding to the peptidase domain of ACE2. 13,14 The receptor-binding domain (RBD) determines the virus-receptor interaction as well as the viral host range. 15 Changes in some residues in the RBD can cause efficient transmission between species and is important for determining host specificity. 16 The huge negative health and economic impact of COVID-19 demands intensive research on treatment and though different vaccines have reached human trials and commercialization, 17 results in animals indicate that they protect against development of COVID-19 but not of infection by SARS-CoV-2. 18 Moreover, the genetic diversification of SARS-CoV-2 points out that, as for other corona viruses, it is unlikely that there will be vaccines for life-long protection. 19 Therefore, chemotherapeutics to stop virus replication and spread is still an important objective. The S protein is the major antigen responsible for inducing effective neutralizing antibodies, and therefore not only a target for vaccines but also for antiviral drugs.
Natural compounds such as flavonoids, alkaloids, lipids, benzenoids, lignans, neolignans and synthetic analogues have been investigated for their potential therapeutic effects against SARS-CoV-2. 20,21 Among their diverse bioactive activities they have antiviral properties and have demonstrated significant interaction with TMPRSS2 residues (essential for viral dissemination and pathogenicity) at the active site of S proteins. 21 Among 33 molecules studied, a natural compound (rutin) showed greater efficiency in binding to the active site of the SARS-CoV-2 protease, than a control antiviral drug (ritinovir), an antiprotozoal drug (emetin) and hesperidin which is a natural compound. 22 Another natural compound studied for COVID-19 treatment is curcumin [(1E,6E)-1,7-bis(4-hydroxy-3-methoxyphenyl)-1,6-heptadiene-3,5-dione], a dietary phytochemical constituent of turmeric powder derived from the rhizomes of Curcuma longa. 23 It has several benefits in controlling inflammation, 24 which is a common reaction caused by SARS-CoV-2. 25 In addition, curcumin has been reported to have a good binding affinity and efficient pharmacokinetic parameters, suggesting its use as a potential inhibitor of the S protein. 26,27 On the other hand the therapeutic effects of curcumin are hampered by its poor absorption after ingestion, quick degradation under physiological pH conditions, as well as rapid metabolism and elimination. 23 Molecular docking is a computational technique which is used to access information about the interaction between proteins and small ligand molecules. 28 Since this interaction occurs in several ways, molecular docking performs simulations that can predict the best conformations and orientations of the ligands in the binding sites of the target protein. 29 Hence, this technique has been used to design new drugs. Recently, molecular docking studies conducted by Patel et al. 27 have shown that curcumin and especially its derivative bisdemethoxycurcumin showed good binding affinity to the spike protein of both the SARS-CoV and SARS-CoV-2 viruses. Curcumin and this analog have a β-diketone moiety which is substrate for liver aldoketo reductases and may be one of the reasons for the rapid metabolism of curcumin in vivo. 30 Many research groups have circumvented this problem by generating analogous compounds with a mono-ketone moiety to improve the stability and metabolic profiles of curcumin. 31 In the present study, we conducted molecular docking studies to assess the interaction of a series of such compounds with the S1 and S2 subunit of the SARS-CoV-2 spike protein. We found that compound 2606 has potential as an inhibitor of the S2 domain and could be explored to inhibit the fusion of SARS-CoV-2 and host cell membranes, which are a potential therapeutic target for micromolecules.

Methodology
Molecular docking analysis of complexes of X-ray crystallographic structure of COVID-19 spike protein from 6YVB access code, 8 with inhibitors, were performed using AutoDock Vina version 1.1.2 and AutoDock 4. 32,33 The spike protein and ligand molecules were prepared through AutoDockTools, 34 for molecular docking simulations. In a first blind docking experiment, the binding affinity of the whole spike protein and of the 94 compounds (Supplementary Information (SI) section, Figure S1) used in this work were analyzed (Table 1). This analysis was also performed for 15 inhibitors described in the literature (structures available in PubChem). The three-dimensional structure of the curcumin analogues was designed with the Gaussian 09 program, 35 which was optimized using the semi-empirical method PM6. 36 Hydrogens from the protein and ligands attached to non-polar atoms were removed and the protonation status of the histidine residues of the spike protein was defined according to the PROPKA 3.0 program, 37 of the server PDB2PQR version 2.0.0, 38 considering pH 7.0. The rigid root of the ligands was identified automatically, defining all possible rotatable covalent connections as active to perform torsions in the conformational search at the connection site.
The search space in the blind docking simulations covering the whole spike protein was included in a box of 76 × 92 × 126 Å, 1 Å spacing, for the AutoDock and AutoDock Vina programs. In the AutoDock Vina program, 100 exhaustiveness was used. In the AutoDock program, 100 runs were carried out, a population of 250 was used, and the maximum number of evaluations were equal to 25,000,000. Remaining parameters were set to program standards. In this program, Lamarckian genetic algorithm (LGA), root-square-mean deviation tolerance (RMSD) was used for the 2.0 Å cluster analysis and the internal electrostatic energy was calculated. The coordinates of the center of the conformational search box at the protein binding site were defined according to the position of the protein. After the blind docking simulation, the ten ligands with the highest binding affinity were selected for a simulation of local molecular docking. In this simulation, the AutoDock Vina program was used, considering the domain found in the poses of the greatest binding affinities of the blind docking simulation. In this second experiment, local docking was carried out in the S1 domain of the C-terminal and in the S2 domain of the protein, since no ligand showed greater binding affinity in the S1 portion of the N-terminal of the protein. The local docking using the spike protein C-terminal domain was performed with a box of 58 × 54 × 50 Å. A box of 44 × 44 × 118 Å was used for the remaining compounds which displayed the highest binding affinity to the S2 domain. A spacing of 1 Å and 100 exhaustiveness was considered for both domains. The LigPlot+ program 39 was used to calculate the residues formed between the spike protein and selected ligands.
The bioavailability was also calculated for the ligands. The SwissADME program 40 was used for these calculations, which uses the bioavailability score described by Martin. 41 Afterwards molecular dynamics (DM) simulations were performed using the GROMACS version 5.0.1 program. 42 The DM simulations occurred with the spike protein S2 domain, since the ligands selected in the docking assays were part of this domain. The trajectories had 50 ns and used the CHARMM force field and the TIP3P water model. 43 The initial position of the compounds used in the DM simulations was obtained from the output of the AutoDock Vina program.
In the simulations, a 74 Å cubic box and solvation in a solution of 100 mmol L -1 of NaCl in water was considered. The protonation status of the residues was defined considering the results obtained in PROPKA 3.0, 37 using a pH of 7.0. Periodic boundary conditions and isothermalisobaric (NPT) ensemble were used in all simulations. For the system, a temperature of 298 K (25 ºC) and pressure of 1.0 bar was used, through the modified Berendsej thermostat (τ t = 0.1 ps) and the Parrinello-Rahman barostat (τ p = 2.0 ps and compressibility 4.5 × 10 -5 bar -1 ). A cutoff point of 14 Å for the Lennard-Jones and Coulomb potentials was used. To measure electrostatic interactions, the particle mesh Ewald (PME) algorithm was used. In DM simulations, a time interval of 2.0 fs was also used and all covalent bonds involving hydrogen atoms were restricted to their equilibrium distances. The conjugated gradient minimization algorithm was used to relax the atoms, in order to avoid the overlaps that occur at the beginning of the box construction process. To minimize the energy of the system, the conjugate gradient and steepest descent integrator algorithm was used, with maximum force criterion equal to 1000 kJ mol -1 nm -1 . After performing the tests, the trajectories obtained were analyzed according to the following parameters: (i) RMSD of non-hydrogen atoms, for the S2 domain of the spike protein and ligands; (ii) number of hydrogen bonds, using cutting distance = 3.5 Å and the maximum angle = 45°; and (iii) number of contacts for distances < 0.6 nm. All molecular dynamics simulations were performed in duplicate and the average RMSD values obtained for protein and ligand, hydrogen bonds and number of contacts were averaged. In the SI section, in Figures S2, S3 and S4, the independent results obtained in these simulations can be observed. In Figure 2, a flowchart displays the methodology and the time spent to carry out the simulations in this work.

Results and Discussion
The interactions of the compounds and the spike protein are listed in Table 1 according to their AutoDock Vina and AutoDock docking scores. This table also shows the bioavailability of the compounds evaluated. The highest binding affinity were observed for compounds 2606 (-9.6 kcal mol -1 ) and 2608 (-9.5 kcal mol -1 ), followed by the ligands rutin (-9.2 kcal mol -1 ), compound 2065 (-9.1 kcal mol -1 ) and hesperidin (-9.0 kcal mol -1 ) which displayed the highest binding affinities, considering the AutoDock Vina program. In the AutoDock program, the highest binding affinities were obtained for binders 2576 (-6.51 kcal mol -1 ), 2610 (-5.67 kcal mol -1 ), 2557 (-5.34 kcal mol -1 ), 2062 (-5.33 kcal mol -1 ) and 2606 (-5.33 kcal mol -1 ). It was found that, as many compounds used in simulation have a large number of torsions, the results obtained in the AutoDock Vina program may be more suitable for these simulations than those obtained in AutoDock. As an example, the rutin and hesperidin binders have the binding affinities 0.28 and -1.68 kcal mol -1 , respectively, in the AutoDock program. These results are inferior when compared to those  After the initial blind docking simulation, the ten ligands with the highest binding affinity were selected for a simulation of local molecular docking. In this simulation, the AutoDock Vina program was used considering the domain found in the poses of the greatest binding affinity of the initial simulation.
In this second experiment, local docking was carried out in the S1 domain of the C-terminal and in the S2 domain of the protein. The residues formed in this interaction were calculated (Table 2 and Figure 3) and no ligand showed greater binding affinity in the S1 portion of the N-terminal of the protein than the selected compounds. The compound 2578 showed the highest binding affinity when only the S1 domain of the spike protein C-terminal is used, while the other compounds showed better results with the S2 domain of the protein. The following residues were not included: (i) the Asp775 residue, identified only for ligand 2582; (ii) residues Glu868 and Ser813, found only for the compound hesperidin; (iii) the Gln774 residue, identified only for linker 2582; (iv) the Phe823 residue, verified only for the compounds hesperidin and rutin; (v) residue Pro862, identified only for compounds rutin and 2610; (vi) the Ser1055 residue, found for the genistein ligand; (vii) the Thr827 residue, found for compounds 2065 and 2582; (viii) residue Val729, identified for binders 2610 and 2582; (ix) the Val826 residue, identified for the compounds rutin and 2582.
The residues Asp867, His1058, Ile870, Pro863, Ser730, Thr732 are present in all the bonds formed between ligands and the S2 domain of the spike protein, Table 3. The Ala1056 and Phe782 residues are identified for eight evaluated ligands and Gly1059 residue for seven. The remaining residues (Thr778 and Val860, for 6 ligands; Pro1057, for 5 compounds; Leu861 and Lys733, for 4 ligands; Leu865, Met731 and Thr866 for 3 compounds) were found for a smaller number of bonds. These results show that part of the domain is coincident for all ligands that showed the highest binding affinity in the spike protein S2 domain. The portion of the spike protein which is common to the compounds is featured in Figure 4.  The ligand 2578 was not included in the Table 3 because it has the highest binding affinity in the S1 C-terminal of the spike protein. The binding microenvironment of five ligands which showed the highest binding affinities in the molecular docking simulations is also shown in Figure 4. The data in Table 4 reveals that four of the seven compounds with the highest binding affinities, considering AutoDock Vina program, are members of a series of cytotoxic 3,5-bis(arylidene)-1-dichloroacetyl-4-piperidones. A total of 14 compounds in this series were evaluated and the results are arranged in Table 4 with respect to the nature of the aryl substituents.
According to Table 4, the magnitude of the variation in the binding affinities with the nature of the aryl substituents permitted the following observations to be made.
(i) The compounds with halogens in the aryl rings namely 2605-2610 have greater binding affinities than the unsubstituted compound 2603.
(ii) The most potent compounds in this series of Asp 867 Gly 1059 His 1058 Ile 870 Leu 861 Leu 865 Lys 733 Met 731 Phe 782 Pro 863 Pro 1057 Ser 730 Thr 732 Thr 778 Thr 827 Thr 866 Val 860 (iv) Compounds containing a strongly electronattracting (2616) and a strongly electron-releasing (2615) groups in the aryl rings have similar binding affinities.
(v) Most of the compounds are more potent than curcumin and bisdemethoxycurcumin which have binding affinity figures of -6.8 and -7.0 kcal mol -1 , respectively.
The average binding affinities of the compounds with halogens in the aryl rings namely 2605-2610 is -8.7 kcal mol -1 . Hence in the future, a number of analogs with halogens in different locations in the aryl rings should be prepared and their binding affinities determined with the objective of finding novel compounds with high binding affinities.
Considering the affinities of ligation obtained in the docking programs, it appears that the curcumin analogues 2606, 2608 and 2065 showed the best affinities in the AutoDock Vina program and competitive results for the AutoDock program. For this reason, these ligands have been used in molecular dynamics simulations. In addition, it was found that the ligand genistein obtained the results of greatest interest among the compounds described in the literature and that were evaluated in this work, since this compound showed competitive affinities in the two docking programs used. Thus, this ligand was used as a reference in DM simulations. RMSD values of nonhydrogen atoms of S2 spike protein and ligands from the starting structure (docking pose) for the complex are reported as a function of the time at the bottom of Figure 5. In these simulations, the averages of values obtained in the duplicate simulations were calculated, from which the following RMSD values were obtained: (i) for ligand 2606 (0.30 ± 0.03) and S2 domain of the spike protein (0.40 ± 0.06), those results are shown in Figure 5a; (ii) for ligand 2608 (0.30 ± 0.05) and S2 domain of the spike protein (0.33 ± 0.03), the results of which are shown in Figure 5b; (iii) for ligand 2065 (0.37 ± 0.06) and S2 domain of the spike protein (0.37 ± 0.04), the results of which are shown in Figure 5c; (iv) for the genistein ligand (0.19 ± 0.04) and S2 domain of the spike protein (0.45 ± 0.04), those results are shown in Figure 5d. It is observed that all ligands remain stable during the DM simulations, especially the ligand 2065, which does not show considerable changes in its initial conformation along the 50 ns dynamic trajectory.
The results obtained for the number of contacts, for distances < 0.6 nm, are presented in Figure 6. In this simulation, the following numbers of contacts were obtained: (i) 2606, with 1100 ± 54 contacts ( Figure 6a); (ii) 2608, with 1122 ± 87 contacts (Figure 6b) (iii), 2065, with 1395 ± 80 contacts ( Figure 6c) and (iv) genistein, with 1107 ± 47 contacts (Figure 6d). In this simulation, it was found that the curcumin analog 2065 presented the largest number of contacts, among the evaluated ligands.
The average number of hydrogen bonds formed between the S2 domain of the spike protein and the selected ligands is verified in Figure 7. In this simulation, 1 ± 1 hydrogen bonds were obtained for ligands 2606 ( Figure 7a) and 2608 (Figure 7b). The reference ligand genistein, which had the highest number of hydrogen bonds in the simulations performed, a value of 4 ± 1 bonds was obtained. The average of hydrogen bonds obtained for ligand 2065 was 0 (Figure 7c). The highest percentage of trajectory occupation in relation to the hydrogen bridges formed between the S2 domain of the spike protein and the ligands 2606, 2608 and 2065 (in the first DM simulation) occurred between His1077, which donates nitrogen and whose acceptor is O 2 , available in these binders. In the second DM simulation for the ligand 2065, the longest hydrogen bridge occurred between Asp849, which donates N and the 2065 which receives O 3 . In the first dynamics simulation of genistein ligand, it was found that the hydrogen bridge that occurs the longest in the simulation occurs between genistein, which donates O 2 and receives O, available at Gly1078. For the second simulation, the longest bridge occurs between the nitrogen donated by Lys752 and which has genistein's O 1 as a receptor. Detailed information can be found in the SI section, in Table S1.

Conclusions
We have shown by molecular docking that a series of  curcumin analogues with a mono-keto moiety have the potential to inhibit SARS-CoV-2 spike protein interaction with the host receptor. Compound 2606 displayed higher binding affinity (-9.6 kcal mol -1 ) to spike protein than bisdemethoxycurcumin (-6.8 kcal mol -1 ) and curcumin (-7.0 kcal mol -1 ), using AutoDock Vina program. Regarding the AutoDock program, all binders with higher affinities are curcumin analogues. In the results of molecular dynamics, it was also observed that the compounds used showed competitive results, compared to the reference ligand. With our parameters it also displayed better results than the Food and Drug Administration (FDA) approved drug hydroxychloroquine. 45 Compound 2606 binds preferentially to the C-terminal region of the spike protein S2 domain which is responsible for the formation of an helical coiled structure that participates in the virus-host cell membrane fusion. 8 The fusion of SARS-CoV-2 with host cell membrane relies on activation by surface proteases, such as TMPRSS2, which induces S1 dissociation and structural changes of the S2 domain. 46 Therefore, future work will consider if the compounds described here can interfere with the binding entry of SARS-CoV-2 into the host cell by blocking the interaction of the spike protein and TMPRSS2. Although the work in progress is computational and requires additional validation in vitro and in vivo, our results demonstrate the potential of the curcumin analogues described here for the purpose of finding an anti-COVID-19 drug.

Supplementary Information
Supplementary data with chemical structure of compounds and molecular dynamics simulations are available free of charge at http://jbcs.sbq.org.br as PDF file.
conceptualization, data curation, investigation, software, writing original draft and writing-review and editing; Umashankar Das, Ana L. Fachin, Jonathan R. Dimmock and Mozart Marins for the project administration, resources, writing original draft and writing-review and editing.