In silico screening of chalcones and flavonoids as potential inhibitors against yellow head virus 3C-like protease

Yellow head virus (YHV) is one of the most important pathogens in prawn cultivation. The outbreak of YHV could potentially result in collapses in aquaculture industries. Although a flurry of development has been made in searching for preventive and therapeutic approaches against YHV, there is still no effective therapy available in the market. Previously, computational screening has suggested a few cancer drugs to be used as YHV protease (3CLpro) inhibitors. However, their toxic nature is still of concern. Here, we exploited various computational approaches, such as deep learning-based structural modeling, molecular docking, pharmacological prediction, and molecular dynamics simulation, to search for potential YHV 3CLpro inhibitors. A total of 272 chalcones and flavonoids were in silico screened using molecular docking. The bioavailability, toxicity, and specifically drug-likeness of hits were predicted. Among the hits, molecular dynamics simulation and trajectory analysis were performed to scrutinize the compounds with high binding affinity. Herein, the four selected compounds including chalcones cpd26, cpd31 and cpd50, and a flavonoid DN071_f could be novel potent compounds to prevent YHV and GAV propagation in shrimp. The molecular mechanism at the atomistic level is also enclosed that can be used to further antiviral development.

Several antiviral drugs targeting the viral replication process have been developed. These drugs often interrupt the orchestrated action of the viral nonstructural proteins (Dinesh et al., 2020). Viral protease is one of the most exciting targets for antiviral drug development due to its structural simplicity, small size, and function in producing other enzymes and structural proteins necessary to produce mature virions (Dinesh et al., 2020). Chymotrypsin-like proteinase (3CL pro ) is a protease encoded by most viruses in Nidovirales, including YHV. 3CL pro plays a pivotal role in the viral replication process by cleaving the viral polyproteins at multiple conserved sites to release several vital replicative proteins (Ziebuhr et al., 2003). Due to the unavailability of the 3D structure from X-ray crystallography, nuclear magnetic resonance (NMR), or cryogenic electron microscopy (cryo-EM) of YHV 3CL pro , homology-based modeling has been conducted to gain the structural basis of this viral protease (Unajak et al., 2014). The YHV 3CL pro model was used in virtual screening for potential compounds from the cancer therapeutic drugs database. NSC122819 was the best promising candidate as it showed relatively good results in an inhibition experiment against bacterially expressed YHV 3CL pro . However, this compound is a chemotherapeutic drug derivative to podophyllotoxin, causing a cytotoxic effect and inhibiting topoisomerase II activity (Dombernowsky, Nissen & Larsen, 1972). Although the potential compound exhibited a good result the in vitro experiments, it is still challenging further to develop NSC122819 for exploitation in the farm setting. Therefore, a search for an alternative compound with higher bioavailability suitable for the treatment of YHV infection would be of great interest.
Flavonoids, the common polyphenols produced by plants, are the most abundant polyphenol in animal diets (Dai & Mumper, 2010). As secondary plant metabolites, which are not essential for the living of the plant, flavonoids are responsible for various biological activities, such as plant coloration (Yoshida, Oyama & Kondo, 2012), UV filtration (Takahashi & Ohnishi, 2004), molecular signaling (Brunetti et al., 2018, detoxifying (Samanta, Das & Das, 2011), and function as anti-microbial agents (Al Aboody & Mickymaray, 2020). Moreover, flavonoids also showed several pharmaceutical properties (Agrawal, 2011). Due to their low toxicity and availability, flavonoids are a well-known component of herbal medicines and even dietary supplements (Kumar & Pandey, 2013).
According to previous studies about potentials of flavonoid and chalcone as viral protease inhibitors, as well as their low toxicity and bioavailable nature, a set of flavonoids and chalcones from our database was considered for identification of bioactive drug candidates for inhibition of YHV major protease, YHV 3CL pro , using computational approaches (Abudayah et al., 2022;Al-Sha'er, Basheer & Taha, 2023). The YHV 3CL pro model predicted by deep learning-based structural modeling was used to identify potential candidates of chalcones and flavonoids by molecular docking, drug-likeness prediction, and molecular dynamics (MD) simulations. The insights offered from this work can thus open a new avenue to exploit both chalcones and flavonoids in preventive and therapeutic approaches against YHV infection. Since YHV and GAV are closely related, the structure of GAV is also included in the screening process to obtain the candidate compounds that can be antiviral agents for both pathogens.

3CL pro structure modeling
Due to lack of the 3D structure available, the 3D model of 3CL pro of YHV and GAV were constructed based on their amino sequences. The amino acid sequence of YHV 3CL pro was obtained from the NCBI protein database (NCBI reference sequence: ACH99403.1, accessed in October 2021). The amino sequence of GAV 3CL pro was extracted from ORF1, a polyprotein pp1a, available on NCBI database (NCBI reference sequence: YP_001661453.1). The two proteins were modelled using ColabFold web software (Mirdita et al., 2022) with AlphaFold v.2.0 (Jumper et al., 2021. Subsequently, the protonation state of ionizable residues was assigned at pH 7.4 using the Open Babel (O'Boyle et al., 2011). The catalytic histidine, H63, was in neutral form with protonated δ nitrogen. The side chain of catalytic dyad H63 and C152 was adjusted to be in active conformation before energy minimization using Chimera software (Huang et al., 1996). The model quality score and local distance difference test (lDDT) were used to determine the local quality of the models, while the regions with lDDT value <50 were manually trimmed out from the model.

Molecular docking
According to previous works about potentials of flavonoid and chalcone as viral protease inhibitors , HIV-1 protease (Turkovic et al., 2020;Xu et al., 2000), dengue-2 protease (Hengphasatporn et al., 2020;Kiat et al., 2006;Srivarangkul et al., 2018), MERS-CoV 3CL pro (Jo et al., 2019), and SARs-CoV 3CL pro (Hengphasatporn et al., 2022;Nguyen et al., 2012;Park et al., 2016), as well as their low toxicity and bioavailable nature, the set of 223 chalcones and 49 flavonoids from in-house database (Sangpheak et al., 2019) was considered for identification of bioactive drug candidates for inhibition of YHV major protease, YHV 3CL pro , in this study. The structure of the reported anti-YHV 3CL pro agent, NSC122819 (Unajak et al., 2014), was downloaded from the ZINC database. The 2D structures of all studied compounds were depicted in supplemental Fig. S1. Each compound from in-house database and the known potent compound NSC122819 was docked into the 20x20x20 Å 3 box centered at the center of 3CL pro catalytic residues, H63 and C152, using AutoDock VinaXB (Koebel et al., 2016) with 100 runs. Among the different configurations, the ligand conformation with the highest binding affinity was selected for further analysis. To validate reliability of molecular docking result, the known inhibitor NSC122819 was used as a template to generate 50 decoy molecules with Directory of Useful Decoys, Enhanced (DUD-E) sever (Mysinger et al., 2012). The decoys and candidate molecules went through the molecular docking process again to generate receiver operating characteristic (ROC) curve (Empereur-Mot et al., 2015).

Molecular dynamics simulation
The docked ligands/YHV 3CL pro complexes were performed by 200-ns MD simulations in periodic boundary conditions using AMBER20 (Case et al., 2021). The ligand structures were optimized by the HF/6-31G(d) level of theory using Gaussian 09 software (Frisch et al., 2009), and the electrostatic potential (ESP) charges were generated at the same method (Sanachai et al., 2020). Subsequently, the restrained ESP (RESP) charges of ligand were obtained using the ANTECHAMBER module. The protein was treated by AMBER ff16SB forcefield, while the ligand parameters were retrieved from generalized AMBER force field version 2 (GAFF2). The TIP3P water molecules were added to solvate the complex in a cubic box with a dimension extended at least 13 Å from the system surface. Counter ions, Na+, were added to neutralize the system. To diminish the unfavorable contacts and steric hindrances, the added waters and ions were minimized using steepest descent (SD) and conjugated gradient (CG) methods for 1500 iterations with restrained protein-ligand complex using a force constant of 500 kcal/mol Å 2 , followed by the same minimization procedure with 1000 iterations for the whole system. The SHAKE algorithm was applied to constrain all hydrogen atoms connected with covalent bonds (Ryckaert, Ciccotti & Berendsen, 1977). The particle-mesh of Ewald's summation method (York, Darden & Pedersen, 1993) was used to treat the long-range electrostatic interactions with 10 Å cut-off distance. Temperature and pressure were controlled by the Langevin dynamics (Welling & Teh, 2011) and Berendsen barostat (Berendsen et al., 1984), respectively. The time step of 2 fs was used. Two steps of 500-ps MD simulations were carried out on each system with a restrained position of the ligand/protein complex by a force constant of 5.0 and 2.0 kcal/mol Å 2 , respectively. Then, MD simulations of ligand/protein complexes without any restraint were continued up to 200 ns. The trajectories extracted every 10 ps were analyzed in terms of root mean square displacement (RMSD), number of contact atoms and number of H-bonds between protein and ligand with CPPTRAJ module (Roe & Cheatham III, 2013).
To calculate the binding affinity of the interaction between candidate compounds and 3CL pro , the Molecular Mechanics/Poisson-Boltzmann Surface Area (MM/PBSA) and the Molecular Mechanics/Generalized Born Surface Area (MM/GBSA) method were performed using 100 snapshots from the last 50 ns of the simulations. The internal dielectric constant was set to be 1. The external dielectric constant was set to 80. The surface tension was set to 0.0072 kcal/mol Å 2 . And the solvent probe radius was set to 0.14 Å. Moreover, the solvated interaction energy (SIE) approach was also adopted to evaluate the binding affinity of ligand toward 3CL pro (Naïm et al., 2007). The compounds with the same level or higher binding affinity against YHV 3CL pro compared to the known inhibitor NSC122819 were selected for investigation on intermolecular interactions with protein target, i.e., MM/GBSA per-residue decomposition free energy (Miller III et al., 2012), with cutoff at G bind = −1 kcal/mol. The top molecules with highest binding affinity were simulated again along with the known inhibitor to confirm repeatability of the simulation.

Virtual screening
The 3D structures of YHV and GAV 3CL pro were modeled using the CollabFold web application (Mirdita et al., 2022) with AlphaFold v.2.0 and MMseqs2 (Jumper et al., 2021;Mirdita et al., 2021) according to the amino acid sequences obtained from the NCBI database (codes: ACH99403.1 and YP_001661453.1). The two models of YHV 3CL pro , YHV_M1 and YHV_M2, and a model of GAV 3CL pro with the best model quality score are chosen and given in supplemental Fig. S1. The catalytic dyad H63 and C152 are identified in both YHV and GAV 3CL pro , as described in the previous study of GAV (Ziebuhr et al., 2003). GAV and YHV are closely related both genetically and morphologically (Chantanachookin et al., 1993;Cowley et al., 2000;Ziebuhr et al., 2003). The binding pockets of the YHV and GAV models are considered and compared in Fig. 1A. It can be seen that the binding pockets of YHV_M1, YHV_M2, and GAV 3CL pro models were well-aligned, and the amino acid sequence at the binding site was highly conserved (92.6% identity, Fig. S2). In Fig. 1B, the model YHV_M2 exhibited a higher z-score (11.8) to the GAV binding pocket than YHV_M1 (11.2). The 3CL pro amino sequences of GAV and YHV are highly conserved with 94% similarity (Fig. S2). Thus, the model YHV_M2 was chosen to be a representative model for YHV 3CL pro in the following studies. The previous homology based model of YHV 3CL pro suggested H30, D70, and C152 as 3 catalytic residues that form the catalytic triad of the protease (Unajak et al., 2014). On the contrary, our predicted model from AlphaFold2 indicated that the YHV 3CL pro employs H-C catalytic dyad for proteolytic mechanism which is in good agreement with the previous experimental data obtained from GAV 3CL pro (Ziebuhr et al., 2003). Considering the genetic similarity and mactching residue position of 3CL pro H63 and C152 in the catalytic dyad of both GAV and YHV 3CL pro (Fig. 1A), we identified H63 as one of the catalytic residues in our model instead of the previously proposed H30 (Unajak et al., 2014). The designation of H63 as part of the YHV 3CL pro catalytic dyad is in great accordance with the previous experimental reports (Ziebuhr et al., 2003).
Molecular docking with 100 runs was employed to generate the possible ligand binding configurations in the binding pocket of YHV and GAV 3CL pro . For each compound, the best configuration with lowest binding free energy ( G) was chosen to be ranked and compared with the known YHV inhibitor, NSC122819 (Unajak et al., 2014). The G of NSC122819 was of −8.0 kcal/mol in YHV 3CL pro and −6.7 kcal/mol in GAV 3CL pro . Among the 272 studied compounds (Fig. 1C), there were 111 compounds with G lower than that of NSC122819. In this study, only the top 15% consensus were selected as candidate compounds. These were six chalcones (cpd26, cpd27, cpd28, cpd31, cpd41, and cpd50), and three flavonoids (DN071_f, IP004, and TP034) with G values in a range of −9.0 to −10.1 kcal/mol for YHV 3CL pro , and −8.0 to −9.3 kcal/mol for GAV 3CL pro . The 2D structures of these nine compounds were depicted in supplemental Fig. S3, while their 2D interactions were shown in Figs. S4-S5. All screened compounds were stabilized by hydrophobic interactions with YHV 3CL pro residues. The chalcones cpd28, cpd41 and cpd50, and flavonoids IP004 and TP034, can form hydrogen bonding with the imidazole ring of H63, which is one of catalytic residues.
The candidates and the known inhibitor NSC122819 served as primers to generate decoy molecules using the Directory of Useful Decoys, Enhanced (DUD-E) sever (Mysinger et al., 2012). The 50 decoys with similar physicochemical properties were created for each compound. All compounds were separately docked into the binding pocket of the YHV 3CL pro model again with all decoys. The receiver operating characteristic (ROC) curve was constructed to validate the docking results (Empereur-Mot et al., 2015). In Fig. 1D, the receiver operating characteristic (ROC) curve showed a high true-positive rate over a false-positive rate, with the area under the curve (AUC) of 86.19%. This finding suggested that the molecular docking results were highly predictive and suitable for the viral proteases.

Drug-likeness prediction
The bioavailability of molecules is one of the most important properties to be considered as a drug. The physiochemical properties of the top nine compounds and NSC122819 were summarized in Table 1. All nine compounds were accepted by Lipinski's of five indicating good physicochemical properties and bioavailability of the molecules. In contrast, the known inhibitor NSC122819 was rejected by Lipinski's criteria with three breaking rules, which indicate that the molecule has poor bioavailability. BOILED-Egg model result in Fig. 2 shows that most candidates were placed in the yellow and white area, indicating that the molecules were susceptible to gastrointestinal absorption. The flavonoid TP034 and the known inhibitor NSC122819 in the grey area were predicted to be not absorbed in the gastrointestinal tract. Noticeably, the chalcone cpd27 could pass through the blood-brain barrier (yellow area) and be pumped out of the brain by activating permeability protein, PGP (Daina, Michielin & Zoete, 2017), as suggested by the blue dot.

MD study of YHV 3CL pro Complexes
The six chalcones and three flavonoids complexed with YHV 3CL pro were investigated by 200-ns MD simulations. The stability of system in term of RMSD was shown in Fig. 3A. The protein structures of most systems including apo protein showed a high RMSD fluctuation with a deviation of 1-2 Å at the early simulation period. The low fluctuation of RMSD values was found after 100 ns until the end of the simulation (<1 Å). The stability of the systems in the last 100 ns is also supported by the RMSD comparison of the MD trajectories in the 2D-RMSD plot (Fig. S6). The ligand-binding at the 3CL pro active site could reduce the motion of the catalytic dyad, governed by a lower RMSD value in complex in comparison to apo-form protein (green line in Fig. 3A). The number of atom contacts (#Atom contact) within 3.5 Å of the compound at the 3CL pro binding pocket and the distance between a closet heavy atom of ligand and the center of mass of catalytic dyad (d Ligand ) along with the simulation time were plotted in Figs. S7B-S7C. #Atom contact of each compound was in the same magnitude from the beginning until the end of the simulation, except for a drastically decreased #Atom contact at 170 ns in flavonoid IP004 (Fig. 3B). The drastically change in the interaction of IP004 system was also spotted in distance measurement between the molecule and the catalytic residues, as flavonoid IP004 moved away from the catalytic dyad at 170 ns (d Ligand lengthened from 7.70 to 15.49 Å, Fig. 3C). Result of chalcone cpd41 and known inhibitor NSC122819 also show great distance at the beginning of simulation (d Ligand = 14.5 Å and 12.17 Å, respectively). These results of the three molecules indicate their weak interaction to the catalytic dyad. However, the three molecules are still clinging to other residues inside the binding pocket which can still be considered as obstacle for substrate to bind at the active site. Other candidates were occupied in the 3CL pro active site for the whole period of simulation and located closer to the catalytic dyad (d Ligand <7 Å) than the known inhibitor NSC122819 (d Ligand = 12.17 Å). From the results in Fig. 3, the trajectories between 150 to 200 ns were considered the equilibrated stages and used for analysis in terms of structural dynamics and energetic data.

Ligand-protein hydrogen bonding
The hydrogen bond is undeniably one crucial factor determining compound binding strength toward the protein. The hydrogen bond formation between compound and 3CL pro was counted when the distance between hydrogen bond donor (HD) and acceptor (HA) was less than 3.5 Å, and the angle of HD_H. . . HA was larger than 120 • . The number of hydrogen bonding (#Hbond) along with the simulation and the hydrogen bond occupation >10% of selected simulation length (50 ns) were given in Fig. S7. Among the screened compounds, most chalcones formed at least a strong hydrogen bond (>70%) with the backbone of the binding pocket residues, i.e., A172 (cpd26), A180 (cpd41), N194 (cpd26 and cpd31), Y197 (cpd28 and cpd50). Instead, a moderately strong hydrogen bond was detected in the flavonoid DN071_f with the imidazole ring of a catalytic residue, H63 (60.2%). Noticeably, the number of hydrogen bonding formed with IP004 dramatically declined after 170 ns, which is a result of the same event shown in Figs. 3B-3C as a relatively weak interaction with A180 (29.5%). The chalcone cpd27 and flavonoid TP034 formed the very weak hydrogen bonds with the surrounding residues (<10%).  energy calculations was illustrated in the 3D scatter plot (Fig. 4A), which the compounds with better binding affinity were located toward the bottom left corner of the plot. It can be seen that the compounds were found to cluster into three groups corresponding to their predicted G bind values: (I) the chalcone cpd31 with the best binding energy, lowest G bind values, predicted by all methods; (II) the chalcones cpd26 and cpd50, and the flavonoid DN071_f with G MM/PBSA lower than −20 kcal/mol and G MM/GBSA lower than −25 kcal/mol; and (III) the other screened compounds as well as and the known YHV 3CL pro inhibitor NSC122819 with relatively higher G bind values, and locate toward the top-right corner of the plot. The compound present in group I and II were considered as best candidates. The simulation systems of the best four compounds, cpd31, cpd26, cpd50, and DN071_f, were repeated in duplicate runs to confirm the reliability of result. The RMSD result of the duplicated simulations show the similar pattern to the original simulation (Fig. S8A). Most of duplications also show the same pattern of hydrogen bonding, except for chalcone cpd26 which the duplicated result show lower number of hydrogen bond at last 50 ns of the simulation (Fig. S8B). Figure 4B illustrates the residue contribution for ligand binding, where the residues which exhibit G residue bind lower than −1 kcal/mol were identified as hotspot residues. All candidates from this study interacted with YHV 3CL pro better than the known inhibitor NSC122819 in accordance with the binding affinity results. The hotspot residues were mainly found in the C-terminal domain of 3CL pro (right hand site of the plot), i.e., there were seven to nine stabilized residues for the three chalcones (cpd26, cpd31, and cpd50) and four residues for the flavonoid (DN071_f). Among them, the two nonpolar residues, I168 and V169, and the polar residue Y199 served as the common hotspot residues response in stabilizing all candidates. The chalcones cpd31 and cpd50 and the flavonoid DN071_f also showed interactions with the two hotspot residues on the N-terminal domain, i.e., the catalytic dyad residue H63 and R46 or L49. Only the catalytic residue H63 served as the shared hotspot residue among the 3 compounds. In comparison to cpd26, which not interact with N-terminal cluster, the compound exhibits the weakest binding affinity among the best four candidates. In Fig. 4C, the vdW energic term with a negative value in almost all key residues implied that vdW force was the primary interaction for all candidate compounds, similar to the result in the previous study (Unajak et al., 2014). The vdW as a major interaction of flavonoids and chalcones toward the binding site of viral 3CL pro has also been reported in other studies of nidoviruses, especially coronaviruses such as SARs-CoV (Nguyen et al., 2012), SARs-CoV-2 (Das et al., 2021), andMERS-CoV (Jo et al., 2019). This was in concordance with the previous suggestion in the linkage between 3CL pro of invertebrate nidoviruse, including YHV and GAV, and coronavirus (Cowley et al., 2000;Ziebuhr et al., 2003). As mentioned earlier, most interactions of candidate compounds in this study are toward nonpolar residue on the C-terminal domain, with only a few interactions toward the catalytic residue on the N-terminal domain. This interaction pattern is similar to the interaction of antiviral compounds against 3C protease (3C pro ) of coxsackievirus and enterovirus (Sripattaraphan et al., 2022). Although coxsackievirus and enterovirus 3C pro employ a catalytic triad, while YHV 3CL pro uses a catalytic dyad active site, both proteases share several common characteristics since the viruses are identified as members of picornavirus-like supercluster (Kim et al., 2012). Although this study primarily focused on finding anti-viral agents against YHV 3CLpro, due to the fact that YHV and GAV are closely related and GAV 3CLpro was also included in the screening process, all potent candidate compounds present here are potentially provide inhibiting properties for both YHV and GAV 3CLpro.

Ligand binding affinity
Searching for YHV anti-viral agent posed many challenges including unavailable of both YHV and GAV viral protein structure data, from either X-ray crystallography, NMR, or cryo-EM experiments. Since YHV and GAV are only two members of their family, Roniviridae, make it difficult to find good reference structures or templates for structural modelling. Moreover, beside the in-silico study, in vitro experiments, such as enzyme-based assay, and in vivo experiments in real animals should be performed to gain better validation of candidates' potential and toxicity. However, the in vitro and in vivo data are not included in this study due to the time and cost constraints in the experiments including newly synthesizing the required amount of potent bioorganic compounds. Further investigation in potential of all candidates presented in this study in both in vitro and in vivo experiments are suggested. The in vitro experiment with the enzyme inhibition assay using potent compounds from this study will be included in our future study.

CONCLUSIONS
Computational approaches were used to identify potential compounds from in-house database containing chalcones and flavonoids with high predicted bioavailability against YHV 3CL pro . As a result, the four potent candidates of three chalcones, cpd26, cpd31, cpd50, and a flavonoid, DN071_f, showed the high binding affinity towards the targeted protease. Most of the interactions were found toward hydrophobic residues on the C-terminal domain of the binding pocket, while some compounds, cpd31, cpd50, and DN071_f, also have interactions directly toward a catalytic residue, H63, on the N-terminal domain. All top candidates with higher binding affinity than reported inhibitor NSC122819 are suggested to be tested by enzyme-based assay for further development as anti-YHV 3CL pro inhibitor.