AI-driven drug repurposing and binding pose meta dynamics identifies novel targets for monkeypox virus

Monkeypox virus (MPXV) was confirmed in May 2022 and designated a global health emergency by WHO in July 2022. MPX virions are big, enclosed, brick-shaped, and contain a linear, double-stranded DNA genome as well as enzymes. MPXV particles bind to the host cell membrane via a variety of viral-host protein interactions. As a result, the wrapped structure is a potential therapeutic target. DeepRepurpose, an artificial intelligence-based compound-viral proteins interaction framework, was used via a transfer learning setting to prioritize a set of FDA approved and investigational drugs which can potentially inhibit MPXV viral proteins. To filter and narrow down the lead compounds from curated collections of pharmaceutical compounds, we used a rigorous computational framework that included homology modeling, molecular docking, dynamic simulations, binding free energy calculations, and binding pose metadynamics. We identified Elvitegravir as a potential inhibitor of MPXV virus using our comprehensive pipeline.


Introduction
Monkeypox virus (MPXV), an Orthopoxvirus genus member [1], is a zoonotic virus discovered in 1958 amid outbreaks of a pox-like illness in macaque monkeys held at a Danish research institution and termed as "monkeypox" [2].The first human case was found in 1970 under enhanced SPX monitoring in the Democratic Republic of the Congo (DRC) [3].Monkeypox, double-stranded DNA virus, is a member of the Orthopoxvirus genus in the Poxviridae family, and poxviruses have a strong inclination to spread outside of their normal ecological area by infecting a naïve group [4][5][6].Monkeypox is clinically almost identical to conventional Variola (SPX) and Vaccinia, smallpox viruses [7]; as a result, since the universal abolition of smallpox in 1977, much emphasis has been focused on monkeypox as a smallpox-like illness with universal implications [8].
Initially macular, the lesions progress to papules, vesicles, pustules, and, lastly, crusts.Up to 11% of unvaccinated affected people die [9,10].Transmission occurs by respiratory droplets or direct contact with lesion exudate; however, there is evidence that infection can occur via bite or scratch [11,12].There are presently no medications approved to treat monkeypox (MPX), and while the smallpox vaccination can give protection, its usage is limited due to safety concerns with a live viral vaccine.As a result, MPX prevention is dependent on limiting human contact with infected wild animals and avoiding viral transmission from person to person [13,14].Despite this, there are over a thousand confirmed monkeypox cases in the United States, the United Kingdom, and several other European nations as of June 4, 2022 [15,16].Monkeypox symptoms include flu-like symptoms, fever, lethargy, back pain, headache, and a unique rash [17,18].The globe is divided into two clades: Central African clade and West African clade.The Central African clade had a case fatality rate of 10.6%, whereas the West African clade had a rate of 3.6% [19].
Monkeypox is an enveloped double-stranded deoxyribonucleic acid (dsDNA) virus that replicates exclusively in the host cell cytoplasm and belongs to the Orthopoxvirus genus of the Poxviridae family [20].F13 encodes a VP37 envelope protein [21] that is essential for the formation of enveloped virions (EVs), which include internal EVs (IEVs), cell-associated EVs (CEVs), and external EVs (EEVs) [22].F13 has a phospholipase motif that is required for the envelopment of intracellular mature virions (IMVs) into virusmodified membranes to produce egress-competent IEVs [23].IEVs are carried to the cell surface by microtubules after acquiring extra membranes, where their outer membrane fuses with the plasma membrane, exposing CEVs.CEVs remain attached to the cell surface, push into nearby cells through actin tails that grow beneath them, or are released into the extracellular environment as EEVs [24].As a result, the prevailing opinion is that CEVs mediate virus propagation from cell to cell and that EEVs facilitate virus systemic dispersion inside a host [25].Therefore, we had chosen F13 envelope protein for the study.Even though MPXV is a DNA virus, it spends its entire life cycle in the cytoplasm of infected cells [26][27][28].All the proteins required for viral DNA replication, transcription, virion assembly, and nuclear egress are encoded by the MPXV genome [29,30].Until 2019, there are no licensed vaccinations or medicines to treat the human monkeypox virus.Dryvax, a smallpox vaccine, has been used to treat both smallpox and monkeypox [31,32].
Tecovirimat is a drug identified through molecular docking by [33,34] for the treatment of monkeypox and has widely been used for treatment of smallpox disease [33].Recently Dassanayake et al., 2022, [35], the authors screened for six antimicrobial peptides, two phytochemicals and two bacterial metabolites to test for antiviral properties against multiple viruses including monkeypox virus using molecular docking and identified Glycocin F as the only natural biomolecule with high binding affinity to viral surface proteins.In [36], the authors undertake a drug repurposing approach to identify potential inhibitors for vital monkeypox viral proteins using molecular docking and molecular dynamics simulations.The authors [36] identified Tipranavir and Dolutegravir as potential top inhibitors.Furthemore, the authors in [37] use a network pharmacology approach followed by molecular docking to identify Shengma-Gegen decoction (SMGGD) as a potential inhibitor of monkeypox virus through the MAPK signalling pathway.
In this work, we propose a multi-step in-silico pipeline to identify potential inhibitors targeting the Monkeypox envelope protein.We tested 1483 FDA approved and investigational drugs using the DeepRepurpose framework (including 1482 drugs available in DeepRepurpose dataset and Tecoviramt) to identify top compounds which potentially inhibit MPX viral proteins available in Uniprot.We focused on the monkeypox envelope protein F13 (Uniprot id: Q5IXY0) as it is point of entry for infected human cells.Since the structure of any of the viral proteins relevant to monkeypox were not available, we used Alphafold to obtain the predicted structure of the envelope protein.We then created a homology model and tested its docking, molecular dynamics (MD) and binding pose metadynamics [38].This in silico pipeline resulted in the identification of the most promising and efficient compounds as a starting point for the development of prospective drugs.The docking and MD results verified each other and revealed a strong connection with the experimental data.We also identified the primary interactions involved in molecular recognition and provided suggestions for the development of efficient and selective therapeutic agents [39].

Artificial Intelligence based DeepRepurpose framework
We used the DeepRepurpose framework proposed in [40] as the platform for transfer learning.The DeepRepurpose framework builds an ensemble of various machine learning models on top of vector representations of the compounds and the viral proteins for estimating compound-viral protein activities.It uses a vector representations such as molecular fingerprints (MFP) or a teacherforcing long short-term memory neural network (TF-LSTM) [41]based embedding vector for compounds concatenated with a convolutional neural network (CNN) [42]-based embedding vector representation for viral proteins [43][44][45][46].This concatenated vector is then passed to state-of-the-art machine learning models such as XGBoost [47] and Support Vector Machines [48,49] to build predictive models.Additionally, the DeepRepurpose framework uses sequence-based representation for compounds (canonical SMILES strings) and viral protein amino-acid sequence representation (primary structure representation) which is provided to end-to-end deep learning models based on CNNs and graph attention networks (GAT) [50].The DeepRepurpose framework aggregates the activity scores predicted by each of these models by taking an average of the activity values resulting in the final predicted activity for a given compound-viral protein pair.
DeepRepurpose was built using > 60k interactions between > 50k compounds and ∼100 viral organisms with the goal of prioritizing compounds (repurposing) for future pandemic-causing viruses.Moreover, in [40], the authors had collected approximately 12,000 clinical-stage or Food and Drug Administration (FDA)-approved small molecules screened for bioactivities against SARS-COV-2 virus in Vero E6 cells.Two thousand, four hundred eighty-three of these compounds were publicly available in a library named Re-FRAME (https://reframedb.org/).In the DeepRepurpose framework, these compounds were then filtered based on the length of their SMILES sequences resulting in a set (S) comprising of 1482 compounds including known antivirals, antibiotics, anticancer and other human investigational compounds [40].

Homology modeling
The lack of a resolved crystalized structure spurred the development of monkeypox viral models based on a comparison of primary sequences with the closest accessible known structure.UniprotKB [51] entry Q5IXY0 was used to obtain the primary sequence of the monkeypox envelope protein F13 (Fig. 2).Q5IXY0's FASTA sequence was uploaded to ModWeb: A Server for Protein Structure Modeling [52], Alphafold2 [53] and Robetta server [54].AlphaFold's inference pipeline and source code are open source and accessible at https://github.com/deepmind/alphafold.Then, multiple sequence alignment was performed using UniRef90, MGnify and small BFD databases.Based on the ModWeb analysis, the template with 7E0M, chain A, with 1.79 Å resolution was selected with psiblast with the best scoring model [55].The predicted model showed the similarity of 1.08 Å with the reference structure and the e-value of 2. Chimera and AlphaFold were used to determine the predicted local distance difference test (pLDDT) of predicted structures, accordingly.Chimera was employed to retrieve these modelled proteins from AlphaFold and assessed their stereochemistry for structural validation using the SAVES v6.0 of Procheck server [56].
The distribution of amino acid residues derived from Ramachandran plots revealed that 89.9% of the areas were in the most preferred region and 9.2% were in the additional permitted region (Figure S1).This modeled structure was minimized using molecular dynamics simulation for 100 ns for the stabilization.

Molecular docking
The homology modelled protein structure was considered as a receptor and was prepared using the protein preparation wizard of Maestro module of Schrodinger software [57].It included energy minimization through Optimized Potentials for Liquid Simulations (OPLS) 2005 forcefield [58].The 1483 AI-based ligands were retrieved from PubChem database (https://pubchem.ncbi.nlm.nih.gov/ ) and prepared for docking by adding charges and hydrogen.These structures were energy minimized and aligned using OPLS 2005 forcefield and greedy algorithm.The homology modeled structure was also prepared using protein preparation wizard of the Schrödinger Glide that includes the addition and removal of the hydrogens using preprocess, and optimization was done through refinement approach aiding the 2005 forcefield [59].The grid box was built using the pockets detected by sitemap [60] which indicate highest druggability.Therefore, we performed site directed docking.The grid box size was 10 Å × 10 Å × 10 Å.Molecular docking of 1483 AI-based ligands with homology modeled protein was accomplished with increased accuracy utilizing the Glide (grid-based ligand docking with extra precision) software included in the Schrödinger molecular modeling package [61].The optimal posture for each proteinligand interaction was determined using the Discovery Studio (DS) visualizer (Accelrys, San Diego, USA).The ligands with the highest binding energy in the interaction profile were chosen for molecular dynamics simulations [62].

Molecular dynamics simulations
Molecular dynamics (MD) simulations were used to assess the stability of the resultant complexes and the capacity of ligands to bind [63].In addition to molecular docking, this computational method enables for the investigation of substantial conformational alterations as well as the stabilization of protein-ligand complex [64,65].Furthermore, it provides a way to determine the binding affinities of the resulting complexes based on the physical motions of atoms and molecules.MD simulations were carried out using the Desmond (Schrödinger Release 2019-3) package [66,67].For the MD simulation investigation, the docked combination of protein and the top ranked compound identified from molecular docking and was employed for 1 μs time.All receptor-ligand complexes were structurally compliant until they were processed with the protein preparation wizard in the Schrodinger interface [68].Inserting hydrogens, assigning bonds, and completing incomplete amino acid side chains and loops using hydrogen bond allocation optimization and water orientation sampling (pH 7.0).The periodic simulation box was created using the Machine Builder module with TIP3P (transferable intermolecular potential with three. points) water model and dissolved using OPLS all-atom force field and 1000 iterations of the steepest descent approach for system reduction [69].
After acquiring the equilibrium, the NPT (atoms, pressure, and temperature were kept constant) assembly underwent an uncontrolled development procedure for 1 μs at 300 K and 1.01325 bar.The Nosé -Hoover thermostat and the isotropic Martyna-Tobias-Klein barostat were utilized [70,71].Short-range (cut-off = 9) and long-range Columb interactions were measured using the smooth particle mesh Ewald (PME) system (PME) [72] using a smooth particle network and the RESPA integrator [73].The system relaxation summary incorporates 1 minimization and 5 equilibration phases with NVT and NPT operations with 1 µs long time period and 5 ps frequency was used to save the configurations.After the simulation, the system's stability was evaluated using a histogram for root mean  square deviation (RMSD), root mean square fluctuations (RMSF), hydrogen bond analysis, radius of gyration (Rg), and solvent accessible surface area (SASA).Molecular mechanics generalized Born surface area (MM/GBSA) and the single trajectory technique were used to compute the binding free energy [74].

Binding pose metadynamics (BPMD)
BPMD, as implemented in Maestro v.2018.4, is a metadynamics simulation variant in which 10 separate metadynamics simulations of 10 ns are run, using CV as the measure of the ligand heavy atoms' root-mean-square deviation (rmsd) relative to their beginning location.BPMD simulation is a popular improved sampling approach for sampling free energy landscapes [75,76].For all receptor-ligand complexes binding simulation, biasing collective variables were specified as the distance between the ligand molecule's center of mass and the ligand-binding residues as collective variables (Table 3).According to the RMSD computation, protein residues within 3 Å of the ligands were chosen for alignment.Before computing the heavy atom RMSD to the ligand conformation in the initial frame, the Cαs of these binding site residues were linked with those in the initial frame of the binding pose metadynamics trajectory.The hill width and height were set to 0.05 kcal/mol and 0.02 kcal/mol, respectively.Before running the metadynamics, the system was solvated in a box of TIP3P molecules, which was followed by several minimization and restrained MD steps that allowed the system to gradually reach the desired temperature of 300 K while also generating any poor connections and/or strain in the actual starting structure.The essential notion of BPMD was that when subjected to the same biasing force, ligands that were not firmly attached to the receptor would have a greater variation in their RMSD than those that are [75].BPMD provides three scores that were connected to the ligand's stability during the metadynamics simulations: (1) PoseScore, which was the average RMSD from the initial pose.A rapid increase in the PoseScore indicated that the ligand was not at a well-defined energy minimum and so has not been correctly predicted [77].(2) PersistenceScore (PersScore) was a measure of hydrogen bond persistence calculated in the last 2 ns of the simulation that contain the same hydrogen bonds as the original structure, averaged across all 10 repeat simulations.Low PersScore was seen in structures whose interaction network is impaired by the BPMD bias.It varies from 0 to 1, with 0 indicating that the ligand had no interactions with the protein at the start and 1 indicating that the interactions between the initial ligand binding mode and the last two ns of the simulations were fully retained [78,79].

Ligand identification using DeepRepurpose framework
We screened 1483 drugs available in the DeepRepurpose framework for each of 11 viral proteins of MPXV to estimate compound-viral protein activities.DeepRepurpose used end-to-end deep learning models with compound SMILES representation and amino acid sequence of viral proteins as input and generates an activity score as input.Moreover, DeepRepurpose also used vector representations for compounds and viral proteins as input to generate an activity score.The final output of DeepRepurpose was an ensemble of 5 different machine learning models and was the average of the activity scores of these models.The top 20 compounds with the highest average activity across all the 11 viral proteins were highlighted in Supp Table 1.These comprised predominantly of antiviral drugs including Elvitegravir, MK-4965, Triciribine phosphate, Lopinavir and Filociclovir.These drugs were further tested through molecular docking to filter and prioritize the optimal set of drugs against MPXV envelope protein.

Homology modeling
UniprotKB entry Q5IXY0 with 372 amino acids length of monkeypox envelope protein F13 was passed to AlphaFold, Modweb and Robetta servers to get predicted 3D protein structure.We identified that AlphaFold had more accurate predicted structure than other platforms.Modweb and Robetta servers provided structures with low scores in the identification of the template and unreliable z-DOPE score (normalized Discrete Optimized Protein Energy).AlphaFold was employed to compute the predicted Local Distance Difference Test (pLDDT), which indicated the domain accuracy.A pLDDT score greater than 70 suggested high-quality structure.Fig. 3 depicts the mean pLDDT values of predicted protein structure and we can observe that the pLDDT scores were all above 80% for all residues.The pLDDT results showed that the predicted structures were highly accurate.Overall, our findings support AlphaFold's excellent accuracy and dependability in predicting the structure of the envelope protein of the monkeypox virus.

Molecular docking
The homology modelled protein structure was docked with 1483 pharmaceutical compounds.The top five compounds were identified as potential hits for inhibiting the monkeypox viral envelope protein F13 (Fig. 4).According to the binding energy ranking, Elvitegravir has the best binding energy of − 5.592 kcal/mol and had 11 H-bonds (with Phe 52, Leu 117, Asn 132, Ser 134, Asn 311, Lys 313, Asn 329 and His 334) (Figure S2).While MK-4965, Triciribine phosphate, Lopinavir and Filociclovir were described in Figure S3 to Figure S6 with the respective interactions.Table 1 shows the top-ranked compounds with their docking score, XP GScore, glide gscore, glide energy, glide emodel scores and glide energy with the homology modelled protein structure.The electrostatic interactions played a pivotal role in the binding of the monkeypox envelope protein.We have also considered Tecovirimat for docking studies which came in at a very low position with − 2.741 kcal/mol score.(Fig.5,6)

Molecular dynamics (MD) simulations of homology modelled receptor and Elvitegravir
The top-ranked five complexes from the molecular docking experiment were utilized to assess the conformational stability and time-dependent binding capabilities of AI-derived compounds in the homology modelled receptor's catalytic pocket employing molecular dynamic simulations.The simulation was run for a time-period of 1 µs using the Schrodinger Desmond program.To investigate the deviations of ligand binding from its original reference (docked) position, the RMSD measure was used for the conformations acquired from the simulated trajectories.The C-RMSD profile for all simulated complexes was shown in Figure S7.All drug candidates display the RMSD values for homology modelled receptor were in the 5 Å range except lopinavir drug.The RMSF assessment metric identifies atoms and amino acid residues that exhibit significant variations during the simulation duration.The RMSF evaluation    The radius of gyration (rGyr) of a ligand assessed its 'compactness' and was comparable to its crucial portrayal of stability throughout simulation time.Solvent accessible surface area (SASA) was a measure used to investigate the accessibility of solvent molecules (typically water) to the protein-ligand complex; lower SASA values represent fewer exposure to the soluble environment and the ability of the complex to maintain the hydrophobic core, thereby increasing the overall stability of the complex.Figure S9 and Figure S10 depicted rGyr and SASA properties of all simulated complexes.Among all Elvitegravir, triciribine phosphate and Filociclovir drugs in the complex with homology modelled receptor possess less variations while MK-4965 and Lopinavir depicts high variations.Figure S11 to Figure S15 shows the intermolecular interactions of all simulated complexes with the homology modelled receptor.In the simulated complexes, Filociclovir drug produced maximum number of 27 hydrogen bonds and 6 ionic bonds.Whereas Lopinavir made 24 hydrophobic interactions that determined high accountability.Using the MM/GBSA method, total free energy of binding of the top-five drugs was calculated.We have incorporated the interaction profiles of MD simulations at every 100 ns time trajectories to understand the binding mechanism.

Binding pose metadynamics (BPMD)
The trajectories of all simulated complexes were evaluated to investigate the binding kinetics of the receptor.The metadynamics simulations were carried out using the collective variable listed in Table 3.The PoseScore, which was the RMSD of the ligand with respect to the initial ligand heavy atoms coordinates, was used to determine posture stability.A PoseScore of 2 was seen as steady.Furthermore, the PersScore, which indicated the strength of the hydrogen bonds established between the ligand and the protein residues, was used to examine the data.It was deemed an indication of strong persistence if 60% of the total hydrogen bonds were preserved during the simulations.The homology modelled receptor and all simulated complexes were employed to BPMD protocol.The Po-seScore for Elvitegravir was 1.6 Å with lower RMSD range which suggested that it managed to generate strong and consistent hydrogen bond network during the simulations.However, MK-4965 and Lopinavir notified with 2.1 Å and 3.2 Å, respectively.Filociclovir listed with 4 Å and Triciribine phosphate occupied 6.4 Å PoseScore which were considered as a higher RMSD range.The PersScore was also identified with the value of 0.7 Å which was very significant for the ligand binding stability and lower structural variability.Overall, BPMD showed strong capacity to detect Elvitegravir and additional investigation of the findings gave some insights into conditions that might be hard for the approach.Rest of the drugs possessed the PersScore under the 0.2 Å (Table 4).As a result, the reference structure for BPMD was not the input structure, but the equilibrated one, which might differ dramatically in some circumstances.Given this discovery, the PoseScore was analyzed to the RMSD with reference equilibrated structure, which was referred to as the MD-RMSD.The goal was to see if unstable postures might be detected ahead of time by the MD equilibration technique.To discover the best approach for balancing accuracy and computational efficiency, a

Conclusion
The lack of a suitable medical therapy or vaccine has indeed worsened the situation of a MPXV.Targeting envelope receptor was a key strategy for designing effective inhibitors powered by computer-aided drug design strategies at the first stage of viral invasion as the disease progression is triggered by major contacts between human membrane and envelope proteins.Using a dynamic framework consisting of AI-driven screening, molecular dynamic modeling, free energy binding measurements and binding pose dynamics, this study optimized lead compounds against the homology modelled F13 envelope protein of monkeypox virus.Elvitegravir was found as potential drug candidate with a higher binding affinity score.Both docked complexes were subjected to 1 μs molecular dynamic simulations, demonstrating the structural stability of protein-ligand complexes.Various measuring techniques, such as RMSD, RMSF, gyration radius, and surface area measurements contributed to the complex's structural integrity.The binding free energy of the Elvitegravir was also measured using MM/GBSA tests, showing that associations with adjacent hydrophobic binding sites of the ligand-binding assist enhanced the binding free energy, while retaining key receptor interactions.
Our in silico pipeline provides a computer-aided drug screening approach by reducing the search space through DeepRepurpose followed by homology modelling, molecular docking, molecular dynamics simulations and binding pose metadynamics to identify the optimal set of drugs for MPXV.By using our pipeline, we narrowed down the compound search space to identify, Elvitegravir as a potential compound against MPXV whose efficacy should further be validated through in-vitro and in-vivo experiments.

Fig. 5 .
Fig.5.Combined docked posture of top-ranked drugs with Homology modelled structure from the Schrodinger Maestro module's Glide XP docking.

C
.N.P. and R.M. conceptualized and designed the project.C.N.P. and R.M. developed methodology.C.N.P. and R.M. acquired data.C.N.P. analyzed and interpreted data.C.N.P., R.M. and H.B wrote manuscript.H.B. provided executive support.C.N.P., R.M. and H.B. supervised the study.The whole manuscript was approved by all authors.

Table 1
Docking score and glide binding profiles of the top-21 hits obtained on performing molecular docking.

Table 2
demonstrations the binding free energy calculation observed for all simulated complexes.Elvitegravir secured best binding energy of − 45.32 kcal/mol while Lopinavir ranked at last position with the binding energy of − 18.99 kcal/mol.This finding demonstrates that Elvitegravir created improved crystal connections as well as new contacts, obtaining the top rank energy between all identified compounds.Elvitegravir aids to the development of a novel ligand binding theory.In 4-oxo-1,4-dihydropyridine pocket, π-stack with Trp 279 residues and π-alkyl or broadly hydrophobic interactions.It was discovered that hydrogen bonding with Ala 134, Ser 135, Ser 140, and Asn 312 was helpful.MK-4965 came in second with a binding energy of − 39.14 kcal/mol owing to novel contacts formed during docking and dynamics simulations.Filociclovir and Triciribine phosphate came in third and fourth place, respectively.According to MM/GBSA score, lopinavir has the maximum binding energy of − 18.99 kcal/mol.

Table 2
MM/GBSA profiles of top-ranked compounds while interacting with the homology modelled protein.

Table 3
Selected collective variables from ligand-binding residues for Binding pose metadynamics (BPMD) analysis.

Table 4
Binding Pose Metadynamics analysis of top ranked drugs.Patel, R. Mall and H. Bensmail Journal of Infection and Public Health 16 (2023) 799-807 detailed examination of MD versus BPMD in detecting dubious crystal structures is necessary.