Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target interaction deep learning model

Highlights • The MT-DTI deep learning model was used to identify potent drugs for SARS-CoV-2.• Atazanavir, remdesivir, and Kaletra were predicted to inhibit SARS-CoV-2.• Rapamycin and tiotropium bromide may also be effective for SARS-CoV-2.


Introduction
Coronaviruses (CoVs), belonging to the family Coronaviridae, are positive-sense enveloped RNA viruses and cause infections in birds, mammals, and humans [1][2][3]. The family includes four genera, such as Alphacoronavirus, Betacoronavirus, Deltacoronavirus, and Gammacoronavirus [4]. Two infamous infectious coronaviruses in the genus Betacoronavirus are severe acute respiratory syndrome coronavirus (SARS-CoV) [5] and Middle East respiratory syndrome coronavirus (MERS-CoV) [6], which have infected more than 10,000 people around the world in the past two decades. Unfortunately, the incidence was accompanied by high mortality rates (9.6% for SARS-CoV and 34.4% for MERS-CoV), indicating that there is an urgent need for effective treatment at the beginning of the outbreak to prevent the spread [7,8]. However, this cannot be achieved with current drug development or an application system, taking several years for newly developed drugs to come to the market. Unexpectedly, the world is facing the same situation as the previous outbreak due to a recent epidemic of atypical pneumonia (designated as coronavirus disease 2019; COVID-19) caused by a novel coronavirus (severe acute respiratory syndrome coronavirus 2; SARS-CoV-2) in Wuhan, China [5,9].
SARS-CoV-2, which belongs to Betacoronavirus, contains a positive-sense single-stranded RNA [(+)ssRNA] genome (29,903 bp) and contains genes encoding 3C-like proteinase, RNA-dependent RNA polymerase (RdRp), 2 0 -O-ribose methyltransferase, spike protein, envelope protein, nucleocapsid phosphoprotein, and several unknown proteins, according to the genome sequencing data of SARS-CoV-2 (https://www.ncbi.nlm. nih.gov/genbank/sars-cov-2-seqs/). Typical clinical symptoms of COVID-19 are fever, dry cough, and fatigue within 3-7 days of latency on average after infection. This is relatively slower than severe acute respiratory syndrome (SARS), which was caused by SARS-CoV [10]. During the life cycle of coronaviruses, the virus replicates via the following processes after entering the host cell: 1) translation of genomic RNA (gRNA), 2) proteolysis of the translated polyprotein with viral 3C-like proteinase, 3) replication of gRNA with the viral replication complex that consists of RNAdependent RNA polymerase (RdRp), helicase, 3 0 -to-5 0 exonuclease, endoRNAse, and 2 0 -O-ribose methyltransferase, and 4) assembly of viral components [11]. These replication-associated proteins are the primary targets of post-entry treatment drugs to suppress viral replication. Although much intensive effort is being made worldwide to develop drugs or vaccines for SARS-CoV-2, patients currently suffering from COVID-19 cannot expect benefits from them due to the slow development process of novel drugs or vaccines. Thus, a rapid drug application strategy that can be immediately applied to the patient is necessary. Currently, the only way to address this matter is to repurpose commercially available drugs for the pathogen in so-called ''drug-repurposing". However, in theory, artificial intelligence (AI)-based architectures must be taken into account in order to accurately predict drug-target interactions (DTIs). This is because of the enormous amount of complex information (e.g. hydrophobic interactions, ionic interactions, hydrogen bonding, and/or van der Waals forces) between molecules. To this end, we previously developed a deep learning-based drug-target interaction prediction model, called Molecule Transformer-Drug Target Interaction (MT-DTI) [12].
In this study, we applied our pre-trained MT-DTI model to identify commercially available antiviral drugs that could potentially disrupt SARS-CoV-2 0 s viral components, such as proteinase, RNAdependent RNA polymerase, and/or helicase. Since the model utilizes simplified molecular-input line-entry system (SMILES) strings and amino acid (AA) sequences, which are 1D string inputs, it is possible to quickly apply target proteins that do not have experimentally confirmed 3D crystal structures, such as viral proteins of SARS-CoV-2. We share a list of top commercially available antiviral drugs that could potentially hinder the multiplication cycle of SARS-CoV-2 with the hope that effective drugs can be developed based on these AI-proposed drug candidates and act against SARS-CoV-2.

Prediction of drug-target interactions using binding affinity scores
Molecule transformer-drug target interaction (MT-DTI) was used to predict binding affinity values between commercially available antiviral drugs and target proteins. MT-DTI is based on the self-attention mechanism that showed remarkable success in natural language process (NLP) literature. MT-DTI is inspired by the idea that for a chemist, understanding a molecule sequence is analogous to understanding a language. To apply the NLP model to drug-target interaction (DTI) tasks, MT-DTI is pre-trained with 'chemical language' (represented as SMILES) of approximately 1,000,000,000 compounds. Similar to the NLP model, which successfully extracts complex patterns from word sequences, MT-DTI successfully finds useful information in DTI tasks. Therefore, it shows the best performance and most robust results in diverse DTI datasets according to a previous study [12].
To train the model, the Drug Target Common (DTC) database [13] and BindingDB [14] database were manually curated and combined. Three types of efficacy value, K i , K d , and IC 50 were integrated by a consistence-score-based averaging algorithm [15] to make the Pearson correlation score over 0.9 in terms of K i , K d , and IC 50 . Since the BindingDB database includes a wide variety of species and target proteins, the MT-DTI model has the potential power to predict interactions between antiviral drugs and SARS-CoV-2 proteins.
After the MT-DTI prediction, the raw prediction results were screened for antiviral drugs that are FDA approved, target viral proteins, and have a K d value < 1000 nM. SMILES containing salt forms were excluded from the final results as the prediction is focused to pairs of a single molecule and the target protein. In addition, remdesivir was also incoprated in the analysis as its therapeutic potential to COVID-19 is recently suggested by Wang et al. [16] and Gliead Sciences announcements (https://www.gilead.com/ purpose/advancing-global-health/covid-19).

Prediction of drug-target interactions using AutoDock Vina
AutoDock Vina (version 1.1.2), which is a molecular docking and virtual screening application [17], was used to predict binding affinities (kcal/mol) between 3C-like proteinase of SARS-CoV-2 and 3,410 FDA-approved drugs. SMILES of 3,410 FDA-approved drugs were converted to the PDBQT format using Open Babel (version 2.3.2) [18] with the following options: --gen3d and -p 7.4. The hydrogens were added to the 3C-like proteinase model using MGLTools (version 1.5.6) [19]. Then, binding affinities between the protein and FDA-approved drugs were calculated using Auto-Dock Vina. The exhaustiveness parameter was set to 10.

Results
To identify potent FDA-approved drugs that may inhibit the functions of SARS-CoV-2 0 s core proteins, we used the MT-DTI deep learning-based model, which can accurately predict binding affinities based on chemical sequences (SMILES) and amino acid sequences (FASTA) of a target protein, without their structural information [12]. This deep learning-based approach is particularly useful, since it does not require protein structural information, which can be a bottleneck for identifying drugs targeted for uncharacterized proteins with traditional three-dimensional (3D) structure-based docking approaches [20]. Neverthless, MT-DTI showed the best performance [12] when compared to a deep learning-based (DeepDTA) approach [21] and two traditional machine learning-based algorithms SimBoost [22], and KronRLS [23], with the KIBA [24] and DAVIS [25] data sets. Taking advantage of this sequence-based drug-target affinity prediction approach, binding affinities of 3,410 FDA-approved drugs against 3C-like proteinase, RdRp, helicase, 3 0 -to-5 0 exonuclease, endoRNAse, and 2 0 -Oribose methyltransferase of SARS-CoV-2 were predicted. To confirm the performance of MT-DTI at least in silico, we compared the binding affinities of 3,410 FDA-approved drugs predicted by MT-DTI to those estimated by AutoDock Vina (a widely used 3D structure-based docking algorithm). It was possible since the 3D structure of the 3C-like proteinase protein was recently unveiled by the X-ray crystallography (PDBID 6LU7) [26]. Significant negative correlations, meaning that the results of both algorithms showed moderate similarities (higher is better for MT-DTI, whereas lower is better for AutoDock Vina) were observed in both the antiviral drug dataset (R = À0.34, and p-value = 0.0071) and the FDA-approved drug dataset (R = À0.32, and p-value < 2.2eÀ16) (Fig. 1). While it is not possible to determine which algorithm is more reliable without various experimental evaluations, a previous study showed that the MT-DTI model is one of the best deep learning-based models that can predict the binding affinity between a given protein and compound [12]. Therefore, we further applied the MT-DTI model to repurpose those FDA-approved drugs that have the potential to inhibit key proteins of SARS-CoV-2.
The SARS-CoV-2 3C-like proteinase was predicted to bind with atazanavir (K d 94.94 nM), followed by remdesivir, efavirenz, ritonavir, and other antiviral drugs that have a predicted affinity of K d > 100 nM potency (Table 1). No other protease inhibitor antiviral drug was found in the K d < 1000 nM range. Although there is no real-world evidence about whether these drugs will act as predicted against COVID-19 yet, some case studies have been identified. For example, a docking study of lopinavir along with other HIV proteinase inhibitors of the CoV proteinase (PDBID 1UK3) suggests atazanavir and ritonavir, which are listed in the present prediction results, may inhibit the CoV proteinase in line with the inhibitory potency of lopinavir [27]. According to the prediction, viral proteinase-targeting drugs were predicted to act more favorably on the viral replication process than viral proteinase through the DTI model (Tables 2-6). The results include antiviral drugs other than proteinase inhibitors, such as guanosine analogues (e.g., acyclovir, ganciclovir, and penciclovir), reverse transcriptase inhibitors, and integrase inhibitors.
Among the prediction results, atazanavir was predicted to have a potential binding affinity to bind to RNA-dependent RNA polymerase (K d 21.83 nM), helicase (K d 25.92 nM), 3 0 -to-5 0 exonuclease (K d 82.36 nM), 2 0 -O-ribose methyltransferase (K d of 390.67 nM), and endoRNAse (K d 50.32 nM), which suggests that all subunits of the COVID-19 replication complex may be inhibited simultaneously by atazanavir (Tables 2-6). Also, ganciclovir was predicted to bind to three subunits of the replication complex of the COVID-19: RNA-dependent RNA polymerase (K d 11.91 nM), 3 0 -to-5 0 exonuclease (K d 56.29 nM), and RNA helicase (K d 108.21 nM). Lopinavir and ritonavir, active materials of AbbVie's Kaletra, both were predicted to have a potential affinity to COVID-19 helicase (Table 3) and are suggested as potential MERS therapeutics [28]. Recently, approximately $2 million worth of Kaletra doses were donated to China [29], and a previous clinical study of SARS by Chu et al. [30] may support this decision [30]. Another anti-HIV drug, Prezcobix of Johnson & Johnson, which consists of darunavir and cobicistat, was to be sent to China [29], and darunavir is also predicted to have a K d of 90.38 nM against COVID-19 0 s helicase (Table 3). However, there was no current supporting literature found for darunavir to be used as a CoV therapeutic. Although remdesivir is not a Fig. 1. Comparison of MT-DTI and AutoDock Vina results. 60 known FDA-approved antiviral drugs (left) and 3410 FDA-approved drugs (right) were evaluated by means of the MT-DTI deep learning-based affinity score (higher is better), and AutoDock Vina docking score (lower is better). Remdesivir, which is not an FDA-approved drug, but regarded as a promising antiviral drug for SARS-CoV-2, was included in this analysis.

Discussion
In many cases, DTI prediction models serve as a tool to repurpose drugs to develop novel usages of existing drugs. The applica-tion of DTI prediction in the present study may be useful to control unexpected and rapidly spreading infections such SARS-CoV, Middle East respiratory syndrome (MERS-CoV), and SARS-CoV-2 at the frontline of the disease control until better therapeutic measures are developed.
Several recent studies have identified promising drug candidates that may help reduce symptoms of COVID-19 by inhibiting some aspects of SARS-CoV-2. For example, remdesivir and chloroquine showed inhibitory effects against SARS-CoV-2 in vitro [31]. Another in-vitro study showed that hydroxychloroquine was found Table 4 Drug-target interaction (DTI) prediction results of antiviral drugs available on markets against a novel coronavirus (SARS-CoV-2, NCBI reference sequence NC_045512.2) 3 0 -to-5 0 exonuclease (accession YP_009725309.1). * indicates isomeric form SMILES. to be more potent than chloroquine for inhibiting SARS-CoV-2 [32]. Remdesivir and lopinavir/ritonavir (Kaletra) also reduced pneumonia-associated symptoms of some COVID-19 patients [33,34]. However, these studies are based on previous knowledge that these drugs showed some inhibitory effects on similar coronaviruses such as SARS-CoV and/or MERS-CoV. In contrast, our approach was truly based on a pre-trained MT-DTI deep-learning model that understands drug-target interactions without domain knowledge [12]. In fact, MT-DTI successfully identified the epidermal growth factor receptor (EGFR)-targeted drugs that are used in clinics (in top-30 predicted candidates) among 1794 chemical compounds registered in the DrugBank database in a previous study [12], suggesting that 3D structural information of proteins and/or molecules is not necessarily required to predict drugtarget interactions.
Our results showed the following intriguing findings that need to be tested experimentally and clinically in the near future. First, MT-DTI generally showed similar results overall compared to the conventional 3D structure-based prediction model, AutoDock Vina, but some differences were observed. For example, atazanavir, remdesivir, and efavirenz were the top three predicted drugs that may bind to the 3C-like proteinase of SARS-CoV-2. This is while saquinavir, nelfinavir, and grazoprevir were the top three drugs identified by AutoDock Vina (Fig. 1). Secondly, when the search space was expanded to all FDA-approved drugs, some immunosuppressant drugs (rapamycin and everolimus) and a drug (tiotropium bromide) for asthma and chronic obstructive pulmonary disease (COPD) were identified as promising candidates by MT-DTI. In contrast, AutoDock Vina predicted purmorphanime, lumacaftor, and verrucarin A were the top three drugs that could bind to the 3Clike proteinase of SARS-CoV-2. However, there is currently no supporting evidence that these drugs may be effective in inhibiting SARS-CoV-2. Lastly, atazanavir appears to be effective in the treatment of COVID-19 by showing overall high binding affinities among tested antivirals for six proteins of SARS-CoV-2 including 3C-like proteinase and the replication complex components (Tables 1-6 and S1-6). But, this prediction also needs to be validated in vitro, in vivo, and in a wide range of clinical trials for efficacy and safety.
We hope our prediction results may support experimental therapeutic options for China and other countries suffering from the SARS-CoV-2 pandemic and align with recent clinical trials [35].

Declaration of Competing Interest
Beck B.R., Choi Y., and Park S. are employed by company Deargen Inc. Shin B. is employed by Deargen Inc as a part-time advisor. Kang K. is one of the co-founders of, and a shareholder in, Deargen Inc. Table 6 Drug-target interaction (DTI) prediction results of antiviral drugs available on markets against a novel coronavirus (SARS-CoV-2, NCBI reference sequence NC_045512.2) 2 0 -Oribose methyltransferase (accession YP_009725311.1). * indicates isomeric form SMILES.