Lead Identification for Severe Acute Respiratory Syndrome Coronavirus-2 Spike D614G Variant of COVID-19: A virtual Screening Process

COVID-19, a pandemic disease caused by single-stranded RNA virus Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). The structural spike (S) protein of SARSCoV-2 plays a vital role in host cell entry, where the Angiotensin-Converting Enzyme-2 (ACE2) receptor of the human cell binds to the Receptor Binding Domain (RBD) region of the S1 domain and makes cell entry. The binding affinity of SARS-CoV-2-ACE2 is tenfold higher than the SARS-CoV-1-ACE2. Recent studies expose that the SARS-CoV-2 S D614G variant is highly infectious than D614 protein, also the D614G variant is highly stable than D614. So far, there is no effective viral-specific regimen for COVID-19. To overcome such problems, in our study, we have utilized the ZINC database to screen potent leads against the highly transmitting SARS-CoV-2 spike D614G protein, through a virtual screening procedure. We have applied three computational tools iGEMDOCK server, AutoDock version 4.2.6 and admetSAR to get active leads. The ZINC000150588351 (Elbasvir), ZINC000064540179 (Sofosbuvir analogue) and ZINC000137700912 (Sofosbuvir analogue) molecules have a greater binding affinity with the high binding energies of -8.22 kcal/mol, -8.13 kcal/mol and -7.64 kcal/mol respectively. The molecules ZINC000064540179 and ZINC000137700912 have high binding energy than their core molecule Sofosbuvir (ZINC100074252) of -4.06 kcal/mol. The ADMET prediction of these molecules reveals satisfactory human intestinal absorption and non-mutagenic property. Our results deliver valuable contributions to the design of inhibitors against COVID-19.

COVID-19, a pandemic disease which was found in December 2019 in Wuhan, China was caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). 1,2 The World Health Organization (WHO) has named this novel CoV as "2019-nCoV" in January 2020. Later, it was found that the 2019-nCoV has a 78% sequence identity with SARS-CoV-1. Hence, the International Virus Classification Commission categorized 2019-nCoV as SARS-CoV-2 in February 2020. 2 CoVs fit under Coronaviridae family and Nidovirales order. SARS-CoV-2 grouped under beta CoV genera, 3 it is more infectious and has an increased mortality rate. 4 SARS-CoV-2 is a single-stranded RNA virus, which encodes structural (Spike (S), Envelope (E), Membrane (M) and Nucleocapsid (N)) and non-structural (replicase polyproteins) proteins. 5 Here, the structural proteins S, E and M are bound to the membrane, while the N protein is situated inside the virions in combination with RNA 6 . The glycoprotein S is responsible for the recognition, attachment and host cell entry of SARS-CoV-2. Also, it act as an antigenic determinant and plays a major role in antibody neutralization. 7 Hence, it is considered as the main target in CoV therapeutic research. 8,9 In humans, CoV infection is initiated by the interaction of S protein of SARS-CoV-2 with the Angiotensin-Converting Enzyme-2 (ACE-2) (play an essential role in angiotensin maturation), a metalloprotease enzyme found in human lung cells and then begin to release the viral RNA into the human cell. During the host cell entry, trimeric S protein is sliced into S1 and S2 domains. 2 This S1 domain has the receptor-binding domain (RBD), through which the SARS-CoV-2 is connected with the ACE-2 receptor. 1 The S2 domain initiates the fusion of viral membranes. 10 Few other human proteases namely, plasmin, trypsin, HAT, cathepsin L, TMPRSS2, TMPRSS4 and TMPRSS11a encourage S protein cleavage to promote attachment and fusion. 8,11 Recent studies reveal that SARS-CoV-2 isolates have mutations in many genomic parts. Such one is found in the S1 domain of the S protein of SARS-CoV-2. In the S1 domain, at 614 positions the amino acid Aspartic acid (D) is replaced with Glycine (G). 12 The literature evidence shows that the transmission of SARS-CoV-2 S proteins with D614G mutation is highly efficient than the D614 S protein. Such higher infectivity of D614G S protein is further confirmed by the in-vitro studies carried out with the epithelial cells of lung and colon, reveals the 4 fold increased infectivity of D614G S protein. 13,14 The D614G S protein has more stability than the D614 S protein. 12 Currently, many Food and Drug Administration (FDA) approved antiviral drugs have been proposed to treat COVID-19, remdesivir, remdesivir/chloroquine, lopinavir/ ritonavir, ribavirin/interferons. 5,15,16 These drugs are not viral-specific therapeutic medications for SARS-CoV-2. Such situations urge us to design a potential drug against SARS-CoV-2. Worldwide many clinical trials are in progress to identify potential drugs for COVID-19, aiming at different antiviral protein targets. In our study, we have screened currently existing antiviral drugs and their analogues (total of 500 compounds) against the highly transmitting SARS-CoV-2 S D614G protein, through virtual screening and molecular docking procedure. The resultant lead compounds were subjected to Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) property prediction to understand the oral bioavailability of the compounds. Our results will be useful for the researchers in the COVID-19 drug screening process.

Protein Preparation and active site prediction
The protein structure of the SARS-CoV-2 S D614G variant was recovered from the protein data bank (PDB), 17 PDB ID: 6XS6. 14 It is a homotrimer, which has 3 identical chains, for our virtual screening and molecular docking purpose only A chain was utilized. Since the SARS-CoV-2 S has S1 and S2 domain, the D614G mutant was present at the C-terminal end. Hence, the S1 domain alone was selected for the current study. The missing RBD region and other residues of the structure were constructed using MODELLER 9.16 18 and evaluated by the PROCHECK 19 and ProSA server 20 . Finally, the modelled structure was energy minimized by the PDB viewer and used for virtual screening. The active site residues for virtual screening and docking studies were collected from the literature.

Virtual screening
In the drug discovery process, virtual screening has become a potent in-silico method to screen millions of small molecules for the lead with preferred properties. The main purpose of the virtual screening process is to increase drug discovery and not to substitute in-vitro or in-vivo analysis. In our current study, we have implemented structure-based virtual screening and employed ZINC database for small molecules, iGEMDOCK server (Hsu et al., 2011) for virtual screening, AutoDock version 4.2.6 19 for molecular docking and admetSAR for ADMET property prediction ( Figure 1). 24

iGeMdoCK server
It is a user-friendly graphical environment, which provides space for ligand and target protein preparation to docked pose analysis. The pharmacological interactions reveal the proteinligand binding mechanism. The iGEMDOCK is useful for both post-screening analysis and pharmacological interactions. The target protein and ligands were submitted in PDB format. The iGEMDOCK employs an empirical scoring system and measures the total energy between the protein and ligand through van der Waals (vdW), H-bond and electrostatic (Elec) interactions (Fitness = vdW + H-bond + Elec, where, Fitness-total energy). 25 autodock version 4.2. 6 AutoDock software (version 4.2.6) was used for the current study. It functions based on the Lamarckian genetic algorithm (LGA), a fusion of Genetic Algorithm (GA), hybrid GA (local search hybridized with GA) and Monte Carlo Simulation. Through the graphical user interface of the AutoDock tool, the polar hydrogen (H) atoms were added to the SARS-CoV-2 S D614G structure (PDB format) and the non-polar H atoms were merged. The atoms were assigned with the Kollman partial charges. Then, the refined protein structure was saved in PDBQT format. Similarly, the ligand molecules were set with the torsion angles for flexible docking, later the ligand was saved in PDBQT format. The docking procedure was based on the specific binding site regions, hence the grid box was set to minimize the run time.
LGA is the most widely used method for docking studies. The final docked conformations were analyzed through the Auto-dock tool. 19

adMet property prediction
In the process of drug discovery and development, ADMET prediction plays an essential part. The drug discovery process includes particular disease and drug target, lead compound optimization, preclinical studies (in-vivo and invitro), filing for Food and Drug Administration (FDA), clinical studies Phase I, phase II and Phase III, New Drug Application (NDA) applied for FDA approval, a drug approved for marketing. The drug discovery process is highly a time-consuming one. Although, lots of active leads have been discovered, in recent years there is only less number of drugs have been got approved. ADMET property deficiencies of those drugs might be the reason for such stagnation. To overcome such problems, the ADMET properties of the lead compounds resulted from the virtual screening process were projected through the online server admetSAR, which predicts the drug-likeness, pharmacokinetic and pharmacodynamic properties. 24

results and disCussion
Protein structure validation MODELLER has created five protein models with different Discrete Optimized Protein Energy (DOPE). The model with a decreased DOPE score was selected as the best model ( Figure  2). The PROCHECK server has screened the stereochemical property of the constructed model with Ramachandran Plot. It indicates that amino acids are located in the favored regions. PROSA shows the Z-score value of -8.3. Both PROCHECK and ProSA further confirmed the reliability of the constructed model. The active site residues of the SARS-CoV-2 S protein which are playing an essential role in protein-ligand binding namely, GLN493, LEU452, LEU455, LEU492, LYS458, PHE486, PHE490, SER494 and TYR489 are collected from the literature.

Protein stability prediction
The DUET server has predicted the stability of the SARS-CoV-2 S D614G mutation by the free energy changes. DUET has shown the ÄÄG value of 0.475 kcal/mol (Stabilizing) ( Table  2). The complementary methods of the DUET,

Virtual screening
To identify the active lead against SARS-CoV-2 S D614G, a total of 500 small molecules were submitted to iGEMDOCK, this has provided the total energy (vdW + H-bond + Elec) of SARS-CoV-2 S D614G -ZINC compounds in kcal/mol. From the screened 500 compounds, the best 50 compounds with high total energy scores were selected and it is tabulated in Table 3. This reveals that the ZINC compounds ZINC000150588351 to ZINC000101802366, have the total energies in the range of -119.61 within the range of -89.15 to -80.25 kcal/ mol and compounds ZINC000100368814 to ZINC000096169912 scores ranges from -79.50 to -70.22 kcal/mol respectively. From these 50 compounds, the compounds ZINC000150588351, ZINC000064540179, ZINC000166442088, ZINC000137700912, ZINC001772647544 and ZINC000101802366 are considered to be the best binding energy compounds. These are subjected to docking procedures with AutoDock version 4.2.6. After molecular docking with AutoDock, the compounds ZINC000150588351 (Elbasvir), ZINC000064540179 and ZINC000137700912 (Sofosbuvir analogues) are selected as the best binding energy compounds (Table 4). They have the binding energies of -8.22 kcal/mol, -8.13 kcal/mol and -7.64 kcal/mol respectively. The amino acid residues ALA352 (2 bonds), ARG466, ASN354 and LEU492 of SARS-CoV-2 S D614G are involved in the H-bond interaction with ZINC000150588351.
A total of six H-bonds were found with bond distances of 3.4Å, 3.2Å, 2.5Å, 2.5Å and 2.4Å respectively (Figure 3 (A) & (B)). Compound ZINC000064540179 has three H-bonds with the residues ASN501, GLY496 and GLY502 of bond distances 2.2Å, 3.2Å and 2.2Å respectively (Figure 4 (A) & (B)). Similarly, compound ZINC000137700912 has formed two H-bonds with residues PHE490 (3.1Å) and SER494  (2.3Å) ( Figure 5 (A) & (B)). The molecules ZINC000064540179 and ZINC000137700912 have high binding energy than the core molecule Sofosbuvir (ZINC100074252), which has a binding energy of -4.06 kcal/mol and 3 H-bond formations. The elbasvir, an antiviral drug that inhibits NS5A and prevents the replication of RNA and assembly of virions in hepatitis C virus (HCV). The elbasvir/grazoprevir combination is used in the treatment of HCV genotypes 1 and 4. 26 Similarly, sofosbuvir is also inhibiting NS5B polymerase in HCV. 27 The compounds ZINC000150588351, ZINC000064540179 and ZINC000137700912 have high binding energy and high binding affinity with SARS-CoV-2 S D614G protein. ADMET property prediction results are shown in Table 5.
The compounds ZINC000150588351, ZINC000064540179 and ZINC000137700912 possess the human intestinal absorption capacity and lack the blood brain barrier and Caco2 (cell lines obtained from human colorectal carcinoma, used to find oral absorption) permeability. The Permeability glycoprotein (P-glycoprotein), (A) Binding orientation of ZINC000150588351 with SARS-CoV-2 S D614G protein (B) H-bond interaction of ZINC000150588351 with amino acids of SARS-CoV-2 S D614G protein. Fig. 3. Docking results for ZINC000150588351 and SARS-CoV-2 S D614G protein (A) Binding orientation of ZINC000064540179 with SARS-CoV-2 S D614G protein (B) H-bond interaction of ZINC000064540179 with amino acids of SARS-CoV-2 S D614G protein. Fig. 4. Docking results for ZINC000064540179 and SARS-CoV-2 S D614G protein a transmembrane protein involved in drug efflux from cells, shows non-substrate and noninhibitor property for ZINC000064540179, ZINC000137700912 compounds and substrate, inhibitor property for ZINC000150588351. These three compounds have shown non-inhibitor property for drug clearance (Renal Organic Cation Transporter), also non-substrate status for the cytochrome P450 (CYP) isoforms CYP450 2C9 & CYP450 2D6, Non-inhibitor status for CYP450 2C9 & CYP450 2D6. These compounds are nonmutagenic, which has reported negative results for AMES toxicity. These three compounds are considered as the potential leads against the SARS-CoV-2 S D614G variant.

ConClusion
The outburst of the COVID-19 has challenged the public health, medical and economic infrastructure. In our current study, to screen active leads against the highly infectious, more stable pathogenic SARS-CoV-2 S D614G variant, we have employed structure-based virtual screening. We have subjected a total of 500 small molecules to the virtual screening process and found compounds ZINC000150588351, ZINC000064540179 and ZINC000137700912, which have high binding energy with highly infectious SARS-CoV-2 S D614G viral protein. These compounds can be