Interaction of certain monoterpenoid hydrocarbons with the receptor binding domain of 2019 novel coronavirus (2019-nCoV), transmembrane serine protease 2 (TMPRSS2), cathepsin B, and cathepsin L (CatB/L) and their pharmacokinetic properties

As of June 2020, the coronavirus disease 19 (COVID-19) caused by the 2019 new type coronavirus (2019-nCoV) infected more than 7,000,000 people worldwide and caused the death of more than 400,000 people. The aim of this study was to investigate the molecular interactions between monoterpenoids and spike protein of 2019-nCoV together with the cellular proteases [transmembrane serine protease 2 (TMPRSS2), cathepsin B (CatB), and cathepsin L (CatL)]. As a result of the relative binding capacity index (RBCI) analysis, carvone was found to be the most effective molecule against all targets when binding energy and predicted (theoretical) IC50 data were evaluated together. It was found to exhibit drug-likeness property according to the Lipinski’s rule-of-five. Carvone has also been determined to be able to cross the blood-brain barrier (BBB) effectively, not a substrate for P-glycoprotein (P-gp), not to inhibit any of the cytochrome P molecules, and to have no toxic effects even on liver cells. In addition, the LD50 dose of carvone in rats was 1.707 mol/kg. Due to its interaction profile with target proteins and excellent pharmacokinetic properties, it has been concluded that carvone can be considered as an alternative agent in drug development studies against 2019-nCoV.


Introduction
Some Coronaviridae viruses are in circulation among people and are known to cause mild respiratory infections (Corman et al., 2019). However, it has been established that 2 important members of this family are transmitted from animals to humans and cause serious infections. These are severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). Both types have led to severe respiratory infections in humans and the death of some individuals, especially those with chronic conditions (Fehr et al., 2017). SARS was first seen in Guangdong, China in 2002 and then quickly spread to other countries. This variant caused 8,096 people to become infected and 774 of these individuals died 1 (De Wit et al., 2016). The main source of SARS-CoV has been found to be Chinese horseshoe bats 1 WHO (2003). Summary of probable SARS cases with onset of illness from 1 November 2002 to 31 July 2003 [online]. Website https://wwwwhoint/csr/sars/country/table2004_04_21/en/ [accessed 00 Month Year]. (Lau et al., 2005;Li et al., 2005). It has been reported to be transferred to humans through civet cats and raccoon dogs sold as food at Chinese wet markets (Guan et al., 2003). There is no approved antiviral agent or vaccine used in the treatment of SARS, either at the time of its first appearance or now. The spread of the pandemic that emerged in the period of 2002-2003 has been prevented by using traditional methods such as restricting people's travel and isolation of sick individuals, just like today (Hoffmann et al., 2020).
After the epidemics of SARS and MERS in the early millennium, a new and highly contagious respiratory disease was detected in Wuhan (China) towards the end of 2019 (Huang et al., 2020;Wang et al., 2020;Zhu et al., 2020). It was reported that the first infected were people who came into contact with animals in the Huanan seafood market. Later it became clear that the virus could spread among humans (Chan et al., 2020). The so-called coronavirus disease 19  spread rapidly to all parts of China. A new variant of SARS virus was found in the analysis of samples taken from sick individuals. The pathogen that caused the disease was called SARScoronavirus 2 (SARS-CoV-2) or 2019 new coronavirus (2019-nCoV) because it is from the same family as SARS-CoV (Zhu et al., 2020). In February 2020, nearly 45,000 cases were detected in China. It has been announced that about 8,000 of these cases are in critical condition and over 1,000 people have died 2 . The virus has spread to about 2 dozen countries, primarily through people traveling from China to other countries. As of June 2020, the number of COVID-19 cases worldwide has exceeded 7,000,000. More than 400,000 of these patients died as of early June 2020 3 . At present, it is not possible to make any inference about the sequence similarities between the SARS-CoV-2 and the SARS-CoV on the pandemic properties of these 2 variants (Munster et al., 2020).
In both coronavirus variants, spike protein plays an important role in the entry of the pathogen into the cell. Entry occurs as a result of the interaction between the S1 subunit of the spike protein and the receptor on the surface of the target cell. However, for entry, priming of the spike protein by cellular proteases is required. In this process, the spike protein is cut at the S1/S2 point and the S2'unit provides the junction between the virus and the cellular membrane. The virus needs angiotensinconverting enzyme 2 (ACE2) to bind to the receptor on the target cell surface (Li et al., 2003). Research has shown that transmembrane serine protease 2 (TMPRSS2) is involved in the priming of spike proteins of SARS viruses (Matsuyama et al., 2010;Glowacka et al., 2011;Shulla et al., 2011). Since TMPRSS2 is actively involved in the priming process of the 2019-nCoV spike protein, there is some evidence that camostat mesylate, which has an inhibitory effect on this protease, prevents infection in lung cells. Cathepsin B and cathepsin L (CatB/L), which are endosomal cysteine proteases, are also thought to be involved in priming of the spike protein of 2019-nCoV. Thus, it has been shown that inhibition of these proteases may also prevent the virus from entering the cell (Hoffmann et al., 2020).
Plant secondary metabolites are synthesized by many species such as vegetables, fruits, medicinal and aromatic plants (Prakash et al., 2007;Singh et al., 2010 plant secondary metabolites in the treatment of some viral infections. Various studies have been conducted on the potential of some of these phytochemicals to inhibit the receptor binding domain (RBD) of the spike protein of 2019-nCoV, cellular proteases or endoribonucleases (Meneguzzo et al., 2020;Sampangi-Ramaiah et al., 2020;Tallei et al., 2020;Thuy et al., 2020).
The aim of this study was to investigate the molecular interactions between monoterpenoid hydrocarbons ( Figure 1 and Table 1), which constitute an important group of plant essential oils, and i) RBD of the spike protein of 2019-nCoV and ii) cellular proteases (TMPRSS2, CatB and CatL). In addition, drug-likeness properties and ADMET profiles of monoterpenoids were presented. Using the binding free energy (kcal/mol) and predicted (theoretical) IC 50 (mM) values of the monoterpenoids, the relative binding capacity index (RBCI) values were also statistically calculated and 'hit' compounds were determined.

Computational capacity
Two Dell laptops (Windows 8.1 and Windows 10 operating system) were used to perform the molecular docking analyses in the current study. One of the computers had the Windows 10 operating system and was equipped with the Intel Core i5-7200U CPU 2.50 GHz and 2.70 GHz processors. The other computer with the Windows 8 operating system had Intel Core i5-5200U CPU 2.20 GHz and 2.20 GHz processor power.

Structural optimization of ligands
The protein data bank (.pdb) files of all the ligands have been downloaded from PubChem (https://pubchem.ncbi. nlm.nih.gov) via the download module of Vega ZZ 3.2.0.9 software. In the Vega ZZ, the atom types and electrical charges of the ligands were fixed with MMFF94 force field and Gasteiger-Marsili parameters (Pedretti et al., 2004). The ligands were energetically minimized by the conjugate gradient minimization method. For this purpose, the minimization steps and tolerance were set to 1000 and 0.01, respectively.

Energy minimization of 2019-nCoV ACE2-RBD, TMPRSS2, CatB/L using nanoscale molecular dynamics (NAMD)
Firstly, in the Vega ZZ environment, the structure of the spike glycoprotein was gained by removing the ACE2 subunit from the angiotensin-converting enzyme 2 -2019-nCoV receptor binding domain (RBD) complex which was downloaded from the url: https://swissmodel.expasy. org/interactive/HLkhkP/models/03 (PDB ID: model_03. pdb) (Camacho et al., 2009;Remmert et al., 2011). Since the structure of the spike glycoprotein in model_03 shows a sequence identity of 99.88% to the 2019-nCoV ACE2binding domain, this model was chosen as the appropriate 3D structure in the docking analyses. During the protein preparation step, the atom types and electrical charges of the spike glycoprotein were fixed using CHARMM22_ PROT force field and Gasteiger-Marsili charges. Next, for the energy minimization of the spike glycoprotein with NAMD, each parameter was loaded from a template file. The number of time steps (number of minimization steps) were set to 10,000 and CHARMM22_PROT was set as the force field. When the energy minimization was completed, the 3D structure corresponding to the last minimization step was saved as the lowest energy conformation. Also, to keep the spike glycoprotein structurally closer to the original crystallographic data, atom constraints were also applied to the protein backbone. In the energy minimization of TMPRSS2, CatB, and CatL, the same steps described for the 2019-nCoV spike glycoprotein were applied.

Homology modeling of TMPRSS2
Since the crystallographic data of TMPRSS2 enzyme structure has not been resolved until today, we generated  BLAST was used to search the TMPRSS2 catalytic domain target sequence against the primary amino acid sequence in the SMTL. As a result of the BLAST search, a total of 788 templates were found. An initial HHblits profile has been built using the procedure as described in (Remmert et al., 2011). This procedure was followed by 1 iteration of HHblits against NR20. The obtained profile was then searched against all profiles of the SMTL and, finally, a total of 1167 templates were found. ProMod3 was used to carry out model building for TMPRSS2 catalytic domain based on the target-template alignment. The coordinates preserved between the target structure and the template were copied from the template to the model. Insertion and deletions were remodeled based on the fragment library. Subsequently, the side chains were rebuilt. Finally, using the CHARMM27 force field, the geometry of the resulting TMPRSS2 model was optimized. In cases where ProMod3 failed in loop modeling, an alternative model was developed with PROMOD-II (Guex et al., 2009).
The model quality (global and per-residue) of TMPRSS2 obtained was evaluated with the QMEAN scoring function (Studer et al., 2020). The near-zero QMEAN score reflects a good agreement between the model structure and the experimental structure, although scores of -4.0 and below indicate that the model is of low quality. Therefore, among the top 5 TMPRSS2 templates obtained as a result of homology modeling, the 5ce1.1.A (model 06) template with the QMEAN score closest to zero (QMEAN = -1.43) was selected as the target in the docking analysis.
In addition, whether our model has an energetically favorable conformation was analyzed by generating a Ramachandran plot in the PROCHECK (Laskowski et al., 1993) web-based tool. ERRAT (Colovos and Yeates 1993) online web-based tool was also deployed to calculate the overall quality factor (OQF) for nonbonded atomic interactions.
In the molecular docking analyzes, polar hydrogen atoms in the receptor and the ligand molecules were retained while nonpolar hydrogens were merged and then, the Gasteiger charges of the ligands were calculated with AutoDockTools as previously described (Ricci and Netz 2009;Nasab et al., 2017). In addition, the Kollmann charges were added for the receptor. During the docking experiments, all the rotatable bonds of the ligands were allowed to rotate and then the optimized protein (rigid) and ligand (flexible) structures were saved in PDBQT format. Grid box coordinates were adjusted as: a) 80 × 90 × 40 Å points for the spike glycoprotein; b) 60 × 110 × 86 Å points for TMPRSS2; c) 86 × 84 × 44 Å points for CatB; and d) 54 × 52 × 60 Å points for CatL. Prior to docking analyses, these grid box sizes were determined to include the active amino acid residues of these enzymes.
In all docking analyses, 50 genetic algorithm (GA) runs using an initial population of 150 individuals, maximum number of 2,500,000 energy evaluations, and a maximum number of 27,000 generations were selected. The values of 0.02 and 0.8 were chosen as the default parameters for mutation and crossover rates, respectively. After 50 independent docking runs, all the possible binding modes (conformations) of the ligands were clustered by the program and were ranked based on the lowest RMSD (root mean square deviation) and the binding free energy (kcal/mol) of the ligand conformation. The best docking poses obtained using the AutoDock 4.2 between the ligand and receptor structures was analyzed with the BIOVIA Discovery Studio Visualizer 2016.

Success criteria set in docking analysis
In the current study, the RMSD (root mean square deviation) value of the docking results obtained for each phytochemical analyzed was considered successful when it was less than 2 angstroms (<2 Å). The criterion considered after the RMSD value was the binding energy (deltaG) of the ligand in the most efficient docked complex. Briefly, the closeness of all the phytochemicals tested in this study to the ACE2-RBD, TMPRSS2, CatB and CatL, and then the energy of that binding in these zones were determined (Morris and Lim-Wilby, 2008). The calculated inhibition constants (K i ) obtained with AutoDock 4.2 for each docked phytochemical were also given.

Drug-likeness prediction, ADMET profile and target prediction
The drug-likeness, ADMET and target profiles of potential hit compounds are very important in terms of reducing side effects in the pharmaceutical industry. In our study, web-based SwissADME, pkCSM and Swiss Target Prediction online tools were used to determine such effects of monoterpene hydrocarbons analyzed (Pires et al., 2015;Daina et al., 2017;Daina et al., 2019).

Calculation of RBCI values
A new analysis method called RBCI was applied to statistically rank the activity potentials of phytochemicals by using binding energy and IC 50 values obtained from the parameters given above. Through this analysis, it is possible to compare data, each of which has different scientific meanings, statistically with each other. If sorting based on the interaction of molecules with proteins is performed according to only 1 of these parameters (eg binding energy or IC 50 value only), the molecules can only be sorted in terms of their potential in that parameter. However, sorting using only 1 of these parameters cannot represent the activity potential of these molecules from all parameters.
The most commonly used method to determine the interaction between the receptor-ligand in multiple measurements is the central tendency, in which the components are ranked-based on the mean value for each component (Zar, 1996). However, since the units and scales of the data obtained from each parameter are different, it is not possible to obtain an average value for all components.If the values in each data set (binding energy and IC 50 ) are converted to standard scores, it is possible to compare them with each other.
In order to calculate the arithmetic mean values, first of all, binding energy and IC 50 data of each phytochemical were used regardless of their units and raw values were obtained. These raw values calculated for each component were subtracted from the arithmetic mean and divided by standard deviation, and standard scores were obtained (see equation given below) (Sharma, 1996). RBCI values of each phytochemical was calculated by averaging these standard scores obtained separately for each protein target.
where 'x' is the raw data, 'μ' is the mean, and 'σ' is the standard deviation.
Although RBCI is a relative index and does not represent the specific binding capacities of the components, it makes it possible to rank components reasonably based on their binding energy and IC 50 values. Therefore, it can be used as an integrated approach to evaluate the molecular interaction of the components, considering all parameters.

Homology modeling
Of the 5 models created by ProMod3, model 06, which had the QMEAN scoring function closest to zero (QMEAN = -1.43) was chosen. The sequence identity of TMPRSS2 (model 06) created by ProMod3 was 33.82% and sequence similarity was 38%. If the target and template sequences show an amino acid identity over 30% and above, homology modeling is accepted as being reliable and successful (Xiang, 2006). To further verify model used in this study, a Ramachandran plot was created through the PROCHECK web-based tool to evaluate the energetically allowed regions of TMPRSS2 ( Figure 2). According to plot statistics, 84.6% of residues were in most favored regions, 14.7% was in additional allowed regions, 0.3% was in generously allowed regions, and 0.3% of the residues were in disallowed regions. ERRAT web-based tool was also used to calculate the overall quality factor (OQF) of nonbonded atomic interactions of the model. According to this online web-based tool, the OQF of a high quality model should be above 91%. The OQF calculated by the ERRAT server of TMPRSS2, which was created as the homology model, was 92.92% ( Figure 3).

RMSD values
RMSD values of atomic positions obtained as a result of computer-based interactions of monoterpenoid hydrocarbons with spike, TMPRSS2, CatB and CatL are given in Tables 1, 2, 3, 4, and 5, respectively. RMSD values obtained as a result of the interaction of monoterpenoids with the proteins in question were in the range of 0.02-1.97, 0.01-20.103, 0.00-13.569, and 0.01-1.14 Å, respectively. The compound showing the lowest RMSD value in interaction with spike, TMPRSS2 and CatB was camphene (0.02, 0.01, and 0.00 Å, respectively). On the other hand, RMSD values of borneol, sabinene, β-pinene, camphor, and carvone were also found to be quite low. With some exceptions, the RMSD values of monoterpenoids on all proteins were below 2.0 Å. The RMSD values obtained as a result of the interactions of menthol with TMPRSS2 and p-cymene with Cat B were found to be quite high (20.103 and 13.569 Å, respectively).

Binding energies
The binding energies of monoterpenoids to target molecules ranged from -5.01 to -2.38 kcal/mol. The binding energies between monoterpenoids and spike, TMPRSS2, CatB and CatL were in the ranges of -5.0/-3.6, -4.45/-2.38, -5.53/-3.78, and -5.03/-3.8 kcal/mol, respectively. The binding energy of carvone against all proteins was found to be significantly lower than the others. In addition, the binding energies of menthol, camphor, and α-terpineol were also found to be promising. Linalool and myrcene were the molecules with the highest binding energy against all proteins.

Predicted IC 50 values
The predicted IC 50 values of monoterpenoids against spike, TMPRSS2, CatB, and CatL were in the range of 0.213-2.29, 0.547-17.89, 0.087-1.7 and 0.205-1.65 mM, respectively. As with the binding energies of the molecules, the IC 50 value of carvone was found to be significantly lower than other monoterpenoids. In addition, the IC 50 values of camphor, menthol, and α-terpineol against all protein targets were also low. IC 50 values of myrcene and linalool were quite high compared to other monoterpenoids. In particular, IC 50 value of linalool against TMPRSS2 was determined as 17.89 mM.

Bonding interactions
The bonds between the monoterpenoids and contact residues in the active regions of the target proteins are detailed in Tables 1, 2, 3, and 4. Van der Waals and hydrophobic interactions (both p-p and mixed p/alkyl) played an important role in the interaction between monoterpenoids and RBD of the spike protein of 2019-nCoV. Van der Waals forces and hydrophobic interactions also played a decisive role in the molecular interactions between monoterpenoids and TMPRSS2, CatB, and CatL. However, it was found that hydrophobic effects between monoterpenoids and target proteins were rather established through mixed p/alkyl interactions.
In general, no interaction has been formed between monoterpenoids and Leu455, Phe486, and Gln493, amino acids involved in binding ACE2 in the RBD of the spike protein of 2019-nCoV ( Figure 4). However, it was found that molecular interaction was formed between other active amino acids (Asn501 and Tyr505) and monoterpenoids. In addition, monoterpenoids were found to interact with other amino acids (Arg403, Tyr453, Tyr495, Gly496 and Phe497) of the spike protein.
As understood from heatmap, which shows the molecular interaction between monoterpenoids and TMPRSS2 ( Figure 5), carvone and carvacrol were the most intensely interacting molecules with the protein in question. On the other hand, the number of bonds between TMPRSS2 and camphene, α-terpineol, thymol, camphor, and linalool was less than others. While molecular interaction was formed between monoterpenoids and the active amino acids of TMPRSS2, His296 and Ser441, no interaction with the other active amino acid, Asp345, was achieved. Monoterpenoids also showed several interactions with some other amino acids of TMPRSS2 (His279, Val280, Cys281, Thr393, Gln438, and Gly439).

RBCI values of monoterpenoids
As known, in this study, the molecular interaction between 15 different monoterpenoids and 4 different protein targets was analyzed. Binding energy and IC 50 data were obtained in the interaction analysis for each protein-ligand system. In order to detect 'hit' monoterpenes, a new analysis method called RBCI, the details of which are given in the 'Material and methods' section, has been applied. In order to calculate the RBCI values of phytochemicals, the binding energy and IC 50 values of each monoterpene against all proteins were used together. It is not possible to use all data sets, each of which has different scientific meanings, in order to rank about the effectiveness of phytochemicals on target proteins. However, with RBCI analysis, the effectiveness of phytochemicals on all proteins can be determined by using different data sets at the same time. Thus, the most effective protein on all 4 proteins can be documented statistically (Figure 8).
RBCI analysis resulted in the superiority of carvone. Top ranked conformation of carvone in the RBD of the spike protein of 2019-nCoV, TMPRSS2, CatB, and CatL are presented in Figure 9. RBCI analysis also confirmed 1 1 1 Ser28 1  that p-cymene, myrcene, and linalool are the weakest components against protein targets in question.

Pharmacokinetic properties of monoterpenoids
Drug-likeness properties of docked monoterpene hydrocarbons against spike glycoprotein of 2019-nCoV, TMPRSS2, CatB and CatL and their ADMET profiles are given in Tables 6 and 7, respectively. Carvone, myrcene, α-terpineol, thymol, carvacrol, β-phellandrene, α-terpinene, limonene, linalool, and borneol were determined to be molecules in accordance with Lipinski's rule-of-five (Lipinski et al., 1997). However, the Moriguchi Log P (MLOGP) values of camphene, sabinene, β-pinene, α-thujene, and p-cymene have been found to be higher than 4.15. It has been determined that all molecules can cross the blood-brain barrier (BBB). None of the molecules were substrates of P-glycoprotein (P-gp). While camphene, β-pinene, thymol, carvacrol, p-cymene, and limonene inhibited CYP2C9, CYP1A2, and CYP2D6, other monoterpenoids had no inhibitory effect on cytochromes. None of the monoterpenoids exhibited toxic effect. The LD 50 doses of the molecules in rats were in the range of 1.549-2.074 mol/kg. While thymol and carvacrol were monoterpenoids with the highest LD 50 value, the LD 50 values of sabinene and camphene were found to be lower than others.

Discussion
RMSD is a value commonly used in molecular docking analysis to measure the reproduction quality of a known binding pose of the molecule couples. This value is functional when the ligand molecule shows different poses in the binding site of the protein. Researchers suggest that RMSD values less than 2.0 Å are an important indicator of the quality of binding poses (Ramírez and Caballero, 2018). Accordingly, almost all of the analyzed monoterpenoids were found to have RMSD values below 2.0 Å against protein targets. It was found that RMSD values obtained only from menthol-TMPRSS2 and p-cymene-CatB interactions were above this critical limit and therefore could not exhibit sufficient binding activity on these molecules. There was no correlation between the RMSD values of the molecules and binding energies and predicted IC 50 values. However, a high correlation was found between binding energies obtained against all protein targets and IC 50 values.
As a result of docking analysis, monoterpenoids interacted with more than half of the active amino acids of protein targets. Monoterpenoids mostly interacted with the active amino acids of CatB and TMPRSS2 (69% and 66% of active amino acids, respectively). Monoterpenoids bound with 55% of active amino acids of CatL, while this ratio has decreased to 50% in spike glycoprotein.
As a result of the RBCI analysis performed to detect the most effective phytochemicals against the all protein targets, carvone was clearly superior to others. It was found to exhibit drug-like property according to the Lipinski's rule-of-five. As it is understood from the ADMET profile, carvone can effectively pass through BBB. P-gp is an important cell membrane component responsible for the removal of toxic molecules from the cell. This protein,  which is found in many organisms, is thought to be one of the cell's defense mechanisms against harmful substances (Nguyen et al., 2020). As can be seen from Table 7, where ADMET profiles of monoterpenoids were given, carvone is not a substrate for P-gp. This finding shows that carvone is not perceived by the cell as a toxic molecule. In addition, the results of AMES toxicity and hepatotoxicity analysis support this finding. Carvone has also been found not to inhibit CYPs involved in the oxidation of xenobiotics and some endogenous harmful compounds (Rettie and Jones, 2005;Spector and Kim, 2015). This finding means that with carvone intake, metabolic activities based on oxidation can continue without disruption. LD 50 dose of carvone in rats was also found to be better than those of sabinene, camphene, α-thujene, myrcene, β-pinene, and linalool. In Figure 10, the intracellular targets of carvone are included. According to this pie chart, the vast majority of intracellular targets of carvone are oxidoreductases and nuclear receptors. Oxidoreductases are enzymes that transfer the electron from the donor molecule to the acceptor. Usually they use NADP or NAD+ as the cofactors. In many organisms, transmembrane oxidoreductases are known to form the electron transfer chain (Rac and Fulgosi, 2020).
Nuclear receptors are responsible for delivering the signal carried by some hormones (e.g., thyroid or some steroidal hormones) to the nucleus of the cell. Thus, these proteins regulate the expression of certain specific genes along with some other proteins (Evans, 1988). Although it is known that carvone targets these proteins, it is thought that detailed analyzes should be made to understand whether carvone has an inhibitory and/or activator effect on the activities of these proteins. As a result of the literature research, the molecular interaction between spike glycoprotein of 2019-nCoV and carvone, camphene, α-terpineol, thymol, carvacrol, camphor, β-phellandrene, menthol, linalool, and borneol was analyzed and the binding energies of these molecules were found to be in the range of -3.4/-3.7 kcal (Smith and Smith, 2020). The data in question were found to be weaker than the data presented in the current study. However, no reports of the interaction of other monoterpenoids with the spike protein have been found. Additionally, no study was found documenting the interaction between neither carvone nor other monoterpenoids and TMPRSS2, CatB, and CatL. Therefore, the data presented here were considered to be the first records for the literature.