Virtual screening of substances used in the treatment of SARS-CoV-2 infection and analysis of compounds with known action on structurally similar proteins from other viruses

Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) is considered the etiological agent of the disease that caused the COVID-19 pandemic, and for which there is currently no effective treatment. This pandemic has shown that the rapid identification of therapeutic compounds is critical (when a new virus with high transmissibility occurs) to prevent or reduce as much as possible the loss of human lives. To meet the urgent need for drugs, many strategies were applied for the discovery, respectively the identification of potential therapies / drugs for SARS-CoV-2. Molecular docking and virtual screening are two of the in silico tools/techniques that provided the identification of few SARS-CoV-2 inhibitors, removing ineffective or less effective drugs and thus preventing the loss of resources such as time and additional costs. The main target of this review is to provide a comprehensive overview of how in-silico tools have been used in the crisis management of anti-SARS-CoV-2 drugs, especially in virtual screening of substances used in the treatment of SARS-CoV-2 infection and analysis of compounds with known action on structurally similar proteins from other viruses; also, completions were added to the way in which these methods came to meet the requirements of biomedical research in the field. Moreover, the importance and impact of the topic approached for researchers was highlighted by conducting an extensive bibliometric analysis.


Introduction
Since the development of the modern digital era, numerous scientific fields have made a sudden transition to datadependent research methods [1]. Bioinformatics is a field of science that was developed as a result of this progress [2,3]. Molecular docking, virtual screening (VS), and molecular dynamics represent important tools that can be used in bioinformatics. Experiments conducted in vivo or in vitro are expensive and time-consuming, but they provide accurate results. By comparison, in silico ones do not yield information that is as accurate. However, they are faster and less expensive, making them widely used in research in the pharmaceutical industry and other scientific fields [2,4].
Molecular docking studies, as a part of in silico analysis, are carried out using a biomedical informatics program that attempts to predict as accurately as possible how a ligand (a small molecule or a protein) interacts with a receptor represented by proteins or genetic materials [5]. These studies are used to identify compounds with therapeutic potential. Furthermore, they accurately determine the interaction between the ligand and the protein, and help establish a link between the chemical structure and the activity of the studied compound [6]. The Protein Data Bank (PDB) is a repository of three-dimensional structures of proteins that are used as receptors for molecular docking studies [7]. Selection of the receptor is a crucial step in the docking process, and if there are multiple sources for a protein, the higher resolution receptor will be selected [8].
The docking of numerous molecules on a target protein and the assessment of these compounds based on the score supplied using an algorithm is the basis of VS. Molecular dynamics allows the introduction of variables such as protein flexibility, behavior in the presence of water and ions and the evolution of the system over time. By associating VS / molecular docking with molecular dynamics, the obtained results are significantly closer to the invivo ligand-protein interaction mode [9]. Although investment in drug research and development is on an upward trend, the number of new drugs introduced into therapy declined significantly in recent years [10]. The regulatory agencies authorize only a small percentage of the drugs tested. Therefore, pharmaceutical companies tend to use drugs that are already used in therapy by finding new indications for them. A recent example is the repositioning of drugs to treat SARS-CoV-2 infection since the onset of the pandemic [11][12][13].
The present research aims to provide an overview of how in silico tools have been used in the crisis management of anti-SARS-CoV-2 drugs, especially in virtual screening of substances used in the treatment of SARS-CoV-2 infection, also considering the analysis of compounds with known action on structurally similar proteins from other viruses, and the way these methods were usable in the biomedical research. Moreover, the importance of the topic for researchers and the impact of the information in the field were underlined by conducting an extensive bibliometric analysis. Using scientific mapping technology to visually assess knowledge structure and research trends facilitates the access of future authors to the most prolific journals while also revealing the most relevant and productive researchers in the field. Identifying the terms with a high occurrence and the citation average for the articles in which they are contained highlights the latest developments in this field. The keywords representing antivirals and other chemical substances are useful tools in understanding the current trends in molecules used for docking or included in databases subjected to virtual screening.

Bibliometric analysis
In order to assess the importance of the topic, an extensive bibliometric analysis was performed, highlighting the most important parameters describing literature such as data source, journal, impact factor, number of citations, most prolific authors, etc.

Data source
A scientific literature search was performed on the Scopus database, the largest abstract and citation database to date. The following search parameters were used: articles that were written in English with the following keywords in the title, abstract, and keywords: SARS-CoV-2, molecular docking, virtual screening. 575 results were identified, of which 539 (93.74%) were original articles, 16 (2.78%) reviews, and the rest of them were included in the following categories: book chapter, conference paper, and letter. Most of the articles were classified in the following subject areas: -Biochemistry, Genetics and Molecular Biology-304 articles‖, -Chemistry-175‖, -Pharmacology, Toxicology, and Pharmaceuticals-171‖, -Computer Science-117‖, and -Medicine-104‖; other subject areas had <100 articles.

Leading countries in publishing SARS-CoV-2 and molecular docking papers
India, the United States, China, and Saudi Arabia published most of the articles (75.88%) in this field. Despite ranking third in terms of the number of published articles, China has a far higher average citation/document than India and the United States. Although Italy has only 25 published articles, the average citation/document is 20. Pakistan also stands out with an average citation/document of 26.41 for 22 articles. India had a high total link strength (TLS) score because it collaborated with a significant number of countries (48) in the publication process. Table 1 shows all the countries that had more than 20 published papers in the topic. VOSviewer was used to create the network map of country co-authorship for articles matching the presented search terms (Figure 1). Countries that frequently co-author articles are included in different clusters:  the red cluster includes Bangladesh, China, Germany, Iran, Italy, Pakistan, and Spain;  the green cluster includes Egypt, Nigeria, South Africa, Turkey, and the United Kingdom;  the blue cluster includes India, Saudi Arabia, and South Korea;  the yellow cluster includes the United States and Brazil.
The bubble size reflects the TLS, which means that more collaborative countries have a larger bubble. The strength of the link (LS) between two countries is shown by the line's thickness. The line connecting two countries will be thicker if they frequently publish together because the LS score for those publications is higher. Researchers from India (TLS 104), Saudi Arabia (91), and the United States (70) closely collaborated with a significant proportion of foreign academics. Indian researchers had the highest level of collaboration with researchers from Saudi Arabia, followed up by experts from the United States and South Korea.

Journals
Over 200 journals have published at least 1 article matching the search terms. The journals that showed the most interest regarding in-silico studies are presented in Table 2. All the information regarding the impact factor (IF) was obtained using the Journal Citations Reports website. When analyzed in terms of the IF, the highest-ranking journal was the International Journal of Molecular Sciences (5.924) and the lowest ranking was the Journal of Molecular Graphics and Modelling (2.518). Furthermore, to assess whether molecular docking and virtual screening are within the scope of interest of these journals outside of the context of SARS-CoV-2, the term SARS-CoV-2 has been removed from the search criteria, and the number of papers from 2012 to the present is analyzed (Figure 2). Several variables are considered while selecting a journal to publish a scientific paper, including the topic area in which the journal shows interest, in order to increase the chances of successful publication.  Table 2 2.4. Prolific authors J o u r n a l P r e -p r o o f Of the 539 articles analysed, 2868 authors were identified. Table 3 presents the top ten writers ordered by the number of published papers. Furthermore, they have 82 (15.2%) publications with 1393 citations.  Figure 3 shows the network map of co-authorship authors, created by VOSViewer; only authors with a minimum of 5 articles were represented. The bubble map shows the relationship between the authors, highlighting the groups of authors who published together. There were generated six clusters (e.g., the red cluster including 11 authors, the yellow cluster 4, the green cluster 9, the blue cluster 5, the purple cluster 3, and the cyan cluster 1). The most extensive set of interrelated data featured 29 interconnected writers.  Table 4 shows the most cited papers, the authors' names, the journal in which they published, and the publication's IF. 7 out of the 10 articles were published in 2020, in journals with IF ranging from 3.39 (Journal of Biomolecular Structure and Dynamics ) to 11.614 (Acta Pharmaceutica Sinica B). Considering that nearly all authors require access to high-quality information when writing a paper, the number of citations is a good measure for quantifying an article's quality. The IF of the journal in which an article is published is also a reliable predictor of the quality of the material contained. Kumar Y.

Most cited articles
In silico prediction of potential inhibitors for the main protease of SARS-CoV-2 using molecular docking and dynamics simulationbased drug-repurposing Journal of Infection and Public Health 3.718 126 An in-silico evaluation of different Saikosaponins for their potency against SARS-CoV-2 using NSP15 and fusion spike glycoprotein as targets Journal of Biomolecular Structure and Dynamics -103 [24] * Impact factors are given according to year 2020.

The origin of SARS-CoV-2
Coronaviridae, Arteriviridae, Mesoniviridae, and Roniviridae comprise the order Nidovirales, of which coronaviruses are the largest group. The Coronaviridae family is further classified into two subfamilies, Torovirinae and Coronavirinae. The Coronavirinae subfamily is subdivided into 4 genera [25], as it is presented in Figure 5. The form of presentation has been chosen as to highlight the taxonomic hierarchy in biological classification of SARS-CoV-2. HCoV-229E and HCoV-NL63 of the genus Alphacoronavirus usually cause common colds, but they can produce complications in immunocompromised patients or the elderly. Betacoronaviruses like HCoV-OC43 and HCoV-HKU1 are non-virulent, while SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV) may cause severe symptoms. Alphacoronavirus and Betacoronavirus also contain animal viruses. Furthermore, only the viruses found in animals have been identified as Gammacoronavirus and Deltacoronavirus [26][27][28].
The origin of SARS-CoV-2 remains unknown, and the generally accepted hypothesis is that the disease was transmitted from animal to human (zoonotic transmission). In silico-related studies describing the interactions between human ACE-2 receptor homologues, TLRs and SARS-CoV-2 spike glycoprotein [31] showed that the glycoprotein can be considered phylogenetically as close to the coronavirus found in bats, binding mighty to the ACE2 receptor protein (both animal  bat, and human). In the same study, it was revealed that cell surface TLRs (i.e., TLR4) are probably the most involved in recognizing the patterns of molecules induced by SARS-CoV-2, which produce the characteristic inflammatory responses. Thus, the data come to support the zoonotic origin (from a bat) hypothesis of SARS-CoV-2 [32].
The genome of the new coronavirus is 96.2% similar to RaTG13, a β-CoV found in bats, and 79.5% to SARS-CoV [33,34]. The majority of the SARS-CoV-2 genome is represented by open reading frames (ORF1a and ORF2b), which will be translated into two pp1a and pp1ab polyproteins. The two polyproteins will be cleaved by papain-like protease (PL pro ) and main-protease viral proteases (3CL pro , M pro ) to create 16 non-essential proteins (nsp). The rest of the genome will be translated into four structural proteins spike (S), Membrane (M), Envelope (E), and Nucleocapsid (N) [35].

Drug repositioning
The process of developing new drugs is costly and time-consuming. The pharmaceutical industry is experiencing issues due to the low number of novel pharmaceuticals launched onto the market in recent years. To address this issue, pharmaceutical companies have resorted to the following methods: improved formulation (modified release), improved bioavailability by co-administration of two pharmaceuticals (e.g. pharmacokinetic enhanced protease inhibitors with ritonavir or cobicistat [36]), and drug repositioning (

Molecular docking
Molecular docking is the process of simulating how a receptor will interact with a ligand. The ligand is usually a small molecule that is thought to have a pharmacological effect, while the receptor is a protein or nucleic acid molecule [5]. Because molecular docking predicts the three-dimensional structure of the ligand-protein and the orientation of the ligand, this method can be used in the discovery of new drugs or in drug repositioning. The binding potential of the ligand to the receptor is calculated using different algorithms that consider possible interactions (e.g., columbic, van der Waals) [40,41].
The foundations of molecular docking have been laid since 1980, and considerable progress has been made to date. The software is far more accurate and accessible. Thus, academic researchers conduct molecular docking studies more frequently (Figure 7) [42][43][44]. If the target protein's structure has not been identified, the homology modeling approach can be utilized, which entails building a protein with an unknown structure from a homologous one whose structure is known. [41]. The way a ligand binds to a certain protein (position prediction) gives information about the interactions between them. Binding energy is often used to compare ligands with similar chemical structures (the hypothesis being that an analog or a substance with a higher affinity than the native ligand may cause competitive inhibition). Furthermore, the goal of VS is to discover new ligands with therapeutic potential [32,45].
Following the docking studies, a checkup is necessary to see if the approach used or the software functioned appropriately. The verification process involves re-docking the native ligand, and calculating root mean square deviation (RMSD) [45]. Calculating the RMSD determines the distance between a pair of superimposed atoms, the value of which is calculated for heavy backbone atoms. The docking protocol is considered better in molecular docking when the RMSD value is low. [46].
The pandemic has brought to light a significant issue affecting humanity, namely the absence of therapies for new and quickly spreading illnesses. In such cases, one alternative is to find new therapeutic indications for authorized drugs, and molecular docking is a helpful tool to achieve this. [47].

Virtual screening (VS)
Pharmaceutical companies employ high-throughput screening (HTS) to identify novel compounds. HTS is an automated process that uses robots paired with high-sensitivity sensors to examine a large number of molecules in a short time [48]. However, HTS is an expensive method that requires special equipment, which has led to the need to find a more economical alternative. Thus, VS has been developed, through which numerous compounds are docked on a protein, being previously downloaded from a database ( Table 5) that can be open access. A protein whose crystallographic structure is known is commonly used, but there is also the possibility of generating a protein by homology modeling [49][50][51].
The most significant benefit of VS is the reduction in the number of substances that must be subjected to expensive experiments. The most significant disadvantage is that a compound with pharmacologic potential can be omitted by the algorithm [52]. VS methods are usually classified into ligand-based VS (LBVS) and structure-based VS (SBVS). In the case of LBVS, a ligand is used as a reference, and the database is searched for molecules with similar characteristics. SBVS approaches are substantially more complex since they consider a protein macromolecule. Ligand classification is based on the degree of the ligand fit in the protein active site (ligandprotein match) [53,54].

Molecular dynamics
Molecular dynamics simulations are in-silico methods that predict the behavior of a system over time. The rapid evolution of computer hardware has also benefited the field of molecular dynamics. Simulations, which once needed the most advanced computers of the time, can be done today on a computer with mid-range specifications [55,56]. The increased availability of these simulations is mirrored in the number of published articles (Figure 8).

3-chymotrypsin like protease (3CL pro , M pro )
The SARS-CoV-2 genome contains at least six ORFs, of which ORF1ab encodes two polyproteins, pp1a and pp1ab. These will be cleaved by the 3CL pro and PL pro (Figure 9) and as a result, 16 non-structural proteins (nsp) which are essential to the virus's life cycle are formed [59,60]. 3CL pro is a dimer composed of two symmetrical protomers further subdivided into three domains. Domain I is represented by residue 8 to 101  3CL pro has no known structural analog in the human body, so discovering a therapeutic agent that targets the mechanism of action of 3CLpro will not show negative effects in the host cell [62][63][64]. Table 5 summarizes some of the SBVS studies performed searching for compounds with inhibitory potential on 3Cl pro . The PDB ID of the target protein and the software used for molecular docking were coupled with molecular dynamics.

J o u r n a l P r e -p r o o f
The crystal structure of 3CLpro SARS-CoV-2 with the PDB ID: 6LU7 was published on 2020-02-05 by Jin et al. 6LU7 has a resolution of 2.16 Å and is co-crystallized with a molecule that functions as a protease inhibitor. This molecule was obtained by computer-aided drug design, following the analysis of a large number of compounds (over 10,000) [85]. Many protease inhibitors used in the treatment of HIV/SIDA and hepatitis C (Table 6) have been identified as possible inhibitors of 3CL pro protease by docking studies and docking studies combined with VS [86]. Protease inhibitors that are conventionally used in the treatment of HIV / AIDS have a relatively high affinity for 3CL pro , making them suitable candidates for inhibiting the SARS-CoV-2 protease. Darunavir and Saquinavir have affinities of -7.7 / -9.5 kcal/mol, respectively. Grazopavir and Paritaprevir, two NH3 / 4A protease inhibitors used to treat HCV, have affinities to 3Clpro of -8.1 / -9.5 kcal/mol, respectively. According to docking studies on 3CL pro and the fact that they act on the protease of another RNA virus (HCV/ HIV/SIDA), these compounds may also interfere with SARS-CoV-2 3CL pro .

A quick look at Nirmatrelvir
The mechanism of action is specific to this class and prevents 3CLpro from processing the pp1a and pp1ab polyproteins [106]. It is coupled with ritonavir, a human immunodeficiency virus type 1 (HIV-1) protease inhibitor that plays a vital role in lowering the metabolization rate of Nirmatrelvir, by inhibition of the CYP3A isoenzymes [107].
According to WHO, it is the most effective therapy for high-risk patients to date (e.g., unvaccinated or immune-suppressed patients). The drug is administered orally in the early stages of the disease. The pharmaceutic concept of the drug is not new, and it is based on a compound developed for the treatment of SARS-CoV. The Emergency use authorization was granted within a year after Phase 1 clinical studies got started. [108].
In terms of in-silico docking, Nirmatrelvir was docked to the 6Y2F protein, and the authors observed that it binds to the following residues through hydrogen bonds: Cys145, Glu166, and Gln189 [109]. Another docking study J o u r n a l P r e -p r o o f on the protein with the PDB ID: 7SI9 reported that Nirmatrelvir interacts with the protein via hydrogen bonds with the following amino acids: Gly143, Phe140, and Glu 166 [110].

Papain-like protease (PL pro )
Another SARS-CoV-2 protease that anti-virals might target is PL pro , because it is involved in processing the viral polyproteins (Figure 4) of SARS-CoV-2, allowing viral spread [111]. It has been demonstrated that it also interfere with host cell proteins by modulating the host viral response to facilitate the spread of the virus [112]. In silico studies on PL pro have been conducted to discover inhibiting drugs for this dual-effect protease. Table 7 presents a series of in-silico studies aimed at discovering PLpro inhibitors.

RNA-dependent RNA polymerase (RdRp)
RdRp is a non-structural protein with a key role in genomic transcription and replication. The analysis of the crystallographic structure of RdRp shows the distinct shape of the right hand consisting of a palm, fingers, and thumb. [129,130]. Involved in genomic transcription and replication, RdRp is considered a valuable therapeutic target ( Figure 11). RdRp has no structural homologs in the human cell, so drugs that act specifically on this protein should, in theory, not cause side effects induced by the drug's mechanism of action [131,132]. In the research process of RdRp inhibitors, the most promising drug class is that of nucleoside analogs [133]. VS studies sought to find RdRp inhibitors, and the most promising compounds were the already approved anti-virals and used as nucleoside analogs in RNA viral infections. The fact that the active sites of these enzymes retain their fundamental properties in most RNA viruses increases the interest in RdRp-targeted anti-viral for the treatment of SARS-CoV-2 infection [134,135]. Until the crystallographic structure of RdRp was identified, molecular docking studies were based on homology models to build the protein needed to perform these studies [136][137][138]. After the publication of the crystallographic structure of RdRp (e.g. PDB ID: 7BW4 [139] 6M71 [140], J o u r n a l P r e -p r o o f homology models have diminished in significance. Table 8 reviews SBVS studies focused on finding RdRp inhibitors.

Remdesivir
Remdesivir is a nucleotide analog designed to target RBA viruses (e.g., hepatitis C). It is a prodrug that is bioactivated in the body and functions as an adenosine triphosphate analog (ATP). Figure 5 summarizes the mechanism of action. The suggested administration model is one dose on the first day, followed by two more on subsequent days. According to research, it must be administered in the early stages of the infection [154,155].

Conclusions and future perspectives
In-silico docking is continually changing, and previously unavailable approaches due to hardware limitations or a lack of target proteins have become widely available. Researchers' interest in this topic has grown due to the large number of macromolecules published on the Protein Data Bank (190404) and the fast advancement of hardware components. In order to discover new, innovative drug substances, pharmaceutical companies are increasingly using HTS, but the high costs make this method require a more economical alternative. The in-silico counterpart of HTS is VS, which enables the test of a large number of compounds in a short amount of time.
VS, more specifically, SBVS in combination with molecular dynamics, has the potential to play a vital role in the reuse or even development of medicinal compounds. Several experiments were conducted to find pharmacological compounds that may act on this RNA virus during the pandemic. Drugs that act on the target structures of other RNA viruses were among the drugs considered. Drugs such as protease inhibitors used in treating HIV / AIDS or hepatitis C have been subjected to docking studies. Compounds of natural origin have been rigorously tested against all SARS-CoV-2 targets in the hope of finding a potential inhibitor.
Currently, in-silico studies are being utilized to limit the number of more costly in-vitro / in-vivo candidates. The introduction of a standard approach is a first step that must be made before the usefulness of insilico procedures rises, especially when it comes to molecular docking. Because most programs employ different algorithms and scoring techniques, reproducibility of findings from one program to another is nearly impossible. Artificial intelligence (AI) and deep learning (DL) have become more prevalent in recent years, and public interest in these technologies has increased. One of the most significant advancements in in-silico research is the incorporation of AI into the drug development process. These approaches are becoming more widely available and are highly accurate. However, it has been observed that in silico methods are extremely useful when it comes to researching the therapeutic potential of different molecules, analyzing the interactions between viral targets and their potential/possible inhibitors, as well as designing new agents usable in prevention or therapy. The applicability of these methods of drug screening, is of real use both in the field of designing innovative drugs for treating various aspects of COVID-19 and regarding their applicability in the event of a future pandemic. In summary, the development of all these biomedical tools mentioned above opens new opportunities in testing and discovering innovative therapies, contributing to obtaining fundamental results with immediate usability in adopting precise intervention strategies / protocols in cases of need or emergency.