Genetic conservation across SARS-CoV-2 non-structural proteins – Insights into possible targets for treatment of future viral outbreaks

The majority of SARS-CoV-2 therapeutic development work has focussed on targeting the spike protein, viral polymerase and proteases. As the pandemic progressed, many studies reported that these proteins are prone to high levels of mutation and can become drug resistant. Thus, it is necessary to not only target other viral proteins such as the non-structural proteins (NSPs) but to also target the most conserved residues of these proteins. In order to understand the level of conservation among these viruses, in this review, we have focussed on the conservation across RNA viruses, conservation across the coronaviruses and then narrowed our focus to conservation of NSPs across coronaviruses. We have also discussed the various treatment options for SARS-CoV-2 infection. A synergistic melding of bioinformatics, computer-aided drug-design and in vitro/vivo studies can feed into better understanding of the virus and therefore help in the development of small molecule inhibitors against the viral proteins.


Introduction
. COVID-19 was first identified in December 2019 in Wuhan City, China , negatively impacting on the lives of the global population due to the resulting healthcare, economic, social and mental health crises. There have been approximately 607 million COVID-19 cases around the world as of 31 st August 2022 (Worldometer, 2020 COVID -Coronavirus Statistics -Worldometer (worldometers.info)).
The genome of SC-2 is 29.9 Kb in length, encoding for 29 proteins (4 structural proteins, 16 non-structural proteins (NSPs) and 9 accessory proteins). The primary focus of this review will be on the NSPs, their degree of conservation across variants of SC-2 and across other group IV viruses, ultimately leveraging this level of conservation to inform future therapeutic development strategies.
This review will provide an overview of SC-2 taxonomy, the genetic and structural organisation of coronaviruses, the conserved regions within different viruses and current treatment for SC-2 infection. The review will particularly focus on conservation among RNA viruses, conservation among coronaviruses and conserved regions present in the NSPs of coronaviruses. We propose that the conserved NSP regions of coronaviruses can potentially be targeted to develop pan inhibitors against future disease outbreaks caused by coronaviruses.

Taxonomy and classification of viruses
Viruses are mobile microscopic parasites containing RNA or DNA genomes which are surrounded by a protective protein coat called the capsid. The nomenclature of the virus depends on the type of genomic material i.e., in the case of single stranded RNA, the viruses can be classified into positive sense and negative sense (Gelderblom, 1996). Baltimore (1971) classified viruses on the basis of the type of mRNA produced by the viruses and divided them into seven groups: double-stranded DNA viruses (dsDNA, Group I); single-stranded DNA viruses (+ssDNA, Group II); double-stranded RNA viruses (dsRNA, Group III); positive single-stranded RNA viruses (+ssRNA, Group IV) (highlighted in Fig. 1); negative single-stranded RNA viruses (ssRNA, Group V); positive single-stranded RNA viruses with DNA intermediates (+ssRNA-RT, Group VI), commonly known as retroviruses; and the double-stranded DNA retroviruses (dsDNA-RT, Group VII).
The Group IV viruses have a positive sense genome and comprises of seven sub-classes of viruses that are either enveloped (Flaviviridae, Togaviridae and Coronaviridae) or non-enveloped (Caliciviridae, Picornaviridae, Hepeviridae and Astroviridae) (Berman, 2012).

Coronaviridae
The order Nidovirales consists of RNA viruses having linear genome sizes of 26-32 kb, includes coronaviruses, toroviruses and roniviruses (Gorbalenya et al., 2006). The coronaviruses (CoVs) are enveloped positive-strand RNA viruses that can cause life-threatening disease in humans. There are 39 species in 27 subgenera, five genera and two subfamilies that belong to the family Coronaviridae. The coronaviruses are spherical in shape having a diameter of 80-160 nm. The Coronaviridae family is divided into subfamilies Torovirinae and Coronavirinae. Genetic and antigenic characteristics are used to divide coronaviruses into four genera: α, β, γ, and δ (Fig. 2).
Mammals are mostly infected by α and β coronaviruses whereas γ and δ tend to infect birds and some can also infect mammals (Woo et al., 2012) (Cui et al., 2019). Coronaviruses are known for causing common colds and have gathered a lot of attention due to their ability to cause zoonotic infections (Pyrc et al., 2007) (Graham et al., 2013). The 2003 outbreak of SC-1 and the 2012 outbreak of MERS-CoV were caused by betacoronaviruses found in bats and other species prior to human transmission (Ge et al., 2013;Menachery et al., 2015). Within the subgenus lineage B, SC-1 and SC-2 phylogenetically share a most recent common ancestor and are relatively distant to MERS-CoV (belonging to the subgenus Merbecovirus) in the genus Betacoronavirus .

Genomic and structural organisation of Coronaviridae
The group 4 viruses have +ssRNA genomes that serve as mRNA. A negative strand is synthesised to generate a double stranded replicative form (dsRF) which is used to make multiple copies of the genome. The viruses within this group are enveloped and non-enveloped (Fig. 3).
The SC-2 genome encodes for structural, accessory and NSPs. There are ~30,000 nucleotides within the SC-2 genome and it possesses 14 ORFs encoding for at least 27 proteins . The translation of these ORFs by the host cell machinery give rise to polyproteins that are processed by the host and viral proteases (Modrow et al., 2013). The viral RNA is translated from two open reading frames ORF1a and ORF1ab which thus produces two large polyproteins, PP1a and PP1ab Tok and Tatar, 2017). Cleaving of the polyproteins is achieved by two viral proteases that are present within the polyprotein structure. These are papain-like protease (PL pro ) and chymotrypsin-like protease (3CL pro , also called M pro ), which produce the mature NSPs (Ghosh et al., 2020) (Fig. 4). There is production of 10 fragments, constituting NSP1-NSP10 when PP1a is cleaved by the protease. On the other hand, PP1ab is cleaved to produce all the NSPs i.e., NSP1-NSP16. A component of NSP3 known as the PL pro protease is involved in the formation of NSP1, NSP2 and NSP3. The remaining NSPs (NSP4-NSP16) are processed by 3CL pro which is present within NSP5 . These NSP's are necessary for the genome replication and some of their functions are mentioned in Table 1 along with sequence similarity percentage between SC-1 and SC-2. All these NSP's are common across members of the Beta-CoV genera.
The pp1ab polyprotein is encoded by the ribosomal frameshift mechanism of gene 1b. pp1ab has 7096 amino acid residues and is known to be well conserved across CoVs (Naqvi et al., 2020). Apart from the structural proteins and NSPs, the SC-2 genome also encodes for eleven accessory proteins: ORF3a, ORF3b, ORF3c, ORF3d, ORF6, ORF7a, ORF7b, ORF8, ORF9b, ORF9c and ORF10 (Shang et al., 2021). These accessory proteins play an important role in pathogenesis but further discussion is outside the scope of this article.

SARS-CoV-2 entry into the host cell
In the SC-2 genome there are structural proteins such as spike (S), envelope (E), membrane (M) and nucleocapsid (N) proteins located at the 3 ′ terminus. These proteins share high sequence similarity with the corresponding protein of SC-1 and MERS-CoV (Naqvi et al., 2020).
The spike protein of CoVs facilitates binding to the host cell-surface receptor during host cell entry (Lau and Peiris, 2005). The receptor-binding domain (RBD) in the S1 subunit of the S protein is responsible for binding to the host receptor thus resulting in fusion of the S2 subunit to the host cell membrane. SC-1 and SC-2 recognize the human ACE2 receptor to bind with the viral S protein followed by entry into the host cell as shown in (Fig. 5)  . In addition to ACE2, coronaviruses utilise different receptors such as aminopeptidase N (CD13) (Yeager et al., 1992), 9-O-Acetylate sialic acid (Huang et al., 2015) and DPP4 (Wang et al., 2013) to facilitate entry into the host cell.

Conserved regions
Conserved sequences are either amino acid or nucleotide sequences that remain relatively unchanged far back up the phylogenetic tree or among diverse species. Sequences considered to be highly conserved suggest that the sequence has been maintained by natural selection. The conserved regions present within the sequences can be identified using multiple sequence alignment (MSA) studies and suggest structural or functional importance to the protein. To outline the importance of sequence and structural conservation across viruses, we have first focused on conservation between different RNA viruses followed by conservation within coronaviruses and then narrowed our focus to conservation of NSPs across coronaviruses. The highly conserved regions, if present in the binding pocket of the protein, can be interesting targets for developing small molecule inhibitors.

Similarities of constituent proteins within RNA viruses
In this section we focus on similarities across different RNA viruses. The RNA-dependent RNA polymerase (RdRp) (NSP12), which is essential for RNA replication, is known to be conserved within RNA viruses so targeting this protein may be an effective therapeutic approach (Xu et al., 2003;Gerlach et al., 2015). A study by Mönttinen et al. (2021) used 42 high-resolution viral RdRp structures from the PDB of -ssRNA, +ssRNA and dsRNA viruses. By using the sequence alignment tool MAFFT L-INS-i, amino acid alignment of complete RdRp sequences was performed. The catalytic site failed to align but 231 residues of the common structural core of the polymerases covered over 30% of the influenza A virus, over 50% of the poliovirus and 25% of SC-2 polymerase structures. The study pointed out possible structural conservation across the viral RdRps thus indicating the potential to target the polymerase for discovering wide spectrum antivirals. Considering NSP3, the first 160 residues of the N-terminal region, also known as the macro domain (X domains), was found to be conserved among alphavirus, rubivirus and coronavirus in a sequence similarity study (Gorbalenya et al., 1991). Torovirus (ToV) belongs to the order Nidovirales, and historically belonged to the family Coronaviridae. It now belongs to separate family Tobaniviridae (ICTV, 2019). The six NSP domains shared by the ToVs and CoVs are 3C-like protease (NSP5), nidovirus  RdRp-associated nucleotidyltransferase (NiRAN) of RdRp (NSP12), zinc-binding domain (ZBD) of helicase (HEL) (NSP13), and endoribonuclease (EndoU) (NSP15) which are known to be conserved in most nidoviruses (Van Boheemen et al., 2012;Saberi et al., 2018). The structural or sequence conservation among these viruses could support the hypothesis of having a common ancestor. These studies could help discover new antivirals targeting these conserved regions among diverse ranges of viruses. The hepatitis C virus non-structural protein (NS5b) RdRp and MERS-CoV RdRp share a low sequence similarity but have structural similarity as both have two conserved aspartate residues at the active site (Elfiky and Azzam, 2021). The RdRp of flaviviruses such as Zika, West Nile Virus and Dengue virus share 66-69% amino acid sequence identity with the RdRp of SC-1 (Xu et al., 2003).
Another approach to look for similarity between viruses is to analyse the structural similarity between them. In a study by Bafna et al. (2020), the PDB was searched using the DALI program Sander, 1993, 1999) for 3D structural similarity with domains I and II (excluding domain III) of the SC-2 M pro as the query search resulted in several structurally similar proteases, among them the hepatitis C virus NS3/4A serine protease. The proteases of HCV and SC-2 had a structural similarity Z score of +8.4 with an overall backbone root-mean-square deviation for structurally similar regions of approximately 3.1 Å. The superimposition of the two proteases resulted in overlap of their substrate binding clefts as well as their active-site catalytic residues, His41/Cys145 of the SC-2 M pro cysteine protease and His57/Ser139 of the HCV NS3/4A serine protease. In light of these structural similarities, HCV protease inhibitors were predicted to bind to the substrate-binding cleft of SC-2 M pro and inhibit virus replication. The results of the study suggested that ten HCV protease inhibitors had the potential to bind with the M pro binding cleft with hydrogen bonding and hydrophobic contacts. After biological testing, seven of the HCV drugs inhibited SC-2 replication in human 293T cells with IC 50 values from 0.55 to 20.5 μM (Bafna et al., 2021).

Sequence similarity/identity and protein conservation across coronaviruses
This section discusses overall genomic conservation across the coronaviruses (CoVs). There is more than 70% genetic similarity between SC-2 and SC-1, which was responsible for the 2003 SARS outbreak. The pairwise amino acid sequence alignment pointed to sequence identity of 94.6% in seven conserved replicase domains of the ORF1ab polyprotein of SC-2 with SC-1 . There is higher sequence homology between SC-2 and SC-1 than MERS-CoV (Xu Table 1 Shows the known NSP's with their functions, amino acid length, accession number and Protein Data Bank (PDB) IDs (example PDB ID and the total number of PDB structures as on July 2022) along with percentage similarity between SC-1 and SC-2. The pairwise sequence alignment was performed using EMBOSS Needle at https://www.ebi.ac.uk (Madeira et al., 2022  et al., 2020). Studies have also suggested that SC-2 is closely related to bat SARS such as bat-SL-CoVZC45 (Lu et al., 2020) and bat-SL-CoVZXC21 Lu et al., 2020). A study by Ceraolo and Giorgi (2020) found a 96.2% sequence identity of Bat-CoV RaTG13 genome (Gisaid EPI_ISL_402131) with the SC-2 reference genome (NC_045512.2) thus confirming the possibility of a zoonotic transmission.
The non-coding sequences, which are at the genomic terminals (5 ′ -3 ′ untranslated regions) of CoVs, have gene regulatory functions and are known to encode conserved RNA secondary structures (Chen and Olsthoorn, 2010). A study by  used 109 representative CoV genomes collected from four genera (alpha, beta, gamma and delta) and the reference genome of SC-2 to find conserved features between them. Using Clustal Omega (v1.2.4), MSA was performed. The results suggested that there was nucleotide sequence identity between SC-2 and the betaCoVs at various genomic locations such as 3 ′ -UTR (97.4%), E gene (95.1%), ORF10 (93.8), 5 ′ UTR (91.1%) and NSP10 (89.7%). The SC-2 3 ′ UTR (approximately 30 nucleotides) also shared high sequence identities with deltaCoVs (97% from pigs) and gammaCoVs (94% from chicken and fowls). According to the study, the 3 ′ UTR and 5 ′ UTR are conserved within the betaCoV lineage B.
In another study the genome and protein sequences of various CoV isolates from humans, bats, civets and pangolin were downloaded from NCBI GenBank (Sayers et al., 2022) (www.ncbi.nlm.nih.gov/sars-cov-2) as well as GISAID (Khare et al., 2021) (www.gisaid.org) database. The cDNA sequences from these databases were translated into protein sequences using the sequence analysis application MacVector. The sequences of protein were aligned using CLUSTAL W v.10. These protein sequences were compared to the reference isolate (Wuhan-Hu-1/2019) for identifying mutations. The result of the study indicated 86% sequence identity of Orf1ab protein between SC-2 and SC-1 (Padhan et al., 2021).

Variants of SARS-CoV-2
The RNA viruses such as SC-2 are also known to adapt according to their new host which results in mutations and causes genetic evolution over time. These mutations eventually will branch out into different variants of the virus that are initially very similar to the ancestral strain but will tend to mutate further over time to produce more variable mutants. Many SC-2 variants have been reported since the first outbreak of the virus. Some of the variants impact on global health and are recognised as variants of concern (VOC) by the World Health Organization (WHO). Considering the different variants of SC-2, a study by Almubaid and Al-Mubaid (2021) analysed 1200 genomes sampled across the first seven months of 2020. They analysed mutations and mutation frequency/trend using sequence alignment tool Clustal Omega and Jalview for viewing the alignment. They reported that the Orf1a region mostly had synonymous mutations. Some synonymous mutations were also found in NSP12 and NSP13. Fig. 6 shows MSAs of NSP1 of the SC-2 variants and the very high degree of conservation across this NSP.
Below are the major VOCs, however more sub-variants have emerged from these over time.

NSP conservation across coronaviruses (CoVs)
The NSP1 of SC-2 has a sequence similarity of approximately 91% with NSP1 of SC-1. Considering structural similarity, the core domain of NSP1 is also similar between the two viruses (Clark et al., 2020;Semper et al., 2021). In a study, 47,427 NSP1 sequences of SC-2 were analysed against the Wuhan-Hu-1 reference strain. The sequence alignment was performed using the EMBOSS diffseq program which resulted in >97.6% sequence identity between the sequences (Min et al., 2020). NSP1 can halt host translation by interacting with the human 40S ribosomal subunit with the help of a dipeptide motif Lys164-His165 (Fig. 7) (Narayanan et al., 2008). These residues were found to be conserved in SC-1 and SC-2 in a sequence alignment of NSP1 C terminal (Schubert et al., 2020). The interaction between the C-terminal of NSP1 and the human ribosome should be further investigated to understand NSP1 mediated translation inhibition and development of inhibitors for same. Fig. 6. Shows the MSA of NSP1 proteins with SC-2 Wuhan as the reference sequence and its variants (one sequence of each variant). The MSA of NSP1 was performed using Clustal Omega (Sievers et al., 2011) and rendered using ESPript version 3.0 (Robert and Gouet, 2014). Red colour shows sequence identity among the variants with the reference sequence. White colour shows mutation and deviation of amino acid sequence (presence of Threonine in Epsilon-NSP1 instead of Alanine).
The NSP2 of SC-2 is less conserved than other NSP's, there is 68% identity with SC-1 NSP2 and just 20% identity with MERS-CoV NSP2. But there are four cysteine residues coordinating a Zn 2+ ion in a Zn ribbon like motif which are conserved across these viruses (Verba et al., 2021).
NSP3 is the largest multi-domain protein produced by the coronaviruses. There is 83% sequence identity between the NSP3 of SC-2 and SC-1. In contrast, there is only 33% identity between NSP3 of SC-2 and MERS-CoV (Freitas et al., 2020). In a study the sequence alignment of the SC-1 and SC-2 SUD protein (one of the domains of NSP3) showed that the Lys565, Lys568 and Glu571 residues are conserved (Lavigne et al., 2021). Considering the PDB structure of one of the domains of NSP3, known as papain-like protease domain, it is known to have a homo-trimeric fold which is conserved with its SC-1 and MERS CoV counterparts. The key residues responsible for binding of ADP-ribose and other related adenosine-derivatives with NSP3-macrodomain X are conserved across beta-CoVs (Fig. 8). The structure also has high structural homology to the SC-1 NSP3-macrodomain X . NSP4, a transmembrane protein, is a component of the replication complex found in CoVs and it has a sequence identity of 80% between different SARS strains (Davies et al., 2020). A study by Gordon et al. (2020) reported 90.8% sequence similarity between NSP4 of SC-2 and SC-1. The C-terminal of NSP4 is found to be conserved among Nidovirales (Neuman, 2016).
The viral protease NSP5, is highly conserved, has a chymotrypsinlike fold (3CLpro) and is also known as the main protease (Mpro) (Anand et al., 2002(Anand et al., , 2003Gorbalenya et al., 1989). There is high tertiary and quaternary structural conservation especially in domains 1 and 2 of Mpro across coronaviruses. The N-terminal domains (1 and 2) forms a chymotrypsin-like fold consisting of beta-barrels which is known to be highly conserved among coronaviruses (Stobart et al., 2013). Two highly conserved residues (His41-Cys145) are present at the catalytic site of the SC-2 and SC-1 protease (Shitrit et al., 2020). A review by Goyal and Goyal (2020) discussed the importance of targeting the dimerization of the main protease as a broad-spectrum therapeutic strategy. The residues involved for dimerization are highly conserved across beta coronaviruses (Fig. 9).
There were highly conserved residues found to be located at the two oligomer interfaces of NSP7 and NSP8 (Biswal et al., 2021). The replication-transcription complex of NSP12-7-8 have high conservation of the structural features between SC-1 and SC-2 (Kirchdoerfer and Ward, 2019;Gao et al., 2020) A structure-based sequence analysis study suggested that the residues of NSP8 forming interactions with NSP7 are highly conserved among coronaviruses (Biswal et al., 2021). A study by Gordon et al. (2020) reported to have 99% sequence similarity between NSP8 of SC-2 and SC-1.
For the replication process, the NSP9 monomer of CoVs is also very important. An MSA study of NSP9 found that there are two glycine residues (G100 and G104) which are conserved across CoVs (Miknis et al., 2009). A study by Littler et al. (2020) suggested that there is a high level of structural and sequence (97%) conservation of NSP9 between SC-2 and SC-1. The study also reported that NSP9 has a conserved GxxxG interaction motif (the same glycine residues highlighted above) within the α-helix which facilitates a mini coiled-coil dimerization interaction.    (Sievers et al., 2011) and viewed with Jalview (Troshin et al., 2011(Troshin et al., , 2018 of NSP5 from SC-2 (YP_009742612.1), SC-1 (YP_009944370.1), and MERS (YP_009047233.1) showing percentage identity (blue colour to white as identify decreases). Red boxes highlight the conserved residues involved in M pro dimerization. (b) SC-2 M pro (PDB ID: 1UK4) showing the two monomers A and B of the dimer in pink and yellow colour respectively. The conserved residues of monomer A are highlighted in green colour (Ser10, Gly11, Glu14, Asn28, Ser139, Ser144, Ser147, Glu166, Glu290 and Arg298) that can be targeted to inhibit the dimerization of SC-2 Mpro. The picture was generated with MOE (2022). A recent study found regions conserved within NSP6 of the Coronaviridae family which could be targeted for developing a new therapeutic approach (Gupta et al., 2020). These conserved regions are mostly coiled secondary structure with multiple charged and aromatic amino acids which are responsible for protein-protein interactions. The MSA of 10,664 SC-2 virus genomes across 73 countries confirmed the global stability of conserved regions within orf1ab specifically in the NSP6 region (Saha et al., 2021).
The SC-2 viral polymerase RdRp (NSP12) helps in replicating the viral genome. Conserved replicase domains indicate common ancestry of the nidoviruses (Gorbalenya et al., 2006). NSP12 also shares conserved active residues with other human coronaviruses such as SC-1, MER-S-CoV, HCoV-OC43 and HCoV-229E (Dhankhar et al., 2021) (Fig. 10). There is the presence of seven conserved polymerase motifs (A to G) in the catalytic core of NSP12 of SC-2 residing in the palm domain resembling NSP12 of SC-1 (Kirchdoerfer and Ward, 2019).
An interesting study by Yazdani et al. (2021), performed sequence alignments of 15 SC-2 proteins and compared the variations across 27 αand β-coronaviruses. They also mapped druggable binding pockets on the protein structures. The results suggested that a druggable channel overlapping with RNA binding site of helicase protein (NSP13) of SC-2 was highly conserved i.e., >90% of the residues lining the pocket were conserved across 27 αand β-coronaviruses. Our in-house analysis also confirmed this high level of conservation across ~200,000 SC-2 variants (Fig. 11). The helicase protein is highly conserved across SC-2, SC-1 and MERS-CoV (Fig. 12). This channel has recently been chosen by the CACHE consortium as the target for its second international hit finding competition (https://cache-challenge.org/). Another study found that the residues in the zinc-binding domain of NSP13 making interactions with NSP8 of the replication complex are  highly conserved in Betacoronaviruses . NSP14 acts as an exoribonuclease and N7-MTase in CoVs. An MSA (using CLUSTALW) study of NSP14 protein sequences was performed among different genera: SC-1 (genus Betacoronavirus), HCoV-229E (genus Alphacoronavirus) and IBV (genus Gammacoronavirus) which indicated the presence of conserved residues (Ma et al., 2015). The presence of a conserved DEDDh motif in the active site of NSP14 is known to be distributed over three canonical motifs (I, II and III). The SC-2 residues D90/E92 (motif I), D243 (motif II), and D273 (motif III) were reported to be fully conserved (Fig. 13) in the other CoVs (SC-1 and MERS-CoV) (Saramago et al., 2021).
The NSP15 of coronaviruses is known to have endoribonuclease activity and to be conserved in coronaviruses . The residues H235, H250, K290 and T341 present in the catalytic site of the NendoU domain are known to be highly conserved in SARS-CoVs (Joseph et al., 2007). There is 94.8% sequence similarity between the NSP15 of SC-2 and SC-1 (Saramago et al., 2022). The six key residues His235, His250, Lys290, Thr341, Tyr343 and Ser294 (Fig. 14) which are present in the active site of NSP15 are conserved among SC-2, SC-1 and MERS-CoV .

Current treatment of SARS-CoV-2 infection
The infection caused by SC-2 in humans can cause mild, moderate or severe symptoms. After the viral spike protein binds with the epithelial cells in the respiratory tract, the virus enters the cells and starts replicating, migrating down to the airways thus entering alveolar epithelial cells in lungs. The immune response is triggered due to rapid replication of viral particles in the lungs. There can be respiratory failure and respiratory distress due to the production of a cytokine storm. This cytokine storm syndrome is considered to be a major cause of death in patients with COVID-19 . In severe cases of COVID-19, life-threatening pneumonia develops. All ages of the population are susceptible to COVID-19 however patients over the age of 60 years old with co-morbidities are more likely to develop severe respiratory disease (Guan et al., 2020).
There are a variety of therapeutic modalities available under FDA issued Emergency Use Authorization (EUA) or being evaluated for the management of COVID-19; for example, antiviral drugs (e.g., molnupiravir, paxlovid or remdesivir), anti-SC-2 monoclonal antibodies (e.g., bamlanivimab/etesevimab or casirivimab/imdevimab), antiinflammatory drugs (e.g., dexamethasone) and immunomodulators agents (e.g., baricitinib or tocilizumab) (Coopersmith et al., 2021). To contain the global pandemic, vaccination is an important approach for preventing and slowing down the transmission of SC-2. Researchers all around the world have worked on developing novel vaccines against SC-2 and according to ourworldindata.org 11.96 billion doses of vaccine doses have been administered as of 16 th June 2022.
Conserved enzymes such as 3CLpro/Mpro (NSP5), papain-like protease (PLpro) (NSP3) and RdRp (NSP12) are considered promising drug targets as these enzymes have sequence similarities with viruses like SC-1 and MERS-CoV (Zumla et al., 2016). It can be hypothesised that due to the sequence similarities between SC-1, MERS-CoV and SC-2 that therapeutic molecules used for targeting SC-1 and MERS-CoV could also prove to be useful for SC-2 with a similar efficacy. The search for antiviral drugs has primarily concentrated on repurposing existing drugs, for example remdesivir, which is an antiviral drug acting on HIV reverse transcriptase and was approved for emergency treatment of COVID-19 patients (Parks and Smith, 2020).

Protease inhibitors
Viral proteases are productive targets for antiviral therapies. The virtual screening of HIV protease inhibitors lopinavir, ritonavir, darunavir and cobicistat (Table 2) against SC-2 3CLpro (NSP5) predicted inhibition of the protease. Lopinavir showed a significantly inhibitory Fig. 13. Sequence alignment Using Clustal Omega (Sievers et al., 2011) and viewed with Jalview (Troshin et al., 2011(Troshin et al., , 2018 of NSP14 from SC-2 (UniProt ID: P0DTD1), SC-1 (UniProt ID: P0C6X7) and MERS (YP_009047225) showing conserved residues (red box) within the different motifs, colour scheme (blue) = percentage identity. effect on SC-2 in Vero E6 cells with an EC 50 of 26.63 μM (Choy et al., 2020). A group in the USA used a docking approach and found that the anticoagulant agent dipyridamole inhibited Mpro as determined by the surface plasmon resonance assay in vitro. More than 50% replication of SC-2 was inhibited at a concentration of 100 nM in Vero E6 cells .
An orally administered drug known as S-217622 (ensitrelvir) which is also a SC-2 M pro inhibitor decreases viral load and ameliorates disease severity in SC-2 infected hamsters (Sasaki et al., 2023). It also inhibited viral proliferation in Vero E6 cells with an EC 50 value of 0.35 μM (Unoh et al., 2022).
The FDA has authorized Paxlovid (nirmatrelvir and ritonavir) in December 2021 (Mpro inhibitor) under emergency use authorization against SC-2 infection which is used for the treatment of mild-tomoderate COVID-19 in adults and paediatric patients (12 years of age and older weighing at least 40 kg). A recent study suggests that Paxlovid is highly effective at reducing the risk of severe COVID-19 or mortality. In the study 4737 patients were treated with Paxlovid who had adequate COVID-19 vaccination status. Both were associated with significant decrease in the rate of severe COVID-19 or mortality (Najjar-Debbiny et al., 2022). Heilmann et al. (2023) identified several mutations such as Y54C, G138S, L167F, Q192R, A194S and F305L in the SARS-CoV-2 M pro that confer resistance to the M pro inhibitors nirmatrelvir and ensitrelvir.

RdRp inhibitors
NSP12 is where the RdRp activity of the virus resides . This protein is necessary for the viral replication process, so it is a promising anti-viral drug target. NSP12 is targeted by small molecule inhibitors like remdesivir (GS-5734), favilavir (T-705) and ribavirin (Tahir Ul Qamar et al., 2020). Among these remdesivir has shown efficacy against SC-2 in vitro and in vivo (Williamson et al., 2020). Contrary to this, a large-scale study conducted on 5000 participants by the World Health Organization's Solidarity trial consortium found that remdesivir had little or no effect on hospitalized patients with COVID-19 indicated by overall mortality, initiation of ventilation, and duration of hospital stay (WHO Solidarity Trial Consortium, 2021).
Favilavir, which is an antiviral drug used for treating influenza, has been approved in China, Russia and India for the treatment of COVID-19. The time of viral clearance was reported to be reduced by the use of favilavir in a clinical study conducted in China (Cai et al., 2020). In December 2021 the FDA issued an EUA for Merck's oral pill molnupiravir (NSP12 inhibitor) for treatment of SC-2 infection. A recent study conducted on 40,776 patients infected with SC-2 during the BA.2 wave found that initiation of novel oral antiviral treatments such as molnupiravir or nirmatrelvir-ritonavir in hospitalised patients not requiring oxygen therapy on admission showed substantial clinical benefit (Wong et al., 2022). Taking the example of HCV (group 4 virus), the RdRp (NS5B polymerase) is targeted by sofosbuvir (Table 3) which is a nucleotide analogue inhibitor (Summers et al., 2014). A study by Sacramento et al. (2021) found that sofosbuvir alone and combined with daclatasvir inhibited replication of SC-2 in Calu-3 cells with EC 50 value ranging from 0.5 to 0.7 μM.

Other inhibitors
The SC-2 NSP13 is the helicase protein which is known for separating and rearranging the viral nucleic duplex prior to transcription and replication (Kwong et al., 2005). The helicase of SC-2 has 99.8% sequence identity to SC-1 NSP13 helicase . Some reported inhibitors include bananins, 5-hydroxychromone derivatives, ADKs and SSYA10-001, which have potential to be used in the treatment of COVID-19 as they had an inhibitory effect on SC-1 (Tanner et al., 2005;Kim et al., 2011;Lee et al., 2009;Adedeji et al., 2012). Among these compounds SSYA10-001 presented weaker antiviral activity against SC-2 in Vero E6 cells with an EC 50 value of 81 μM. In the same study another compound FPA-124 was tested for antiviral activity against NSP13 of SC-2 in Vero E6 cells giving an EC 50 value of 14 μM (Zeng et al., 2021).

CADD studies
The drug discovery process has been accelerated with the help of computer-aided drug design (CADD). Used appropriately, it can decrease timelines and is a cost-effective approach for identifying novel  hit molecules and for improving a molecule's drug-like properties by pharmaceutical companies. In silico virtual screening of possible inhibitors against target proteins can give insight into the mechanism of action (MOA) of the molecule by analysing the binding interactions between the two. Considering the urgency of discovering drug treatments for COVID-19, repurposing of existing drugs is a productive approach for finding quick therapeutic alternatives. The popular protein targets of SC-2 have been the proteases and the polymerase. Using docking based virtual screening (DBVS) Pant et al. (2020) calculated the binding affinity of FDA approved protease inhibitors against Mpro of SC-2. After screening large databases such as ZINC/ChEMBL, drugs such as cobicistat, ritonavir, lopinavir and darunavir were found to be the top hit molecules but were not confirmed in vitro. Subsequent biological evaluation of these compounds by other groups highlighted variation in prediction success. An in vitro study by Shytaj et al. (2022) found that cobicistat does not inhibit the enzymatic activity of Mpro but has an effect on S-glycoprotein maturation or function. The results showed that cobicistat can inhibit S-glycoprotein fusion with an IC 50 of 3.8 μM. The drugs such as lopinavir and ritonavir had an IC 50 value of 12.01 μM and 19.88 μM respectively against SC-2 infection in Vero E6 cells . On the other hand, darunavir showed no antiviral activity against SC-2 at clinically relevant concentrations (EC 50 > 100 μM) (De Meyer et al., 2020).
Considering protease inhibitors, a study by  reported that many drugs displaying diverse therapeutic actions had predicted high binding affinity to PL pro . These drugs were a series of anti-viral drugs (ribavirin, valganciclovir and thymidine), anti-bacterial drugs (chloramphenicol, cefamandole and tigecycline), a muscle relaxant drug (chlorphenesin carbamate) and an anti-tussive drug (levodropropizine). In silico studies have suggested that ribavirin has a broad-spectrum impact on SC-2, acting at different viral proteins. It has been reported to show antiviral activity in Vero E6 cells following SC-2 infection as it decreases the expression of TMPRSS2 at both mRNA and protein levels 48 h after treatment (Unal et al., 2021), while promising does not necessarily mean that the MOA is via PL pro . The RdRp inhibitors remdesivir and favipiravir have proven to show efficacy against SC-2. These were also found to be amongst high-ranking nucleotide analogues that could tightly bind to RdRp .
The NSPs are now emerging as a new avenue of investigation for identifying possible inhibitors that could block protein activity thus stopping viral replication. A study by Tazikeh-Lemeski et al. (2020) used a DBVS approach to screen FDA approved drugs against NSP16 to predict their inhibitory activity. The study found anti-viral drugs (maraviroc and raltegravir) and an anti-inflammatory drug (prednisolone) to be effective drug candidates against NSP16. These compounds are predictions and have not been tested against SC-2 NSP16. Our previous study (Kandwal and Fayne, 2022) that focussed on elucidating possible mechanisms of action of compounds that were active against whole virus SC-2, targeted NSPs of SC-2 and resulted in target prediction of  many inhibitors based on pharmacophore based virtual screening. For example, the pharmacophore features selected for NSP15 resulted in AT-9283, acadesine, olomoucine, sapropterin and tetrahydrofolic acid showing promising interactions in the NSP15 binding site. The computational drug discovery-based approaches can help in discovering small molecule inhibitors of SC-2 viral proteins but there is also a need to validate these hits with in vitro and in vivo assays. The structural proteins, such as the spike protein, can also be good targets. Many drugs such as anti-hypertensive drugs (rescinnamine), anti-fungal drugs (posaconazole), anti-bacterial drug (sulfasalazine) and anti-coagulant drug (dabigatran etexilate) showed high predicted binding affinity for the spike protein of SC-2 . Among these drugs posaconazole specifically blocks SC-2 entry, possibly by selectively targeting the "E-L-L" motif of the S protein. The IC 50 of the drug was estimated to be 3.37 μM in HEK293T-ACE2 cells (Jana et al., 2022). An interesting study by Lau et al. (2021) found top scoring compounds (docking calculation) such as Imatinib, lapatinib, adapalene and candesartan cilexetil predicted to inhibit the binding domain of spike 1 of SC-2 and was followed up with an in vitro cellular infection assay. Among them imatinib and lapatinib had IC 50 values of approximately 10 μM.

Discussion
The sequence similarity between SC-2 and SC-1 is quite striking. This can be the foundation for medicinal chemistry studies for new therapeutic alternatives for the infection caused by these viruses. The combined efforts of various research disciplines such as bioinformatics, computational drug design, virology and clinical studies, can help speed up the task of finding therapeutics for COVID-19. One such way of finding a drug alternative is through drug repurposing of known drug molecules that have well-defined clinical profiles. Most of the drug development studies have focussed on targeting the viral proteases, polymerases, and structural proteins. There is a need to explore the NSPs as potential targets for treating COVID-19. The group 4 viruses have +ssRNA and are divided into various families having some conserved sequences between them. For example, NSP12 of Coronaviridae family and NSP9 of Arteriviridae family have approximately 200-400 highly conserved residues in the NiRAN and RdRp domains Xu et al., 2003;Snijder et al., 2016). Considering the order Nidovirales, the N-terminal of NSP13 is found to be the most conserved domain (Gorbalenya, 2001;Nga et al., 2011). Within the Coronaviridae family there is spike protein similarity that is shared between SC-2 and SC-1 (Wan et al., 2020).
All these studies suggest sequence conservation between various +ssRNA viruses. Taking into consideration the sequence conservation among other viral families within group 4, it is also important to explore the structural similarities of various viral proteins. For example, protease (NS3/4A) of the hepatitis C virus shares structural similarities (substrate binding cleft, active site) with the protease (Mpro) of SC-2 (Bafna et al., 2020). Such studies can pave the way towards finding drug alternatives that can be repurposed for treating SC-2 infection. Bioinformatics tools such as MSA studies of viral proteins of SC-2 with other related SARS-CoV viruses or other group 4 viruses, can be very useful for identifying conserved residues. These residues may, in turn, define binding sites or clefts that can be initially targeted by repurposed drugs or, subsequently, tailored small molecule inhibitors. For example, the crystal structure of RdRp bound to Remdesivir shows that it forms interaction with conserved residues K545 and R555 present in RdRp of SC-2 . Sinefungin co-crystallised with NSP16, is known to interact with the conserved residues present inside the active site (Krafcikova et al., 2020).
There is a need to perform a comprehensive sequence alignment study on these viruses, the currently available SC-2 dataset consists of ~12 million sequences (GISAID Khare et al., 2021), SARS-CoV related viruses and other group 4 viruses. The conserved regions that have a role in protein activity could be targeted for further binding site analysis with suitable inhibitors.

Conclusions and future directions
Multiple studies have highlighted that there are sequence similarities across human CoVs which should enable the identification of conserved regions within the viral genome and target the conserved residues present in the protein binding sites. As a significant amount of research has focused on spike and protease viral targets, we suggest that the NSPs, which are essential for viral replication, require much more detailed study. By utilising bioinformatics tools, we propose determining conserved regions between SC-2 variants, other SARS viruses and the broader group 4 virus family. These conserved regions may represent novel starting points for designing small molecule interventions. Their conservation will point to their importance in the viral life cycle within which they may form key protein-protein interactions (Fayne, 2013), interact with co-factors or other essential biological agents. Moving forward, we recommend using computational drug design approaches to virtually screen existing and designed drug libraries to elucidate interactions between novel NSP binding sites and potential small molecule inhibitors. Due to the novelty of these sites, it is likely that no positive control compounds will be available and there may be difficulties developing specific in vitro assays that can determine if binding is occurring at this novel site. The most active compounds should be tested in on-target NSP binding assays followed by SC-2 replication assays to confirm antiviral activity. Due to the conservation of these sites across multiple viruses, it may be possible to discover pan-viral small molecule inhibitors with the potential to inhibit target NSPs across group 4 viruses thus, offering protection from possible future outbreaks of related viruses.

Declaration of competing interest
We declare that we have no conflict of interest.

Acknowledgements
The research conducted in this publication was funded by the Irish Research Council under grant number GOIPG/2021/954. The Trinity Biomedical Sciences Institute (TBSI) is supported by a capital infrastructure investment from Cycle 5 of the Irish Higher Education Authority's Programme for Research in Third Level Institutions (PRTLI). We thank the software vendors for their continuing support of academic research efforts, in particular the contributions of the Chemical Computing Group (CCG), Biovia and OpenEye Scientific. The support and provisions of Dell Ireland, the Trinity Centre for High Performance Computing (TCHPC) and the Irish Centre for High-End Computing (ICHEC) are also gratefully acknowledged. We also thank Drs Karsten Hokamp and Fiona Roche for their bioinformatics insight and assistance. Papain-like protease RdRp RNA-dependant RNA-polymerase SARS-CoV Severe acute respiratory syndrome-related coronavirus SARS-CoV-2 Severe acute respiratory syndrome-related coronavirus 2 SC-1

Abbreviations
Severe acute respiratory syndrome-related coronavirus 1 SC-2 Severe acute respiratory syndrome-related coronavirus 2 -ssRNA Negative sense, single stranded, Ribonucleic acid VOC Variants of concern WHO World Health Organization ZBD Zinc-binding domain