In-silico nucleotide and protein analyses of S-gene region in selected zoonotic coronaviruses reveal conserved domains and evolutionary emergence with trajectory course of viral entry from SARS-CoV-2 genomic data

Introduction the recent zoonotic coronavirus virus outbreak of a novel type (COVID-19) has necessitated the adequate understanding of the evolutionary pathway of zoonotic viruses which adversely affects human populations for therapeutic constructs to combat the pandemic now and in the future. Methods we analyzed conserved domains of the severe acute respiratory coronavirus 2 (SARS-CoV-2) for possible targets of viral entry inhibition in host cells, evolutionary relationship of human coronavirus (229E) and zoonotic coronaviruses with SARS-CoV-2 as well as evolutionary relationship between selected SARS-CoV-2 genomic data. Results conserved domains with antagonistic action on host innate antiviral cellular mechanisms in SARS-CoV-2 include nsp 11, nsp 13 etc. Also, multiple sequence alignments of the spike (S) gene protein of selected candidate zoonotic coronaviruses alongside the S gene protein of the SARS-CoV-2 revealed closest evolutionary relationship (95.6%) with pangolin coronaviruses (S) gene. Clades formed between Wuhan SARS-CoV-2 phylogeny data and five others suggests viral entry trajectory while revealing genomic and protein SARS-CoV-2 data from Philippines as early ancestors. Conclusion phylogeny of SARS-CoV-2 genomic data suggests profiling in diverse populations with and without the outbreak alongside migration history and racial background for mutation tracking and dating of viral subtype divergence which is essential for effective management of present and future zoonotic coronavirus outbreaks.


Introduction
Coronaviruses (CoVs) are enveloped viruses with a positive-sense, single-stranded RNA genome belonging to the coronaviridae family [1]. CoVs are divided into alpha, beta, gamma and delta groups and the beta group is further composed of A, B, C and D subgroups [2]. The virus belongs to the 2B group of the beta-coronavirus family, which includes SARS-CoV and Middle East respiratory syndrome coronavirus MERS-CoV [3]. Their entry into respiratory and oesophageal routes accounts for mild to severe acute respiratory syndromes which has led to global epidemics with high morbidity, mortality and immense economic losses in affected human populations [4,5]. Encoded within the 3' end of the viral genome are the four main structural proteins of coronavirus particles: spike (S), membrane (M), envelope (E) and nucleocapsid (N) [6] as shown in Figure 1.
Phylogenetic analyses of 15 human CoV whole genomes revealed 2019 novel CoV (2019-nCoV) genome shares highest nucleotide sequence identity with SARS-CoV (79.7%) while its two evolutionarily conserved regions (envelope and nucleocapsid proteins) had sequence homology of 96% and 89.6% with same respectively [3]. Hence, the nomenclature for the novel type of the coronavirus outbreak. Surface proteins which stick out like crown tips (spikes) on coronaviruses binds to host cell receptors-angiotensin converting enzyme 2 (ACE 2) in epithelial cells in hosts. The S1 subunit (N-terminal) of the surface protein facilitates binding to the ACE2 receptor while the S2 subunit (C-terminal) mediates host cell entry through the binding of the viral S protein to human dipeptidyl peptidase 4 (DPP4), marking onset of infection [7,8].
Interestingly, conserved domains of CoVs have been indicated in literatures as vital entry targets in vaccine and drug development [9,10]. However, growing variability and mutational changes in viruses can cause lack of specificity and reduce efficiency of therapeutic measures. Recombination serves central function in virus replication and evolution in viral infections such as HIV, Ebola and MERS [11,12] while molecular mechanisms (RNA fragmentation and trans-esterification reactions) are possible causes of RNA fragments ligation and subsequent increased novel recombination frequency observed among various RNA viruses [13]. Diverse host factors account for a great deal of genome variability in viral recombinants which ranges from multi-resistance to evolutionary novelties [14]. The emergence of novel viral variants trafficked by humans and animals alike through global travel has remained a constant threat in public health and increasing complexity of host-viral interactivity in viral adaptation and evolution [15].

Methods
Comparison and analyses of conserved domain of 2019-nCoV/SARS-CoV-2 protein: reference number (initial entry with refSeq number NC_045512.1) SARS-CoV-2 was retrieved from National Centre for Biotechnological Information (NCBI) database and query for its conserved domains (CDS) was launched using affiliated resources. Proteins with similar conserved domains were included in the subsequent multiple sequence alignment of spike gene of zoonotic coronaviruses investigated in this study. . Their nucleotide and S gene protein sequences were pooled using NCBI resource tools while analysis was done using EMBOSS needle, clustal W2 and clustal omega respectively.
Homology and phylogeny analysis of the S-protein genes in candidate zoonotic viruses: the identified spike gene protein sequences of animal coronaviruses were retrieved from submitted protein entries in NCBI database, homology analysis of the sequences was compared using clustal omega, EMBOSS needle while phylogenetic trees was constructed using the neighbor-joining method by CLUSTAL X software.

SARS-CoV-2 sequence and phylogenetic analyses:
in total, we culled the respective genomic and protein data of eight [8] 2019-nCoV/SARS-CoV-2 clinical isolates from beta coronaviruses database in NCBI and these are: [ [7] MT308703;QIV64975.1 (USA, April 2020) and [8] MT308704 (USA, April 2020). Whole-genome alignment and protein sequence identity calculation were performed using multiple sequence alignment in EMBL-EBI database with default parameters in clustal W2 and clustal omega respectively.

Evolutionary patterns from SARS-CoV-2 isolates:
increased level of evolutionary divergence was observed in submitted entries of the recent SARS-CoV-2 genomic data during time of the study (entries from December 2019 till 4 th April) as seen in Figure 3 and Figure 4, while evolutionary patterns observed between Wuhan SARS-CoV-2 data and other five geographical locations reveal trajectory of infection from reported source of outbreak.

Discussion
The region of 2019-nCoV domain which encodes nsp 11 spans from about 18046-19824bp. It was indicated in countering host innate antiviral response via inhibition of type I interferon (IFN) production using NendoU activity-dependent mechanisms in porcine reproductive syndrome viruses [16]. The nsp 11 is also associated with pathways such as programmed cell death evasion, mitogen-activated protein kinase signaling, histone-related, cell cycle and DNA replication and the ubiquitin-proteasome through RNA microarray analysis [17][18][19][20] and few nsp 11 inhibitors include papain-like proteinase (plPRO) and 3C-like main protease-3CLpro [21]. Coronavirus RNA-directed RNA polymerase (RdRp) terminus covers the N-terminal region of the coronavirus. It spans from about 13480-14538bp in SARS-CoV-2 and its interaction with nsp3 has been indicated in viral replication especially during early onset of infection [22]. The inhibitors of coronavirus RdRp include ATP inhibitors with mfScores lower than 110 [21]. The nsp 13 is regarded as a highly conserved and multifunctional helicase unit and its spans from about 20662-21537 in the SARS-CoV-2 isolate [23].
They are SARS-CoV helicases that are chiefly concerned with RNA processing, DNA replication, recombination and repair, transcription and translation [24]. A few potential inhibitors of nsp13 have been identified [25,26] and they act by interfering with its unwinding and ATPase activities. The coronavirus S2 super family spans from 23546-25372 and forms the characteristic 'corona' after which the group is named. CoV diversity is reflected in the variable spike proteins (S proteins) and evolves into forms differing in receptor interactions and response to various environmental triggers of virus-cell membrane fusion [27]. The C-terminal (S2) domain directs ectodomain fusion of all CoVs spike proteins following receptor binding [28,29]. The level of interactions between the S protein and the virus receptor controls the host cell range [30]. A study showed a switch of species specificity via a mutant mouse hepatitis virus (MHV) construct which conferred horizontal gene transfer and ability to infect feline cells which were initially absent in wild MHV cells [30]. This was achieved via the substitution of the spike glycoprotein ectodomain. Another research [31] also indicated role of natural mutations in reactivity between the receptor binding domain of spike and crossneutralization between palm civet coronavirus and SARS-CoVs.
Identification of the origin, natural host (s) and evolutionary pathway of viruses which causes pandemics is essential to understand molecular mechanism of their cross-species interactivity and implementation of a proper control measure [32]. Protein sequence alignment analyses reveals the closest evolutionarily conservation between 2019-nCoV/SARS-CoV-2 and pangolin S protein with 95.6% similarity and 92.1% identity while 46.8% similarity and 31.2% identity was observed between SARS-CoV-2 and bat S protein (supplementary data). This finding therefore agrees with reports indicating pangolin as a more recent ancestor of SARS-CoV-2 than bats [33,34] which could have arisen as a result of recombination (chimera) or interactions between pangolin-CoV-like virus with a bat-CoV-RaTG13like virus going by the homology and subclade of SARS-CoV-2 and pangolin S genes from bat S-gene seen in this study ( Figure 2). Although, some computational analyses prediction of the improbability of direct binding between receptor binding domains (RBDs) in SARS-CoV-2 and ACE2 in humans suggests otherwise [35,36], studies have shown demonstrations of cross-species interactivity through structural (in-silico), in-vitro and in-vivo mechanisms [31, [37][38][39].
Series of in-vivo and in-vitro RNA recombination leading to vast genetic variability of positive strand RNA viruses has also been reported [13]. Domestication, consumption and wildlife activities which results in natural selection on a human or human-like ACE2 receptor [33,36] raises the possibility of SARS-CoV-2 emergence from pangolin. The receptor-binding domain (RBD) in the spike protein and functional polybasic (furin) cleavage site at the S1-S2 boundary [33,40]  . This amongst others, necessitates the strict travel bans, laws and confinement strategies adopted in different countries to curb its spread. Surprisingly, genomic and protein data from Philippines suggests otherwise (Figure 3 and Figure 4). Despite the limited data used for SARS-CoV-2 genomic profiling in this study, we found viral subtype divergence (considering distance metrics of SARS-CoV-2 with entries) (Figure 3 and Figure 4) suggesting a population-specific post translational modification which could have been influenced by genetic makeup. This is presumed based on subclades formed between protein sequence data from Philippines (BCA37476.1 and BCA37477.1) and another between China (QHD43415.1) and Philippines (BBZ90167.1) countries in the same continent. Also, empirical data points to genetic and epigenetic factors in SARS-CoVs evolution, incidence and infection rates amongst diverse populations and across different racial backgrounds [44].

Conclusion
Viral cellular mechanisms are vital factors necessary for replication during infection. Hence, identification of domains of viral entry and evasion of antiviral mechanisms in host cells is essential for development of effective therapeutic measures. Conserved domains that are vital targets sites for inhibition of SARS-CoV-2 viral entry and replication in host cells found in this study include nsp11, nsp 13, RdRp and corona super family while compounds such as RNA aptamers, ATP inhibitors, papain-like proteinase (plPRO) and 3C-like main protease-3CLpro etc. are viable indicated inhibitors of these domains; also, understanding the evolutionary pathway of the novel coronavirus transmission will not only help combat the current pandemic but assist in mutation tracking for identifying future zoonotic coronaviruses threats. The phylogenetic analyses of candidate zoonotic coronavirus (S) gene with SARS-CoV-2 revealed pangolin as the most recent ancestor which formed a sub-clade with bat S-gene suggesting interspecies recombination of CoV in bats and pangolins. Evolutionary pattern observed between SARS-CoV-2 genomic data from source of outbreak with recent entries analyzed in this study showed relative trajectory course of infection from source to other places except protein data from Philippines suggesting earlier existence of SARS-CoV-2 which should be further investigated. Also, genomic and protein data revealed racial viral subtype divergence and rapid rate of mutation despite the novelty of the outbreak.
Precise dating of viral subtype divergence will enable researchers correlate divergence with epidemics and pandemics via viral sequence sampling for proper time-scale measurements of zoonotic threats in human populations. Therefore, there is an urgent need for large scale analysis and profiling of genetic data of SARS-CoV-2 in affected populations especially in Africa where there is paucity of genomic SARS-CoV data for effective therapeutic measures.

What is known about this topic
•