A nidovirus perspective on SARS-CoV-2

Two pandemics of respiratory distress diseases associated with zoonotic introductions of the species Severe acute respiratory syndrome-related coronavirus in the human population during 21st century raised unprecedented interest in coronavirus research and assigned it unseen urgency. The two viruses responsible for the outbreaks, SARS-CoV and SARS-CoV-2, respectively, are in the spotlight, and SARS-CoV-2 is the focus of the current fast-paced research. Its foundation was laid down by studies of many corona- and related viruses that collectively form the vast order Nidovirales. Comparative genomics of nidoviruses played a key role in this advancement over more than 30 years. It facilitated the transfer of knowledge from characterized to newly identified viruses, including SARS-CoV and SARS-CoV-2, as well as contributed to the dissection of the nidovirus proteome and identification of patterns of variations between different taxonomic groups, from species to families. This review revisits selected cases of protein conservation and variation that define nidoviruses, illustrates the remarkable plasticity of the proteome during nidovirus adaptation, and asks questions at the interface of the proteome and processes that are vital for nidovirus reproduction and could inform the ongoing research of SARS-CoV-2.


From SARS-CoV-2 to nidoviruses and back
The loss of many hundreds of thousands human lives to the COVID-19, the severe respiratory disease and its complications, raised unprecedented interest in its causative agent severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) of the species Severe acute respiratory syndrome-related coronavirus [1e4] as well as other coronaviruses. The state-of-the art of coronavirus research is in the spotlight and scrutinized by scientists, stakeholders as well as the public. It reveals a group of remarkable RNA viruses of extraordinary genome sizes and complex biology, which characterization was largely advanced by a relatively small group of dedicated researchers until 2020 (e.g. see reviews in [5e7]). Also apparent, are considerable gaps in our understanding of the fundamentals of coronavirus infection that complicate developing safe and efficient vaccines and antivirals against COVID-19. Mobilization of resources and expertise across many fields of science are addressing both the immediate challenges of the COVID-19 pandemic (e.g. [8e10] and advancement of our knowledge of coronaviruses (e.g. [11e16]).
The observed very low variation of SARS-CoV-2, during the virus transmission in human population since December 2019 [17e19], is one of the silver linings that emerged from the research so far. It lends support to a belief that prospective therapeutics directed against the currently circulating virus variants will remain effective in the future. However, we also know that RNA viruses can evolve fast, could SARS-CoV-2 be an exception even if its replication includes proofreading?
In this regard, it may be informative to look at the virus genomic variation beyond the current pandemic and consider relatives of SARS-CoV-2 that collectively form the order Nidovirales [20]. The natural variation of nidoviruses has realized over a considerable, although poorly defined, timeframe in vastly different vertebrate and invertebrate hosts; its scale far exceeds mutation ranges observed for SARS-CoV-2 and tested in experiments with SARS-CoV-2 and few other selected nidoviruses. Characterization of various patterns of natural protein variation reveal constraints on evolution and may also inform about potential of the respective proteins to evolve under directional selection pressure of antivirals. Comparative genomics of corona-and related viruses have successfully guided experimental research on the nidovirus proteome, which is critical for identifying targets of antivirals in SARS-CoV-2. This article provides a primer on the recognized commonalities of nidoviruses and reviews cases of outstanding genome conservation and the variation that underlies the biology of nidoviruses and informs about the potential to change. It attests for numerous connections between research on diverse nidoviruses and the ongoing characterization of SARS-CoV-2 including efforts to contain the pandemic with antivirals.

Origin of Nidovirales name and nidovirus identity
The most distinguished characteristic of nidoviruses, as recognized early in the course of research on coronaviruses and arteriviruses, is the production of a set of subgenomic (sg) mRNAs that is 5 0 -and 3 0 -nested relative to genomic RNA. This shared characteristic provided a basis for the order's name: nidus means nest in Latin [21]. This name is retained despite several later described nidoviruses appear to have only 3 0 -nested set of sg RNAs [22,23]. Nidoviruses were also found to share a genome organization and expression mechanisms. Yet practically, new viruses are assigned to nidoviruses using comparative sequence analysis and considering phylogeny affinity in most conserved protein domains that form a unique synteny [24].

Nidovirus diversity and taxonomy
Nidoviruses possess positive-sense, non-segmented linear RNA genomes in the unprecedented large size range of 12e41 kb. They replicate in the cytoplasm and have genomes packaged into enveloped virions that may vary in shape, depending on the virus lineage [24,25]. SARS-CoV-2 is a variant of one of about a hundred known nidovirus species [4]. These viruses form the order Nidovirales that was established in 1996 by merging two families of viruses infecting vertebrates, Coronaviridae and Arteriviridae [26,27]; the former was recently split in two, elevating the original subfamily Torovirinae to the new family Tobaniviridae [28]. First invertebrate nidoviruses, comprising family Roniviridae [22], were identified only 20 years ago, although currently the order includes already 8 families of vertebrate and 6 families of invertebrate nidoviruses, with majority including only a single species [29e33] [34].
With the exponential growth of the number of available nidovirus genome sequences, the number of known nidovirus species began to grow accordingly, although their formal classification within the taxonomy framework may lag behind. Likewise, the gap between the newly identified and the few experimentally characterized nidoviruses is also rapidly increasing. The latter group includes arteriviruses: equine arteritis virus (EAV) and porcine reproduction respiratory syndrome viruses (PRRSV), and coronaviruses: human coronavirus 229E (HCoV-229E), transmissible gastroenteritis virus (TGEV), mouse hepatitis virus (MHV), Middle East respiratory syndrome coronavirus (MERS-CoV) and avian infectious bronchitis virus (IBV), in addition to SARS-CoV and SARS-CoV-2. Also, the limited characterization of several tobaniviruses, and invertebrate mesoniviruses and roniviruses, often isolated from 'exotic' hosts, was important for understanding generalities and host-and lineage-dependent specifics of nidoviruses, and for the validation of many models of comparative genomics. Since viruses of the subfamily Orthocoronavirinae (formerly Coronavirinae) and the family Arteriviridae are most frequently sampled, they were predominantly used to characterize patterns of conservation and evolution at subfamily and family levels, respectively.

Canonical multi-ORF organization of nidovirus genome: three functional regions and their expression
All known coronaviruses, including SARS-CoV-2, as well as other nidoviruses, whose genomes were sequenced from 1987 up to 2016, are characterized by a conserved multicistronic genome organization including multiple open reading frames (ORFs) (Fig. 1). The two largest and slightly overlapping ORFs, 1a and 1b, occupy the 5 0 -terminal two-thirds of the genome and encode nonstructural proteins (nsps) that are derived by autoproteolytic processing from polyproteins 1a and 1ab, pp1a and pp1ab, encoded by ORF1a and jointly ORF1a/ORF1b, respectively [35]. The number of nsps vary from 12 to 16 in arteriviruses and coronaviruses, respectively; comparable number may be produced in other lesser characterized nidoviruses. Together these two ORFs are often referred to as the replicase gene, although ORF1a and ORF1b and their products are chiefly responsible for the control of genome expression and replication, respectively. Consequently, considering them as two separate major regions facilitates functional and evolutionary analyses of nidoviruses [36]. The 3 0 -terminal region of the genome contains multiple smaller ORFs (3 0 ORFs), the number of which varies considerably among nidoviruses and which encode structural and, in some nidoviruses, accessory proteins. This third region is chiefly responsible for virus dissemination [36] and, in the Coronaviridae, may vary in respect to ORF composition even between representatives of the same species, as we learned from the comparison of SARS-CoV and SARS-CoV-2 [37]. Untranslated regions (UTRs) are present at the 5 0 -and 3 0 -ends of the genome, and may also be found between ORFs in the 3 0 ORFs region. The genomic 5 0 -end is believed to be capped [23,38,39], and 3 0 -end of the genome is polyadenylated [40,41].
Nsps assemble into a membrane-bound replication-transcription complex (RTC) that mediates genome replication and synthesis of sg mRNAs (transcription) [42e44] for expression of the 3 0 ORFs (Fig. 1). Transcription involves a discontinuous synthesis of minus strand sg-size RNAs that employs short conserved sequences known as the body and leader transcription-regulating sequence (bTRS and lTRS) located upstream of a 3 0 ORF and ORF1a, respectively [21], although some nidoviruses may not use lTRS [23,45]. Most sg mRNA species are monocistronic and serve to translate only their 5 0 -most ORF, but some sg mRNA species are polycistronic [21,46,47]. Expression from separate sg mRNAs allows the regulation of the abundance of the respective structural and accessory proteins relative to each other and nsps [48,49].
The assembly of a virus particle is a multistage process that includes the encapsidation of a viral genome by multiple copies of a nucleocapsid protein, and the wrapping of the nucleoprotein complex by a host membrane, carrying viral structural proteins. The wrapping is coupled with budding into the lumen of the endoplasmic reticulum (ER) or Golgi complex, and followed by transportation of the virus particles to the plasma membrane through the secretory pathway, culminating in their release from the cell [50,51].
The multi-ORF genome organization, called hereafter canonical, is coupled with conserved expression mechanisms of transcription and translation, controlling the relative quantities of functionally different proteins in the infected cell. Specifically, pp1a proteins are synthesized in higher quantities than pp1ab unique proteins, due to À1 programmed ribosomal frameshifting (PRF) directing a fraction of the ribosomes from ORF1a to ORF1b translation of the incoming and newly synthesized genomic RNAs [52]. In contrast, the 3 0 ORFs are expressed from a separate set of sg mRNAs in a differential mRNA-specific manner at a later point in time, although the actual complexity of this regulation is just emerging [21,53,54,16].
This apparent coupling of the genome organization with the differential expression seems to be functionally sensible. The least expressed ORF1b-encoded proteins include core enzymes of the RTC, such as RNA-dependent RNA polymerase (RdRp) and superfamily 1 Helicase (HEL1), that are required in relatively minute molar quantities to catalyze synthesis of diverse virus RNAs and other RNA-dependent reactions [24]. They are assisted by other ORF1a-encoded subunits in the RTC which are produced in larger molar quantities compared to the ORF1b-encoded core enzymes for the reasons that are yet to be fully understood but would be compatible with either structural role or less efficient enzymatic activity or plentitude reactions they may be involved in. Likewise, the 3 0 ORFs may be expressed most actively to provide subunits of virus particles or proteins that may modulate virus-host interaction, with some proteins being multifunctional.

Non-canonical ORF organizations of recently discovered nidoviruses
Canonical multi-ORF organization, including two large overlapping ORFs for the replicase, was long considered a defining characteristic of nidoviruses [20], since it was invariantly found in diverse viruses that: 1) infect either vertebrate or invertebrate hosts, 2) have vastly different genome sizes, in the range from 12 to 34 kb, and 3) are separated by evolutionary distances comparable to the most distant in the Tree of Life [36]. Yet, we learned recently that this conclusion was premature and, presumably, due to poor sampling of viruses in the two genome size ranges: between 16 and 20 kb, and larger than 34 kb, respectively. Comparative genomic analysis of recently identified four highly divergent nidoviruses, Wuhan Japanese halfbeak arterivirus (WJHAV; 18.2 kb) [60], Beihai nido-like virus 1 (BNV1; 20.3 kb) [60], Aplysia abyssovirus 1 (AAbV; 35.9 kb) [30,31] and planarian secretory cell nidovirus (PSCNV; 41.1 kb) [32] revealed surprising plasticity of nidoviruses at the two critical junctions separating the three major genomic regions: between ORF1a and ORF1b, and ORF1b and 3 0 ORFs, respectively (Fig. 2). In the WJHAV genome, ORF1b is fused with a gene encoding putative glycoprotein [60], presumably a structural protein. Both BNV1 and AAbV contain two ORFs, a 5 0 -terminal ORF combining ORF1a-and ORF1b-like regions, and a single 3 0 -terminal ORF encoding structural protein domains [30,31,61]. The PSCNV genome has a single large ORF, which is an equivalent of ORF1a, ORF1b and 3 0 ORFs fused together [32]. This PSCNV ORF is exceptionally large: it encodes a 13,556 aa polyprotein that is 58e67% larger than the largest single-or multi-ORF polyproteins of other viruses. These nidoviruses could be considered non-canonical. Collectively they account for a large share of nidoviruses at the family level: four versus ten that have the canonical multi-ORFs organization. In contrast, they are substantially underrepresented at the species level. . Genome ORFs are depicted in their frame, with ORF1a frame set to zero. For each sg mRNA, only ORFs believed to be translated from it are shown, without indicating their frame relative to ORF1a. For genome and sg mRNAs, RNA signals are indicated by color (see inset). For polyproteins, autoproteolytic processing scheme (see inset) and selected protein domains (see text for abbreviations) are specified. The NC_004718.3 record was used to prepare this figure. Note that sg mRNA 3.1 [55] is not shown; the most N-terminal ubiquitin (Ub) and Macro domains are separated by acidic, structurally disordered region of~70 aa [56,57]. SUD-N and SUD-M, N-terminal and Middle domains of SARS-CoV Unique Domain, respectively [58,59]; Y, Y domain [57].
The discovery of non-canonical nidoviruses raises the intriguing question about genome expression mechanisms used by these viruses and specifically about whether they are able to maintain region-specific stoichiometry of viral proteins described for canonical nidoviruses. It seems that despite differences in the ORFs organization, the end result of genome expression may be quite similar in both canonical and non-canonical nidoviruses.
Computational analysis of the genome sequences of BNV1, AAbV and PSCNV shows how these non-canonical nidoviruses may attenuate translation of the ORF1b-like region to achieve a nonequimolar ratio of nonstructural proteins that are encoded in ORF1a and ORF1b of canonical nidoviruses. Both BNV1 and AAbV have ORF1a-like and ORF1b-like regions residing in the same reading frame and separated by a single stop codon (Fig. 2) [30,31,61]. If a readthrough of this stop codon only occurs in a fraction of translation events, proteins encoded in the ORF1a-like region would be expressed in a higher quantity compared to proteins encoded in the ORF1b-like region. This type of regulation was documented for the attenuation of RdRp (nsP4) of alphaviruses [62]. The PSCNV single-ORF genome includes a predicted À1 PRF site immedaitely upstream of the ORF1b-like region with a potential to divert translation of a fraction of ribosomes to a tiny 39 nt ORF in an alternative reading frame (Fig. 2). As a result, the ORF1alike compared to the ORF1b-like region of the PSCNV genome would be expressed more frequently upon translation of the genomic RNA [32]. The main difference between the À1 PRFdirected mechanisms in canonical nidoviruses and in PSCNV is that À1 PRF directs translation into ORF1b in the former but diverts it from ORF1b-like region in the latter. A similar mechanism was already demonstrated for diverting a fraction of ribosomes during translation of ORF1a in the nsp2 region of some arteriviruses [63].
Likewise, non-canonical nidoviruses may be similar to the canonical nidoviruses in the use of sg mRNAs for the expression of the 3'-terminal genome region. Such evidence was obtained for the single 3 0 ORF of AAbV [30] and the 3 0 ORFs-like region of the PSCNV genome [32]; based on similarity with canonical nidoviruses, this hypothesis may be extended to the single 3 0 ORF of BNV1 and the three small 3 0 ORFs of WJHAV, although these viruses are yet to be studied in this respect (Fig. 2). In both the PSCNV and AAbV, potential leader and body TRSs were identified by comparative genomics as large repeats in the 5 0 UTR and upstream of the genome region predicted to encode structural proteins, respectively (Fig. 2). A sharp increase in coverage of the genome by RNA-seq reads was observed at the body TRS of both PSCNV and AAbV, consistent with the downstream region being a subject of transcription. Existence of PSCNV sg mRNA species, expected to be expressed when the identified TRSs are employed, was confirmed in a 5 0 -RACE experiment. Importantly, if translation of the PSCNV sg mRNA species is initiated at its most 5 0 -terminal start-codon, it would result in production of a polyprotein identical to the C-terminus of the giant polyprotein expressed from the PSCNV genome [32]. Thus, predicted structural proteins of PSCNV may be expressed from both genome and sg mRNA, unlike their counterparts in canonical nidoviruses.
Interestingly, the PSCNV may not be the only non-canonical nidovirus, structural proteins of which are synthesized from both genome and sg mRNA. The WJHAV potentially encodes a glycoprotein in the unusually long 3 0 -terminus of its ORF1b that is located downstream of the otherwise terminal 2 0 -O-methyltransferase (O-MT) locus [60,34]. While the WJHAV was not analyzed in this respect, this genome organization is compatible with both genome and sg mRNA directing synthesis of the glycoprotein (Fig. 2). Production of certain structural proteins from both genome and sg mRNA can also be envisioned for some of the canonical nidoviruses, such that stop-codon of ORF1b and startcodon of the downstream structural ORF are in-frame and separated by few codons in their genomes. If a readthrough of the ORF1b stop-codon would occur, with the stop-codon being decoded by a suppressor tRNA [52], it would lead to a continuation of translation, resulting in the production of pp1ab fused with a structural protein. For example, SARS-CoV ORF1b and ORF2, encoding S protein, belong to the same reading frame and are separated by just two codons downstream of the ORF1b termination codon (Fig. 1). No evidence for such suppression has been reported for this or other viruses of the species Severe acute respiratory syndrome-related coronavirus.
Unlike the expression of structural proteins from individual ORFs, observed in most known nidoviruses, expression of multiple structural proteins from a single ORF, predicted for non-canonical BNV1, AAbV and PSCNV (Fig. 2), would require processing of the structural polyprotein by host and/or viral proteases, unless their structural domains function in the context of a single polyprotein. Accordingly, a chymotrypsin-like serine protease domain was detected in the structural polyprotein sequences of BNV1 and AAbV [30,31], and potential cleavage sites of cellular proteases furin and signal peptidase were identified in the C-terminal region of the PSCNV polyprotein [32].

All nidoviruses share seven protein domains that are vital to control of genome replication and expression
Six domains, transmembrane domain 2 (TM2), 3C-like protease (3CLpro), transmebrane domain 3 (TM3), RdRp, Zn-binding domain (ZBD) and HEL1, were delineated in pp1a/pp1ab soon after the first sequence of the coronavirus genome was released [64e68]. Subsequently, they were found to be universally conserved in all nidoviruses [26,69e71] (Fig. 3), although unlike other universally conserved domains, TM2 and TM3 have no residues invariant in all nidoviruses.
Many more protein domains were delineated in further studies but none were conserved across the entire order until the discovery of the nidovirus RdRp-associated nucleotidyltransferase (NiRAN) domain in the ORF1b, which remained undetected for almost twenty years since the sequencing of the first ORF1b due to most pronounced divergence [72] (Fig. 3). It is the only enzymatic core domain that has no apparent ortholog in other known RNA viruses, although it uses a kinase fold [73] that is likely shared with protein kinases identified in few nidoviruses [25,74]. With the identification of the NiRAN domain, one hallmark of nidoviruses e universally conserved replicative domains encoded in a certain order (synteny) e expanded to include seven domains: (TM2)-3CLpro-(TM3)-NiRAN-RdRp-ZBD-HEL1. This domain constellation remains the most reliable characteristic that readily distinguishes the evergrowing diversity of nidoviruses from other viruses. Its presence allows to demarcate a monophyletic nidovirus cluster in phylogenetic trees including also other viruses.
In agreement with their nidovirus-wide conservation, all domains of the synteny, when tested in experiments, proved to be essential for nidovirus replication [72,75e78].

Plasticity of the most conserved replicative proteins of nidoviruses
While remaining under a strong purifying selection, most conserved replicative domains have accepted rare or unique substitutions of key residues in nidoviruses that are among most divergent phylogenetically and in respect to other characteristics. Namely, PSCNV, singularly representing the suborder Monidovirineae [33], contains a number of remarkable substitutions in three enzymatic domains of the synteny, 3CLpro, NiRAN and RdRp [32]. A substrate pocket of PSCNV 3CLpro contains Val residue in place of His residue absolutely conserved in other nidoviruses [35]; the substitution was predicted to confer an unusual substrate specificity that remains to be established for the enzyme. 3CLpro is also known for repeated toggling between the nucleophile Cys and Ser residues of its catalytic center [35,79]. It is observed between virus families and its implications for the nidovirus life cycle remains unknown. PSCNV NiRAN has a substitution of one out of the seven residues absolutely conserved in nidoviruses previously characterized. The RdRps of PSCNV, as well as WPDV [80], have a Gly-Asp-Asp (GDD) signature in its catalytic motif C, instead of Ser-Asp-Asp (SDD) signature characteristic for nidoviruses [24]. These observations indicate plasticity of active sites of three key enzymes, possibly related to functional diversification and their coupling. It reveals considerable potential of nidoviruses adapt at the molecular level even when changes concern the vital functions and most constrained enzymes, which are among most immediate targets of prospective antivirals.

Do nidoviruses use cognate enzymes to catalyze RNA capping?
The most conserved domains of nidoviruses either control genome expression by autoproteolytic processing of pp1a/pp1ab (3CLpro) or genome replication (RdRp and HEL1, assisted by NiRAN and ZBD, respectively) or possibly both (TM2 and TM3). Intriguingly, no such ubiquitous link is evident between the protein domain conservation and control of translation of virus RNAs, including the incoming virion RNA, that launches virus-specific biosynthetic processes in the infected cell. Such control might include encoding a full complement of enzymes of an RNA capping pathway, like it was documented for a highly diverse monophyletic group of RNA viruses, historically known as alpha-like virus supergroup and currently comprising the class Alsuviricetes [81e83]. Indeed, coronavirus HEL1, residing in nsp13, possesses RNA 5 0 -triphosphotase (RTPase) activity that may catalyze the first reaction of the RNA capping pathway [84,85] (Fig. 4). In addition, two other ORF1b-encoded enzymes, guanine-N7-methyltransferase (N-MT) and O-MT, may catalyze the third and fourth reactions of the conventional mRNA capping pathway [86e89]. N-MT and O-MT reside in coronaviruses nsp14 and nsp16 (Fig. 1), respectively, and they are colinear in the pp1ab polyproteins of mesoni-and roniviruses, whose nsps are yet to be described fully (Fig. 3) [71,88,58]. However, contrary to their assumed essential involvement in the mRNA capping, these enzymes are not conserved in all nidoviruses (Fig. 4). Specifically, tobaniviruses encode O-MT, but appear to lack N-MT or at least its catalytic domain, while both N-MT and O-MT are missing in arteriviruses [71,90]. It was proposed that the N-MT function in tobaniviruses may be complemented by a putative MT encoded in ORF1a [91]. However, this enzyme was not found in some tobaniviruses and its specificity remains unknown [74]. Additionally, the enzyme catalyzing the second reaction of the capping pathway, guanylyltransferase (GTase), has not been identified in any nidovirus [89], although NiRAN was proposed as a possible candidate [72]. Since nidoviruses are unlikely to subvert the capping machinery of eukaryotic hosts that functions in the nucleus, it remains unresolved how they synthesize the 5 0 -end cap [23,38,39], which controls translation initiation and protects the RNA molecule from degradation [97]. This uncertainty leaves open also the question about the natural targets of N-MT and O-MT, and methylation of other substrates than the 5 0 -terminal nucleotides remains a valid option [58].

Hotspots of nidovirus evolution: duplications in pre-TM2 region
Most large-scale genomic changes in nidoviruses, including variation in respect to the N-MT and O-MT domains discussed above, can be attributed to aberrant homologous and nonhomologous recombination, the mechanisms behind deletions, duplications and gene acquisitions in RNA viruses [98,99]. These evolutionary events are most frequently observed in the two regions of nidovirus genome controlling nidovirus-host interactions: pre-TM2 region of ORF1a and 3 0 ORFs, as was documented in the past (e.g. [100,101]).
Several notable examples of deletions, duplications and gene acquisitions, mapping to these genome regions, were described in recent years and summarized below.
One of the most common mechanisms of genome and protein innovation is the generation of tandem repeats. Possibly due to fast evolution, adjacent and highly similar tandem repeats were rarely observed in the genomes of RNA viruses. They were reported in an nsp3 region between Ubiquitin (Ub) and papain-like protease (PLP1) domains (Fig. 3) in several coronaviruses, including human coronavirus HKU1 (HCoV-HKU1, species Human coronavirus HKU1, genus Betacoronavirus) [102,103] and duck-dominant coronavirus (DdCoV; species Duck coronavirus 2714, genus Gammacoronavirus) [104]. The analyzed HCoV-HKU1 isolates encode from 2 to 17 perfect, and from 1 to 4 imperfect copies of the acidic NDDEDVVTGD repeat. Four analyzed isolates of DdCoV all harbored five almostidentical copies of a 23 aa charged residue-rich repeat.
Interestingly, also arteriviruses contain repeats positioned in close proximity to each other in a similar location of pre-TM2 region: three copies of the PxPxPR motif, separated by~10 aa, were identified within the HVR domain of EAV and WPDV [80]. At least one copy of this motif was also found within the Hinge or HVR domain of almost all other arteriviruses. PxPxPR motifs may be recognized by cellular Src homology 3 (SH3) protein domains, implicated in signal transduction [105]. The same function was previously suggested for the canonical SH3-binding motifs PxxP detected in the nsp2 sequence of PRRSV-1 [106]. Given the small size of PxPxPR motifs and their scattered position within the fastevolving Hinge and HVR domains of arteriviruses, they might have emerged by either point mutation fixed by selection, or duplication followed by diversification.  [92] of the conserved core of RdRp, using IQ-Tree 1.5.5 [93] with automatically selected the rtREV þ F þ I þ G4 evolutionary model. To estimate branch support, SH-like approximate likelihood ratio test with 1000 replicates was conducted. Polyproteins pp1ab are shown as light grey bars; they are autoproteolytically processed to nsps that were identified only for few nidoviruses and omitted here (see also Fig. 1). TM domains are shown as dark grey bars; TM helices were predicted by TMHMM2.0c [94] and clustered if separated by less than 300 aa (less than 180 aa for arteri-and tobaniviruses). Other selected domains, whose coordinates were obtained from the Viralis database [92], are shown as colored bars; proteolytically inactive PLP domains are indicated by stripes on bars; indices of PLP domains are specified below the bars. "Pkinase, protein kinase [25]; CPD, cyclic phosphodiestarase known also as 2 0 ,5 0 -phosphodiesterase, 2 0 PDE [58,95]; NADAR, domain involved in the utilization of NAD and ADP-ribose derivatives [96]; for other domains, see Fig. 1 and the text." Remarkably, the pre-TM2 region is also the location of the only described tandem repeats in the newly discovered invertebrate nidovirus PSCNV [32]. Two tandem repeats of 67 and 66 aa are separated by 3 aa and share 41.1% identity. No homologs of these repeats, which could hint to their function, were identified in other viruses or elsewhere.
The described tandem repeats are most likely to have emerged by duplication. The PSCNV genome encodes also the potential leader and body TRSs (see above) that are exceptionally~60 nt long, share 86% nucleotide sequence identity, and are separated by 28,327 nt. They might have emerged as a result of duplication, but incremental extension of ancestral elements with insertions under positive selection (convergence) of genome expansion remains a credible scenario as well. A shorter ancestral genome might have already had TRSs at the respective positions in the genome, as is typical for nidoviruses. Expansion of the genome would have necessitated TRSs extension: short motifs identical to TRSs can be encountered in a long genome just by chance, compromising quality genome expression. Consequently, gradual expansion of the genome could have promoted coordinated extension of TRSs through the convergence mechanism. Expansion of virus sampling in the PSCNV clad could help in resolving the evolutionary history of this intriguing similarity between exceptionally long TRS(-like) elements.
1.9. Could the host NF-kB pathway be universally targeted by all nidoviruses?
Another major source of genome and protein innovation is domain acquisition from other species. Many domains found in subsets of nidoviruses (Fig. 3) may have been acquired through this mechanism [107]. The study of PSCNV was particularly insightful in this respect since it expanded the previously known proteome repertoire of nidoviruses or even larger groups of viruses [32].
A domain acquisition by a nidovirus must be reconciled with many constraints acting on genome, RNome and proteome that remain mostly unknown [36]. It seems plausible that, if fixed, a newly acquired domain provides a fitness advantage by improving control over a process critical for either virus reproduction per se or modulation of host environment to facilitate virus reproduction, or both. For instance, it was proposed that the acquisition of the 3'-5' exoribonuclease (ExoN) domain [108] improved the RTC control over progeny quality through the proofreading of RNA synthesis and allowed expansion of the nidovirus genome over 20 kb during evolution [71]; it provides a plausible explanation for the universal domain presence in all known nidoviruses with genomes of 20 kb or larger (Fig. 3). It is more uncertain whether a similar broad generalization is feasible when a clear-cut correlation between the domain presence and a characteristic may not be immediately evident in many nidoviruses. In this case, one of the options is that nidoviruses might have recruited different domains to regulate a common process, and analysis of a domain acquisition in a selected virus may offer a specific window into the process.
For instance, counteracting innate immunity response may be vital for nidoviruses to proliferate in wide range of hosts. It was reported that MERS-CoV targets the NF-kB pathway of the innate immunity using accessory protein 4b that is apparently conserved only in virus species of subgenus Merbecovirus, genus Betacoronavirus [109]. This pathway is targeted by many RNA viruses, including potentially SARS-CoV-2 [110]. Indeed, the planarian nidovirus PSCNV, which belongs to a different suborder than MERS-CoV and SARS-CoV-2, was proposed to target the NF-kB system by using an ANK-dependent pathway [32] in a manner shared by large dsDNA viruses [111e113].
The ANK domain is ubiquitous in proteins of diverse cellular organisms, and dsDNA viruses with large genomes (>200 kb), but PSCNV is the first and the only currently known RNA virus that encodes ANK domain [32,114]. The PSCNV ANK domain clusters phylogenetically with ANK domains of a pair of host proteins, SMU15016868 and SMU15003987, indicating that the PSCNV ancestor might have been acquired from an ancestor of the host, flatworm Schmidtea mediterranea. The two ANK-containing proteins of the PSCNV host have domain architectures suggestive of their interaction: SMU15016868 is characteristic for NF-kB protein, N-RHD-ANK-C (RHD is a Rel homology domain), while SMU15003987 is characteristic for its inhibitor IkB, N-ANK-C [115]. Based on studies of several viruses [111,112], the NF-kB protein is expected to reside in the cytoplasm, bound by inhibitors, its own ANK domain and protein IkB, in the absence of a viral infection (Fig. 5A). A viral infection would trigger degradation of NF-kB inhibitors, allowing NF-kB transcription factor to enter the nucleus and modulate gene expression to promote an antiviral immune response (Fig. 5B). Different viruses evolved diverse counteracting measures, and PSCNV may recruit cognate ANK as a IkB-mimicking protein, retaining a NF-kB transcription factor in the cytoplasm after the degradation of its inhibitors, and thus downregulating the immune response (Fig. 5C) [32]. This model is testable and also we could expect to learn whether other nidoviruses target the NF-kB pathway as well, which seems plausible. Our understanding of control of innate immunity by nidoviruses will improve if the future studies could also explain why PSCNV appears to resemble large DNA viruses rather than a fellow nidovirus MERS-CoV (and presumably others), which phylogenetically, genetically and sizewise are much more similar.

Concluding remarks
The identification of SARS-CoV-2 was prompted by the outbreak of the infectious disease, unlike discovery of many nidoviruses in recent years by phenotype-free metagenomics-and transcriptomic-based research. SARS-CoV-2 is a variant of the The conventional mRNA capping pathway is shown on the left, with the enzymes catalyzing the respective four reactions listed in bold. Further to the right, presence of these enzymes in viruses of five nidovirus families, each designated by its prefix, is listed (see Fig. 3 for phylogeny and pp1ab domain organization). RTPase, 5 0 -triphosphotase; GTase, guanylyl transferase; N-MT, guanine-N7-methyltransferase; O-MT, 2 0 -O-methyltransferase. In m7 GpppN 2'-Om notation, m7 G stands for 7-methylguanosine, p stands for phosphate, N 2'-Om stands for the 5 0 -terminal nucleoside of the RNA molecule, methylated at the ribose-2 0 -O position. For details, see text.
known species which is another major difference compared to other newly described nidoviruses that prototype new species. This dominance of the disease-free virus species discovery is likely to continue and accelerate even further in the future; it is vital for advancement of the field. Beneficiaries of the massive nidovirus discovery are numerous and range from research on the ecology of nidoviruses to the in-depth characterization of selected nidoviruses of recognized societal importance, like SARS-CoV-2. Each newly described nidovirus decreases bias of our knowledge that is most critical to address for achieving a comprehensive understanding of nidovirus biology. Functional annotation of new nidoviruses by comparative genomics is based on the knowledge already accumulated in the field. Likewise, each novel nidovirus sheds a new light on known nidoviruses, putting state-of-the-art of experimental and computational research to a test of the lab of nature. Comparative genomics expands knowledge about the proteome and RNome of nidoviruses and their possible functions. It reveals what can and not be changed over the life of multitude of virus generations in many hosts and under many recurrent and fluctuating selection forces whose specifics remain to be described. And it also highlights paralells between distantly related nidoviruses even in the genomic regions that are poorly conserved. This insight may become particularly relevant when we try to interrogate SARS-CoV-2 with compounds to limit or even eliminate the virus transmission in the human population. How this coronavirus may adapt to these challenges is a big unknown, but it would be foolish to underestimate its plasticity and resilience, as comparative genomics of known nidoviruses attests in this review.