A Statovirus‐like virus from respiratory tracts of patients, China

The emerging evidence of human infections with emerging viruses suggests their potential public health importance. A novel taxon of viruses named Statoviruses (for stool‐associated Tombus‐like viruses) was recently identified in the gastrointestinal tracts of multiple mammals. Here we report the discovery of respiratory Statovirus‐like viruses (provisionally named Restviruses) from the respiratory tracts of five patients experiencing acute respiratory disease with Human coronavirus OC43 infection through the retrospective analysis of meta‐transcriptomic data. Restviruses shared 53.1%–98.8% identities of genomic sequences with each other and 39.9%–44.3% identities with Statoviruses. The phylogenetic analysis revealed that Restviruses together with a Stato‐like virus from nasal‐throat swabs of Vietnamese patients with acute respiratory disease, formed a well‐supported clade distinct from the taxon of Statoviruses. However, the consistent genome characteristics of Restviruses and Statoviruses suggested that they might share similar evolutionary trajectories. These findings warrant further studies to elucidate the etiological and epidemiological significance of the emerging Restviruses.


| INTRODUCTION
Because of the differences in cell structure and innate immunity processes between plants and animals, 1,2 plant-and animal-infecting pathogens have evolved in distinct pathways that lead to high specialization and to being usually confined to their host range.
Nonetheless, human and animal infections with phytopathogens have continuously emerged, which caused great concerns about their public health and veterinary threat. 1 Tobacco mosaic virus in the family Tombusviridae, was initially identified in saliva of smokers, 3 and subsequently proved to cause human infection by detection of specific antibodies against the virus. 4][10][11][12][13] Phylogenetic analysis found that the clade of Statoviruses was related to the plant-infecting family Tombusviridae and the animal-infecting family Flaviviridae, and sandwiched between the two virus families on the phylogenetic tree. 8However, it remains unclear if the stool-associated viruses from the gastrointestinal tracts are able to cause human infections, or simply "passengers" due to dietary exposure or food contamination. 5More recently, a Statovirus-like virus was detected in three nasal-throat swabs from livestock farm workers experiencing acute respiratory disease.The partial RNAdependent RNA polymerase (RdRp) and coat protein sequences with 249 and 260 amino acids (aa) in length shared only 40.4% and 45% identity with known Statoviruses sequences.Although no official demarcation of Statoviruses is available, the low identity of the RdRp sequence with other Statoviruses and its distinction in the phylogenetic tree suggest it as a variant viral member. 14 this study, five near-full genomes of Statovirus-like viruses were assembled from respiratory samples of patients with Human coronavirus OC43 (HCoV-OC43) infection through retrospective analysis of the previous meta-transcriptomic data, 15

| Meta-transcriptome sequencing and viral genome assembly
Meta-transcriptome sequencing was conducted on a total of the 321 samples positive for HCoV.The library was performed using Illumina NovaSeq platform with 150 bp paired-end sequencing.
Raw reads generated were checked for quality and filtered using the AfterQC program and human reads were filtered by the HISAT2 program (version 2.2.1).Subsequently, de novo assembly was conducted using Trinity (version 2.12.0).After assembly of HCoV genome sequences, 15 we used the meta-transcriptomic sequencing output to retrospectively identify other unclassified viruses by comparing contigs against National Center for Biotechnology Information (NCBI) non-redundant nucleotide (nt) database through the blastn program (version 2.11.0) 16 and against the non-redundant protein database through Diamond blastx (version 2.0.9), 17 with an evalue threshold of 1e-5 to retain high sensitivity at a low falsepositive rate.To eliminate mis-assembly, quality-controlled reads were mapped to the assembled complete genomes using Bowtie2 (version 2.3.5.1). 18The demographic and clinical information was extracted from the previous data set of patients infected with HCoVs. 15

| RT-PCR amplification and verification of viral genome
To verify the viral genome sequences detected in metatranscriptome sequencing output, we conducted RT-PCR assays for amplifying the viral sequences using a serial of overlap primers designed according to meta-transcriptome-assembly genome sequences of Restviruses (Supporting Information S1: Table 1).All available samples were amplified for the virus sequences and confirmed by Sanger sequencing.The Sanger sequencing results were subsequently used for mutual validation with the sequences assembled from meta-transcriptome sequencing output.Meanwhile, to determine Restviruses viral load, a quantitative system using realtime RT-PCR assay was carried out (Supporting Information S1:  19 We used function trimAl (version 1.4.1) 20to remove ambiguously aligned regions.Phylogenetic trees were constructed using the maximum likelihood (ML) method with 1000 bootstrap replicates in IQ-TREE (version 2.2.2.3), 21 employing the best-fit model TVMe + R4 for complete genomes, model LG + I + G4 for RdRps and model JTT + F + G4 for capsid proteins according to Bayesian Information Criterion (BIC).We also constructed the phylogenetic trees using the software MrBayes (version 3.2.7) 22for comparison.The phylogenetic trees were rooted at midpoint using phangorn package and visualized using ggtree package in R software (version 4.2.1).

| Viral genomes annotation
Open reading frames (ORFs) of all viral genomes were defined and annotated through a BLAST-like algorithm by alignment with the full length of each annotation using Geneious Prime 2023.2.1 (https://www. T A B L E 1 The characteristics of five patients infected with Restviruses.geneious.com).The conserved domains were identified using RSP-blast (version 2.6.0)compared with CDD database (version 3.20) and verified on the HAMMER web server (https://www.ebi.ac.uk/Tools/hmmer/).

| Viral searching through Sequence Read Archive (SRA) database
To determine the prevalence of the viruses, we downloaded raw sequencing data from humans' respiratory or intestinal tracts from the SRA database (https://www.ncbi.nlm.nih.gov/sra/).A total of 2655 runs were downloaded, and the accession numbers were listed in Supporting Infromation S1: Text 1. Taking the sequences assembled in this study as a reference, we aligned the downloaded SRA data against the reference sequences to perform read mapping using Bowtie2 (version 2.3.5.1).

| Discovery and assembly of previously unrecognized viral genomes
From the meta-transcriptomic sequencing data set, we initially identified reads of previously unrecognized viruses from sputum and bronchoalveolar lavage fluid samples of two patients with HCoV-OC43 infection, who had experienced acute respiratory disease (Table 1).The relative abundance of viruses on the family level in each library was calculated and normalized by the number of mapped reads per million total reads (Supporting Infromation S1: Figure 1).Through de novo assembly of the reads of these samples, 4587-and 4481-length viral nt sequences (breadth of coverage > 95%) were respectively assembled from the two samples, which were most closely related to a proposed taxon of Statoviruses but only shared 33.3%-33.7%aa identities by BLASTx.We then designed specific primers according to assembled virus sequences (Supporting Infromation S1: Table 1), and performed RT-PCR of the residual samples followed by Sanger sequencing of the amplicons to verify the accuracy of viral genomes identified using transcriptomeassembly.The two near-full genome sequences were confirmed by specific RT-PCR and subsequent Sanger sequencing.In addition, the other three nearly complete genome of 4583-, 4410-, and 4389-nt were, respectively obtained by mapping assembled contigs to sequences from oropharyngeal swabs of two children under 1 year of age and a sputum of an adult female.As a result, five near-full-length genomes of Statovirus-like viruses were available for further analysis (Figure 1).We determined the viral load by real-time RT-PCR assay for each sample.
Except that the sample of case 4 is not available, the viral load of other samples was detected (Table 1).The range of viral load for case 1, case 2, and case 3 was 10 5.02 -10 6.40 copies/mL, while the viral load of case 5 was below the detection threshold.The five genome sequences were deposited to GenBank under the accession number OR367718-OR367722 (Supporting Infromation S1: Table 3).Notably, the three patients positive for the Statovirus-like viruses were also detected positive for HCoV-OC43 infection.

| Evolutionary characteristics of the newly recognized viruses
The five viruses identified in this study were divided into two groups.
The near-full genome sequences of the three strains in group 1 Genome organization with open reading frames of Restviruses.The annotated protein for each virus is highlighted, the RdRp domain in gray and coat protein in green.
detected during 2016−2017 shared 97.0%-98.8%nt identities with each other, and those of the other two strains in group 2 detected in 2019 had 95.9% nt identity.While the nt identities of the genomes ranged from 53.1% to 98.8% among the five viral strains (Supporting Information S1: Figure 2).Their genomic sequences shared only 39.9%-44.3%nt identities with the most closely related genome of a porcine Statovirus (GenBank accession no.MW504581) detected from swine slurry in a North American swine farm operation. 13The phylogenetic analysis on the basis of the genome sequences revealed that the viruses recognized in this study were phylogenetically close to the cluster of porcine Statoviruses, but formed a well-supported clade distinct from the taxon of Statoviruses (Figure 2A).
Subsequently, we constructed the phylogenetic tree based on aa sequences of the most conserved RdRp protein and capsid protein from the five viruses of this study and all available Statoviruses or Stato-like viruses (Figure 2B,C).The phylogenetic trees based on RdRp protein generated by either maximum likelihood (Figure 2B) or MrBayes analysis (Supporting Information S1: Figure 3) showed consistent topology with that constructed using genomic nt sequences.10][11][12][13] We further compared the RdRp aa identity of the virus in this study with those in the closest clade of porcine Statoviruses, and found that the identities were 62.4%-100% with each other, but only 37.6%-41.6%with porcine Statoviruses (Supporting Information S1: Figure 4).At present, there is no species or even genus demarcation of the family Tombusviridae according to the current scheme of virus classification by the International Committee on Taxonomy of Viruses (ICTV). 24  We further compared the diversity in different regions of Restvirus genomes between group 1 and group 2 Restviruses (Supporting Information S1: Figure S1).The aa identities of the ORF1 region were 96.0%-98.6%with each other among the three in group 1, and 94.7% between the two in group 2. While the aa identities of ORF1 between the two groups were lower than 40.6%.
The aa identities of ORF2 were 100% within group 1, 99.5% within group 2, and 52.6% between the two groups.There were 94.6%-98.0%aa identities of ORF1 region that did not overlap with ORF2 within group 1, 92.7% within group 2, and 38.9%-43.5% between the groups.We also aligned the 3′ NTR nt sequences of Restvirus genome, which include two strains in group 1 and two strains in group 2, and used four genomes to denote the terminal nt.
The 3′ NTR downstream of the stop codon was conserved (TGA) for ORF2 and various for ORF1, including TAA for group 1 and TAG for group 2.

| Epidemiological and clinical characteristics of patients with Restviruses
As mentioned above, two of five patients infected with Restviruses were detected from meta-transcriptome sequencing output, and subsequently confirmed by RT-PCR and Sanger sequencing.The other three were identified by assembled contigs mapped to sequences.Three Restvirus genomes were from two sputum samples and a bronchoalveolar lavage fluid sample of hospitalized old patients with severe respiratory disease, and two genomic sequences were obtained from oropharyngeal swabs of two ill children under 1 year of age (Table 1).The interval between illness onset and sample collection ranged from 1 to 3 days.Out of the five patients, two occurred in 2016, one in 2017, and two in 2019.All the five patients were coinfected with various HCoV-OC43.Three patients were male, and the other two were female.
By aligning the 2655 runs of human respiratory and digestive tract samples from the SRA database, it was discovered six runs contained reads mapping to Restviruses.However, due to the short length of the reads or insufficient sequencing depth, contigs could not be assembled.Detailed information about the six runs was listed in Supporting Information S1: Table 4.These samples were collected from tracheal aspirates of patients with acute respiratory tract infections.

| DISCUSSION
Here we report the discovery and characterization of Restviruses, genetically related to but distinct from the novel taxon of Statoviruses, identified by retrospective analysis of metatranscriptomic data and confirmed by RT-PCR amplification and subsequent Sanger sequencing from respiratory samples of five patients in Beijing, China.Currently, the admittedly criteria for proving a true positive meta-transcriptomic result is confirmatory tests from the original samples. 27In this study, we designed specific primers for subsequent confirmatory RT-PCR tests according to the viral reads and contigs presenting in the meta-transcriptomic data.
Although a virus might be identified and subsequently characterized even on the basis of a single initial read from the meta-transcriptomic data as previously reported, 28,29  Restvirus, even though we were unable to proceed with further assembly and verification.Thus, its public health significance as well as the role as an etiological agent deserve further investigation.The potential infectivity and pathogenicity of a previously unknown virus are often inferred from its closest virus.This may be rational for new viruses that share a high genome nt identity to the reference virus.
However, this is often problematic, if the viral sequence is greatly divergent from the reference virus.Despite as Statovirus-like viruses, the genomic sequences of Restviruses in this study only share 39.9%-44.3%nt identities with Statoviruses.][10][11][12][13] Further investigations are needed to clarify the debates.An example is pepper mild mottle virus, the pathogenicity of which was unclear when it was found in human feces, and later which was proven to be a causative agent of patients with fever, abdominal pains, and pruritus. 6,7 advances in metagenomic sequencing and analysis, wholegenome sequencing of either known or novel pathogens can now be done directly from clinical samples, helping to accurately elucidate the viral infection and characterization. 30The discovery of genomes of Restviruses indicates that metagenomic sequencing is a valuable approach to identifying viruses from clinical samples, which can facilitate ecological and differential diagnosis of infections with known or unknown pathogens.The advantage of metagenomic approach over RT-PCRs lies in its capacity of identifying and assembling all known and previously unrecognized virus genomes simultaneously, because metagenomic approaches do not target particular pathogens. 31This untargeted manner makes it a promising application prospect of equally detecting expected pathogens as well as emerging pathogens, such as severe acute respiratory syndrome coronavirus 2, the pathogen of COVID-19. 32tably, Restviruses in this study are all detected from the respiratory samples of five patients with HCoV-OC43 infections. 15e HCoV-OC43 usually causes mild illness and reinfection in humans due to short-lasting protective immunity. 33The three old be directly concluded due to a relatively small sample size, the role of Restviruses should not be ignored.As an example, the recent outbreaks of acute severe hepatitis of unknown etiology in children, 33 have subsequently been revealed by agnostic metagenomic sequencing that the severity of hepatitis is related to coinfections involving adeno-associated virus type 2 and one or more helper viruses such as human adenoviruses, herpesvirus 6B and Epstein-Barr virus. 34Therefore, it is crucial that the potential respiratory pathogenic role of Restviruses should not be disregarded.
In the future, it would be valuable to conduct surveillance and and confirmed by specific reverse transcription-polymerase chain reaction (RT-PCR) and subsequent Sanger sequencing of the amplified products.The evolutionary position and genomic structure of the newly recognized viruses were then investigated, meanwhile the epidemiological and clinical characteristics of the patients were described.

2 | METHODS 2 . 1 |
Study design and RNA extraction A multicenter surveillance study of the human seasonal coronavirus (HCoV) infections was performed based on the Respiratory Pathogen Surveillance System (RPSS) at Beijing Center for Disease Prevention and Control from January 2016 to December 2019 at Beijing Metropolis, China.Oropharyngeal swabs, broncho-alveolar lavage fluid, and other respiratory samples were collected according to hospital or general practice standard procedures.A total of 321 samples positive for HCoV were collected.According to the manufacturer's instructions, total RNA was extracted using the AllPrep RNA Mini Kit (Qiagen).To eliminate potential nucleic acid background contamination, all buffers, reagents, and plasticware used for nucleic acid extraction, and amplification were subjected to UV irradiation.Furthermore, a blank control, sterile enzyme-free water was performed to assess the possibility of cross-contamination and potential reagent contamination.
Based on the classification information available in the NCBI species database, Statovirus and statovirus-related viruses are categorized as unclassified viruses.With regard to the evolutionary analysis of the whole genome, the viruses in this study clustered into a superclade are between the families Flaviviridae and Tombusviridae.According to the commonly accepted species demarcation threshold (80% identity of the genome-wide nt sequence, and 90% aa identity of RdRp protein), 25,26 the viruses identified from respiratory tracts of patients in this study and Viet Nam should be sufficiently divergent to represent a new member, considering the so low identity of the genome as well as the RdRp protein sequences with related to the taxon of Statoviruses and the distance in the phylogenetic trees.We provisionally named them "Restviruses" for Respiratory Statovirus-like viruses, given the sample origin of respiratory tracts and initial identifiable sequence alignment to Statoviruses.

3. 3 |
Genomic characteristics of RestvirusesGenome sequence analyses showed that Restviruses had single, positive-strand RNA genomes.Restviruses had a putative genome organization with only two ORFs, flanked by relatively short 3' nontranslated RNA (NTR) segments of 20-26 nt.The larger ORF1 spanned most of the genome composing of 1521 aa and 1485 aa in group 1 and group 2, respectively, which encoded a conserved RdRp Superfamily I domain.The smaller ORF2 encoded a predicted 404 aa and 401 aa of the coat proteins in group 1 and group 2, respectively, which was fully overlapped within ORF1 domain (Figure 1, Supporting Information S1: Figure 5).Although Statoviruses sometimes had an out-of-phase overlap between ORF1 and ORF2, the genome characteristics of Restviruses and Statoviruses were basically consistent, suggesting their similar evolutionary trajectory.

2
Phylogenetic tree of Restviruses.(A) Phylogeny of Restviruses and Stato-like virus based on complete genome.(B) Phylogeny of Restviruses and Statolike virus based on RdRp domain.(C) Phylogeny of Restviruses and Stato-like virus based on capsid protein.Branch supports obtained from 1000 bootstrap replicates are shown.The Restviruses obtained in this study were marked with red points.failure, one of whom was transferred to the intensive care unit.Two old patients were recorded to have at least one underlying chronic illness, such as coronary heart disease, hypertension, and cerebral vascular accident.Three hospitalized patients displayed abnormal results under chest computed tomography (CT) or X-ray radiography examination.A patient had abnormally higher white blood cell counts, indicating possible coinfections with bacteria.The patient who underwent blood gas analysis had the highest alveolar oxygen partial pressure (PaO 2 ), possibly due to oxygen therapy.One patient had abnormally lower PaO 2 , and showed an arterial oxyhemoglobin saturation below 95%.The pressure of carbon dioxide (PaCO 2 ) for two patients were out of the normal range of 35-45 mmHg, and one (case 1) had PaCO 2 below 35 mmHg and one (case 2) upper to 50 mmHg (Table we obtained complete or nearly complete genome sequences of Restviruses in this study through either de novo assembly or specific RT-PCR and subsequent sequencing.The genome sequences of Restviruses only share 39.9%-44.3%nt identities with the most closely related taxon of Statoviruses, and formed a well-supported clade separating from Statoviruses (Figure 2).In addition, Restviruses have a different genome organization in comparison with the closely related Statoviruses.Typically, Tombusviruses possess 3-5 kb sized genomes with 3-5 ORFs but lack a large ORF that spans the entire genome. 8Although both Restviruses and Statoviruses have two overlapped ORFs, their overlapping patterns are different.The smaller ORF2 of Restviruses fully overlaps within the larger ORF1 domain, while the smaller ORF of Statoviruses has an out-of-phase overlap with ORF1 (Figure 1).Both phylogenetic analysis and genomic structure support the argument that Restviruses should be a new member distinct from Statoviruses and other known viruses.All the Restviruses were detected in samples from both lower and upper respiratory tracts, including sputum, bronchoalveolar lavage fluid, and oropharyngeal swabs, thus the possible contamination due to inhaling air or taking food can be basically excluded, suggesting that Restviruses should be infective, opportunistically infective, even pathogenic to humans.The presence of similar viruses in nasal-throat swabs of Vietnamese patients experiencing acute respiratory disease 14 further supports that the emerging Restvirus might globally distribute among humans.Furthermore, six runs in SRA database from respiratory samples contained some reads mapping to patients with coinfections of Restviruses and HCoV-OC43 were hospitalized due to severe illness with abnormal results CT or X-ray radiography examination.Although it remains unclear if Restvirus is an opportunistic infection in the general population, or HCoV-OC43infected cases, or if it has contributed to causing more severe cannot SONG ET AL. | 7 of 9 screening for individuals infected with Restviruses within the population to gain a more comprehensive understanding of the impact of Restviruses on humans.The main limitation of the study is that paired sera from acute and convalescent phases are not available from the Restvirusesinfected patients for antibody detection because this is a retrospective study based on previously obtained metatranscriptomic database.Secondly, the limited sample size prevented us from isolating the virus and conducting extensive laboratory investigations to determine the pathogenicity of Restviruses.Thirdly, the coinfection with HCoV-OC43 makes us unable to conclude Restviruses as the main etiological agent or just a helper pathogen for the patients with acute respiratory disease.Further ongoing surveillance should be conducted to investigate the presence of the Restviruses in humans and other animals to comprehensively assess its pathogenicity.In summary, the discovery of a previously unrecognized virus, Restvirus, in patients with acute respiratory disease highlights the continuous emergence of respiratory pathogen in humans, and warrants further investigations to characterize its pathogenicity and clinical importance.Metagenomic analysis is a promising and applicable approach to detection of respiratory viral pathogens in clinical samples independent of known viral genome sequence.