Polycipiviridae: a proposed new family of polycistronic picorna-like RNA viruses

Solenopsis invicta virus 2 is a single-stranded positive-sense picorna-like RNA virus with an unusual genome structure. The monopartite genome of approximately 11 kb contains four open reading frames in its 5′ third, three of which encode proteins with homology to picornavirus-like jelly-roll fold capsid proteins. These are followed by an intergenic region, and then a single long open reading frame that covers the 3′ two-thirds of the genome. The polypeptide translation of this 3′ open reading frame contains motifs characteristic of picornavirus-like helicase, protease and RNA-dependent RNA polymerase domains. An inspection of public transcriptome shotgun assembly sequences revealed five related apparently nearly complete virus genomes isolated from ant species and one from a dipteran insect. By high-throughput sequencing and in silico assembly of RNA isolated from Solenopsis invicta and four other ant species, followed by targeted Sanger sequencing, we obtained nearly complete genomes for four further viruses in the group. Four further sequences were obtained from a recent large-scale invertebrate virus study. The 15 sequences are highly divergent (pairwise amino acid identities of as low as 17 % in the non-structural polyprotein), but possess the same overall polycistronic genome structure, which is distinct from all other characterized picorna-like viruses. Consequently, we propose the formation of a new virus family, Polycipiviridae, to classify this clade of arthropod-infecting polycistronic picorna-like viruses. We further propose that this family be divided into three genera: Chipolycivirus (2 species), Hupolycivirus (2 species) and Sopolycivirus (11 species), with members of the latter infecting ants in at least 3 different subfamilies.


INTRODUCTION
The order Picornavirales currently comprises the families Dicistroviridae, Iflaviridae, Marnaviridae, Picornaviridae and Secoviridae, and the unassigned genera Bacillarnavirus and Labyrnavirus.Members of the order are characterized by (i) a positive-sense RNA genome, usually with a 5¢ covalently linked VPg (virus protein, genome linked) and a 3¢ poly(A) tail, (ii) a polyprotein gene expression strategy with cleavage mainly mediated by viral protease(s), (iii) a structural protein module containing three jelly-roll capsid protein domains which form small non-enveloped icosahedral virions with pseudo T=3 symmetry and (iv) a non-structural protein module containing a superfamily III helicase (or NTPase), a 3C-like chymotrypsin-like protease and a superfamily I RNA-dependent RNA polymerase (RdRp), encoded sequentially in that order [1].In contrast, the families Caliciviridae and Potyviridae (for example), although termed picorna-like (and informally grouped into a 'picorna-like superfamily') are excluded from the Picornavirales order for various reasons; for example, caliciviruses encode a single jelly-roll capsid protein and virions have true T=3 symmetry, while potyviruses have an unrelated capsid protein and filamentous virions, and encode a superfamily II instead of a superfamily III helicase.Solenopsis invicta virus 2 (SINV-2) is a positive-sense single-stranded RNA virus that infects the red imported fire ant, Solenopsis invicta Buren, an invasive species in the southern USA [2].Replicating virus was detected in the larval and adult stages of S. invicta [3].While field infection rates are rather low for S. invicta [4,5], there are distinct fitness costs for founding queens infected with the virus [6].produced less brood and had longer claustral periods.SINV-2 infection was also associated with significant upregulation of global gene expression in S. invicta queens, including immune response genes [6].The monopartite genome was originally found to contain three open reading frames (ORFs) in its 5¢ third and a single long ORF in its 3¢ two-thirds, though we show here that there are actually four consecutive 5¢ ORFs, one having been overlooked previously as a result of a sequencing error.The polypeptide encoded by the 3¢ ORF contains motifs characteristic of a superfamily III helicase, a 3C-like chymotrypsin-related protease and a superfamily I RdRp [2].A phylogenetic analysis of RdRp sequences placed SINV-2 within the 'picorna-like superfamily', but outside of established virus families [7].Isometric particles with a diameter of ~33 nm were only found in ants testing positive for SINV-2 by RT-PCR [2].
By searching the NCBI transcriptome shotgun assembly (TSA) database, we identified six related sequences, of which five were derived from ant samples.To further explore the diversity and prevalence of this group of viruses, we performed high-throughput sequencing of RNA derived from Solenopsis invicta and four UK ant species.This led to the identification of four additional viruses.During the course of this work, 1445 new viruses were identified via highthroughput sequencing [8], of which four exhibited similar genome organizations to SINV-2.We performed a phylogenetic and comparative genomic analysis of these SINV-2like virus sequences.All have a characteristic genome organization, with four consecutive 5¢ proximal ORFs and one long 3¢ ORF.ORFs 1, 3 and 4 are predicted to encode jellyroll fold capsid proteins, while ORF2 encodes a protein of unknown function.Ant-infecting members of the group apparently have an additional 5¢ ORF (ORF2b) overlapping the 5¢ end of ORF2 and encoding a small protein containing a predicted transmembrane domain.The 3¢ ORF5 encodes helicase, protease and RdRp domains, is presumed to encode a VPg and potentially another protein between the helicase and protease, and is likely to encode one or more additional proteins upstream of the helicase.The unusual genome organization and phylogenetic distinctness of this group of viruses suggests they should be classified into a new virus family, for which we propose the name Polycipiviridae (polycistronic picorna-like viruses).The characteristic complement of picornavirus-like protein domains suggests that the Polycipiviridae should be included within the Picornavirales order.

Identification of SINV-2-like viruses
The SINV-2 genome was first sequenced by Valles et al. [2].Suspicious of a large apparently non-coding gap following ORF1, we resequenced the SINV-2 genome.This resulted in the correction of a UGA stop codon to a UGG tryptophan codon, after which it became apparent that the region following ORF1 is actually occupied by an ORF.The reannotated SINV-2 ORFs are shown in Fig. 1.The new SINV-2 sequence has been deposited in GenBank under accession MF041813.1,and has 143 nucleotide differences relative to the previous SINV-2 sequence EF428566.1.
Further high-throughput sequencing of Solenopsis invicta RNA samples revealed the presence of a new SINV-2-like virus, which we named Solenopsis invicta virus 4 (SINV-4).A complete genome was obtained by Sanger sequencing and has been deposited in GenBank with the accession number MF041808.1.SINV-4 has the same genome organization (Fig. 2), and 31 % amino acid identity to SINV-2 in the ORF5 polypeptide.
Next we queried the NCBI Transcriptome Shotgun Assembly (TSA) database using TBLASTN with the SINV-2 ORF5 polypeptide as the query and eukaryotes as the target taxonomic group.Amongst the top matching 'hits', long sequences (>10 kb) were identified, and their longest ORF extracted, translated and queried against NCBI RefSeq virus genome sequences.Sequences whose reciprocal best match was SINV-2 were retained, while other sequences (typically with top matches to iflavirus or dicistrovirus sequences) were discarded.This resulted in five long sequences with reciprocal best matches to SINV-2.An additional long sequence was obtained by merging the partly overlapping fragments LH935078.1 and LH935077.1.Five of the long sequences derive from ant RNA-Seq libraries (order Hymenoptera, family Formicidae; LA858223.1 and LA866448.1 from Monomorium pharaonis, LI526777.1 from Lasius neglectus, LI719284.1 from Linepithema humile and the combined LH935078.1+LH935077.1 from Formica exsecta).The sixth sequence, KA182589.1,derives from a Chironomus riparius RNA-Seq library (order Diptera; family Chironomidea).Sequence attributes are recorded in Table 1 and the genome organizations are depicted in Fig. 2. A number of shorter partial genome sequences were also identified, but are not discussed further here.
Given that SINV-2-like viruses appeared to be most prevalent in ants, we obtained samples from four different ant species (Lasius flavus, L. niger, L. neglectus and Myrmica scabrinodis) collected in Cambridge, UK, and performed two rounds of high-throughput RNA sequencing.Contigs were assembled using Trinity [9,10] and Velvet [11], and SINV-2-like sequences were identified with BLASTX.
In the first round, we sequenced L. niger and L. flavus RNA samples using both small-RNA sequencing to enrich for 21-22 nt virus-derived RNAs that are expected to be produced as a result of the insect RNA interference (RNAi) anti-viral defence pathway, and standard RNA-Seq.In this case, good contig assemblies were not obtained for the small-RNA samples and no further small-RNA sequencing was performed.For the long RNA-Seq of L. niger, but not L. flavus, we identified one SINV-2-like sequence.In the second round, we performed long RNA-Seq for RNA obtained from M. scabrinodis, L. neglectus and a second L. flavus sample, and identified SINV-2-like sequences for M. scabrinodis and L. neglectus.Morphological ant species identifications Using Bowtie 2 [12], we mapped RNA-Seq reads (>30 nt) back to the assembled virus genomes to assess coverage and variation.LniV-1, LneV-1 and MsaV-1 had mean coverage values of 6.4, 457 and 1348, respectively.For the two viruses with high coverage (Fig. 3), we identified single nucleotide polymorphisms (SNPs) that were present in >10 % of reads.
In MsaV-1, we found seven SNPs in coding regions, all non-synonymous and with frequencies just over 10 %.In LneV-  1 and the genome organizations are depicted in Fig. 2.
In summary, we identified a total of 15 SINV-2-like sequences (Table 1), comprising a correction of the original SINV-2 sequence, SINV-4, six sequences from the NCBI TSA database, three sequences from UK ant species and four additional recently published sequences.Fourteen of the sequences appear to represent complete or nearly complete virus genomes.

Genome organization of SINV-2-like viruses
Each full-length sequence had five main ORFs (Fig. 2).The stop and start codons of consecutive ORFs in the 5¢ region have closely spaced (often overlapping) stop and start codons (Table 1).We also identified a shorter sixth ORF (termed ORF2b) in some members of the group (Fig. 2).ORF2b overlaps the 5¢ end of ORF2 in the +1 frame relative to ORF2.The Formica exsecta TSA contains an additional short ORF (ORF3a) between ORFs 2 and 3. Hubei picornalike virus 81 (accession number KX884540; [8]) appears to be incomplete, lacking ORFs 1 and 2 and most of 3. Sequences with ORF2b cluster phylogenetically (Fig. 4; see below) and all are ant-associated, except for Shuangao insect virus 8, which was derived from an 'insect mix' that was not reported to contain members of the family Formicidae [8].However, insects within this sample may include predators of ants; thus the sequence could potentially have originated from an ant (or other) host.
Most of the 15 sequences had some amount of the 5¢ and 3¢ untranslated regions (UTRs) present, though, given the variability in UTR lengths, we assume that several were incomplete.The 5¢ UTR sequences were frequently >200 nt and ranged up to 366 nt in length.Eight sequences extend to the poly(A) tail and may therefore be assumed to have been 3¢complete.For these sequences, the 3¢ UTR lengths ranged from 385 to 479 nt, excluding the poly(A) sequence.The length of the intergenic region between ORFs 4 and 5 ranged from 336 to 768 nt.To assess 5¢-completeness we compared 5¢ nucleotide sequences between different species (Figure S1, available in the online Supplementary Material).
A number of polycipivirus sequences exhibit a predicted stable simple stem-loop structure close to the 5¢ end.In SINV-4, SINV-2 and the L. neglectus TSA, this is preceded by an AU-rich tract with 5¢-terminal UUU.This 5¢ end similarity between these highly divergent sequences suggests that this represents the true 5¢ end of the genome.Relative to these, the MsaV-1 and LneV-1 sequences appear to be missing approximately 5 and 23 nt, respectively, from their 5¢ ends.
For the LniV-1 sequence we were unable to obtain complete 5¢ or 3¢ UTR sequences.
For each ORF, functions were predicted with HHpred [13] using the Pfam [14] and PDB [15] databases.For most sequences, ORFs 1, 3 and 4 were predicted to encode picornavirus-like capsid proteins (Fig. 2).For the most divergent sequences -Hubei picorna-like viruses 81 and 82, and the C. riparius TSA -ORF1 was predicted to encode a picornavirus-like capsid protein, but for ORF4 this was only predicted for Hubei picorna-like virus 81, and for ORF3 it was not predicted for any of the three.However, for ORFs 3 and 4, HHpred [16,17] found significant sequence alignments (E-values <0.0001; alignment lengths ranging from 112 to 245 amino acids) between an alignment of the 11 sequences from the SINV-4/SINV-2 clade and a query alignment of the C. riparius TSA and Hubei picorna-like virus 82, or the single Hubei picorna-like virus 81 sequence, suggesting that these ORFs encode homologous proteins across all 14 sequences.For all sequences, ORF5 was predicted to encode helicase (Hel), protease (Pro) and RNAdependent RNA polymerase (RdRp) domains.ORFs 2 and 2b had no HHpred matches; however, a transmembrane domain was predicted using TMHMM [18] in the middle of all ORF2b amino acid sequences.HHpred indicated homology between ORF2 of the SINV-4/SINV-2 clade and ORF2 of Hubei picorna-like virus 81 (E-value <10 À8 ; alignment length 126 amino acids), but homology to ORF2 of the C. riparius TSA/Hubei picorna-like virus 82 clade was less certain (E-value=0.029;alignment length 165 amino acids).The Formica exsecta TSA ORF3a polypeptide also had no significant HHpred matches.
Following Koonin and Dolja [19], we identified characteristic Hel, Pro and RdRp motifs in the ORF5 polypeptide sequences.All sequences contained the three superfamily III helicase motifs (Fig. S2).The protease domain is less well conserved across picorna-like viruses, with only three very short characteristic motifs corresponding to the catalytic triad -H, D and C (within a quite conserved GxCG motif) [19].Positive-sense RNA viruses, and in particular picornalike viruses, usually have chymotrypsin-like cysteine proteases.In the SINV-2-like sequences, however, the protease has a serine (S) at the corresponding active site, GxSG (Fig. S3).We were able to find all eight conserved RdRp motifs (Fig. S4), with the motifs most closely matching superfamily I RdRps (a group that also includes the RdRps of picornaviruses, potyviruses, sobemoviruses and nidoviruses [19]).In two sequences (the C. riparius TSA and Hubei picorna-like virus 82), the usually very well-conserved GDD in motif VI (also often called motif C) was replaced with ADD.

Phylogeny of SINV-2-like viruses
Based on the genome structure and the identified protein domains, SINV-2-like sequences may be classified within the order Picornavirales.To test for monophyly, we obtained amino acid sequences for the conserved 'core' region of the RdRp from viruses in the order Picornavirales from the supplementary material of [7], appended the equivalent region from the 15 SINV-2-like sequences with RdRp coverage, realigned the sequences with MUSCLE [20] and generated a Bayesian Markov chain Monte Carlo-based phylogenetic tree using MrBayes [21].The SINV-2-like sequences form a monophyletic (albeit highly divergent) group (Fig. 5).

DISCUSSION
We have identified and sequenced four new viruses from ant species.Together with SINV-2, 6 sequences recovered from TSA databases and 4 recent additions to GenBank, these 15 sequences form a distinct group of arthropodinfecting viruses with a characteristic genome organization.
Members have a polyadenylated positive-sense RNA genome that encodes three related picornavirus-like jellyroll capsid domains, and a non-structural polyprotein containing superfamily III helicase, 3C-like chymotrypsinrelated protease and superfamily I RdRp domains, characteristic of members of the order Picornavirales.Thus we propose that they should be classified into a new family, Polycipiviridae (polycistronic picorna-like viruses), within the order Picornavirales.The virion morphology (small icosahedral particles) observed for SINV-2 [2] is also consistent with this placement.Like the dicistronic dicistroviruses (family Dicistroviridae) and the bipartite cheraviruses, sadwaviruses and comoviruses (family Secoviridae), the polycipivirus coding sequences are split into separate nonstructural and structural protein modules.In contrast to these other groups, the polycipivirus structural protein module is further split into separate 5¢ ORFs rather than depending on polyprotein expression of multiple jelly-roll domains from a single ORF.
Although polycipiviruses form a distinct clade, there is considerable diversity among the 15 available sequences (some of the ORF5 pairwise amino acid identities are as low as 17 %), indicating that the family should be split into a number of genera.We propose the following groupings of the currently available sequences (see Fig. The polycipivirus 3C-like protease is unusual in that it contains a serine residue at its active site (serine protease), whereas the majority of Picornavirales 3C-like proteases have a cysteine residue at this location (cysteine protease).
The only other currently known exceptions are the sole member of the Picornavirales family, Marnaviridae, and some members of genus Nepovirus in the family Secoviridae [1,22].The 'picorna-like' astroviruses, sobemoviruses and poleroviruses also have chymotrypsin-like serine proteases, and cellular homologues have serine at the active site [23,24].
The most conserved protein of positive-sense RNA viruses is the RdRp.In polycipiviruses, we were able to identify all eight signature motifs of the superfamily I RdRp.Unusually, however, in two of the sequences the highly conserved GDD of motif VI was replaced with ADD.Motif VI is responsible for magnesium ion coordination.The first aspartate (D) is mainly responsible for the coordination and cannot be substituted.There is, however, potential for flexibility in the third position, and even more flexibility in the first position (reviewed in [25]).Indeed, the glycine (G) has been experimentally substituted with six different amino acids, and with some substitutions the polymerase remains active.Substitution with alanine (ADD) in tobacco vein mottling virus (family Potyviridae), polio virus (family Picornaviridae) or hepatitis C virus (family Flaviviridae) results in an in vitro RdRp activity of 5 to 12 % of wild-type activity [26][27][28], although when the same mutation was introduced in encephalomyocarditis virus (family Picornaviridae) the RdRp was inactive in vitro [29].Positive-sense and doublestranded RNA viruses have a strong preference for GDD, although nidoviruses (order Nidovirales) and some, but not all, hypoviruses (family Hypoviridae) have SDD at this site.On the other hand, non-segmented negative-sense RNA viruses generally have GDN, whereas segmented negativesense RNA viruses have SDD, and the reverse transcriptases of retroviruses have MDD [30][31][32][33] 5. Phylogenetic tree for polycipiviruses and representative members of the order Picornavirales.Core RdRp amino acid sequences from representative Picornavirales viruses were obtained from Koonin et al. [7] and combined with the equivalent regions from the 15 sequences in Table 1 with RdRp coverage.Sequences were aligned with MUSCLE, and a Bayesian Markov chain Monte Carlo-based phylogenetic tree was produced.Posterior probabilities are indicated for family root nodes, and elsewhere if P<1.00.Abbreviations: ALSV, apple latent spherical virus; BBWV1, broad bean wilt virus 1; CPMV, cowpea mosaic virus; CRLV, cherry rasp leaf virus; CrPV, cricket paralysis virus; DCV, Drosophila C virus; DWV, deformed wing virus; EMCV, encephalomyocarditis virus; FMDV, foot-and-mouth disease virus; GFLV, grapevine fanleaf virus; HaRNAV, Heterosigma akashiwo RNA virus; HAV, hepatitis A virus; HplV-81, Hubei picorna-like virus 81; HplV-82, Hubei picorna-like virus 82; HRV1A, human rhinovirus 1A; IFV, infectious flacherie virus; LneV-1, Lasius neglectus virus 1; LniV-1, Lasius niger virus 1; MsaV-1, Myrmica scabrinodis virus 1; NIMV, navel orange infectious mottling virus; PnPV, Perina nuda picorna-like virus; PV, polio virus; PYFV, parsnip yellow fleck virus; RTSV, rice tungro spherical virus; SDV, satsuma dwarf virus; ShiV-8, Shuangao insect virus 8; SINV-2, Solenopsis invicta virus 2; SINV-4, Solenopsis invicta virus 4; TRSV, tobacco ringspot virus; TrV, Triatoma virus; TSV, Taura syndrome virus.have a permuted polymerase domain, with motif VI occurring between motifs III and IV; the GDD sequence is still present in permutotetraviruses but is substituted with ADN in birnaviruses, and mutation of ADN to GDD results in an almost complete loss of activity [34,35].In summary, therefore, even though the ADD in motif VI of two polycipivirus sequences is a deviation from the expected GDD, it is still a plausible variation.The fact that it was observed in two independent sequences (which also cluster phylogenetically and are relatively distant from the other polycipivirus sequences; Fig. 4) indicates that ADD is a real variant and not a result of sequencing error.
Where investigated, Picornavirales species have been found to harbour a VPg protein covalently linked to the 5¢ end of their genome that primes RNA synthesis during genome replication.The Picornavirales VPg is typically <5 kDa and normally encoded between Hel and Pro [1].Due to their small size and lack of structural domains, divergent VPg proteins are not easily recognizable by sequence homology, and we were not able to definitively identify a VPg domain in polycipivirus sequences.Nonetheless, there is a large region of unassigned function between Hel and Pro that we presume encodes a VPg, and probably (due to the size of the region) another protein of unknown function.A further one or two additional proteins are likely encoded upstream of Hel in the ORF5 polyprotein.Due to the high divergence between polycipiviruses and picorna-like viruses whose cleavage sites have been characterized, we were unable to definitively predict the polyprotein cleavage sites.
As in other Picornavirales species, gene expression in polycipiviruses likely depends on internal ribosome entry site (IRES)-mediated initiation.Consistent with this, the lengthy 5¢ UTRs typically contained a number of AUG codons that would be expected to inhibit 5¢ end-dependent scanning.Consequently, we suppose the 5¢ UTR to contain an IRES to direct ribosome initiation at the ORF1 start codon.Similarly, we suppose the long intergenic region between ORFs 4 and 5 to contain a second IRES to direct ribosome initiation at the ORF5 start codon.A similar situation occurs in dicistroviruses (family Dicistroviridae), where the translation of structural and non-structural polyproteins is directed by separate IRESes (although in that case the non-structural polyprotein ORF is 5¢ proximal).Although we attempted to define the potential IRES RNA structure in silico by means of RNA-folding algorithms and inter-species comparisons, the results to date have been inconclusive.The translation mechanism of the additional 5¢ ORFs (ORFs 2 to 4 and, where present, 2b) remains uncertain; however, close spacing of the stop and start codons of consecutive ORFs suggests a ribosome reinitiation mechanism.Additional work will be required to confirm the composition and structure of virus particles, the presence and sequence of a 5¢-linked VPg protein, the non-structural polyprotein cleavage sites and products, and the gene expression mechanisms of this unusual family of arthropodinfecting picorna-like viruses.
Ten of the 11 members of the proposed Polycipiviridae genus Sopolycivirus appear to infect ant species, while the eleventh member (Shuangao insect virus 8) is apparently an insect virus whose host has yet to be identified (Fig. 4).These viruses have been isolated from ants across several continents and in four out of the five ant species targeted in this study.Moreover, they have been identified in three different ant subfamilies, and three individual ant species (Solenopsis invicta, Monomorium pharaonis and Lasius neglectus) were found to play host to more than one divergent viruses in the group, suggesting a long evolutionary history between Sopolycivirus species and ants.Although we cannot rule out other hosts, and our own sampling has been biased towards discovering new ant-infecting members, it seems possible that genus Sopolycivirus may be an ant-specific clade.

METHODS SINV-4 identification and sequencing
Adult worker Solenopsis invicta ants were collected from 46 nests in the eastern portion of the state of Formosa, Argentina, and returned to quarantine in Gainesville, Florida, United States.Total RNA was extracted from 10 to 15 live worker ants from each colony using Trizol (Invitrogen) and the PureLink purification kit (Ambion).Total RNA (10 µg per group) was submitted to GE Healthcare (Los Angeles, CA, USA) for mRNA purification, library preparation and Illumina RNA sequencing (MiSeq).Sequences were aligned to the S. invicta genome (GenBank accession AEAQ01000001.1)and non-matching sequences were compared to the UniProt annotated Swiss-Prot protein sequence database (download date 14 November 2014).The unmatched sequences were assembled (Vector NTI, Invitrogen) and a unique sequence with significant similarity to SINV-2 was identified by BLASTX analysis.RT-PCR with genespecific oligonucleotide primers revealed that this sequence was present in some colonies of S. invicta in the USA.Total RNA from USA S. invicta colonies containing the sequence was extracted and used as template for cDNA synthesis, subsequent PCR and RACE (5¢ and 3¢) to obtain the entire genome sequence of this new virus, SINV-4, by Sanger sequencing.For 5¢ RACE, the 5¢ RACE System for Rapid Amplification of cDNA Ends, version 2.0 (Invitrogen, Carlsbad, CA, USA) was used.cDNA was first synthesized with oligonucleotide p1476 (5¢-TGGAATTCCAGAATTTTCTAAGGTTCCCA TATTAGT), followed by PCR with the gene-specific primer p1474 (5¢-TGAATTCCAGGTAACGCTTGAAC-CATTGGT) and the abridged anchor primer (Invitrogen).For 3¢ RACE, the GeneRacer kit (Invitrogen) was used.cDNA was synthesized with the GeneRacer Oligo dT primer.PCR was subsequently completed with the GeneRacer 3¢ primer and the gene-specific primer, p1536 (5¢-ATGGCTGTTGCTGACATGTTATG CATTATGTT).LniV-1, LneV-1 and MsaV-1 identification and sequencing Adult L. niger, L. neglectus, M. scabrinodis and L. flavus ants were collected from individual nests in Cambridge, UK; all ants were workers except the sequencing round 1 L. flavus sample, for which queens were used.Total RNA was extracted from 10 to 20 adult workers or 5 queens using Trizol (Invitrogen) following the manufacturer's instructions.RNA was treated with DNase (Promega, RQ1 RNase-free DNase).For standard RNA-Seq libraries, ribosomal RNA was depleted using the Ribo-Zero kit (Illumina), and the remaining RNA subjected to alkaline hydrolysis, followed by acrilamide gel purification of RNA bands within the size range 35-50 nt for L. niger and 70-85 for the other four species.For small-RNA sequencing, RNA bands within the size range 18-30 nt were acrylamide gel-purified from total RNA after DNase treatment.All library amplicons were constructed using a small RNA cloning strategy [36,37] and sequenced (single-end, 75 nt) using the NextSeq500 platform (Illumina) at the DNA Sequencing Facility (Department of Biochemistry, University of Cambridge).High-throughput sequencing data were deposited in ArrayExpress (http://www.ebi.ac.uk/arrayexpress) under the accession number E-MTAB-5781.RNA-Seq reads were trimmed using the FASTX-Toolkit and assembled using the Trinity (v 2.3.2) and Velvet (v 1.2.10) de novo assemblers [10,11].Using BLASTX, the assembled contigs were compared to a database of polypeptide sequences derived from SINV-2, SINV-4 and the SINV-2-like sequences identified in the NCBI TSA database.Contigs that mapped to SINV-2-like proteins with E-value <10 À6 and length >300 nt of coding sequence were retained, manually joined where possible, and used to design primers for Sanger sequencing.
Gaps were filled and the entire genomes sequenced by Sanger sequencing and terminal sequences obtained by 5¢ and 3¢ RACE, as follows.Two µg of total ant RNA was treated with proteinase K and then recovered by acid phenol/chloroform extraction and used for 5¢-and 3¢-RACE using the SMARTer RACE 5¢/3¢ kit (Clontech) according to the manufacturer's instructions.Gene-specific primers for 5¢ and 3¢ RACE were located within 1000 nt of the expected 5¢ and 3¢ ends of the virus genome.The resulting clones (12)(13)(14)(15)(16)(17)(18)(19) for each of MsaV-1 5¢ and 3¢, LneV-1 5¢ and 3¢ and LniV-1 5¢, and 4 for LniV-1 3¢) in the pRACE vector were sequenced with M13 universal primers.The rest of the viral genomes were PCR-amplified as eight partially overlapping DNA fragments using specific primers and the same cDNA templates as were generated for the 5¢/3¢ RACE PCRs.These fragments were cloned into the pJET1.2vector (Thermo Scientific) and sequenced with pJET1.2-specificprimers.

Computational analysis
Sequences were processed using EMBOSS [38] and analysed using BLAST [39], HHpred [13] (using the PDB and Pfam databases) and Phyre 2 [40] (analysis performed between November 2016 and May 2017).Comparison of amino acid sequences/alignments between different polycipivirus clades was performed using HHpred in the align two sequences/ alignments mode during 23-28 June 2017.Amino acid sequences were aligned using MUSCLE v 3.8.31[20] and phylogenetic trees were estimated using the Bayesian Markov chain Monte Carlo method implemented in MrBayes v 3.2.6 [21], sampling across the default set of fixed amino acid rate matrices, with 10 million generations, and discarding the first 25 % as burn-in.The trees were visualized using Fig-Tree v 1.4.3.Pairwise amino acid identities were calculated based on pairwise ORF5 MUSCLE alignments.To calculate the coverage and polymorphism frequencies, Bowtie 2 (v 2.3.0,[12]), using default parameters, was used to map raw NextSeq sequencing reads back to the LniV-1, LneV-1 and MsaV-1 genomes.

Accession numbers, vouchers, and new family submission
We have submitted a proposal to the International Committee on Taxonomy of Viruses (ICTV) to name a new family Polycipiviridae containing the genera Sopolycivirus, Hupolycivirus and Chipolycivirus in the order Picornavirales.Sequences for SINV-4, SINV-2, LneV-1, MsaV-1 and LniV-1 have been submitted to NCBI with the accession numbers MF041808, MF041813, MF041809, MF041810 and MF041812.Voucher specimens of the S. invicta ant species from which SINV-4 was originally sequenced are retained at USDA-ARS, Gainesville, Florida, USA.

Fig. 3 .
Fig. 3. RNA sequencing coverage of the MsaV-1 and LneV-1 genomes.The total coverage at each nucleotide position is indicated in red (positive sense) or blue (negative sense; note the different axis scale).Black lines show the mean coverage in a 1000-nt sliding window.

Fig. 4 .
Fig. 4. Phylogenetic tree of the proposed family Polycipiviridae.ORF5 amino acid sequences were aligned with MUSCLE, and a Bayesian Markov chain Monte Carlo-based phylogenetic tree was produced with MrBayes.All posterior probabilities were equal to 1.00.The tree was mid-point rooted and visualized with FigTree.TSA indicates sequences obtained from the NCBI Transcriptome Shotgun Assembly database.Proposed Polycipiviridae genera -Chipolycivirus, Hupolycivirus and Sopolycivirusare delineated with boxes.For sopolyciviruses, the host ant subfamilies are indicated on the right-hand side.