Integrase-associated niche differentiation of endogenous large DNA viruses in crustaceans

ABSTRACT Crustacean genomes harbor sequences originating from nimaviruses, a family of large double-stranded DNA viruses infecting crustaceans. In this study, we recovered metagenome-assembled genomes of 27 endogenous nimaviruses from crustacean genome data. Phylogenetic analysis revealed four major lineages within Nimaviridae, and for three of these lineages, we propose novel genera of endogenous nimaviruses: “Majanivirus” and “Pemonivirus” identified from penaeid shrimp genomes, and “Clopovirus” identified from terrestrial isopods. Majanivirus genomes contain multiple eukaryotic-like genes such as baculoviral inhibitor of apoptosis repeat-containing genes, innexins, and heat shock protein 70-like genes, some of which contain introns. An alignment of long reads revealed that each endogenous nimavirus species specifically inserts into host microsatellites or within 28S rDNA. This insertion preference was associated with the type of virus-encoded DNA recombination enzymes, the integrases. Majaniviruses, pemoniviruses, some whispoviruses, and possibly clopoviruses specifically insert into the arthropod telomere repeat motif (TAACC/GGTTA)n and all possessed a specific tyrosine recombinase family. Pasiphaea japonica whispovirus and Portunus trituberculatus whispovirus, the closest relatives of white spot syndrome virus, integrate into the host 28S rDNA and are equipped with members of another family of tyrosine recombinases that are distantly related to telomere-specific tyrosine recombinases. Endogenous nimavirus genomes identified from sesarmid crabs, which lack tyrosine recombinases and are flanked by a 46-bp inverted terminal repeat, integrate into (AT/TA)n microsatellites through the acquisition of a Ginger2-like cut-and-paste DDE transposase. These results suggest that endogenous nimaviruses are giant transposable elements that occupy different sequence niches through the acquisition of different integrase families. IMPORTANCE Crustacean genomes harbor sequences originating from a family of large DNA viruses called nimaviruses, but it is unclear why they are present. We show that endogenous nimaviruses selectively insert into repetitive sequences within the host genome, and this insertion specificity was correlated with different types of integrases, which are DNA recombination enzymes encoded by the nimaviruses themselves. This suggests that endogenous nimaviruses have colonized various genomic niches through the acquisition of integrases with different insertion specificities. Our results point to a novel survival strategy of endogenous large DNA viruses colonizing the host genomes. These findings may clarify the evolution and spread of nimaviruses in crustaceans and lead to measures to control and prevent the spread of pathogenic nimaviruses in aquaculture settings.

Our analysis yielded a total of 27 endogenous nimaviral genomes, 24 of which were regarded as complete (Table 2).Of these, 23 genomes were deposited in the DDBJ/ NCBI/ENA databases as metagenome-assembled genomes (MAGs) of uncultivated virus genomes.The genomes of Portunus trituberculatus whispovirus (PotrWSV), Litopenaeus stylirostris majanivirus (LsMJNV), Farfantepenaeus duorarum majanivirus (FdMJNV), and Trachelipus rathkii clopovirus (TrCLPV) are available as supplementary files of the manuscript (see "Data availability" for the link to the FigShare repository).These MAGs are consensus sequences of closely related clones infecting a single organism.Most of the coding sequences on the assembled genomes are intact, but the actual individual copies within the host genome may be disrupted by mutations.Despite these limita tions, the MAGs of endogenous nimaviruses represent distinct lineages of nimaviruses and provide valuable information for analyzing the evolution of Nimaviridae.
Maximum phylogenetic analysis of nimaviral core proteins (10,21) revealed four major clusters within Nimaviridae (Fig. 1).As discussed below, we believe that these clades represent distinct genus-level taxa.

Majaniviruses colonize telomere repeats of penaeid shrimp genomes
We previously reported on a group of penaeid shrimp-specific endogenous nimaviruses, exemplified by MjeNMV (Fig. 2A) (10).We propose for these penaeid endogenous nimaviruses a genus-level cluster, "Majanivirus" (Marsupenaeus japonicus endogenous nimavirus), consisting of 12 members (Table 2).Complete majaniviral genomes range from 278 to 401 kb in size, with GC content ranging from 27% to 42%.Majanivirus genomes were recovered from all penaeid shrimp genome data sequenced in this study, except for Sicyonia sp.Fukuoka2019.We also identified partial majaniviral genomes from publicly available Illumina genome shotgun sequencing data of two penaeid shrimp genomes, Litopenaeus stylirostris and Farfantepenaeus duorarum.
Majaniviruses identified from Penaeus sensu lato (Penaeus s. l.: Marsupenaeus, Melicertus, Fenneropenaeus, Litopenaeus, Farfantepenaeus, and Penaeus sensu stricto) form a coherent and exclusive clade, indicative of close association and host selectivity (Fig. 1; Fig. S1).However, the phylogeny of Penaeus s. l.-associated majaniviruses does not simply reflect that of the host; instead, they are divided into two geographically defined clusters: the Indo-Western Pacific (IWP) and Atlantic-Eastern Pacific (AEP) (Fig. S1).This means that, in addition to the phylogeny of the host species, their geographic distribution has also influenced the diversification of majaniviruses.
We previously showed that the MjeNMV genome is chromosomally integrated into the kuruma shrimp genome (10), but their integration sites remained unknown.Bao et al. (11) were the first to show that Nimav-1_LVa (LC738872.1),a majanivirus, spe cifically insert into the (TAACC/GGTTA)n motifs in the genome of the Pacific white shrimp Litopenaeus vannamei (11).Our analysis of ONT read alignments indicates that MjeNMV and other complete majanivirus genomes are flanked by the same (TAACC/ GGTTA)n pentanucleotide motifs (Fig. 2B), strongly suggesting that telomere insertion is a common feature of majaniviruses.However, some ONT reads were successfully mapped from one end of the majanivirus genome, spanning across the external (TAACC/ GGTTA)n tract and reaching the other end of the genome.This suggests that some majaniviral copies could exist as concatemers, episomes, or possibly as a combination of both.This suggests that they are, or were until recently, actively replicating within the host genome.
A salient feature of majanivirus genomes is the expansion of eukaryotic-like genes (Fig. 3).The earliest reports on WSSV-like sequences in the penaeid shrimp genomes noted an expansion of a large DNA segment containing WSSV homologs as well as various eukaryotic genes, including baculoviral inhibitor of apoptosis repeat (BIR)-con taining proteins (7) and an HSP70 homolog (20).Bao et al. also observed the presence of eukaryotic-like genes on the Nimav-1_LVa genome (11).The availability of complete majaniviral genomes confirms that the presence of eukaryotic-like genes is a shared trait of majaniviruses.Heat shock protein 70-like proteins (MjHSP70-2) (20) and innexins form their own clades on the phylogenetic trees, indicating that they have been vertically  inherited from a common ancestor of the majaniviruses (Fig. 3B and C).BIR-containing proteins clustered with other decapod proteins, but we surmise that they are nimaviral sequences annotated as host genes (Fig. 3A).These findings demonstrate that majanivi ruses harbor multiple eukaryotic-like genes, which were likely acquired from their decapod hosts.Nimaviral core genes are a set of genes that are ubiquitously conserved among Nimaviridae and are likely to play essential functions in the viral replication cycle (10,11,21).The original nimaviral core gene set consisted of 28 genes.Bao et al. proposed the inclusion of four additional genes (wsv112, wsv206, wsv226, and wsw308) to this set, raising the total number to 32 (11).Two protein-coding genes lying downstream of the wsv306-like protein gene in the majaniviral genomes were suspected to be wsv308 and wsv310 orthologs, but their orthology could not be verified by sequence similarity due to substantial divergence.Regardless, structural prediction with ColabFold (22,23) yielded remarkably similar predicted structures, with DALI Z-scores of 19.7 for wsv308-like proteins and 10.5 for wsv310-like proteins (Fig. S2 and S3; Files S1 and S2) (24).We, therefore, concluded that these two genes are authentic WSSV orthologs and added wsv308 and wsv310 to the nimaviral core gene repertoire.Our analysis suppor ted the inclusion of wsv226, wsv308, and wsv310 to the core genes, but phylogenetic analysis of wsv112 and wsv206 suggested that they were acquired independently in the majaniviruses and whispoviruses (Fig. S4), although this does not necessarily mean that their functions are dispensable for viral replication.Consequently, our revised version of the nimaviral core gene set includes 31 genes (Table 3).MdWSSV-like, Metopaulias depressus WSSV-like virus; see Table 2 for the abbreviations for the other viruses.
AvCLPV might be the same virus as the WSSV-like sequences reported by Thézé et al. (25).Porcellio scaber clopovirus (PsCLPV; AP027154.1)was identified from the shotgun sequencing data of Porcellio scaber.The genome sequence of TrCLPV (File S3) was identified from the shotgun sequencing data of Trachelipus rathkii, a species native to Europe but introduced into North America (16).TrCLPV has a genome size of 579 kb, the largest of all nimaviruses discovered so far.
All clopovirus genomes contained a stretch of (TAACC/GGTTA)n repeats, suggesting that clopoviruses specifically insert into this sequence motif (Fig. 5B).However, most ONT reads mapping to these regions span the (TAACC/GGTTA)n repeat to align to either end of the clopovirus genome, suggesting that many of the clopovirus copies exist as episomes or concatemers.For consistency with other nimaviral MAGs, we removed (TAACC/GGTTA)n from the clopoviral genome assemblies to produce linear contigs.Together, these results reveal the presence of a divergent nimavirus lineage in terrestrial isopods, which we call clopoviruses.Clopoviruses possessed 19 ancestral nimaviral genes, of which 10 are core genes (Table 3).Given the small number of genes shared with other nimaviruses, clopoviruses could be classified into a novel family.

28S rDNA-associated tyrosine recombinase in the closest WSSV relatives
PotrWSV (File S4) and Pasiphaea japonica whispovirus (PajaWSV; LC738885.1)are the closest relatives of WSSV analyzed in this study, forming a stem group leading to WSSV (Fig. 1 and 8).PotrWSV was discovered from the genome sequencing data of the swimming crab Portunus trituberculatus (14).PajaWSV, identified from the shotgun sequence data of the Japanese glass shrimp (Pasiphaea japonica), is the closest relative of WSSV characterized in this study.PajaWSV and WSSV share an average amino acid identity of 42.94%.PajaWSV and PotrWSV insert into the host 28S rDNA with a 11-mer target site duplication (5′-CCGTCGCGRGAC-3′), a conserved motif occurring within 28S rDNA (Fig. 8B and C).
PajaWSV and PotrWSV shared predicted multi-exon tyrosine recombinase genes that are phylogenetically related to telomere-associated YRs (Fig. 7).BLASTP search revealed additional YR-like proteins from decapod crustaceans although they were not associated with nimaviruses.Inclusion of the additional YRs into the phylogenetic tree led to the conclusion that PajaWSV and PotrWSV YRs were not immediate phylogenetic neighbors, raising the possibility that the YR genes in the two virus genomes were acquired independently.Collectively, these results suggest that two immediate WSSV relatives employ a distinct family of tyrosine recombinase to integrate into host 28S rDNA although whether the tyrosine recombinase genes are orthologous remains an open question.

Limited expression of MjeNMV genes
To assess the transcriptional landscape of an endogenous nimavirus, we mapped multitissue RNA-seq data of M. japonicus (17) against the MjeNMV genome.The mapping rates were universally low (0.0003%-0.1114%),indicating that MjeNMV activity is limited to low level (Fig. S6).Regardless, their mapping profiles were strikingly different and deserved attention.MjeNMV transcripts in somatic tissues were predominantly short and antisense, whereas gonads had more sense-stranded transcripts.The presence of antisense transcripts in somatic tissues is suggestive of transcriptional silencing medi ated by small RNAs such as siRNA and piRNA, while sense transcripts in gonads are suggestive of weak activity.Overall, transcriptional landscape of MjeNMV is evidently different between somatic and germline tissues although their functional significance remains unclear.

Divergence time estimation of Nimaviridae using penaeid host phylogeogra phy
Viruses do not leave fossil records, but we can trace their evolutionary history by linking it to that of their host (40)(41)(42)(43).As described in a previous section, the evolutionary history of majaniviruses is tightly intertwined with the phylogeography of penaeid shrimps.This prompted us to use the shrimp phylogeography as a calibrator for estimat ing the evolutionary timelines of nimaviruses.Penaeus s. l. is considered to have origina ted in the present IWP and spread eastward to the AEP (44), where it diverged into two genera, Litopenaeus and Farfantepenaeus.Majaniviruses found from Litopenaeus and Farfantepenaeus were most likely introduced to AEP along with the Litopenaeus-Farfantepenaeus common ancestor.This assumption derives the divergence between the IWP and EAP to be as old as the divergence between Litopenaeus and Farfantepenaeus, which was estimated to be at least 42 million years ago (MYA) (Fig. S1 and S5).
LsMJNV and FdMJNV are found in the Eastern Pacific and the Atlantic, respectively.Gene flow between the two oceans has been shut off since the formation of the Isthmus of Panama, which occurred approximately 2.8 MYA (45) .This leads us to hypothesize that the divergence between LsMJNV and FdMJNV dates back to at least 2.8 MYA.
Using these two inferred calibration points, we estimated the divergence times of majaniviruses and the entire Nimaviridae family (Fig. 10).Overall, the divergence times of younger nodes appear more credible than older nodes.For example, the divergence between LdMJNV and FdMJNV was estimated to be 10.4 ± 1.2 MYA.The average nucleotide identity of the two viruses is 85% (15 substitutions/100 nucleotides), which translates to 1.44 × 10^−8 substitution/site/year, which falls within the range of known substitution rates of large double-stranded DNA viruses (46).Divergence time between SiNMV plus CiNMV and Metopaulias depressus WSSV-like virus was estimated to be 15.4 ± 1.8 MYA, which is way older than 4 MYA, the minimum estimated divergence time of M. depresssus and other sesarmid crabs (47).The crown Pemonivirus clade, whose members are exclusively found in IWP Penaeus s. l., was estimated to be 25.4 ± 2.8 million years old.This is concordant with their absence from Litpoenaeus and Farfantepenaeus, which are estimated to have been isolated from the IWP for at least 42 million years.
The divergence dates of deep branches in the Nimaviridae family tree are younger than the estimated divergence of major host lineages.The last common ancestor of majaniviruses was estimated to be present 90 MYA, which is substantially younger than the early diversification of penaeid shrimps, which took place much earlier by the late Jurassic (48-51).Genus Whispovirus was estimated to date back to 171.9 ± 17.3 MYA.The divergence between isopod and decapod nimaviruses was estimated to be 307.3 ± 31.4MYA, which is much younger than the estimated divergence time of eucarids and peracarids, which dates back to the Cambrian (52).These values suggest that majaniviral diversification is more characterized by jumping between closely related hosts than strict host-viral co-evolution.
Overall, the availability of multiple endogenous nimavirus genomes closely associ ated with a particular host taxon allows us to time-calibrate the evolutionary history of nimaviruses.The estimates support the idea that nimaviruses have been associated with crustacean hosts for the last few hundreds of millions of years.

DISCUSSION
It has long been known that crustacean genomes harbor various WSSV-like sequen ces (7)(8)(9)53), but the reasons why they are present has remained unknown.The present results demonstrate that endogenous nimaviruses selectively insert into specific genomic contexts, and this specificity is correlated with the types of integrases they encode.We propose that endogenous nimaviruses are selfish genetic elements that persist within the host genomes (54) and that the capture of integrase genes with different insertion specificities has allowed nimaviruses to persist as genomic parasites colonizing different repetitive motifs representing genomic niches (55,56) .We note that these endogenous nimaviruses are distinct from fragmented viral insertions that may produce potentially immunogenic transcripts (57)(58)(59).The selfish nature of transposable elements could explain the persistence of endogenous nimaviruses even without a perceivable selective advantage to the host (60).  2 for the abbreviations for the other viruses.
While it is possible that promiscuous integration followed by biased selective retention produced the appearance of selective integration, it is unlikely that only one of a wide variety of repetitive motifs present in the host genome would be tolerated.Therefore, the most likely explanation for the observed insertion selectivity is that it is mediated by site-specific integrases.
We now know that endogenous nimaviruses exist as multi-copy elements within host genomes.However, the process by which these populations formed remains unknown.One possibility is that a single ancestral infection event initiated a series of multiplica tions that gave rise to hundreds of copies within the host.Alternatively, closely related viruses-divergent at the strain or isolate level-might have repeatedly infected the same host, contributing to the increased copy numbers.The reality could be a combina tion of both scenarios, suggesting that endogenous nimaviruses may have experienced a complex evolutionary history.A detailed analysis of within-host sequence diversity could potentially allow us to infer the population dynamics of endogenous nimaviruses.However, we currently find this task to be challenging.
Overall, the distribution of integrases among nimaviruses does not strictly align with their phylogenetic relationships, indicating that nimaviruses have acquired integra ses multiple times throughout evolutionary history.Interestingly, we observed that nimaviruses from phylogenetically distinct lineages, such as sesarmid nimaviruses and HtNMV, can possess mutually similar integrases, which raises the possibility that integrase genes may have been shared among different lineages of nimaviruses (61).
Repeat-specific integration is believed to be a survival strategy employed by transposable elements in order to minimize negative effects on host fitness (74)(75)(76).The prevalence of telomere-repeat-specific nimaviruses in penaeid shrimps and terrestrial isopods may be due to a combination of this viral survival strategy and the abundance of simple sequence repeats, including telomere-like repeats (77), in the genomes of these organisms (16-18, 78, 79).
MjeNMV gene expression was universally and yet showed tissue-specific variations.The role of epigenetic factors in this process is highly probable and merits further exploration.The weak expression of MjeNMV genes in gonads suggests gonad-specific activation, a behavior that mirrors that of certain transposable elements.These elements, to ensure their survival, activate within the germline while maintaining dormancy in somatic tissues, thereby avoiding detrimental impacts on the host's fitness (80).It is plausible that endogenous nimaviruses may adopt a similar lifecycle.It would be ideal if we could analyze the differential expressions of integrants in different parts of the chromosomes, but unfortunately, it is extremely challenging because individual copies are mutually almost identical and are difficult to resolve.
Endogenous nimaviruses tenaciously retain structural protein genes, suggesting that they maintain the capability to form viral particles and transmit between hosts.We speculate that endogenous nimaviruses in crustacean genomes are analogous to prophages in bacterial genomes, which can remain dormant until certain stressors trigger their reactivation.Conservation of PIFs among endogenous nimaviruses is particularly noteworthy (21), as they may facilitate the oral transmission of viral particles through cannibalism of dead hosts, which is a common transmission route of WSSV (81)(82)(83).
We speculate that the absence of the integrase gene may significantly contribute to the evolution of a free-living, highly pathogenic nimavirus.WSSV and CoBV, the only isolated free-living nimaviruses to date, are entirely devoid of integrases.The absence of an integrase implies a lack of ability for the virus to integrate itself into the host genome and propagate via vertical inheritance.Once the virus loses its vertical transmission capability, it is likely to become reliant on horizontal transmission for its survival.This shift in transmission strategy could foster the emergence of highly pathogenic variants.
Our analyses may be biased by incomplete taxon sampling of the host and the scarcity of exogenous nimavirus genomes.However, the lack of observed diversity could also reflect the actual rarity of exogenous nimaviruses circulating in the environ ment.To date, we have not been able to identify nimavirus-like sequences in environ mental metagenomes, and despite the long history of modern shrimp aquaculture, WSSV remains the only pathogenic nimavirus of penaeid shrimps.We hypothesize that exogenous nimaviruses are rare and the emergence of a pathogenic nimavirus is an even more unusual event.To confirm this hypothesis, it would be valuable to conduct thorough metagenomic surveys, similar to one conducted in Drosophila melanogaster (84), to assess the prevalence of exogenous nimaviruses and other double-stranded DNA viruses in various crustacean species.
Estimating the ages of integration for individual nimaviruses in a given host genome is currently challenging due to the repetitive nature of viral copies and their integration sites.Nevertheless, we postulate that it is possible to infer the divergence times of nimaviruses at the species or genus level by associating them with the hosts' phylo geography.In this study, we aimed to estimate the divergence times of Nimaviridae using majaniviruses and their closely associated hosts, Penaeus s. l.By introducing two calibration points inferred from the phylogeography of Penaeus s. l., we obtained divergence time estimates for Nimaviridae spanning hundreds of millions of years.
However, our analysis has clear limitations.Our divergence time estimates rely on only two inferred calibration points for majaniviruses and lack any for deeper nodes.Indeed, the estimated divergence dates for deeper nodes, such as the emergence of the Whispovirus genus and the split between decapod and isopod nimaviruses, appear to be younger than the estimated divergence times of major host lineages.It is possible that we may have significantly underestimated the true depths of these divergence times.
Looking forward, we believe that host phylogeography could prove to be a powerful tool in inferring the evolutionary history of nimaviruses.As we continue to accumulate crustacean genome data, we anticipate discovering additional endogenous nimavirus genomes, some of which may be highly host-specific.Such nimaviral lineages will be invaluable in calibrating viral evolutionary timelines based on host divergence.
In conclusion, the availability of endogenous nimavirus genomes provides unique opportunities for studying the diversity and evolution of crustacean-infecting large DNA viruses.

Publicly available data sets
Publicly available whole-genome shotgun sequence data of Portunus trituberculatus, Litopenaeus stylirostris, Farfantepenaeus duorarum, and Trachelipus rhatkii were downloa ded from the NCBI database (Table S2).The raw reads were analyzed in a similar manner to other crustacean genome data.

De novo assembly of shotgun sequence data and virus discovery
The filtered Illumina reads were de novo assembled using SPAdes (90).Combinations of SPAdes versions and parameters varied depending on the time of the analysis, sequencing coverage, and data complexity.The SPAdes assemblies were used to salvage low-copy nimaviral sequences that could not be fully recovered from ONT assemblies, including Nima-1_Lva and SiNMV.
The ONT reads were filtered by 5-, 10-, or 20-kb length cutoffs using SeqKit (91) and were de novo assembled by Flye v2.9 (92) in metagenome mode.The primary ONT assemblies were visualized by Bandage v0.8.1 (93) and screened for nimaviral sequences by TBLASTN searches querying WSSV proteins.The identified nimaviral contigs were used as the bait to map back the ONT reads by Minimap2 (94), and the mapped ONT reads were reassembled by Flye v2.9 (92) in normal mode or Canu v2.2 (95).This generated consensus nimaviral genome sequences that we believe are close representa tions of the original viral genomes.
The BLASTP output was merged by Automated Assignment of Human Readable Descriptions (AHRD) pipeline (102) into a table containing functional description.The genomic coordinates corresponding to nimaviral-like proteins were masked by BEDtools (103), and the remaining coordinates were forwarded to ab initio eukaryotic-like gene prediction by Augustus v3.3.3 (104) using the Apis mellifera gene model (11).The use of Apis mellifera gene model was inspired by Bao et al. (11).The predicted pro teins [generated by gffread (105)] were BLASTP searched against the abovementioned nimaviral and arthropod proteomes, and the BLASTP output was passed to AHRD to generate final functional annotations.The GFF3 annotation files were converted into DDBJ flat files using GFF3toDDBJ (https://github.com/yamaton/gff3toddbj)and fFconv (https://www.ddbj.nig.ac.jp/ddbj/ume-e.html).
Protein sequences of Ginger2 and other DDE transposases were downloaded from the NCBI database and aligned by MAFFT, trimmed with trimAl, and phylogenetic analysis was conducted with IQ-TREE 2.2.0.3.

Copy number estimation
Estimated copy numbers of endogenous nimavirus genomes were calculated as follow: Virus copy number = Virus sequencing depth Estimated genome coverage Estimated genome coverage = Total reads (bp) Estimated host genome size (bp) Estimated host genome sizes were retrieved from literature (14,(17)(18)(19) and the Animal Genome Size Database (https://www.genomesize.com/).

Penaeid mitochondrial genome assembly and annotation
Mitogenome sequences of Metapenaeopsis lamellata and Sicyonia sp.Kyushu2019 were characterized in this study.A contig representing the mitochondrial genome was extracted from a Flye assembly of >5 kb ONT reads.Trimmed Illumina reads were mapped onto the contig by minimap2 and iteratively polished by Pilon v1.24 (97).The mitogenome was annotated on the MITOS2 server (http://mitos2.bioinf.uni-leipzig.de/index.py)(117).The annotated mitochondrial genomes of the two species are available as Supplementary Files of the manuscript.

Phylogenetic analysis and divergence time estimation of penaeid shrimps
A total of 32 mitogenome sequences derived from the suborder Dendrobranchiata, which encompasses penaeoid and sergestoid shrimps, were downloaded from the NCBI database (accessed June 2023; Table S5).The mitogenomes of Sicyonia sp.Kyushu2019 and Metapenaeopsis lamellata were generated in this study as described in the previ ous section.The predicted amino acid sequences of 13 protein-coding genes were aligned by MAFFT v7.520.The alignments were used for Bayesian phylogenetic analysis and divergence time estimation using BEAST v2.7.4 (118).A strict molecular clock, the WAG substitution model, and the Yule speciation model were selected.A total of five fossil and geological calibration points were included as described in Table S5 (45,46,119,120).Ten-million iterations were performed, which were sampled every 10,000 steps after a 10% burn-in.We used Tracer v. 1.7.1 (121) to monitor the progress of the run and to ensure that the effective sampling sizes of all parameters were larger than 200.A maximum clade credibility tree was generated with TreeAnnotator (https://www.beast2.org/treeannotator/),which was visualized with FigTree v1.4 (http:// tree.bio.ed.ac.uk/software/figtree/).

Divergence time estimation of Nimaviridae
The multiple sequence alignments of nine nimviral core proteins used in the maximum likelihood phylogenetic analysis were used for the Bayesian phylogenetic analysis by BEAST v2.7.4.A strict molecular clock, the WAG substitution model, and the Yule speciation model were selected.Two calibration points were introduced as described in Table S5 (45).Ten-million iterations were performed, which were sampled every 10,000 steps after a 10% burn-in.We used Tracer v. 1.7.1 (121) to monitor the progress of the run and to ensure that the effective sampling sizes of all parameters were larger than 200.A maximum clade credibility tree was generated with TreeAnnotator and was visualized with FigTree v1.4.4.

MjeNMV transcriptome analysis
A total of 49 M. japonicus RNA-seq data were downloaded from NCBI database (Table S4) (17).The raw Illumina reads were trimmed by Fastp v0.23.0, and the trimmed reads were mapped onto the MjeNMV genome by HISAT2 v2.2.1 (122).Mapped reads were separated according to the transcriptional orientation using SAMtools.The results were visualized using a custom script.
The raw reads generated in this study are deposited to DDBJ/NCBI/ENA database under the BioProject ID PRJDB13888.The accession numbers of the nimaviral MAG assemblies are provided in Table 2. Colabfold predictions of wsv308 and wsv310 orthologs are available as Supplementary Files 1 and 2, respectively.TrCLPV MAG, protein sequences, and genome annotation are available as Supplementary File 3. PotrWSV MAG, protein sequences, and genome annotation are available as Supplementary Files 4. LsMJNV MAG, protein sequences, and genome annotation are available as Supplementary Files 5. FdMJNV MAG, protein sequences, and genome annotation are available as Supple mentary Files 6.The mitochondrial genome sequence of Metapenaeopsis lamellata is available as Supplementary File 7. The mitochondrial genome sequence of Sicyonia sp.Fukuoka2019 is available as Supplementary File 8. Examples of codes used in this study are available as Supplementary File 9. Supplementary Files 1 to 9 are available on FigShare (https://doi.org/10.6084/m9.figshare.22012370.v1).

FIG 1
FIG 1 Phylogenomic tree of Nimaviridae.Amino acid sequences of nine nimaviral core genes (wsv026, wsv282, wsv289, wsv303, wsv343, wsv360, wsv433, wsv447, and wsv514; 12,905 amino acids; substitution model: JTT + F + I + I + R5) were used for the analysis.Virus names are colored according to insertion motif specificity as indicated on the lower left.The bar in the middle of the figure denotes substitution per site.Ultrafast bootstrap value (1,000 trials) was 100% unless indicated beside the node.Proposed genus names are quoted and unitalicized.WSSV, white spot syndrome virus; CoBV, Chionoecetes opilio bacilliform virus;

FIG 10
FIG 10 Divergence time estimation of Nimaviridae.A total of nine nimaviral core protein sequences (12,905 sites) were used in the analysis.Blue bars indicate 95% confidence intervals of estimated divergence dates.Numbers on nodes correspond to two calibration points described in Table S5.All nodes were supported by a posterior probability of 1. Proposed viral genus names are quoted and unitalicized.WSSV, white spot syndrome virus; CoBV, Chionoecetes opilio bacilliform virus; MdWSSV-like, Metopaulias depressus WSSV-like virus; see Table2for the abbreviations for the other viruses.

TABLE 1
Crustacean samples sequenced in this study

TABLE 2
Nimaviral genomes characterized in this study a Not applicable to third-party annotation.

TABLE 3
Nimaviral core genes found in clopovirus genomes a Naldaviral core genes.