Tracing RNA viruses associated with Nudibranchia gastropods

Background Nudibranchia is an under-studied taxonomic group of gastropods, including more than 3,000 species with colourful and extravagant body shapes and peculiar predatory and defensive strategies. Although symbiosis with bacteria has been reported, no data are available for the nudibranch microbiome nor regarding viruses possibly associated with these geographically widespread species. Methods Based on 47 available RNA sequencing datasets including more than two billion reads of 35 nudibranch species, a meta-transcriptome assembly was constructed. Taxonomic searches with DIAMOND, RNA-dependent-RNA-polymerase identification with palmscan and viral hallmark genes identification by VirSorter2 in combination with CheckV were applied to identify genuine viral genomes, which were then annotated using CAT. Results A total of 20 viral genomes were identified as bona fide viruses, among 552 putative viral contigs resembling both RNA viruses of the Negarnaviricota, Pisuviricota, Kitrinoviricota phyla and actively transcribing DNA viruses of the Cossaviricota and Nucleocytoviricota phyla. The 20 commonly identified viruses showed similarity with RNA viruses identified in other RNA-seq experiments and can be putatively associated with bacteria, plant and arthropod hosts by co-occurence analysis. The RNA samples having the highest viral abundances showed a heterogenous and mostly sample-specific distribution of the identified viruses, suggesting that nudibranchs possess diversified and mostly unknown viral communities.


INTRODUCTION
With an estimated 10 8 viral particles per millilitre of coastal water, oceans can be referred as hubs of viral diversity (Suttle, 2007;Mushegian, 2020). Here, viruses of unicellular organisms can govern the fate of host populations and influence global environmental cycles (Suttle, 2007;Brum et al., 2015;Sunagawa et al., 2020). As viruses are obligate intracellular parasites, their diversity reflects host genetic variety within the three domains of life (Krehenwinkel, Pomerantz & Prost, 2019). In this context, host-virus combinations frequently show co-adaptation processes, although viruses infecting multiple hosts and host-jumps have often been reported for invertebrate viruses (Shi et al., 2016). Completely leveraging the high-throughput power of modern sequencers, shotgun metagenomics is contributing to untangle the fascinating complexity of host and virus diversity and highly vulnerable in the current temperature increasing scenario (Armstrong, Tanner & Stillman, 2019). RNA sequencing (RNA-seq) data are rare for these species, and most of them have been used to study the nervous system (Senatore, Edirisinghe & Katz, 2015), to trace nudibranch-gastropod phylogeny (Goodheart et al., 2017) or for comparative morphology studies (Goodheart et al., 2018). Noticing the complete absence of information regarding nudibranch-associated viruses, I exploited the transcriptomic datasets of 35 nudibranch species and, by performing a meta-transcriptomic analysis, I applied different pipelines to identify viruses associated with these species.

Data retrieval
The SRA archive (https://www.ncbi.nlm.nih.gov/sra) was interrogated on the 1 st of December 2021, to retrieve 47 Nudibranchia transcriptomic datasets. RNA-seq samples characterized by paired-read layout, random or PCR selection (no size selection), based on Illumina technology were selected ( Table 1). The NCBI protein (nr) databases were downloaded in December 2021, together with the annotation databases used for the tools described below (VirSorter2, CheckV and CAT).

Transcriptome assembly and preliminary analysis
RNA-seq datasets were trimmed for quality and to remove sequencing adaptors using fastp with default parameters (Chen et al., 2018). Trimmed reads were de novo assembled per sample using the CLC assembler (CLC Genomic Workbench v.20; Qiagen, Germantown, MD, USA), setting bubble size and word size graph parameters to automatic and the minimum allowed contig length to 200 bp. The cd-hit tool (Fu et al., 2012) was used to reduce the redundancy of the contigs, applying a cut-off of 99% similarity.

Taxonomic annotation of contigs and identification of viruses
A global taxonomic annotation of the RNA contigs was performed using DIAMOND (Buchfink, Xie & Huson, 2015) blastx searches against the NCBI nr database. The DIAMOND output file was meganized and uploaded in MEGAN6 for visualization (Huson et al., 2007). To properly identify viral contigs, two different approaches were applied. Firstly, to identify the footprint of the viral RNA-dependent-RNA-polymerases, palmscan (Babaian & Edgar, 2021) was applied with the more conservative parameters on the whole collection of RNA contigs. Secondly, to identify viral contigs based on viral hallmark genes, VirSorter2 (Guo et al., 2021) in combination with CheckV v0.8.1 (Nayfach et al., 2021) were used. VirSorter2 was run with a cut-off of 0.5 to maximize sensitivity, applying 1.5 kb as the minimal required length and searching for 'RNA viruses'. The RNA sequences were used as input for coding sequence (CDS) prediction with prodigal v2.6.3 (Hyatt et al., 2010), which were then annotated against Pfam (release 32.0) and a custom viral HMM database to detect viral sequences. Putative viral contigs were then passed through CheckV to identify possible host genes and handle duplicate segments of circular contigs. Contigs with at least one identified viral gene or with a VirSorter2 score > 0.95 or a hallmark gene count > 2 (the last two parameters were applied in the absence of viral genes

Viral distribution analysis
To evaluate the distribution of the retrieved viruses among the collection of datasets in the NCB SRA database, the RNA-dependent-RNA-polymerase regions obtained from the core-virome were extracted and used as query for palmID v.0.04 in the Serratus open-science viral discovery platform (Edgar et al., 2022). Briefly, for each query the tool provided a quality evaluation based on the conservation of the three functional motifs of the RdRP (motif A, B and C) and plotted a confidence value on the distribution of values obtained from 15,010 canonical viral RdRP from GenBank. Moreover, a word cloud analysis based on the STAT k-mer analysis (Katz et al., 2021) performed on the samples including similar viruses was considered to identify possible viral hosts.

Taxonomic characterization of nudibranch meta-transcriptome
The search for the term 'Nudibranchia' in the nucleotide, protein, short-read (SRA) and bibliographic (PubMed) NCBI databases resulted in a limited number of hits upfront to 1,947 taxonomic entries at NCBI. Most of the 135 SRA entries referred to RNA-seq experiments based on Illumina technology, whereas no one genome draft is available for these species. Among these SRA samples, 47 datasets pertaining to 35 nudibranch species and accounting for 2.01 billion reads were selected (Table 1). De novo meta-transcriptome reconstruction, based on per-sample assemblies, resulted in 3,078,050 contigs, reduced to 2,929,504 contigs after the removal of highly similar sequences (>99%). Taxonomic classification based on blastx searches against the NCBI database assigned a match to 28.8% of these RNA sequences, with most of them referring to Mollusca (60%), and in particular to Aplysia spp. (100 k contigs), which represented the nearest genome-sequenced taxonomic entry within the Euthyneura clade. Notably, a considerable number of contigs (105 k) were assigned to Symbiodinium spp., a known symbiotic alga of marine metazoans including corals and molluscs. Together with a relatively small

Identification of the nudibranch-associated viruses
To identify bona fide viruses associated with these nudibranch RNA samples, DIAMOND blastx results were implemented with two different approaches: either used to identify the footprint of the viral RdRPs (palmscan) or to identify viral hallmark genes based on curated viral HMM databases. A total of 37 high-confidence RdRP footprints were identified by palmscan, whereas 128 putative viral contigs were identified by VirSorter2 (Table S1). Including the contigs identified by DIAMOND, a total of 552 contigs were classified as possible viruses, with only 19 contigs commonly identified by all the three methods (Fig. 1A, Data S1). The size distribution of the viral contigs increased from blastx to Virsorter2 results with the 19 common contigs placed in the upper part of the distribution (Figs. 1B and 1C). A taxonomic annotation of the 552 contigs based on the predicted CDS was performed with CAT, revealing that palmscan identified only contigs classified as 'RNA virues', whereas Virsorter2 identified also hits similar to metazoan genomes (Table S2). Like palmscan, Virsorter2 identified RNA viruses with the single exception represented by a Cossaviridae hit (ssDNA virus) identified due to an ORF encoding a parvovirus non-structural protein NS1 domain (SRR1950949_18608 in Data S1). Overall, several hits matching the phyla Kitrinoviricota, Negarnaviricota and Pisuviricota were identified, together with abundant unclassified viruses ( Fig. 2 and Table S2). Regarding the 19 shared contigs plus the one identified by palmscan and Virsorter2, four contigs showed similarity to Beihai picorna-like virus 75 (Table 2). These contigs resembled three partial and one complete viral genome and their similarity to Beihai picorna-like virus 75 (KX883381) ranged from 46% to 53%. Two other contigs showed similarity to Wenzhou picorna-like virus 46, whereas four contigs were similar to different viruses belonging to the Pisuviricota phylum (Table 2).

Nudibranch-associated RNA virus distribution among the 'planetary RNA virome'
A total of 15 high-confidence RdRP proteins were used to trace the distribution of the corresponding viruses within the collection of SRA datasets by using palmID implemented into the Serratus open science platform (Edgar et al., 2022). Association analysis based on the STAT kmer analysis performed on the RNA-seq datasets including similar viruses was used to retrieve possible host-virus associations (Fig. 3). The most abundant nudibranch-associated virus, Beihai picorna-like virus 75 (SRR3726693_23), was identified in metagenome-derived datasets poorly supporting possible hosts (Table 2). SRR1950953_226, which showed low similarity with a virus associated with crustacea (Beihai crab virus 1), was associated with arthropods by palmID. Differently, Wenzhou picorna-like virus 19 (SRR1950942_4074), initially associated with gastropods, is likely a plant virus. Viruses similar to Barns Ness hepe-like virus 1 and Barns Ness hepe-like virus 2, initially identified as possible porifera-associated viruses (Waldron, Stone & Obbard, 2018) but also found associated with bivalves (Rosani et al., 2019), revealed a possible association with arthropods. Beihai picorna-like virus 72 (SRR1950940_818) is possibly associated with Bacteria, as Biomphalaria virus 4 (SRR3726696_22111) and the unclassified SRR1950951_5889, SRR3726705_1238 and SRR1950949_2406 viruses. Notably, the two viruses similar to Wenzhou picorna-like virus 46 (SRR1950953_981 and SRR3726693_3931) were associated with a viral metagenome sample obtained from porifera (Rhopaloeides odorabile) as well as to several Pomacea canaliculata samples.
To further investigate possible host-virus associations in specific samples, a taxonomic classification of the assembled contigs of the eight samples including more than 500 viral reads was performed separately (Fig. S1). Except for nudibranchs, poor evidence of possible hosts of shared viruses was detected in independent samples (e.g. SRR1950953 and SRR3726693), nor for the highly abundant viruses found in sample SRR3726693, although bacteria are present in all these samples and might represent the genuine viral hosts (Fig. S1). An evident contamination of Dinophyceae (i.e. Symbiodinium spp.) is detectable in samples SRR1950951 and SRR3726699, but no viruses are shared between these  Gastropoda Plant (Fig. 3D) samples. Low-level contaminations of Anthozoa are detectable in all samples, although the low abundance rejects the possibility that these species can act as hosts of the most abundant viruses. Differently, sample SRR1950953 showed Hydrozoa contamination, possibly representing the host of SRR1950953_226 and SRR1950953_981 viruses.

DISCUSSION
Gastropod nudibranchs are intriguing models to investigate host-microbe associations and symbiosis from a functional point of view (Cheney et al., 2016;Mudianta et al., 2016;Kristiana et al., 2020). Notably, nudibranch-associated microbes are poorly known, and viruses have never being reported nor mentioned in the scientific literature. Besides the general interest of tracing viruses associated with these organisms, this will be preliminary to the understanding of possible roles mediated by viruses among nudibranch symbiosis as well as to untangle nudibranch antiviral defences. So far, no viral-or bacterial-mediated diseases are known in these species, possibly suggesting the presence of effective antiviral and antibacterial defences.
Here, the identification of viruses was approached by alternative and partially complimentary methods. Taxonomic classification of contigs (blastx) against the whole NCBI nr database identified both putative DNA and RNA viruses, although suffering from the poor representation of nudibranch species among databases. This aspect introduced For each contig, identified by the contig ID and size, the annotation obtained using blastn against the NCBI nt database (description) and the taxonomic classification obtained using CAT (phylum, order, family and species) were reported together with the suggested host of the best hit or by palmID. The viral contigs were ordered by total number of mapped reads. Underlined hits referred to genomes including low-confidence RdRPs (no palmID analysis was performed). Bolded taxonomic classifications referred to information not retrieved by CAT but added by palmID. SRR3726698_7138 represented the hit identified by palmscan and Virsorter2 only (annotations was retrieved using blastx).
abundant false positives in the identification of viruses, namely host contigs claimed as viral.
Differently, both palmscan and Virsorter2 genuinely identified RNA viruses. The former is designed to identify the footprint of the viral RdRPs (Edgar et al., 2022), an exclusive feature of RNA viruses. The latter identified viral hallmark genes and also provide a qualitative evaluation on the retrieved genomes (CheckV), reducing the identification of incomplete viruses, such as DNA viruses using RNA data. Overall, the sum of these methods identified 552 viral contigs, whereas the combination identified a core set of 19 viruses only. Sporadic DNA viruses identified in few samples and showing similarity with Herpesviruses, Mimiviruses or Poxviruses were discharged due to their genomic incompleteness.  Table 2.
Full-size  DOI: 10.7717/peerj.13410/ fig-3 Arguably, this result revealed the importance of using complimentary methods in order to detect high-confidence viruses associated with selected biological sample (e.g. nudibranchs), differentiating this analysis from a mere explorative survey focused on the detection of viruses. Likewise, although computationally very performant, the search of the RdRP only identified few viruses. Dedicated tools such as Virsroter2 limits false positives and increase the sensitivity by targeting multiple viral hallmark genes instead of the RdRP only, although requiring more computational effort.
By the search of the core nudibranch-virome among the 'planetary RNA virome' (Edgar et al., 2022), similar hits were retrieved as well as putative host-pathogen associations based on co-occurrence analysis.
Together, these results highlighted that nudibranchs possess different viromes, showing some similarities to viruses previously identified in gastropods, like Wenzhou picorna-like virus 46 and Wenzhou picorna-like virus 19 (Shi et al., 2016). Interestingly, the most represented virus showed similarity to Beihai picorna-like virus 75, initially found associated to echinoderms (Shi et al., 2016), although also present in several metagenomic datasets. Massive coverage in a single sample, together with the absence of evident signs of contamination of other possible host species, suggested that these viruses are hosted by nudibranchs (Learchis evelinae), and their role might be take into consideration in future research. Differently, all the other viruses are characterized by a lower coverage and co-occurrences analysis revealed that they possibly resembled bacterial viruses, which are still poorly studied (Wolf et al., 2020).
Indeed, nudibranch RNA samples are characterized by 'physiological contaminations', probably due to their feeding habits (corals) or mutualistic associations, here depicted by contigs of Symbiodinium spp. (Burghardt & Wägele, 2014;Mies et al., 2017). In this context, the host of the SRR1950940_818 contig similar to Beihai picorna-like virus 72 might be a coral, confirming its initial attribution to Anthozoa (Shi et al., 2016), although palmID associated also this virus with bacteria. Finally, the possible hosts of Barns Ness hepe-like virus 1 and 2 was considered, since these viruses were reported associated with Porifera and bivalves (Waldron, Stone & Obbard, 2018;Rosani et al., 2019), whereas they cooccur with Arthropoda (palmID) and nudibranch too.

CONCLUSIONS
A total of 20 RNA viruses were identified associated with nudibranch samples, confirming that RNA-seq datasets are useful data to untangle sample-associated viromes, particularly for under-studied taxa. While the analysis of RNA samples can reveal the active components of the biome as well as RNA viruses, DNA-based metagenomics will be required to trace the dormant DNA viruses. Starting from the evidence of RNA viruses associated with nudibranchs, future research can deepen the functional roles of viromes in these fascinating small and colourful marine species by untangling individual viromes using dedicated sample preparation approaches.