Unravelling the Single-Stranded DNA Virome of the New Zealand Blackfly

Over the last decade, arthropods have been shown to harbour a rich diversity of viruses. Through viral metagenomics a large diversity of single-stranded (ss) DNA viruses have been identified. Here we examine the ssDNA virome of the hematophagous New Zealand blackfly using viral metagenomics. Our investigation reveals a plethora of novel ssDNA viral genomes, some of which cluster in the viral families Genomoviridae (n = 9), Circoviridae (n = 1), and Microviridae (n = 108), others in putative families that, at present, remain unclassified (n = 20) and one DNA molecule that only encodes a replication associated protein. Among these novel viruses, two putative multi-component virus genomes were recovered, and these are most closely related to a Tongan flying fox faeces-associated multi-component virus. Given that the only other known multi-component circular replication-associated (Rep) protein encoding single-stranded (CRESS) DNA viruses infecting plants are in the families Geminiviridae (members of the genus Begomovirus) and Nanoviridae, it appears these are likely a new multi-component virus group which may be associated with animals. This study reiterates the diversity of ssDNA viruses in nature and in particular with the New Zealand blackflies.


Introduction
Hematophagous insects are responsible for vectoring a wide range of pathogens. Vectors of important mammalian and avian pathogens, such as mosquitoes, are members of the order Diptera (suborder: Nematocera). One group of blood sucking insects, commonly referred to as blackflies or sandflies, are members of the Simuliidae family [1]. Although blackflies are found globally, only some species consume a blood meal. For those species that feed on blood, it is the adult females that do so, whereas the males feed primarily on nectar. Aquatic environments are essential for the life cycle of blackflies, with the egg, larval and pupae stages all occurring in flowing water, followed by the emergence of a winged adult form.
Blackflies are known to transmit a handful of parasites, predominantly protozoa and parasitic worms. In humans, the most important blackfly vectored pathogen is the parasitic nematode, Onchocerca volvulus which causes river blindness. Although rare, globally O. volvulus has a significant impact on human health in Africa, affecting over 18 million people [2,3]. Avian haemazoan parasites, such as those in the genus Leucocytozoon, are also commonly transmitted by blackflies, with different blackfly species showing preference to the bird species they bite and harbouring a variety of parasite linages [4,5]. The most studied arbovirus transmitted by blackflies is vesicular stomatitis virus (family Rhabdoviridae, genus Vesiculovirus) that typically infects livestock, however, zoonotic events have also been reported [6]. Invertebrate iridoviruses (family Iridoviridae), which are double-stranded DNA viruses, have been identified in blackflies across the globe, often causing a covert infection [7,8].
Several species of blackfly are endemic to New Zealand, two of which, the New Zealand blackfly (Austrosimulium australense) and the west coast blackfly (A. ungulatum), consume blood meals [9,10]. They are notorious for their persistence in pursuit of a blood meal and in various locations are present in overwhelming swarms. Despite having some knowledge on the ecology of New Zealand blackflies, we know very little about the viruses that are circulating in these insects. No human pathogens vectored by blackflies in New Zealand have been documented and, with the exception of protozoal transmission in some avian species, little is known about the microorganisms circulating in these insects.
In recent years, studies have used viral metagenomics as a non-biased approach for identifying viruses circulating in hematophagous and phytophagous insects [11][12][13][14][15]. This has resulted in the identification of a vast number of viruses, circulating in the hosts these insects are feeding on, from the surrounding environment, and those that infect the insects themselves. Arthropods have been broadly shown to harbour a wide range of viruses with circular replication-associated (Rep) protein encoding single-stranded (CRESS) DNA genomes [11,13,14,16,17]. At present there are many established CRESS DNA virus families; Bacillidnaviridae, Circoviridae, Geminiviridae, Genomoviridae, Nanoviridae, and Smacoviridae, all of whose members encode Reps that share conserved replication associated motifs, an origin of replication and are usually <6 kb in size [18]. Also, viruses in the family Microviridae which infect bacteria, typically encode a major capsid, minor capsid and a replication initiation protein, and range in size from 4-7 kb [19]. In addition, numerous novel CRESS DNA viruses have been identified in arthropods which are yet to be taxonomically classified [11,20].
Viral metagenomic studies reveal a great deal about viruses circulating in arthropods. For example, a study on mosquitoes [14] identified viral sequences with similarities to animal-infecting ssDNA viruses (families: Anelloviridae, Circoviridae, Parvoviridae), double-stranded (ds) DNA viruses (families: Herpesviridae, Poxviridae and Papillomaviridae), plant-infecting ssDNA viruses (families: Geminiviridae and Nanoviridae), and three bacteria-infecting dsDNA viruses (families: Myoviridae, Podoviridae and Siphoviridae). Viral metagenomic studies allows for a snapshot of the viruses circulating in the vectors hosts, the insect and surrounding ecosystem to be identified. Based on the lack of information on the widespread New Zealand hematophagous insect commonly known as the blackfly, we undertook a metagenomics approach to investigate the associated ssDNA virome.

Collection of Blackflies and Isolation of Viral Nucleic Acid
For this project, 40 individual blackflies were collected from North Canterbury, New Zealand in 2015. The 40 individuals were collected from a single site. The sex of the individuals, and whether they had consumed a blood meal, was determined. These samples were pooled and homogenized using a pestle in 2 mL of SM buffer (0.1 M NaCl, 50 mM Tris/HCl-pH 7.4, 10 mM MgSO 4 ). The homogenized sample was centrifuged for 10 min at 10,000 rpm and the resulting supernatant was filtered through a 0.2 µM filter. The viral particles in the filtrate were precipitated overnight at 4 • C with 15% PEG and following this the solution was centrifuged at 14,000 rpm for 10 min and resulting pellet resuspended in 500 µL of SM buffer. Following this, 200 µL of the resuspended material was subsequently used to isolate viral DNA using the High Pure Viral Nucleic Acid Kit (Roche Diagnostics, USA) according to the manufacturer's specifications. The viral nucleic acid was then used in a rolling circle amplification reaction with the TempliPhi™ kit (GE Healthcare, USA) to preferentially amplify circular DNA molecules.

High-Throughput Sequencing and Viral Genome Verification
Rolling circle amplified DNA was used to prepare 2 × 150 bp paired-end libraries for sequencing on an Illumina 2500 platform at Macrogen Inc. (Korea). The paired-end reads were de novo assembled using metaSPAdes v 3.12.0 [21] and resulting contigs (>750 nts) were filtered for viral-like sequences using BLASTx [22] against viral protein database generated from the GenBank RefSeq depository. For contigs with similarities to viruses in the Microviridae family, full de novo assembled genomes were confirmed by mapping raw reads using BBMap [23], and deemed credible with a coverage level of greater than 10×. For viral contigs with similarities to other ssDNA viruses, abutting primers (Table S1) were designed to recover the complete genomes by PCR using Kapa HiFi HotStart DNA polymerase (KAPA Biosystems, USA). Amplicons were resolved on a 0.7% agarose gel, the correct size amplicons were excised, gel purified and cloned into pJET1.2 cloning vector (ThermoFisher Scientific, USA). Recombinant plasmids containing the viral genomes were purified from transformed XL blue E. coli competent cells and Sanger sequenced at Macrogen Inc. (Korea) by primer walking. The Sanger sequences were assembled using Geneious software V11.1.5.

Network Construction, Phylogenetic and Similarity Comparison Analyses
The blackfly viral rep (CRESS DNA viruses) and the major capsid protein (mcp) gene (microviruses) together with those available in GenBank were extracted, translated and used to build a Rep and MCP protein sequence dataset. The Rep of circoviruses, geminiviruses, smacoviruses, genomoviruses and nanoviruses and the MCP of viruses in the subfamily Gokushovirinae (family Microvividae) clustered first with CD-HIT [24] using 0.9 sequence identity cut off and a representative from each cluster was included in the final dataset. The Rep and MCP protein sequences were used separately to build sequence similarity networks (E-value = 1 × 10 −5 ) using the EFI-Enzyme similarity tool server [25,26]. The MCP similarity network was constructed using minimum similarity score 10 −175 and the Rep with 10 −60 . The score is the similarity threshold for connected nodes, i.e., proteins, with each other and thus those with scores below this value are not connected. Protein similarity networks were visualised using Cytoscape V3.7.0 [27].
Based on the network clusters, unique cluster sequences datasets were built for each major cluster (comprised of ten or more members) that contained a blackfly derived sequence. The datasets included genomoviruses, circoviruses and five additional clusters labelled cluster group 1-5 for the unclassified CRESS DNA viruses and two microvirus groups labelled cluster MV group 1 and 2. The sequences in each of these cluster groups were aligned using MUSCLE [28] and maximum likelihood phylogenetic trees inferred using PHYML [29] with the best fit models, determined using ProtTest [30]. The substitution models used are genomoviruses, LG+I+G; circoviruses, rtREV+G+I; CRESS group 1, rtREV+G+I; CRESS group 2, WAG+G+I; CRESS group 3, rtREV+G; CRESS group 4, WAG+G+I; CRESS Group 5, WAG+G+I, MV group 1, rtRev+G+I; MV group 2, rtREV+G+I+F. Branches with aLRT support of <0.8 were collapsed using TreeGraph2 [31]. The Maximum likelihood phylogenetic trees were midpoint rooted with the exception of the genomoviruses which were rooted using the geminivirus Reps and the cycloviruses with circoviruses Reps.
BLASTx [22] comparisons were undertaken for any singletons to determine the most closely related sequences in GenBank. For clusters comprised of less than four sequences an amino acid pairwise comparison using SDT V1.2 [32] was undertaken.

Results and Discussion
In a pool of 40 individual blackflies, diverse CRESS DNA viruses were identified. Cluster analyses using the Rep protein, which is the most conserved gene among CRESS DNA viruses, provides a broad overview of the extensive range of viruses recovered in the blackfly samples (Table 1 and Figure 1). The network analyses results reveal (Figure 1) that the Reps from the blackfly derived viruses or DNA molecules cluster with those in the Genomoviridae family (n = 9), Circoviridae family (n = 1), unclassified CRESS DNA viruses (n = 15) or as singletons (n = 6).

Divergent CRESS DNA Viruses
A large cohort of Rep-encoding viruses exist that are divergent and do not, at present, fall into classified viral families. These vary in genome organisation but all encode a Rep and putative CP. Several novel CRESS DNA viruses were also discovered in this study, all but one cluster outside major virus family groups and are distributed across five larger network clusters and three smaller clusters contain two to three sequences (Table 1, Figure 1). Nine do not cluster with any other Reps therefore are represented as singletons (Table 1, Figure 1). These range in genome size from~1.8 to 3 kb and are referred to as blackfly DNA virus (BfV) 1-19 (Table 1). Genome organisation is highly variable, see Table 1 for details on open-reading frame orientations. Interestingly, BfV-18 appears to have an unusual rep which contains two predicted intron regions. All Rep encoding molecules harbour RCR and SF3 helicase motifs, with the exception of BfV-1 and -2 which are apparently missing motif C and RCR motif II, respectively.
Several viruses related to those in the family Circoviridae have also been identified in arthropods [11,14,15,[45][46][47][48]. Here we identify a blackfly DNA virus (BfV-10) whose Rep clusters with members of the Circoviridae family and it is phylogenetically basal to the cycloviruses (Figure 1). The Rep of BfV-10 share 35-43% amino acid identity to Reps of circoviruses.

Multi-Component Viruses and Circular Rep-Encoding DNA Molecule
Viruses in the Nanoviridae family infect plants and are comprised of up to eight separate circular DNA components, all encoding a different functional gene including a Rep. Members of another plant infecting virus family, Geminiviridae, are known to have satellite molecules which enhance replication and pathogenicity. More recently a multi-component virus was isolated from faecal samples of Pacific flying fox, this, like other multi-component DNA viruses shares sequence recognition regions in the intergenic regions. In this study, three circular DNA molecules that encode a rep gene were identified. For two of these, cognate molecules encoding a cp gene were identified and therefore these represent two multicomponent viruses (Table 1, Figure 3A-C, Supplementary Data 2). These components have common regions in the intergenic region ( Figure 3B), such common regions in multicomponent viruses act as recognition sites for initiation of replication [52]. These, referred to as blackfly multicomponent virus (BfMCV) 1 and 2, have Reps most similar to another multicomponent virus recovered from a Pacific flying fox multicomponent virus (KT732816) [40], sharing 65% and 67% aa similarity, respectively ( Figure 3C,D). The cognate BfMCV 1 and 2 CPs share 44% with leaf-footed bug circular genetic element (MH545544) [11] and 43% with Pacific flying fox multicomponent virus (KT732816) [40]. No cognate molecule was identified for the third DNA molecule, referred to as Blackfly DNA molecule 1, however, this molecule is most closely related to the other two multicomponent viruses detected here, sharing 61-62% Rep identity, and to rodent stool associated circular genome virus (JF755415) [53] sharing 62% Rep identity. A cognate molecule encoding a CP may therefore not have been recovered, although no molecule was identified in the high-throughput sequencing data, or this genetic element may represent replication. All molecules have the same nonanucleotide sequence 'TAGTATTAC'. In this study, three circular DNA molecules that encode a rep gene were identified. For two of these, cognate molecules encoding a cp gene were identified and therefore these represent two multicomponent viruses (Table 1, Figure 3A-C, Supplementary Data 2). These components have common regions in the intergenic region ( Figure 3B), such common regions in multicomponent viruses act as recognition sites for initiation of replication [52]. These, referred to as blackfly multicomponent virus (BfMCV) 1 and 2, have Reps most similar to another multicomponent virus recovered from a Pacific flying fox multicomponent virus (KT732816) [40], sharing 65% and 67% aa similarity, respectively ( Figure 3C,D). The cognate BfMCV 1 and 2 CPs share 44% with leaf-footed bug circular genetic element (MH545544) [11] and 43% with Pacific flying fox multicomponent virus (KT732816) [40]. No cognate molecule was identified for the third DNA molecule, referred to as Blackfly DNA molecule 1, however, this molecule is most closely related to the other two multicomponent viruses detected here, sharing 61-62% Rep identity, and to rodent stool associated circular genome virus (JF755415) [53] sharing 62% Rep identity. A cognate molecule encoding a CP may therefore not have been recovered, although no molecule was identified in the high-throughput sequencing data, or this genetic element may represent replication. All molecules have the same nonanucleotide sequence 'TAGTATTAC'.

Bacteria-Infecting CRESS DNA Viruses
Microviruses, which infect bacteria, are commonly found where their host is present including the microbiome of arthropods [34,54] and harbour unidirectional genomes which encode a replication initiation protein, major and minor capsid proteins and other accessory proteins [19]. Divided into two subfamilies, Gokushovirinae and Bullavirinae (International committee for virus taxonomy: 2018 release; https://talk.ictvonline.org/taxonomy/), microviruses are largely host specific [19]. Here, we identify 108 novel microviruses from blackflies. Using the most well conserved protein, the MCP, we show the wide scope of microvirus diversity present in blackflies (Figure 4). A large proportion of blackfly microviruses (BfMVs) MCPs group with those of gokushoviruses ( Figure 4). The MCPs of 88 BfMVs cluster in MV group 1, four in MV group 2 and the remaining are part of small network clusters or are singletons. Although a few BfMVs in the MV group 1 are interspersed throughout the MCP phylogeny, the majority fall in two major clades (Figure 4)

Bacteria-Infecting CRESS DNA Viruses
Microviruses, which infect bacteria, are commonly found where their host is present including the microbiome of arthropods [34,54] and harbour unidirectional genomes which encode a replication initiation protein, major and minor capsid proteins and other accessory proteins [19]. Divided into two subfamilies, Gokushovirinae and Bullavirinae (International committee for virus taxonomy: 2018 release; https://talk.ictvonline.org/taxonomy/), microviruses are largely host specific [19]. Here, we identify 108 novel microviruses from blackflies. Using the most well conserved protein, the MCP, we show the wide scope of microvirus diversity present in blackflies (Figure 4). A large proportion of blackfly microviruses (BfMVs) MCPs group with those of gokushoviruses ( Figure 4). The MCPs of 88 BfMVs cluster in MV group 1, four in MV group 2 and the remaining are part of small network clusters or are singletons. Although a few BfMVs in the MV group 1 are interspersed throughout the MCP phylogeny, the majority fall in two major clades (Figure 4)   Many of the newly described microviruses identified in this study group with those isolated from other arthropods such as those from honey bees [34]. Taking into consideration that only 40 blackflies were sampled, the sheer number and diversity of microviruses that have been found here is remarkable. This may, in fact, reflect an equally diverse associated bacterial community in the blackfly and the animals from which they obtain a blood meal. Furthermore, this, taken with the recent recovery of many novel microviruses from arthropods [34,54], indicates the presence of distinct arthropod microvirus populations.

Conclusions
The ongoing expansion of our knowledge of the CRESS DNA viruses facilitated by high-throughput sequencing approaches emphasizes the breadth of diversity in various ecosystems and organisms. The diversity and number of CRESS DNA viruses associated with arthropods alone is staggering [11,14,16,20,48]. Here, we report the identification of CRESS DNA viruses associated with blackflies in New Zealand. Nine genomoviruses, 19 unclassified CRESS DNA viruses, two multicomponent viruses, a circular rep-encoding DNA molecule and 108 microviruses. To date, the only multicomponent CRESS viruses in established families are the nanoviruses and geminiviruses which both infect plants.
Other than these, our group have identified novel multicomponent viruses previously in faecal matter of Pacific flying foxes [40] and now in blackflies. These appear to be related (i.e., CP and Reps) and thus eludes to the fact that there are other Rep encoding multicomponent viruses [40], given the presence of a completely unique group comprised of the BfMCVs and the pacific flying fox multicomponent virus beyond those that belong to the Geminiviridae and Nanoviridae families. Due to the blood feeding nature of blackflies, these viruses may have originated from the blackflies, the hosts they feed from or surrounding environment. Although further investigation is needed to truly determine the hosts of the CRESS viruses, we know the microviruses infect bacteria and therefore, prominent groups of related microviruses may reflect a specific blackfly bacterial profile. This study provides knowledge on viruses associated with the understudied hematophagous blackfly of New Zealand and further displays the sheer breadth of CRESS DNA viral diversity in arthropods globally.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/6/532/s1, Table S1: Primer details for each recovered blackfly virus genomes and molecules; Table S2: Rolling circle replication and helicase motifs identified in the Reps of the eukaryotic CRESS DNA viruses; Data S1: Pairwise identity comparison of the full genome, Rep and CP sequences of the genomoviruses recovered in this study together with all previously identified.