The genetics of host–virus coevolution in invertebrates

Highlights • No genetic data are available on viruses or antiviral defence for most animal phyla.• The research bias may bias understanding of host–virus coevolution.• Antiviral RNAi genes of Drosophila display rapid adaptive evolution.• There is no difference between plant, vertebrate, and insect viruses in median dN/dS ratio.• High-frequency large-effect segregating polymorphisms provide evidence for coevolution.


Background
Viral infection and antiviral defence are universal phenomena [1] and viral infections are reported across the metazoa [e.g. [2][3][4]. However, research tends to focus more on the coevolution of vertebrates (and plants) and their viruses than on invertebrates and their viruses, and relevant genetic data on viruses and antiviral resistance are lacking for almost all invertebrate phyla. If major lineages differ systematically in their molecular or ecological interaction with viruses, as might be expected given the differences in immune mechanisms, then the research bias could skew our overall perspective of host-virus (co)evolutionary process [e.g . 5 ].
In this review we present data from arthropods that broadly suggest viruses do indeed drive invertebrate evolution -selective sweeps, resistance polymorphisms, and elevated rates of protein evolution have all been attributed to virus-mediated selection. However, whether this is part of a strict coevolutionary process [6,7] is less clear: viruses certainly evolve in response to invertebrate hosts, but as yet there is relatively little evidence demonstrating that this occurs as part of a reciprocal selective process.

Virus-driven invertebrate evolution
Selection by viruses could drive frequent and rapid fixations in invertebrate populations, reducing genetic diversity at the selected loci and elevating divergence between species. Selection on amino-acid sequences, which may be common for antagonistic host-virus interaction, could additionally elevate the rate of non-synonymous substitution (dN). Comparison of such 'footprints of selection' between immune genes and genes with other functions argues in favour of pathogen-mediated selection in arthropods generally [e.g. [8][9][10][11], and identifies the antiviral RNAi pathway as a potential coevolutionary hotspot in Drosophila [9,12 ,13]. Genes mediating antiviral RNAi [Ago2 and Dcr2, reviewed in 14] are among the fastest evolving 3% of protein sequences across D. melanogaster and D. simulans, with adaptive amino-acid fixations in this pathway estimated to happen every 10-40 thousand years [15]. Moreover, there is evidence for positive selection and recent selective sweeps in antiviral RNAi genes from multiple Drosophila lineages, while homologous 'housekeeping' genes do not show this pattern [12 ,15,16].
The hypothesis that this is driven by a molecular 'arms race' with viruses is appealing [15], first because virusencoded suppressors of RNAi (VSRs) are widespread among RNA viruses [reviewed in 17], second because some VSRs are known to interact directly with AGO2 and DCR2 [e.g. [18][19][20], and third because VSRs from Drosophila Nora viruses can be highly specific to the host species' AGO2 [21 ]. However, other invertebrate antiviral genes are not reported to display extensive positive selection, and it remains possible that selection on Drosophila RNAi genes has been mediated by other selective agents [22]. To test whether such potential 'hot spots' of immune system evolution are a general phenomenon will require data from a wider range of invertebrate taxa, and based on sequence analysis alone it will remain hard to attribute selection to the action of viruses.
Virus-mediated selection may also be inferred using highfrequency large-effect host resistance polymorphisms, as these can result from negative frequency dependent selection (i.e. when rare alleles have higher fitness) or incomplete/ongoing selective sweeps [reviewed in 7]. A large-effect polymorphism in the D. melanogaster autophagy-pathway gene ref (2)P conveys resistance to the vertically-transmitted Drosophila melanogaster Sigma Virus (DMelSV), with the resistant allele reducing viral transmission by $90% in females and $60% in males [reviewed in 23]. The resistant allele occurs at 25-35% in European populations, and population-genetic analyses suggest it arose roughly 1-10Kya and has increased in frequency recently [24,25]. A second large-effect DMelSV resistance polymorphism comprises a natural Doc transposable element insertion into CHKov1 followed by a partial duplication and inversion involving CHKov1 and CHKov2. The Doc insertion exists at high frequency (80% in a North American population) and reduces infection rates by $50%. The subsequent rearrangement gave rise to a virus-inducible CHKov2 transcript associated with an 80-140 fold decrease in viral titre [26]. Again, population genetic analyses of this locus suggest resistance is derived and has recently increased in frequency [26,27]. Resistance to Drosophila C virus (DCV) is associated with segregating variants in pastrel ($50% increase in survival time) and Anaphase promoting complex 7 (>100% increase, but this currently lacks experimental verification [28 ]), although both resistant alleles are currently rare [15% and 3% of surveyed alleles in the wild, see 28 ]. Finally, experimental evolution under recurrent challenge with DCV also identified functional polymorphism in pastrel, and further identified virus-resistant alleles segregating in Ubc-E2H and CG8492. The DCV-resistant alleles of pastrel and Ubc-E2H respectively displayed a 24% and 14% selective advantage under experimental conditions, and knock-downs of gene expression reduced survival after challenge [29 ].
High-frequency large-effect viral resistance polymorphisms have also been reported from other invertebrates. For example, segregating resistance to the Orsay Virus in the nematode Caenorhabditis elegans maps to a non-functional truncation of Drh-1, one of three dicer-related helicases involved in RNAi [30 ]. Here the susceptible allele is derived, but is nevertheless found at a global frequency of 23% and appears to have spread recently, perhaps suggesting the action of selection at a linked locus [30 ]. Polymorphism in the antiviral RNAi pathway (Dicer-2) has also been proposed to underlie some of the genetic variance for resistance to Dengue virus in the mosquito Aedes aegypti [31]. In other cases the mechanism for resistance is unknown. For example, some populations of the pest moth Cydia pomonella have recently evolved resistance to its Granulosis virus, via a single dominant sex-linked allele that blocks viral replication [32,33]. Similarly, resistance to White Spot Syndrome Virus in the shrimp Penaeus monodon has been mapped to single marker associated with a $2000-fold reduction in viral titre [34], which occurs at a frequency of 40-60% [35].
Such polymorphisms are consistent with negative frequency dependent selection or with incomplete/ongoing selective sweeps [e.g. 28 ], but because the resistant allele is often recently derived and increasing in frequency, it seems likely that many may be in the process of fixing. However, robustly attributing evolution to virus-mediated selection is challenging, and selection by other agents [e.g. Doc insertion in CHKov1; 27], and at linked loci [e.g. drh-1 deletion; 30 ] have been proposed in some cases. Nevertheless, experimental evolution shows that virus-mediated selection can lead to a rapid evolutionary response in Drosophila and can select for segregating variants such as pastrel [29 ] and ref (2)

Invertebrate-driven virus evolution
It seems certain that viral evolution occurs in response to invertebrates, if only because hosts always dominate the viral environment. For example, viral adaptation may underlie host-specificity seen in some insect viruses [e.g. 21,37,38], and adaptation to the invertebrate host has been attributed to specific amino-acid changes in several invertebrate-vectored viruses, including Chikungunya Virus, Venezuelan equine encephalitis virus, and West Nile Virus [39][40][41]. Such adaptation to the host may also be reflected by the tendency for Sigma Viruses to replicate more effectively in closer relatives of their natural hosts [42].
Given this, it is interesting to ask whether virus evolution occurs in response to specific host immune mechanisms. Genotype by genotype interactions -with host polymorphism for resistance and viral polymorphism for overcoming that resistance -may be indicative of negative frequency-dependent selection or incomplete on-going selective sweeps in the virus, driven by selection mediated by host resistance. For example, genotype by genotype interactions have been reported between Dengue Virus 1 and Aedes aegypti mosquitoes [e.g. 43,44]. The best-studied invertebrate case may be the interaction between ref (2)P and DMelSV [reviewed in 23,45], where a viral lineage capable of overcoming ref (2)P resistance arose a few hundred years ago and subsequently spread to become the most common form [46,47]. The rapid spread of this resistance-insensitive virus was documented as it occurred in two European populations [48,49], and experiments suggest that the ref (2)P-insensitive virus can replace the sensitive virus in a resistant ref (2)P host background -indicating that host resistance may indeed drive viral evolution [36]. The rapid spread of a viral lineage may often be indicative of a selective sweep, and such expansions have also been seen in the Sigma virus of D. obscura [50]. However, without additional evidence of pre-sweep genotypes or genomic regions such potential sweeps cannot be differentiated from expansions [e.g. an epidemic, 51], and cannot be attributed to host-mediated selection.
It is often argued that if host resistance drives the recurrent appearance of novel viral protein variants, then this may elevate the ratio of non-synonymous to synonymous variants (dN/dS) in the virus [e.g. 52, but see 53]. This is widely accepted for some viral genes interacting with the vertebrate immune system [e.g. 52,54], but although several multi-isolate invertebrate datasets are available [46,47,50,[55][56][57][58][59][60][61][62][63][64], few present whole genomes or analyse patterns of protein evolution [c.f. 51]. However, some vertebrate and plant viruses interact with their invertebrate vectors, allowing the additional impact of invertebrate-mediated selection over and above that mediated by vertebrates or plants to be detected [65,66]. Previous analyses of viral surface proteins -which often interact directly with host proteins -suggests that dN/dS is lower     . Note that the number of PSCs did not correlate with the total number of codons. Viruses were chosen to encompass a wide phylogenetic distribution, and were included if !20 complete genomes were available (!16 complete genomes for invertebrates). If >100 genomes were available, the data were down-sampled at random to 100 sequences. Selection was inferred using FUBAR [71] from the HyPhy package [72] on a 20 Â 20 grid with 10 independent MCMC chains each providing 1000 subsamples from the posterior (each 5 Â 10 8 steps after 5 Â 10 8 burn-in steps). Codons were only included if the effective sample size from the posterior was !100. Overlapping reading frames were excluded and recombination breakpoints were inferred using GARD [73] before FUBAR analysis. GLMMs were fitted using MCMCglmm [74], with host as a fixed effect and viral family as a random effect. A Gaussian distribution was assumed for median dN-dS values, while the number of PSCs was assumed to be Poisson distributed. Significance was assessed by examination of the credibility intervals.
which positive selection was often detectable were nonvectored vertebrate viruses [detected in 12 of 27, vs 1 of 17 for vectored vertebrate viruses, and 2 of 24 and 1 of 10 for vector-borne and non vector-borne plant viruses; 65,66]. Taken together, these data may suggest that constraint is higher in vector-borne viruses, but that neither plants nor invertebrates are as likely as vertebrates to drive viral dN detectably above dS. Figure 1

Conclusions
Despite the evidence for strong positive selection acting on some antiviral immunity genes, there are generally few sites in the viruses of vertebrates, arthropods, or plants which exhibit detectable positive selection using the dN > dS test, and the number does not differ significantly between these groups (Figure 1). There is generally little evidence for pervasive diversifying selection in either surface proteins [65,66] or VSRs [67]. However, even assuming that dN > dS is a good metric of positive selection, there are at least two reasons why it may be hard to detect an arms race using such data from RNA viruses. First, if hosts drive global selective sweeps to fixation in the virus, then standing dN/dS within a population will not strongly reflect the impact of positive selection [53]. Second, even if different viral lineages respond in parallel to selection -so that comparisons between the lineages might be expected to display elevated dN/dS -the disparity in evolutionary rates means that host fixations will be so infrequent, compared to viral mutations, as to have virtually no impact on viral dN/dS [e.g. 67]. Therefore it is perhaps unsurprising that the well-known examples of pervasive diversifying selection in viruses are not driven by coevolution with the host population, but by virus evolution in response to the rapidly changing 'adaptive' immune response of vertebrates [e.g. 54].
Given the difficulty associated with inferring invertebrate-virus coevolution from historic patterns of protein evolution, the best evidence instead comes from patterns of functional polymorphism. Although the most compelling case is arguably the ref (2) 70], and suggest that ongoing and/or incomplete sweeps may be widespread. Indeed, if viral insensitivity to resistance often arises rapidly, before the resistant allele has fixed, then reciprocal invertebratevirus coevolution may be much more widespread than is evident from reciprocal sweeps to fixation.