Introduction

Despite many advances in our understanding of virus evolution, how virulence evolves in a virus, particularly following a jump to a new host species, continues to be contentious. Will a virus become more or less virulent in a new host? What level of virulence is optimized by natural selection and why? Is there any consistent association between host-jumping and virulence such that predictions can be made about how virulence might evolve following emergence? Not only will the answers to these questions reveal fundamental aspects of virus biology, but they also may assist in infectious disease management and mitigation, particularly as humans, other animals and plants face a continual threat from emerging viruses.

The term ‘virulence’ has different meanings depending on context, can be assessed in a variety of ways and is often only an operational measure1. To be as general as possible, we assume a simple working definition of virulence: the harm caused by pathogen infection, particularly in terms of host morbidity and mortality. Virulence is also a complex trait determined by a combination of pathogen, host and environmental factors. Although it is obviously necessary to understand all three, we focus on the pathogen (virus) component as this is the most tractable, with the small genomes and rapid replication and evolution of viruses facilitating comparative and experimental studies, and because there is strong evidence for heritable virus genetic variation for virulence2. Although we purposely focus on the analysis of virus genomes, genome-wide association studies (GWAS) promise to open up new ways to explore the evolutionary impact of viral infections on host genomes3 and hence the intimate interaction between host and virus4.

Another complexity in studies of virulence evolution is that metagenomics is increasingly showing that mixed (that is, polymicrobial) infections are commonplace5, and even seemingly healthy hosts can carry multiple microorganisms of the kind often thought to be pathogenic6,7. Determining which microorganism is the cause of a particular disease syndrome can be troublesome, and it is possible that overt illness might result from synergistic interactions between multiple microorganisms that overwhelm the host. Hence, the model of one pathogen–one disease that has dominated studies of human infections, and implicitly models of virulence evolution, may be overly simplistic. It is also possible that measurements of relative virulence vary between humans and wildlife populations. For example, whereas dengue is considered an important infectious disease of humans, the virulence of dengue virus is low in terms of overall mortality in humans, and infections of equivalent severity may go unnoticed in wildlife, particularly as there is a strong sampling bias towards the most virulent presentations.

Virulence evolution in viruses has traditionally been studied from one of two separate research paths — the theoretical and the empirical — that have largely been pursued independently. Although theory and empiricism have each generated important and parallel insights, they have each been able to paint only a partial picture of virulence evolution. Few attempts have been made to bridge this divide8,9.

There is now a large body of long-standing evolutionary theory that considers what level of virulence maximizes pathogen fitness under variable conditions, such as differing modes of transmission, levels of co-infection, selection pressures, and both within and between hosts10. Although of great value, a drawback is that this work is unavoidably based on a small number of case studies, the most famous of which is the co-evolution of myxoma virus (MYXV) and European rabbits following the release of MYXV as a biological control11. However, the insights from examples such as MYXV may be insufficient to adequately inform theoretical models confronted with novel, real-world emergence events.

By contrast, empirical studies involve laboratory-based methods to identify the mutations that affect virulence (that is, virulence determinants), usually on the basis of a combination of reverse genetics and cell culture and/or animal models8,9,12,13,14,15,16. These studies are often very successful in pinpointing causal mutations (see Table 1 for illustrative examples) and are commonplace following the emergence of a new disease. However, because the mutations identified through experimental studies are not considered in an evolutionary context, their relevance for general theories of virulence evolution is usually ignored. In addition, in vitro methods may not reflect real-world selection pressures, there may be little consideration about how virulence mutations affect inter-host transmission, and animal models commonly differ from the species infected in the field. For example, virulence determinants in MYXV identified on the basis of in vitro studies and mouse models have often not been upheld in reverse genetic experiments using the natural rabbit host17. Similarly, despite the regularity of their use, there has been a long-standing debate over the validity of ferrets as accurate models for human influenza18.

Table 1 Examples of virulence determinants in viruses

Bridging the gap between the theoretical and empirical approaches would bring a new impetus to studies of virulence evolution. In this Review, we outline how this can be achieved within a phylogenomics framework. We show that virus phylogenies are being increasingly used to help identify virulence determinants and that the data obtained can be used to test general theories of virulence evolution. Such a phylogenomic approach to studying virulence evolution is timely because of the rapidity with which virus genome sequence data are now being generated, including during ongoing disease outbreaks of emerging viruses19,20,21, and because of the development of new phylogeny-based methods for studying and visualizing genomic data22,23,24. However, the success of this approach also requires that phylogenomic data are combined with relevant clinical, epidemiological and experimental metadata so that a direct link can be made between virulence, virus genotype and phenotype, and population fitness.

Disease emergence and virulence evolution

Arguably the most interesting context of virulence evolution is following a host jump as this sits at the heart of virus emergence, and the question of how virulence will evolve is commonly asked following the appearance of a new virus or the emergence of an existing virus with altered host range (Box 1). To make meaningful inferences, it is important to compare virulence in both the reservoir (that is, donor) and novel (that is, recipient) host species. Although this may sound straightforward and is tractable in some cases25, in reality it faces a number of difficulties. In many cases, including common infections such as hepatitis C virus26, as well as emerging infectious diseases such as Zika virus (ZIKV), the reservoir species is unknown or is at best uncertain. Even if a reservoir species is known, we generally know little, if anything, about virulence in that species, and there is likely to be an ascertainment bias towards the most virulent cases. For example, although species of fruit bats appear to be reservoirs for Ebola virus (EBOV)27, little is known about its virulence in these animals28, as is true of many wildlife infections. The identification of host species may also change with better sampling, which is invariably poor in wildlife. For example, it was long thought that the canine parvovirus (CPV) that emerged in dogs in the late 1970s had jumped from cats infected by a closely related virus29. However, more recent sampling of wild carnivore species has shown this to be incorrect, such that the true reservoir species for CPV is unclear30. Therefore, it is crucial to understand disease processes, including virulence, in reservoir hosts under natural conditions, which will require more detailed studies of animal ecology.

Although we currently know little about virulence in reservoir species, comparative data tell us that, on average, low-virulence infections have a greater chance of successfully establishing transmission cycles in humans than viruses with higher mortality31. This greater chance is presumably because high virulence requires a greater supply of susceptible hosts during the early stages of emergence.

Theories of virulence evolution

Evolutionary biologists have had a long fascination with virulence32,33,34,35,36,37. Because there is a very large literature base on this subject, we necessarily provide only a brief overview here. A straightforward interpretation of virulence evolution is that natural selection will optimize the level of virulence that maximizes pathogen fitness, expressed as the basic reproductive number (R0)1, although in reality fitness is shaped by a complex set of host–pathogen interactions38,39. Current evolutionary theory tells us that when a virus jumps to a new species, its initial virulence can vary from asymptomatic to highly pathogenic, and precisely where it lies on this virulence spectrum is difficult to predict. However, it is possible that the direction of virulence evolution can be anticipated, at least in part, if the key relationship between virulence and transmissibility, and hence fitness, is understood. Importantly, there is also evidence from insect viruses that host phylogeny is able to predict some aspects of virulence evolution following species jumps, with related host species tending to have similar levels of virulence25.

A commonly stated idea is that there is often an evolutionary trade-off between virulence and transmissibility because intra-host virus replication is necessary to facilitate inter-host transmission but may also lead to disease, and it is impossible for natural selection to optimize all traits simultaneously. In the case of MYXV, this trade-off is thought to lead to ‘intermediate’ virulence grades being selectively advantageous: higher virulence may mean that the rabbit host dies before inter-host transmission, whereas lower virulence is selected against because it does not increase virus transmission rates. A similar trade-off model has been proposed to explain the evolution of HIV virulence40. However, many doubts have been raised about the general applicability of the trade-off model35,41,42,43, virus fitness will be affected by traits other than virulence and transmissibility39,41,44, contrary results have been observed in experimental studies45 and relatively little is known about evolutionary trade-offs in nature. For example, in the case of the second virus released as a biocontrol against European rabbits in Australia — rabbit haemorrhagic disease virus (RHDV) — there is evidence that virulence has increased through time, probably because virus transmission often occurs through blow flies that feed on animal carcasses, making host death selectively favourable46. Similarly, experimental studies of plant RNA viruses have shown that high virulence does not necessarily impede host adaptation47 and, in the case of malaria, higher virulence was shown to provide the Plasmodium parasites with a competitive advantage within hosts48.

Other factors in addition to evolutionary trade-offs can shape the level of virulence in an emerging virus. For example, ‘short-sighted’ virulence evolution within a single host may be detrimental for inter-host transmission49, and newly emerged ‘spillover’ infections that have experienced only a limited number of transmission events are likely to have virulence levels that have not yet been optimized for transmissibility by natural selection50. Accordingly, for spillover infections, ongoing transmission may be largely at the mercy of random drift effects, including the severe population bottlenecks that routinely accompany such events51. Finally, it is possible that virulence may sometimes simply be a coincidental by-product of selection for another trait or selection for transmission in another species.

Theory therefore tells us that natural selection can increase or decrease pathogen virulence, depending on the particular combination between host, virus and environment1,32,33,37,41,52,53. Although providing a useful framework, theory can provide only useful generalities because the relevant factors vary substantially and need to be assessed on a case-by-case basis. Virulence evolution could, however, be better understood if its genomic basis were known.

Phylogenomics for assessing virulence evolution

Phylogenetic studies of viruses, including those that consider whole-genome sequences, are commonplace and are often used to understand a variety of aspects of virus evolution. In particular, virus phylogenies are being increasingly used to understand the evolution of key phenotypic traits such as virulence9 (and see the examples described below). Phylogenomics provides an informative way to help understand virulence evolution and establishes a set of hypotheses that can be tested using appropriate experimental assays8,9. We also believe that phylogenomics provides valuable information on how natural selection acts on virulence and can be used to test general models of virulence evolution, thereby providing a key link between theoretical and empirical approaches to studying virulence evolution. The crux of this approach involves mapping mutations onto phylogenetic trees of viruses sampled within and/or between disease outbreaks and from reservoir and novel hosts. The phylogenetic location of these changes — whether they fall on shallow or deep nodes (branches) and/or singularly or in parallel — makes it possible to infer, at least in broad terms, the selection pressures acting on virulence mutations and from this infer important aspects of virulence evolution (Box 1; Fig. 1).

Fig. 1: Phylogenomics of virulence evolution.
figure 1

a | A model phylogeny with virulence determinants mapped to a fairly deep node suggesting that higher virulence has increased virus fitness. b | A model phylogeny with virulence traits mapped to shallow nodes suggesting that higher virulence reduced pathogen fitness so that viruses with these mutations are purged from the population or require compensatory mutations. c | A model phylogeny with a high-virulence mutation arising multiple times independently owing to parallel or convergent evolution. The occurrence of parallel/convergent mutations that occur more frequently than by chance8 is likely to reflect adaptive evolution (Fig. 2). d | The relationship between virulence, fitness and host jumps. A virus is assumed to be at a fitness peak (high R0), in this case high virulence, in the reservoir host, so that the mutations determining both virulence and host range are expected to be subject to strong purifying selection (for example, a low value of dN/dS). As the virus emerges in the new recipient host, it will initially be maladapted (that is, reside in a fitness valley) and subject to genetic drift as the population is small. As it adapts to the new host, virulence will be selectively optimized (in this case declining), increasing R0 and resulting in positive selection (for example, dN/dS > 1, although other measures of selection pressure are available). Once the virus becomes adapted to the new host, the virulence determinants are again subject to purifying selection.

The greater the fitness of a virulence determinant, the more rapidly it will spread through the virus population and the deeper it will fall on a virus phylogeny (that is, closer to the root of the tree), including on the branch linking reservoir and novel hosts. Of particular importance are repeated occurrences of the same mutation falling on deep branches across multiple outbreaks, or multiple cross-species transmission events, as both parallel evolution and convergent evolution can be signatures of adaptive evolution8,54,55,56,57 (Fig. 1). For example, in the case of West Nile virus (WNV), a single mutation in the virus helicase protein repeatedly evolved in high-mortality outbreaks in birds, which is indicative of a selective advantage58. Similarly, the reversion to virulence in oral polio vaccine (OPV) strains of poliovirus has been associated with extensive parallel evolution8, and parallel evolution was also associated with host-specific adaptation in experimental studies of cross-species transmission involving Drosophila virus C59. Because adaptive evolution has been at play, phylogenies of the sequence in question may have a characteristic shape24, and the sequences associated with selected branches may also contain genomic signatures indicative of positive selection, such as the rapid fixation of amino acid changes or an increased rate of nonsynonymous to synonymous substitutions per site (ratio dN/dS)8,60. In the case of frequent parallel or convergent evolution for specific virulence mutations, it is also possible that the amino acid sites involved will have signatures of positive selection, such as an elevated dN/dS (as was the case in WNV; see below).

Following the same logic, mutations that fall on shallow branches in virus phylogenies (that is, closer to the tips) are present in a smaller proportion of the population and are therefore more likely to be of lower fitness such that they may be removed by purifying selection. Hence, virulence-determining mutations that repeatedly fall on tip branches alone are likely to inhibit some other aspect of pathogen fitness, thereby reducing R0 at the population scale.

Although this approach has a solid theoretical and empirical basis61,62, a complicating factor is that a virulence-determining mutation that has very recently emerged will necessarily fall towards the tips rather than on an internal branch even if it is selectively advantageous. Similarly, although popular, dN/dS measures are less robust over short timescales, such as during outbreaks, because mutations may not have reached fixation by positive selection or had time to be purged by purifying selection, and it can be difficult to detect selected mutations that occur only once63,64. Approaches to detect positive selection that do not rely on dN/dS, such as those based on tree shape24, or tracking mutations that are increasing in frequency compared with those thought to be evolving neutrally64,65, may therefore add analytical power.

The phylogenetic mapping of virulence mutations can proceed in two ways depending on the extent of a priori knowledge. In a ‘top-down’ approach, in which virulence determinants are unknown, a virus phylogeny is inferred, mutations are mapped onto this phylogeny and the mutations on key branches are then identified. Such ‘key branches’ include those directly associated with cross-species transmission events, invasions of new geographic areas, increases in rates of transmission, spikes in morbidity and/or mortality or clear instances of positive selection. The mutations identified in this way are candidates for virulence determinants that can be tested in an appropriate experimental framework8. An example of this approach is shown in Fig. 2. The second, ‘bottom-up’, approach utilizes existing knowledge of virulence determinants, such as that determined by an experimental study. The putative virulence determinant is then mapped onto the phylogeny, and its phylogenetic location (that is, deep or shallow branch, singular or parallel/convergent evolution) is used to infer how it affects virulence evolution, whether it is associated with reciprocal mutations that reflect evolutionary trade-offs and the selection pressures it faces.

Fig. 2: Example of how phylogenomics can guide the experimental analysis of virulence determinants.
figure 2

The evolution of virulence in strains of oral polio vaccine (OPV)8. OPV is an attenuated form of poliovirus that can occasionally revert to a virulent form and cause outbreaks of poliomyelitis. A | Phylogenetic analysis of OPV strains in nature reveals that some mutations associated with high virulence have experienced more frequent parallel evolution than expected by chance (and occupy well supported nodes) and hence are likely to be seletively favoured8. B | Computational evolutionary analysis then reveals that this parallel evolution for high virulence is associated with a hypothetical threonine-to-proline (T-to-P) amino acid change that is subject to significant adaptive evolution (which can be detected in a variety of ways)60,63. C | The virulence impact of these mutations is then confirmed in both in vitro (cell culture; part Ca) and in vivo (mouse; part Cb) experimental studies. In all cases, the red shading signifies increased virulence.

Although this phylogenomic approach is being increasingly used to identify virulence determinants, and we discuss a number of real data examples below, it can be used to make general statements about the nature of virulence evolution. Specifically, a virulence mutation that falls deep in the phylogeny such that it is inherited in all subsequent branches, and one evolving in parallel or with evidence of positive selection, necessarily implies that virulence is selectively advantageous. Conversely, a virulence determinant that occurs sporadically on shallow branches and is subject to strong purifying (negative) selection suggests that virulence is not directly beneficial, probably because it inhibits some other component of overall fitness. In such cases, each instance of high virulence may represent an independent and transient evolutionary event. If only a single mutation is associated with a change in virus virulence, as in the case of WNV, then this change in virulence is likely to be selectively advantageous without an evolutionary trade-off with transmissibility, as a reduction in transmissibility would probably need to be compensated for by additional reciprocal mutations located elsewhere in the genome. Hence, if multiple mutations fall on a branch associated with a change in virulence, it is possible that some are virulence determinants and the others are associated with evolutionary trade-offs on other traits.

Although we have described it in terms of emerging viruses, this phylogenomic approach can, in theory, be applied to any system in which a phylogeny can be inferred and in which it is possible to experimentally assess the impact of individual mutations on virulence. Similarly, it can be used to study other virological traits associated with disease emergence, particularly host range. For example, the repeated evolution of the same amino acid changes following the cross-species transmission of avian influenza virus to humans strongly suggests that they directly affect host range66, and a similar approach has been used to elucidate the nature of the evolutionary arms race between viruses and their hosts67,68.

Critically, however, the approach described here should also be considered an idealized one that works best when a limited number of genomic mutations act independently to shape virulence. Virulence determinants may be harder to identify when there are more complex interactions between mutations9, which appears to be true of MYXV (Box 2). Although epistasis is likely to be commonplace in RNA viruses69, little is currently known about whether virulence mutations interact epistatically70. Similarly, this approach may work best for RNA viruses because their constrained genome sizes mean that there are probably a limited number of virulence determinants, increasing the likelihood that they are subject to parallel and/or convergent evolution, and rates of recombination (which complicate phylogenetic relationships) are often fairly low within species71.

Examples of virulence evolution in nature

To illustrate how a phylogenomic approach can shed light on the evolution of virus virulence, we now briefly outline a number of cases in which it can be or has been applied. We begin by considering cases in which virulence determinants have been successfully mapped (WNV and avian influenza A virus (AIV)), move on to those in which revealing the mutations that underpin changes in virulence has been more complex (MYXV, Marek’s disease virus (MDV) and HIV) and end by examining virulence evolution in two recent disease outbreaks (EBOV and ZIKV). When possible, we also outline what the phylogenomic analysis in each case has told us about the evolution of virulence in general.

West Nile virus

In 1999, a new lineage of WNV became the leading cause of arthropod-borne viral encephalitis in humans and horses in North America, spreading from east to west across the continent72 and causing severe mortality in many bird species, particularly the American crow73. Phylogenomic analysis revealed that a single Thr249Pro (T249P) amino acid substitution in the virus NS3 helicase protein was associated with high-virulence WNV outbreaks in corvids on multiple continents. Experimental analysis in captive crows then showed that this mutation was sufficient to explain the high fatality rates in American crows, perhaps because it increased the rate of virus replication58. WNV therefore provides an important example of where a single genetic switch controls virulence, which is obviously the easiest scenario to detect using a phylogenomic approach. Of more general importance was that T249P evolved in parallel and experienced an elevated rate of nonsynonymous change, suggesting that high virulence was selectively favoured in the absence of an evolutionary trade-off as no reciprocal mutations were observed elsewhere in the viral genome58. However, because WNV infects a variety of bird species, it is possible that the repeated appearance of T249P in fact reflects aspects of viral evolution in different hosts. In particular, American robins may have been responsible for a substantial proportion for the cross-continent spread of the virus74, in which case T249P may have been selected to increase replication (transmissibility) in that species, a coincidental by-product of which was heightened virulence in crows.

Avian influenza A virus

Ever since the emergence of the highly pathogenic H5N1 subtype of influenza virus, there has been concern over whether this influenza virus could establish sustained transmission in humans, in which it causes only sporadic spillover infections at present75,76. More recently, highly pathogenic H7N9 has sporadically infected humans77 and continues to spread through poultry populations in China78, evolving from a low-virulence ancestor79. Although the true number of human cases, and hence accurate mortality, is difficult to ascertain, it is clear that both H5N1 and H7N9 cause fairly high mortality in humans and could have serious consequences were they to trigger a large-scale human epidemic. This concern has led to attempts to use genomic data to help in pandemic risk assessment80.

At the virus subtype level, the presence of a run of polybasic amino acids in the hinge region between the HA1 and HA2 subunits that make up the haemagglutinin (HA) protein of influenza virus helps it establish a systemic, and subsequently more serious, infection and thereby acts as a useful marker of high-virulence strains of the H5 and H7 AIV subtypes79,81,82. This marker makes it relatively easy to distinguish between potentially low-virulence and high-virulence AIVs, although what triggers the evolution of the high-pathogenicity variants in these subtypes is unclear83. Other individual amino acid changes, affecting a variety of gene functions, have also been proposed as specific virulence determinants for H5N1 (refs84,85,86) as well as in those viruses that circulate in human populations such as seasonal H3N2 (ref.87), the H1N1 virus responsible for the global pandemic of 1918–1919 (ref.88) (in which host inflammatory and cell death responses to infection appear to play a key role)89, and influenza B virus90 (Table 1).

The key unresolved question is how natural selection will shape both virulence and transmissibility if an AIV-like H5N1 or H7N9 virus is eventually able to develop sustained transmission in humans. An added complexity is that phylogenomic analyses reveal a consistent set of mutations that distinguish human and avian influenza viruses, although whether these affect host range alone, or both host range and virulence, is unclear66,84.

Myxoma virus

The canonical study of virulence evolution following a species jump is MYXV in European rabbits, with a body of classic work undertaken by Fenner and colleagues91,92,93 (Box 2). In both Australia and Europe, highly virulent strains of MYXV were used as a biological control against the European rabbit population, with releases beginning in the early 1950s. In both continents, the same trajectory of virulence evolution was observed: virulence declined from the highly virulent (that is, grade I) release strains to encompass a far wider range of virulence grades, including the most attenuated grade V strains, with strains of ‘intermediate’ virulence the most commonly sampled in the field. This pattern, reflecting a combination of the virus evolving more attenuated strains and the host developing resistance, fuelled the idea of a trade-off between virulence and transmissibility.

Sixty years after the initial release of MYXV, the first large-scale genomic studies of its spread were performed (Box 2). Phylogenomic analysis revealed that the virulence phenotype has changed on a regular basis94. However, a major surprise was that each change in virulence was associated with a different set of mutations across multiple genes94,95. Although which mutations had the greatest impact on virulence is still unclear and requires further experimental analysis, such a phylogenomic pattern indicates that there are multiple routes to achieving the same levels of virulence, including attenuation, such that there has been convergent evolution for phenotype but not genotype. It is likely that this evolutionary flexibility in part reflects the fairly large genome size of MYXV (a double-stranded DNA virus of ~160,000 bp), which may mean that there is a large number of potential virulence determinants that can interact through epistasis96, in turn complicating any phylogenomic analysis.

Marek’s disease virus

Whether ‘imperfect’ (that is, ‘leaky’) vaccination against infectious disease, in which disease symptoms are reduced but there is less impact on virus replication and transmission, will change the selection pressures acting on the pathogen and affect virulence evolution has been the source of debate97,98. Although still contentious, particularly in the case of human disease, there is good evidence that imperfect vaccination has increased virulence in the case of MDV, a DNA herpesvirus that poses a major problem to the poultry industry99. In the 1960s, the appearance of virulent MDV strains forced the development of the first generation of Marek’s disease vaccines. However, because these vaccines were imperfect, ‘very virulent’ MDV began to appear within 10 years, necessitating a second-generation vaccine. This very virulent MDV was followed, more rapidly, by the appearance of ‘very virulent plus’ MDV, requiring a third-generation vaccine (Fig. 3). Imperfect MDV vaccines enhance virulence by elongating the infectious periods and hence transmission potential of virulent strains that would have been removed by natural selection before transmission in the absence of vaccination99. Although the genomic basis to MDV virulence evolution is currently uncertain, with some causative amino acid changes proposed100, initial phylogenomic studies suggest that, as in the case of MYXV, there are multiple genetic pathways to high virulence101 (and which again may reflect the fairly large size of the viral genome). Not only does virulence evolution in MDV have important implications for vaccination strategies against other diseases in which vaccine efficacy is fairly low102, but it also shows that in some circumstances increased virulence can be selectively advantageous.

Fig. 3: Evolution of virulence in the context of imperfect vaccination.
figure 3

In the 1960s, a vaccine was developed for Marek’s disease virus (MDV) of chickens present on poultry farms. This imperfect vaccine reduced disease symptoms but did not prevent virus replication, thereby extending the infectious periods, and hence potential for transmission, of virulent strains that would have been removed by natural selection before transmission to a new host in the pre-vaccine era99. Because of this, ‘very virulent’ MDV began to appear within 10 years, necessitating the development of a second-generation vaccine that was also imperfect. This was followed, in an even shorter period, by the appearance of ‘very virulent plus’ MDV, requiring a third-generation vaccine. Although the genomic basis of MDV virulence is currently unknown, the phylogenies at the bottom of the figure hypothetically assign virulence to multiple causative mutations (as in the case of myxoma virus). The dashed arrows indicate the evolution of viruses to the next virulence grade.

HIV

Given the importance of HIV to human health and that it ignited much of the research on disease emergence, it is no surprise that there has been considerable discussion on the evolution of HIV virulence103,104,105,106,107. Indeed, it is striking that HIV in humans is markedly more virulent than the closely related viruses that naturally infect non-human primates in Africa (Box 1). Although there have been suggestions that HIV has begun to evolve reduced virulence108, discussions of the trajectory of virulence evolution are necessarily complicated by the fact that antiviral therapy has greatly extended life expectancy.

HIV virulence is often approximated as the degree of variation in the set point viral load (SPVL) that is established soon after initial infection104. The higher the SPVL, reflecting greater levels of virus replication, then the more rapidly the patient will progress to AIDS in the absence of antiviral therapy, although other studies have suggested that the replicative capacity of the virus itself is a more informative marker of virulence and is also a direct measure of virus fitness109. Indeed, some ‘controller’ individuals are able to control levels of HIV in the absence of antiviral therapy, and it has been shown that this is in part due to infection with viruses of reduced replicative capacity110.

Importantly, viral genetic variation may play a more important role in shaping HIV virulence than host factors, with approximately one-third of the observed variability in SPVL assigned to virus factors111 and only ~13% seemingly due to the host112. This observation also implies that SPVL, and hence virulence, can be selectively optimized113. In support is evidence that SPVL, and hence virulence, has declined in some African HIV subtypes, even accounting for the use of antiviral therapy, and that this reflects a trade-off between virulence and transmissibility114. Importantly, however, despite many studies into the determinants of HIV virulence, the virus genomic mutations responsible for determining SPVL are still uncertain and multiple genes may be involved104. The difficulty in assigning the genetic determinants of SPVL may be in part due to genetic variation across viral populations111. For example, heritability in SPVL was highest (~60%) between individuals in the Swiss HIV cohort, which also represents the most homogenous viral population113.

Ebola virus

The 2013–2016 outbreak of EBOV (Makona variant) in West Africa was the largest and longest described in humans since the first description of the disease in 1976, with approximately 29,000 cases and some 11,000 deaths. In addition to hindering attempts at disease control, this elongated period of transmission in humans may have resulted in different selection pressures from those faced in the animal reservoir. This outbreak also raised key questions about virulence evolution, particularly whether natural selection would have favoured EBOV variants causing higher or lower human case fatality rates had the virus not been stamped out by public health intervention115.

Phylogenetic analysis of EBOV during the 2013–2016 outbreak revealed an Ala82Val (A82V) substitution in the virus glycoprotein to be of particular importance70,116. A82V is notable as it falls on a deep internal branch of the EBOV phylogeny, compatible with adaptive evolution, whereas other amino acid changes are associated with individual or only small clusters of sequences. Moreover, A82V improves binding to the human NPC1 receptor utilized by EBOV117, which would increase infectivity in humans, at the same time reducing infectivity in cells from the bat reservoir species70,116. Intriguingly, the appearance of A82V on the EBOV phylogeny is associated with two key epidemiological features: an increase in case numbers and an increase in mortality (Fig. 4). If there was indeed an increase in both EBOV transmissibility and virulence, then higher virulence is likely to have directly increased viral fitness, and in the absence of an evolutionary trade-off as only a single substitution was identified. However, these apparent changes in phenotype also coincided with the movement of the virus from Guinea to Sierra Leone, such that any change in case numbers and mortality could in fact be due to a change in epidemiological factors (such as access to health care or differing human demographics and/or transmission networks), and recent studies using animal models suggest that A82V has no direct impact on virulence118.

Fig. 4: The relationship between host adaptation and the evolution of virulence in Ebola virus.
figure 4

A model Ebola virus (EBOV; Makona variant) phylogeny illustrates the evolution of a single amino acid substitution (glycoprotein, A82V) that is associated with viral adaptation to the human host during the West African EBOV outbreak of 2013–2016. A82V improves binding to the human NPC1 receptor utilized by EBOV, increasing infectivity in humans (red) while simultaneously reducing infectivity in cells from the bat reservoir species (blue)70,116. Maps above the phylogeny show the spread of EBOV over the timeline of the outbreak in the three affected countries in West Africa, where blue-shaded regions correspond to the wild-type virus variant (A82) and red-shaded areas correspond to mutated virus variant (V82). It is possible that A82V was also associated with an increase in both EBOV case numbers and mortality (that is, virulence) as the outbreak progressed, such that increased virulence is directly selectively advantageous, although this is confounded by epidemiological factors.

Zika virus

ZIKV is the most recent emerging virus to lead to a major public health scare and is puzzling because a seemingly benign virus suddenly increased in virulence, causing severe neurological disease in humans. Before 2007, there were fewer than 20 human cases of ZIKV reported and all were mild infections restricted to Africa and Asia9,119. Consequently, neither the disease caused by ZIKV nor the molecular determinants of ZIKV virulence were well characterized, and it is likely that there was systematic under-reporting of infections, including those associated with severe disease. In 2007, the Pacific Islands reported the first major outbreaks of ZIKV before the virus spread to the Americas in 2014. Although the majority of human infections range from asymptomatic to mild, the virus was associated with the neurological Guillain–Barré syndrome in French Polynesia in 2013 (ref.120), and those cases from the Americas, particularly Brazil, were linked to more severe diseases, including congenital abnormalities such as microcephaly121. Phylogenetic analysis revealed that the most recent Zika epidemics are due to the Asian lineage of ZIKV, rather than the African lineage122, and that the virus spread cryptically in Brazil for at least a year before its detection20. Although there are multiple amino acid differences between the African and Asian lineages122,123, it has been claimed that those in the Asian lineage that spread through the Americas may be directly linked to both increased infectivity in Aedes aegypti mosquitoes124 and microcephaly, most notably a Ser139Asn (S139N) amino acid change in the PrM protein125. However, there is still considerable uncertainty in this area, with others arguing that viruses from both lineages can cause neurovirulence but that cases often go unreported9,126. Hence, the case of ZIKV highlights the difficulty in assessing virulence evolution within a background of sparse and biased sampling even with phylogenomic data and shows the importance of collecting reliable, real-time epidemiological data even in low-incidence situations.

Future directions and recommendations

Virulence evolution has been one of the longest-standing issues in evolutionary biology. Although a strong body of theory has been developed, there are few cases in which we understand the forces that have shaped particular instances of virulence evolution and even fewer in which we have successfully linked evolutionary theory with individual genomic changes. We believe that a synthesis of experimental studies of virulence determinants and long-standing theory of virulence evolution set within a phylogenomic framework will generate a more comprehensive understanding of virulence evolution. In particular, not only does a phylogenomic approach enable potential virulence determinants to be identified, which is being increasingly used in the case of emerging viruses, but this analysis also sheds light on the models of virulence evolution that have occupied theoreticians for decades.

Recent advances in real-time genomics during disease outbreaks127,128 and the increased demand for precision in public health interventions may help in the development of a new understanding of the evolution of pathogen virulence. We contend that this can be achieved within a phylogenomic framework as long as relevant data are available and strong links are made between genomics, phylogenetics, epidemiology, and experimental studies of virus virulence and fitness. Therefore, it is critically important to collect clinical (that is, disease symptoms and severity) and epidemiological (that is, time and place of sampling) metadata concurrently with the sequencing of virus genomes and to sample across a range of clinical syndromes, not just those associated with severe disease. We also stress the value of gathering concurrent and historical data from likely reservoir species as these will provide a more complete insight into virulence evolution and determining the full range of microorganisms that infect a particular species, as well as their interactions, as assigning disease syndromes to individual pathogens may often be difficult. Thankfully, advances in metagenomics now make the latter task feasible6,7,129. Similarly, there is a marked lack of good virulence grading schemes among viral infections. Although such schemes can sometimes be simplistic, assuming discrete virulence categories that may not exist in nature and incorporating degrees of subjectivity, the case of MYXV shows that they are key to considering the relationship between genotype and phenotype that is essential to understanding virulence evolution.

Finally, it is possible that an increased understanding of virulence evolution drawn from a phylogenomic approach may contribute to new strategies for pathogen control and eradication, and there is a clear potential for this framework to inform and improve the fields of disease management and the biological control of invasive pests. Although predicting where and when a new disease might emerge is clearly unfeasible because of the immense complexities involved54,130, predicting the overall trajectory of the virulence evolution of a virus in a novel host may be more achievable. Once again, biocontrol presents a compelling example. Although controversial131,132,133, the proposed release of cyprinid herpesvirus 3 (CyHV-3) as a biological control against invasive common carp (Cyprinus carpio L.) in Australia may present a unique opportunity to follow, in real time, the co-evolution between host and virus at both the genotypic and phenotypic scales. Both theory and virus natural history predict that CyHV-3 virulence will decline with time134, and it will be interesting and informative to see how any such virulence evolution is manifest in phylogenomic data.