Introduction

The Executive Committee of the International Committee on Taxonomy of Viruses (ICTV-EC) recently invited comments on a proposal to approve a standard binomial system of formal virus names [25]. How best to name and classify viruses have been constantly aired topics for discussion since viruses were first discovered at the end of the 19th century, and particularly when the number and diversity of viruses was realized early in the 20th. It was clear that an orderly system of nomenclature was required. So why has it taken so long to resolve the issues? Can anything be learned from earlier discussion? I believe they can, as opinions expressed early in this debate are still valid and indicate why it has taken so long, and also indicate how it can be resolved.

The earliest attempts to name and classify viruses were mostly individual efforts [4, 20], and the viruses of animals, bacteria and plants were discussed separately, but a watershed was reached when Holmes [14] published “The Filterable Viruses” as Supplement 2 to the 6th Edition of Bergey’s ‘Manual of Determinative Bacteriology’. It was an attempt to establish parallel systems of nomenclature for the viruses of bacteria, plants and animals using Latinized (Linnaean) binomials (LBs). One result was a meeting of the Society of General Microbiology (UK) to discuss the issues, and as Christopher Andrewes noted in his contribution,”The organizers of this discussion have very sensibly placed me, as representing sound common sense, between the two extremists, Drs Holmes and Bawden. Dr Holmes wants to start classifying and naming viruses on Linnaean lines right away. Dr Bawden is almost certain to advise you to have nothing to do with any such proceeding”. Holmes [15] pressed the need for nomenclatural continuity, which, he claimed, would be provided by LBs. Andrewes [2] mostly criticized the details of the Holmes/Zhdanov groupings of animal viruses, but Bawden [5] first explained how important it is for a classification to have a clear purpose, citing the different classifications of plants by botanists or farmers:

“ The main concern of the farmer or gardener is not whether a plant is graminaceous or cruciferous, but whether it is a useful plant for him to grow or a pernicious weed. To the botanist, couch grass may be a near relative of wheat, and charlock of turnips, but to the farmer one of each pair means a profit and the other a loss, two categories that, to his mind, could not be more unrelated.”

Bawden then discussed the characters available at that time for distinguishing between different viruses and concluded that they were inadequate to support a system of LB names because they did not reveal the phylogenetic relationships of viruses:

“ The fact that we cannot group our ‘collective species’ by inferred phylogeny is one of the reasons that makes me strongly oppose the use of Linnaean binomial names for plant viruses. These names not only demand identification at the species level, which I hope I have shown can be done, but the arrangement of species into genera, and the word genus to a modern taxonomist suggests a group of phylogenetically related species that is clearly separated from other genera.” Thus Bawden linked the use of Linnaean binomial names firmly with biological (phylogenetic) relationships at both the species level and the genus.

The original attempts to classify viruses were based on details of their host ranges and symptoms, but advances in biochemistry provided increasing amounts of information on the composition of virions, and, using these, Cooper [7] classified the known viruses of animals on whether their genome was RNA or DNA, whether or not there was lipid in the virions, and on the size of their virions. Lwoff, Horne and Tournier [19] expanded this classification to apply to viruses from all types of host by involving the structure of the virions, which they described as either “cubical” or “helical”. Lwoff presented the LHT system to a Cold Spring Harbor Symposium in 1962 and stated that their classification was “rational” and “an attempt at a coherent classification based on essential characters” but not “a natural phylogenetic classification”. The Symposium synopsis (http://library.cshl.edu/symposia/1962/index.html) records that Peter Wildy, who was present, “responded rather strongly to what he saw as an arbitrary scheme that was not meaningful.”, and when he subsequently reported on the Symposium to a meeting of microbiologists in the U.K., he summarized his views using a figure from Edward Lear’s ‘Nonsense Botany’ (1871):

figure a

Bawden and Wildy were of the same opinion, both considered the adoption of a LB system to be premature, unless the viruses could be grouped in a ‘biological’ or ‘natural’ classification, rather than a system based on arbitrarily chosen characters.

The Cooper and LHT classifications were further developed by Baltimore [3], who separated all viruses into seven non-hierarchical categories based on the type and strandedness of their genomes, and the way in which their mRNAs were produced. There was again no evidence that this classification was biological (i.e. phylogenetic), but it is simple to understand and, perhaps as a result, continues to be widely taught.

In the latter half of the 20th century the amount of taxonomically informative data for viruses increased greatly as viruses were often chosen for research into new techniques during the early decades of development of molecular biology. More recently, methods for, first, protein sequencing and then nucleotide sequencing were developed, and these quickly confirmed that most of the virus groupings devised using phenotypic data [21] were congruent with the phylogenetic relationships calculated from their gene sequences [26], so they were, in essence, the basis for a biological classification.

Further advances in sequencing methods resulted in the discovery that virus-like gene sequences (metagenomes) could be obtained from a wide range of living materials; the extant virosphere was found to be very much larger than expected, and furthermore, most of the metagenomes were found to be from outside the phylogenetic boundaries of known virus groupings. The ICTV responded very promptly to this exciting discovery, and a workshop of “invited experts in the field of virus discovery and environmental surveillance, and members of the ICTV Executive Committee” was held in June 2016 to discuss how best to include metagenomic sequences into the official taxonomy of the ICTV. Proposals were developed during the workshop and were presented and approved at a meeting of the ICTV-EC just two months later and reported to the world as a ‘Consensus Statement’ of the ICTV-EC [27]! The experts had decided to accept virus-like metagenomes as being those of viruses, and to incorporate them into the ICTV Taxonomy, even though nothing was known of their phenotypic properties, and degraded the precision of meaning of the word “virus”. As a consequence, two major changes were made to the content and appearance of the ICTV Taxonomy. First the available ranks in its hierarchy were increased from five to 15 [24], from Species to Realms, and secondly, a start was made to include metagenomes in the ICTV Taxonomy [29]. Why this was considered useful was not revealed, as those working with metagenomes only require the existing vernacular locality-based names for their work.

So, now the ICTV Taxonomy has realms at the base of its classification, and the realms are similar to the original LHT/Baltimore categories. The details of the first realm, the Riboviria; have been published on the ICTV website, and its name has already become widely used on the Internet. Walker et al. [29] stated that:

“Perhaps the most notable taxonomic change approved in this ratification is the establishment of the realm Riboviria, a likely monophyletic clade of viruses with positive strand, double-strand or negative-strand genomic RNA that use cognate RNA-directed RNA polymerases (RdRPs) for replication. The realm Riboviria is placed at the highest taxonomic rank permitted by the ICTV Code”.

Although the word “likely” is included, the announcement concludes by stating that:

“The evidence for monophyly of RNA viruses and for various clades within Riboviria has been accumulating over the years from phylogenetic analyses of the universal marker, RdRP, supplemented by comparison of additional molecular traits shared by subsets of RNA viruses”.

This may surprise those who have already read one or other of a large number of publications including, for example, Dolja and Koonin [8] who stated that:

“Comparison of the genome architectures of RNA viruses discovered by metagenomics and by traditional methods reveals an extent of gene module shuffling among diverse virus genomes that far exceeds the previous appreciation of this evolutionary phenomenon.”!

So are all the research papers stating that many, if not most, viruses with RNA genomes, are polyphyletic in origin, wrong? No. The authors of the ICTV-EC paper have made the fundamental mistake of mixing and confusing virus phylogenies and gene phylogenies. The twigs of the Riboviria tree are mostly of different species and genera of viruses, but at various points within the tree, they become the taxonomy of their RdRp genes alone! The Riboviria is not a monophyly, but a chimaera (i.e. “something made up of parts of things that are very different from each other”; Cambridge English Dictionary). It is possible to argue, as one referee of this paper has, that the RdRp gene provides, for the Riboviria, the equivalent of “the set of core genes that are considered to reproduce the replicating lineages of bacteria through time, even though they might acquire and discard various accessory modules related more to shorter term environmental adaptation.” (Anon). This is true, but rather than degrading the precision of the words “monophyly” and “polyphyly”, it would be better to devise a new word to describe this newly discovered evolutionary stratagem – perhaps “hyperphyly” is the word it needs!

The RdRp phylogeny used for the Riboviria is that of Wolf et al. [30] and its use implies that the ICTV accepts as proven that all RNA-dependent RNA polymerase (RdRp) genes are monophyletically related and with a particular topology. This result was obtained using rounds of heroic “semi-manual curation” with the “the boundaries of the RdRp domain expanded or trimmed to improve their compatibility with each other”, resulting in aligned sequences of, for example, in branch 3, up to 89% indels (Y.I. Wolf; personal communication)! It should be noted that an early and more direct phylogenetic analysis of the RdRps [31] found “no support for the common ancestry of RNA-dependent RNA polymerases and reverse transcriptases” then known. Even if the monophyly of the RdRps is confirmed, it may be difficult to distinguish whether it has resulted from divergent evolution from a single ancestor or convergent evolution from more than one ancestor [6, 16, 17]; there are probably few molecular structures able to fulfill the key roles of an RdRp, and convergence by selection is known to be potent (e.g. [13]).

So what should the ICTV-EC decide apropos Latinized binomials? First, as advised by Fred Bawden all those years ago, they should decide the purpose of the ICTV Taxonomy. It is my opinion that it has become so complex that only specialists understand how it was derived and hence what it can tell them, whereas its primary role, like the other international biological codes, is to best help the broader community, not just specialist viral phylogeneticists. It would have been better perhaps to have separated the two roles by maintaining data on the basic taxonomy of viruses, perhaps to the level of the ICTV Virus Taxonomy Profiles (https://www.microbiologyresearch.org/content/ictv-virus-taxonomy-profiles), separate, for the present, from attempts to present best the ‘black box’ of viruses past; there must be many better ways than 2D spreadsheets! The ICTV Taxonomy should be a simple biological classification of viruses, aligned, where possible, with the other biological codes, so that users can easily move between them. This could be accomplished by several important and concurrent changes.

  1. 1.

    The ICTV Taxonomy should be solely and completely ‘biological’ and ‘phylogenetic’ at all levels, not just species as at present. It would then qualify for the use of Latinized binomials for approved species. The classification of viral genes, including metagenomes, is a separate, intensely interesting topic, but at the moment it is very much in its early research phase.

  2. 2.

    The ICTV Taxonomy should be a taxonomy of viruses, not virions, not gene sequences, but viruses. The ICTV should promote the view that most viruses are sub-cellular organisms with a two-part life cycle (i.e. virions and virus-infected host cells) as proposed by Forterre [10], not just virions or metagenomes.

  3. 3.

    The definition of virus species must be clarified, and this is most easily accomplished by aligning it with the logical principles used by the other biological codes. The virus species is currently defined as “a monophyletic group of viruses whose properties can be distinguished from those of other species by multiple criteria”. This definition helps no-one, even though it is accompanied in the Code by a long explanation of the characters that may be useful for distinguishing one species from others. Instead of trying (and failing) to define the characters that may be used to define a species, it is much simpler (and in line with the other biological codes) for the definition to state how, in practice, species have been defined and why. Thus, minimally, a virus species is “an isolate or group of virus isolates that is considered by experts using multiple criteria to be so distinct that it is/they are most conveniently known by a single name” [12]. There is no need to enumerate those “multiple criteria”, the reader only needs to know that it has been done by experts who have used the most appropriate criteria, which differ for different groups of viruses. However the ICTV should, like the other biological nomenclatural codes, use a system of ‘types’ [11] where “a type is a particular specimen…of an organism…to which the scientific name of that organism is formally attached” (https://en.wikipedia.org/wiki/Type_(biology) – 15 June 2019). The codes of cellular organisms nominate a single type for each species, and each name is permanently attached to its type. Thus the species is the group of individual isolates that are so similar to the type that they are most conveniently given the same name (NB for pairwise comparisons, one of the pair is always the type). The nomenclatural codes of cellular organisms have traditionally used dried specimens in museums, viable cultures in collections, etc. as their types. Existential types are now no longer needed as genomic sequences provide ideal surrogates for types; each genomic sequence provides a ‘datum’ for each virus in evolutionary space and time to which a name can be attached. The ICTV now lists the Accession Codes of around 90% of the “Exemplar Isolates” in its full ICTV Reports (https://talk.ictvonline.org/ictv-reports/ictv_online_report/). To help all users, the Accession Codes of Exemplar Isolates should be formally adopted in the Virus Code to define the types of approved virus species, and only viruses with a genomic type should be recognised as species and given an ICTV-approved name. Thus, a renovated Virus Code should state, in full, that “a virus species is an isolate or a group of virus isolates that is considered by experts using multiple criteria to be so distinct that they are most conveniently known by a single name, and with one isolate, for which there is a complete genomic sequence, specified as its ‘genomic type’. Each virus species, like the species of cellular organisms, would cease to be merely a ‘construct of the human mind’, but it would be a group of isolates, related by common ancestry, that are so similar that the experts who study them know them by a single name, and the genomic type would ensure that one of those individuals, once upon a time, had a genomic sequence that is now stored in the GenBank database – its parent virus population will move on, but its type is stored for ever, we trust, in GenBank.

  4. 4.

    Users. The ICTV needs to be more active in its interaction with all users, even though all understand that the ICTV depends on much selfless pro bono work. Its website is a rather opaque collection of documents, and needs to have the most useful ICTV products in more accessible formats. These should include the ICTV Taxonomy and the ICTV Profiles. Working virologists most often require access to the approved name of “their current virus” to embellish the Introduction to the next report/paper. Students will perhaps be more interested in the Virus Code, which should be revamped with clearly stated principles so that anyone with a grounding in science, especially biology, can understand it and thereby understand the ICTV Taxonomy. The ICTV should also use the social media for announcements and discussions, and so that the occasional ICTV ballots might have more validity.

If such changes are made:

  1. 1.

    The ICTV will be seen to support the long-term convention that LBs are only used for biological classifications;

  2. 2.

    Non-specialists will benefit and, when using the ICTV website, will not be fooled by a ‘pseudophyly’ into believing that the evolution of viruses has followed the same ‘tree of life’ plan as cellular organisms, but will discover that the pattern of evolution of viruses and of simple cellular organisms is profoundly different from that of cellular organisms with completely different ratios of horizontal and vertical gene flow;

  3. 3.

    The use of names of cellular organisms will benefit from exposure to the ICTV Code with its stricter separation of formal LBs and vernacular names;

  4. 4.

    The ICTV website must help promote the exciting opportunities for research revealed by metagenomes, not only using the latest mega-data computational techniques [e.g. 1,9,18,22,23,26,28,] to explore the ‘black hole’ of virus pre-history, but also as clues for finding and describing the myriad of undescribed viruses that have provided metagenomes.