A proposal for a common nomenclature for viral clades that form the species varicella-zoster virus: summary of VZV Nomenclature Meeting 2008, Barts and the London School of Medicine and Dentistry, 24–25 July 2008

Varicella-zoster virus (VZV), the cause of chickenpox and zoster, was the first human herpesvirus to be sequenced fully and the first for which vaccines have been licensed and widely used. Three groups have published genotyping schemes based on single nucleotide polymorphisms (SNPs) and, between them, have identified five distinct phylogenetic clades, with an additional two putative clades. Sequencing of over 23 whole VZV genomes from around the world further refined the phylogenetic distinctions between SNP genotypes. Widespread surveillance in countries in which the varicella vaccine is now in use and the difficulties posed by three unique genotyping approaches prompted an international meeting, at which a common nomenclature based on phylogenetic clades was agreed upon. In this paper, we review the original genotyping schemes and discuss the basis for a novel common nomenclature for VZV strains. We propose a minimum set of SNPs that we recommend should be used to genotype these viruses. Finally, we suggest criteria by which novel clades can be recognized.


Introduction
Varicella-zoster virus (VZV; Human herpesvirus 3) is a member of the genus Varicellovirus, subfamily Alphaherpesvirinae. VZV exclusively infects humans, causing two diseases: chickenpox (varicella) following primary infection and shingles (herpes zoster), which results from reactivation of latently persistent virus (Hope-Simpson, 1965). VZV is related closely to other members of the genus Varicellovirus, including suid herpesvirus 1 (which infects pigs) and equine herpesviruses 1, 3, 4, 8 and 9, which infect horses (McGeoch & Cook, 1994). The most closely related human herpesviruses are herpes simplex virus 1 and 2. Varicella, although a relatively benign childhood infection in temperate countries, is nonetheless a cause of medically and economically important morbidity and mortality (Meyer et al., 2000;Preblud, 1986). In most countries, between 2 and 4 % of children are hospitalized for complications of chickenpox, including secondary bacterial infection, pneumonia and encephalitis (Sengupta et al., 2008). Mortality in children is around 0-0.05 deaths per 100 000 cases of chickenpox, primarily among children who were previously healthy (Sengupta et al., 2008). Adult infection, which is more common in tropical countries, results in a 6-fold increase in hospitalizations and up to a 20-fold increase in deaths (Marin et al., 2008;Rawson et al., 2001). Herpes zoster is a major cause of morbidity in the elderly from persistent neuropathic pain (Dworkin et al., 2007) and mortality in the immunocompromised, particularly from dissemination of the acute rash. The medical and economic burden of VZV disease has led to the introduction of childhood vaccination programmes against varicella in Japan, North America, parts of Europe, other Asian countries and the Antipodes (Bonanni et al., 2009). More recently, a vaccine to prevent herpes zoster has been licensed and is being administered to older people in the USA (Oxman et al., 2005).
The VZV genome was one of the first human herpesvirus genomes to be sequenced fully; this was done at the MRC Virology Unit, Glasgow, UK (Davison & Scott, 1986). The Dumas strain, which was originally isolated from a patient with varicella in Holland, is approximately 125 kb in length. Seventy-two open reading frames (ORFs) numbered 0-71 have been identified, of which 69 are unique and three are duplicated (Fig. 1). Like other herpesviruses, VZV has unique long and short regions and two invertedrepeat regions, TR and IR, within which lie the duplicated genes ( Fig. 1). VZV genomes differ in size by up to 500 bases and the differences are due to variation in the length of five tandem-repeat regions, R1-R5, and the viral origin of replication OriS (Tyler et al., 2007). R1, R2 and R3 lie with ORFs 11, 14 and 22, respectively, and are included in the coding regions of those genes (Davison & Scott, 1986). R4, R5 and the origin of replication OriS lie in non-coding regions. R4 and OriS are duplicated, occurring between ORFs 62 and 63 and ORFs 70 and 71. In recent years, 23 full-length sequences of 18 VZV genomes (Table 1) have been published, making the VZV genome the most-studied of the human herpesvirus genomes to date. The second virus to be sequenced fully was the parental vaccine strain pOka, as well as three preparations of the vaccine strain vOKa (Gomi et al., 2002). Two groups, one at the CDC and the other a collaboration between scientists at the University of Iowa and the Canadian National Public Health Laboratory, have published analyses of two and 11 novel genomes, respectively (Table 1) (Grose et al., 2004;Norberg et al., 2006;Peters et al., 2006).
If the variable regions are excluded, there are between 30 and 200 single nucleotide polymorphism (SNP) differences between any two genomes, i.e. between 1 in 4000 and 1 in 600 substitutions (Tyler et al., 2007). Approximately 30 % of these result in amino acid changes, and substitutions are not distributed evenly. If mutations are standardized per 100 bases, ORFs 1 and 62 are the most variable, whereas ORFs 3, 25 and 49 had no substitutions in 11 genomes (Tyler et al., 2007). Considering the genome in its doublestranded, linear form, the most variation occurs in ORFs sited at either end of the genome, i.e. ORFs 1-3 and 57-68, as well as in ORFs 35-37 (Tyler et al., 2007). The numbers of substitutions observed in the unique short region appear to be higher than in the unique long region, suggesting a different evolutionary rate (McGeoch, 2009).
Although some VZV genotyping by restriction fragment length polymorphism (RFLP) analysis of the variable region was carried out in the 1980s and 1990s, it was the development of SNP-typing tools that led to a better understanding of VZV molecular epidemiology and evolution. From 2000 onwards, SNP-based methodologies for genotyping were published by two USA-based and one UK-based groups (Barrett- Muir et al., 2001;Faga et al., 2001;Loparev et al., 2004;Muir et al., 2002). SNPs are sites that contain single base-pair variations (Collins, 1997). These sites are generally considered biallelic, due to the limited number of transversions that have been found (Brookes, 1999;Kruglyak, 1997). Each allele must occur in .1 % of the population to be considered an SNP (Kruglyak, 1997;Wang et al., 1998). Individuals who share many of the same SNPs are likely to have arisen from a common ancestor and can therefore be grouped by inheritance (Collins, 1997;Faga et al., 2001).
The three SNP-based genotyping methodologies differ in their approaches. The scattered SNP method, developed by the Breuer group in the UK, used heteroduplexmobility assays to identify 92 polymorphisms in 37 ORFs spread evenly across the VZV genome (Barrett- Muir et al., 2001;Muir et al., 2002). This research group identified three distinct phylogenetic genotypes circulating in the UK and a fourth in Japan. The Iowa group, led by Grose, sequenced five glycoprotein genes and the ORF encoding the major immediate-early transactivating protein, IE62. Sixty-one polymorphisms, 21 in the glycoproteins and 40 in ORF 62, were identified (Faga et al., 2001). Phylo- genetic analysis of these identified four clades (Peters et al., 2006). The third group, led by Schmid at the CDC, located a 447 bp fragment in ORF 22 upstream of the Cterminal coding region that was sufficiently polymorphic to allow identification of four viral clades (Table 2) (Loparev et al., 2004). Between them, the three groups identified five distinct clades, although no one group identified all five. The different nomenclatures used by each group are shown in Table 2 and the SNPs used for genotyping in Table 3.

Whole-genome sequencing
The phylogeny of VZV was further informed by wholegenome sequencing carried out by US-Swedish and US-Canadian collaborations (Norberg et al., 2006;Peters et al., 2006). Until this time, seven full-length sequences were available: the Dumas strain, the Biken, Merck and GSK strains of the Oka vaccine strain, parental Oka, and MSP and BC, in which the identical amino acid substitution D150N had been reported in glycoprotein E (Davison & Scott, 1986;Gomi et al., 2002;Grose et al., 2004). The US (Iowa)-Canadian collaboration reported an additional 11 strains, whilst the US-Swedish collaboration reported partial sequence from an additional two strains. In the latter, repeat and tandem-repeat regions, including the R1-R5 and OriS regions, were excluded (Norberg et al., 2006). In the first study, all 11 additional strains sequenced were collected from North America, whilst in the US-Swedish study, one strain originated from Morocco and the second from the USA (Norberg et al., 2006;Peters et al., 2006). At the same time, the Iowa-Canadian group carried out the definitive investigation of virus stability. Repeated passage of a virus in cell culture followed by whole-genome sequencing showed that, excluding unstable repeat regions, one substitution (8610 26 substitutions per base) occurred following 20 passages and 28 substitutions (2.2610 24 per base) after 72 passages. This suggests that, above 20 passages, the virus becomes unrepresentative of the original (Tyler et al., 2007). Table 1 shows the common clade nomenclature agreed upon at a meeting held in 2008, alongside each of the others; this will be used from this point forward. The power of full-length sequence to  establish evolutionary history and shed light on pathogenesis was exemplified by work on the MSP and BC strains of VZV. In 2000, the Grose laboratory described the MSP virus, in which a mutation in a glycoprotein E (gE) B-cell epitope abrogated binding of an anti-gE antibody (Santos et al., 2000). Two years later, the same gE amino acid mutation was described by the Tipples group in a Canadian VZV community isolate, strain BC (Tipples et al., 2002). As the VZV genome is extremely stable, it was initially unclear whether the mutations in BC and MSP had arisen independently. The Iowa-Canadian collaboration hypothesized that MSP and BC might indeed represent a distinct subclade that was now spreading independently in North America. Sequencing of full-length genomes established that the viruses, although both belonging to clade 1, were not related more closely to each other than they were to other viruses within the same clade (Grose et al., 2004). Thus, the substitution in gE was a homoplasty, i.e. an identical mutation that had arisen in an evolutionarily independent manner in two unrelated viruses, most probably to escape neutralizing antibody. Subsequently, in an analysis of 634 VZV isolates collected at the Karolinska University Hospital, Sweden, two additional similar gE mutant viruses were discovered (Wirgart et al., 2006).
Early on, opportunistic genotyping by many groups identified differences in the global distribution of the five clades (Loparev et al., 2004;Muir et al., 2002;Wagenaar et al., 2003). Two clades (1 and 3) were found in European countries and areas of the world predominantly settled by Europeans, whilst clade 2 was identified as being predominantly present in Japan and surrounding countries (Loparev et al., 2004;Muir et al., 2002;Wagenaar et al., 2003). Because the varicella vaccine was developed in Japan, the majority of sequenced clade 2 strains originated from Japan. The two remaining clades, 4 and 5, are most prevalent in Asian and African countries (Loparev et al., 2004(Loparev et al., , 2007bMuir et al., 2002;Wagenaar et al., 2003). The results obtained from genotyping of opportunistically collected samples were confirmed by analysis of zoster samples collected prospectively from European subjects living in the UK, the oldest of whom was over 100 years of age (Sengupta et al., 2007). Making the assumption that the virus reactivating as zoster and the virus responsible for primary infection were the same strain, it was confirmed that the European clades 1 and 3 were present in 90 % of Caucasians living in the UK and that this prevalence had not changed over the past 100 years. Reassuringly, this validated the conclusions about geographical distribution of strains and supported the value of opportunistic genotyping. These findings have enabled a number of groups to observe an increased circulation of clade 5 viruses in recent years (Hawrami et al., 1997;Quinlivan et al., 2002;Sauerbrei & Wutzler, 2007).
Following the initial publications, the CDC group added SNPs from ORFs 21 and 50 to those used originally in the ORF 22 genotyping scheme (Table 2) (Loparev et al., 2007b;Sergeev et al., 2006). Using these, they have suggested that, in addition to the five existing phylogenetic clades, there may be two additional stable clades, which they designated M3 and M4 (Loparev et al., 2007b(Loparev et al., , 2009Sauerbrei et al., 2008;Sergeev et al., 2006) (Table 2; Fig. 2). M4 is negative for a BglI site in ORF 54, which is a characteristic feature of the 'European origin' clades 1 and 3 (Quinlivan & Breuer, 2006;Sengupta et al., 2007) and has in fact been shown by the CDC group to resemble European clades 1 and 3 (Loparev et al., 2009). It is possible that they may be the same as the viruses described as clade 1 and 3 'recombinants' by the UK group . However, extensive SNP typing has shown M4 to have unique SNPs that are not shared with any other clade. The data suggest that, once full-length sequence has been obtained, the status of M4 as an independent clade is likely to be confirmed (Loparev et al., 2009). The status of M3 is less certain. The virus is positive for the ORF54 BglI site, in which respect it resembles the clade 2, 4 and 5 viruses that predominate in Japan, Asia and Africa, respectively. M3 has been described only a few times and  T   ND   50  87841  C  T  T  T  T  T   ND   54  95241  T  C  T  C  C  T   ND   55  98437  T  C  T  C  T  C   ND   56  98825  T  T  T  C  C  C   ND   60  101464 C  C  C  C  A  C   ND   66 113243 A A C C A C ND more sequencing data are needed to confirm its existence as an independent clade (Loparev et al., 2007a;Sergeev et al., 2006).

The 2008 VZV nomenclature
To resolve the confusion associated with three different genotyping classifications, a meeting (see attendance list in Acknowledgements) to discuss a common nomenclature was held in London, UK, in 2008. Four principles were agreed upon: that the new nomenclature should be distinct from the three established nomenclatures, that it should reflect the phylogenetic structure of VZV strains, that it should reflect the order in which full-genome sequences were published or deposited in the databases, and that it should describe the provenance of the strain. To that end, the proposed nomenclature for VZV and the prototypic viruses for each clade are shown in Table 2 and Figs 2 and 3. A summary of the proposed nomenclature overall is shown in Fig. 2. We have used the word 'clade' to designate an evolutionarily distinct subspecies of VZV. The intention is that each clade should have at least two whole genomes. Clades 2 and 5 have only one genome and further work to generate a second prototype for each clade is ongoing. Several principles for naming strains have been agreed and are listed below.
1. Strains of VZV will now be identified by the disease (varicella, zoster or latent). 2. Novel viruses will be designated a member of an existing clade, based on phylogenetic clustering or SNP profile as outlined in Table 3. 3. Strains will be identified as cultured isolates (i) or sequence (s). 4. The geographical and temporal origin of the virus will be identified. 5. Putatively novel clades will be identified by Roman numerals until confirmed as a clade.
Some of the laboratories that can provide support for genome sequencing are listed in Table 4.

Future studies
Vaccination programmes to prevent varicella in children are now established in many resource-rich countries, and surveillance of viral genotypes has been set up by some to support these efforts (Table 4). Reference genotyping of varicella is provided by several countries and the establishment of a common nomenclature will facilitate interchange of information. In particular, the success of the vaccine programmes in preventing wildtype varicella and of genotyping the viruses causing outbreaks has been, and will continue to be, central to ongoing implementation and improvement. Most  recently, the emergence of wild-type vaccine recombinant viruses provides a challenge for which the unified nomenclature and common genotyping will be important, not only for surveillance but also for pathogenesis (D. S. Schmid, unpublished data). Of the three such recombinants analysed so far, two are vaccine-clade 1 and one is vaccine-clade 3. This discovery was not totally unexpected, as recombination between two different live vaccine viruses has been documented previously with pseudorabies virus, an alphaherpesvirus that infects pigs (Dangler et al., 1993). A common nomenclature will also allow better understanding of VZV genome biology, including the emergence of new strains following biological and immune selection across functionally important regions. Finally, there is still work to be done on defining relatedness between strains of the same clade for investigation of outbreaks. The stability of the VZV genome and low numbers of substitutions between strains, particularly within the same clade, preclude the use of SNPs for transmission studies. However, signature sequences that are unique or uncommon may be useful for establishing chains of transmission, as was the case in an outbreak of varicella in a residential care home (Lopez et al., 2008). An alternative approach is to make use of repeat regions within the VZV genome, which evolve at a faster rate. Two such regions that have been used to investigate hospital transmissions are R1 in ORF 11 (Molyneaux et al., 2006;Tang et al., 2005) and OriS (J. Breuer, unpublished data). However, the precise parameters that will inform their use for investigation of outbreaks remain to be established. Ultimately, genome sequencing may prove to be the most useful tool for investigation of VZV transmission events.