Evolution of Cocirculating Varicella-Zoster Virus Genotypes during a Chickenpox Outbreak in Guinea-Bissau

ABSTRACT Varicella-zoster virus (VZV), a double-stranded DNA alphaherpesvirus, is associated with seasonal outbreaks of varicella in nonimmunized populations. Little is known about whether these outbreaks are associated with a single or multiple viral genotypes and whether new mutations rapidly accumulate during transmission. Here, we take advantage of a well-characterized population cohort in Guinea-Bissau and produce a unique set of 23 full-length genome sequences, collected over 7 months from eight households. Comparative sequence analysis reveals that four distinct genotypes cocirculated among the population, three of which were present during the first week of the outbreak, although no patients were coinfected, which indicates that exposure to infectious virus from multiple sources is common during VZV outbreaks. Transmission of VZV was associated with length polymorphisms in the R1 repeat region and the origin of DNA replication. In two cases, these were associated with the formation of distinct lineages and point to the possible coevolution of these loci, despite the lack of any known functional link in VZV or related herpesviruses. We show that these and all other sequenced clade 5 viruses possess a distinct R1 repeat motif that increases the acidity of an ORF11p protein domain and postulate that this has either arisen or been lost following divergence of the major clades. Thus, sequencing of whole VZV genomes collected during an outbreak has provided novel insights into VZV biology, transmission patterns, and (recent) natural history. IMPORTANCE VZV is a highly infectious virus and the causative agent of chickenpox and shingles, the latter being particularly associated with the risk of painful complications. Seasonal outbreaks of chickenpox are very common among young children, yet little is known about the dynamics of the virus during person-to-person to transmission or whether multiple distinct viruses seed and/or cocirculate during an outbreak. In this study, we have sequenced chickenpox viruses from an outbreak in Guinea-Bissau that are supported by detailed epidemiological data. Our data show that multiple different virus strains seeded and were maintained throughout the 6-month outbreak period and that viruses transmitted between individuals accumulated new mutations in specific genomic regions. Of particular interest is the potential coevolution of two distinct parts of the genomes and our calculations of the rate of viral mutation, both of which increase our understanding of how VZV evolves over short periods of time in human populations.


IMPORTANCE
VZV is a highly infectious virus and the causative agent of chickenpox and shingles, the latter being particularly associated with the risk of painful complications. Seasonal outbreaks of chickenpox are very common among young children, yet little is known about the dynamics of the virus during person-to-person to transmission or whether multiple distinct viruses seed and/or cocirculate during an outbreak. In this study, we have sequenced chickenpox viruses from an outbreak in Guinea-Bissau that are supported by detailed epidemiological data. Our data show that multiple different virus strains seeded and were maintained throughout the 6-month outbreak period and that viruses transmitted between individuals accumulated new mutations in specific genomic regions. Of particular interest is the potential coevolution of two distinct parts of the genomes and our calculations of the rate of viral mutation, both of which increase our understanding of how VZV evolves over short periods of time in human populations.
V aricella-zoster virus (VZV; subfamily alphaherpesvirus), causes chickenpox (varicella), an infection mainly of childhood, and shingles (zoster), a painful dermatomal rash that follows reactivation of latent endogenous virus in sensory ganglia. The virus is transmitted in aerosols, resulting mainly from the rupture of fluid filled skin blisters which are characteristic of both chickenpox and shingles but also from virus shed from the respiratory tract. Virus inhaled by a susceptible contact replicates in the nasopharynx, spreading thereafter to cause the centripetal rash characteristic of chickenpox. Like other airborne virus infections, chickenpox is epidemic. Immunity is generally lifelong, with outbreaks mainly affecting susceptible birth cohorts. In temperate countries such as the United Kingdom and the United States, VZV is estimated to infect 60 to 90% of close and household contacts and, by age 10, more than 90% of the population are immune (1,2). In contrast, VZV household infectivity in Guinea-Bissau, a tropical African country close to the equator, is closer to 16% (3). Unusually for a tropical climate, the mean age of chickenpox is similar to that of temperate countries, and this has been attributed to a higher population density which compensates for the reduced viral transmissibility (3). Possible explanations for the reduced infectivity of VZV in tropical countries include increased temperature, humidity, and UV light exposure, all of which have been shown in vitro to inactivate virus (4). However, prevalent viral genotypes circulating in Africa, India, and Sri Lanka differ from endogenous European genotypes, and this could also provide an explanation for different patterns of transmissibility (5). While more than 47 full-length VZV genomes have been sequenced to date (6)(7)(8)(9)(10), none are from viruses circulating in countries with low transmission rates.
Here, we have sequenced and assembled whole VZV genomes from 23 individuals over the course of a seasonal varicella outbreak in Guinea Bissau. These viruses were collected from a wellcharacterized population cohort in the Bandim peri-urban area of Bissau, the capital of Guinea-Bissau, which has been studied for over 30 years as part of the Bandim Health Project (Statens Serum Institut, www.bandim.org) (3,11,12). Epidemiological data were obtained for an outbreak of 1,419 cases occurring over a 7-month period during 2001, while samples of vesicle fluid were obtained from a subset of these (ϳ500). The data collected included house location, household structure (i.e., numbers of families and family members cohabiting), severity of disease, and the relationship of each infected person to the putative index patient who transmitted to them. While subgenomic regions (e.g., the origin of DNA replication [OriS]) undergo rapid changes during an epidemic (3), the extent to which new mutations accumulate across the whole genome during transmission is not known, and thus VZV transmission chains remain poorly characterized at the whole-genome level.
This study uses recently developed enrichment methods (13,14) which enable deep sequencing of pathogens directly from clinical samples and is the first study of herpesviruses that investigates the origins and subsequent evolution of viral genotypes during an outbreak and provides unique insights into the biology of these large double-stranded DNA viruses. Finally, we address the longstanding question of whether varicella strains circulating in tropical countries have specific genetic adaptations that result in reduced transmission rates.

MATERIALS AND METHODS
Sample collection and ethics. Ethical clearance for the study was obtained from the Ministry of Public Health in Guinea-Bissau and the East London and City Health Authority. Participation was voluntary. We selected eight putative transmission chains comprising two to four members of the same household collected during the start, middle, or end of the outbreak in 2001 (Table 1 and Fig. 1). Putative transmissions were defined by the occurrence of a household member being diagnosed with varicella within 7 to 21 days of another member of the same household and where typing of the OriS region showed a similar number of TA and GA repeats (Ϯ1) (3). We also selected three samples (Bandim 6, 7, and 18) where the OriS repeat structure differed from other infected persons of the same household in the same time period. Vesicular virus was obtained from a total of 24 subjects with varicella during the outbreak, placed in viral transport medium and stored at Ϫ80°C. DNA extraction, library construction, targeted enrichment, and sequencing. Total DNA was extracted from each sample by using a QiaAMP DNA minikit (Qiagen) according to the manufacturer's instructions. DNA quantification was performed with a NanoDrop spectrophotometer, and samples with 260/280 ratios outside the range 1.7 to 2.1 and 260/230 ratios outside the range 1.8 to 2.2 were further purified using a Zymoclean Genomic DNA Clean & Concentrator (Zymo Research Corp.). Whole-genome amplification using GenomiPhi V2 (GE Healthcare) was performed using 10 ng of starting material. Libraries were constructed in accordance with the standard SureSelect XT v1.5 protocols (Agilent). Enrichment for VZV sequences was performed as described previously (13,14). Sequence libraries were multiplexed (Bandim 1 to 12 and Bandim 13 to 24) and sequenced using 300-bp and 500-bp paired-end kits (respectively) on an Illumina MiSeq.
Genome assembly and variant calling. Sequence data set where demultiplexed using BaseSpace and individual data sets were subsequently parsed through QUASR (15) for duplicate removal and read-trimming (-q 30, -l 50) and subsequently aligned against the VZV reference strain Dumas (NC_001348) using BWA (16). Resulting alignments were processed using SAMTools (17) to generate pileup files for each sample. A consensus sequence for each data set was called with the QUASR module "pileupConsensus" and a 50% frequency threshold (i.e., no ambiguities were included). Variant profiling for each data set was performed using VarScan v2.2.11 (18) with the following parameters: basecall quality, Ն20; read depth, Ն50; and independent reads supporting minor alleles, Ն2 per strand. In addition, variant calls showing a directional strand bias of Ն0.85 were excluded from further analyses. Consensus sequences were generated for each rash sample, but iterative repeat regions R1, R2, R3, R4, and R5 (19,20), as well as the terminal repeat region, were trimmed prior to tree-building analyses. Consensus sequence analyses. DNA sequences were aligned by using the program Mafft, v6 (21), with alignments checked manually; no insertions or deletions were inferred from the alignment.
Substitution rates. Estimates of substitution rates were inferred using the program Beast v1.7.5 (22). The Beast analyses were performed under a HKYϩI model of nucleotide substitution (selected by jModeltest 2.1 [23, 24]), strict and relaxed clock models (strict clock, relaxed lognormal, and relaxed exponential) and a variety or tree coalescent priors (constant, Bayesian skyline, and exponential). A Bayes factor analysis suggested that there was not sufficient evidence to reject a model of constant population size. The Monte Carlo Markov chain was run for 50,000,000 iterations, with a thinning of 50,000. We checked for convergence by ensuring that all parameters had an effective sample size (ESS) of at least 200. Finally, we assessed the molecular clock model by looking at the coefficient of variation histogram. The program Path-O-gen (http://tree.bio.ed.ac.uk/software/pathogen), which regresses the root-to-tip distance against the sampling date, was used to assess the "clock-likeness" (the extent to which the sampling date is correlated with the total branch length) of the Guinea-Bissau VZV sequence data using neighbor-joined trees inferred by Mega v5.2 (25). To assess the level of the temporal structure (which measures the effect that the background mutation rate is having on the inferred substitution rate) we adopted the approach of Duffy and Holmes (43) whereby rate analyses were repeated, but with the sampling dates randomly shuffled among the tips.
Amplification of VZV reiteration regions. The VZV reiteration regions R1, R2, R4, and R5 were amplified and sequenced from Guinea-Bissau sample DNA using primers designed against the Dumas reference genome (NC_001348) as a template. All PCRs were carried out using Herculase II fusion DNA polymerase (Agilent Technologies) according to the manufacturer's instructions. The cycling conditions were as follows: denaturation at 95°C for 2 min, followed by 35 cycles of amplification (denaturation at 94°C for 20 s, annealing at 55 to 65°C for 20 s, and extension at 72°C for 30 s), and then a final extension step at 72°C for 3 min. PCR products were purified using a DNA Clean & Concentrator-5 kit (Zymo Research) according to the manufacturer's instructions. Products were Sanger sequenced in the forward and reverse directions.
GenBank accession numbers. Consensus sequences for all samples sequenced in the present study are available in GenBank under the following accession numbers: KM355696 to KM355718.

RESULTS
VZV DNA was successfully enriched from 23 patient samples (Table 1), with over 99.9% genome coverage by Illumina MiSeq sequencing ( Table 2). All 23 Guinea-Bissau viruses clustered within clade 5, and all Guinea-Bissau viruses were more closely related to each other than to other clade 5 viruses, although no single mutation differentiated them from other clade 5 viruses (Fig. 2). Virus sequences segregated into two main genogroups, 5A and 5B, and these cocirculated throughout the epidemic although no coinfected samples were found (Table 1 and Fig. 1 and 2). Most viruses within each genogroup were genetically nearly identical. However, three viruses in genogroup 5B (Bandim 6, 13, and 14) were more divergent than others compared to the consensus sequence for the genogroup. Bandim 6 differed from the consensus 5B sequence by 16 single-nucleotide polymorphisms (SNPs), and Bandim 13 and 14, although they were identical to each other, differed from the consensus 5B sequence by 11 SNPs (Fig. 3). The paired Bandim 13/14 viruses were recovered at the end of the outbreak and may therefore have evolved during epidemic spread. In contrast, Bandim 6 cocirculated with 5A and 5B viruses in the first month of the outbreak. Based on the timing of the second infections (within 7 to 21 days of the putative index case), we identified 15 potential household transmissions (Table 1 and Fig.  1). In nine of these, the SNP genotype of the transmitted virus sequence was identical to the index case, whereas in three they were identical apart from one (Bandim 1/2) or two SNP differences Bandim (11/12). However, for four putative household transmissions the viruses that appeared temporally linked were  (Fig. 4).
In total, 44 SNP positions were identified within our sample set, 15 of which are nonsynonymous mutations (Fig. 4). The majority of open reading frames (ORFs) contained either no SNP (n ϭ 45) or a single SNP (n ϭ 14). Four viruses-Bandim 6, 13, 14,    Fig. 1) with asterisks denoting bootstrap support Ͼ0.9 (A) and a UPGMA (unweighted pair-group method with arithmetic averages) phylogeny (constructed from a distance matrix calculated from the repeat data where the variable is the number of repeat units) for the same sample collection using just the R1-R5 repeat region sequences (B) are shown. Repeat region patterns are identified by color and correspond to data shown in Table S1 in the supplemental material. (C) Phylogenetic network reveals multiple genotypes. Two primary genogroups are present in the Guinea-Bissau data set, while Bandim 6 and Bandim 13/14 are also considered different lineages. Nodes are colored according to the lineage (gray nodes are median joins), labeled according to the sample, and "sized" according to the number of identical genomes (not including variation in repeat regions). The numbers of SNP differences between consensus sequences are labeled along the branches (i.e., branch lengths are not scaled to number of changes). and 17-had synonymous mutations at positions in ORF62. Three viruses-Bandim 15, 16, and 17-also shared a synonymous mutation in ORF64. Using the sampling dates for all 23 samples, we calculated the substitution rate (excluding the repeat regions) to be 1.82 ϫ 10 Ϫ5 substitutions per site per year (Tables 3  and 4). Estimates of the substitution rate were also obtained using strict clock models (where a single substitution rate applies to all parts of a phylogenetic tree) and relaxed clock models (where substitution rates are allowed to differ across the phylogenetic tree). Although these rates varied only slightly between strict and relaxed clock models, there was only weak evidence for temporal signal (a measure of the degree of clock-like [i.e., constant] evolution in the data) (Tables 3 and 4). This can be attributed to the cocirculation of multiple genotypes during the outbreak and thus we also calculated a substitution rate separately for genogroup 5B, for which there was better evidence for temporal signal (Tables 3  and 4). No substitution rate could be calculated for genogroup 5A since the genetic diversity between the samples was too low. Overall, these substitution rate calculations are higher than have previously been estimated for VZV (e.g., 3.8 ϫ 10 Ϫ6 substitutions per site per year [26]). To validate our estimate, the calculations were repeated after random shuffling of sampling dates between samples (27). Ideally, the mean evolutionary rate calculated from the true data should not coincide with the confidence intervals attached to the mean evolutionary rates calculated for any of the shuffled tips analyses (Fig. 4). However, some overlap was ob-served in our analyses, which suggests that the background mutation rate may be slightly inflating our estimate of the evolutionary rate. This is consistent with the short time scale over which the viruses in the present study were sampled and suggests that some of the variation observed between samples may be from deleterious mutations which are yet to reach fixation. To estimate the time of divergence of genogroups 5A and 5B, we used three different calculations of the substitution rate, the one derived here from all 23 samples (1.82 ϫ 10 Ϫ5 substitutions per site per year), one derived just from genogroup 5B samples (5.91 ϫ 10 Ϫ5 substitutions per site per year), and one obtained from a previous studies by  [26]). The dates of divergence, estimated from all three rates, for genogroups 5A and 5B, as well as the Bandim 6 and Bandim 13/14 lineages, are shown in Table 4 and Fig. 5.
In all but three cases, which differed by a single repeat unit (either TA or GA), the OriS sequences obtained by Illumina sequencing matched those previously obtained by PCR and Sanger sequencing (3) (see Table S1 in the supplemental material). The OriS and R4 repeat regions were present as two identical copies in all viruses (i.e., did not differ within a single sample but could still differ between samples). The OriS is the only region for which recombination could be postulated with a single genogroup 5A virus (Bandim 18) having a genogroup 5B-like OriS repeat structure. We were also able to amplify and sequence (by Sanger methodology) four of the five VZV tandem repeat regions (R1, R2, R4, and R5). Insufficient DNA remained to amplify the R3 repeat region. The repeat structures of the R2, R4, R5, and OriS sequences were similar to published sequences (see Table S2 in the supplemental material). However, the Guinea-Bissau viruses possess a hexamer repeat motif, termed ε, present at the 3= end of the R1 repeat which has previously only been observed in four other clade 5 viruses (Table 5) and is currently considered to be unique to clade 5 viruses (9,28). The hexamer repeat motif encodes a single aspartic and glutamic acid that serves to increase protein acidity at the C-terminal coiled/helical (end) region of the clade 5 R1 repeat.
Two R5 alleles were observed that segregated completely by genotype and did not vary throughout the outbreak (Fig. 5). Although the other repeat regions appeared more variable, some evidence of conservation within each lineage was evident for R1, R4, and OriS, although less so for R2. (Fig. 3; see Table S2 in the supplemental material). OriS was the most variable, although two main alleles, segregating with 5A (8xTA/14xGA) and 5B (5xTA/ 9xGA), were evident. Of the 11 samples that differed from the  consensus for their genogroup, five (Bandim 7, 11, 13, 23, and 24) differed by a single dinucleotide, of which four (Bandim 7, 11, 23, and 24) were otherwise identical. A single dinucleotide difference in OriS can occur even when resequencing the same sample and so may be considered an artifact in most cases. The six cases (Bandim 6 and Bandim 14 to 18) that differed by more than one dinucleotide repeat from the genogroup consensus coincided with six of the seven viruses that also had changes in R1 structure (Fig. 5). The remaining virus with changes in the R1 region had a single dinucleotide change in OriS (Bandim 13). Together, these data are consistent with a pattern of lineage coevolution involving the R1 repeat region and the OriS (Fig. 5; see Table S1 in the supplemental material). In contrast, the R4 repeat region, which is noncoding and segregated according to lineage, does not vary or associate with the evolution of new lineages. Variation in the R2 repeat region located in ORF14, which codes for glycoprotein C, was lineage independent.

DISCUSSION
We report here the first whole-genome sequencing of freely circulating uncultured VZV from an outbreak of chickenpox. The outbreak began in Bissau, the capital of Guinea-Bissau, at the beginning of January 2001 and ended 23 weeks later at the end of June 2001. All of the viruses sequenced belong to clade 5, which is thought to be endemic in Africa and parts of Southeast Asia. However, at least three distinct sublineages of 5A, 5B, and Bandim 6 were circulating during the first month of the outbreak, suggesting multiple viral origins. This finding corroborates a previous study that identified multiple clades among viruses cocirculating in time and geographical location during a chickenpox outbreak in the United Kingdom (29). We conclude that the most parsimonious explanation for this finding is that exposure to infectious virus from many sources is common and thus not a rate-limiting step to epidemic spread. Rather, as we and others have previously shown, it is probably the availability of sufficient susceptible hosts together with environmental conditions such as school gatherings (3). In the United Kingdom and United States, primary cases are typically school-aged children who become infected at school, and secondary infections are their cohabitants at home (1,2). In Guinea-Bissau, the ages of primary and secondary cases do not differ, probably due to extensive mixing of preschool-aged children with older children both inside and outside the home (3). It was shown in the present study that the infectivity rate falls coincidentally with the school holidays and that mixing at school is an important facilitator of transmission. Primary infections caused by Bandim 6 and 17 are from patients of school age (12 and 10 years old, respectively), but those caused by Bandim 7 and 18 are not (representing 2-and 4-year-old patients, respectively). We therefore have insufficient data to conclusively state from our samples whether transmission facilitated by mixing at school is an important primary transmission route of infection into a household or not, and so the origin of the index viruses in this outbreak remains uncertain. The Bandim Health Project collects detailed epidemiological data about households, their inhabitants, their health, and recent travel. Questionnaires administered to households affected early in the outbreak failed to identify contact with herpes Red and blue shading of the repeat regions indicate the relative conservation of the repeat region sequences in each of the genogroups (e.g., R5 differs between genogroups but is perfectly conserved within each genogroup) and correlates with data shown in Fig. 3. Putative transmission chains are grouped (based on geographic information) in the first column with the month during which the sample was isolated shown in the second column. Repeat regions that could not be amplified are indicated by an "X" (note that R3 repeat regions are not included). nc, noncoding region; *, stop codon. zoster or sporadic chickenpox, including cases imported from outside the area. Asymptomatic oral shedding of virus is well described (30)(31)(32) and, in the absence of evidence for contact with chickenpox or zoster, the possibility that orally shed live virus seeded this outbreak cannot be excluded.
Whole-genome sequencing confirmed VZV to be extremely stable during transmission (4,8,11); for example, 8 of the 14 genogroup 5A viruses ( Bandim 4,5,7,11,19,20,21,and 22) were identical (barring one OriS dinucleotide repeat in Bandim 7 and 11) despite, in some cases, being recovered several months apart. A further three-Bandim 12, 15, and 16 -differed by only a single SNP from the consensus sequence for their genogroup. There was more variation in the 5B genogroup, where at least two subsidiary divergent lineages appear to have arisen, at least one of which (Bandim 6) diverged prior to this outbreak. The substitution rates calculated from all of the sequences, excluding the repeat regions, and from just the 5B lineage, respectively, 1.82 ϫ 10 Ϫ5 and 5.91 ϫ 10 Ϫ5 substitutions per site per year, are significantly higher than previous estimates based on codivergence of the host and the virus (3.9 ϫ 10 Ϫ9 substitutions per site per year) (9) but only slightly higher than a previous estimate using heterochronous data (3.8 ϫ 10 Ϫ6 substitutions per site per year) (26). Previous analyses using time-stamped data in a range of other viruses have all inferred rates higher than that estimated by codivergence (33)(34)(35)(36). Although the short sampling time may have inflated the estimates of substitution rates by including mutations that may be deleterious and become fixed over longer sampling times, the data are consistent with theories of VZV evolution that place the clade diversification of VZV some 20,000 to 50,000 years ago (37) rather than with the migrations out of Africa (26). Based on the substitution rates derived here and those derived by Firth et al. (26), we estimate that genogroups 5A and 5B diverged between 2 and 31 years prior to the outbreak, whereas the Bandim 6 lineage arose at least 1 and 14 years previously (Tables 3 and 4 and Fig. 6). We originally hypothesized that the Bandim 13/14 viruses, which were sampled toward the end of June, might have arisen by accumulation of mutations in genogroup 5B during the outbreak. However, even the fastest estimates of VZV mutation rate placed the date of divergence Bandim 13/14 from 5B as prior to the current outbreak (Tables 3 and 4 and Fig. 6). Sequencing of greater numbers of viruses from the outbreak is now required to corroborate (or refute) this finding. In addition, further sequencing should provide greater support for the mutation rate estimated here. The observation that length polymorphism in the R1, R4, and OriS is not random but rather lineage specific points to the possible coevolution of the R1 repeat region and the OriS (Fig. 6). However, no functional link between these two regions has been identified either in VZV or related herpesviruses. R1, which is located at the N terminus of the ORF11 protein (ORF11p), is upstream of a region of the ORF11p that has been predicted, in silico, to bind RNA (38)(39)(40). The R1 motif itself is predicted to form a hydrophobic alpha helix, which also typically binds to nucleic acid. It is unlikely that R1 directly binds to the OriS, since ORF11p is not known to be part of the DNA replication complex.  . Genogroup 5A is highlighted in red, and genogroup 5B is highlighted in blue. Dates were inferred with a strict clock (A), the rate fixed to that estimated from a strict clock analysis of just the genogroup 5B samples (B), and the rate fixed to that estimated from a previous study (26) (C).
The structure and sequences of the Guinea-Bissau clade 5 repeat regions R2 to R5 and OriS overlap the repeat sequences found in other clades (see Table S2 in the supplemental material). The exception is the R1 tandem repeat, the C-terminal end of which contains densely repeating aspartic acid and glutamic acid residues, making it more acidic and hydrophobic than the R1 repeats found in clades 1 to 4 (see Table S2 in the supplemental material). ORF11 is expressed as an immediate-early protein in keratinocytes (18) and has been shown to be essential for replication in the SCID-Hu mouse skin xenograft model. As with its herpes simplex virus 1 homologue (UL47), the loss of ORF11p is associated with diminished expression of immediate-early proteins (38,41,42). Since clade 5 viruses are endemic to Africa and Southeast Asia (5), it is possible that the differences seen in clade 5 R1 represent adaptation to the host populations in these geographical regions. Further work is therefore needed to determine whether the R1 structure observed here is also found in clade 5 viruses from these regions and to determine whether this structure has an evolutionary or functional significance.
In summary, whole-genome sequencing of VZV has enabled us, for the first time, to study the dynamics of VZV transmission and evolution during a localized outbreak in Guinea-Bissau in 2001. These data have allowed us to accurately measure VZV short-term substitution rates and observe that all clade 5 viruses sequenced to date have a unique R1 repeat structure that codes for a more hydrophobic and acidic N terminus in the ORF11p. Guinea-Bissau clade 5 sequences cluster separately from clade 5 sequences obtained from European and U.S. subjects, suggesting a different evolutionary history, although sequencing of more African strains is needed to confirm this observation. Although most VZV remains highly conserved during epidemic transmission, our data imply that changes occurring in the R1 and OriS are associated with the evolution of new viral lineages.