High-throughput sequence analysis of variants of human cytomegalovirus strains Towne and AD169

The genomes of commonly used variants of human cytomegalovirus (HCMV) strains Towne and AD169 each contain a substantial mutation in which a region (UL/b′) at the right end of the long unique region has been replaced by an inverted duplication of a region from the left end of the genome. Using high-throughput technology, we have sequenced HCMV strain Towne (ATCC VR-977) and confirmed the presence of two variants, one exhibiting the replacement in UL/b′ and the other intact in this region. Both variants are mutated in genes RL13, UL1, UL40, UL130, US1 and US9. We have also sequenced a novel AD169 variant (varUC) that is intact in UL/b′ except for a small deletion that affects genes UL144, UL142, UL141 and UL140. Like other AD169 variants, varUC is mutated in genes RL5A, RL13, UL36 and UL131A. A subpopulation of varUC contains an additional deletion affecting genes IRS1, US1 and US2.

Human cytomegalovirus (HCMV; species Human herpesvirus 5) was first isolated over 50 years ago (Craig et al., 1957;Rowe et al., 1956;Smith, 1956). The most widely used laboratory strains are Towne (Plotkin et al., 1975) and AD169 (Rowe et al., 1956). Both have been distributed widely and developed as vaccine candidates, and over the years their detailed histories have become obscure. Moreover, the fact that the biological properties of these strains are not conserved between stocks (Brown et al., 1995) demonstrates the existence of genetic variants, and this may affect the interpretation of experimental studies. This paper contributes to the characterization of variants present in the ATCC VR-977 stock of Towne and of commonly used variants of AD169 in comparison with a novel, genetically more intact variant.
The Towne genome sequence has been determined (Dunn et al., 2003;Murphy et al., 2003) from a bacterial artificial chromosome (BAC) constructed from plaque-purified ATCC VR-977 (Marchini et al., 2001). However, ATCC VR-977 is known to contain a mixture of two variants . One (Towne varRIT3 or Towne short ; here called varS) is represented in the BAC and, as described above, lacks U L /b9. The other (Towne long ; here called varL) is intact in U L /b9, and this region has been sequenced (Dolan et al., 2004).
We determined the sequence of ATCC VR-977 by aligning data obtained from an Illumina Genome Analyzer (http:// www.illumina.com) against a constructed reference sequence. To obtain the reference, the more reliable of the two varS BAC sequences (AC146851; Murphy et al., 2003) was reorganized into a genome-equivalent arrangement after identifying the termini and removing insertions in gene UL32 and the origin of DNA replication. The varS reference was generated from this sequence by inserting the region containing genes IRS1-US12 from strain Merlin (AY446894; Dolan et al., 2004) in place of the plasmid vector, which had replaced these genes during generation of the BAC. The varL reference was then constructed by inserting the U L /b9 sequence (GenBank accession no. AY446869; Dolan et al., 2004) into the varS reference in place of the duplication that had originally replaced U L /b9. The Illumina data were derived from whole cell DNA extracted from human fetal fibroblasts infected with ATCC VR-977, and assembled and viewed using Maq and Maqview (Li et al., 2008; http://maq.sourceforge.net). A total of 47.1 % of the 5 079 235 sequences (50 nt each) aligned with the derived varL consensus and the coverage was 516 reads per nt. Test assemblies using appropriate references confirmed that ATCC VR-977 contains both varL and varS, and showed that the whole population has frameshifts in genes RL13, UL1, UL130, US1 and US9. In addition, a 346 bp deletion in gene UL40 was present in almost the whole population, though detection of very sparse data from the deleted region indicated that a small proportion (much less than 1 %) of genomes might be intact. This deletion implies that varL and varS do not encode the gene UL40 signal peptide sequence, which contributes to natural killer (NK) cell evasion by upregulating human lymphocyte antigen-E (Tomasec et al., 2000). Eighteen clear single nucleotide polymorphisms (SNPs) were identified, but these could not be assigned to particular variants. In addition to the presence of U L /b9, and excluding the duplicate copies of the inverted repeats, ATCC VR-977 differed from the varS BAC at 40 loci consisting of 34 substitutions and six insertions. By referring to sequences available for other HCMV strains, it was possible to infer which sequence was mutated at 25 loci. The BAC was assessed as being mutated at 23 and ATCC VR-977 at one, with the remaining substitution corresponding to an SNP.
In contrast with ATCC VR-977, in which varL has retained U L /b9, all commonly used AD169 stocks lack this region. Therefore, it was of interest that one of us possessed an AD169 stock (varUC) that reputedly contained U L /b9 sequences. N. Lurain had received varUC at the University of Chicago from K. Thompson, who in turn had obtained it from M. Beem in 1981 at the same institution. No documentation was available on its history, but it was thought to have undergone at least 50 passages since its acquisition. Initial studies (N. Lurain, unpublished data) had demonstrated that varUC plaques were similar to those of strain Toledo (Kemble et al., 1996), which contains U L /b9 (Cha et al., 1996), appearing as clusters of refractile, rounded cells, rather than the well-separated, elongated cells characteristic of commonly used AD169 variants. Also, sequencing had revealed the presence of U L /b9 genes in varUC, specifically UL146 and a region containing the 39 end of UL144 and the 59 end of UL140 with a 3.2 kb deletion encompassing the intervening genes UL142 and UL141. Moreover, genotyping data from microarray experiments had indicated the presence of all U L /b9 genes except UL142 and UL141 (J. García-Ramírez, D. Foster, L. Buehler, N. Lurain & P. Ghazal, unpublished data). These findings implied that varUC is either an AD169 variant that has retained most of U L /b9 or another strain entirely.
In order to distinguish between these possibilities, several genes that are mutated in commonly used AD169 variants or that vary greatly between HCMV strains were amplified by PCR from DNA extracted from a stock of cell-released varUC and sequenced; U L /b9 was also sequenced in its entirety. These data were compared with the published genome sequences of two AD169 variants. One of these was Fig. 1. Schema for the derivation of the major lesion in the genome of commonly used AD169 variants from a wild-type virus genome; lengths are shown to scale. In the wild-type genome, the long and short unique sequences (U L and U S ; shown as thinner structures) are flanked by inverted repeats (ba/b9a9 and ca/c9a9; shown as thicker structures). As indicated by the dashed lines, the AD169 genome was generated by replacing a region at the right end of U L (U L /b9; shaded grey) by an inverted duplication of a region from the left end of the genome (arrow). This resulted in U L becoming shorter by 15 kb (19 genes) and b/b9 becoming longer by 10 kb (six genes and part of another). A similar phenomen occurred in the derivation of the Towne varS genome, with U L becoming shorter by 13 kb (15 genes) and b/b9 becoming longer by 11 kb (seven genes and part of another).
varUK, for which the sequence (X17403) was derived by Chee et al. (1990) and updated (BK000394) by the correction of errors and the insertion of a 929 bp region that is absent from certain stocks (Dargan et al., 1997;Mocarski et al., 1997). The other was varATCC, for which the sequence (AC146999) was derived by Murphy et al. (2003) from a BAC generated from plaque-purified ATCC VR-538 (Yu et al., 2002). In the 16976 bp of the 28780 bp determined for varUC that were comparable with varUK and varATCC, all three genomes were closely similar. The presence of U L /b9 in varUC, except for the previously characterized 3.2 kb deletion, was confirmed. This deletion is predicted to result in lack of expression of the UL144, UL142 and UL141 proteins, and expression of the UL140 protein with the C-terminal eight residues replaced by 71 residues specified by a reading frame that overlaps UL144.
The partial information obtained was consistent with varUC being an AD169 variant, and formed the basis for deriving the complete genome sequence from Illumina data derived from DNA extracted from pelleted cell-free virions. A reference was constructed from the varUK sequence, utilizing the partial varUC data to amend differences, and inserting U L /b9 in place of the duplication that had originally replaced U L /b9. A total of 92.4 % of 6 264 332 sequences (50 nt each) aligned with the derived varUC consensus, and the coverage was 1267 reads per nt. Test assemblies and PCR experiments demonstrated the absence of the U L /b9 deleted form characteristic of varUK and varATCC and the 929 bp deleted form present in some varUK stocks. The 3.2 kb deletion in U L /b9 was confirmed as being a feature of the entire population, and a 3.7 kb deletion in c9/U S , which affects genes IRS1, US1 and US2, was detected in the majority of the population. Test assemblies also showed that the entire varUC genome population contained the mutations in RL5A, RL13, UL36 and UL131A present in varUK and varATCC. Four clear SNPs were identified.  (FJ527563; 231781 bp). This excludes the duplicate copies of the inverted repeats (i.e. ab and ca) and 12 substitutions (nt 222147-223855) in a region of the varATCC sequence that was replaced during construction of the BAC. This replacement appears to have originated from a strain other than AD169 (probably Toledo).
DFor non-synonymous substitutions, an arrow indicates the change from the non-mutant to the mutant amino acid residue. Where it was not possible to distinguish the non-mutant and mutant residues, the alternatives are separated by a filled circle. dTermination codon indicated by z. §Frameshifted in all variants and therefore considered non-coding.
Given the apparent existence of a vast number of differentiable HCMV strains (e.g. Bradley et al., 2008;Rasmussen et al., 2003), the high degree of sequence similarity between the three variants and the sharing of several mutations in common, confirmed that varUC is an AD169 variant. Differences due to insertions included the presence of U L /b9 in varUC, length variations in the tandem repeat in a/a9 and heterogeneity in several non- Fig. 2. Schema for the derivation of AD169 variants from the original clinical material, based on the assumption that varATCC was generated from a US source and not from importation of varUK. Mutations in RL5A, RL13, UL36 and UL131A (and perhaps other, undetected mutations resulting in amino acid substitutions or affecting non-coding sequences) arose during the 14 passages that led to NIH 76559 and were inherited by all three lineages. Later on during the 14 passages, the deletion of U L /b9 occurred, and this was inherited by varUK and varATCC from NIH 76559. The 3.2 kb deletion in U L /b9 that characterizes varUC occurred either at an intermediate stage as a precursor of the larger deletion or in the separated lineage. Genes that are detectably disabled, and other mutations whose effects are unknown (Table 1), are listed below each variant.
coding polynucleotide tracts. Substitutions were identified at 54 loci (Table 1), with over half (29) in bac/b9a9c9 and the adjacent sequence at the left end of U L , and 32 in proteincoding regions (five synonymous and 27 non-synonymous). By referring to sequences available for other HCMV strains, it was possible to infer which sequence was mutated at 42 loci. A total of 36 mutations were specific to a single variant: nine to varUC and 27 to varATCC. Each of the remaining six mutations was present in pairs of variants: one in varUC/varUK, two in varATCC/varUC and three in varATCC/varUK.
Given the lack of historical information, it is not possible to reconstruct fully the lineages that led to the three AD169 variants. AD169 was isolated at the National Institutes of Health (Bethesda, MD, USA) from the adenoids of a 7year-old girl and passaged 14 times in human fibroblast cells to yield a stock named NIH 76559 (Rowe et al., 1956). The lineage that led to varUK was initiated by researchers at St George's Hospital Medical School (London, UK), who obtained NIH 76559 in 1960 and passaged it 40 times in human fibroblast cells. The resulting virus was used to make batches of a potential vaccine by 16-24 additional passages (Elek & Stern, 1974). The varUK sequence was determined from a set of plasmid clones generated from a plaque-purified derivative of one of these passages (Oram et al., 1982). The route by which varATCC was derived from NIH 76559 is less clear, but it seems likely that it originated from an exclusively US lineage, since a US researcher, W. A. Chappell, deposited the stock with the ATCC. The ATCC has declined to reveal to us when this occurred, but has distributed AD169 at least since 1973 (e.g. Smith & de Harven, 1973), and now markets it as ATCC VR-538. The relatively large number of mutations specific to varATCC is consistent with the impressive numbers of passages to which AD169 was subjected in early years in some US laboratories (e.g. 232 passages by Vonka & Benyesh-Melnick, 1966). If varATCC indeed originated from a purely US source, a schema of the type illustrated in Fig. 2 may be proposed. However, resolution of the details is confounded by the potential persistence of mutations (including those that may have arisen in the ancestor of any two or all three variants) to different extents in subsequent lineages, and to the fact that the varUK and varATCC sequences originated from molecular clones made from plaque-purified viruses and therefore do not necessarily represent whole populations.
Like many other passaged HCMV strains, the Towne and AD169 variants are mutated in gene RL13 and one of the three genes in the UL128 locus (UL128, UL130 and UL131A) (Akter et al., 2003;Dolan et al., 2004;Hahn et al., 2004). This implies strong selection during passage in human fibroblasts against the encoded functions, which are involved in cell tropism (reviewed by Sinzger et al., 2008a). Towne varS and the AD169 variants are also mutated in U L /b9, as is strain TB40/E, which is frameshifted in UL141, with a derivative additionally lacking UL145 and UL144 (Dolan et al., 2004;Sinzger et al., 2008b;Tomasec et al., 2005). The patterns of mutation suggest that more than one gene in this region (UL145, UL144, UL142, UL141 or UL140) may be selected against, though not as strongly as RL13 and the UL128 locus. It is not immediately apparent why expression of these genes might be deleterious. The proteins encoded by UL142 and UL141 are involved in evasion of NK cell function, the former by downregulating MICA, which is a ligand for the activating receptor NKG2D (Chalupny et al., 2006;Wills et al., 2005), and the latter by sequestering CD155 (PVR), which is a ligand for the activating receptors CD226 (DNAM-1) and CD96 (TACTILE) . The UL144 protein activates NF-kB in a TRAF6-dependent manner, causing upregulation of the chemokine CCL22 (MDC) (Poole et al., 2008), and also inhibits T cell proliferation by binding CD272 (BTLA) (Cheung et al., 2005).
We have contributed towards the characterization of variants of HCMV strains Towne and AD169, so that biological data may be assessed with greater rigour. The sequence of Towne ATCC VR-977 confirmed the presence of two major variants (varL and varS) and extended knowledge of their shared mutations. A novel AD169 variant (varUC) was shown to be genetically more intact than varUK and varATCC and may be a new tool in the hands of HCMV researchers.