Ancient Evolution and Dispersion of Human Papillomavirus 58 Variants

ABSTRACT Human papillomavirus 58 (HPV58) is found in 10 to 18% of cervical cancers in East Asia but is rather uncommon elsewhere. The distribution and oncogenic potential of HPV58 variants appear to be heterogeneous, since the E7 T20I/G63S variant is more prevalent in East Asia and confers a 7- to 9-fold-higher risk of cervical precancer and cancer. However, the underlying genomic mechanisms that explain the geographic and carcinogenic diversity of HPV58 variants are still poorly understood. In this study, we used a combination of phylogenetic analyses and bioinformatics to investigate the deep evolutionary history of HPV58 complete genome variants. The initial splitting of HPV58 variants was estimated to occur 478,600 years ago (95% highest posterior density [HPD], 391,000 to 569,600 years ago). This divergence time is well within the era of speciation between Homo sapiens and Neanderthals/Denisovans and around three times longer than the modern Homo sapiens divergence times. The expansion of present-day variants in Eurasia could be the consequence of viral transmission from Neanderthals/Denisovans to non-African modern human populations through gene flow. A whole-genome sequence signature analysis identified 3 amino acid changes, 16 synonymous nucleotide changes, and a 12-bp insertion strongly associated with the E7 T20I/G63S variant that represents the A3 sublineage and carries higher carcinogenetic potential. Compared with the capsid proteins, the oncogenes E7 and E6 had increased substitution rates indicative of higher selection pressure. These data provide a comprehensive evolutionary history and genomic basis of HPV58 variants to assist further investigation of carcinogenic association and the development of diagnostic and therapeutic strategies. IMPORTANCE Papillomaviruses (PVs) are an ancient and heterogeneous group of double-stranded DNA viruses that preferentially infect the cutaneous and mucocutaneous epithelia of vertebrates. Persistent infection by specific oncogenic human papillomaviruses (HPVs), including HPV58, has been established as the primary cause of cervical cancer. In this work, we reveal the complex evolutionary history of HPV58 variants that explains the heterogeneity of oncogenic potential and geographic distribution. Our data suggest that HPV58 variants may have coevolved with archaic hominins and dispersed across the planet through host interbreeding and gene flow. Certain genes and codons of HPV58 variants representing higher carcinogenic potential and/or that are under positive selection may have important implications for viral host specificity, pathogenesis, and disease prevention.

are ancient DNA viruses that have evolved over millions of years, reaching a high level of diversity at the genomic and phenotypic levels (4,5). The Alphapapillomavirus genus is associated with infections of the anogenital and oral mucosa and contains all high-risk types, among which HPV16 and HPV18 account for ϳ70% of all cervical cancers across the world. Several high-risk HPV types have shown geographical/ethnic differences in disease attribution. Of particular interest, HPV58 ranks sixth or seventh as a cause of cervical cancer globally, but it is the third most predominant type in East Asia and was found in 10 to 18% of cervical cancers and precancers (6)(7)(8). However, the reason for this remarkable geographic distribution of HPV58-associated cervical disease burden is unknown.
HPV types are defined as those with L1 genomic sequences differing by Ͼ10%, whereas variant lineages and sublineages differ by 1 to 10% and 0.5 to 1% over the complete genomes, respectively (9,10). At both the type and variant levels, substantial differences in evolutionary relatedness and carcinogenicity have been observed. For example, HPV16 variants were shown to correlate with the continental distribution of humans and were associated with a different degree of cancer risk (11,12). In a multicontinental epidemiological study, the HPV58 E7 T20I/G63S variant was more frequently detected in East Asia and carried a 7-to 9-fold-higher risk for cervical cancer, indicating that the viral variant plays a key role in modulating its unique carcinogenic property (13). These data strongly indicate that sequence variation at certain sites of the HPV genome can critically determine phenotypic characteristics, including oncogenicity.
Evolutionary relationships between micropathogens and their hosts are often complex, with multiple time and space scales over which a phylodynamic interaction captures the relationships among pathogen genetic diversity, host immunity, and pathogen transmission (14)(15)(16). A prevailing model suggesting virus-host codivergence has been shown for feline papillomaviruses within the genus Lambdapapillomavirus isolated from oral lesions (17), but the evolution of papillomaviruses does not mirror the host phylogeny accurately (18). While natural selection and genomic drift drive important forces shaping the wide variety of papillomavirus (PV) phylogenies, potential crucial mechanisms for the well-adapted spectrum of these slowly evolving doublestranded DNA viruses (e.g., papillomaviruses or polyomaviruses) may include host switching, niche adaptation, lineage sorting and duplication, or rare recombination (18)(19)(20)(21). A common assumption of HPV evolution was that viruses have codiverged with modern humans as a host population and distributed across the planet through Homo sapiens migration (22,23), but this scenario is being challenged by the differential coevolution of HPV16 lineages with archaic hominins (24). Understanding the capacity for, and history of, viral transmission and adaptation to host immune selection and epidemic dynamics will facilitate the clarification of how virus genetic variation determines phylogeny and pathogenicity in individual hosts and the population.
In this study, we applied phylogenetic and bioinformatics analyses on a large data set of HPV58 variants to understand viral genetic variation and evolutionary dynamics. By estimating divergence times, we determined a complex evolutionary history of HPV58 variants that involves ancient codivergence with archaic hominins and recent viral transmission from Neanderthals/Denisovans to modern human populations. By examining viral gene substitutions and genomic signatures, we identified certain genes and mutations that process positive selection; some of them may represent higher carcinogenic potential. These findings will have clinical implications for screening, early intervention, and therapeutics in the future.

RESULTS
HPV58 variant genomic diversity and phylogeny. Complete genome analysis using multiple phylogenetic algorithms clustered 90 HPV58 isolates into four lineages that were further divided into seven sublineages ( Fig. 1; see also Table S1 in the supplemental material). The maximum inter(sub)lineage difference was observed between A2 and D2 variants (mean percentage Ϯ standard error, 1.45% Ϯ 0.13%), and the overall interlineage and intersublineage mean differences ranged between 0.90% and 1.38% and between 0.54% and 0.75%, respectively. When A and non-A (B, C, and D) variants were compared, the latter encompassed a higher level of genetic diversity (nucleotide diversity [] of 0.00748 versus 0.00439).
A summary of the sequence diversity of 90 complete genomes is shown in Table 1. In total, 500 sites (6.4%) have changed across the complete genome, among which 202 codon sites (8.4%) within 8 open reading frames (ORFs) were nonsynonymous. The noncoding regions of HPV58 (the long control region [LCR], noncoding region 1 [NCR1], and NCR2) and E4/E5 ORFs showed higher levels of genomic diversity, while the L1 protein (amino acid change) was relatively conserved compared to other ORFs.
No evidence of recombination was found across the HPV58 complete genome by a genetic algorithm for recombination detection (GARD). However, phylogenetic incongruence was observed among A variants when separated maximum likelihood (ML) trees inferred from the early genes (E1, E2, E6, and E7) and the late genes (L1 and L2) were compared (trees are not shown). The A1 sublineage was closer to A2 in both early gene (86% bootstrap value) and complete genome trees ( Fig. 1) but closer to A3 in the late gene tree (73% bootstrap value). A bootstrap scanning method with a window size of 1,000 bp using SimPlot identified potential breakpoints among the E1, E2, L2, and L1 genes within the A variants (data not shown). Either raw sequences or sequences filtered by removing mutations potentially due to the antiviral activity of human APOBEC3 (hA3) cytosine deaminases showed similar topology incongruences between early and later gene trees.
Although sampling of isolates increased the repertoire of genomic variability, our data may have covered the majority of variant lineages within the given populations since the rarefaction curves of parsimony-informative single nucleotide polymorphisms (SNPs) (site detected in Ն2 samples) leveled off with increasing numbers of genomes. However, additional variations, including potential false-positive SNPs or sequencing errors, remain unavoidable, as the number of singleton SNPs (variations present only once in the sampled genomes) increased almost linearly following the increase in sample size (data not shown).
Natural selection of the HPV58 genome. In order to determine whether positive selection is playing a role in shaping the genetic makeup of HPV58, six models employing the maximum likelihood regression of codon substitutions ( ϭ dN/dS) were applied, and the model with the highest log likelihood value was chosen as the "best" one ( Table 3). The measure is an average over all sites in an ORF. The E4 gene had the highest average (model 3 [M3]; ϭ 1.9), with 5.7% of codons (mean ϭ 20.5) under diversifying selection. Statistically significant codon sites in E4 identified by the likelihood ratio test (LRT) were found to be amino acids 1L, 39S, and 74V (P Ն 0.95 by CODEML, and P Ն 0.90 by a fast unconstrained Bayesian approximation [FUBAR]). Although the average dN/dS ratios of other ORFs were Ͻ1, the HPV58 oncogenes E7 demonstrated to be under positive selection. No amino acid within E1, E5, or L2 met criteria for positive selection using the LRT, although several sites (E1 amino acids 79I and 361D and L2 amino acid 433T) had dN/dS ratios of Ն1 by CODEML. Geographic distribution of HPV58 variants. Based on SNP patterns and phylogenetic tree topologies, a total of 747 HPV58 variants with known geographic origins from 16 countries/regions was assigned to lineages/sublineages (Table 4). Although these partial sequences spanned variable genes/regions, including E6, E7, L1, and the LCR, we can unambiguously assign each isolate to a phylogenetic branch with maximum likelihood in a complete genome tree using a placement algorithm. As shown in the summarized charts of HPV58 phylogeography, isolates from Asia were equally represented by the A1, A2, and A3 sublineages, with an accumulated prevalence of Ͼ97% (Fig. 3a). America and Europe were predominated by A2 variants (76% in America and 78% in Europe) but showed a larger proportion of non-A variants than did Asia (14.4 to 16.5% versus 2.7%). In Africa, about half of the variants were non-A variants, which were predominated by C variants, whereas the A variants were mainly assigned to the A2 sublineage. Overall, the A3 sublineage represented by a high-risk E7 T20I/G63S variant was most common (26.9%) in Asia and, to a lesser extent, was common in America (9.1%) and Europe (4.1%) but was absent in Africa.
Principal component analysis (PCA) using a weighted UniFrac algorithm led to variants that were well clustered into three distinct groups corresponding to the source of these samples (Africa, Asia, and America/Europe) (Fig. 3b). Globally, the A2 sublineage was the most widespread variant, whereas the A1 and A3 sublineages were rarely founded outside Asia (P Ͻ 0.001) (Fig. 3c). The majority of HPV58 non-A variants was detectable in Africa compared with other three continents; due to the limited sample size, however, we did not identify the B1 and D1 sublineages in Africa. Divergence time estimation and ancestral codon mutation. We used a Bayesian Markov chain Monte Carlo (MCMC) framework and the previously reported evolutionary rate of feline papillomaviruses to estimate the HPV58 variant divergence times (Table 5). Papillomaviruses process evolution with a low mutation rate due, in part, to the fact that this kind of double-strand DNA virus uses the host cell DNA replication machinery, characterized by high fidelity, proofreading capacity, and postreplication repair mechanisms. A combination of relaxed log-normal molecular clock and coalescent Bayesian skyline models provided the best performance, with a tree height estimated at 451,600 years ago (451.6 kya) (95% highest posterior density [HPD], 306.0 to 619.6 kya). This estimation is around three times longer than the modern Homo sapiens divergence time (ca. 120 to 180 kya) and implies a hominin host switch (HHS) scenario indicating ancestral viral transmission between archaic and modern human populations. We then used an archaic hominin divergence time (500 kya; 95% HPD, 400 to 600 kya) and a modern human out-of-Africa migration time (90 kya; 95% HPD, 60 to 120 kya) to calibrate the times for the most recent common ancestor (MRCA) of HPV58 variants (Fig. 4, arrows). When time points were introduced into the HPV58 variant tree, similar divergence times were estimated, with a mean substitution rate of HPV58 variants ranging between 1.72 ϫ 10 Ϫ8 and 1.91 ϫ 10 Ϫ8 substitutions/site/year. For the final plot, the HHS scenario with a combination of a relaxed log-normal molecular clock and coalescent Bayesian skyline models showed the strongest support (Akaike's information criterion for MCMC samples [AICM] of 32,648) for the time inference of HPV58 variants (Fig. 4). The initial divergence of HPV58 variants was estimated to be approximately 479 kya (95% HPD, 391 to 570 kya), largely predating the out-of-Africa migration of modern humans (60 to 120 kya).
The potential ancestral codon mutations were predicted by using a maximum likelihood regression model (see Table S3 in the supplemental material). Interestingly, some sites associated with the E7 T20I/G63S variant that carries higher carcinogenicity had mutated earlier than the time of the divergence of the A3 sublineage. For example, E7 G760A within B1 variants (branch 3 [Br3] in Fig. 4 and Table S3 in the supplemental material) and E1 A1965T within the MRCA of A variants (Br10) may represent ancient adaptation or fitness in archaic hominins.

DISCUSSION
HPV58 is one of the oncogenic types attributing to a high proportion of cervical cancers in East Asia (8). HPV variants have been reported to be different in persistence, viral load, and carcinogenicity (11,(25)(26)(27), indicating that each HPV lineage has a different evolutionary history in their host population. In this study, we applied multiple phylogenetic algorithms and used a large data set of complete genome and partial gene sequences to investigate the origin, dispersal, and diversity of HPV58 variants. We observed a predominant dispersal of HPV58 A variants worldwide and A1/A3 variants in Asia; most importantly, the estimated divergence times of HPV58 variants largely  Table  4) were assigned to a lineage/sublineage and are summarized by continent in the pie charts. (b) Principal-component analysis using a weighted UniFrac algorithm clustered different study cohorts into three distinct groups, mainly matching the geographic locations where the viruses were isolated. (c) Relative frequencies of HPV58 lineage/sublineage distributions in four continents. A higher frequency indicates a predominance of certain lineages/sublineage in the associated geographic area.

Evolution of HPV58 Variants
Journal of Virology predated the recent out-of-Africa migration of modern human populations. These findings confirm a hominin host switch scenario previously reported for HPV16 showing that the ancient hominin-virus codivergence and recent host switch events shaped the radiation that we observe in the phylogenetic tree of extant HPV variants (24). The majority of HPV variants currently predominant in Eurasian populations could be the descendants of viral transmission from Neanderthals/Denisovans to modern human populations through interbreeding and gene flow. The divergence time estimation and the phylogenetic separation between HPV58 A and non-A variants mirror the host split between archaic hominins and modern human ancestors, indicating that ancestral HPV58 lineages may have already existed before the emergence of modern humans, which rejects the initial assumption of codivergence between HPV and H. sapiens migration (22). Interestingly, the initial divergence of ancestral HPV58 lineages estimated here, similar to the split between HPV16 lineages A and BCD (461 kya; 95% HPD, 365 to 561 kya) (24), is compatible with the speciation between Neanderthals/Denisovans and modern H. sapiens ancestors (Fig. 5). When Neanderthal/Denisovan ancestors diverged and expanded out of Africa, they may have carried ancestral HPV variants (e.g., HPV16A or HPV58A). The morphological features typical of Neanderthals first appeared in the European fossil record about 400,000 to 600,000 years ago. Progressively more distinctive hominin forms subsequently evolved (e.g., splitting of Neanderthals and Denisovans) until their extinction around 30,000 years ago (28)(29)(30). During the late part of their history, Neanderthals lived in Europe and Western Asia and presumably came into contact with anatomically modern humans in the Middle East from at least 80,000 years ago. Interbreeding between Neanderthals/ Denisovans and modern humans is probably largely responsible for the sexual transmission of viruses through host gene flow. This inference is made based on the supposed 2 to 4% of nuclear DNA in Eurasians that can be traced to Neanderthals (29,31). This scenario can also be confirmed in part by the very low prevalences of certain HPV58 A variants (particular the A1 and A3 sublineages) in current African populations, strongly indicating that these variants were of Neanderthal origin. Viral transmission from modern humans to Neanderthals/Denisovans was possible based on evidence of ancient gene flow from early modern humans to Eastern Neanderthals (30). However, papillomavirus usually establishes infection at the basal layer of epithelium cells, making it impossible to detect viruses from fossil bones of archaic hominins.
The real divergence of HPV58 variants is likely more complex. In parallel with the ancient out-of-Africa expansion of Neanderthals/Denisovans, the viruses remaining in Africa codiversified with subsequent host speciation. Some viral variants (e.g., the D1 a Node numbers match the notation in Fig. 4. Boldface indicates the time estimation with "best" clock model and tree prior. b Two time points were introduced in the HPV58 variant tree to calibrate the time estimate. Boldface indicates the time estimate with the "best" models. sublineage) may have dispersed outside Africa following the recent out-of-Africa migration of modern humans, while some expanded in one or more isolated hominin populations in Africa or became extinct. The MRCA for HPV58 non-A variants (358 kya; 95% HPD, 245 to 473 kya) seems older than the one for A variants (198 kya; 95% HPD, 122 to 283 kya) ( Fig. 4 and Table 5), consistent with the low genomic diversity of A variants, probably because of a population bottleneck where only a proportion of viruses in Neanderthals/Denisovans was able to be horizontally transferred to modern humans. In contrast, HPV58 non-A variants are more diversified, matching the observation that African populations were the most diverse populations genetically (32). This may also support the notion that both modern humans and HPV58 non-A variants arose in Africa. Besides, interbreeding between archaic hominins was multiregional, occurring at different times and places (29). For example, a subset of modern human ancestors who carried some Neanderthal DNA may have headed east and interbred with Denisovans in Oceania (e.g., Australia, Melanesia, and the Philippines) (33). Although the contribution of Denisovans to modern humans was quantitatively small, gene flow from ancient Oceanians (after they mixed with Denisovans) to mainland Asian ancestors may have accumulated; as a result, certain viral variants (e.g., A1 and A3 sublineages) became more predominant in East Asia. While we were able to analyze the largest available worldwide collection of HPV58 isolates, the number of samples from certain areas is small, precluding a determination of the accurate geographic distribution of viral variants in different geographic areas. For example, the history of B1 variants is still elusive. The B1 sublineage is closer to B2, but they cannot be classified in the same monophyletic clade (Fig. 1); independent evolutionary histories encompassed by each of them may explain their differences in dispersal and frequency in present-day populations. Similar to HPV16 and HPV18 genomes (34,35), the majority of HPV58 variant genes are under purifying selection, with average dN/dS ratios of Ͻ1. The low substitution rate limits the actual number of evolutionary events and maintains the core functions of HPV-encoded proteins in a neutral fashion. In contrast, several codon sites under positive selection may affect viral phenotypes involved in facilitating host immune evasion, maintaining asymptomatic infection, or enhancing viral persistence and replication. For example, E7 aa 63 is under positive selection, with 3 different amino acid changes at the same codon: G63S is shared by the A3 and B1 sublineages, G63D is conserved in the A2 and D2 sublineage, and G63H is unique to the D1 sublineage. Such genetic changes possibly display differential viral loads and infection persistence that are directly involved in enhanced adaptive introgression in host alleles. Alternatively, the genetic variations that occur in HPV58 may be induced partly by the host innate immune system, such as the antiviral activity of hA3 cytidine deaminases (36). It has  6 122.2, 283.3 152.8 100.3, 219.8 358.1 244.8, 472.7 273.3 183.1, 367.0 206.9 138.7, 285.6 103.0 79.2, 125 been reported that APOBEC3-mediated cytidine deaminase activity could target HPV16 genes and induce viral mutations (37). These induced mutations, if not lethal, may also be responsible for the long-term accumulation of genomic changes that affect the success of niche adaptation or function fitness contributing to HPV-associated cancer (38). Nevertheless, functional investigations will be warranted to clarify whether these and other genotype changes affect any phenotypes in present-day HPV variants.  (Table 5), was used to calculate the divergence times. An HPV16 variant substitution rate and two human evolutionary time points of calibration (arrowed at nodes 0 and 6) were set. Branch lengths are proportional to the times scaled in thousands of years. Gray bars indicate the 95% HPD for the corresponding divergence age. The branches are coded (Br0 to Br14), and ancestral codon mutations are listed in Table S3 in the supplemental material.
To date, this is the most comprehensive study on the evolutionary history of HPV58 variants. One interpretation of the data in this work implies ancient hominin-virus codivergence between human papillomaviruses and hosts, while certain HPV variants, particularly the isolates predominant in present-day Eurasia, could be the descendants of sexual transmission from Neanderthals/Denisovans to modern human populations. The implementation of new technology such as next-generation sequencing makes sequencing of large amounts of complete HPV genomes more feasible and economical (11,39,40). A high-throughput, ultradeep-coverage method permits a more detailed examination of genotype-phenotype relationships between viral genomics and carcinogenesis. It will also help to establish local adaptation between virus and host genomes to explain the differential dispersals and cancer risks of HPV variants in different ethnic populations.

MATERIALS AND METHODS
Isolates and complete genome sequencing. In our previous study, 445 HPV58 isolates collected from 15 countries/cities across four continents had been sequenced for E6-E7-E2-E5-L1-LCR (7,13). In the present study, we identified 44 isolates that carry unique variation sites or patterns for complete genome sequencing. Type-specific primers were designed to amplify the complete genome by using nested overlapping PCR as previously reported (10). The collection of samples for sequencing analysis had been approved during the course of previous studies.
Phylogenetic analysis and tree construction. Ninety complete HPV58 genomes, including 44 new sequences from the present study, 26 sequences previously reported by our group (10), and 20 sequences available in the NCBI/GenBank database, were used for evolutionary analysis (see Table S1 in the supplemental material) (40)(41)(42)(43)(44)(45)(46). ML trees were constructed by using RAxML MPI v8.2.9 (47) and PhyML MPI v3.0 (48) with optimized parameters based on complete genome nucleotide sequences aligned by MAFFT v7.221 (49). Data were bootstrap resampled 1,000 times in RAxML and PhyML. MrBayes v3.2.6 (50) with 10,000,000 cycles for the MCMC algorithm was used to generate Bayesian trees. A 10% discarded burn-in was set to eliminate iterations at the beginning of the MCMC run. For Bayesian tree construction, a general time-reversible (GTR) model identified by ModelTest v3.7 (51) was set for among-site rate variation, allowing substitution rates of aligned sequences to be different. The CIPRES Science Gateway (52) was accessed to facilitate RAxML and MrBayes high-performance computing. To detect phylogenetic congruence among genes of HPV58 variants, separated ML trees inferred from concatenated nucleotide sequences of early genes (E6, E7, E1, and E2 ORFs) and late genes (L2 and L1 ORFs) were constructed using RAxML. SimPlot (53) and GARD within the HyPhy distribution (54) were used to detect potential recombination events across six genes. Mutations potentially due to the antiviral activity of hA3 cytosine deaminases (TCR¡TKR or YTA¡YMA; boldface indicates the mutation in the trinucleotide motif; M ϭ A or C; K ϭ G or T; R ϭ A or G; Y ϭ C or T) were excluded from tree comparisons and recombination tests (38).
Sequence diversity and genomic signatures. SNPs and amino acid changes were determined by using scripts developed in-house with R v3.3.2 (55). The rarefaction curves of SNPs were generated by EstimateS v9.1.0 (56). Inter-and intralineage nucleotide sequence differences were calculated by using the p-distance method in MEGA6 (57). A Wilcoxon-Mann-Whitney U test was used to determine the significance of pairwise differences between the defined groups. Genome-wide sequence signature analysis using MI computation was applied to examine the strength of the association between two variations (58). The pairwise MI indexes were calculated by using the mutinformation function in the R package "infotheo." A higher MI value indicates greater agreement between two compared sites.
Detection of positive selection. The maximum likelihood regression models of codon substitution ( ϭ dN/dS) were applied to identify whether an HPV58 gene(s) was under positive selection (59,60). These models view the codon as the fundamental unit of evolutionary change and take into account genealogic history when calculating scores. Log-likelihood scores evaluate the quality of the fit of the input data to the conditions of the model. Six models, including M0 (one ratio), M1 (neutral), M2 (selection), M3 (discrete), M7 (beta), and M8 (beta and ), used for the distribution of distinct ORFs (E6, E7, E1, E2, E4, E5, L2, or L1), were implemented in the CODEML program in the PAML v4.8a package (60), with a guided RAxML tree inferred from the global complete genome alignment. The codon sequences of each ORF were aligned based on the amino acid alignment by MUSCLE (61). The ratio of nonsynonymous/synonymous substitution rates is an indicator of natural selection, with a value of 1 representing neutral variation, a value of Ͻ1 representing purifying selection, and a value of Ͼ1 representing diversifying positive selection. Three LRTs were performed to assess the influence of positive selection on a particular coding region, which compared M1 with M2, M0 with M3, and M7 with M8. When alternative models (M2, M3, and M8) suggest the presence of sites with a value of Ͼ1, results of all three tests taken together are considered evidence of positive selection (34,35). Amino acid sites in a protein are expected to be under different selective pressures and have different underlying ratios.
A FUBAR algorithm within the HyPhy distribution (62) was used to verify the sites under positive selection observed by CODEML. This hierarchical Bayesian MCMC method ensures robustness against model misspecification by averaging over a large number of predefined site classes. A codon site with a value of Ͼ1 was considered to be under positive selection when the posterior probabilities determined by CODEML and FUBAR were Ն0.950 and 0.9000, respectively.
Geographic distribution of HPV58 variants. Information on the geographic sources of isolates was obtained from the corresponding publications (7,42,(63)(64)(65)(66). Tree topologies of partial sequences spanning variable genes/regions of 747 HPV58 isolates obtained worldwide (Table 4) were constructed using pplacer v1.1.alpha17 (67) by placing short sequences on a reference tree to maximize phylogenetic likelihood according to a complete genome alignment. The reference RAxML tree was based on 90 complete genomes that we used for evolutionary analysis. A cutoff value of a maximum likelihood of Ն0.8 was set as a confident assignment of HPV58 isolates into each (sub)lineage. The abundances of each lineage from the same country/city were combined and normalized as a percentage. We used a weighted UniFrac method in the R package "GUniFrac" (68) to calculate the pairwise distances between geographic locations, based upon which a PCA was performed to visualize the clustering of countries/ cities by using the betadisper function in the R package "vegan." Four geographic groups, Africa, America, Asia, and Europe, were summarized for the distribution of HPV58 variant lineages; for each lineage, its frequency in each geographic group was calculated based on the summary of individual percent abundances divided by the summary of the total percent abundance.
Divergence time estimation. We used a Bayesian MCMC method implemented in BEAST2 v.2.4.5 (69) to estimate the divergence times of HPV58 variants. Three tree priors were estimated by using the (i) coalescent constant population, (ii) Yule model, and (iii) coalescent Bayesian skyline demographic model, with assumptions that the papillomavirus genome has a strict mutation rate or that there is an uncorrelated log-normal distribution (UCLD) molecular clock model of rate variation among branches (Table 5). We chose the GTR sequence revolution model with gamma-distributed rate heterogeneity among sites and a proportion of invariant sites (GTRϩGϩI) determined by the best-fit model approach of Modeltest v3.7 (51). The complete genome alignment and a previously reported PV evolutionary rate of 1.95 ϫ 10 Ϫ8 substitutions per site per year (95% confidence interval [CI], 1.32 ϫ 10 Ϫ8 to 2.47 ϫ 10 Ϫ8 substitutions per site per year) were used (17).