HPV16 variants distribution in invasive cancers of the cervix, vulva, vagina, penis, and anus

Abstract Human papillomavirus (HPV)16 is the most oncogenic human papillomavirus, responsible for most papillomavirus‐induced anogenital cancers. We have explored by sequencing and phylogenetic analysis the viral variant lineages present in 692 HPV16‐monoinfected invasive anogenital cancers from Europe, Asia, and Central/South America. We have assessed the contribution of geography and anatomy to the differential prevalence of HPV16 variants and to the nonsynonymous E6 T350G polymorphism. Most (68%) of the variance in the distribution of HPV16 variants was accounted for by the differential abundance of the different viral lineages. The most prevalent variant (above 70% prevalence) in all regions and in all locations was HPV16_A1‐3, except in Asia, where HPV16_A4 predominated in anal cancers. The differential prevalence of variants as a function of geographical origin explained 9% of the variance, and the differential prevalence of variants as a function of anatomical location accounted for less than 3% of the variance. Despite containing similar repertoires of HPV16 variants, we confirm the worldwide trend of cervical cancers being diagnosed significantly earlier than other anogenital cancers (early fifties vs. early sixties). Frequencies for alleles in the HPV16 E6 T350G polymorphism were similar across anogenital cancers from the same geographical origin. Interestingly, anogenital cancers from Central/South America displayed higher 350G allele frequencies also within HPV16_A1‐3 lineage compared with Europe. Our results demonstrate ample variation in HPV16 variants prevalence in anogenital cancers, which is partly explained by the geographical origin of the sample and only marginally explained by the anatomical location of the lesion, suggesting that tissue specialization is not essential evolutionary forces shaping HPV16 diversity in anogenital cancers.

Oncogenic potential is not evenly distributed among oncogenic HPVs. Instead, HPV prevalence largely differs between types and between geographical regions, and the probability of progression from a clinically asymptomatic cervical infection to ICC is different for different HPVs [11,12]. HPV16 is the most frequently detected HPV in all cervical infections, from normal cytology to ICC, in all world regions [11,12]. HPV16 is also the most oncogenic HPV, responsible for 61% of all ICCs worldwide [4] and for even higher fractions of other HPV-associated anogenital carcinomas [6,9,10]. The biological reasons underlying the increased prevalence and oncogenicity of HPV16 compared with other closely related viruses, for example, the sister viruses HPV31 and HPV35, remain unclear [13].
Sequence diversity within HPV types is described in terms of viral variants [14]. The best classification for HPV16 variants has been proposed by Burk and coworkers, describing four lineages and a number of sublineages and applying an alphanumeric code, for example, HPV16_ A4 [15]. Further, a large body of experimental research on the differential biological activities of HPV16 variants has focused on the E6 gene, especially on the T350G polymorphism, corresponding to the L83V amino acid substitution in the E6 oncoprotein. The initial literature described 350T as the "prototype," found in the "European" HPV16 variant, and the 350G allele as the "nonprototype," found in "non-European" HPV16 variants. However, the T350G polymorphism is found in different HPV16 variant lineages and is not a specific marker of any of them [16].
Papillomavirus variants are genetically very close, with above 98% nucleotide identity [14], but nonetheless HPV16 variants are suggested to differ in their oncogenic potential [17]. Particularly, the E6 T350G polymorphism has been associated with differential persistence and risk of progression to precancerous cervical lesions [17,18].
The objective of this study was to characterize the viral component in a comprehensive set of invasive tumors of the cervix, vulva, vagina, penis, and anus, encompassing 35 countries within three continents. We aimed to analyze the differential prevalence of HPV16 variants as well as of the intensively studied T350G polymorphism as a function of the anatomical location of the lesion, the geographical origin of the samples, and the age at cancer diagnosis.
Specimens were received anonymously and allocated a unique identification number upon reception, and the respective local and ICO ethic committees approved all the study protocols.

Identification and selection of the most informative regions
The most variable regions in the HPV16 genome were identified to maximize sequence diversity and phylogenetic signal in the targeted DNA fragments. We retrieved 109 HPV16 complete genome sequences from GenBank. Coding regions were aligned at the amino acid level with Muscle 3.7 [19] (http://www.ebi.ac.uk/Tools/msa/muscle/), while the upstream regulatory region (URR) was aligned at nucleotide level. Phylogenetic inference was performed at nucleotide level for each alignment, as well as for the concatenated full-length aligned genome, under a maximum likelihood framework using RAxML v7.2.8 [20] (http://www.exelixis-lab.org/) and the GTR+Γ 4 as substitution model. Robustness of tree individual nodes was assessed by bootstrap resampling analysis, as determined with the -autoMRE command [21]. Using the full-length genome data and the -J MR_DROP command [22], three rogue taxa were identified to show inconsistent positions during bootstrapping and were excluded from further analyses. The final alignment for the full-length genome comprised 7925 nucleotides and 548 distinct alignment patterns (Fig.  S1). The well-resolved maximum likelihood phylogenetic trees obtained were employed to compute tree-based pairwise genetic distances (nucleotide substitutions per site) for each pair of taxa and for each genomic region analyzed, using the RAxML -f x command. Distances were then normalized with respect to the genetic distance between the corresponding taxa for the complete genome.

PCR and sequencing
DNA was extracted from four 5-μm paraffin slices by incubation overnight at 56°C with 250-μL proteinase K buffer (10 mg/mL proteinase K, 50 mmol/L Tris-HCl, pH 8.0) followed by incubation at 95°C for 8 min to inactivate proteinase K, and stored at −20°C until use. A 1:10 water dilution of this DNA solution was used for downstream processing. PCR primers were designed to target-specific HPV16 genome regions, so that well-described linagespecific polymorphisms were covered by the corresponding amplicons (Table S2). We also used primers previously designed by Larsson and coworkers [23] to span two positions in the E6 gene that have been thoroughly analyzed in several studies (i.e., nt 131 and 350, reference sequence GenBank: NC_001526). All PCR reaction mixtures contained: 0.125 U/μL AmpliTaq Gold ® DNA Polymerase (Life Technologies, Alcobendas, Spain), 2.0 mmol/L MgCl 2 , 0.2 μmol/L deoxynucleotides triphosphate (Life Technologies, Alcobendas, Spain), 0.2 μmol/L forward and reverse primer (Biolegio , Nijmegen, The Netherlands), and 5 μL DNA solution. PCR conditions were 95°C for 10 min; 40 cycles of 30 sec at 94°C, 30 sec at 58°C, 30 sec at 72°C; plus 7-min final extension at 72°C. PCR products were Sanger-sequenced at Genoscreen (Lille, France) in both strands using four pairs of primers. (Table S2).

Phylogenetic analyses
Phylogenetic relationships of the amplified E6, L2, and LCR short fragments were placed in the global context of HPV16 genetic variability using an Evolutionary Placement Algorithm on RAxML v7.2.8 with the GTR+Γ 4 model [19,20]. The algorithm provides likelihood weights for placing the partial sequences into the different nodes in the reference tree, in our case based on the pruned full-length genome alignment described above. Sequences obtained from our samples were incorporated into the reference alignment with MAFFT v7, and their phylogenetic placement was individually inferred with the -f v command in RAxML [21]. We integrated the results for all nodes and used 0.7 as a likelihood cutoff value to assign each sample into a specific variant lineage, namely A1-3, A4, B, C, and D (Table S3, Fig. S1). Using the 0.7 cutoff, 12 samples (1.7%) could only be classified as belonging to the A lineage and were subsequently classified as HPV16_A1-3 using a 0.6 cutoff value (Table S3).

Statistical analyses
A generalized linear model (GLM) with a Poisson distribution for count data and a log-link function was used to analyze the relationship between HPV16 variants prevalence with the two variables of interest: anatomical location and geographical origin. We explored as well the contribution of all double and triple interactions. Significance level was set at α value of 0.05. Analyses were performed using R in RStudio v0.98.939 (RStudio, Inc.). To corroborate the GLM results, differences in HPV16 variant distribution stratifying by anatomical location or by geographical origin were statistically assessed by means of Pearson's chi-square test and of Fisher's test, respectively. Prevalence ratios (PRs) of HPV16 variants among invasive anogenital cancers between Europe and Central/South America or Asia were estimated using Poisson multivariate regression model with robust variance. The different HPV16 variant lineages (i.e., A1-3, A4, and D) were used as dichotomous variables.
Distribution of the polymorphic site T350G within HPV16_A1 variants was assessed by Pearson's chi-square test when stratified by geographical origin and by Fisher's test when stratified by anatomical location. To assess the possible differential prevalence of the T|G alleles, we estimated the frequency of this polymorphism within HPV16_ A1-3 variants for all anatomical locations for samples from Europe and Central/South America (Table 3). By focusing on the HPV16_A1-3 variants, we aimed to avoid the possible different epistatic interactions of the T|G alleles with the genetic background of each HPV16 variant, because the E6 350 position is also polymorphic T|G in HPV16_B, monomorphic T in HPV16_C, and monomorphic G in HPV16_D [16]. Asian cases were excluded from this analysis due to the small number of samples.
Cancer registry data show that cervical cancers are diagnosed earlier than other anogenital cancers associated with HPVs [6,[8][9][10]. Also, cancers caused by HPV16 are diagnosed earlier than cancers in the same anatomical location caused by other HPVs [4]. To disentangle the effects of virus genetics and anatomical location of the lesion on the age at diagnosis, we have followed a top-down approach, analyzing first age at cancer diagnosis for all HPV-related anogenital cancers available from our full clinical data set [4,6,8,9], then for all cases exclusively linked to HPV16, and finally for all cases exclusively linked to HPV16_A1-3 ( Fig. 2). For ages at tumor diagnosis, central values were estimated with the median, dispersion was estimated with the median absolute deviation, and differences were assessed S. Nicolás-Párraga et al.

HPV16 Variants in Anogenital Cancers
by Wilcoxon-Mann-Whitney test. Bonferroni correction for multiple comparisons was used when applicable.

Choice of informative regions and sample set description
We identified, in decreasing order, E4, E5, LCR, L2, E2, and E6 as the most informative regions in the HPV16 genome to perform phylogenetic inference (Fig. S2). PCRs were designed for each of these six genomic regions, and the LCR, L2, and E6 targets rendered the best results in terms of amplicon quality and suitability for Sanger sequencing, as well as for the number of samples that tested positive. The final sample set comprised three continents (Europe, Central/South America, and Asia) and five anatomical sites (cervix, vulva, vagina, anus, and penis) ( Table 1; Table S1). From 711 initially suitable amplicons, we were able to confidently classify 692 (97.3%) as belonging to HPV16_A1-3, A4, B, C, or D following an evolutionary placement algorithm (Table S3). Only nine samples belonged within the B or C lineages. Given the low numbers for both B and C lineages in our sample set, these sequences were not included in further analyses.

Geographical origin and anatomical location of the HPV16 variant distribution
The association between HPV16_A1-3, A4, and D variants (n = 683) with anatomical location and geographical origin was assessed using a GLM analysis. The model that fitted best our observations for the complete data set included the predictors "Geographical origin," "Anatomical location," and "Variant" (AIC = 225.88; Table S4). All predictors and their two-by-two interactions contributed significantly to the model (P < 0.0001 in all cases), but the triple interaction did not provide additional explanatory power (P = 0.36). The GLM analysis fitted very well our experimental data, as only <1.4% of all variance in HPV16 variant distribution remained unexplained by the model (Table S4). In our data set, 14.1% of the global variability arose from differential coverage of the three geographical regions (n = 342 for Europe, n = 261 for Central/South America, and n = 80 for Asia), and only 1.7% arose from differential coverage of the five anatomical origins analyzed (n = 163 for cervix, n = 121 for vulva, n = 114 for vagina, n = 115 for penis, and n = 170 for anus). Thus, the GLM approach allowed us to estimate and account for possible biases associated with design asymmetries in our data. We confirmed further the GLM results by estimating prevalence ratios for the different HPV16 variants stratifying by geography (Table 2) and by using a chi-square test after stratifying for geography and a Fisher's test after stratifying by anatomical location of the samples (Table S5).
We estimated that 68.2% of all variation in HPV16 variants abundance corresponded to actual differences in variant prevalence alone (P < 2.2e −16 ; Table S4). Globally, HPV16_A1-3 was by far the most prevalent variant, with an overall prevalence of 95% in Europe, 86% in Central/ South America, and 61% in Asia (F, Table 2). We quantified further that 9.0% of all variance in variant distribution was explained by differential association of viral lineages with geography (P < 2.2e −16 ; Table S4). This variation corresponded to a significant 1.7-fold (95% CI: 1.4-2.1) increase of HPV16_D prevalence in Central/ South America and to a significant 6.6-fold (95% CI: 4.9-8.9) increase of HPV16_A4 prevalence in Asia, in both cases compared with Europe ( Table 2, see also Fig. 1). Finally, 2.8% of all variation in variant distribution corresponded to differential association of viral lineages with anatomical location (P < 2.1e −05 , Table S4, see also Fig. 1). This variation stemmed from the increased prevalence of HPV16_A4 in vagina and in anus in Asia, where this variant prevailed (Table S5, see also Fig. 1). Differences remained significant even after excluding data from Asia for vagina and penis, both locations with low number of cases (Table S5, see also Fig. 1).

HPV16 E6 gene T350G polymorphism
Prevalence for the 350G allele within HPV16_A1-3 ranged between 47% and 59% for Europe and between 59% and 90% for Central/South America. No differences between anatomical locations were observed within each geographical region (respectively, P = 0.617 and P = 0.102 for Europe and Central/South America). However, HPV16_A1-3 cases from Central/South America showed consistently higher 350G allele frequencies compared with Europe, especially for cervical (P = 0.015) and penile (P < 0.0005) cancers (Table 3).

Age at cancer diagnosis
Cervical cancers showed significantly younger ages at diagnosis compared with other anogenital cancers (early fifties vs. early sixties, P < 0.0005) regardless of the oncogenic HPV type or of the HPV16 variant driving the cancer (Fig. 2, see also Table S6). Notably, no significant differences were observed for age at cancer diagnosis among noncervical cancers (Fig. S3, see also Table S7).

Discussion
In this study, we have assessed the HPV16 variant diversity in a comprehensive set of invasive tumors of the cervix, Figure 1. Distribution of HPV16_A1-3, A4 and D variants depending on geographical regions and anatomical location. For each combination of geography and anatomy, the number of samples is given in parentheses. Values for the contribution of differential variant prevalence (68%), for the contribution of geography (9%), and for the contribution of anatomy (3%) have been generated with a generalized linear model. For each anatomical location, the result of a chi-square test assessing homogeneity for variant prevalence values between the three geographical origins is provided (e.g., for vaginal cancers, the H0 hypothesis of the variant prevalence values being similar in Europe, Central/South America, and Asia is rejected with P = 0.004). For each geographical origin, the result of a chi-square test assessing homogeneity for variant prevalence values between the five anatomical locations is provided (e.g., for cancers from Central/South America, the null hypothesis of the variant prevalence values being similar in cervix, vulva, vagina, anus, and penis is accepted with P = 0.074). HPV16, Human papillomavirus type 16. No difference observed between anatomical location within Europe (P = 0.350) and within Central/South America (P = 0.110) Differences observed between anatomical location within Asia (P = 0.002). We have quantified for the first time the relative contributions of variant differential abundance, geographical origin, and anatomical location of the anogenital cancers to the observation of differential prevalence distribution of HPV16 variants. Our results show that there are no large differences between HPV16 lineage prevalence values among the anogenital cancers. The most prevalent viral lineage was by far HPV16_A1-3, independently of geographical origin and anatomical location of the samples, with the only exception of anal cancers in Asia, dominated by HPV16_A4, albeit based on small numbers. We have further estimated the contribution of geography and anatomical location to the observed differential HPV16 variant prevalence. The geographical origin of the cancer sample explains roughly 9% of all diversity in viral lineage distribution, and this contribution arises essentially from the increased prevalence of HPV16_A4 in samples from Asia and of HPV16_D in samples from Central/South America. In contrast, the anatomical location of the anogenital cancer explains only <3% of the observed diversity in viral lineages distribution. Indeed, we have not observed significant prevalence differences for HPV16 variants between anatomical locations of the anogenital cancers within Europe or Central/South America. In Asia, however, the higher contribution of HPV16_A4 variant in anogenital cancers exhibited a significant prevalence peak for anal cancers.
Variant distribution and diversity in HPV16 have mainly focused on the uterus cervix [16,24,25], but a sound description for viral lineages in other anogenital cancers sites was still wanting. Here, we have characterized the HPV16 variant component in a total of 692 anogenital invasive squamous cancers, including more than 550 cases from the vulva, vagina, penis, or anus. Our results confirm previous results with cervical samples and show further that the repertoire of viral HPV16 variants in anogenital cancers is largely the same regardless of the anatomical location. Consistent with our observations, two previous small studies in Northern Europe (HPV positive total N = 40; HPV16 positive N = 31) and in North America (HPV positive total N = 14; HPV16 positive N = 9) also reported an increased prevalence of HPV16_A1-3 variants in vulvar cancer (N = 29/31; N = 5/9 respectively) [26,27]. Regarding vaginal cancer, the only previous study analyzing HPV16 variants (HPV positive total N = 37; HPV16 positive total N = 26) showed exclusively the presence of HPV16_A1-3 variants in European samples [26]. For anal cancer, a Canadian study (HPV positive total N = 96; HPV16 positive total N = 79) reported around 90% prevalence for HPV16_A variants [28]. Finally, and concerning viral diversity in HPV16-associated penile cancers, an Italian study (HPV positive total N = 19; HPV16 positive total N = 18) showed above 40% prevalence for both HPV16_A1-3 and D variants, along with above 10% minor nonnegligible contribution of HPV16_B variants [29]. However, a Mexican series of penile cancer samples (HPV positive total N = 67; HPV16 positive total N = 57) showed 92% prevalence of HPV16_A1-3 and 8% prevalence of HPV16_D [30]. In certain cases, the use in previous literature of imprecise naming schemes for HPV16 variants hampers a proper comparison. To avoid ambiguity, we have adhered here to the HPV16 variant terminology as standardized by Burk and coworkers [15] and strongly encourage further research on HPV variants to stick to it. Previous HPV16 variant nomenclatures included potentially misleading geographical references (e.g., "European") or ill-defined arbitrary classifications (e.g., "prototype" or "nonprototype"). The use of a geography-based nomenclature conveys a message of a close match between differential HPV16 variants prevalence and geography, which is not justified neither by the best previously available data [16] nor by our results presented here. Minor variations in the viral genome may be responsible for important changes regarding increased persistence or viral load [36,37]. In addition, the adaptive host-pathogen interaction may further condition the differential probability for clearance or for eventual malignization of HPV16 infections [35,36]. To tackle this connection between viral genotypic diversity and cancer risk, a long-studied candidate has been the T350G(L83V) single-nucleotide polymorphism in the HPV16 genome [38,39]. In vitro studies have suggested an increased transformation potential for the 83V allele, especially for the HPV16_D lineage [38,40], although these results may be linked to a specific host genetic background [39]. In European populations, prospective studies in cervical lesions as well as case-control studies have also communicated inconsistent results regarding the involvement of the T350G polymorphism in the persistence and progression to cancer [17,18,33,41]. To address the question of the differential HPV16_E6 350G allele frequencies as a function of the geographical origin and the anatomical location of the cases, we focused exclusively on HPV16_A1-3 samples from Europe and Central/South America. We found that the 350G allele frequency did not significantly differ between anatomical locations for samples from the same geographical origin. In addition, our analysis revealed an increase in 350G allele frequency in samples from Central/South America compared with samples from Europe, consistent for all anatomical locations except for the vulva. This trend is in agreement with previous studies reporting an increased frequency of the 350G allele in Central/South America compared with European populations [31,32,33,34], as well as with the minor contribution of this allele in vulvar and in vaginal lesions [23,26]. Finally, we estimated the possible influence of the HPV16 variant on the age at cancer diagnosis. The rationale behind is threefold. First, cancer registry data show that cervical cancers are diagnosed earlier than other anogenital cancers associated with HPVs (www.hpvcentre.net). Second the studies from our group also show that cervical cancers caused by more aggressive HPVs, such as HPV16, HPV18, or HPV45, are diagnosed earlier than cervical cancers caused by other HPVs [4]. Third, the relative contribution of the different HPVs varies depending on the anatomical location of the cancer. Thus, the observed differences in age at diagnosis in HPV-related cancers of different anatomical origin could be linked to specific characteristics of the target tissue and/or to the different prevalence of the underlying viral agents. Making a coherent picture out of all available facts remains, however, a conundrum, because the contribution of HPV16 in noncervical cancers is higher than in cervical cancer: 61% in cervix [4], 62.9% in penis [6], 72.5% in vulva [9], and 75.8% in anus [8].
The only exception to this trend is vaginal cancers,   S. Nicolás-Párraga et al.

HPV16 Variants in Anogenital Cancers
showing a 57% contribution of HPV16 [10]. One would thus expect that the increased contribution of HPV16 in noncervical cancers would result in earlier age at diagnosis when comparing HPV-related cancers among locations, but this is not the case. Our study design offered a unique opportunity to disentangle both alternatives and to test these hypotheses, as we have gathered a large sample set of HPV16-monoinfected cancers of five different anatomical origins. One explanation could have been that HPV16 variants show different prevalence in cancers from different anatomical locations, but the results communicated here suggest that such differences are minor. Remarkably, our results for the complete series on HPVs and anogenital cancers [4,6,[8][9][10] show that differences in age at diagnosis remain unchanged between cervical cancers (diagnosed in the early fifties) and noncervical cancers (diagnosed in the early sixties) when only HPV16 monoinfections are considered, and even after focusing on cancers associated to a single viral lineage, namely the most prevalent HPV16_A1-3. We propose that tissuespecific characteristics of the transitional epithelia at the uterus cervix may underlie the observed trend in age at diagnosis, reflecting an enhanced propensity to infection and/or a lower rate of infection clearing, resulting in an increased vulnerability to HPV16-driven cervical carcinogenesis [42]. The anatomy of anal mucosa encompasses also the transitional epithelia of a squamous-columnar junction, but nevertheless cervical and anal cancers largely differ in incidence and prevalence, age at diagnosis as well as in the repertoire of HPVs and in the contribution of HPV16 to malignization [4,8]. A particular cellular population in the squamous-columnar junction of the uterus cervix, the so-called cuboidal cells, which retain phenotypic features of embryonic stem cells, is enriched among transformed cells located in the uterus cervix, suggesting that they are implicated in cervical neoplasia after persistent HPVs infection [43]. However, the microanatomy of the anal transformation zone is not identical to that of cervical transformation zone: cells in the cervical squamous-columnar junction are monolayered, they are in direct contact with the basal membrane, and they display a immunophenotype different tumor [44]. Such histochemical differences between cervical and anal transformation zones most likely underlie the large epidemiological differences between cervical and anal HPV-induced cancers in terms of incidence, prevalence, and age at diagnosis.
Despite the large sample size and the sound molecular identification of viral variants, our study suffers from a number of limitations. First, we chose to restrict ourselves to a very homogeneous study subject, namely invasive squamous cell carcinomas containing exclusively HPV16 DNA. By doing so, we could only recover in our repository enough samples from Europe, Central/South America, and Asia, as we did not have access to good quality samples from the African continent. Indeed, a thorough study on the evolution of any human pathogen should aim to sample the host-pathogen interaction there where the genetic diversity of the host is largest, that is, Africa, for humans [45]. Second, we recognize that given the variant variability observed in anal samples, the study would have benefited from an increase in samples collected from Asia. Third, due to the nature of our study, we did not have access to any data regarding patient ethnicity and we were thus constrained to proxy the genetic background of the patient after the geographical origin of the sample. Although the geographical origin of the clinical sample is shown to rather reliably reflect the ethnic origin of the patient, even in admixed populations such as in Central/South America [46], it would have been desirable to generate the genetic data from both human and virus from the same infection. Finally, we have used the broad geographical units defined by the United Nations to stratify geography, but we are nevertheless aware that these units cover large differences in human genetic diversity that may thus result flattened.
In conclusion, our results provide for the first time a sound estimate of the differential contribution of HPV16 variants to anogenital cancers in Europe, Central/South America, and Asia. We identify geographical origin and anatomical location as minor factors for explaining the differential prevalence of HPV16 variants in anogenital cancers. Instead, our results suggest that the increased prevalence of HPV16_A1-3 may reflect a genuinely enhanced fitness for this variant to cause cancer. Only a large prospective series exploring HPV16 variants prevalence on clinically asymptomatic infections, monitoring the initial steps of viral colonization of anogenital mucosas and following differential viral persistence and accounting for the patient's genetic background, will ultimately provide answers about the extent of the differential fitness of these viral lineages and will help understand the host interplay of the most oncogenic HPV.

Supporting Information
Additional supporting information may be found in the online version of this article: Figure S1. Cumulative pair-wise distance frequencies for HPV16 genes and control region: Pair-wise distances (substitutions per site) are calculated for the full-genome, LCR, and all ORFs of reference sequences. Horizontal plain gray line represents the pair-wise distance 95th percentile. E2 -E4 (E2 minus E4) stands for the E2 gene nonoverlapping with the E4 gene. Figure S2. Mid-point rooted HPV16 best-known maximum likelihood phylogenetic tree, constructed using 109 unique full-length genome sequences. HPV16 lineages are classified into four variants: A, B, C, and D. Bootstrap values above 700 are displayed closed to the corresponding node. GenBank accession numbers are given for all entries. Figure S3. Age at tumor diagnosis for HPV positive, HPV16 single-infected band HPV16 A1-3 invasive squamous cell carcinomas stratified by anatomical location: Box plots represent the median, the 25% and 75% quantiles. Median and range (1.5 × interquantile) are represented in brackets below the box plots. Number of samples is represented in brackets below the anatomical location. HPV16-positive samples include all the types detectable through SPF10-LIPA25 protocol (version 1; Laboratory Biomedical Products, Rijswijk, the Netherlands). Table S1. Sample distribution per anatomical location, geographical region, and country. Table S2. Primer design. Table S3. Likelihood weights for the attribution of each individual sequence to each (sub) variant, number of samples, and percentage. Table S4. Generalized linear model (GLM) and analysis of deviance. Table S5. HPV16 A1-3, A4 and D variant distribution by anatomical location within each geographical area. Table S6. Age at tumor diagnosis for invasive squamous invasive carcinomas HPV positive, HPV negative, HPV16 single infected, and HPV16 A1-A2-A3 stratified by cervix, noncervix (women), and anogenital (men) samples. Table S7. Age at tumor diagnosis for invasive squamous invasive carcinomas HPV positive, HPV16 single infected, and HPV16 A1-A2-A3 stratified by anatomical location. Table S8. Collaborating centers at the RIS HPV TT and HPV VVAP study groups.