Genetic Variability in the E6/E7 Region of Human Papillomavirus 16 in Women from Ecuador

Human Papillomavirus (HPV) infection is associated with intraepithelial neoplasia and cervical cancer (CC). Ecuador has a high prevalence of cervical cancer, with more than 1600 new cases diagnosed annually. This study aimed to analyze oncogenes E6 and E7 of HPV16 in samples collected from women with cancerous and precancerous cervical lesions from the Ecuadorian coast. Twenty-nine women, including six with ASCUS, three with LSIL, thirteen with HSIL, and seven with Cacu, were analyzed. The most common SNPs were E6 350G or L83V (82.6%) and E6 145T/286A/289G/335T/350G or Q14H/F78Y/L83V (17.4%). Both variants are reported to be associated with an increased risk of cervical cancer in worldwide studies. In contrast, all E7 genes have conserved amino-acid positions. Phylogenetic trees showed the circulation of the D (26.1%) and A (73.9) lineages. The frequency of D was higher than that reported in other comparable studies in Ecuador and Latin America, and may be related to the ethnic composition of the studied populations. This study contributes to the characterization of the potential risk factors for cervical carcinogenesis associated with Ecuadorian women infected with HPV16.


Introduction
Human Papillomavirus (HPV) is one of the most common genital infections worldwide and is classified as a sexually transmitted infection (STI) [1]. It is estimated that more than 80% of sexually active people will become infected with the virus at some point in their lives [2]. HPV is transmitted by skin-to-skin contact, mainly during sexual intercourse [3]. There are 448 genotypes of HPV detected, and 12 are classified as high-risk types [4,5]. Most people infected with HPV have no symptoms, so it is difficult to know if they are infected. Some people may experience genital warts or cervical changes that can be detected during a Pap test or Pap smear; however, conventional detection technologies have limitations because they may not detect the specific HPV type or genetic variant of infected cells [1,6,7].
HPV infection is associated with cervical intraepithelial neoplasia (CIN) and cervical cancer (CC). The most frequent high-risk oncogenic genotypes are HPV 16 and 18, which are linked to different types of cancers, including cervical, vulvar, and vaginal cancers in women, and cancer of the penis, anus, mouth, and throat in men [6]. According to the World Health Organization (WHO), about 530,000 new cases were reported in 2012, with a 7.5% HPV-associated female mortality. The incidence and mortality rates of CC are very high in sub-Saharan Africa, Latin America, and Southeast Asia [6]. HPV16 is classified as high-risk according to its capacity to produce cervical intraepithelial neoplasia. Persistent infection by these oncogenic genotypes leads to CC development. The genotype HPV16 is responsible for 50-60% of all cases [7]. The viral genome contains eight open reading frames (E1, E2, E4, E5, E6, E7, L1, and L2), a long non-coding control region (LCR) and a short non-coding region (NCR) located between E5 and L2 [8].
Mutations in HPV16 have been identified in genes encoding the E6 protein. This gene is involved in genomic instability of human cells through its interaction with p53, which may lead to altered carcinogenic potential and contribute to increased pathogenicity, whereas the amino acid conservation in the E7 protein is associated with cervical cancer development [9][10][11].
In Ecuador, more than 1600 new cases of CC are diagnosed each year (estimated data for 2018), making it the second leading cause of cancer-related deaths among women aged 20-69 years. According to GLOBOCAN, Ecuador ranks seventh in the region, with the highest prevalence of CC, after Chile [18]. In 2014, Ecuador experienced its highest peak of deaths from this disease, which represented the leading cause of cancer-related deaths, surpassing breast cancer by 4% and stomach cancer by 0.5%. In the same year, the Society for the Fight Against Cancer (SOLCA) reported that 20 out of every 100,000 women suffered from some type of neoplasm, and CC ranked second as the leading cause in the cities of Quito and Loja, with 34.1% and 35.6%, respectively [19].
Therefore, this study aimed to analyze mutations in the HPV16 E6/E7 regions in samples from Ecuadorian coastal women with cancerous and precancerous cervical lesions.

Molecular Analysis
For amplification of the E6 gene, we used the pair of primers E6-F (5 -CGAAACCGGT TAGTATAA '-3 ) and E6-R (5 -GTATCTCCATGCATGATT-3 ), and for E7 we used the primers E7-F 5 -ATAATATAAGGGGTCGGTGG-3 and E7-R 5 -CATTTTCGTTCTCGTCAT CTG-3 R [21,22]. The primer-annealing regions were E6 (nucleotides 52-575) and E7 (nucleotides 480-985), both of which flank the coding regions of these genes (nucleotides 104-559 and 562-858, respectively). PCR reactions were performed in a final volume of 25 µL with 5 µL of DNA and 10 µM of each primer, and the cycling amplification profile conditions were as follows: five minutes at 94 • C, followed by 35 cycles of 60 s at 94 • C, 60 s at 55 • C, and 60 s at 72 • C, with a final extension at 72 • C for seven minutes [23]. The PCR amplicons were detected by electrophoresis on a 2% agarose gel in TAE buffer, stained with SYBR ® Safe 10,000× (Invitrogen), and purified using the PCR Purification Kit (Qiagen). The resulting PCR products of 524 pb for E6 and 506-bp for E7 were sequenced with the original primers and analyzed separately. Briefly, the purified amplicons were sent for sequencing using the Sanger method (ADN ABI 3730xl) to Genewiz, NJ, USA. The chromatograms were manually curated, cleaned and analyzed using Codon Code Aligner software (CodonCode Corporation). The sequences were run through NCBI BLAST to confirm viral origin [24].

Genetic Characterization and Phylogenetic Analyses
The E6 and E7 sequences were aligned, and single nucleotide polymorphisms (SNPs) were identified using Codon Code Aligner Software [Codon Code Corporation]. An isolate was classified as a variant if it had at least one nucleotide substitution change (polymorphism) when compared with the reference isolate [16].

GenBank Accession Numbers
The sequences described in this study were deposited in GenBank under the following accession numbers: E6: OQ730038-OQ730060 and E7 OQ730061-OQ730081.

Phylogenetic Analysis
The phylogenetic classification of E6 sequences is shown in Figure 1. The tree topology retrieves the four major lineages of HPV16 evolution: A (European), B and C (African), and D (Asian American). Samples from Ecuador corresponding to Guayas (n = 10), Esmeraldas (n = 2), Manabí (n = 1), Santa Elena (n = 1), and Los Ríos (n = 3) were assigned to lineage A; and samples corresponding to Guayas (n = 3), Esmeraldas (n = 2), and Manabí (n = 1) were assigned to Lineage D. There were no samples assigned to lineages C and D (African origin).

Discussion
Human papillomavirus (HPV) is responsible for 4.5% of all human cancers, with CC being the most common. In 2020, CC became the fourth most frequent cancer among women worldwide, with approximately 342,000 new cases diagnosed annually in the world [27]. A total of 222 genotypes of the virus have been detected in humans [28], of which 12 are classified as carcinogenic genotypes: 16,18,31,33,35,39,45,51,52,56, 58 and 59 [5]. The development of this type of cancer and the appearance of precancerous lesions are directly related to infection by high-risk HPV, particularly by the oncogenic genotypes HPV16 and HPV18 [5,7]. HPV genotype 16 is most commonly found in cervical precursor lesions and cervical cancer, with an odds ratio association value of over 300 [5]. Additionally, several epidemiological studies have shown that non-European variants of HPV16 (lineages B/C/D) have a stronger association with high-grade cervical neoplasia and cancer than the European lineage (A) [12,13,16]. Mutations in the E6 and E7 genes may influence the processes of malignant transformation [10]. The E6 gene is located between nucleotides 104 and 559, whereas E7 is located between nucleotides 562 and 858 [29]. Mutations in the E6 gene are associated with cell cycle arrest and the absence of apoptosis, whereas the strict conservation of the E7 gene is associated with cervical cancer cells [10,11].
In this study, the most common variant was E6 350G or L83V (82.6%, 19/23). From an evolutionary perspective, the E6 350G SNP has arisen independently in different HPV16 lineages [12,13]. In this study, we found the 350G SNP in both lineage A (56.5%) and D samples (26.1%), consistent with previous reports [13]. Interestingly, in vitro studies have shown that keratinocytes infected with E6 350G have a higher capacity for cell transformation compared to those infected with E6 350T, regardless of their evolutionary origin [13], suggesting that these SNPs are potential molecular markers of cancer progression [10]. Another SNP found in E6 gene was the A532G polymorphism (six samples). This is a synonymous substitution that does not change the amino acid sequence of the E6 oncoprotein and was previously identified in HPV16 samples collected in Korea [30]. However, its biological meaning is unknown.
Regarding the E7 oncogene, all identified SNPs were synonymous substitutions. This strict conservation of the 98 amino acids of E7 (which disrupts Rb function) is critical for HPV16 carcinogenesis and has been indicated as a risk factor in large worldwide studies [11]. Briefly, we identified C678T and T749C polymorphisms in one sample. The C678T mutation has been previously reported by Antaño et al. in 0.53% of the samples from Mexico [31]. In addition, polymorphisms C732T, C789T, and G795T were identified in five samples. These SNPs are also synonymous mutations and are usually found linked, resulting in a pattern indicative of lineage D [12]. Therefore, they have been associated with an increased risk of CC [12,13]. In a study by Antaño et al. on variants of the E6 and E7 genes of HPV16 in women from southern Mexico, the combination of the three polymorphisms found (E7-C732/C789/G795) was associated with a 3.79 increased risk of developing CC compared to the wild-type genotypes of the E7 gene [31], a result probably attributable to lineage D. Furthermore, HPV16 variants from lineage D have been associated with invasive CC in other Latin American countries [32].
Finally, phylogenetic analyses based on the E6 region identified that the Ecuador sequences belonged to lineages A and D. These lineages have different origins, and are differentially distributed worldwide. For example, lineage A is frequently found in North America and Europe, whereas lineage D is more frequently found in Asia and America [14,16]. In this study, 26.1% of the samples belonged to lineage D. This result is in contrast to the frequency reported for Quito, Ecuador by Mejía et al. (6.6%) [33]. The higher prevalence of HPV16 lineage D may be related to factors such as the ethnic composition of the population. This suggests the possibility that the distribution of this lineage in the littoral region may be related to historical links and migrations within the country, and it deserves future studies including human genetic markers [34].
Other studies of comparable size have also shown a different prevalence of lineage D compared to this study. For example, a study involving 38 cervical lesions from Mexico found a frequency of lineage D of 5.3% (2/38) [35]. Another study from Mexico involving 20 healthy women and 21 cervical lesions/cancer cases (considering only the case groups for comparison) found a frequency of lineage D of 9.5% [36]. A study from Brazil involving 20 cases of cervical lesions (LSIL and HSIL) in HIV-negative women found a prevalence of lineage D of 15% [37]. Finally, a larger case-control study from Argentina by Totaro et al. (2021) identified a frequency of lineage D of 6% among 83 samples of L-SIL/HSIL/cancer lesions [38]. Nevertheless, future studies with large sample sizes will help elucidate the epidemiology of HPV16 lineages in this population. Another limitation of our study is that the E6 gene was not phylogenetically informative for identifying sublineages or for recovering the reference tree topology described by whole genome analysis, in which lineages C and D share a common ancestry [15]. However, the sequence was sufficient to recover supported phylogenetic clusters for the classification of lineages A and D. In addition, we identified co-infection of lineages A and D in four samples. Recent studies using Next-Generation Sequencing (NGS) have shown that co-infections are surprisingly common [39], highlighting the importance of addressing this issue with additional methodologies in future research. Finally, the lack of successful sequencing in nearly 20% of the samples may be attributed to low sample quality and/or a low viral load in the lesions analyzed, among other possible factors.
This study contributes to the genetic characterization of the E6 and E7 HPV oncogenes in samples from Ecuadorian women. The identification of well-described molecular markers of cancer progression, such as the conservation of E7 protein, HPV16 lineage D, and E6 SNPs 350G, will help identify women at an increased risk of developing CC.