The possible impact of novel mutations in human papillomavirus 52 on the infection characteristics

Human papillomavirus 52 (HPV52) infection is prevalent in the Chinese population, and variations in HPV52 show correlations with oncogenicity. However, no specific variation in HPV52 was reported to show relevancy to infection characteristics. In this study, we retrieved 222 isolates of E6 and L1 full-length genes from 197 Chinese women with HPV52 infection. After sequence alignment and phylogenetic tree construction, we found that 98.39 % of the collected variants belonged to the sublineage B2 and two variants displayed incongruence between the phylogenetic tree of E6 and L1. The analysis of the infection pattern showed that the presence of C6480A/T mutation in the L1 gene was associated with single infection (P=0.01) and persistent infection (P=0.047) of HPV52, while the A6516G nucleotide change was relevant to transient infection (P=0.018). Our data also indicated that variations T309C in the E6 gene and C6480T, C6600A in L1 were more commonly presented in patients with high-grade cytology (P<0.05). One HPV52 breakthrough infection after vaccination was identified, which hinted at the immune escape post-vaccination. Young coitarche age and non-condom usage were correlated to multiple infections. This study provided insight into the polymorphism of HPV52 and revealed the impact of variations in HPV52 on its infection characteristics.


INTRODUCTION
Human papillomavirus (HPV) arouses extensive attention for its oncogenic potential, and 15 high-risk genotypes with a correlation with cervical cancer have been identified. The distribution of HPV genotypes varies among regions, typically represented by the significantly high prevalence of HPV52 and HPV58 in Asia [1]. According to the latest report published by the Catalan Institute of Oncology (ICO) and the International Agency for Research on Cancer (IARC), the prevalence of HPV52 in women without lesions and women with precancerous cervical lesions including low-grade squamous intraepithelial lesion (LSIL) and high-grade squamous epithelial lesion (HSIL) ranked only second to HPV16 in the world, especially in less developed regions [2]. In China, the prevalence of HPV52 in cervical LSIL even exceeded HPV16 and the overall genotype distribution in the total population showed the indistinguishable prevalence of HPV52 and HPV16 in western China [3,4].

ACCESS
Although the study on HPV16/18 remains highlighted, more and more epidemiological evidence is pushing HPV52 to the centre stage of HPV research.
Diverse intratype HPV variants have been identified with sequences included in the GenBank database. Based on tree topology and nucleotide sequence differences, the variants of the same genotype are further classified as lineages (1-10 %) and sublineages (0.5-1 %). Four lineages have been identified in HPV52 and lineages A, B, and C each branch into two sublineages. The variation between lineages leads to the disparity in infection and cancerous capacity. The L1 capsid protein plays a major role in mediating the entry of the virus, thus initiating the infection. While E6 and E7 proteins modulate the oncogenic pathway by interfering with p53 and Rb, contributing to tumorigenesis. The Asia-prevailing lineage B of HPV52 exhibited a higher potential to induce severe cervical lesions compared to lineage A [5]. In addition, evidence of specific mutations including K93R (A379G) in HPV52 E6 and C632T (T20I) and G760A (G63S) in HPV58 E7 that contributed to precursor lesions of cancer have been revealed [6,7]. The genetic variability of HPV casts a great challenge on disease prevention and treatment.
Most HPV infection results in spontaneous clearance, whereas 10-20 % of infection persist latently [8]. At present, there is no generally recognized definition for persistent HPV infection. The most common definition is ≥2 consecutive positive HPV DNA tests (regardless of HPV type) and the minimum duration of HPV persistence is 6-12 months [9,10]. Persistent infection is closely correlated to the incidence and progression of cervical precancerous lesions and cancer. HPV E6 and E7 oncoproteins were identified to interfere with sensors, adaptors, and signalling molecules, thus altering innate immune pathways, and contributing to persistent infection [11]. The role of HPV genomic polymorphisms in characterizing infection is well-interpreted by the persistent tendency of non-European HPV16 variants [12,13]. The prevalence of variation T350G (L83V) in HPV16 E6 in the population of persistent infection was found, albeit controversial, as the conflicting result was also retrieved [14][15][16][17]. Similarly, HPV52 lineage C variants displayed a slower clearance than lineage A [18], and non-prototype long control region (LCR) was found to be correlated with HPV52 persistence [19]. However, no specific variation in HPV52 was reported to shape the infection pattern. Cervical cytology and HPV-testing are the foundation of cervical cancer screening procedures, and the cytology alteration serves as the first step of HPV-related malignancy transformation. However, whether polymorphisms in the HPV52 genome contributed to the cytological progression varied among reports [7,20,21]. HPV prophylactic vaccines provided protection for susceptible populations, whereas the occurrence of breakthrough infections was noteworthy [22,23]. The HPV persistence and vaccine escape are main concerns for cervical cancer prevention, and more clinical data are requisite for further research.
Considering the high prevalence of HPV52 in China and the lack of data and research on variants related to infection characteristics, we analysed the nucleotide sequences of prevailing HPV52 variants to investigate the relevancy between variations and HPV infection status, as well as the severity of the cervical lesions. In addition, the clinical features of women with HPV persistent infection and vaccine breakthrough infection were also analysed to provide information for future investigation.

Study population and sample collection
From March 2021 to March 2022, a total of 197 HPV52-positive cervical samples were collected from the colposcopy clinic at the Gynaecology and Obstetrics Hospital of Fudan University. The study was approved by the Ethical Committee of Gynaecology and Obstetrics Hospital of Fudan University and informed consent was taken from each patient. Those patients were admitted to the colposcopy considering abnormal results of liquid-based cytology tests (LCT), HPV testing, or cervical treatment history. The cervical exfoliated cells were collected with the sampling brush and stored in cell preservation solution. The coloscopy was performed by colposcopists who had over 5 years of experience in colposcopic diagnosis. Punch biopsies under colposcopy were performed in suspicious areas based on acetowhite and Schiller tests. The histopathological diagnoses were made by two senior pathologists. The clinical features including HPV genotype, HPV infection duration, LCT result, cervical histopathological result, age, medical history, geography, Ob/Gyn history, HPV vaccination history, age of coitarche, number of sex partners and contraception were recorded. The cytologic results were classified as negative for

Impact Statement
Human papillomavirus (HPV) is a pathogen that relates to carcinogenesis in the cervix, anus, and oropharynx with high mortality. Nowadays, various research on phylogeny and variation of HPV has revealed its association with pathogenicity and infectivity. However, there are insufficient data for HPV52 compared to HPV16/18, and prevailing variants associated with infection characteristics are rarely noticed. We expanded the sequence data of HPV52 from the Chinese mainland and clarified the variants that might be in relation to cervical lesion severity and infection characteristics. Our study provided basic data for developing therapeutic protocols and vaccines for HPV infection in China. It also provided a theoretical basis and support for HPV recombination.

DNA extraction, PCR amplification and sequencing
Genomic DNA was extracted from the cervical exfoliated cells using TIANamp Genomic DNA Kit (TIANGEN) and collected in 50 µl nuclease-free water, then stored at −80 °C. To amplify the full-length gene sequences of HPV52 E6 and L1, the typespecific primers were designed. The sequences of primers were as follows: HPV52 E6 gene, sense 5′-AGAC CGAA ACCG GTGT ATAT ATATAGA-3′, anti-sense 5′-CCAC ACCA TCTG TATC CTCCTCA-3′, and HPV52 L1 gene, sense 5′-TCCA TTGA GTCA GGTC CTGACAT-3′, anti-sense 5′-ACAT GCAA ACAA CACA GTAC ACACA-3′. PCR reactions were done in a 50 µl reaction volume containing one unit of TaKaRa Ex Taq (Takara, Japan), 1×Ex Taq Buffer (Mg 2+ plus), 200 µM of dNTP Mixture and 20 pmol of each primer. The thermal cycling parameters were 98 °C for 10 s, 55 °C for 30 s and 72 °C for 40 s (E6)/90 s (L1), with a final extension in 72 °C for 7 min. PCR amplicons were separated on 1.2 % agarose gels and visualized by YeaRed Nucleic Acid Gel Stain staining under UV transillumination. A positive control of HPV52 plasmid and a negative control without template DNA were performed in each set of PCR. After purification, DNA amplicons were sequenced by ABI 3730xl DNA Analyzer using the same PCR primers. All samples had their PCR repeated in duplicate and were sequenced from both directions to exclude PCR artefacts, and the consensus sequences were obtained with Phred Quality Score higher than 20.

Phylogenetic analysis and variant identification
The phylogenetic analysis was conducted in mega v11.0 following the guideline for phylogenetic reconstruction [24]. The HPV52 standard sequences of each lineage and sublineage (A1: X74481, A2: HQ537739, B1: HQ537740, B2: HQ537743, C1: HQ537744, C2: HQ537746, D: HQ537748) were all downloaded from GenBank database, NCBI. The reference genomes for HPV52 lineages and sublineages were referred to the publication of Burk et al. [25]. All obtained sequences (including 98 HPV52 E6 sequences and 124 L1 sequences) were aligned with the HPV52 prototypes by ClustalW and muscle, and pairwise distance was calculated in the Bootstrap method to consolidate the items with the same sequence and estimate p-distance. Phylogenetic trees were constructed based on the nucleotide sequences using the maximum likelihood method in mega 11.0 with 1000 bootstrap replicates, and the pairwise comparison was visualized in Microsoft Excel.

Statistical analysis
The Independent-samples t-test was applied to compare the difference in the continuous variables between two groups and Levene's test was applied to conduct a homogeneity test of variance. Mann-Whitney Rank sum test was implemented to evaluate the difference in histopathology between the two groups. Pearson χ 2 and Fisher's exact test were employed to evaluate the distribution of HPV52 mutations to HPV infection status and disease severity. P<0.05 was considered statistically significant. Statistical analysis was conducted in SPSS v25.0 and visualized by Microsoft Excel.  Table S2, 130 subjects were each divided into three group pairs based on the pattern of HPV infections to clarify the association with demographic characteristics. Younger coitarche age (P=0.021) and a lower proportion of condom usage (P=0.045) were observed in the multiple infection population. Older average age (P=0.023) in the persistent infection population and a higher proportion of multiple infections (P=0.002) in the genotype alteration group were also impressive.

Demographic characteristics of the population and risk association with HPV infection patterns
A total of ten women were vaccinated with the HPV prophylactic vaccine. Five women received the nonavalent vaccine, four received the quadrivalent vaccine, and one received the bivalent vaccine. All the vaccinated women received their vaccination after their coitarche. Compounding the situation, five women were HPV-positive pre-vaccination, and three women did not receive HPV testing before vaccination so their pre-vaccination conditions were unclear. Two patients were negative for HPV before vaccination, one of which received the quadrivalent vaccine and was infected with HPV52 9 months after vaccination completion. The other patient had a 2 month history of HPV39, HPV52 co-infection with cytological NILM, and finished three-dose nonavalent vaccination at the age of 26. Interestingly, one woman had HPV58 infection before the vaccination, which turned negative and subsequently became HPV52 positive after the inoculation.

Variations of E6 and L1 genes
From the 130 patients with complete clinical records, 98 E6 and 124 L1 full-length sequences were retrieved. The gap between samples and isolates may be explained by the low viral titre or unstable amplicons. We obtained 32 distinct L1 variants and 17 E6 variants of HPV52, which were all submitted to the GenBank database and received the accession numbers. Compared to the sublineage B2 prototype reference sequence (GenBank: HQ537743), 33.67 % (33/98) of the E6 isolates and 43.55 % (54/124) of the L1 isolates showed nucleotide mutations. And eight novel E6 variants and 11 novel L1 variants were also identified. A summary of nucleotide and amino acid sequence variation throughout the E6 and L1 fragments were shown in Figs 1 and 2, respectively.
In the E6 gene, 17 nucleotide substitutions were identified with nine novel variations and nine non-synonymous substitutions resulting in amino acid change. Two non-synonymous substitutions appeared more than once, including G379A (R93K) and C467A (N122K). The synonymous nucleotide substitution G356A appeared in 17 isolates and was specified in the B lineage. In the L1 gene, 45 nucleotide substitutions were identified with 13 novel variations and 13 non-synonymous substitutions, in which T5606C (L5S), A6212C (N207T), and A6999C (E469D) appeared in more than one isolate. Synonymous substitutions C6480A/T were observed in 21 L1 isolates, among which 13 were C6480A and eight were C6480T. In addition, the substitution in nucleotide 6480 was only seen in the B2 sublineage. Another synonymous substitution G5799A appeared in 11 isolates, which also occurred in prototype sequences of lineage A and sublineage C2. Four isolates had more than two substitutions after being aligned with the reference sequence.

Phylogenetic tree
As shown in Fig. 3, maximum likelihood phylogenetic trees and pairwise comparison based on the L1 and E6 were inferred from obtained HPV52 variants and seven reference sequences. Lineages A, B, C, and D were marked in red, blue, green and purple, respectively. In total, 98.39 % (122/124) of variants belonged to B2 sublineages with 56.45 % (70/124) of L1 isolates sharing the same sequence with the B2 prototype. The remaining two variants belonged to the C2 sublineage. No isolates clustered in the branches of A and D lineage. And no novel lineage or sublineage was found based on the phylogenetic tree and pairwise distance comparison. In the 98 E6 isolates obtained, 66.33 % (65/98) of isolates shared the same sequence with the lineage B prototype as the E6 gene of sublineage B1 and B2 were identical. Consistent with the L1 gene, 96.94 % (95/98) of E6 isolates belonged to lineage B. However, variants from two samples displayed incongruence between the phylogenetic trees of E6 and L1 as illustrated in bold. They belonged to the B2 sublineage in the phylogenetic tree based on the L1 sequence, while they belonged to the A2 sublineage in the E6 tree.

Distribution of E6 and L1 variation in different HPV infection patterns and cervical lesions
As seen in Table 1, the presence of T309C in E6 variants exhibited a significant difference in the distribution of cytology compared to other variations (P=0.043), but no correlation was shown (OR=1.333, 95 % CI: 0.757-2.348). There were no differences in cytology distribution in other E6 variations (P>0.05). The variations in E6 showed no correlation with multiple, persistent HPV infections and the histopathological result of cervical biopsy (P>0.05). Table 2

DISCUSSION
The gene sequence of HPV is the basis of the infectivity and pathogenicity of HPV. This study investigated the correlation between HPV52 variations and HPV infection characteristics, in addition to lesion severity. We identified 17 variations in the E6 gene and 45 variations in the L1 gene. The presence of C6480A/T in the L1 gene was found to be associated with single and persistent infection, while the A6516G was relevant to transient infection. The presence of C6480T in L1 showed positive correlations with the high-grade cytology. In terms of infection characteristics, we found that young coitarche age and non-condom usage were correlated to multiple infection and multiple infection seemed to be correlated to genotype alteration.
We retrieved 124 L1 sequences and 98 E6 sequences from 130 cervical samples. The phylogenetic tree based on the L1 gene was the generally acknowledged genotyping method. Among the obtained isolates, 98.39 % of variants belonged to sublineage B2 and 1.61 % of variants belonged to the sublineage C2 based on the L1 tree. The prevalence of HPV52 sublineage B2 was confirmed by previous publications, represented by the large-scale study involving specimens from 14 sites worldwide revealing its great dominance in Asia, which reached 89.0 % [5]. The prevalence of sublineage B2 was further verified by two studies performed in two Asian countries, Korea (91 isolates) [7] and southwest China (53 isolates) [26]. Our study analysed the lineage attribution of 124 isolates with full-length L1 gene and collaboratively evidenced the prevalence of lineage B2 in China. The incongruence of E6 and L1 genes in the phylogenic tree was observed in two variants. One isolate was retrieved from a single infection with HPV52, the other was from the multiple infection with HPV52 and HPV39. The phylogenetic incongruence between early and late genes of alpha-HPV have been identified decades ago, which was most pronounced by E6 and L2 [27,28]. The incongruence revealed different evolutionary pathways, which might result from the breakage of the gene, intensive selection, and recombination [29]. The practical circumstance of the variants in our study cannot be affirmed as we didn't obtain the whole genome, but it shall offer enlightenment to the possibility of HPV recombination [30].
In this study, the proportion of polymorphic nucleotides coincided with previous reports of Alpha-9 species. The proportion of variable nucleotide sites was estimated as 4.4 % across the 7993nt HPV52 genome by Chen et al. [31], which ranged from 2.5-4.5 % for E6 and 2.20-3.8 % for L1 from various reports [6,7,26,32]. The percentages of variable nucleotide positions across HPV16 and HPV18 showed similar results [33,34]. We also identified nine novel variations in E6 and 13 in L1, which have not been reported Fig. 3. Phylogenetic tree of the HPV52 variants. Maximum likelihood analysis (with mega 11.0 programme) of E6 (a) and L1 (b) nucleotide sequences were inferred from obtained HPV52 variants and seven reference sequences. The numbers below branches indicate bootstrap values. Lineages A, B, C, and D were marked in red, blue, green and purple, respectively. The bold variants were sequenced from two samples that shared the same E6 sequence and showed phylogenetic incongruence between E6 and L1 trees. We did not detect the secondary structure alteration of the protein correlated with nonsynonymous nucleotide substitutions in E6 and L1. Most substitutions in the L1 gene were synonymous. The four variations in L1 with the most frequent occurrences were C6480A, G5799A, C6490T, and K6516G. The nucleotide synonymous substitution G356A in E6 was the most frequent variation compared to prototype B and specifically appeared in lineage B. Although synonymous substitution did not alter the protein sequence, it had been shown that synonymous codon mutations affected transcription modifications, translation, RNA folding and splicing, thus impacting cellular processes [35]. In addition, the synonymous substitution T309C in E6 and C6600A in the L1 exhibited distributions in high-grade cytology. Since the two variations both occurred only once, their correlation with the severity of cytological results was not sufficient. However, the substitution C6480T in L1 exhibited a significantly positive correlation with high-grade cytology, which hasn't been identified previously. Synonymous changes in L1 could have impacted a viral regulatory element since the L1 sequence is not exclusively a coding sequence. Those synonymous mutations may alter the pathogenicity by modulating the transcription and RNA modification, which merits further study.
The association analysis of clinical features showed that early coitarche and non-condom usage might be vulnerable factors to multiple infections, while multiple infections were not related to persistent infection and the severity of the cervical lesion. The finding contrasted with some perspectives considering that multiple HPV infections were associated with infection persistence and the incidence of the precancerous lesion [36,37]. However some publications demonstrated that the multiple infections of non-16/18 hrHPVs did not increase the incidence of HSIL and cancer [38]. The number of subjects was small in our study with only 15 women with HSIL and two with cancer, resulting in the inadequacy of correlation analysis for lesion severity. During the analysis of variations, we accidentally found that the synonymous substitutions in E6 excluding A530G all occurred in samples with HPV52 single infection. But no variation in E6 showed a correlation with HPV infection status. Nevertheless, the presence of C6480A in L1 variants was significantly correlated with HPV52 single infection, and L1 variants with substitution in nucleotide 6480 were all from samples with single infections. Most previous studies merely focused on variants from single HPV infection, and little research studied the relationship between variations and multiple infections. The mechanism behind multiple infections is still controversial, the possibility of synergistic and competitive interactions between different genotypes has been put forward [39,40]. Our study revealed the potential of the autogenous regulation of HPV infection, which might contribute to its competitive advantages over other genotypes. HPV persistence contributes to the progress of cervical lesions and serves as the risk factor for cervical cancer. The relevancy between persistent infection and older age was presented, which could be attributed to the decline in immune function. The link between variation and HPV persistence was reported by Aho et al., and a nonprototypic LCR variant was identified as the only independent predictor of HPV52 persistence, resulting in the loss of a binding site for a repressor of HPV expression [19]. Although a previous study suggested the irrelevance of L1 polymorphism to HPV52 persistence [32], our findings provided the possibility that variations in the L1 gene impacted HPV persistence. The variation C6480T served as a promoter for persistent infection, while A6516G acted as a protector, which prevented HPV persistence. The variations in HPV52 E6 showed no correlation with HPV persistence, which coincided with the previous publication [19].
Vaccine escape has become a hot topic as the infection of SARS-CoV-2 after vaccination was common because of variation [41,42]. Ten vaccinated patients with HPV52 infection were found. All women received their first dose after coitarche, which had been proven to be associated with the infection of vaccine types [22]. The patient who experienced the HPV58 infection before vaccination and was replaced by HPV52 after vaccination infection displayed an intriguing example of genotype alteration. We considered those who were negative for HPV before vaccination and got infected with vaccine-covered HPV genotypes after receiving vaccines as vaccine escape populations. Only one case can be identified as HPV52 breakthrough infection in this study, and the woman developed multiple infections of nonvaccine type HPV39 in addition to HPV52. Although the reduction in the neutralizing litres was possible, her smoking habit might also confer a high risk for breakthrough infection.
The research on variations in HPV16/18 has been widely studied and numerous sequences have been submitted to the GenBank database. By contrast, the number of HPV52 sequences in the collection was much fewer although its prevalence in Asia has been well-recognized. In this study, we tripled the number of E6 and L1 sequences of HPV52 from China mainland in the GenBank database and provided valuable information on the genomic diversity of HPV52. However, there were also some limitations. Firstly, most patients were from Shanghai and Jiangsu province, which made it difficult to analyse the geographic correlation. Secondly, most variations only occurred once, whose clinical significance cannot be well-interpreted. Further research will be required for investigating the concrete functions of those synonymous substitutions.

CONCLUSION
In the light of our findings, the synonymous C6480A/T variant in HPV52 L1 is correlated with single and persistent HPV52 infection, while A6516G is more frequently found in transient HPV52 infection. Also, young coitarche age and non-condom usage are correlated to multiple infection. The study provides insights for future studies on epidemiology, phylogenetics, pathogenicity, HPV infection pattern, and HPV vaccine escape.

Funding information
This work was supported by the National Natural Science Foundation of China (Grant No. 82272970) and the Science and Technology Commission of Shanghai Municipality (Grant No. 21Y11906500; No.22ZR1408800)