L1 and L2 gene polymorphisms in HPV-58 and HPV-33: implications for vaccine design and diagnosis

Background Cervical cancer is associated with infection by certain subtypes of human papillomavirus (HPV). The L1 protein comprising HPV vaccine formulations elicits high-titre neutralizing antibodies and confers protection against specific HPV subtypes. HPV L2 protein is an attractive candidate for cross-protective vaccines. HPV-33 and HPV-58 are very prevalent among Chinese women. Methods To study the gene intratypic variations and polymorphisms of HPV-33 and HPV-58 L1/L2 in Sichuan China, HPV-33 and HPV-58 L1 and L2 genes were sequenced and compared with other genes submitted to GenBank. Phylogenetic trees were constructed by maximum-likelihood and the Kimura 2-parameters methods (MEGA 6). The secondary structure was analyzed by PSIPred software, and HPV-33 and HPV-58 L1 homology models were created by SWISS-MODEL software. The selection pressures acting on the L1/L2 genes were estimated by PAML 4.8. Results Among 124 HPV-33 L1 sequences 20 single nucleotide mutations were observed included 8/20 non-synonymous and 12/20 synonymous mutations. The 101 HPV-33 L2 sequences included 12 single nucleotide mutations comprising 7/12 non-synonymous and 5/12 synonymous mutations. The 223 HPV-58 L1 sequences included 32 single nucleotide mutations comprising 9/32 non-synonymous and 23/32 synonymous mutations. The 201 HPV-58 L2 sequences comprised 26 single nucleotide mutations including 9/26 non-synonymous and 17/26 synonymous mutations. Selective pressure analysis showed that most of the common non-synonymous mutations showed a positive selection. HPV-33 and HPV-58 L2 were more stable than HPV-33 and HPV-58 L1. Conclusions HPV-33 and HPV-58 L2 were better candidates as clinical diagnostic targets compared with HPV-33 and HPV-58 L1. Clinical diagnostic probes and second-generation polyvalent vaccines should be designed on the basis of the unique sequence of HPV-33 and 58 L1/L2 variations in Sichuan, to improve the accuracy of clinical detection and the protective efficiency of vaccines. Electronic supplementary material The online version of this article (doi:10.1186/s12985-016-0629-9) contains supplementary material, which is available to authorized users.


Background
Human papillomavirus (HPV) infection plays a critical role in the development of cervical cancer [1]. The risk of developing cervical cancer in HPV-infected patients is 50-fold higher than in uninfected women [1,2]. Approximately 500000 new cases of cervical cancer are diagnosed every year, with 250000 deaths, more than 85 % of all patients belong to low-income countries [3,4].
The HPV genome is packaged within the major capsid late protein L1 and the minor capsid proteins L2 [12,13]. Five L1 proteins form a pentamer, and 72 pentamers constitute the virus capsid. The L1 and L2 proteins selfassemble into virus-like particles (VLPs) that induce high levels of neutralizing antibodies and are highly protective [14,15]. The L1-VLPs are the components used in the design of specific prophylactic vaccines. Vaccine targeting L1 only prevents infection by specific HPV subtypes because of the lack of cross-protective epitopes in different HPV subtypes. HPV L2 protein also induces neutralizing antibodies, the N-terminal of L2 protein contains crossprotective epitopes and represents the target of neutralizing antibodies [14]. Therefore, targeting L2 may be an attractive approach for a candidate vaccine.
The data supporting HPV-58 L2 and HPV-33 L1/L2 in China are limited. The molecular variants of HPV-33 L1/ L2 worldwide are not widely reported. Ethno-geographical variations are observed in distribution of HPV subtypes. Among different subtypes of HPV, there are subtypes and variants that can acquire biological advantages through fixed mutations in their genomes, and even small variations could result in small adaptive improvements that could alter the composition of an HPV-infected population [17]. Altered amino acid composition affects the host immune response, and in such cases, intra-type protection may be less effective [18]. Ideally, the diagnosis and treatment of vaccine constructs needs to be developed locally.
This study investigated the HPV-33 and HPV-58 L1/ L2 gene polymorphism and intratypic variations in Sichuan, China. This study can provide essential data for future research on viral prevention and therapeutics. Above all, our study provides critical data facilitating the development of diagnostic probes and design of vaccines based on HPV-33 and HPV-58 L1/L2.

Study population and specimen collection
Cervical specimens were collected from Sichuan Reproductive Health Research Center Affiliated Hospital, The Angel Women's and Children's Hospital, The Chengdu Western Hospital Maternity Unit, and The Peoples' Hospital of Pengzhou, and other institutions. Between January 1, 2009, and December 31, 2014, women presenting for cervical screening underwent histology and cytology evaluations for cervical disease. Women over 14 years of age and with visible cervical lesions and/or HPV-related diseases (e.g., cervicitis, cervical intraepithelial neoplasia) were eligible for inclusion. Cervical specimens were collected from participants and placed in a preservative buffer and stored at −20°C.

PCR amplification and sequencing
The entire L1/L2 genes of HPV-33 and HPV-58 were amplified using primer pairs. The primers were designed according to the GeneBank reference sequences for HPV-33 (GenBank: M12732.1) and HPV-58 (GenBank: D90400). The primer sequences were listed in Table 1. Each 50 μL PCR reaction contained 5 μL of extracted DNA (10-100 ng), 200 μmoL MgCl2 and dNTPs, 2 U of Pfu DNA polymerase (Sangon Biotech, Shanghai, China), and 0.25 μmoL of each primer. The PCR conditions were 95°C for 10 min; 35 cycles of 50 s each at 94°C, 54°C (difference for each gene) for 60 s, 72°C for 60 s, and a final step of 72°C for 7 min. The PCR amplification products were visualized on 2 % agarose gels stained with GeneGreen nucleic acid dye under the ultra violet light WFH-202. Target products were sequenced by Sangon Biotech.

Variant identification and analysis
The sequences and variations were analyzed by NCBI Blast, and DNAMAN version 5.2.2. The nucleotide positions were numbered according to the GeneBank reference sequences of HPV-33 (GenBank: M12732.1) and HPV-58 (GenBank: D90400). All the data were confirmed by repeating PCR amplification and sequence analysis at least twice.

Phylogenetic trees analysis
Then, phylogenetic trees of respective HPV-33 L1/L2 and HPV-58 L1/L2 variation patterns were constructed with the maximum-likelihood trees using MEGA (Molecular Evolutionary Genetics Analysis Version) 6 software and Kimura's two-parameter model. The tree topologies were evaluated using bootstrap resampled 1,000 times [19].

Analysis of the selection pressures and secondary structure
To estimate the positive selections at particular sites of the HPV-33 and HPV-58 L1/L2 gene sequences, the codeml program in the PAML (Phylogenetic Analyses by Maximun Likelihood) version 4.8 package was used to perform the likelihood ratio tests (LRTs) to infer non-synonymous and synonymous nucleotide divergence for coding regions by the method of Nei and Gojobor [20][21][22][23]. The secondary structure of the reference sequences were analyzed by PSIPred servers at (http://bioinf.cs.ucl.ac.uk/psipred).
PSIPred is a simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks, which enable the analysis of output obtained from PSI-BLAST (Position Specific Iterated-BLAST). A very stringent cross validation of the method indicated that PSIPred 3.2 attained an average Q3 score of 81.6 % [24].

Results
Of all the HPV-58 and HPV-33 samples, only 223 sequences of HPV-58 L1 gene, 201 sequences of the HPV-58 L2 gene, 124 sequences of the HPV-33 L1 gene, and 101 sequences of the HPV-33 L2 gene were obtained owing to the small number of copies of infected HPV in some women and limited amplicons obtained for sequencing, and there maybe a potential sampling bias against integrated HPV genomes resulting in lost capsid genes.

Gene polymorphism of HPV-33 L1
Compared with the HPV-33 reference sequence (Gen-Bank: M12732.1), the nucleotide variation rate of HPV-33 L1 was 68.55 % (85/124) in the 124 HPV-33 L1 sequences studied. We identified 20 single nucleotide changes among the 124 sequences studied. Specifically, 12/20 (60.00 %) were synonymous and 8/20 (40.00 %) were non-synonymous mutations. Only 1 nonsynonymous mutation was observed in sequence encoding the helix. The detected mutations are summarized in Table 2. The maximum-likelihood phylogenetic tree can be seen in Fig. 1a. The secondary structure predicting result of the HPV-33 L1 was showed in Additional file 1: Figure S1 (A).
The secondary structure predicting result of the HPV-33 L2 was showed in Additional file 1: Figure S1 (B).

HPV-58 L1 gene polymorphism
Compared with the HPV-58 reference sequence (Gen-Bank: D90400), the nucleotide variation rate of HPV-58 L1 was 96.86 % (216/223) in the 223 HPV-58 L1 sequences studied. We identified 32 single nucleotide changes among the 223 sequences studied. Specifically, 23/32 (71.88 %) were synonymous mutations and 9/32 (28.12 %) were non-synonymous mutations. 3 non-synonymous mutations were observed in sequences encoding the helix, 1 non-synonymous mutation was observed in sequence encoding the sheet. The detected mutations are summarized in Table 4. The maximum-likelihood phylogenetic tree is shown in Fig. 2c. The secondary structure predicting result of the HPV-58 L1 was showed in Additional file 2: Figure S2 (C).

HPV-58 L2 gene polymorphism
Compared with the HPV-58 reference sequence (Gen-Bank: D90400), the nucleotide variation rate of HPV-58 L2 was 68.55 % (168/201) in the 124 HPV-58 L2 sequences studied. We identified 26 single nucleotide Note: M12732 was used as reference. The nucleotides conserved with respect to the reference sequence are marked with a dash(-), whereas a variation position was indicated by a letter. Predicted amino acid changes were also shown. The "S" in the last row of the table means Sheet, the "H" means Helix      KU550626  Table 5. The maximum-likelihood phylogenetic tree is displayed in Fig. 2d. The secondary structure predicting result of the HPV-58 L2 was showed in Additional file 2: Figure S2 (D).

D90400 G G T A G A A T A A G G A A T A G A G A A A G A
Note: D90400 was used as reference. The nucleotides conserved with respect to the reference sequence are marked with a dash(-), whereas a variation position was indicated by a letter. Predicted amino acid changes were also shown. The "S" in the last row of the table means Sheet, the "H" means Helix Table 6 Site-specific tests for positive selection on HPV-33 L1 We demonstrated that HPV-58 L1 and L2 variation frequencies were higher than those of HPV-33 L1 and L2. Among these variations, C5807A, A5822G, G5984A, A6437G, T6470C, T6485C and A6695C (E377D, which is a positive selection variation) represented novel HPV-58 L1 mutations, which were found until now only in Sichuan, China [15,17,[28][29][30][31][32]. In HPV-58 L2, mutations other than A4621C and A5206G, were newly reported [15]. In HPV-33 L1, T5960C, A5997G (K135R), T6385G, G6396A (G268E), T6463C, G6520A, T6613C, A6694G, A6951C (K453T), C7044A (P484H) and G7063A were reported for the first time, these newly reported mutations were only found in China in reports related to HPV-33 L1 [30,33,34]. We reported the HPV-33 L2 mutations for the first time.

Models
The Due to the high diagnostic value of L1, and its variability [35], L1 is often selected as a clinical diagnostic target. We considered the most sites in common mutations to design clinical diagnostic probes targeting HPV L1 and L2 genes. The intratypic variations observed in L1 and L2 enabled the analysis of known and novel HPV subtypes [36,37]. In our study, we observed that the sequence patterns and single nucleotide changes of HPV-33 L2 were less frequent than those of HPV-33 L1.
The sequence and single nucleotide changes of HPV-58 L2 were less frequent than those of HPV-58 L1, while those of HPV-33 and HPV-58 L2 were more conserved than those of L1, suggesting that HPV-33 and HPV-58 L2 were better candidates as clinical diagnostic targets compared with HPV-33 and HPV-58 L1.
Nearly all conformational epitopes are located on one or more of the outwardly facing surface-exposed loops of BC, DE, EF, FG, and HI [38].  [38]. Although we found no mutations at these Fab interaction sites, we did find several mutations (T56N, K135R, T266K, and G268E of HPV-33 L1; N82T of HPV-58 L1) next to these sites. We believe mutations occurred on the outwardly facing surface-exposed loops  Tables 6, 7, 8 and 9: ln L, the log-likelihood difference between the two models; 2Δl, twice the log-likelihood difference between the two models; the positively selected sites were identified with posterior probability ≥ 0.9 using Bayes empirical Bayes (BEB) approach ne asterisk indicates posterior probability ≥ 0.95, and two asterisks indicate posterior probability ≥ 0.99. NA means not allowed. NS means the sites under positive selection but not reaching the significance level of 0.9 Amino acid residues 69-81 and 108-120 of L2 protein are highly conserved and contain cross-reacting epitopes that play an important role in inducing neutralizing antibodies [14]. G4438A (D77N) was discovered at residues 69-81 and 108-120 of HPV-33 L2. G4452A (G70E) and G4470A (S76N) were identified at residues 69-81 and 108-120 of HPV-58 L2. Amino acid residues 33-52, 73-84, 89-100, and 121-140 of L2 contain non-neutralizing antibody epitopes [14]. G4438A (D77N) of HPV-33 L2 and The G4470A (S76N) of HPV-58 L2 were discovered at residues 73-84. These mutations must be considered during vaccine design targeting HPV-33 and 58 L2.
This is the first study examining the role of L1/L2 proteins of HPV-58 variants in Sichuan and that of the L1/L2 proteins of HPV-33 in China. Because of limitations related to sample size, sample copies, and sequencing technology, the present study may have had a sampling bias against integrated HPV genomes. The data presented in this study have significant implications for the understanding of intrinsic geographical and biological differences in HPV-33 and HPV-58 L1/L2, as well as contribute to the design of clinical diagnostic probes and second-generation polyvalent vaccine based on HPV-33 and HPV-58 L1/L2.

Conclusions
Mutations in HPV L1 and L2 may alter the virulence of variants, and also define altered epitopes in vaccine design. The reference sequences of HPV-33 and 58 only represent minor sequence patterns of HPV-33 and 58 L1/L2. Further, the distribution of HPV-33 and 58 L1/L2 variations in Sichuan has its own peculiarities. Therefore, clinical diagnostic probes and second-generation polyvalent vaccines should be designed on the basis of the unique sequence of HPV-33 and 58 L1/L2 variations in Sichuan, whereby the accuracy of clinical detection and the protective efficiency of vaccines can be improved.

Additional files
Additional file 1: Figure S1. Predicted of HPV-33 L1 and L2 proteins Secondary Structure by PSIPred. Note: A) Secondary structure within the reference sequence of HPV-33 L1 protein, B) Secondary structure within the reference sequence of HPV-33 L2 protein. Black arrow indicates corresponding mutation is a non-synonymous mutation. (TIF 948 kb) Additional file 2: Figure S2. Predicted of HPV-58 L1 and L2 proteins secondary structure by PSIPred. Note: C) Secondary structure within the reference sequence of HPV-58 L1 protein, D) Secondary structure within the reference sequence of HPV-58 L2 protein. Black arrow indicates corresponding mutation is a non-synonymous mutation. (TIF 922 kb) Additional file 3: Figure S3. Sequence alignment of L1 from the HPV types HPV16, HPV33, and HPV58. Note: The residues conserved across four HPV types are shown in capital letters, whereas the nonconserved residues are given in lowercase letters. The five loops displayed on the surface of the virus particle are marked and labeled. (TIF 1448 kb)