The polymorphism analysis and epitope predicted of Alphapapillomavirus 9 E6 in Sichuan, China

The Alphapapillomavirus 9 (α-9 HPV) is a member of the Alphapapillomavirus genus and Papillomaviridae family. These viruses are almost all carcinogenic HPV, which is closely related to 75% of invasive cervical cancer worldwide, and has a high prevalence in Sichuan. The carcinogenic function is mainly realized by its E6 oncoprotein. Cell samples were collected by cervical scraped for HPV detecting and typing. HPV-16, HPV-31, HPV-33, HPV-52, HPV-58 5 α-9 genus HPV subtype positive samples were selected, their E6 gene was sequenced and analyzed. The positive selection sites of HPV E6 genes were estimated by PAML 4.8 server. The secondary and tertiary structure of E6 protein were predicted by PSIPred and Swiss-model. The T-cell antigen epitopes of E6 protein were predicted by IEDB. α-9 HPV has a high prevalence in Sichuan, China. From 2012 to 2017, 18,067 cell cervical samples were collected, and 3135 were detected with α-9 HPV infection. Among which, 250 cases HPV-16 E6, 96 cases HPV-31 E6, 216 cases HPV-33 E6, 288 cases HPV-52 E6 and 405 cases HPV-58 E6 were successfully amplified, 17, 6, 6, 13, and 4 non-synonymous nucleotide mutations were respectively detected in HPV-16, 31, 33, 52, and 58 E6, 7 positive selection sites of α-9 HPV E6 were selected out (D32E of HPV-16 E6, K35N, K93N and R145I of HPV-33 E6, K93R of HPV-52 E6, K93N and R145K of HPV-58 E6). The structure and antigen epitopes of E6 protein with amino acid substitution differ from those of wild-type E6 protein, especially for the mutation located in the E6 positive selection site. HPV E6 nucleotide non-synonymous mutation in the positive selection site influence the protein structure and decrease the antigen epitopes affinity of the E6 protein overall, making it more difficult for the HPV-infected cells to be detected by the immune system, and enhancing the HPV adaptability to the environment. Mutations influence the validity of HPV clinical diagnostic probes, the polymorphism analysis of α-9 HPV E6 enrich the data of HR-risk HPV in Sichuan China, and the detection probes designed with the polymorphism data in mind can improve the efficiency of clinical detection; Mutations influence epitopes affinity, the association of E6 polymorphism and epitope affinity can improve the design of therapeutic vaccine with good immunity and high generality antigen epitope; The above study all provide a good theoretical basis for the prevention and treatment of HPV-related diseases.


Introduction
Cervical cancer is the second most common cancer among women aged 99.7% of the cervical cancers were found to be associated with high-risk (HR) Human Papilloma Virus (HPV) persistent infection [1]. The three main genera are α, β, γ, of which α genus is associated with anal and oral mucosal infection. α-9 genus HPV is almost all carcinogenic HPV, causing 75% invasive cervical cancer worldwide, and its carcinogenicity is mainly realized by E6 and E7 early proteins encoded by HPV E6 and E7. The carcinogenicity of HPV E6 protein is more evident than that of E7 protein, in terms of the cell cycle changes and the efficiency of HPV infected cells to permanent biochemical transformations [2,3]. Without E7, E6 can connect and ubiquitin degradation the p53 protein via E6AP, interfere the cell cycle, activate telomerase and reverse transcriptase to accumulate the mutations, for infected cells immortalization and maintain immortalization, which is closely related to the function of HPV immortalization, cell transformation, and carcinogenesis [4][5][6][7].
At present, there is no specific drug for HPV treatment and mainly relies on the body's immune system to detect and eliminate the virus. Human leukocyte antigen (HLA) has the function of recognizing itself target by recognizing and stimulating CD8+ cytotoxic T lymphocytes (CTL), CD4+ helper T lymphocytes (Th) as well as binding antigen polypeptide to regulate the immune response, control and eliminate HPV infection [8][9][10][11]. The antigen epitopes are composed of specific amino acid sequences, are the targets of immune rejection [12][13][14]. HPV E6 protein has been considered as a potential target for the activation of T cells in immune response strategies and maybe an ideal target for HPV therapeutic vaccines [15,16].
HPV is a high infectious and mutable virus, with different epidemic trends and mutation types in different regions and populations [17]. The polymorphism of HPV E6 oncogene is strong, its non-synonymous mutation changed E6 protein amino acid composition, which may relate to the differences in immune response and pathogenicity [18]. Some mutant strains can even fix their genes by mutations, enhancing their adaptability to the environment and changing its infection rate [19]. For example, L83V (L90V) of HPV-16 E6 in Swedish and Italian populations as well as D25E (D32E) of HPV-16 E6 in Japanese populations have been proven to be associated with the progression of cervical cancer [20][21][22][23]. In the current HPV vaccine design targeting E6 protein, the E6 mutants have almost never been considered. In the epitope-specific vaccine designed by Kelly L, the body's long-active T-cell response to HPV-18 was induced by targeting the reference sequence of HPV E6 and E7. Due to the relatively rapid mutation rate of HPV, the host response capacity, malignant tumor prevention and therapeutic efficacy of vaccine were changed great [15].
HPV has strong regional and population differences, the prevalence of α-9 HPV and the harmfulness of E6 oncoprotein are extremely high, and E6 polymorphism is closely related to the difference of immunogenicity, adaptability, and pathogenicity. Therefore, it's urgent to study the genetic diversity, positive selection sites, antigen epitope, the protein structure of α-9 HPV E6 for providing data to realize the effective prevention and control of the disease in this region.

Samples resource
The study was ethically approved by the Education and Research Committee and Ethics Committee of Sichuan University, Sichuan, China. Eighteen thousand sixtyseven specimens were randomly collected from January 2012 to December 2017 in Chengdu Women and Children's Center Hospital, Chengdu Jinjiang District Women and Children's Hospital, Angel Women's and Children's Hospital, Affiliated Hospital of Sichuan Reproductive Health Research Center, Sichuan Reproductive Health Research Center Affiliated Hospital, Shuangnan Hospital, Chengdu Song zi niao Sterility Hospital, Infertility Hospital Affiliated to Chengdu Medical College and Chengdu Jinsha hospital. Before sample collection, written informed consent was obtained from all patients or their guardians, and patient privacy is strictly protected. The cell specimens were collected randomly by cervical scraped and placed in − 20 °C antiseptic buffer (9 g NaCl, 10 g C 6 H 5 CO 2 Na, 1 L H 2 O).
Information) Primer Blast based on the reference sequences, the primers and reference sequences used for the molecular characterization analysis of α-9 HPV E6 were shown in Additional file 1: Table S1 and synthesized by TSINGKE (Chengdu, China). The PCR reaction system consists of 5 µl HPV DNA, 13.1 µl ddH 2 O, 1 µl primers, 0.4 µl TransTaq DNA polymerase, 2.5 µl dNTPs, and 3 µl buffer. The reaction conditions were shown in Additional file 1: Table S1. The PCR products were visualized by gel electrophoresis in 2% agarose gel (Sangon Biotech Co., Ltd.). The target products of E6 were purified and sequenced by TSINGKE at least twice (Chengdu, China).

Sequence analysis Genetic polymorphisms analysis of α-9 HPV E6 gene
The successfully amplified sequences was sequenced, and the sequences were analyzed by NCBI BLAST, Premier5, and DNAMAN5.2.2. Nucleotide mutations of α-9 HPV E6 sequence were determined according to the reference sequence in GenBank (Additional file 1: Table S1). Chisquare test was used to confirm the significance of data differences, and P < 0.05 was considered as significant differences between the data.

Amino acid composition and protein structure analysis of α-9 HPV E6
Mega6.0 software was used to translate the E6 nucleotide sequence into the E6 protein sequence. PSIPred (http:// bioinf. cs. ucl. ac. uk/ psipr ed/) and Swiss-model were used to analyze the secondary and tertiary structure of E6 protein.

T-cell antigen epitopes predicted analysis of α-9 HPV E6 protein
According to the Chinese major histocompatibility complex database (dbMHC) average frequency of HLA alleles, 13 HLA-I and 6 HLA-II alleles were selected (Additional file 1: Table S2). Based on the selected HLA alleles, the T-lymphocyte epitopes of α-9 HPV E6 protein were predicted by IEDB resource (http:// www. iedb. org/). According to the method recommended by IEDB, lower the percentile rank (PR) of antigen epitopes is better the affinity, peptides with PR < 1.0 for HLA-I and peptides with PR < 5.0 for HLA-II were deemed to meaningful as well as selected for further analysis.

The protein structure analysis of α-9 HPV E6
Nucleotides non-synonymous mutation changed the amino acid composition of protein, which affects the structure of the protein, while the protein function is mainly realized by its structures. With the help of Mega6.0, PSIPred and Swiss-model, the primary, secondary, and tertiary structure difference of α-9 HPV E6 protein reference and mutation sequence were revealed. In HPV-16 E6, I34R, L35V, R62I, P66A and L90V all located in β-fold, E120D, D32N, D32E located in the periphery of the spatial protein structure and close to the active region of znic granules. The amino acid number in the α-helix and β-sheet regions are different in protein reference and mutation sequence. Details are shown in Figs. 2 and 3.
S74T and Q113R located in α-helix of HPV-33 E6 protein, K93N located on the outer edge of E6 protein and near the zinc granule, the above amino acid substitutions all located in the active region of the protein. Amino acid substitution changed the number of amino acids in the α-helix and β-sheet region, as well as made the E6 protein show more contact with the environment (Figs. 6, 7).
R77K, E89K of HPV-52 E6 located in α-helix, N127I located in the β-sheet region, K93R situated on the outer edge of E6 protein and close to the zinc granules, all the amino acid substitutions found in the active region of the protein. Amino acid substitution increased the number of amino acids in the α-helix and β-sheet region, and the number of buried amino acids decreased (Figs. 8,9). E32Q, D86E, K93N and R145K are located in the coil of HPV-58 E6 protein, E32Q, K93N situated on the outer edge of E6 protein, and close to the zinc granule, belonging to the active region of E6 protein. Amino acid substitution increased the number of amino acids in the α-helix region and decreased the number of amino acids in the coil region (Figs. 10, 11).

The antigen epitopes analysis of α-9 HPV E6 protein
In HPV-16 E6 reference sequence, 97 HLA-I and 25 HLA-II epitopes were selected out, and epitope prediction results of variants were different, details were shown in Additional file 1: Tables S3 and S4. M1K made epitope affinity increase; R17G, D32N, D32E, I34R, L35V, P66A, H85Y and L90V changed epitope number and affinity; Table 2 Nucleotide mutation and amino acid substitution in HPV-31 E6 Compared with the HPV-31 E6 reference sequence (J04353), the mutations are marked with the corresponding bases and amino acid, and those without changes are replaced with a dash (-). No. means the number of nucleotide mutations, Location means the sites of nucleotide mutations, Mutation means the style of nucleotide mutations, Frequency (%) means the percentage of nucleotide mutations, Substitution means the amino acid substitution that occurred by nucleotide mutations No  1  2  3  4  5  6  7  8  9  10  11  12  13   Location  27  69  141  178  190  194  205  219  228  297  321  368  413 Mutation  Mutation    Table 7. 125 HLA-I and 43 HLA-II epitopes of HPV-31 E6 reference sequence was selected out, and epitope of variants were different (Additional file 1: Tables S5, S6). H60Y, K65R changed epitope number and affinity, T64A decreased epitope number, and K123R, A138V made new epitope appear. The effection of amino-acid substitution on HPV-31 E6 epitopes were summarized in Table 8.

HPV-31 E6
109 HLA-I and 41 HLA-II epitopes of HPV-33 E6 reference sequence was selected out, epitope of variants Table 5 Nucleotide mutation and amino acid substitution in HPV-58 E6 Compared with the HPV-58 E6 reference sequence (D90400), the mutations are marked with the corresponding bases and amino acid, and those without changes are replaced with a dash (-). No. means the number of nucleotide mutations, Location means the sites of nucleotide mutations, Mutation means the style of nucleotide mutations, Frequency (%) means the percentage of nucleotide mutations, Substitution means the amino acid substitution that occurred by nucleotide mutations No  1  2  3  4  5  6  7  8   Location  78  94  150  198  258  279  286  434 Mutation  were different (Additional file 1: Tables S7, S8). K35N decreased epitope number and affinity; S74T, N86H, K93N and R145I changed epitope number and affinity; Q113R increased epitope affinity. The effection of amino-acid substitution on HPV-33 E6 epitopes were summarized in Table 9. 95 HLA-I and 50 HLA-II epitopes of HPV-52 E6 reference sequence was selected out, epitope of variants were different (Additional file 1: Tables S9, S10). E21K, L46V, E89K, K93R and N127I changed epitopes number and affinity; 105 M increased epitope affinity; N122K decreased epitopes number and E138K decreased epitope affinity. The effection of amino-acid substitution on HPV-52 E6 epitopes were summarized in Table 10.

Discussion
Cervical cancer is the second major malignant tumor in women in childbearing age and seriously threatens women's health. HR-HPV persistent infection is closely related to the occurrence and development of cervical cancer and other malignant diseases. α-9 HPV is almost all carcinogenic and associated with 75% cervical cancers. Sichuan is a multi-ethnic mixed residence area with a high prevalence rate of α-9 HPV. From 2012 to 2017, α-9 HPV positive samples accounted for 53.68% of all HPV positive samples and 73.22% of high-risk HPV positive samples, showing an increasing trend.
Gene non-synonymous mutations change the amino acid composition, and structure of the protein, as well as the functions of protein, are mainly realized by its Fig. 4 Secondary structure of HPV-31 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-31 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-31 E6 mutation sequence structures. HPV E6 consists of one N-terminal (residues 1-36), one C-terminal (residues 147-158) and two Zinc fingers (residues 37-73 and 110-146, CxxC-(29x)-CxxC) three domains. The two Zinc finger binding domains form a deep pocket, which can mediate the most important tumor suppressor protein p53 ubiquitination degradation by binding to the "LXXLL" sequence of E6AP protein [24,25]. 145-149 were PDZ domain-containing combined region that was the target of E6 protein for cellular transformation and the carboxy-terminal half being principally involved in p53 binding [26]. K93N of HPV-33 E6, K93R of HPV-52 E6, and K93N of HPV-58 E6 are located at the outer edge of E6 protein and near the zinc granule [27]. The N86H, R145I of HPV-33 E6 and D86E, R145K of HPV-58 E6 occurred in the same positions; K93N of HPV-33 E6, K93R of HPV-52 E6 and K93N of HPV-58 E6 all located in the 93rd of the E6 Fig. 5 Tertiary structure of HPV-31 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-31 E6 reference sequence; b is the larlarian diagram of HPV-31 E6 reference sequence homology modeling; c is homology modeling structure of HPV-31 E6 mutation sequence; d is the larchian diagram of HPV-31 E6 mutation sequence homology modeling protein; those amino acid substitutions located in protein active region, can cause the E6 terminal and the trend of the carboxyl end structure disorder. Those protein conformational changes may lead to the differences in their ability to bind to the host p53 protein and other potential proteins, thus affecting the pathogenicity of α-9 HPV [29].
Positive selection sites make the gene frequency of the corresponding amino acid increasingly stable and enhance the species' adaptability to the environment [28]. According to the calculation, the positive selection site of HPV-16 E6 was D32E (128/250); HPV-33 E6 were K35N (42/216), K93N (42/216), R145I (33/216); HPV-52 E6 was K93R (252/288); HPV-58 E6 were K93N (111/405), R145K (16/405); These positive selection sites all belong to its high frequency non-synonymous mutation, suggesting that these positive selection sites, which contribute to the adaptation of α-9 HPV E6, have been widely spread. Fig. 6 Secondary structure of HPV-33 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-33 E6 reference sequence; a is the secondary structure pattern diagram constructed based on HPV-33 E6 mutation sequence. The black boxes are the difference areas between the reference and mutation sequence secondary structure HPV E6 protein plays a key role in cervical cancer development. During HPV infection, the immune system will treat E6 protein as an antigen presentation to eliminate HPV infection and reduce the occurrence risk of HPV-related diseases with the help of body immunity [30]. Some specific mutations in HPV E6 may lead to the differences in the infection ability and pathogenicity of the virus. Positive selection sites of HPV-16 E6 D32E, D32N located in protein outer edge and next to the zinc granules; 6 HLA-II epitopes disappeared due to D32E/ D32N; In Japan, D32E has been confirmed to be associated with the development of cervical cancer [31]; T-cell antigen epitopes affinity reduced due to D32E, D32N, that may lead to the persistent infection of virus and promote the development of cervical cancer. Positive selection sites K35N and K93N of HPV-33 E6 are close to the zinc granules, while R145I located in the E6 PDZ Fig. 7 Tertiary structure of HPV-33 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-33 E6 reference sequence; b is the larlarian diagram of HPV-33 E6 reference sequence homology modeling; c is homology modeling structure of HPV-33 E6 mutation sequence; d is the larlarian diagram of HPV-33 E6 mutation sequence homology modeling binding domain; K35N and R145I made 35-43KPLQR-SEVY for HLA-C*03:02 and 141-149RSRRRETAL for HLA-C*01:02 disappear respectively, K93N changed the epitope number and affinity; Above three positive selection sites of HPV-33 E6 located in E6 protein active region, affect the protein conformation, function and reduced the immunogenicity of the peptide containing the above sites to a certain amount. The positive selection site K93R of HPV-52 E6 changed the epitope number and decreased the affinity of excellent epitopes. K93N and R145I of HPV-58 E6 reduced the affinity of excellent HLA-I antigen epitopes. Those positive selection sites reduced the immunogenicity of E6 overall, which may make HPV-infected cells more difficult to be detected Fig. 8 Secondary structure of HPV-52 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-52 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-52 E6 mutation sequence. The black boxes are the difference areas between the reference and mutation sequence secondary structure by the immune system, and enhance HPV adaptability to the environment. No positive selection site was selected out in HPV-31 E6, and the high-frequency non-synonymous mutation sites enhanced the affinity and number of E6 epitopes, which may relate to its extremely low prevalence.
Studies have found that mutations affect the efficiency of HPV vaccine [32], the protein structure and antigen epitope bioinformatics prediction method were introduced to analyze the influence of HPV E6 mutation on protein conformational and immunogenicity. We discussed the relationship between protein structure, Fig. 9 Tertiary structure of HPV-52 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-52 E6 reference sequence; b is the larlarian diagram of HPV-52 E6 reference sequence homology modeling; c is homology modeling structure of HPV-52 E6 mutation sequence; d is the larlarian diagram of HPV-52 E6 mutation sequence homology modeling positive selection site, antigen epitope and pathogenicity of α-9 HPV E6 protein in Sichuan was discussed for the first time. Amino acid substitution in positive selection sites may affect the virus infection efficiency, immunogenicity, and pathogenicity by altering their T-cell epitopes affinity to improve the survival ability of α-9 HPV as well as an adaptation to evolution. These results help explore the relationship between HPV E6 polymorphism and HPV affection capacity and its action mechanism to improve the therapeutic vaccine of α-9 HPV in Sichuan regions of China.   10 Secondary structure of HPV-58 E6 comparing reference to the variant sequence. Note a is the secondary structure pattern diagram constructed based on HPV-58 E6 reference sequence; b is the secondary structure pattern diagram constructed based on HPV-58 E6 mutation sequence. The black boxes are the difference areas between the reference and mutation sequence secondary structure selection site) of HPV-33 E6 and D86E, R145K (positive selection site) of HPV-58 E6 occurred in the same location of E6. α-9 HPV E6 positive selection sites that adaptive to the environment D32E, K35N, K93N, R145I, K93R, R145K have been widely spread, they all located in the E6 protein active region and altered their protein structure, as well as overall reduce the immunogenicity of the E6 protein, so that HPV infected cells are more difficult to be detected by the immune system and enhance the adaptability of α-9 HPV to the environment. Fig. 11 Tertiary structure of HPV-58 E6 comparing reference to the variant sequence. Note a is the homology modeling structure of HPV-58 E6 reference sequence; b is the larlarian diagram of HPV-58 E6 reference sequence homology modeling; c is homology modeling structure of HPV-58 E6 mutation sequence; d is the larlarian diagram of HPV-58 E6 mutation sequence homology modeling