Data
The ACCME cohort profile and characteristics of the study participants have been described in detail elsewhere.31 The mean (SD) age of the participants was 38 (10) years. The PC plots of the genotypes showed clustering of the study participants by ancestry (Supplementary Figure 1).
Case-Control Genome-wide Association Study
A total of 10,067 people (224 cases and 9,843 controls) passed the filters and controls for the prevalent HPV case-control GWAS, while 10,525 people (679 cases and 9,846 controls) passed the filters and controls for the persistent HPV case-control GWAS.
The Miami plot, Figure 1, shows all the SNPs associated with both prevalent and persistent hrHPV infections. Table 1 shows the top SNPs associated with prevalent hrHPV infections. One variant, rs116471799, located on chromosome 4, was significantly associated with prevalent hrHPV (Odds Ratio [OR], p-value [p]: 3.63, p=1.76 x 10-8). This variant is located 77kb 3' of LIM Domain Binding 2 (LDB2). The regional plot for rs116471799 (Supplementary Figure 2) shows the surrounding markers, including ENSG00000248138 Gene, a novel transcript and protein coding gene and Transmembrane Anterior Posterior Transformation 1 (TAPT1). Over 200 SNPs reached borderline genome-wide significance for association with prevalent hrHPV. Table 1 also shows the top SNPs associated with persistent hrHPV infections, four of which reached genome-wide significance. The SNP with the strongest significant association, rs2342234 (OR: 0.35, p=1.50 x 10-8), is located 37kb 3' of Transmembrane Phosphoinositide 3-Phosphatase and Tensin Homolog 2 (TPTE2), a protein coding gene. The regional plot for rs2342234 (Supplementary Figure 2) shows the surrounding markers. The second significantly associated SNP, rs115537401 (OR: 2.50, p=3.26 x 10-8), is located 30kd of SMAD Family Member 2 (SMAD2). The third and fourth significantly associated SNPs, rs1879062 (OR: 1.79, p=3.81 x 10-8), and rs1028206 (OR: 1.78, p=4.45 x 10-8) are clustered on Chromosome 5 (D’ = 1, r2 = 0.97). Over 200 SNPs reached borderline genome-wide significance for persistent hrHPV.
Case-Case Genome-wide Association Study
Table 2 shows the top SNPs associated with hrHPV in the case-case GWAS. Only one variant remained at genome-wide significance level and three variants were borderline significant, after the p-values were corrected for multiple testing. The significantly associated variant, rs88064 (OR: 6.28, adjusted p=4.69 x 10-9), is located 1.1Mb 3' of DAOA Antisense RNA 1 (DAOA-AS1). The three borderline associated variants, rs11234546, rs10514013 and rs41309770, had the same OR: 14.5 and adjusted p: 5.33 x 10-8.
Replication Studies
To evaluate the replication of the top variants in the discovery study, we examined them in PACS, the only other available GWAS of hrHPV17, to our knowledge. We were able to replicate rs116054643 for the association with prevalent hrHPV (discovery cohort - OR: 1.88, p=4.52 x 10-7; replication cohort - OR: 1.68, p=0.01, Table 1) and rs12448674 for the association with persistent hrHPV (discovery cohort - OR: 0.75, p=4.18 x 10-7; replication cohort - OR: 0.71, p=0.04, Table 1). For the top variants associated with prevalent and persistent hrHPV in the discovery cohort, we observed that the MAF was similar in the replication cohort, but the direction and magnitude of the association differed for about half of the variants.
We investigated the transferability of previously reported SNPs associated with hrHPV and cervical cancer in the ACCME study sample. Of the 339 SNPs, 254 were present in our dataset. Six SNPs (or 2%, 6/254) showed exact replication, i.e., consistent direction of effect for the alleles and p < 0.05, for prevalent hrHPV (Supplementary Table 1) and eighteen SNPs (or 7%, 18/254) showed exact replication for association with persistent hrHPV (Supplementary Table 2).
Meta-analysis
We conducted a meta-analysis with the GWAS of ACCME and PACS, using METAL and METASOFT software. The meta-analysis yielded genome-wide significant loci, LDB2 (rs116471799, p=1.98 x 10-8), PPP3CA (rs116054643, p=2.50 x 10-8) and NCK2 (rs138289957, p=4.03 x 10-8; rs13407090, p=4.12 x 10-8; rs75543399, p=4.34 x 10-8) associated the prevalent hrHPV (Supplementary Table 3). The meta-analysis yielded genome-wide significant loci, TPTE2 (rs2342234, p=1.10 x 10-8 and rs2152687, p=3.80 x 10-8) associated the persistent hrHPV (Supplementary Table 4). Figure 2 shows a Miami plot for the prevalent and persistent hrHPV meta-analysis. The meta-analysis results from METAL and metasoft were similar.
Classical HLA Associations
DPB1*01.01 (total frequency = 0.42, frequency in hrHPV negative = 0.42, frequency in women with prevalent hrHPV infection = 0.44, frequency in women with persistent hrHPV infection = 0.44), DQA1*01:02 (total frequency = 0.38, frequency in hrHPV negative = 0.38, frequency in women with prevalent hrHPV infection = 0.38, frequency in women with persistent hrHPV infection = 0.37) and DQB1*06:02 (total frequency = 0.27, frequency in hrHPV negative = 0.27, frequency in women with prevalent hrHPV infection = 0.27, frequency in women with persistent hrHPV infection = 0.24) were the most common HLA alleles in the study population.
None of the HLA alleles, haplotypes or amino acids were significantly associated with prevalent hrHPV. Seven HLA alleles, DRB1*15:03, DRB1*03:01, DRB1*13:02, DQB1*05:02, DQB1*06:02, DQB1*02:01 and DQB1*02:02, were associated with persistent HPV in unadjusted regression models (Table 3). With further adjustment for the first principal components (model 2), all the alleles located on DRB1: DRB1*15:03, (OR: 0.77, 95% CI: 0.66 – 0.89, p=0.006), DRB1*03:01, (OR: 1.29, 95% CI: 1.07 – 1.56, p=0.04), DRB1*13:02 (OR: 1.45, 95% CI: 1.17 – 1.79, p=0.006), and DQB1*05:02 (OR: 1.48, 95% CI: 1.17 – 1.89, p=0.006) remained statistically significant, while DQB1*06:02 (OR: 0.84, 95% CI: 0.74 – 0.96, p=0.06) and DQB1*02:01 (OR: 1.21, 95% CI: 1.03 – 1.43, p=0.06) were borderline statistically significantly associated with persistent hrHPV. Most of these significant alleles are included in the haplotypes associated with persistent hrHPV, including C*07:01 - DQB1*06:02, B*58:02 - DRB1*15:03 and DQB1*05:02 - DRB1*13:02. A total of 37 haplotypes were significantly associated with persistent hrHPV and 46% (17/37) of them were located on DRB1. The top haplotypes are shown in Table 3. Eighty-five amino acids were significantly associated with persistent hrHPV. About 32% (27/85) of the significant amino acid positions were located on DRB1. DRB1_42_Serine (OR: 1.45, p=8.19 x 10-05) and DRB1_59_Tyrosine (OR: 1.54, p=2.27 x 10-04) showed the strongest associations (Supplementary Table 5) with persistent hrHPV.
Zygosity tests showed that heterozygous DRB1*15:03 (OR: 1.25, p=0.01) or DQB1*06:02 (OR: 1.25, p=0.008) allele was associated with higher risk of persistent hrHPV infections. While heterozygous DRB1*13:02 (OR: 0.68, p=0.001) or DRB1*03:01 (OR: 0.78, p=0.02) allele was associated with lower risk of persistent hrHPV infections (Supplementary Table 6). Zygosity tests for amino acid residues showed that only homozygous DRB1_59_H and C_48_A were significantly associated with prevalent hrHPV, while 62 heterozygous amino acids were significantly associated with persistent hrHPV (Supplementary Table 7).
HLA Peptide-binding Affinity Predictions
The analyses of peptide binding predictions showed that HLA-DRB1 alleles that were positively associated with persistent hrHPV showed weaker binding with peptides derived from hrHPV proteins and vice versa as shown by IC50 and EL score data (Figure 3, Table 4). In contrast to other HLA-DRB1 alleles, those alleles positively associated with persistent hrHPV had significantly higher IC50 (and lower EL scores), while negatively associated alleles had significantly lower IC50 (and higher EL scores). These findings remained consistent with consideration of the full set of peptide predictions and after excluding the set of non-binders (IC50 > 5000 nM, Supplementary Table 8, 9 and 10). Therefore, persistence alleles were associated with weaker peptide binding and vice-versa.
Functional Annotation
Functional annotation in the HaploReg database revealed that rs116471799 (77kb 3' of LDB2), which was associated with prevalent hrHPV at genome-wide significance level, is associated with altering 17 regulatory motifs. The top variants associated with persistent hrHPV at genome-wide significance level were associated with altering regulatory motifs and showed both promoter and enhancer activities of the histone modification signatures H3K4me1, H3K4me3, H3K27ac and H3K9ac in several tissues. rs2342234 (37kb 3' of TPTE2) is associated with H3K9ac promoter histone marks in H1 BMP4 Derived Trophoblast Cultured Cells. rs115537401 (30kb 3' of SMAD2) is associated with H3K9ac and H3K4me3 Promoter histone marks in primary mononuclear cells from peripheral blood and lungs respectively, and H3K4me1 Enhancer histone marks in H1 BMP4 Derived Trophoblast Cultured Cells, Mesenchymal Stem Cell Derived Adipocyte Cultured Cells, Breast Myoepithelial Primary Cells, and several other tissues.
Querying the Genotype-Tissue Expression (GTEx) database, we found that LDB2 is highly expressed in the uterus (median Transcript Per Million [TPM] = 84.6, endocervix (median TPM = 61.4) and subcutaneous adipose tissue (median TPM = 54.4) (Supplementary Figure 3). TPTE2 is mainly expressed in the testes (median TPM = 18) and in EBV-transformed lymphocytes (median TPM ~1). Its median TPM in other tissues was <1 (Supplementary Figure 4). SMAD2 is expressed in the uterus (median TPM = 6.3) and endocervix (median TPM 5.8) (Supplementary Figure 5).
Gene Enrichment Analysis
The MAGMA analysis identified over 100 significant gene-sets associated with prevalent and persistent hrHPV. The top gene-set associated with prevalent hrHPV was PID P53 downstream pathway (p=0.001), which includes 134 genes located in CP: canonical pathways (Supplementary Table 11). The p53 downstream genes are involved in cell-cycle arrest, DNA repair, senescence, apoptosis, and cancer. The antigen processing and presentation gene-set located in CP: KEGG (canonical pathways) of C2 was the top gene-set associated with persistent HPV (p=0.001) (Supplementary Table 12). Antigen presentation is mediated by the Major Histocompatibility Complex (MHC) class I.
Polygenic Risk Scores
The bar plots displaying the model fit of the PRS at different PT and the quantile plots which illustrate the effect of increasing PRS on predicted risk of prevalent and persistent hrHPV are shown in Figure 5. For prevalent hrHPV, the statistically significant model included 1,621 SNPs, the best-fit PT is 0.00035, and the p-value of the model fit was 0.002 (-log10(0.002) = 2.7). The statistically significant model for persistent hrHPV contained 3,925 SNPs, with best-fit PT of 0.00115, and p-value of 0.04 (-log10(0.046) = 1.34).