Genome-wide profiling of the human papillomavirus DNA integration in cervical intraepithelial neoplasia and normal cervical epithelium by HPV capture technology

HPV integration plays an important role in cervical carcinogenesis. HPV genotypes and the exact integration sites were investigated using HPV capture technology combined with next generation sequencing in 166 women. Three, one and six integration sites were verified in 7 HPV-positive ‘normal cervical epithelium’, 6 HPV-positive CIN2 and 15 HPV-positive CIN 3 samples, respectively. Of the 10 integrations, one and nine were involved with HPV33 and HPV16, respectively. Our study accurately evaluated HPV integration level in CINs and normal cervical tissues using high-throughput viral integration detection method providing basic evidence for HPV integration-driven cervical carcinogenesis.

Human papillomavirus (HPV) infection has been recognized as an important cause of cervical precancerous lesions or cancer, and yet is necessary but not sufficient for cervical carcinogenesis process 1,2 . Therefore, in addition to HPV infection, HPV integration could contribute to the cervical carcinogenesis process.
HPV integration could upregulate the expression of viral oncogenes E6 and E7 and eventually promotes host genomic instability, which could be a crucial event of cervical carcinogenesis process 3,4 . Additionally, the level of HPV integration was positively associated with cervical intraepithelial neoplasia (CIN) grades and was proposed as a marker for cervical disease progress 5,6 . Therefore, comprehensively and accurately identifying the sites and level of HPV integrations from normal cervical epithelium to CIN and cervical cancer is necessary to assess HPV-induced carcinogenesis process. However, cervical HPV integrations have only been comprehensively investigated in invasive cancer and limited data were available in CINs and normal cervical epithelium 7,8 . Moreover, most previous assays could not identify HPV integrations sensitively, which could lead to the underestimation of HPV integration level 7,9,10 . Recently, the study by Hu et al.  (HIVID) 8 . However, Nigel Dyer et al. claimed that 87% of the integration breakpoints reported by Hu et al. were likely to be experimental and computational artifacts according to their own data analysis pipeline 11 . Meanwhile we used the same HPV capture technology and the next generation sequencing as in Hu's study to detect cervical HPV integrations in 39 HPV-positive primary cervical tumor samples and 2 cell lines and yet only identified 117 unique validated HPV integration breakpoints 12 . Moreover, we found that the Sanger sequencing validation rate based on one, two, three, or more than three discordant paired-end reads was 3.7%, 47.8%, 44.4%, and 83.3%, respectively, indicating that HIVID could be a sensitive method to detect integrated HPV and yet has a high false positive rate with fewer supporting reads. HPV integration rates in the study by Hu et al. could be overestimated in cervical cancer and even in CINs. Therefore, given that comprehensive and accurate data regarding HPV integrations in CINs and normal cervical tissues were limited, we detected HPV integrations in a series of CINs and normal cervical epithelium samples using base-resolution HPV capture technology and the next generation sequencing as previously reported 12 .
In this study we enrolled 166 participants with CIN or 'normal cervical epithelium' in order to investigate the level of cervical HPV integration in CINs and normal cervical epithelium.
Determination of potential HPV integration sites. As described in our previous study 12 , if a specific site had one or more discordant reads mapped on one end to the HPV reference genome and the other to human chromosome, it would be considered as a potential HPV integration locus. A total of 37, 21, 44 and 45 potential HPV integrations were identified in 7 HPV-positive 'normal cervical epithelium' , 8 HPV-positive CIN1, 6 HPV-positive CIN2 and 15 HPV-positive CIN 3 samples, respectively ( Table 1).

Figure 1. Flow chart of study design, sample selection and HPV detection.
Scientific RepoRts | 6:35427 | DOI: 10.1038/srep35427 E2 (n = 2), E2/E4 (n = 1), L1 (n = 1), and L2 (n = 1). Due to the limited number of integration events, we did not find hot spots in the human genome (Table 1). All integration positions were examined for the presence of fragile sites in the human genome. Of the 10 integration positions, one was located in a fragile site and three were close to a fragile site (Supplementary Table S1). Meanwhile, the human genomic sequences within 50 kb of an integration locus were investigated. Seven integration sites were located in cellular genes with six in introns and one in an exon (Supplementary Table S1).
It is worth noting that HPV integrations could occur in cervical tissue with normal epithelium and the integration rate in CIN 3 was significantly higher than those in CIN 1 or CIN 2. This indicated that HPV integrations could play an important role in the early stage of cervical carcinogenesis, although our results were lack of the statistical analysis and the sample size may not be enough for the analysis. In addition, we found similar characteristics of the HPV integration sites in cervical cancer and non-cancer specimens 12 . For example, HPV integration sites were mainly located in the E1 and E2 regions of the viral genome and in cellular genes of the human genome.
Two different types of mechanisms are presumed to explain cervical carcinogenesis process induced by HPV integration, i.e. altering viral gene expression or disrupting cellular transcripts. In order to determine the effect of these two mechanisms, it is necessary to comprehensively profile HPV integrations in host and viral genome. However, detection methods of HPV integration in most previous studies were low-throughput and lower sensitivity. In order to better understand the cervical carcinogenesis induced by HPV integration, this approach is able to discern fusion breakpoints accurately at single-base resolution for further elucidating the effect of HPV integration on viral and its flanking cellular transcripts. In addition, since HPV integration could lead to the viral persistence and moreover HPV persistent infection plays role in cervical carcinogenesis, this approach provides unbiased, genome-wide integration information to monitor the persistent or permanent infection.
However, there are some limitations in our study. Firstly, since HPV integration rates in 'normal cervical epithelium' and CIN were low and moreover only 36 HPV-positive women were involved in HPV integration analysis, comprehensively evaluating the sites and the level of HPV integrations was limited to some extent. Secondly, the cross-sectional study did not investigate the temporal relationship between HPV integration and CINs. Thirdly, in our study, CIN enriched tissue was not sampled by laser microdissection, which did not rule out contamination from normal adjacent epithelium or the underlying stroma. This would overestimate HPV integration rate in CINs to some extent although this effect was small due to significantly lower HPV integration rate in 'normal cervical epithelium' than in CINs. Fourthly, Since HPV DNA was detected using a highly sensitive PCR primer set (SPF1/GP6+ ) amplifying a 184-bp fragment of the L1 open-reading frame before performing HPV capture and sequencing, this might produce HPV false negatives from L1 breakpoints among the 166 samples. However, since proportion of breakpoints occurring in this targeted region of L1 was low 12 and in most situations, HPV viral genome may be existed in both episomal and integrated forms, HPV false negative probability due to L1 breakpoints was small.
In summary, the accurate identification of HPV integrations in CINs and normal cervical tissues could provide basic evidence for HPV integration-driven cervical carcinogenesis and be served as individualized markers in cervical cancer screening in the future.

Materials and Methods
Study population and specimen collection. A total of 166 cervical biopsy specimens were collected and diagnosed with normal cervical epithelium or acute/chronic cervicitis without atypical hyperplasia (n = 64), CIN 1 (n = 62), CIN 2 (n = 19) and CIN 3 (n = 21) from Beijing Cancer Hospital, Beijing, China, between 2014 and 2015. All biopsy specimens were reviewed by two experienced pathologists who confirmed the diagnosis of CIN. Cervical biopsies were histologically diagnosed using criteria defined by the World Health Organization 15 . All cases have no histological evidence of epithelial malignancy of the cervix in this study. Normal cervical epithelium or acute/ chronic cervicitis without atypical hyperplasia is defined as 'normal cervical epithelium' in this study. Punch biopsy samples were divided into two parts; one was kept for histopathological analysis, and the other one was used for HPV typing and integration analysis. Individual informed consents had been collected from all participants. This study received ethical approval from the Institutional Review Board of the Peking University School of Oncology, China. All experiments were performed in accordance with relevant guidelines and regulations.
The specimens were stored at − 80 °C and genomic DNA was extracted from the frozen tissues using DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. The β -globin gene was evaluated in all specimens by PCR.
HPV typing and integration detection. HPV DNA in valid (β -globin positive) specimens was detected using a highly sensitive PCR primer set (SPF1/GP6+ ) amplifying a 184-bp fragment of the L1 open-reading frame 16 . Specimens showing the PCR amplification product were used to identify HPV genotypes and integrations. HPV probes were designed according to the full-length genome of 17 HPV types (6,11,16,18,31,33,35,39,45, 52, 56, 58, 59, 66, 68, 69, and 82) by MyGenostics (MyGenostics, Baltimore, MD, USA). Details of HPV typing and the detection of HPV integrations, as well as Sanger sequencing validation of potential HPV integration sites were described previously 12 . In brief, the whole-genomic libraries were hybridized with HPV probes (MyGenostics GenCap Technology), adsorbed onto the beads via biotin and streptavidin magnetic beads, and the uncaptured DNA fragments were removed by washing. Then the eluted fragments containing the targeted gene were enriched by PCR to generate libraries for sequencing. Libraries were quantified and sequenced for paired-end 125 bp using the Illumina HiSeq 2500 sequencer (Illumina Inc., San Diego, CA, USA). Illumina clean reads were mapped to human genome (GRCh37/hg19) and HPV genome of 17 types using the BWA program. The paired-end read, uniquely mapped with one end to a human chromosome and the other to the HPV reference genome, is identified as a discordant read pair. If a specific position has one or more discordant read pairs, it would be considered as a potential HPV integration site. PCR and Sanger sequencing were used to verify all the potential HPV integration breakpoints. All sequences of the fusion genes were characterized by the NCBI human mega Blast database alignment tool and the UCSC Blat database.