Targeted DNA Methylation Analysis by High-Throughput Sequencing in Hepatic Carcinoma Tissue and Adjacent Normal Tissues


 Background: The relationship between epigenetic abnormalities and tumorigenesi has been investigated in the past decade and made major advances, particularly the abnormal expression of small RNAs, DNA methylation, and histone modification in cancer. In many tumor-related studies, the regulatory changes in DNA methylation during cancer development and the development of resistance to anticancer drugs have show that DNA methylation can be used as a biomarker for cancer diagnosis and concomitant diagnosis, but there is a lack of clinically useful biomarkers associated with hepatic carcinoma. Using high-throughput sequencing technology, appropriate testing and validation can be carried out in large samples. The relationship between DNA methylation and tumor development can be explored, contributing to clinical diagnosis and personalized treatment of hepatic carcinoma. Methods: In this study, we implemented and evaluated the effectiveness of high-throughput sequencing for DNA methylation analysis in hepatic carcinoma. For the relationship between DNA methylation and gene expression, Pearson correlation analysis was used to evaluate the correlation. Twenty-five isolated genomic regions were amplified by PCR using bisulfite-transformed liver cancer tissue (Ca) and paracancer tissue (T) as template DNA. PCR final product sequence information was obtained by sequence analysis using Illumina Hiseq/Miseq platform. Results: The average depth of coverage across all amplicons was 30,548 for T and 29,346 for Ca, with a maximum of 3,675 at the ARID1A amplicon and a minimum of 65 at the PTEN amplicon. Methylation spectra were obtained for each genomic locus of the two groups of samples, and the results showed that methylation was significantly different at the X target loci and slightly different at the Y target loci. Cluster analysis showed that all T tissues were clustered in one group (except tissues T2 and T3), while Ca tissues were clustered on the other side. The results showed that DNA methylation at the loci may be closely related to liver cancer, providing references for the research and development of biomarkers in clinical diagnosis. Conclusions: The study demonstrates that high-throughput sequencing technology is a powerful and cost-effective method for methylation analysis of target DNA in cancer tissues.

investigated in the laboratory, it can be applied for epigenome-wide association studies and used as epigenetic biomarkers for the prediction of complex diseases. Age-related changes in DNA methylation have been observed in several types of human tissues such as the brain, skeletal muscle, adipose tissue, and pancreatic islets [2,3] . Epigenetic alterations in blood mirror age-associated DNA methylation and gene expression changes in human liver [4] .
DNA methylation is usually associated with diseases such as cancer and autoimmune disorders, occurring with environmental exposure and other biological phenomena. Strong correlations between DNA methylation patterns and clinical phenotypes can serve as biomarkers for disease diagnosis and treatment guidance. For example, DNA methylation biomarkers have been shown to support clinical decisions in various cancers and are also used in non-invasive prenatal testing, quality control of cultured cells, and forensic applications [5] .
To investigate the tumorigenesi and the corresponding treatment options, the study of genetic and epigenetic background information related to liver cancer will promote the understanding of the pathogenesis of liver cancer. As the one of the most widely used sequencing platforms, the application of MiSeq in this study makes it possible to analyze DNA methylation of single base pairs and contributes to the understanding of more detailed regulatory mechanisms. As an important epigenetic marker, DNA methylation is involved in a series of functions, such as suppression or activation of gene expression, nucleosome positioning transcription factor binding, and splicing [6] . Methylation polymorphism is an important reason for individual phenotypic difference, and methylation variation may also often lead to individual phenotypic difference. A large number of studies have shown that abnormal methylation is closely related to cancer, diabetes, Parkinson's disease and other complex diseases, methylation detection has great signi cance foro biology and translational medicine.In this study, we used multiple PCR technology combined with Illumina Hiseq/Miseq second-generation sequencing platform to capture and sequence multiple speci c CpG islands at the same time, and accurately calculate the methylation level of each CpG site with high-depth sequencing data. By analyzing the the expression and methylation status of 25 cancer-related genes in liver cancer and its adjacent tissues to explore the correlation between DNA methylation and the development of liver cancer Methods Preparation of DNA samples Twenty-six pairs of fresh liver cancer tissue and adjacent normal liver tissue samples were obtained from the hospital, as approved by Xinxiang Medical University (XYLL-2016025). All samples were immediately frozen in liquid nitrogen after surgery and stored at -80℃. Genomic DNA was isolated from these samples via the phenol-chloroform method, after incubation with proteinase K and DNA lysis buffer (50 mM Tris-HCl at pH 8.0, 100 mM ethylenediaminetetraacetic acid at pH 8.0, 100 mM NaCl, 1% sodium dodecyl sulfate). The quality of genomic DNA of each sample was good. The cleaning bands were detected by agarose gel electrophoresis, and there was no obvious degradation or RNA contamination.
The quality and quantity of the DNA were determined by NanoDrop 2000 spectrophotometer (NanoDrop Technologies, DE, USA). Total DNA concentration was ≥ 20 ng/ml and the total content was ≥ 1 mg. 3) At room temperature, the mixture will change from green to blue with the addition of the DNA Protect Buffer. 4) Sul te reaction was performed using a PCR instrument.The reaction procedure is as follows: 95℃ 5min→60℃ 25min→95℃ 5min→60℃ 85min→95℃ 5min→60℃ 175min→20℃ Forever.

Target area enrichment
According the following list of Cancer-related genes (Table S1), we designed forward and reverse primers for methylated and unmethylated statuses with MethPrimer (Li Lab, China, Table S2). The volume of each PCR reaction mix is 20 µl, which includes 10.0 µl of KAPA2G Robust Hot Start ReadyMix, 1.0 µl of forward primer, 1.0 µl reverse primer, > 5 ng of bisul te-treated genomic DNA, and topped with doubledistilled water to 20 µl. The PCR reaction mix was then subjected to thermal cycling for target ampli cation. PCR was carried out with the following conditions: 3 min of initial enzyme activation; 45 cycles of 94 °C at 30 s, 55 °C at 30 s, and 72 °C at 40 s; and nal extension at 72 °C for 10 min. PCR products of the same template were mixed by the same volume and puri ed by backman agtampure XP magnetic beads. The products were subjected to subsequent tests after product concentration detection.

DNA methylation analysis
Illumina plate-compatible speci c tag sequences were introduced to the end of the library by PCR ampli cation using a primer with Index sequence. The Index PCR ampli cation products of all samples were equally mixed and the nal MethylTarget sequencing library was obtained after glue cutting recovery. After accurate quantization of library molality, the Illumina Hiseq/Mis EQ platform was nally used for high-throughput sequencing in a 2 × 150 bp/2 × 250 bp double-end sequencing mode to obtain FastQ data.

Statistical analysis
Statistical analysis was performed using the statistical software package SPSS21 (SPSS, Chicago, USA). We used a chi-square test to compare the abundances of the three gene methylation statuses (methylated, unmethylated, and partially methylated). Two-tailed chi-square (χ 2 ) test or Fisher's exact test was performed to analyze the correlation between gene expression/methylation and clinicopathological parameters. The Chi2 test was conducted to compare the genomic distribution of signi cant CpG sites.

DNA methylation analysis
The main principles of DNA methylation analysis be classi ed into various types based on pretreatment (sodium bisul te, concentrated enzyme digestion, and a nity) and analytical method (next-generation sequencing-based, locus-speci c, gel-based, and array-based analyses) [7] . Among them, sodium bisul temodi ed genomic DNA is one of the most commonly used techniques to evaluate CpG methylation status by modifying genomic unmethylated cytosine to uracil [8,9] . As the gold standard of DNA methylation analysis, bisul te-based DNA methylation offers the advantage of quantitative assessment, detection sensitivity, and ability to analyze a wide variety of samples [10] . However, it is limited by the amount of DNA because the transformation of sodium bisul te requires DNA denaturation before treatment, and the removal of sodium bisul te after puri cation results in a large amount of degraded DNA. Most published studies on DNA methylation have used low-throughput, conventional bisul te (Sanger) sequencing for the analysis of mammalian samples. With advances in nucleic acid sequencing technologies, next-generation sequencing (also known as deep or high-throughput sequencing) makes it possible to generate millions of base pairs of sequence information in a matter of hours [11] . At present, most reports on DNA methylation based on high-throughput sequencing are focused on genome-wide epigenomic analysis [12,13] and is very expensive. However, because of the large size of the mammalian genome, statistics on the ratio of methylated to non-methylated cytosine in the genome are not reliable, and it is di cult to ensure su cient coverage depth for each CpG site. Absolute DNA methylation assays provide a quantitative measurement of DNA methylation levels at single-CpG resolution. In this study, 25 discrete genomic regions were ampli ed by PCR using bisul te-converted genomic DNA derived from hepatic carcinoma and adjacent normal tissues as template DNA. The resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx platform.
Although the accumulation of genetic and epigenetic abnormalities has been shown to play an important role in carcinogenesis, the molecular mechanism of hepatic carcinoma remains to be further studied [14] . Epigenetic effects such as microRNAs, DNA methylation, histone modi cation, non-coding Rnas, and nucleosome localization have similar functions, whilch can alter gene expression without altering DNA sequence [15][16][17][18][19] . In the past few decades, it has become increasingly clear that altered epigenetic regulation plays a key role in many diseases, particularly cancers [18] . As the most widely studied epigenetic mechanism, abnormal expression of DNA methylation has been found to be related to tumor development in many studies. For example, in the study of gastric carcinogenesis, it was found that the tumor suppressor genes RUNX3 and cancer-related genes CDH1 and MLH1 were regulated by promoter hypermethylation and silenced in the early stage of carcinogenesis [16] . In this study, we selected 25 tumor-related genes based on comprehensive literature review. Next-generation sequencing has the ability to detect small amounts of methylated DNA. To explore the distribution of CpGs in the selected genes, DNA methylation can serve as a molecular biomarker in GC in a variety of samples, such as serum, plasma, and tissue. In total, we applied 33 samples, among which 26 were used for high-throughput sequencing (13 pairs of hepatic carcinoma and adjacent normal tissues).
Sequence information and chromosomal coordinates for these 25 tumor-related genes are shown in Table 1. Genomic DNA was collected from homologous hepatic carcinoma tissue (Ca) and adjacent normal tissues (T) and subjected to bisul te conversion, providing a mark to differentiate unmethylated from methylated cytosine bases within the nucleotide sequences evaluated. A single PCR amplicon from each of the 25 distinct tumor-related genes was generated using bisul te-converted DNA from hepatic carcinoma and adjacent normal tissues. The average amplicon length was 279.4 base pairs (bp; range 120-650 bp), covering an average of 36.8 CpG dinucleotides per amplicon (range 2-145 CpG sites). The optimized multiple PCR primer panel was used to perform multiple PCR ampli cation using the transformed sample genome as the template. Gene coverage depth analysis A total of 11,368,632 sequencing readings were aligned and overall, the number of readings of T sample is signi cantly higher than that of Ca sample. Table 1 summarizes the statistics of the number of reads aligned and used for downstream methylation analyses for each CpG island queried. Although more readings were obtained in the T sample, the differences between the readings for each gene in the T sample were small and followed the same trajectory (except for T14 sample). Similarly, the distribution of each gene in the Ca sample was uniform. Figure 1A and Fig. 1B show the coverage depth of each gene in four Ca samples and four T samples, respectively. And Fig. 1C and Fig. 1D show the coverage depth of four tumor suppressant genes and cancer-related genes in different Ca samples and T samples, respectively. Because read distribution bias was extreme for one sample (T14; other genes are not shown) and large gaps in coverage were observed, this genomic locus was not evaluated further. Although each gene has different coverage depth in the same sample, the locus of read distribution of each gene is consistent in all samples (Fig. 1A, Fig. 1B

Analysis of differences in methylation ratio
Using a sophisticated single nucleotide polymorphism identi cation algorithm, we can accurately determine the position of each methylation site, the sequencing depth, and the methylation ratio in the 25 amplicons of a single sample. The difference in methylation ratio of 25 amplicons (and their subtypes) between the two groups of samples was obtained through data analysis. Three methylated types (CG, CGH, and CHH) were analyzed, and their methylation ratios of these three methylated types in the amplicons of Ca and T were calculated. Sequencing results showed that CG-type methylation was present in all amplicons, including 45 of the 46 subtypes. CHG and CHH were only present in 4 amplicons (nine subtypes). The comparison of methylation data is shown in Fig. 2. In Fig. 2A, we observed that among the 45 amplicon subtypes, the methylation ratio of 3 amplicons (DAB2IP, 15.44% difference; Prdm14-1, 16.34% difference; Rab31-1, 14.36% difference) showed a signi cant difference between Ca and T (greater than 10%), whereas 19 amplicons showed no signi cant difference (the methylation patterns were greater than 1% and less than 10%) and the remaining 23 amplicons only showed slight differences (less than 1%). The methylation ratios of 20 amplicons were up-regulated while those of 25 were down-regulated in Ca. It is worth noting that the methylation ratio of all 25 down-regulated amplicons was less than 4% between Ca and T, and most were less than 1%. In Fig. 2B, 3 amplicons showed insigni cant differences in methylation ratio between Ca and T, and 6 showed extremely slight differences. Among the 9 amplicons, the methylation ratio of 3 amplicons was up-regulated in Ca, while that of 6 amplicons was down-regulated. Analysis of CHH methylation mode (Fig. 2C) revealed that 2 amplicons showed insigni cant methylation ratio difference between Ca and T, and 7 showed extremely slight differences. Only one amplicon showed an up-regulated methylation ratio in Ca, while the others were down-regulated.
Since there is no clear threshold for DNA methylation, it is not clear whether these subtle differences have physiological signi cance. However, there is evidence that very small changes in DNA methylation patterns may also affect the physiology of cells and organisms [23][24][25] . Therefore, any change in methylation pattern need to be taken seriously. Physiological changes can be caused by changes in methylation of more than 10%, which can be regarded as the key analysis direction. For example, Waterland et al. found that a change in methylation of only 10% was enough to cause phenotypic variation in live yellow aguti mice [26] . In view of the obvious changes in methylation ratio of DAB2IP, prdm14-1, and rab31-1, it is speculated that the methylation abnormalities of these 3 amplicons may be related to the growth and development of hepatic carcinoma cells. The speci c relationship still needs further exploration.

Haplotype type analysis
According to analysis by genome-wide Association Studies (GWASs), the statistical data indicate that single nucleotide polymorphisms (SNPS) are the predominant form of DNA sequence variation.The occurrence of SNPS may be due to individual differences or disease susceptibility. Haplotype-dependent allele-speci c methylation (hap-ASM) is also thought to impact disease susceptibility, but maps of this phenomenon using stringent criteria in disease-relevant tissues remain sparse. According to the research resultsIn of Mirabello et al., the HPV16 methyl haplotype determined by a new-generation sequencing method is considered to be associated with pre-cervical cancer [20] . Jo et al. found that haplotypes can affect DNA methylation levels and promoter activity in breast carcinomas [21] . In the study on the mechanism of DNA methylation and disease association, methyl-SeQ was used to sequence cells and tissues such as brain, T lymphocytes, glial cells, puri ed neurons and placenta, and a large number of unreported quantitative trait loci (mQTLs) and differential methylated regions (DMRs) were nally analyzed. The results showed that hap-ASM is highly tissue-speci c; an important trans-acting regulator of genomic imprinting is regulated by this phenomenon; variations in CTCF and TF binding sites are an underlying mechanism, and maps of hap-ASM and mQTLs reveal regulatory sequences underlying supraand sub-threshold GWAS peaks in immunological and neurological disorders [22] .
Allele-speci c methylation in uences allele-speci c expression. Therefore, the methylation state of haplotypes within genetically associated regions can determine epigenetic differences with potential functional effects. DNA methylation data and association-determined risk and non-risk haplotypes can be compared by haplotype-speci c methylation analysis. With the help of high-throughput sequencing, we can carry out multi-dimensional methylation analysis. In this study, the amplicon was used as the unit of analysis to analyze the haplotype of CpG loci, and the results are listed in the site haplotype (Table 2).

Analysis of the difference in haplotype abundance
T-test, u-test, and logistic regression models were used to analyze the difference in haplotype abundance between the two groups of samples. Table 3 demonstrates that only a few haplotypes showed signi cant differences in abundance between the two groups of samples. For example, among the 8 haplotypes of the ARID1A_1_ sequence, only TTCTTTTT showed statistical signi cance after t-test analysis (p ≤ 0.5). Distinct from genomic imprinting, where the methylation of an allele is determined by its parent-of-origin, for loci with hap-ASM, the local sequence context (haplotype) acts in cis to dictate the methylation status of local CpGs [22] . Abundance differences in haplotypes such as TTCTTT between Ca and T samples may affect tumor development by indicating the methylation status of local CpGs. and distinct from T samples (Fig. 3). This indicates that all Ca samples have the same trend of methylation status change, which may lead to changes in the physiological properties of normal samples.
To evaluate the classi cation power of the identi ed differentially methylated DNA regions, we performed cluster analysis according to the methylation level of CpG loci in the amplicon. The results showed that all samples were divided into two clusters (Fig. 4). T tissues were clustered in one group (except T2 and T3), while Ca tissues were clustered on the other side. Heatmap intuitively shows the relative methylation level of CpG sites in each sample. Figure 4 shows that almost all T samples showed low methylation levels in blue, while Ca samples on both sides showed higher methylation levels, indicated by orange or red. On the whole, this reveals that the methylation level of Ca samples is improved compared with that in homologous T samples.

Conclusion
High-throughput sequencing technology offers a powerful method for methylation analysis of target DNA in hepatic carcinoma tissue and adjacent normal tissues. Importantly, the cost of high-throughput sequencing is no higher than that of conventional bisul te sequencing.     Cluster analysis. Each cell of the heatmap represents the relative methylation level of CpG sites in the corresponding row of samples and re ects the relative methylation level of the sites by color gradient.
Blue and red represent low and high levels of methylation, respectively. The similarity of methylation level is shown by row arrangement order. Adjacent rows indicate higher overall similarity of sample methylation level. The tree on the left of the gure systematically describes the degree of similarity.