Evaluation of clinical formalin‐fixed paraffin‐embedded tissue quality for targeted‐bisulfite sequencing

Formalin‐fixed paraffin‐embedded (FFPE) tissues are promising biological resources for genetic research. Recent improvements in DNA extraction from FFPE samples allowed the use of these tissues for multiple sequencing methods. However, fundamental research addressing the application of FFPE‐derived DNA for targeted‐bisulfite sequencing (TB‐seq) is lacking. Here, we evaluated the suitability of FFPE‐derived DNA for TB‐seq. We conducted TB‐seq using FFPE‐derived DNA and corresponding fresh frozen (FF) tissues of patients with kidney cancer and compared the quality of DNA, libraries, and TB‐seq statistics between the two preservation methods. The approximately 600‐bp average fragment size of the FFPE‐derived DNA was significantly shorter than that of the FF‐derived DNA. The sequencing libraries constructed using FFPE‐derived DNA and the mapping ratio were approximately 10 times and 10% lower, respectively, than those constructed using FF‐derived DNA. In the mapped data of FFPE‐derived DNA, duplicated reads accounted for > 60% of the obtained sequence reads, with lower mean on‐target coverage. Therefore, the standard TB‐seq protocol is inadequate for obtaining high‐quality data for epigenetic analysis from FFPE‐derived DNA, and technical improvements are necessary for enabling the use of archived FFPE resources.


INTRODUCTION
Formalin-fixed paraffin-embedded tissue (FFPE) has long been used as a permanent specimen preservation method in pathological and histological studies, including immunohistochemical 1 and in situ hybridization 2 studies. Formalin fixation results in cross-linking of primary amines in nucleotides and amino acids, leading to DNA/RNA fragmentation and enzymatic activity inhibition. 3 Thus, FFPE-derived DNA is generally of low quality and considered unsuitable for omics analyses, including genome, transcriptome, and DNA methylation (DNAm) analyses. Recently, kits with improved efficiency for FFPE-derived DNA extraction have become commercially available, allowing the preparation of sequencing libraries from FFPE samples. These technical improvements led to the use of FFPE-derived DNA for whole-genome sequencing, 4 DNA capture sequencing, 5,6 RNA sequencing, 7 and wholeexome sequencing. 8,9 DNAm is an epigenetic mechanism that modulates gene expression. DNAm profiles possess developmental stage specificity and cell/tissue type specificity. 10 Hence, DNAm patterns are considered novel biomarkers for clinical diagnosis before the onset and in the early stages of diseases. Array-based DNAm analysis is widely used owing to the relatively low cost of DNAm profile analysis. 11 High-quality DNAm data can be obtained from 50 ng of FFPE-derived genomic DNA (gDNA). 12 However, because only a few CpG sites can be analyzed using the DNAm profile analysis, most potential biomarkers remain unsurveyed.
Recently, we identified a novel DNAm biomarker for severe aortic valve stenosis using TB-seq and demonstrated that TB-seq is useful in searching for novel DNAm biomarkers. 13 FFPE-derived DNA represents a vast pathological resource that could be utilized for DNAm analysis. However, there is no established approach for the use of FFPE-derived DNA for DNAm analysis. Here, we focused on TB-seq and compared the quality of gDNA, libraries, and TB-seq statistics obtained from fresh frozen (FF) or FFPE samples.

Samples and ethics
The FFPE and FF samples were prepared from the same cancerous and non-cancerous tissue samples of the same individual. FFPE and FF samples were microscopically and macroscopically observed by a pathologist and collected from non-necrotic areas. FFPE samples were fixed using 10% neutral-buffered formalin for 1-3 days in accordance with the Japanese Society of Pathology Guidelines for the Handling of Pathological Tissue Samples for Genomic Research. 14 Written informed consent was obtained before treatment. This study was approved by the Ethics Committee of the National Cancer Center, Tokyo, Japan, and Keio University.

DNA extraction, library preparation, and sequencing
The workflow summary is shown in Figure 1. FFPE-derived DNA was prepared as previously reported. 12 gDNA was extracted from FF tissue using the QIAamp DNA Mini kit (Qiagen, Hilden, Germany). A GeneRead DNA FFPE kit (Qiagen) was used to extract gDNA from FFPE tissue. DNA yield and quality were assessed using the Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA). Absorbance was measured using a Nanodrop 2000/2000c Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).
To prepare DNA for library construction, 1 μg of gDNA was sheared to a size range of 150-175 bp using the focusedultrasonicator (Covaris Inc., Woburn, MA, USA). Sequencing libraries were prepared with Agilent SureSelect XT Human Methyl-Seq Capture Library and Reagent kit (Agilent) according to manufacturer's instructions. Finally, all libraries were treated with sodium bisulfite using the EZ DNA Methylation-Gold Kit (Zymo Research, Irvine, CA, USA).
Library fragment size was measured using a D1000 ScreenTape and Reagents Kit (Agilent), and the yields were quantified by real-time PCR using Kapa Library Quantification Kit (NIPPON Genetics Co., Ltd., Tokyo, Japan). The libraries were sequenced (2 × 125 bp) using a HiSeq2500 (Illumina Inc., San Diego, CA, USA).

HumanMethylation 450 K array analysis
Fifty nanograms of FFPE-derived gDNA was treated with bisulfite conversion reagent and examined using the HumanMethylation 450 K array (Illumina). The number of detected CpGs was calculated using the GenomeStudio software.

RESULTS
To ascertain whether FFPE-derived DNA is suitable for TB-seq analysis, we compared the quality of gDNA, libraries, and sequence statistics of FF and FFPE samples collected from four kidney cancer patients.

Comparison of sample quality between FFPE and FF tissues
The absorbance ratio at 260 and 280 nm (A 260/280 ) of the isolated DNA solutions from FF and FFPE samples ranged between 1.86 and 1.92 (Table 1), indicating high quality. The A 260/230 ratio of FF-derived DNA ranged between 1.73 and 2.10, indicating relative purity. However, the A 260/230 ratio of FFPE-derived DNA ranged between 1.05 and 1.23 (Table 1). The mean FF-derived gDNA peak size was > 50,000 bp   Because DNA from FFPE-CT and FFPE-NT was concentrated after the DNA extraction, the concentration of gDNA from FFPE was 10 times higher than in FF. FF-CT, fresh-frozen cancer tissue; FF-NT, fresh-frozen non-cancerous tissue; FFPE-CT, formalin-fixed paraffin-embedded cancer tissue; FFPE-NT, formalin-fixed paraffin-embedded non-cancerous tissue; Conc., concentration; A 260/280 , Absorbance ratio between 260 nm and 280 nm; A 260/230 , Absorbance ratio between 260 nm and 230 nm; DIN, DNA integrity number. (Figure 2A and B), whereas it was approximately 550 bp for the FFPE-derived gDNA ( Figure 2C and D). The DNA integrity number of the FFPE-derived DNA was 1.6, indicating extremely low quality. Sequencing libraries generated from FFPE-derived DNA yielded approximately 4.4-5.0 nM, 10 times fewer than that of FF-derived DNA ( Table 1).

Comparison of library quality and sequencing statistics between FFPE and FF tissues
The amounts of raw data obtained from FFPE libraries were approximately 10% lower than those from FF libraries (paired t-test P = 0.0042) ( Table 2), although the concentration of libraries used for sequencing was the same ( Table 2). The ratios of mapped reads to the reference genome were approximately 95% in FF libraries and 85% in FFPE libraries, similar to that of high-quality, optimized, reduced representation bisulfite sequencing using 50 ng of FFPE-derived DNA. 17 Nevertheless, the ratio of PCR duplicate reads was higher in FFPE libraries than in FF libraries (> 60% vs. 10-20%) ( Table 2). The average ratio of ontarget reads for FF samples was approximately 80%, and the mean on-target coverage was over 30×. However, for FFPE samples, we observed a <70% on-target reads ratios and a mean on-target coverage of 7.2-7.3× (Table 2). Of the 3.7 million CpGs designed for Agilent ready-made probes, the number of detected CpGs was more than 3.6 million in FF (>97%) and only 3 million in FFPE (approximately 81%) ( Table 2).

Number of detected CpGs of FFPE samples in HM450 microarray analysis
To compare the results of the quality assessment of the TB-Seq in this study with those of the microarray analysis, HM450 microarray analysis was performed on FFPE samples. The results showed that more than 97% (P < 0.05) or more than 95% (P < 0.01) of the CpGs were detectable in the FFPE samples (Table 3).

DISCUSSION
FFPE-derived DNA is a rich source of genetic material for molecular diagnostic and pathological studies. Although FFPE-derived DNA has recently been available for analysis in multiple sequencing methods, 4-9 sequencing-based DNA methylation analysis is still largely unreported. Contrarily, DNA methylation analysis using microarrays is widely used worldwide, and microarray analysis of small amounts of FFPE-derived DNA as small as 50 ng has been established, as reported by Ohara et al. 12 Our microarray analysis results are consistent with a previous report 12 and could obtain high-quality data from FFPE-derived DNA.
Our TB-seq results indicate that the quality of FFPE-derived DNA is lower than that of FF-derived DNA, indicating organic contaminants in the isolated DNA solution and fragmentation of gDNA. These results are consistent with those of previous reports. 18,19 Furthermore, the organic contamination and fragmentation of gDNA may have caused PCR bias and increased the fraction of duplicate reads in the FFPE sample. The high percentage of duplicate reads seen in PCR amplification of FFPE libraries indicated that the amount of input gDNA used in the existing TB-seq protocol was not enough to ensure the complexity of the libraries. In addition, to obtain the same level of on-target coverage from the FFPE and FF samples, approximately five times the data is necessary, which is not cost-effective.
In conclusion, the existing protocol for TB-seq is insufficient for preparing sequencing libraries from FFPE-derived DNA  Near-target bases are defined as the detected bases mapped within 1 kb upstream or downstream of Agilent ready-made probes. FF-CT, freshfrozen cancer tissue; FF-NT, fresh-frozen non-cancerous tissue; FFPE-CT, formalin-fixed paraffin-embedded cancer tissue; FFPE-NT, formalin-fixed paraffin-embedded non-cancerous tissue. for general DNAm analysis. Technical improvements limiting the DNA fragmentation are necessary before reliably using the archived FFPE resources for this analysis. This approach will help facilitate epigenetic research using FFPE tissue archives and identify novel DNAm biomarkers for various diseases and environmental exposures.