A streamlined method for analysing genome-wide DNA methylation patterns from low amounts of FFPE DNA

Formalin fixed paraffin embedded (FFPE) tumor samples are a major source of DNA from patients in cancer research. However, FFPE is a challenging material to work with due to macromolecular fragmentation and nucleic acid crosslinking. FFPE tissue particularly possesses challenges for methylation analysis and for preparing sequencing-based libraries relying on bisulfite conversion. Successful bisulfite conversion is a key requirement for sequencing-based methylation analysis. Here we describe a complete and streamlined workflow for preparing next generation sequencing libraries for methylation analysis from FFPE tissues. This includes, counting cells from FFPE blocks and extracting DNA from FFPE slides, testing bisulfite conversion efficiency with a polymerase chain reaction (PCR) based test, preparing reduced representation bisulfite sequencing libraries and massively parallel sequencing. The main features and advantages of this protocol are: An optimized method for extracting good quality DNA from FFPE tissues. An efficient bisulfite conversion and next generation sequencing library preparation protocol that uses 50 ng DNA from FFPE tissue. Incorporation of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. An optimized method for extracting good quality DNA from FFPE tissues. An efficient bisulfite conversion and next generation sequencing library preparation protocol that uses 50 ng DNA from FFPE tissue. Incorporation of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. We provide a complete workflow and an integrated protocol for performing DNA methylation analysis at the genome-scale and we believe this will facilitate clinical epigenetic research that involves the use of FFPE tissue.


Background
The development of next generation sequencing technologies has facilitated large-scale quantification of DNA methylation. The progressive improvement in profiling global DNA methylation provides a great opportunity for analyzing large numbers of clinical samples and detecting aberrant epigenetic marks [1]. Formalin-fixed, paraffin-embedded (FFPE) tissues represent a major source of samples in clinical research, especially in cancer research. In many cases, FFPE tissue is the only available material especially for retrospective studies. Therefore, it is essential to efficiently use FFPE tissues to obtain high resolution genomic and epigenomic data from clinical specimens.
However, FFPE is a challenging material for generating epigenomic data. The formalin fixation process leads to DNA damage due to fragmentation [2]. Indeed, nucleic acids from FFPE samples generally contain smaller fragments (less than 300 bp) [3,4]. Further, the nature of tissue preparation leads to cross-linking of DNA and proteins [5]. The cross-linking process increases the mechanical stress on DNA and contributes to DNA degradation. In addition, non-buffered formalin that was used historically oxidizes to generate formic acid which results in DNA cleavage [6]. Further, several additional factors influence the quality of nucleic acids derived from FFPE; for example, duration of fixation, composition of fixative (concentration of formalin, pH and salt concentration), temperature and tissue type [3,[7][8][9]. Furthermore, processing of FFPE tissues for DNA extraction could affect the quality and downstream application. For good yield and quality, the lysis protocol that is used needs to effectively remove the DNA-protein cross links [10]. In addition, deparaffinization of the FFPE tissues is a crucial step. Deparaffinization procedures are considered to have great impact on the quality and quantity of nucleic acids extracted from FFPE blocks [11]. For DNA methylation analysis, bisulfite conversion is the most commonly used method that allows analysis of methylated and unmethylated CpG sites after the bisulfite treatment [12]. Bisulfite conversion will further degrade DNA [13,14] and therefore methylation analysis on FFPE samples presents an additional challenge. Some previous studies have performed DNA methylation analysis of FFPE tissues [15][16][17][18][19]. These studies have provided valuable insights regarding the factors affecting methylation analysis of FFPE samples, and provided promising results for the use of FFPE material for DNA methylation profiling. However, in the previous sequencing-based methylation studies on FFPE samples, the mapping rates of sequenced reads (to the reference genome), were lower than those from fresh tissue.
Here we describe a complete and optimized workflow for preparing next generation sequencing libraries for methylation analysis from FFPE tissues (Fig. 1). This includes, counting cells from FFPE blocks and extracting DNA from FFPE slides, testing bisulfite conversion efficiency with a polymerase chain reaction (PCR) based test and sequencing. We have optimized a method for extracting good quality DNA from FFPE tissues for methylation analysis. We have prepared next generation sequencing library (using reduced representation bisulfite sequencing [RRBS]) with 50 ng DNA from FFPE tissues and we demonstrate utility of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. Following the described protocol we obtained high quality methylation data and a higher mapping efficiency than previous studies.

Extraction of DNA from FFPE tissues
We have adopted the standard Qiagen kit protocol with some modifications. Additional incubation in proteinase K (steps 5 and 6) resulted in improved bisulfite conversion in some samples. The FFPE samples were obtained from primary, non-invasive (in-situ) melanomas. The FFPE blocks were collected under the regulation of Health and Disability Ethics Committee (Ethics protocol number: LRS1102002).

Materials
QIAamp DNA FFPE Tissue Kit (Qiagen #56404), with MinElute columns Heating Block Xylene (LabServ Analytical Grade #BSPXL864) Proteinase K (Life Technologies #25530-015) Procedure 1. FFPE tissue slices (10 μm thickness) were placed in 1.5 mL microcentrifuge tubes, 1 mL xylene was added and the tube was vortexed for 10 s. During DNA extraction from FFPE tissue, an important step is the efficient removal of paraffin as incomplete paraffin removal can inhibit further downstream applications such as PCR. We used a common technique for paraffin removal, which uses washes with xylene and ethanol [7]. We found that this technique worked very well with our FFPE samples. In our hands, using FFPE slices derived from the same blocks with a non-solvent based kit (Machery-Nagel Nucleospin FFPE DNA) did not give optimal DNA that could be amplified by PCR.   40). For each of these sections the area (in μm 2 unit) was determined using Pen Tool. 4. Next, from each of these sections, a further 3 to 5 subsections were defined. The number of subsections measured from a section was determined by the total area size of section. Each subsection was recorded using the Snapshot tool at 40X magnification. 5. Using Image J, each subsection was then converted into an 8bit image with the background subtracted. 6. Next, we used threshold adjustment parameters on these subsections using Image J, this operation allows for the differentiation of cells from each other. 7. The image was then converted to "Mask", followed by selection of "Fill Holes" option to fill any empty areas within each cell that was lost during threshold adjustment. 8. "Watershed" was then selected to provide a division point between the joint cells. 9. Next, we completed a particle analysis with a threshold pixel size of 120 and exclusion of cells from the edge of the sample. 10. Average cell count for a section was determined from the area and the number of cell in the subsections and then considering the total area of the section. 11. Because these calculations were done in a 5 μm section, each cell count was doubled to provide an approximate of the total cell count for 10 μm section. 12. The DNA yield reported here was measured using a Nanodrop (NanoDrop, USA). Nanodrop is widely available in standard molecular biology laboratories. For next generation sequencing applications, we recommend using the Qubit Fluorometer (Invitrogen), which provides sensitive assays for low amounts of DNA.

Preparation of libraries for reduced representation bisulfite sequencing (RRBS) or other genome-wide methylation applications (from FFPE samples)
Following the extraction of DNA, the next step is to prepare libraries for sequencing to profile genome-wide methylation patterns. The genome-wide techniques employ a common principle for analysis; a local treatment of the genome to distinguish between methylated and unmethylated sites followed by global investigation of these modified sites to derive methylation patterns. The global investigation approaches are generally next generation sequencing or array platforms. For local treatment, there are three main approaches. These are: 1) restriction endonucleases that cleave DNA at specific recognition nucleotide sequences. 2) bisulfite conversion: treating DNA fragments with sodium bisulfite before PCR analysis. Sodium bisulfite treatment of DNA converts cytosine (C) residues to uracil (U), but leaves 5methylcytosine residues unchanged [20]. RRBS falls under this category. One note is that sodium bisulfite treatment doesn't distinguish between 5-methylcytosine and 5-hydroxymethylcytosine [21]. 3) an affinity enrichment method involving the application of an antibody (specific for methylated cytosines) to enrich for methylated regions in the genome by immunoprecipitating genomic DNA [22]. Sodium bisulfite treatment is a convenient and commonly used treatment prior to genome-wide sequencing. We used RRBS on FFPE samples to assess whether successful libraries could be prepared following the method described here. RRBS utilises bisulfite conversion combined with next-generation sequencing to provide single-nucleotide resolution methylation information at a genome-scale. Although we have tested the described protocol for RRBS, it is highly likely that these protocols could be used for other genome-wide methylation applications, as the principles are similar. The method for RRBS library preparation and its application has been extensively described previously [23][24][25][26][27][28] [29]; however, it is also possible to select any other size ranges for reduced representation libraries.
Assessment of bisulfite conversion efficiency with PCR before next generation sequencing Next-generation sequencing of the prepared libraries is an expensive step. The most critical aspect for the success of methylation-sequencing libraries is efficient bisulfite conversion. Therefore using commercially available kits, we have incorporated a PCR based test for assessing the success of bisulfite conversion for methylationsequencing libraries. Here we describe this method and also demonstrate the utility of this test with an example from generating FFPE RRBS libraries.

Assessment of DNA yields from FFPE
We performed cell counting of eight FFPE sections and compared the DNA yield ( Fig. 2 and Figure S1 in Additional file 1). We counted the cell number in 5 μm sections and doubled the cell count to provide an approximation of the total cell count for 10 μm section (results are shown in Table 1). We confirmed that the number of cells in a 10 μm section was strongly correlated with the total DNA yield from that section (Pearson r = 0.67, correlation after log2 transformation = 0.74, Figure S2 in Additional file 1). However, the observed and expected recovery rates substantially differed between samples. Here we provide the method that we used for counting the number of cells in a 10 μm FFPE slide and the corresponding DNA yield for these samples (Table 1). However, this represents one of many possible methods

Preparation of libraries for reduced representation bisulfite sequencing (RRBS) or other genome-wide methylation applications (from FFPE samples)
The quality of the RRBS libraries prepared from melanoma FFPE slices was assessed on a 2100 Bioanalyzer (Agilent Technologies) using the high sensitivity DNA chip. Bioanalyzer analysis of two representative FFPE derived RRBS libraries are shown in Fig. 3.
Assessment of bisulfite conversion efficiency with PCR before next generation sequencing  Fig. 4a)  In addition, as expected, the untreated genomic DNA and PCR blank (dH 2 0) showed no amplification (lanes 5 and 6 respectively).  We have shown the results for this PCR test for RRBS libraries in Fig. 4b. RRBS libraries were prepared with different input DNA and then bisulfite converted and amplified with Illumina primers to recover enough DNA for next generation sequencing (lane 2-4 contains libraries prepared with 50, 100 and 500 ng DNA input respectively, lane 5 contains a methylated positive control and lane 6 contains a water negative control, Fig. 4b). Similar to previous observations a 182 bp was visible in the gel as expected suggesting that all these libraries were successfully converted. We also noticed an additional band at~120 bp (Fig. 4b). We have previously described the presence of this additional band which is likely to arise from adaptor-adaptor dimerization [30]. For all three libraries (50, 100 and 500 ng input), the same adaptor concentration was used and the gel image indicates that higher intensity of potential adaptor dimers in libraries with lower DNA templates (i.e., a stronger band from the 50 ng library compared to the 500 ng library, Fig. 4b).

Sequencing, alignment and analysis of RRBS libraries from the FFPE DNA samples
To test whether good quality RRBS sequences could be obtained from the FFPE libraries, we performed massively parallel sequencing on two FFPE RRBS libraries (FFPE1 and FFPE2 as shown in Figs. 3 and 4). For this test purpose, Illumina MiSeq was used to generate single-ended 151 bp sequences. We obtained 109,352 and 78,364 sequenced reads for FFPE1 and FFPE2 samples respectively. We assessed the quality of the sequenced reads by plotting Phred quality scores along the read position using the FastQC program (from Babraham Institute, URL://www.bioinformatics.babraham.ac.uk/ projects/fastqc/). The higher the Phred score, the better was the base call (calculated using the formula Qphred = −10 log10(e), where e is the estimated probability of a base being incorrectly identified). For both the FFPE libraries, very high quality sequenced reads were obtained (mean quality score of 32 and 35 for FFPE1 and FFPE2 respectively, Figure S3 in Additional file 1). Consistent with this observation, we also found no traces of N bases (i.e., if the base-caller cannot determine the sequence, it replaces these bases with Ns which are not usable and cause misalignment if present in the dataset) in both of these samples ( Figure S4 in Additional file 1). The Illumina platform uses sequence by synthesis chemistry to sequence the DNA molecules and as a result of accumulation of errors, the base calling is less accurate at the end of the reads (3′ end). Our sequenced reads were 151 bp long and we observed relatively decreased sequence quality towards the 3′ end of the sequence, consistent with previous quality reports on RRBS sequenced reads ( Figure S3 in Additional file 1).
Finally, we aligned these sequenced reads with the reference human genome (GRCh37 build) using bisulfite aligner Bismark (version: v0.14.3). Alignment was performed after processing the reads and removal of adaptors as described previously [20,31]. After processing, 63,870 and 69,988 sequenced reads were accepted for analysis and alignment for FFPE 1 and FFPE 2 respectively. This was an acceptance rate of 58.4% and 89.3% sequenced reads for further analysis. In the human genome, especially in differentiated somatic cells the proportion of non-CpG methylation is low [32,33]. If an RRBS or other whole genome-scale bisulfite treated library shows a high level of non-CpG methylation, it is likely that this results stem from failure of bisulfite conversion. The Bismark analysis of FFPE1 and FFPE2 samples indicated a very low level of non-CpG methylation. For FFPE1, both the CHG and CHH context methylation was 1.2%, while for the FFPE2 library, CHG and 2% agarose, run for 25 mins at 100 V CHH context methylation was 1.2% and 1.1% respectively (Data S1 and S2 in Additional file 1). This percentage is the summation of the actual non-CpG methylation in the genome and incomplete bisulfite conversion. Taken together, these results indicate a high efficiency of bisulfite conversion in these FFPE libraries, consistent with the PCR test described in Fig. 4 for the several FFPE samples. Recently, unmethylated Lambda DNA spike-in was used to test bisulfite conversion efficiency in RRBS libraries. After sequencing and alignment, the non-conversion rate was calculated as the number of sequenced cytosines in non-CG contexts divided by all the covered cytosines in non-CG contexts in the lambda DNA genome [34].

Discussion
For retrospective clinical studies and archival biological material, FFPE represents the most common tissue resource. The ability to perform epigenetic analysis will not only help in understanding the molecular basis of diseases but also has implications for other ongoing work involving the development of new epigenetic biomarkers or diagnostic assays. Previously, few studies have assessed the quality of DNA methylation profiles obtained from FFPE tissues [35][36][37]. These studies have reported a good correlation of methylation calls between fresh tissues and FFPE samples. Also, it was shown that results obtained from FFPE tissues were reproducible with independent techniques for methylation profiling. However, these analyses were based on few CpG sites and mainly captured the promoter methylation status of selected genes. Here we provide a complete workflow and protocol for performing genome-wide methylation analysis and highlight critical factors for successful analysis of FFPE samples (see Table S1 in Additional file 1). We also implemented and demonstrated a PCR based method to assess bisulfite conversion efficiency prior to sequencing. This could be used to screen samples prior to sequencing so that only successfully converted samples would go to the next step, leading to significant savings in cost and time.
Due to DNA degradation and fragmentation, the mapping rates for FFPE samples are lower than those that would be expected from fresh tissue or cell line material. For example, a previous genome-wide evaluation of FFPE material reported unique mapping rates of 7.0% to 19.9% [16]. Following the protocol described here, we obtained a unique mapping efficiency of 35-40% (Data S1 and S2 in Additional file 1). This mapping efficiency was obtained with a stringent mapping criteria of only one mismatch in the seed of the sequenced read (i.e., in the first 28 bp of the reads) while the default mismatch allowed in Bismark alignment is two. If the default parameters are used the mapping efficiency is likely to further improve. Furthermore, a previous study which sequenced 18 FFPE samples using RRBS, reported unique alignment rate of the sequenced reads ranging from 16.7% to 53.1% (median = 27.7%) [15]. Following the described protocol we obtained 58.4% and 89.3% reads that passed quality control and were used for alignment to the reference genome.

Conclusions
As a method for genome-wide methylation profiling, RRBS is shown to be reproducible and has been widely used by many groups world-wide [15,[38][39][40][41][42][43]. In a recent analysis of melanoma cell lines, we further demonstrated reproducibility of RRBS results for several target genes, using Sequenom EpiTyper methylation analysis and traditional bisulfite sequencing [44,45]. Here we have combined several methods to provide an integrated protocol. In the current study, we have not directly compared our FFPE RRBS pipeline with a complete existing pipeline. Future comparison of the genome-wide methylation profiles of FFPE samples using the described RRBS workflow with other global methylation analysis will be beneficial to further demonstrate the utility of this method. However, we have demonstrated a cell counting method that optimises tissue usage when analysing small samples (for example, primary melanoma). We have shown successful bisulfite conversion of genomic DNA extracted from an extended proteinase K treatment of FFPE tissue, whereas standard DNA extraction protocols resulted in bisulfite conversion failure in some libraries. In addition, we have successfully implemented a PCR-based method to assess bisulfite conversion before and after RRBS library preparation that avoids sequencing of poor quality libraries. Using this integrated protocol, we have demonstrated better mapping efficiency than previously published genome-wide methylation studies. In conclusion, we provide a streamlined workflow and protocol for performing DNA methylation analysis at the genome-scale and we believe this will facilitate clinical epigenetic research that involves the use of FFPE tissue. Availability of data and materials Availability of Data is not applicable for wet lab experiments described here as results for optimized methods are presented here. The sequencing data for FFPE sample for that are presented here for demonstration purpose are available on request to the corresponding author.
Authors' contributions JLL optimized the FFPE RRBS method and prepared libraries. JW designed and performed the cell counting on FFPE blocks. PAS wrote the computational analysis pipeline. IMM and MRE contributed to the conceptual design, interpretation of data and manuscript writing. AC designed and lead the study, analyzed the data and wrote the first draft of the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate Collection of tissue samples and consent of patients were performed according to the protocol and approval from Health and Disability Ethics Committee (Ethics protocol number: LRS1102002).

Consent for publication
Not applicable.