Strategies for discovery and validation of methylated and hydroxymethylated DNA biomarkers

DNA methylation, consisting of the addition of a methyl group at the fifth-position of cytosine in a CpG dinucleotide, is one of the most well-studied epigenetic mechanisms in mammals with important functions in normal and disease biology. Disease-specific aberrant DNA methylation is a well-recognized hallmark of many complex diseases. Accordingly, various studies have focused on characterizing unique DNA methylation marks associated with distinct stages of disease development as they may serve as useful biomarkers for diagnosis, prognosis, prediction of response to therapy, or disease monitoring. Recently, novel CpG dinucleotide modifications with potential regulatory roles such as 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxylcytosine have been described. These potential epigenetic marks cannot be distinguished from 5-methylcytosine by many current strategies and may potentially compromise assessment and interpretation of methylation data. A large number of strategies have been described for the discovery and validation of DNA methylation-based biomarkers, each with its own advantages and limitations. These strategies can be classified into three main categories: restriction enzyme digestion, affinity-based analysis, and bisulfite modification. In general, candidate biomarkers are discovered using large-scale, genome-wide, methylation sequencing, and/or microarray-based profiling strategies. Following discovery, biomarker performance is validated in large independent cohorts using highly targeted locus-specific assays. There are still many challenges to the effective implementation of DNA methylation-based biomarkers. Emerging innovative methylation and hydroxymethylation detection strategies are focused on addressing these gaps in the field of epigenetics. The development of DNA methylation- and hydroxymethylation-based biomarkers is an exciting and rapidly evolving area of research that holds promise for potential applications in diverse clinical settings.


Introduction
Epigenetics is the study of reversible, heritable mechanisms that regulate gene expression without altering the DNA sequence [1,2]. DNA methylation is one of the most wellstudied epigenetic mechanisms in mammals. It refers to the addition of a methyl group to the fifth carbon of a cytosine (5-mC) that precedes a guanine (CpG). Frequently, but not exclusively, CpG dinucleotides occur in CG-rich DNA stretches known as CpG islands (CGIs) [3]. CGIs are often clustered within control regions of a gene, such as the promoter regions, but also less commonly in other parts of the gene, including introns and exons [4]. Recently, methylation has also been shown to occur at "CGI shores," regions of lower CpG density that lie in close proximity, but not within CGIs [5,6]. DNA methylation has many diverse functions in normal cells including silencing of transposable elements, inactivation of viral sequences, maintenance of chromosomal integrity, X-chromosome inactivation, and transcriptional suppression of a large number of genes [7,8]. In normal cells, methylation patterns are replicated with high fidelity during mitosis. However, it has been shown that these patterns can become altered during the course of aging and disease. Aberrant DNA methylation is a well-recognized hallmark of many complex diseases such as heart disease, diabetes, and neurological disorders, but has been most extensively studied in cancer. Accordingly, various investigative teams have focused on characterizing unique DNA methylation "signatures" associated with pathogenesis as they may serve as useful biomarkers for diagnosis, prognosis, disease monitoring, or prediction of response to therapy [9].
DNA methylation biomarkers offer several significant advantages over expression-based markers. For instance, they are readily amplifiable and easily detectable using polymerase chain reaction (PCR)-based approaches even if alterations are present only in a limited number of cells [10]. DNA methylation is a highly stable marker that can be readily detected in a great variety of samples collected in a minimally invasive manner such as saliva, plasma, serum, urine, semen, and stool [11]. Furthermore, disease-specific DNA hypermethylation is a positively detectable signal. Despite these advantages, shortcomings in DNA methylation detection technologies including issues with assay sensitivity, specificity, accuracy, and data interpretation are confounding the discovery and development of effective clinical biomarkers. One limitation of DNA methylation analysis techniques is inability to differentiate heterogeneous methylation patterns in different cells present within samples [12]. Therefore, advances in technology that allow for analysis of a single DNA strand from a single cell will help point toward better biomarkers.
Another limitation of many current methodologies is the inability to distinguish between 5-mC and other novel structurally similar DNA modifications that have been recently discovered in mammalian DNA including 5-hydroxymethylcytosine (5-hmC), 5-formylcytosine (5-fC), and 5-carboxylcytosine (5-caC) [13]. 5-hmC has been recently discovered to be generated by hydroxylation of 5-mC by a group of enzymes of the 10-11 translocation (TET) proteins and is now considered to be "the sixth base" of the genome of higher organisms [14][15][16]. This raises the possibility that 5-hmC may act as an intermediate epigenetic state associated with changes in DNA methylation and transcriptional regulation during development, normal, and disease states [14,16,17]. Studies have shown a correlation between 5-hmC and gene expression, suggesting a regulatory role for 5-hmC [18][19][20]. Furthermore, it was recently shown that 5-hmC is significantly decreased in multiple human cancers and cancer mouse models, opening exciting opportunities to explore new types of epigenetic biomarkers [17,21]. To address this, innovative 5-hmC detection methods are being developed to allow for specific and/or simultaneous detection of 5-mC and 5-hmC. However, further research is necessary in the area of 5-fC and 5-caC detection strategies. Improvements in technology may lead to the development of novel epigenetic biomarkers that will enhance our understanding of the molecular biology of diseases.
This review is divided into two parts that cover existing and emerging strategies applied to (A) discovery and (B) validation of DNA methylation-based biomarkers and describes their major advantages and limitations (Fig. 1). Particularly, more recent strategies that have not been previously reviewed in the literature are described in more detail. Part A gives an overview of the large-scale, genomewide, epigenetic profiling platforms used for candidate biomarker discovery. These platforms can be used to compare methylation profiles among cell lines, healthy samples, and disease samples to find disease-related alterations. Tables 1 and 2 provide an overview of these genome-wide methylation analysis strategies and their applications to sequencing (Table 1) and microarray ( Table 2) platforms and their significant advantages and limitations. Part B gives an overview of highly targeted locus-specific assays used for validation of biomarker performance in large independent cohorts. Table 3 provides an overview of locus-specific assays developed for analysis of a few loci across numerous samples and their advantages and limitations, whereas Table 4 presents information on the sensitivity and DNA quality requirements of each strategy. Additionally, studies examining the effect of hydroxymethylation on the outcome of methylation marker analyses and novel detection strategies specific to 5-hmC are described.

Discovery of Novel DNA Methylation Biomarkers
Over the past few decades, there have been an increasing number of approaches devoted to generating genome-wide methylation profiles and aberrant methylation signatures, each with its own advantages, disadvantages, and areas of applicability. As DNA methylation information is lost during PCR amplification, the majority of techniques rely on methylation-dependent treatment of DNA prior to amplification. These assays can be classified into three main categories: restriction enzyme (RE) digestion, affinity-based analysis, and bisulfite modification. The combination of these three approaches with sequencing and microarray-based platforms has given rise to a wide range of techniques for global DNA methylation analysis.
Global approaches to DNA methylation analysis are being widely used to generate genome-wide methylation profiles because they offer a number of advantages. In general, these approaches are high-throughput strategies with regard to the number of loci that can be analyzed at one time. In particular, sequencing platforms provide quantitative information about the methylation status of every CpG and allow for the analysis of methylation in repeat sequences and rare methylation variants, which is difficult to do using microarrays. Another advantage of sequencing approaches is that they can be used to analyze DNA methylation of regions with no prior knowledge of the sequence. The main weaknesses of sequencing strategies are library bias, cost, availability, and difficulties in data management and analysis, although the cost of massive sequencing technologies is rapidly decreasing. DNA methylation profiling using high-density microarrays is another commonly used method to identify broad differences between groups of samples. They are less time consuming, less labor intensive, and less costly than sequencing. In addition, microarrays allow for simultaneous analysis of a larger number of samples with a wider CGI coverage. Nevertheless, microarray analyses lack reliable quantitation and are limited by probe design, hybridization efficiency, and hybridization artifacts.

Restriction enzyme digestion
Restriction enzyme-based methods exploit the property of methylation-sensitive enzymes which only digest unmethylated DNA and methylation-dependent enzymes which only cut methylated DNA. These enzymes are used to enrich for methylated or unmethylated sequences and provide a read-out of DNA methylation. Restriction landmark genome scanning (RLGS) was the first reliable RE-based technique for global DNA methylation profiling and has been previously reviewed in detail by Smiraglia et al. among others (Fig. 2) [22][23][24]. However, the use of RLGS is decreasing as it involves the use of radioactive materials and gel electrophoresis. Many techniques currently in use couple enzymatic methods to array-based analysis. One such technique is HpaII tiny fragment enrichment by ligation-mediated PCR (HELP) which is based on digestion of high-molecular-weight genomic DNA with methylation-sensitive HpaII (Fig. 2) [25]. In parallel, a second aliquot of DNA is digested with the methylation-insensitive isoschizomer, MspI, which digests the same cleavage site irrespective of methylation status. Main strategies for DNA methylation analysis classified into three categories: restriction enzymes-based, affinity-based, and bisulfitebased strategies. The COBRA approach has been placed between bisulfate-based and restriction enzymes-based strategies, while the COMPARE-MS approach has been placed between restriction enzymes-based and affinity-based strategies because these combine two approaches. COMPARE-MS, combination of methylated-DNA precipitation and methylation-sensitive restriction enzymes.
Therefore, sequences present in MspI but not in HpaII libraries are derived from methylated regions. The MspI library also serves as an internal control that allows for identification of spurious variables that can affect HpaII digestion. These include absence of CpG sites in restriction site, mutations, copy number variations, and technical failure. Furthermore, the use of an internal reference allows for detection of spurious differential effects specific to the HpaII enzyme. The HELP assay has been combined with massively parallel sequencing (HELP-Seq) and/or array-based platforms [25,26]. Other examples of approaches based on HpaII and MspI digestion are methyl-Seq and luminometric methylation assay (LUMA) (Fig. 2) [27][28][29]. In methyl-Seq, following digestion with MspI and HpaII, genomic DNA fragments are subjected to size selection to enrich for CpG-containing regions and the selected fragments are sequenced on a next-generation sequencing platform. In LUMA, genomic DNA is cleaved by HpaII or MspI followed by a bioluminometric polymerase extension and pyrosequencing to quantify the extent of RE cleavage and thus methylation levels. To enable normalization between runs and for DNA input, EcoRI is included in all reactions. The above approaches rely on MspI digestion to create a control library. Alternatively, in methods such as methylation-sensitive cut counting (MSCC), genomic DNA is only digested with HpaII followed by deep sequencing (Fig. 2) [30]. The number of times a given site is observed during sequencing then serves as indication of methylation level. Sites represented many times during sequencing are inferred to have low methylation while sites with no reads have high methylation levels. Besides HpaII/MspI, another enzyme pair commonly used in methylation analyses is SmaI (methylation sensitive) and XmaI (methylation insensitive). One method utilizing these enzymes is methylated CGI amplification (MCA) [31]. This method employs HELP, HpaII tiny fragment enrichment by ligation-mediated PCR; MAD, methylation amplification DNA chip; PMAD, promoter-associated methylated DNA amplification DNA chip; CHARM, comprehensive high-throughput arrays for relative methylation; MMASS, microarray-based methylation assessment of single samples; DMH, differential methylation hybridization; MSNP, methylation single-nucleotide polymorphism; MeDIP, methyled DNA immunoprecipitation; MIRA, methylated CpG island recovery assay; BiMP, bisulfite methylation profiling.
SmaI to generate blunt end fragments and to eliminate unmethylated sites (Fig. 3). Next, DNA is further digested with XmaI to create sticky ends and leave overhangs in methylated sites. Methylated fragments are then adaptor ligated and PCR enriched. The resulting amplicons are either sequenced (MCA-Seq) or differentially labeled and cohybridized to a microarray (MCAM) [31,32]. Another strategy that utilizes the SmaI and XmaI enzymes is methylation amplification DNA chip (MAD) (Fig. 3) [33]. More recently, MAD was modified to develop the promoter-associated methylated DNA amplification DNA chip (PMAD) assay which incorporates the HpaII and MspI enzymes [34]. Both techniques have been previously reviewed by Huang et al. [35]. An alternative to using methylation-sensitive enzymes is to use methylation-dependent enzymes as McrBC. This enzyme recognizes closely spaced methylated cytosines and so has the capacity to digest densely methylated regions of DNA [36]. One technique that utilizes this enzyme is comprehensive high-throughput arrays for relative methylation (CHARM) [37]. The initial step in this method is digestion with RE such as MseI to shear DNA (Fig. 3). The recognition site of this enzyme rarely occurs in GC-rich-regions; thus; most CGIs remain intact. This is followed by the division of DNA into two fractions: one treated with McrBC and the other untreated. The McrBC digested and untreated DNA is size-fractionated, differentially labeled, and cohybridized to a microarray. The ratio of hybridization intensities between treated and untreated DNA provides a measure of DNA methylation. Other techniques that utilize the McrBC enzyme are microarray-based methylation assessment of single samples (MMASS, Fig. 3), which has been reviewed by Huang et al., and MethylScope [35,[38][39][40]. With the Methyl-Scope strategy DNA is sheared and divided into two fractions, one of which is digested with McrBC (Fig. 3). The fragments are then fractionated by electrophoresis and fragment larger than 1 kb are purified, labeled with differ-ent dyes for the McrBC digested and undigested fractions, and cohybridized to genomic-tiling microarrays.
The advantage of using McrBC is its high sensitivity to densely methylated regions. Also, as it does not require a highly specific sequence motif, it cuts more frequently. One other advantage of this assay is that it does not require prior methylation information from a reference genome to serve as a control. Other variations of RE-based DNA methylation profiling methods include those that employ a combination of methylation-sensitive enzymes. One such technique is differential methylation hybridization (DMH) [41]. In this approach, DNA is digested using a combination of methylation-sensitive enzymes such as BstUI, HhaI, and HpaII (Fig. 4). DNA fragments then undergo linker ligation, PCR enrichment, and cohybridization to a microarray. We and others have successfully implemented this strategy. For example, in our laboratory, transforming growth factor b 2 (TGFb2) and homeobox D3 (HOXD3) hypermethylation has been discovered as potential biomarkers of prostate cancer progression through a genome-wide DMH screening [42,43].
An additional microarray platform that enables the measurement of single-nucleotide polymorphisms (SNPs), copy number, loss of heterozygosity (LOH), and DNA methylation simultaneously is methylation SNP (MSNP) [44,45]. In this approach, DNA is first sheared with XbaI, a frequent cutting enzyme, for genomic library construction. Next, the DNA is digested with HpaII to enrich for methylated fragments. This way one can check for (1) copy number variations in XbaI fragments, (2) SNPs in HpaII cutting sites in XbaI fragments, and (3) methylation in HpaII cutting sites. This approach has the obvious advantage of providing information about numerous features from one array.
RE-based genome-wide DNA methylation analysis is a potentially robust approach for genome-wide screening to identify frequently methylated CpGs. The methodology is  relatively straightforward, rapid, and inexpensive and can be used to analyze thousands of CpGs in a single experiment. Some of the earliest studies to find disease-specific gene methylation events which have been proposed as biomarkers relied on RE digestion. For example, methylation of O(6)-methylguanine DNA methyltransferase Sensitivity is dependent on the specific assay and parameters such as the concentration and quality of input DNA and PCR conditions. For this reason, we have not defined absolute values for this parameter.

244
(MGMT) in gliomas, p-class glutathione S-transferase (GSTP1) in prostate cancer, and mutL homolog 1 (MLH1) in colon cancer were discovered using this strategy [46][47][48]. However, as these approaches are based on RE, they are confined to recognition elements and can only interrogate a subset of methylation sites. Another limitation of enzymatic approaches is the inability to distinguish 5-mC and 5-hmC [49]. Methylationdependent enzymes cleave both CpG modifications (methylation and hydroxymethylation), whereas methylation-sensitive enzymes are completely blocked by both modifications. Consequently, a proportion of genomic loci identified as "methylated" in these studies may actually be hydroxymethylated. To address this issue, new enzymatic approaches have been developed for specific detection of hydroxymethylated cytosines. These include, but are not limited to, enzymatic digestion of DNA followed by radioactive labeling of the 5-hmC and enzymatic glucosylation strategies which utilize b-glucosyltransferase to attach a glucose moiety to 5-hmC, protecting it from subsequent digestion with glucosyl-sensitive REs [14,50]. Alternatively, other strategies employ 5-hmC-dependent enzymes such as PvuRts1I which selectively cleave 5-hmC-containing sequences [51]. The enriched 5-hmC fractions can then be analyzed by DNA microarrays, sequencing, or chromatography.

Affinity-based methylation analysis
To circumvent the limitations of RE digest analysis, techniques that use affinity purification to enrich for methylated DNA can be utilized. Techniques used to capture methylated DNA sequences as methyled DNA immunoprecipitation (MeDIP) start with shearing DNA through sonication to produce random fragments [52]. The fragments are then denatured to produce singlestranded DNA and immunoprecipitated with one or more monoclonal anti-5-methylcytosine antibodies (Fig. 4). The collected DNA is enriched for methylated sequences and is then amplified and analyzed using sequencing (MeDIP-Seq) or microarray platforms [52,53]. Recently, coupling of MeDIP with microarray platforms has been proven to be a successful strategy to map genome-wide DNA methylation patterns in Arabidopsis thaliana as well as human normal and transformed cells [52,54,55]. One major limitation of the method is that MeDIP requires DNA to be single-stranded which may be difficult to achieve in regions of high CpG content. MeDIP-based methods are also limited by the 246 quality and specificity of the antibody. Moreover, enrichment efficiency is significantly lower in regions with low CpG content.
To avoid these problems, methods based on methyl binding domain proteins (MBDs) can be used. Such methods include methylated CGI recovery assay (MIRA), which utilizes MBD2 and MBD3, and MBD column chromatography which utilizes MBD2 or MeCP2 [56][57][58]. In MIRA, DNA is sheared with MseI, linker ligated, and incubated with MBD2 and MBD3 bound to a sepharose matrix that binds to methylated DNA with high specificity (Fig. 4). The MIRA captured DNA is PCR amplified, labeled, and cohybridized to CGI microarrays. Affinity-based methods allow for rapid and specific assessment of the mean methylation levels of large DNA regions. The reagents involved are commercially available and easy to use. However, the methods require high-DNA input and do not yield information on distinct CpG dinucleotides. Moreover, MBD or antibody interaction with DNA is affected by surrounding sequences and methylation density. Therefore, repeat sequences are sometimes overrepresented in affinity-based analysis. Furthermore, it has been shown that affinity-based methylation strategies that utilize MBDs or anti-5-mC antibodies are specific and do not bind 5-hmC [59,60]. Therefore, anti-5-hmC antibodies were developed for hydroxymethylation-specific analyses and can be used in the abovementioned strategies replacing anti-5-mC antibodies [19]. Anti-5-hmC-specific antibodies can also be used in combination with dot blots or immunohistochemical platforms to detect 5-hmC in cells and tissues [17,61,62]. Alternatively, numerous strategies that involve chemical labeling of 5-hmC (e.g., biotin or sulfonate) followed by affinity-based purification with specific antibodies have been developed [63,64]. One such approach makes use of enzymatic glucosylation of 5-hmC followed by selective pull down using J-binding protein 1 coupled to magnetic beads [65].

Bisulfite modification
The principle of sodium bisulfite modification is based on the differential reaction of methylated and unmethylated cytosines with the reagent, such that following bisulfite treatment, only unmethylated cytosines are converted into uracils [66]. The conversion can then be detected using a variety of methods combined with sequencing and/or microarray platforms. Bisulfite treatment-based strategies of methylation analysis surpass almost every other methodology, thereby becoming the most widely accepted and most widely used approaches. The advantages of this methodology include quantitative DNA methylation analysis almost anywhere in the genome, single CpG resolution, and detection of strand-specific methylation. However, the conversion process results in significant DNA degradation and reduced sequence complexity. This poses certain challenges for sequencing and array platforms. Moreover, methods relying on bisulfite conversion and sequencing also require extensive bioinformatics for base calling, sequence alignment, and statistical analysis. Additionally, as bisulfite analysis depends on the complete conversion of unmethylated cytosines to uracil, incomplete or inappropriate conversion will be erroneously interpreted. Studies have also shown that sodium bisulfite reacts with 5-hmC to yield a distinct adduct, cytosine 5-methylenesulfonate which does not undergo conversion to a deaminated cytosine [49,60,67]. Some have suggested that 5-methylenesulfonate may stall or block Taq polymerase in subsequent amplification reactions [67]. However, it has been shown that bisulfite-treated DNA templates containing 5-hmC can be efficiently amplified [49]. As a result, following bisulfite conversion, 5-hmC is indistinguishable from 5-mC, implying that a proportion of genomic loci previously identified as methylated may actually be hydroxymethylated. Therefore, "oxidative bisulfite" sequencing (oxBS-Seq) approach has been recently developed [68]. In this approach, 5-hmC undergoes specific oxidation to 5-fC using potassium perruthenate. Next, during bisulfite conversion, 5-fC is converted to uracil allowing for specific mapping of 5-mC sites. Furthermore, 5-hmC mapping can be achieved by subtraction of oxBS-Seq from a BS-Seq readout.
Alternatively, bisulfite-independent strategies involving alternative chemical pretreatments of DNA have been recently developed for specific 5-hmC detection. One such approach is called glucosylation, periodate oxidation, biotinylation (GLIB) [69]. This strategy is based on initial glucosylation of 5-hmC followed by periodate oxidation and biotinylation. The hydroxymethylated DNA is then pulled down using the biotin-streptavidin system. Other related strategies have also been recently published using a custom-synthesized UDP-glucose analog (UDP-6-N3-glucose) or radioactively labeled UDP-[ 3 H] glucose [63,70]. Alternative chemical labeling strategies can be carried out by the addition of sulfur containing moieties, cysteamine, or selenocysteamine followed by direct detection or selective biotinylation [71]. The enriched 5-hydroxymethylated DNA can then be analyzed by microarrays or sequencing.

Sequencing-based methylation profiling
Whole genome shotgun bisulfite sequencing (WGSGS) provides a genome-wide methylation profile at single base-pair resolution and is therefore the most comprehensive methodology [72,73]. It has recently been applied to generate a whole genome methylation profile of the A. thaliana genome [74,75]. However, the human genome is much larger and the cost of sequencing is currently very expensive.
An alternative method, called reduced representation bisulfite sequencing (RRBS), enriches for CpG-rich regions using RE such as BglII or MspI to reduce genome complexity and sequence redundancy [76,77]. Next, DNA undergoes adaptor ligation, bisulfite modification, PCR enrichment, and finally sequencing. The data generated includes regions of the genome that are in close proximity to the RE's recognition site. That is simultaneously an advantage for bioinformatics analysis and a limitation for genome-wide methylation analysis.
An alternative approach to enrich for CpG-rich DNA is denaturing HPLC (DHPLC) [78]. This technique is based on the idea that following bisulfite treatment, amplicons that differ in methylation patterns have different G/C content resulting in different melting temperature, which in turn translates into different retention times in HPLC under partially denaturing conditions. The different DNA fractions are then sequenced to identify methylation 248 profiles. The advantages of this technique are that it is simple, cost-effective, and rapid. However, it requires relatively high DNA quantities and has limited sensitivity, especially when analyzing tissue samples.

Massively parallel clonal DNA sequencing platforms
Sequencing-based methylation analyses initially relied on Sanger sequencing [79]. However, it is too costly, inefficient, and time consuming to sequence the entire human genome. Therefore, a variety of sequencing platforms have been developed and applied to DNA methylation analysis. These include next-generation sequencing (NGS) and single-molecule sequencing. The development of NGS platforms enables sequencing and mapping of millions of DNA fragments in parallel, thus significantly increasing throughput and decreasing cost per base thereby providing new opportunities for comprehensive, highly sensitive, genome-wide mapping of methylation sites at a more affordable price [80]. These methodologies are gradually replacing conventional sequencing. The three main NGS platforms currently used are Roche 454 sequencing (Branford, Connecticut), Applied Biosystems SOLiD TM (Carlsbad, California), and Illumina Solexa; genome analyzer (San Diego, California) [81][82][83]. Other NGS platforms also available are Polonator (Salem, New Hampshire) and Helicos Heliscope TM (Cambridge, Massachusetts) [84,85]. Roche 454 sequencing was the first commercially available NGS platform. In this approach, clonal amplification of library fragments bound on beads is achieved by single-molecule emulsion PCR with amplicons captured onto the surface of beads. Individual beads are then sequenced by pyrosequencing. Roche 454 can generate up to one million reads per run at read lengths of up to 1 kbp (http://www.454.com/). It provides the fastest time per run and longest read length compared with other NGS platforms, offering several advantages for methylation analysis. Longer reads can be more easily and accurately aligned to the reference sequence and have a higher chance to cover SNPs and other genotyping information in the vicinity of CpGs. However, this strategy generates less reads per run resulting in higher cost of sequencing. Additionally, it has a higher error rate in calling homopolymeric stretches which may be a problem in bisulfite-modified DNA because it contains long stretches of A or T following conversion.
Similarly, Applied Biosystems SOLiD TM is also based on emulsion PCR to generate clonally amplified sequencing fragments with smaller beads attached to a solid surface and sequencing is achieved using sequencing by synthesis driven by a ligase. The Applied Biosystems SOLiD TM platform can generate up to 700 million reads per run at read lengths of up to 75 bp (http://www. appliedbiosystems.com). One advantageous feature of this platform is two base encoding in which each base position is examined twice; thus, miscalls can be more readily identified. Additionally, a new strategy, termed Meth-ylSeq TM , has been recently developed. In MethylSeq TM , bisulfite-modified DNA is also amplified by microdroplet emulsion PCR using a primer library targeting a large number of genes. The resulting PCR library is sheared, ligated, and subjected to massively parallel clonal sequencing [86,87]. However, like Roche 454, SOLiD TM and MethylSeq TM are based on emulsion PCR which can be troublesome and technically challenging.
The Illumina Solexa genome analyzer is the most widely used NGS strategy for DNA methylation analysis. It is based on in situ bridge template clonal amplification on a solid surface with amplicons remaining immobilized and clustered in a single physical location. Up to eight independent amplicon libraries are then sequenced in parallel using sequencing-by-synthesis technology that employs reversible terminators with removable fluorescent dyes. The Illumina Solexa genome analyzer can generate over 300 million reads per run at read lengths of up to 2 9 150 bp (http://www.illumina.com/systems/genome_ analyzer_iix.ilmn). Both Applied Biosystems SOLiD TM and Illumina Solexa genome analyzer offer higher throughput and lower cost compared to Roche 454 but are more limited in alignment of bisulfite-converted sequences.
Other emerging single-molecule sequencing strategies bypass methylation-dependent treatments such as bisulfite modification prior to analysis. For example, two such new sequencing approaches are nano-sequencing and singlemolecule, real-time (SMRT) sequencing [88,89]. Nanosequencing identifies methylation-based fluctuation in ionic current as DNA passes through a nanopore while SMRT-sequencing relies on emission spectra and polymerase kinetics during sequencing-by-synthesis for methylation analysis. These strategies offer the ability to perform highly sensitive methylation analyses of minute DNA quantities that is free of methylation-dependent treatment and amplification artifacts. Moreover, nano-and SMRTsequencing have been shown to distinguish 5-mC from 5-hmC without any DNA pretreatments [90,91].

Microarray-based DNA methylation profiling
In a technique, known as bisulfite methylation profiling (BiMP), bisulfite-treated DNA is subjected to whole genome amplification (WGA) using random tetranucleotide primers, enzymatic fragmentation, and microarray hybridization [92]. The microarray is designed using differentially labeled oligonucleotide pairs complementary to the unchanged, methylated sequence. Therefore, methylation is detected as a signal and mismatches caused by the conversion of unmethylated cytosines do not result in signal. This approach results in overall low hybridization signal and may not be applicable to regions of sparse methylation. The Infinium approach entails similar sample preparation that involves bisulfite modification of genomic DNA followed by WGA [93,94]. The DNA is then hybridized to BeadChip microarrays, which are designed with oligonucleotide pairs targeting CpG sites of interest, with one complementary to the unchanged, methylated sequence and the other to the converted unmethylated sequence. Next, a PCR reaction is performed with fluorescently labeled universal PCR primers and the methylation levels can be determined by comparing the proportion of fluorescence emitted by each dye. Most microarray platforms contain a standard array of probes covering a library of CGIs. However, some companies also offer custom microarrays to allow for flexibility in experimental design and methylation analysis of CGI and/or organisms not available on standard microarrays. Furthermore, in the future era of personalized medicine, custom microarrays will be valuable for specific, individual methylation signatures.

Microarray expression profiling
Genome-wide methylation profiling of samples representing diseased and normal state in search for biomarkers can be costly and time consuming. Therefore, some investigators prefer to narrow down the search using an expression-array following treatment with demethylating agents such as 5-aza-2′-deoxycytidine [95,96]. This approach facilitates identification of genes that display evidence of methylation-dependent gene regulation in a disease state and understanding of disease pathobiology and progression. This approach identifies potential biomarkers, that is, those genes that are reactivated after the treatment. However, this strategy is prone to false results and is not considered to be a reliable measure of DNA methylation. This is because treatment with demethylating drugs alters the expression of many genes that (a) may not be related to disease state and (b) could stimulate expression of other, secondary targets. Therefore, methylation profiles of candidate biomarkers identified using this approach are further validated by other strategies.

Validation of DNA Methylation-Based Biomarkers
Global DNA methylation screening approaches have their limitations and are prone to biases. Therefore, it is important to validate genome-wide assays with a quanti-tative, locus-specific assay, to assess quality and accuracy of the data and to determine whether specific methylation differences observed between samples are genuine. Majority of current gene-specific assays are PCR-based and are easily adapted to commercial platforms and can be used in clinical laboratories with high sensitivity and specificity [97]. Likewise, global 5-hmC detection strategies need to be validated using RE-, affinity-or bisulfite-based approaches combined with site specific, PCR-based platforms. Therefore, a number of methods have been developed to enrich for CpG harboring segments and survey a more limited region of the genome for methylation. One such method is bisulfite sequencing (BS) [79]. In this technique, genomic DNA is bisulfite modified and regions of interest are PCR amplified (Fig. 4). The PCR products are then cloned in Escherichia coli and numerous individual clones, each representing one PCR amplicon, are sequenced. The cloning step in this assay is necessary in order to isolate individual alleles, which differ in the pattern of methylated CpGs. However, it is costly, laborious, and time consuming. Therefore, recently digital PCR has been applied to BS [98]. Digital PCR is an alternative method for isolation of individual alleles which differ in methylation patterns. In digital PCR, the DNA sample is distributed over a 96-well PCR reaction plate so that individual DNA molecules are localized and amplified independently. The digital PCR products are purified and subjected to sequencing. BS is considered the goldstandard technique for DNA methylation analysis as it provides high-accuracy, single-nucleotide resolution information about the methylation status of almost any desired DNA segment. Therefore, BS has been extensively used to generate high-resolution maps of 5-mC in the CGI associated with a variety of promising biomarkers including MGMT, CDKN2A, and MLH1 to name a few [99][100][101]. More recently, strategies based on padlock probes have been developed as an alternative to enrichment for CpG-rich DNA fragments [102,103]. Padlock probes consist of end segments, complementary to a target sequence, connected by a linker sequence. The end segments hybridize to bisulfite-converted target DNA in such a way that during ligation the probe becomes circularized around it. The linker sequence is then used for universal PCR allowing for the amplification of thousands of probes within a single reaction. The amplified targeted CpGs in padlock loops are then subjected to sequencing. In a technique called bisulfite padlock probes (BSPP), a library of padlock probes is hybridized to bisulfiteconverted DNA, circularized, and PCR amplified. The resulting amplicons are then sequenced (Fig. 5). The main limitations of this method are sequence dependent bias of DNA polymerase and ligase, probe design, and hybridization efficiency.
Another strategy used to detect methylation in targeted DNA regions is pyrosequencing. Pyrosequencing is a "sequencing by synthesis" technique in which bisulfitemodified DNA is amplified using biotinylated primer (Fig. 4). The resulting biotin-labeled amplicons are denatured and utilized as a template for sequencing primers. During pyrosequencing, only one of the four nucleotides is present, and if incorporated into the sequence in a complementary base-pair wise manner, a pyrophosphate molecule is released as a reaction by-product. The release of pyrophosphate molecules is then quantitatively converted into a bioluminometric signal. Pyrosequencing has been widely used for methylation analysis in clinical specimens because it allows for direct quantitative sequencing of CpGs within a defined region of interest, accuracy, reproducibility, speed, and ease of use. Furthermore, the pyrosequencing technology has been incorporated into massively parallel sequencing on the 454 sequencing system to allow for genome-wide methylation analysis [104,105].
An alternative sequencing platform for analysis of preselected CGIs is the GoldenGate assay, which has been previously reviewed by Chang et al. [106]. In this strategy, bisulfite-modified DNA undergoes allele-specific extension and ligation of specific CpG loci followed by PCR with universal primers and hybridization to bead microarrays.
A more recent platform adapted for methylation analysis is matrix-assisted laser desorption ionization time-offlight mass spectrometry (MALDI-TOF-MS). MassARRAY EpiTYPER assay uses this platform for quantitative basespecific methylation analysis of genomic regions of interest (Fig. 6) [107]. EpiTYPER can be used for biomarker discovery; however, the technology is especially well suited for precise sequencing using short DNA fragments and is more commonly used in candidate gene methylation analyses. In this assay, bisulfite-modified DNA amplicons with a T7-promoter tag are transcribed in vitro and digested with RNase A. Subsequently, the products are analyzed by MALDI-TOF-MS. Each C-to-T switch in the DNA following bisulfite conversion is identified on the MS as a mass difference of 16 Da. The main advantages of EpiTYPER are that it is fast, accurate, reproducible, and quantitative. However, some CpGs are missed by this technique when two fragments generated are of the exact same size, or fragments that are too small or too large to Figure 5. Schematic diagram of the bisulfite padlock probes approach to DNA methylation analysis. Bisulfite-modified DNA is combined with thousands of padlock probes that contain a common linker sequence represented in green. The library of padlock probes is hybridized to the bisulfite-converted DNA, circularized, and PCR amplified. The probes contain an enzyme digestion site such as MmeI-recognition site for uniform size selection. Next, the PCR-amplified DNA is digested and processed for next-generation bisulfite sequencing analysis.
be analyzed. This technique has been previously used in our laboratory to provide accurate and quantitative methylation profiles of multiple CpGs in the Bone morphogenetic protein 7 (BMP7) and HOXD3 genes in prostate cancer samples [42]. Given that EpiTYPER analysis is based on bisulfite modification, it cannot differentiate 5-mC from 5-hmC. Alternative MS-based platforms can be used for specific 5-hmC quantification including HPLC-MS and liquid chromatography-MS [61,[108][109][110].

Methylation-specific PCR (MSP) and quantitative variations of MSP
MSP is the most widely used locus-specific bisulfite-based DNA methylation analysis strategy that has been reliably applied to a large scale of clinical samples and has been previously reviewed in the literature [10,111]. Briefly, bisulfite-modified DNA serves as a template for PCR amplification using primer sets specific for methylated (MSP) and unmethylated (methylation-independent PCR) sequences. This is designed for proportional amplification of methylated and unmethylated DNA, respectively. MSP can also be coupled with in situ hybridization to visualize the methylation status of specific CpGs in individual cells [112]. It is a very popular technique because it is rapid, cost-effective, easy, and requires lesser quantities of DNA. However, it is prone to false positives, PCR contamination, and can only be used for qualitative analysis. Quantitative variations of this technique based on real-time PCR include MethyLight, methylation-sensitive melting curve analysis (MS-MCA), methylation-sensitive high-resolution melting (MS-HRM), sensitive melting analysis after real-time (SMART)-MSP, HeavyMethyl, and methylation-specific fluorescent amplicon generation (MS-FLAG) (Fig. 7), [113][114][115][116][117][118]. All these quantitative variations of MSP are highly sensitive real-time assays and are suitable for DNA methylation analysis of fresh, frozen, or formalin-fixed paraffin-embedded tissues and body fluid samples, such as serum, plasma, and urine.
MethyLight utilizes methylation-specific primers and a TaqMan methylation-specific fluorescent reporter probe that anneals to the amplified region of interest [113]. Annealing between the probe and methylated DNA results in fluorescent signal detection that is proportional to the Figure 6. The basic principle of EpiTYPER analysis. Bisulfite-modified DNA is PCR amplified with T7 promoter-tagged reverse primer. Next, in vitro RNA transcription is performed, followed by digestion with RNase A. The digestion products are analyzed by MALDI-TOF MS. Methylated cytosines are transcribed to guanine, whereas unmethylated cytosines are converted to uracils and transcribed to adenines. This is represented in the mass spectrum by signal pairs separate by 16 m/z (or multiples thereof).
252 amount of amplicon. Methylation levels are then determined by normalizing the signal to an Alu-based control reaction. MethyLight is a high-throughput, specific, sensitive, and quantitative assay that requires very small amounts of DNA; thus, it is suitable to be used in clinical laboratories. The utility of MethyLight for DNA methylation-based biomarker has been demonstrated by numerous studies, including the methylation of GSTP1, APC, TGFb2, HOXD3, MLH1, dickkopf homolog 1 (DKK1), and secreted frizzled-related protein 1 (SFRP1), which has been shown to be detected in prostate and colon cancers [43,[119][120][121][122]. More recently, MethyLight In MS-MCA, bisulfite-treated DNA is PCR amplified with methylation-independent primers and doublestranded intercalating dye such as SYBR green (represented by green circles). Following PCR, the reaction temperature is increased and DNA melting properties are examined. Methylated DNA is C and G rich and consequently more resistant to melting. Therefore, more fluorescent signal is recorded at higher melting temperatures. (c) In SMART-MSP, bisulfite-modified DNA undergoes methylation-specific amplification in the presence of double-stranded intercalating dye such as SYBR green (represented by green circles) and the amount of signal detected is proportional to the amount of methylated DNA. Following PCR, the reaction temperature is increased and DNA melting properties are examined. (d) HeavyMethyl utilizes blocker oligonucleotides that specifically bind to unmethylated DNA and prevent its amplification. Alternatively, methylated DNA is amplified using methylation-independent primers and a methylation-specific probe that contains a fluorophore (F) and a quencher (Q). During the PCR reaction, the probe is cleaved by the exonuclease activity of DNA polymerase, causing the fluorophore to be released from the quencher and light to be emitted. The emitted light signal is proportional to the amount of methylated DNA present in the sample. (e) In MS-FLAG, bisulfite-treated DNA is amplified with methylation-specific primers that contain a cleavage site for PspGI. Additionally, the primers contain a fluorophore (F) and a quencher (Q). The cleavage of the primers by PspGI enables the release of the quencher from the fluorophore and light to be emitted, which is proportional to amount of methylated DNA.
has been improved with the implementation of digital PCR [98]. However, this assay only allows quantitative methylation assessment of a few selected CpGs and is based on the assumption that all CpGs within the region probed share the same methylation status. Therefore, the selection of informative CpGs is crucial. MS-MCA is a method that employs an intercalating double-stranded DNA fluorescent dye such as SYBR green to monitor the melting properties of PCR products during MSP as temperatures rise [114]. DNA melting curves are acquired by measuring the fluorescence during a linear temperature transition. Methylated DNA following bisulfite modification contains higher GC content, thus making it more resistant to melting. As a result, more fluorescent signal is recorded at higher melting temperatures. Methylation status of an unknown sample is then determined by comparing its melting profile with the melting profiles of controls obtained from the amplification of fully methylated and unmethylated molecules. MS-HRM is an improvement of MS-MCA that acquires more data points, thus allowing for subtle differences within the amplicons to be detected. More information about this strategy is available in a review by Kristensen et al. [111,115]. Another method called SMART-MSP involves WGA, bisulfite modification, and probe-free realtime MSP with a fluorescent dye followed by HRM [116]. In this approach, methylation levels are determined based on fluorescent signal detection during MSP and melting profiles during HRM. Methylation levels are determined by normalization to a control assay such as collagen, type II, and alpha 1 (COL2A1) as well as to fully methylated and unmethylated standards. The limitations of all methods based on MCA are the use of dyes and the necessity of special equipment. Additionally, when heterogeneously methylated molecules are analyzed by MCA, the melting pattern becomes complex and difficult to interpret.
An alternative approach called HeavyMethyl uses methylation-independent primers and oligonucleotide blockers that hybridize only to unmethylated DNA [117]. Thus, only methylated DNA is amplified. The use of blockers to prevent unmethylated DNA amplification increases analytical sensitivity and reduces false-positive rate. This strategy also employs a fluorescent probe and fluorescent signal detection is used to quantify DNA methylation. Methylation status is quantified by normalization to a reference housekeeping gene such as b actin (ACTB) in a duplex PCR reaction. This is approach is more complicated than other approaches and requires a more accurate optimization. MS-FLAG is another quantitative MSP approach that relies on fluorescence [118]. In MS-FLAG, the real-time fluorescence signal is detected during PCR by cleavage of the MSP primers containing a fluorophore by a thermostable endonuclease. Methyl-BEAMing is a recently developed system based on methylationindependent PCR amplification of individual bisulfiteconverted DNA molecules attached to magnetic beads within aqueous nano-compartments suspended in oil phase [123]. Following PCR, the beads are collected, incubated with fluorescent probes that specifically hybridize to methylated sequences, and analyzed using flow-cytometry. Methylation levels are then determined by normalization to long interspersed nuclear elements (LINE1)-based control reactions. This approach has been successfully applied to the analysis of vimentin methylation as a potential diagnostic biomarker for colorectal cancer [123].

Methylation-sensitive single-nucleotide primer extension (MS-SNuPE)
MS-SNuPE is another bisulfite modification-based strategy that has been previously reviewed [124,125]. The assay involves amplification of bisulfite-modified DNA with primers that terminate prior to the cytosine residue to be assayed. Next, on primer annealing, the primers are extended with radioactive nucleotides and the methylation is identified based on the sequence visualized by autoradiography. To avoid radioactive labeling, SNaPshot, HPLC, and MIRA platforms have been combined with MS-SNuPE [126][127][128].

Combined bisulfite restriction analysis (COBRA)
COBRA is a well-established bisulfite-based method that relies on methylation-independent DNA amplification and digestion with BstUI, an enzyme that cuts unmodified cytosines [129]. Methylation levels are established by the relative amounts of digested and undigested PCR products. COBRA is a low-throughput, nonquantitative technique that can only analyze CpGs present in enzymatic restriction sites. Furthermore, the method is relatively labor-intensive yet cost-effective. An improved protocol for COBRA, called Bio-COBRA, has been developed with a microfluidic platform for more high-throughput, accurate, and quantitative DNA methylation analysis [130].

Methylation-sensitive arbitrarily primed PCR (MS-AP-PCR) and amplification of intermethylated sites (AIMS)
The most well-known locus-specific DNA methylation analysis techniques based on methylation-sensitive RE are MS-AP-PCR and AIMS [131][132][133]. In MS-AP-PCR, DNA is digested with MspI or HpaII, whereas in AIMS, it is digested with SmaI and XmaI. However, both techniques suffer from low-resolution and low-throughput, require high DNA quality and quantity, and utilize radioactive materials. Consequently, MS-AP-PCR and AIMS are rarely used for methylation analysis nowadays.

MeDIP-PCR
An alternative approach to enrich for methylated DNA is affinity-based enrichment. One class of affinity-based strategies, called MeDIP-PCR, utilize bead-immobilized anti-5-methylcytosine antibodies [134]. Gene-specific DNA methylation is subsequently analyzed by PCR. Affinity-based methods allow for rapid and specific assessment of methylation changes in a gene-specific manner. They are easy to use and commercially available. However, the methods require high DNA input, have a potential for false-positive results due to unspecific binding to unmethylated DNA, and do not yield quantitative information.

Combination of methylated-DNA precipitation and methylation-sensitive restriction enzymes (COMPARE-MS)
This method combines RE digestion with AluI and HpaII and MIRA to enrich for methylated DNA followed by quantitative real-time PCR (qPCR) for a more sensitive and specific methylation analysis than either approach alone [135]. However, the assay is complex, laborintensive, and time consuming.

Conclusions and Future Perspectives
The development of DNA methylation-based biomarkers is an emerging and exciting area of research that holds promise for potential applications in diverse clinical settings. This review focuses on a large number of techniques that have been developed for methylation analysis at global and gene-specific levels for DNA methylation-based biomarkers discovery and validation. It is important to note that a key intermediate step between discovery and validation is the analysis of the heterogeneity of methylation in gene promoters and identification of contextually meaningful CpG sites that mediate gene transcription. DNA methylation changes may cause quantitative transcriptional changes and/or may lead to qualitative transcriptional silencing. Upstream regulatory regions of many genes are known to harbor more than one promoter. These promoters may serve to regulate expression of specific transcripts thereby leading to generation and/or expression of alternate transcripts. Differential methylation of such promoters may be context dependentthat is, certain promoters are preferentially regulated via methylation in certain tissues or cell types. Alternately, methylation signals of promoters may change in response to surrounding environmental milieu.
Better understanding of these aspects will provide important clues underlying association of specific biomarkers with disease biology. However, there are still many challenges to the effective implementation of DNA methylation-based biomarkers. For example, many methylation studies published to date have not accounted for the presence of hydroxymethylated DNA. The role(s) of 5hydroxymethylation is distinct from 5-mC and is being elucidated. Recent studies suggest that 5-hmC may serve as an intermediate in direct DNA demethylation [136,137]. It is present in mammalian DNA at physiologically relevant levels and aberrant hydroxymethylation may lead to disease. For example, 5-hmC is already implicated in carcinogenesis as it is significantly decreased in prostate, colon, and breast cancer compared with normal tissue [138]. Systematic investigation of the distribution and function of 5-hmC marks in various cellular contexts is necessary. No single 5-mC or 5-hmC detection strategy to date is superior to others, and there is much to be done in the field of epigenetic biomarker analysis strategies that will close the gap between biomarker discovery and clinical adaptation.
Earlier methylation analyses relied exclusively on BS, but this approach has many challenges. Subsequently, array-based profiling approaches were leading the field of DNA methylation-based biomarker discovery, but NGSbased approaches have quickly caught up and are likely to become the platform of choice in the near future. If the $1000 personal genome becomes a reality in the future, personal epigenome will be a reality soon to follow. One can envision in the era of personalized medicine, individual methylation "signatures" will be tested in a variety of minimally invasive samples. Although currently the identification of methylation "signatures" is focused mostly on cancer, future focus will be on other diseases, beyond cancer.
With respect to future frontiers in array-based platforms, the development of a triple microarray that will allow highly sensitive analysis of disease-related changes in DNA methylation, histone modifications, and microRNA expression simultaneously will provide new insights for more comprehensive epigenetic biomarker development.
New advancements in epigenetic technologies in the future will most likely drive the development of easy, noninvasive, cost-effective, high-throughput, highly sensitive, and specific epigenetic tests in the clinic.
Fellowship, Laboratory Medicine and Pathobiology Fellowship, and Paul Starita Fellowship, University of Toronto.