Epigenome-wide association studies for cancer biomarker discovery in circulating cell-free DNA: technical advances and challenges

Since introducing the concept of epigenome-wide association studies (EWAS) in 2011, there has been a vast increase in the number of published EWAS studies in common diseases, including in cancer. These studies have increased our understanding of epigenetic events underlying carcinogenesis and have enabled the discovery of cancer-specific methylation biomarkers. In this mini-review, we have focused on the state of the art in EWAS applied to cell-free circulating DNA for epigenetic biomarker discovery in cancer and discussed associated technical advances and challenges, and our expectations for the future of the field.


Introduction
The epigenome represents the compendium of (mitotically) heritable molecular changes, independent of alterations in the DNA sequence that holistically regulate (in concert with other factors) the mode of expression of the information encoded in the DNA sequence, hence determining the cellular phenotype [1]. Epigenetic marks include DNA methylation, histone modifications and variants as well gene expression regulation by ncRNAs. DNA methylation is one of the most important epigenetic modifications in eukaryotes necessary for cellular differentiation, with each cell type having a unique methylation profile. Environmental exposures, age-related changes and those induced by injury and inflammation, or mutations in epigenome-regulating genes, leave their mark on the methylome creating a distinctive footprint.
Analogous to genome-wide association studies (GWAS), epigenome-wide studies (EWAS) are designed to identify associations of epigenetic marks with a specific phenotype (trait, condition or a disease) using a variety of arrayor sequencing-based profiling technologies [2]. The most commonly studied epigenetic mark is DNA methylation of 5-methylcytosine in CpG context. DNA methylation plays a crucial role in gene regulation of many oncogenes and tumour suppressor genes, and aberrant DNA methylation of both individual gene promoters and on a genome-wide scale has been heavily implicated in cancer initiation and progression [3,4].
Hypermethylation of promoter CpG islands may induce epigenetic silencing of individual tumour suppressor genes while global hypomethylation contributes to tumourigenesis through the promotion of genomic instability and activation of oncogenes. It has been well documented that specific epigenetic changes, such as MGMT promoter hypermethylation in glioblastoma, or global hypermethylation in bladder cancer, have been associated with sensitivity/resistance to chemotherapeutic drugs [5][6][7]. Alterations in DNA methylation patterns were shown to be an early feature in cancer development, and unlike mutations, specific epimutations are highly prevalent within tumour types [8][9][10][11]. Pronounced intratumour heterogeneity of promoter DNA methylation has been documented in a variety of tumours, and recent studies have demonstrated a positive correlation between the extent of genome-wide DNA methylation heterogeneity and adverse patient outcome [12][13][14]. Altogether, these findings have spurred research into the potential use of specific DNA methylation alterations as cancer biomarkers for diagnosis, prognosis and prediction of therapy response, and early detection [15][16][17].

Cancer EWAS and biomarkers
The scope for the discovery of novel DNA methylation biomarkers has greatly expanded due to the evolution of new technologies enabling a transition from candidategene approaches to genome-wide studies based on microarray and sequencing methods interrogating hundreds of thousands of CpG loci, and ultimately to bisulfite sequencing of the whole genome or a selected fraction of the genome. EWAS have provided a systematic insight into both environmental (such as diet and smoking), and intrinsic factors that result in altered DNA methylation profiles. Although the causal relationship cannot be inferred with certainty from EWAS, DNA methylation marks strongly associated with the phenotype of interest can, nevertheless, be useful as cancer biomarkers [15,18].
In the past five years, as a result of a widespread use of Infinium Bead Array ('450 K array') and methylation pulldown sequencing assays, the number of EWAS studies aimed at identifying and validating specific DNA methylation changes associated with cancer initiation, specific subtypes, prognosis or drug response published each year has tripled [17]. Many DNA methylation biomarkers with diagnostic, prognostic and predictive power are already in clinical trials or the clinical setting for cancer [19 ]. One such success story for DNA methylation biomarker development, with rapid translation from bench to bedside, is the methylation of the SEPT9 promoter as implemented in a blood-based test for colorectal cancer (CRC) screening [20]. Following extensive validation in prospective clinical trials, the SEPT9 test has been commercially marketed as Epi proColon (Epigenomics AG) and been made available in several European countries, China and USA where it has been recently approved by the US Food and Drug Administration (FDA).

DNA sources for EWAS
Given the robustness of DNA methylation during cell isolation and processing, DNA derived from almost any tissue type or bodily fluid can be used for DNA methylation analysis, provided there is enough input material for the chosen assay. Figure 1 illustrates the most common cell-based and cell-free DNA sources used for EWAS and why. Although it is desirable to measure disease-associated DNA methylation biomarkers in a disease-relevant tissue or primary cell type, surrogate tissues such as whole blood may be used if the biomarker is tightly associated with the phenotype of interest either directly or indirectly. Whole blood has been commonly used as a surrogate tissue of choice for many EWAS studies performed to date despite limitations in detecting tissue-specific alterations and the requirement to correct for cell composition heterogeneity reviewed in [21,22]. A notable early example of a blood-based EWAS applying cell composition correction is a study by Liu et al. in which they identified four CpG loci in the major histocompatibility complex (MHC) cluster associated with rheumatoid arthritis by analysing 354 cases and 337 healthy controls using the 450 K array platform [23]. Other types of tissues, including solid tumour biopsies, have also been successfully used to identify biomarkers as well as mechanistically relevant differentially methylated positions (DMPs) and regions (DMRs), in lung cancer [24] to name but a few recent studies. However, for successful discovery of biomarkers based on comparing (preferably matched) tumour and normal tissue samples, and subsequent translation to a blood-based test suitable for a clinical environment, it is important to include whole blood as a control tissue and/or screen the biomarker against an appropriate database such as MARMAL-AID [31,32]. This will ensure that selected cancer-specific biomarkers will not be or only be minimally confounded by cell composition effects.

Liquid biopsies for biomarker discovery
Because of their invasive nature, tumour biopsies cannot always be performed in the clinical setting and; additionally, they are liable to sampling bias owing to the heterogeneous nature of solid tumours. To overcome these limitations, liquid biopsies are increasingly being used as a minimally invasive alternative and more comprehensive capture of tumour heterogeneity [33]. For that, circulating tumour (ctDNA) or cell-free DNA (cfDNA) isolated from different biofluids, such as plasma, serum, urine or saliva can be used for biomarker discovery in the context of tissue [34 ,35 ] and tumour [36,37] dynamics, including tissue-specific cell death as well as tumour load, progression and evolution ( Figure 1).
Cell-free DNA is highly fragmented to a mean length of only around 180 base pairs. Components of cfDNA include DNA shed by normal cells undergoing apoptosis in healthy individuals, but both necrosis and apoptosis of tumour cells, disseminated tumour cells (DTC), and active secretion of DNA by living cells contribute to ctDNA in cancer patients. Although ctDNA makes up less than 1% of total cfDNA content, it can be distin-

Low resolution disease-specific associations
Tissue dynamics, including tissue-specific cell death Tumor dynamics, including load, progression, evolution High resolution disease-specific associations

Current Opinion in Genetics & Development
Cell-based and cell-free DNA sources for epigenome-wide association studies (EWAS). DNA can be derived from whole blood (wb), affected tissue (at), purified primary cells (pc) and biofluids, including blood plasma or serum, with (cf) referring to all circulating cell-free DNA and (ct) to circulating cell-free tumour DNA. demonstrated the utility of ctDNA for monitoring tumour dynamics during treatment in patients with advanced disease. Exemplar proof-of-principle efforts include a study on metastatic breast cancer comparing the utility of targeted sequencing of ctDNA to serum cancer antigen 15-3 (CA15-3) and circulating tumour cells (CTCs) to measure treatment response [41], and a study by Murtaza and colleagues applying exome sequencing to track tumour evolution of breast, ovarian and lung cancers in response to therapy [42]. In contrast, most of the studies looking at cfDNA methylation associated with tumour stage, prognosis and response to therapy performed to date were based on candidate-gene approaches [43][44][45][46][47]. Studies based on genome-wide methylation profiling of cfDNA remain scarce [48 ,49 ,50 ,51 ] due to technical challenges including minute amounts of starting material, procedural losses during sample processing, DNA extraction, bisulfite conversion and library preparation ( Figure 2).

Technical challenges
Although there is a plethora of available methods for genome-wide DNA methylation profiling, including the use of methylation-specific restriction enzymes, affinity enrichment or bisulfite conversion in combination with microarray or sequencing; these methods differ not only in their coverage and resolution (ranging from few 100s to 1 bp), but also require different amounts of starting material [52 ,53]. The gold standard method for genome-wide interrogation of DNA methylation at a single base pair resolution with a digital readout is bisulfite sequencing (BS-seq) [54,55]. Here, we discuss the critical determinants for applying high-throughput genome-wide BS-seq to EWAS for cancer biomarker discovery in liquid biopsies ( Figure 2).
Firstly, the choice of biofluid is of significant importance, and should be selected not only in relation to the biology of the disease (for instance, urine for bladder cancer, saliva for oral cancer, etc.), but also taking into consideration the composition of the sample. For example, even though blood serum contains higher concentrations of cfDNA per ml than plasma, it was demonstrated that large fraction of it originates from lysed lymphocytes and thus, contains reduced representation of ctDNA [49]. To achieve an optimal analytical sensitivity of the 50 Cancer genomics   EWAS pipeline for biomarker discovery based on genome-wide bisulfite sequencing in liquid biopsies. The biological or clinical question determines the appropriate EWAS study design. The choice of sample type determines the source of potential gDNA contamination of liquid biopsy given the differences in cell composition between biofluids (blood plasma, serum, urine, cerebrospinal fluid, saliva . . . ). Sample processing, including the methods for sample acquisition and storage, carry the risk of cfDNA degradation and/or contamination with gDNA. Significant procedural loss of material may occur during subsequent cfDNA extraction. Treatment with sodium bisulfite induces further fragmentation of cfDNA or adaptor-tagged libraries, in addition to procedural DNA loss during purification steps. The choice of read length, sequencing depth and whether or not molecular barcodes are being used, influence the coverage and the lower detection limit for DNA methylation variants.
downstream methylation assay, stringent standard operating procedures (SOP) for biofluid collection (choice of anticoagulant, time interval between processing and collection, storage temperature), processing (centrifugation force and temperature, extraction method), quantification and long-term storage need to be implemented to maximise the cfDNA recovery during extraction and to minimise the levels of background noise that may come from contaminating DNA [56].
Secondly, treatment of cfDNA with sodium bisulfite results in further fragmentation and significant loss of starting material during desulfonation and purification procedures [57][58][59]. Even though the levels of cfDNA are somewhat increased in cancer patients, the combination of minute amounts of tumour-specific methylated DNA circulating in the excess of unmethylated 'normal' cfDNA, and bisulfite treatment may lead to reduced complexity and stochastic sampling issues. In addition, the limiting factor for variant detection in cfDNA methylation analysis by BS-seq is the background noise from the incomplete bisulfite conversion, which might be further reduced by optimisation of this step. Thus, efficient protocols for cfDNA isolation from plasma and bisulfite treatment are critical factors for the subsequent generation of libraries of sufficient complexity and successful analysis of sparse amounts of methylated DNA in plasma.
Finally, standard BS-seq protocols, where adaptor tagging is preceding the bisulfite conversion step, followed by several gel purification steps resulting in a significant loss of starting material, require micrograms of input DNA and are not amenable for use on cfDNA. Other important considerations for the BS-Seq EWAS study design that are not the focus of this manuscript, are sample size, sequencing depth, and the choice of analysis pipelines that are described in greater detail elsewhere [32, [60][61][62][63][64][65]. Ultra-low input bisulfite sequencing became feasible by incorporating bead-purification, single-tube library preparation, and engineering different methods for library construction based on: random priming of tagged adapters to bisulfite converted ssDNA, post-bisulfite adaptor tagging (PBAT) [69][70][71]; transposase-based library construction -Tagmentation BS-seq (T-WGBS or Tn5mCseq) [72][73][74]; ssDNA adaptor ligation originally applied to ancient DNA samples [75,76]; switching activity of MMLV-RT enzyme to introduce both adapters in a single step -Capture and Amplification by Tailing and Switching (CATS) [77]. These methods were successfully applied for whole-genome methylation profiling of single cells [78], maternal plasma for prenatal diagnostics [68], and plasma from cancer patients [50 ,51 ,79 ].

Methods for ultra-low input WGBS
Targeted BS-seq of circulating cell-free DNA Preferably, EWAS would be performed using complete methylome data. For the time being, however, the costs associated with WGBS are still prohibitively high to allow for high-throughput studies despite recent progress in extracting more information from low depth-of-coverage WGBS data [80 ].
Other approaches for medium-to high-coverage, targeted bisulfite sequencing, rely on the non-specific enrichment of CpG rich regions either by restriction-enzyme mediated as in the case of reduced representation bisulfite sequencing (RRBS), or target-specific enrichment. Modified RRBS methods were applied to study methylation in small cell populations [81,82], and in single cells [83].
Although RRBS was successfully applied to laser captured FFPE samples [84], it is yet to be determined if is amenable for use on highly fragmented cfDNA. In a recent study, Wen et al. implemented an innovative highly sensitive method for detection of thousands of hypermethylated CpG islands in cfDNA, methylated CpG tandems amplification and sequencing (MCTA-Seq), to analyse a large cohort of tissue (n = 57) and plasma samples(n = 94) from hepatocellular carcinoma (HCC) patients (n = 36) and healthy controls (n = 55). A panel of four genes specific for cancer detection (RGS10, ST8SIA6, RUNX2 and VIM) was identified and comparison between matched plasma and tissue samples indicated that both the cancer and noncancerous tissues contribute to elevation of the methylation markers in plasma [85 ].
Conversely, target-specific enrichment might be carried out using microdroplet PCR amplification (Raindance) [86,87], ligation capture [88,89], bisulfite padlock probe (BSPP) capture [90,91] or in-solution hybridization [92][93][94]. To the best of our knowledge, these enrichment methods were not yet applied to the genome-wide analysis of plasma cfDNA, due to the comparably high input requirements (few hundred nanograms to few micrograms). Nonetheless, Miura et al. have recently combined the PBAT protocol with in-solution hybridization using Agilent SureSelect probes to perform targeted bisulfite sequencing starting from only 30 ng of DNA [95]. With continuing research in the field, we believe that target enrichment in combination with aforementioned ultralow input methods for library preparation, may prove highly suitable for future EWAS aimed at discovery of novel cfDNA methylation biomarkers.
Outlook EWAS performed on liquid biopsies represents a highly promising and minimally invasive platform for discovery of novel epigenetic biomarkers for early detection, prognosis and treatment monitoring in cancer. Based on the compelling advantages and progress discussed here, it is not surprising to see both commercial and academic initiatives being set up leveraging different aspects of cell-free DNA analysis, including GrailBio (http://www. grailbio.com/), leveraging ultra-deep sequencing, Can-cerID (http://www.cancer-id.eu/), leveraging major pan-European resources, C2c [96], leveraging integration with whole-body imaging and UroMark, leveraging microdroplet technology [97], to name but a few. Following the recent benchmarking of DNA methylation assays for validating epigenetic biomarkers [98 ] and the development of more sensitive methods overcoming the discussed technical limitations for genome-wide methylation analysis of cfDNA and ctDNA, we predict a surge in EWAS-based studies utilizing liquid biopsies in the coming years. Together with complementary approaches, these studies will advance precision medicine for cancer by facilitating the delivery of much-needed biomarkers for transforming cancer health care.

Conflict of interest
The authors have no competing interests to disclose.  In this study the authors demonstrated that the detection of genome-wide hypomethylation and copy number aberrations in plasma using WGBS is a promising approach for hepatocelular carcinoma detection 51.
Legendre C, Gooden GC, Johnson K, Martinez RA, Liang WS, Salhia B: Whole-genome bisulfite sequencing of cell-free DNA identifies signature associated with metastatic breast cancer. Clin Epigenet 2015, 7:100. In this study the authors performed EWAS based on WGBS of cfDNA, to identify DNA hypermethylated loci for prediction of metastatic breast cancer.