Introduction

While an individual’s genome is static across cell type and time except for rare somatic mutations, the mechanisms that regulate genomic transcription vary across cell types and are dynamic in different physiologic states. Collectively, these regulatory elements make up one’s epigenome. DNA methylation, one of these epigenetic modifications, principally acts as a silencer of gene expression through hypermethylation at promoter regions.1,2,3 Unique patterns of DNA methylation occur at various genetic loci to regulate gene expression in a cell- or tissue-specific manner.4,5,6,7,8,9 These cell type-specific methylation patterns serve as markers of cell identity and are sometimes referred to as the “methylome.”

Cell-free DNA (cfDNA) is composed of small DNA fragments averaging 150 bp in size found in plasma and urine.10 These cfDNA fragments are principally released due to the processes of necrosis, apoptosis, and NETosis (a process referring to the release of neutrophil extracellular traps (NETs) as part of the immune response).10,11 Due to its origin from different forms of cell death, cfDNA is largely considered to be a marker of inflammation and tissue injury.11,12,13,14,15,16 While cfDNA has the potential to be a powerful clinical biomarker, it lacks cell-type specificity and thus cannot be used to localize the source of tissue damage. However, if cfDNA is used in conjunction with methylome analysis, this lack of specificity of cfDNA is overcome.

This review will discuss methylation and its role in gene silencing and cell-type specificity, cfDNA as a component of the inflammatory process and its clinical application, and the establishment of the methylome and how its study can be coupled with the study of cfDNA in the diagnosis and management of pediatric disease.

Methylation

As DNA is stable across cell types, cell differentiation and cell-specific functions rely on regulatory elements. These regulatory elements include histone modification and DNA methylation. Collectively, these regulatory elements are referred to as the epigenetic code. DNA methylation occurs on cytosine residues, principally in the context of prolonged cytosine-guanine dinucleotide (CpG) repeats referred to as CpG islands, with 50–90% of CpG sites being methylated genome-wide, though the degree of methylation in a given region varies by cell type and function throughout the genome.17,18,19,20 By having methylation take place on CpG repeats, there is equivalency in CpG content between DNA strands, and methylation is propagated across cell divisions.4,21

Methylation principally occurs in the context of promoter hypermethylation and functions to prevent gene transcription, with approximately 70% of gene promoters residing within CpG islands.22 However, methylation outside of promoter regions occurs and has a variable impact on transcription regulation based on location within the genome. Methylation spikes have been described in exonic regions at exon-intron boundaries, suggesting the role of methylation in directing alternative pre-mRNA splicing.18,23,24,25 Interestingly and in contrast to promoter hypermethylation, multiple lines of evidence have demonstrated that methylation of the gene body is positively correlated with euchromatin configuration and gene expression.18,26,27,28,29

Gene silencing

Typically, promoter methylation prevents gene transcription. This association between methylation and gene silencing is supported by positive correlations between gene transcription and promoter hypomethylation across all cell types.18 Active transcription start sites typically have unmethylated CpG islands located near gene promoter regions.30 Conversely, experimentally increasing methylation in promoter regions leads to decreased gene expression.30 DNA methylation of regulatory regions mediates gene silencing by recruiting methyl-group recognition proteins that in turn recruit additional factors to induce a closed chromatin configuration.2,31,32

Interestingly, while DNA methylation regulates chromatin configuration, chromatin configuration appears to dictate methylation. Namely, histone H3 methylated at lysine 4 (H3K4me) seems to prevent methylation. Evidence shows that unmethylated CpG islands are enriched with H3K4me histones and associated with active promoters, while <0.1% of methylated CpG islands are packaged with H3K4me.4,19,33,34,35 Data suggest that the presence of H3K4me histones is protective against the actions of DNA methyltransferase 3-like protein (Dnmt3L), whereas the presence of its counterpart, histone H3 unmethylated at lysine 4, allows recognition by Dnmt3L that subsequently activates factors to promote DNA methylation.35

Methylation in embryogenesis

The regulatory nature of DNA methylation plays a prominent role in embryogenesis and development, with global demethylation being among the first processes to occur. Within hours of fertilization, the paternal genome undergoes active and expansive demethylation, with the maternal genome undergoing demethylation within the first several cleavage cycles.36,37,38 It is presumed that this early global demethylation permits the reestablishment of totipotency of early embryonic cells, while the differential timing of paternal versus maternal DNA demethylation allows for parental-specific developmental cues.1,36 These parental-specific patterns of methylation drive the pathogenesis of genomic imprinting diseases, such as Prader–Willi syndrome (PWS) and Angelman syndrome (AS). Both diseases occur due to deletions in the region of chromosome 15q11.2-q13; however, paternal versus maternal methylation patterns lead to differential gene expression within this region. As a result, a deletion on the paternal chromosome will lead to a PWS phenotype, whereas a maternal deletion will result in AS.

Following this early and active demethylation, widespread de novo methylation occurs during implantation, creating a bimodal methylation pattern with nearly the whole genome becoming methylated except for a small number of protected CpG island-like windows at specifically recognized promoter sequences. These methylation patterns are then propagated with subsequent cell divisions. In the process of reestablishing the methylation pattern, progressive discrete changes in methylation occur in coordination with cell differentiation with increasing levels of methylation as development advances.4,18,38

The methylome and cell-type specificity

As has been discussed, promoter methylation at CpG sites leads to gene silencing and occurs progressively in relation to the degree of cell differentiation. It has been extensively demonstrated that differentially methylated regions (DMRs) occur in a cell type-specific manner. Cell types and tissues have varying amounts of total genomic methylation. For example, there is a 4% difference in the overall level of methylation between skeletal muscle and liver.39 Distinct cell types have unique sets of hypermethylated and hypomethylated promoter regions, and the hypomethylated regions correlate to the expression of genes related to a cell’s specific function.4,5,6,7 Tissue-specific DMRs relating to cell-type expression include the selective methylation of germline-specific genes in somatic cells,9 hypomethylation of the neuronal differentiation factors NEUROD1 and MEF2A in ectodermal tissues, and hypomethylation of RUNX1 in hematopoietic organs.5 In this way, DMRs serve as an identifying marker of cell type, which when taken together collectively within an individual can be referred to as the methylome. To further use the epigenome in the study and treatment of human physiology and disease, large-scale efforts are taking place to create reference methylomes. For example, The Encyclopedia of DNA Elements (ENCODE) project has characterized methylation signatures in 82 cell types and tissues throughout the body.40

Methods of methylation analysis

Numerous methodologies exist for analyzing methylation patterns (Fig. 1). Bisulfite conversion-based methods are considered the gold standard for DNA methylation analysis. In bisulfite conversion-based techniques, input DNA is treated with sodium bisulfite. In response, unmethylated cytosine residues are deaminated to uracil and subsequently to thymine through DNA amplification; meanwhile, methylated cytosine residues remain unchanged in the presence of sodium bisulfite.41 The most thorough and informative bisulfite conversion method is whole genome bisulfite sequencing (WGBS), but this method is costly. To overcome this problem, variations of bisulfite sequencing have been developed, including reduced-representation bisulfite sequencing (RRBS)42 and targeted methylation sequencing.43 In RRBS, DNA is digested using a restriction enzyme. DNA fragments falling within a desired size range are then selected for bisulfite conversion. RRBS has the benefits of lower cost relative to WGBS, the ability to examine DNA fragments as small as 40 bp in length, and a very low requirement for input DNA (10–300 ng) while still maintaining sensitivity with single-nucleotide-level resolution. However, only CpG-rich regions are analyzed in this method, resulting in relatively low coverage of intergenic regions.42,44,45 Targeted methylation sequencing applies bisulfite sequencing to regions of interest (e.g., CpG sites with differential methylation patterns known to be implicated in disease) that are amplified using polymerase chain reaction with site-specific primers and probe hybridization to pull down DNA fragments containing CpG sites.43,45 Shortcomings of bisulfite conversion methods include the fact that sodium bisulfite treatment results in degradation of 84–96% of input DNA,46 and that, at present, bisulfite conversion-based methods cannot be performed with the rapidity needed for clinical use.

Fig. 1: Methodologies of methylation analysis.
figure 1

Techniques, advantages, and disadvantages of common methylation analysis methods.

Additional methodologies include restriction enzyme-based methods and enrichment-based methods. Restriction enzyme-based methods use methylation restriction enzymes (MRE) to cleave DNA at specific sites, with methylation-sensitive enzymes cleaving only at unmethylated sites and methylation-insensitive enzymes cleaving regardless of methylation status. These DNA fragments are then sequenced using “MRE-seq.” This method allows for the evaluation of methylated versus unmethylated DNA, with current restriction enzyme-based analyses allowing coverage of up to 98.5% of CpG islands.45,47 Enrichment-based methods use either anti-methylcytosine or methyl-CpG binding proteins to preferentially pull down methylated DNA regions while excluding unmethylated regions. Relative to bisulfite conversion methods, these enrichment-based processes have lower cost, similar sensitivity, and better specificity, as well as lower input DNA requirements. However, in comparison to WGBS, these processes have lower resolution and bias in favor of hypermethylated regions.45,48,49 Restriction enzyme-based methods and enrichment-based methods obviate the need for sodium bisulfite treatment and avoid the associated DNA degradation.

While methylation analysis can provide insight into identifying cell-type specificity and methylation signatures can suggest potential patterns of gene silencing and transcription, they do not directly assess chromatin accessibility. Some studies have coupled methylation analysis with more direct methods of assessing chromatin configuration, such as chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) to identify binding sites of DNA-associated proteins.50,51 This combination of data permitted and informed the identification of differential gene expression and mechanisms of disease in various pathologies, as has been widely demonstrated in The Cancer Genome Atlas (TCGA).

Cell-free DNA

Cell-free DNA as an inflammatory marker

cfDNA is composed of small DNA fragments averaging 150 bp in size10 found in plasma and urine. cfDNA principally arises with cellular death, including the processes of necrosis, apoptosis, and NETosis.10,11 These NETs are extracellular structures consisting of cytosolic and granule proteins assembled on decondensed chromatin and released from neutrophils in response to oxidative stress, presence of immune complexes, or invasive pathogens.52,53 NETs have significant implications in immune and inflammatory responses as they modulate inflammatory cells, regulate cytokines, and kill pathogens.14,52,53

Beyond its contribution as the DNA component of NETs, cfDNA is independently pro-inflammatory. It is speculated that this may be partly due to the high levels of 8-oxo-2’-deoxyguanosine (8-oxo-dG) DNA, a product of DNA oxidation present within cfDNA13,54 which is attributable to the relatively higher prevalence of guanosine-rich sequences compared to nuclear DNA.12 8-oxo-dG DNA stimulates increased cytokine production, which enhances the inflammatory response.12,13 In this way, cfDNA putatively participates in a positive feedback loop in the inflammatory pathway.

cfDNA molecules arising from cells involved in the inflammatory pathway and tissues highly susceptible to injury and turnover are among the largest constituents of the cfDNA pool. cfDNA from white blood cells (predominantly neutrophils) comprise the majority of constituents of the total cfDNA pool, with predominate solid tissue contributions being from vascular endothelium, hepatocytes, and placental cells in pregnant individuals as determined by methylation analysis.55,56 As a result, elevated cfDNA levels are seen in adult and pediatric patients with diseases characterized by increased inflammation and tissue damage (e.g., cancer, autoimmune conditions, sepsis).57,58,59,60 The short half-life of cfDNA (2.6–5.6 h)61,62 gives it potential as a real-time biomarker.

Cell-free DNA as a clinical biomarker

As discussed, cfDNA is a marker of cell death, tissue injury, and inflammation. Several lines of evidence correlate cfDNA levels to various disease states. For example, in animal and human models of infection, increased cfDNA levels are associated with mortality in intensive care unit patients with sepsis.15,63,64 In the recent and ongoing severe acute respiratory syndrome coronavirus 2 pandemic, patients with higher circulating mitochondrial cfDNA levels have been shown to be at higher risk of mortality, intensive care unit admission, intubation, need for vasopressor support, and need for renal replacement therapy.65 Though correlations between cfDNA and sepsis have been more broadly examined in adult populations, there has been increasing interest in examining cfDNA levels as a marker of infection in pediatric patients, including neonates, with both preterm animal models and human preterm neonates demonstrating elevations in cfDNA in the setting of late-onset sepsis.66 Similarly, an association has been noted between transient inflammatory processes and elevations in cfDNA.15 Given that cfDNA arises secondary to the inflammatory responses, and subsequently potentiates the activity of inflammatory cytokines,54 persistent elevations in cfDNA may serve as a biomarker for the identification of hyperinflammatory syndromes and may be used in autoinflammatory conditions such as systemic lupus erythematosus and rheumatoid arthritis to monitor disease activity and response to treatment.57,59,60

In patients with cancer, cfDNA from the tumor itself as well as total cfDNA may serve as biomarkers. Studies show that in pediatric patients with cancers such as lymphoma and neuroblastoma, total cfDNA is increased, likely due to apoptosis or necrosis of healthy tissue in addition to cellular turnover in the tumor. In addition, many new technologies exploit the DNA mutational load of tumors to identify tumor-specific cfDNA (ctDNA).67,68 Thus, ctDNA can be used to stage cancer, evaluate response to therapy, and identify the presence of relapse. Researchers recently showed that they could readily identify copy number variation and translocations in childhood sarcomas and that the levels of ctDNA correlated with metastasis, clinical response to treatment, and early relapse.68,69 In this way, cfDNA may serve as a “liquid biopsy” in the field of oncology.63,70,71

Similar to ctDNA, cfDNA can be used for surveillance in solid organ transplant recipients by determining the amount of donor-derived cfDNA (ddcfDNA) (i.e., cfDNA carrying the genotype of the allograft donor rather than that of the recipient) relative to the total cfDNA pool. The use of ddcfDNA has been explored most prominently in renal transplant, which has multiple lines of evidence supporting elevated ddcfDNA (>0.5–1% of the total circulating cfDNA) in the detection of clinical and subclinical acute rejection.72,73,74,75 In pediatric solid organ transplant including heart, liver, and kidney transplant, ddcfDNA may also be a sensitive marker for rejection. With time, using this marker will likely result in a decreased need for surveillance allograft biopsies and lead to increased selectivity in deciding which patients require for-cause allograft biopsies.76,77,78 Clinically available testing currently exists and modifications to current methods allow for exploiting single-nucleotide polymorphisms to identify ddcfDNA without knowing donor genotype.79 Despite the ability to recognize the source of cfDNA in the case of cancer or transplant rejection, the clinical applications of cfDNA are still limited. In cancer, heterogeneity in the tumor cells contributing to the cfDNA may lead to difficulties with interpretation and significant variability among patients. In transplant, ddcfDNA may be increased in other inflammatory states such as infection or acute tissue injury.75,80

Perhaps the main limitation to clinical application of cfDNA is its lack of specificity. Outside of solid organ transplant with the use of ddcfDNA, there are relatively few disease states in which genotyping can be exploited to identify the source of cfDNA. However, if cell-specific methylation signatures are coupled with cfDNA analyses, these limitations are overcome, opening new areas of clinical application.

Cell-free DNA, the methylome, and their combined utility in pediatric medicine

As mentioned previously, using cfDNA as a clinical biomarker for organ or cellular injury is limited due to an overall lack of specificity, i.e., elevated total cfDNA may generally suggest tissue damage but does not indicate the tissue type experiencing injury. Using a process termed DNA methylation deconvolution, the cellular source of circulating cfDNA can be determined opening possibilities for several potential future uses in clinical medicine (Fig. 2). Through use of resources such as TCGA and ENCODE, datasets obtained from the Gene Expression Omnibus, and through efforts of individual groups, cell type-specific methylation signatures have been cataloged into reference methylomes.55,56,81,82 Furthermore, this has allowed the characterization of proportional contributions of various cell types to the total cfDNA pool in healthy individuals, with the main contributors to the cfDNA pool being from hematopoetic cells (32% granulocytes, 30% erythrocyte precursors, 10% monocytes, 12% lymphocytes), with the predominate solid tissue sources being vascular endothelial cells and hepatocytes.56 These cell-specific reference methylomes can aid in disease diagnosis, and deviations from the reference proportional contributions of cell types to the total cfDNA pool can suggest a specific tissue implicated in a disease process. For example, Hao et al. were able to identify cancer types in tissues of colon, breast, and lung cancer patients with ≥95% accuracy through examination of DMRs and methylation deconvolution.83 Similarly, Moss et al. used methylome analysis of cfDNA to demonstrate that in various solid cancers, the tumor tissue of origin made up the largest portion of the total cfDNA pool, which was increased in colon, breast, and liver cancer patients.56 In pediatric populations, Van Paemel et al. demonstrated correct identification of 81% of samples from pediatric patients with extracranial tumors through the use of cfDNA methylation deconvolution, with an accuracy of diagnosis being limited by level of sample contamination with high molecular weight DNA (i.e., non-cfDNA).84 Though this method cannot yet serve as a standalone diagnostic test, these results suggest that cfDNA methylation patterns coupled with biopsy information can more accurately establish diagnoses and allow more tailored treatments in pediatric cancers. Other applications of methylation analysis used in conjunction with cfDNA include assessing response to treatment in the form of decreasing tumor cfDNA in response to therapy,56 identifying primary tumor type in metastatic disease, and prognosticating and stratifying risk (e.g., high-risk versus low-risk) by differential methylation patterns of tumor-specific cfDNA.83 Finally, examining methylation patterns of cfDNA allows the identification of tumor-specific cfDNA that indicates the presence of residual cancer cells following surgical resection and allows monitoring for early cancer recurrence.85,86 Beyond identifying cancer presence and type by methylation signature, research is actively underway examining methylation patterns that contribute to intratumor heterogeneity—genomic and epigenomic heterogeneity within a tumor that allows cancer to evolve and become more aggressive.87

Fig. 2: Clinical applications of cfDNA and the methylome.
figure 2

Current and potential uses of cfDNA and the methylome as a real-time biomarker.

One of the most established uses of methylation patterns of cfDNA is in maternal-fetal medicine. In 2005, Chim et al. discovered the differential methylation of the maspin promoter gene between placental tissues (hypomethylated) and maternal blood cells (hypermethylated).88 This exploitation of DMRs drove the development of cell-free fetal DNA (cffDNA) as a fetal genetic diagnostic test. cffDNA was initially utilized as a putative diagnostic tool for the identification of trisomy 21 through evaluation of an elevated placental “chromosome-dosage” of the chromosome 21 marker, HLCS, which is preferentially hypermethylated in placental cells as compared to hypomethylated in maternal blood cells.89 This approach can similarly be applied to placental disorders, as has been shown with elevated cffDNA (identified through hypermethylation of RASSF1AI) as a predictor for the development of pre-eclampsia.90 Methylation patterns for cffDNA likely will lead to further uses in the prediction of other pregnancy outcomes such as preterm birth and small for gestational age. Changes may also provide insight into how environmental exposures such as smoking during pregnancy affect the placenta.

Possibilities for use of differential methylation in cfDNA in other areas of pediatric medicine are just starting to be explored. Cardiac-specific cfDNA has been shown to correlate with troponin levels in children with congenital heart disease pre- and post-transplant and may serve as a biomarker in other cardiac pathologies.91 Ye et al. used cfDNA methylation patterns from cerebrospinal fluid (CSF) to establish that somatic mutations detected in the CSF originate in the brain and may explain focal epilepsy.92 These studies highlight the potential for methylation patterns of cfDNA to be used in a broad range of pediatric diseases.

Conclusion

As the fields of genomics and methylomics progress, greater insights into disease-specific methylation patterns will emerge. With this increased knowledge, the cfDNA methylome will allow the identification of tissue-specific cfDNA, leading to the broad use of this technique as a noninvasive diagnostic and monitoring tool throughout the medical field. In pediatric medicine, methylation patterns of cfDNA are being used to diagnose pediatric cancers more accurately. With this advancement will undoubtedly come the ability to target therapies more precisely. As this science continues to advance, the use of cfDNA in monitoring cancer recurrence will likely promote earlier interventions and improve outcomes. Its utility as a “liquid biopsy” will decrease the need for invasive testing and high-risk sedated diagnostic procedures in the fields of pediatric oncology and in surveillance of allograft health in solid organ transplant. Finally, given its short half-life, cfDNA has the potential to be used as a real-time marker of end-organ damage rather than relying on less sensitive and delayed biomarkers, such as serum creatinine in the setting of kidney injury. When used in conjunction with methylome analysis, cfDNA has exciting potential to become a powerful tool in medical diagnostics and disease monitoring.