Prognostic DNA methylation markers for sporadic colorectal cancer: a systematic review

Biomarkers that can predict the prognosis of colorectal cancer (CRC) patients and that can stratify high-risk early stage patients from low-risk early stage patients are urgently needed for better management of CRC. During the last decades, a large variety of prognostic DNA methylation markers has been published in the literature. However, to date, none of these markers are used in clinical practice. To obtain an overview of the number of published prognostic methylation markers for CRC, the number of markers that was validated independently, and the current level of evidence (LoE), we conducted a systematic review of PubMed, EMBASE, and MEDLINE. In addition, we scored studies based on the REMARK guidelines that were established in order to attain more transparency and complete reporting of prognostic biomarker studies. Eighty-three studies reporting on 123 methylation markers fulfilled the study entry criteria and were scored according to REMARK. Sixty-three studies investigated single methylation markers, whereas 20 studies reported combinations of methylation markers. We observed substantial variation regarding the reporting of sample sizes and patient characteristics, statistical analyses, and methodology. The median (range) REMARK score for the studies was 10.7 points (4.5 to 17.5) out of a maximum of 20 possible points. The median REMARK score was lower in studies, which reported a p value below 0.05 versus those, which did not (p = 0.005). A borderline statistically significant association was observed between the reported p value of the survival analysis and the size of the study population (p = 0.051). Only 23 out of 123 markers (17%) were investigated in two or more study series. For 12 markers, and two multimarker panels, consistent results were reported in two or more study series. For four markers, the current LoE is level II, for all other markers, the LoE is lower. This systematic review reflects that adequate reporting according to REMARK and validation of prognostic methylation markers is absent in the majority of CRC methylation marker studies. However, this systematic review provides a comprehensive overview of published prognostic methylation markers for CRC and highlights the most promising markers that have been published in the last two decades.


Background
Colorectal cancer (CRC) is the third most common form of cancer and accounts for more than 500,000 deaths worldwide each year [1]. Overall, the prognosis of CRC patients is poor with about half of all diagnosed patients dying as a result of recurrence, metastasized disease, or co-morbidities [2]. CRC often develops without symptoms until it has reached an advanced stage. Prognostic markers, which can predict the prognosis of CRC patients and which can stratify high-risk early stage patients from low-risk early stage patients are urgently needed for better management of CRC. Evidence-based results regarding prognostic markers are therefore essential for better patient management.
CRC patient survival is highly dependent on the tumor stage at the time of diagnosis. Therefore, the tumornode-metastasis (TNM) staging system is the gold standard to determine the prognosis of a CRC patient [3,4]. In addition, clinical markers, such as poor tumor differentiation, vascular, and/or perineural invasion, as well as molecular markers, such as microsatellite instability (MSI) status and KRAS or BRAF mutation status can be used [5,6]. Since in many years, a vast amount of epigenetic biomarkers have been identified and described as promising cancer biomarkers in the scientific literature [7][8][9][10]. However, to date, only a few biomarkers have been validated for clinical use [11,12]. CRC, in particular, has often been the topic of epigenetic biomarker research, leading to the identification of methylation markers for early detection of CRC, prediction of prognosis, and/or treatment response [8,[13][14][15][16]. At the moment some methylation markers for early detection of CRC (such as SEPT9, NDRG4, and BMP3) have been incorporated in the FDA-approved commercial tests, Epi proColon® and Cologuard, respectively [17,18]. However, for prognostic or predictive purposes, no methylation marker for colon and/or rectal cancer has made the translation to a clinically applicable biomarker. The reasons for the lack of translation of biomarkers, prognostic, or other, into clinical practice, have already been recognized previously, with many different research groups providing possible explanations and/or solutions for these problems, such as poorly selected biospecimens, not-clinically relevant sample series, and underpowered sample series, as well as lack of validation and reproducibility of the biomarker assay [19][20][21][22][23][24]. In 2005, the REporting recommendations for tumor MARKer prognostic studies (REMARK) guidelines were published in an effort to improve the reporting of biomarker studies and subsequently increase the number of prognostic biomarkers that can be used in clinical practice [25]. Nevertheless, there is evidence that adherence to REMARK is still suboptimal [26,27].
A comprehensive overview of potentially promising prognostic epigenetic biomarkers for CRC is lacking. Furthermore, the current amount of available information only leads to more confusion instead of contributing to answering the question, which biomarker should be further developed for translation. Here, we provide a comprehensive overview of the currently available evidence on prognostic DNA methylation markers for CRC and review the quality of these studies using the REMARK guidelines as a tool.

Search strategy and study eligibility
A literature review was performed covering English language articles in PubMed, EMBASE, and MEDLINE until May 2017 using the following search terms: DNA methylation, biomarker, cancer, colon, colorectum, colorectal, survival, patients outcome, prognosis (Additional file 1). Published studies were eligible to be included in our analysis if colon, rectal, or colorectal cancer patient prognosis was analyzed stratifying patients by methylation status of the marker. Only original articles (no reviews, editorials, conference abstracts, etc.) were considered (Fig. 1). Studies were included if overall survival (OS), disease-specific survival (DSS), disease-free survival (DFS), recurrence-free survival (RFS), or any other endpoint were reported and if results were presented as Kaplan-Meier plots, relative risks, or hazard ratios (HRs) with corresponding 95% confidence intervals (95% CI). We did not restrict our search to specific patient characteristics (such as age group, sex, ethnicity, and tumor type). Studies were excluded if the tumor was hereditary; prognosis was not analyzed by one of the abovementioned methods; studies were focusing on the prognostic influence of the CpG island methylator phenotype (CIMP and microsatellite instability (including MINT loci)), as this has been the topic of multiple systematic reviews of our and other research groups [28][29][30][31][32]; and studies were on methylated miRNAs and LINE-1, as our focus in this review was directed to CpG islands of protein-coding genes.

Data extraction
Data extraction was performed by two independent researchers (MD and DG) using a standardized data registration form in which the following items were recorded: marker of study, sample size, cancer type (colon, rectum, or CRC), sample type (primary tissue, serum, mucosa, blood, lymph node tissue, peritoneal lavage, or stool), stage (tumor-node-metastasis (TNM) staging, according to editions mentioned in original paper, or Dukes' staging), study design, year of collection of samples, number of patients in survival analyses, endpoints, subgroup analysis, p value, and hazard ratio (HR) with corresponding confidence interval. This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, where applicable [33].

Quality assessment
Eligible studies were scored (MD and DG) based on the REMARK criteria [34], which summarizes 20 items for good reporting of a prognostic biomarker study (Additional file 2). In case of complete reporting according to the guidelines, a study was given 1 point, in case of incomplete reporting, a study was given 0.5 points, and in case of lack of reporting any aspect of the guideline item, a study was given 0 points. The maximum score was 20 points (all items adequately reported). Interobserver variation of scores was solved by mutual consensus. The risk of potential bias and confounders was analyzed per study using the information obtained with the REMARK scores. If a study obtained ≥ 1.5 points for REMARK criterion #2 ("patient characteristics") and #6 ("sample selection and follow-up"), the risk of selection bias was low. In case of less than 1.5 points, the risk of potential selection bias is increased. Bias regarding the assay method (measurement bias) was assessed similarly using REMARK criteria #5 ("assay method") and #11 ("handling of marker values"). The risk of bias regarding outcome assessment (measurement bias) was scored based on REMARK criterion #7 ("clinical endpoint definition"). In case of complete reporting (score = 1), the risk of bias is low, as compared to partial or lack of reporting, which increases a potential risk of bias. The presence of potential confounding factors was assessed using REMARK criterion #16 ("multivariable analysis"). In case of 1 point, the risk of confounding factors is lower as compared to studies that did not perform or report a multivariable analysis. In order to investigate whether the REMARK score or the total number of patients included in the survival analysis correlates with the reported significance of the marker (p value), we performed a regression analysis and determined the Pearson's correlation coefficient (r). We compared the REMARK scores of studies reporting a significant finding versus studies reporting a non-significant finding using a Mann-Whitney test. We used the statistical programming language R (version 3.3.1) to perform all analyses.

Forest plots
We prepared forest plots for methylation markers that were investigated in two or more study series. If available, we reported HRs for overall and subgroup analysis, such as single TNM stages or mutation status. Univariate HRs were used, unless multivariate HRs were available. If multiple HRs were available, the most adjusted HR was depicted in the plot. In order to give a complete overview, p values are depicted in the forest plot, if only Kaplan-Meier results were available. We used the statistical programming language R (version 3.3.1) to perform all analyses and generate the figures.

Level of evidence
The level of evidence (LoE) can be determined using an evidence-ranking scheme such as GRADE [35] or the Fig. 1 Flowchart of the study identification process. A total of 83 studies were selected for qualitative assessment OCEBM levels [36] in which level I represents definitive evidence, level IV represents (very) weak evidence, and the remaining levels a degree in between. Even though these rankings do not provide a definitive judgment on the quality of the provided evidence, they do offer a valuable indication. To give an overview of the current evidence on prognostic epigenetic biomarkers in CRC, we classified a LoE to each marker, or marker panel, investigated in two or more independent study series, using a ranking scheme adapted for biomarkers [37] and the OCEBM schemes [36].

Eligible studies
We initially identified 2063 studies for potential inclusion using our search strategy (Additional file 1). We excluded 1869 studies mainly because they were either not original studies or not relevant to prognosis or colorectal cancer, colon cancer, or rectal cancer. We checked full-text articles of the remaining 194 studies, of which 66 were excluded, for prognosis was not pertinent to methylation, prognosis was associated with CIMP, methylation of non-protein-coding genes was investigated (miRNAs, LINE-1, MINT loci), or only in silico data were described (Fig. 1). A total of 83 studies were included in this systematic review.

Study characteristics
Study characteristics are summarized in Additional file 3. Sixty-three (76%) studies reported results from single methylation markers, and 20 (24%) studies included multiple markers. In total, 123 different methylation markers were investigated (Additional file 3). Studies were published between 1999 and 2017. Median (range) sample size was 127 patients (30 to 1105 patients). Seventy-four (89%) studies investigated CRC, six (7%) studies investigated colon cancer only, and three (4%) studies investigated rectal cancer only. Sixty-eight (82%) studies used formalin-fixed or fresh-frozen primary tissue for biomarker analyses, 10 (12%) studies used blood (serum or plasma), two (3%) studies used normal mucosa, one study (1%) used peritoneal lavage fluid, and in one study (1%), the tissue used in the analyses was not specified. Thirteen (16%) studies included patients with the same single TNM (or Dukes) stage, 68 (82%) studies included patients with two or more disease stages. For two (3%) studies, the TNM (or Dukes) stage of the included patients was not specified. Eighty-two studies (99%) used Cox proportional hazard analyses, Kaplan-Meier plots, or both to assess the relation with overall, regression-free, or disease-specific survival. For one study (1%), the statistical method was not described.

Quality assessment
We evaluated studies according to the REMARK checklist and assigned a score between 0 and 20 to each study (Additional file 4). The scores ranged from 4.5 points to 17.5 points with a median score of 10.7 (Fig. 2a). Among the 83 studies that were scored according to the REMARK criteria, we observed a large variation in the amount of information given for the specific criterion. For only two criteria (#1"state marker, objectives, and hypotheses" and #5 "specify assay details"), complete or partial information was given for all studies. For the other criteria, these numbers ranged from 10% (#11 "specify marker values in analyses and discuss cut-off points") to 99% (#19 "interpretation of results and study limitations") ( Fig. 2b). Full quality scores could only be given for one REMARK criterion (#1 "state marker, objectives, and hypotheses") for the majority of the studies (98%). For all other criteria, the percentage of studies obtaining full quality scores ranged from 1% (#11 "specify marker values in analyses and discuss cutoff points") to 70% (#4 "describe biological material") ( Fig. 2b). Almost none of the studies sufficiently addressed how marker values were handled in the analyses or presented cutoffs (10%). Less than half of the studies provided complete or partial information on candidate markers initially considered for the study (28%), reported a rationale for sample size (28%), reported estimated effects with corresponding confidence intervals of the marker and other prognostic variables in the analyses (47%), or reported further investigations such as checking assumptions of proportional hazards (31%). A complete overview of the REMARK scores for the different studies per REMARK criterion is presented in Additional file 4. The risk of bias of each included study is summarized in Additional file 5.
Since it is more likely that studies reporting statistically significant results get published [38,39], we were interested whether there is an association between REMARK score and reported p value and whether inadequately reported studies tend to more frequently report significant results. Although REMARK scores varied between the individual studies, p values < 0.05 for the association between the methylation marker and prognosis were more often reported in studies with lower REMARK scores, as compared to studies with average to high REMARK scores (Fig. 3a, p = 0.005), although this was not seen in the Pearson's correlation coefficient ( Fig. 3b; r = 0.0543, p = 0.499). Whereas almost half (46%) of the 83 selected studies exclusively reported significant results, 24% of all studies reported nonsignificant methylation marker results and 30% described statistically significant, as well as nonsignificant results. Often, methylation markers were reported in small study populations (median n = 127.5), increasing the possibility that reported prognostic effects cannot be validated in other study populations. Therefore, we were interested whether there is an association between the reported p value and the number of patients included in the survival analysis. A borderline statistically significant correlation was observed between the reported p value of the survival analysis and the size of the study population (n) ( Fig. 3c; p = 0.051).

Prognostic marker findings
Additional file 3 shows the impact of methylation markers on prognosis in the included studies . The majority of markers were investigated in a single study without any internal or external validation. As unvalidated results are at higher risk to represent chance findings, validation in at least one independent population is needed to draw any conclusion, even preliminary, for these markers. Therefore, we only prepared forest plots for methylation markers that were investigated by two or more studies and/or where internal validation in an independent series was performed (i.e., IGFBP3, CDKN2A (p16), WNT5a, HPP1, RET, TFPA2E, HLTF, EVL, CD109, NRCAM, FLNC, BNIP3, MLH1, MGMT, RASSF1A, CDKN2A (p14), APC, CHFR, SEPT9, and one multimarker panel; Fig. 4).
Overall, studies assessing IGFBP3 methylation showed similar correlations with a poor prognosis in CRC patients. Yi et al. firstly investigated IGFBP3 hypermethylation as a prognostic marker in three different study populations [48]. Although a significant association was found with poorer OS in the two smaller populations (n = 147, n = 72; HR 2.58 95% CI 1.37-4.87, HR 2.06 95% CI 1.04-4.09, respectively), no association was a b Fig. 2 Quality assessment of methylation marker studies. a Histogram depicting the REMARK score distribution for all studies included in the analysis (mean REMARK score = 10.711, standard deviation = 2.820). b Histogram showing the completeness of reported REMARK items found in a cohort of 558 patients (data not shown in original article). In a subgroup analysis, IGFBP3 hypermethylation was found to be a prognostic factor in three independent studies, all focusing on TNM stages II-III, even though every study used a different endpoint (RFS, DFS, and OS) ( [53]. The studies of Liang et al. [50], Wettergren et al. [55], Mitomi et al. [58], and Maeda et al. [52], focusing on TNM stage II or Dukes' B patients alone or in a larger subgroup of TNM stages II-IV or Dukes' B-C patients, also show statistically significant associations with a poor prognosis (p = 0.0001; HR 4.70 95% CI 1.10-19.50; HR 3.38 95% CI 1.67-6.84; p = 0.022, respectively). This could not be confirmed by the study of Cleven et al., which was conducted in TNM stage II, microsatellite stable (MSS), and BRAFwt patients [73]. A significant association with a poor prognosis was also reported for Dukes' C patients with mutated KRAS (HR 2.60 95% CI 1.20-3.50) [62], but not in another study focusing on Dukes' C patients or TNM stage III, MSS, and BRAFwt patients [73].
Four studies reported on the association between HPP1 hypermethylation and prognosis with conflicting results [78][79][80][81]. In TNM stages I-IV, a statistically significant association was shown with OS (HR 5.10 95% CI 2.20-11.60 and Kaplan-Meier p value < 0.0001) [78,81]. Subgroup analyses of the studies by Philipp et al. showed a statistically significant association between OS and HPP1 hypermethylation in TNM stage IV only (Kaplan-Meier p value 0.0003 and < 0.0001, respectively) [80,81]. The study by Herbst et al. only showed a Fig. 3 Association between REMARK score and significance level of studies. a Box plot comparing the REMARK score between studies that reported statistically significant findings versus studies that did not report statistically significant findings (Mann-Whitney test, p = 0.005). b Dot plot showing that there was, however no significant correlation between the REMARK score and the reported p values (Pearson's correlation coefficient = 0.0543, p value = 0.499). c Dot plot depicting that for the reported survival analyses, we found a stronger, but still statistically not significant correlation between the number of patients used and the reported p values (Pearson's correlation coefficient = 0.1814, p value = 0.051)   [82]. After one administration with combination chemotherapy, a multivariate analysis still predicted a poorer outcome for HPP1 methylated patients (HR 2.08 95% CI 1.54-2.80).
Methylation of RET was studied in three independent patient series reported in one publication [107]. While there was no association with disease-specific survival in the total population of TNM stage I-IV patients, a significant association with poorer OS was found in two TNM stage II patient series (HR 2.51 95% CI 1.42-4.43; HR 1.91 95% CI 1.04-3.53) and one TNM stage III patient series (HR 2.04 95%-CI 1.23-3.37).
Hyper-as well as hypomethylation of TFAP2E was investigated in three studies, of which two studies strikingly reported similar associations between hyper-and hypomethylation and prognosis. The study by Zhang  Conflicting results were also found for BNIP3 methylation, which was studied in three different studies [47,65,73]. In overall univariate analyses, BNIP3 hypermethylation appeared to be statistically significantly associated with a poorer survival in the study by Shimizu et al. (Kaplan-Meier p value 0.012), whereas the other two studies did not report a statistically significant association with poor prognosis (HR 0.94 95% CI 0.58-1.55 and HR 2.23 95% CI 0. 94-5.28). Subgroup analyses of BNIP3 methylation showed a statistically significant association with poorer OS in one of two studies (HR 3.74 95% CI 1.04-13.43) [47] but not in the other (TNM stages II and III, MSS, BRAFwt patients HR 0.86 95% CI 0.39-1.88, and HR 1.08 95% CI 0.38-3.12, respectively) [73].
Out of six studies assessing the association between MLH1 methylation and prognosis in TNM stages I-IV, three studies showed statistically significant results; however, two showed a better prognosis (HR 0.12 95% CI 0.03-0. 56 [69], while the other report a better OS when MLH1 is methylated (p = 0.046) [49].
(See figure on previous page.) Fig. 4 Forest plots of reported methylation markers in colorectal cancer studies. Forest plots were prepared for methylation markers that were reported in two or more publications or study populations. The hazard ratios (HR) are sorted according to the REMARK score. HRs with a statistically significant association are depicted with a solid line; HRs of reported markers with no significant association are depicted with a dotted line; HRs of subgroup analyses are depicted in blue. Univariate HRs and confidence intervals (CI) are reported unless multivariate HRs were available (a). As for IGFBP3 and TFAP2E the HRs of the study of Perez-Carbonell et al. [90] and Zhang et al. [117], respectively, were both associated with worse survival. For this figure, the HR was reversed for visualization purposes (b). A multivariate HR for BNIP3 methylation was available in the study of Shimizu et al., however was not statistically significant (c) For MGMT promoter hypermethylation, four studies showed no association with prognosis in TNM stages I-IV [43,73,94,95], while the study of Nilsson et al. suggested an association between MGMT methylation and a better prognosis (HR 0.36 95% CI 0.15-0.87) [105]. Subgroup analyses in TNM stages II and III showed no association with MGMT in the study of Cleven et al. [73]. Strikingly, a study by Kuan et al. focused on recurrence and showed a strong association between recurrence and MGMT methylation in TNM stages III-IV (HR 11.83 95% CI 3.45-40.12) [69].
RASSF1A methylation was studied in five different studies; however, a statistically significant association was only found in TNM stage III patients (HR 3.89 95% CI 1.23-12.30) [73]. Four studies in TNM stages I-IV and another subgroup analysis on TNM stage II did not show any association between RASSF1 and prognosis [73,[103][104][105]. Matthaios [73,99]. Subgroup analyses also do not show a statistically significant influence of CDKN2A (p14) methylation on prognosis [73].
Hypermethylation of APC was reported to be positively associated with survival by Chen et al. (HR 0.426 95% CI 0.190-0.957) [43]; however, two independent studies did not confirm these results [73,105] [72]. In the study of Cleven et al. subgroup analyses of TNM stage II, MSS and BRAFwt patients showed a significant association between CHFR methylation and CSS (HR 3.89 95% CI 1.58-9.60), but results were not validated in an independent patient series. Subgroup analysis of TNM stage III, MSS, and BRAFwt patients did not show a significant association with prognosis [73].
SEPT9 was assessed as a biomarker in two studies. Liu et al. did not find a statistically significant association between methylation and prognosis in TNM stage I-IV or TNM stage I-III patients [109]. The study of Tham et al. reports an association between SEPT9 methylation and worse OS in TNM stage I-III patients (HR 3.50 95% CI 1.67-7.32) [108]. Although they appeared to have assessed SEPT9 methylation as a biomarker, the study of Perez-Carbonell did not give any specific information on the outcome of this analysis [90].
For three markers (ID4, MYOD1, and SFRP2) and two marker panels (AXIN2 & DKK1 and CDKN2A & hMLH1), analyses were performed in two or more independent studies or patient populations, but reported results were too limited to construct a forest plot (Additional file 6). Methylation of ID4 was assessed by two studies. Umetani et al. reported a significant association between ID4 methylation in stage I-IV CRC patients and poorer OS (HR 1.82 95% CI 1.09-3.43) [88], but the study of Tanaka et al. did not confirm this (p = 0.118) [72]. MYOD1 was suggested as a prognostic biomarker in the study of Hiranuma et al. (HR 3.16 95% CI 1.25-8.02) [97], but this was not seen in the study of Shannon et al. (p = 0.14) [96]. Also, for SFRP2 methylation, survival data were only reported in one out of two studies [70,111]. Tang et al. observed a statistically significant association with OS in stage I-IV CRC patients (HR 3.06 95% CI 1.12-8.40) [111].
Of 83 included studies, 20 studies assessed the prognostic influence of multimarker panels. Although the same markers were included in several multimarker panels, only three panels were assessed in two or more independent studies or patient populations. For one panel consisting of MLH1 and CDKN2A, a better prognosis in TNM stage I-IV patients was reported in the study of Veganzones et al. (p = 0.04) [66]. However, in the study by Aoyagi et al., methylation of both genes was associated with worse survival in TNM stage IV patients (p = 0.03; Additional file 6) [85]. The study of Gaedcke et al. described a panel of markers (ADAP1, BARHL2, CABLES2, DOT1L, ERAS, ESRG, RNF220, ST6GALNAC5, TAF4, SLC20A2) that was associated with poorer DFS in two independent study series (HR 3.57 95% CI 1.01-12.55; HR 3.78 95% CI 1.26-11.37; Fig. 2p) [40]. Kandimalla et al. studied another panel, combing the markers AXIN2 and DKK1 (Additional file 6), in two independent TNM stage II populations (n = 65 and n = 79, respectively). In both populations, an association with poorer RFS was found (HR 3.84 95% CI 1.14-12.43; p < 0.0004, respectively) [45].

Clinical translation
For a definite conclusion on validity of a (prognostic) biomarker, a sufficient level of evidence (LoE) is needed. Methylation marker results (i.e., similar conclusions drawn in two or more independent study series) were ranked according to two established ranking schemes to obtain a comprehensive summary of the current evidence on prognostic epigenetic biomarkers in CRC [36,37]. For 12 single markers, and two multimarker panels, consistent results were reported in two or more publications or populations (Table 1). For four markers, the current LoE is level II, and for the other markers, LoE is lower. For 11 other markers and one multimarker panel, reported results are still too inconclusive to draw any conclusion on a possible prognostic biomarker effect (Table 1).

Discussion
In this review, we summarized published studies on prognostic DNA methylation markers for CRC. Although a large number of studies were identified and included in this review, the results from individual studies are difficult  to compare due to the variation in study design, methodology, and survival endpoints. The number of prognostic biomarkers that were considered in multiple independent studies or patient populations is low, and promising results observed in one study are often not validated in another.
In 2005, with the publication of the REMARK guidelines, an attempt was made to improve the reporting quality of biomarker studies [25]. However, the observed variation in reporting sample series characteristics, statistical analyses, and sample sizes indicates that the REMARK guidelines are still not completely adapted and that accurate reporting of prognostic DNA methylation markers needs improvement (median REMARK score 10.7 out of 20). As we observed that studies which reported significant findings had lower REMARK scores (Fig. 3a, p = 0.005) than studies which did not, a stricter adherence to the REMARK guidelines might be helpful, if we ever want to draw definitive conclusions on the role of a possible biomarker [27]. A more rigorous peer review by the scientific journals might be justified, in order to achieve this [123]. However, the REMARK guidelines are open to subjective interpretation, just as the scoring of the REMARK criteria. An inadequately reported methylation marker study does not imply that the methylation marker itself is not valuable, but it might hinder reproducibility.
The observed inconsistencies in individual study results might have various reasons, such as differences in sample collection, sample preparation, methods of DNA methylation analysis, and the genomic location of the assay [22,[124][125][126]. The lack of standardization of different methods is a major issue in DNA methylation research. Differences in one or more technical aspects of the detection method used, including primer design, reagents, equipment, and protocols, can result in different DNA methylation measurements, even for the exact same genomic location, and can therefore have a substantial impact on the prognostic value of a test [127,128]. Therefore, even consistently reported methylation marker results from this review should be treated carefully and validated in powered prospective studies. Comparing the results of methylation markers obtained with different methodology was not within the scope of this review, however should be addressed in a metaanalysis of the markers with most evidence.
The same lack of standardization holds true for the statistical analysis and the choice of study endpoints [129,130]. Whereas many studies focus on overall survival, other studies use cause-specific, disease-specific, or recurrence-free survival or do not specify the endpoints that were considered. As there are no uniform definitions of these endpoints, it is difficult to compare individual study results [34].
It is generally accepted that CRC is a heterogeneous disease with diverse subgroups, both on histological and molecular level [131][132][133][134][135][136]. Analyzing all CRC patients as one group will therefore obscure the true potential of some biomarkers. The majority of studies in this review (68 studies; 82%) performed an analysis in TNM stages I-IV. As to date, TNM stage is one of the most important prognostic factors in cancer and the choice to include all TNM stages in one analysis will most likely influence the final conclusion. To overcome this, most studies also included one or more subgroup analyses, e.g., subgroups based on MSI, KRAS, or TNM stage. However, the definition of these subgroups is often very detailed or specific, thereby hampering the comparability of individual study results [38]. For example, 13 different subgroup analyses were identified for CDKN2A (p16) in this systematic review that could not be combined in a meta-analysis. In addition, different subgroups may have different baseline risks of death, thereby even further hindering comparison between different studies. Thus, on the one hand, analyzing CRC as a homogenous group could hinder the discovery of subgroup-specific biomarkers; however, on the other hand, analyzing subgroups that are too specific hinders the possibility of validation or meta-research.
Although the risk of introducing selection bias when selecting patients solely based on the availability of tissue is recognized [137], most studies in this review were retrospective (67 studies; 81%) and conducted in a small number of patients (64 studies included < 200 patients; 77%). Often, those study populations are often patient series that have been collected in research laboratories and University Hospitals based on the availability of samples. This, in addition to the availability of follow-up data, often determines the size of the study population. This approach however does not sufficiently contribute to answering the question whether a biomarker had prognostic value and if it should be implemented to improve patient care. It has already been shown that the prognostic effect of DNA methylation markers assessed in small sample series are often chance findings that cannot be reproduced in independent series [138,139]. Also in this review, we observed that statistically significant p values tend to be reported more often in studies with a small population size (Fig. 3c). In order to increase the LoE of prognostic methylation markers, large-scale prospectively collected study populations are required. Therefore, we need a more structured approach, with collaborations between research groups, to obtain sufficient numbers of patient samples and validation populations to draw final conclusions on the prognostic relevance of a biomarker [140]. Twenty-three single markers and two multimarker panels have been investigated in two or more independent studies or patient populations. For eight biomarkers (IGFBP3, CDKN2A (p16), WNT5a, HPP1, RET, TFPA2E, HLTF, CD109, NRCAM, FLNC, and EVL) and the marker panels proposed by Gaedcke et al. (ADAP1, BARHL2,  CABLES2, DOT1L, ERAS, ESRG, RNF220, ST6GAL-NAC5, TAF4, SLC20A2) and Kandimalla et al. (AXIN2,  DKK1), the results were consistent and a statistically significant association with prognosis was observed despite differences in study design and methodology. For five biomarkers (IGFBP3, CDKN2A (p16), WNT5a, HPP1, and RET), the current LoE was II-III, indicating that a definitive conclusion on the prognostic influence of these markers is within reach. However, before a definitive conclusion on these markers can be made, a large, prospective study aimed at studying the clinical validity or a meta-analysis of studies with LoE II, which will be difficult given the differences in study design and methodology, is needed. Solely for WNT5a, additional studies assessing the prognostic influence might be omitted as results show no prognostic influence with a current LoE of II. For the other markers (TFPA2E, HLTF, HPP1, and the multimarker panel proposed by Gaedcke et al.), current LoE is greater than or equal to III, indicating that even more validation is needed. To increase the LoE, these validation studies should preferably be prospectively designed, aimed at studying the biomarker effect, instead of retrospective case-series as these contribute little to increasing the LoE.
Reporting according to guidelines such as REMARK is important but not sufficient for successful translation of prognostic DNA methylation markers to clinical practice. Promising prognostic DNA methylation markers should be evaluated in multivariable prediction models to study their added prognostic value compared to the current reference standard (TNM stage) and other novel strategies to predict CRC prognosis [141][142][143]. Few of the included prognostic methylation marker studies in this review have assessed the incremental value of their prognostic DNA methylation marker in addition to the golden standard TNM staging system or other suggested prognostic markers such as grade of differentiation or microsatellite instability (MSI) [144]. In addition, other potential prognostic tools, such as histologic and molecular markers [145,146], the immunoscore [147], circulating tumor DNA (ctDNA) [148], or the consensus molecular subtype (CMS) classification [149] should be taken into account in prediction models, as it is likely that a combination of several different types of markers will eventually yield the best predictive power.

Conclusion
Despite the widespread acceptance of epigenetic alterations as possible important biomarkers for CRC prognosis, very few biomarkers reach the point of usability in daily patient care and comprehensive overviews of the abundantly available biomarker results are lacking. In this review, we identified several promising markers that all require different amounts of further validation before definitive conclusions on their clinical applicability can be drawn. We also identified multiple problems hampering the comparison of individual study results including problems with population selection, study design, technical issues, and validation problems. Adhering to the REMARK guidelines might partly overcome these problems, and a more rigorous peer-review process specifically focusing on these reporting issues might be an essential step towards reducing the number of chance findings. In addition, biomarker research would benefit from a more structured approach in multidisciplinary collaborations, including clinicians, epidemiologists, statisticians, technicians, and molecular biologists, aiming to perform large, well-designed, and validated biomarker studies [140]. Only then will we be able to ultimately assess the clinical value of a biomarker.

Additional files
Additional file 1:  Table S5. Risk of potential bias and confounders of the included studies. Studies indicated by a "X" potentially have an increased risk of bias, whereas studies indicated by a "√" potentially have a decreased risk of bias. (DOCX 108 kb) Additional file 6: Table S6. Single markers and their characteristics that have been investigated in more than one study series, however, of which Cox regression survival analysis was not available for all markers. (DOCX 63 kb) Acknowledgements Not applicable.

Funding
Not applicable.

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information files.
Authors' contributions MD, DG, HG, MW, MvE, VM, and KS contributed to the conception and design of the manuscript. MD and DG performed the systematic search, data extraction, and scoring of all articles. MD, AK, and KS analyzed and interpreted the data. MD and KS drafted the manuscript, and all authors critically revised the manuscript. All authors read and approved the final manuscript for publication.
Ethics approval and consent to participate Not applicable Consent for publication Not applicable