Comprehensive Biostatistical Analysis of CpG Island Methylator Phenotype in Colorectal Cancer Using a Large Population-Based Sample

Background The CpG island methylator phenotype (CIMP) is a distinct phenotype associated with microsatellite instability (MSI) and BRAF mutation in colon cancer. Recent investigations have selected 5 promoters (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1) as surrogate markers for CIMP-high. However, no study has comprehensively evaluated an expanded set of methylation markers (including these 5 markers) using a large number of tumors, or deciphered the complex clinical and molecular associations with CIMP-high determined by the validated marker panel. Metholodology/Principal Findings DNA methylation at 16 CpG islands [the above 5 plus CDKN2A (p16), CHFR, CRABP1, HIC1, IGFBP3, MGMT, MINT1, MINT31, MLH1, p14 (CDKN2A/ARF) and WRN] was quantified in 904 colorectal cancers by real-time PCR (MethyLight). In unsupervised hierarchical clustering analysis, the 5 markers (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1), CDKN2A, CRABP1, MINT31, MLH1, p14 and WRN were generally clustered with each other and with MSI and BRAF mutation. KRAS mutation was not clustered with any methylation marker, suggesting its association with a random methylation pattern in CIMP-low tumors. Utilizing the validated CIMP marker panel (including the 5 markers), multivariate logistic regression demonstrated that CIMP-high was independently associated with older age, proximal location, poor differentiation, MSI-high, BRAF mutation, and inversely with LINE-1 hypomethylation and β-catenin (CTNNB1) activation. Mucinous feature, signet ring cells, and p53-negativity were associated with CIMP-high in only univariate analysis. In stratified analyses, the relations of CIMP-high with poor differentiation, KRAS mutation and LINE-1 hypomethylation significantly differed according to MSI status. Conclusions Our study provides valuable data for standardization of the use of CIMP-high-specific methylation markers. CIMP-high is independently associated with clinical and key molecular features in colorectal cancer. Our data also suggest that KRAS mutation is related with a random CpG island methylation pattern which may lead to CIMP-low tumors.

There is considerable heterogeneity of tumors with regard to CpG island methylation, and not all CpG islands are methylated in a similar manner in colorectal cancer [15]. Thus, choice of CpG islands can substantially influence the features of CIMP. In fact, different CIMP panels used in various studies have caused considerable confusion [7]. Weisenberger et al. [15] have screened 195 CpG islands, and selected 5 loci (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1), which can serve as surrogate markers for CIMP-high. We have further validated the use of 8 markers (the above 5 plus CDKN2A (p16), CRABP1 and MLH1) as a CIMP-high diagnostic panel [24]. However, no study has comprehensively compared these CIMP-high-specific CpG islands and other CpG islands using a large number of tumors.
In this study, we have assessed 16 CpG islands including the new 5 CIMP markers as well as MINT (methylated in tumor) markers and other CpG islands, utilizing hierarchical clustering analysis on a large number of colorectal cancers. We have also assessed the characteristics of CIMP-high tumors determined by a validated marker panel, and interactions of various clinical and tumoral factors by multivariate logistic regression analysis. This study provides the rationale for standardization of CIMP-highspecific methylation markers.

Study Group
We utilized the databases of two large prospective cohort studies; the Nurses' Health Study (NHS, N = 121,700 women followed since 1976) [25,26], and the Health Professionals Follow-up Study (HPFS, N = 51,500 men followed since 1986) [26]. A subset of cohort participants developed colorectal cancer during prospective followup. Thus, these colorectal cancers represented a population-based, relatively unbiased sample (compared to a single or few-hospitalbased sample). Previous studies on the cohorts have described baseline characteristics of cohort participants and incident colorectal cancer cases, and confirmed that our colorectal cancers were well representative as a population-based sample [25,26]. Clinical information was obtained through chart review by physicians. We collected paraffin-embedded tissue blocks from hospitals where participants had undergone resections of primary colorectal cancers. Based on availability of adequate tissue specimens, a total of 904 colorectal cancer cases (406 from the men's cohort and 498 from the women's cohort) were included. Clinical characteristics of the cases are described in Table 1 (on the left, under the column heading ''All cases''). Among our cohort studies, there was no significant difference in demographic features between cases with tissue available and those without available tissue [26]. Most tumors have previously been characterized for statuses of MSI, CIMP, KRAS, BRAF, p53, bcatenin, LINE-1 methylation and 14 of the 16 methylation markers [19,21,24,27]. However, none of our previous studies have comprehensively analyzed the 16 methylation markers in relation to each other, independent associations of CIMP with various clinical, pathological or tumoral molecular characteristics, or interactions of various factors on the associations with CIMP-high by comprehensive biostatistical methods. This study represents a unique novel study in term of 1) a large sample size; 2) the validated set of CIMP-specific methylation markers; 3) the number of other molecular events analyzed, including 8 CpG islands other than the CIMP-specific markers, MSI, KRAS, BRAF, p53, LINE-1 methylation and b-catenin; and 4) comprehensive statistical analyses including unsupervised hierarchical clustering, smoothing splines to assess nonlinearity, multivariate logistic regression, and stratified logistic regression. Thus, this study obtained novel data from the existing materials and database, analogous to novel studies using the well-described cell lines or mouse models. Informed consent was obtained from all study subjects. Tissue collection and analyses were approved by the Harvard School of Public Health and Brigham and Women's Hospital Institutional Review Boards.
Pathologic Examination, DNA Extraction and Sequencing of KRAS and BRAF For all cases, pathologic features including tumor differentiation, mucinous features and signet ring cells were examined by a pathologist (S.O.). Poor differentiation was defined as the presence of ,50% glandular area. Genomic DNA was extracted from paraffin tissue, and PCR and Pyrosequencing targeted for KRAS codons 12 and 13, and BRAF codon 600 were performed as previously described [28,29].

Real-time PCR (MethyLight) for Quantitative DNA Methylation Analysis
Sodium bisulfite treatment on DNA and subsequent real-time PCR (MethyLight [30]) was validated and performed as previously described [31]. We quantified DNA methylation in 5 CIMPspecific promoters (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1) and 11 other CpG islands [CDKN2A (p16), CHFR, CRABP1, HIC1, IGFBP3, MGMT, MINT1, MINT31, MLH1, p14 (CDKN2A/ARF), and WRN]. COL2A1 (the collagen 2A1 gene) was used to normalize for the amount of template bisulfiteconverted DNA [31]. Primers and probes were previously described [15,27], except for IGFBP3, p14 and WRN: IGFBP3- The PCR condition was initial denaturation at 95uC for 10 min followed by 45 cycles of 95uC for 15 sec and 60uC for 1 min. A standard curve was made for each PCR plate by duplicated PCR amplifications for COL2A1 on bisulfite-converted human genomic DNA at 4 different concentrations (in a 5-fold dilution series). The percentage of methylated reference (PMR, i.e., degree of methylation) at a specific locus was calculated by dividing the GENE:COL2A1 ratio of template amounts in a sample by the GENE:COL2A1 ratio of template amounts in SssI-treated human genomic DNA (presumably fully methylated) and multiplying this value by 100. Methylation positivity was set as PMR$4 as previously validated [31].

Pyrosequencing to Measure LINE-1 Methylation
In order to accurately quantify relatively high LINE-1 methylation levels, we utilized Pyrosequencing as previously described [21]. LINE-1 methylation level measured by Pyrosequencing has been shown to correlate well with overall 5-methylcytosine level (i.e., genome-wide DNA methylation level) in tumor cells [32,33]. (%) indicates the proportion of cases with a specific clinical feature within each MSI/CIMP subtype. ' Proximal colon includes cecum to transverse colon, and distal colon includes splenic flexure to sigmoid colon. * p53 and b-catenin status was determined by immunohistochemistry. Active b-catenin was defined as b-catenin score $3, where the b-catenin score was the sum of nuclear (0, 1+ or 2+), cytoplasmic (0, 1+ or 2+) and membrane (0 or 1+ if expression is lost) scores as originally described by Jass et al. [35]. # t-test assuming unequal variances. CIMP, CpG island methylator phenotype; LINE-1, long interspersed nucleotide element-1; MSI, microsatellite instability; SD, standard deviation. doi:10.1371/journal.pone.0003698.t001 Immunohistochemistry for p53 and b-catenin Tissue microarrays (TMAs) were constructed and immunohistochemistry for p53 and b-catenin was performed as previously described [19,34]. Appropriate positive and negative controls were included in each run of immunohistochemistry. Cytoplasmic and nuclear b-catenin expression was recorded separately as either no expression (0), weak expression (1+), or moderate/strong expression (2+). The b-catenin activation score was calculated as the sum of nuclear score (0-2), cytoplasmic score (0-2) and membrane score (0 if membrane staining was positive, +1 if membrane expression was lost), as originally described by Jass et al. [35]. All immunohistochemically-stained slides were examined by one of the investigator (b-catenin by K.N.; p53 by S.O.) unaware of other data. Random samples of 402 and 118 tumors were re-examined for b-catenin and p53, respectively, by a second observer (bcatenin by S.O., p53 by K.N.) unaware of other data, and the concordances between the two observers were 0.83 for b-catenin (k = 0.65, p,0.0001), and 0.87 for p53 (k = 0.75, p,0.0001), indicating substantial agreement.

Statistical Analysis
For cluster analysis of biomarkers including the 16 methylation markers, MSI, KRAS and BRAF, we utilized average linkage hierarchical clustering with a Euclidean distance metric as implemented in MeV (http://www.tm4.org) [36]. The chi square test was used to examine an association between CIMP and other categorical variables of interest. The t-test assuming unequal variances was performed to compare mean age and mean LINE-1 methylation level. The k coefficient was calculated to assess agreement between each of the 16 markers (positive vs. negative) and the 16-marker CIMP panel (CIMP-high positive vs. negative).
To examine the relations of a given variable and CIMP-high, we utilized unconditional logistic regression models to calculate odds ratios (ORs) for CIMP-high, according to the status of the given variable, unadjusted and adjusted for age, sex, tumor location, stage, differentiation, LINE-1 methylation level, and status of MSI, KRAS, BRAF, p53 and b-catenin. To adjust for potential confounding, age and LINE-1 methylation level were used as continuous variables, and all of the other variables were used as categorical variables.
For age and LINE-1, we assessed non-linearity by the likelihood ratio test that compared a regression model including a quadratic (or cubic) term with a model excluding it. The likelihood ratio test showed that including the quadratic term did not significantly alter model fit (p = 0.86 for age, p = 0.078 for LINE-1), and that including the cubic term did not significantly alter model fit (p = 0.87 for age, p = 0.084 for LINE-1). We also examined the possibility of a non-linear relation between age (or LINE-1 methylation) and CIMP-high, non-parametrically with restricted cubic splines [37].
We dichotomized tumor location (proximal vs. distal), tumor differentiation (poor vs. well/moderate), signet ring cells (present vs. absent), MSI (high vs. non-MSI-high), p53 (positive vs. negative), KRAS (mutated vs. wild-type), BRAF (mutated vs. wild-type) and bcatenin (active vs. inactive). There were 3 categories for mucinous feature (0%, 1-49%, and $50%) in the initial main analysis ( Table 2). We dichotomized mucinous feature (present vs. absent) in secondary stratified analyses and analyses of interactions, because multivariate ORs for CIMP-high were similar across the 1-49% mucinous and $50% mucinous categories (in reference to the nonmucinous category). There were 4 categories for stage (I, II, III and IV) in the initial main analysis ( Table 2). We dichotomized tumor stage (I vs. II-IV) in secondary stratified analyses and analyses of interactions, because multivariate ORs for CIMP-high were similar across stage II-IV (in reference to stage I).
When there was missing information on tumor stage (12%), LINE-1 (3.9% missing), MSI (3.2% missing), p53 (1.3% missing), KRAS (2.3% missing) or BRAF (4.7% missing), we assigned a separate (''missing'') indicator variable and included those cases in the multivariate analysis models. We confirmed that excluding cases with a missing variable did not significantly alter results (data not shown). There was no missing information in other variables.
An association of each variable with CIMP-high was also assessed in strata of important clinical or molecular features, including age (,65 year old vs. $65 year old), sex, tumor location (proximal vs. distal), MSI status, and BRAF status. For stratified analysis, each multivariate logistic regression model included a variable of interest that was stratified by a given stratifying variable (e.g., age) and adjusted for all of the remaining variables (SAS codes available upon request).
An interaction was assessed by including the cross product term of a given variable (e.g., MSI) and another variable of interest in a regression model, and the likelihood ratio test compared a model including the cross product term with that excluding it. In addition to interactions of any given variable with MSI, location, age, sex and BRAF, we examined all possible remaining two-way interactions, and found no significant interactions (data not shown).
All statistical analyses except for clustering analysis used SAS version 9.1 (SAS Institute, Cary, NC). All p values were two-sided, and statistical significance was defined as p#0.05. Nonetheless, multiple hypotheses testing was considered when interpreting the data, especially in examining multiple two-way interactions.
To evaluate 16 methylation markers in an unbiased fashion, we conducted an unsupervised hierarchical clustering analysis of the 16 methylation markers and status of MSI (microsatellite instability), and KRAS and BRAF oncogenes, using 860 tumors with all of these results available ( Figure 1). The 8 CIMP-high markers (CACNA1G, CDKN2A (p16), CRABP1, IGF2, MLH1, NEUROG1, RUNX3 and SOCS1) were generally clustered together, indicating good concordance of methylation patterns in these markers and supporting these 8 markers as good CIMP-high markers. In addition, p14, MINT31 and WRN were also clustered with the 8 markers. The other 5 methylation markers (MGMT, HIC1, CHFR, MINT1 and IGFBP3) were not clustered closely with each other or the 8 markers. The BRAF and MSI variables, which have been known to be associated with CIMP-high [15,18,24], were also clustered together with these 8 markers, indicating tight associations with CIMP-high. Notably, KRAS mutation was not clustered with any of the methylation markers, suggesting its association with a random methylation pattern (particularly in CIMP-low tumors which have been associated with KRAS mutation [29]; see also Supplemental Figure S1). We used clustering analysis only for the examination of marker clustering, but not for tumor classification. That was because clustering of markers was very stable with the large number of tumors (i.e., excluding a few tumors did not substantially influence results) while tumor classification by clustering analysis based on the 16 markers was not stable.
To describe performance of each of the 16 markers in an unbiased way, we calculated k coefficient (for agreement statistics), sensitivity and specificity of each maker for CIMP-high diagnosis determined by the 16 markers (Supplemental Table S1). The cutoff for CIMP-high was set as $11/16 or $10/16 methylated markers based on the distribution of KRAS and BRAF mutations (Supplemental Figure), and on the previous findings that CIMP-high is associated with BRAF mutation and CIMP-low is associated with KRAS mutation [24,29]. Sensitivity and specificity of each marker reflected overall concordance of a methylation pattern with the remaining 15 markers. It was evident that performance of the 8 CIMP-panel markers (CACNA1G, CDKN2A, CRABP1, IGF2, MLH1, NEUROG1, RUNX3 and SOCS1) was generally good. The k coefficient was greater than 0.5 for all of these 8 markers. RUNX3 was the single best marker for CIMPhigh diagnosis. Among the other 8 markers (CHFR, HIC1, IGFBP3, MGMT, MINT1, MINT31, p14 and WRN), only MINT31 and p14 consistently showed the k coefficient greater than 0.5, and good sensitivity/specificity. This was in agreement with clustering analysis, which showed that MINT31 and p14 clustered with the 8 CIMP-panel markers.
We also compared the all-16-marker panel with the 8-marker CIMP panel. Using the 8-maker panel, or the 16-maker panel, CIMP-high was defined as $6/8 or $11/16 methylated markers, respectively. Among the 904 cases, 879 cases (97.2%) showed concordant diagnosis of CIMP status between the 16-marker panel and the 8-marker panel (k = 0.89, p,0.0001). When the 16marker CIMP panel was used, the associations of CIMP-high with clinical and molecular features were very similar to the CIMP-high associations by the 8-marker CIMP panel (data not shown). We also confirmed a high degree of agreement (98.6% concordant; k = 0.94) between the 8-marker panel and the 5-marker panel described by Weisenberger et al. [15]. Thus, in subsequent analyses, we used the 8-marker CIMP panel which we had extensively validated [24].

CIMP-high in colorectal cancer
We assessed clinical, pathologic and molecular features of CIMP-high colorectal cancer ( Table 1). By univariate analysis, CIMP-high was associated with female sex, older age, proximal location, poor differentiation, mucin, signet ring cells, MSI-high and BRAF mutation, and inversely with stage I, KRAS mutation, LINE-1 hypomethylation, positive p53, and active b-catenin (all p,0.004).
Age was linearly associated with CIMP-high in logistic regression analysis (p for trend ,0.0001). We did not show significant non-linearity by the likelihood ratio test that compared a model including a quadratic (or cubic) term with a model excluding it (p.0.85). Likewise, LINE-1 hypomethylation was inversely linearly associated with CIMP-high (p for trend ,0.0001), and there was no significant non-linearity by the likelihood ratio test, using a quadratic (or cubic) term (p$0.078). Non-parametric restricted cubic splines also supported a linear b-catenin activation is defined as b-catenin score $3. The b-catenin score is the sum of nuclear (0, 1+ or 2+), cytoplasmic (0, 1+ or 2+) and membrane (0 or 1+) scores as originally described by Jass et al. [35]. ' Proximal colon includes cecum to transverse colon, and distal colon includes splenic flexure to rectum. CI, confidence interval; CIMP, CpG island methylator phenotype; LINE-1, long interspersed nucleotide element-1; MSI, microsatellite instability; OR, odds ratio. doi:10.1371/journal.pone.0003698.t002 relation between age and CIMP-high ( Figure 2) and an inverse linear relation between LINE-1 hypomethylation and CIMP-high ( Figure 3).
In multivariate logistic regression analysis, CIMP-high was significantly associated with older age, proximal location, poor differentiation, MSI-high and BRAF mutation, and inversely with active b-catenin and LINE-1 hypomethylation ( Table 2). However, all of the other features (female, stage, mucin, signet ring cells, KRAS and p53) were no longer significantly associated with CIMP-high in multivariate analysis. We further examined for potential confounders in the association of each variable with CIMP-high. Except for sex, all of the other variables showed substantial changes in odds ratio (OR) for the association with CIMP-high after adjusting for MSI, BRAF and/or tumor location (or other variables) ( Table 2). These results indicated the existence of complex relations between clinical and molecular features (including CIMP) in colorectal cancer, which are summarized in Figure 4. [38][39][40]. Thus, we examined the relations of clinical and tumoral variables with CIMP-high in MSI-high tumors and non-MSI-high tumors ( Table 3). Older age, proximal location and BRAF mutation were significantly associated with CIMP-high in both MSI-high and non-MSI-high tumors. In contrast, the relations of CIMP-high with poor differentiation, KRAS mutation and LINE-1 hypomethylation appeared to be different according to MSI status (p for interaction ,0.005).

Molecular classification by MSI status is increasingly important in colorectal cancer
CIMP-high was associated with poor differentiation and inversely with KRAS mutation in MSI-high tumors, but not in non-MSIhigh tumors. LINE-1 hypomethylation was inversely associated with CIMP-high in non-MSI-high tumors, but not in MSI-high tumors.

Associations with CIMP-high in strata of tumor location
There is accumulating evidence for a molecular difference between proximal and distal colorectal cancers [38,41]. Therefore, we examined the relations of clinical and tumoral variables with CIMP-high in proximal tumors and distal tumors ( Table 4). The relations of CIMP-high with the variables did not appear to differ according to tumor location (all p for interaction .0.23).

Associations with CIMP-high in other stratified analyses
We examined the relations of clinical and tumoral variables with CIMP-high in strata of sex, age (,65 year old vs. $65 year old) and BRAF status. Considering multiple hypotheses testing (12hypotheses testing each), the effect of the variables did not appear to significantly differ according to age (all p for interaction .0.03) and sex (all p for interaction .0.02). Notably, the effect of LINE-1 hypomethylation did appear to differ according to BRAF status (p for interaction = 0.001) ( Table 5). A significant inverse association of LINE-1 hypomethylation with CIMP-high was present in BRAF-mutated tumors [adjusted OR = 0.022; 95% confidence interval (CI), 0.003-0.17], but not in BRAF-wild-type tumors (adjusted OR = 0.87; 95% CI, 0.25-3.06).
We also examined all of the remaining potential two-way interactions by the available clinical and tumoral variables, and  CRABP1 and NEUROG1) are clustered closely, supporting that these markers are good CIMP-high markers. Also note the close relationship between MSI, BRAF and the 8 CIMP-high markers. KRAS mutation is not clustered with any of the methylation markers analyzed, suggesting that KRAS mutation (which is associated with CIMP-low [24,29]; see Supplemental Figure) is probably associated with a random methylation pattern. doi:10.1371/journal.pone.0003698.g001 found no significant interactions with regard to the associations with CIMP-high (data not shown).

Discussion
In this study utilizing a large sample size, we evaluated 16 methylation makers in an unbiased fashion. The 16 methylation markers included the 5 markers (CACNA1G, IGF2, NEUROG1, RUNX3 and SOCS1) that were selected by screening of 195 CpG islands [15] and further validated to be included in the CIMP-high diagnostic panel (the above 5 plus CDKN2A, CRABP1 and MLH1) [24]. By unsupervised hierarchical clustering analysis, the 5 methylation markers were clustered with each other as well as with MSI (microsatellite instability) and BRAF mutation. Analysis of k coefficient, sensitivity and specificity also demonstrated good performance of the 5 methylation markers with generally concordant methylation pattern. Utilizing the validated CIMP panel, we have deciphered the complex relations of CIMP-high with various clinical, pathologic and molecular features in colorectal cancer. Our data provide a rationale for the of the validated CIMP-specific methylation marker panel.
This study is the first extensive investigation to compare the 5 new CIMP-high markers [15] with MINT1, MINT31 and other CpG islands, using a large sample size. Performance of the 5 new markers (CACNA1G, IGF2, NEUROG1, RUNX3, and SOCS1), CRABP1 and MLH1 was consistently superior to that of WRN, MINT1, CHFR, IGFBP3, HIC1 and MGMT. MINT31, CDKN2A, and p14 showed intermediate performance characteristics, and in hierarchical clustering analysis, were generally clustered with the new 5 CIMP markers, MSI and BRAF mutation. We have provided valuable data for standardization of methylation markers for the detection of CIMP-high in colorectal cancer.
Studying epigenetic and genetic aberrations is important in cancer research [42][43][44][45][46]. We used quantitative PCR assays (MethyLight [30]) to determine the degree of DNA methylation, which is robust enough to reproducibly differentiate low-level methylation from high-level methylation [31]. Our resource of a large colorectal cancer sample obtained from two large prospective cohorts (representing a relatively unbiased sample compared to a single-hospital-based sample) has provided a sufficient power to evaluate the 16 methylation markers, and to simultaneously assess independent relations of CIMP-high with multiple clinical and tumoral molecular variables.
Interestingly, unsupervised clustering analysis using a large number of tumors revealed that KRAS mutation was not clustered with any of the 16 methylation markers. However, as shown in our previous studies [24,29], KRAS mutation was more common in CIMP-low tumors compared to CIMP-high and CIMP-0 tumors. Although these findings appeared to be discrepant, we believe that KRAS mutation is perhaps associated with a random pattern of CpG island methylation, indicated by the non-clustering phenom-enon in clustering analysis. In contrast, our clustering analysis has clearly shown that BRAF mutation is clustered with CIMP-high specific markers, indicating that BRAF mutation is perhaps associated with a non-random pattern of CpG island methylation.
Previous studies identified various factors associated with CIMP-high, including old age, female, proximal location, poor differentiation, mucin, signet ring cells, MSI-high, BRAF mutation, wild-type KRAS, inactive b-catenin, wild-type APC, high LINE-1 methylation level, and wild-type TP53 [9][10][11][12][13][14][15][16][17][18][19]21,47,48]. However, many of these factors are interrelated. Thus, in order to properly decipher the relations with CIMP-high, it is necessary to use a large number of tumors, determine a number of molecular features, and perform comprehensive biostatistical analysis. We were able to utilize a large colorectal cancer sample that has been examined for multiple molecular events, and appropriate biostatistical methods. Figure 3 summarizes our current knowledge on the associations of clinical, pathologic and molecular features including CIMP in colorectal cancer. It is very important to keep in mind these relations, when analyzing the association between any of these factors and an outcome (e.g., molecular changes in colorectal cancer, patient mortality, etc.). These factors may confound the relationship of interest. Indeed, we have demonstrated confounding effects of MSI, BRAF and tumor location in a number of the associations in Table 2. In particular, signet ring cells, KRAS and p53 were no longer associated with CIMP-high after adjusting for the confounders.
We have shown that the relations of CIMP-high with tumor differentiation, KRAS mutation and LINE-1 hypomethylation appear to differ according to MSI status. MSI is a major molecular classifier in colorectal cancer [38][39][40]. MSI-high tumors have been shown to exhibit widespread mutations in nucleotide repeat sequences such as those in TGFBR2 and BAX [49,50]. Thus, it is likely that overall genomic changes in MSI-high tumors are dissimilar to those of non-MSI-high tumors. That may explain  why there are some pathologic and molecular features that are differentially associated with CIMP-high according to MSI status.
In summary, using the 16 methylation markers and a large population-based sample, we have evaluated performance of each of the 16 methylation markers in an unbiased fashion. Our current  Each multivariate logistic regression model assesses a variable of interest (stratified by BRAF status in a given model), adjusting for all of the above remaining variables. An interaction was assessed by the likelihood ratio test that compares a model including a cross product term (of the BRAF variable and another variable of interest) with a model excluding it. * b-catenin activation is defined as b-catenin score $3, where the b-catenin score is the sum of nuclear (0, 1+ or 2+), cytoplasmic (0, 1+ or 2+) and membrane (0 or 1+) scores as originally described by Jass et al. [35]. ' Proximal colon includes cecum to transverse colon, and distal colon includes splenic flexure to rectum. CI, confidence interval; CIMP, CpG island methylator phenotype; LINE-1, long interspersed nucleotide element-1; MSI, microsatellite instability; OR, odds ratio. doi:10.1371/journal.pone.0003698.t005 study provides valuable data for standardization of the use of CIMP-high-specific markers. Using the validated CIMP-specific methylation marker panel, we have comprehensively analyzed the clinical, pathologic and molecular features of CIMP-high colorectal cancer by comprehensive biostatistical methods. We have provided the rationale to use the validated CIMP-highspecific methylation marker panel in clinical and research settings. Further studies are necessary to elucidate fundamental molecular defects that lead to CIMP-high colorectal cancer. Figure S1 Distribution of colorectal cancers according to the number of methylated markers and KRAS/BRAF mutational status. Note that KRAS mutation is associated with CIMP-low (rather than CIMP-high and CIMP-negative), in agreement with studies using more limited CIMP-specific methylation markers [24,29]. Found at: doi:10.1371/journal.pone.0003698.s001 (0.12 MB TIF)