MTHFR Glu429Ala and ERCC5 His46His Polymorphisms Are Associated with Prognosis in Colorectal Cancer Patients: Analysis of Two Independent Cohorts from Newfoundland

Introduction In this study, 27 genetic polymorphisms that were previously reported to be associated with clinical outcomes in colorectal cancer patients were investigated in relation to overall survival (OS) and disease free survival (DFS) in colorectal cancer patients from Newfoundland. Methods The discovery and validation cohorts comprised of 532 and 252 patients, respectively. Genotypes of 27 polymorphisms were first obtained in the discovery cohort and survival analyses were performed assuming the co-dominant genetic model. Polymorphisms associated with disease outcomes in the discovery cohort were then investigated in the validation cohort. Results When adjusted for sex, age, tumor stage and microsatellite instability (MSI) status, four polymorphisms were independent predictors of OS in the discovery cohort MTHFR Glu429Ala (HR: 1.72, 95%CI: 1.04–2.84, p = 0.036), ERCC5 His46His (HR: 1.78, 95%CI: 1.15–2.76, p = 0.01), SERPINE1 −675indelG (HR: 0.52, 95%CI: 0.32–0.84, p = 0.008), and the homozygous deletion of GSTM1 gene (HR: 1.4, 95%CI: 1.03–1.92, p = 0.033). In the validation cohort, the MTHFR Glu429Ala polymorphism was associated with shorter OS (HR: 1.71, 95%CI: 1.18–2.49, p = 0.005), although with a different genotype than the discovery cohort (CC genotype in the discovery cohort and AC genotype in the validation cohort). When stratified based on treatment with 5-Fluorouracil (5-FU)-based regimens, this polymorphism was associated with reduced OS only in patients not treated with 5-FU. In the DFS analysis, when adjusted for other variables, the TT genotype of the ERCC5 His46His polymorphism was associated with shorter DFS in both cohorts (discovery cohort: HR: 1.54, 95%CI: 1.04–2.29, p = 0.032 and replication cohort: HR: 1.81, 95%CI: 1.11–2.94, p = 0.018). Conclusions In this study, associations of the MTHFR Glu429Ala polymorphism with OS and the ERCC5 His46His polymorphism with DFS were identified in two colorectal cancer patient cohorts. Our results also suggest that the MTHFR Glu429Ala polymorphism may be an adverse prognostic marker in patients not treated with 5-FU.


Introduction
Colorectal cancer has a high incidence in the developed countries [1]. In 2004 this disease was the 4 th leading cause of death due to cancer with over 600,000 deaths worldwide [2]. In Canada, it is a major health concern with an estimated 22,200 new cases and 8,900 deaths expected in 2011 [3]. There are significant inter-provincial variations in incidence and mortality rates, and the province of Newfoundland and Labrador (NL) has the highest age-standardized incidence and mortality rates for colorectal cancer among the Canadian provinces [3]. Both genetic and environmental factors play a role in susceptibility to colorectal cancer. While the majority of the colorectal cancer patients are sporadic cases, nearly 5% of the colorectal cancers are caused by inherited high-penetrant mutations [4]. Thirty-five per cent of the risk for developing sporadic colorectal cancer is also attributed to the inherited factors [5].
Important colorectal cancer outcomes include recurrence, metastasis and death. Currently, the most valuable prognostic criterion in colorectal cancer patients is the TNM (tumor-nodemetastasis) staging defined by the American Joint Committee on Cancer [6]. Generally, patient prognosis worsens with increasing stage.
A number of clinical and molecular parameters have also been investigated for their prognostic utility in colorectal cancer. For instance, Popat et al [7] reported in their meta-analysis that patients with microsatellite instability-high (MSI-H) tumors have a more favorable prognosis when compared to patients with microsatellite instability-low (MSI-L) or microsatellite stable (MSS) tumors. Several other clinicopathological and molecular features have also been reported to be associated with prognosis, such as high tumor grade [8], mucinous histology [9], lymphovascular invasion [10], chromosomal instability [11], and the presence of the BRAF1 Val600Glu somatic mutation in tumors [12], though contradictory reports have also been published [13][14][15]. Inconsistent results on the association between familial risk status and survival of colorectal cancer patients were also reported [16,17]. Additionally, demographic factors such as gender and ethnicity may be modifiers of prognosis [6]. These factors only partly account for the variations in cancer patient outcomes and it is possible that genetic factors (such as single nucleotide polymorphisms (SNPs), insertion/deletion (indel) polymorphisms, and somatic mutations) may influence prognosis. Their investigation thus may help understanding the reasons for the inter-patient outcome variability and the underlying biological mechanisms.
Several studies have previously reported significant associations between genetic variations and outcomes in colorectal cancer patients. In the present study, we investigated 27 such polymorphisms ( Table 1) as potential prognostic factors in a colorectal cancer patient cohort (discovery cohort, n = 532) and subsequently tested the validity of the positive associations in an additional colorectal cancer patient cohort (validation cohort, n = 252).

Ethics Statement
This study includes two patient cohorts. For both cohorts, collection of the patient clinical data and biospecimens was approved for research purposes by the Regional Health Boards and the Human Investigation Committee (HIC) of Memorial University of Newfoundland. In the discovery cohort, written informed consent was obtained from the patients recruited or their proxies. The Human Investigation Committee of Memorial University of Newfoundland waived the need for written informed consent from the participants in the replication cohort. Ethics approval for this particular project was also obtained from the Human Investigation Committee of Memorial University of Newfoundland.

Patient Cohorts
a) The discovery cohort. This cohort consisted of 532 colorectal cancer patients from the Newfoundland Colorectal Cancer Registry (NFCCR). NFCCR was established in 1999 and recruited 736 stage I-IV colorectal cancer patients between 1999 and 2003 [18]. All patients were #75 years old and their diagnosis was confirmed by pathological examination. The molecular and genetic characteristics of this cohort and other details have been previously reported by others [19,20]. For all patients, the clinical data was compiled (although there were also missing values for some variables; Table 2). In this study, 532 of the 736 colorectal cancer patients from the NFCCR were investigated for whom the genomic DNA (extracted from blood) was also available. Patient data on clinicopathological features, recurrence and metastasis, and the date of death were retrieved from clinical reports (medical, pathology, radiology, autopsy, and surgical reports, lab investigations, physicians' assessment and progress notes, inpatient discharge summaries), the Newfoundland Cancer Treatment and Research Foundation database, or patient follow-up questionnaires. In this cohort, 62% of the patients were treated with 5-FU based chemotherapy in either neoadjuvant or adjuvant settings or upon diagnosis of local and distant recurrences, whereas the remaining patients were either not treated with chemotherapy, or were treated with cisplatin/etoposide (n = 1). Patients in this cohort were followed until April 2010. The median follow-up time in this cohort for overall survival and disease free survival was 6.4 and 6 years, respectively ( Table 2).
b) The validation cohort. This is a retrospective cohort comprised of 280 previously collected colorectal cancer patients from the Avalon Peninsula of Newfoundland. For all 280 patients the clinical data was collected, however genomic DNA (extracted from non-tumor tissues) was available for only 252 patients. Hence, 252 of 280 patients were included into the present study. Patients in this cohort were diagnosed with primary colorectal cancer in a two-year period (between 1997-1998). The patient selection criteria are as follows: a) patients with carcinoma in polyp were included only if the tumor invaded into the stalk, b) patients whose colorectal cancer was a recurrence of an earlier colorectal cancer or a metastasis from a distant organ, and those with carcinoid tumors, familial adenomatous polyposis, carcinoma in situ and mucosal carcinoma were excluded, and c) patients were selected regardless of their age of diagnosis. Prognostic data of these patients was collected from the medical and hospital records and the Newfoundland and Labrador Centre for Health Information. In the validation cohort, 34.9% of the patients were treated with 5-FU-based chemotherapy in either neoadjuvant or adjuvant settings or upon diagnosis of local or distant recurrences. The remaining patients were either not treated with chemotherapy or were treated with other agents such as irinotecan, tomudex or oxaliplatin. Patients in this cohort were followed until July 2009. The median follow-up time for this cohort was 5.4 and 3.3 years for overall survival and disease free survival, respectively ( Table 2).

Selection of Polymorphisms
The dbCPCO database [21] (http://www.med.mun.ca/cpco/) summarizes literature on genetic markers studied for their prognostic associations in colorectal cancer patients. In August 2010, a search of the entries in this database for survival measures (e.g. overall survival) was performed. As a result of this search, 31 polymorphisms were identified. Out of 31, one polymorphism (EGFR (CA) n repeat) was not included in this study because of the lack of a suitable equipment in our lab required to obtain its genotypes. In addition, three polymorphisms (EGF A61G, TP53 Arg72Pro and PTGS2 2765 G/C) could not be genotyped using the MassArrayH technology. As a result, 27 polymorphisms from 24 different genes that were a) reported to be associated with overall survival in at least one study (either univariate or multivariate analyses) ( Table S1 in File S1), b) suitable to be genotyped using the genotyping techniques used in this project (e.g. single nucleotide polymorphisms, indels, microsatellite repeats, gene deletions), and c) successfully genotyped (i.e. did  [72] not fail to be genotyped using the MassArrayH method) were investigated in the current study ( Table 1).

Genotyping Methods
The genotypes for 27 polymorphisms were obtained in the discovery cohort and the genotypes for four polymorphisms that were associated with OS in the discovery cohort (MTHFR Glu429Ala, ERCC5 His46His, SERPINE1 2675indelG, and GSTM1 gene deletion) were obtained in the validation cohort. Genotypes were obtained using the Sequenom MassArrayH platform, TaqManH SNP genotyping assays and gel electrophoresis of PCR-amplified fragments. Further details related to genotyping experiments can be found in the Methods S1 and the Table S2 in File S1. Each genotyping reaction included nontemplate amplifications as negative controls. At least 5.9% of the genotypes were successfully duplicated with a minimum 99.7% concordance rate. Samples with discordant genotypes were either re-genotyped (genotypes obtained by using the TaqManH SNP genotyping assays and gel electrophoresis of PCR-amplified fragments) or excluded from analysis (genotypes obtained by using the Sequenom MassArrayH technique). The minimum successful genotyping rates were 97.4% for the discovery cohort and 94.44% for the validation cohort. In the case of in-house genotyping experiments (i.e. TaqManH SNP genotyping assays and the gel-electrophoresis of PCR-amplified fragments), genotyping reactions for failed DNA samples were attempted at least two additional times, depending on the availability of DNA.

Statistical Analyses
a) Hardy Weinberg Equilibrium (HWE) test. HWE test was manually performed for polymorphisms using the Pearson's Chi-square test ( Table S3 in File S1). For the GSTM1 and GSTT1 gene deletions HWE was not tested as heterozygote genotypes cannot be detected using the genotyping methodology applied in this study.
b) Survival endpoints. OS time was the time from diagnosis of colorectal cancer until death from any cause. DFS time was the time from diagnosis of colorectal cancer until the occurrence of metastasis, recurrence or death from any cause, whichever was earlier. Patients who did not experience the outcome of interest were censored at the time of last follow up. c) Variables. Categorical variables analyzed were sex (males vs females), tumor histology (mucinous vs non-mucinous), tumor location (rectal vs colon), stage (stages II, III and IV vs stage I), tumor grade (poorly differentiated/undifferentiated vs well/ moderately differentiated), vascular and lymphatic invasions (present vs absent), familial risk (high/intermediate risk vs low risk), microsatellite instability status (MSI-H vs MSI-L/MSS) and BRAF1 Val600Glu mutation status (present vs wildtype). For the discovery set, the familial risk status was determined previously by the NFCCR investigators using the Amsterdam II and revised Bethesda criteria [18]. Tumor MSI status and BRAF1 Val600Glu status analyses were also previously performed by NFCCR [19,20,22]. Vascular and lymphatic invasions were highly correlated in the discovery cohort (.95% of tumors with vascular invasion also had lymphatic invasion). Please note that in some models, confidence intervals for the stage IV patients were wide, reflecting the small sample size for this group of patients. These results therefore should be interpreted cautiously.
We categorized the genotypes for each polymorphism assuming the co-dominant genetic model (i.e. minor allele homozygotes and heterozygotes were individually compared to the major allele homozygotes). In the case of the MTHFR Glu429Ala polymorphism, we also performed multivariable analyses under the Table 1. Cont. 10-22% Chr 6, 43752536 T allele associated with lower plasma VEGF levels [74] na: not available, MAF: minor allele frequency, VNTR: variable number of tandem repeats, 2R: 2 VNTR repeats, 3R: 3 VNTR repeats. The EGFR rs2227983 polymorphism is also known as rs11543848. The PTGS2 c.3618A/G excluded from analysis due to its low minor allele frequency. *genotyped by MassArrayH technology, **genotyped by gel electrophoresis of PCR-amplified fragments, ***genotyped by TaqManH SNP genotyping assays.
****MAF information was retrieved from the dbSNP database [75]. *****MAFs are as reported in a published report [76]. The chromosomal locations of polymorphisms are extracted from the dbSNP database [75] (Genome Reference Consortium Human Build 37 patch release 5). doi:10.1371/journal.pone.0061469.t001   d) Univariate analyses. Time-to-event survival plots were constructed using the Kaplan-Meier method and were compared by the log-rank test.
e) Multivariable analysis. The variables used in the construction of the final multivariable models were selected by backward elimination method for OS and DFS separately using the Cox regression method. Selected variables were then reentered in the final models. The proportionality assumption was verified by examining the log-minus-log (log(-log(S(t)))) plots. We also tested the interaction between the MTHFR Glu429Ala genotypes (co-dominant genetic model) and the 5-FU treatment using the Cox regression method.
f) Stratified analyses. Since the MTHFR enzyme (thus the Glu429Ala polymorphism by modifying the MTHFR enzymatic activity) plays a biological role in the 5-FU metabolism/efficacy (See Discussion), 5-FU stratification was done only for this polymorphism. Since this polymorphism was not associated with DFS, 5-FU stratification analysis for DFS was not performed.
g) Comparisons of cohorts. To test if the differences between the baseline characteristics of the discovery and the validation cohorts were significant, we performed the Chi-square test for the categorical variables. Since age was not normally distributed in both cohorts, the non-parametric Mann Whitney-U test was used to compare differences in distribution of age between these two cohorts. Similar analyses were also performed to compare the entire NFCCR cohort (n = 736) and the discovery cohort (n = 532) and also the entire second cohort (n = 280) and validation cohort included in our analysis (n = 252).
PASW Statistics 18 software release 18.0.2 (IBM, NY, USA) was used to perform the statistical analyses. All tests were double sided and the significance threshold was set at p = 0.05. To avoid falsenegative results, correction for multiple testing was not performed in the discovery cohort analysis. While this also increases the potential number of false-positive associations, analysis of the associations detected in the discovery cohort in an additional patient cohort (i.e. the replication cohort) helped eliminate the false-positive findings.

The Discovery Cohort Characteristics
Baseline characteristics of the discovery cohort are listed in Table 2. The median age at diagnosis was 61.4 years. One-third (33.3%) of the patients had died and 39% of patients had experienced recurrence, metastasis or death by the time of last follow-up. The discovery cohort has a significantly lower proportion of stage IV patients (9.8%) when compared to the entire NFCCR cohort (20.9%) (p,0.001). The discovery cohort also significantly differed from the NFCCR cohort in terms of proportions of tumors with vascular (p = 0.007) and lymphatic invasions (p = 0.021), and deceased patients (p,0.001).

Overall Survival Analysis in the Discovery Cohort
Out of 26 polymorphisms investigated, four polymorphisms were significantly associated with OS when adjusted for sex, age, stage and MSI status ( Table 3). Briefly, for the MTHFR Glu429Ala polymorphism, patients homozygous for the C allele   Table S3 in File S1). In addition, patients with at least one copy of GSTM1 gene had a greater risk of death compared to patients with no copy of the gene (HR: 1.40, 95% CI: [1.03-1.92], p = 0.033) ( Table 3).

Disease-free Survival Analysis in the Discovery Cohort
Out of 26 polymorphisms, the ERCC5 His46His and OGG1 Ser326Cys polymorphisms were associated with shorter DFS in the discovery cohort when adjusted for other variables (

The Validation Cohort Characteristics
Baseline characteristics of the validation cohort are listed in Table 2. The median age of diagnosis was 68.7 years. By the time of last follow-up 61.5% of patients had died and 66.3% of the patients had experienced recurrence, metastasis or death. There were no statistically significant differences between the initial 280 patients in this cohort and the 252 patients included in this study in terms of clinical and molecular features (data not shown).
However, there were significant differences between the discovery and validation cohorts in terms of clinicopathological characteristics. First, there was a higher proportion of stage IV patients in the validation cohort (16.3%) compared to the discovery cohort (9.8%) (p = 0.034). Second, the median age at diagnosis in the validation cohort (68.7 years) was significantly higher (p,0.001) compared to that of the discovery cohort (61.4 years). The proportions of patients in terms of sex, tumor location, grade, OS and DFS status, vascular and lymphatic invasions, and treatment with 5-FU-based regimens were also different between the two cohorts ( Table 2).

Overall Survival Analysis in the Validation Cohort
The genotype distribution of four polymorphisms tested in the validation cohort did not deviate from HWE. Out of these four polymorphisms, only the MTHFR Glu429Ala polymorphism was associated with shorter survival times when adjusted for other variables (Table 3). However, in contrast to the discovery set, in the validation cohort, the heterozygotes (Glu/Ala) had shorter survival (HR: 1.71, 95% CI: [1.18-2.49], p = 0.005) when compared to homozygotes for glutamate (Glu/Glu) (Figure 1). Thus the genotype associated with worse survival in the validation cohort (AC) was different than the genotype associated in the discovery cohort (CC). Therefore, we also performed multivariable analyses assuming the recessive and dominant genetic models in these two cohorts ( Table 3, Tables S4-S7 in File S1). As a result, in the discovery set, the MTHFR Glu429Ala polymorphism was associated with OS in the recessive genetic model (CC vs AC+AA; HR: 1.80, 95% CI: [1.13-2.86], p = 0.014), but not in the dominant genetic model. In contrast, in the validation set, this polymorphism was associated with OS in the dominant genetic model (CC+AC vs AA; HR: 1.56, 95% CI: [1.12-2.17], p = 0.009), but not in the recessive genetic model. Analysis of this polymorphism assuming the additive genetic model did not yield significant association with OS in either cohort (data not shown). Thus, the association of the MTHFR Glu429Ala polymorphism with OS in these two cohorts is detected under different genetic models.

Explorative Analyses for Overall Survival and the MTHFR Glu429Ala Polymorphism
While this interesting association pattern of the MTHFR Glu429Ala polymorphism with OS in two cohorts may also be explained by the reduced statistical power to detect the effects of each genotype groups, we also performed additional explorative analyses to investigate other possibilities. First, to test whether the   association of different genotypes could be due to relatively higher median age in the validation cohort when compared to the discovery cohort (see Discussion), we performed a multivariable analysis in patients from the validation set who were under 75 years of age at the time of diagnosis (n = 149). Yet, we found that the pattern of association observed in this analysis (AC vs AA, HR: 2.02, 95% CI: [1.20-3.41], p = 0.008) was similar to that obtained in the validation cohort, suggesting age at diagnosis was not the reason for this disparity. Second, since the 5-FU efficacy might be modified by the activity of the MTHFR enzyme (see Discussion), we also tested the association of the MTHFR Glu429Ala genotypes with OS in patient groups stratified based on their treatment characteristics (patients treated with 5-FU based regimens versus patients not treated with it). Interestingly, in this analysis, we have found that when adjusted for age, stage, MSI status, this polymorphism was significantly associated with OS in the patients not treated with 5-FU (both CC vs AA and AC vs AA genotypes in the discovery set and AC vs AA genotypes in the validation cohort), but not in the patients treated with 5-FU based regimens (Tables 5 and 6). Analysis of the interaction between the MTHFR Glu429Ala polymorphism and the 5-FU treatment status in both the discovery and the validation cohorts did not reveal a statistically significant interaction between these two variables.

Disease Free Survival Analysis in the Validation Cohort
In the multivariable analysis of DFS in the validation cohort, the association of the ERCC5 His46His polymorphism with DFS was also detected as follows: compared to those patients with the CC genotype, patients with TT and the CT genotypes had shorter DFS (HR: 1.81, 95% CI: [1.11-2.94], p = 0.018 and HR: 1.48, 95%CI: [1.02-2.17], p = 0.041), respectively ( Table 4).

Discussion
The main result of our study is that the associations of the MTHFR Glu429Ala polymorphism with the overall survival and the ERCC5 His46His polymorphism with the disease-free survival were detected in two separate cohorts of colorectal cancer patients.
Two other studies have also reported the MTHFR Glu429Ala polymorphism to be associated with OS in colorectal cancer patients. In one study conducted in a Spanish cohort [23], patients carrying the C allele (AC+CC genotype) had worse survival than those with AA genotype, when adjusted for clinicopathological variables. Similarly, in another study [24] performed in metastatic colon cancer patients, female patients with AC+CC genotypes for the MTHFR Glu429Ala had worse OS than female patients with AA genotype in univariate analysis. However, in six other studies, no association was observed between the MTHFR Glu429Ala polymorphism and OS in colorectal cancer in univariate or multivariable analyses [25][26][27][28][29][30].
The role of the MTHFR enzyme and its Glu429Ala polymorphism in colorectal cancer prognosis is not well-known. However, MTHFR has been biologically investigated in great detail (i.e. its role in folate metabolism and 5-FU mechanism of action). In addition, the Glu429Ala polymorphism has been previously shown to cause moderate reduction in the activity of MTHFR enzyme; the Ala/Ala homozygotes have close to 60% of the normal MTHFR activity and the heterozygotes have ,80% of the normal MTHFR activity [31,32].
In folate metabolism, one of the biological activities of the MTHFR enzyme is to convert the 5,10-methylenetetrahydrofolate (5,10-MTHF) to 5-methyltetrahydrofolate (5-MTHF) [33]. 5,10-MTHF is predominantly used in the synthesis of purines and thymidine, the nucleotides used by the dividing cells in DNA synthesis. In addition, 5-MTHF is used in the synthesis of Sadenosyl-methionine (SAM), a key mediator in a number of methylation reactions including DNA methylation [33]. Thus reduction in MTHFR enzymatic activity due to Glu429Ala polymorphism may result in accumulation of 5,10-MTHF and concurrent reduction of 5-MTHF to a certain extent (Figure 2). Accumulation of 5,10-MTHF form of folate may provide increased amounts of nucleotides for DNA synthesis to the rapidly proliferating tumor cells to grow. This theory is supported by recent reports which suggest that once a colorectal adenoma has developed, folate supplementation can aid its growth and progression [34][35][36][37], presumably by facilitating large amounts of nucleotide precursors for tumor growth [33,35,36]. In another study, folate supplementation was found to be associated with progression of already developed colorectal cancer in rats [36]. Also in the Aspirin/Folate Polyp Prevention Study [34,36], folate supplementation was associated with higher risk of advanced adenomas as well as increased number of adenomas in patients with previously developed colorectal adenomas. These findings suggest a negative effect of high folate levels in colorectal cancer prognosis. Therefore, in our cohorts, the association of the MTHFR Glu429Ala polymorphism with worse prognosis of colorectal cancer patients may be due to reduced activity of MTHFR and accumulation of folate which can facilitate tumor growth.
In addition, due to the inefficient MTHFR enzyme function (for example, due to the Glu429Ala polymorphism), the optimal conversion of 5,10-MTHF to 5-MTHF may also be reduced, causing reduction in the levels of 5-MTHF. This may ultimately lead to a decrease in synthesis of SAM ( Figure 2). SAM is an important methyl donor for a large number of reactions and its deficiency can induce DNA hypomethylation. In MTHFR gene knockout mice, the levels of SAM as well as the extent of DNA methylation were found to be significantly reduced [38]. Thus by reducing the enzymatic activity, the MTHFR Glu429Ala polymorphism may lead to a similar, although a less severe consequence. In a study conducted in tumor cells from colon cancer patients, for example, DNA hypomethylation was associated with unfavorable cancer-specific survival and OS [39]. Accordingly, DNA hypomethylation due to the MTHFR Glu429Ala polymorphism mediated reduction in SAM levels may have also contributed to unfavorable prognosis in our cohorts.
Another finding in our study was the association of different genotypes of the MTHFR Glu429Ala polymorphism with OS in two separate colorectal cancer patient cohorts. To explore the potential causes of this disparity, we have focused on the differences between the discovery and validation cohorts that were known to play a biological role in MTHFR activity or the folate metabolism; namely age and treatment with 5-FU. Older individuals are known to have an impaired ability to absorb dietary folate [40] and the validation cohort in our study had a significantly higher median age compared to the discovery cohort (p,0.001). We therefore hypothesized that along with the low folate absorption in older patients, the mildly reduced MTHFR activity due to heterozygosity of MTHFR Glu429Ala polymorphism might have been sufficient to contribute to the unfavorable prognosis observed in the validation cohort (in this case, the Ala/ Ala homozygotes would also be expected to have shorter OS; however, it was possible that this association might have been missed due to insufficient study power for comparison of CC vs AA genotypes). Therefore, we conducted a multivariable analysis in the validation cohort patients, who were ,75 years of age at the time of diagnosis (similar to the patients in the discovery set). These results also showed the association of the AC genotype of the MTHFR Glu429Ala polymorphism with OS when compared to the AA genotype, similar to the association detected in the entire validation cohort. Therefore, it is not likely that increased age together with the MTHFR Ala429 variant and their effect on folate mechanism may explain the association of two different genotypes in our discovery and validation cohorts.
We then focused on the 5-FU and its effect on the folate metabolism. 5-FU is routinely used in the colorectal cancer chemotherapy and one of its anti-neoplastic mechanisms is the inhibition of thymidine synthesis. In the process of thymidine synthesis inhibition, 5,10-MTHF stabilizes the chemical complex necessary for inhibition of thymidylate synthase (TYMS) enzyme [41]. In the presence of increased concentration of 5,10-MTHF, the inhibition of thymidine synthesis and thus the efficacy of 5-FU is expected to increase and this has been demonstrated in vitro in human colon cancer cells [42]. However, in multiple prognostic studies, statistical association of the MTHFR Glu429Ala polymorphism with response to treatment with 5-FU based chemotherapy in colorectal cancer patients was not detected [25,26,[43][44][45][46][47][48][49] suggesting that MTHFR Glu429Ala polymorphism may not affect the efficacy of 5-FU based treatments. In the present study too, no significant association of MTHFR Glu429Ala was found in patients treated with 5-FU based chemotherapy (although we cannot fully rule out the possibility of insufficient study power to detect an effect). However, in our study, MTHFR Glu429Ala polymorphism was associated with shorter OS in patients, who were not treated with 5-FU in both the discovery and the validation cohorts (Tables 5 and 6). On further analyses, we found that the majority of non-5-FU treated patients were stage I and II (92%) and had colon tumors (81.4%), who generally receive surgical treatment without 5-FU-based chemotherapy. Therefore, these results suggest that the reduced MTHFR activity due to Glu429Ala polymorphism may be associated with shorter OS, and thus may be a promising adverse prognostic marker, in early stage colon cancer patients or those patients not treated with 5-FU. Alternatively, other polymorphisms highly linked with MTHFR Glu429Ala polymorphism may be the reason for this association (Methods S2 and Figure S1 in File S1). While these results are needed to be confirmed with further studies, to our knowledge, this is the first study that identified a potential prognostic significance of the MTHFR Glu429Ala polymorphism in colorectal cancer patients not treated with 5-FU.
In the present study, we also show that the TT genotype of the ERCC5 His46His polymorphism is associated with shorter DFS in the two colorectal cancer patient cohorts investigated ( Table 4). To our knowledge, this is the first study that reports the association of the ERCC5 His46His polymorphism with DFS in colorectal cancer patients. ERCC5 is one of the endonucleases functioning in the nucleotide excision repair. ERCC5 His46His is a synonymous and non-splice site polymorphism and its impact on function of ERCC5 protein is uncertain. Previously, the TT genotype of this polymorphism was reported to be associated with short progression free survival (PFS) in advanced colorectal cancer patients receiving oxaliplatin [49] and short PFS and OS times in stage I and II head and neck cancer patients receiving radiotherapy [50]. Radiotherapy-resistant lung cancer cells have also shown an upregulation of ERCC5 [51]. Additionally, in a study of ovarian cancer patients treated with platinum-based chemotherapeutic drugs, loss of heterozygosity (LOH) of ERCC5 and down regulation of this gene were associated with a favorable PFS, presumably due to increased efficacy of these drugs [52]. Whether His46His polymorphism causes up-regulation or down-regulation of ERCC5 is presently unknown and functional characterization of the polymorphism is required to understand its potential prognostic role in colorectal cancer. Alternatively, since this synonymous polymorphism does not alter the amino acid sequence in the protein, it is also likely that rather than the polymorphism itself, another highly correlated polymorphism in the same LD block may have a biological impact on disease progression and survival (Methods S2 and Figure S2 in File S1).
Twenty-four of the 26 selected polymorphisms (excluding one polymorphism which was not included into the statistical analysis in this study due to its low minor allele frequency) did not show an association with survival in our discovery cohort and previously reported associations were thus not replicated in colorectal cancer patients from Newfoundland. Such a lack of concordance in results of genetic prognostic studies is a common observance due to significant heterogeneity in the cohort characteristics and study design amongst different studies. For example, variations in patient ethnicities, treatment characteristics, follow-up times and clinical characteristics are regarded as critical reasons for the discordance in results of genetic prognostic studies in different study cohorts [53,54]. In addition, the discovery and validation cohorts used in this study are predominantly composed of Caucasian patients, with follow-up times of up to 10 years and other clinicopathological features described in Table 2. These features may not be shared by the other published cohorts ( Table S1 in File S1), which may have contributed to differences in the results. Finally, there are differences between the discovery and validation cohorts in terms of several demographic and clinicopathological features (sex, age, stage, grade and invasion status, 5-FU treatment status) and OS and DFS follow up times ( Table 2). These differences as well as the small sample size of the validation cohort may also explain why no associations of ERCC5 His46His, SERPINE1 2675indelG and GSTM1 gene deletion polymorphisms with OS were detected in the validation cohort. Therefore, it is possible that associations of ERCC5 His46His, SERPINE1 2675indelG and GSTM1 gene deletion polymorphisms with OS may be detected in other cohorts with similar characteristics to the discovery cohort. Alternatively, the associations detected in the discovery cohort could be false-positive associations.
The limitations of this study are the dissimilarities between the discovery and validation cohorts, the fact that the discovery cohort was biased towards early stage patients, small size of the validation cohort, the short follow-up time in the validation cohort, especially for DFS, the limited number of genes and polymorphisms investigated, and the limited gene coverage (i.e. other polymorphisms in these genes were not studied). The main strength of this study is the relatively large sample size of the discovery cohort. This is also one of the few studies in colorectal cancer that attempted to replicate results in an additional patient cohort.

Supporting Information
File S1 Supporting information. Figure S1 The circled SNP is rs1801131 (MTHFR Glu429Ala), which lies in a 12kb LD block. The black squares indicate other highly correlated SNPs (r 2 .0.80). Figure S2 The circled SNP is rs1046678 (ERCC5 His46His), which lies in a 23kb LD block. The black squares indicate other highly correlated SNPs (r 2 .0.80). Table S1 n: number. Table S2 * Assay ID by Applied Biosystems (CA, USA). Underlined are the sequences on probes that are complementary to alleles they recognize. Assays for rs1799750 in MMP1 gene and rs1799889 in SERPINE1 gene were custom designed. Assays for rs1801131 in MTHFR gene and rs1047768 in ERCC5 gene were predesigned by Applied Biosystems. Primer and probe sequences for these assays are not available since they are proprietary of Applied Biosystems. Seq: sequence. Table S3 n/a: not applicable. Polymorphisms with x2 value greater than 3.84 were considered to be deviating from HWE (p,0.05). *For these gene deletions, since heterozygote genotype cannot be determined by the genotyping method applied, HWE was not calculated. All polymorphisms were investigated in this study regardless of their deviations from the HWE as these deviations may also be attributed to the fact that the Newfoundland population is considered a genetically isolated population [49]. Nevertheless, it is worth noting that while the OGG1 Ser326Cys polymorphism that deviated from the HWE was included in the DFS multivariable model of the discovery cohort, its genotype data was not available for the validation cohort patients. Thus, the main conclusion on the disease-free survival analysis that the ERCC5 His46His polymorphism was associated with DFS in both the discovery and the validation patient cohorts is not affected by including this polymorphism in the DFS analysis of the discovery cohort.