Pre-Diagnostic Circulating Metabolites and Colorectal Cancer Risk in the Cancer Prevention Study-II Nutrition Cohort

Untargeted metabolomic studies have identified potential biomarkers of colorectal cancer risk, but evidence is still limited and broadly inconsistent. Among 39,239 Cancer Prevention Study II Nutrition cohort participants who provided a blood sample between 1998–2001, 517 newly diagnosed colorectal cancers were identified through 30 June 2015. In this nested case–control study, controls were matched 1:1 to cases on age, sex, race and date of blood draw. Mass spectroscopy-based metabolomic analyses of pre-diagnostic plasma identified 886 named metabolites, after quality control exclusions. Conditional logistic regression models estimated multivariable-adjusted odds ratios (OR) and 95% confidence intervals (CI) for 1 standard deviation (SD) increase in each metabolite with risk of colorectal cancer. Six metabolites were associated with colorectal cancer risk at a false discovery rate < 0.20. These metabolites were of several classes, including cofactors and vitamins, nucleotides, xenobiotics, lipids and amino acids. Five metabolites (guanidinoacetate, 2’-O-methylcytidine, vanillylmandelate, bilirubin (E,E) and N-palmitoylglycine) were positively associated (OR per 1 SD = 1.29 to 1.32), and one (3-methylxanthine) was inversely associated with CRC risk (OR = 0.79, 95% CI, 0.69–0.89). We did not replicate findings from two earlier prospective studies of 250 cases each after adjusting for multiple comparisons. Large pooled prospective analyses are warranted to confirm or refute these findings and to discover and replicate metabolites associated with colorectal cancer risk.


Introduction
Colorectal cancer (CRC) is a multi-factorial disease with many established lifestyle and behavioral risk factors, including smoking, diet, pharmacologic agents, and several variables related to energy balance and metabolism (e.g., diabetes, excess body weight, physical inactivity) [1][2][3]. Metabolomic profiling measures metabolic end products and exogenous exposures such as xenobiotics and it integrates influences of genetic variability. As such, metabolomic profiling is an ideal method to identify novel biomarkers of colorectal cancer risk. Studies have begun to identify circulating metabolomic features differentiating CRC patients from controls [4][5][6], but most collected blood after diagnosis.
The use of metabolomic methods to identify candidate biomarkers of CRC risk is still a new area of research, [6] with few prospective studies published to date [7,8]. Identification of metabolic dysregulation biomarkers prior to CRC diagnosis may eventually lead to objective risk assessment for potential targeted prevention measures and closer screening and follow-up. Metabolites associated with CRC risk when restricting the analysis to the first 5 years of follow-up may provide clues for high risk persons and early detection of adenomas, and lead to a better understanding of the mechanisms through which risk factors play a role in CRC. The purpose of this study was to conduct a comprehensive, exploratory analysis of putative metabolomic markers of CRC risk using mass spectrometry in a nested case-control study of 517 CRC patients and 517 matched controls with prediagnostic blood samples from the American Cancer Society's (ACS) Cancer Prevention Study (CPS)-II Nutrition Cohort.

Results
Participant characteristics are provided in Table 1. Due to matching, there were no differences comparing cases to controls by age, sex and race/ethnicity. Colorectal cancer cases had a significantly higher BMI and were less likely to have undergone colorectal cancer screening compared to controls. Cases also consumed more red meat and had a lower ACS dietary pattern score compared to controls. Associations of all 886 metabolites with CRC risk are provided in Supplementary Table S1. In Supplementary Figure S1, a heatmap illustrates interrelationships of the top 20 associated metabolites using raw p-values. Six metabolites from the multivariable-adjusted model met the criteria of false discovery rate (FDR) < 0.20 (Table 2). These included vanillylmandelate (VMA), a metabolite of epinephrine and norepinephrine and involved in tyrosine metabolism; 3-methylxanthine, a xenobiotic involved in xanthine metabolism; bilirubin (E,E), a heme breakdown product; N-palmitoylglycine, an acyl glycine; guanidinoacetate, an amino acid involved in creatine metabolism; and 2'-O-methylcytidine, a nucleotide involved in pyrimidine metabolism. The ORs for each metabolite with CRC risk were similar between minimally adjusted (including hours since last meal) and multivariable-adjusted models; therefore, only multivariable-adjusted models are presented. 3-Methylxanthine was inversely associated with risk; all other metabolites were associated with an increased CRC risk. Table 2 provides ORs and 95% CIs for each metabolite, for continuous (per SD) metabolites with and without mutual adjustment. In models simultaneously controlling for the other five metabolites, risk estimates for guanidinoacetate, vanillylmandelate, 3methylxanthine, and 2 -O-methylcytidine changed least (raw p value < 0.05). Results using categorical variables (based on quartile distribution) are also presented. Only one of these six metabolites had a statistically significant interaction with sex, with a stronger association observed in men than women for 2'-O-methylcytidine (p interaction = 0.04) ( Table 3). When stratifying the analysis by follow-up time between blood draw and diagnosis (≤5 years, >5 years), most associations remained similar to those from the overall models but a significant interaction was noted for N-palmitoylglycine, with stronger associations when diagnosis occurred within the first five years of follow-up (p interaction = 0.04)( Table 3). None of these six metabolites reached statistical significance in models stratified by tumor subsite. In analyses conducted separately by SEER stage, 2'-O-methylcytidine, 3-methylxanthine and guanidinoacetate were significantly associated with localized CRC tumors, and no metabolites were significantly associated with regional or distant-metastatic staged disease. However, associations were generally in the same direction regardless of tumor stage and tests of heterogeneity were all nonsignificant (Pheterogeneity ≥ 0.05, not shown).

Discussion
In this study of 517 colorectal cancer cases and 517 matched controls, six metabolites among 886 identified were moderately associated with colorectal cancer risk at the FDR < 0.20. These metabolites covered a range of metabolite classes, including amino acids, fatty acids, cofactors, nucleotides and xenobiotics. Only one metabolite association differed by sex, and sex-specific associations were generally of the same magnitude, albeit with varying precision.
Few studies have used metabolomic approaches to identify pre-diagnostic metabolic biomarkers of CRC carcinogenesis. Of 9 studies included in a recent systematic literature review, [6] all but one analyzed biomarkers after CRC diagnosis, which can be biased by surgery, treatment, cancer progression, and changes in lifestyle after diagnosis. In these studies, [6] the number of cases ranged from 28 to 282, with 6 studies having fewer than 100 cases. Pathways that varied by case status included protein biosynthesis, urea cycle, alanine metabolism and glutathione metabolism; markers of energy and lipid metabolism also differed by case-control status. [6] Two studies that had access to pre-diagnosis blood samples (including one from the systematic review [8]) conducted untargeted metabolomic analyses using mass-spectrometry to identify biomarkers of CRC "exposotype" [7,8]. In an analysis of 254 cases and 254 matched controls in the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial cohort (PLCO), none of the 278 named metabolites measured by Metabolon's platforms (and present in 80% of the study participants) were associated with CRC risk after adjusting for multiple comparisons, although serum glycochenodeoxycholate, a bile acid metabolite, was associated with a five-fold increased risk among women only. [8] In the current analysis, glycochenodeoxycholate was positively associated with CRC risk, but did not meet the FDR threshold (OR = 1.18, 95% CI 1.03, 1.34, p = 0.015, FDR = 0.39; Supplementary Table S1). The association was positive in both sexes, but slightly stronger in women (OR = 1.21, 95% CI 1.0, 1.46, p = 0.050, FDR = 0.61; not shown). In the Shanghai Men's and Women's Health Study cohorts (SMHS and SWHS) including 250 cases and 250 controls, 35 of the 618 annotated metabolites were statistically significantly associated with CRC risk using FDR < 0.05 to define significance, of which nine metabolites were independently associated with risk [7]. The strongest independent metabolite in that analysis, picolinic acid, was not replicated in the current study (picinolate, OR = 1.01, 0.88-1.15, Supplementary Table S1). In SMHS and SWHS, the metabolites most strongly associated with risk were those that play a role in glycerophospholipid dysregulation which may impact lipid profiles, energy balance, and insulin signaling. [7] We detected two lipids that were significantly inversely associated with risk in SMHS and SWHS: one (1-palmitoyl-2-docosahexaenoyl-GPC (16:0/22:6)) was not associated with risk (OR = 0.93, 95% CI 0.81, 1.06), and the other, 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPE (P-16:0/20:4), a plasmalogen, was inversely associated with risk (0.86, 95% CI 0.75, 0.98, p = 0.024) in our study but did not meet the FDR cutpoint (FDR = 0.44). The other six metabolites were not identified in this study. Thus, two previously identified metabolites, glycochenodeoxycholate [8], a secondary bile acid formed in the large intestine by microbial flora, and 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPE (P-16:0/20:4) [7], a plasmalogen, were significant at p < 0.05 without adjusting for multiple comparisons; these metabolites are candidates for further study.
There are several potential reasons why the studies to date have not robustly identified the same risk metabolites. These include different study populations, blood collection conditions, [9] the statistical methods employed, measurement error, limited power, and platform used. Other reasons may explain differences in statistical significance across studies. For example, in the PLCO cohort, Cross et al. [8] utilized a minimally adjusted model to identify metabolic signatures associated with CRC risk, whereas Shu et al. [7] controlled for many lifestyle variables in their models. In theory, these approaches answer a somewhat different question. Without covariates, the metabolomic signature could reflect biomarkers of multiple influences on CRC risk (genetic, lifestyle and other exposures); in the latter, controlling for known risk factors would more likely emphasize mechanisms distinct from known lifestyle risk factors, such as other environmental influences, pharmaceuticals and genetics. We present results using multivariable models to identify metabolites that are not obscured by differences in common behavioral characteristics. Nevertheless, although P-values differed, the ORs did not change in the present analysis from minimally adjusted to multivariable-adjusted models, suggesting minimal confounding from the covariables included in the models.
The metabolites identified in this study have biologically plausible roles in CRC carcinogenesis. A hallmark of cancer is impaired lipid metabolism, as was noted in the traditional case-control studies (where blood was collected after cancer diagnosis), and in the two prior nested case-control studies [7,8]. Changes in fatty acid profiles may indicate increased membrane synthesis and cellular turnover. In the current study, Npalmitoylglycine, a fatty acid (acyl glycine) was significantly positively associated with CRC risk, and was the only metabolite with a significantly stronger association within the first 5 years of follow-up. Whether N-palmitoylglycine is a risk factor involved in carcinogenesis, or a biomarker of the cancer itself is not known.
Vanillylmandelate (VMA), an organic compound used in vanilla flavor synthesis and a byproduct of epinephrine and norepinephrine metabolism, and also involved in tyrosine metabolism, was positively associated with CRC risk in this study. Mandelate, a fecal metabolite related to VMA, was associated with a three-fold increase in colorectal cancer risk in a study designed to quantify technical variability of fecal metabolomics data from 48 cases and 102 controls. [10] As altered protein synthesis has been identified as present in cancer metabolism [11], the increased risk associated with both metabolites may be relevant.
Bilirubin, a degradation product of heme which is conjugated in the liver and excreted in the bile, is a metabolic marker of liver disease, and also known to have antioxidant and potentially cytotoxic effects [12,13]. In non-metabolomic-based studies, a case-control [12] and cross-sectional study [14] reported inverse associations between blood bilirubin levels and CRC risk, whereas bilirubin was not associated with colorectal cancer risk in a German prospective cohort study (RR = 1.40, 95% CI 0.93, 2.09) [15] or a prospective analysis from NHANES [16]. In the current analysis, a bilirubin metabolite (E,E) was associated with a 29% increased risk of CRC. Tobacco use is associated with lower bilirubin levels [14]; therefore, potential confounding by tobacco should be carefully ruled out in epidemiologic analyses of bilirubin and cancer risk. Our participants were mostly non-smokers and we controlled for smoking status.
A metabolite of caffeine and theophylline, 3-methylxanthine was inversely associated with CRC risk in our study. 3-methylxanthine is a biomarker of coffee consumption [17], and coffee consumption is inversely related to CRC risk in epidemiologic analyses [18,19]. Guertin et al., observed that caffeine and theophylline both mediated the inverse association of coffee with colorectal cancer risk in the PLCO cohort [20]. Whether this metabolite plays a mechanistic role in CRC prevention, or whether it reflects other lifestyle risk factors associated with coffee consumption deserves further investigation.
Strengths of this analysis include its prospective design, large number of cases and controls, and ability to control for important CRC risk factors. Potential limitations include the one-time measurement of metabolites and processing delays in sample preparation. These factors may contribute to measurement error, which could attenuate the estimates of risk associations. However, in a previous study [21], we found good reproducibility for up to 48 h of processing delay for the five metabolites that were identified in both analyses. Some associations may also have been underestimated due to technical variation, and residual confounding due to between-person differences in fasting status, although we controlled for time since last meal to minimize the influence. Finally, with a relaxed false discovery rate p value cut-point, some results may be due to chance.
In summary, we identified six metabolites that were moderately associated with CRC risk in multivariable-adjusted models. These metabolites may reflect altered lipid and amino acid metabolism in carcinogenesis, and potentially other pathways including xenobiotic metabolism. Whether bilirubin and 3-methylxanthine reflect biologically meaningful mechanisms, or serve as biomarkers of exposure to CRC risk factors (e.g., red meat and coffee consumption), remains to be elucidated. To date, the limited number of prospective studies did not identify the same metabolite-CRC risk associations. Large pooled analyses of studies using similar laboratory and analytic methodology are warranted to identify and confirm candidate metabolites associated with CRC risk with greater statistical power.

Study Population and Design
Men and women in this study were participants in the CPS-II Nutrition Cohort, a subset of 1.2 million participants in the CPS-II Cohort (1982), who resided in 21 U.S. states when they were invited to enroll in the CPS-II Nutrition Cohort of incident cancer follow-up, beginning in 1992-1993 [22]. Between 1998Between -2001,200 of these men and women provided a non-fasting blood sample. The Emory University School of Medicine Institutional Review Board approved all aspects of the CPS-II (Ethical Approval Code: IRB00045780).
Among those who provided blood samples, 617 CRC cases were identified through 30 June 2015 via self-report which was verified with medical records, state cancer registry linkage, or linkage with the National Death Index (defined by ICD-10 codes 18.0, 18.2-18.9, 19.9 and 20.9, excluding non-adenocarcinomas). After excluding 97 cases with prevalent cancer except for nonmelanoma skin cancer at or before blood draw, one case with an incorrect diagnosis date, and two cases with insufficient plasma, 517 cases remained [229 in men, 288 in women; 436 colon (204 proximal, 95 distal, 2 overlapping and 135 unknown), 74 rectum] and 7 unknown subsite. Controls were incidence-density matched 1:1 to cases on sex, race/ethnicity, age at blood draw (± 6 months) and date of blood draw (± 30 days).

Metabolomics Assessment
Metabolomic profiling was conducted by Metabolon, Inc. (Durham, NC, USA) using untargeted, ultrahigh performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) [17,23]. Briefly, methanol was added to precipitate protein, followed by centrifugation. Four sample fractions were dried and reconstituted in different solvents for measurement under four different platforms, including two separate reversed phase (RP)/UPLC-MS/MS methods with positive ion mode electrospray ionization (ESI), one RP/UPLC-MS/MS method with negative ion mode ESI and one hydrophilic interaction chromatography (HILIC)/UPLC-MS/MS with negative ion mode ESI. Individual metabolites were identified by comparison with a chemical library maintained by Metabolon that comprises more than 3300 commercially available purified standard compounds and recurrent unknown entities, based on retention index, mass to charge ratio, and fragmentation. Peaks were quantified using area-under-the-curve and day-to-day variation corrected by setting median values for each compound to 1 for each run-day and normalizing each data point proportionately. Missing values were assumed to reflect amounts below the level of detection and were imputed to the observed minimum of the non-missing values.
Colorectal cancer cases and controls were analyzed in the same batch in a blinded fashion. Replicate quality control samples from 29 study participants were included with the study samples and used to assess intra-and inter-batch variation in the metabolite measurements. A total of 1063 named metabolites were measured. Metabolites which were undetectable in >90% of the samples (n = 27) and those with a technical intraclass correlation coefficient (ICC) that was missing (n = 74) or <0.50 (n = 76) were excluded from the analyses, leaving 886 named metabolites with an average CV% 0.29 (interquartile range 0.18-0.35) and ICC 0.82 (interquartile range 0.74-0.92).

Statistical Analysis
Metabolites were log-transformed and auto-scaled to account for non-normal distribution, consistent with our previous studies. [17] Covariates were assessed either at blood draw (1998)(1999)(2000)(2001) or on the 1999 follow-up survey.
We used conditional logistic regression to estimate the odds ratios (OR) and 95% confidence intervals (CI) per one standard deviation (SD) increase of each named metabolite with CRC risk. The statistical models were conditioned on the matching factors and adjusted for: hours since last meal (to account for length of fasting), body mass index (BMI, kg/m 2 ), smoking, recreational physical activity, alcohol drinking, non-steroidal anti-inflammatory drug use, American Cancer Society diet guidelines score (higher scores represent greater consumption of vegetables, whole fruit and whole grains, and lower consumption of red and processed meat intake), [24] and total energy intake (see footnote to Table 2 for details).
Associations were considered statistically significant if the false discovery rate (FDR) [25] adjusted p value was <0.20; this relaxed p value has been used in similarly sized studies [26,27] to allow for generating hypotheses. Conditional logistic regression models were used to examine associations stratified by sex and by years between blood draw and CRC diagnosis (≤5 follow-up, >5 years of follow-up). The likelihood ratio test was used to calculate p for interaction by comparing the full model with interaction terms to a reduced model without interaction terms. We also examined risk according to CRC tumor subsite and Surveillance, Epidemiology and End Results (SEER) stage at diagnosis: localized [invasive tumors confined to the colorectum (n = 195 cases)]; regional [tumors that extend through the bowel wall to adjacent tissue or regional lymph nodes (n = 225 cases)] and distant metastases (n = 53). The Wald p value for heterogeneity by tumor site and stage was estimated from an unconditional nominal polytomous logistic regression model using the model-based variance-covariance matrix estimate [28]. Analyses were conducted using R version 4.0.2 (The R Foundation for Statistical Computing, Vienna, Austria) [29], and SAS version 9.4 (SAS Institute, Cary, NC, USA).

Supplementary Materials:
The following are available online at https://www.mdpi.com/2218-198 9/11/3/156/s1, Table S1: Individual metabolite (n = 886) associations with colorectal cancer risk in the CPS-II Nutrition Cohort (n = 517 matched cases and controls), Figure S1: Interrelationships of top 20 CRC-associated metabolites priorbased on raw p-values. Funding: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study-II cohort. no external funding.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Emory University (Atlanta, GA, USA) Ethical Approval Code: IRB00045780, and those of participating registries as required.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data described in the manuscript and analytic code are not available to protect participant confidentiality and in adherence with institutional policies.