Identifying metabolic features of colorectal cancer liability using Mendelian randomization

Background: Recognizing the early signs of cancer risk is vital for informing prevention, early detection, and survival. Methods: To investigate whether changes in circulating metabolites characterize the early stages of colorectal cancer (CRC) development, we examined the associations between a genetic risk score (GRS) associated with CRC liability (72 single-nucleotide polymorphisms) and 231 circulating metabolites measured by nuclear magnetic resonance spectroscopy in the Avon Longitudinal Study of Parents and Children (N = 6221). Linear regression models were applied to examine the associations between genetic liability to CRC and circulating metabolites measured in the same individuals at age 8 y, 16 y, 18 y, and 25 y. Results: The GRS for CRC was associated with up to 28% of the circulating metabolites at FDR-P < 0.05 across all time points, particularly with higher fatty acids and very-low- and low-density lipoprotein subclass lipids. Two-sample reverse Mendelian randomization (MR) analyses investigating CRC liability (52,775 cases, 45,940 controls) and metabolites measured in a random subset of UK Biobank participants (N = 118,466, median age 58 y) revealed broadly consistent effect estimates with the GRS analysis. In conventional (forward) MR analyses, genetically predicted polyunsaturated fatty acid concentrations were most strongly associated with higher CRC risk. Conclusions: These analyses suggest that higher genetic liability to CRC can cause early alterations in systemic metabolism and suggest that fatty acids may play an important role in CRC development. Funding: This work was supported by the Elizabeth Blackwell Institute for Health Research, University of Bristol, the Wellcome Trust, the Medical Research Council, Diabetes UK, the University of Bristol NIHR Biomedical Research Centre, and Cancer Research UK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This work used the computational facilities of the Advanced Computing Research Centre, University of Bristol - http://www.bristol.ac.uk/acrc/.


Introduction
Colorectal cancer (CRC) is the third most frequently diagnosed cancer worldwide and the fourth most common cause of death from cancer (Ferlay et al., 2015;Clinton et al., 2018).There is a genetic component to risk of the disease, which is thought to explain up to 35% of variability in CRC risk (Huyghe et al., 2019;Czene et al., 2002;Lichtenstein et al., 2000).In addition, modifiable lifestyle factors, including obesity, consumption of processed meat, and alcohol, are thought to increase CRC risk (Clinton et al., 2018;Händel et al., 2020;McNabb et al., 2020;Lauby-Secretan et al., 2016;Gui et al., 2023).However, the underlying biological pathways remain unclear, which limits targeted prevention strategies.While CRC has higher mortality rates when diagnosed at later stages, earlystage CRC or precancerous lesions are largely treatable, meaning CRC screening programmes have the potential to be highly effective (Meester et al., 2020;Cardoso et al., 2021).Due to the lack of known predictive biomarkers for CRC, wide-scale screening (if implemented at all) is expensive and often targeted crudely by age range.Identifying biomarkers predictive of CRC, or with causal roles in disease development, is therefore vital.
One potential source of biomarkers for CRC risk is the circulating metabolome, which offers a dynamic insight into cellular processes and disease states.It is increasingly clear from mechanistic studies that both systemic and intracellular tumour metabolism play an important role in CRC development and progression (Qiu et al., 2009;Ward and Thompson, 2012).Interestingly, several major risk factors for CRC are known to have profound effects on metabolism (Rattray et al., 2017).For instance, obesity has been shown via conventional observational and Mendelian randomization (MR) analyses to strongly alter circulating metabolite levels (Gui et al., 2023;Singla et al., 2010;Papandreou et al., 2021;Ahmad et al., 2022).This suggests that the circulating metabolome may play a mediating role in the relationship between at least some common risk factors, such as obesity, and CRC -or at least might be a useful biomarker for disease or intermediates thereof.In particular, previous work has highlighted polyunsaturated fatty acids (PUFAs) as potentially having a role in CRC development.The term PUFA includes omega-3 and -6 fatty acids.Recent MR work has highlighted a possible link between PUFAs, in particular omega-6 PUFAs, and CRC risk (Haycock et al., 2023).Further investigating the relationship between CRC and circulating metabolites may therefore provide powerful insights into the causal pathways underlying disease risk or alternatively may be valuable in prediction and early diagnosis.
MR is a genetic epidemiological approach used to evaluate causal relationships between traits (Yarmolinsky et al., 2018;Smith and Ebrahim, 2003).This method uses genetic variation as a proxy measure for traits in an instrumental variable framework to assess the causal relevance of the traits in disease development.As germline genetic variants are theoretically randomized between generations and fixed at conception, this approach should be less prone to bias and confounding than conventional analyses undertaken in an observational context.Conventionally, MR is used to investigate the effect of an exposure on a disease outcome.In reverse MR, genetic instruments proxy the association between liability to a disease and other traits (Holmes and Davey Smith, 2019).This approach can identify the biomarkers which cause the disease, are predictive for the disease, or have diagnostic potential (Holmes and Davey Smith, 2019).Given the suspected importance of the circulating metabolome in CRC development, employing both reverse MR and conventional forward MR eLife digest Colorectal cancer, or bowel cancer, is the fourth most common cause of death from cancer worldwide.Understanding how the cancer develops and recognizing early signs is essential, as people who receive treatment early on have higher survival rates.
One way to boost early detection and disease survival rates is through identifying early colorectal cancer biomarkers.For example, metabolites produced when cells process nutrients have been shown to play a role in the development of colon cancer.Certain metabolites could therefore serve as biomarkers, which can be detected in routine blood tests.But first, scientists need to identify the exact metabolic processes involved in cancer development.Bull, Hazelwood et al. show that fat metabolites during early adulthood may help predict colorectal cancer risk.In the experiments, the team assessed the link between an individual's genetic risk for developing colorectal cancer and metabolites in their blood.By looking at data from over 6,000 individuals living in the UK, followed from early life into adulthood, they found higher fatty acid and low-density lipoprotein levels in young adults at risk of colorectal cancer.However, the results could not be replicated in a separate cohort study of middle-aged adults.The study of Bull, Hazelwood et al. shows that colorectal cancer risk indicators may be present from adolescence to around 40 years, before most individuals are diagnosed.The results suggest this may be a window for early detection and preventive interventions.It also highlights that differences in fat metabolism, possibly linked to genetic differences, may underlie colorectal cancer risk.More studies are needed to better understand how and whether interventions targeting fat levels may help prevent colorectal cancer development.
for metabolites in the same study may be an efficient approach for revealing causal and predictive biomarkers for CRC.Although previous observational studies have investigated associations between the circulating metabolome and CRC risk, these studies may have been influenced by confounding bias, which should be less relevant to MR analyses (Gao et al., 2023;Rattner et al., 2022;Leichtle et al., 2012;Ritchie et al., 2010;Nishiumi et al., 2012;Ma et al., 2012;Bertini et al., 2012;Zhang et al., 2014;Farshidfar et al., 2016;Farshidfar et al., 2012).Additionally, these studies focussed on adults, who commonly take medications which may confound metabolite associations, further complicating interpretations.
Here, we applied a reverse MR framework to identify circulating metabolites which are associated with CRC liability across different stages of the early life course (spanning childhood to young adulthood, when use of medications and CRC are both rare) using data from a birth cohort study.We then attempted to replicate these results using reverse two-sample MR in an independent cohort of middle-aged adults (UK Biobank).We then performed conventional 'forward' MR of metabolites onto CRC risk using large-scale cancer consortia data to identify the metabolites which may have a causal role in CRC development.

Study populations
This study uses data from two cohort studies: the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (generation 1) cohort (individual-level data) and the UK Biobank cohort (summarylevel data); plus summary-level data from a genome-wide association study (GWAS) meta-analysis of CRC comprising the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), Colorectal Transdisciplinary Study (CORECT), and Colon Cancer Family Registry (CCFR).
ALSPAC is a population-based birth cohort study in which 14,541 pregnant women with an expected delivery date between 1 April 1991 and 31 December 1992 were recruited from the former Avon County of southwest England (Boyd et al., 2013).Since then, 13,988 offspring alive at 1 y have been followed repeatedly with questionnaire-and clinic-based assessments (Fraser et al., 2013;Northstone et al., 2019).Sufficient information was available on 6221 of these individuals to be included in our analysis, as metabolomics was not performed for all individuals in the ALSPAC study.Study data were collected and managed using REDCap (Research Electronic Data Capture) electronic data capture tools hosted at the University of Bristol (Harris et al., 2009) REDCap is a secure, webbased software platform designed to support data capture for research studies.Offspring genotype was assessed using the Illumina HumanHap550 quad chip platform.Quality control measures included exclusion of participants with sex mismatch, minimal or excessive heterozygosity, disproportionately missing data, insufficient sample replication, cryptic relatedness, and non-European ancestry.Imputation was performed using the Haplotype Reference Consortium (HRC) panel.Offspring were considered for the current analyses if they had no older siblings in ALSPAC (203 excluded) and were of white ethnicity (based on reports by parents, 604 excluded) to reduce the potential for confounding by genotype.The study website contains details of all available data through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).UK Biobank is a population-based cohort study based in 22 centres across the UK (Sudlow et al., 2015).The cohort is made up of around 500,000 adults aged 40-80 years old, who were enrolled between 2006 and 2010.Genotyping data is available for 488,377 participants (Bycroft et al., 2018).Participants were genotyped using one of two arrays -either the Applied Biosystems UK BiLEVE Axiom Array by Affymetrix (now part of Thermo Fisher Scientific) or the closely related Applied Biosystems UK Biobank Axiom Array.Approaches based on principal component analysis (PCA) were used to account for population structure.Individuals were excluded if reported sex differed from inferred sex based on genotyping data; if they had sex chromosome karyotypes which were not XX or XY; if they were outliers in terms of heterozygosity and missing rates; or if they had high relatedness to another participant.Multiallelic SNPs or those with a minor allele frequency of below 1% were removed.Imputation was performed using the UK10K haplotype and HRC reference panels.
The GWAS meta-analysis for CRC included up to 52,775 cases and 45,940 controls (Huyghe et al., 2019;Huyghe et al., 2021).This sample excluded cases and controls from UK Biobank to avoid potential bias due to sample overlap which may be problematic in MR analyses (Burgess et al., 2016).
Cases were diagnosed by a physician and recorded overall and by site (colon, 28,736 cases; proximal colon, 14,416 cases; distal colon, 12,879 cases; and rectal, 14,150 cases).Colon cancer included proximal colon (any primary tumour arising in the caecum, ascending colon, hepatic flexure, or transverse colon), distal colon (any primary tumour arising in the pleenic flexure, descending colon, or sigmoid colon), and colon cases with unspecified site.Rectal cancer included any primary tumour arising in the rectum or rectosigmoid junction (Huyghe et al., 2019).Approximately 92% of participants in the overall CRC GWAS were white European (~8% were East Asian).All participants included in sitespecific CRC analyses were of European ancestry.Imputation was performed using the Michigan imputation server and HRC r1.0 reference panel.Regression models were further adjusted for age, sex, genotyping platform, and genomic principal components as described previously (Huyghe et al., 2019).

Assessment of CRC genetic liability
Genetic liability to CRC was based on single-nucleotide polymorphisms (SNPs) associated with CRC case status at genome-wide significance (p<5 × 10 −8 ).A total of 108 independent SNPs reported by two major GWAS meta-analyses were eligible for inclusion in a CRC genetic risk score (GRS) (Huyghe et al., 2019;Law et al., 2019).The set of SNPs was filtered, excluding 36 SNPs that were in linkage disequilibrium based on R 2 > 0.001 using the TwoSampleMR package (SNPs with the lowest p-values were retained) (Hemani et al., 2018).This left 72 SNPs independently associated with CRC (Supplementary file 1a), 65 of which were available in imputed ALSPAC genotype data post quality control.As GWAS of site-specific CRC have identified marked heterogeneity (Huyghe et al., 2021), GRS describing site-specific CRCs were constructed for sensitivity analyses using the same process outlined above.The GRS for colon cancer, rectal cancer, proximal colon cancer, and distal colon cancer were comprised of 38, 25, 20, and 24 variants, respectively (Supplementary file 1a).For overall CRC and site-specific CRC analyses, sensitivity analyses excluding any SNPs in the FADS cluster (i.e.within the gene regions of FADS1, FADS2, or FADS3) (Supplementary file 1a) were performed given a likely role for these SNPs in influencing circulating metabolite levels directly, in particular via lipid metabolism (i.e.not primarily due to CRC) (Lu et al., 2019;Zaytseva et al., 2018;ClinicalTrials. gov, 2022;Fhu and Ali, 2020;Chen et al., 2020;Kathiresan et al., 2009;Tanaka et al., 2009).

Assessment of circulating metabolites
Circulating metabolite measures were drawn from ALSPAC and UK Biobank using the same targeted metabolomics platform.In ALSPAC, participants provided non-fasting blood samples during a clinic visit while aged approximately 8 y, and fasting blood samples from clinic visits while aged approximately 16 y, 18 y, and 25 y.Proton nuclear magnetic resonance ( 1 H-NMR) spectroscopy was performed on ethylenediaminetetraacetic acid (EDTA) plasma (stored at or below -70°C pre-processing) to quantify a maximum of 231 metabolites (Würtz et al., 2017).Quantified metabolites included the cholesterol and triglyceride content of lipoprotein particles; the concentrations and diameter/size of these particles; apolipoprotein B and apolipoprotein A-1 concentrations; as well as fatty acids and their ratios to total fatty acid concentration, branched chain and aromatic amino acids, glucose and pre-glycaemic factors including lactate and citrate, fluid balance factors including albumin and creatinine, and the inflammatory marker glycoprotein acetyls (GlycA).This metabolomics platform has limited coverage of fatty acids.In UK Biobank, EDTA plasma samples from 117,121 participants, a random subset of the original ∼500,000 who provided samples at assessment centres between 2006 and 2013, were analysed between 2019 and 2020 for levels of 249 metabolic traits (168 concentrations plus 81 ratios) using the same high-throughput 1 H-NMR platform.Data pre-processing and QC steps are described previously (Würtz et al., 2017;Julkunen et al., 2021;Bycroft et al., 2018).To allow comparability between MR and GRS estimates, all metabolite measures were standardized and normalized using rank-based inverse normal transformation.For descriptive purposes in ALSPAC, body mass index (BMI) was calculated at each time point as weight (kg) divided by squared height (m 2 ) based on clinic measures of weight to the nearest 0.1 kg using a Tanita scale and height measured in light clothing without shoes to the nearest 0.1 cm using a Harpenden stadiometer.
CRC liability variants were combined into a GRS using PLINK 1.9, specifying the effect (risk raising) allele and coefficient (logOR) with estimates from the CRC GWAS used as external weights (Huyghe et al., 2019;Law et al., 2019).GRSs were calculated as the number of effect alleles (or dosages if imputed) at each SNP (0, 1, or 2) multiplied by its weighting, summing these, and dividing by the total number of SNPs used.Z-scores of GRS variables were calculated to standardize scoring.

Statistical approach
An overview of the study design is presented in Figure 1.To estimate the effect of increased genetic liability to CRC on circulating metabolites, we conducted a GRS analysis in ALSPAC and reverse twosample MR analyses in UK Biobank.Estimates were interpreted within a 'reverse MR' framework (Holmes and Davey Smith, 2019), wherein results are taken to reflect 'metabolic features' of CRC liability which could capture causal or predictive metabolite-disease associations.To clarify the direction of metabolite-CRC associations, we additionally performed conventional 'forward' two-sample MR analyses to estimate the effect of circulating metabolites on CRC risk using large-scale GWAS data on metabolites and CRC.

Associations of CRC liability with circulating metabolites in early life
Separate linear regression models with robust standard errors were used to estimate coefficients and 95% confidence intervals (95% CIs) for associations of GRSs with each metabolite as a dependent variable measured on the same individuals at age 8 y, 16 y, 18 y, and 25 y, adjusted for sex and age at the time of metabolite assessment.To aid interpretations, estimates were multiplied by 0.693 (log e 2) to reflect SD-unit differences in metabolites per doubling of genetic liability to CRC (Burgess and Labrecque, 2018).The Benjamini-Hochberg method was used to adjust p-values for multiple testing and an adjusted p-value of <0.05 was used as a heuristic for evidence for association given current sample sizes (Benjamini and Hochberg, 1995).Next, we performed a reverse Mendelian randomization analysis to identify metabolites influenced by CRC susceptibility in an independent population of adults.Finally, we performed a conventional (forward) Mendelian randomization analysis of circulating metabolites on CRC to identify metabolites causally associated with CRC risk.Consistent evidence across all three methodological approaches was interpreted to indicate a causal role for a given metabolite in CRC aetiology.
Reverse MR of the effects of CRC liability on circulating metabolites in middle adulthood 'Reverse' MR analyses (Holmes and Davey Smith, 2019) were conducted using UK Biobank for outcome datasets in two-sample MR to examine the effect of CRC liability on circulating metabolites.SNP-outcome (metabolite) estimates were obtained from a GWAS of metabolites in UK Biobank (Clayton et al., 2022;Borges et al., 2022a).Prior to GWAS, all metabolite measures were standardized and normalized using rank-based inverse normal transformation.Genetic association data for metabolites were retrieved using the MRC IEU UK Biobank GWAS pipeline ( Data. bris, 2022).Full summary statistics are available via the IEU Open GWAS project (Holmes and Davey Smith, 2019;Elsworth et al., 2020).Up to three statistical methods were used to generate reverse MR estimates of the effect of CRC liability on circulating metabolites using the TwoSampleMR package (Hemani et al., 2016): random-effects inverse variance weighted (IVW), weighted-median, and weightedmode, which each make differing assumptions about directional pleiotropy and SNP heterogeneity (Bowden et al., 2016;Hartwig et al., 2017).The IVW MR model will produce biased effect estimates in the presence of horizontal pleiotropy, that is, where one or more genetic variant(s) included in the instrument affect the outcome by a pathway other than through the exposure.In the weighted median model, each genetic variant is weighted according to its distance from the median effect of all genetic variants.Thus, the weighted median model will provide an unbiased estimate when at least 50% of the information in an instrument comes from genetic variants that are not horizontally pleiotropic.The weighted mode model uses a similar approach but weights genetic instruments according to the mean effect.In this model, over 50% of the weight of the genetic instrument can be contributed to by genetic variants which are horizontally pleiotropic, but the most common amount of pleiotropy must be zero (known as the Zero Modal Pleiotropy Assumption [ZEMPA]) (Hartwig et al., 2017).As above, estimates were multiplied by 0.693 (log e 2) to reflect SD-unit differences in metabolites per doubling of genetic liability to CRC (Burgess and Labrecque, 2018).

Forward MR of the effects of metabolites on CRC
Forward MR analyses were conducted using summary statistics from UK Biobank for the same NMRmeasured metabolites (SNP-exposure) and from GECCO/CORECT/CCFR as outlined above (SNPoutcome).We identified SNPs that were independently associated (R 2 < 0.001 and p<5 × 10 -8 ) with metabolites from a GWAS of 249 metabolites in UK Biobank described above.As before, we used up to three statistical methods to generate MR estimates of the effect of circulating metabolites on CRC risk (overall and site-specific): random-effects IVW, weighted median, and weighted mode.The Benjamini-Hochberg method was used to adjust p-values for multiple testing and an adjusted p-value of <0.05 was used as a heuristic for nominal evidence for a causal effect (Benjamini and Hochberg, 1995).MR outputs are beta coefficients representing the logOR for CRC per SD higher metabolite, exponentiated to reflect the OR for CRC per SD metabolite.
MR analyses were performed in R version 4.0.3(R Development Core Team, 2021) and GRS analyses in Stata 16.1 (StataCorp, College Station, TX).The ggforestplot R package was used to generate results visualizations (Scheinin et al., 2022).

Associations of CRC liability with circulating metabolites in early life
At the time the ALSPAC blood samples were taken, the mean age of participants was 7.5 y (N = 4767), 15.5 y (N = 2930), 17.8 y (N = 2613), and 24.5 y (N = 2559) for the childhood, early adolescence, late adolescence, and young adulthood time points, respectively.The proportion of participants which were male were 50.5,47.4,44.5,and 39.1% and mean BMI was 16.2,21.4,22.7,and 24.8 kg/m 2 for each time point, respectively.The socio-demographic profile of ALSPAC offspring participants has been reported previously (Boyd et al., 2013).The mean and standard deviation (SD) values for metabolites on each measurement occasion in ALSPAC are shown in Supplementary file 1b.
In the GRS analysis, there was no strong evidence of association of CRC liability with metabolites at age 8 y (Supplementary file 1c).At age 16 y, there was evidence for association with several lipid traits including higher cholesteryl esters to total lipids ratio in large low-density lipoprotein (LDL) (SD change per doubling CRC liability = 0.06, 95% CI = 0.02-0.10)and higher cholesterol in very small very low-density lipoprotein (VLDL) (SD change per doubling CRC liability = 0.06, 95% CI = 0.03-0.10).There was strong evidence for association with several traits at age 18 y including higher non-high-density lipoprotein (non-HDL) lipids, for example, a 1 doubling CRC liability was associated with higher levels of total cholesterol (SD change = 0.05 95% CI = 0.01-0.09),VLDL-cholesterol (SD change = 0.05, 95% CI = 0.01-0.09),LDL-cholesterol (SD change = 0.06, 95% CI = 0.02-0.09),apolipoproteins (apolipoprotein B [SD change = 0.06, 95% CI = 0.02-0.09]),and fatty acids (omega-3 [SD change = 0.08, 95% CI = 0.04-0.11],docosahexaenoic acid [DHA] [SD change = 0.05, 95% CI = 0.02-0.09])(Supplementary file 1c). Figure 2 (Figure 2-figure supplements 1-6) shows results for all clinically validated metabolites.At age 25 y, there was no strong evidence of association of CRC liability with metabolites.In anatomical site-specific analyses, there was strong evidence for association of liability to colon cancer with omega-3 (SD change = 0.07, 95% CI = 0.03-0.11)and DHA (SD change = 0.07, 95% CI = 0.03-0.10)at age 18 y.There was little evidence for any associations at any other CRC site or age (Supplementary file 1c).When SNPs in the FADS cluster gene regions were excluded due to possible horizontal pleiotropy given the role of FADS in lipid metabolism, there was a reduction in strength of evidence for an association of liability to CRC with any metabolite measured, although estimates were in a largely consistent direction with the prior analysis (Supplementary file 1d).

Reverse MR of the effects of CRC liability on circulating metabolites in middle adulthood
All instrument sets from the reverse MR analysis had an F-statistic greater than 10 (minimum F-statistic = 36, median = 40), suggesting that our analyses did not suffer from weak instrument bias (Supplementary file 1e).There was little evidence of an association of CRC liability (overall or by anatomical site) on any of the circulating metabolites investigated, including when the SNP in the FADS gene region was excluded, based on our pre-determined cut-off of FDR-P < 0.05; however, the direction of effect estimates was largely consistent with those seen in ALSPAC GRS analyses, with higher CRC liability weakly associated with higher non-HDLs, lipoproteins, and fatty acid levels (Supplementary file 1f and g). Figure 3 (Figure 3-figure supplements 1-3) shows the results for clinically validated metabolites.In subsite stratified analyses, there was strong evidence for a causal effect of genetic liability to proximal colon cancer on several traits, including total fatty acids (SD change per doubling of liability = 0.02, 95% CI = 0.01-0.04)and omega-6 fatty acids (SD change per doubling of liability = 0.03, 95% CI = 0.01-0.05).

Forward MR for the effects of metabolites on CRC risk
All instrument sets from the forward MR analysis had an F-statistic greater than 10 (minimum F-statistic = 54, median = 141), suggesting that our analyses were unlikely to suffer from weak instrument bias (Supplementary file 1h and i).There was strong evidence for an effect of several fatty acid traits on overall CRC risk, including of omega-3 fatty acids (CRC OR = 1.13, 95% CI = 1.06-1.21),DHA (OR CRC = 1.76, 95% CI = 1.08-1.28),ratio of omega-3 fatty acids to total fatty acids (OR CRC = 1.18, 95% CI = 1.11-1.25),ratio of DHA to total fatty acids (CRC OR = 1.20, 95% CI = 1.10-1.31),and ratio of omega-6 fatty acids to omega-3 fatty acids (CRC OR = 0.86, 95% CI = 0.80-9.13)(Supplementary file 1j, Figure 4, Figure 4-figure supplements 1-3).These estimates were overlapping with variable precision in MR sensitivity models.When SNPs in the FADS gene region were excluded, there was little evidence for a causal effect of any metabolite investigated on CRC risk based on the predetermined FDR-P cut-of off <0.05, although the directions of effect estimates were consistent with previous analyses (Supplementary file 1k).
In anatomical subtype-stratified analyses, evidence was the strongest for an effect of fatty acid traits on higher CRC risk, and this appeared specific to the distal colon, for example, omega-3 (distal CRC OR = 1.20, 95% CI = 1.09-1.32),and ratio of DHA to total fatty acids (distal colon OR = 1.29, 95% CI = 1.16-1.43).There was also evidence of a negative effect of ratio of omega-6 to omega-3 fatty acids (distal CRC OR = 0.80, 95% CI = 0.74-0.88)and a positive effect of ratio of omega-3 fatty acids to total fatty acids (distal CRC = 1.24, 95% CI = 1.15-1.35;seen also for proximal CRC OR = 1.15, 95% CI = 1.07-1.23)(Supplementary file 1j).These estimates were also directionally consistent in MR sensitivity models.

Discussion
Here, we used a reverse MR framework to identify circulating metabolites which are associated with genetic CRC liability across different stages of the early life course and attempted to replicate results in an independent cohort of middle-aged adults.We then performed forward MR to characterize the causal direction of the relationship between metabolites and CRC.Our GRS analysis provided evidence for an association of genetic liability to CRC with higher circulating levels of lipoprotein lipids (including total cholesterol, VLDL-cholesterol, and LDL-cholesterol), apolipoproteins (including apolipoprotein B), and fatty acids (including omega-3 and DHA) in young adults.These results were largely consistent in direction (though smaller in magnitude and weaker in strength of evidence) in a twosample MR analysis in an independent cohort of middle-aged adults.Results were attenuated, but consistent in direction, when potentially pleiotropic SNPs in the FADS gene regions were excluded.However, it should be noted that use of a narrow window for exclusion based on being within one of the three FADS genes may mean that some pleiotropic SNPs remain.Our subsequent forward MR analysis highlighted PUFAs as potentially having a causal role in the development of CRC.
Our analyses highlight a potentially important role of PUFAs in CRC liability.However, these analyses may be biased by substantial genetic pleiotropy among fatty acid traits.SNPs which are associated with levels of one fatty acid are generally associated with levels of many more fatty acid (and non-fatty acid) traits (Borges et al., 2022b).For instance, genetic instruments within the FADS cluster of genes will likely affect both omega-3 and omega-6 fatty acids, given FADS1 and FADS2 encode enzymes which catalyze the conversion of both from shorter chain into longer chain fatty acids (Borges et al., 2022b).In addition, the NMR metabolomics platform utilized in the analyses outlined here has limited coverage of fatty acids, meaning that many putative causal metabolites for CRC, for example, arachidonic acid, could not be investigated.Therefore, although our results indicate that PUFAs may be important in CRC risk, given the pleiotropic nature of the fatty acid genetic instruments and the limited coverage of the NMR platform, we are unable to determine with any certainty which specific classes of fatty acids may be driving these associations.
Our analyses featured evaluating the effect of genetic liability to CRC on circulating metabolites across repeated measures in the ALSPAC cohort.The mean ages at the time of the repeated measures were 8 y, 16 y, 18 y, and 25 y, representing childhood, early adolescence, late adolescence, and young adulthood, respectively, and therefore individuals in this cohort are unlikely to be taking metabolitealtering medication such as statins, and unlikely to have CRC.The strongest evidence for an effect of liability to CRC on metabolite levels was seen in late adolescence.The reason for this remains unclear.It is possible that this represents a true biological phenomenon if late adolescence is a critical window in CRC development or metabolite variability, which may be likely given the limited variance in metabolite levels at the later age of 25 y (Supplementary file 1b).The lack of an effect at the younger ages could be explained by the fact that the CRC GRS may capture many key life events or experiences which could impact the metabolome (e.g.initiation of smoking, higher category of BMI trait per doubling of genetic liability to colorectal cancer (purple, 8 y; turquoise, 16 y; red, 18 y; black, 25 y).Filled point estimates are those that pass a Benjamini-Hochberg FDR multiple-testing correction (FDR < 0.05).
The online version of this article includes the following figure supplement(s) for figure 2:       reached, educational attainment level set, etc.) but may not have yet happened at younger ages, thus obscuring an effect of genetic liability to CRC on the metabolome.Our results suggest that puberty could be important, with an effect seen seemingly particularly at the end of puberty.Repeating our analysis with sex-stratified data may aid in determining whether this is likely to be the case; sexstratified GWAS for metabolites are not currently available to replicate such analyses.An alternative explanation is selection bias due to loss of follow-up, leading to a change in sample characteristics over time.
Another key finding in the reverse MR analysis was that genetic liability to CRC was associated with increased levels of total cholesterol, VLDL-cholesterol, LDL-cholesterol, and apolipoprotein B, though we find little evidence for a causal effect of these traits on risk of CRC in the forward MR, replicating previous forward MR analyses for total and LDL-cholesterol (Gui et al., 2023;Rodriguez-Broadbent et al., 2017;Cornish et al., 2020;Luo et al., 2021).This suggests that these traits may either be only predictive of (i.e.non-causal for) later CRC development or may be influenced by the development of CRC and could have diagnostic or predictive potential.Given that the participants in the ALSPAC cohort are many decades younger than the average age of diagnosis for CRC (mean age 25 y in the latest repeated measure analysed in ALSPAC; whereas the median age at diagnosis of CRC is 64 y) (Granados-Romero et al., 2017), the former seems the most likely scenario.Previous conventional observational studies have presented conflicting results when investigating the association between measures of cholesterol and CRC risk, with some finding an inverse association and others a positive association, possibly reflecting residual confounding in conventional observational analyses (Park et al., 2000;Yang et al., 2022;Tian et al., 2015;Yao and Tian, 2015;Mayengbam et al., 2021;Mannes et al., 1986;Mamtani et al., 2016).Previous MR studies have had similar findings to our forward MR analysis, in that there seems to be little evidence for a causal effect of cholesterol on CRC development (Rodriguez-Broadbent et al., 2017;Cornish et al., 2020;Luo et al., 2021).One possible explanation for how circulating levels of total cholesterol, VLDL-cholesterol, LDL-cholesterol, and apolipoprotein B could predict (without necessarily causing) future CRC development could be linked to diet.A previous MR analysis suggested an effect of increased BMI on several measures of circulating cholesterol (Gui et al., 2023).Consuming a diet which is high in fat may increase CRC risk both through and possibly independently of adiposity, alongside increasing levels of circulating cholesterol (Ocvirk et al., 2019;Tang et al., 2012;Bell and Culp, 2022;Chiu et al., 2017;Han et al., 2018;Vincent et al., 2019).The potential for lipoprotein or apolipoprotein lipid measures in future CRC risk prediction should be further investigated.
Our analyses stratified by anatomical subsite highlighted fatty acids as being affected by genetic liability to colon and proximal colon cancer, with the forward MR confirming that fatty acid traits may be particularly important in the development of these subsites of CRC as well as distal colon cancer.
In our forward MR analyses, we were unable to replicate the findings of three previous MR studies which found evidence for a causal effect of circulating linoleic acid levels on CRC development in terms of strength of evidence, though the direction of the effect estimate was similar to previous studies (May Wilson et al., 2017;Khankari et al., 2020;Liyanage et al., 2019).This is surprising as all three previous analyses had a much smaller sample size than that included in our analysis (the largest had sample size of 24,748 for exposure vs 118,466 presently; and 11,016 cases and 13,732 controls for outcome vs 52,775 cases and 45,940 controls presently).Our analysis using updated metabolic trait per doubling of liability to colorectal cancer.Filled point estimates are those that pass a Benjamini-Hochberg FDR multiple-testing correction (FDR < 0.05).
The online version of this article includes the following figure supplement(s) for figure 3:     genetic instruments to proxy fatty acids may be more successful in accurately instrumenting heterogenous phenotypes such as metabolite levels compared with previous analyses.All other findings in our forward MR analysis are consistent with previous MR studies where they exist (Rodriguez-Broadbent et al., 2017;Cornish et al., 2020;Luo et al., 2021).

Limitations
The limitations of this study include firstly the relatively small sample size included in the ALSPAC analysis, which may have implications for power and precision.Secondly, mostly due to the longitudinal nature of the ALSPAC study, our sample at each time point is composed of slightly different individuals.This could be influencing our results and should be taken into account when comparing across time points.Thirdly, our analyses involving genetic instruments for CRC liability may have suffered from horizontal pleiotropy, even after excluding genetic variants in or near the FADS gene.Fourthly, our analyses were mostly restricted to white Europeans, which limits the generalizability of our findings to other populations.Fifthly, our analysis would benefit from being repeated with sex-stratified data.Although such GWAS results for metabolites are not currently available, the data to perform such GWAS are available in UK Biobank for future analyses.Sixthly, for our forward MR analysis, we used the UK Biobank for our exposure data.The UK Biobank has a median age of 58 at the time these measurements were taken, meaning statin use may be widespread in this population, which could be attenuating our effect estimates.Future work could attempt to replicate our analysis in a population with lower prevalence of statins intake.Finally, we included only metabolites measured using NMR.Confirming whether our results replicate using metabolite data measured with an alternative method would strengthen our findings.

Conclusions
Our analysis provides evidence that genetic liability to CRC is associated with altered levels of metabolites at certain ages, some of which may have a causal role in CRC development.Further investigating the role of PUFAs in CRC risk and circulating cholesterol in CRC prediction may be promising avenues for future research.and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses.EPICOLON: We are sincerely grateful to all patients participating in this study who were recruited as part of the EPICOLON project.We acknowledge the Spanish National DNA Bank, Biobank of Hospital Clínic-IDIBAPS, and Biobanco Vasco for the availability of the samples.

Figure 1 .
Figure1.Study design.First, linear regression models were used to examine the relationship between genetic susceptibility to adult colorectal cancer (CRC) and circulating metabolites measured in the Avon Longitudinal Study of Parents and Children (ALSPAC) participants at age 8 y, 16 y, 18 y, and 25 y.Next, we performed a reverse Mendelian randomization analysis to identify metabolites influenced by CRC susceptibility in an independent population of adults.Finally, we performed a conventional (forward) Mendelian randomization analysis of circulating metabolites on CRC to identify metabolites causally associated with CRC risk.Consistent evidence across all three methodological approaches was interpreted to indicate a causal role for a given metabolite in CRC aetiology.

Figure 2 .
Figure 2. Associations of genetic liability to adult colorectal cancer (based on a 72 single-nucleotide polymorphism [SNP] genetic risk score) with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y [N = 4767], 16 y [N = 2930], 18 y [N = 2613], and 25 y [N = 2559]).Estimates shown are beta coefficients representing the SD difference in metabolic Figure 2 continued on next page

Figure supplement 1 .
Figure supplement 1. Associations of genetic liability to adult colon cancer with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y, 16 y, 18 y, and 25 y).

Figure supplement 2 .
Figure supplement 2. Associations of genetic liability to proximal colon cancer with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y, 16 y, 18 y, and 25 y).

Figure supplement 3 .
Figure supplement 3. Associations of genetic liability to distal colon cancer with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y, 16 y, 18 y, and 25 y).

Figure supplement 4 .
Figure supplement 4. Associations of genetic liability to rectal cancer with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y, 16 y, 18 y, and 25 y).

Figure supplement 5 .
Figure supplement 5. Associations of genetic liability to adult colorectal cancer (excluding rs174533) with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y, 16 y, 18 y, and 25 y).

Figure supplement 6 .
Figure supplement 6. Associations of genetic liability to adult colon cancer (excluding rs174535) with clinically validated metabolic traits at different early life stages among the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring (age 8 y, 16 y, 18 y, and 25 y).

Figure supplement 1 .
Figure supplement 1. Associations of genetic liability to colorectal cancer with clinically validated metabolic traits in an independent sample of adults based on reverse two-sample Mendelian randomization analyses.

Figure supplement 2 .
Figure supplement 2. Associations of genetic liability to colorectal cancer (excluding genetic variants in the FADS gene region) with clinically validated metabolic traits in an independent sample of adults based on reverse twosample Mendelian randomization analyses.

Figure supplement 3 .
Figure supplement 3. Associations of genetic liability to colorectal and colon cancer with clinically validated metabolic traits in an independent sample of adults based on reverse two-sample Mendelian randomization analyses with FADS variants excluded from colorectal cancer instruments.

Figure 4
Figure 4 continued on next page Figure supplement 1. Associations of clinically validated metabolites with colorectal cancer by site (colorectal, colon, distal colon, proximal colon, and rectal cancer) based on conventional (forward) two-sample Mendelian randomization analyses.

Figure supplement 2 .
Figure supplement 2. Associations of clinically validated metabolites with colorectal cancer based on conventional (forward) two sample Mendelian randomization analyses with FADS variants excluded from metabolite instruments.

Figure supplement 3 .
Figure supplement 3. Associations of clinically validated metabolites with colorectal cancer by site (colorectal, colon) based on conventional (forward) two sample Mendelian randomization analyses with FADS variants excluded from metabolite instruments.

Figure 4 continued
Figure 4 continued Bull, Hazelwood et al. noted that many individuals in this older age group use fat-targeting drugs called statins, which may have obscured this connection.