Large-scale metabolomic profiling and incident non-alcoholic fatty liver disease

Summary Non-alcoholic fatty liver disease (NAFLD) is a highly prevalent disease with no specific drug therapy. High-throughput metabolomics present an unprecedented opportunity to identify biomarkers and potentially causal risk factors for NAFLD. Here, we determined the impact of 21 circulating metabolites, 17 lipids, and 132 lipoprotein particle characteristics on NAFLD combining prospective observational and two-sample Mendelian randomization (MR) analyses in 121,032 UK Biobank participants. We identified several metabolic factors associated with NAFLD risk in observational and MR analyses including triglyceride-rich and high-density lipoprotein particles composition, as well as the ratio of polyunsaturated fatty acids to total fatty acids. This study, is one of the largest to investigate incident NAFLD, provides concordant observational and genetic evidence that therapies aimed at reducing circulating triglycerides and increasing large HDL particles, as well as interventions aimed at increasing polyunsaturated fatty acid content may warrant further investigation into NAFLD prevention and treatment.


INTRODUCTION
Worldwide, up to 25% of the population could be affected by non-alcoholic fatty liver disease (NAFLD). NAFLD is one of the leading causes of serious liver complications, such as steatohepatitis (NASH), fibrosis, cirrhosis, liver failure, and hepatocellular carcinoma. 1,2 The prevalence of NAFLD and its associated sequelae has been increasing for the past decades, posing a serious threat to global population health. 3 NAFLD is characterized by an accumulation of lipids within hepatocytes (> 5% of hepatocyte mass). It is associated with abdominal adiposity and disturbances in hepatocyte lipid handling. 4 Therapeutic options are often limited to lifestyle changes, and effective and specific drug treatment for NAFLD is still an important unmet medical need.
In recent years, the development of high-throughput metabolomic platforms has provided unique opportunities to test the association of hundreds of metabolic measures with complex diseases such as NAFLD. Previous metabolomic and lipidomic clinical studies have identified biomarkers associated with NAFLD and NASH. 5 Metabolomic observational studies have found that the blood level of several amino acids is elevated in patients with NAFLD. 6 Lipidomic observational studies reported plasma lipid abnormalities in NAFLD such as higher levels of VLDL (very low-density lipoproteins) and LDL (low-density lipoproteins), as well as lower levels of HDL (high-density lipoproteins) cholesterol. 7,8 However, these studies had small sample sizes and had cross-sectional designs so reverse causality and confounding cannot be excluded. Whether these metabolic biomarkers are simple correlates or causal drivers of NAFLD remains unclear. Consequently, it is unknown whether therapies designed to alter their blood levels could prevent the onset of NAFLD.
Finding effective therapies for NAFLD has been hampered by the difficulty of finding modifiable causal targets. NAFLD is diagnosed histologically through a biopsy, which is invasive and carries some risk to patients, or with imaging, which is costly, limiting the ability of large-scale randomized control trial to assess drug target efficacy. Novel and innovative approaches for drug target prioritization such as Mendelian randomization (MR) analyses have been used for cardiometabolic diseases such as coronary artery disease (CAD), 9 but work using MR specifically for NAFLD remain sparse. MR leverages the power of large human The study included a prospective investigation in the UK Biobank, as well as a two-sample Mendelian randomization investigation into the impact of circulating metabolite with incident and prevalent non-alcoholic fatty liver disease, respectively. Related to STAR Methods. ll OPEN ACCESS 2 iScience 26, 107127, July 21,2023 iScience Article RESULTS

Observational analyses
The clinical characteristics of the study total sample and subsample with and without incident NAFLD are presented in Table 1. For observational analyses, we included all participants with nuclear magnetic resonance (NMR) data available (N = 121,731) and excluded all participants with an NAFLD diagnosis prior to enrollment (N = 699). Participants of the NMR subsample did not differ on key covariates from the total UKB sample ( Table 1). We identified NAFLD time of diagnosis using first occurrence data (hepatic fibrosis, NASH, NAFLD, and other specified diseases of the liver without ICD10 exclusion code). During the median 12.6-year follow-up, 2259 participants had incident NAFLD.
We calculated Cox proportional hazards to determine the association of 170 circulating metabolic factors with incident NAFLD. We included age at recruitment, sex, ethnicity, smoking status (ever smoked, never smoked), alcohol consumption, waist circumference as well as medication for blood pressure, diabetes, or cholesterol as covariates. Adjusted hazard ratios (HR) for NAFLD per one standard deviation (SD) increase of circulating metabolites ranged from 0.71 to 1.33. Given the high correlation between each of the exposures evaluated, we adjusted for multiple comparison using principal components. We applied a Bonferroni correction with a number of tests equal to the number of principal components that accounted for a cumulative 90% of the exposure variance (p < 0.05/8 = 0.0063). 112 metabolic factors had evidence for predicting incident NAFLD: 38 metabolites were negatively associated with NAFLD incidence and 74 metabolites were positively associated with NAFLD incidence (Table S1). Figures 2, 3 and 4 present the effect of metabolites, lipids, and lipoprotein concentrations on NAFLD using observational and univariable MR analyses.

Univariable Mendelian Randomization
We used univariable MR to assess the causal impact of genetically predicted NMR metabolic measures on NAFLD. Table S2 presents the GWAS study details. We selected all robust (p < 5e-8) and independent (linkage disequilibrium [LD] r 2 < 0.01 in a 1 Mb window) SNPs as genetic instruments for MR analyses. As a sensitivity analysis, we excluded SNPs 500 kb upstream and downstream of the GCKR region (2:27219709-28246554), which is known to exhibit pleiotropy. We used the Steiger's test to flag and remove genetic instruments that were more strongly linked with the outcome than the exposure. Table S3 presents association statistics for all genetic instruments. All F statistics for the genetic instruments were over 54, indicating excellent strength of genetic instruments (Table S4). We performed IVW-MR and robust MR analyses (MR-PRESSO, weighted median and contamination mixture) for each of the metabolic measures. We used the p value of the intercept test of MR-Egger regression to assess horizontal pleiotropy.  Figure 2 presents the association of 21 circulating metabolites with NAFLD risk in observational and univariable MR analyses. Most metabolites associated with incident NAFLD in observational analyses were not causally associated with NAFLD in MR analyses. Using 14 SNPs (r 2 = 0.65%; F-statistic = 54), acetate levels were negatively associated with NAFLD (OR 0.46 95% CI = 0.26-0.83, p = 9.3e-03). This association-which was in the opposite direction of the observational point estimate-remained consistent across robust MR analyses (Table S5) and the MR-Egger intercept did not differ significantly from zero (Table S6). However, this association did not pass multiple testing significance. Circulating levels of alanine and albumin were associated with NAFLD using IVW-MR, but results using MVMR, and robust MR were inconclusive. Figure 3 presents the association of 17 circulating lipids with NAFLD risk in observational and univariable MR analyses. In observational analyses, saturated and monounsaturated fatty acids were positively associated with incident NAFLD, whereas polyunsaturated fatty acids were not associated with incident NAFLD. The ratio of polyunsaturated fatty acids to total fatty acids strongly predicted incident NAFLD (HR 0.71 95% CI = 0.68-0.74, p = 6.1e-54). MR paralleled many of these results, for instance by reporting a strong positive effect of monounsaturated and saturated lipids on NAFLD, but no evidence for an association of polyunsaturated fat and NAFLD. Using a genetic instrument of 56 SNPs (r 2 = 4.86%; F-statistic = 105), the ratio of polyunsaturated fatty acids to total fatty acids was negatively associated with NAFLD risk (OR = 0.70 95% CI = 0.60-0.81, p = 1.8e-06). The omega-6 to omega-3 ratio, often used as a biomarker for cardiometabolic health, was not associated with incident NAFLD in observational and univariable MR analyses.  iScience Article lipoprotein subfractions. Investigating the association of lipoprotein concentrations and particle diameters, we observed that VLDL particle concentration and diameter were positively associated with NAFLD in observational and univariable MR analyses, LDL particle concentration and diameter were not associated with NAFLD, and large HDL particle concentration and diameter were negatively associated with NAFLD in observational and univariable MR analyses. These results were directionally consistent in robust MR analyses and Egger intercept indicated no significant horizontal pleiotropy.
As several triglyceride-rich lipoprotein characteristics were causally associated with NAFLD, we hypothesized that triglyceride levels may represent the main driver for these associations. Similarly, as several HDL particle content and cholesterol concentrations were causally associated with NAFLD, we hypothesized that HDL-cholesterol may represent the driver for these association. We therefore investigated the association of plasma triglyceride levels and HDL-cholesterol with among participants with these measures available and without NAFLD diagnosis prior to enrollment (458,048 study participants including 8571 incident NAFLD cases). Plasma triglycerides and HDL-cholesterol were associated with NAFLD incidence ( We also investigated the combined impact of high triglyceride/low HDL-cholesterol metabolic dyslipidemia in the full UK Biobank dataset. We categorized participants based on their lipid levels using clinical cut-off values, as suggested by the National Cholesterol Education Program-Adult Treatment Panel III (NCEP-ATPIII). For TG levels, the cut-off value was 1.7 mmol/L for both men and women and for HDL-C the cut-off value was 1.0 mmol/L for men and 1.3 mmol/L for women. Figure 5 presents the respective time to NAFLD diagnosis Kaplan-Meier curves of these groups. Individuals with metabolic dyslipidemia (high triglycerides and low HDL-cholesterol) were at higher risk of incident NAFLD. In multivariate adjusted Cox regression, participants with metabolic dyslipidemia had an HR of 1.83 (95% CI = 1.72-1.96, p = 7.7e-75) for incident NAFLD compared to participants with low triglycerides and high HDL-cholesterol levels (Table S9). Taken together, results from observational and MR analyses are consistent with a strong and iScience Article independent causal effect of triglycerides and HDL-cholesterol on NAFLD and suggest that the high triglyceride/low HDL-cholesterol metabolic dyslipidemia may be key risk factors for NAFLD onset.

Multivariable Mendelian randomization
In a previous MR study, our group identified waist circumference, a crude marker of abdominal adiposity, as a key driver for NAFLD risk. 12 Waist circumference has been linked, using MR, with circulating lipids and lipoproteins 11 in the same UK Biobank dataset suggesting that waist circumference could act as a confounding factor in the association between circulating metabolic factors and NAFLD. To remove the effect of waist circumference as a potential confounding factor, we performed multivariable MR. We used as genetic instruments SNPs robustly (pval < 5e-8) and independently (LD R 2 < 0.01) associated with each metabolic measure. We included GWAS summary statistics of waist circumference measured in the UK Biobank. These lipoproteins were selected based on the k-mean clustering analysis which identified two lipoprotein clusters (HDL particle structures and triglyceride levels). We selected relevant traits to enforce this finding. For example, we present triglycerides and triglyceride rich particle concentration as well as LDL and HDL particle concentration  -S3 and Table S10).
Altogether, we identified 29 metabolic factors that influence NAFLD risk. These associations pass multiple testing correction, are consistent across a range of robust MR analyses, are robust to the inclusion of waist circumference in MVMR analyses and do not display evidence of reverse causality with liver fat accumulation (Methods) (Figures S4-S6). Moreover, none of the genetic variants were flagged by Steiger filtering and the effect sizes remained similar when excluding SNPs in the GCKR gene region (Table S11). Using k-mean clustering on the phenotypic correlation, several of these metabolites are highly correlated and form three clusters ( Figure S7). The first cluster encompasses 11 lipoprotein factors related to HDL-cholesterol particles (minimum Pearson r = 0.77), the second encompasses 17 lipoprotein factors related to triglyceride levels (minimum Pearson r = 0.79) and the third is the ratio of PUFA on total FA. All associations are presented in a volcano plot in Figure 6.

DISCUSSION
This study, the largest to date to investigate incident NAFLD in the general population, examined the association of 21 blood metabolites, 17 lipids, and 132 lipoprotein characteristics with incident and prevalent NAFLD. Using a combination of prospective observational analyses in the UK Biobank and two-sample MR, we identified 29 metabolic factors associated with NAFLD. These associations pass multiple testing correction, are consistent across a range of robust MR analyses, are robust to the inclusion of waist circumference in multivariable MR analyses and do not display evidence of reverse causality with liver fat. These results prioritize triglycerides and triglyceride-rich lipoprotein concentrations, HDL particle structure and PUFA iScience Article to total FA ratio as key factors that may causally be involved in the onset of NAFLD, which may represent novel therapeutic and dietary strategies to prevent NAFLD. From a clinical standpoint, these results also suggest that the high triglyceride/low HDL-cholesterol metabolic dyslipidemia may be an important risk factor for incident NAFLD.
Previous studies revealed that individuals with NAFLD have higher levels of circulating tyrosine 15 and branched-chain amino acids (BCAAs: isoleucine, leucine, and valine) 16 compared to individuals without NAFLD. Similarly, our observational analyses in the UK Biobank support that individuals with high levels of these amino acids are at increased risk of developing NAFLD. However, our MR analyses did not support a causal role for BCAAs or tyrosine on NAFLD. The observational link between BCAAs and NAFLD could arguably be confounded by factors, such as insulin resistance which increase circulating amino acid levels by increasing muscle protein catabolism. 17 It is worth noting that some preclinical models support a causal role of BCAA in NAFLD development. For instance, overexpression of the phosphatase (PPM1K), which lowers circulating BCAA reduces hepatic steatosis in mice. 18 Our results support that circulating fatty acids, especially monounsaturated and saturated fatty acids may be positively associated with incident NAFLD and provide some evidence that this association may be causal. The PUFA/total fatty acids ratio was inversely associated with NAFLD risk in observational and MR analyses. Three overfeeding studies reported SFA to increase liver fat more than monounsaturated fat and PUFA. In a study investigating the relative effect of SFA and PUFA, 39 normal-weight individuals were overfed muffins high in saturated fatty acids (palm oil) or n-6 PUFAs (sunflower oil) for seven weeks.
Although weight gain was similar between groups, SFA increased liver fat by 40% whereas PUFA did not increase liver fat. 19 Later in a similar experiment performed in 60 individuals with an elevated body weight, SFA increased liver fat by 50% whereas PUFA did not increase liver fat. 20 Similarly, in 38 participants considered overweight, overfeeding 1,000 extra kcal/day for three weeks a diet rich in SFA increased liver fat by 55% whereas overfeeding monounsaturated fatty acids only increased liver fat by 15%. 21 The main limitation from these studies is short duration, leaving the possibility that diet does not have a long-term impact on NAFLD risk. Dietary intervention studies lasting several years are difficult to perform. Our MR results investigating genetically predicted circulating lipids enabled the assessment of lifelong exposure to various blood lipid concentrations. Results of these analyses support those of dietary interventions studies and confirm that diets rich in PUFA and low in SFA may contribute to NAFLD prevention/treatment.
Our results support that high levels of triglycerides may be linked with NAFLD incidence (with UK Biobank participants in the top triglyceride quintile being at more than 2-fold higher NAFLD risk compared to those of the bottom quintile) but also that this relationship may be causal. Triglycerides lowering may offer promise for NAFLD prevention. For example, the thiazolidinedione pioglitazone decreases steatosis by binding and activating peroxisome proliferator-activated receptor gamma in adipocytes to promote adipogenesis and fatty acid uptake in peripheral but not visceral fat. 22 In 55 randomly assigned patients with NASH, pioglitazone plus a hypocaloric diet, decreased triglyceride levels and hepatic fat content by 54% compared to hypocaloric diet plus placebo. 23 Similarly, in 101 patients with NASH hypocaloric diet plus pioglitazone, as compared with hypocaloric diet plus placebo resolved NASH in 51% of pioglitazone-treated patients vs. 19% of placebotreated patients. Hepatic triglyceride content and circulating triglyceride levels also both decreased. 24 Another triglyceride-lowering agent, fenofibrate, reduces circulating triglycerides primarily by increasing B-oxidation and triglyceride-rich lipoprotein catabolism 25 (whereas pioglitazone increase adipose tissue storage capacity). In several small clinical studies, fenofibrate significantly lowered circulating very low-density lipoprotein-TG (VLDL-TG) concentrations by as much as 48.9%, but the hepatic lipid accumulation remained unchanged. [26][27][28][29] Recently, the PROMINENT trial demonstrated that pemafibrate treatment reduced triglyceride levels by 26% and reduced investigator-reported 3.4 years incident NAFLD by 22% when compared to placebo in 10,497 patients with type 2 diabetes. 30 These studies, combined with the results of our study suggest that the mechanism of triglyceride-reducing drugs may be important for the prevention and/or treatment of NAFLD.
Although CAD and NAFLD are both associated with dyslipidemia, CAD lipoprotein/lipids causal factors may differ from NAFLD's. The levels of triglycerides and cholesterol are intertwined because of their shared metabolism. Alterations in both LDL and triglycerides are associated with CAD incidence; although drugs that reduce LDL have been successful in preventing CAD, drugs that reduce triglyceride levels have not shown consistent effects. 31 Multivariable MR analyses reached similar conclusions, prioritizing apolipoprotein B, a key determinant of atherogenic lipoprotein particles, as the main causal factor for CAD. 9,32 Our MR analysis revealed no causal effect of LDL cholesterol on NAFLD risk, but a large effect of triglycerides on NAFLD risk. Hence, among lipids, LDL-C/APOB is the main causal factor for CAD, but triglycerides and HDL particle composition appear to be important causal factors for NAFLD.
The main strength of this paper is the combination of observational and MR analyses and the use of the largest study sample to date to investigate the relationship between blood metabolites, lipids, and lipoproteins with incident (rather than prevalent) NAFLD. At the present time, MR methods cannot analyze time-to-event data (although this is an area of intensive research). Hence, the two methods are complementary. MR enables the study of lifelong genetic effect of metabolic factors on NAFLD in a large European sample, whereas prospective observational analyses evaluate predictive value of metabolic factors on incident NAFLD.
In conclusion, we have screened 170 metabolic factors for their association with incident NAFLD using a combination of observational and MR analyses. This analysis revealed that many of these metabolites ll OPEN ACCESS iScience 26, 107127, July 21, 2023 9 iScience Article such as BCAAs are associated with incident NAFLD but that they may not be causal risk factors for NAFLD. On the other hand, several plasma lipids and lipoproteins provided concordant effects on NAFLD in observational and MR analyses. Our results suggest that therapies aimed at reducing circulating triglyceride-rich lipoprotein concentrations or increasing HDL particles as well as interventions targeting dietary fat quality, for instance by increasing polyunsaturated over other fats, could prevent the onset of NAFLD.

Limitations of the study
Our findings should be considered within the study limits. First, he NAFLD GWAS included $8000 cases and $700,000 controls and there were 11,308 cases and 491,103 controls in the entire UK Biobank sample, but the population prevalence of NAFLD in the UK has been estimated to be around 25%. Furthermore, we used liver fat percentage measured by liver magnetic resonance imaging in the UK Biobank 33,34 to derive a more accurate NAFLD prevalence estimate in the UK Biobank. The proportion of individuals with a liver fat percentage > 5% was 10967/40533 or 27% of the sample. Hence, it is highly probable that some controls may have been misclassified. While it is important to acknowledge this limitation, we believe that such misclassification could bias our results toward the null and underestimate the strength of the reported associations. Second, our MR analysis was restricted to participants of European ancestry only, because of the scarce availability of genome-wide summary statistics on non-European individuals. While this might restrict the generalizability of our results to this group, our observational analyses, performed without excluding non-white individuals, yielded similar conclusions. Finally, the omega-6 to omega-3 ratio was not associated with incident NAFLD. We did not have access to EPA/DHA ratio and hepatic measures, which may explain the null association.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
We would like to thank all study participants as well as all investigators of the UK Biobank, as well as the studies that were used throughout the course of this investigation.

Lead contact
Further information and request for resources should be directed to and will be fulfilled by the lead contact, Benoit J. Arsenault (benoit.arsenault@criucpq.ulaval.ca).

Materials availability
This study did not generate new material or reagent.
Data and code availability d All genome-wide summary statistics used in this study are in the public domain.
d All original code has been deposited on GitHub and is publicly available as of the date of publication at https://github.com/LaboArsenault/NMR_NAFLD.git d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Study participants
The UK Biobank is a population-based cohort of approximately 500,000 participants aged 40 to 69 years recruited during 2006-2010 from several centers across the UK. Data access permission for this study was granted under UKB application 25205. NMR metabolic biomarkers have been quantified in approximately one third of randomly selected participants in the UK Biobank. 14 Metabolites from non-fasted plasma samples were measured using high-throughput nuclear magnetic resonance (NMR) (Nightingale Health Ltd; biomarker quantification version 2020). A total of 121,074 participants (46% men, 94% White) were retained for analyses after removing duplicates and observations not passing quality control (QC) (i.e., sample QC flag ''low protein,'' biomarker QC flag ''technical error'' or samples with insufficient material). NMR platform quantifies 168 metabolic measures and 81 derived ratios. We included all metabolic measures and two ratios: the ratio of polyunsaturated fatty acids over all fatty acids and the ratio of Omega 6 fatty acids over Omega 3 fatty acids. The NMR platform utilization has been previously described. 14 We used the ukbnmr package for quality control and removal of technical variation of the metabolites. 35

Observational analyses
We used first occurrences dates for liver disease ("K74," "K75," "K760," ICD-10) provided by UKB until October 1, 2021. These data were retrieved from death registries, primary care information (available for $45% of UKB participants), hospital admission medical reports and self-report data. We then used Hospital Episode Statistics data to refine our variable to ICD10 code at three levels by excluding participants with hospital inpatient exclusion code (all K.74 code except K74.0 and K74.2 (hepatic fibrosis), all K75 code except K75.8 (NASH), all K76 code except K76.0 (NAFLD) and ICD10: K76.9 (other specified diseases of the liver)). This procedure allows the exclusion of participants with liver disease that were not related to NAFLD (alcoholic liver disease, viral hepatitis, alpha-1 anti trypsin deficiency, etc.). NAFLD Cox regression results were obtained from models adjusted for variables at recruitment including age, sex, smoking status (never/ever smoked), alcohol consumption, Townsend's deprivation index, ethnicity, waist circumference, and metabolic syndrome medication dichotomized (do you presently take any of the following medications: cholesterol-lowering medication, blood pressure medication or insulin). Results are presented as estimated hazard ratios (HR) for incident NAFLD diagnosis per SD increase of circulating metabolites, with 95% confidence intervals. We also plotted Kaplan-Meier survival curves for triglyceride and HDLcholesterol quintiles as well as for the combination of the two risk factors. iScience Article ancestries-based principal components. We then meta-analyzed four cohorts (The eMERGE network, the UK Biobank, the Estonian Biobank and FinnGen) using METAL. 37 Liver Fat: GWAS summary statistics for liver fat measured using magnetic resonance imaging were obtained from a GWAS of 32,860 white British participants from the UK Biobank. 38 Magnetic resonance scans were annotated by trained radiologists following a standard procedure. Deep learning algorithms were then applied to estimate liver fat. Liver fat was inverse-normal transformed and adjusted for genetic sex, age, age squared, the first 10 principal components of genetic ancestry, scaled scan date, scaled scan time, and study center as fixed effects and genetic relatedness as a random effects term. Using BOLT-LMM, the residuals were then regressed on gene carrier status. Waist circumference: GWAS summary statistics for waist circumference were obtained from the UK Biobank from 462,166 Europeans. GWAS was conducted using linear mixed model (LMM) association method as implemented in BOLT-LMM (v2.3) correcting genotype array, sex and the ten first genetic principal components. The resulting residuals were inverse-normal transformed prior to GWAS. Circulating metabolic measures: GWAS summary statistics for 170 circulating metabolites were obtained from 114,999 UK Biobank participants of European ancestry. 39 GWAS was conducted using LMM association method as implemented in BOLT-LMM (v2.3) correcting genotype array, sex and the ten first genetic principal components. All measures were standardized and normalized prior to analyses using rank-based inverse-normal transformation. These summary statistics were accessed through the IEU Open GWAS Project.

Univariable MR analyses
For univariable MR analyses, we included as genetic instruments all genome-wide significant SNPs (p value <5e-8). We then ensured their independence by clumping using 1 Mb window with a LD r 2 <0.01 using the European 1000-genome LD reference panel. We excluded genetic regions that are known to be widely pleiotropic with a primary effect on the liver to take MR assumptions into consideration. For this reason, we removed the GCKR region as a sensitivity analysis. GCKR is primarily expressed in the liver, where it regulates lipogenesis, glycogenesis and B-oxidation. A non-synonymous genetic variant in the GCKR region is known to influence the levels of a wide range of metabolites. 40 Table S2 contains relevant association statistics for genetics instruments of each exposure. We harmonized the exposure and outcome datasets by aligning the effect sizes of different studies on the same effect allele. All GWAS summary statistics were reported on the forward strand. When a particular SNP was not present in the outcome datasets, we used a proxy SNP (r 2 > 0.8) obtained using LD matrix of European samples from the 1000 Genomes Project.
We performed the inverse variance weighted (IVW) method with multiplicative random effects with a standard error correction for under dispersion as primary MR analysis. MR must respect three core assumptions (relevance, exchangeability, and exclusion restriction) to assert correct causal inference. We evaluated the relevance assumption by assessing instrument strength using the F-statistic, 41 and the variance explained using the r 242 (Table S6). The exchangeability and the exclusion restriction assumption are unverifiable but can be questioned when heterogeneity in the estimate is present. We quantified heterogeneity in the exposure outcome association using Cochran's Q. We also perform four different robust MR analyses (the MR Egger, 10 the contamination mixture, 43 the weighted median, and the MR-PRESSO 44 ) to explore whether the results differed from the primary MR results. These robust methods, their assumptions and their statistical properties have been extensively reviewed elsewhere. 45 Given the high correlation between each of the exposures evaluated, we adjusted for multiple comparison using principal components. We applied a Bonferroni correction with a number of tests equal to the number of principal components that accounted for a cumulative 90% of the exposure variances. In our dataset, the first eight principal component accounted for 90% of the exposure's variance. Therefore, we applied a Bonferroni correction for 8 tests throughout the manuscript. We considered as potential causal association the MR results with significant primary MR analysis after adjustment for multiple testing, with a majority of robust MR analyses significant at p < 0.05, with all estimates directionally consistent across methods, with Egger intercept p < 0.05, with robustness to the inclusion of waist circumference in MVMR, and with no evidence for reverse causality with liver fat.
We evaluated reverse causality by performing Steiger filtering and reverse MR. The Steiger test provides a p value under the null hypothesis that the difference in variance explained is null. 46 We systematically removed the variants tagged by the Steiger test. As another test of reverse causality, we performed reverse MR. It is not recommended to perform MR with a binary factor as exposure. 47 The binary factor NAFLD, ll OPEN ACCESS iScience 26, 107127, July 21, 2023 15 iScience Article which is diagnosed when liver fat percentage is above 5%, is akin to a dichotomization of the underlying continuous factor ''liver fat''. We therefore estimated the causal effect of the continuous variable ''liver fat'', and not NAFLD, in reverse MR.

Multivariable MR analyses
For multivariable MR, we included as genetic instruments the SNPs that were previously selected in univariable MR for the exposure (i.e., metabolites, lipid concentrations and lipoprotein characteristics). We used the same set of SNPs for the covariate (i.e., waist circumference) and harmonized the effect allele with the exposure SNPs. When an SNP was not present, either in the exposure or in the outcome dataset, we used a proxy SNP (r 2 > 0.8) instead. As primary multivariable MR analysis, we performed the IVW method. Like univariable MR, Multivariable MR must include valid instruments that respect the three core assumptions. In contrast to univariable MR, multivariable MR allows genetic variants to be associated with multiple risk factors (pleiotropy) given that these risk factors are included in the analysis. We evaluated the relevance assumption by quantifying instrument strength using the conditional F-statistics 48 in the MVMR V.0.2.0 package. 48 We quantified heterogeneity in the exposure outcome association correcting for covariates using Cochran's Q. 13 We also performed two different robust MR analyses: the multivariable median method and the multivariable MR-Lasso method. 49 Like robust univariable MR analyses, consistent estimates across different methods give confidence in the robustness of the causal findings. We considered as potential causal association the MR results that were supported by primary MVMR analysis and robust MVMR analyses (MVMR-Weighted median method p < 0.05, MVMR-Egger intercept p > 0.05, MVMR-Lasso p < 0.05) and all estimates directionally consistent.

Institutional review board approval
The UK Biobank received ethical approval from the Research Ethics Committee (REC reference for the UK Biobank is 11/NW/0382). All GWAS summary statistics were publicly available and accessible through URL. For all included genetic association studies, all participants provided informed consent and study protocols were approved by their respective local ethical committee.

ll
OPEN ACCESS