Systematic review and meta-analyses: What has the application of Mendelian randomization told us about the causal effect of adiposity on health outcomes?

Mendelian randomization (MR) is increasingly used for generating estimates of the causal impact of exposures on outcomes. Evidence suggests a causal role of excess adipose tissue (adiposity) on many health outcomes. However, this body of work has not been systematically appraised. We systematically reviewed and meta-analysed results from MR studies investigating the association between adiposity and health outcomes prior to the SARS-CoV-2/COVID-19 pandemic (PROSPERO: CRD42018096684). We searched Medline, EMBASE, and bioRxiv up to February 2019 and obtained data on 2,214 MR analyses from 173 included articles. 29 meta-analyses were conducted using data from 34 articles (including 66 MR analyses) and results not able to be meta-analysed were narratively synthesised. Body mass index (BMI) was the predominant exposure used and was primarily associated with an increase in investigated outcomes; the largest effect in the meta-analyses was observed for the association between BMI and polycystic ovary syndrome (estimates reflect odds ratios (OR) per standard deviation change in each adiposity measure): OR = 2.55; 95% confidence interval (CI) = 1.22–5.33. Only colorectal cancer was investigated with two exposures in the meta-analysis: BMI (OR = 1.18; 95% CI = 1.01–1.37) and waist-hip ratio (WHR; OR = 1.48; 95% CI = 1.08–2.03). Broadly, results were consistent across the meta-analyses and narrative synthesis. Consistent with many observational studies, this work highlights the impact of adiposity across a broad spectrum of health outcomes, enabling targeted follow-up analyses. However, missing and incomplete data mean results should be interpreted with caution.

Body mass index (BMI) was the predominant exposure used and was primarily associated with an increase in investigated outcomes; the largest effect in the meta-analyses was observed for the association between BMI and polycystic ovary syndrome (estimates reflect odds ratios (OR) per standard deviation change in each adiposity measure): OR = 2.55; 95% confidence interval (CI) = 1.22-5.33.Only colorectal cancer was investigated with two exposures in the meta-analysis: BMI (OR = 1.18; 95% CI = 1.01-1.37)and waist-hip ratio (WHR; OR = 1.48;

Introduction
Observational epidemiological studies have indicated that adiposity is strongly associated with all-cause and cause-specific mortality 1,2 as well as numerous health outcomes 3 .This includes many common diseases, such as cardiovascular disease (CVD) 4 and many cancers 5 , as well as commonly accepted risk factors for diseases such as high blood pressure 6 .Mendelian randomization (MR) studies can be used alongside conventional observational studies to strengthen evidence for causality within an association (or indeed provide evidence against an association) 7 .Briefly, MR is a statistical method which, exploiting the random allocation of alleles during gametogenesis, utilizes germline genetic variation (usually, single nucleotide polymorphisms) as instrumental variables for traits of interest [8][9][10] .There has been a steady increase in the publication of MR studies since being widely reported on in 2003 8 .There is now a large body of evidence from MR studies for a causal effect of adiposity on many outcomes, including many cancers 11,12 .
Systematic reviews enable a global overview of the literature and provide avenues for hypothesis generation.In combination with meta-analyses, systematic reviews can be used as a method for improved causal inference as pooled estimates can be more precise than estimates from individual studies 13 .As the MR literature has not been systematically appraised with respect to the association between adiposity and health outcomes, we set out to systematically review MR studies investigating adiposity as an exposure and provide pooled estimates where appropriate.During the recent severe acute respiratory syndrome coronavirus 2/coronavirus disease 2019 (SARS-CoV-2/COVID-19) pandemic, there was an explosion of work focused on body mass index (BMI) related traits and outcomes/ intermediates or infection impact.This is extremely important, but is complicated by both the parameterisation of infection as a target and the nature of exhaustive genetic instruments for adiposity.This work has been brought together to recount the body of work undertaken immediately before this event and hence presents a pre-pandemic overview of the literature.Further work is now, of course, needed to distil the postpandemic literature; however, that is not within the remit of this review.
Here, a hypothesis-free systematic review and meta-analyses are presented alongside a narrative synthesis of 173 articles reporting 2,214 MR analyses.This work was pre-published on PROSPERO (Extended data File 1) 14 , is accompanied by Extended data 14 , and a GitHub repository (Underlying data 15 ) and data browser where all data, scripts, results, and figures are available.A narrative synthesis of non-meta-analysed studies is given in Extended Data 6.

Methods
Data sources and search strategy EMBASE and MEDLINE were searched from inception (EMBASE = 1974; MEDLINE = 1946) until 18 February, 2019 using detailed search strategies including free text and controlled vocabulary terms, and used synonyms for both adiposity and MR terms (Extended data File 2 14 and on GitHub).The pre-print service, bioRxiv, was also searched from inception (November 2013) until 18 February, 2019.Due to the limited search functionality and inability to include Boolean operators ('AND', 'OR', 'NOT') in bioRxiv searches, four free-text terms in four independent searches were used: 'Mendelian randomization', 'Mendelian randomisation', 'causal inference', and 'causal analysis'.

Study selection
Articles returned through the searches of EMBASE and MEDLINE were imported into EndNote (version X8.2; Clarivate Analytics), and de-duplication was performed using pagination identifiers 16 .Articles returned from bioRxiv were imported into Mendeley and de-duplication performed using the Mendeley de-duplication function.Titles and abstracts of all remaining articles were screened by two independent reviewers (MAL and LJM) using Rayyan 17 , with discrepancies resolved through discussion.Articles that met the pre-defined inclusion criteria (see below) were combined and, in instances where the bioRxiv study had been published and this was identified in either the EMBASE or MEDLINE search, the bioRxiv version of the study was excluded.The full texts of all studies that met inclusion criteria were then screened by the two reviewers.
For title and abstract screening and for full-text screening, articles must have met the following pre-defined inclusion criteria: be written in English; be available in full text (or in the case of conference abstracts, the authors must be contactable to obtain the relevant data); be published in a peer-reviewed journal or bioRxiv; use MR methodology to investigate the causal effect of adiposity on any outcome.All MR approaches (e.g., one-sample, two-sample, and multivariable MR), were considered for inclusion.Adiposity was considered to be any measure which aimed to assess the amount of adipose tissue an individual possessed.If a study focused on adiposity alongside other exposures, the effect of each adiposity measure was reported separately if available.If it was not available, the joint effect was reported.Articles in which an MR approach was used but not explicitly called 'Mendelian randomization' were included.More specifically, any study in which genetic variants were used as instrumental variables (IVs) or the direct association between a genetic variant and outcome was employed was eligible (as described previously 8 ), provided it met the other inclusion criteria.

Data extraction
In the first instance, data extraction was performed by eight reviewers (MAL, CH, LJM, NM, TB, WW, SF, and KHW), with articles split evenly between them, using a data extraction form (Extended data 3) 14 designed using a pre-publication version of the STROBE-MR guidelines 18 in order to obtain all relevant data from each study.Once all articles had been reviewed, two reviewers (MAL and CH) extracted data from all articles they did not review in the first instance.The same two reviewers then checked all extracted data for discrepancies, which were resolved through a third review of individual articles and subsequent discussion.In some cases, articles included in the data extraction contained more than one relevant MR analysis.As such, the words "study" and "studies" refer to the MR analyses within an article.The following data were extracted from each of the studies from all contributing articles: exposure(s), outcome(s), study design and sample characteristics, genetic variant IV selection, MR methodology, sensitivity analysis, and causal estimates.Where relevant data was not reported by the article, "Not discussed" was entered into the data extraction form.
Once data extraction was completed, three columns were added to summarise the type of outcome being studied: column 1 ("outcome") was used as a general categorisation of all outcomes across articles (e.g., the outcome "oestrogen receptor negative (ER-) breast cancer" would have the value "breast cancer"); column 2 ("outcome info") reported the outcome-specific information that distinguished outcomes within categories defined in column 1 (e.g., column 2 would contain the value "ERbreast cancer" for the same breast cancer example); and column 3 ("outcome group") categorised outcomes more generally than values defined in column 1 (e.g., the breast cancer example would be categorised as "cancer").Outcome categories were assigned based on prior biological knowledge and aimed to collapse the large number of outcomes.Where there were too few outcomes to make a category, they were grouped into an "other" category.

Quality assessment
There is currently no risk of bias tool to assess the quality of MR analyses.Here, the tool used by Mamluk et al. (2020) 19 was adapted and used for quality assessment of studies included in the meta-analyses.The quality of each study (MR analysis) within an article was assessed on a three-point scale (low = 3, medium = 2, high = 1; Extended Data 5) 14 across 12 questions, including the five used by Mamluk et al., (2020)  19 .Additional questions which aimed to assess instrument selection, sample overlap, sensitivity analyses, descriptive data, data availability (data missingness), and statistical parameters were included based on a pre-publication version of the STROBE-MR guidelines.Quality assessment was not used as a prerequisite for inclusion or exclusion in the meta-analyses.Rather, it was used to supplement the meta-analyses and aid interpretation, with studies grouped into three rankings based on their quality assessment score: low (total score 12-19), medium (total score 20-27) or high quality (total score 28-35).Mendelian randomization (MR) analyses were included in metaanalyses if they met the conditions set out in the flowchart with regards to sample overlap.* = MR analyses had to use the same exposure and the same outcome to be compatible, e.g., for the exposure, body mass index (BMI) could not be meta-analysed with any other exposure that was not BMI.This also applies to outcomes, e.g., the outcome oestrogen receptor negative (ER-) breast cancer could not be meta-analysed with breast cancer, it could only be meta-analysed with ER-breast cancer.

Meta-analysis
Studies were included for meta-analysis if they met a series of rules that ensured the exposure and outcome were consistent across studies.To be meta-analysed, study methods had to be compatible, for example, the same MR method(s) and units of measurement.As sample overlap can induce bias in MR studies 20 , no population overlap between the different studies that provided data for an outcome being meta-analysed across multiple MR studies or between the different studies that provided the exposure and outcome data were permitted within a meta-analysis (Figure 1).Where there was sample overlap between studies, the study with the larger sample size was retained.Studies using the same population samples for the exposure data were included as the risk of bias is low 20 .
In a fixed-effects meta-analysis, the assumption is that all effect estimates estimate the same effect.In MR analyses, we assume that studies using the same exposure and outcome will be estimating the same effect, but that the exposure and outcome is subtly different among different populations given instrumentation and measurement error.We therefore consider these to be related effects 21,22 .In an inverse variance weighted fixed-effects model, a weighted average is calculated as:

= ∑ ∑
Where, y i is the causal effect estimates in the i th MR study, SE i is the standard error of that estimate, and the summation (Σ) is across all studies.In a random-effects model, SE i is adjusted to incorporate heterogeneity among study effects (τ 2 ).In this, a random-effects model will weight smaller studies more than a fixed-effects model would, as they provide more information on the distribution of effects as opposed to more information on the overall effect.This does not mean that random-effects models account for heterogeneity; random-and fixed-effects models will give identical results when there is no heterogeneity.
Following this and considerations in the Cochrane handbook, an inverse variance weighted random-effects model using estimates and standard errors was performed using the meta 23 package in R and the function metagen.Where standard errors and effect estimates were not available for a study (e.g., confidence intervals (CIs) and odds ratios were available), these were back-calculated manually.For both binary and continuous outcomes, the Hartung and Knapp method to adjust CIs to reflect uncertainty in the estimation of between-study heterogeneity 24,25 , which is recommended for random-effects models 26,27 , was used where ≥ 5 studies were included in the meta-analysis 26 .Between-study variance was estimated for all meta-analyses using the Paule-Mandel estimator 28 , for which simulation studies have shown good performance compared to other estimators 29 .
Forest plots were used to visualise results.For binary outcomes, the relevant summary method was used for odds ratios, risk ratios, hazard ratios, among others.For continuous outcomes, the mean difference was used for the underlying summary method.When presenting results, "increase" and "positive" refer to, for example, a higher BMI or an increase in the risk of type 2 diabetes; "decrease" and "negative" refer to, for example, a lower BMI or a decreased risk of type 2 diabetes.

Narrative synthesis
A narrative synthesis of all studies not included in the metaanalyses was performed in order to gain a global picture of reported causal effects.The narrative synthesis summarised the reported directions of effect estimates across outcome categories, including a summary of the evidence for selected exposures and outcomes.The outcome categories were used to guide the synthesis.Given the non-independence of studies and the focus on summarising directions of effect estimates, the synthesis should be interpreted as an overview and not as definitive evidence for a causal effect.For a complete picture, or to look at specific exposure-outcome pairs, data extracted from all included studies are available from Extended data 3 14 and can be browsed online.

Literature search and data extraction
A total of 173 articles met the pre-defined inclusion criteria after full text screening (Figure 2; PDFs for each article available on GitHub) -articles from bioRxiv included in data extraction were replaced with their published version if available.Of the 23 included bioRxiv articles, 18 were published once data extraction began and these published versions were included instead of the bioRxiv article.One bioRxiv article was excluded as the published version did not include the MR analysis.The remaining four bioRxiv articles were included.Most of the 173 articles were published in the past five years (Figure 3).Data were extracted for 2,214 studies performed across the 173 articles (i.e., many articles conducted multiple MR analyses) and one-sample MR was the predominant analysis performed (Figure 4).This included 30 exposures and 659 outcomes.The majority of studies (68%) used BMI as the exposure (Table 1).The largest proportion of outcomes were grouped into the metabolic (18%) and cancer categories (16%) (Table 2).The "other" category included 118 methylation outcomes, 68 mortality outcomes, and a handful of the following outcomes: age related macular degeneration, cataract, disease count, hernia, sleep, and physical activity.

Meta-analysis and quality assessment
In total, 66 studies from 34 articles were included in 29 meta-analyses -studies investigating the effect of adjusted variables (i.e., WHRadjBMI) in two-sample settings were excluded given recent evidence of biased estimates when using adjusted traits in MR studies 30 .Most of the 2,214 studies were excluded due to a lack of meta-analysable data (e.g., only one MR analysis looked at a given exposure-outcome pair).
The average quality assessment score across the 66 studies was 24 (standard deviation (SD) = 2.8; Figure 5).Only the study of the association between BMI and haemorrhagic stroke by Dale et al., (2017) 31 was ranked as high quality.All low scoring studies showed consistent directions of effect with the other studies with which they were meta-analysed.Quality assessment scores for each study are presented alongside the meta-analysis results (Figure 6 and Figure 7).The majority of studies included in the meta-analyses used sex-combined data for the exposure and outcome.As such, we consider meta-analysis results to be the sex-combined effect of the exposure on the outcome.The exception is for the sex-specific outcomes endometrial, ovarian, and prostate cancer and polycystic ovary syndrome which used sex-specific outcome data.For these four outcomes only, we consider the effect on the outcome to be sex-specific.
All results are given per SD unit increase.For all binary outcomes, results are given as an odds ratio (OR) and reflect the OR of the outcome per SD unit increase in the exposure.For continuous outcomes, results are given as the mean difference (MD) and reflect an average unit change in the outcome per SD unit increase in the exposure.The term "effect estimate" is used throughout.
Of the 20 binary (Figure 6) and 9 continuous (Figure 7) outcomes, 5 meta-analyses had negative effect estimates: birthweight on ER-breast cancer and colon cancer, and BMI on high-density lipoprotein cholesterol (HDL-C; analysed with SD and mmol/L units) and low-density lipoprotein cholesterol (LDL-C; mmol/L).14 of the remaining tests had positive effect estimates with CIs that did not span the null.The remaining 10 tests had positive effect estimates with CIs that spanned the null.There was little difference between effect estimates from studies contributing to individual meta-analyses that had a low-quality assessment score and studies with a medium or high-quality assessment score.One outcome was investigated using more than one exposure, colorectal cancer with BMI and WHR.There was evidence for an increasing effect of both measures on colorectal cancer: WHR (OR = 1.48; 95% CI = 1.08-2.03);BMI (OR = 1.18; 95% CI = 1.01-1.37. BMI was the predominant exposure and was found to be associated with an increase in the risk of all cancers tested (colorectal, endometrial, lung, ovarian, and prostate), CIs crossed the null only for prostate cancer (OR = 1.08; 95% CI = 0.91-1.28).There was weak evidence for an association between BMI and ischemic and haemorrhagic stroke, hypertension, arthritis, and Alzheimer's disease, with effect estimates close to the null and CIs spanning the null.
There was evidence of heterogeneity within the included studies, 8 of 20 binary outcomes and 5 of 9 continuous outcomes had heterogeneity statistics with p-values ≤ 0.05.However, given no meta-analysis met the requirements for heterogeneity statistics (≥ 5 studies) 32 these results should be interpreted with caution.
Narrative synthesis A total of 2,144 studies were not included in the meta-analyses.A complete summary for each outcome category is available as Extended data 6 14 .All extracted data are available from Extended data 3 14 and can be browsed online.Briefly, of the 2,144 studies, 1,343 reported a positive direction of effect and 597 reported a negative direction of effect.The remaining 204 studies either did not report an effect estimate or the effect estimate was null.The largest number of studies and articles investigated the association between adiposity and metabolic or cancer outcomes which are summarised here.In this synthesis we discuss directions of effect across all studies and do not account for sex in this regard.
For the metabolic category, 380 studies were reported across 51 articles.89 studies reported a positive effect estimate and 266 studies reported a negative effect estimate, the remaining studies did not report an effect estimate.For example, there was weak evidence for an increasing effect of BMI on cholesterol, but strong evidence for an increasing effect of WHRadjBMI on cholesterol.Evidence was strongest for outcomes analysed by multiple studies and articles.For example, there was strong evidence for an increasing effect of BMI, birth weight, childhood BMI, WHR, WHRadjBMI, and WC on diabetes (type 1, type 2, and all).Where a study performed a bi-directional MR analysis and adiposity was the secondary analysis (i.e., to check for reverse causation), this was recorded as a bi-directional MR analysis.One-sample and two-sample MR meta-analysis indicates that the meta-analysis included MR analyses that were both one-and two-sample designs.Generalized summary data-based MR allows for, and models, correlated SNPs within the instrument.Factorial MR is analogous to a factorial randomized controlled trial, whereby individuals are grouped using genetic scores (generally in a 2 x 2 approach).An MR-PheWAS is the investigation of a single trait on many, potentially hundreds, of outcomes.Direct G-O refers to an MR analysis which used instruments from a single locus, e.g., the FTO locus.

Discussion
Here, 173 articles and 2,214 MR analyses were reviewed.Meta-analyses and a narrative synthesis of these studies provide an overview of the causal landscape of adiposity.Broadly, evidence points to an increasing effect of adiposity on a wide array of outcomes, including many cancers as well as cardiovascular traits, and type-2 diabetes.It was not possible to summarise the effect of adiposity on each outcome in the narrative synthesis.Instead, extracted data from all 2,214 studies are available as Extended Data 5 14 and via a data browser.Broadly, results from the meta-analyses were consistent with the narrative synthesis.However, there was variability within outcomes.
There were some inconsistencies between evidence from the meta-analyses and narrative synthesis.For example, there was evidence for an increasing effect of adiposity on endometrial and colorectal cancer in the meta-analysis, but within the narrative synthesis, there were studies that reported evidence of an increasing, protective, and null effect of adiposity on both cancers.This is expected to some degree since in meta-analyses the sample size is considered, and studies are weighted by this.In contrast, in the narrative synthesis, only the direction of effect was used to summarise the effect of adiposity.
Additionally, studies included in the meta-analyses were non-overlapping, whereas the narrative synthesis will have included numerous studies of the same exposure-outcome pair with overlapping samples.As a result, effects from the same population are likely repeated in the narrative synthesis, which may have biased the summation of the overall effect of adiposity.
Many of the consistent effects observed across the meta-analyses and narrative synthesis are supported by observational studies, including increased risk of CVD 4 and hypertension 6 .However, there are some inconsistencies with the observational literature, notably for the effect of adiposity on haemorrhagic stroke, where evidence for an effect of adiposity was weak in meta-analysis but is strong in observational analyses 33 .There was also evidence in the narrative synthesis for an effect of adiposity on a broad number of metabolites which is also found in the observational literature 6,34 .However, there was weak evidence in the meta-analyses for a decreasing effect of BMI on HDL-C and an increasing effect on LDL-C (e.g., the estimate with SD units was positive and had less heterogeneity across the studies meta-analysed), which is repeatedly found in observational studies 6,34 .
Whilst the direction of effect estimates were largely consistent across both meta-analyses, where we do see inconsistent results, they typically manifest as differences in the magnitude of effect estimates.For instance, our study found a larger effect of BMI on ovarian cancer (OR: 1.39; 95% CI: 1.20-1.61)compared to the review by Larsson and Burgess (OR: 1.07; 95% CI: 0.97-1.18).Such differences in magnitude may reflect more reliable estimates obtained from studies with larger case numbers as used in their review (cases for ovarian cancer = 27,420) compared to our meta-analysis (cases for ovarian cancer = 5,400), due to the predominance of studies using UK Biobank included in their meta-analysis.There are also important differences in the scope of these papers.Firstly, whilst the review published by Larsson and Burgess specifically  focused on BMI and a handful of chronic disease groups, many of which overlapped with the outcomes included in our work, our study was broader, encompassing all adiposity measures, with no restriction on outcome.Similarly, whilst the date by which the searches were completed in our systematic review was earlier, we included a larger number of articles (173 vs. 47), extracted data on more MR studies (2,214 vs. 101) and did not exclude studies that had investigated the same outcome using the same population.Lastly, we also provided a narrative synthesis of data extracted from all 2,144 MR studies not included in the meta-analyses.
A particular consideration from this work is the shallowness of the identified exposure-outcome pairs.That is, many outcomes have been assessed, but these have predominantly been assessed with BMI as the exposure.Although there is some replication of the results of the association between BMI and various outcomes, they are concentrated on more heavily studied diseases such as cancer and CVD.An additional component of this observation is the use of meta-analyses and biobanks, whereby the same exposure-outcome association has been assessed using ever larger samples, which include the same populations.This poses a potential problem for future work, whereby large studies using meta-GWAS and biobanks, due to their size, are able to capture population structure 36 .If not controlled within GWASs and MR analyses, this population structure may bias MR analyses and meta-analyses of MR results due to the introduction of genetic confounding and violation of the second MR assumption.In addition, the potential for selection bias, particularly in studies of later life diseases such as Alzheimer's disease, may lead to attenuated or reversed estimates due to selection bias 37 .This may arise when genetic variants associated with the exposure (in this case BMI) influence study participation and will be most prominent for exposures that may cause diseases that prevent the study of later life diseases because of mortality or loss to follow-up 37 .Whilst this may be present within the current evidence synthesis, particularly for those disease and health outcomes occurring in middle-to-late life, it is difficult to ascertain and quantify the impact that this may have on the current estimates.We advocate that authors undertaking MR analyses on such outcomes consider this when conducting and interpreting their estimates.It is also important to note that a key assumption of most MR analyses is that there is a linear association between the exposure on the outcome and between the genetic instrument and exposure.Few studies included within our research acknowledged this assumption nor the potential bias non-linear effects may have on estimated effects.Investigations of non-linear effects were primarily performed in methodological studies in which an adiposity measure was used as an exemplar 38,39 .Whilst there may be causal non-linear effects between adiposity and some diseases or health outcomes, this is currently difficult to estimate with MR methodologies [40][41][42] and, in an observational context, non-linear relationships may instead reflect bias (e.g., due to reverse causation or confounding).For example, in the context of BMI and mortality, the presence of those who lose weight due to co-morbidities that increase the risk of mortality in the study sample may indicate that a lower BMI increases the risk of mortality).Until more concrete non-linear methodologies are developed, this will remain a key limitation of providing unbiased non-linear effect estimates of the causal role of adiposity on health outcomes.
Data extraction was based on the STROBE-MR guidelines, which includes information on interpretability and reproducibility.It was not possible to extract all data from many of the 2,214 studies included in the review.Although some of this data related to reproducibility guidelines (e.g., software used) a large proportion was related to interpretability (e.g., SNPs used).This also included data on sex, which was routinely missing or difficult to extract from both the MR studies and original GWAS publications from which the MR studies obtained exposure and/or outcome data.This limited the scope of the narrative synthesis to an overall summary of the direction of effect estimates and did not allow for sex-specific summaries.As the STROBE-MR guidelines have now been published 18 , it is expected that the reporting quality of studies will improve.The omission of methodological detail is unlikely to affect the results of an analysis but does impact on reproducibility and the reuse of results in meta-analyses such as those presented here.
Most studies employed similar instrumentation approaches, using a p-value threshold of 5 × 10 -8 and a linkage disequilibrium R2 threshold of 0.0001 (the default for the TwoSampleMR R package) to identify independent instruments.This has the advantage that many studies will likely have used the same SNPs for the same exposure.Similarly, most studies used the same methodologies; however, there was little investigation of non-linear effects.

Strehngths and limitations
The majority of the 29 meta-analyses included just two MR analyses; this was primarily a result of overlapping outcome samples across studies which would ultimately bias results towards the confounded observational estimate 20 .This overlap suggests replication within the literature but also the use of meta-GWAS to obtain ever larger populations for MR analyses.The limited number of analyses included in each meta-analysis (i.e., < 5 studies) prevents meaningful interpretation of heterogeneity statistics 32 and prevented the assessment of publication bias.
Given the incomplete and often poor reporting of MR analyses, results here should be interpreted cautiously.Studies were excluded from meta-analysis if there was overlap between the outcome data between studies or between the exposure data and outcome data between studies.However, it is possible that this was not completely accurate given that not all studies reported the cohorts used in their analyses.Additional limitations of MR analyses, including homogeneity and monotonicity, may be especially important in meta-analysis results given effects among different populations may not be homogeneous (i.e., the effect of the IV or exposure is not the same for all populations) or monotonic (i.e., the effect of the IV on the exposure is differential among populations).

Conclusions
Adiposity is shown to exert a predominantly increasing effect on numerous outcomes including many cancers, cardiovascular outcomes, and metabolic traits.Results here are broadly consistent with the observational literature and provide corroborative evidence for associations with several traits.However, these results are not definitive and should instead be used as a guide for future investigations aiming to triangulate evidence of association 7 .There is a need to update this work, especially considering the large body of work conducted during the SARS-CoV-2/COVID-19 pandemic, and it is hoped this will become easier as the quality of studies improves with the adoption of the STROBE-MR guidelines.

Analysis code
Analysis code available from: https://github.com/mattlee821/systematic_review_MR_adiposity/tree/1.0.0 Archived analysis code at time of publication: https://doi.org/10.5281/zenodo.7377406 15cense: MIT I request just a few minor clarifications about the methodology.Specifically, Why was the specific date, 2019-02-18 chosen as the cut-off for publication?The justification of pre-COVID-19 pandemic is understood, but why specifically that date? 1.
In the data extraction section, you mention, "Where there were too few outcomes to make a category, they were grouped into an 'other' category."What was threshold for this classification? 2.
For binary outcomes results are presented as odds-ratios, for papers that reported other outcome measures (such as risk ratios or hazards-ratios) how were they transformed? 3.
The rest of the paper seem well thought out and well referenced for the applicable methods used.

Are sufficient details of the methods and analysis provided to allow replication by others? Partly
Is the statistical analysis and its interpretation appropriate?Yes

Are the conclusions drawn adequately supported by the results presented in the review? Yes
If this is a Living Systematic Review, is the 'living' method appropriate and is the search schedule clearly defined and justified?('Living Systematic Review' or a variation of this term should be included in the title.)

Not applicable
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Statistics, Biostatistics, Childhood growth I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Are the rationale for, and objectives of, the Systematic Review clearly stated?

Not applicable
Are sufficient details of the methods and analysis provided to allow replication by others?Not applicable

Is the statistical analysis and its interpretation appropriate? Not applicable
Are the conclusions drawn adequately supported by the results presented in the review?

Not applicable
If this is a Living Systematic Review, is the 'living' method appropriate and is the search schedule clearly defined and justified?('Living Systematic Review' or a variation of this term should be included in the title.)Not applicable 1.
It is necessary to update the search since it has been almost four years since it was performed and several relevant studies for this review have been published.

2.
Since the study is focused on health outcomes, I suggest including mediRxiv among the consulted databases. 3.
Indicate whether only two-sample MR analyses were included in this review or also multivariable MR studies were considered.

4.
Are the rationale for, and objectives of, the Systematic Review clearly stated?Yes

Are sufficient details of the methods and analysis provided to allow replication by others? Yes
Is the statistical analysis and its interpretation appropriate?Yes

Are the conclusions drawn adequately supported by the results presented in the review? Yes
If this is a Living Systematic Review, is the 'living' method appropriate and is the search schedule clearly defined and justified?('Living Systematic Review' or a variation of this term should be included in the title.)Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genetics; Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 06 Sep 2023

Matthew Lee
We thank the reviewer for their time in reviewing this work!In the Introduction, please consider including a brief definition of MR studies.This will be helpful for readers unfamiliar with this method.

○
We have added a sentence in the first paragraph with additional references: "Briefly, MR is an epidemiological method which, exploiting the random allocation of alleles during gametogenesis, utilizes germline genetic variation (usually, single nucleotide polymorphisms, SNPs) as instrumental variables for modifiable exposure traits of interest conducted to date on adiposity and disease outcomes and risk factors, for example more like a scoping review.Such a scoping review would however need to be up-to-date.We thank the reviewer for this fair comment.By way of comparison, we found similar results in our meta-analyses compared to the review by Larsson and Burgess.For example, our meta-analysis results indicated that body mass index (BMI) increased the risk of colorectal cancer (OR: 1.18; 95% CI: 1.01-1.37),as did the meta-analysis conducted by Larsson and Burgess (OR: 1.13; 95% CI: 1.06-1.20).Where we do see inconsistent results, they typically manifest as differences in the magnitude of effect estimates and not in the direction of effect estimates.For instance, our study found a larger effect of BMI on ovarian cancer (OR: 1.39; 95% CI: 1.20-1.61)compared to the meta-analysis conducted by Larsson and Burgess (OR: 1.07; 95% CI: 0.97-1.18).Whilst both systematic reviews are valuable, there are important differences in the scope of these papers.Firstly, while Larsson and Burgess include studies published after those included in our systematic review, they focused solely on BMI and chronic diseases, whilst our study included a more comprehensive investigation of all adiposity measures with no restriction with respect to outcome.This is reflected in the larger number of articles in our review (173 vs. 47) and resultant overall number of analyses (2,214 vs. 101).Whilst Larsson and Burgess were able to perform more meta-analyses (63 in total) compared to our study (31), this was a result of the greater number of studies conducted using UK Biobank and the inclusion of de novo MR analyses authors conducted themselves using FinnGen, both of which featured prominently in the other review.These features highlight differences in scope and also the relative niches these two reviews occupy.Nonetheless, we wish to acknowledge previous work and the similarities and differences compared to our study and have therefore added a new paragraph to the discussion: "Results from the meta-analyses conducted here are corroborated by a recent review by Larsson and Burgess (2021) 4 , where meta-analysis results are largely consistent.For example, our meta-analysis results indicated an effect of BMI on CRC (OR: 1.18; 95% CI: 1.01-1.37)and venous thromboembolism (OR: 1.58; 95% CI: 1.33-1.87),estimates of which were similar to those provided by Larsson and Burgess for the same phenotypes (i.e., OR: 1.13; 95% CI: 1.06-1.20 for CRC; and OR: 1.49; 95% CI: 1.33-1.66for venous thromboembolism).Whilst the direction of effect estimates were largely consistent across both meta-analyses, where we do see inconsistent results, they typically manifest as differences in the magnitude of effect estimates.For instance, our study found a larger effect of BMI on ovarian cancer (OR: 1.39; 95% CI: 1.20-1.61)compared to the review by Larsson and Burgess (OR: 1.07; 95% CI: 0.97-1.18).Such differences in magnitude may reflect more reliable estimates obtained from studies with larger case numbers as used in their review (cases for ovarian cancer = 27,420) compared to our metaanalysis (cases for ovarian cancer = 5,400), due to the predominance of studies using UK Biobank included in their meta-analysis.There are also important differences in the scope of these papers.Firstly, whilst the review published by Larsson and Burgess specifically focused on BMI and a handful of chronic disease groups, many of which overlapped with the outcomes included in our work, our study was broader, encompassing all adiposity measures, with no restriction on outcome.Similarly, whilst the date by which the searches were completed in our systematic review was earlier, we included a larger number of articles (173 vs. 47), extracted data on more MR studies (2,214 vs. 101) and did not exclude studies that had investigated the same outcome using the same population.Lastly, we also provided a narrative synthesis of data extracted from all 2,144 MR studies not included in the meta-analyses."Regarding the reviewer's second point about updating the systematic review, we agree, and acknowledge in the final paragraph of our introduction, that an update to this work would be valuable.However, this update is not within the scope of this current review given our aim to target the pandemic window of publication before there was a systemic shift in publication characteristics -especially for BMI.In addition to this, we are relatively confident that the inclusion of an update would likely lead to the similar conclusions -not justifying a lack of update (given the focus of our review), but noting that the publication our specific review would not substantively alter the current evidence base, but work with it.As an indication, we performed a preliminary search of EMBASE and MEDLINE from February 2019 -July 2023 and identified 5,106 articles (after de-duplication).If we assumed a similar inclusion rate as we published here (i.e., ~2.6%), we would expect an additional ~105 articles to be included for further data extraction in efforts to update the current review.Updating the review would require resources that are beyond the remit of the existing review project.
Second, MR studies are open to selection bias because of the misalignment of the time of the start of exposure and of recruitment2.Specifically, genetic endowment (used to instrument MR studies) starts at birth, whilst almost all the studies used were recruited in middle age or later.People who died before recruitment because of their genetically determined adiposity or died before recruitment from a disease of interest, or a competing risk of that disease, are missing.Missing people on exposure and outcome creates selection bias, which attenuates or reverses estimates3.As such, some of the estimates may be misleading, particularly for late onset diseases.Ideally, the authors should consider the possibility of selection bias from selecting on survival for all outcomes considered and explain how the estimates might or might not be biased.

○
This is an important point and we have now added a statement to this end in the discussion: "In addition, the potential for selection bias, particularly in studies of later life diseases such as Alzheimer's disease, may lead to attenuated or reversed estimates due to selection bias 5 .This may arise when genetic variants associated with the exposure (in this case BMI) influence study participation and will be most prominent for exposures that may cause diseases that prevent the study of later life diseases because of mortality or loss to followup 5 .Whilst this may be present within the current evidence synthesis, particularly for those disease and health outcomes occurring in middle-to-late life, it is difficult to ascertain and quantify the impact that this may have on the current estimates.We advocate that authors undertaking MR analyses on such outcomes consider this when conducting and interpreting their estimates."Third, the key question about the role of adiposity in disease is the shape of the association.Specifically, observationally, the association of adiposity with disease may appear to be non-linear, particularly in the lower ranges4.From a public health point of view, we need to know if the relation of adiposity with disease is the lower the better or if there is some optimal value.The paper does not address this key question, but at least it could be acknowledged as a limitation.
○ This is a good point and we agree that we should acknowledge this important issue.As such, we have added the following text to the discussion: "It is also important to note that a key assumption of most MR analyses is that there is a linear association between the exposure on the outcome and between the genetic instrument and exposure.Few studies included within our research acknowledged this assumption nor the potential bias nonlinear effects may have on estimated effects.Investigations of non-linear effects were primarily performed in methodological studies in which an adiposity measure was used as an exemplar 6,7 .Whilst there may be causal non-linear effects between adiposity and some diseases or health outcomes, this is currently difficult to estimate with MR methodologies [8][9][10] and, in an observational context, non-linear relationships may instead reflect bias (e.g., due to reverse causation or confounding).For example, in the context of BMI and mortality, the presence of those who lose weight due to co-morbidities that increase the risk of mortality in the study sample may indicate that a lower BMI increases the risk of mortality).Until more concrete non-linear methodologies are developed, this will remain a key limitation of providing unbiased non-linear effect estimates of the causal role of adiposity on health outcomes."

Figure 1 .
Figure 1.Inclusion criteria for meta-analysis: flowchart.Mendelian randomization (MR) analyses were included in metaanalyses if they met the conditions set out in the flowchart with regards to sample overlap.* = MR analyses had to use the same exposure and the same outcome to be compatible, e.g., for the exposure, body mass index (BMI) could not be meta-analysed with any other exposure that was not BMI.This also applies to outcomes, e.g., the outcome oestrogen receptor negative (ER-) breast cancer could not be meta-analysed with breast cancer, it could only be meta-analysed with ER-breast cancer.

Figure 2 .
Figure 2. PRISMA flowchart.N gives the number of articles at each stage.MR = Mendelian randomization.

Figure 3 .
Figure 3. Distribution of publication year and average exposure and outcome sample sizes across included studies up to the search date of February 2019.The number of articles included per year is given on the left Y axis; the right Y axis gives the average sample size for exposure (grey) and outcome (red) for each year.Outcome cases and controls were summed within analyses for binary outcomes.

Figure 4 .
Figure 4. Distribution of study design across 173 included articles.The Y axis gives the MR study design and the X axis gives the number of studies for that study design.The majority of the 173 included articles reported more than one Mendelian randomization (MR) analysis.Where a study performed a bi-directional MR analysis and adiposity was the secondary analysis (i.e., to check for reverse causation), this was recorded as a bi-directional MR analysis.One-sample and two-sample MR meta-analysis indicates that the meta-analysis included MR analyses that were both one-and two-sample designs.Generalized summary data-based MR allows for, and models, correlated SNPs within the instrument.Factorial MR is analogous to a factorial randomized controlled trial, whereby individuals are grouped using genetic scores (generally in a 2 x 2 approach).An MR-PheWAS is the investigation of a single trait on many, potentially hundreds, of outcomes.Direct G-O refers to an MR analysis which used instruments from a single locus, e.g., the FTO locus.

Figure 5 .
Figure 5. Quality assessment: distribution of quality assessment scores for studies included in the meta-analyses."High" indicates a study scored highly; "low" indicates a study scored poorly.QA = quality assessment score.

Figure 6 .
Figure 6.Meta-analysis: effect estimates and 95% confidence intervals for binary outcomes.Forest plot shows effect estimates and 95% confidence intervals (CIs) from a meta-analysis of 22 different exposure-outcome pairs.Mendelian randomization analyses included based on criteria in Figure 1.P-values are given for the heterogeneity statistics.QA = quality assessment score; OR = odds ratio.onGitHub.Forest plots of individual meta-analyses are also available on GitHub.

Figure 7 .
Figure 7. Meta-analysis: effect estimates and 95% confidence intervals for continuous outcomes.Forest plot shows effect estimates and 95% confidence intervals (CIs) from a meta-analysis of 9 different exposure-outcome pairs.Mendelian randomization analyses included based on criteria in Figure 1.P-values are given for the heterogeneity statistics.QA = quality assessment score; OR = odds ratio.Available on GitHub.Forest plots of individual meta-analyses are also available on GitHub.

Table 1 . Number and frequency of exposures used across all 2,214 Mendelian randomization analyses.
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Are the rationale for, and objectives of, the Systematic Review clearly stated? Yes Are sufficient details of the methods and analysis provided to allow replication by others? Yes Is the statistical analysis and its interpretation appropriate? Partly Are the conclusions drawn adequately supported by the results presented in the review? Partly If this is a Living Systematic Review, is the 'living' method appropriate and is the search schedule clearly defined and justified? ('Living Systematic Review' or a variation of this term should be included in the title.) Partly Competing Interests:
This is a valid point, as medRxiv is an important resource.However, medRxiv was not launched until June 2019.Given the scope of this systematic review and date by which we finished the searches prior to the COVID-19 pandemic (i.e., up-to February 2019), medRxiv inclusion was not within the scope of this work.Indicate whether only two-sample MR analyses were included in this review or also multivariable MR studies were considered.No competing interests were disclosed.