A Bayesian approach to Mendelian randomization using summary statistics in the univariable and multivariable settings with correlated pleiotropy

Mendelian randomization uses genetic variants as instrumental variables to make causal inferences on the effect of an exposure on an outcome. Due to the recent abundance of high-powered genome-wide association studies, many putative causal exposures of interest have large numbers of independent genetic variants with which they associate, each representing a potential instrument for use in a Mendelian randomization analysis. Such polygenic analyses increase the power of the study design to detect causal effects, however they also increase the potential for bias due to instrument invalidity. Recent attention has been given to dealing with bias caused by correlated pleiotropy, which results from violation of the Instrument Strength independent of Direct Effect assumption. Although methods have been proposed which can account for this bias, a number of restrictive conditions remain in many commonly used techniques. In this paper, we propose a novel Bayesian framework for Mendelian randomization which provides valid causal inference under very general settings. We propose the methods MR-Horse and MVMR-Horse, which can be performed without access to individual-level data, using only summary statistics of the type commonly published by genome-wide association studies, and can account for both correlated and uncorrelated pleiotropy. In simulation studies, we show that the approach retains type I error rates below nominal levels even in high pleiotropy scenarios. We consider an applied example looking at the causal relationship between combinations of four exposures (LDL-cholesterol, triglycerides, fasting glucose and birth weight) and three outcomes (coronary artery disease, type 2 diabetes and asthma).


Introduction
Mendelian randomization is a technique which exploits genetic variation to make causal inferences on the effect of one or more exposures on an outcome of interest (Lawlor et al., 2008).The underlying idea is that, because of the Mendelian laws of inheritence, a genetic variant which is associated with an exposure of interest can serve as a proxy for the exposure.Such a proxy should be less susceptible to the counfounding and reverse causation which can inhibit causal inference from observational data.Methods for Mendelian randomization have been developed which can combine information from multiple genetic variants associated with the same exposure.Furthermore, many of these methods can be performed using only summary statistics taken from genome-wide association studies (GWAS), removing the need to access sensitive individual level data.Under the two-sample framework, summary statistics for the exposure and outcome can be taken from separate, ideally non-overlapping, samples (Pierce and Burgess, 2013).Thus, any combination of exposure and outcome for which there exists publicly available GWAS summary statistics can potentially be considered in a Mendelian randomization study.However, in order to provide robust evidence for causal effects, strong assumptions must be made pertaining to the relationships between the genetic variants and the exposures, outcome and any confounder variables.
Mendelian randomization can be considered as a form of instrumental variables analysis, where each genetic variant must satisfy the three instrumental variables assumptions.In the single exposure case, these assumptions are that each genetic variant: 1.
is associated with the exposure;

2.
is not associated with the outcome via any confounding pathway;

3.
does not affect the outcome other than via the exposure.
The relationships between a valid genetic variant and other relevant traits is illustrated graphically in the directed acyclic graph (DAG) in Figure 1A (Greenland, 2000).
A common source of instrument invalidity is pleiotropy, which is where genetic variants associate with multiple traits.If a genetic instrument associates with a predictor of the outcome via a causal pathway which bypasses the exposure, then this will open up a pleiotropic pathway and invalidate the third instrumental variables assumption.Another source of invalidity is population structure, in which there are differences in the genetic distributions among subgroups of the population being studied.This results in confounding of the genetic variant-outcome relationship, and invalidates the second instrumental variables assumption.Such confounding may occur if there is population stratification, dynastic effects or assortative mating (Brumpton et al., 2020).
Much attention has been given in the literature to developing methods for Mendelian randomization which are robust to instrument invalidity.Many of these methods have been developed motivated by providing robustness against pleiotropy.Often, the InSIDE (Instrument Strength Independent of Direct Effects) assumption is required for these methods to give valid results, which means that any pleiotropic effects are independent of the genetic variant-exposure associations (Bowden et al., 2015).Methods which require InSIDE will not provide robustness to population structure, since the confounding pathway may create a correlation between the genetic effects on the exposure and the direct effects on the outcome.Recently, more attention has been given to methods which can also handle violations of the InSIDE assumption, a scenario referred to as correlated pleiotropy (Morrison et al., 2020).However, methods which provide such robustness typically suffer from a combination of inflated type I error rates, low power or further restrictive assumptions.
In this paper we propose a novel Bayesian approach to Mendelian randomization which is robust to both uncorrelated and correlated pleiotropy.We show that this approach is able to provide unbiased and efficient causal effect estimation without inflated type I error rates in very general settings, with favorable performance in comparison with existing methods.A further advantage of the proposed method is that it is easily generalised to the multivariable Mendelian randomization paradigm, which has far fewer robust methods than the single exposure case.We demonstrate the approach in an applied example which looks at the relationship between combinations of four exposures (LDL-cholesterol, triglycerides, fasting glucose and birth weight) and three outcomes (coronary artery disease, type 2 diabetes and asthma).

Models for instrument invalidity
We consider a potentially casual exposure X for an outcome Y with unmeasured confounders U. We denote by θ the causal effect of X on Y, and assume that this effect is linear and homogeneous in the population (that is, that there is no effect modification).This latter assumption allows for causal effect estimation in the instrumental variables framework.
The relationships between a valid genetic instrument, G j , and the exposure, outcome, and unmeasured confounders, are shown in Figure 1A.If we have a single such instrument, we can estimate the causal effect using the Wald ratio, β Y j /β X j , where β Xj and β Y j are estimates of the associations between G j and X and Y, respectively.These association estimates are typically obtained by using linear regression for continuous traits and logistic regression for binary traits.If we have J valid instruments, the inverse-variance weighted (IVW) method fits the model where σ Y j is the standard error of the jth genetic variant-outcome association estimate (Burgess et al., 2013).Note that the standard error of the jth genetic variant-exposure association estimate, σ Xj , does not appear in the IVW model, which reflects the fact that this approach implicitly assumes that the genetic associations with the exposure are measured without error.
Most pleiotropy robust methods for Mendelian randomization make the assumption that some of the instruments are valid, and allow for varying levels of invalidity.MR-PRESSO assumes that a small number of instruments are invalid which will result in outliers in the Wald ratio estimates (Verbanck et al., 2018).Under the InSIDE assumption, MR-PRESSO tests for outliers and estimates the causal effect using IVW with the identified outliers removed.If at least half of the instruments are valid, a weighted median of the Wald ratios gives a consistent estimate of the causal effect, whether or not InSIDE is satisfied (Bowden et al., 2016).
Figure 1B illustrates the case where there is a pleiotropic pathway to the outcome which does not pass via X or U (although it may go via other unmeasured traits which do not associate with the exposure).In this case, the direct effect α j is independent of the instrument strength β Xj , hence InSIDE is met.The causal effect can be estimated using the MR-Egger method, which fits the model (2) The intercept term α 0 represents the average pleiotropic effect across all variants.If this average effect is equal to zero, then it is said that pleiotropy is balanced.If the average effect is non-zero, the pleiotropy is said to be directional.Under the MR-Egger model, potentially all variants may be invalid, but only in such a way that the InSIDE assumption is satisfied.
Figure 1C shows the case where the genetic variant has a pathway to the outcome via one or both of a direct path and a path that passes through a confounder of the exposure-outcome relationship.This latter pathway creates correlated pleiotropy, or, in other words, violates InSIDE.One commonly used model is for this scenario is where α j represents the effect α j + γ j δ Y and β Xj is an estimate of β Xj = β Xj + γ j δ X .Under the assumption that some of the instruments are valid and some are invalid, Kang et al. (2016) and Rees et al. (2019) proposed estimating θ using penalised regresssion with a lasso-style penalty on the α j terms.In this way, the estimates of α j relating to the valid instruments are shrunk toward zero.A post-lasso approach fits the IVW model using only the instruments whose α j estimates were shrunk to zero in the initial lasso regression.Although these lasso-based approaches provide unbiased effect estimates, they can suffer from substantially inflated type I error rates due to the post-selection inference problem.Kang et al. (2016) showed that the causal effect is identifiable in the model given in equation ( 3) when a plurality of instruments are valid.That is, if there are groups of variants for which the single-instrument causal effect estimate is the same, the largest of these groups is the set of valid instruments.Alternative methods that make the plurality valid assumption include the Contamination Mixture method (Burgess et al., 2020) and MR-Mix (Qi and Chatterjee, 2019), which are based on fitting mixture distributions to the Wald ratio estimates, and weighted mode approaches (Hartwig et al., 2017;Guo et al., 2018).Xue et al. (2021) proposed a method which fits the model

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts using constrained maximum likelihood, referred to as MR-cML.This method provides consistent causal effect estimation under the plurality valid assumption.Similar to the lasso regression approaches, MR-cML maximises the likelihood in such a way that the α j 's pertaining to valid instruments will tend toward zero.However, through the use of model averaging it suffers less from inflated type I errors.Furthermore, the model acounts for the uncertainty in the β X estimates.The MR-RAPS method is also based on the model given in equation ( 4), where pleiotropy is accounted for using a robust loss function.However, MR-RAPS makes the InSIDE assumption, requires balanced pleiotropy, and was shown to have lower power than MR-cML by Xue et al. (2021).
Bayesian methods have been far less utilised in the Mendelian randomization literature, despite a number of potential advantages over frequentist approaches in fitting a model such as that given in equation ( 4).Berzuini et al. (2020) developed a Bayesian approach where parameters representing pleiotropic effects are given a shrinkage prior.This has the effect of shrinking these terms pertaining to the valid instruments toward zero, similar to the lassobased approaches.The use of credible intervals obtained from the posterior distribution was shown to provide better coverage than the weighted median method, suggesting that the Bayesian approach is better able to estimate the uncertainty in the effect estimates.The method requires individual-level data and an assumption akin to InSIDE.Recently developed approaches aimed at the correlated pleiotropy case include CAUSE (Morrison et al., 2020), MR-Corr2 (Cheng et al., 2021), and MR-CUE (Cheng et al., 2022), which each utilise Bayesian methods.MR-Corr2 uses a spike-and-slab prior for pleiotropic terms, similar to BAYESMR, and also includes a parameter to account for correlated pleiotropy.However, it does not allow for both uncorrelated and correlated pleiotropy among the variants.CAUSE and MR-CUE make the assumption, illustrated in Figure 1E, that genetic instruments are either directly associated with the exposure or only associated with the exposure via a confounder.Xue et al. (2021) showed that CAUSE is particularly sensitive to violations of this assumption.Another feature of these approaches is that they utilise genome-wide summary statistics, as opposed to just summary statistics relating to a curated set of instruments.The genome-wide summary statistics are used to estimate nuisance parameters which are inputs to the Bayesian models.
Figure 1D shows the case where correlated pleiotropy is caused by a confounder of the genetic variant-outcome relationship, for example due to population structure.Although the cause of instrument invalidity differs to that shown in Figure 1C, we can model it in the same way.We thus consider (4) to be a very general parametric model for Mendelian randomization with pleiotropy.It can represent invalidity due to both pleiotropy and population structure, and does not make the assumption of CAUSE and MR-CUE that the instruments cannot both be directly associated with the exposure and subject to correlated pleiotropy.

Univariable Mendelian randomization
Suppose we have, for J genetic variants, summary statistics of their association with the exposure and outcome.We denote the estimate of the association between the jth variant and the exposure and outcome by β Xj and β Y j , respectively, and the standard errors of the estimates by σ Xj and σ Y j .We model the association estimates by We place a jointly normal prior on (β Xj , α j ) and we use the following weakly informative priors where N + (•, •) is the half-normal distribution, that is, the truncated normal distribution bounded below at 0.
In this model, the β Xj term represents the effect of the jth genetic variant on the exposure, and the α j term represents the effect of the variant on the outcome which bypasses the exposure.If the jth variant is a valid instrument, then the true value of α j is zero.Otherwise, the variant is pleiotropic, and the pleiotropic effect is accounted for by a non-zero α j .If the jth variant is either a cause of, or caused by, U, then β Xj and α j will be correlated.This correlation is modelled by the ρ j parameter.Thus, the model can account for valid instruments (α j = 0), uncorrelated pleiotropy (α j ≠0, ρ j = 0) and correlated pleiotropy (α j ≠ 0, ρ ≠ 0).Any prior distribution for ρ j which restricts its values to the interval (−1, 1) may be used, and should be informed by domain knowledge of the likely biological mechanisms underlying the genetic variant associations.For our simulation study and applied examples, we set ρ j = 2r j − 1 where r j ∼ Beta (10, 10).This places the majority of the mass in the prior distribution around zero, reflecting the fact that many of the genetics variants will either be valid instruments, or have pleiotropic effects which are close to uncorrelated with its exposure association.However, it still allows for substantial correlation when this is present.
For the final two parameters in the model, we use the priors where Cauchy + (•, •) is the half-Cauchy distribution.Under this parametrisation, the marginal distribution of α j is the horseshoe prior (Carvalho et al., 2010).The horseshoe is a shrinkage prior which will shrink some of the α j 's toward zero, while non-zero α j 's will escape shrinkage better than when using a Laplace prior (which is equivalent to lasso regression).Thus, it is less likely to underestimate the value of non-zero α j 's.The τ parameter controls the global level of shrinkage such that a lower value will shrink more of the α j 's close to zero.In our case, we assign τ a half-Cauchy prior and allow its posterior distribution to be estimated from data.However, if there was prior knowledge on the level of sparsity, this could be incorporated by adjusting this prior or setting a fixed value (Piironen and Vehtari, 2017).The horsehoe prior has previously been proposed for pleiotropic effects in Mendelian randomization by Berzuini et al. (2020) for the individual level data setting.
In practice, we have implemented the method using BUGS via the rjags package in R (Plummer, 2022).We take 10 000 samples from the posterior distribution (with a burn-in of 10 000 iterations) from 3 chains, which, in all cases reported in the simulation study and applied examples, resulted in R-hat values for the parameters of interest close to 1, indicating good convergence.The causal effect estimate is the mean of the posterior samples, and the 95% credible interval is the lower and upper 2.5th percentiles.We call our method MR-Horse (after the horseshoe prior).

Multivariable Mendelian randomization
Multivariable Mendelian randomization considers the effect of multiple exposures on the outcome (Burgess and Thompson, 2015;Burgess et al., 2015).Originally motivated as a method for accounting for measured pleiotropy, it can also be used to simultaneously estimate causal effects of multiple correlated exposures (Sanderson et al., 2019), to perform risk factor selection (Zuber et al., 2020) and in mediation analysis (Carter et al., 2021).A genetic variant is a valid instrumental variable for multivariable Mendelian randomization if: it is associated with at least one exposure; it is not associated with the outcome via any confounding pathway; and it does not affect the outcome other than via one or more of the measured exposures.
The multivariable IVW approach fits the model given in equation ( 1), but where the β Xj 's are vectors of the association estimates between the jth genetic variant and each exposure.
There are far fewer pleiotropy robust methods for multivariable Mendelian randomization compared with the univariable setting.Commonly used approaches include multivariable versions of MR-Egger (Rees et al., 2017) and MR-PRESSO, a multivariable median-based approach (Grant and Burgess, 2021), and GRAPPLE (Wang et al., 2021), akin to a multivariable version of MR-RAPS.Recently, a multivariable version of MR-cML has been developed (Lin et al., 2023).

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts Our proposed model for univariable Mendelian randomization can be easily generalised to the multivariable case as follows.We denote by β Xj• the vector of associations between the jth genetic variant and the K exposures, and by β Xj .their estimates which have covariance matrix Σ Xj .As with the standard errors in the univariable case, the Σ Xj 's are assumed known.
In practice, they can be estimated either using individual level data or from GWAS summary statistics (see, for example, Sanderson et al., 2021 andGrant et al., 2022).We fit the model where the kth element of the vector θ represents the direct causal effect of the kth exposure on the outcome.We place a joint prior on the (β Xj• ,α j ) as follows and we use the following priors As in the univariable case, this parametrisation puts a horseshoe shrinkage prior on the α j 's with global shrinkage controlled by the τ parameter.The correlation between the genetic variant-exposure effects and the pleiotropic effects are accounted for via the ρ j parameters, where, again, we set ρ j = 2r j − 1 with r j ∼ Beta (10, 10).
As before, we have implemented the method using BUGS and the rjags package in R, with 10 000 samples taken from the posterior distribution (with a burn-in of 10 000 iterations) from 3 chains.In all cases reported in the simulation study, R-hat values for the parameters of interest were close to 1, indicating good convergence.We call the multivariable method MVMR-Horse.

Simulation study
Primary simulation parametrisation-In order to examine the performance of our method in comparison with alternatives, we generated data based on the simulation parametrisation of Xue et al. (2021), with some parameters adapted to ensure levels of instrument strength and variation of the exposure explained that is typical of realistic settings.The data generating mechanism is Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts where G j ∼ Binomial (2, f j ), f j ∼ Uniform (0.1, 0.3), ε U , ε Y ∼ N (0, 1), and ε X ∼ N (0, var (ε X )), independently.For each j, β X j was generated uniformly on the interval (−0.15, −0.05) ∪(0.05, 0.15), γ j ∼ Uniform (−0.1, 0.1) and δ X = δ Y = 1.We considered scenarios where either 20%, 40% or 60% of variants were invalid instruments.Unlike Xue et al. ( 2021), we considered both balanced and directional pleiotropy scenarios.In the balanced pleiotropy case, α j ∼ Uniform (−0.2, 0.2), and in the directional pleiotropy case, α j ∼ Uniform (−0.1, 0.3), when the jth variant was invalid, and in both cases α j = 0 when the jth variant was valid.The causal effect was either θ = 0.1 or 0. Note that under this parametrisation, InSIDE was violated in all scenarios.In each of 1 000 replications, two independent samples of sizes n X and n Y were generated.The genetic association estimates with the exposure were taken in the sample of size n X , and with the outcome in that of size n Y .The association estimates and their standard errors were obtained using simple linear regression of the exposure and outcome on each genetic variant in turn.
We considered three primary settings.In the first, we set J = 100, var (ε X ) = 1, and n X = n Y = 50 000.This generated strong genetic instruments (with mean F-statistic across all scenarios between 77.4 and 85.0) which explained a high proportion of the variation in the exposure (with mean R 2 value across all scenarios between 15.4% and 16.9%).In the second setting, we set J = 100, var (ε X ) = 4, n X = 5 000 and n Y = 50 000.This generated weak genetic instruments (mean F-statistic between 4.4 and 4.7) but which still explained a reasonably high proportion of the variance in the exposure (mean R 2 between 8.6% and 9.3%).In the third setting, we set J = 20, var (ε X ) = 1 and n X = n Y = 50 000.
This generated strong genetic instruments (mean F-statistic between 87.7 and 98.2) which explained a reasonably low proportion of the variance in the exposure (mean R 2 between 3.5% and 3.9%).
Comparison methods-We compared our approach with standard IVW estimation and also an oracle estimator which performed IVW using only the truly valid instruments.We used a weighted median method, MR-Median, which requires half of the variants to be valid but does not make the InSIDE assumption nor requires balanced pleiotropy.We used BWMR to compare with another Bayesian approach, noting that this method assumes InSIDE.Finally, we compared with MR-cML as well as an extension to this approach MR-cML-DP.These latter two methods both make the plurality valid assumption and aim to shrink estimates of the pleiotropic effects pertaining to valid instruments toward zero.The extension uses data perturbation (DP) to produce multiple perturbed samples (the default, which we use, being 200 samples), and then takes averages across the samples for estimation Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts of the causal effect and standard error.This significantly increases the computational time, but has been shown to reduce bias and type I error rates.
Other common methods not considered include MR-Egger, MR-RAPS and MR-PRESSO (In-SIDE and balanced pleiotropy) and also ConMix, MR-Mix, MR-Mode, MR-Lasso (plurality valid, can handle some directional pleiotropy).In the simulation study of Xue et al. (2021), MR-cML generally outperformed each of these methods in terms of bias, power and type I error rates.Because our simulation study is largely based on that of Xue et al. (2021), it is expected that MR-cML would be the best performing out of these, and thus for brevity we don't compare with all these methods in our study.Note that MR-cML also performed favorably compared with MR-Median in the study of Xue et al. (2021), however we retain MR-Median here so we have another comparison method which does not require balanced pleiotropy or InSIDE.
Genetic variants acting on the exposure via a single mechanism-Under the primary simulation parametrisation, each genetic variant acts directly on the exposure, and each invalid genetic variant also acts on the exposure indirectly via the unmeasured exposure-outcome confounders.To explore a more parsimonious setting where genetic variants act on the exposure through only one of these mechanisms, we repeated the simulation study in the first setting (that is, with J = 100, var (ε X ) = 1 and n X = n Y = 50 000), but where either β Xj = 0 or δ j = 0, with an equal probability of both, for all j relating to invalid instruments.
Genome-wide simulation study-We performed a simulation study which simulated a set of summary statistics representing genetic associations from independent variants across the genome, in order to compare our approach with CAUSE.We simulated J = 100 000 genetic variant associations such that either p = 100 or 20 variants were expected to have non-zero association with the outcome, setting the heritabilities of X and Y to h X = h Y = 0.3, respectively.We considered scenarios where either 20%, 40%, or 60% of these variants were expected to be pleiotropic.To do this, we first simulated β X j , α j , γ j from the joint mixture distribution where Grant and Burgess Page 10 Am J Hum Genet.Author manuscript; available in PMC 2024 February 22.

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts and q is the proportion of variants associated with the exposure which are invalid.
For each variant, we computed a p-value using β X j and s j .We performed the same methods used in the primary simulation, taking as inputs variants with p-value < 5 × 10 −8 (that is, which pass the conventional genome-wide significance threshold).We also performed CAUSE using all simulated genetic variants, and with its default threshold p-value < 0.001 as classifying that a variant is associated with the exposure.Due to the extra computational time required to run the simulations, we restricted this simulation study to 200 replications per scenario.
Multivariable simulation study-To compare methods in the multivariable setting, we simulated from the same model as in the primary simulation with J = 100 and n X = n Y , but where there were two exposures generated according to for k = 1, 2, where θ 1 = 0.1 or 0, θ 2 = 0.1, and (ε Xk , ε Xk ) were simulated from the bivariate normal distribution with marginal variances of 1 and covariance of 0.3.
As well as the multivariable IVW and multivariable IVW-oracle methods, we also compared with the multivariable median approach (MVMR-Median), GRAPPLE and the constrained maximum likelihood approach with data perturbation (with 100 perturbed samples), which we denote MVMR-cML-DP.Note that for the methods which require the full covariance matrices of the genetic variant-exposures association estimates (that is, GRAPPLE, MVMR-cML-DP and MVMR-Horse), we input these assuming that the traits are independent, even though they are not.This it to reflect the typical scenario where only GWAS summary statistics are available without trait correlation estimates.Due to the extra computational time required to run the simulation, we restricted this simulation study to 200 replications per scenario.

Applied example
We considered combinations of four exposures: LDL-cholesterol (LDL), triglycerides (TG), fasting glucose (FG); and birth weight (BW), on three outcomes: coronary artery disease (CAD); type 2 diabetes (T2D); and Asthma.Each of these exposure-outcome relationships were considered by both Morrison et al. (2020) and Xue et al. (2021), some with conflicting results.Following Morrison et al. (2020), we classify the exposure-outcome pairs as either: having a causal effect supported by the literature (LDL-CAD, TG-CAD, FG-T2D); having unknown or conflicting evidence of a causal effect (BW-CAD, TG-T2D, FG-CAD, BW-
We analysed each of the exposure-outcome pairs using two-sample Mendelian randomization from publicly available GWAS results, as shown in Table 7.All genetic associations were estimated in samples of individuals of European, or predominantly European, ancestry.Note that some of these GWAS datasets differ to those used in Morrison et al. (2020) and Xue et al. (2021), and so there are some differences in the results that we report here.
For each exposure, we chose a set of instruments by first selecting all genetic variants which associated with the trait with a p-value < 5 × 10 −8 , then pruned these to have r 2 < 0.001.
The set of variants was then harmonised with each outcome dataset, and only variants which appeared in both exposure and outcome datasets were retained.We performed analysis using the IVW, CAUSE, MR-cML, MR-cML-DP, and MR-Horse methods.We used the TwoSampleMR R package (Hemani et al., 2018) for the pruning and harmonisation procedures, and the MendelianRandomization R package (Yavorska and Burgess, 2017;Broadbent et al., 2020) for implementing the IVW, MR-cML and MR-cML-DP methods.

Simulation study results
The mean estimates, standard deviation of estimates, coverage and power / type I error rates for each scenario are shown in Table 1 (J = 100, strong instruments), Table 2 (J = 100, weak instruments), Table 3 (J = 20, strong instruments), Table 4 (single genetic mechanism), Table 5 (genome-wide study) and Table 6 (multivariable setting, showing results relating to estimates of θ 1 only).For MR-Horse and MVMR-Horse, we computed coverage as the proportion of replications for which the true causal effect was within the 95% credible interval, and rejection rate as the proportion of replications for which the 95% credible interval did not contain zero.For the other methods, except for CAUSE, we used the confidence intervals (or credible intervals) produced by their respective software packages to evaluate coverage and rejection rates in the same way (where 95% confidence intervals for BWMR were produced using the effect estimates and standard errors given by the BWMR package).For CAUSE, we evaluated coverage using the 95% credible interval, and the rejection rate as the proportion of replications for which the p-value (comparing the causal model to the non-causal sharing model) was less than 0.05.
When genetic instruments were strong, and explained a large proportion of the variance in the exposure, the mean estimates from MR-cML, MR-cML-DP and MR-Horse were close to IVW-Oracle at all levels of pleiotropy.The IVW and BWMR methods showed considerable bias from 40% pleiotropy, as did MR-Median at 60% pleiotropy.Only MR-cML-DP and MR-Horse retained nominal coverage and type I error rates in all scenarios.In general, MR-cML-DP showed slightly higher power, whereas MR-Horse showed slightly higher coverage.These results were similar in the case where this setting was repeated but with genetic variants acting on the exposure only via a single mechanism.

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts When the genetic instruments were weak, the estimates from IVW and IVW-Oracle attenuated to the null, which is the expected behaviour of these methods with weak instruments.MR-Median and BWMR showed bias and inflated type I error rates across all medium-to high-level pleiotropy settings.MR-Median does not account for the uncertainty in the genetic variant-exposure association estimates, and so is not robust to weak instruments.BWMR does account for this uncertainty, but does not allow for InSIDE violations.MR-cML also showed considerable bias and inflated type I error rates in the medium-to high-pleiotropy settings, as did MR-cML-DP in the high-pleiotropy settings.
On the other hand, MR-Horse had low levels of bias and had type I error rates below the nominal level in all scenarios.
When the genetic instruments were strong, but explained a low proportion of the variation in the exposure, the performances of each method were broadly similar to in the first setting, but with lower power across the board.At the high pleiotropy levels, MR-cML-DP had type I error rates above the nominal level, whereas MR-Horse did not.In the genome-wide simulation study, MR-Horse had comparable power and bias to CAUSE, with substantially lower type I error rates.It again had slightly higher coverage and lower type I error rates compared with MR-cML-DP, and outperformed all other methods across each metric.
In the multivariable simulation study, MVMR-Horse outperformed IVW, MVMR-Median and GRAPPLE in terms of bias, precision and type I error rates in all scenarios.Similar to the univariable settings, MVMR-Horse retained type I error rates below the nominal level in all scenarios, with the trade-off of lower power compared with MVMR-cML-DP.MVMR-cML-DP had inflated type I error rates in the higher pleiotropy scenarios.
Overall, the results from the simulation study show that MR-Horse and MVMR-Horse have similar levels of performance to MR-cML-DP and MVMR-cML-DP when genetic instruments are strong and explain a high amount of the variation in the exposure.
However, particularly at high levels of pleiotropy, MR-Horse outperforms MR-cML-DP when instruments are weak and/or explain only a small amount of variation in the exposure.
Although conservative, MR-Horse retained low bias and type I error rates below the nominal level in all scenarios considered, and was the only method to do so out of those considered.

Applied example results
Figure 2 shows the estimated log odds ratio of each outcome per standard deviation increase in the genetically-predicted levels of each exposure, as well as the 95% confidence intervals (IVW, MR-cML, MR-cML-DP) or 95% credible intervals (CAUSE, MR-Horse).For the exposure-outcome combinations with established causality, the results from each method were broadly in line across the different methods.For LDL-T2D, CAUSE did not show evidence of a causal effect.MR-Horse agreed with MR-cML(-DP) that increased LDL may reduce T2D risk, although with a 95% credible interval that came closer to the null.The other conflicting result was that CAUSE and MR-cML showed evidence of a causal effect of TG on T2D risk, whereas MR-cML-DP and MR-Horse produced a null result.Because MR-cML-DP and MR-Horse showed lower type I error rates in our simulation studies, one possible explanation for this difference is that CAUSE and MR-cML may be producing a false positive result in this case.All methods showed a causal effect of BW on the risk of

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts CAD and T2D, although with the estimates from CAUSE much closer to the null than the others.Finally, all methods gave a null result for the causal effect of LDL, TG and BW on the risk of asthma.However, MR-Horse showed a possible causal relationship between fasting glucose and risk of asthma, agreeing with MR-cML(-DP).

Discussion
In this paper, we have presented a Bayesian framework for performing Mendelian randomization with summary level data which is robust to general forms of instrument invalidity.The approach does not require the InSIDE assumption and so it can handle correlated pleiotropy.The method is easily generalised to the multivariable Mendelian randomization setting, providing another robust method for multivariable studies.Our simulation studies showed that the approach has favorable performance in comparison with commonly used methods across a variety of settings.
Our approach has similarities to previously proposed Bayesian approaches, but has been designed to avoid many of the restrictions that these methods pose.Berzuini et al. (2020) first proposed a Bayesian framework for Mendelian randomization with pleiotropy and suggested the use of the horseshoe prior to account for the presence of some valid and some invalid instruments.However, their method requires individual level data and makes the InSIDE assumption.2022) are able to account for both correlated and uncorrelated pleiotropy, but make a further assumption that the genetic instruments are associated with the exposure either entirely through the confounders, or entirely through a direct effect.
Our approach utilises the horseshoe prior as suggested by Berzuini et al., can be applied using summary level data, and can account for both correlated and uncorrelated pleiotropy without any assumption on the pathways between the genetic instruments and exposure.Furthermore, we have shown how to extend the framework to be used for multivariable Mendelian randomization.
In comparison to frequentist approaches for Mendelian randomization, our method is based on the same causal model as that used by Xue et al. ( 2021), which we believe is a very general model for instrument invalidity.With strong genetic instruments, the performance of our method in simulation studies was comparable to Xue et al.'s method when the latter was applied with the data perturbation extension in scenarios with low-to mid-levels of pleiotropy.However, when the genetic instruments were weak and/or explained a small amount of variation in the exposure, MR-Horse outperformed all other methods considered.In practice, Mendelian randomization studies with weak instruments, both in the sense of low F statistics and low proportion of variation in the exposure explained, are common, particularly with the sort of complex exposures which are being increasingly studied with this design.It is also likely that, particularly when many genetic variants are selected as instruments from high powered GWAS, the proportion of invalid instruments is similar to the high pleiotropy scenarios considered in our simulation study.Thus, the simulation scenarios whose results are shown in Tables 2 (weak instruments) and 3 (low proportion of variation explained) are likely to closely reflect many real-world settings.MR-Horse was the only method considered which retained low levels of bias, and coverage and type I error rates at their nominal levels in these cases.
The trade-off to the Bayesian approach retaining low type I error rates is that it sometimes had lower power than the frequentist approaches.The increased conservatism in the Bayesian approach supports the assertion that the horseshoe prior is able to avoid shrinkage of the pleiotropic effects pertaining to invalid instruments.Besides improved type I error control, another advantage of our framework is that it allows for the model specification to be adapted to specific settings.For example, the prior distribution of the global shrinkage parameter could be adjusted, or set at a fixed value, based on prior knowledge of the level and nature of pleiotropy.As long as these prior assumptions are reasonable, such incorporation of prior knowledge would be expected to reduce the uncertainty in the estimates and increase power.Although the implementation of the method in this paper has used some weakly-informative priors for the model parameters, an interesting area for future development of the approach would be to explore the use of domain knowledge to further inform the model inputs.
Not taken into account in our model are sample overlap and linkage disequilibrium.Most of the methods which are commonly used for Mendelian randomization assume independent genetic variants, and achieve this by pruning candidate instruments according to an r 2 threshold.In general, the use of correlated variants is unlikely to add substantially more information to an analysis, and methods which are able to incorporate linkage disequilibrium tend to be very sensitive to mis-specification of genetic correlations (Burgess et al., 2017).Nonetheless, there are situations where correlated variants may be desirable, such as in a cis-Mendelian randomization study exploring drug targets (Schmidt et al., 2020).In this case, correlation estimates could be incorporated into our model by specifying a joint distribution for the genetic variant-exposure association estimates and for the genetic variant-outcome association estimates which included genetic correlations, as is done by Cheng et al. (2022) to account for linkage disequilibrium.
Sample overlap is more likely to be an issue in practice, since many published GWAS results are from meta-analyses which include common data sources such as the UK Biobank (Sudlow et al., 2015).The models of Morrison et al. (2020) and Cheng et al. (2022) account for sample overlap by including correlation between β Y j and β Xj in a joint distribution.Our model could be adapted in a similar way to account for sample overlap if it was believed to be substantial.
Assumptions that we have made are of the linearity of the effect of the exposures on the outcome, and homogeneity or no effect modification.Furthermore, an implicit assumption that we have made in developing the method and simulation study is that all traits are continuous.In practice, summary level Mendelian randomization methods can be performed with categorical traits, for example by taking genetic association estimates from logistic regression.This may bias causal effect estimates because of the non-collapsibility of odds Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts ratios, however the bias will tend to be toward the null, at least in the univariable case (Bowden et al., 2017).
Overall, our Bayesian framework is an effective and flexible method for both univariable and multivariable Mendelian randomization.It provides an important sensitivity analysis to these studies which is robust to violations of the InSIDE assumption and directional pleiotropy.Furthermore, it can be easily adapted to handle sample overlap or linkage disequilibrium.
Europe     Results of simulations for the scenarios with 100 strong genetic instruments.
Reported is the mean estimate (Mean), standard deviation of estimates (SD), proportion of replications that the 95% confidence interval (or credible interval in the case of MR-Horse) contained the true causal effect (Cov.) and the proportion of replications that the 95% confidence interval (or credible interval) did not contain zero (Rej.)Results of simulations for the scenarios with 20 strong genetic instruments.
Reported is the mean estimate (Mean), standard deviation of estimates (SD), proportion of replications that the 95% confidence interval (or credible interval in the case of MR-Horse) contained the true causal effect (Cov.) and the proportion of replications that the 95% confidence interval (or credible interval) did not contain zero (Rej.)Results of simulations for the genome-wide simulation scenarios with either J = 100 or J = 20 genetic variants associated with the exposure.
Reported is the mean estimate (Mean), standard deviation of estimates (SD), proportion of replications that the 95% confidence interval (or credible interval in the cases of CAUSE and MR-Horse) contained the true causal effect (Cov.) and the proportion of replications that the 95% confidence interval (or credible interval) did not contain zero or, in the case of CAUSE, the proportion of replications that the causal null hypothesis was rejected (Rej.) Zhao et al. (2019) (BWMR) andBucur et al. (2020) (BAYESMR) have proposed Bayesian approaches using summary-level data.Similar toBerzuini et al. (2020), these approaches place a shrinkage prior on the α j terms and make the InSIDE assumption.
Zhao et al. (2019) andBucur et al. (2020) developed versions of the Berzuini et al. approach using summary level data, again requiring InSIDE to be satisfied.Cheng et al. (2021) introduced a correlation parameter into a Bayesian model to account for correlated pleiotropy, however their model requires that pleiotropic effects are either all correlated or all uncorrelated with the variant-exposure effects.The methods of Morrison et al. (2020) and Cheng et al. (

Figure 1 .
Figure 1.Directed acyclic graphs showing the relationship between the jth genetic variant (G j ), exposure (X), outcome (Y) and unmeasured confounders (U) when: (A) G j is a valid instrument; (B) G j is invalid due to a direct effect on Y ; (C) G j is invalid due to both a direct effect on Y and an indirect effect via U ; (D) G j is invalid due to both a direct effect on Y and being caused by U ; (E) G j is invalid due to either (i) a direct effect on Y or (ii) both a direct effect on Y and an indirect effect via U but with no direct effect on X.

Figure 2 .
Figure 2. Mendelian randomization estimates and 95% confidence intervals (IVW, MR-cML, MR-cML-DP) or 95% credible intervals (CAUSE, MR-Horse) for the different exposureoutcome combinations.Plots coloured in red indicate exposure-outcome relationships considered causal.Plots coloured in yellow indicate exposure-outcome pairs with unknown causal relationships.Plots coloured in blue indicate exposure-outcome relationships considered not causal.In each plot, the vertical dashed line indicates a log odds ratio of zero.