A novel framework with automated horizontal pleiotropy adjustment in mendelian randomization

Summary The presence of horizontal pleiotropy in Mendelian randomization (MR) analysis has long been a concern due to its potential to induce substantial bias. In recent years, many robust MR methods have been proposed to address this by relaxing the “no horizontal pleiotropy” assumption. Here, we propose a novel two-stage framework called CMR, which integrates a conditional analysis of multiple genetic variants to remove pleiotropy induced by linkage disequilibrium, followed by the application of robust MR methods to model the conditional genetic effect estimates. We demonstrate how the conditional analysis can reduce horizontal pleiotropy and improve the performance of existing MR methods. Extensive simulation studies covering a wide range of scenarios of horizontal pleiotropy showcased the superior performance of the proposed CMR framework over the standard MR framework in which marginal genetic effects are modeled. Moreover, the application of CMR in a negative control outcome analysis and investigation into the causal role of body mass index across various diseases highlighted its potential to deliver more reliable results in real-world applications.


Supplemental Figures
0.0 Here we directly used the 20 generated IVs Z j in both the standard MR and CMR implementations without selection of IVs via LD clumping.From top to bottom are type-I errors, mean squared error (MSE) and boxplots of estimates across 500 replicates. .Forest plots of causal effect estimates (X-axis) of BMI on multiple diseases (Y-axis) obtained from different methods in both the standard MR (with and without Steiger filtering) and CMR frameworks.Solid circles represent statistically significant results after Bonferroni correction with p ≤ 0.05/14.'MR' is applying the standard MR method alone, 'Steiger' is applying the standard MR method after Steiger filtering, and 'CMR' is implementing the proposed CMR framework.
Next, we consider the scenario where G j has an additional pleiotropic pathway to Y (Fig. S7B) as follows: In CMR, we have the conditional genetic effect of Z j on the outcome β * Y j := plim β * Y j : (5) which suggests that Z j is still valid in the CMR framework, even when the effect of G j on X is nonlinear.However, when the pleiotropic effect of G j is not linear in Eq. ( 4), i.e., Y = θX + h(G j ) + ϵ Y , where h(G j ) is some nonlinear function of G j , then the second term in Eq. ( 5) is generally non-zero, making Z j still invalid in CMR.We will consider

Figure S1 .
Figure S1.Addtional simulation results for Scenarios 2 (left) and 3 (right).Here we directly used the 20 generated IVs Z j in both the standard MR and CMR implementations without selection of IVs via LD clumping.From top to bottom are type-I errors, mean squared error (MSE) and boxplots of estimates across 500 replicates.

FigureFigurecML
Figure S2.Q-Q plots of the 96 pairs negative control analysis.Within each plot, 'MR' is applying the standard MR method alone, 'Steiger' is applying the standard MR method after Steiger filtering, and 'CMR' is implementing the proposed CMR framework.

Figure S7 .
Figure S7.Directed acyclic graph among genetic variants Z j , G j , exposure X, outcome Y , and unmeasured confounding U .

Table S1
Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 168 and 143.

Table S2
. Simulation results for Scenario 1 with 20% IVs having biological horizontal pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05, 0.1.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S4 .
Simulation results for Scenario 1 with 60% IVs having biological horizontal pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05, 0.1.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 175 and 150.
SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S5 .
Simulation results for Scenario 2 across 500 replicates.From left to right: causal effect θ = 0, 0.05, 0.1.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.
Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 148 and 144.

Table S6 .
Simulation results for Scenario 3 across 500 replicates.From left to right: causal effect θ = 0, 0.05, 0.1.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.
Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 168 and 168.

Table S7 .
Simulation results with all valid IVs across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.
Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 168 and 163.

Table S8
Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 168 and 159.

Table S9 .
Simulation results for Scenario 1 with 40% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S10 .
Simulation results for Scenario 1 with 60% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.
MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 167 and 148.

Table S11 .
Simulation results for Scenario 1 with 80% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S12 .
Simulation results for Scenario 2 with 20% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.
Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 172 and 161.

Table S13 .
Simulation results for Scenario 2 with 40% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S14 .
Simulation results for Scenario 2 with 60% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 174 and 154.

Table S15 .
Simulation results for Scenario 2 with 80% IVs having LD-induced horizontal pleiotropy and no IV with biological pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.05.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S16 .
Simulation results for Scenario 4 with 20% IVs having LD-induced horizontal pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.1.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.Average F-statistics when θ = 0 across 500 replicates before and after conditioning are 48 and 44.
SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S17 .
Simulation results for Scenario 4 with 60% IVs having LD-induced horizontal pleiotropy across 500 replicates.From left to right: causal effect θ = 0, 0.1.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S19 .
Simulation results for Scenario SB across 500 replicates.From left to right: causal effect θ = 0, 0.2.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.

Table S20 .
Simulation results for Scenario SC across 500 replicates.From left to right: causal effect θ = 0, 0.2.The top part corresponds to results from standard MR methods and the bottom part corresponds to results from the CMR implementation.MSE: mean squared error.T1E: type-1 error.SD: standard deviation of estimates.SE: mean of estimated standard error.