Skip to main content
Log in

A General Approach to Sensitivity Analysis for Mendelian Randomization

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Mendelian Randomization (MR) represents a class of instrumental variable methods using genetic variants. It has become popular in epidemiological studies to account for the unmeasured confounders when estimating the effect of exposure on outcome. The success of Mendelian Randomization depends on three critical assumptions, which are difficult to verify. Therefore, sensitivity analysis methods are needed for evaluating results and making plausible conclusions. We propose a general and easy to apply approach to conduct sensitivity analysis for Mendelian Randomization studies. Bound et al. (J Am Stat Assoc 90:443–450. 10.2307/2291055, 1995) derived a formula for the asymptotic bias of the instrumental variable estimator. Based on their work, we derive a new sensitivity analysis formula. The parameters in the formula include sensitivity parameters such as the correlation between instruments and unmeasured confounder, the direct effect of instruments on outcome and the strength of instruments. In our simulation studies, we examined our approach in various scenarios using either individual SNPs or unweighted allele score as instruments. By using a previously published dataset from researchers involving a bone mineral density study, we demonstrate that our proposed method is a useful tool for MR studies, and that investigators can combine their domain knowledge with our method to obtain bias-corrected results and make informed conclusions on the scientific plausibility of their findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Auerbach J et al (2018) Causal modeling in a multi-omic setting: insights from GAW20. BMC Genet 19:74. https://doi.org/10.1186/s12863-018-0645-4

    Article  Google Scholar 

  2. Basmann RL (1957) A generalized classical method of linear estimation of coefficients in a structural equation. Econometrica 25:77–83

    Article  MathSciNet  Google Scholar 

  3. Bauchet M et al (2007) Measuring European population stratification with microarray genotype data. Am J Hum Genet 80:948–956. https://doi.org/10.1086/513477

    Article  Google Scholar 

  4. Bound J, Jaeger D, Baker R (1995) Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J Am Stat Assoc 90:443–450. https://doi.org/10.2307/2291055

    Article  Google Scholar 

  5. Burgess S, Bowden J, Fall T, Ingelsson E, Thompson SG (2017) Sensitivity analyses for robust causal inference from mendelian randomization analyses with multiple genetic variants. Epidemiology 28:30–42. https://doi.org/10.1097/EDE.0000000000000559

    Article  Google Scholar 

  6. Burgess S, Thompson SG (2014) Mendelian randomization: methods for using genetic variants in causal estimation. Chapman & Hall/CRC interdisciplinary statistics series. Chapman & Hall/CRC, Boca Raton

  7. Burgess S, Thompson SG (2015) Mendelian randomization: methods for using genetic variants in causal estimation. Chapman & Hall/CRC interdisciplinary statistics series. CRC Press, Taylor & Francis Group, Boca Raton

  8. Chao J, Swanson NR (2007) Alternative approximations of the bias and MSE of the IV estimator under weak identification with an application to bias correction. J Econom 137:515–555. https://doi.org/10.1016/j.jeconom.2005.09.002

    Article  MathSciNet  MATH  Google Scholar 

  9. Conley TG, Hansen CB, Rossi PE (2012) Plausibly exogenous. Rev Econ Stat 94:260–272. https://doi.org/10.1162/REST_a_00139

    Article  Google Scholar 

  10. Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL (1959) Smoking and lung cancer: recent evidence and a discussion of some questions. J Natl Cancer Inst 22:173–203

    Google Scholar 

  11. Davey Smith G, Ebrahim S (2005) What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ 330:1076–1079. https://doi.org/10.1136/bmj.330.7499.1076

    Article  Google Scholar 

  12. Harding DJ (2003) Counterfactual models of neighborhood effects: the effect of neighborhood poverty on dropping out and teenage pregnancy. Am J Sociol 109:676–719. https://doi.org/10.1086/379217

    Article  Google Scholar 

  13. Dimitri P (2018) Fat and bone in children—where are we now? Ann Pediatr Endocrinol Metab 23:62–69. https://doi.org/10.6065/apem.2018.23.2.62

    Article  Google Scholar 

  14. Gastwirth JL, Krieger AM, Rosenbaum PR (1998) Dual and simultaneous sensitivity analysis for matched pairs. Biometrika 85:907–920. https://doi.org/10.1093/biomet/85.4.907

    Article  MATH  Google Scholar 

  15. Goh WWB, Wang W, Wong L (2017) Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol 35:498–507. https://doi.org/10.1016/j.tibtech.2017.02.012

    Article  Google Scholar 

  16. Golding J (1990) Children of the nineties. A longitudinal study of pregnancy and childhood based on the population of Avon (ALSPAC). West Engl Med J 105:80–82

    Google Scholar 

  17. Greenland S (1996) Basic methods for sensitivity analysis of biases. Int J Epidemiol 25:1107–1116

    Article  Google Scholar 

  18. Haavelmo T (1944) The probability approach in econometrics. Econometrica 12:1–15

    Article  MathSciNet  Google Scholar 

  19. Hackinger S, Zeggini E (2017) Statistical methods to detect pleiotropy in human complex traits. Open Biol. https://doi.org/10.1098/rsob.170125

    Article  Google Scholar 

  20. Katan MB (2004) Apolipoprotein E isoforms, serum cholesterol, and cancer. 1986. Int J Epidemiol 33:9. https://doi.org/10.1093/ije/dyh312

    Article  Google Scholar 

  21. Kolesár M, Chetty R, Friedman J, Glaeser E, Imbens GW (2015) Identification and inference with many invalid instruments. J Bus Econ Stat 33:474–484. https://doi.org/10.1080/07350015.2014.978175

    Article  MathSciNet  Google Scholar 

  22. Leek JT et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739. https://doi.org/10.1038/nrg2825

    Article  Google Scholar 

  23. Lin DY, Psaty BM, Kronmal RA (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54:948–963

    Article  Google Scholar 

  24. Listgarten J, Kadie C, Schadt EE, Heckerman D (2010) Correction for hidden confounders in the genetic analysis of gene expression. Proc Natl Acad Sci USA 107:16465–16470. https://doi.org/10.1073/pnas.1002425107

    Article  Google Scholar 

  25. Matthew H, Jerry H, Christopher JP (2016) Finite sample bias corrected IV estimation for weak and many instruments. Adv Econom 36:245–273

    Article  Google Scholar 

  26. Michaelson JJ, Loguercio S, Beyer A (2009) Detection and interpretation of expression quantitative trait loci (eQTL). Methods 48:265–276. https://doi.org/10.1016/j.ymeth.2009.03.004

    Article  Google Scholar 

  27. Neuman JA, Isakov O, Shomron N (2013) Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection. Brief Bioinform 14:46–55. https://doi.org/10.1093/bib/bbs013

    Article  Google Scholar 

  28. Novembre J et al (2008) Genes mirror geography within Europe. Nature 456:98–101. https://doi.org/10.1038/nature07331

    Article  Google Scholar 

  29. Palmer TM et al (2012) Using multiple genetic variants as instrumental variables for modifiable risk factors. Stat Methods Med Res 21:223–242. https://doi.org/10.1177/0962280210394459

    Article  MathSciNet  MATH  Google Scholar 

  30. Rosenbaum PR (1987) Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika 74:13–26. https://doi.org/10.2307/2336017

    Article  MathSciNet  MATH  Google Scholar 

  31. Rosenbaum PR, Rubin DB (1983) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B (Methodol) 45:212–218

    Google Scholar 

  32. Seldin MF, Price AL (2008) Application of ancestry informative markers to association studies in European Americans. PLoS Genet 4:e5. https://doi.org/10.1371/journal.pgen.0040005

    Article  Google Scholar 

  33. Sivakumaran S et al (2011) Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet 89:607–618. https://doi.org/10.1016/j.ajhg.2011.10.004

    Article  Google Scholar 

  34. Small DS (2007) Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J Am Stat Assoc 102:1049–1058. https://doi.org/10.1198/016214507000000608

    Article  MathSciNet  MATH  Google Scholar 

  35. Smith GD, Ebrahim S (2003) ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 32:1–22

    Article  Google Scholar 

  36. Smith GD, Ebrahim S (2004) Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol 33:30–42. https://doi.org/10.1093/ije/dyh132

    Article  Google Scholar 

  37. Theil H (1953a) Estimation and simultaneous correlation in complete equation systems. Central Planning Bureau. Mimeo, The Hague

  38. Theil H (1953b) Repeated least squares applied to complete equation systems. Central Planning Bureau. Mimeo, The Hague

  39. Theil H (1958) Economic forecasts and policy. Central Planning Bureau. Mimeo, The Hague

  40. Timpson NJ, Sayers A, Davey-Smith G, Tobias JH (2009) How does body fat influence bone mass in childhood? A Mendelian randomization approach. J Bone Miner Res 24:522–533. https://doi.org/10.1359/jbmr.081109

    Article  Google Scholar 

  41. Vanderweele TJ, Arah OA (2011) Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22:42–52. https://doi.org/10.1097/EDE.0b013e3181f74493

    Article  Google Scholar 

  42. Wang X, Jiang Y, Zhang NR, Small DS (2018) Sensitivity analysis and power for instrumental variable studies. Biometrics. https://doi.org/10.1111/biom.12873

    Article  MathSciNet  Google Scholar 

  43. Wosje KS, Khoury PR, Claytor RP, Copeland KA, Kalkwarf HJ, Daniels SR (2009) Adiposity and TV viewing are related to less bone accrual in young children. J Pediatr 154:79–85.e72. https://doi.org/10.1016/j.jpeds.2008.06.031

    Article  Google Scholar 

  44. Wright PG (1928) The tariff on animal and vegetable oils. The Institute of Economics Investigations in international commercial policies, vol 26. MacMillan, New York

  45. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93. https://doi.org/10.1016/j.ajhg.2011.05.029

    Article  Google Scholar 

  46. Wu Y et al (2018) Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits. Nat Commun 9:918. https://doi.org/10.1038/s41467-018-03371-0

    Article  Google Scholar 

  47. Zhang W, Ghosh D (2017) On the use of kernel machines for Mendelian randomization. Quant Biol 5:368–379. https://doi.org/10.1007/s40484-017-0124-3

    Article  Google Scholar 

  48. Zhu Z et al (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48:481–487. https://doi.org/10.1038/ng.3538

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Science Foundation under Grant No. NSF ABI 1457935 and the National Institutes of Health under Grant R01 GM117946.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debashis Ghosh.

Ethics declarations

Conflict of interest

Weiming Zhang and Debashis Ghosh declared that they have no conflict of interest.

Appendix 1

Appendix 1

We assume independence of the p genetic variants. We first expand the \({\sigma }_{\widehat{x},\varepsilon },\) which is the covariance between the fitted exposure and the error term in Eq. (5).

$$\begin{aligned}{\sigma }_{\widehat{x},\varepsilon} & =cov\left(\widehat{x},\varepsilon \right) \\ & =cov\left({\alpha }_{0}+{\alpha }_{u2}{\gamma }_{0}+Z(\alpha_{z}+{\alpha }_{u2}{\gamma }_{z}), {\beta }_{u1}{U}_{1}+{\beta }_{u2}{U}_{2}+Z{\beta }_{z}+{e}_{y}\right) \\ & = cov\left(Z({\alpha }_{z }+{\alpha }_{u2}{\gamma }_{z}), {\beta }_{u2}{U}_{2}+Z{\beta }_{z}\right)\\ &= cov\left(Z({\alpha}_{z }+{\alpha }_{u2}{\gamma }_{z}), {\beta }_{u2}{U}_{2}\right) \\ & \quad +cov\left(Z(\alpha_{z }+{\alpha}_{u2}{\gamma}_{z}), Z{\beta }_{z}\right) \\ & =cov\left(Z(\alpha_{z}+{\alpha}_{u2}{\gamma }_{z}), {\beta }_{u2}({\gamma }_{0}+{Z\gamma }_{z }+{e}_{u2})\right)+ (\alpha_{z}+{\alpha}_{u2}{\gamma}_{z})^{\prime} var(Z){\beta }_{z} \\ & =(\alpha_{z}+{\alpha }_{u2}{\gamma }_{z})^{\prime}var(Z){\beta }_{u2}{\gamma }_{z }+ (\alpha_{z }+{\alpha }_{u2}{\gamma }_{z})^{\prime}var(Z){\beta }_{z} \\ & =(\alpha_{z }+{\alpha }_{u2}{\gamma }_{z})^{\prime}var(Z)({\beta }_{u2}{\gamma }_{z }+{\beta }_{z})={\sum }_{i=1}^{p}(\alpha_{zi }+{\alpha }_{u2}{\gamma }_{zi})({\beta }_{u2}{\gamma }_{zi }+{\beta}_{zi})var\left({Z}_{i}\right)\end{aligned}$$
$${\sigma }_{\widehat{x}}^{2}=var\left({\alpha }_{0}+{\alpha }_{u2}{\gamma }_{0}+{Z(\alpha }_{z }+{\alpha }_{u2}{\gamma }_{z})\right)={(\alpha }_{z }+{\alpha }_{u2}{\gamma }_{z})^{\prime}var(Z){(\alpha }_{z }+{\alpha }_{u2}{\gamma }_{z})={\sum }_{i=1}^{p}{{(\alpha }_{zi }+{\alpha }_{u2}{\gamma }_{zi})}^{2}var({Z}_{i}).$$

Hence,

$$\frac{{\sigma }_{\widehat{x},\varepsilon }}{{\sigma }_{\widehat{x}}^{2}}=\frac{{\sum }_{i=1}^{p}{(\alpha }_{zi }+{\alpha }_{u2}{\gamma }_{zi})({\beta }_{u2}{\gamma }_{zi }+{\beta }_{zi})var\left({Z}_{i}\right)}{{\sum }_{i=1}^{p}{{(\alpha }_{zi }+{\alpha }_{u2}{\gamma }_{zi})}^{2}var({Z}_{i})}.$$

When there is a single SNPs (p = 1), the bias simplifies to

$${\sigma }_{\widehat{x},\varepsilon }/{\sigma }_{\widehat{x}}^{2}={\beta }_{u2}{\gamma }_{z}/{(\alpha }_{z }+{\alpha }_{u2}{\gamma }_{z})+{\beta }_{z}/{(\alpha }_{z }+{\alpha }_{u2}{\gamma }_{z}).$$

When multiple SNPs are used in MR, it was suggested that a summary score should be used in place of the multiple SNPs to reduce the finite-sample bias. This simplified equation may be used in such situation by treating the summary score as the single instrument.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Ghosh, D. A General Approach to Sensitivity Analysis for Mendelian Randomization. Stat Biosci 13, 34–55 (2021). https://doi.org/10.1007/s12561-020-09280-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-020-09280-5

Keywords

Navigation