Abstract
Mendelian randomization (MR) is an epidemiological approach that utilizes genetic variants as instrumental variables to estimate the causal effect of a modifiable but likely confounded exposure on a health outcome. This paper investigates an MR scenario in which different subsets of genetic variants identify different causal effects. These variants may aggregate into clusters, and such variant clusters are likely to emerge if they affect the exposure and outcome via distinct biological pathways. In the framework of multi-outcome MR, where a common risk factor causally impacts several disease outcomes simultaneously, these variant clusters can reflect the heterogeneous effects this shared risk factor concurrently exerts on all the diseases under examination. This, in turn, can provide insights into the disease-causing mechanisms underpinning the co-occurrence of multiple long-term conditions, a phenomenon known as multimorbidity. To identify such variant clusters, we adapt the general method of Agglomerative Hierarchical Clustering (AHC) to the summary data MR setting, enabling cluster detection based on the variant-specific causal estimates, using only genome-wide summary statistics. In particular, we tailor the method for multi-outcome MR to aid the elucidation of the potentially multifaceted causal pathways underlying multimorbidity stemming from a shared risk factor. We show in various Monte Carlo simulations that our ‘MR-AHC’ method detects variant clusters with high accuracy, outperforming the existing multi-dimensional clustering methods. In an application example, we use the method to analyze the causal effects of high body fat percentage on a pair of well-known multimorbid conditions, type 2 diabetes (T2D) and osteoarthritis (OA), discovering distinct variant clusters reflecting heterogeneous causal effects. Pathway analyses of these variant clusters indicate interconnected cellular processes underlying the co-occurrence of T2D and OA; while the protective effect of higher adiposity on T2D could possibly be linked to the enhanced activity of ion channels related to insulin secretion.
Competing Interest Statement
Tim Frayling has received funding from GSK and consulted for Sanofi and Boehringer Ingelheim. Jack Bowden is a part time employee of Novo Nordisk, engaged in work unrelated to this project.
Funding Statement
This work was funded by the Strategic Priority Fund Tackling multimorbidity at scale programme (grant number MC/MR/WO14548/1) delivered by the Medical Research Council and the National Institute for Health and Care Research in partnership with the Economic and Social Research Council and in collaboration with the Engineering and Physical Sciences Research Council. Nicolas Apfel is supported by the ESRC grant EST013567/1.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All data used in this paper are publicly available. GWAS summary statistics for body fat percentage are obtained from the original study, Supplementary material 1d; the type 2 diabetes data are downloaded from http://diagram-consortium.org/downloads.html; osteoarthritis data are available from https://r8.risteys.finngen.fi/phenocode/M13_ARTHROSIS_INCLAVO; data for GST, CAT, SOD, GPX and CRP are downloaded from https://gwas.mrcieu.ac.uk/; data for IL-6, IL-8, IL-12, IL-1B, TNF-A from https://data.bris.ac.uk/data/dataset/c4e3b263f392bb23cd62997d1b14da05; data for GDF-15 from https://www.ebi.ac.uk/gwas/efotraits/EFO_0009181; data for bipolar and major depression disorder from https://pgc.unc.edu/for-researchers/download-results/; data for waist-to-hip ratio from https://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files#GWAS_2010_WHRadjBMI_Summary_Statistics; data for HDL-C and total cholesterol from https://csg.sph.umich.edu/willer/public/lipids2013/; data for coronary artery disease from https://www.cardiogramplusc4d.org/data-downloads/. The R code that implements MR-AHC and that generates the simulation datasets are available on Github: https://github.com/xiaoran-liang/MRAHC.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Application results updated; Supplemental files updated.
Data Availability
All data used in the applied examples in the present study are available upon reasonable request to the authors. The R code that generates the simulation datasets are available online at https://github.com/xiaoran-liang/MRAHC