Detecting potential causal relationship between multiple risk factors and Alzheimer’s disease using multivariable Mendelian randomization

Background: Alzheimer’s disease (AD) is a progressive brain disorder characterized by cognitive skills deterioration that affects many elderly individuals. The identified genetic loci for AD failed to explain the large variability in AD and very few causal factors have been identified so far. Results: mvMR showed that increasing years of schooling (OR=0.674, 95%CI: 0.571-0.796, P=3.337E-06) and genetically elevated HDL cholesterol (OR ranging from 0.697 to 0.830, P=6.940E-10) were inversely associated with AD risk, genetically predicted total cholesterol (OR=1.300, 1.196 to 1.412; P=6.223E-10) and LDL cholesterol (OR=1.193, 1.097 to 1.296, P=3.564E-05) were associated with increasing AD risk. Genetically predicted FG was suggestively associated with increased AD risk. Furthermore, MR-BMA analysis also confirmed FG and years of schooling as two of the top five causal risk factors for AD. Conclusions: Our findings might provide us novel insights for treatment and intervention into the causal risk factors for AD or AD-related complex diseases. Methods: By using extension methods of Mendelian randomization (MR)--multivariable MR (mvMR) and MR based on Bayesian model averaging (MR-BMA), we intend to estimate the potential causal relationship between nine risk factors and AD outcome and try to prioritize the most causal risk factors for AD.

AGING factor on the outcome by using genetic instrumental variables (IVs). Previous two sample MR studies have demonstrated several potential causal risk factors for AD, such as lower education level and smoking [10]. Multivariable MR (mvMR) [11] is an extension of two sample MR approach that incorporates a set of pleiotropic SNPs [12] associated with several risk factors to simultaneously assess the causal effect of each risk factor on the outcome. In comparison with two sample MR, mvMR assumes the genetic IV is associated with at least one risk factor, although not necessarily all risk factors. Additionally, under the circumstances of horizontal pleiotropy, causal effect can be assessed even if none of the variants show specific associations with any individual risk factor [11]. Burgess et al. [13] successfully applied mvMR to estimate the causal effects of lipid fractions on cardiovascular artery disease. However, the current application of mvMR are not capable of feature ranking and selecting.
Recently, Zuber V et al. developed a novel approach [14] which combined multivariable MR with Bayesian model averaging (MR-BMA) that scales to high dimensional settings and can select biomarkers as causal risk factors for the disease of interest. Study demonstrated that the method can detect and prioritize true risk factors even when the multiple risk factors are highly correlated [14]. The MR-BMA approach has been successfully applied to prioritize the most likely causal metabolites for agerelated macular degeneration [14].
Of all the previously reported risk factors for AD, it remains unclear which are causal, and which may play the most pivotal role in disease susceptibility. In the current study we intend to integrate the mvMR and MR-BMA approach to identify and prioritize the most likely causal risk factors for AD, the risk factors included in the current study are body mass index (BMI), type 2 diabetes (T2D), high-density lipoprotein cholesterol (HDL cholesterol), low-density lipoprotein cholesterol (LDL cholesterol), total cholesterol, fasting glucose (FG), fasting insulin (FI), currently tobacco smoking, and years of schooling.

Genetic IVs selection and validation
Overall, we obtained 1235 LD-independent SNPs that achieved genome-wide significance for all the risk factors after implementing the pruning strategy previously described. Then those SNPs were extracted from AD dataset. After harmonizing the exposure and outcome datasets, there were 1159 SNPs remained for the MR analysis. The number of IVs included for each risk factor was demonstrated in Table 1, and detailed information for the characteristics of SNPs used for each risk factor was shown in Supplementary Table 1.

mvMR estimates results
Our standard MR approach showed that genetically increased years of schooling (OR = 0.674, 95% confidence interval (CI): 0.571-0.796, P = 3.337E-06) and elevated HDL cholesterol (OR = 0.761, 95% CI: 0.697-0.830, P = 6.940E-10) were significantly associated with decreasing risk of AD. We found a significant association between total cholesterol and AD, the odds ratio per genetically predicted 1 SD higher total cholesterol level was 1.300 (1.196 to 1.412; P = 6.223E-10) ( Table 2 and Figure 1). Elevated LDL cholesterol level was associated with increased ADD susceptibility (OR ranging from 1.097 to 1.296 per SD increment in genetically determined LDL cholesterol level, P < 3.564E-05). However, there was a suggestive association between FG and AD, one SD increase in FG was associated with 29.7% increase in AD risk (OR = 1.297, 95% CI: 1.013-1.661, P = 0.039). No association was observed between the other risk factors and AD, and for detailed information please find Figure 1 and Table 2.

Sensitivity analysis
Consistent with standard IVW results, MLM and weighted median results also showed a significant association between years of schooling and AD ( Table 2). In the analysis of weighted median, genetically determined increasing HDL cholesterol level was associated with decreased AD risk ( Table 2). Similar significant association was observed between total cholesterol with AD by MLM approach. Similar to MR main results, MLM approach also found a suggestive association between FG and AD. We still did detect any association between the rest risk factors and AD. Furthermore, MR Egger test suggested that there was no pleiotropic effect among the selected IVs for each risk factor (Table 3).
For the bi-directional MR analysis, 16 LD independent SNPs that reached genome-wide significance were selected as IVs for AD, then those SNPs were extracted from the nine outcomes individually. After data harmonization, number of valid IVs left for each outcome was demonstrated in Table 4. The results showed significant association between AD and BMI (OR ranging from 0.980 to 0.995, P = 0.002), and borderline association between AD and T2D (OR = 1.047, 95 CI: 1.000-1.096, P = 0.049, Table 4). And MR Egger intercept suggested no existence of pleiotropic among selected IVs. Besides, as a complementary approach, Steiger test results also showed that the variances explained in the exposures    were larger than that in the outcome (AD), and the causal direction turned out to be TRUE (Table 5).

MR-BMA estimates results
All  Table 6). The MR-BMA were partially consistent with mvMR results, MR-BMA method prioritized two of the significant risk factors (FG and years of schooling) in mvMR result.

DISCUSSION
In the present study, by performing mvMR and MR-BMA analysis together using summary statistics for AD and multiple risk factors, we successfully identified five  B) The best ten individual models according to their posterior probability (PP). causal risk factor (years of schooling, total cholesterol, HDL cholesterol, LDL cholesterol and FG) for AD and we also prioritized and ranked two of these five risk factors (FG and years of schooling) for AD, which might provide us novel insights into determine the causal risk factors for complex traits and diseases.

Individual models Risk factors combination
Our results are consistent with both previous traditional observational studies and two sample MR results which provide established evidence that educational attainment was associated with a reduced risk of AD [21][22][23].
Concentrations of genetically determined total cholesterol and LDL cholesterol showed positive associations with risks of AD, which is consistent with the known causal effect of them on AD risk from previous two sample MR study [24], and similar to our results, elevated HDL cholesterol level also showed causal association with decreased AD risk [24]. The suggestive association between FG and higher AD risk is also consistent with previous study that FG showed AGING suggestive association with AD risk in sensitivity analysis [24]. No observed causal associations between fasting insulin, BMI and AD risk are also consistent with previous two sample MR result [24].
Although our mvMR did not find any causal association between current tobacco smoking with risk of AD, current tobacco smoking became the top first risk factor in the MR-BMA model, which is partially in accordance with previous MR study [24].
However, the observed associations differ from previous study that no evidence of causal association between lipid profiles and AD risk after excluding the potential pleiotropic SNPs. In the current study, the independent SNPs (r 2 < 0.001) we included suggest no evidence of pleiotropic effect and our sensitivity analysis support the causal associations detected by the main MR analysis. One of the most interesting findings from the present study is the absence causal association of BMI and AD risk, however the bi-directional MR found causal association between AD and decreased BMI. This finding supports the hypothesis of reverse causation (the negative confounding effect of AD related weight loss) might be an explanation for the obesity paradox on AD risk [4].
There are several important strengths to note for the current study. Our MR analysis results may provide evidence of the causal role of five risk factors in the development of AD since the influence of traditional confounding factors in observational studies is minimized/eliminated. Since the alleles follow the principle of random distribution when forming gametes at meiosis, the causal effect of genotype on disease in MR studies will not be distorted by the confounding factors, a major limitation of traditional observational studies. By leveraging the summary statistics from the large available GWASs for multiple risk factors and AD, we were able to increase our discovery power. Furthermore, previous studies have shown that performing the MR analysis by using summary statistics data and by using individual-level data have similar efficiency [19]. Finally, the application of MR-BMA ranked the potential risk factors for AD, which could provide certain genetic evidence in disease prevention and curation.
However, there may also be some limitations. First, we included the mixed population data for lipid traits instead of European only, because the dataset for lipid we included in our analysis was the largest to date, and the individuals from other ethnicities only take up 4% (Table 1). Additionally, our MR results does not mean the potentially causal risk factors identified are playing truly causal roles in the AD susceptibility, we were trying to provide some novel insights into underlying mechanisms of the AD and hope to provide certain genetical evidence to disease prevention.

CONCLUSIONS
In conclusion, by combining mvMR and MR-BMA together, we successfully identified five potential causal risk factors for AD and we also ranked and prioritized two of them for AD, which might provide us novel insights into the causal mechanisms of AD. Our results demonstrate that by increasing the years of schooling and HDL cholesterol level, decreasing total cholesterol, LDL cholesterol and FG levels could decrease the risk of developing AD.

Genetic IVs selection and validation
Summary statistics for risk factor-associated SNPs were extracted from the large publicly available GWAS datasets to date performed by the corresponding Consortia in European populations (Table 1). For the implementation of mvMR, we selected SNPs that achieved genome-wide significance (p < 5 × 10 −8 ) in the GWAS datasets as for each risk factor as IVs. Effect estimates of these risk factorassociated SNPs on the risk of AD were assessed using the summary statistics of 74,046 European individuals for AD from The International Genomics of Alzheimer's Project (IGAP) Consortium [15]. The European samples from the 1000 genomes project reference panel were adopted to estimate linkage disequilibrium (LD) between chosen SNPs. When target SNPs were not available in the outcome study, we used proxy SNPs that were in high LD (r 2 > 0.8) with the SNPs of interest.
To ensure the SNPs used as IVs for risk factors are not in LD with each other, a vital assumption of MR, we calculated pairwise-LD between all our selected SNPs in the 1000 Genomes European reference sample using PLINK 1.90 [16]. For all pairs of SNPs determined to violate the independence assumption with r 2 > 0.001 we retained only the SNP with the smaller association p-value. To ensure the effect of a SNP on the exposure and the effect of that SNP on the outcome correspond to the same allele, we harmonized the effect of these instrumental SNPs by using a function that ensures all corresponding risk factors and outcome (AD) alleles are on the same strand where possible. If they are not, then the function will flip alleles and use allele frequency to infer the strand of palindromic SNPs.

AGING mvMR estimates
In the current study, standard inverse-varianceweighted (IVW) fixed-effects [17] analysis was used to estimate the causal effect of the multiple related risk factors on the BMD traits. After obtaining the selected instruments for each exposure, all exposures for those SNPs were then regressed against the outcome together, weighting for the inverse variance of the outcome to ensure the genetic instruments with more precise association receive more weight in our analysis.

Sensitivity analysis
As sensitivity analysis, weighted median estimator and maximum likelihood method (MLM) [18] were also performed to provide more robust MR estimates. MR-Egger approach [19] was also used to assess the potential pleiotropic effects among the selected IVs. A Bonferroni corrected threshold with P < 0.006 (0.05/9) was considered to be significant causal association, and 0.006 < P < 0.05 was considered suggestive evidence for causal association. Besides, to orient the causal relationship between them, we also performed the bi-directional MR analysis (P < 0.05) and MR Steiger directionality test [20].

MR-BMA estimates
Following the mvMR analysis, MR-BMA was applied to prioritize the most causally related risk factors for AD. MR-BMA assumes that the true causal risk factors are very few and it considers the risk factor selection as a variable selection problem in the linear regression model. The approach considers all possible combinations of the risk factors and generates posterior probability (PP) for each specific model, where PP means the probability of including a specific risk factor in the model. Furthermore, MR-BMA adopts BMA which computes a marginal inclusion probability (MIP) for each risk factor, where MIP refers to the sum of the PP over all possible models where the risk factor is present. Then MR-BMA will compute the model-averaged causal estimate (MACE) for each risk factor by ranking all the risk factors according to the corresponding MIP. Finally, MR-BMA will prioritize the best model by the PP value for each individual model. All the analyses were implemented in R software environment.

Availability of data and materials
The datasets generated and/or analyzed during the current study are included in this published article and provided in Supplementary

AUTHOR CONTRIBUTIONS
QZ as the first author performed data analysis and wrote the manuscript. FX, LW and WDZ contributed suggestions for manuscript revision and revised the manuscript. CQS and HWD conceived and initiated this project, provided advice on experimental design, oversaw the implementation of the statistical method, and revised/finalized the manuscript.

ACKNOWLEDGMENTS
We thank all the consortia (included in Supplementary Table 1) for providing the GWAS datasets available to the public.