Genetic overlap between Alzheimer's disease and blood lipid levels

Late-onset Alzheimer's disease (AD) has a significant genetic component, but the molecular mechanisms through which genetic risk factors contribute to AD pathogenesis are unclear. We screened for genetic sharing between AD and the blood levels of 615 metabolites to elucidate how the polygenic architecture of AD affects metabolomic profiles. We retrieved summary statistics from genome-wide association studies of AD and the metabolite blood levels and assessed for shared genetic etiology, using a polygenic risk score-based approach. For the blood levels of 31 metabolites, all of which were lipids, we identified and replicated genetic sharing with AD. We also found a positive genetic concordance - implying that genetic risk factors for AD are associated with higher blood levels - for 16 of the 31 replicated metabolites. In the brain, lipids and their intermediate metabolites have essential structural and functional roles, such as forming and dynamically regulating synaptic membranes. Our results imply that genetic risk factors for AD affect lipid levels, which may be leveraged to develop novel treatment strategies for AD.


Introduction
Alzheimer's disease (AD) is the most common neurodegenerative disease and causes over two-thirds of dementia cases ( Patterson, 2018 ). Histopathologically, AD is defined by the accumulation of two protein aggregates: extracellular amyloid-beta (A β) in the form of senile plaques and intracellular neurofibrillary tangles containing hyperphosphorylated tau protein ( Jack et al., 2018 ). AD patients have progressive memory loss and reduced cognitive capabilities until they require full-time healthcare. The drugs used for the treatment of AD temporarily relieve symptoms at best, and disease-modifying drugs that halt or reverse dis-Abbreviations: A β, amyloid-beta; NFT, neurofibrillary tangle; PRS-based, polygenic risk score-based; SECA, SNP effect concordance analysis.
E-mail address: geert.poelmans@radboudumc.nl (G. Poelmans). # Equal contribution ease progression are currently unavailable. The reason for this is that the molecular basis of AD pathogenesis is not fully understood, and elucidating these pathways may provide us with clues that aid in the search for novel therapeutic strategies for AD.
AD is a multifactorial disease caused by a complex interplay of genetic, environmental, and lifestyle factors ( Livingston et al., 2020 ). Genetic factors play a major role in AD, as supported by heritability estimates of around 60%-80% ( Bekris et al., 2010 ). The APOE -ε4 allele is the strongest common genetic risk factor for AD, but in contrast to rare autosomal dominant mutations that have been found in AD, it does not invariably cause disease ( Bekris et al., 2010 ). Increasing age is the strongest risk factor, and based on the age of disease onset, a distinction can be made between two types of AD: early-onset AD ( < 65 years) and late-onset AD (LOAD ≥65 years). Multiple genome-wide association studies (GWASs) have been performed in order to identify common single nucleotide polymorphisms (SNPs) that each contribute a small genetic risk to the development of LOAD. The most recent AD GWAS (metaanalysis) included the genetic data of more than 60 0,0 0 0 individuals and identified 29 significant AD risk loci ( Jansen et al., 2019 ).
The cumulative effect of these genetic risk variants, reflecting the polygenicity of AD, can be captured using a PRS-based approach ( Purcell et al., 2009 ). PRS of AD also predict case status in approximately 80% of AD patients in an independent sample ( Jansen et al., 2019 ). However, the precise molecular pathways affected by the polygenic architecture of AD remain unclear.
Several perturbed metabolic processes have been reported in AD, for example glucose and insulin signaling, lipid metabolism and oxidative stress ( Cai et al., 2012 ). Metabolites, the products of metabolism, can be studied and quantified with metabolomics ( Wilkins and Trushina, 2017 ). Metabolite changes that have been observed in AD are not limited to the brain itself but are observed in the cerebrospinal fluid and, more distally, in the blood as well, highlighting the systemic nature of the disease ( de Leeuw et al., 2017 ;Wilkins and Trushina, 2017 ). What makes metabolomics especially alluring is that it captures the effect of both genetic and environmental factors ( Holmes et al., 2008 ). Therefore, combining high-throughput metabolic profiling with genetic (i.e., GWAS) data provides valuable information about which, and to what extent, common genetic variation is associated with blood metabolite levels ( Shin et al., 2014 ).
In this study, we aimed to gain more insights into the risk factors contributing to LOAD by investigating the existence and extent of genetic sharing between AD and blood metabolite levels, using GWAS summary statistics. With a PRS-based approach ( Euesden et al., 2015 ), we performed an unbiased screen by determining the (extent of) shared genetic etiology between AD and the blood levels of 615 metabolites that belong to one of the following subgroups: lipids, amino acids, peptides, nucleotides, carbohydrates, cofactors and vitamins, energy-related molecules, and xenobiotics. Furthermore, we used SNP effect concordance analysis (SECA) ( Nyholt, 2014 ) to determine whether the identified genetic sharing between AD and specific metabolite levels reflects a positive or negative concordance, that is whether AD risk SNPs are associated with higher or lower metabolite levels.

GWAS summary statistics for PRS-based analyses
For PRS-based analyses (see below) in the discovery phase, we used the summary statistics from the most recent GWAS of AD (71,880 cases, 383,378 controls) as 'base sample' ( Jansen et al., 2019 ). In this GWAS study, AD cases were defined as patients clinically diagnosed with AD-type dementia and individuals with a parental history of AD ("AD-by-proxy"). AD-by-proxy showed strong genetic correlation with AD ( r g = 0.81). We assessed the extent of genetic overlap between AD and the blood levels of metabolites, using the summary statistics data from GWASs of blood metabolite levels that were performed in general population samples as 'target samples'. These metabolite level GWAS summary statistics were retrieved from four studies: Rhee et al. ( Rhee et al., 2013 ) (up to 1,802 participants), Shin et al. ( Shin et al., 2014 ) (up to 7,373 participants), Draisma et al. ( Draisma et al., 2015 ) (up to 7,476 participants), and Kettunen et al. ( Kettunen et al., 2016 ) (up to 24,925 participants). Metabolites with unknown function (n = 200) or duplicates (n = 138) were removed (for the latter, we retained the data from the study with the largest sample size), leaving 615 unique metabolites for PRS-based analyses. These metabolites belong to eight biochemical subgroups: lipids (n = 409), amino acids (n = 104), peptides (n = 26), nucleotides (n = 22), carbohydrates (n = 19), cofactors and vitamins (n = 13), energy-related molecules (n = 16) and xenobiotics (n = 6). For replication purposes, we used GWAS summary statistics from an independent AD cohort (1,798 cases (diagnosed with ICD-10 code G301), 72,206 healthy controls) obtained through FinnGen (finngen_r3_AD_LO_EXMORE). No sample overlap exists between the base and target samples.

PRS-based analyses
To determine the level of shared genetic etiology between AD and blood levels of the 615 unique metabolites, we first performed PRS-based analyses in PRSice v1, using the abovementioned GWAS summary statistics data for the discovery and replication 'base' and 'target' samples ( Euesden et al., 2015 ;Johnson, 2013 ). First, we performed clumping based on the p -values of SNPs in the 'base sample' to select the most significant SNP among correlated SNPs that are in linkage disequilibrium (LD, r2 > 0.25) within a window of 500 kb ( Bralten et al., 2018 ;Xicoy et al., 2021 ). PRSice then calculated summary-level PRS by regressing the weights of selected AD risk SNPs (based on their p-value in the AD GWAS) on to the calculated weighted multi-SNP risk scores of metabolite blood levels, using the gtx package implemented in PRSice ( Johnson, 2013 ). The PRS-based analyses were performed for all SNPs that exceed seven default p -value thresholds (PTs): 0.001, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5. Subsequently, a correction for multiple testing was performed using a Bonferroni significance threshold < 0.05 (i.e., p < 0.05/4305 tests ( = 7 PTs x 615 phenotypes) = 1.16 × 10 -5 ). The replication PRS-based analyses in the Finngen cohort were performed only for those metabolites that showed significant genetic sharing with AD in the screen that used the discovery GWAS of AD.
In addition, to determine whether the significant genetic overlap between AD and the blood metabolite levels that we identified in our screening of 615 metabolites is related to the major genetic risk factor for AD, APOE -ε4, we repeated the PRSbased analyses for the significantly replicated findings while excluding SNPs that define APOE -ε4 carrier status from the AD GWAS base sample ( Jansen et al., 2019 ). Specifically, we excluded the SNPs that define the ε 2, ε 3, and ε 4 alleles of APOE (rs429358, rs7412) and all SNPs in LD (r2 > 0.25) in a window of 500 kb ( Bralten et al., 2018 ;Xicoy et al., 2021 ) with rs429358 and rs7412, resulting in an exclusion of 34 SNPs in total (Table S5) ( Babenko et al., 2018 ). Again, a Bonferroni correction for multiple testing was performed.

SNP effect concordance analyses
Following the PRS-based screening and follow-up analyses, SNP Effect Concordance analysis (SECA) was performed to determine the direction of genetic overlap between AD and the blood levels of those metabolites that were replicated and survived correction for multiple testing ( Nyholt, 2014 ). Specifically, we used SECA to calculate empirical p-values for the concordance that is the agreement in the SNP effect direction across two phenotypes -between AD and the blood metabolite levels that emerged from the PRSbased analyses as having a significant shared genetic etiology. SECA p-values lower than the Bonferroni-corrected threshold accounting for the number of tests that we performed were considered significant.

Colocalization analysis
We then performed Bayesian colocalization analyses to examine whether the observed overlap between AD and metabolite blood levels was caused by shared, identical or by distinct genetic variants located in the same risk locus. For this, we used the COLOC package ( Giambartolomei et al., 2014 ) and Sum of Single Effects (SuSiE) regression framework ( Wallace, 2021 ), allowing for multiple (causal) genetic variants to be evaluated simultaneously. We selected the 31 metabolites of which the blood levels showed genetic sharing with AD in both the discovery ( Jansen et al., 2019 ) and replication (FinnGen cohort) analyses. Genetic risk regions were identified as genome-wide genetic risk loci for AD ( Jansen et al., 2019 ), using a 50kb window surrounding the lead SNP. LD structure of the genetic loci was determined using PLINK ( Purcell et al., 2007 ). COLOC uses a Bayesian framework to generate the posterior probability (PP) for five mutually exclusive hypotheses regarding the sharing of causal genetic variants between two traits (i.e., AD and metabolite blood levels): 0. No association with either AD or metabolite level (PP0), 1. Association with AD only (PP1), 2. Association with metabolite blood level only (PP2), 3. Association with AD and metabolite blood level from distinct causal variants (PP3), and 4. shared causal variant between AD and metabolite blood level (PP4). We defined colocalization evidence as supportive for a shared causal variant (PP4) or distinct causal variant (PP3) between AD and metabolite blood levels when the PP3 or PP4 was higher than the other posterior probabilities.

PRS-based analyses
We identified and replicated genetic sharing between AD and the blood levels of 31 out of the 615 screened metabolites (5.0%) for at least one of the seven used p -value thresholds (P T s) after correcting for multiple testing ( Table 1 , Fig. 1 ). A complete overview of the PRS-based analyses for all blood metabolite levels tested in the discovery phase is provided in Table S1. All 31 metabolites for which we identified and replicated a significant shared genetic etiology with AD were lipid-related, that is lipoproteins (n = 26), fatty acyls (n = 2) and sterols (n = 3). For these metabolites, genetic variants associated with AD also explained 0.1%-0.4% and 0.2%-0.9% of the variation in metabolite blood levels, for the discovery and replication AD GWAS, respectively.
After excluding SNPs that define APOE -ε4 status, genetic sharing between AD and 24 out of the 31 metabolites remained significant after Bonferroni correction (i.e., p < 0.05/280 tests ( = 7 P T s x 40 phenotypes) = 1.79 × 10 -4 ), with omega-6 fatty acids, the concentration of very small VLDL particles, total lipids in very small VLDL, free cholesterol in small VLDL, esterified cholesterol, free cholesterol and, triglycerides in IDL falling below the significance threshold ( Table 1 , Table S1). For all PRS-based analyses involving the replicated metabolites, the most significant P T was the lowest that we tested, that is P T = 0.001. To further explore the underlying genetic architecture, we repeated the PRS-based analyses for the 31 replicated metabolites in the discovery GWAS with a range of additional and (much) lower P T s, ranging from P T = 5.00 × 10 -4 to 5.00 × 10 -100. Twenty-eight of the 31 metabolites showed the strongest genetic overlap with AD at a P T of 5.00 × 10 -9 (Table S2, Table  S3), suggesting that genome-wide significant SNPs drive most of the genetic sharing reported in this study.

SECA analyses
SECA analyses showed a significant genetic concordance with AD for 16 of the 31 metabolites that emerged from our screening (51.6%) ( Table S4). All of these 16 metabolites showed a positive concordance with AD, indicating that genetic variants associated with AD also associate with increased blood levels of these metabolites. The metabolites with a positive concordance included lipoproteins (n = 12), sterols (n = 3), and fatty acyls (n = 1).

Colocalization analyses
We performed colocalization analyses to assess whether identical or different causal variants underlie the genetic overlap between AD and the blood levels of the 31 metabolites. We found that variants from six AD risk loci were also associated at genomewide significant level ( p < 5.00 × 10 -8 ) with the blood levels of at least one metabolite, that is the risk loci containing HLA-DRB1 (chromosome 6), ADAM10 (chromosome 15), KAT8 (chromosome 16), SCIMP (chromosome 17), APOE (chromosome 19), and AC074212.3 (chromosome 19). The colocalization analyses supported model PP3 in most cases. In this model, distinct genetic variants located in the same locus are associated with AD and metabolite blood levels. For two genetic variants, we observed colocalization between AD and metabolite levels (PP4 range: 0.99-1.00): rs1464110675, an intronic SNP in KAT8 , associated with AD and blood levels of cholesterol esters in large LDL, and rs157580, an intronic SNP in TOMM40 , associated with AD as well as blood levels of free cholesterol in small VLDL, total lipids in very small VLDL, very small VLDL particles, triglycerides in IDL, and blood levels of linoleate and omega 6 fatty acids (Table S6).

Discussion
In this paper, we used a PRS-based screening approach to examine to what extent common genetic risk factors for AD associate with blood metabolite levels. We identified and replicated significant genetic sharing between AD and the blood levels of 31 metabolites. After excluding APOE -ε4 defining SNPs, the blood levels of 24 metabolites still showed genetic overlap with AD. Strikingly, all the metabolites that we identified and replicated are involved in lipid metabolism. In addition, for all 16 replicated metabolites for which we found a significant genetic concordance, there was a positive concordance, that is genetic variants associated with AD also associate with increased blood lipid metabolite levels.
Lipids take up 50 % of the brain's dry weight, and behind adipose tissue, the brain is the second most lipid-rich tissue of the body ( Bruce et al., 2017 ). Lipids can either be synthesized and metabolized in the brain locally, or they are transported into the brain from the periphery, that is the systemic circulation ( Bruce et al., 2017 ). Observational studies have identified peripheral disturbances in lipid levels (including type 2 diabetes-associated dyslipidemia) as a potentially modifiable target for AD development ( Kivipelto et al., 2018 ;Reitz, 2013 ;Tang et al., 2019 ). Targeted polygenic studies based on these observations have reported genetic sharing between AD and blood levels of specific lipids such as triglycerides and low-density lipoproteins ( Desikan et al., 2015 ). Risk variants in multiple AD candidate genes have also been linked to lipid homeostasis and metabolism ( El Gaamouch et al., 2016 ;Kao et al., 2020 ).
As for our specific findings, we mainly identified genetic sharing between AD and the blood levels of several lipoproteins. In the blood, cholesterol and triglycerides are mostly transported within these lipoproteins, that is lipid-protein-complexes that also contain phospholipids, and that vary in size and specific lipidbinding protein composition to ensure solubility ( Feingold, 20 0 0 ). Ranging from large to small in size, lipoproteins are classified as chylomicrons, very-low-density lipoproteins (VLDL), intermediatedensity lipoproteins (IDL), low-density lipoproteins (LDL), and high-density lipoproteins (HDL) ( Feingold, 20 0 0 ). Based on size, these lipoprotein classes can be further subdivided into very large, large, medium, small, and very small VLDL, IDL, LDL or HDL. In the brain, lipoproteins redistribute cholesterol and other lipids to neurons and other brain cells for repairing and remodeling Metabolites that show significant overlap in both the discovery ( Jansen 2019 ) and the replication (FinnGen) samples are indicated in bold (n = 31). p -values and variance explained (R 2 ) for the optimal SNP threshold (P T ) in the discovery GWAS ( Jansen 2019 ) are presented. Analyses excluding ApoE -ε4 defining SNPs (Excl. ApoE-ε4) and concordance analyses were only performed for replicated metabolites. For metabolites with a significant concordance (i.e., P-Bonferroni < 0.05 from SECA analysis) the direction of the association has been depicted by ' + ' (i.e., positive association), '-' (i.e., negative association) or ' = ' (i.e., direction of association unknown). Key: AD, Alzheimer's disease; ApoE, Apolipoprotein E; IDL, intermediate-density lipoproteins; LDL, low-density lipoproteins; PC ae, phosphatidylcholine acyl-alkyl; PC aa, Phosphatidylcholine diacyl; NS, not significant; NT, not tested; SNP, single nucleotide polymorphism; VLDL, very low-density lipoproteins.

Fig. 1.
Heatmap representing the shared genetic etiology between AD and blood levels of metabolites. Metabolites with at least one significant association were selected in the heatmap. Lipid results are classified by lipid species. Significant associations (i.e., p < 0.05/4305 tests (7 P T s thresholds x 615 blood metabolite levels) = 1.16 × 10 -5 ) are depicted with an asterisk ( * ). For metabolites with a significant concordance (i.e., p -Bonferroni < 0.05 from SECA analysis) the direction of the association has been depicted by ' + ' (i.e., positive association), '-' (i.e., negative association) or ' = ' (i.e., direction of association unknown). Abbreviations: AD, Alzheimer's disease; SNP, single nucleotide polymorphism; PC aa, phosphatidylcholine diacyl; PC ae, phosphatidylcholine acyl-alkyl; IDL, intermediatedensity lipoproteins; LDL, low-density lipoproteins; VLDL, very low-density lipoproteins. The list of abbreviations for the metabolites is included in Table 1 . membranes, organelle biogenesis, and synaptogenesis (and hence maintaining neuronal plasticity) ( Mahley, 2016 ). Interestingly, APOE, a major protein component of HDL particles, is primarily accountable for these processes in the brain ( Mahley, 2016 ), and higher blood levels of HDL have been associated with a lower dementia risk ( Button et al., 2019 ). In contrast, increased cholesterol in VLDL/IDL and especially LDL particles in the blood is an AD risk factor, even after correcting for APOE -ε4 status ( Lee et al., 2019 ;Shepardson et al., 2011 ;Wingo et al., 2019 ). Our results show that free, esterified and total blood cholesterol are in positive concordance with AD, that is genetic risk factors associated with AD are also associated with higher cholesterol levels in the blood. Blood levels of cholesterol in VLDL, IDL, and LDL are in positive concordance as well. Furthermore, we identified genetic sharing between AD and blood levels of APOB, the major protein component of LDL particles that is encoded by a gene in which rare coding variants have been identified in early-onset AD ( Wingo et al., 2019 ). Taken together, our findings are in line with the literature on AD and lipoprotein levels, adding a shared genetic etiology to these epidemiological observations. Increased blood levels of lipoproteins, especially LDL, are associated with cardiovascular disease. In this respect, although there may be a limited direct role of peripheral lipoproteins on CNS metabolism, we could speculate that the identified genetic overlap between AD and increased blood lipoprotein levels would initially result in cardiovascular disease that over time could also affect the vasculature and blood flow towards the brain and subsequently AD development and progression, as supported by the vascular AD hypothesis ( Helzner et al., 2009 ;Stampfer, 2006 ). We also found genetic sharing between AD and the blood levels of two fatty acyls, that is omega-6 fatty acids (omega-6-FAs, for which we identified a positive concordance) and linoleic acid (LA) (no concordance). In AD, unsaturated fatty acid metabolism is significantly dysregulated, and LA was found to be specifically decreased in AD brains ( Snowden et al., 2017 ). Furthermore, lower levels of LA in the brain correlate with Braak (neurofibrillary tau tangles) and CERAD (neuritic A β plaques) stages in the inferior frontal and middle frontal gyri, but not in the cerebellum of AD patients ( Snowden et al., 2017 ). LA is a precursor of omega-6-FAs. The omega-6-FA arachidonic acid (AA) is a mediator of inflammatory pathways and induces A β and tau polymerization in vivo and in vitro , which can be alleviated by blocking the conversion of LA to AA ( Amtul et al., 2012 ). Lastly, our colocalization analyses revealed that a genetic variant in KAT8 is associated at genome-wide significant level with both AD and the blood levels of cholesterol esters in large LDL, which is interesting as the encoded protein KAT8 was found to be involved in regulating fatty acid synthesis ( Lin et al., 2016 ). Furthermore, we found that a variant in TOMM40 -a gene in the APOE locus -is associated with AD and the blood levels of multiple lipid metabolites, which is in keeping with TOMM40 variants having been linked to dyslipidemia ( Abe et al., 2015 ;Miao et al., 2018 ).
This study has some limitations. First, a shared genetic etiology does not necessarily mean higher or lower metabolite blood levels are causative of AD. Similarly, as already indicated, blood metabolite levels may not reflect what is happening in the brain directly, which warrants further investigation using cerebrospinal fluid and post-mortem brain samples. While multiple lines of evidence support a role for (peripheral) disturbances in lipid levels in AD pathogenesis, it is unknown how this dyslipidemia contributes to AD onset and progression.
In summary, we report novel genetic sharing between AD and blood lipid levels, in addition to confirming previous findings. In the brain, lipids and their intermediate metabolites have essential structural and functional roles, forming and dynamically regulating synaptic membranes, in addition to their roles in energy storage. As we report a genetic base for the overlap between AD and disturbed lipid levels, therapeutic approaches aimed at adjusting these lipid levels may be beneficial in AD treatment and prevention. However, future studies will be needed to identify those patients who may benefit most from these approaches, through using combined lipid and genomic profiling.

Verification
All authors on this paper have reviewed and approved the contents of this manuscript and meet the requirement for authorship. This manuscript is currently not under review at any other publication. Funding sources had no role in design and conduct of the study, data collection, data analysis, data interpretation, or in writing or approval of this report.

Disclosure statement
GP is director of DrugTarget ID., Ltd. (the Netherlands). The other authors have no competing interests to declare.