Multiomics integrative analysis identifies APOE allele-specific blood biomarkers associated to Alzheimer’s disease etiopathogenesis

Alzheimer’s disease (AD) is the most common form of dementia, currently affecting 35 million people worldwide. Apolipoprotein E (APOE) ε4 allele is the major risk factor for sporadic, late-onset AD (LOAD), which comprises over 95% of AD cases, increasing the risk of AD 4-12 fold. Despite this, the role of APOE in AD pathogenesis is still a mystery. Aiming for a better understanding of APOE-specific effects, the ADAPTED consortium analysed and integrated publicly available data of multiple OMICS technologies from both plasma and brain stratified by APOE haplotype (APOE2, APOE3 and APOE4). Combining genome-wide association studies (GWAS) with differential mRNA and protein expression analyses and single-nuclei transcriptomics, we identified genes and pathways contributing to AD in both APOE dependent and independent fashion. Interestingly, we characterised a set of biomarkers showing plasma and brain consistent protein profiles and opposite trends in APOE2 and APOE4 AD cases that could constitute screening tools for a disease that lacks specific blood biomarkers. Beside the identification of APOE-specific signatures, our findings advocate that this novel approach, based on the concordance across OMIC layers and tissues, is an effective strategy for overcoming the limitations of often underpowered single-OMICS studies.


INTRODUCTION
Non-Mendelian Alzheimer's disease (AD) has become the paradigm of a complex disease for which a major genetic determinant is known, the APOE locus. Three linkage studies published in 1993 pointed to the APOE region at 19q13 as a risk locus for late onset familial AD [1,2], and even common sporadic late-onset AD (LOAD) [3]. Shortly after, researchers around the world confirmed the association of APOE gene with diverse forms of the disease and its association with other dementias.
The APOE gene encodes a lipoprotein firstly identified in the 1970s among patients with familial hypercholesterolemia type III [4,5]. The protein has three major isoforms depending on the combination of two polymorphisms located at positions 112 (rs429358 (C > T)) and 158 (rs7412 (C > T). The most common isoform, APOE3, has a cysteine at position 112 and an arginine at position 158, whereas APOE2, the least common isoform, has a cysteine at both positions, and the AD risk allele APOE4 has an arginine at both positions [6][7][8]. These aminoacidic substitutions result in a conformational change that brings together the Nterminal and C-terminal domains in APOE4, which are normally separated in APOE2 and APOE3 isoforms. The consequences in downstream signaling of this conformational shift in the APOE4 isoform are still unknown. In fact, it is not even clear if the APOE4 is a gain or loss of function mutation despite extensive research in the field [9]. What is already known is that having a single APOE4 allele increases risk 2-to 4-fold and having two APOE4 alleles increases risk about 8-to 12-fold, although risk varies according to genetic background and sex [10].
One aim of the ADAPTED consortium is to identify specific APOE signatures associated with the different APOE isoforms. We describe for the first time herewith a comprehensive integration of genomic, transcriptomic and proteomic data stratified by the three major APOE haplotypes.

GWAS data: SNP-level analysis
The combined analysis of the three stages (stage I+II+III) (Figure 1 and Supplementary Tables 1-3 and Supplementary Figures 1-3), identified genome-wide significant signals (p<5x10 -8 ) for APOE, BIN1, CLU, CNTNAP2 and PICALM in the APOE4 stratum; suggestive signals (p<10 -5 ) in this analysis include a 1.4Mb intergenic region on 4p15. (from 33.3Mb to 34.7 Mb, hg19) with lowest p value for the SNP rs12641122 (p=6.28x10 -7 ), a 4.5Kb intergenic region on 4q35.2 or the KCNQ3 gene among others. In the APOE3 stratum, ABCA7, BIN1 and PICALM passed the genome-wide significance threshold, with suggestive signals for the HLA-DQ/HLA-DR loci, CTNND2, FBN1, WLS or CSTF1 genes among others. By contrast, no genome wide significant SNPs were found in the APOE2 stratum, nor any known AD gene among suggestive signals.
An additional validation of stage I+II+III findings was performed using the EADI population (stage IV), where only the APOE locus in the APOE4 stratum reached the genome wide significance threshold (Supplementary  AGING Sex stratified meta-analysis (Supplementary Tables 7, 8 and Supplementary Figure 6) identified genome-wide significant signals for BIN1 and APOE as well as suggestive signals for PICALM, MYLK, SOX5 and SCEL in the female population. By contrast in males, only suggestive signals for BIN1, APOE, ZCCHC2, the ABI3BP/IMPG2 locus, ESRRB and the 19q13.4 leukocyte receptor cluster were identified. Stratified analysis by sex and APOE (Supplementary Tables 9-14 and Supplementary Figure 7), yielded genome-wide significant signals for APOE in the APOE4 stratum in both sexes and for a 400kb 13q31.3 intergenic region containing the Ubiquitin Specific Peptidase 7 (Herpes Virus-Associated) (USP7) pseudogene (RP11-464I4.1) for APOE3 males. Among APOE4 males, we found association with AD for a large region of 1.9Mb on 3q12.1 comprising the genes CMSS1, COL8A1, FILIP1L, MIR548G, RPL24 and, in females, a 1.5Mb region on 2q33.2 comprising the ABI3 homologue ABI2, CARF, CYP20A1, FAM117B, FZD7, ICA1L, NBEAL1, RAPH1 and WDR12 genes.

GWAS data: gene-level analysis
Genetic marker-level results were summarized into a single measure of association for each gene. Then, association results from the combined stage I, II and III meta-analysis were used to estimate gene-wide statistics for all genes in each one of the three APOE strata (Supplementary Tables 15-17). Per stratum, genes were ranked in ascending order according to lowest to highest p values derived from the mean χ2 statistics implemented in MAGMA (Table 1). Among previously reported AD genes, APOE was the highest ranked in both APOE2 and APOE4 carriers (ranks 26 and 3 respectively), whereas BIN1 was ranked first in the APOE3 stratum (Supplementary Table 18). Known AD genes were ranked worst in the APOE2 stratum than in the others, with the complement receptor 1 (CR1), ranked in position 1292, being the second most relevant of these genes among APOE2 carriers after APOE.

Differential expression analysis
Blood APOE stratified DE meta-analysis between AD cases and controls (Supplementary Tables 19-21) included the ADNI and ADDN datasets. In the APOE2 stratum we identified only two upregulated (ISY1 and SRF) and two downregulated (CPT1A, PLCD1) genes below the FDR <0.05 threshold, clearly differing from expression profiles in APOE3 and APOE4 carriers ( Figure 2, top 100 genes from each stratum). By contrast, APOE3 and APOE4 stratified analyses identified 1,692 and 3,293 DE genes respectively. Among genes differentially expressed in APOE4 cases versus controls we observed an over-representation of mitochondrial genes, most of them involved in the oxidative phosphorylation pathway. However, several genes from this pathway were differentially expressed in all strata but with opposite expression profiles, such as the electron transport chain genes ATP5F1, UQCRB or NDUFB3 upregulated in APOE2 cases but downregulated in APOE4 cases when compared to controls of the same haplotype. APOE3 genes were mainly cytoplasmatic genes involved in RNA metabolism.
Cortex APOE-stratified DE included the MAYO, ROSMAP, MSBB, GSE15222 and GSE48350 studies. Meta-analysis of cortex datasets resulted in 518, 7714 and 1717 statistically significant genes (FDR<0.05) for the APOE2, APOE3 and APOE4 strata respectively (Supplementary Tables 22-24). As opposed to blood analyses, the overall picture is of enhanced gene expression in AD in all strata, but more pronounced in APOE2 except for XIST, strongly downregulated in AD APOE2 subjects ( Figure 3, top 100 genes from each stratum). The heparan sulfate proteoglycan CD44 and the heparan sulfate lysosomal degradation enzyme IDS encoding genes were differentially expressed in all strata, with CD44 strongly upregulated in APOE2 cases and IDS downregulated in APOE4 cases. APOE2 specific genes were mostly nuclear genes involved in primary metabolic processes, as well as some apoptosis related genes (CFLAR, ATM, MCL1, AKT3 and CTSZ), all of them downregulated in AD cases but CTSZ, with higher expression in AD cases than in controls. APOE3 and APOE4 candidate genes were mainly expressed in the cytoplasm. In all strata, we identified genes involved in neuronal development (such as GFAP, BDNF or CDC42), especially in the APOE3 stratum. For both APOE2 and APOE4 strata, genes involved in vesicle mediated transport were identified, with key genes such as PCSK1, SYTL2 or SVOP downregulated in APOE4 cases.

Robust rank aggregation analysis
Integrative analysis was performed independently to include either blood or cortex APOE-stratified DE rankings. Thus, we combined meta-GWAS stage I-III gene-level results with blood meta-GWES results (Supplementary Tables 25-27) or with  cortex meta-GWES results respectively (Supplementary  Tables 28-30).
In cortex, we found 376, 399 and 366 significant genes (FDR<0.05) for the APOE2, APOE3 and APOE4 strata respectively. Seven common AD candidate genes (APOC1, APOC2, CD44, CDC42, CLPTM1, DST, PGM2L1) were significant in all three strata; of them, APOC1 and CLPTM1 were also associated in the three strata in blood. In this analysis, the shortest list of APOE-specific genes was found in the APOE4 stratum, which showed the largest overlap with the APOE3 stratum; among the 45 significant genes shared by these strata, we identified several AD genes, including BIN1, MS4A4A, MS4A6A, PICALM and RIN3 in accordance with blood results. The CR1 gene is included among the 19 top genes from the APOE2 and APOE3 strata (ATPIF1, CACNB2, CDC27, CFLAR, COX15, CR1,  DCLK1, GOSR2, KANSL1, KLF12, MAPT, MCL1,  NSF, POGK, RUFY3, SCD5, SORBS1, SPEN, TTN), whereas APOE, TOMM40, SLC24A4 or WWOX were included among common genes for the APOE2 and APOE4 strata (21 genes: AHNAK, APOE, ARNT,  CRTAP, FBXL16, GART, KALRN, KAT6A, MTMR11,  OPA1, PDLIM5, PPFIA1, PURA, RBMS2, SLC24A4,   SRGAP1, TOMM40, TSPAN14, UBE2F, WWOX,  ZNF264). Enrichment analysis also identified both common and exclusive pathways. Common pathways for AD irrespective of the APOE haplotype were related to adhesion, neuronal development, differentiation, and lipoprotein metabolism; diverse signals related to neuronal death are also present in all three strata. Again, we observed larger overlap between APOE3 and APOE4 pathways (glial cell differentiation and activation, immunological, lipid metabolism, cardiovascular system development and heart function) than for APOE2 and APOE3 (which includes axonogenesis) or APOE2 and APOE4 (mainly phospholipid and lipoprotein metabolism due to APOE, APOC1 and APOC2 genes). (Supplementary Tables 34-36 and Supplementary Figure 9). We observed that APOE2 exclusive pathways include chromatin regulation and telomere maintenance related processes. The APOE3 strata showed the largest number of significant enrichments, but most of them showed a similar annotation in APOE2, or, more frequently, in APOE4 strata, with the exception of antigen processing and presentation, IFNG signalling, astrocyte development and activation and myelin sheath. In APOE4 macrophage activation, fructose metabolism, vitamin D mediated inflammation, inositol phosphate metabolism and cholesterol efflux were the most relevant pathways. Clathrin vesicles, amyloid biology, inflammatory and immune response and glial cell development and differentiation appear as the most relevant categories shared by APOE3 and APOE4 strata.
To identify relevant candidate blood biomarkers tracking brain changes in AD pathology we compared blood and cortex analyses ( Figure 4). We identified 68 genes in common for the APOE4 stratum, including CLU, CD2AP, IL6, MS4A2, SLC25A1 or INNPP4A ( Figure 4C). In APOE3, 76 common genes were found, including APP, AQP9, ATPAF, CD209, LILRA5, NDUFB3 or PTK2B ( Figure 4D). Finally, in the APOE2 stratum, we identified 84 common genes including several ABC receptors (ABCA9, ABCB1, ABCB4, ABCD4), solute carrier molecules (SLC25A3, SLC25A4, SLC35E1, SLC9A9), TLR9 or IL4I1 ( Figure 4B). Overlap between APOE strata-specific pathways from blood and cortex showed 6 common pathways for APOE2 ( Figure 4F), half of them related to chromatin regulation, 24 common pathways for APOE3 (secretion, regulation of supramolecular fiber organization, site of polarized growth and leukocyte activation involved in inflammatory response, Figure 4G) and 18 shared pathways for APOE4, including clathrin coated vesicles, amyloid-beta processing, mitochondrial transmembrane transport, macrophage activation and monosaccharides and fructose metabolism ( Figure 4H). We followed up genes with concordant profiles in both AGING blood and cortex (upregulated or downregulated in AD cases vs controls) and showing opposite profiles in APOE2 and APOE4, which included 34 genes with overrepresentation of the gluconeogenesis and fructose metabolic pathways (FBP1, FBP2, SLC25A1) ( Figure  5A). When compared with average expression in normal brains, FBP1, FBP2, RHOH, JPH2, ERAp2 and SCLT1 were upregulated in APOE4 cases when they are usually expressed at low levels, whereas, SNX3 and SUB1, were downregulated in APOE4 cases when they are expressed at very high levels in the normal brain according to GTEx ( Figure 5B).

Validation on proteomic datasets
We aimed at investigating if any of our candidate genes were detected and differentially expressed at the proteomic level using blood proteomics data from the ADDN study (931 proteins) and cortex proteomics from four independent datasets (BANNER, BLSA, MAYO and MSBB, 2,658 proteins).
Out of 737 RRA blood candidates, only 38 were present in the ADDN blood proteomic data (Supplementary Table  37 and Supplementary Figure 10). Among them, DE analyses between cases with controls, either overall or stratified by APOE haplotype, identified 8 differentially expressed genes in the unstratified analysis, 8 genes in the APOE4 stratum and 9 in the APOE3 stratum. We could confirm APOE allele-specific effects identified in the RRA analysis for the immune related proteins AIF1, METAP2, NCK1, PRDX1, PRKCZ, RPS27A in the APOE3 stratum, and FCGR2B and SEZ6L2 (involved in SNC development) in the APOE4 stratum. Overall, among these 38 RRA candidates, we identified a cluster of 11 overexpressed proteins in AD cases when compared to controls in the APOE3 stratum, but downregulated APOE4 AD cases including AIF1, APP, GDI2, HSP90AA1, METAP2, NACA, NCK1, PRDX1, RPS27A, SFTPD and UFC1 (Supplementary Figure 11); immunological functions associated to these proteins include leukocyte activation (APP, PRDX1, GDI2), Tolllike receptors (TLRs) cascade (APP, RPS27A, SFTPD) or phagocytosis (NCK1, HSP90AA1, SFTPD, AIF1) in line with our RRA findings.
In cortex, 234 out of 1,039 RRA candidates were present in the proteomics DE meta-analysis, 100 of them showing evidences of association (p<0.05) in at least one stratum or in the unstratified analysis (Supplementary Table 38 and Supplementary Figure  12). Of note, the largest differences between cases and controls were observed among APOE4 carriers, confirming at the proteomic level the role of APOE4 AGING RRA candidates involved in neurogenesis (DPYSL4, EHD1, GABRB3, MAPK8, UNC13A), or more specifically, in glial cell differentiation (CLU, GAP43, GFAP, GSN). Among APOE3 candidates, we confirmed candidates involved in neurotransmission such as RPH3A PTK2B, ALDH5A1, GABRA2 and APP (the later upregulated in all strata) and genes from the electron transport chain (ALDH5A1, NDUFA7, NDUFB3). Confirmed APOE2 candidates included the choline transporter SLC44A1, involved in myelin production, and the myelin basic protein MBP; MAPT was upregulated in all strata but particularly in the APOE2 stratum. We also confirmed the role of CDC42 and DST in all the strata, but we did not observe association of CD44 and PGM2L1 with AD in this analysis.

Cell-type-specific expression profiles: cortex snRNAseq
Since the enrichment analysis showed an overrepresentation of neuronal development related pathways in all strata, and of cells from the glial lineage in the APOE3 and APOE4 strata, we investigated which cerebral cell types our cortex RRA candidates were mainly expressed in, and which cell types showed largest differences between cases and controls using snRNAseq from the ROSMAP study ( Figure 6). We dropped pericytes and endothelial cells from the differential expression analysis because of the low number of cells (≈100 cells, <0.3%).
APOE gene was mainly expressed in astrocytes and microglia ( Figure 6). According to previous results, APOE is upregulated in microglia from AD subjects when compared with controls (overall and stratified by APOE genotypes). By contrast, in astrocytes we found higher APOE expression levels in cases than controls in the APOE3 stratum (logFC=0.34, p=1.56x10 -4 ), but significant lower expression in APOE4 cases than in controls (logFC=-0.14, p=1.83x10 -2 , pinteraction<10 -5 ).
As reported in the original article [19], most neuronal genes were strongly differently expressed in AD cases versus controls. Furthermore, our analysis found this result was consistent irrespective of the APOE haplotype. Given that glial specific signals arose from APOE3 and APOE4 strata, we therefore primarily focused on RRA cortex candidates showing evidence of association with AD in any glial cell type (astrocytes, microglia, oligodendrocytes and oligodendrocyte precursors) within the same APOE stratum ( Figure 6 and Supplementary Table 39 and Supplementary  Figures 13-16). In fact, RRA candidates were mainly expressed in the glial lineage, showing a lineal decrease in expression from APOE2 to APOE4 in the astrocyte and microglia populations, and an increase in expression in the oligodendrocyte subpopulation ( Figure 6). The seven genes in common in all the RRA analyses, were downregulated in all cell types except for APOE and APOC1 in microglia, and CD44 in astrocytes ( Figure 6). In the stratified analysis, these 7 genes were predominantly downregulated in AD APOE3 carriers and upregulated among APOE4 AD cases when compared to controls, particularly APOC1, DST and CD44 (Supplementary Figure 9). By stratum, APOE2 RRA cortex candidates were mostly upregulated in all cell types (Supplementary Figure 10), and in particular FXR1 and DNAJB1, the latter only downregulated in microglia. APOE3 RRA candidates showed the largest differences between cases and controls in microglia cells, where APOC1, ALDOA, RPLP0 and DYNLRB1 were strongly upregulated whereas ARL17B was downregulated in AD cases Supplementary Figure 11). Almost all APOE4 candidate genes were downregulated in both excitatory and inhibitory neurons and upregulated in the glial lineage, particularly TMEM163 and CPM in microglia and GFAP, PLCE1, CLU, CALN1, DLG2 and PDE5A in astrocytes Supplementary Figure 12); we also observed a strong downregulation of the Serine/Threonine Kinase 17b (STK17B), involved in apoptosis and autophagy, in microglial cells of APOE4 cases.

DISCUSSION
The ADAPTED consortium has performed a holistic approach analyzing and integrating diverse data sets from different OMICS technologies, including genomics, transcriptomics (bulk tissue and single cell) and proteomics collected from public repositories and other consortia, resulting in nearly sixty thousand samples analyzed. The novelty of our strategy relies on the use of a stratified approach for the three major APOE haplotypes, and the integration of these signals with a ranked-based algorithm which accommodates different kind of data, resulting in replicated signals at different levels. These signals have been further explored at the single-cell level, pointing to key cellular types for AD. Previous attempts for integrating different OMICS in AD were mainly focused on the identification of quantitative trait loci (QTLs) for mRNA levels, protein levels or epigenomic signatures by means of association analyses [20][21][22][23], in some cases stratified by APOE allele [24]. Other approaches involved the independent analysis of the different OMICS and selection of concordant genes [25] or the combination of human GWAS data with mouse transcriptomics [26]. Potential limitations of our study include reduced sample size in some of the datasets, especially for the APOE2 stratum, and the use of unsigned methods (i.e. irrespective of the directionality AGING AGING of the expression profiles) for selecting candidate genes in expression datasets.
At the genome level, we were able to detect genomewide significant signals for ABCA7, BIN1 and PICALM in the APOE3 stratum and for APOE, BIN1, CLU and PICALM in the APOE4 stratum. We identified a novel candidate region for APOE4 carriers on 4p15.1 (33.6Mb-34.3Mb), which, according to the GWAS catalogue (https://www.ebi.ac.uk/gwas/) has not been previously associated with AD, but with schizophrenia, total cholesterol change in response to fenofibrate in statin-treated type 2 diabetes, and PCSK9 levels, a protease that binds to lipoprotein receptors promoting their degradation; a homozygous deletion overlapping this region has been described for the offspring of a consanguineous marriage between first cousins, with cognitive impairment and autistic-like behavior [27]. Sex-stratified analysis identified genome wide significant signals for APOE and BIN1 only in females; this result is in agreement with the recent report from Fan et al., who described a genome-wide significant association for BIN1 only in females [28]. Further stratification of male and female populations by APOE haplotype identified a genome-wide significant intergenic region on 13q31.3 among APOE3 males. This region has been associated with TREM2 levels, circulating Interleukin-1-receptor antagonist levels and triglyceride change in response to fenofibrate in statintreated type 2 diabetes. This region harbors a USP7 pseudogene (RP11-464I4.1) associated with herpesvirus. Interestingly, a potential role of herpes simplex virus infection in AD has recently been object of intense debate [29]. Despite the number of GWAS datasets collected, our study is still underpowered for detecting genuine APOE strata-specific signals with low effect sizes, but resulting gene-level statistics were instrumental to select those DE signals that better correlate with the disease at genetic level. This helps maximize high probable loci involved in the fundamental pathways involved in disease pathogenesis.
The genome-wide expression analysis was performed at two levels: blood and brain cortex. In blood, mitochondrial ribosomal genes and as well as those encoding proteins of the respiratory chain appeared downregulated in cases irrespectively of the APOE haplotype, but more pronounced among APOE4 carriers. Mitochondria are crucial players of energy metabolism but are also the main source of Reactive Oxygen Species (ROS). Mitochondrial dysfunction has been proposed as the primary process triggering all the cascade of events that lead to sporadic late-onset AD. Although this hypothesis has not been confirmed, diverse mitochondrial functions were observed altered in AD and even MCI subjects, showing a significant increase of oxidative stress markers, such as lipid peroxidation and protein oxidation products [30][31][32]. We did not observe mitochondrial signatures at the whole cortex level, mostly enriched in activated genes from neuronal, apoptosis, vesicle mediated transport and adhesion related pathways, maybe because mitochondrial dysfunction has been reported to be limited to certain hippocampal and temporal cortex neurons [33,34].
Integration of genome data with expression data at blood and cortex levels through the RRA algorithm, showed a larger overlap of genes and functions in APOE3 and APOE4 carriers than in APOE2 carriers, which appears as a more distinct entity. In fact, we identified signatures for chromatin remodeling and regulation in this stratum at both brain and plasma levels, not observed in the other two strata. Common features of the disease to all three strata are related to lipid metabolism due to APOE (except for the APOE3 carriers), APOC1 and APOC2. A recent report has suggested that APOC1 gene, located in the APOE locus, is an independent risk factor for AD, and that genetic variability in the region is associated with chromatin regulation [35].
Although we have identified signatures of the nervous system development in all strata, they represent a largest proportion of relevant pathways in the APOE3 stratum. In this stratum, enrichment analysis of RRA cortex candidates showed an over-representation of genes involved in cardiac development and function (DLG1, JPH2 or MEF2C among others), supporting a cardiovascular etiology of dementia in this stratum. In line with this finding, we have recently reported a link between cardiac function and AD, that is mediated, at least in part, by CFLAR and caspase dependent mechanisms [40]. In fact, CFLAR and CASP8 are both RRA cortex candidates in this stratum. Another example of the nervous-cardiac connection is GFAP, which participates in the control of heart rate and vascular resistance through the sympathetic nervous system (SNS), which controls heart rate and vascular resistance. We have observed an upregulation of GFAP protein in cortex of all AD cases irrespective of the APOE carrier status. Macrophage activation and Fc gamma receptor mediated phagocytosis appeared as the most exclusive pathways in the APOE4 stratum. Phagocytosis (i.e. the engulfment and digestion of cellular debris) is critical for the degradation of infectious agents and senescent cells, playing a key role in tissue remodeling, immune response, and inflammation. Several Fc receptors (FcRs, FCGR2B, FCRLA and FCRLB) and downstream effectors receptor such as CDC42, RHOH, RHOQ and RHOT2, GTPases that regulate actin cytoskeleton, have been identified as APOE4 RRA candidates. While FcRs are constitutively active for phagocytosis, the complement receptor (CR)mediated phagocytosis is activated in presence of additional stimuli. An additional difference between FcR-and CR-mediated phagocytosis is that he former have a higher capacity for triggering the release of inflammatory mediators [41]. In fact, an enhanced release of inflammatory molecules such as IL-6, an APOE4 RRA candidate, IL1β or TNFα has been observed in blood among APOE4 carriers [42,43] and in blood and brain humanized APOE4 mice models [44][45][46] has been observed. In this study, we found that CR-related mechanisms were more relevant in APOE2 and APOE3 carriers, with CR1 and ATP5F1 as RRA candidates in both strata.
Macrophages are also involved in the development of atherosclerotic plaques through the intracellular accumulation of lipids and the formation of foam cells, a process counterbalanced by cholesterol efflux, a mechanism identified as an APOE4 specific feature in our study. A key protein in this process seems to be AIF1, a pro-inflammatory molecule expressed primarily in the monocyte/macrophage lineage, which was shown to be downregulated in APOE4 cases and upregulated in plasma samples of APOE3 cases in this study. A1F1 was originally cloned from a rat heart allograft under chronic rejection, and it is involved in several inflammatory conditions including atherosclerosis. Crossbreeding experiments A1F1 and APOE transgenic mice have shown an interaction between these genes leading to atherosclerotic vasculopathy though modulation of the incorporation of degenerated LDL by macrophages [47,48].
In brain, the resident macrophages, microglia cells, are the specialized phagocytic cells acting through a complement dependent mechanism coupled to ATP production. The analysis of single cell cortex data points to a pivotal role of the glial lineage in the development of AD in accordance with RRA results and current knowledge. Beyond astrocytes and microglia, the main cell types in which APOE is expressed, oligodendrocytes and oligodendrocyte precursors (OPCs) also play a role; interestingly, it has been suggested that astrocytes and oligodendrocytes could also participate in phagocytosis in the brain [49]. But the main role of oligodendrocytes is the production of myelin in the central nervous system, a cholesterol dependent mechanism; oligodendrocytes are continuously generated in the healthy adult brain, being the formation of new myelinating oligodendrocytes during adult life an important mechanism for neuroplasticity [50]. Astrocytes were shown to facilitate all steps of myelination, promoting OPC proliferation through PDGF and FGF2, or inhibiting the differentiation of OPCs into myelinforming cells through the CD44 receptor. Furthermore, CD44 is a top candidate from cortex RRA analysis upregulated in astrocytic cells of AD cases of all APOE strata, particularly in APOE4, while downregulated in the other cell types including OPCs, illustrating the complexity of AD related mechanism at the cellular level. Myeloid basic protein encoding gene (MBP) is one of the top RRA candidates from the APOE2 stratum, also reinforcing the relevance of myelination in AD in agreement with recent research in the field [51,52]. In fact, evidence from multiple sclerosis-lesions suggests that Fc receptors and complement have relevant roles in myelin phagocytosis, while in-vitro blockade of Fc or CRs reduced myelin phagocytosis [53].
In summary, through the integration of multi-OMICS datasets we have identified both common and APOE AGING specific signatures of AD. The ADAPTED consortium has generated isogenic hiPSC derived macrophages, neurons, astrocytes, and microglia carrying the different APOE haplotypes to further explore presented findings in human samples, in a cell-type specific manner. This will support the further elucidation of APOE dependent pathways that drive the AD risk and potentially support developing a therapy for AD patients. Table 2 summarizes the datasets and number of individuals by APOE stratum included at each analysis stage (total number of processed samples: 50,737). A flow chart of the analyses performed in this report is shown in Figure 7. Additional information about study datasets is provided as Supplementary Note. Whenever possible, clinical information was reviewed to exclude: i) cases not classified as confirmed or probable AD ii) controls with amyloid pathology or history of altered cognition tests.

Quality control (QC) and imputation
A standard QC was applied to all datasets, including removal of individuals with more than 3% missing genotypes, with excess autosomal heterozygosity (>0.35 or more than 3 standard deviations (SD) from population mean), those showing a discrepancy between genotypic and reported sex, as well as individuals of non-European ancestry based on SMARTPCA principal component (PC) analyses (exclusion of subjects more than 6 SDs away from the population mean) [54]. Duplicated and related individuals were identified and removed by means of IBS estimates (IBS>0.1875) both within and across studies. At the genotype level, we removed SNPs with missing genotype rate > 5%, not in Hardy-Weinberg equilibrium (HWE) (p<10 -6 in controls) and SNPs with minor allele frequency (MAF) < 1%. When necessary, datasets were updated to genome build GRCh37/hg19. Genotype imputation was performed at the University of Michigan server using the minimac3 algorithm and the SHAPEIT tool for haplotype phasing with the Haplotype Reference Consortium (HRC) cohort as reference panel [55]. After imputation, only SNPs with an R 2 quality estimate higher than 0.3 and MAF >1% were kept for association analysis.

APOE stratified association analysis
Association analysis was performed within each dataset in three independent groups: ε2 stratum (including subjects with APOE genotypes ε2/ε2 and ε2/ε3), ε3 stratum (ε3/ε3 individuals) and ε4 stratum (ε3/ε4 and ε4/ ε4 carriers). The ε2/ε4 genotype was excluded because of the combination of both the protective and deleterious alleles. Association of genotype dosages with the AD case-control status was explored through regression models adjusted by age, sex and the first four PC vectors as covariates using PLINK software [56].

Sex and APOE stratified association analysis
We also explored the effect of both APOE and sex on susceptibility to AD using two approaches. We first performed a sex stratified analysis using logistic regression models adjusted by age, the first four PC vectors and APOE genotype as a quantitative trait, assigning each allele E2 a value of -1, each E3 allele a value of 0 and each E4 allele a value of +1 (full range: from -2 to 2). Additionally, we performed an association analysis stratified by both APOE and sex. For these analyses, eight datasets from Stage I (ADDN, ADGC, ADNI, GNADA, MAYO, NIA, NXC, ROSMAP, N=12,158 individuals) and Stage II (GR@ACE, 5,741 subjects) were used.

Meta-analysis (meta-GWAS)
Within each stage and stratum, association results were combined by meta-analysis using the inverse variance AGING method implemented in METAL [57] or PLINK software programs. SNPs with MAF >1% that were available in at least 60% of the datasets at each stage were included in the meta-analysis. Genomic inflation lambda (λ) was calculated using the GenABEL package [58]. Manhattans and QQ plots were generated with the qqman R package [59].

Gene-level analysis
Gene level analysis was performed using MAGMA software, which compute gene-wise statistics taking into account physical distance and linkage disequilibrium (LD) between markers [60]. All SNPs with MAF above 5% were used in these analyses, setting a distance threshold of 50kb. At each stratum, genes were ranked according to the global p mean value.

Study cohorts
Whole blood expression profiles for meta-analysis were obtained from ADNI and AddNeuroMed studies

Differential expression analysis
As for GWAS data, differential expression (DE) analysis between cases and controls was performed independently in the three APOE subgroups using R package limma [65] by dataset and brain region when available. Limma results were adjusted for multiple testing using the Benjamin and Hochberg's (BH) method. Volcano plots and heatmaps were produced to assess these results. Probes were annotated to gene symbols using appropriate specific libraries, keeping the most differentially expressed mRNA isoform for those genes showing alternative splicing.

Differential expression meta-analysis (meta-GWES)
Independent APOE stratified meta-analyses were performed for combining DE results from the different datasets into single ranked gene lists for both blood and cortex. For cortex, only genes present in at least a 70% of the datasets were considered for meta-analysis. Individual logFCs were combined using the Random Effect Model (REM). Given that the analysis included data from different brain regions, genes were ranked according to the Fisher statistics to avoid making assumptions about the directionality of the effect, aimed at identifying candidate markers differentially expressed in the "majority" of studies, where Fisher methods has been described to outperform other methods in terms of power detection, biological association, stability and robustness [66]. All the analyses were performed with the metaDE R tool. Heatmap graphs were generated with the Pheatmap R package.

Integrative analysis
In order to obtain per-gene single estimates GWAS and GWES data were combined using the Robust Rank Aggregation (RRA) method [67]. The algorithm, integrated in the RobustRankAggreg R package, uses a probabilistic model for aggregation that is robust to noise and also facilitates the calculation of significance probabilities for all the elements in the final ranking. Two independent runs of the RRA algorithm were performed. In all of them we combined stage I+II+III GWAS meta-analysis plus blood or cortex GWES metanalyses ( Figure 7). Final gene ranks for blood and cortex were generated according to ascending order of the exact p values generated by the RRA algorithm.

Proteomic data analysis
Proteomic data from blood (ADDN study) and brain (BANNER, BLSA, MAYO and MSBB studies) were collected. Histograms and boxplots were generated to assess the distribution of normalized intensity protein expression values distributed by data providers. Differential protein expression analyses by study and APOE stratum were performed using limma, with PMI, age, sex and, when available, lipid lowering medication as covariates. Meta-analysis of the diverse brain datasets was performed as described for GWES datasets.

Single nuclei RNAseq (snRNAseq) data analysis
Additionally, we explored snRNAseq cortex data from the ROSMAP study [19]. Count matrix provided by ROSMAP study was processed using Seurat package [68]. After QC (filtering out cells that have unique feature counts over 2,500 or less than 200 and cells with >5% mitochondrial counts), data were normalized and scaled. Prior to clustering the cells, we applied the Uniform Manifold Approximation and Projection (UMAP) dimensional reduction technique. Finally, a differential expression analysis between AD cases and controls was performed by each cell type using the edgeR package [69].

Enrichment analysis
Enrichment analysis of RRA results was performed using four different tools: WebGestaltR [70,71], FUMA [72] and gPROFILER [73], for genes passing the multiple testing correction threshold (p=0.05), and GSEA [74] for full gene ranked lists. The databases being interrogated include GO, KEGG, WikiPathways, and Reactome. Only pathways and GO categories selected by at least two enrichment tools with adjusted p<0.05 and a minimum of three overlapping genes were selected for further exploring.

Data availability statement
Summary statistics are included as Supplementary Tables and will be made available through Synapse repository (https://www.synapse.org/) upon publication.
Most data used in this article are publicly available (see acknowledgement section).

Code availability statement
Code used for this article will be made publicly available through a public Jupyter server (   AGING ADAPTED and MOPEAD projects (grant numbers 115975 and 115985, respectively) and by national grants PI19/01301, PI16/01861, PI17/01474 and PI19/01240. Acción Estratégica en Salud is integrated into the Spanish National R + D + I Plan and funded by ISCIII (Instituto de Salud Carlos III)-Subdirección General de Evaluación and the Fondo Europeo de Desarrollo Regional (FEDER-'Una manera de hacer Europa'). Some control samples and data from patients included in this study were provided in part by the National DNA Bank Carlos III (http://www.bancoadn.org/, University of Salamanca, Spain) and Hospital Universitario Virgen de Valme (Sevilla, Spain); they were processed following standard operating procedures with the appropriate approval of the Ethical and Scientific Committee.

Extended datasets description The Alzheimer's disease genetics consortium (ADGC)
The National Institute on Aging (NIA) Alzheimer's Disease Centres (ADCs) cohort includes subjects ascertained and evaluated by the clinical and neuropathology cores of the 29 NIA-funded ADCs [1]. Data collection was coordinated by the National Alzheimer's Coordinating Center (NACC). The ADC cohort consists of autopsy-confirmed and clinicallyconfirmed AD cases, and cognitively normal elders (CNEs) with complete neuropathology data who were older than 60 years at age of death, and living CNEs evaluated using the Uniform dataset (UDS) protocol who were documented to not have mild cognitive impairment (MCI) and were between 60 and 100 years of age at assessment.

The AddNeuroMed study
AddNeuroMed was a public-private partnership for biomarker discovery and replication in Alzheimer's disease [2,3]. It was designed as a multi-center study in Europe with the first patient enrolled in January 2006 and the last in February 2008. The study protocol was planned for a baseline assessment visit with follow ups every 3 months for the first year, proceeded by annual visits that continued through 2013. The study enrolled a total of 258 AD, 257 MCI and 266 controls, not all with complete data at each assessment.

The Alzheimer's disease neuroimaging initiative (ADNI)
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). The ADNI study has three phases: ADNI1, ADNI GO and ADNI2. For upto-date information, see http://www.adni-info.org.

The atherosclerosis risk in communities (ARIC)
The ARIC study is a population-based cohort study of atherosclerosis and clinical atherosclerotic diseases (ARIC Investigators 1989) [4]. At its inception (1987)(1988)(1989), 15,792 men and women, including 11,478 white and 4,266 black participants were recruited from four U.S. communities: Suburban Minneapolis, Minnesota; Washington County, Maryland; Forsyth County, North Carolina; and Jackson, Mississippi. In the first 3 communities, the sample reflects the demographic composition of the community. In Jackson, only black residents were enrolled. Participants were between age 45 and 64 years at their baseline examination in 1987-1989 when blood was drawn for DNA extraction and participants consented to genetic testing. Vascular risk factors and outcomes, including transient ischemic attack, stroke and dementia, were determined in a standard fashion. During the first 2 years (1993-1994) of the third ARIC examination (V3), participants aged 55 and older from the Forsyth County and Jackson sites were invited to undergo cranial MRI. This subgroup of individuals with MRI scanning represents a random sample of the full cohort because examination dates were allocated at baseline through randomly selected induction cycles.

The Banner Sun Health Research Institute (Banner) study
This study is based on 201 post-mortem brain tissue samples obtained from the Banner Sun Health Research Institute's Brain and Body Donation Program. The tissue set came from 101 cognitively normal (controls) and 100 Alzheimer's disease (AD) cases. Label free proteome analysis was done on the dorsolateral prefrontal cortex from all individuals. Post-mortem neuropathological evaluation was performed at Banner Sun Health Research Institute. This included amyloid plaque distribution according to CERAD criteria and neurofibrillary tangle pathology assessed with Braak staging. Control cases were defined as cognitively normal within on average 9 months of death with low CERAD (0.13 ±0.35) and Braak (2.26 ±0.94) measures for amyloid and tau neuropathology, respectively. In contrast, AD cases were demented at the last clinical research assessment, and the brains showed high CERAD (2.9 ±0.31) and Braak (5.4 ±0.82) scores consistent with moderate to severe neuropathological burden. There was no significant difference in age or post mortem interval (PMI) between control and AD.

The Baltimore longitudinal study on aging (BLSA) study
We BLSA study included 97 post-mortem brain tissue samples from the National Institute on Aging's AGING Baltimore Longitudinal Study of Aging (BLSA, https://www.blsa.nih.gov/). The tissue set came from 50 individuals representing 15 controls, 15 AsymAD and 20 AD cases. For 47 cases, we analyzed tissue from both the dorsolateral prefrontal cortex (FC, Brodmann Area 9) and precuneus (PC, Brodmann Area 7). Both regions are affected in AD, and PC is a site of early amyloid deposition and glucose hypometabolism. Postmortem neuropathological evaluation was performed at the Johns Hopkins Alzheimer's Disease Research Center with the Uniform Data Set including amyloid plaque distribution according to CERAD criteria and neurofibrillary tangle pathology assessed with Braak staging. Control cases were defined as cognitively normal within on average 9 months of death with low CERAD (0.13 ±0.35) and Braak (2.26 ±0.94) measures for amyloid and tau neuropathology, respectively [5]. In contrast, AD cases were demented at the last clinical research assessment, and the brains showed high CERAD (2.9 ±0.31) and Braak (5.4 ±0.82) scores consistent with moderate to severe neuropathological burden. AsymAD cases were cognitively normal proximate to death and had high CERAD (2.1 ±0.52) and moderate Braak (3.6 ±0.99).

The cohort for heart and ageing research in genomic epidemiology (CHARGE) consortium
The CHARGE consortium currently includes six large, prospective, community-based cohort studies that have genome-wide variation data coupled with extensive data on multiple phenotypes [5]. A neurology working-group arrived at a consensus on phenotype harmonization, covariate selection and analytic plans for within-study analyses and meta-analysis of results [6]. Consent procedures, examination and surveillance components, data security, genotyping protocols and study design at each study were approved by a local Institutional Review Board, details are provided below. Of the six studies, we included in this study the Atherosclerosis Risk in Communities (ARIC) study, the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS) and the Rotterdam Study (RS).

The cardiovascular health study (CHS)
The CHS is a population-based cohort study of risk factors for coronary heart disease and stroke in adults ≥ 65 years conducted across four field centers [7]. The original predominantly European ancestry cohort of 5,201 persons was recruited in 1989-1990 from random samples of the Medicare eligibility lists; subsequently, an additional predominantly African-American cohort of 687 persons was enrolled for a total sample of 5,888. Blood samples were drawn from all participants at their baseline examination and DNA was subsequently extracted from available samples. Genotyping was performed at the General Clinical Research Center's Phenotyping/Genotyping Laboratory at Cedars-Sinai among CHS participants who consented to genetic testing and had DNA. European ancestry participants were excluded from the GWAS study sample due to the presence at study baseline of coronary heart disease, congestive heart failure, peripheral vascular disease, valvular heart disease, stroke or transient ischemic attack or lack of available DNA. Among those with successful GWAS, 567 European ancestry participants had available FreeSurfer measures for this analysis. CHS was approved by institutional review committees at each field center and individuals in the present analysis had available DNA and gave informed consent including consent to use of genetic information for the study of cardiovascular disease.

The European Alzheimer's disease initiative (EADI) consortium
All the 2,240 Alzheimer's disease cases were ascertained by neurologists from Bordeaux, Dijon, Lille, Montpellier, Paris, Rouen, and were identified as French NHW ancestry. Clinical diagnosis of probable Alzheimer's disease was established according to the DSM-III-R and NINCDS-ADRDA criteria. Controls were selected from the 3C Study [8]. This cohort is a population-based, prospective (10-years follow-up) study of the relationship between vascular factors and dementia. It has been carried out in three French cities: Bordeaux (southwest France), Montpellier (southeast France) and Dijon (central eastern France). A sample of non-institutionalized, over-65 subjects was randomly selected from the electoral rolls of each city. Between January 1999 and March 2001, 9,686 subjects meeting the inclusion criteria agreed to participate. Following recruitment, 392 subjects withdrew from the study. Thus, 9,294 subjects were finally included in the study (2,104 in Bordeaux, 4,931 in Dijon and 2,259 in Montpellier). Genomic DNA samples of 7,200 individuals were transferred to the French Centre National de Génotypage (CNG). First stage samples that passed DNA quality control were genotyped with Illumina Human 610-Quad BeadChips. At the end we removed 308 samples because they were found to be first-or second-degree relatives of other study participants or were assessed non-European descent based on genetic analysis using methods described in 89. In this final sample, at 10 years of follow-up, 564 individuals suffered from Alzheimer's disease with 95 prevalent and 469 incident cases.

The Framingham heart study (FHS)
The FHS is a three-generation, single-site, communitybased, ongoing cohort study that was initiated in 1948 AGING to investigate the risk factors for cardiovascular disease. It now comprises 3 generations of participants: the Original cohort followed since 19489; their Offspring and spouses of the Offspring (Gen 2), followed since 1971 [9]; and children from the largest Offspring families enrolled in 2000 (Gen 3) [10]. The Original cohort enrolled 5,209 men and women who comprised two-thirds of the adult population then residing in Framingham, MA. Survivors continue to receive biennial examinations. The Offspring cohort comprises 5,124 persons (including 3,514 biological offspring) who have been examined approximately once every 4 years. The Third-generation includes 4,095 participants with at least one parent in the Offspring Cohort. The first two generations were invited to undergo an initial brain MRI in 1999-2005, and for Gen 3, brain MRI began in 2009. The population of Framingham was virtually entirely white (Europeans of English, Scots, Irish and Italian descent) in 1948 when the Original cohort was recruited. Selfreports of ethnicity across all three generations were 99.7% whites, reflecting the ethnicity of the population of Framingham in 1948. FHS participants had DNA extracted and provided consent for genotyping, and eligible participants underwent genome-wide genotyping.

Multi-site collaborative study for genotype-phenotype associations in Alzheimer's disease and longitudinal follow-up of genotype-phenotype associations in Alzheimer's disease and neuroimaging component of genotype-phenotype associations in Alzheimer's disease (GenADA)
GenADA was a multi-site collaborative study, involving GlaxoSmithKline Inc and nine medical centers in Canada, including 1000 AD patients and 1000 ethnically-matched controls in order to associate DNA sequence (allelic) variations in candidate genes with AD phenotypes [11,12]. The study consists of both retrospective and prospective data. Where possible, biological relatives with Alzheimer's (up to third degree relationship) and unaffected siblings of AD cases were also recruited.

The genetic and environmental risk for Alzheimer's disease (GERAD1) consortium
The GERAD1 sample comprised up to 3941 AD cases and 7848 controls. A subset of this sample has been used in this study and were genotyped at the Sanger Institute on the Illumina 610-quad chip. London and the South East Region AD project (LASER-AD), University College London; Competence Network of Dementia (CND) and Department of Psychiatry, University of Bonn, Germany and the National Institute of Mental Health (NIMH) AD Genetics Initiative. All AD cases met criteria for either probable (NINCDS-ADRDA, DSM-IV) or definite (CERAD) AD. All elderly controls were screened for dementia using the MMSE or ADAS-cog, were determined to be free from dementia at neuropathological examination or had a Braak score of 2.5 or lower."

The genome research @ fundació ACE project (GR@ACE) study
The GR@ACE study comprises 4,120 AD cases and 3,289 control individuals. Cases were recruited from Fundació ACE, Institut Català de Neurociències Aplicades (Catalonia, Spain). Diagnoses were established by a multidisciplinary working-group, including neurologists, neuropsychologists, and social workers, according to the DSM-IV criteria for dementia and to the National Institute on Aging and Alzheimer's Association's (NIA-AA) 2011 guidelines for defining AD [13]. Dementia individuals diagnosed with probable or possible AD at any moment of their clinical course were considered AD cases.
Briefly, participants were genotyped using the Axiom 815K Spanish Biobank Array (Thermo Fisher), performed in the Spanish National Center for Genotyping (CeGEN, Santiago de Compostela, Spain). Individuals were excluded for low-quality samples, (call rate <97%), excess heterozygosity, sample duplicates, or relation to another sample (PIHAT > 0.1875). Individuals were excluded if sex discrepancy was detected. Population outliers of European ancestry were also removed. Variants were excluded if they departed from the Hardy-Weinberg equilibrium (P-value ≤ 1 × 10-6), presented a different missing rate between cases and controls (P-value < 5 × 10-4 for the difference), or had a low frequency (MAF < 0.01) or low call rate < 95%. High-quality variants were imputed in Michigan Server using the Haplotype reference consortium (HRC) panel (https:// imputationserver.sph.umich.edu). Only high imputation quality markers (MAF > 0.05 and R2>0·03) were used for downstream analysis. Further information about phenotyping and GWAS quality controls have been previously provided [14].

AGING
The mayo clinic LOAD genome-wide association study (MAYO) Subjects from the Mayo LOAD GWAS were selected from two clinical AD Case-Control series: Mayo Clinic Jacksonville (MCJ), Mayo Clinic Rochester (MCR)and a neuropathological series of autopsy-confirmed subjects from the Mayo Clinic Brain Bank [15]. All subjects from the clinical series (MCJ and MCR) were diagnosed by a Mayo Clinic neurologist; all control subjects had a Clinical Dementia Rating score of zero at the most recent time of testing; all LOAD patients had a diagnosis of probable or possible AD according to the NINCDS-ADRDA criteria [16]. All ADs had definite diagnosis according to the NINCDS-ADRDA criteria and had Braak scores of ≥4.0. All non-AD Controls had Braak scores of ≤2.5; many had brain pathology unrelated to AD.

The Mount Sinai Brain Bank (MSBB) study
Brain specimens were obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB) which holds over 1,700 samples. This cohort was assembled after applying stringent inclusion/exclusion criteria and represents the full spectrum of disease severity. Neuropathological assessments are performed according to the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) protocol and include assessment by hematoxylin and eosin, modified Bielschowski, modified thioflavin S, and anti-β amyloid (4G8), anti-tau (AD2) and anti-ubiquitin (Daka Corp.). Each case is assigned a Braak AD-staging score for progression of neurofibrillary neuropathology. Quantitative data regarding the density of neuritic plaques in the middle frontal gyrus, orbital frontal cortex, superior temporal gyrus, inferior parietal cortex and calcarine cortex are also collected as described. Clinical dementia rating scale (CDR) and mini-mental state examination (MMSE) severity tests are conducted for assessment of dementia and cognitive status. Final diagnoses and CDR scores are conferred by consensus. Based on CDR classification, subjects are grouped as no cognitive deficits (CDR = 0), questionable dementia (CDR = 0.5), mild dementia (CDR = 1.0), moderate dementia (CDR = 2.0), and severe to terminal dementia (CDR = 3.0-5.0). Covariates including demographic and neuropathological data were collected on the samples used for this project including postmortem interval, race, age of death, clinical dementia rating, clinical neuropathology diagnosis, CERAD, Braak, sex, and a series of neuropathological variables.

The Neocodex-Murcia study (NXC)
The study includes 327 sporadic AD patients and 801 controls with unknown cognitive status from the Spanish general population collected by Neocodex [17,18]. AD patients were diagnosed as possible or probable AD in accordance with the criteria of the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) [16].
The national institute on aging -late onset Alzheimer's disease family study (NIA) The goal of this study is to identify and recruit families with two or more siblings with the late-onset form of Alzheimer's disease and a cohort of unrelated, nondemented controls similar in age and ethnic background, and to make the samples, the clinical and genotyping data and preliminary analyses available to qualified investigators world-wide [19]. Genotyping by the Center for Inherited Disease Research (CIDR) was performed using the Illumina Infinium II assay protocol with hybridization to Illumina Human 610Quadv1_B Beadchips.

The religious orders study and memory and aging project (ROS/MAP) study
The Religious Orders Study (ROS) is a longitudinal clinical-pathologic cohort study of aging and Alzheimer's disease (AD) from the Rush University that enrolled individuals from religious communities for longitudinal clinical analysis and brain donation [20]. Participants were enrolled from more than 40 groups of religious orders (nuns, priests, brothers) across the United States. Medical conditions are documented starting in 1994 by clinical evaluation or self-report. Alzheimer's Disease status was determined by a computer algorithm based on cognitive test performance with a series of discrete clinical judgments made in series by a neuropsychologist and a clinician.
The Memory and Aging Project (MAP) is a longitudinal, epidemiologic clinical-pathologic cohort study of common chronic conditions of aging with an emphasis on decline in cognitive and motor function and risk of Alzheimer's disease that began in 1997 and is run from Rush University [20]. This study was designed to complement the ROS study by enrolling individuals with a wider range of life experiences and socioeconomic status into a study of similar structure and design as ROS. The study enrolled older individuals without any signs of dementia, primarily recruiting from continuous care retirement communities throughout north-eastern Illinois, USA. Diagnoses of dementia and AD are performed in an identical manner to the ROS study.

The Rotterdam study
The Rotterdam Study is a prospective, population-based cohort study among individuals living in the welldefined Ommoord district in the city of Rotterdam in The Netherlands [21,22]. The aim of the study is to determine the occurrence of cardiovascular, neurological, ophthalmic, endocrine, hepatic, respiratory, and psychiatric diseases in elderly people. The cohort was initially defined in 1990 among approximately 7,900 persons, aged 55 years and older, who underwent a home interview and extensive physical examination at the baseline and during followup rounds every 3-4 years (RS-I). The cohort was extended in 2000/2001 (RS-II, 3,011 individuals aged 55 years and older) and 2006/2008 (RS-III, 3,932 subjects, aged 45 and older). Written informed consent was obtained from all participants and the Medical Ethics Committee of the Erasmus Medical Center, Rotterdam, approved the study.

The Translational Genomics Research Institute (TGEN) study
The TGEN GWAS study included 643 late onset AD cases and 404 controls from a neuropathological cohort, and 197 late onset AD cases and 114 controls from a clinical cohort [23].