Quantitative endophenotypes as an alternative approach to understanding genetic risk in neurodegenerative diseases

Endophenotypes as measurable intermediate features of human diseases reflect their underlying molecular mechanisms. The use of quantitative endophenotypes in genetic studies has improved our understanding of pathophysiological changes associated with diseases. The main advantage of the quantitative endophenotypes approach to study human diseases over a classic case-control study design is the inferred biological context that can enable the development of effective disease-modifying treatments. Here, we summarize recent progress on biomarkers for neurodegenerative diseases, including cerebrospinal fluid and blood-based, neuroimaging, neuropathological, and clinical studies. This review focuses on how endophenotypes studies have successfully linked genetic modifiers to disease risk, disease onset, or progression rate and provided biological context to genes identified in genome-wide association studies. Finally, we review critical methodological considerations for implementing this approach and future directions.


Introduction
Alzheimer's disease (AD) is the most common neurodegenerative disease and the leading cause of dementia. It is characterized by progressive memory decline, cognitive impairment, and behavioral changes (Alzheimer's, 2019). It is estimated that presently 5.8 million Americans have AD, and the number can grow to 13.8 million by mid-century (Alzheimer's, 2019). To date, there is no effective treatment for AD; the medications currently used for AD only temporarily improve symptoms with a limited duration (Alzheimer's, 2019). Effective treatment development is hindered because the precise molecular changes and processes in the brain that cause AD are still unknown. Moreover, AD is a heterogeneous disease, presenting a broad spectrum of clinical signs and pathology (Ferreira et al., 2018). Like AD, other neurodegenerative disorders are also clinically-and pathologically heterogeneous, including Parkinson's disease (PD), Frontotemporal dementia (FTD), and dementia with Lewy bodies (DLB) (Matej et al., 2019).
AD neuropathological features include extracellular deposits of amyloid-beta (Aβ) peptides in plaques and intracellular neurofibrillary tangles (NFTs) composed of hyperphosphorylated tau protein (Hyman et al., 2012). The accumulation of Aβ plaques starts decades before clinical symptoms appear (Bateman et al., 2012). According to the amyloid cascade hypothesis, NFT formation is driven by Aβ pathology or soluble oligomeric Aβ (Bakota and Brandt, 2016). Other comorbid pathologies are also present in a portion of AD patients. In approximately 30% of AD patients, a burden of cerebrovascular pathology is observed as a function of age (Toledo et al., 2013). Similar to PD and DLB, about 50% of AD patients show aggregation of misfolded α-synuclein in Lewy bodies and Lewy neurites (Hamilton, 2000). Additionally, TAR DNA binding protein 43 (TDP-43) immunoreactive lesions have been observed in over 20% of AD cases (Amador-Ortiz et al., 2007), it is mostly seen in the 'oldest-old' and severe cases (Huang et al., 2020).
Genome-wide association studies (GWAS) combined with wholegenome and exome-sequencing projects have identified about 40 loci associated with AD, in addition to APOE ε4 (Sims et al., 2017;Kunkle et al., 2019;Jansen et al., 2019). Most of those loci don't have a culprit gene identified; therefore, a functional link to AD is not evident. Sporadic AD has an estimated heritability between 70 and 80%. The current genetic findings explain about half of that heritability (Ridge et al., 2016). The missing heritability may be due to the genetic complexity and the clinical heterogeneity of AD. Alternative approaches are necessary to decode the genetic complexity of AD and other neurodegenerative diseases.
Biomarkers are measurable, endogenous characteristics that denote an abnormal process representing either risk or manifestation of a disease. Endophenotypes are a subtype of biomarkers that are disease-specific and heritable. Attributes of a disease can be broken down into endophenotypes, which might help diagnose and understand it. Dissecting AD by endophenotypes may help find the missing heritability. The use of endophenotypes may reveal novel genes and pathways involved in neurological disorders. Moreover, genetic drivers of endophenotypes may be associated with different aspects of the disease, such as risk, age of onset, and progression rate. There are significant advantages to using quantitative endophenotypes for discovery, such as reduced heterogeneity, increased statistical power, and specific biological context for associated genes and variants.
Here we describe studies that have used quantitative endophenotypes and how their discoveries helped understand the underlying disease mechanism. We summarize recent progress on biomarkers for neurodegenerative diseases, including cerebrospinal fluid and bloodbased, neuroimaging, neuropathological, and clinical studies. We focus on studies that successfully link genetic drivers of endophenotypes to disease risk.

Selection of the studies included in this review
We followed the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines.

Search strategy and inclusion criteria
We conducted a systematic literature search from the study inception until September 2020. We used keywords, including Alzheimer's disease, neuroimaging, endophenotypes, GWAS, quantitative trait, and genetics. We added keywords like biomarker, serum, plasma, blood, and CSF. Finally, we extensively searched the following keywords in our search: "(Endophenotype* OR biomarker) AND (dementia OR Alzheimer*) AND (plasma OR imaging OR CSF) AND GWAS)" on PubMed. The authors independently evaluated the titles and abstracts of all retrieved papers for potential eligibility. We reviewed full-text articles related to potentially eligible biomarkers and then compiled a final list of inclusion studies. Between-reviewer discrepancies were resolved through discussion. After a thorough review of full-text articles, twentytwo studies were ultimately included in our meta-analysis.

Eligibility criteria
To be included, studies had to be (a) an observational study, either with genetic data from cohorts or a cross-sectional design comparing biomarker levels in patients with AD and controls, or (b) a human trial. We excluded (a) non-human studies, (b) reviews and commentaries, and (c) wrong outcomes (biomarkers without genetic analyses).

Study prioritization/final selection
For studies focusing on the same phenotype, we prioritize those with a larger sample size or a replication or validation stage.

Data extraction
The authors independently extracted data of interest following the PRISMA guidelines. The primary outcome was genetic variants associated with differential biomarker levels between patients with AD and controls, including statistical data such as size effect, the direction of the effect, standard mean difference (SMD), with the corresponding 95% confidence interval (CI) and p value. We defined the control group as cognitively healthy participants. If no relevant data regarding biomarker levels were reported in the article, we attempted to use other compatible statistical parameters (e.g., p-value, sample size, or odds ratio) to estimate the effect size (ES). These estimated ESs were then converted and pooled into SMDs. We extracted the variables of interest, including biomarker level, mean age, gender distribution (proportion of female participants), and biomarker source.

Fluid biomarker
The primary sources of body fluid biomarkers for neurological diseases are cerebrospinal fluid (CSF) and blood. Because of the direct interaction with the interstitial space, CSF is the most representative of the brain's biochemical and metabolic changes.
3.1.  Aβ is a physiological product of cleavage of the transmembrane protein amyloid precursor protein (APP). Cleavage of APP by α-secretase results in the production of soluble α APP, which is neuroprotective (Turner et al., 2004). However, if the APP is sequentially cleaved by β-site amyloid precursor protein cleaving enzyme 1 (BACE-1) and γ-secretase, Aβ peptides are produced (Portelius et al., 2011;Yan, 2017). Production of Aβ peptides of variable length results from imprecise cleavage by γ-secretase in the transmembrane domain of APP. The Aβ peptide that consists of 42 amino acids (Aβ42) is the longest, most hydrophobic, and it is the major component of the plaques in AD brains (Glenner and Wong, 1984;Masters et al., 1985). As Aβ42 accumulates in the brain, both CSF and blood Aβ42 levels decrease (Portelius et al., 2011).
Tau is encoded by the microtubule-associated protein tau (MAPT) gene; alternative splicing results in six isoforms (Goedert et al., 1989). Tau is a microtubule-associated protein located in the axon of neurons, and it has been linked to several neurodegenerative disorders (tauopathies). Tau phosphorylation is essential for regulating microtubule stability and axonal transport (Johnson and Stoothoff, 2004). The NFTs present in AD brains are comprised mainly of hyperphosphorylated tau (Bakota and Brandt, 2016). The hyperphosphorylation of tau reduces affinity for microtubules, contributing to axonal pathology (Krstic and Knuesel, 2013). Tau hyperphosphorylation and aggregation play a significant role in causing neuronal loss (Khan and Bloom, 2016). CSF Aβ42, tTau, and pTau are considered core biomarkers for AD and support AD diagnosis (Dubois et al., 2014). Decreased CSF Aβ42 is observed in AD patients (Motter et al., 1995) with an inverse correlation with Aβ plaques in the brain (Grimmer et al., 2009;Jagust et al., 2009). CSF tTau levels represent neuronal and axonal degeneration and damage in the brain, but they are not specific to a particular neurological disorder (Hesse et al., 2001;Ö st et al., 2006;Blom et al., 2009). CSF pTau (tau phosphorylated at threonine 181) levels are increased in AD patients, but not in most of the other forms of dementia (Sjögren et al., 2001;Koopman et al., 2009), and is therefore significant for differential diagnosis. Abnormal levels of CSF Aβ42, tTau, and pTau are observed in individuals more than a decade before clinical symptoms appear . These well-established AD endophenotypes have been used in GWA studies to identify potential functional regulatory mechanisms.
In 2017, Deming et al.  published a large GWA study of CSF Aβ42, tTau, and pTau181 levels in 3146 samples that revealed novel associated loci. Two novel loci were associated with CSF levels of pTau181, 13q21.1 near PCDH8 (rs9527039, p= 5.95×10 − 9 ), and 18q23 near CTDP1 (rs12961169, p= 5.12×10 − 10 ). For CSF Aβ42, two novel loci were identified: one near GLIS1 on 1p32.3 (rs185031519, p= 2.08×10 − 8 ) and another within SERPINB1 on 6p25 (rs316341, p= 1.72×10 − 8 ). Deming et al. reported that these loci are associated with AD risk (rs185031519, p= 3.43×10 − 2 ), age at onset (rs316341, p= 4.62×10 − 3 ), and rate of cognitive decline (rs185031519, p=1.92×10 − 2 ). The direction of effect and alleles associated with lower CSF Aβ42 are the same for increased risk, earlier onset, and accelerated AD progression. Loci previously reported were replicated, such as the proxy SNP for APOE ε4 Kim et al., 2011), which is the most significant variant associated with CSF levels of Aβ42, t-tau, and p-tau181. It has been hypothesized that APOE's main effect on AD is through Aβ deposition as carriers of APOE ε4 alleles show large amounts of brain Aβ deposition and low CSF Aβ42 . However, APOE affects Tau pathogenesis independent of the presence of Aβ pathology as well. In patients without Aβ pathology, Tau neuropathology was more severe in APOE ε4 carriers compared with noncarriers (Shi et al., 2017). Deming et al. replicated two loci previously identified for pTau181 on 3q28 near the GMNC gene and 9p24.2 in the GLIS3 gene . These loci represent only a small proportion of the genetic modifiers of levels of CSF Aβ42, tTau, and pTau. Larger cohorts are expected to unveil additional loci.
In 2018, Deming et al. (Deming et al., 2018) performed sex-stratified and sex interaction analyses in CSF Aβ42 and tTau levels to identify sexspecific associations (1527 males and 1509 females). The most significant association for both males and females for CSF Aβ42 and tTau was on the APOE locus. Females showed a stronger association with the APOE locus than males for CSF tTau analysis, while males were more strongly associated with CSF Aβ42 analysis . Females showed genome-wide significant associations of CSF Aβ42 with loci on 6p25 within SERPINB1 (rs316341, p=4.25×10 − 8 ) and 4q34.3 near LINC00290 (rs13115400, p=3.97×10 − 8 ), which were not found in males (p=0.009 and p=0.20, respectively). The levels of expression of SERPINB1 were associated with higher levels of amyloidosis in females (corrected p-values<0.02) but not males (p>0.38), suggesting a sexspecific effect of SERPINB1 on amyloidosis. For CSF tTau, previously identified locus 3q28, near GMNC, has genome-wide significance in females (rs1393060, p=8.27×10 − 10 ), but not males (p= 0.03). Analysis of expression of the genes in the 3q28 locus revealed an association of high expression levels of OSTN (p=0.006) and CLDN16 (p=0.002) with lower tangle density in females, but not in males, unveiling potential female-specific regulators of Tau pathology.

CSF TREM2
Variants in the TREM2 (triggering receptor expressed on myeloid cells 2) gene have been associated with increased risk of AD in different populations (Guerreiro et al., 2013;Jonsson et al., 2013;Benitez et al., 2013;Jin et al., 2014;Jin et al., 2015). TREM2 protein's primary functions include regulation of myeloid cell number, inflammation, and phagocytosis (Jay et al., 2017). In AD brains, in the early stages of the disease, TREM2 expression is upregulated and plays a role in the clearance of amyloid-beta through phagocytosis, suggesting a protective effect. In later stages, TREM2 activates the inflammatory responses, which can be detrimental (Jay et al., 2017).
To identify genetic modifiers of CSF TREM2 and better understand the role of TREM2 in AD, Deming and colleagues performed a GWA study using CSF TREM2 levels as a quantitative trait . Two genome-wide, independent signals on 11q12.2 within the 4domain subfamily A (MS4A) gene cluster (rs1582763, p= 1.15 × 10 -15, and rs6591561, p= 1.47 × 10 − 9 ) were found. These two independent variants have opposite effects: rs1582763 is associated with higher CSF TREM2 levels, lower risk for AD, and older age of onset, while rs6591561 is associated with lower CSF TREM2 levels, higher risk for AD, and earlier age of onset. The variant rs1582763 regulates the expression of MS4A4A and MS4A6A genes, which are highly correlated with TREM2 expression levels. TREM2 protein increased with MS4A4A overexpression, and TREM2 production was reduced when MS4A4A was silenced in primary human macrophages. Although the two genes (TREM2 and MS4A4A) have been implicated in AD, Deming et al. (2019) demonstrated for the first time that it is possible to modulate TREM2 levels by targeting MS4A4A levels. This study also provides a biological context for the previous association of MS4A4A with AD risk. MS4A4A modifies AD risk by regulating TREM2 levels. Mendelian randomization studies also demonstrated that higher TREM2 levels are protective, and there are currently clinical trials targeting TREM2 by activating antibodies or MS4A4A as a potential therapy for Alzheimer's disease .
Deming and colleagues (Deming et al., 2016a(Deming et al., , 2016b(Deming et al., , 2016c conducted a study to identify genetic regulators of CSF YKL-40 and to determine if these genetic regulators are associated with AD phenotypes. Using a small cohort of 379 individuals, they identified a genome-wide significant cis-signal in the CHI3L1 gene (rs10399931, p= 1.76×10 − 14 ). However, rs10399931 was not associated with AD risk, age at onset, or progression, suggesting that this variant is associated with the normal regulation of YKL-40 levels but not with the disease. This locus explained 12% of the CSF YKL-40 variance. This result also indicated that YKL-40 is a biomarker of disease but not an endophenotype (YKL-40 is not implicated in AD pathogenesis). Levels of CSF YKL-40 have been reported to be highly correlated with CSF pTau level (Antonell et al., 2014a) When rs10399931 is included in the additive model, a significant increase in correlation with CSF pTau level is observed enhancing its biomarker specificity (Deming et al., 2016a(Deming et al., , 2016b(Deming et al., , 2016c. This is an example of how genetics can be combined with proteomics to create a better prediction model at an individual level.

Plasma progranulin
Progranulin is a secreted glycoprotein encoded by the GRN gene; it is widely expressed and performs various functions (Bateman and Bennett, 1998;Nguyen et al., 2013). Progranulin acts as a trophic factor for different cell types, including neurons (Van Damme et al., 2008;Beel et al., 2017). Heterozygous deletions in the GRN gene cause FTD (Baker et al., 2006;Cruts et al., 2006), while homozygous deletions cause neuronal ceroid lipofuscinosis (NCL) (Smith et al., 2012), a lysosomal storage disease. NCLs are rare neurological diseases, while FTD is one of the most common forms of early-onset dementia. Common variants in the GRN gene are associated with AD risk Viswanathan et al., 2009) and reduced ALS patient survival .
Carrasquillo and colleagues (Carrasquillo et al., 2010) performed a GWA study with 533 control plasma samples and identified a genomewide significant association of the 1p13.3 locus (rs646776, p= 1.7×10 − 30 ) with lower levels of GRN protein. The SNP is located near the SORT1 gene and had been previously reported to increase mRNA levels of SORT1 (Musunuru et al., 2010). Overexpression of SORT1 in HeLa cells resulted in a drastic reduction of secreted GRN in the media, while SORT1 knockdown increased GRN levels. The rs646776 SNP showed no correlation with SORT1 mRNA levels in the cerebellum or frontal cortex brain samples, indicating a different regulation pathway for SORT1 and GRN in the brain.

Other fluid biomarkers
Many other fluid biomarkers are being investigated for links to clinical presentation and pathology in neurodegenerative diseases. For example, the clusterin protein is increased in AD patients' hippocampus (Lidström et al., 1998). CSF clusterin levels are increased in AD patients (Nilselid et al., 2006;Sihlbom et al., 2008). Deming et al. used CSF clusterin as a quantitative endophenotype in 673 individuals and found suggestive loci in 16q24.1 intron of LINC00917 (rs2581305, p= 3.98 × 10 − 7 ) and 7p15.3 near interleukin 6 (rs1800795, p= 9.94 × 10 − 6 ) (Deming et al., 2016b). These results need to be confirmed in a larger sample cohort.
CSF apolipoprotein E (APOE) level is another interesting candidate endophenotype. The association between CSF APOE levels and AD is still ambiguous. The CSF APOE levels in APOE ε4 carriers are associated with the clinical and pathophysiological manifestation of AD (van Harten et al., 2017). CSF APOE levels were not consistently different between AD patients and controls (Talwar et al., 2016). In 2012, using an additive model for APOE genotypes from 570 individuals and CSF APOE levels, a significant association with CSF levels (p= 6.9 × 10 − 13 ) was found. The risk conferred by APOE genotype was inversed correlated with CSF APOE levels (Cruchaga et al., 2012). GWAS analysis of CSF APOE levels did not identify genome-wide significant signals, likely due to the small sample size.
CSF levels of Aβ42 are one of the core biomarkers for AD, and it is used to support the diagnosis of AD (Dubois et al., 2014). Plasma levels of Aβ42, as a surrogate for CSF levels, have produced inconsistent results (Olsson et al., 2016). A recent study found that the ratio of plasma Aβ42 to Aβ40 predicts brain amyloidosis (Schindler et al., 2019). No genomewide significant loci were found in a meta-analysis of 3528 healthy individuals using plasma levels of Aβ40 and Aβ42 (Chouraki et al., 2014). More reliable tests for plasma Aβ levels have been developed since 2014, when the meta-analysis was performed.
In 2014, Kauwe et al.  performed a meta-analysis of 574 individuals with CSF levels of 59 AD-related analytes. CSF levels of five proteins presented significant genetic associations with studywide significant p-values (p<1.46 × 10 − 10 ): Angiotensin-converting enzyme (ACE), Chemokine (C-C motif) ligand 2 (CCL2), Chemokine (C-C motif) ligand 4 (CCL4), Interleukin 6 receptor (IL6R), and Matrix metalloproteinase-3 (MMP3). The ACE protein has been implicated in AD pathogenesis (Oba et al., 2005), CSF levels of this protein were associated with SNPs within the ACE locus. The SNPs associated with higher CSF ACE levels were previously associated with reduced risk for AD (Lambert et al., 2013). CSF MMP3 levels have been reported to be increased in individuals with a pTau181/ Aβ42 ratio seen in AD patients . SNPs associated with CSF MMP3 levels are located within the MMP3 gene, are associated with increased levels of CSF MMP3, and previously associated with reduced risk of AD (Lambert et al., 2013). In a similar study, Deming et al. (Deming et al., 2016b) performed pQTL analyses in more than 800 individuals for 300 protein levels. Fifty-six genome-wide significant associations (28 novel) with 47 analytes were found, including novel GWAS hits for ACE or APOE levels. These findings suggest that these proteins, and possibly their pathways, could be therapeutic targets for AD.

Neuroimaging biomarkers
Recent developments in neuroimaging tools allow for the real-time detection of changes in structure, function, and molecular composition of the brain in living subjects with neurodegenerative diseases. Neuroimaging resources, in combination with clinical assessments, are commonly used in the diagnosis of neurological disorders. Neuroimaging can be roughly divided into three categories: structural imaging, functional neuroimaging, and molecular imaging. Magnetic resonance imaging (MRI) measures brain volume and tissue characteristics. Positron emission tomography (PET) uses a "tracer" compound tagged with a radioactive isotope. PET tracers are designed to bind to abnormal proteins, neurotransmitters receptors, enzymes, or active neurons. In summary, neuroimaging techniques facilitate brain structure assessment, and the presence of abnormal proteins in living organisms, data that was previously only detectable by post-mortem examination.

Amyloid-β PET imaging
Brain accumulation of amyloid plaques can be identified by using PET amyloid-β (Aβ) specific radiotracers, such as 11 C-labeled Pittsburgh Compound B (PiB). PiB was the first specific radioligand to bind aggregates of Aβ (Klunk et al., 2004); since then, other specific ligands were developed (Herholz and Ebmeier, 2011). There is increased retention of PiB in the brains of AD patients compared to cognitively healthy controls (Klunk et al., 2004;Jack Jr. et al., 2013) and Aβ accumulation detected by PiB is significantly associated with the APOE ε4 allele ). Yan and colleagues (Yan et al., 2018) performed a GWA and meta-analysis using PiB-PET imaging data from 1000 subjects and revealed that the most significant locus associated with PiB data was the APOE locus. APOE ε4 was associated with higher PiB retention, while APOE ε2 was associated with lower PiB retention, consistent with previous reports of Aβ brain accumulation and APOE alleles Morris et al., 2010). A study with a larger number of subjects may unveil novel genetic loci associated with Aβ brain burden.
Another specific PET Aβ ligand commonly used in diagnosis and research studies is the 18 F-labeled tracer florbetapir. Ramanan et al. (Ramanan et al., 2014) performed a GWA study in 555 participants with florbetapir PET imaging. They identified a previously associated APOE locus (rs429358, p= 5.5 × 10 − 14 ) and a novel locus upstream of the BCHE gene on 3q26.1 (rs509208, p= 2.7 × 10 − 08 ). The BCHE gene encodes for the protein butyrylcholinesterase, which is known to be enriched within Aβ plaques (Guillozet et al., 1997). However, this novel locus has not been replicated in the Yan et al. study that used a different ligand but a larger sample size. Ramanan et al. (Ramanan et al., 2015) also performed a GWA study of longitudinal changes in brain amyloid burden measured by florbetapir PET in 495 participants. They identified a novel association with the 3q28 locus (rs12053868, p= 1.38 × 10 − 09 ) within gene IL1RAP. Carriers of rs12053868-G were more likely to progress from mild cognitive impairment to AD. This locus has been previously associated with CSF levels of pTau discussed in this review Deming et al., 2017).
Recently, a study conducted in the preclinical phase of AD included PET imaging data from 6 cohorts in a meta-analysis of 4314 individuals (Raghavan et al., 2020). They replicated the previously associated APOE locus (rs6857, p= 5.79 × 10 − 132 ). They also identified a novel locus, 16p13.3 (rs56081887, p= 3 × 10 − 09 ), that includes the gene RBFOX1, which encodes an RNA-binding protein named ataxin-2 binding protein. They observed that low expression of RBFOX1 was correlated with a higher Aβ burden (p= 0.002) and worse cognition (p=0.006). Furthermore, the RBFOX1 protein localized around Aβ plaques and with neurofibrillary tangles. These results imply that the protein may play a general role in AD-related proteinopathy. However, these findings need to be replicated in more extensive and independent studies.
PET detection of reduction in glucose metabolism in the posterior cingulate cortex (PCC) predicts the conversion from healthy to MCI and MCI to AD (de Leon et al., 2001;Chételat et al., 2003). A study performed a GWA in PET PCC fluorodeoxyglucose 18 F decline in 606 participants and identified a genome-wide signal on locus 14q32.12 within gene PPP4R3A (rs2273647, p = 4.44 × 10 − 08 ) (Christopher et al., 2017). Using an independent cohort of 870 individuals, they demonstrated that the variant was protective against conversion to MCI or AD (p= 0.038) and cognitive decline in individuals with dementia (p= 3.41 × 10 − 15 ). They also observed that the variant altered the expression of PPP4R3A in the peripheral blood and temporal cortex. This study has suggested that gene PPP4R3A is a potential candidate for AD therapies.

Magnetic resonance imaging
Structural imaging is an invaluable tool for both diagnoses and the understanding of the development of neurodegenerative diseases. The MRI technique detects patterns of atrophy, assesses vascular burden, and ascertains brain lesions. Early detection of signatures of neurodegenerative diseases and tracking disease progression are a few advantages of using MRI. Early-onset AD and genetically mediated FTD show specific brain atrophy signatures that can be distinguished by MRI (Frisoni et al., 2007;Möller et al., 2013;Rohrer et al., 2015). Several MRI-markers are associated with aging and predictors of dementia, including total brain volume, hippocampal volume, and white matter hyperintensities (Debette and Markus, 2010;Sperling et al., 2011;Vermeer et al., 2007). Hippocampal atrophy is one of the core biomarkers for AD McKhann et al., 2011) and is used to evaluate disease progression as it correlates with Braak staging and neuronal counts (Gosche et al., 2002;Jack Jr. et al., 2002). Furney et al. (Furney et al., 2011) conducted a GWA study using MRI for the hippocampus and entorhinal cortex in 939 individuals. They identified a significant association between entorhinal cortical volume with locus 6q14.3 within gene ZNF292 (rs1925690, p= 2.56 × 10 − 08 ) and entorhinal cortical thickness with locus 3p22.3 (rs11129640, p= 5.57 × 10 − 08 ). Additionally, they performed a gene-based analysis in AD-associated genes (Harold et al., 2009;Lambert et al., 2009) that identified an association between entorhinal cortical thickness and gene PICALM (p=6.66 × 10 − 06 ).
A 2019 GWA study in regulating hippocampal atrophy rate included 602 cognitively normal individuals identified a genome-wide significant locus near the TOMM40-APOC1 region (rs4420638, p= 9.32×10 − 9 ) (Guo et al., 2019). An association of rs4420638 minor allele (G) with lower Mini-Mental State Examination score, higher Alzheimer's Disease Assessment Scale-Cognitive Subscale 11 ratings, and accelerated cognitive decline was also found. Even though Guo et al. have focused on the TOMM40 gene, which is adjacent to the APOE gene, as the main gene driving the association at the loci, a previous study linked APOE to hippocampal volume . Chauhan et al. conducted a large cross-sectional study with 8175 cognitively normal individuals and found a nominal association (p= 0.0054) of the AD risk allele of APOE (rs2075650) with smaller hippocampal volume. More studies are necessary to understand the interactions with these genes and how they play a role in hippocampal atrophy.
A large study comprising 38851 individuals with MRI data identified genetic associations with the volumes of nucleus accumbens, amygdala, brainstem, caudate nucleus, globus pallidus, putamen, and thalamus (Satizabal et al., 2019). They identified 199 genes associated with these brain subcortical structures. The genes are implicated in neurodevelopment, axonal transport, synaptic signaling, inflammation/ infection, apoptosis, and risk of neurological diseases. A gene identified was ALPL, associated with globus pallidus volume, which encodes for an alkaline phosphatase protein present in neuronal membranes and other body fluids (Moss, 1997). Alkaline phosphatase has been implicated in promoting tau's neurotoxicity (Diaz-Hernandez et al., 2010), has increased activity in AD, and inversely correlates with cognitive function (Vardy et al., 2012;Kellett et al., 2011). Moreover, the DLG2 gene, associated with putamen volume, has been previously associated with schizophrenia, cognitive impairment, and Parkinson's disease (Ingason et al., 2015;Nithianantharajah et al., 2013;Nalls et al., 2014). The gene FOXO3 has been associated with brainstem volume, stress, sleep, and Huntington's disease (Scarpa et al., 2016). They were able to show in this study that the genetic architecture of subcortical volumes overlaps with that of neuropsychiatric disorders.
A new approach was taken by Horgusluoglu-Moloch et al. (Horgusluoglu-Moloch et al., 2020), they used an MRI technique, diffusion tensor imaging (DTI), to detect abnormal changes in neuronal fibers at the microstructural level. DTI can detect abnormal diffusion patterns in specific white matter regions in MCI and AD subjects (Rathore et al., 2017). They analyzed DTI scans from 269 individuals and integrated them with genetic and transcriptomic data to identify genetic risk factors underlying white matter abnormalities in AD. They found that the hippocampus and sagittal stratum were the white matter regions with a higher correlation with memory and AD pathology endophenotypes. They also identified a significant association between gene CELF1 (p= 8×10 − 4 ) and hippocampus when performing a gene-based association with AD risk loci identified by the International Genomics of Alzheimer's Project (Lambert et al., 2013).
In 2018, Scelsi et al. (Scelsi et al., 2018) performed a GWA study with a disease progression score (DPS) for AD that is derived from combining amyloid burden (PET imaging) and bilateral hippocampal volume (MRI). The DPS was calculated for 944 participants and used as the phenotype for the GWA analysis. They identified a significant association with locus 4p15.31 (rs6850306, p= 1.03×10 − 08 ), this variant is an eQTL for gene LCORL in brain tissue. Using an independent cohort, they showed that rs6850306 was protective against conversion from MCI to AD (n = 911, p= 0.032). This new approach was demonstrated to be useful in identifying new genetic drivers for complex diseases.

Neuropathological biomarkers
Neuropathological examination of the post-mortem brains of patients with neurodegenerative diseases has identified pathological changes that provided insights linked to diseases. The standardization and harmonization of neuropathological criteria have enabled accurate diagnoses of neurodegenerative disorders. Neuropathologic changes are evaluated in brain sections by histochemical stains or with immunohistochemistry directed against specific proteins. The neuropathological AD hallmarks include Aβ plaques and NFTs, as well as neuronal and synaptic loss (Serrano-Pozo et al., 2011).
In 2019, a sex-stratified GWA study was performed in autopsy measures of neurofibrillary tangles and neuritic plaques . They included 2701 males and 3275 females, 70% of which were diagnosed with AD. The APOE locus was genome-wide significant for neuritic plaques and neurofibrillary tangles in males and females (p< 4.96×10 − 20 ). In males, another genome-significant signal was identified in locus 7p21.1 (rs34331204, p= 2.48 × 10 − 08 ), but not significant in females (p= 0.85). Using publicly available data resources, they found an association between rs34331204 with hippocampal volume (p= 0.014) and executive function (p= 0.001) only in males. Their results suggest that this locus may confer male-specific protection from tau pathology.

Neuronal proportion
AD patients present a reduction of neuronal density in the hippocampus compared with age-matched healthy individuals (Padurariu et al., 2012). Neuronal loss precedes the formation of Aβ, implicating it in the early stages of disease (Wright et al., 2013). Thus, understanding changes in brain cell proportions may help to reveal drivers of the early stages of AD. Quantification of the brain cellular population requires expertise and still may be biased by technical artifacts (Golub et al., 2015). Li et al. developed and validated a digital deconvolution algorithm to determine the cell proportions from bulk RNA-sequencing data . Using this method, Li et al. confirmed that AD cases had a lower neuronal proportion than cognitively healthy controls.
Subsequently, this method was used to identify genes associated with the neuronal proportions in the human cortex. The neuronal proportion is nominally associated with a variant located in the TMEM106B gene (rs1990621, p= 6.40×10 − 07 ). The association was replicated in an independent dataset (p= 7.41×10 − 04 ) and a meta-analysis showed a genome-wide significant association (p= 9.42×10 − 09 ) (Li et al., 2020). The variant, rs1990621, is in strong linkage disequilibrium (LD) with coding and missense TMEM106B variant p.T185S (rs3173615, r 2 = 0.98), which has been previously associated with a lower risk for FTD (Nicholson et al., 2013). This association is independent of disease status, indicating that TMEM106B is a general protective factor, and also helps to explain the mechanism by which TMEM106B is associated with the disease: TMEM106B protects neurons from dying and opens the door to novel therapeutic approaches targeting TMEM106B.

Considerations and future directions
Genetic studies using endophenotypes are an alternative to the classic case-control design. They provide enough power to identify novel associations with smaller sample sizes and uncover genetic effects on relevant disease-associated biomarkers. Clinical diagnosis is not a reliable variable for complex diseases; therefore, characterizing endophenotypes' genetic profiles is a powerful tool to improve our understanding of disease development. Several neurodegenerative disorders share many endophenotypes; this approach can unveil the underlying biological mechanism that is in common for these conditions. Here, we described several examples using quantitative endophenotypes that identified a potential functional mechanism for genes and pathways involved in neurodegenerative diseases. One of the major challenges of working with biomarkers is addressing the heterogeneity of data collection. Data from biomarkers might be collected in different centers that use different platforms, creating the need for harmonization of the data. There are several approaches to harmonize data from different studies or use data from different studies. One of the most basic approaches is to analyze each dataset independently and perform metaanalyses. This approach is relatively straightforward but provides lower statistical power than joint analyses. Other downstream analyses, like conditional or sex-specific analyses, need to be performed in each dataset independently. Analyzing all of the data in joint analyses is a more powerful approach and facilitates downstream analyses. However, to perform joint analyses, it is necessary to harmonize the phenotype data. One of the options to harmonize the data is to first log 10 -transform the data to approximate a normal distribution and then set the mean for each dataset to zero, creating a standard score, calculate Z-scores, or use an expectation-maximization algorithm to calculate the positivity/ negativity threshold for each dataset and use the threshold to normalize the values. All of these approaches, including meta-analyses, were evaluated and compared in Deming et al. , confirming that the joint analyses provide more statistical power, and the three different methods lead to the same results. This same method can be applied for other biomarkers such as plasma and PET imaging using the same tracer.
A critical consideration for quantitative endophenotype studies is the use of early-stages biomarkers. Biomarkers of early-stages of disease represent the beginning of the pathophysiological cascades and, therefore, are more likely to lead to the discovery of successful therapeutic targets. Mendelian forms of neurodegenerative diseases have facilitated the study of early disease stages. PET imaging studies in familial AD cases suggest that brain Aβ deposition begins 20 years before clinical symptoms appear (Bateman et al., 2012). MRI imaging studies in familial FTD cases show that pathological changes start several years before symptoms (Jiskoot et al., 2019). These studies have determined some initial pathological changes in AD and FTD patients, which are good candidates as endophenotypes for genetic studies.
Another challenge, and frequently discussed question, for genetic studies using biomarker levels is whether the association test should include biomarker levels exclusively or corrected by diagnosis orcasecontrol status. This argument's rationale is that not correcting by casecontrol status could lead to a higher rate of false positives. Increasing evidence suggests that clinical diagnosis may not capture the pathological status in the case of complex traits such as AD and other neurodegenerative diseases, whose pathological changes start ~15-20 years before clinical manifestations (Bateman et al., 2012). These early pathological brain changes are currently detected by neuroimaging or CSF or blood biomarkers. They define a new subgroup, the presymptomatic, who are clinically non-demented individuals exhibiting a positive biomarker profile. Thus, genetic studies of biomarker levels stratified by case-control, or including case-control as a covariate, could lead to a flawed study design as a clinical diagnosis will not capture individuals in the pre-symptomatic stage. It is also becoming apparent that a large proportion (~30%) of individuals clinically classified as "controls" are indeed presymptomatic (biomarker positive) cases. Also, some patients could have been misdiagnosed and have another neurodegenerative disease. In this case, being biomarker negative could have distinguished the real AD cases. Also, stratifying the individuals by case and controls will lead to a much lower statistical power as the sample size will be significantly reduced.
The Cruchaga lab has published several studies with in-depth analysis of whether including case-control status as a covariate or stratifying by case-control status leads to false positive or false negatives. In the Deming et al., 2017 study, additional analyses were performed by adding clinical status or CDR as covariates. No additional genome-wide or suggested signals were found when including these covariates. Adding case-control status or CDR covariates did not significantly change the results (Supplementary Table 7  ). For example, the association of APOE with CSF Aβ levels was significant in the 'default' model (p= 4.78×10 − 94 ; β = − 0.10), but also in the model with case-control (p= 2.67×10 − 72 ; β = − 0.09) and CDR as covariates (p= 9.43×10 − 69 , β = − 0.09). Although the p-values are different, it is important to note that the effect size (β) is not significantly different between them. The same results were found for the other genome-wide loci for CSF Aβ and tau levels in this study (Supplementary Table 7 (Deming et al., 2017)).
Stratifying for case-control status will lead to a lower statistical power, but it will not change the strength (effect size, or β) of the association. In an earlier study, Cruchaga et al. (Cruchaga et al., 2013), found several locus associated with CSF p-tau levels. Stratified analyses showed that the effect size was similar in cases (association of rs769449 with ptau p= 3.38×10 − 6 ; β= 0.067) and controls (association of rs769449 with ptau p= 1.54×10 − 6 ; β= 0.075; Supplementary table 2). This effect is not only exclusive to CSF Aβ and tau, it has also been found for other AD-related proteins, including CSF TREM2. In a recent study, Deming et al. (Deming et al., 2019), found that variants in the MS4A locus were associated with CSF TREM2 levels (n= 807, β= 735.1, p= 1.15×10 − 15 ). Results from case-control stratified analyses indicated that both clinically diagnosed cognitively impaired individuals (n= 606, β= 675.8, p= 8.19×10 − 10 ) and cognitively normal controls (n= 207, β= 912.4, p= 5.20×10 − 8 ) contributed to the association between the MS4A locus and CSF sTREM2 concentrations.
These results indicate that association of genetic variants with biomarker levels are most likely true biological effects. Thus, the genetic variants lead to changes in biomarker levels that ultimately change the risk for disease. APOE affects CSF Aβ, tau, and pTau levels, whereas MS4A4A affects CSF sTREM2 levels, even before clinical onset. This biological effect does not depend on age or disease status. Associated genetic variants affect multiple biological processes of those proteins, ultimately leading to the disease's development and clinical symptoms.
Many non-specific biomarkers correlate with dementia and neurodegeneration, which may not be useful for differential diagnosis, but might be informative from a research perspective. Abnormal aggregation of tau in the brain is found in several neurodegenerative diseases (Williams, 2006). In AD, tau is deposited in the brain as NFTs, while other neurodegenerative diseases present with different tau deposits (Sergeant et al., 2005). Tau deposits are mainly assessed by immunohistochemistry in post-mortem tissue, but great effort is ongoing to develop selective tau tracers by PET imaging. These tracers still need to be tested for binding properties before being considered as reliable biomarkers. Tau PET imaging would be an essential endophenotype to understand tau aggregation's underlying pathophysiology and may give insights into pathways associated with Tauopathies.
Identification of novel endophenotypes would be a substantial benefit for clinical diagnosis and further understanding of the disease's currently unknown aspects. There are newly proposed methods to identify novel biomarkers. Mass spectrometry (MS) based proteomics in CSF and blood is one promising approach. MS is an unbiased, specific method via the amino acid sequence information at the peptide level. A recent study has described a reproducible workflow for biomarker discovery using MS in CSF samples of AD patients (Bader et al., 2020). Metabolomics is another promising approach for the identification of potential new biomarkers. Metabolomics measures biochemical products of cell processes that may capture altered complex biochemical pathways in disease. Varma and colleagues found specific metabolites of the sphingolipid and glycerophospholipid classes that are associated with the severity of AD brain pathology. In addition, metabolite concentration in blood is associated with disease progression (Varma et al., 2018). These new approaches can uncover new endophenotypes and metabolic pathways that could be targeted for therapeutic intervention.
The application of whole-genome sequencing (WGS) to identify rare variants associated with disease risk has been limited because of the cost of sequencing large cohorts. Whole-exome sequencing studies in familial forms of neurodegenerative disorders have identified several novel disease-associated genes. However, recent efforts to generate WGS in samples from neurodegenerative disease cohorts will explore the full spectrum of genetic variation, including structural variants and repeat expansions that have not been fully explored. Genetic studies of endophenotypes will also greatly benefit from WGS data availability.
In conclusion, there are many advantages of genetic studies of endophenotypes, as shown in the examples discussed in this review. This approach's most valuable benefit is that the genetic variants identified can be put into a disease context. The exact mechanism by which these variants affect the associated endophenotype and, consequently, the disease remains to be elucidated. To pinpoint the specific biological processes involved, integration of gene expression and -omics data will be crucial. As multi-ethnic datasets become available, their incorporation will reveal valuable and novel insights into the genetic etiology of neurodegenerative diseases.