Targeted sequencing on neurodegenerative genes identied novel causal and risk variants of familial Alzheimer's disease

Mutations of APP, PSEN1 and PSEN2 only account for a small portion of familial Alzheimer’s disease (AD), leaving the genetic factors for the rest AD families unexplained. Neurodegenerative diseases have some neuropathological, clinical and genetic crossover. The effect of neurodegenerative genes to familial AD remains unknown. We hypothesized that rare variants of neurodegenerative diseases genes may lead to family aggregation of AD. The aim of the study was to investigate the effect of neurodegenerative genes to familial AD. Targeted sequencing of 277 neurodegenerative disease genes was performed on probands from 75 AD families of Chinese origin. Rare coding variants segregated in families were tested for association in an independent cohort of 506 patients with sporadic AD and 498 cognitively normal controls. East Asians data from the Exome Aggregation Consortium (ExAC) were used as a reference control. results


Introduction
Alzheimer's disease (AD) is a kind of neurodegenerative disease characterized by progressive episodic memory loss and other multiple cognitive decline including learning, attention, orientation and calculation. AD is the most common form of dementia, followed by vascular dementia (VD), frontotemporal dementia (FTD) and dementia with Lewy body (DLB) (1). According to the World Alzheimer Report 2018(2), 50 million people were living with dementia worldwide in 2018 and the number will more than triple to 152 million by 2050. The circumstance is more severe in low-and middle-income countries where 66% of the people with dementia live. That 66% is set to rise to 71 or 72% by 2050. AD, together with other forms of dementia, has become a major social problem that seriously affects human health, especially for low-and middle-income countries. AD can not yet be cured, however, knowledge of the factors that cause AD or modify the risk of AD to delay its onset, even a ve-to-ten year delay, would have a massive global impact and help to manage the severe social problem.
AD can be divided into familial AD (FAD) and sporadic AD (SAD) depending on whether there is family aggregation. FAD was thought to be caused by a certain causal gene inherited inside the family following Mendel's law. Presenilin 1 (PSEN1), presenilin 2 (PSEN2), and amyloid precursor protein (APP) involved in amyloidogenic processing have been identi ed as the causal genes of FAD (3,4). To date, a small portion of AD families (about 500 families to date) has been identi ed as carrying mutations of the three causal genes (http://www.molgen.ua.ac.be/ADMutations/), leaving the genetic factors of the rest unknown. The etiology of SAD is complicated and multifactorial, including aging, sex, education, and risk genetic factors. APOE, together with more than 20 loci identi ed by genome-wide association studies (GWAS), is recognized as a risk gene for SAD (5,6). Those loci are common variants (MAF > 5%), and their effect on AD is assumed to be low to moderate as each locus accounts for only a small proportion of the variance in AD susceptibility, thus, they don't lead to family aggregation.
To elucidate the causative or high risk genetic factors of AD, studies on AD families were performed using multiple sequencing technique such as whole-exome sequencing (WES), whole-genome sequencing (WGS) and targeted sequencing. To date, no novel causal gene has been identi ed; however, a series of rare coding variants from risk genes that segregated in large AD families were identi ed to be causal to FAD, including R47H in TREM2 (7), rare variants in SORL1 (8,9), and ABCA7 (10). As those rare variants mostly located in the functional regions that lead to amino acid changes, researchers tend to consider them as high risk that is enough to cause family aggregation of AD patients. Thus, we proposed that the remaining heritability of FAD may partly come from rare variants of those risk genes.
Neurodegenerative diseases are a series of diseases with insidious onset and slow progression, characterized by aggregation of abnormal proteins in neurons and other cells, including AD, Parkinson's disease (PD), amyotrophic lateral sclerosis (ALS), prion disease, etc. Neurodegenerative diseases have some neuropathology, clinical and genetic crossover (11,12). For example, Parkinson's disease dementia (PDD) has both α-synuclein and Aβ deposits in the brain that may co-cause the disease onset (13,14).
ALS and FTD share some common causal genes such as C9orf72 (15,16). In previous study, we detected a novel missense mutation p.S17G of PRNP gene in a sporadic LOAD patient, suggesting that PRNP mutation is present in Chinese AD patients (17). A recent study from our research group found that expanded GGC repeats within human-speci c NOTCH2NLC gene could cause a kind of neurodegenerative disease, neuronal intranuclear inclusion disease (NIID), that part of the affected families were formerly diagnosed as AD or PD (18). Whether mutations of genes related to neurodegenerative diseases, such as other types of dementia, PD and ALS, also contribute to AD remains to be investigated.
Previously studies on the effect of neurodegenerative genes to AD mostly conducted in the early-onset AD or sporadic AD cases. To date, no study that focus on the effect of neurodegenerative genes to familial AD cases was performed. To further elucidate the underlying genetic factors of AD families that do not carry known causative mutations, we performed targeted next-generation sequencing covering 277 candidate genes involved in neurodegenerative diseases (Additional le 1) in a discovery set of 75 AD families, including causative genes, previously reported risk loci and genes functional associated with AD, FTD, DLB, PD, ALS and prion disease. Candidate causal and risk variants were further genotyped in an independent sporadic case-control dataset. Our study will help to deepen the insight on the genetic basis of AD and provide genetic characteristics of Chinese FAD patients.

Sample selection
All families selected for targeted sequencing and all sporadic AD patients for follow-up genotyping were referred to the outpatient neurology clinics of Xiangya Hospital. For each patient, probable AD diagnosis was established using the National Institute of Aging-Alzheimer's Association (NIA-AA) Criteria and the International Working Group (IWG-2) criteria by two experienced neurologists. Each selected family had at least two patients suffering from AD in two generations. Families with known mutations in APP, PSEN1, PSEN2, GRN, MAPT or C9orf72 were excluded from the study. Cognitively normal controls were related individuals of matched geographical ancestry, age and sex. The study was approved by the Ethics Committee of Xiangya Hospital, Central South University in China (equivalent to an Institutional Review Board) and performed in accordance with the approved guidelines and regulations. Written informed consent was signed by each subject.
Targeted sequencing of neurodegenerative disease related genes in AD families Targeted sequencing was performed in probands from 75 AD families, targeting all exonic sequence of genes including 119 dementia-related genes, 133 PD related-genes and 25 ALS-related genes (the gene list can be found in Additional le 1).
Genomic DNA was extracted from peripheral blood leukocytes using a QIAGEN kit following the supplier's instructions. We used NanoDrop (Thermo Fisher Scienti c, Waltham, MA) and Qubit 3.0 to detect the concentration and purity of DNA. Quali ed genomic DNA was then prepared for establishing a sequencing library through the following procedure including genomic DNA break by Bioruptor Pico, endlling, A-tail add, joint link, and pre-PCR reactions. The length of the DNA library fragment was measured between 220-320bp by QSEP 100. The targeted DNA area was captured using the Dynabeads MyOne Streptavidin T1 magnetic beads, and then enriched through a post-PCR reaction. Sequencing of all DNA samples occurred on Illumina HiSeq 2500. The sequencing data were then processed using the Illumina Sequence Control Software to assess the quality.
Variants that passed quality control were analyzed and selected based on the following standards:1) rare variants with minor allele frequency <5% in the Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium (ExAc); 2) heterozygous variants located in the exon and splice-site; 3) functional variants (missense, nonsense and frameshift); and 4) variants with multiple lines of computational evidence (including SIFT, Polyphen2, Mutationtaster) supporting a deleterious effect.
Rare, damaging variants shared by at least two families were then genotyped in sporadic AD cases and cognitively normal controls to determine its population frequencies and risk. We also used the East Asians data from ExAC as a reference control to compare the frequencies of the variants found in FAD.
Follow-up genotyping in an independent case-control dataset A total of 6 rare, damaging variants with overrepresentation in AD families were genotyped in sporadic AD cases (N=506) and controls (N=498) using Multiplex Snapshot SNP assay. Samples were rst puri ed and then ampli ed using extension primers, designed adjacent to the 6 variants sites, and labeled with different single uorescent dideoxyribonucleoside triphosphate (ddNTP) added to the polymorphic site.
The products were then sequenced on an ABI3730XL capillary electrophoresis instrument. GeneMapper 4.0 software (Applied Biosystems Co., Ltd., USA) was used for data analysis.

Statistics
Frequency comparisons between the case and control were made using Pearson's χ2 test. When the SNP numbers were fewer than 5 in a group, Fisher's test was adopted. For repeated measures, P<0.05/n (n=total number of SNPs in comparison) was taken as statistically signi cant after the Bonferroni correction. All data are presented as mean+standard deviation (SD). Statistical analysis was conducted using GraphPad Prism version 5.0.

Results
Among the 75 AD families, 40 of them were early-onset AD (EOAD) with an onset age younger than 65 year, and 35 of them were late-onset AD (LOAD) with an onset age older than 65. The average age at onset (AAO) and female ratios were 51.62±8.34 and 0.47 (EOAD), 71.87±5.72 and 0.61 (LOAD) respectively. A total of 309 rare, damaging, coding variants were sequenced in 67 families. All these variants had a high sequencing quality score with a 20x coverage of 96.86%. The average sequencing depth was 256.86. The major genetic causes remained unclear for all the AD patients in this study.
1.Identi cation of a novel mutation p.P410S of PLD3 gene in an early-onset AD family A novel rare variant c. 1228C>T, p.P410S of PLD3 gene was found in an early-onset AD family ( gure 1). PLD3 gene is a risk gene for AD rst identi ed by Cruchaga et al. in 2014. The variant V232M was found to segregate in two large late-onset families and showed a strong association with disease status in a large case-control dataset. P410S identi ed in the study was a novel variant that absent in the Exome Sequencing Project, 1000 Genomes Project, ExAc database, and our case-control cohort. The variant P410S was segregated in two affected individuals (II:3 and II:5) but absent in two unaffected siblings (II:7 and II:8). The P410S was located in the conserved region of PLD3 protein. Pathogenicity predictions revealed that P410S was damaging using SIFT, Polyphen2, and Mutationtaster. All of the above indicated that p.P410S of PLD3 gene might be a causative mutation to this AD family.
The proband (II:3) was a 56-year-old woman who complained of memory impairment and personality change since the age of 53. Neuropsychological assessment of the proband showed short-and long-term memory impairment, as well as de cit of attention and executive functions. Cerebrospinal uid Aβ  level was 434.2 pg/mL (control values >651 pg/mL), Aβ 1-42 / Aβ 1-41 ratio was 0.07 (control ratio >0.1), total tau value was 491.43 pg/mL (control values <290 pg/mL). Her Brain MRI imaging showed asymmetrical atrophy in the bilateral hippocampus and mild atrophy of parietal areas. She was diagnosed as typical AD. Her sister (II:5) complained of episodic memory loss but presented normally upon neuropsychological assessment and cranial MRI scanning, suggesting a prodromal AD diagnosis.
2. Identi cation of a known causal mutation p.I2012T of the LRRK2 gene in an early-onset AD family We identi ed a missense substitution p.I2012T of the LRRK2 gene in an early-onset AD family ( gure 2). The index LRRK2 p.I2012T mutation carrier was a 59-year-old man. He complained of episodic memory loss and progressive apathy and fewer words since the age of 53. His neuropsychological assessment revealed 2 /30 in the Mini-Mental State Examination (MMSE) and 1/30 in the Montreal Cognitive Assessment (MoCA). Physical examination revealed no static tremor and myotonia. Brain MRI showed mild to moderate atrophy in the bilateral hippocampus, temporal and parietal areas. He was clinically diagnosed as probable AD. His mother died of heart attack at the age of 78. The caregiver recalled that she had cognitive decline during her last 5 years. No brothers or sisters of the proband complained of dementia or parkinsonism. We further examined the p.I2012T status of the three offspring (III:1, III:2 and III:3), and found that all the three children of the proband were p.I2012T carriers. Their age were 38, 36 and 32, respectively. Their neuropsychological assessment and physical examination were all normal. We will take tracking observation on the three p.I2012T carriers.

3.Variants of AD risk genes and follow-up genotyping
Fifty-one rare variants of the known AD risk gene were detected in our screening additional le 2 , including 17 missense variants in ABCA7, 11 missense variants in SORL1, 1 frameshift and 4 missense variants in ABCA1, 3 missense variants in CD33, 1 frameshift and 3 missense variants in CR1, 2 missense variants in BIN1 and PTK2B, and 1 missense variant in FERMT2, MS4A6A, TREM2 and CLU, respectively. No known risk rare variants reported in other cohorts were found in our case.
A total of 6 rare variants in 4 genes were present in 2 or more unrelated families, including 1 missense variant each in TREM2, and ABCA1, and 2 missense variants in CR1 and ABCA7 respectively. We further tested their association with risk of AD in an independent sporadic case-control dataset. Furthermore, we compared the frequency of each variant with the East Asian allele frequencies (n=8,628) in the ExAC database ( Table 1). The p. P143S, rs192694824 in ABCA7 and rs374551420 in CR1 were signi cantly associated with familial AD when compared with the East Asian controls in the ExAC database after correction for multiple testing (p= 0.005437, 0.001383, 0.000549. Bonferroni-corrected P-value = 0.0083). The rs200820365 in TREM2 was signi cantly associated with AD when we compared the total AD carriers with the East Asian controls in the ExAC database after correction for multiple testing (p= 0.000396, Bonferroni-corrected P value =0.0083).

Variants of other neurodegenerative disease-related genes and functionally related genes.
A total of 257 rare variants were found in other neurodegenerative disease-related genes and functional related genes (additional le 3). All the nominally signi cant variants were extremely rare in the general population (max MAF=5%). A majority of them were predicted to be deleterious. The variant p. G223del in FUS was shared by 3 FAD families (2 LOAD and 1 EOAD) and absent in the Exome Sequencing Project, 1000 Genomes Project, and ExAc database. We further tested its association with risk of AD in our casecontrol datasets and found that p. G223del is frequent in AD (0.04 in family cases and 0.006 in sporadic AD cases) compared with controls (0.002). After Fisher exact test, the variant was signi cantly associated with FAD (p =0.008). We also detected a missense variant p.A104T of PARK7/DJ-1 in 2 EOAD families. The p.A104T was previously reported in sporadic PD patients and reported as a risk factor for EOPD.

Discussion
Neurodegenerative diseases, such as AD, PD, FTD, ALS and prion disease, are a group of disorders characterized by aggregation of abnormal proteins in neurons and progressive degeneration of affected neurons. Neurodegenerative diseases have some neuropathology, clinical and genetic crossover. To determine whether genetic mutations in neurodegenerative diseases could have a role in FAD etiology, we performed a targeted sequencing of 277 candidate genes related to neurodegenerative disease in 75 FAD probands. Casual and risk variants were found to be segregated within AD families, and showed strong association when combined with ExAC East Asian control data and our case-control cohort. This result indicated a potential role for rare variants from AD risk genes and genes linked to other NBDs in FAD etiology.
A novel missense mutation, p.P410S, of PLD3 was identi ed in one of our AD families. PLD3 was identi ed as a new risk gene by a next-generation sequencing study on a large LOFAD cohort of European descent. p.V232M was found to segregate with disease status in two large late-onset families and showed strong association with disease status in a large LOAD cohort (4,998 cases and 6,356 controls) (19). Since then, multiple replication studies in different ethnicities (Belgian, Germany, French, Chinese, etc) have been performed on rare variants of PLD3, but they showed controversial association consequences with AD risk (20)(21)(22). Notably, in a northern Han Chinese population study of 960 LOAD cases and 2,290 controls, the authors found the rare variants p.I163M and c.1020-8G > A that conferred considerable risk of LOAD in their cohort; they detected a p.V232M carrier but found no association of p.V232M with the LOAD risk in their cohort (23), which indicated that the role of PLD3 to AD risk was still unclear and that it showed strong ethnic diversity. The p.P410S identi ed in our study was segregated in an EOAD family. Pathogenicity predictions revealed that p.P410S was damaging using SIFT, Polyphen2, and Mutationtaster. p.P410s is absent in the Exome Sequencing Project, 1000 Genomes Project, ExAc database, and our case-control cohort, indicating that p.P410S might be a causative mutation to this EOAD family. Our result indicated that rare variants of PLD3 may also play an important role in EOAD family as well as LOAD as previously described. Large family sample studies should be conducted on PLD3 in the future.
Mutations of LRRK2 is a main genetic cause for both familial and sporadic PD rstly identi ed by Zimprich et al. in 2004(24). Most of the previous reported LRRK2 mutations are G2019S and R1441C/G/H (25,26). The I2012T mutation mostly occurs in Asian PD patients (27,28), which lies in the activation loop of kinase domain and could decrease LRRK2 kinase activity in several studies (26,29). The clinical spectrum of I2012T mutation carriers varies widely, ranges from typical late-onset levodoparesponsive PD to FTD with parkinsonism. We found a missense substitution p.I2012T of the LRRK2 gene in an EOAD family. The proband showed a typical AD clinical course with episodic memory loss, progressive apathy and fewer words, but no typical FTD symptoms or parkinsonism were found in the proband or his relatives. Brain MRI showed mild to moderate atrophy in the hippocampus, temporal and parietal areas, suggesting a diagnose of probable AD. As we know, in the precious studies on relationship between AD cognitive impairment and LRRK2 mutations, the R1396G mutation in LRRK2 seemed to have a protective effect on cognitive impairment in LRRK2-related PD. The G2019S carriers had a lower executive performance than non-carriers in the Ashkenazi Jewish group, while with a negative result in an Italy AD cohort (30). To date, neither AD nor dementia but without parkinsonism phenotype has been reported in I2012T mutation carriers. Our research reported a new clinical presentation of LRRK2 I2012T mutation, and which propose that screening of LRRK2 mutation is needed in the genetic cause unknown FAD patients.
In the follow-up genotyping, the p.P143S, rs192694824 in ABCA7 and rs374551420 in CR1 were signi cantly associated with FAD risk, and the rs200820365 in TREM2 was signi cantly associated with AD risk when they were compared with the allele frequency of the Asian data in ExAc. No previously reported risk variants were found in this study, suggesting region speci city and race speci city of the rare variants in AD risk. ABCA7 encoded a protein that is involved in phospholipid transmembrane transport, phagocytosis of apoptotic cells, and microglia-mediated clearance of Aβ (31,32). In the previous study, several risk variants that contributed to AD in ABCA7 differed between people of different ethnic origins. In a European population, targeted resequencing identi ed an intronic variant in ABCA7(10), whereas among African-Americans, the most strongly associated SNP is rs115550680 (33). In this study, 2 rare variants in exons that showed strong association were identi ed in the Chinese FAD population. CR1, which encodes a large type 1 transmembrane glycoprotein involved in the immune complement cascade, was among the rst susceptibility genes for AD to be identi ed (34). To date, no rare variants in CR1 have been reported to be associated with AD. Findings from a genome-wide CNV association study con rmed association between an intragenic CNV and AD (35). A rare coding variant, rs4844609, located outside the CNV region, was reported to explain the GWAS association (36). However, that nding has yet to be replicated. In this study, rs374551420 in CR1 was signi cantly associated with FAD, existing in 2 LOFAD families and absent in controls or unaffected family members. The rare variant R47H in TREM2 was rst detected with strong association with LOAD with an effect size similar to that of APOE ε4 from two NGS-based studies (7). The genetic association between TREM2 R47H and AD has been widely replicated. R47H was absent in our sequencing data; instead, we found that rs200820365 in TREM2 showed a strong association with AD in this study. The rs200820365 was overrepresented in EOAD families and SAD, supporting its causative role in both EOAD and LOAD. Taken together, our study above con rmed that AD risk genes may have a high burden of deleterious variants in both EOAD and LOAD, but differences in the observed mutations could be due to ethnicity, capture, coverage differences and sample size in different studies. The biological impact of these variants remains to be investigated.
A total of 257 rare damaging mutations in FTD-, VD-, PD-, and ALS-related and functionally related genes were identi ed in this study. These variants were located in the conservative domain, and had at least one computer-based software predicting damage, supporting their potential causative role. Their association with AD needs a large cohort study to elucidate. Notably, the variant p. G222del in FUS was frequent in AD cases. Through follow-up genotyping in our case-control cohort, it was found in 3 FAD families and 3 sporadic AD cases. By Fisher's exact test, p. G223del in FUS was signi cantly associated with FAD. Variants in FUS have been identi ed as causal and risk for ALS, essential tremor, and FTD (37)(38)(39)(40). The p. G223del was located in the G-rich region of FUS that is not strongly conserved in closely related organisms, suggesting it may be a benign variant. Its association with FAD needs further validation. We also detected 2 families carrying p.A104T in PARK7/DJ-1. The p.A104T of PARK7/DJ-1 was rst reported in a sporadic PD patient in 2003 (41). Previous studies on PARK7/DJ-1 and PD supposed that mutations in the heterozygous state in the parkin gene may represent risk factors (42). As the role of p.A104T in PD remains unclear, the association of p.A104T with FAD needs large cohort and functional study for con rmation.

Conclusions
In summary, of the 75 FAD families that excluded mutation of causal dementia genes, we identi ed 2 EOAD families that carrying causative mutations of PLD3 and LRRK2, 13 AD families that carrying risk variants of TREM2, ABCA7, CR1 and FUS. The results here implied that multiple rare coding mutations of causal and risk neurodegenerative genes were present in familial AD. Further con rmation and characterization of these rare variants will be important for understanding the biology of AD and developing therapeutic strategies. Our data supported that the genetic architectures of AD, PD, and FTD/ALS can be overlapping. Thus, we proposed that targeted sequencing on neurodegenerative disease genes was necessary for genetic unclear AD families. Availability of data and materials: The materials used in the current study are available from the corresponding author on reasonable request.  Figure 1