The etiological effect of a new low-frequency ESR1 variant on Mild Cognitive Impairment and Alzheimer’s Disease: a population-based study

Latent genetic variations of cholesterol metabolism-related genes in late-onset Alzheimer’s disease, especially, as well as in mild cognitive impairment pathogenesis are still to be studied extensively. Thus, we performed the targeted-sequencing of 12 nuclear receptor genes plus APOE which were involved in cholesterol content modulation to screen susceptible genetic variants and focused on a new risk variant ESR1 rs9340803 at 6q25.1 for both late-onset Alzheimer’s disease (OR=3.30[1.84~4.22], p<0.001) and mild cognitive impairment (OR=3.08[1.75~3.89], p<0.001). This low-frequency variant was validated in three independent cohorts totaling 854 late-onset Alzheimer’s disease cases, 1059 mild cognitive impairment cases and 1254 controls from nine provinces of China mainland. Preliminary functional study on it revealed decreased ESR1 expression in vitro. Besides, we detected higher serum Aβ1-40 concentration in participants carrying this variant (p=0.038) and lower plasma total cholesterol level in this variant carriers with late-onset Alzheimer’s disease (p=0.009). In summary, we identified a susceptible variant which might contribute to developing mild cognitive impairment at earlier stage and Alzheimer’s Disease later. Our study would provide new insight into the disease causation of late-onset Alzheimer’s disease and could be exploited therapeutically.


INTRODUCTION
Late-onset Alzheimer's disease (LOAD, OMIM: 104310) is a complex neurodegenerative disease with polygenic background. It's characterized by extracellular senile plaques (SPs) of which the core protein is β-Amyloid peptide (Aβ), mostly 40-or 42amino acid peptides, and intracellular neurofibrillary tangles (NFTs) in the brain [1]. A variety of genetic as well as environmental factors have long been believed to associate with LOAD, with APOE as the strongest genetic factor and ageing as the most influential risk factor. Though a number of distinct loci and risk genes have been discovered by several genome-wide association studies (GWAS), the genetic etiology of LOAD remains largely elusive [2][3][4].
Multiple studies have demonstrated the correlation between altered cholesterol levels and increased Aβ formation in cellular and animals models of LOAD [5][6][7]. Specifically, optimal cholesterol content in neurons is reported to be critical to the stability of the brain microenvironment [8]. In this respect, several cholesterol metabolism-related nuclear receptor (NR) molecules in LOAD brain have been reported by independent studies [9-11]. However, whether possible NR gene variations, which may functionally cause disturbed cholesterol content and impair normal cholesterol activity, would promote Aβ formation and thus induce LOAD still warrants to be studied extensively. What's more, exploration on the effect of AD-related NR gene variations in Mild Cognitive Impairment (MCI) patients is still lacking. In this context, we speculate that latent variations of NR genes which would exacerbate Aβ-induced memory impairment through acting on cholesterol modulation are likely to participate in disease development of AD continuum including MCI due to AD as the prodromal phase.
Therefore, the present study is designed to identify disease risk-linked genetic variations of NR genes and explore the possible pathogenic mechanisms basing on the relative large case-control sample groups and with use of bioinformatic tools. With all these efforts, we intend to provide some new knowledge about the effect of potential genetic variations on cholesterol content subtlety as well as consequent Aβ production, which might have impact on MCI and AD incidence.

Identification and replication of AD-related variants
After targeted sequencing of 12 NR genes plus APOE which were involved in cholesterol metabolism modulation on 73 LOAD cases first, 9 out of 1690 rare or low-frequency SNVs enriched in AD samples were paid on great interest (Table 1; Supplementary Tables  S2, S3). Genotyping of these candidate variants on 200 LOAD cases and 200 controls subsequently revealed that rs9340803 A>G in ESR1 (ENSG00000091831) intron 4 at 6q25.1 (MAF<1%) was associated with AD risk. Replication of this ESR1 variant in three independent sample groups totaling 854 AD cases, 1059 MCI cases and 1254 controls affirmed that this variant was risk-associated for both AD and MCI (Fig. 1, Fig.  2). Additionally, genetic drift was ruled out because this SNP was of low-frequency (MAF< 2%) in all subpopulations by referring to the 1000 Genomes Project. Baseline characteristics of participants in the present study referred to Supplementary Table S1.   AGING Population-based etiology analysis of AD-related ESR1 rs9340803

ESR1 variant distribution among AD/MCI/CN
As shown in Fig. 2, this ESR1 variant was demonstrated to associate with AD risk by comparisons between AD cases and controls from our large sample collection and revealed a 3.30-fold risk of developing AD ( Fig.2A; Table S3). Higher percentage of the ESR1 rs9340803 minor allele was shown in both AD and MCI cases as compared to that in CNs (4.53%, 4.23% vs. 1.37%; p<0.001), whereas the frequency didn't differ significantly between AD and MCI cases (p=0.059). Further analyses stratified by sex and age group showed same trend existing as more minor allele carriage in AD and MCI cases (Table S4). Besides, we took APOE into consideration for further stratification, which turned out more APOEε4-AD cases (4.19%) and more APOEε4+ MCI cases (3.37%) with this ESR1 variant separately.
When comparing MCI cases to CNs solely, the disease risk was elevated by 2.08-fold with this ESR1 variant ( Fig. 2A

ESR1 variant distribution among CI/CN
AD and MCI patients were then combined together as CI cases who had amnesic memory problem in our study. As a result, these cases appeared to possess higher frequency of ESR1 rs9340803 heterozygous genotype (4.36% vs. 1.37%, p<0.001) comparing with controls, indicating the minor allele of rs9340803, like APOEε4, as a risk factor, which was manifested by a 3.18-fold of risk in the occurrence of memory problem in CI cases ( Fig. 2B; Table S5). The independent effect of the G allele of this SNP apart from APOEε4 was somewhat greater than that of APOEε4 (OR= 3.23 (1.80~4.05) vs. 1.58 (1.32~1.90); Table S5). Logistic regression analysis with age, sex and APOE fixed confirmed it (Table S6). While, the joint action of the G allele and APOEε4 greatly increased the cognitive anomaly risk, with relative to individuals with both rs9340803 major allele homozygosity and non-APOEε4 (OR= 4.69 (1.39~5.89), p=0.006). Besides, the frequency of the minor allele of this SNP also differed between CI and CN subgroup members as categorized by age and sex separately, with more old and male cases with the minor allele (Table S7). Specifically, among CI cases without APOEε4, this ESR1 variant impacted the risk of disease incidence for not only females but also males (OR=2.74, p=0.048; OR=3.52, p=0.002; Table  S8). Gene-gene and gene-environment interaction analyses were additionally done which further validate the joint effect of this ESR1 variant, APOEε4 and aging, with high-risk haplotype G-T-C occupying the potential of increasing disease risk to 2.46 (1.18~3.29)-fold for the elderly all and to 6.54 (0.88~10.52) if aged 70 and older (SupplementaryTable S9).

ESR1 rs9340803 G variant potentially damages ESR1 transcription
This intronic SNP residing 45bp downstream of exon4 was predicted by the in silico prediction programs to broke the binding motif for a splice auxiliary protein hnRNP H1 and generate a hnRNP A1 binding motif which would promote exon 4 skipping, impairing the normal regulation activity of intronic splicing process of ESR1 pre-mRNA, which might down-regulate the transcription of ESR1 (Fig. S1).

ESR1 rs9340803 variation affects the expression of the gene
In accordance with previous prediction, dual-luciferase reporters assay showed statistically significant

In silico pathway analysis of Estrogen-ERαcholesterol in brain
Estrogen deficiency and altered lipid profile are considered significant risk factors for AD. Cholesterol is the substrate for estrogen synthesis and the possible interactions between cholesterol and estrogens in the etiology of AD, may be influenced by the cholesterol metabolism [12]. Bioinformatic pathway analysis regarding the estrogen-ERα-cholesterol cycle was thus done on the basis of ESR1 gene function, previous study results and our preliminary results (Fig. 3). Estradiol (E2 as the most form of estrogen) originating from cholesterol by source, was recognized to bind with estrogen receptor (ERα mostly in hippocampus) at first and promote complex biological outcomes in brain regions at last, playing a protective role in neurodegenerative diseases. Cholesterol content in the cytoplasm and on the membrane was modulated in some way by key enzymes, which can be regulated by interaction of factors like estrogen response elements (EREs) with ERα directly or by estrogen after bind with activated Erα (Fig. 3B). Notably, we also revealed serum Aβ concentrations related to AD and MCI cases with the ESR1 variant. This AD risk-associated variant could affect by increasing the Aβ-oligomer concentrations, that is, higher Aβ1-40 concentration is observed in participants carrying this ESR1 variant in our study (Table 2). Besides, we observed the molecular epidemic characteristics of neurotoxic Aβ isoforms: most Aβ1-40 in AD cases and less in MCI cases while lest in controls; most Aβ1-42 in AD cases while less in MCI cases and controls. That is may mainly because Aβ42 oligomers, which are more hydrophobic and prone to build up, have been reported to emerge preceding Aβ40 during the Aβ cascade process, and Aβ40 oligomers accumulate overwhelmingly as the most form of Aβ species [16,17]. Our results underlie the important role of ESR1 genetic variation in Aβ pathology, and the sequence of Aβ isoforms at different times along the continuum of the degenerative process.

DISCUSSION
This ESR1 variant is also found to be in correlation with altered blood lipid fractions in AD and MCI cases of our study. But significantly lower plasma TC level is only seen in ESR1 variant carriers with AD diagnosis (Table 3), which can be partially explained that low Table 3. Analyses on blood lipids levels (mmol/L). AGING plasma TC levels could be a result of AD pathology and linked to increased insulin resistance especially in AD patients lack of estrogen protection [18,19]. Lipid metabolism dysregulation is found systematically and in brain among patients with LOAD [7]. In fact, disturbed cholesterol content which would impair the cognitive function have been substantially explored, while, the relation between ERS1 variants and cholesterol level in AD and MCI patients is till poorly understood [20,21]. Considering the fact that altered cholesterol levels accompanied by Aβ accumulation increasing leads to AD development [22][23][24], we pay much attention to the relationship among ESR1 variation, Aβ concentration and cholesterol levels in AD and MCI cases. In our study, statistically significant correlations exist between Aβ1-40, Aβ1-42 and Aβ1-42/1-40 each and TC levels in ERS1 variant carriers, implying a link between abnormal cholesterol content and Aβ production in brain in the context of ESR1 variation, similar to studies on other cholesterol metabolism-related genes like CLU and SORL1 [13,14]. Whereas, knowledge about the impact of ESR1 variation on cholesterol content and Aβ production in AD patients, particularly in MCI cases, is still lacking.
To explain the possible functional mechanism of impact of ESR1 variation on cholesterol level and Aβ in AD, even in MCI development, we suppose that along with aging, brain microenvironment of individuals carrying this ESR1 variant are much vulnerable to specific environmental factors such as altered lipids levels, thus mutant ERα which partakes in the cholesterol metabolism, would exacerbate cholesterol disturbance and triggers a series of reaction including continually Aβ production and lead to neuronal apoptosis in brain tissues, inducing MCI early and eventual AD later. Our study implies some new ideas on MCI as early phrase of AD development. Correspondingly, bioinformatic pathway analysis ulteriorly hint us that ERα might modulate, in some way, the content of cholesterol in brain by inhibiting several key cholesterol-related enzymes which all are expressed in the hippocampus and regulate the estrogen synthesis at different steps (Fig. 3). Therefore, this variation on ESR1, which might have functional impact by modulating the cholesterol content in brain and thus promoting Aβ production, could be a causal factor among the complex genetic pathological basis of AD. AGING As one of NR family member, estrogen receptor α (Erα, P03372), encoded by the ERS1 gene, is highly expressed in brain regions especially the hippocampus and hypothalamus which are associated with memory and cognitive performance [25][26][27]. ERα is activated by binding with estrogen and estrogen-ERs has been proposed to partake in cholesterol metabolism and Aβ accumulation in brains from LOAD patients [28]. In vitro gene expression assay in our study reveals decreased ESR1 expression with homozygous risk alleles. So, it is reasonable to infer that this ESR1 variant might interfere the amount and activity of ERα, leading to loss of neuroprotective effects of estrogen with promoted Aβ production included. Bioinformatic pathway analysis ulteriorly hints us that ERα might take part in modulating, in some way, the content of cholesterol in brain by inhibiting several key cholesterol-related enzymes which all are expressed in the hippocampus and regulate the estrogen synthesis at different steps (Fig. 3), on the ground of local estrogen synthesis with cholesterol as precursor in the hippocampus of adult brain [23,28].
In addition, variants of several cholesterol metabolism modulation-related genes, nuclear receptor encoding genes for example, has been reported by a series of studies to alter the cholesterol level in brain and thus increase Aβ production, exacerbating the cognitive decline as a result [29-31,10]. Based on the overall findings, we, therefore, propose that the ESR1 variant identified in our study might act by perturbing the subtlety of cholesterol content in brain, to promote Aβ production as well as increase Aβ toxicity, and consequently induce cognitive decline which is manifested by worsening cognitive symptoms slightly in the elderly with MCI and severely in AD seniors, whereby participating in the pathological process of AD.
What needs to be emphasized is that we conducted multiple tests based on a relative large sample size. Although Bonferroni correction was adopted in pairwise comparison, there was still a high possibility of type I error. Therefore, results of this study are mostly exploratory, and the corresponding conclusions need to be verified by subsequent studies. No doubt intensive study of the functional effect of this variant is warranted for our following research, we hope findings of the present study may aid in understanding more about the pathological underpinnings of AD.

Subjects
A total of 5635 Han Chinese participants were enrolled to our study over a 5-year period (from November 2012 through December 2017) and 3167 eligible ones (57.4% were female) were investigated finally, consisting of 854 cases with LOAD (median age(Q):77.50 [14] yearold), 1059 with mild cognitive impairment (MCI; median age:73.0 [12] year-old) and 1254 age-and region-matched cognitively normal controls (CNs; Cognitively normal control subjects were recruited from community-based and hospital-based elderly following the criteria as I) age 50 years or greater; II) no history of suggesting of brain diseases or cognitive decline; and III) no obvious hemorrhage or infarction in brain imaging.
All procedures including blood sample collection were approved by the ethical review boards at the centers involved in this study and written informed consents was obtained from each subject or proxy. The study was performed according to the principles of the Helsinki Accord.

Candidate variants selection
To discover new, rare or low-frequency variants that are associated with LOAD, we applied several rigorous analysis steps and selected candidate SNVs enriched in LOAD samples for subsequent association tests basing on given criteria. Detailed selection process referred to Supplementary Methods.

Genotyping of 9 variants using iPLEX Gold chemistry
Candidate rare or low-frequency SNVs were genotyped on 200 LOAD cases and 200 controls using the MassARRAY Compact system (Sequenom, San Diego, CA). Quality control of genotyping was carried out afterwards.

Large scale population screening on ESR1 rs9340803 and APOE
Among multiple population from nine provinces across northern and southern China mainland, additional 2694 individuals composed of 581 LOAD cases, 1059 MCI cases and 1054 controls were screened on APOEε4 and ESR1 rs9340803 genotypes.

Detection of Aβ-oligomer concentrations in the serum
On recognition that as an important biomarker of AD, Aβ can unwittingly accumulate in the brain for years, disrupting nerve connections essential for thinking and memory, and can enter systematic blood via the bloodbrain barrier, we detected the concentration of serum Aβ-oligomers exploiting the Human/Rat Amyloid (40/42) ELISA Kit (WAKO; Osaka, Japan).

In vitro expression assay of mutated ESR1
Firefly luciferase and renilla luciferase reporter gene expression assay and kymographs were performed to analyze the expression of ESR1 transfected with wildtype or variant RTN3 constructs on 293T cell line for preliminary functional exploration on ESR1 variation.

Statistical analysis
Genotypes were evaluated for departure from Hardy-Weinberg equilibrium (HWE) in the controls using chisquared tests. Variants with p<0.05 were considered to deviate from HWE. Minor allele frequency (MAF) of variants were used as the risk allele frequencies and 4% was defined as the prevalence of AD [1].
Data were presented as number and percentages for categorical variables. Given the high inter-individual variability, most of the continuous data produced in this paper were non-normally distributed (assessed via Kolmogorov-Smirnov, Shapiro-Wilk tests, and visual inspection of Q-Q plots). Thus median (interquartile [Q] or [25%, 75%]) thus was used. When appropriate, parametric tests were computed, but for the most part, the non-parametric alternative had to be adopted. Mann-Whitney U test and Kruskal-Wallis test were used, respectively, to compare means of groups of variables skewly distributed. The frequencies of categorical variables were compared using Pearson χ2 or Fisher's exact test, when appropriate. Bonferroni correction was used to correct multiple testing. Logistic regression and correlation analyses were performed. A p value less than 0.05 was considered statistically significant. Odds ratios (ORs) and 95% confidence interval (CI) were also calculated using SSPS 19.0 V software (SPSS Inc., Chicago, IL, USA).

CONCLUSIONS
We present a new low-frequency risk variant, ESR1 rs9340803, in both LOAD and MCI cases, which might possess etiological relation to AD along the whole disease continuum. This ESR1 variation independently or synergistically with APOE, elevates the risk of AGING cognitive damaging for cases in our study. We put forward that, for the first time, this variation on ESR1, which might have functional impact by modulating the cholesterol content in brain and thus promoting Aβ production, could be a causal factor among the complex genetic pathological basis of AD.

ACKNOWLEDGMENTS
We thank all participants who offered their genomic DNA and clinical information for this study and appreciate the work of all clinicians who helped evaluate samples and data.

Discovery of NR gene variants using targeted NGS
The sequencing data yielded, on average, 125.5 Mb of 100 bp paired-end sequence reads per individual, representing an average coverage depth of approximately 126X. Approximately 85.3% of the sequence reads were mapped to unique regions of the human genome (Build 37.5, hg19; BWA software). The Samtools software called out, on average, 134 single nucleotide variations (SNVs) per individual compared to the reference genome.
In all, we identified 1690 SNVs. Among those, 329 SNVs were consistent with those in the public SNP database and 1361 SNVs were previously unknown. 100 SNVs resided within putative promoter regions of the 13 genes and 1564 SNPs were located in the introns, all known exons, untranslated regions, or splice sites. A total of 26 SNPs were within non-coding RNA intronic and exonic regions (Table S2).

Gene-gene & gene-environment interaction
Compared with CNs, ESR1 rs9340803 G allele and APOE4 synergistically elevating the effect size to 4.69-fold(1.39-5.89) among AD or MCI patients. Given the preliminary results and the fact that aging was the most prominent risk factor, we're promoted to ask whether the identified new low-frequency ESR1 mutation, APOE4 together with aging may collectively contribute to the development of AD. Therefore, gene-gene interaction and gene-circumstance(aging) interaction were explored using GMDR software(https://sourceforge.net/projects/gmdr/). It turned out that one three locus-aging model, ESR1 (rs9340803)-APOE (rs429358, rs7412)-aging, had a maximum testing accuracy of 71.22% and a maximum cross-validation consistency (100/100) that was significant at p<0.0001 level. In the three-locus(rs1387923-rs2769605-rs6265) model, the

Functional prediction for the LOAD-associated variant
We explored the role of ERS1 rs9340803 G allele in the cytological level preliminarily. Rs9340803A /G was located in the intron 4 of ERS1 gene, close to the 3' receptor site splicing region of exon 4. MutationTaster, Human Splicing Finder and SFmap were used to assess the potential impact of rs9340803 G variant on ERS1 alternative splicing, and this variant was pridicted to damage the regulation of intrinsic splicing of precursor ERS1mRNA. In addition, SFmap predicted that the G allele variant would destroy the binding site for the hnRNP H1, and Human Splicing Finder predicted it to generate a binding site for hnRNP A1, which is known to promote exon exclusion and induced abnormal exon skipping.  The procedures of the present study were approved by the ethical review boards at all involved study centers and written informed consent was obtained from each subject or proxy.
Patients with other type of dementia such as frontotemporal dementia (FTD), dementia with Lewy bodies (DLB), Parkinson's dementia (PD) and vascular dementia (VD) or dementia caused by other factors such as multiple sclerosis will be excluded with cautious differential diagnoses. People with a history of alcoholism or drug abuse or for whom structural neuroimaging didn't support or ruled out a diagnosis of AD were also excluded. Criteria for undiagnosed AD patients included 1) an MMSE score lower than 27; 2) exclusion of other likely types of dementia; and 3) lacking of imaging examination at the time of recruitment.
Among the overall enrollees, 2468 cases were excluded, 43 AD cases, 101 MCI cases and 2324 CNs, to be specific, with respect to the corresponding conditions likeⅠ) failure on APOE or ESR1 genotyping; Ⅱ) demographic information incomplete; Ⅲ) age<50 years old. Thus, 3167 individuals were investigated finally.

Selection of 12 candidate NR genes and APOE
Twelve candidate cholesterol regulating gene-relatednuclear receptor (NR) genes (VDR, THRA, ESR1, ESR2, LXRB, PPARA, PPARB, PPARG, AR, GR, RXRA and RXRB) were selected for next-generation sequencing basing on previous meta-analysis and bioinformatic pathway analysis, among which I) all are involved in the cholesterol metabolism; II) two genes, LXRB and ESR1, are associated with up-regulation of APOE expression; and III) four genes, RXRA, VDR, ESR1 and AR, contain common LOAD-associated polymorphisms. Additionally, we incorporated APOE into this study.

Targeted sequencing
An approximately 150 kb genome region across the 12 NR genes and APOE were sequenced using pools of PCR productions from 73 Chinese LOAD patients. In brief, we amplified the putative promoter regions (3 kb upstream of the transcriptional start sites), all known exons, untranslated regions and the 200 bp intronic sequence flanking exons for detection of variation which affects alternative splicing of genes using PCR technique. Purified amplicons subsequently were used for constructing fragment libraries with Truseq DNA Sample Preparation Kit (Illumina San Diego, California, USA). Bar-coded fragment of sequencing libraries were added using a paired-end DNA sample preparation kit (Illumina, California, USA) and Illumina multiplexing adaptor (Illumina) according to the manufacturer's instructions. The quality control of libraries were tested utilizing real-time PCR with the LightCycler480Ⅱsystem (Roche Madison, WI, USA). The 73 pooled libraries were then used for parallel sequencing with the utilization of a Hiseq 2000 sequencer (Illumina San Diego, California, USA). Sequencing data were aligned to the genomic reference (GRch37.5) using BWA software. Single nucleotide variants (SNVs) and small deletion and insertion (Indel) variants were called using SAMtools 1.31 and GATK 2.6, respectively and annotated by comparing with the dbSNP, HapMap and 1000 Genomes databases.
To discover new, low-frequency or rare variants that are associated with LOAD, we applied several rigorous analysis steps. We re-aligned the BWA-aligned reads using the Sequence Alignment/Map (SAM) tools 1.31 and the Genome Analysis Toolkit (GATK) 2.6. Potential SNVs and Indels were called using SAMtools. In this process, several heuristic rules were applied: (i) all samples should be covered sufficiently (≥10×) at the genomic position being compared; (ii) the average base quality for a given genomic position should be at least 15 in all 73 pooled samples; (iii) The variants should be supported by at least 10% of the total reads; (iv) Each variant should be supported by at least five reads. To further reduce the false positive calls, SNVs and Indels called using the SAMtools were re-called with GATK software package in the 73 pooled samples. We