Computational analysis of functional SNPs in Alzheimer’s disease-associated endocytosis genes

Background From genome wide association studies on Alzheimer’s disease (AD), it has been shown that many single nucleotide polymorphisms (SNPs) of genes of different pathways affect the disease risk. One of the pathways is endocytosis, and variants in these genes may affect their functions in amyloid precursor protein (APP) trafficking, amyloid-beta (Aβ) production as well as its clearance in the brain. This study uses computational methods to predict the effect of novel SNPs, including untranslated region (UTR) variants, splice site variants, synonymous SNPs (sSNPs) and non-synonymous SNPs (nsSNPs) in three endocytosis genes associated with AD, namely PICALM, SYNJ1 and SH3KBP1. Materials and Methods All the variants’ information was retrieved from the Ensembl genome database, and then different variation prediction analyses were performed. UTRScan was used to predict UTR variants while MaxEntScan was used to predict splice site variants. Meta-analysis by PredictSNP2 was used to predict sSNPs. Parallel prediction analyses by five different software packages including SIFT, PolyPhen-2, Mutation Assessor, I-Mutant2.0 and SNPs&GO were used to predict the effects of nsSNPs. The level of evolutionary conservation of deleterious nsSNPs was further analyzed using ConSurf server. Mutant protein structures of deleterious nsSNPs were modelled and refined using SPARKS-X and ModRefiner for structural comparison. Results A total of 56 deleterious variants were identified in this study, including 12 UTR variants, 18 splice site variants, eight sSNPs and 18 nsSNPs. Among these 56 deleterious variants, seven variants were also identified in the Alzheimer’s Disease Sequencing Project (ADSP), Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Mount Sinai Brain Bank (MSBB) studies. Discussion The 56 deleterious variants were predicted to affect the regulation of gene expression, or have functional impacts on these three endocytosis genes and their gene products. The deleterious variants in these genes are expected to affect their cellular function in endocytosis and may be implicated in the pathogenesis of AD as well. The biological consequences of these deleterious variants and their potential impacts on the disease risks could be further validated experimentally and may be useful for gene-disease association study.

136 Analysis of sSNPs 137 PredictSNP2 (http://loschmidt.chemi.muni.cz/predictsnp2/) is a web server that predicts 138 Table 1 lists the variants that change the number of motif matched in UTRSite, as 232 compared with their wild type UTR sequences (see Table S3 for complete prediction analysis 233 results). Both 3' and 5' UTRs are enriched with cis-acting regulatory elements, and both UTRs 234 are important in the regulation of protein expression. In this study, a total of 12 UTR variants 235 were predicted to cause an addition or a deletion of regulatory elements in the UTR sequences. 236 Further analysis on the impacts of these regulatory elements, including the effect of the miRNA 237 binding to the UTR sequence is outside the scope of this study. However, it should be 238 characterized in the future.
239 Prediction analysis of splice site variants 240 The prediction analysis of splice site variants was done using MaxEntScan, which is 241 integrated in VEP. Submission to MaxEntScan requires only the SNP IDs. The consensus score 242 and score difference between wild type and mutant sequences were obtained after the 243 submission. The consensus score for each variant was calculated based on different protein-244 coding transcripts and the same variant may have different consensus scores on different 245 transcripts. This allowed users to study the impact of the splicing variant on different transcripts. 246 Table 2 shows the variants with score difference exceeding the defined threshold. Columns  Manuscript to be reviewed 277 Prediction analysis of nsSNPs 278 All 759 nsSNPs in the three genes were analyzed by five prediction packages. The 279 Ensembl genome database contains prediction results from SIFT and PolyPhen-2, in which a 280 total of 106 nsSNPs were predicted as "damaging" in SIFT and "probably damaging" in  Table 4 while the complete prediction results of nsSNPs using five prediction 285 packages are shown in Table S8. All the deleterious nsSNPs were predicted with high SIFT score 286 and most of them have a PSIC score larger than 0.95 in PolyPhen-2. Prediction results from SIFT 287 and PolyPhen-2 showed that all deleterious nsSNPs were highly conserved in the proteins (  295 the formation of clathrin-coated pit, which is one of the key functions of PICALM in CME 296 (Ishikawa et al., 2015). These deleterious nsSNPs were predicted to cause conformation change 297 and affect PICALM protein function. Figure 1 shows the sticks representation of the protein 298 structural changes caused by the deleterious nsSNPs in PICALM gene. In Fig. 1, the variant 299 residues are colored yellow while red dashed lines indicate the hydrogen bonds between the 300 residues. Variants rs780443419 (F109S) and rs765338634 (L179P) resulted in an addition or a 301 deletion of hydrogen bond formation between the mutant and neighboring amino acids.
302 Therefore, the substitution of these protein residues could significantly affect the ANTH domain 303 function as well as the overall PICALM protein structure.

304
For SYNJ1 gene, 13 nsSNPs were predicted as deleterious. Six of them including 305 rs781675993, rs398122403, rs762909719, rs771755243, rs768897710 and rs779479360 are  The structural and functional importance of the 18 deleterious nsSNPs were further 328 analyzed using ConSurf analysis tools. Evolutionary conservation analysis determines the level 329 of conservation of each protein residue and predicts the potential structural and functional 330 importance of these deleterious variants to the protein. Figure 2 shows that 17 out of 18 (94%) 331 deleterious nsSNPs were analyzed to be "conserved", with 12 of them (70%) "highly conserved" 332 (score "9") through homologous sequence alignment. Only one deleterious nsSNP, rs745418083 333 (L776S) in SYNJ1 gene, was estimated to be "intermediate" in terms of evolutionary 334 conservation. Besides that, 11 of the 18 (61%) deleterious variants were predicted as structural Manuscript to be reviewed 335 residues and the rest (39%) were functional residues. Figure S2-4 shows the conservation scores 336 of full length proteins of PICALM, SYNJ1 and SH3KBP1, respectively.

337
Besides the prediction analysis of the functional and structural importance of the 338 deleterious nsSNPs and their level of conservation on the proteins, the changes of physical and 339 chemical properties between wild type and mutant amino acids were studied. Table S9 shows the 340 hydropathy, polarity and charge differences between the wild type and mutant amino acids of the 341 deleterious nsSNPs.  Table S9 shows that 345 the hydropathy in eight deleterious nsSNPs has changed from hydrophobic to hydrophilic, and 346 the polarity of four nsSNPs has changed from non-polar to polar. The substitution of amino acids 347 may affect both covalent and non-covalent interactions among amino acids, subsequently 348 influencing the stability and conformation of protein structure.

349
To study the role of nsSNPs in affecting the total free energy and the stability of protein To demonstrate the reliability of nsSNP prediction, we predicted the functional 372 consequences of other nsSNPs that have been previously studied in other benchwork 373 experiments. Ten PSEN1 pathogenic nsSNPs that have been validated experimentally to affect 374 amyloid-beta (Aβ) level were retrieved from ALZFORUM (https://www.alzforum.org/). Besides 375 that, another five PSEN1 nsSNPs, including three non-pathogenic variants and two variants that 376 have never been reported to be deleterious or disease-associated, were selected as negative 377 controls. The prediction results of the total 15 nsSNPs in PSEN1 gene are shown in Table S10. 378 The prediction results show that nine out of these ten pathogenic nsSNPs were predicted 379 deleterious by all five prediction packages used in this paper. The only pathogenic nsSNP that 380 was not predicted as deleterious variant, which is rs63750231 (E280A), has I-Mutant DDG and 381 SNPs&GO score that is lower than the cutoff point. All these five negative controls of PSEN1 382 nsSNPs were predicted to be non-deleterious to the protein.  Manuscript to be reviewed 447 experiments to determine the functional consequences of all the SNPs, even for a single gene. 448 For that reason, computational methods become an alternative and important way to prioritize 449 the SNPs that are possibly structurally or functionally significant for the genes of interest. 450 Computational methods such as prediction and modelling tools allow researchers to identify 451 functionally significant SNPs from neutral SNPs. The prediction accuracy is expected to be 452 improved when results from multiple algorithms are combined to perform meta-prediction. 453 Besides that, computational methods are able to provide high throughput prediction results at In our study, a total of 56 rare variants in PICALM, SYNJ1 and SH3KBP1 genes were 531 predicted as deleterious variants. These deleterious variants were predicted to affect the 532 regulation of gene expression and protein functions. Three of these genes have cellular functions 533 involved in clathrin-mediated endocytosis (CME) and deleterious variants in these genes were 534 expected to affect the functions of these proteins in endocytosis. Moreover, these genes were 535 previously reported as AD-associated and they are implicated in the pathogenesis of AD. The