Does conserved domain SOD1 mutation has any role in ALS severity and therapeutic outcome?

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative fatal disease that can affect the neurons of brain and spinal cord. ALS genetics has identified various genes to be associated with disease pathology. Oxidative stress induced bunina and lewy bodies formation can be regulated through the action of SOD1 protein. Hence, in the present study we aim to analyse the structural and functional annotation of various reported SOD1 variants throughout and their putative correlation with the location of mutation and degree of ALS severity by inferring the structural and functional alterations in different SOD1 variants. We have retrieved around 69 SNPs of SOD1 gene from Genecards. Structural annotation of SOD1 variants were performed using SWISS Model, I-Mutant 2.0, Dynamut, ConSurf. Similarly, the functional annotation of same variants were done using SIFT, PHP-SNP, PolyPhen2, PROVEAN and RegulomeDB. Ramachandran plot was also obtained for six synonymous SNPs to compare the amino acid distribution of wild-type SOD1 (WT SOD1) protein. Frequency analysis, Chi square analysis, ANOVA and multiple regression analysis were performed to compare the structural and functional components among various groups. Results showed the mutations in conserved domain of SOD1 protein are more deleterious and significantly distort the tertiary structure of protein by altering Gibb’s free energy and entropy. Moreover, significant changes in SIFT, PHP-SNP, PolyPhen2, PROVEAN and RegulomeDB scores were also observed in mutations located in conserved domain of SOD1 protein. Multiple regression results were also suggesting the significant alterations in free energy and entropy for conserved domain mutations which were concordant with structural changes of SOD1 protein. Results of the study are suggesting the biological importance of location of mutation(s) which may derive the different disease phenotypes and must be dealt accordingly to provide precise therapy for ALS patients.


Introduction
Amyotrophic lateral sclerosis (ALS, also known as Lou Gehrig's disease), a term coined by Jean Martin Charcot in 1874, is considered to be one of the most fatal neurodegenerative diseases. It can be characterized by progressive degeneration of both upper and lower motor neurons (amyotrophic) leading to hardening of lateral columns and creeping paralytic condition (lateral sclerosis). For ALS patients, maximum duration of survival is around 3-5 years [1,2]. As the disease progresses, phosphorylation of neurofilaments stimulates the accumulation of bunina and lewy bodies in pathophysiology of ALS patients consequently induce inflammation in perikarya and proximal axons. Spheroids and ubiquitinylated strands can also stimulate the inflammation in these patients later which can be mediated through triggered response of microglia and astrocytes [3]. Terminal stage in ALS patients can feature failure of cardio-respiratory system after 2-5 years of onset of disease, resulting into death [2].
Three phenotypic variants of ALS have been described which include: pacific (associated with dementia), familial (mostly autosomal dominant, fALS), and sALS (sporadic ALS) [1]. Approximate, 10% of familial ALS cases demonstrate the Mendelian inheritance. Interestingly, 20% of fALS cases have exhibited the SOD1 mutation (superoxide dismutase 1 contains Cu and Zn in catalytic site). Additionally, mutation in TARDBP has also accounted for 5-10% of fALS cases along with the mutations in FUS (5%) and ANG (1%) [4]. However, sALS has also shown 5% cases with SOD1 mutation. Most of SOD1 mutations showed dominant clinical phenotype in ALS disease [5] except D90A mutation which could exhibit either recessive [6] or dominant [7,8] phenotype. Mutations in 13 genes and their loci have been identified as the causal genetic factor for typical ALS clinical phenotype which primarily involves SOD1, TARDBP, ANG, FUS, and OPTN etc. [4]. Recent genome wide association studies (GWAS) have also suggested and associated seven novel genes with ALS pathology including C21orf2, TUBA4A, CHCHD10, NEK1, TBK1, MATR3, and CCNF [9]. Intriguingly, hexa-nucleotide expansion (G4C2) of C9ORF72 gene (by mean of loss or gain of function mutations leading to develop somatic mosaicism) has also been associated with fALS (with 3.2% prevalence rate in Indian ALS) which may stimulate pathology by TDP-43 accumulation, impaired RNA metabolism and defective proteosomal degradation mechanism [10][11][12]. C9ORF72 association with fALS is signifying the biological importance of non-coding mutation, though it's not theme of current study.
The present study aims to examine the changes in functional and structural domains of SOD1 protein for the reported SNPs in various studies. Additionally, we also attempted to investigate significance of mutation location in maintaining the free energy and entropy of protein's tertiary structure. The impact of synonymous SNPs on protein structure as they do not exert any changes in the tertiary structure has also been explored to demonstrate their biological significance in SOD1 protein structure. Moreover, we also tried to decipher the biological significance of individual scores obtained from various bioinformatics tools to show their impact on structural and functional aspects of SOD1 protein and their implication in ALS severity which can be used to differentiate transmittant ALS phenotypes and can provide the substrate for development of personalized medicine.

Methodology
We have retrieved 69 clinically associated exonic SNPs of SOD1 with ALS pathology in various populations from Genecards [13] (Accession number for protein from NCBI NP_000445.1, Uniprot ID: P00441) (Additional file 1: Table S2). Structural and functional parameters of SOD1 genetic variations were collected from different web servers (Additional file 1: Table S1). Consurf [14] was utilized to identify the position of evolutionary conserved amino acid based on their respective SNPs (Additional file 1: Figure S1). FASTA format has been taken into consideration to annotate SOD1 protein structure and its comparison with different variants along with the severity of ALS pathology. Protein Data Bank (PDB) structure models were generated using Swiss Model [15]. All the structural components of protein including the Cβ, all atoms, salvation and torsion angles were retrieved from Swiss Model web server. Collectively these values define the Q mean (Z-score) of protein and describe the degree of naiveness of the protein model based on genetic polymorphism and ultimately signifying the protein stability. The corresponding free energy (Gibb's free energy) and the entropy with changes in protein sequence were also obtained from Dynamut [16] and I-Mutant 2.0 [17] (Additional file 1: Table 1). Functional association of free energy and entropy changes can be correlated with protein stability and molecular flexibility of mutated protein.
Functional annotation of studied mutations has also been analysed through algorithms derived from SIFT [18], PROVEAN [19] and PolyPhen-2 [20] to express the effect of non-synonymous SNPs (nsSNP) to define deleterious and/or neutral nature of mutation. Moreover, reliability index (RI) were also retrieved through PhD-SNP [21] to demonstrate the deleterious effect of nucleotide variation (SNPs) to define the nature of the mutation as neutral or diseased (Tables 2, 3 and 4). Predictive functional annotation of SNPs was also determined by using RegulomeDB [22]. Scoring of SNPs through RegulomeDB can be represented as transcription factor (TF) binding site or regulatory regions (promoter, operator and enhancer sequences) or DNase hypersensitive region etc. Synonymous mutations were scored using RegulomeDB where structural and functional annotations cannot be predicted for such variations (Tables 2, 3 and 4). String [23] was used to see the biological interaction of SOD1 with other biomolecules which may collectively regulate the oxidative stress and the processes mediated by them. Moreover, to identify the evolutionary conserved pattern of SOD1, cladogram among various species was analyzed using MEGAX [24].

Statistical analysis
Frequencies and their association were calculated using Fisher's test analysis. One-way ANOVA was employed to examine the changes in the structural and functional parameters including QMEAN, Gibb's free energy (ΔΔG), entropy (ΔS), SIFT and PROVEAN scores. Independent t test was also carried out to calculate the mean difference of the above mentioned parameters between the groups classified based on the location of the amino acid (e.g. conserved and variable region). Logistic regression analysis was also conducted to identify the changes in various parameters by considering one of them as dependent factor and to demonstrate their diagnostic efficacy in identifying ALS cases more precisely. We calculated fold changes in both structural and functional parameters by deducing the values of mutant SNPs from wild type (WT).

Frequency and association of SOD1 variants
Studied exonic mutations are mostly falling in conserved domain (49 variants) (by ConSurf) of SOD1 protein which have been found to affect the protein structure (52 variants) (by SIFT) and show deleterious effect (62 variants) (by PROVEAN) on SOD1 protein in ALS pathology (Table 5). Gibb's free energy (EcoDDG) of these variants has also suggested that 60 variants are destabilizing the protein structure, though entropy changes have shown both decreasing (n = 38) and increasing (n = 30) trends in almost equal proportions. Results of RegulomeDB
Reg. has also indicated that the most of these variants are falling under the score 4 (n = 46) which represents the binding site for TF and DNAse peak, on the contrary to lesser frequencies of 2b (TF binding + matched TF motif + matched DNase Footprint + DNase peak), 3a (TF binding + any motif + DNase peak) (n = 11) and 5 (TF binding or DNase peak) (n = 9).

E101G
Interestingly, Fisher's test analysis has revealed that PROVEAN (p = <0.0001), SIFT (p = <0.0001) and PHD-SNP (p = 0.002) significantly associated with the conserved domain of SOD1 protein which may define the severity of ALS phenotype. Additionally, HumDiv phenotype to predict the PolyPhen-2 model has demonstrated that around 41 SOD1 variants are falling in conserved domain of protein and signifying most of SOD1 variants are probably damaging in nature (Table 6).
Phylogenetic relationship through Cladogram has revealed that human SOD1 is closely related to Pongo abelli, Similarly SOD1 of Rattus norvegicus has also showed phyologenetic proximity to Mus musculus, however, both showed some divergence from the human SOD1 (Additional file 1: Figure S2).

Alterations of structural and functional parameters Mutation in conserved domain influences PROVEAN score
Comparative analysis of structural and functional values derived from various annotating software like Dynamut, PolyPhen-2, Reliability index (RI) of PHD-SNP, PROVEAN and SIFT have indicated significant alterations based on the location and nature of mutations (both in nucleotide and amino acid) in SOD1 protein. PROVEAN score has been found to be significantly varied in HumDiv phenotypes i.e. benign, possibly damaging and probably damaging (Fig. 1a) and conserved domains (Fig. 1b). Significant alteration in PROVEAN score has been found (between probably damaging and benign) among HumDiv phenotypes derived from PolyPhen-2 algorithm in SOD1 variants. Similarly, PROVEAN score has also significantly changed in both average and conservative domains as compared to benign domain. Results suggest that biological significance of mutations located in conserved domain which could lead to drastic changes in structural and functional components of SOD1 protein can be correlated with the phenotypic severity of ALS patients.

SOD1 Mutation in conserved domain enhances ALS severity by disrupting structural and functional parameters of protein
To compare the effect of SOD1 mutant variants in conserved domain versus variable-average region of protein, ANOVA results have significantly changed in PROVEAN and SIFT scores. Functional annotation of SOD1 mutations through PROVEAN and SIFT algorithms are suggesting that changes in structural and functional parameters of SOD1 protein can be correlated with the ALS severity and distorted version of protein which can be corresponding to the location of mutation (conserved versus variable-average) (Fig. 2). Marginally significant alteration reliability index (RI) of PolyPhen-2 has also been observed between conserved versus variable-average domain. Significant alterations in HumDiv sensitivity and specificity derived from PolyPhen-2 algorithm has also suggested highly distorted structure of SOD1 protein when mutation was located in conserved domain as compared to variable-average (Fig. 2).

SOD1 variants alter structural component Cβ
Results of Pearson's chi-square have shown that most of the SOD1 variants are located in conserved domain of protein. ANOVA results have demonstrated that the structural component of SOD1 protein, namely Cβ, derived from SWISS model has significantly altered in RegulomeDB 4 score as compare to score 2b +3a (Fig. 3a). Alterations in ΔΔG ENCoM (kcal/mol) and ΔΔS Vib ENCoM (kcal.mol −1 .K −1 ) have also been observed among RegulomeDB groups, though not of statistical significance (Fig. 3b, c). Results are indicating that the conserved domain mutation(s) can alter the structural component Cβ of SOD1 protein and may influence the binding of TF and DNase hypersensitivity. The impact of various algorithms to predict the functional and structural components was further correlated with SOD1 variants located in different domains including variable and conserved etc. Multiple regression analysis has shown that SWISS model component QMEAN has been found to be associated with PROVEAN score and can alter the same by 3.299 (p = 0.003). Moreover, ΔΔG ENCoM (kcal/mol) has also exhibited the association with QMEAN (p = 0.016; B = −0.156). Importantly, standardized coefficient beta (β) has also demonstrated the difference of 0.358. Entropy changes (ΔΔS Vib ENCoM (kcal.mol −1 K −1 ) of SOD1 variants showed the significant association with DDG value derived from iMutant (p = 0.02) and can alter the value by −0.582 unit. Results are suggesting the alterations in corresponding stability and flexibility of SOD1 protein with respect to nature and location of mutation. Importantly, predicted structural and functional changes of SOD1 protein can be further correlated with expression levels in ALS pathology which may be useful for predicting precise ALS phenotype (Table 7).
Structural and functional annotations of synonymous SNPs (sSNPs) cannot be done with existing tools to predict their structural and functional alterations. In current data set, six synonymous SOD1 variants were retrieved which were located in TF binding site and DNase hypersensitivity (Regulome 4 score). Interestingly, five out of six SOD1 sSNPs were falling in conserved domain and one was located in variable region. Results are suggesting the regulatory function of sSNPs by modulating the SOD1 expression and associated cellular mechanism (Table 8).
Moreover, it was found that SNP at 94 position of SOD1 protein has six different variants as reported by various studied. (Table 9), suggesting a potential hot spot in ALS pathology. Comparative analysis has indicated that structural and functional annotations of these variants have drastically changed as compared to WT protein structure of SOD1. Results have shown increased scores of QMEAN, Cβ, salvation and torsion angel parameters as compared to WT SOD1 protein. All structural factors derived from SWISS modeling in WT protein structure are required to maintain amino acids geometry, for necessary hydrogen and hydrophobic bonding to guide the secondary and tertiary structures of SOD1 protein. Predictive ΔΔG ENCoM (kcal/mol) and ΔΔSVib ENCoM (kcal.mol − 1.K − 1) by Dynamut also got altered which was concordant with structural changes in protein derived from these SOD1 variants. Results can indicate the distorted molecular flexibility and decreased stability of SOD1 protein derived from these six SNPs. Importantly, these variants were falling in the conserved region of SOD1 protein and primarily affect the TF binding and DNase hypersensitivity. Comparative Ramachandran plot analysis has also suggested the changes in distribution of amino acids in favored, allowed and outlier regions as shown in Fig. 4. Results of Table 9 demonstrating the increased number of amino acids fall in favored region in these six SNPs as compared to WT (286 amino acids). Similarly, drastic changes in number of amino acids can be seen in SOD1 variants in both allowed and outlier regions of these six SNPs in comparison to WT (14 and 2, respectively). Results signify changes in structural parameters in all six variants as compared to WT which can affect the bonding pattern of secondary structure SOD1 proteins.

Discussion
Present study provides the comparative analysis of various bioinformatics tools used to predict structural and functional aspects of exonic variants of SOD1 in ALS pathology which may be used to decipher the clinical severity of disease and translational implications. Not much success has been achieved in the advancement of prognostic and diagnostic fields to predict ALS in early stages of pathology. ALS is one of the most devastating degenerative diseases which demands faster conversion of genetic data into its clinical and translational development. ALS patients have been prescribed Riluzole to offer symptomatic relief and retard the degenerative process by inhibiting release of glutamic acid and noncompetitive action with N-methyl-D-aspartate (NMDA) receptors [25,26]. We have demonstrated that mutation(s) found in conserved domain of SOD1 protein are more deleterious and disease causing which can significantly distort the structure of SOD1 protein by altering Gibb's free energy and entropy of naïve protein. PROVEAN, PHD-SNP and SIFT scores also got altered significantly in mutations located in the conserved domain of SOD1 protein. Interestingly, results of multiple regression analysis to see individual impact of different entities including SIFT, PROVEAN, Polyphen2, QMEAN etc. on structure and function of SOD1 protein have revealed significant changes in free energy and entropy (Delta G and Delta S) which were concordant with structural changes in SOD1 protein. Results are suggesting that conserved domain mutation may have pivotal role in balancing the free energy and entropy of SOD1 protein by maintaining homeostatic interactions. Multiple bioinformatics tools to predict the structural and functional analysis can enhance the possibility of identifying variant's nature that can be missed by employing one tool to be specific. The resulted SNPs can induce the unfavorable conformational changes in SOD1 protein and may refuse to interact with other associated molecules which may hamper the mediated functions of downstream molecules (Fig. 5). Mutations in conserved domain of SOD1 protein have been found to stimulate the sedimentable aggregates [27], impair the activity of Na + /K + ATPase-α3 [28], reduce the affinity for Zn ion [29] and increase the Palmitoylation [30]. Therefore, it can also be argued that location and degree of mutation of SOD1 gene may have diverse impact on structural and functional aspects of SOD1 protein. This suggests that ALS pathology derived through various mutation may be dealt accordingly to the nature of SOD1 mutation and therapeutic regimen must be designed accordingly. It is evident from our results that mutation in phylogenetically conserved region is pronounced to be highly detrimental in nature because these alterations are located in functional domain of the effective protein and suggest maximum structural distortion of protein. Based on these results, it may be suggestive to provide the therapies or molecules which may assist in maximum structural restoration of the mutated SOD1 protein to provide the interactive interface for downstream molecules, may be beneficial for ALS patients.
Study has also provided comparative analysis of synonymous SNPs (six), though they do not exert any changes in tertiary structure of protein. Our results are suggesting the changes in QMEAN, Cβ, salvation and torsion angle of these six variants as compared to WT SOD1 protein and indicating to consider the same while making a clinical impression of ALS phenotype.
Moreover, Ramachandran plot analysis has also showed the differential distribution of amino acids among favorable, allowed and outlier region require to maintain secondary structure of SOD1 protein suggesting the distorted molecular flexibility and stability of protein of variants at 94 position Such multiple variations at single location may lead to differential clinical phenotypes of ALS based on distribution of amino acid in Ramachandran plot due to varied degree of interactions with downstream molecules.

Limitation
Predictive genetic interactions between these SOD1 variants and molecular interaction with other genes have not been deciphered. Corresponding protein levels of SOD1 variants can precisely define consequence in ALS severity and can derive better representation of ALS phenotype which could be demonstrated by adopting cell culture or animal model based analysis.

Additional file 1: Tables and Figures.
Abbreviation ALS: Amyotrophic lateral sclerosis-Motor neuron disease; fALS: Familial ALS-Inherited form of disease; sALS: Sporadic ALS-Non-inherited form of disease, de novo mutation is main cause; SNP: Single nucleotide polymorphism-most common type of genetic variation and being explored in population studies to identified biomarkers; sSNPs: Synonymous SNPs-No change in amino acid after mutation; nsSNP: Non-synonymous SNPs-lead to change in amino acid's chemical and physical nature after mutation; WT: Wild type-Phenotype occurs in population with maximum frequency; SOD1: Super oxide dismutase 1-Antioxidant enzyme protecting the cell from reactive oxygen Fig. 5 Schematic illustration of various biomolecules to reveal the interaction with SOD1 protein to perform the various cellular and molecular function to maintain oxidative homeostatis species toxicity; TARDBP: TAR DNA binding protein-It acts as neuronal activity response factor in the dendrites of hippocampal neurons; ANG: Angiogenin-Regulate the angiogenesis process; OPTN: Optineurin-Function in cellular morphogenesis and membrane trafficking, vesicle trafficking; C21orf2: Chromosome 21 open reading frame 2-Regulation of cell morphology and cytoskeletal organization; TUBA4A: Tubulin α4a -Major protein of microtubules; CHCHD10: Coiled-coil-helix-coiled-coil-helix domain-containing protein 10-Imperative role in oxidative phosphorylation; NEK1: NIMA Related Kinase 1 (NIMA:never in mitosis gene A)-Regulate neuronal morphology and axonal polarity; TBK1: TANK-binding kinase 1-Serine -threonine protein kinase and regulate cell-proliferation, autophagys and apoptosis. It also involve in antiviral innate immunity response; MATR3: Matrin-3-It interacts with TDP-43 and provides the structural support to nuclear membrane and shows binding with nucleic acid; CCNF: Cyclin F-Regulate cell cycle transitions through their ability to bind and activate cyclin-dependent protein kinases; GWAS: Genome wide association studies-To explore the genetic variant(s) across the population and their association with the trait in different individuals; ANOVA: Analysis of variance-To analyse the significant differences between the means of two or more independent groups; RI: Reliability index-Predictor of human Deleterious effect of Single Nucleotide Polymorphisms; RC: Ramachandran plot-Analysis of secondary structure of protein by understanding the distribution of phi-psi angles; SIFT: The scale-invariant feature transform-To scale the effect of amino acid substitution on protein function; PROVEAN: Protein Variation Effect Analyzer-Amino acid substitution impact on the biological function of a protein; Reg: RegulomeDB Score-Interpretation of regulatory variants in the human genome; ΔΔG: Gibb's free energy-Equilibrium constant for a reversible reaction; ΔS: Entropy-Degree of randomness; Con.: Conserved-Location of amino acid in conserved domain of SOD1 and consensus present throughout the species; Avg.: Average-Location of amino acid in Average domain of SOD1; Var.: Variable-Location of amino acid in variable domain of SOD1 and can vary throughout the species; P: Polar-Amino acid whose side chain show bonding with water (hydrophilic nature); NP: Non Polar-Side chain of amino acid buried inside the protein structure and formed hydrophobic bonding; PB: Polar Basic-Polar Amino acid with -NH2 group in the side chain; PA: Polar Acidic-Polar Amino acid with -COOH group in the side chain; Tor: Torsion angle-Dihedral angles in the back bone of protein (φ, ψ, ω); Solv: Solvation-Bonding between solute and solvent; Mol flex: Molecule flexibility-Molecular flexibility of prot signi the conforma changes to copeup with external sources to distort the protein structure; Eff.: Effect-Effect and indicating the outcome of the mutation.