Computational Analysis of Damaging Single-Nucleotide Polymorphisms and Their Structural and Functional Impact on the Insulin Receptor

Single-nucleotide polymorphisms (SNPs) associated with complex disorders can create, destroy, or modify protein coding sites. Single amino acid substitutions in the insulin receptor (INSR) are the most common forms of genetic variations that account for various diseases like Donohue syndrome or Leprechaunism, Rabson-Mendenhall syndrome, and type A insulin resistance. We analyzed the deleterious nonsynonymous SNPs (nsSNPs) in INSR gene based on different computational methods. Analysis of INSR was initiated with PROVEAN followed by PolyPhen and I-Mutant servers to investigate the effects of 57 nsSNPs retrieved from database of SNP (dbSNP). A total of 18 mutations that were found to exert damaging effects on the INSR protein structure and function were chosen for further analysis. Among these mutations, our computational analysis suggested that 13 nsSNPs decreased protein stability and might have resulted in loss of function. Therefore, the probability of their involvement in disease predisposition increases. In the lack of adequate prior reports on the possible deleterious effects of nsSNPs, we have systematically analyzed and characterized the functional variants in coding region that can alter the expression and function of INSR gene. In silico characterization of nsSNPs affecting INSR gene function can aid in better understanding of genetic differences in disease susceptibility.


Introduction
The insulin receptor (INSR) is a tyrosine kinase-specific transmembrane receptor that is activated by insulin, insulin growth factor I, and insulin growth factor II [1]. Metabolically, the INSR plays a crucial role in the regulation of glucose homeostasis which may result in a range of clinical events including diabetes and cancer [2,3]. The main activity of INSR is persuading uptake of glucose and because of a decrease in insulin receptor signaling leads to diabetes mellitus type 2. The cells' inability to take glucose results in hyperglycemia and all the sequels that result in diabetes. Insulinresistant patients may also display acanthosis nigricans. It is already proven that the presence of mutant receptors in the cell may have detrimental effects on the activity of the normal receptor. A previous study conducted with kinase-deficient INSRs transfected into cultured cells showed that such receptors suppressed the function of endogenous INSRs and functioned as dominant-negative mutations [4]. However, in most cases of insulin resistance, the mutation is expressed as a recessive form. Yamamoto-Honda et al. [5] studied the function and consequences of recessive mutation in the INSR. For example, Donohue syndrome known as Leprechaunism is a rare and severe genetic autosomal recessive disorder due to defect in the INSR gene.
Single-nucleotide polymorphisms (SNPs) are the most common form of human genetic variations and nearly half a million of SNPs reside in the exons of the human genome. Among these SNPs, "nonsynonymous SNPs (nsSNPs)" can alter the amino acid residues and contribute to functional diversity in encoded proteins in the human population. The genomic distribution of SNPs is not obviously homogenous. In general, SNPs occur in noncoding regions more frequently than in coding regions [6]. Genetic recombination and mutation rate are some other factors that can also determine SNP density [7]. SNPs are usually biallelic and a single SNP may cause a Mendelian disease [8]. In case of complex diseases, SNPs do not usually function independently but rather they work as a group with other SNPs to exhibit a disease condition which has been seen in osteoporosis [9]. A wide range of human diseases, such as sickle-cell anemia, thalassemia, and cystic fibrosis, result from SNPs [10][11][12]. For drug discovery, diseases with different SNPs may become crucial pharmacogenomic targets; some SNPs are also associated with the metabolism of different drugs [13][14][15]. For genome-wide association studies, SNPs can serve as a useful genetic marker [16]. The consequences or deleterious effects of SNPs are generally attributed to their impact on the protein structure and function. However, very few studies have been done to predict the SNPs and their impacts on INSR.
In this study, we identified the nsSNPs' deleterious mutations in silico which may have an impact on the structural integrity of human INSR protein and are involved in several genetic diseases. Knowledge of in silico analysis of SNPs will play a major role in the understanding of the genetic basis of several complex genetic human diseases. Furthermore, the genetics of human phenotypic diversity could also be implied by establishing the functions of these SNPs. Using laboratory techniques, it is still a major obstacle to identify the functional SNPs in a disease-related gene. However, with recent advancements in the "in silico" technique and procedures, it is now possible to carry out research investigations without the need for extensive lab work. The main focus of this work is to investigate the SNPs genetic variations in the human INSR gene and their possible effects on structure and functions of INSR using bioinformatics and computational algorithms. In PROVEAN, protein sequences of BLAST hits with more than 75% global sequence identity were clustered together and top clusters formed a supporting sequence set. A delta alignment scoring system was used, where the scores of each supporting sequence were averaged within and across clusters to generate the final PROVEAN score. A protein variant is said to be "deleterious" if the final score is below a certain threshold (default is −2.5) or is predicted to be "neutral" if the score is above the threshold [17].

Materials and Methods
PolyPhen version 2 predicts the influence of amino acid substitution on the structure and function of proteins by using the specific empirical rules. Protein sequence, database ID/accession number, amino acid position, and amino acid variant details are the input options for PolyPhen [18]. The tool estimates the position-specific independent count (PSIC) score for every variant and calculates the score difference between variants.
I-Mutant 2.0 and I-Mutant 3.0 are based on Support Vector Machine (SVM) algorithm to predict the stability of the protein due to single amino acid variations. It can predict protein stability changes by using protein sequence or structure. It has an overall accuracy of 77% when prediction is based on protein sequence. I-Mutant 2.0 and I-Mutant 3.0 predict the DDG values as a regression estimator and the sign of the stability change. I-Mutant 3.0 furthermore classifies mutations into three categories: neutral mutation (−0.5 ≤ DDG ≤ 0.5), large decrease (≤ −0.5), and large increase (>0.5) [19,20].

3D Modeling and Analysis of Protein Structure.
The EMBL-EBI web-based tool PDBsum (http://www.ebi.ac.uk/ pdbsum/) was used to find the proteins related to the INSR. PDBsum provides an at-a-glance overview of every macromolecular structure deposited in the Protein Data Bank (PDB). It performs a FASTA search against all sequences in the PDB to obtain a list of the closest matches [21]. LS-SNP/PDB [22] annotates all human SNPs that produce an amino acid change in a protein structure in PDB [23], using features of their local structural environment, putative binding interactions, and evolutionary conservation. The presence of an nsSNP in a highly conserved surface patch or a charged surface patch suggests possible biological importance. These annotations allow users to quickly scan a large number of nsSNPs of interest and prioritize those with higher likelihood of impacting normal protein activities. LS-SNP server is also useful to analyze human nsSNPs onto protein homology models [24].
PYMOL was used to generate the mutant models of each of the selected PDB entries for the corresponding amino acid substitutions. PYMOL allows browsing through a rotamer library to change amino acids. A "Mutagenesis Wizard" was used to replace the native amino acid with new one. The mutation tool facilitates the replacement of the native amino acid by the "best" rotamer of the new amino acid. The ".pdb" files were saved for all the models.

Structure Validation and Energy
Minimization. Structural Analysis and Verification Server (SAVES) was implemented for evaluating the quality and validation of the refined 3D structural models. The SAVES integrates PROCHECK, PROVE, and ERRAT software programs to check overall quality of the 3D models obtained from the PYMOL mutagenesis tool. Structure refinement was carried out using KoBaMIN which is based on knowledge based potential refinement for proteins protocol [25].

Protein Stability Validation for Mutant Structure.
The approach called Mutation Cutoff Scanning Matrix (mCSM) uses the concept of graph-based structural signatures to study and predict the impact of single-point mutations on protein stability and protein-protein and protein-nucleic acid affinity. The mCSM encodes distance patterns between atoms to represent protein residue environments [26].
2.6. Structural Analysis. The predicted structures were viewed in University of California San Francisco (UCSF) Chimera. It is a computationally intensive program for visualization of molecular models and it provides an interactive interface for the user for analyzing the models and model related data. It provides a platform for analyzing sequence alignments, generating homology models, molecular docking, viewing various density models, and also comparing different models by superimposition [27]. The mutant and wild type structures were superimposed and the effect of the nonsynonymous variation was observed in terms of steric hindrance due to the changes of the side chains and charge of the amino acid. Then, the degree of change in the hydrophobicity or hydrophilicity of the substituted amino acid and its effect on the interacting intrachain and interchain molecules was analyzed. A summary of in silico approaches used in this study is shown in Figure 1.

SNP Dataset from dbSNP.
The dbSNP contains both validated and nonvalidated polymorphisms. In spite of this drawback, we opted to avail the dbSNP because allelic frequency of most of nsSNPs of INSR has been recorded there and that is the most extensive SNP database. In our data search, some previously reported SNPs in dbSNP have been identified as invalid because of wrong sequencing and alignment. These erroneous SNPs have expired or have merged with other SNPs. Some INSR genes have been renamed. We carefully cross-examined the databases and removed those old and invalid SNPs. At dbSNP, INSR gene contains data for 4967 SNPs. Out of 4967 SNPs, only 57 were nsSNPs in the coding region (Table 1). Our investigation accounted for the nsSNPs in the coding region only.

Effects of nsSNPs on INSR Predicted by Different Tools.
The PROVEAN algorithm works mainly with primary sequence for prediction while other tools perform similar task with the structure. Since PROVEAN can predict a large number of substitutions and does not require structures, it is advantageous over other tools. PROVEAN predicts the effect of the variant on the biological function of the protein based on sequence homology. The scores of PROVEAN are classified as "deleterious" below a certain threshold (here −2.5) and "neutral" above it. A .txt file containing "db SNP rsIDs" of all 57 nsSNPs was submitted to the "dbSNP rsIDs" page to calculate the PROVEAN score. Out of 57 nsSNPs, PROVEAN predicted 24 as deleterious and 33 as neutral (Table 1). Among the 24 deleterious nsSNPs mutations, W1220L and C219R were predicted as highly deleterious with PROVEAN scores of −11.648 and −9.831, respectively. PolyPhen identifies homologues of the input sequences via BLAST and calculates PSIC scores for every variant and estimates the difference between the variant scores; the difference of 0.339 is detrimental. There are certain empirical rules applied to the sequences and the accuracy is approximately 82% with a chance of 8% false-positive prediction. The protein accession number of INSR (P06213) and the amino acid substitutions corresponding to each of the 57 nsSNPs were submitted separately. Table 2 summarizes the results obtained from the PolyPhen server. A PSIC score difference was assigned to categorize SNPs as benign and damaging. "PolyPhen-2: scores are evaluated as 0.000 (most probably benign) to 0.999 (most probably damaging)." Twenty-one of the 57 nsSNPs were predicted as "damaging," and the PSIC scores fell into the range of 1.51 to 3.41. 18 nsSNPs predicted to be deleterious by the SIFT (Sorting Intolerant from Tolerant) program were also predicted to be damaging by the PolyPhen server.
I-Mutant is a neural network based routine tool used in the analysis of protein stability alterations by considering the single-site mutation. I-Mutant also provides the scores for free energy alterations, calculated with the FOLD-X energy based web server. By assimilating the FOLD-X estimations   (Table 3). Finally, we selected 18 significant nsSNPs because they were predicted to be deleterious by PROVEAN, PolyPhen, and SIFT programs and showed decreased structural stability following analysis by I-Mutant (Table 4).

Effects of nsSNPs on Protein
Structure. By using the EMBL-EBI web-based tool PDBsum, the INSR protein structures were searched. Two related protein structures, namely, 2HR7 and 4IBM, were found to share 100% amino acid sequence similarity. The single amino acid polymorphism (SAAP) database server (http://www.bioinf.org.uk/saap/db/) is offline due to essential maintenance. Thus, we were unable to map the deleterious nsSNPs into protein structure through SAAP. Mapping the deleterious nsSNPs into protein structure information was performed through the LS-SNP/PDB server. According to this resource, 2HR7 accounted for 9 nsSNPs and 4IBM had 4 nsSNPs. Apart from the SNP scanning, LS-SNP/PDB server also predicts solvent accessibility and conservation ratio of given protein structures. An overview of mapping of mutant structures and their solvent accessibility and conservation ratios is given in Table 5.
Out of 18 nsSNPs predicted to be deleterious by PROVEAN or PolyPhen, a total of 13 were mapped to the PDB ID 2HR7 and 4IBM native structures. All the functional nsSNPs predicted using the PROVEAN and PolyPhen tools were subjected to the PYMOL mutation tool. A model for each functional nsSNP was made by PYMOL mutagenesis tool and visualized using UCSF Chimera tool for comparison with the native structures (Figure 2, only mutants rs1051691 (I421T) and rs121913156 (R1174Q) are shown).
Energy minimization is performed for the native structures (2HR7 and 4IBM) and the mutant modeled structures. The KoBaMIN web server uses a force field for energy minimization. The total energy for all the mutant and native models after minimization is listed in Table 6. The total energies for the native structures of 2HR7 and 4IBM are −22087.6969 kJ/mol and −13041.4646 kJ/mol, respectively. Change in total energy due to mutation is noticeable in   the both 2HR7 and 4IBM mutant models. RMSD is the measure of the deviation of the mutant structures from their native configurations. The higher the RMSD value, the more the deviation between the two structures. Structural changes, in turn, affect functional activity. RMSDs for all the mutant structures are listed in Table 6. The mutants rs79312957 and rs121913156 have higher RMSD value of 6.025 and 0.436 compared to native structures RMSD value 6.019 and 0.404, respectively. These two nsSNPs could be believed to affect the structure of the proteins. These two nsSNPs were also shown to be deleterious according to the PROVEAN and PolyPhen server. The 3D structure of the native INSR protein crystal structures 2HR7 and 4IBM and the predicted mutant structures were superimposed over chain A. The superimposed structures revealed that the mutants might have considerably affected the protein structure and thus its function (Figure 3; only rs79312957 is shown). Substituted amino acid residues in the mutants might have altered the conformation of the INSR or networking among neighboring amino acids or interaction between the substrate and receptor [28,29].

Effects of nsSNP on Protein Stability.
The effects of the nsSNPs on protein stability were computed with FOLD-X by mCSM server which uses an empirical energy equation to calculate the Gibbs free energy DDG. The empirical  energy terms consider the location and type of a substituted residue. The mCSM is a structure based prediction tool. Two different analysis protocols were utilized to obtain maximum information over the effect of the single amino acid substitutions: (1) all the nsSNPs were considered singularly and their effect on the protein stability and interaction potential was determined; (2) the nsSNPs were considered according to the allelic sequences. Initially, all the structures were minimized and obtained a stable protein stability value. Then the structures for each single amino acid variation were generated using the Build Model feature of FOLD-X 3.0. Finally, the effect of each single amino acid variation on the protein stability of INSR was determined using the analyzed complex features. The mutation was considered as destabilizing and stabilizing when the DDG was >0 and <0, respectively. In this prediction method, all the mutant structures ultimately derived from the PROVEAN, PolyPhen, and I-Mutant programs were finally submitted to the mCSM server to predict mutant structure's protein stability upon mutation. The mCSM predicted all structures as "Destabilizing" including two as "Highly Destabilizing" (Table 7).

Discussion
The SNP in INSR can manifest several insulin-resistant syndromes like Leprechaunism, Rabson-Mendenhall syndrome, and type A insulin resistance [30,31]. Diagnostic measures have already been established on clinical examination as well as laboratory diagnostic tests with elevated insulin levels as a constant feature. Functional and DNA analysis can be used for absolute confirmation, but certain mutations do not contribute to insulin binding and DNA analysis is still not able to identify all the putative mutations. Although there is no direct genotype-phenotype correlation, but mutations in the alpha subunit of the insulin receptor are associated with a more severe phenotype compared to the mutations affecting the beta subunit [32]. Numerous studies have been conducted using in silico analysis approaches to predict the functional effects of nsSNPs on genes such as G6PD, BARF, and PTEN [33,34]. Therefore, for addressing this issue, we selected in silico strategy to analyze and predict the functional effects of SNPs on INSR. We used different in silico methods based on the combination of two distinctive approaches which are sequence and structural based approaches. In comparison with the structure based methods, sequence based prediction methods are one step ahead because they can be applied to any proteins with known relatives, whereas structure based approaches are not feasible to implement for proteins with unknown 3D structures. Software programs and servers that integrate both sequence and structure resources have advantage of being able to assess the authenticity of the predicted results by cross-referencing the results from both methods. Most computational methods utilize this information for the prediction and analyses of deleterious nsSNPs, among which PROVEAN and PolyPhen algorithms are the main representatives. Considering normalized probability score below −2.5 in PROVEAN and a PSIC score 1.5 in PolyPhen as deleterious, 24 and 21 of amino acid substitutions were predicted to have functional impact on INSR gene. The variation in prediction score of PROVEAN and PolyPhen is mainly because of the difference in sequence alignment and the values used to classify the variants. Significant similarity was observed between the results obtained by PROVEAN and PolyPhen. PROVEAN and PolyPhen in predicting the effect of nsSNPs on protein function might be suitable in silico approach [35].
In order to predict the impact of nsSNPs on protein structure, I-Mutant 3.0 was used which evaluated the stability change upon single-site mutation. I-Mutant 3.0 was ranked as one of the most reliable predictors based on the work performed by Khan and Vihinen [36]. Based on the difference  Each mutation was considered individually to study the inherent effect of the SNP. In addition, the allelic sequences were analyzed to investigate if the polymorphisms neutralized each other by occurring simultaneously as an act of preservation of function by nature. The mCSM was used to analyze the effects of single amino acid variations on the structure and stability of the protein. Our results indicated that all of the 13 mutant structures of 2HR7 and 4IBM were predicted as "Destabilizing" which signified our results found by PROVEAN and PolyPhen. Among all the destabilized mutant structures, two mutants were labelled as "Highly Destabilizing" which were rs1051691 and rs52800171 in their I448T and W1220L positions, respectively, which suggested that these polymorphisms should be considered as a potential target for future experiments. If a single amino acid variation shows a change in protein stability or protein-protein interaction, it should give comparable values with the sign reversal for the reverse mutation. This would indicate that the prediction of the effect of the single amino acid variation on the protein structure or proteinprotein interaction might be substantial.

Conclusions
This study shows a correlation between SNPs in the INSR gene and several diseases like insulin-resistant syndromes such as Leprechaunism, Rabson-Mendenhall syndrome, and type A insulin resistance. The present study concludes that 13 nsSNPs especially rs1051691 and rs52800171 decreases protein stability and are not tolerated or may result in loss of function. Their presence in the INSR increases the possibility of altered transcriptional and cell cycle regulation and INSR mediated diseases. Therefore, the probability of their involvement in disease predisposition increases. Thus, for further analysis, these mutations should be given priority to obtain detailed information on their effects. In order to confirm the structures modeled in this study, the actual structures should be determined by X-ray crystallography or nuclear magnetic resonance spectroscopy. We anticipate that the results obtained from our analysis would pave the way for providing useful information to the researchers and can play an important role in bridging the gap between biologists and bioinformaticians.