Forces and Disease: Electrostatic force differences caused by mutations in kinesin motor domains can distinguish between disease-causing and non-disease-causing mutations

The ability to predict if a given mutation is disease-causing or not has enormous potential to impact human health. Typically, these predictions are made by assessing the effects of mutation on macromolecular stability and amino acid conservation. Here we report a novel feature: the electrostatic component of the force acting between a kinesin motor domain and tubulin. We demonstrate that changes in the electrostatic component of the binding force are able to discriminate between disease-causing and non-disease-causing mutations found in human kinesin motor domains using the receiver operating characteristic (ROC). Because diseases may originate from multiple effects not related to kinesin-microtubule binding, the prediction rate of 0.843 area under the ROC plot due to the change in magnitude of the electrostatic force alone is remarkable. These results reflect the dependence of kinesin’s function on motility along the microtubule, which suggests a precise balance of microtubule binding forces is required.


Materials and Methods
Selection of kinesin nsSNPs. The kinesin nsSNPs were downloaded from dbNSFP 40 and missense mutations located in the coding regions of the kinesin motors domains with structures available in the PDB 44 were selected. The Human Gene Mutation Database (HGMD) 41 and ClinVar 42 were used to identify disease-causing mutations. This resulted in the selection of 50 mutations in various kinesins.
A total of 11 nsSNPs with the allele frequency greater than 1% in the 1000 Genomes Project 45 were identified and used as common, non-disease causing polymorphisms in the healthy individuals.
Note that significantly more disease-causing mutations were identified than non-disease-causing mutations, but the inclusion of mutations with allele frequency smaller than 1% may result in mutations with unknown physiological importance. The list of all the mutations for this study is provided in Supplementary Materials Table S1.
Preparation of kinesin-tubulin structures. The 61 selected mutants come from 10 kinesin proteins representing 8 kinesin families (Table 1). High resolution structures, those with better than 5 Å resolution and having no mutation, were downloaded from the Protein Data Bank (PDB) 46 for 5 of the 10 kinesin proteins. If multiple structures were available for the same kinesin in PDB, the structure with the highest resolution was selected for this work.
High resolution structures were not available in the PDB for the other 5 kinesin proteins. Note that for some proteins, including KIF1A and KIF5A, structures were available, however, either the resolution was too low or the structure had mutations introduced into it. In the cases without structure, SWISS-MODEL 47 was used to build protein homology models from templates with high sequence similarities. The top model from SWISS-MODEL was selected to model the corresponding kinesin motor structures. Some of the structures had missing heavy atoms. Profix 48 was used to fix these structures. NAMD 49 was used to perform a 10,000-step energy minimization for each structure. In NAMD minimizations, the CHARMM 50 force field and the Generalized Born (GB) implicit solvent model were used.
There were no structures of the human kinesin-human tubulin complex available in PDB. However, there were many other kinesin-tubulin complex structures available, and kinesins share the same microtubule binding site 22,51 . Therefore, kinesin-tubulin complex structures were made using Chimera 52 to align each kinesin (Table 1) to the human α1A/β3 tubulin dimer structure (PDB ID 5JCO) 53 , using a model of human kinesin-5 and a mammalian tubulin dimer docked into a 9.5-Å cryo-EM map (PDB ID 4AQW) 54 as a template. The C-termini (E-hooks) were not modeled since their structures are not available in the corresponding PDB files.
Building complex structure via structural alignment of the backbone atoms resulted in atomic clashes at the binding interface. To remove these structural clashes introduced during the modeling process, the kinesin-tubulin complex structures underwent 2000 steps of energy minimization using the CHARMM36 55 force field in CHARMM 56 software in which only amino acid side chains were free to move because a 10 kcal·mol −1 ·Å −1 harmonic constraint was placed on all backbone atoms.
The nsSNP structures were generated based on the wild type structure for each kinesin using PDB2PQR 57 . The protonation states of titratable group were assumed to be standard, roughly corresponding to pH = 7.0. Since the kinesins considered in this work are cytoplasmic kinesins, the physiological pH is 7.0. Only the mutated residue was energy optimized; all other atoms were kept in the same position as in the wild type structure to isolate the direct effects of electrostatic forces.
Force calculations. Electrostatic forces were calculated for each kinesin-tubulin complex using DelPhiForce 58 . The force reported is the net electrostatic force exerted on a kinesin by its tubulin dimer binding partner. The electrostatic force on each individual atom and residue, which is used to analyze the detailed force distribution on each kinesin, was also calculated with DelPhiForce.
The forces on each kinesin were calculated in two states: the bound state and the unbound state. The bound state was considered to be the equilibrium complex position, which was determined as described in "Preparation of kinesin-tubulin structures". The unbound state was obtained by displacing the kinesin 5 Å from the tubulin in the direction along the line between the mass center of the kinesin and tubulin dimer. The dielectric constant for water and protein were set as 80 and 2, respectively; the resolution of the grid was set at 2 grids/Å; the perfil was set at 70; the ionic strength of the solvent was set at 0 (zero salt concentration was used to be consistent with our previous studies and to avoid the ambiguity associated with explicit ion binding). However, to check the sensitivity of results, parallel calculations were done at physiological salt concentration corresponding to ionic strength I = 0.15 M. The dipolar boundary condition was used in all cases. Information on these parameters is available in the DelPhi 59, 60 manual (http://compbio.clemson.edu/downloadDir/delphi/delphi_manual.pdf).
The electrostatic force difference, ∆F , was defined as the difference between the electrostatic forces exerted on wild type and the corresponding mutant kinesins.
where ∆F and are vector quantities with components ΔF lat (lateral direction), ΔF long (longitude direction), and ΔF bind (binding direction). The relative force difference ΔF rel is defined as: The relative force difference in the binding direction ΔF bind,rel is defined as bind,rel bind,mut bind,wt b ind,wt where F bind,mut and F bind,wt are the components of the electrostatic force between the microtubule and the mutant and wild type kinesin in the binding direction, respectively.

Results
Electrostatic forces act between kinesin and tubulin. In our previous work, we demonstrated that the electrostatic forces on kinesin-5 form a binding funnel around the tubulin dimer 58 . A similar binding funnel was also found for dynein around the tubulin binding pocket 61 . In this work, we found that the binding funnel is common to kinesins, as shown for kinesin-13 as an example (Fig. 1), and that the electrostatic force guides the kinesin to the binding pocket of the tubulin. We obtained similar results for the other kinesins (Supplementary  Material Table S1). Because the electrostatic force is a vector, F , we examined its components in the longitudinal (F long ), lateral (F lat ), and binding (F bind ) directions ( Fig. 2) separately to further assess the role of electrostatic forces. We found that the magnitude of the mean electrostatic force, | | F avg , for the 10 wild type kinesins used in this study in the bound state was 1,450 ± 170 pN (results for each kinesin are shown in Supplemental Material Table S1). Preforming the same calculations for unbound kinesins (at a displacement of 5 Å from the tubulin) resulted in an 87% decrease in | | F avg to 192 ± 56 pN. However, despite the large drop in magnitude, the direction of that mean force, and therefore the contribution of individual components, in the bound state was statistically indistinguishable from the unbound state ( Table 2). The component of the mean electrostatic force in the binding direction, F bind,avg , contributed the most to the force magnitude (Table 2), and the components in the lateral, F lat,avg , and longitudinal, F long,avg , directions were not statistically different from zero ( Table 2).
Electrostatic forces and diseases. We calculated the electrostatic force differences, ∆F (Equation 1), of bound and unbound structures (Supplementary Material Table S1), where the force difference quantifies the difference in electrostatic force between the mutant and the corresponding wild type structure. Like the electrostatic force, force differences have three components in the longitudinal (ΔF long ), lateral (ΔF lat ), and binding (ΔF bind ) directions (Fig. 2). Besides the force differences, we also calculated the relative force difference ΔF rel (Equation 2). We found that mutations with larger values of relative force differences, ΔF rel , are more likely to cause disease (Supplementary Material Table S1).
We quantified the result that large ΔF rel tends to cause disease using Receiver Operating Characteristic (ROC) plots (Fig. 3). The area under an ROC plot indicates how well a descriptor, in this case ΔF rel , discriminates between two states, in this case whether a mutation is disease-causing or non-disease-causing. The area under an ROC plot of 1 indicates the descriptor can always discriminate between the states, and the area under a ROC plot of 0.5 (corresponding to the red dotted line in Fig. 3) indicates the descriptor is no better than random chance. We found that ΔF rel of unbound structures provided a better prediction of disease than ΔF rel of bound structures because the areas under the unbound state ROC plots were 0.84 and 0.84 for ΔF rel and ΔF bind,rel , respectively (Fig. 3) and the area under the bound state ROC plots were 0.79 and 0.77 for ΔF rel and ΔF bind,rel , respectively (Fig. 3). We also noted that ΔF rel performed slightly better than ΔF bind,rel for structures in bound states (Fig. 3). We obtained similar results from ROC plots (Supplementary Material Figure S1) of electrostatic force calculations at an ionic strength of 0.15 M, indicating that ionic strength does not play a role in discriminating disease-causing from non-disease-causing mutations. Thus, in the rest of the manuscript, we focus on results obtained with I = 0 M. Since disease can be caused by either decreasing or increasing the wild type force, we did ROC using the absolute values of ∆F and |ΔF bind |, for both unbound and bound states, which resulted in similar as above performance; areas under ROC curve ranged from 0.72 to 0.75 (Supplementary Material Figure S2).

Statistical analysis of electrostatic force components and disease-causing mutations.
We further investigated the unbound state's ∆F and its ∆F 's components as predictors of whether a mutation is disease-causing or non-disease-causing using histograms (Fig. 4). We found that all mutations in our study with ∆ > F 16 pN led to disease and that only 9% of the non-disease causing mutations had ∆ > F 4 pN (Fig. 4A). We also found that kinesins had a higher tolerance to ΔF bind and ΔF long than to ΔF lat . We found that only 9% of the non-disease-causing mutants had ΔF lat > 1 pN while 36% had ΔF bind > 1 pN and 27% had ΔF long > 1 pN (Fig. 4B,C,D). We also noted that mutations causing ΔF lat between 1 pN and 4 pN did a much better job distinguishing disease state because this range in ΔF lat contains 35% of all disease-causing but only 9% of non-disease-causing mutants, which is statistically significantly different (p-value = 0.02), but this same range in ΔF bind and ΔF long had percentages of disease-causing and non-disease-causing that were statistically indistinguishable (Fig. 4B,C,D).
Analysis of additional features that may be used to discriminate disease-causing and non-disease-causing mutations. We performed a statistical analysis of 23 features potentially affecting the pathogenicity of kinesin mutations using standard techniques (see Supplementary Material). By comparing the p-values of an F-regression analysis, we found that electrostatic force was the best predictor ( Table 3). The other good predictors were the secondary structure of mutation position, change in binding free energy, and the location of mutation site ( Table 3). The buried surface area, residue polarity, residue charge, etc., were not identified as significant features in predicting pathogenicity (Table 3).   Table 2. Mean electrostatic force magnitude and direction. Note: Values are reported as mean ± standard error of the mean; n = 10 wild type kinesin proteins.
We found that 88% of the disease-causing mutations occur in α-helices, coils, and turns (Fig. 5A). Only 31% of mutations located on strands caused disease, which is significantly fewer than the 61% and 53% disease-causing rates for mutations on coils and turns, respectively (Fig. 5A). Our data had few instances of mutations on 3-10 helices or salt bridges, therefore these mutations are not taken into further analysis.
Additionally, we noted that the disease-causing nature of a mutation was correlated to the function of the structure domain upon which it resides. 76% of mutations at the tubulin binding site were disease-causing (Fig. 5B), while mutations at other locations were disease-causing in only 46% of instances (Fig. 5B). Note that  since this study focused on the kinesin-tubulin interaction, mutations on ATP binding site were not taken into further discussion.

Discussion
We demonstrated that the changes of the electrostatic component of the force between kinesin and microtubule caused by amino acid mutations in the kinesin motor domain serve as a good discriminator between disease-causing and non-disease-causing mutations. 23 other features typically used by the computational community were also investigated, but we found them to be not as good predictors of disease state as the change of the electrostatic force. These results are remarkable because kinesin-related diseases may originate from nsSNPs causing effects within the motor domain not related to kinesin-microtubule binding. These effects may include disruption of nucleotide hydrolysis site because motility requires ATP hydrolysis 62 , proximity of the mutation to the location neck-linker-motor domain interaction site because motility requires neck linker docking 63,64 , and motor domain structural stability because structure and function are closely correlated in structured proteins. Additionally, the kinesin family to which the mutated protein belongs could also be an important factor because certain families may have more critical cell or developmental biological function than others, and certain families may have fewer functional redundancies with other motors within the family than others 65 . Moreover, that our results show electrostatic force is a good discriminator between disease-causing and non-disease causing mutations suggests that there is steep electrostatic potential energy well about the kinesin docking location on the microtubule. Because the force is proportional to the spatial gradient of potential energy, small changes in the electrostatic energy potential result in large change in the force. Therefore, it is likely that electrostatic force is an even better discriminator than electrostatic energy potential.
We checked if our results were biased by the location of the mutations sites relative to the kinesin-microtubule binding interface by generating a representative kinesin motor domain-tubulin structure (Fig. 6). We note that the kinesin motor domains studied in this work have similar structures (Fig. 6A), and thus we use one (kinesin-3) to visualize the location of mutations sites (Fig. 6B). We found that there is no preference for disease-causing mutations to be at the binding interface while non-disease-causing mutations are away.
Recent studies indicate that positively charged residues on the kinesin motor domain strengthens its interaction with the microtubule, while negatively charged residues have an opposite effect 22,23,25,66 . Consistent with these previous studies, we have found that most of the disease-causing mutations we studied involve charged residues. Our findings provide additional evidence for the importance of charged residues and electrostatics to kinesin motor domain microtubule binding. Furthermore, we found that Y274 and L248, which were previously identified as the top two most important uncharged residues for kinesin-microtubule binding 22,66 , were also associated with disease-causing mutations in kinesin-3 family member KIF1A (L249Q) and in kinesin-1 family  Our key result is that if a mutation causes a ∆ > F 4 pN in the unbound state, then it is very likely to cause disease. Such a threshold roughly corresponds to 1 kcal/mol binding energy, an energy threshold that is widely used to discriminate disease-causing from non-disease-causing mutations 67 . Below we investigate a few particular mutants more closely, as illustrative examples, to understand our result a bit better.
First, we noted that kinesin-3 family member KIF1A E253K is charge reversal, from a negatively charged glutamic acid residue to a positively charged lysine residue, and it resulted in the largest ∆F (Supplementary Material Table S1) in both the bound and unbound states. We looked carefully at the magnitude and direction of the force on each amino acid in this kinesin-3 (Fig. 7). We found that the mutated amino acid lies close to the tubulin interface: the distance between the CA atom of E253 and the closest CA atom on tubulin is 9.6 Å. Because the mutation flips the charge of the residue and it is so close to the highly-charged tubulin interface, the large change in force we calculated was likely do the negative-to-positive charge reversal. The negatively charged E253 in wild type kinesin-3 opposes binding (Fig. 7 red arrow), and the positively charged K253 in the mutant kinesin-3 favors binding (Fig. 7 blue arrow), to the net negatively charged tubulin dimer. It is therefore not surprising that the enhanced binding due to this mutation causes spastic paraparesis and sensory and autonomic neuropathy type-2 39 given that kinesin-3 drives long-distance transport in neuronal cells 9 .
Second, we noted that kinesin-8 family member KIF18A T273A mutant is the only non-disease causing mutant with a ∆ > F 4 pN; it had ∆ = . F 4 92 pN. We looked carefully at the location of this residue in the structure and found it to reside on an unstructured region (or at least one that is not in the PDB ID 3LRE structure) 68 on the microtubule binding surface 22 , thus leading to a relatively large calculated ∆F . However, the T273 is not highly conserved and the T273A does not change the motility of the kinesin in in vitro motility assays 22 . This could explain how this mutation is non-disease-causing despite relatively large ∆F .
Third, we noted that the kinesin-1 family member KIF5A S203C mutant has a low ∆ = . F 1 27 pN in the unbound state (Supplementary Material Table S1), well below the discrimination threshold of 4 pN, but is disease-causing. We looked carefully at the location of this mutation, and found it is located in close proximity (5.5 Å) to the Mg 2+ ion in the nucleotide binding site 69 . Specifically, S203 resides within a highly conserved sequence (NXXSSR, residues 199-204 of KIF5A) in switch I 30 , and it is thought to be important in recognizing the hydrolysis state of bound the nucleotide 70 . This could explain how this mutation causes a disease despite low ∆F , highlighting our discrimination method's limitation in finding all the true positive cases, particularly when mutations are unrelated to the kinesin-tubulin interaction.
While we did find that ∆ > F 4 pN in the unbound state is an excellent discriminator, we also found that the three components of the relative force difference, ΔF long,rel , ΔF lat,rel , and ΔF bind,rel , are also successful predictors, in their own right. These results suggest that the individual components of the binding force, particularly the lateral and longitudinal components, may be of critical importance for kinesin motility. It should be additionally noted that the magnitude of the electrostatic force is significantly (least 5-fold) larger in the binding direction  Kinesin-3 family member KIF1A (light blue) with the E253K mutation is shown bound to a tubulin dimer with α-tubulin (red) on the left side and β-tubulin (orange) on the right side. Most electrostatic forces (yellow arrows) on each residue of the kinesin-3 remain unchanged, but the force on residue 253 changes with the mutation, with both the force on wild type (red arrow) and the force on the mutant (blue arrow) shown. than the other two directions (Table 1). Thus, it is likely to be less sensitive to the changes in magnitude than the other directions. If a mutation changes the force in the binding direction a given amount, kinesin may still bind to tubulin properly, however if the force in the lateral or longitudinal direction were changed by that same amount, it may be significantly more sensitive to the difference. It should be noted that the absolute value of the electrostatic force change was found to be the best discriminator. Thus, mutations strengthening the binding are equally likely to be disease-causing as mutations weakening it. This is consistent with previous studies on other systems, indicating that these systems are optimized and any deviation away from the wild type properties could be disease-causing 4,67 .
Finally, it should be noted that this study considers the electrostatic component of the force acting between the kinesin and tubulin, not the total force. A kinesin motor domain that is not subjected to other external force, e.g. a cargo load, is at equilibrium on the microtubule. Therefore, at equilibrium, non-electrostatic forces must be acting at the tubulin-kinesin interface to balance out the large magnitude electrostatic forces we have calculated.