Structural and calorimetric studies reveal specific determinants for the binding of a high-affinity NLS to mammalian importin-alpha

The classical nuclear import pathway is mediated by importin (Imp α and Impβ), which recognizes the cargo protein by its Nuclear Localization Sequence (NLS). NLSs have been extensively studied resulting in different proposed consensus; however, recent studies showed that exceptions may occur. This mechanism may be also dependent on specific characteristics of different Imp α . Aiming to better understand the importance of specific residues from consensus and adjacent regions of NLSs, we studied different mutations of a high affinity NLS complexed to Imp α by crystallography and calorimetry. We showed that although the consensus sequence allows Lys or Arg residues at the second residue of a monopartite sequence, the presence of Arg is very important to its binding in major and minor sites of Imp α . Mutations in the N or C-terminus (position P1 or P6) of the NLS drastically reduces their affinity to the receptor, which is corroborated by the loss of hydrogen bonds and hydrophobic interactions. Surprisingly, a mutation in the far N-terminus of the NLS led to an increase in the affinity for both binding sites, corroborated by the structure with an additional hydrogen bond. The binding of NLSs to the human variant Imp α 1 revealed that these are similar to those found in structures presented here. For human variant Imp α 3 the bindings are only relevant for the major site. This study increases understanding of specific issues sparsely addressed in previous studies that are important to the task of predicting NLSs, which will be relevant in the eventual design of synthetic NLSs.


INTRODUCTION
The nuclear envelope presented in the eukaryotic cells establishes an essential barrier to controlling and regulating cellular processes, such as gene expression and cell cycle evolution. The transport of the most well-characterized components through the nuclear membrane occurs after the recognition of classic nuclear localization sequences (NLSs) in proteins that will need to be imported [1][2][3]. These proteins in the cytoplasm are transported by a heterodimeric receptor comprised of importin α and β (Impα and Impβ, respectively). Impα recognizes and binds to NLS, and impβ transports the molecular complex through the nuclear pore [4][5][6].
Impα has two important regions, the impβ binding domain (IBB) present in the Nterminal and the region containing the NLS binding sites, which are responsible for interacting with other macromolecules awaiting transport [3,[7][8][9]. Classic NLSs (cNLS) are classified as monopartite when the basic amino acid cluster interact with the major site region of Impα and as bipartite when two basic clusters interact simultaneously with the major and minor site regions of Impα [1,10]. The major binding site is associated with the main binding region for monopartite cNLSs, and an eventual binding of monopartite cNLSs to the minor site is related to the high concentration used for the NLS peptides in the crystallization assays [8].
In order to gain further understanding of the specificities for monopartite NLSs' interactions with the protein receptor, Hodel and colleagues [14] performed an alanine scan to study the energetic contribution of each NLS residue using the sequences of SV40 T-Ag and Myc NLSs. They used an experimental approach to measure the K d for the binding between NLS-GFP fusion proteins and Impα using fluorescence assays. Interestingly, all tested mutations (P0 to P6) resulted in an affinity diminution which the higher impact was for P2 position (100-fold) and P3 and P5 (10-fold) for SV40 TAg fusion protein. In an effort to better categorize experimentally observed NLSs and to describe atypical NLSs, Kosugi and colleagues [15] used various types of NLSs by random peptide libraries, leading to the suggestion of six classes of NLSs, which includes two noncanonical (or atypical) NLSs [15].
In the case of bipartite NLSs, the study by Marfori et al. [16] (by solid-phase binding assays) using two NLSs with the consensus sequence (Bimax1 and Bimax2) revealed insights about the interaction of bipartite peptides into Impα.
Different structural studies emphasized the importance of particular amino acids in the N and C termini of NLSs, the role of flaking sequences in the NLS N-terminal region [17][18][19], and the role of P4 position using structural and functional assays [20]. Additionally, a review presented by Marfori and colleagues [5], complemented with data from databases and a predictive tool for NLSs. Furthermore, another review published by Christie and collaborators [21] exhibited the different pathways and the proteins related to the nuclear importation using structural biology data.
Different classes of NLSs were suggested by an analysis of the interactions of NLSs with Impα variants from a random peptide library, resulting in six classes, including two types of atypical NLSs [15]. Additionally, structural studies using Impα from the α1 Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20210401/915590/bcj-2021-0401.pdf by guest on 30 June 2021 family present in plants (rice) and fungus (Neurospora crassa), identified monopartite NLSs that only binds to minor site or have higher affinity to this site compared to the major site [22][23][24][25]. In contrast, a recent study with a fungus-specific NLS demonstrated its preference for the major binding site [26,27]. All these studies demonstrate the high variability of NLSs with several Impα from different families.
Some Impα variants may exist as a result of cellular duplication events. These variants can be divided into three main families (α1, α2 and α3), and may have preferences for specific NLSs that can be associated with specific roles [28][29][30]. Mouse genomes present six variants, while the human genome presents seven. A variant can play a fundamental role during the early stages of embryonic development [31] and during preferential interactions with specific proteins [32][33][34]. The causes for the specificity for NLS binding in different Impα variants has not been thoroughly studied, but the residues present near NLS binding sites have been proposed as one of the factors [23,25]. Of the seven variants for human Impα [30], HsImpα1 is the most general importer, as it carries many different proteins that contain the classic NLS in their sequence [28,35]. Furthermore, HsImpα1 has higher homology with the most studied Impα (MmImpα isoform α2) sharing approximately 99% of the amino acid sequence [35]. In contrast, the HsImpα3 has higher specificity in cargo import, such as transcription factors, nuclear factors and regulators of gene expression [36][37][38], which is probably related to the higher flexibility in the armadillo (ARM) repeats 7 and 8 that are positioned closer to the NLS binding site [20,35,38].
Several proteins related to DNA repair are transported to the cell nucleus by the classical NLS-import pathway [39], including proteins that belong to the five main DNA repair processes [40]. Several Impα/NLSs complexes from proteins related to DNA repair have been studied using crystallography and affinity techniques [18,41,42]. The heterodimer formed by the MLH1 and PMS2 proteins occurs along the pathway responsible for repairing errors in DNA replication when there is a mismatch repair (MMR) [39]. It has been shown that both MLH1 and PMS2 are imported via the classical nuclear import pathway, alone or forming a heterodimeric complex [42,43]. We previously studied the interaction of MLH1 NLS ( 466 SSNPRKRHRED 476 ) and PMS2 NLS ( 571 LATPNTKRFKKEE 583 ) complexed to Mus musculus Impα (MmImpα) and have demonstrated that Impα/MLH1 NLS has the higher affinity already observed in Impα/NLS complexes [42]. Due to this, MLH1 NLS was selected in the present study to further understanding of the classic NLS consensus.
In the present work different mutations in the nuclear localization sequence (NLS) of the MLH1 protein, a DNA repair protein with a very high affinity to MmImpα [42], were selected in order to study key amino acids responsible for this interaction. Thus, four mutated NLS peptides were co-crystallized with Impα. Their structures were solved by Xray crystallography, and, subsequently, isothermal titration calorimetry assays were performed to quantify these interactions and calculate their stoichiometry and other thermodynamic values. These assays yield valuable information regarding the structural determinants of little studied residues from the consensus sequences and adjacent positions. The data obtained here is not only useful in understanding the role of binding of MLH1 NLS, but for the general binding of NLSs to Impα since majority of these residues in particular positions are found in other NLSs. Additionally, in order to obtain a more general study, we performed bioinformatic assays aiming to obtain binding characteristics of these four mutated MLH1 NLSs with two different variants of human importin α (variants 1 and 3). GenOne company at purity levels that exceeded 95%.

Protein expression and purification.
The gene encoding the protein 6xHis-tagged truncated Mus musculus importin-α2 (MmImpα) was expressed using a heterologous system with E. coli (DH5α) and isolated by a His-trap nickel affinity column (5 mL, GE Healthcare), as previously used by Teh and colleagues [44]. The protein elution was evaluated using a gradient (0 to 100%) of 0.15 M Imidazole in 0.5 M NaCl and 0.02 M Hepes pH 7.0, followed by overnight dialysis using a buffer composed of 0.02 M Tris HCl pH 8.0 and 0.1 M NaCl. The protein was concentrated using a Vivaspin 20 (30kDa -GE Healthcare) and stored at -20ºC. The sample purity was evaluated using 12% SDS-PAGE [45].

Calorimetric assays.
Isothermal titration calorimetry (ITC) assays were used to quantify the binding of  The thermogram areas of peaks were determined automatically with Origin v.7.0 Add-on, provided by the manufacturer, and curve fitting was performed by binding polynomials as previously described [46].

Crystallization and X-ray data collection.
Single crystals of all four complexes of MmImpα/MLH1 NLS mutants were obtained using the conventional hanging-drop vapor-diffusion method [47]. Crystals were

Structure determination and refinement.
Data were processed using an XDS v.20180126 program [48], and the crystal structure of the complexes was solved by Fourier synthesis using the MmImpα/MLH1 NLS (PDB access code 5U5P) coordinates as the initial model [42]. The modeling process was conducted by alternating cycles of manual rebuilding using the Coot program v.0.8.9 [49] and automated refinement was performed using the PHENIX package v.1.12 [50]. To check the overall quality of the final model, the PHENIX package v.1.12 was used. X-ray crystallographic structure coordinates were deposited in the Protein Data Bank (RCSB PDB-www.rcsb.org). Data collection and refinement information were described in Table   1.
Comparative analysis.

Protein-peptide docking.
Molecular docking using the MLH1 and mMHL1 peptides and crystal structures of the HsImpα1 and HsImpα3 was performed by the Autodock Vina tool, present in the PyRx v.0.9.5 program [54]. The best complexes obtained were chosen by two selection criteria: i) the antiparallel conformation of NLS related to the Impα models and ii) Gibbs free energy expressed in kcal/mol. The first criterion was adopted because all Impα/NLS complexes solved to date present this conformation [55].

Crystallography structures of Importin-α bound to mMLH1 NLS.
The crystallographic structures of MmImpα were solved with the truncated conformation at the residues 1 to 69, which represents the IBB domain responsible for the autoinhibition of Impα [56]. Additionally, all the crystal structures present the conserved ten The crystal structure of the complex MmImpα/MLH1-R472K NLS only presented the NLS interacting at the major binding site (Fig. 1 A), featuring a monopartite structure as observed in other crystal structures of complexes MmImpα/Ku70 NLS and MmImpα/Ku80 NLS [18]. It was possible to insert 10 residues in the electron density (1.2σ) that correspond to the NLS ( 467 SNPRKKHRED 476 ), P1-P5 residues highlighted) that interact with Impα via 14 hydrogen bonds (Fig. 2 C).
The crystal structures of other three complexes present interactions of mMLH1 at both major and minor bind sites ( Fig. 1 B, C and D), as previously observed in the crystal structure of MmImpα/MLH1 NLS. The complex MmImpα/MLH1-E475A NLS present interactions of 10 residues of NLS at the major site ( 467 SNPRKRHRAD 476 ) via 14 hydrogen bonds with the Impα (Fig. 2 D). At the minor site, 8 NLS residues interact with the Impα via 11 hydrogen bonds, including the P1' to P4'positions ( 471 KRHR 474 ) (Fig. 2 E). Similarly, the complex MmImpα/MLH1-R470A NLS also presents interaction of the NLS at the major and minor binding sites of Impα. The numbers of hydrogen bonds at the major (14) and minor (11) binding sites are similar (Fig. 2 H and I). Finally, the complex MmImpα/MLH1-S467A NLS shows the interaction of P1 to P5 NLS residues at the major binding site through 17 hydrogen bonds and at the minor binding site through 12 hydrogen bonds (Fig. 2 F and G).
It was observed lower B-factors for the regions P1-P5 in the major site and P1'-P4' in the minor site compared to the entire peptides for all MmImpα/mMLH1 complexes ( Table 1). The isotherms obtained were fitted at two non-identical and independent binding sites. The dissociation constant (K d ) and the enthalpy (ΔH) were obtained from the thermograms shown in Figure 3 and described in Table 2.
NLSs complexes display a stoichiometry of 2, indicating that the peptides interact at the major and minor binding sites. The first two complexes present lower affinity than MmImpα/MLH1 NLS in both sites ( Table 2); however, the third complex (MmImpα/MLH1-S467A) present higher affinity than MmImpα/MLH1 NLS in both sites. The MmImpα/MLH1-R472K NLS complex display a stoichiometry of 1, suggestion the interaction of the peptide only with the major binding site. In addition, it can be observed for the Table 2 that the interaction between MmImpα and MLH1-S467A NLS present the higher K d at the major site compared to other three complexes.

Molecular docking using the MLH1 and mMLH1 NLSs with human variants
HsImpα1 and HsImpα3 were performed in order to study the binding behavior of NLS peptides at the minor and major binding sites. As demonstrated in the previous sections, the crystal structures of MmImpα/MLH1 NLS [42] and MmImpα/MLH1 mutants NLS complexes revealed the presence of monopartite peptides at similar binding sites, except the MmImpα/MLH1-R472K complex whose peptide was absent in its minor site.  (Table 3), as also found in MmImpα1/MLH1-R472K crystal structure.
In contrast, docking results using HsImpα3 with MLH1 and mMLH1 (MLH1-E475A, MLH1-R470A and MLH1-R472K) reveal the presence of the peptides interacting at the major site with no significant differences as found in HsImpα1, but no NLS were found at the minor site. The MLH1-S467A NLS are found at both the major and minor sites ( Table   3).

MLH1-R472K NLS-Mutation in the P3 position and abolishment of minor site binding.
According to the consensus sequence for monopartite NLSs [K(K/R)X(K/R)], the position P3 can be occupied by Lys or Arg residues. Thus, we decided to test the mutation Arg→Lys (R472K) in a monopartite NLS with a high affinity (MLH1 NLS) for the residue that binds at the P3 position of the major site. Additionally, this mutation also has an impact in the binding of this residue at the P2' position of the minor site.
Calorimetric data of the MmImpα/MLH1-R472K NLS complex showed a decreasing affinity in one order of magnitude compared to the non-mutated peptide ( Table   2) and a stoichiometry n=1, indicating a singular binding site. The crystal structure of this complex showed that the mutated residue (R472K) is still bound to the protein in the P3 position; however, the hydrogen bond of the Lys residue to Asp228 (Impα) present in the MmImpα/MLH1 structure is absent in the mutated complex (Fig. 2 C). Indeed, according to a previous study [14], an Ala mutation in the residue binding to the P3position had the second highest impact for SV40 TAg and Myc NLSs and only the mutation in the invariant P2 (Lys) had a higher contribution.
In Ku70 and Ku80 proteins, the formation of a heterodimeric complex involved in DNA repair occurs, but they also may have a separate role. The NLSs from both proteins have the same residues in the consensus region (P2-P5, Table 4), with exception of a presence of a Lys instead of an Arg in the P3 position (the same sequence of R472K NLS).
Reinforcing the present study, Ku80 NLS displayed a lower affinity (K d four times higher) than Ku70 NLS [18]. Other protein related to DNA repair, XPG (Xeroderma pigmentosum type G), presents its NLSs (XPG1 and XPG2) with K d in the same order of magnitude of MLH1 and also displays an Arg residue in the P3 position [41]. The optimized bipartite NLS sequences, Bimax1 and Bimax2 also present an Arg residue in the P3 position [5]. Finally, SV40 TAg NLS presents a Lys in the P3 position and a K d of 1.8 μM, that is eighteen times higher than MLH1 NLS [42]. Thus, considering all these data, it is possible to affirm that an Arg residue is more favorable than a Lys residue in the P3 position because it frequently occurs at the NLSs with high affinity, e.g., MLH1 and XPG1. Importantly, the mutation studied here (R472K) prevents the binding of the peptide to the minor site, highlighting the importance of Arg residue in the P2' position, as observed in several monopartite NLS sequences ( Table 4) SV40TAg NLS, 110 GPGSDDEAAADAQHAAPPKKKRKVG 132 ) [17] presents Arg bound to the P2' position (Table 4). Thus, although an arginine residue at the position is not mandatory in this position, it can be observed that the large majority of NLS peptides have KR residues in the P1' and P2' positions ( Table 4) [41].
In summary, although the consensus sequence allows Lys or Arg residues at the second amino acid of a monopartite basic cluster, the presence of an Arg in monopartite NLSs seems to be very important for their binding at both major (P3 position) and minor (P2´ position) sites of Impα, particularly for the variant MmImpα isoform α2 as studied here [21].

MLH1-R470A and E475A NLSs -Mutations in the P1 and P6 positions.
The importance of the P1 and P6 positions is not well established, and their affinity with importin has not been studied in depth. According to the study of Hodel and colleagues [14], Ala mutation in the residue binding at the P6 position impacted SV40 Tag (K d is ~5 higher) and Myc NLSs affinity (K d is ~14 higher), while an Ala mutation in the P1position impacted SV40 TAg (K d is ~2 higher). Few additional studies have considered the relevance of these positions [5,8,16,42]. Thus, we tested the mutations Arg→Ala and Glu→Ala in a monopartite NLS with high affinity (MLH1 NLS), which are responsible for the interactions at the P1 and P6 positions of the major site.
Calorimetric data of MmImpα/R470A and MmImpα/MLH1-E475A NLSs show that both mutations lead to an important decrease in the affinities of these peptides for MmImpα at both binding sites (major and minor sites). In particular, the mutation in the N-terminus (R470A) reduced the affinity 30-fold and 5-fold in the major and minor binding sites, respectively. Similarly, the mutation at the C-terminus (E475A) reduced the affinity of the Downloaded from http://portlandpress.com/biochemj/article-pdf/doi/10.1042/BCJ20210401/915590/bcj-2021-0401.pdf by guest on 30 June 2021 mutated peptide to Impα, but with less intensity. It decreased the affinity by 7-fold and 2.5fold for the major and minor sites, respectively ( Table 2).
Regarding the minor binding site, both R470A and E475A NLS structures maintain eleven hydrogen bonds when compared to non-mutated NLS; however, the Asn361 (Impα)-Lys471 (NLS) is replaced by the interaction Asn361 (Impα)-Ala470 (NLS) (Fig. 2 I). Thus, these NLS mutations have less impact in the minor site than in major binding site, as observed in the calorimetric study.

MLH1-S467A NLS -Mutations in N-terminal sequence.
The importance of flaking sequences in the N-terminal region of the monopartite NLS sequences have been emphasized in some articles. The first structural study that used the N-terminally extended SV40TAg NLS (CN-SV40TAg NLS) [17] provided important information about the binding profile of this region, which was also observed in the Ku70 NLS structure [18]. This study [17] aimed to analyze the role of the phosphorylation of specific residues in the N-terminal region, which was further investigated by other authors [19,58]. Kosugi and colleagues [15] classified proposed classes with residues in the N- terminal region (P1 position). More recently, a possible role for Ser residue in the N-termini of MLH1 NLS was evidenced. However, no study has related structural and affinity data in the residues of the N-terminal region to date. Thus, we have tested the mutation Ser→Ala in MLH1 NLS for the residue that binds at the P-2 of the major site.
Surprisingly, contrasting to the other three complexes presented here, calorimetric studies with MmImpα/MLH1-S467A NLS showed a higher affinity for both binding sites, despite the values being in the same order of magnitude of non-mutated complexes ( Table 4). The crystal structure of the complex MmImpα/MLH1-S467A NLS reveals the presence of an additional hydrogen bond Tyr277 (Impα) -Asn468 (NLS) in the major site, which can be related by the increase of NLS accessibility led by the mutation S467A (Fig. 2 F). At the minor site, the hydrogen bonds found between the Impα and NLS from both structures (MmImpα/MLH1-S467A NLS and MmImpα/MLH1) remain similar.
Indeed, bipartite NLSs with higher affinities had prolines and acidic amino acids in the linker region, and in the regions closer to the minor and major binding sites [15].
Another study showed that the monopartite conformation of the XPG1 peptide ( 1057 KRGITNTLEESSSLKRKRL 1074 ) is more favorable than the bipartite version of this peptide due, among other factors, to the presence of uncharged polar residues in the linker region (Ser, for example) [41]. Finally, serine residues in the flaking region of the NLSs have also been related to the phosphorylation process, which is able to further modulate the nucleocytoplasmatic trafficking [17,19,59,60]. Thus, although, this regulation process was not yet observed for this specific residue of the MLH1, our study shed some light on future studies related to this topic [61].
In contrast, docking solutions for HsImpα3 and MLH1, MLH1-R472K, MLH1-E475A, and MLH1-R470A NLSs were only found at the major site. A structural comparison between HsImpα3 and HsImpα1 minor sites reveals that three natural substitutions in key residues near the binding site: N350K, K392G and S406I are able to prevent the interaction of the NLSs (Fig. 4). Other structural feature observed is the presence of the natural substitution K435Q (HsImpα3 compared to HsImpα1), which changes the positive charge to neutral and also can interfere with the interaction of NLSs at the minor site (Fig. 5).
Additionally, an extensive search in the PDB server did not present any crystal structures of the HsImpα3 in the presence of monopartite peptides at the minor site, suggesting that this site does not participate in the transport of proteins with monopartite NLS. However, docking assays using HsImpα3 with MLH1-S467A NLS produced a favorable solution with -6.4 kcal/mol of Gibbs free energy ( Table 3) (Fig. 7A and B). This fact can be related to the very high affinity of MLH1-S467A NLS for MmImpα and the previously cited mutations found in the HsImpα3 sequence that can change the protein surface charges (Fig. 6A). Interesting, the mutation S467A (NLS) changes the peptide accessibility, causing the interaction of the next residue of the peptide (Asn468) with the Ser407 of the HsImpα3 (Fig.   6 B), which in turn causes the shift of an amino acid position of the NLS peptide (when compared to the MmImpα crystal structure), changing Lys→Arg at the P1', Arg→Lys at the P2' and His→Arg at the P3'positions ( Fig. 7 C, D and E).
This study corroborates previously published results [23, 25-27, 32-34, 65] that observed high specificity for the major or minor sites of Impα from particular organisms or variants from the same organism, while other Impα [27] display affinity for both minor and major sites depending on each NLS. All this complexity of different Impα and variety of NLSs is responsible for the selectivity of the importation process of different cargo proteins and is also dependent of other regulation mechanisms, including phosphorylation of particular residues near the NLSs [17,19,57,59].

CONCLUSION
The transport of important components through the nuclear membrane is essential for cellular processes, including gene expression and DNA repair and transcription. The most general importer is the importin α (Impα), which recognizes the proteins that can be transported by the Nuclear Localization Sequence presented in the cargo sequence. This work presents crystallographic, calorimetric and bioinformatic studies, including the Impα from Mus musculus and four MLH1 NLSs with mutations in key residues responsible for interaction with Impα. Based in the data presented here, we were able to move forward on some specific issues that had not been addressed in previous studies.
Finally, bioinformatics studies with two different variants of human Impα and their comparison with the well-studied Mus musculus Impα also revealed how specific the nuclear transport mediated by Impα can be.

COMPETING INTERESTS
The authors declare that there are no competing interests associated with the manuscript.