MD simulation of high-resolution X-ray structures reveals post-translational modification dependent conformational changes in HSF-DNA interaction

Heat shock factors (HSFs) constitute a transcription factor family playing regulatory roles in maintaining cellular protein homeostasis or mediating cell differentiation and development (Akerfelt et al., 2010, Bjork and Sistonen 2010, Westerheide et al., 2012). Some human diseases such as cancer and neurodegeneration are often linked with malfunction of HSFs (Dai et al., 2007, Neef et al., 2011, Scherz-Shouval et al., 2014). The human HSF family consists four members: HSF1-4, which exhibit tissue-specific expression profiles and possess unique but overlapping functions. HSF1 is the major regulator of the heat shock response, while HSF2 is more associated with development and cell differentiation. These two HSFs display similar module organization in their amino acid sequences. From the N-to C-terminus, there are a DNA-binding domain (DBD), an oligomerization domain, a regulatory domain and a trans-activation domain. DBD, which is responsible for recognizing and binding target genes, is the most conserved and the only structurally characterized domain to date (Littlefield and Nelson 1999, Jaeger et al., 2016, Neudegger et al., 2016). Both HSF1 and HSF2 undergo extensive post-transla-tional modifications (PTMs) such as phosphorylation, acetylation and SUMOylation, in an activation-attenuation cycle. Two PTMs occurring in the DBD of HSFs, i.e. acety-lation of K80 in HSF1 and SUMOylation of K82 in HSF2, have been demonstrated to play crucial roles in the regulatory mechanism of HSFs. The acetylation is believed to promote the release of HSF1 from the target gene in the attenuation phase (Westerheide et al., 2009), while the biochemical consequence of the SUMO modification is still a matter of debate as either strengthening or weakening the HSF2-DNA interaction was observed in different studies (Xing et al., 2005, Anckar et al., 2006). Despite the significance of these PTMs, the precise dynamic process and detailed structural effects inducted by the modifications remain unclear. In this study, we used combined techniques of X-ray crystallography and molecular dynamics (MD) simulations to address these questions. We firstly purified and crystallized the the DBDs from both human HSF1 and HSF2 as previously described (Feng et al., 2016), and also co-crystallized HSF1-DBD in complex with a 12-bp ds-DNA with the tail-to-tail orientation. These crystal structures were determined and refined at atomic resolutions (1.32–1.70 Å). The wing motif (residues 83-98 in HSF1 or 75– 90 in HSF2) that is highly unstructured in most DBDs crystallized so far was fortunately well resolved in one monomer of either HSF1 or HSF2 in our structures. Comparison …


Protein expression and purification
The nucleotide sequences encoding the DNA-binding domain of human HSF1-DBD (residues 15-120) and HSF2 (residues 7-112) were amplified by PCR using the full-length hsf1 and hsf2 genes as the templates, respectively.
To improve the solubility of the expressed products, both sequences were inserted into plasmid pMAL-c4E (New England Biolabs) for production of maltose-binding protein (MBP) fusion proteins in Escherichia coli. A Tobacco Etch Virus (TEV) protease cleavage site and a His 6 -tag were introduced to the N-termini of the destination proteins by PCR amplification. The recombinant plasmids were firstly transformed into E. coli strain Rosetta2, and pilot experiments showed that both DBDs were expressed in soluble form even after the removal of the MBP-tag by in vitro cleavage. To simplify the purification, an in vivo cleavage strategy was used by transforming plasmid pRK603, which contained a TEV protease encoding gene (Kapust and Waugh 2000), into the same strain prior to preparation of competent cells using the classical calcium chloride method. After transformation of the expression plasmids pMAL-HSF1DBD and pMAL-HSF2DBD into this strain, the bacteria were grown in LB medium containing 100 g/ml ampicillin, 50 g/ml kanamycin and 34 g/ml chloramphenicol at 37°C. Expression of the for HSF1-DBD were evaluated using HKL2000 (Otwinowski and Minor 1997), while the data for HSF1-DNA and HSF2-DBD were processed using XDS (Kabsch 2010). Further details of HSF2-DBD purification, crystallization and data collection were given in another report (Feng et al., 2016).

Structure determination and refinement
The crystal structure of HSF1-DBD was determined by means of molecular replacement using the Kluyveromyces lactis HSF-DBD structure (chain B in the PDB entry of 3HTS) (Littlefield and Nelson 1999) as a search model.
Automatic structure determination using Phaser (Bunkoczi et al., 2013) failed to give a clear solution until the search model was modified with side chain truncation from the Cβ atoms using Chainsaw (Stein 2008) in the CCP4 program suite (Winn et al., 2011). After automatic model building using Phenix.autobuild (Terwilliger et al., 2008), the structure was refined using phenix.refine (Afonine et al., 2012) with several rounds of manual remodeling between refinement cycles using the modeling toolkit Coot (Emsley et al., 2010). The structure of HSF2-DBD was solved by molecular replacement as well using the refined HSF1-DBD structure as a search model, and refined in the same way. The same protein model and a standard B-form DNA duplex generated by using Coot were used as search models for molecular replacement of the HSF1-DNA structure. All structures were validated using MolProbity (Chen et al., 2010). Statistics from the data collection and structure refinement are summarized in Table 1. All figures representing the refined structures were prepared using the molecular visualization program Pymol (Schrodinger 2010).

Modeling and MD simulations of acetylated HSF1-DBD interacting with ds-DNA
A 25-bp B-form ds-DNA fragment comprising a single GAA repeat in the middle part was generated using Nucgen from the Amber14 package (Case et al., 2015). A model mimicking the protein-DNA complex was obtained by aligning the refined HSF1-DBD structure to chain B in the PDB entry of 3HTS (Littlefield and Nelson 1999), and replacing chain A (the crystallized DNA in complex with Kluyveromyces lactis HSF-DBD in 3HTS) with the modeled 25-bp ds-DNA by superimposing the GAA repeats in both. Acetylated K80 in HSF1-DBD was modeled by replacing one hydrogen atom of ε-NH2 with CH 3 CO-group. The force field of this non-standard lysine residue (hereafter named as LMC) was derived from calculation of the restrained electrostatic potential (RESP) partial atomic charges using the GAMESS program package (Schmit et al., 1993) with HF/6-31G(d) as the basis set. The output values obtained from the GAMESS were used as an input for the antechamber to derive RESP partial atomic charges for the LMC residue. These suitable parameters of LMC were used for later simulation including all GAFF atom types and RESP partial atomic charges of LMC residue.
Molecular dynamics (MD) simulations were performed in parallel using the Amber14 package (Case et al., 2015) for the ensembles containing either wild type or acetylated HSF1-DBD using a five-step protocol including ensemble construction, minimization, heating, equilibration and production. The program suite Amber 14 was used to perform the simulations, with force field parameters from ff14SB and ff99bsc0 for protein and DNA, respectively. The two ensembles were put into truncated octahedrons with explicit solvent fully filled. The TIP3 water model was used for explicit simulation, with 24498 and 21264 water molecules filled in the boxes containing acetylated and unmodified HSF1 respectively. To neutralize the charges of two ensembles, 48 and 47 Na + ions were added into these water boxes, respectively.
After the ensemble building, three runs of simulation including a single run of the unmodified system and two independent runs from the same starting condition but different random seeds were carried out. All simulation calculations were parallel performed on NIVIDA Graphics card including GTX 690 and GTX 780. At first, two 10 ps minimization simulations were conducted with whole solutes and DNA strands fixed to clean the clashes between solutes. Then the ensembles were experienced by 200 ps heating simulation processes until the temperature up to 300 K. This time the whole solutes were fixed with stronger force constants to avoid explosion of systems. After the heating processes, ensembles were equilibrated during a 1 ns simulation.
Finally, 40 or 50 million steps of NPT simulation were performed in production with step size of 1 fs. Subsequently, another run of simulation for the acetylated system from the same starting model but with a different seed and longer simulation time (50 ns) was performed in order to test the reproducibility of the trajectories. Trajectory analysis was done by using the imbedded tools in Amber 14. The final conformations were extracted from long production simulations and aligned with each other to analyze the interaction differences and conformation changes.

Modeling and MD simulations of SUMOylated HSF2-DBD interacting with ds-DNA
A similar modeling protocol was used in building a model where HSF2-DBD bound to a 25-bp DNA comprising a single GAA repeat in the middle part. A SUMO2 model was obtained by removing the 2 C-terminal residues (V94 and Y95) from a reported NMR structure (PDB entry 2AWT). To mimic SUMOylation of K82 in HSF2, the SUMO2 model was then docked to the HSF2-DNA model with constraints of positioning its C-terminus located within covalent bonding distance to the ε-NH2 of K82 in HSF2 using ZDOCK (Chen et al., 2003). The resultant model thus contained an isopeptide bond connecting K82 in HSF2 and G93 in SUMO2, which were hereafter referred to as K82* and G93*, respectively. The force field parameters for these two isopeptide-bridging amino acids were extracted from the docking model with methylation at α-NH2 and α-COOH of this "bipeptide" to mimic the electric environment in protein context. The GAMESS package (Schmit et al., 1993) was used to calculate the electrostatic potential (ESP), with HF/6-31G(d) as the basis set, and the output values served as an input for the antechamber to derive RESP partial atomic charges for K82*.
Similar to the HSF1 simulations, the simulations of these systems were performed in parallel using the Amber14 package (Case et al., 2015) for the ensembles containing either wild type or SUMOylated HSF2-DBD using a five-step protocol including ensemble construction, minimization, heating, equilibration and production. The two ensembles were put into truncated octahedrons with explicit solvent fully filled. The TIP3 water model was used for explicit simulation, with 31648 and 21648 waters in the solvating boxes containing SUMOylated and unmodified HSF2 respectively. To neutralize the charges of two ensembles, 50 and 47 Na + ions were added into these boxes, respectively.
Similar to the HSF1 systems, three runs of simulation including a single run of the unmodified system and two independent runs from the same starting condition but different random seeds were carried out after the ensemble building. The simulation calculations were performed by following a same protocol as that for acetylated HSF1 bound to ds-DNA, except for 30 or 50 million steps in the production.

Overall crystal structures
The structures presented here belonged to different space groups and were refined with good model quality, as indicated by the crystallographic and stereochemical parameters given in Table 1. The asymmetric unit of the HSF1-DBD crystal contained only one protein monomer (Fig. S1a), while those in the cocrystal with DNA and the HSF2-DBD crystal both comprised four protein monomers (Fig. S1b, c). In the structures except HSF1-DBD, dimers of dimer were formed with the two monomers in each dimer related by two-fold non-crystallographic symmetry, but displaying different dimeric interfaces. In tetramer arrangement, however, there was no symmetry relating the two dimers in both structures. Obviously, crystal packing rather than molecular properties and shapes were most likely responsible for such diversity of oligomer organization of HSF-DBDs observed in these crystals.
In all these structures, most amino acids even including some histidine residues within the N-terminal His-tag are present in the refined models. The highly flexible wing loop (residues 83-98 in HSF1 or 75-90 in HSF2) invisible in almost all reported structures of HSF-DBD (Harrison et al., 1994, Littlefield and Nelson 1999, Jaeger et al., 2016, Neudegger et al., 2016 was also poorly defined in the electron density of most protein monomers and had to be omitted from the models. Even so, an intact wing loop in monomer C was observed in the complex structure of HSF1-DBD bound to DNA (Fig. S1b), and similarly in monomer D in the HSF2-DBD structure (Fig. S1c). It seems likely that the conformation of these loops is well fixed by contacts from neighboring protein monomers. Hereafter, we use these two monomers to represent the DBD structures of human HSF1 and HSF2, respectively.
The helix-turn-helix motif centred at the structural core, with two additional helices (α1 and α4) serving as N-and C-terminal extensions (Fig. S1d, e).
It is noteworthy that the structure of HSF2-DBD was determined at 1.32 Å, the resolution of which made alternative rotamer conformations became visible in electron density. The side chains of a total of 27 amino acids were defined in more than one rotamer, which otherwise led to significant residual F obs -F calc density and unreasonable R free values. The hydroxyl oxygen of S35 in monomer A, for example, could be undoubtedly interpreted by dual conformations (Fig. S1f).

Comparison with the DBD of K. lactis HSF
Among the functional modules, the DNA binding domain is the most conserved domain in HSF. The DBDs of human HSF1 and HSF2 share 49% and 50% sequence identity with K. lactis HSF, respectively (Fig. S2a), though with much lower overall homology for the full-length sequence. Structure comparison also revealed very similar architecture among these transcription factors.
Superimposition of HSF1-DBD or HSF2-DBD onto a previously reported structure of K. lactis HSF (PDB ID 3HTS) (Littlefield and Nelson 1999) resulted in good structure overlay with RMSDs of 0.933 or 0.881 Å, respectively (Fig.   S2b, c). In K. lactis HSF, a proline residue (P237) is unusually present at an internal position in helix α2, leading to the formation of a helical kink. This proline seems insignificant for protein function or stability but critically required for folding kinetics (Hardy and Nelson 2000). Notably, the proline residue is conserved in human HSFs (P58 in HSF1 and P50 in HSF2) (Fig. S2a), in which a similar kinked helix is formed (Fig. S2b, c), suggesting a common role of this amino acid.
Compared to the partial DBD structure in PDB entry 3HTS, a complete DBD architecture including the wing motif and the intact C-terminal conformation was observed in our and the recently published structures (Jaeger et al., 2016, Neudegger et al., 2016. Strikingly, a two-turn 3 10 helix (α4) occurs in the DBD of both human HSFs (residues 109-114 in HSF1 and 101-106 in HSF2). The presence of such a helix is quite unusual as a typical 3 10 helix contains only a single turn in protein structures, which would otherwise be unstable. In DBD structures, however, the C-terminal helix is well stabilized by the interactions from other structural elements, in particular the N-terminal helix (α1). The hydrophobic residues present in these two helices, e.g. L19 and W23 in α1 and L112 in α4 in HSF1, show close contacts (Fig. S2d). Furthermore, two phenylalanine residues preceding the C-terminal helix (e.g. F99 and F104 in HSF1) are involved in a hydrophobic cluster with the other two phenylalanines located in the helix-turn-helix motif (F44 and F78) (Fig. S2e). All these hydrophobic interactions surrounding helix α4 renders the C-terminal conformation rather rigid, which may constrain the conformational freedom of DBDs in an HSF trimer.

The wing motif
Unlike most DNA-binding proteins containing a winged helix-turn-helix motif, the wing in K. lactis HSF did not interact with the bound DNA duplex (Littlefield and Nelson 1999), which was very recent observed in recently reported structures as well (Jaeger et al., 2016, Neudegger et al., 2016. This motif, however, was proposed as an important structural element for full activity of K. lactis HSF (Cicero et al., 2001), the HSE-binding specificity of human HSF1 and the responses to heat stress (Ahn et al., 2001). The wing is highly unstructured in most DBDs crystallized so far, but fortunately it was well resolved in one monomer of either HSF1 or HSF2 in our structures (Fig. S3a,   b).
Intriguingly, the wing topology presented here was not identical to that observed in the recent published structures (Fig. S3c, d). In those structures (PDB ID 5D5V and 5D8K) where the wing puckered upward with a topology partially constrained by a hydrogen bond formed between the side chains of two conserved amino acids (H83 and E98 in HSF1 or H75 and E90 in HSF2).
By contrast, the wing revealed in our structures adopted a more open conformation probably because of the lack of this hydrogen bond. Although the location of the glutamic acid was conserved among these structures, the histidine residue was positioned 6 Å away, far from hydrogen bonding distance.
Notably, two positively charged residues closely upstream the conserved histidine (R79 and K80 in HSF1 or R71 and K72 in HSF2) contacted the DNA phosphate backbone in the published structures. We hence speculate that DNA-binding may be the major driving force for the formation of the hydrogen bond at the stem part of the wing, which renders this loop less flexible.
SUMOylation of K82 in the wing of HSF2 has been proven as an important PTM in regulating the activity of this transcription factor (Goodson et al., 2001, Xing et al., 2005. Although a lysine residue (K91) is present at the corresponding position in HSF1, it seems impossibly to be modified by SUMO (Anckar et al., 2006, Tateishi et al., 2009, Jaeger et al., 2016. A previous study has suggested that the residues downstream K82, in particular G87 and P88, were required for SUMOylation in HSF2-DBD (Anckar et al., 2006). These two amino acids are replaced by an aspartic acid (D96) in the HSF1 sequence (Fig.   S2a), which takes the same spatial position as P88 in HSF2 (Fig. S3e). The presence of this negatively charged residue seems to well constrain the flexibility of K91 by forming a salt bridge between them, and as a result, K91 in HSF1 adopts a partially buried conformation. In contrast, K82 in HSF2 is significantly more solvent exposed, and hence more accessible to a SUMO ligase in the cell. (Fig. S3e). Thus, the structural deviations revealed in these two DBDs could well explain the difference of SUMO modification between the two transcription factors.

Unspecific HSF1-DNA interactions
In this study, we cocrystallized the DBD of HSF1 with two tail-to-tail orientated HSE repeats, which was identical to that used in the structure of PDB 3HTS (Littlefield and Nelson 1999) or 5D5U (Neudegger et al., 2016). Unfortunately, we failed to observe a sequence-specific binding architecture in this high-resolution structure. Despite the protein-DNA stoichiometry remaining 1:1 in that crystal, only two DBD copies appeared bound to DNA (Fig. S1b).
Instead of being inserted into the major groove of DNA, the recognition helix of either monomer A or C lay along the double helix albeit over the major groove ( Fig. S4a). With such an orientation, conserved residues in the recognition helix and the preceding short loop, such as H63, R71 and Y76, failed to contact any nucleobases through direct or water-bridging hydrogen bonds, but instead interact with the phosphate backbone only.
A similar case has been observed in the cocrystal of K. lactis HSF-DBD bound to a 12-mer DNA, in which however the recognition helix also failed to contact DNA . Apparently, site-specific DNA binding was most likely blocked by crystal packing interaction in both cases. Different from those structures, however, major protein-DNA contacts were mediated by the recognition helix in our structure, which allows us to speculate that this binding manner might not be completely physiologically irrelevant. For most DNA-binding proteins, the first step in DNA recognition and binding would be the formation of unspecific contacts dominantly driven by electrostatic interactions (Ohlendorf and Matthew 1985). In this sense, a possible representation of such a binding architecture might be an initial phase in the dynamic process of HSF-HSE recognition and binding, and starting from this given state, the relative protein-DNA orientation is subsequently fine-tuned until final specific binding architecture is reached.  (Fig. S4b, c). In these cases, one DBD in a single trimer of HSF is supposed to bind DNA in an unspecific manner, although the other two DBDs must form sequence-specific interactions with the GAA triplets. As another scenario, the DNA binding architecture revealed in our structure might represent such a non-site-specific binding mode between one DBD in an HSF trimer and a non-consensus HSE repeat (Fig. S4b, c).      (c), Potential unspecific binding of one DBD in an HSF trimer to HSEs containing non-consensus pentanulcleotide repeats such as the gap-(b) or step-type HSE (c) Takemori 2007, Sakurai andEnoki 2010). Site-specific and unspecific bound DBDs are denoted by green and red circles, respectively.