Raw nuclear magnetic resonance data of human linker histone H1x, lacking the C-terminal domain (NGH1x), and trajectory data of nanosecond molecular dynamics simulations of GH1x- and NGH1x-chromatosomes

Linker histone H1 plays a vital role in the packaging of DNA. H1 has a tripartite structure: a conserved central globular domain that adopts a winged-helix fold, flanked by highly variable and intrinsically unstructured N- and C-terminal domains. The datasets presented in this article include raw 2D and 3D BEST-TROSY NMR data [1H-15 N HSQC; 15 N and 13C HNCO, HN(CO)CACB, HNCACB, HN(CA)CO] recorded for NGH1x, a truncated version of H1x containing the N-terminal and globular domains, but lacking the C-terminal domain. Experiments were conducted on double-labelled (15 N and 13C) NGH1x in 'low' and 'high salt,' to investigate the secondary structure content of the N-terminal domain of H1x under these conditions. We provide modelled structures of NGH1x (in low and high salt) based on the assigned chemical shifts in PDB format. The high salt structure of NGH1x (globular domain of H1x [GH1x; PDB: 2LSO] with the H1x NTD) was docked to the nucleosome to generate NGH1x- and GH1x-chromatosomes. The GH1x-chromatosome was generated for comparative purposes to elucidate the role of the N-terminal domain. We present raw data trajectories of molecular dynamics simulations of these chromatosomes in this article. The MD dataset provides nanosecond resolution data on the dynamics of GH1x- vs NGH1x-chromatosomes, which is useful to elucidate the DNA binding properties of the N-terminal domain of H1x in chromatin, as well as the dynamic behaviour of linker DNA in these chromatosomes.


a b s t r a c t
Linker histone H1 plays a vital role in the packaging of DNA. H1 has a tripartite structure: a conserved central globular domain that adopts a winged-helix fold, flanked by highly variable and intrinsically unstructured N-and C-terminal domains. The datasets presented in this article include raw 2D and 3D BEST-TROSY NMR data [1H-15 N HSQC; 15 N and 13C HNCO, HN(CO)CACB, HNCACB, HN(CA)CO] recorded for NGH1x, a truncated version of H1x containing the N-terminal and globular domains, but lacking the C-terminal domain. Experiments were conducted on double-labelled (15 N and 13C) NGH1x in 'low' and 'high salt,' to investigate the secondary structure content of the N-terminal domain of H1x under these conditions. We provide modelled structures of NGH1x (in low and high salt) based on the assigned chemical shifts in PDB format. The high salt structure of NGH1x (globular domain of H1x [GH1x; PDB: 2LSO] with the H1x NTD) was docked to the nucleosome to generate NGH1x-and GH1x-chromatosomes. The GH1x-chromatosome was generated for comparative purposes to elucidate the role of the Nterminal domain. We present raw data trajectories of molecular dynamics simulations of these chromatosomes in this article. The MD dataset provides nanosecond resolution data on the dynamics of GH1x-vs NGH1x-chromatosomes, which is useful to elucidate the DNA binding properties of the Nterminal domain of H1x in chromatin, as well as the dynamic behaviour of linker DNA in these chromatosomes.
© How data were acquired NMR data were acquired on Bruker Avance III HD spectrometers, operating at 700 or 950 MHz 1 H frequency, and equipped with cryogenically cooled triple-resonance (HCN) probes and pulsed z-field gradients at 5 °C (278 K). GH1x (PDB:2LSO) and NGH1x [1] were docked onto a nucleosomal template containing 20 bp of linker DNA and complete core histone tail domains [12] and subjected to energy minimization in YASARA [ 2 , 3 ]. Molecular dynamics simulations were done using GROMACS [ 4 , 5 ] employing the AMBER03 force field [6] . Final energy minimization was done using the steepest descent algorithm in YASARA [7] . Value of the Data The NMR data provide information on the structure and dynamics of the N-terminal domain (NTD) and globular domain of H1x, whereas the MD trajectory data provide atomic resolution information on the dynamics of these domains in chromatosomes.
Structural biologists, molecular biologists, epigeneticists and computational biologists interested in chromatin structure or intrinsically disordered proteins will find the data useful. The NMR datasets are useful for the development of NMR pulse sequences for experiments on intrinsically unstructured proteins and/or experiments conducted in high ionic strength conditions. The MD data can serve as a benchmark in the development of molecular dynamics simulations of chromatosomes and for the development of experiments to verify chromatosome models. Together, the data can be used to evaluate the effect of the N-terminal domain of H1x on the position and orientation of the globular domain of H1x in the nucleosome, chromatosomal protein-DNA interactions, and linker DNA conformation.

Raw NMR data
Raw NMR data recorded on Bruker Avance IIIHD spectrometers, operating at 700 or 950 MHz 1 H frequency, and equipped with cryogenically cooled triple-resonance (HCN) probes and pulsed z-field gradients at 5 °C (278 K) are provided. A description of the data files are provided in the accompanying data repository entry. The raw data provided form the basis of the findings published in [12] . We also offer models of NGH1x (in low and high ionic strength conditions) in PDB format.

Raw MD trajectory data
The data presents nanosecond molecular dynamics simulation trajectories generated in GRO-MACS. GROMACS molecular dynamics run files are provided in MDP format and the trajectory files in XTC format. Water molecules and ions were not included in the trajectory files. Files containing molecular structures in Gromos87 format (.gro) were also included. Trajectory files span 600 ns, and coordinates and velocities were recorded in 20 ps intervals. Trajectories can be viewed in VMD [11] . We also provide MD quality control data and trajectory energy parameters ( Table 1 ). Fig. 1 and Table 2 .
The MD simulations data presented in this manuscript differ from those previously published [1] . The simulations in [1] formed part of the docking procedure to generate the starting structures ( Fig. 2 ) 13 x diluted sample). Flow-through from the spin concentrator was loaded in the lane labelled 'FT1 . The gel was stained by Coomassie Brilliant Blue R-250. NGH1x migrated slower than expected due to its highly basic nature. GH1x-chromatosome frame 1 (A) and NGH1x-chromatosome frame 1 (B) are illustrated. DNA is shown in light purple, core histones, and core histone terminal tails in red, GH1x in light blue, and NGH1x in green cyan. Cartoon structures were rendered with PyMOL.

NMR spectroscopy
The generation, expression, and purification of NGH1x [human H1x residues 1-120] were performed as reported in [12] . Briefly, a pET22b( + )-based expression construct was employed,   into which a stop codon was inserted 5 to the vector's 6x His-tag to achieve expression of the untagged protein. Expression of NGH1x was achieved in E. coli BL21DE2 cells, and NGH1x was purified using a three-step FPLC procedure that incorporated hydrophobic interaction chromatography (HIC), ammonium sulphate precipitation and ion-exchange chromatography (IEX) [13] . NMR experiments were performed as described [12] . Two-dimensional (2D) 1H-15 N BEST-TROSY spectra, as well as three-dimensional (3D) HNCO, HN(CO)CACB, HNCACB, HN(CA)CO and HADAMAC, were recorded on Bruker Avance IIIHD spectrometers (operating at 700 or 950 MHz 1 H frequency).
Assigned spectra of NGH1x in low-and high ionic strength conditions based on the raw data provided here have been published [12] .
The structure of NGH1x under low-and high salt conditions was modelled using TALOS + and CS-ROSETTA.

Molecular dynamics simulations
GH1x (PDB: 2LSO) and the high salt structure of NGH1x was docked to a nucleosomal template containing complete core histone tails and 20 bp of linker DNA [19] using the docking algorithm described in [1] . The GH1x-chromatosome was generated for comparative purposes. The resulting GH1x-and NGH1x-chromatosomes served as starting structures for the MD simulations reported here. Energy minimizations (EM) of starting structures were done in YASARA using the AMBER03 force field and long-range electrostatics. Each starting structure was placed within a rectangular simulation cell (187.30 Å x 187.30 Å x 187.30 Å ), and the system was solvated with 203 464 explicit TIP3P solvent molecules.
MD simulations were performed at the University of the Free State (UFS) High-Performance Computing (HPC) Cluster using GROMACS v 4.6.7. The AMBER03 all-atom force field and TIP3P water model [14] were used. Periodic boundary conditions were applied, and long-range electrostatics were treated with the PME method (grid spacing: 0.16 nm and 0.8 nm cut-off). P-LINCS [15] was implemented to constrain bonded hydrogen motions, and the SETTLE algorithm [16] was used to limit solvent motions.
An initial EM run was performed to remove steric hindrance and irregular bond lengths in the starting structure. Two simulation runs were performed with nucleosome positions constrained for further equilibration: a 2 ns NVT simulation to equilibrate the temperature of the system to 300 K using the velocity rescaling thermostat [17] , and a 20 ns NPT simulation to equilibrate the pressure of the system to 1 bar using the Parrinello-Rahman barostat [18] .
GH1x-and NGH1x-chromatosome production runs were individually performed over 600 ns with a time step for integration of 2 fs at 300 K and 1 bar on the unconstrained nucleosome. Trajectory frames were saved every 0.2 ps to produce a total of 30 0 0 0 simulation frames. The production runs were each performed on a total of 192 cores and ran for 86 days on the HPC Cluster at the UFS. EM simulations were conducted with the steepest descent algorithm using GROMACS. Preliminary MD quality control analysis was performed in GROMACS. RMSD and drifts were calculated with full precision.
Following MD simulations, energy minimizations were again performed using a steepest descent algorithm in GROMACS. The procedure terminated after energy conversion.
We provide MD quality control data and energy parameters in the associated Mendeley data entry.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.