Hydrogens and hydrogen-bond networks in macromolecular MicroED data

Graphical abstract


Introduction
Microcrystal electron diffraction (MicroED) has been successful in structure determination of crystalline biological specimens using electron cryo-microscopy (cryo-EM) (Nannenga et al., 2014b(Nannenga et al., , 2014aShi et al., 2013;Yonekura et al., 2015), including novel structures (Clabbers et al., 2021;Rodriguez et al., 2015;Sawaya et al., 2016;Xu et al., 2019), as well as difficult to crystallize membrane proteins in detergent and lipid mixtures (Liu and Gonen, 2018;Martynowycz et al., 2020;Martynowycz et al., 2021b). As electrons interact more strongly with matter than X-rays (Henderson, 1995), the crystal volume required for useful diffraction is typically about a million times smaller. Electrons are scattered by the electrostatic potential and the strength of scattering depends on the charged state of atoms (Cowley, 1995). The effects of charge distribution are already apparent at moderate to low resolution (Yonekura et al., 2018(Yonekura et al., , 2015, and the charged state of residues in macromolecules has previously been investigated using electron crystallography (Kimura et al., 1997;Mitsuoka et al., 1999;Yonekura et al., 2015).
Electrostatic potential maps obtained from electron scattering can provide strong contrast for identifying hydrogen atoms, which has enabled localizing hydrogens in electron diffraction structures of small molecule organics and peptide fragments Dorset, 1995;Gallagher-Jones et al., 2018;Gruene et al., 2018;Jones et al., 2018;Palatinus et al., 2017;Rodriguez et al., 2015;Sawaya et al., 2016). Identifying the positions of hydrogen atoms and visualizing their resulting hydrogen bonding networks are crucial for understanding protein structure and function such as resolving precise drug or ligand binding interactions (Clabbers et al., 2020;Martynowycz et al., 2021b;Purdy et al., 2018) or elucidating mechanisms for substrate transfer in membrane protein transporters and channels (Gonen et al., 2005;Liu and Gonen, 2018). In single-particle cryo-EM imaging, individual hydrogen atom positions were localized from reconstructions of apoferritin at 1.2 Å resolution (Maki-Yonekura et al., 2021;Nakane et al., 2020) and for the GABA A receptor at 1.7 Å resolution (Nakane et al., 2020). Here, hydrogen atoms were identified by omitting them from the model and inspecting the peaks in a calculated mF O − DF C difference map following refinement in Servalcat based on crystallographic refinement routines implemented in REFMAC5 Yamashita et al., 2021).
Visualizing hydrogen atoms in macromolecular X-ray crystallography generally requires (sub-) atomic resolution data. The accuracy of localizing hydrogens varies with local structural flexibility that is reflected by the temperature factors. Typically, crystals of macromolecules are more disordered than peptides or small molecules and have a much higher solvent content. Therefore, in absence of high-quality and atomic resolution data, identification of hydrogen atoms in macromolecular MicroED structures has thus far remained elusive.
Recently, we reported the structure of triclinic hen egg-white lysozyme at 0.87 Å resolution from electron-counted MicroED data (Martynowycz et al., 2022). MicroED data were collected from 16 crystal lamellae and the structure was phased ab initio as described previously (Martynowycz et al., 2022). Following density modification, individual atoms could be resolved at sub-Ångström resolution, enabling automated model building of the entire structure without reference to a previously determined homologous model (Martynowycz et al., 2022). The improvement in data accuracy and resolution in this study compared to previous efforts was realized by combining focused ionbeam milling to produce approximately 300 nm thin crystalline lamellae ideal for cryo-EM at 300 kV (Martynowycz et al., 2021a), and collecting data in electron-counting mode at a significantly reduced total exposure of only 0.64 e -.Å − 2 per crystal (Martynowycz et al., 2022). A low exposure rate is required for electron counting as it ensures that the rate of scattered electrons remains within the linear range of the camera. Lowering the total exposure also reduces the effects of radiation damage that can affect the structural integrity of the protein and the ability to localize hydrogen atoms Leapman and Sun, 1995).
Here, we set out to further refine the ab initio model resulting from automated building against the subatomic resolution MicroED data to closely examine the individual hydrogen atom positions. We demonstrate that over a third of all hydrogen atoms can be identified from strong difference peaks, the most complete view of a macromolecular hydrogen network visualized by electron diffraction to date. We describe the hydrogen bonding interactions that are observed, as well as the charged states of residues and hydrogen bond networks. Furthermore, analysis of the hydrogen bond lengths from the MicroED data reveals that these are more accurately described by the inter-nuclei distances. The results illustrate that MicroED can provide accurate structural information on hydrogen atoms and hydrogen bonding interactions.

Crystallization and sample preparation
Crystalline lamellae of triclinic lysozyme were prepared as described previously (Martynowycz et al., 2022). Briefly, crystals of hen egg-white lysozyme (Gallus gallus) were grown by dissolving 10 mg/ml protein in a solution of 0.2 M sodium nitrate and 50 mM sodium acetate at pH 4.5. After incubation overnight at 4 • C an opaque suspension was observed. After further incubation for one week at room temperature a crystalline slurry containing microcrystals appeared. Samples were prepared by depositing 3 μl of the crystalline slurry onto a glow-discharged EM grid (Quantifoil, Cu 200 mesh, R2/2 holey carbon). Excess liquid was blotted away and the sample was vitrified using a Leica GP2 vitrification robot. Grids were transferred to an Aquilos dual-beam FIB/SEM (Thermo Fisher) and crystals were milled to lamellae with an optimal thickness of approximately 300 nm as described previously (Martynowycz et al., 2021a;Martynowycz et al., 2022).

Data collection and processing
Electron-counted MicroED data were collected on a Titan Krios 3Gi TEM (Thermo Fisher) operated at 300 kV as described previously (Martynowycz et al., 2022). Briefly, the TEM was set up for low exposure data collection using a 50 μm C2 aperture, spot size 11, and a beam diameter of 25 μm. A 100 μm SA aperture was used, corresponding to an area of 2 μm diameter on the specimen. Crystalline lamellae were continuously rotated over a range of 84 • at a rotation speed of 0.2 • /s over 420 s with a total exposure of approximately 0.64 e -.Å − 2 per dataset. Data were recorded on a Falcon 4 direct electron detector (Thermo Fisher) in electron-counting mode operating at an internal frame rate of 250 Hz. Data from 16 crystal lamellae were integrated using XDS (Kabsch, 2010) and scaled and merged in AIMLESS (Evans and Murshudov, 2013). The structure was phased ab initio by placing a three-residue idealized α-helical fragment using PHASER (McCoy et al., 2007) followed by density modification in ACORN (Foadi et al., 2000). The entire structure was built automatically using BUCCANEER (Cowtan, 2006) and refined in REFMAC5  using electron scattering factors.

Identification of hydrogen atoms
The structure was manually inspected and remodeled using Coot (Emsley et al., 2010), and re-refined with REFMAC5  using electron scattering factors. Hydrogen atoms were added in idealized riding positions. A hydrogen-only omit map was calculated from the final structural model by REFMAC5 . Peaks in the mF O − DF C difference map at a threshold ≥ 2.0σ above the mean were identified and listed using PEAKMAX in the CCP4 software package (Winn et al., 2011, p. 4). Difference peaks that fell within 0.5 Å of the idealized distance for the known positions were assigned as hydrogen atoms.

Figure and table preparation
Figures were prepared using ChimeraX, and the matplotlib package in Python 3.6. Figures were arranged in PowerPoint. Tables were arranged in Excel.

Identifying hydrogen atoms in macromolecular MicroED data
First, the structural model of triclinic lysozyme resulting from automated building was refined using electron scattering factors, isotropic atomic displacement parameters, and the default riding hydrogen model in REFMAC5 . Twelve alternate side-chain conformations were modeled upon visual inspection using Coot (Emsley et al., 2010), and their occupancies were refined. The model was then refined using anisotropic B-factors until convergence (Supplementary Table 1). A crystallographic mF O − DF C difference map was calculated using a model without hydrogen atoms (Yamashita et al., 2021). Peaks in the difference hydrogen omit map at greater than or equal to 2.0σ above the mean were then identified using PEAKMAX , and those within 0.5 Å distance from any idealized riding position were identified as potential hydrogen atoms. In this manner, we located 376 out of 1067 possible hydrogen atoms corresponding to about 35 % of the entire structure. Within 10 Å of the model including solvent regions, 1369 peaks are identified at a threshold ≥ 2.0σ. Lowering the threshold to 1.0σ revealed a total of 562 hydrogen atom positions, approximately 53 %. At contour levels below 2.0σ, the difference map is noisier, increasing the chance of false positives and making it more challenging to unambiguously identify peaks as hydrogen atoms. Nevertheless, these results consitute the most complete hydrogen bonding network visualized to date by macromolecular MicroED (Table 1).
Overall, the protein main chain is expected to be more rigid than the side chains; we consequently expect more hydrogen atoms to be found in the backbone than in the protein side chains. At the 2.0σ threshold, we identified 61 out of 141 possible Cα-H hydrogens and 76 out of 127 peptide N -H hydrogen bonds corresponding to approximately 43 and 60 % of the entire backbone structure, respectively (Table 1, Supplementary Tables 2, 3 and 8). The backbone hydrogen atoms are structurally important and can be involved in forming and stabilizing secondary structural elements via hydrogen-bonding interactions. For example, the structure of lysozyme has two short antiparallel β-strands and we could identify three strong difference peaks at > 3.0σ indicating the positions of those hydrogen atoms involved in hydrogen bonding interactions (Fig. 1a, Supplementary Video 1). The average N -H distance in the β-strands is 1.14(26) Å, and the distance between the amide group hydrogen donor and carbonyl acceptor is 2.76(9) Å (Table 2). Interestingly, whereas the Asp52 and Gly54 N-H distances are close to the idealized positions, the difference peak for the Asn44 N -H is located at an almost equal distance between the donor and the Asp52 carbonyl acceptor (Fig. 1a, Table 2). The structure of lysozyme is further composed of several short helices, and we could identify a total of 15 hydrogen bonding interactions in the three major α-helices (Table 2). For example, in the longest 12-residue α-helix we identified 6 out of 10 possible hydrogen bonds based on strong difference peaks at > 2.7σ (Fig. 1b, Supplementary Video 2). The average hydrogen atom peptide N -H distance for the α-helices is 0.97(14) Å with an average distance between donor and acceptor of 2.84(13) Å (Table 2).
Higher flexibility and alternate conformations can affect localizing hydrogen atoms in the side chains. Nevertheless, we could successfully localize side-chain hydrogen atoms in the data and identify several hydrogen-bonding interactions between side-chain atoms (Fig. 2, Supplementary Table 2). For example, a difference peak at 2.4σ can be resolved between His15-NE2 and Thr89-OG1 indicating a possible shared hydrogen bond between both side chains (Fig. 2a). As expected at pH 4.5, the data show the solvent-exposed histidine to be protonated at ND1, although the hydrogen distance and angle are different from idealized geometry (Fig. 2a). Another example of hydrogen bonding interactions is illustrated for Tyr53-OH acting as a hydrogen donor to Asp66-OD1 with a strong difference peak at 3.4σ (Fig. 2b, Supplementary Table 2).
In single-particle cryo-EM, it was previously observed that acidic side chains were poorly resolved at moderate to low resolution owing to radiation damage and due to the rapid falloff of the electron scattering factors for negatively charged atoms at lower scattering angles (Maki-Yonekura et al., 2021;Yonekura et al., 2018Yonekura et al., , 2015. In the MicroED data, the acidic aspartate and glutamate residues and their negatively charged side-chain carboxyl groups are generally well resolved (Fig. 2c). Additionally, clear difference peaks at > 2.3σ were identified in the data for the amide side-chain nitrogen for asparagine and glutamine residues, making it possible to clearly distinguish between the nitrogen and oxygen atoms of the side-chain amide group (Fig. 2c).
Difference peaks were also identified for several water molecules that are involved in hydrogen bonding interactions with the protein backbone and side chains (Fig. 2d, Supplementary Video 3). Such hydrogen bonding networks can act as long-range proton transfer wires. For example, a water molecule is coordinated with the adjacent Ser91, Leu56, and Tyr53 residues and shows two strong difference peaks at ≥ 2.7σ (Fig. 2d). Two additional water molecules show hydrogen atom peaks at ≥ 2.2σ and are involved in hydrogen bonding interactions with each other and residues of the neighboring protein backbone (Fig. 2d).
The O-H hydrogen bond lengths and angles of the water molecules are reasonably close to ideal values, except for one of the differences peaks for w1001 which is significantly shorter at 0.64 Å. The distance between the w1001-O proton donor and the Tyr53-O proton acceptor is however close to ideal values at 2.75 Å.

Hydrogen bond distances
The sheer numbers of hydrogen atoms visualized in this study allow us to measure and report hydrogen bond distances in a way previously not possible in cryo-EM ( Supplementary Tables 2-10; Fig. 3). Electrons are scattered by the potential field generated from electron clouds and the nuclei. The peaks in an electrostatic potential map are therefore expected to reflect the inter-nuclei distances more than distances between centers of mass of electron clouds as observed in X-ray diffraction. Whereas for non-hydrogen atoms the centroids and nuclei coincide, for hydrogen atoms the centroid of the electron cloud does not match the nucleus. We refined the structure using the default riding hydrogen model based on hydrogen distances between the electron cloud centroids using restraints derived from X-ray scattering. We analyzed the identified hydrogen atom difference peaks in the data at ≥ 2.0σ and calculated the average distance for each of the hydrogen bond types (Table 1, Fig. 3, Supplementary Tables 2-10). The number of observations for some bond types is insufficient for a rigorous statistical analysis. We do however find an average Cα-H distance for the main chain of 1.11(13) Å for 61 hydrogen bonds, compared to idealized values of 0.98 and 1.10 Å for X-ray and neutron diffraction, respectively (Table 1, Fig. 3, Supplementary Table 3). The average distance for all N -H bonds is 1.03(16) Å for 83 observations, compared to idealized values of 0.86 and 1.01 Å for X-ray and neutron diffraction, respectively (Supplementary Tables 2 and 8). Interestingly, the distances for the amide N-H bonds that are involved in hydrogen bonding interactions with neighboring residues are slightly longer compared to those that are not involved in such electrostatic interactions (Table 1, Fig. 3).
These results suggest an elongation of the hydrogen bond lengths compared to the electron cloud centroid distances assumed in the riding hydrogen model, although the number of observations for each type is rather limited and the standard deviations from the mean value are quite large (Table 1, Fig. 3). Furthermore, elongation of the hydrogen bond distances may be reflected by higher B-factors (Nakane et al., 2020;Yamashita et al., 2021), or may represent some excited state (Gallagher- Jones et al., 2018). Nevertheless, we find an overall trend that the Cα-H and N-H bond lengths are closer inter-nuclei distances (Gruene et al., 2014;Williams et al., 2018). This observation agrees with previous  (Gruene et al., 2014). c Idealized inter-nuclei hydrogen bond lengths used in neutron diffraction (Gruene et al., 2014). electron diffraction and imaging experiments that show an apparent elongation of the hydrogen bond lengths compared to X-ray diffraction Maki-Yonekura et al., 2021;Nakane et al., 2020). Refinement of structural models derived from electron scattering would therefore benefit from more appropriate restraints specific for electrons, including a more accurate riding hydrogen model, as well as taking the electrostatic potential of the crystal into account (Gruene et al., 2014;Murshudov et al., 2011;Williams et al., 2018;Yamashita et al., 2021;Yonekura et al., 2018Yonekura et al., , 2015.

Conclusions
The results demonstrate that hydrogen atom positions can be accurately identified in macromolecular MicroED data. As with X-ray crystallography, this will typically require atomic resolution data or better (Dauter et al., 1997;Howard et al., 2004;Kosinska Eriksson et al., 2013;Ogata et al., 2015;Walsh et al., 1998;Wang et al., 2007). In comparison, the structure of triclinic lysozyme was determined previously using Xray diffraction at 120 K and room temperature to 0.93 and 0.95 Å resolution, respectively (Walsh et al., 1998). The single-crystal low-temperature X-ray structure is of high quality and generally has more clearly visible hydrogen atoms than the room temperature model merged from three crystal datasets. Difference maps contoured at 1.9σ show hydrogen atoms in residues within the better-defined regions of the structure, and at 1.8σ contour level, 77 out of 127 peptide N -H atoms (61 %) are identified (Walsh et al., 1998). The number of hydrogen atoms localized in the low-temperature structure is similar to the MicroED structure at comparable resolution, even though the intensity and model statistics are worse (Supplementary Table 1, Supplementary Fig. 1) (Martynowycz et al., 2022). Out of a total of 112 waters in the MicroED structure, 65 are within 1 Å distance from waters located in the X-ray map. Within 2 Å, a total of 97 waters match with those located using the X-ray data. The lower accuracy of the MicroED data can in part be attributed to nonisomorphism from merging of 16 crystal datasets and lower completeness in the highest resolution shells (Supplementary Fig. 1). Additional factors that contribute to the errors are multiple scattering interactions and absorption that can affect the accuracy of the intensities and increase the background noise. Furthermore, inaccurate modeling of the electron form factors and the electrostatic potential in structure refinement can contribute to higher model R-factors. Compared to X-ray diffraction, electrons are expected to provide better contrast for identifying hydrogen atoms at a similar resolution as the scattering factors fall off less steeply with decreasing atomic number. The lighter hydrogen atoms are therefore expected to be better resolved next to the heavier atoms, which might explain why we can identify many hydrogen atoms even though the MicroED data appear noisier. This is further supported by a comparison between apoferritin models from X-ray crystallography and single-particle cryo-EM showing that hydrogen atoms are more clearly visible in the latter (Yamashita et al., 2021). More recently, a significantly higher resolution structure of triclinic lysozyme was solved  Table 2. Electrostatic potential 2mFo-DFc maps are contoured at 4.0σ (blue) and mFo-DFc difference maps are shown at 2.5σ (green and red for positive and negative, respectively). Carbon atoms are shown in brown, nitrogen in blue, and oxygen in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 2
Hydrogen bond distances and angles for secondary structure.

Donor-H⋯Acceptor
Diff. peak  ab initio at 0.65 Å by X-ray diffraction (Wang et al., 2007). At this resolution, approximately 31 % of all hydrogen atoms in main and side chains could be identified at 3.0σ or higher. We would anticipate major improvements in hydrogen atom localization in MicroED data upon further improving data quality and increasing the resolution. Previously, hydrogen atoms were successfully identified in protein complexes by single-particle cryo-EM. In comparison to the results presented here, these studies reported that about 70 % of the expected number of hydrogen atoms could be identified above a threshold level of 2.0σ using hydrogen-only omit maps from atomic resolution reconstructions of apoferritin (Maki-Yonekura et al., 2021;Yamashita et al., 2021). Remarkably, about 17 % of possible hydrogen atoms could be identified from data as low as 1.84 Å resolution (Yamashita et al., 2021). In imaging, the phase information is retained during reconstruction, and images are filtered to remove noise and to select a specific conformational state. The resolution is therefore a local feature of the map whereas the B-factor is a global parameter applied in map sharpening or blurring. This is unlike a crystallographic map, where the structural flexibility or disorder is modeled locally using alternate conformation and per-atom refined B-factors. In crystallography, resolving detailed features such as hydrogen atoms is affected by local disorder. In the MicroED structure, twelve residues are modeled with alternate side-chain conformations at low occupancy, making it more challenging to identify hydrogen at these positions. The mean B-factor over all atoms in the model is 11.98 Å 2 , and most outliers are on the exterior of the protein facing the solvent. Low completeness in the higher resolution shells and non-isomorphism while merging data of multiple crystals could both contribute to increased B-factors. Especially the last two C-terminal residues have high temperature factors, these were also poorly resolved in the high-resolution X-ray diffraction structure (Walsh et al., 1998). Indeed, most hydrogens can be identified within the more stable core of the protein whereas the residues facing the solvent have higher flexibility and B-factors.
In all, 377/1067 (35 %) hydrogen atoms could be located at ≥ 2.0σ and we illustrate several examples of well-resolved hydrogen atom positions and hydrogen bonding interactions between protein residues and solvent molecules. This is the most complete hydrogen network map for macromolecular MicroED data to date, and these results provide a glimpse of the information that can be obtained by electron scattering, opening new avenues for further experiments investigating hydrogen bonding networks in protein structures. At the current stage, the difference map becomes increasingly noisy at contour levels below 2.0σ, making it more challenging to unambiguously identify hydrogen peaks. Future efforts that can enhance the localization of hydrogen atoms should be focused on improving data accuracy and increasing resolution even further. Energy filtration can improve data quality by discarding inelastically scattered electrons, improving the detection of weak peaks at high resolution and at the lower scattering angles that are shaded by the direct beam (Yonekura et al., 2019(Yonekura et al., , 2015. It would also mean that the exposure could be lowered even further without losing the weak signal from high-resolution reflections to the noise of the background. Energy filtering does not exclude multiple elastic scattering which may affect the measured kinematic intensities (Cowley, 1995;Fujiwara, 1959). For any typical hydrated protein crystal, these effects are suggested to be far less detrimental to data quality compared to inelastic scattering (Latychevskaia and Abrahams, 2019;Martynowycz et al., 2021a). Dynamical structure refinement can enhance the localization of hydrogen atoms in small molecule structures (Palatinus et al., 2017), but its implementation is computationally expensive and has yet to be extended to macromolecules that include bulk solvent. In recent experiments, recording MicroED data using a direct electron detector in electron-counting mode significantly improved data quality, and we residues and for several water molecules. Hydrogen atoms were assigned from a hydrogen-only omit map for peaks at ≥ 2.0σ and within 0.5 Å from their idealized positions. (a) Strong difference peaks indicate hydrogen atom positions for His15, as well as a possible hydrogen bond interaction between His15-NE2 and a neighboring Thr89-OG1. The histidine residue appears to be protonated at ND1 which is consistent with pH 4.5 of the crystallization condition. (b) Hydrogen atoms are indicated by difference peaks for two residues, as well as a potential hydrogen bonding interaction between Tyr53-OH and Asp66-OD1. (c) Acidic side-chains showing well resolved atoms. Strong difference peaks for side chain hydrogen atoms can be observed in asparagine and glutamine residues. (d) Illustration of a hydrogen bonding network involving water molecules and several protein residues. The inset shows the hydrogen bond distances for the water molecules. Electrostatic potential 2mFo-DFc maps are contoured at (a) 2.5σ (blue) and (b-d) at 3.0σ (blue), mFo-DFc difference maps are shown at 2.3σ (green and red for positive and negative, respectively). Carbon atoms are shown in brown, nitrogen in blue, and oxygen in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) expect further benefits from faster readout and better electron-counting algorithms using electron-event representation (Guo et al., 2020;Nakane et al., 2020).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Coordinates and structure factors have been deposited to the PDB under accession code 7ULY. Maps have been deposited to the EMDB under accession code EMD 26596. Fig. 3. Hydrogen bond distances for macromolecular MicroED data. Hydrogen bond distances in Å are shown as histogram plots with a normal distribution fitted to the data. Idealized hydrogen bond lengths between electron cloud centroids used in X-ray diffraction are indicated by a teal dotted line, idealized inter-nuclei hydrogen bond lengths used in neutron diffraction are indicated by the orange dotted line (see also Table 1, Supplementary Tables 2, 3, 6-8) (Gruene et al., 2014). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)