Protonation of histidine rings using quantum-mechanical methods

Histidine protonation is fraught, but a new quantum-mechanical method can provide insight into choices.


Introduction
Protonation of histidine can take on three different forms: one H atom on either or both of the N atoms in the heterogeneous ring of the imidazole moiety.As noted by Malinska et al. (2015) This variability is of particular importance in protein structures, although it is also particularly recalcitrant to X-ray crystallographic characterization because of the limited resolution and the inability to detect H atoms that blight the method in routine applications.
This obstinate behaviour led to a procedure to determine the protonation of the histidine by matching the refined ring geometry with those in the Cambridge Structural Database (CSD; Groom et al., 2016).The protonation-determination procedure involved relaxing the restraints of the histidine side chain and using discrimination functions based on the CSD data to predict the protonation.The example used was PDB entry 4rj2: a 0.99 A ˚resolution model of Escherichia coli purine nucleoside phosphorylase that comprises a hexamer of subunits (A-F ), each containing six histidine residues.An upper B-factor limit of 20 A ˚2 on the post-refinement N atom was used to filter for accuracy.
The procedure was used to predict the presence or absence of a particular proton, resulting in some ambiguity in the final protonation.Unfortunately, histidine remained unruly as the method was only able to predict 61% of the 144 (completely ordered) proton positions in the model.It does, however, provide a valuable curated test set of histidine protonation.
Recently, the power of quantum-mechanical (QM) calculations was applied to the generation of restraints using the in situ geometries of the target molecule (Liebschner et al., 2023).Known as QM Restraints (QMR), this program is part of the Quantum Interface (QI) in the Phenix suite of programs (Liebschner et al., 2019).QI is a central location for interacting with quantum-mechanical modules in Phenix.The power of QM is that it does not require atomic properties, such as van der Waals radii, to be known in order to calculate the protonation state.QM methods rely on the specification of element and electron positions alone, minimizing the electron wavefunction to determine properties of the molecule.This makes them orthogonal to and independent of the methods used for many of the related macromolecular computational methods.
A key feature of the QI module is the ability to create a selection of the overall model representing the molecular environment of the entity of interest.The default mechanism is to include all entities such as amino acids, metals, waters and ligands within a buffer distance (with a default of 3.5 A ˚) of the user-defined selection.For a drug candidate, this selection will likely be a small molecule in a protein binding pocket.QI also prepares this small model (or cluster) for QM calculations by adding H atoms to solvent and appropriately protonating the protein main-chain terminations created by the selection process.The surrounding amino acids (and other entities) are frozen while the geometry of the ligand and all H atoms are minimized.By default, all of the atoms in the user-defined selection and the H atoms in the remainder of the cluster are energy-minimized using the QM method of choice.The minimized geometry is used to create geometric restraints that reflect the molecular environment of the binding pocket, which are then used in standard crystallographic refinement algorithms (Afonine et al., 2012).Two QM packages are currently supported: MOPAC (Stewart, 1990;Moussa & Stewart, 2024), which is distributed with the Python3 version of Phenix, and Orca (Neese et al., 2020), which is a third-party QM package supported by the QI module.This integration of MOPAC and Phenix means that nuances in the energy calculations used in this work can be handled automatically by the code (see the discussion below).Orca is also well suited to the QMR module, particularly for ligands with novel moieties.The default QM method when using MOPAC is PM6-D3H4, while the default method for Orca is PBEh-3c; however, this is easily changed.
For QMR, the entity of interest is usually a drug molecule or other ligand.However, any reasonable selection can be used to create a QI model, including, for example, a nonstandard amino acid, should the restraints for the side chain be needed.For the standard amino acid histidine, the protonation of the imidazole moiety is affected by the molecular environment which is extant in the selected QM model.

Histidine configurations
As stated earlier, there are three protonation states: an H atom on either or both of the N atoms in the heterogeneous ring.Each configuration can be identified by the name of the proton: HD1 and HE2 for the singly protonated states and, using the logical 'and', HD1^HE2 for the double protonation.It should be noted that the HE2 tautomer is slightly more stable than the HD1 tautomer.In addition, histidine sidechain rings can rotate around the � 2 torsion, generating the possibility of two conformations 180 � apart.This is part of the NQH flipping algorithm implemented in the MolProbity suite (Williams et al., 2018) as part of the hydrogen-adding program reduce.In general, the orientations of the N atoms in the ring are correct in a refined model.However, to cover all possibilities, including the rarer rotamers, possible flips of the side chain add three more possible conformations.
A new program, QM Flipping (QMF ), was developed to determine which of these six starting protonations/flips is most likely in an atomic model.Each of the six configurations are geometry-minimized to ascertain their feasibility.A geometry minimization can change the atomic positions greatly.As in the QMR calculations, the protein-environment atomic positions are frozen except for the H atoms.Because the histidine is the entity of interest, its atoms have more freedom.Allowing complete freedom produces spurious geometries, particularly in cases where the proposed protonation causes a clash.However, because the position of the histidine is reasonably well known, one can leverage the experimental information to determine the plane of the ring.Thus, the default has the plane of the ring restricted via the torsion angle towards the main chain.This results in better geometries for 'bad' protonations, as can be seen by comparing the left side of Fig. 1 (unrestricted) with the right (restricted).In this particular high-resolution example, because the water molecules have been placed the changes in geometries are minimal, but are still sufficient to extend outside the density.However, in poorly defined environments the final geometries can be very distorted.

Heats-of-formation comparisons
One additional attribute of QM-minimized geometries is the calculated minimum energy.These can be compared directly if the same number of atoms and electrons are in each configuration.For the trivial cases of asparagine and glutamine ('NQ' of 'NQH flips') this is the case, but for histidine two of the six configurations have an additional proton.Parenthetically, the planar restriction also results in higher energies for 'bad' protonations, providing more discrimination between states.An additional note for comparisons of the NQH flips is that each cluster needs to contain the same number of entities.This nuance required that the cluster needed to be chosen via a selection syntax rather than the buffer parameter, a subtlety automatically handled via the QI interface for QMF.
The semi-empirical QM models implemented in MOPAC are primarily intended for thermochemistry applications defining a heat of formation for each molecular configuration.These heats are relative to the standard state of the constituent atoms and a reservoir of electrons in vacuum.The relative stability of two configurations can be directly predicted by the difference between their heats of formation when they contain the same number of atoms and electrons.Otherwise, the difference implies a reaction transferring atoms and electrons to or from their standard reference states.In the case of an excess proton, the reference state is derived from molecular hydrogen gas, which includes neutralizing electrons taken from vacuum.This is an unphysiological reaction for biological applications, where the reference state of a proton is assumed to be solvation in water with a pH of 7.4.To account for this change of reference, the free energies reported by MOPAC need a correction term for each excess or deficiency of a proton to compare them with a neutral molecular configuration for biological applications.In some cases, heats of formation are also modified by an implicit solvent model approximating the dielectric response of the chemical environment that is not explicitly contained in the calculation.
To assign a biologically relevant proton heat of formation, we follow an established approach of fitting QM data to a simple semiempirical model for pK a (Matsui et al., 2012), where a and b are model parameters, G(HA) is the calculated heat of formation of solute molecule A and G(A À ) is the calculated heat of formation of the deprotonated solute.The model parameters will depend on the specific semiempirical model, and here we fit the correction for the PM6-D3H4 model (R ˇeza ´c ˇ& Hobza, 2012).In Table 1, we fit this model to minimize the root-mean-square (r.m.s.) error for 20 experimental reference values of pK a that include both N and O atoms as protonation sites.In these data, the amino acids arginine, histidine, lysine, aspartic acid, glutamic acid and tyrosine are in the zwitterionic form and are protonated on their side chains.These PM6-D3H4 calculations use COSMO implicit solvent with a relative static permittivity of 78.4 to approximate the aqueous environment (Klamt & Schu ¨u ¨rmann, 1993).The minimum r.m.s.error of 1.77 is achieved with a = 42.18 and b = 0.368 mol kcal À 1 .The effective free-energy correction for the proton balances the heats of formation of protonated and unprotonated solutes when the pK a matches the pH, More accurate, but less transferable, pK a and G(H + ) values can be obtained by fitting models for specific protonation sites.For example, a fit to only nitrogen-bound protons results in G(H + ) = À 94.72 kcal mol À 1 , while a fit to only oxygen-bound protons results in G(H + ) = À 93.88 kcal mol À 1 .We use the general-purpose value of À 94.51 kcal mol À 1 since it is close to the special-purpose value for nitrogen-bound protons such as in histidine.
The accuracy of the QM fitting described here provides an r.m.s.error for comparing heats of formation when the additional proton is involved.The r.m.s.error of the pK a is 1.77, which translates to an r.m.s.error of 4.81 kcal mol À 1 when translated to an energy using our semiempirical linear model.The model cannot reliably order states with an energy difference less than this value.The errors in this pK a model come from errors in the semiempirical QM model, errors in approximating the chemical environment by an implicit solvent model and errors from using a single, minimum-energy geometry for solute molecules in implicit solvent.The largest error in Table 1 is for arginine, which is likely to come from the PM6-D3H4 model because the guanidino group in its minimum-energy geometry of arginine has poor planarity, and geometric errors typically imply energetic errors.
While the pK a model used here can be paired with other QM-based model chemistries, it may need to be refitted for different levels of theory and adjusted for different energy references and applications.For example, previous fits of this model (Matsui et al., 2012) used density-functional theory (DFT) calculations, a reference proton free energy of zero and fit different model parameters for protons bound to six different side groups.Adjusting for the change in reference used in MOPAC, this fitting produced a range of proton heats of formation between À 101.49 and À 69.31 kcal mol À 1 , with an average heat of À 85.54 kcal mol À 1 .

Other metrics
Additional metrics to assess each histidine configuration can be helpful.Recently, a program for determining the model quality based on hydrogen bonding (Afonine et al., 2023) was added to Phenix.As part of the validation of histidine conformations, hydrogen bonds are counted.When applied to each of the final six histidine configurations, an increase in hydrogen bonds will generally lead to a better result.Another metric is based on the movement of the minimized atoms.A large root-mean-square deviation (r.m.s.d.) of the starting and final geometries could be due to a poor protonation clashing with the surrounding amino acids.One final metric is the rotameric state (Hintze et al., 2016;Williams et al., 2018).This rarely changes except between the two flip states, but is reported for information.If the state is labelled 'OUTLIER' then it is a case for further investigation.

Single-histidine example
The output of a single QMF run is summarized in Table 2.The original configuration is displayed on the first line with its rotamer.Each subsequent line lists the configuration, energy, �E, number of hydrogen bonds, r.m.s.d.from the starting geometry and rotamer.The first three calculated geometries differ only in the number and position of protons (see Fig. 1).This is reflected in the rotamer (all of which are the same) and the r.m.s.d.values.The larger r.m.s.d.values for the flipped configurations are due to the swapping of the Cartesian coordinates to flip the ring by 180 � .
The number of hydrogen bonds is greatest for the doubly protonated and HD1 protonation states.At 14, it is one greater than that for HE1, which is reflected in the �E of 19.3 kcal mol À 1 between the HD1 and HE2 configurations.The energy difference between doubly protonated and HD1 protonation is 2.6 kcal mol À 1 , which is well below the significance level (4.81 kcal mol À 1 ).Closer inspection of the two geometries (Fig. 2) shows that the nearby water molecule is involved with the HE2 proton in the doubly protonation configuration and with the NE2 N atom in the HD1 state, thus maintaining the same number of hydrogen bonds.

Comprehensive protein environment example
Extending the use of QMF to all histidine amino acids in the high-resolution example PDB entry 4rj2 is displayed in Table 3.The first row of the table displays the results discussed in the example in Table 2 and Fig. 2. The results from Malinska Table 3 Summary of predicted histidine protonation states for PDB entry 4rj2.
For the results from Malinska and coworkers, each N-atom proton is labelled '+' for present, 'À ' for absent and '?' for unknown.In the comparison columns, 'Y' and 'N' are yes and no, respectively, while 'M' means maybe.The results from MolProbity and QMF are included as 'HD1^HE2' for doubly protonated, with a single atom name for the singly protonated results.
Minimized geometry of two histidine configurations of histidine in chain A and residue 4 of PDB entry 4rj2.
and coworkers are shown in the second and third columns.A '+' means protonated, a 'À ' means unprotonated and a '?' means unknown.For the example of the histidine labelled as residue 4 in chain A (compactly 'His 4 A'), it shows that ND1 is protonated, but it is unclear about the protonation state of NE2.Both MolProbity and QMF agree with the addition of HD1, but differ on the presence of HE2.As already mentioned, QMF calculates an insignificant �E, indicating that either double protonation (HD1^HE2) or HD1 protonation is acceptable.The next histidine of interest is His 97 A. Malinska and coworkers do not have a prediction due to high B-factor values that are a result of the amino acid being on the surface of the protein.Consequently, there are few hydrogen bonds involved, thus lowering the discriminating power of QMF.A closer look at the density around this histidine shows the possibility of one or two water molecules.If these solvent molecules could be justified, they may provide more discrimination.The same situation is true for His 209 A, a surface side chain with possible water interactions.The relevance of the protonation of a surface histidine is low, so the poor discrimination of the method is less important.Removing these special cases from all chains results in perfect agreement between the methods; in particular, MolProbity and QMF agree in all cases.

Metal-coordination example
Histidine coordinates with many metal ions, with the zinc finger being a common motif.The correct protonation of a metal ion is particularly important, as an incorrectly placed proton can distort the final refined model.This would appear to be the case for Zn 4 B in PDB entry 3rzu, as shown in Fig. 3.The doubly protonated histidine at the top of the image is 2.63 A ˚from the zinc, much farther than the more accurate 2.1-2.3A ˚range.The other histidines are also distorted.
MolProbity protonates the model as shown in the left panel; however, QMF predicts the correct protonation depicted on the right.Furthermore, the smallest energy difference between the lowest energy state and the next lowest state was 21 kcal mol À 1 , with the majority being far larger.In cases where the proton is pointing towards the metal, it was often bent out of the plane of the imidazole ring.Naturally, this spurious geometry was reflected by a high energy.
It is noteworthy that the 'wrong' states of the other histidines coordinated with the zinc had little impact on the final energy differences, but an approach that corrects each histidine before proceeding to the next hisitidine and revisiting each to ensure a consistent result may be advisable.

Ligand-environment example
An example of a histidine interacting with a ligand can be found in PDB entry 2ixc, a carbohydrate epimerase resolved to 1.8 A ˚.The ligand (2 0 -deoxythymidine-�-l-rhamnose, code TRH) has a carbohydrate-like structure at one end that interacts with two histidine side chains.All four copies of TRH pass the polder OMIT map (Liebschner et al., 2017) test, so the chain B copy was chosen randomly.His 119 B has a similar motif to the single-histidine example in that one proton in a doubly protonated configuration is a hydrogen-bond donor, but when the proton is removed the N atom becomes an acceptor (Fig. 4).In the previous example, the interaction is with a water molecule, while in this example it is the carbohydrate moiety.
In contrast, the energy difference favours HE2 protonation by 20.4 kcal mol À 1 , which is far more significant than the value of 2.6 kcal mol À 1 in the first example.Similarly, MolProbity predicts a doubly protonated state in this example.The resolution precludes using the discrimination functions method.The difference in vapour energies (David et al., 2017) favours the N-atom acceptor (O-H� � �:N) by 5 kcal mol À 1 over the O-atom acceptor (NÀ H� � �:O).Other factors affect the final heat of formation, but in this case the increased strength of the hydrogen bond appears to be adding to the improvement in energy for the singly protonated configuration.It should be noted that the protonation state of His 62 B agrees with MolProbity and is not changed for this comparison.
Other amino-acid side chains in the vicinity can affect the local pH.In this particular case, protonation of Asp 83 B could

Conclusions
Accurate prediction of histidine protonation is not a trivial task and X-ray crystallographic data do not provide direct information about protonation, except with very high-resolution data.Acceptable resolution neutron data can be a path to more information.The MolProbity suite provides an empirical approach that optimizes the hydrogen-bonding network.This is one of its strengths and should be relied on to perform this function for large networks in standard proteins.Malinska and coworkers proposed geometry-based discrimination functions that are limited to high-resolution structures with low temperature factors.Here, we show that Quantum Mechanical Flipping (QMF ) using QM energies and hydrogen-bonding metrics can predict histidine protonation in agreement with both MolProbity and Malinska and coworkers in the highresolution example.QMF will also work at low resolution due its reliance solely on the atomic positions, meaning that any well characterized region of the model is viable.
Overall, the QMF method has the advantage of working at typical resolutions in the macromolecular structure determination field as well as being capable of determining histidine protonations in environments including nonstandard amino acids, metal ions and ligands, including drug-like entities, where MolProbity may fail due to its limited parameterization.QMF is also crystal symmetry-aware, so interactions between symmetry copies are addressed by the method.
QMF does not use the structure factors.Nor does it validate the position of any atoms in the experimental data: this is the user's responsibility.It can therefore be used with either X-ray or cryoEM models that are reasonable in the region of interest.As shown in the ligand example, various protonation states in the region of interest can be investigated.
One case is not covered explicitly in this article: a nonstandard amino acid with an imidazole or similar moiety (for example d-histidine, DHI) or a ligand with a titratable group.A user can use these tools to calculate the energy and other  metrics with a little care.Neither of the previous methods can help; however, QI has the tools available for the user to manually minimize the geometry of each important state and calculate the energies.Each state needs to be generated by the user and passed to QMR.A simple energy comparison (including the protonation energy adjustment as appropriate) as well as looking at each geometry can provide the necessary answers.

Availability
The Quantum Mechanical Restraints (QMR) and Quantum Mechanical Flipping (QMF ) modules of the Quantum Interface (QI ) are available in version 1.21.1-5286 of the Phenix package.Downloading the Python3 version will include MOPAC, which can be installed separately for the official Python2 version.Downloading a nightly build will provide more features for both QMF and the QI modules in general.

Figure 1
Figure 1Comparison of geometry minimizations of the six starting configurations of histidine residue 4 in chain A of PDB entry 4rj2.The left column allows the planar torsion to relax, whilst the right column maintains the starting position.The top views are perpendicular to the ring, with the bottom views showing the ring on edge.Note that the atoms from the main-chain CA atom to the CG atom are frozen in both cases.

Figure 3
Figure 3Metal-coordinated protonation states of the three histidine side chains around Zn 4 B in PDB entry 3rzu as predicted by MolProbity (left) and QMF (right).

Figure 5
Figure 5Two lowest energy configurations of His 119 B from PDB entry 2ixc.The right panel (HE2 protonated) is 22.8 kcal mol À 1 lower in energy than the left panel (HD1 protonated).

Figure 4
Figure 4Two lowest energy configurations of His 119 B from PDB entry 2ixc.The right panel (HE2 protonated) is 20.4 kcal mol À 1 lower in energy than the left panel (doubly protonated).

Table 2
The QMF results for histidine residue 4 in chain A of PDB entry 4rj2.

Table 1
Experimental and model pK a values.Experimental data were sourced from Wikipedia, either the entry for the molecule, the entry for 'amino acid' or the entry for 'acid dissociation constant'.