Polarized Protein-Specific Charges from Atoms-in-Molecule Electron Density Partitioning

Atomic partial charges for use in traditional force fields for biomolecular simulation are often fit to the electrostatic potentials of small molecules and, hence, neglect large-scale electronic polarization. On the other hand, recent advances in atoms-in-molecule charge derivation schemes show promise for use in flexible force fields but are limited in size by the underlying quantum mechanical calculation of the electron density. Here, we implement the density derived electrostatic and chemical charges method in the linear-scaling density functional theory code ONETEP. Our implementation allows the straightforward derivation of partial atomic charges for systems comprising thousands of atoms, including entire proteins. We demonstrate that the derived charges are chemically intuitive, reproduce ab initio electrostatic potentials of proteins and are transferable between closely related systems. Simulated NMR data derived from molecular dynamics of three proteins using force fields based on the ONETEP charges are in good agreement with experiment.


Simulations of bulk water
Bulk water supercells were generated using the tleap module in AMBER11 by solvating a single water molecule with the TIP3P water model in a cubic box. Each supercell structure was minimized then heated to 300K in 300 ps followed by equilibration for 5 ns in the NPT ensemble. The last snapshot from each supercell structure was used for single point DFT calculations in ONETEP under periodic boundary conditions, using a psinc spacing of 0.45 Bohr corresponding to a kinetic energy cutoff of ∼ 1000 eV, NGWF localization radii r NGW F of 10 Bohr initialized as in Section 2.4, and the PBE exchange-correlation functional. A separate post-processing analysis was then performed on the converged electronic density to compute DDEC/ONETEP charges.

Relaxing simulation parameters
As DDEC charges are relatively insensitive to the quality of the underlying ab initio calculation such as basis set size, it is possible to relax convergence parameters without significantly impacting the accuracy of computed charges. The aforementioned calculations were performed using 12 density kernel optimization (LNV) steps per NGWF iteration. Using relaxed convergence parameters of 800 eV psinc energy cutoff, 8.0 bohr NGWF localization radii, and variable 4-8 LNV steps per NGWF optimization cycle reduced the calculation time of the 2502-atom bulk water system to three hours on the same system and number of cores. The mean absolute deviation of the resulting DDEC charges is less than 0.01 e from the calculation shown in Figure 1

Simulations of phenol
Phenol structures for DDEC analysis were generated using AMBER11. The phenol molecule was modeled using the GAFF force field, and was solvated in a periodic box by TIP4P water molecules.
The system was equilibrated in the NPT ensemble and five snapshots were extracted at 300 K comprising the phenol molecule and 20 nearest water neighbors. The lysozyme L99A/M102E SI-2 X-ray crystal structure (3GUO) was used as the starting point for simulations of phenol in the protein binding pocket. The protein was modeled using the ff99SB biomolecular force field and the system was equilibrated at 300 K in a periodic water box. Five snapshots were extracted for DDEC analysis, comprising phenol, five nearby residues (I78, L84, A99, E102, V103) and one water molecule. RESP charges for phenol were calculated based on a HF/6-31G * electronic structure calculation of the optimized structure in GAUSSIAN09 and used the antechamber module of AMBER11.

Simulations of proteins
ONETEP calculations for proteins were performed in the same way as the 25 small molecules, as described in the main text, except that implicit solvent was used to describe the aqueous environment. Using a smeared-ion formalism, the molecular Hartree energy is obtained not in reciprocal space, but rather by solving the Poisson equation (homogeneous in vacuum, inhomogeneous in solution) in real space, under open boundary conditions. This is achieved via a multigrid approach.
A 10 Bohr gap was left between the electron density and the edge of the multigrid and the ion smearing width set to 0.8 Bohr. The converged electronic density obtained in vacuum was used to generate the density-dependent dielectric cavity in solution. The values of the solvation parameter β and electronic density threshold ρ 0 were 1.3 and 7.8 × 10 −4 a.u. respectively. The relative dielectric permittivity of the solvent was set to 80.0 for all implicit solvent calculations. DDEC charges were computed in a post-processing manner using the electronic density of the molecule converged in solvent. That is, the density from which the DDEC charges are derived have been polarized by the solvent.
MM models were prepared from the initial crystal structures by solvation in TIP3P water and neutralization by counterions followed by energy minimization in a cubic box with minimum distance from protein atoms of 10 Å. The system was heated from 0 to 300 K over a period of 100 ps followed by equilibration for 1.2 ns. NPT production runs were performed for 10 ns with a 1 fs timestep at 1.0 atm, 300 K, using Langevin dynamics with a collision frequency of 1.0 ps −1 . Tra-SI-3 jectories were saved every 50 fs for analysis. The ff99SB force-field was used throughout. For simulations with DDEC charges, the ff99SB charges were replaced prior to minimization.  Figure S1: DDEC/ONETEP charge convergence for nitroethane (CH 3 CH 2 NO 2 ), measured as the RMS difference from charges calculated using the finest tested parameter value (in brackets), for: (a) ONETEP psinc energy cutoff (1600 eV), (b) r NGW F (14.0 Bohr), (c) r max (11.0 Bohr), and (d) N Rad (160 shells corresponding to a radial shell spacing of 0.0625 Bohr). Charges are converged to much less than 0.01 e for the parameters chosen.  Figure S5: (a) Convergence profile for DDEC and ISA charges for the 2502 atom bulk water system in ONETEP. Charges are considered converged when their maximum absolute change for any atom between three consecutive iterations is < 2 × 10 −5 e. The slow convergence of ISA charges is observed to be a general feature of the method, which can be attributed to the shallow optimization landscape presented by the density distribution around embedded atoms. The addition of IH reference densities into the optimization alleviates this issue. (b) Atomic and molecular charge distributions for the DDEC and ISA method in ONETEP for various bulk water supercells. The ISA charge distribution has a much greater spread, indicating its environmental sensitivity to atoms in embedded regions. (c) Comparison between experimental and calculated backbone N-H order parameter S 2 for 1UBQ using the ISA and IH methods in ONETEP for a 10 ns trajectory. IH charges reproduce experimental data more closely, with a root-mean-squared difference of 0.052, compared with 0.078 for ISA. SI-7