Pdb2pqr: Expanding and Upgrading Automated Preparation of Biomolecular Structures for Molecular Simulations

Real-world observable physical and chemical characteristics are increasingly being calculated from the 3D structures of biomolecules. Methods for calculating pK a values, binding constants of ligands, and changes in protein stability are readily available, but often the limiting step in computational biology is the conversion of PDB structures into formats ready for use with biomolecular simulation software. The continued sophistication and integration of biomolecular simulation methods for systems-and genome-wide studies requires a fast, robust, physically realistic and standardized protocol for preparing macromolecular structures for biophysical algorithms. As described previously, the PDB2PQR web server addresses this need for electrostatic field calculations (Dolinsky et al., Here we report the significantly expanded PDB2PQR that includes the following features: robust standalone command line support, improved pK a estimation via the PROPKA framework, ligand parameterization via PEOE_PB charge methodology , expanded set of force fields and easily incorporated user-defined parameters via XML input files, and improvement of atom addition and optimization code. These features are available through a new web interface (http:// pdb2pqr.sourceforge.net/), which offers users a wide range of options for PDB file conversion, modification and parameterization.


INTRODUCTION
Due to the importance of electrostatic interactions in biomolecular systems, a variety of computational methods have been developed for evaluating electrostatic forces and energies [see (1)(2)(3)(4)(5)(6) and references therein]. Typical computational electrostatics methods for biomolecular systems can be loosely grouped into two categories: 'explicit solvent' methods, which treat solvent molecules in full molecular detail, and 'implicit solvent' methods, which include solvent-solute interactions in averaged or continuum fashion. Implicit solvent methods are, by definition, limited in detail and therefore lack the atomic-scale accuracy of their explicit solvent counterparts. However, implicit solvent methods have gained increasing popularity, in part due to their elimination of the extensive sampling of solvent configurations required with explicit models (1,(3)(4)(5)(6)(7).
The basic ingredients of an implicit solvent electrostatics calculation are environmental parameters such as temperature, solvent dielectric and ionic strength; biomolecular atomic coordinates; and parameters for atomic charges and radii. While the environmental parameters are relatively straightforward to specify, the remaining two ingredients can often be difficult to supply. In particular, most biomolecular structures in the Protein Data Bank (PDB) (8) do not contain hydrogen atoms, and many are also missing a fraction of the heavy atom coordinates. The addition of hydrogens and the reconstruction of these missing coordinates is not a trivial process; electrostatic properties obtained from the 'repaired' structures can often be very sensitive to the manner in which missing atoms are added and protonation states are assigned (9,10). force field idiosyncrasies can often make the assignment of atomic charges and radii a cumbersome task. An additional obstacle to the use of PDB structures in electrostatics calculations and other biomolecular computational tasks is the accurate assignment of parameters to 'non-standard' residues and ligands.
Previously (9), we introduced the freely available PDB2PQR service (http://pdb2pqr.sf.net/), which was designed to facilitate the setup and execution of continuum electrostatics calculations from PDB data, particularly by non-experts. The original PDB2PQR server automated many of the common tasks of preparing structures for continuum electrostatics calculations, including adding a limited number of missing heavy atoms to biomolecular structures, estimating titration states and protonating biomolecules in a manner consistent with favorable hydrogen bonding, assigning charge and radius parameters from a variety of force fields, and finally generating 'PQR' output (a PDB-like format with the occupancy and temperature factor columns replaced with charge 'Q' and radius 'R', respectively) compatible with several popular computational biology electrostatics [APBS (10) and MEAD (11)], docking [AutoDock (12)], simulation [AMBER (13)] and visualization [VMD (14), PyMOL (15) and PMV (16)] packages. Since its inception, we have continued to expand the capabilities of the PDB2PQR server to address the challenges associated with ligand parameterization in PDB files and to include several new features.

METHODS
The PDB2PQR web service is driven by a modular, Python-based collection of routines, which provides considerable flexibility to the software and permits noninteractive, high-throughput usage. The service is available via a number of web mirrors listed at http:// pdb2pqr.sf.net/. The source code is also available for download from this link, and due to the portability of Python, PDB2PQR can be executed on a wide range of platforms. Figure 1 outlines the typical workflow of a PDB2PQR job and summarizes the features described in more detail below. The procedures for reconstruction of missing atoms, hydrogen optimization and APBS input generation were described previously (9) and are essentially unchanged in the current version of the software. Since their initial development, these atom reconstruction options have been greatly improved through a number of bug fixes and code optimization, robust support for separate biomolecular chains, and improved chain termini optimization. The following sections describe modified and new elements of the PDB2PQR pipeline.

Titration state assignment by PROPKA
Protonation states for titratable protein groups are assigned by PROPKA 1.0 (http://propka.ki.ku.dk) (17). PROPKA utilizes a very fast empirical method to predict pK a values and is successful at predicting unusual pK a values. Recently, a comparative study of several protein pK a prediction methods showed that PROPKA was the most accurate method overall (18). PROPKA uses a heuristic method to compute the pK a perturbations due to desolvation, hydrogen bonding and charge-charge interactions. In the current version of PROPKA, contributions from nucleic acids as well as heteroatoms such as bound ions or ligands to the pK a values are not included. Note that, during the course of titration state assignment, PROPKA generates statistics on residue hydrogen bonding, location and solvent accessibility and Coulombic interactions. This information is available to users as a downloadable text file provided at the end of the PDB2PQR/PROPKA calculation.  Standard residue parameter assignment PDB2PQR currently allows users to assign protein and (where available) nucleic acid parameters based on explicit solvent AMBER99 (19) and CHARMM27 (20) force fields, the PARSE continuum electrostatics force field (21), a Poisson-Boltzmann-optimized force field by Tan et al. (22), or user-defined force fields. User-defined parameters can be uploaded to the PDB2PQR server in a simple flat-file format described in the PDB2PQR user guide. Additionally, PDB2PQR output can be customized to include a variety of atom naming schemes, including AMBER99 (19), CHARMM22 (20), PARSE (21) and an internal naming scheme based on the IUPAC naming recommendations (23). This flexibility in nomenclature was included to facilitate import of PDB2PQR output into other modeling packages. Additionally, the web server provides a 'map' which is output at the end of every PDB2PQR calculation and presents a table of atoms' name/number, residue name, chain name, AMBER atom type and CHARMM atom type to aid in the interpretation of parameter assignment and the development of user-defined charges and radii.

Ligand parameter assignment
The calculation of ligand charges necessitates detailed information on molecular structure and protonation states due to the large variation in the covalent structures of small-molecule protein ligands. The current version of PDB2PQR therefore requires the ligand structure, protonation state and formal charge to be specified by the user in the popular MOL2 (24) format. Ligand structures in MOL2 format are readily available from popular molecular modeling software and free web services such as PRODRG (25). Future versions of PDB2PQR will include a pdb2mol2 parser and automatic assignment of default ligand protonation states from a small-molecule pK a database. The calculation of ligand charges in PDB2PQR is based on the partial equalization of orbital electronegativities (PEOE) procedure developed by Gasteiger and Marsili (26). In the PEOE procedure, orbital electronegativities are linked to partial atomic charges q by a polynomial expansion ( ¼ a þbÁq þ cÁq 2 þ dÁq 3 ). The coefficients a, b, c and d were optimized by Gasteiger and Marsili using gas phase data on ionization potentials and electron affinities. We utilize a PEOE algorithm, which has been optimized by Czodrowski et al. to obtain better agreement between theoretical and experimental solvation energies for a set of small molecules including the polar amino acids (27). The resulting PEOE_PB charges have been tested for small-molecule complexes with trypsin, thrombin (28) and HIV protease (29), and have been found to give results that are in agreement with experimental values.

Post-processing
The current version of PDB2PQR supports an 'extension' directory for user-defined processing of PDB2PQR output. Such extensions might include alternative naming schemes, identification and parameterization of other molecule types, additional hydrogen bond processing, etc. The web servers listed at (http://pdb2pqr.sf.net/) provide only the default PDB2PQR functionality. However, it is straightforward for users to download the PDB2PQR software and setup their own web servers with additional functionality based on custom extensions.

CONCLUSIONS
We have described a number of new features for the free PDB2PQR web server, a service which helps users prepare molecular structures for further computational work by modeling missing atoms, assigning charges and titration states, and providing a mechanism for assignment of ligand parameters. Readers interested in these tasks might also be interested in other servers, which provide complementary services for biomolecular structure processing (30)(31)(32). Planned future developments for PDB2PQR include the construction of a pdb2mol2 parser to allow for the automatic parameterization of non-protein atoms, the correct treatment of protein posttranslational modification, and the integration of a Poisson-Boltzmann continuum electrostatics-based pK a calculation algorithm into PDB2PQR. We anticipate that the PDB2PQR service will continue to be a helpful addition to the portfolio of tools available to the structural and computational biology communities.