Molecular Fragment Replacement Approach to Protein Structure Determination by Chemical Shift and Dipolar Homology Database Mining

https://doi.org/10.1016/S0076-6879(05)94003-2Get rights and content

Abstract

A novel approach is described for determining backbone structures of proteins that is based on finding fragments in the protein data bank (PDB). For each fragment in the target protein, usually chosen to be 7–10 residues in length, PDB fragments are selected that best fit to experimentally determined one-bond heteronuclear dipolar couplings and that show agreement between chemical shifts predicted for the PDB fragment and experimental values for the target fragment. These fragments are subsequently refined by simulated annealing to improve agreement with the experimental data. If the lowest-energy refined fragments form a unique structural cluster, this structure is accepted and side chains are added on the basis of a conformational database potential. The sequential backbone assembly process extends the chain by translating an accepted fragment onto it. For several small proteins, with extensive sets of dipolar couplings measured in two alignment media, a unique final structure is obtained that agrees well with structures previously solved by conventional methods. With less dipolar input data, large, oriented fragments of each protein are obtained, but their relative positioning requires either a small set of translationally restraining nuclear Overhauser enhancements (NOEs) or a protocol that optimizes burial of hydrophobic groups and pairing of β-strands.

Introduction

With the completion of the sequencing of the human and many other genomes and the availability of an abundance of protein sequence data, there is a strong demand for rapid determination of tertiary protein structures. There are two main experimental avenues toward obtaining atomic resolution protein structures: X-ray crystallography and solution state nuclear magnetic resonance (NMR) spectroscopy. The process of structure determination by X-ray crystallography is already quite streamlined due to the availability of robotics for optimizing crystallization conditions, high-intensity synchrotron radiation sources, and standardized, semiautomated analysis software. Structure determination by NMR spectroscopy on the other hand is still a time-consuming and labor-intensive process with a turnaround time typically on the order of several months, which additionally requires 15N, 13C, and, for larger proteins, 2H isotopic enrichment. Usually, an NMR structure determination project proceeds in several stages: assignment of backbone resonances using pairs of now standard triple-resonance experiments, assignment of side chain resonances using 13C-, 1H- or 15N-mediated TOCSY- and COSY-type experiments, followed by assignment of NOE cross-peaks, and structure calculation.

The introduction of facile methods for weakly aligning proteins relative to the magnetic field now also allows measurement of residual dipolar couplings (RDCs). In favorable cases, the alignment necessary for a nonvanishing dipolar interaction can be imposed on the solute macromolecules directly by the magnetic field (Bothner-by 1985, Kung 1995, Tjandra 1996, Tolman 1995), but more commonly an anisotropic aqueous medium is used. Many such media are now available, including lyotropic liquid crystalline solutions of phospholipid bicelles (Tjandra and Bax, 1997), Pf1, fd, or TM phage particles (Clore 1998b, Hansen 1998), cellulose crystallites (Fleming et al., 2000), and polyethylene glycol (Ruckert and Otting, 2000) or cetylpyridinium halide-based bilayers (Barrientos 2000, Prosser 1998). Anisotropically compressed, low-density polyacrylamide gels (Chou 2001a, Ishii 2001, Meier 2002, Sass 2000, Tycko 2000, Ulmer 2003) and suspensions of magnetically oriented purple membrane fragments (Koenig 1999, Sass 1999) also have proven useful for this purpose.

RDCs are global parameters in the sense that they restrain the orientations of the corresponding dipolar interaction vectors all relative to a single reference frame, often referred to as the principal axis frame of the alignment tensor. In this respect, they differ in nature from NOEs and dihedral restraints derived from J couplings, which report on atomic positions relative to one another. Besides improving local geometry (Chou 2001b, Tjandra 1997), RDCs have been shown to be highly useful for determining the relative orientation of individual domains in multisubunit proteins, nucleic acids, and their complexes (Braddock 2001, Clore 2000, Lukavsky 2003).

In the principal frame of the alignment tensor, the dipolar coupling is given byDij(θ,φ) = Da[(3cos2 θij-1) + 32R sin2θijcos2φij]where θ and φ are the polar angles of the dipolar interaction vector, rij, in the alignment frame; Da is the magnitude of the alignment tensor, which includes constants related to the magnetogyric ratio and internuclear distance of nuclei i and j, and R is the rhombicity of the alignment tensor (Bax et al., 2001). Clearly, with a single experimental Dij(θ,φ) value, and two variable parameters, in general an infinite number of (θ,φ) solutions exist. This degeneracy may be partly lifted if RDCs in a different alignment medium and with a different, independent alignment tensor are available (Ramirez and Bax, 1998). However, even in this case, a vector orientation can never be distinguished from its inverse because both orientations lead to the same dipolar coupling. Only once the “handedness” of a local element of structure that involves several dipolar interactions in at least two alignment frames is known, can the absolute orientation of such a fragment and thereby of its vectors be determined (Al-Hashimi et al., 2000). As a consequence, when attempting to build a full protein structure that simultaneously satisfies Eq. (1) for all dipolar interactions, the number of false minima scales exponentially with this number of couplings. Solving this problem by means of a “brute force” simulated annealing or Monte Carlo program on a full protein has proven very difficult. However, when first assembling local substructures, this problem is no longer intractible, and a number of recently proposed approaches rely on this principle (Andrec 2001, Delaglio 2000, Hus 2000, Rohl 2002).

Our present approach represents a much improved and more stable version of the molecular fragment replacement (MFR) method described earlier, which derived backbone torsion angles from searching the PDB for seven-residue peptide fragments that fit experimental dipolar couplings in a fragment of the target protein (Delaglio et al., 2000). In the original procedure, a starting model was first built using these backbone torsion angles and subsequently refined by optimizing agreement between this model and the full set of dipolar couplings. Although the method results in reasonable structures, provided that a nearly complete set of dipolar coupling is available, convergence to a satisfactory final structure and accuracy of its local details remain limited by the quality of the fragments of the original search. However, in favorable cases, substantial regions of a protein can be assembled from such data, even in cases in which all dipolar couplings and assignments are derived from a single experiment (Zweckstetter and Bax, 2001).

Our MFR approach is related to work by Annila et al. (1999), who proposed to use dipolar couplings for finding structurally homologous proteins in the PDB. Instead of searching for complete proteins, the MFR program searches the PDB for structural homology for only 7–10 residues at a time. The idea of using small substructures from a database of representative protein structures as “templates” for building a structure was pioneered by Jones and co-workers and has been very successful in X-ray crystallography (Jones and Thirup, 1986). The approach has also been applied to solving structures on the basis of NOEs, where it searches a database for substructures compatible with the experimental NOEs (Kraulis and Jones, 1987). However, because only short and medium range NOEs can be used in the search process, obtaining the correct tertiary fold remains very difficult with such a method. Other approaches relying on database substructures have also been described in recent years. Work by the Baker group (Rohl and Baker, 2002) relies on selecting a large number of database fragments that are roughly compatible with the experimental parameters measured for the corresponding target fragment and then using efficient Monte Carlo methods to assemble these fragment into a common structure with reasonable packing properties, where the fragments retain an orientation needed to satisfy dipolar coupling restraints. A method proposed by Andrec et al. (2001) is similar in spirit to our own MFR method but uses “postprocessing” to distinguish correct from incorrect fragments by comparing them with the overlapping region of an adjacent fragment. Using a so-called bounded-tree search, self-consistent sets of overlapping fragments can be identified relatively rapidly, resulting in a backbone structure.

A different approach to building complete protein backbone structures from dipolar couplings, which does not rely on a database for finding suitable substructures, has been proposed by Hus 2000, Giesen 2003. It is conceptually somewhat similar to approaches pursued in determining polypeptide structure on the basis of 15N–1H dipolar couplings derived from solid-state NMR measurements (Brenneman 1990, Marassi 1998, Nishimura 2002, Wu 1995) and conducts a systematic search in Cartesian space when adding a peptide plane to the chain. The approach requires a very complete set of dipolar couplings when applied de novo, but it has other applications too. For example, it was shown to be particularly powerful for pinpointing the precise structural differences in the backbone at the active site of a 27-kDa enzyme [methionine sulfoxide reductase (MsrA) from Erwinia chrysanthemi] and its Escherichia coli homologue, for which an X-ray structure was available (Beraud et al., 2002). Conceptually, Hus' method shares features with a method devised by Mueller et al. (2000), which determines the (usually 4-fold degenerate) peptide plane orientations compatible with experimental dipolar couplings prior to finding a chain compatible with these orientations. Finally, Fowler et al. (2000) demonstrated it is feasible to get information on the fold from 15N–1HN, 1HN1HN, and 1HN1Hα couplings without the need for 13C enrichment.

Our improved MFR approach, which we refer to as MFR+, represents a much more versatile and stable version of the original MFR method. The method differs from all previous database substructure methods by introduction of an intermediate step where the fragments are refined with respect to the experimental observables (shifts, couplings, and possibly short and medium range NOEs or torsion angle restraints) prior to their final selection and incorporation into a structure. It also utilizes the unique advantage of dipolar tensor parameters to maintain reasonable orientations for each fragment at all stages of the substructure assembly. The user has complete freedom to specify weighting factors used and to define the minimal criteria for deeming a selection to be “reliable.” With the default settings, and with dipolar couplings available from two different media, the program can rapidly generate backbone structures for the proteins ubiquitin and GB3 that are considerably less than 1 Å from their true structure. With less experimental data, accurate partial structures can be obtained, which subsequently can be used to assemble the structure either manually or by using docking algorithms (Clore 2000, Clore 2003).

Section snippets

Description of the MFR+ Method

The routines to conduct the homology search, visualize the results, and the assembly of a structure were written in the Tcl⧸Tk language and use “NMRWish,” an in-house version of the Tcl⧸Tk interpreter “wish,” which has been customized by the addition of routines to handle and manipulate tables, databases, and PDB format files. It can perform chemical shift (CS) and dipolar coupling (DC) simulations, carry out coordinate alignments, handle restraints (NOE, dihedral, CS, DC, J), and also includes

Application to Model Proteins

Application of the MFR+ method is demonstrated for three proteins for which extensive sets of experimental backbone dipolar couplings were available, ubiquitin, GB3, and DinI. Crystallographically determined structures are available for ubiquitin and GB3, and NMR structures are available for all three. The method has also been applied to several slightly larger proteins for which dipolar couplings were simulated, including thioredoxin, profilin, and interleukin-1β. In all these applications,

Concluding Remarks

The MFR+ approach provides a remarkably direct way to determine solution NMR structures from protein backbone RDC data, either without or with inclusion of a small set of local backbone NOE data. The approach utilizes only protein backbone data and thereby bypasses the side chain and NOE assignment step. However, resulting structures have limitations that are distinct from those encountered in conventional, NOE-based structural studies. The most significant limitation of the MFR+ method in its

References (80)

  • J.A. Losonczi et al.

    Order matrix analysis of residual dipolar couplings using singular value decomposition

    J. Magn. Reson.

    (1999)
  • F.M. Marassi et al.

    NMR structural studies of membrane proteins

    Curr. Opin. Struct. Biol.

    (1998)
  • C.D. Schwieters et al.

    Internal coordinates for molecular dynamics and minimization in structure determination and refinement

    J. Magn. Reson.

    (2001)
  • C.D. Schwieters et al.

    The Xplor-NIH NMR molecular structure determination package

    J. Magn. Reson.

    (2003)
  • S. Vijay-Kumar et al.

    Structure of ubiquitin refined at 1.8 A resolution

    J. Mol. Biol.

    (1987)
  • A. Weichsel et al.

    Crystal structures of reduced, oxidized, and mutated human thioredoxins: Evidence for a regulatory homodimer

    Structure

    (1996)
  • M. Zweckstetter et al.

    Prediction of charge-induced molecular alignment of biomolecules dissolved in dilute liquid-crystalline phases

    Biophys. J.

    (2004)
  • M. Andrec et al.

    Protein backbone structure determination using only residual dipolar couplings from one ordering medium

    J. Biomol. NMR

    (2001)
  • A. Annila et al.

    Recognition of protein folds via dipolar couplings

    J. Biomol. NMR

    (1999)
  • L.G. Barrientos et al.

    Characterization of surfactant liquid crystal phases suitable for molecular alignment and measurement of dipolar couplings

    J. Biomol. NMR

    (2000)
  • S. Beraud et al.

    Direct structure determination using residual dipolar couplings: Reaction-site conformation of methionine sulfoxide reductase in solution

    J. Am. Chem. Soc.

    (2002)
  • A.A. Bothner-by et al.

    High-field orientation effects in the high-resolution proton NMR-spectra of diverse porphyrins

    Magn. Reson. Chem.

    (1985)
  • D.T. Braddock et al.

    Rapid identification of medium- to large-scale interdomain motion in modular proteins using dipolar couplings

    J. Am. Chem. Soc.

    (2001)
  • M.T. Brenneman et al.

    A method for the analytic determination of polypeptide structure using solid-state nuclear magnetic-resonance—the metric method

    J. Chem. Phys.

    (1990)
  • A.T. Brunger

    “XPLOR: A System for X-ray Crystallography and NMR, 3.1 Ed.”

    (1993)
  • D.L. Bryce et al.

    Application of correlated residual dipolar couplings to the determination of the molecular alignment tensor magnitude of oriented proteins and nucleic acids

    J. Biomol. NMR

    (2004)
  • J.J. Chou et al.

    A simple apparatus for generating stretched polyacrylamide gels, yielding uniform alignment of proteins and detergent micelles

    J. Biomol. NMR

    (2001)
  • J.J. Chou et al.

    Solution structure of Ca2+-calmodulin reveals flexible hand-like properties of its domains

    Nat. Struct. Biol.

    (2001)
  • G.M. Clore

    Accurate and rapid docking of protein-protein complexes on the basis of intermolecular nuclear Overhauser enhancement data and dipolar couplings by rigid body minimization

    Proc. Natl. Acad. Sci. USA

    (2000)
  • G.M. Clore et al.

    R-factor, free R, and complete cross-validation for dipolar coupling refinement of NMR structures

    J. Am. Chem. Soc.

    (1999)
  • G.M. Clore et al.

    Docking of protein-protein complexes on the basis of highly ambiguous intermolecular distance restraints derived from H-1(N)⧸N-15 chemical shift mapping and backbone N-15-H-1 residual dipolar couplings using conjoined rigid body⧸torsion angle dynamics

    J. Am. Chem. Soc.

    (2003)
  • G.M. Clore et al.

    Assignment of the side-chain H-1 and C-13 resonances of interleukin-1-beta using double-resonance and triple-resonance heteronuclear 3-dimensional NMR-spectroscopy

    Biochemistry

    (1990)
  • G.M. Clore et al.

    Measurement of residual dipolar couplings of macromolecules aligned in the nematic phase of a colloidal suspension of rod-shaped viruses

    J. Am. Chem. Soc.

    (1998)
  • G. Cornilescu et al.

    Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase

    J. Am. Chem. Soc.

    (1998)
  • G. Cornilescu et al.

    Protein backbone angle restraints from searching a database for chemical shift and sequence homology

    J. Biomol. NMR

    (1999)
  • F. Delaglio et al.

    Protein structure determination using molecular fragment replacement and NMR dipolar couplings

    J. Am. Chem. Soc.

    (2000)
  • P.C. Du et al.

    Have we seen all structures corresponding to short protein fragments in the Protein Data Bank? An update

    Protein Eng.

    (2003)
  • R.L. Dunbrack et al.

    Conformational-analysis of the backbone-dependent rotamer preferences of protein side-chains

    Nat. Struct. Biol.

    (1994)
  • A.A. Fedorov et al.

    X-ray structures of isoforms of the actin-binding protein profilin that differ in their affinity for phosphatidylinositol phosphates

    Proc. Natl. Acad. Sci. USA

    (1994)
  • Cited by (56)

    • Molecular modeling of biomolecules by paramagnetic NMR and computational hybrid methods

      2017, Biochimica et Biophysica Acta - Proteins and Proteomics
      Citation Excerpt :

      Software packages like MECCANO (Molecular Engineering Calculations using Coherent Association of Non-averaged Orientations) can directly determine the protein structure [114]. In a similar method, molecular fragment replacement (MFR), the measured RDCs are directly compared with predicted RDCs from 3D fragments in a database generated from known structures in the protein databank [115,116]. This approach was shown to identify partial or complete homologous folds of the target protein [117–119].

    • Dynamic pictures of proteins by NMR

      2014, Annual Reports on NMR Spectroscopy
    • Analysis of non-uniformly sampled spectra with multi-dimensional decomposition

      2011, Progress in Nuclear Magnetic Resonance Spectroscopy
      Citation Excerpt :

      High resolution and the possibility to work with a one-dimensional representation of the multidimensional spectra dramatically simplifies identification of the signals and, as the consequence, allows reliable automated spectral analysis. The NUS-MDD approach for efficient handling of raw experimental data fits well into the context of recent advances in computational techniques for automated signal assignments [88,89], rapid protein structure determination, and macromolecular complex characterization [93–98]. Together, these methods should enable one to determine spatial structures and characterize protein dynamics and interactions with higher accuracy and much more rapidly than is possible at present, thereby enhancing the value of NMR spectroscopy in high-throughput applications such as structural genomics [99,100] and in conventional “hypothesis-driven” structural biology projects.

    • Solution conformation and dynamics of the HIV-1 integrase core domain

      2010, Journal of Biological Chemistry
      Citation Excerpt :

      To better sample low energy conformations of the catalytic loop that are in agreement with NMR chemical shifts, a second set of CS-ROSETTA calculations was performed. For this round, fragment selection (70) was once again carried out by using the backbone and 13Cβ chemical shifts; however, during refinement, all residues were fixed to the x-ray coordinates (PDB entry 1QS4, chain C) except for the catalytic loop (residues 139–153) and N-terminal residues 50–57. Specifically, the loop-relax protocol in ROSETTA 3.0 was used to generate 3,000 all-atom models, where conformational sampling was confined to the catalytic loop.

    • Chemical shift-based methods in NMR structure determination

      2018, Progress in Nuclear Magnetic Resonance Spectroscopy
      Citation Excerpt :

      The utility of MFR is highlighted by a measured backbone RMSD (Root Mean Square Deviation) of 1.2 Å (angstrom) between modelled and X-ray structures of ubiquitin [62], suggesting that folds for small proteins can be captured using solely the chemical shifts and dipolar couplings, thereby alleviating the need to acquire and analyze NOESY data. Further improvements have been made to this algorithm at various stages including fragment search, assembly, sidechain placement, and structure refinement by employing other NMR parameters, such as J-couplings and NOEs [64,65]. While the early MFR method could accurately model backbone structures of small proteins, a significant limitation remained with respect to sidechain placement [62], which has been addressed in more recent methods [64,65].

    View all citing articles on Scopus
    View full text