Elsevier

Tetrahedron

Volume 56, Issue 48, 24 November 2000, Pages 9461-9470
Tetrahedron

Synthesis of Multi-Domain Proteins Using Expressed Protein Ligation: Strategies for Segmental Isotopic Labeling of Internal Regions

https://doi.org/10.1016/S0040-4020(00)00830-9Get rights and content

Abstract

Here we describe how a sequential version of the protein semi-synthesis technique, Expressed Protein Ligation (EPL), can be used to assemble multiple (i.e. 3 or more) recombinantly-derived polypeptides segments into a target protein. Sequential EPL was successfully used to assembly the 304 amino acid eukaryotic adaptor protein, Crk-II, from three recombinant polypeptide segments in good yield. Moreover, the resulting multi-component ligation product was found to possess the expected biological activity in a series of ligand binding studies. By allowing the controlled assembly of 3 or more recombinant polypeptide segments, sequential EPL opens the door to the segmental isotopic labeling of internal regions of large proteins with NMR probe-nuclei.

Introduction

Expressed Protein Ligation (EPL) is a protein semi-synthesis technique which allows synthetic peptides and recombinant proteins to be chemoselectively and regioselectively joined together via a normal peptide bond.1., 2., 3. The technique combines the structural flexibility associated with chemical peptide synthesis with the extended size range of recombinant DNA expression, i.e. semi-synthetic proteins can be prepared from short synthetic peptide cassettes (containing any number of chemical probes) and much larger recombinant polypeptide building blocks. As with other approaches designed to allow the introduction of unnatural amino acids into proteins,4., 5., 6., 7., 8., 9. EPL offers the possibility of using synthetic chemistry to study protein structure and function in a manner entirely analogous to that routinely used to elucidate the structure-activity relationships (SAR) of small bioactive peptides.10 This type of ‘chemical protein mutagenesis’ provides expanded opportunities in protein engineering and should have applications in areas ranging from structural biology and biochemistry to basic cell biology.

EPL is an extension of the well established native chemical ligation approach, originally developed for the total chemical synthesis of proteins from fully unprotected synthetic peptides.7., 8., 9., 10., 11., 12. Native chemical ligation is based on the chemoselective reaction of a polypeptide containing a C-terminal thioester (α-thioester) with a second polypeptide containing an N-terminal cysteine (α-Cys) residue. At neutral pH, such unprotected peptides chemoselectively react with one another to form a native peptide bond at the ligation site.11 This is an extremely robust chemical process which can be performed in the presence of chaotropes, detergents, organic solvents and, importantly, all of the chemical functionalities found commonly in proteins. Thus, native chemical ligation is widely used in the synthesis of small proteins and protein domains.7., 12. Significantly, recent advances in protein engineering allow both the necessary reactive groups for native chemical ligation to be introduced into recombinant polypeptides (Fig. 1). This, of course, means that all possible combinations of synthetic and recombinant building blocks can be assembled in a semi-synthetic ligation process, commonly referred to as Expressed Protein Ligation.

As illustrated in Fig. 1, recombinant polypeptides containing α-Cys moieties or α-thioester groups can be generated directly from the appropriate fusion-protein precursors. Reactive recombinant protein α-thioesters can be prepared by exploiting the natural process, protein splicing, a post-translational event known to involve thioester intermediates.13., 14., 15., 16. It is possible to chemically intercept the splicing process with a suitable thiol by appending the recombinant polypeptide of interest to a mutated protein splicing domain (termed an intein,17), thereby generating the corresponding recombinant protein α-thioester (Fig. 1A). Two general strategies have been developed for the production of α-Cys containing recombinant polypeptides, both of which involve proteolytic removal of an N-terminal leader sequence from a precursor fusion protein (Fig. 1B). The first approach makes use of specific proteases which cleave C-terminal to their recognition site—factor Xa, enterokinase or Ubiquitin C-terminal hydrolase. By appending the appropriate recognition sequence immediately in front of a cryptic N-terminal cysteine in the protein of interest, it is possible to generate the requisite α-Cys moiety by in vitro treatment of the fusion protein with the protease.18 An alternative strategy has recently been described which does not require a separate proteolysis step but instead makes use of an auto-processing fusion protein system.19., 20., 21. This approach was again developed from studies on protein splicing and utilizes yet another engineered intein splicing domain which spontaneously cleaves itself off the precursor fusion protein, again to give the requisite N-terminal cysteine protein.

Expressed protein ligation has been used in a variety of protein engineering studies, allowing the incorporation of post-translational modifications,1 unnatural amino acids,22 and biochemical/biophysical probes2., 18., 23. into proteins. The technology has also been used to generate backbone cyclized and polymeric proteins,24., 25., 26., 27. as well as proteins which act as biosensors for signal transduction processes.28., 29. Of particular relevance to the present work, EPL and related technologies (see below) have been used in so-called ‘Segmental Isotopic Labeling’ strategies30., 31., 32., 33. designed to address the practical size limit for protein structure determination using nuclear magnetic resonance (NMR) spectroscopy. This limit is attributable to the loss of spectral resolution occurring from both increased line widths at longer rotational correlation times, and from the increased number of signals of similar chemical shifts—both effects are proportional to the number of amino acids in the protein. The former of these problems has to some extend been addressed with the development of new NMR experiments such as Transverse Relaxation Optimized Spectroscopy (TROSY)34 and approaches for measuring residual dipolar coupling constants.35 However, standard isotopic labeling strategies involving uniform incorporation of 13C, 15N and perdeuteration of amino acid side-chains do not address the signal overlap problem for larger systems. Segmental isotopic labeling seeks to resolve this issue by allowing selected segments of a protein to be isotopically labeled with 13C, 15N and 2H. In principle, segmental isotopic labeling of a large protein will lead to simplified NMR spectra; unlabeled regions of the protein can be filtered out using suitable heteronuclear correlation experiments leaving only signals from the labeled part of the protein. Consequently, segmental labeling should allow NMR structure analysis of discrete regions of very large proteins (Fig. 2).

The feasibility of segmental isotopic labeling has recently been demonstrated using two different peptide ligation strategies, trans-splicing30., 32., 33. and EPL.31 The trans-splicing approach is based on the observation that protein splicing can be triggered by reconstituting inactive N- and C-terminal fragments of an intein.36., 37., 38., 39. This provides a means of joining any two recombinant proteins together in vitro; each protein is expressed as a fusion with complementary parts of the split-intein, simply refolding the fusion proteins together results in activation of protein splicing activity (through non-covalent association of the intein fragments) and so generation of the desired chimera. In a pioneering study, Yamazaki and co-workers30 exploited the trans-splicing phenomenon for segmental isotopic labeling of proteins. Using a trans-splicing system based on the PI-PfuI intein, these researchers were able to selectively 15N label the C-terminal domain of the E. coli RNA polymerase α subunit. Each half of the protein was selectively labeled by simply expressing the corresponding split intein fusion in 15N enriched medium. Heteronuclear NMR experiments on these samples revealed clearly the improvement in spectral resolution that segmental labeling provides. In a complementary study,31 EPL was used to 15N label a single domain within a Src-homology domain pair derived from the Abl protein tyrosine kinase. An ethyl thioester derivative of recombinant Abl-SH3 was generated from the corresponding intein fusion and then chemically ligated, under physiological conditions, to 15N-labeled recombinant Abl-SH2 possessing a factor Xa generated α-Cysteine. Comparison between the 1H{15N} NMR spectra of fully labeled and segmental labeled Abl-SH(32) again illustrated the power of segmental isotopic labeling for studying large proteins by NMR spectroscopy.

The biosynthetic strategies described above allow the amino or carboxyl terminal half of a protein to be selectively isotopically labeled. A logical extension of this work would be to develop strategies which allow segmental isotopic labeling of internal regions of a large protein (Fig. 2). This important technical advance has recently been achieved by Yamazaki and co-workers33 who selectively labeled an internal region of the model protein, Maltose Binding Protein (MBP), using a clever tandem trans-splicing method. Key to this approach was the use of two orthogonal split inteins, PI-pfuI and PI-pfuII, both from the organism Pyrococcus furiosus. The target MBP protein was expressed as three split-intein fusions; the central MBP segment contained PI-pfuII and PI-pfuI intein fragments at its N- and C-terminus respectively, while the N- and C-terminal MBP segments contained the complementary PI-pfuII and PI-pfuI fragments at their C- and N-termini, respectively. Synthesis of the desired internally labeled MBP was facilitated by reconstitution of the three purified polypeptides (Fig. 2A). Importantly, the authors were able to prevent undesirable inter-intein trans-splicing by splitting the two inteins at different points in the primary sequence, thereby allowing the two desired trans-splicing reactions to occur simultaneously with high fidelity and efficiency. Indeed, an elegant feature of this approach is the self-assembly of the four intein fragments which not only allows the one-pot synthesis of the product, but significantly reduces the concentration dependence of this third-order reaction. Potential drawbacks of the strategy are that it results in the insertion of several additional amino acids at the two splice junctions and its requirement for relatively harsh reaction conditions (both chemical denaturants and elevated temperatures are at various points required).

Although EPL has yet to be used for the selective isotopic labeling of an internal region of a protein, the biosynthetic tools required for this purpose have been reported.28., 29. These make use of a so-called sequential EPL reaction which allows a synthetic peptide or a recombinant polypeptide to be inserted into the interior of a recombinant protein framework (Fig. 2B). In this strategy, the polypeptide insert contains both an α-thioester and a cryptic α-Cys masked by a factor Xa cleavable pro-sequence. This reversible cysteine protection is necessary to prevent the insert reacting with itself in either an inter- or intramolecular fashion.24., 27. In the first step, the insert is reacted through its α-thioester group to the α-Cys moiety of a C-terminal recombinant polypeptide. The pro-sequence is then removed from this intermediate ligation product by treatment with factor Xa, to reveal the requisite α-Cys for the second ligation reaction, this time with the N-terminal polypeptide α-thioester segment. In this way, the regioselective insertion is achieved. As with all EPL reactions, the chemistry can be performed in the presence or absence of chaotropic reagents and procedures are now available which allow all of the steps to be performed either in solution28 or on the solid-phase.29

In principle, sequential EPL permits the segmental isotopic labeling of internal regions of proteins. In the present work, the feasibility of this is demonstrated through the synthesis of the eukaryotic adaptor protein, c-Crk-II, from three recombinant polypeptide fragments. Moreover, we show that the resulting multi-component ligation product is fully biologically active in a series of ligand binding studies. Thus, sequential EPL permits both the endo and exo segmental isotopic labeling of this protein.

Section snippets

Synthetic design

c-Crk is a member of the so-called ‘adaptor protein’ family of intracellular signaling proteins. The major role of adaptor proteins such as Grb2, Nck, and Crk, is to recruit proline-rich effector molecules to tyrosine phosphorylated kinases or their substrates, a function which they mediate through their Src homology 3 (SH3) and SH2 domains.40., 41. Thus, adaptor proteins play a critical role in regulating signaling pathways, and c-Crk has been implicated in several such processes including

Summary

By allowing the controlled assembly of 3 or more polypeptide segments, sequential EPL permits the segmental isotopic labeling of internal regions of large proteins. In the present work we have demonstrated that it is possible to assemble the 304 amino acid protein, c-Crk-II, from three recombinant polypeptide segments and that, to the first approximation, the purified reassembled protein retains the ligand binding properties of the wild-type protein. Optimization of the protein expression and

General materials and methods

Analytical gradient HPLC was performed on a Hewlett-Packard 1100 series instrument with 214 and 280 nm detection using a Vydac C18 column (5 micron, 4.6×150 mm) at a flow rate of 1 mL/min. Preparative HPLC was performed on a Waters DeltaPrep 4000 system fitted with a Waters 486 tunable absorbance detector using a Vydac C18 column (15–20 micron, 50×250 mm) at a flow rate of 30 mL/min. All runs used linear gradients of 0.1% aqueous TFA (solvent A) vs 90% acetonitrile plus 0.1% TFA (solvent B).

Acknowledgements

We thank Drs D. Cowburn and R. Birge (Rockefeller University) for many useful discussions. This research was supported by the National Institutes of Health (GM55843, T. W. M.), the Burroughs-Wellcome fund (G. J. C.) and a Merck postdoctoral fellowship (U. K. B.).

References (50)

  • K. Severinov et al.

    J. Biol. Chem.

    (1998)
  • C.J.A. Wallace

    Curr. Opin. Biotechnol.

    (1995)
  • J. Wilken et al.

    Curr. Opin. Biotechnol.

    (1998)
  • G.J. Cotton et al.

    Chem. Biol.

    (1999)
  • T.W. Muir et al.

    Methods Enzymol.

    (1997)
  • F.B. Perler

    Cell

    (1998)
  • D.A. Erlanson et al.

    Chem. Biol.

    (1996)
  • T.C. Evans et al.

    J. Biol. Chem.

    (1999)
  • S. Mathys et al.

    Gene

    (1999)
  • R.S. Roy et al.

    Chem. Biol.

    (1999)
  • H. Iwai et al.

    FEBS Lett.

    (1999)
  • T.C. Evans et al.

    J. Biol. Chem.

    (1999)
  • G.J. Cotton et al.

    Chem. Biol.

    (2000)
  • H. Wu et al.

    Biochim. Biophys. Acta

    (1998)
  • L. Buday

    Biochim. Biophys. Acta

    (1999)
  • Y. Hashimoto et al.

    J. Biol. Chem.

    (1998)
  • G. Wider et al.

    Curr. Opin. Struct. Biol.

    (1999)
  • S. Chong et al.

    J. Biol. Chem.

    (1998)
  • U.K. Blaschke et al.

    Methods Enzymol.

    (2000)
  • X. Wu et al.

    Structure

    (1995)
  • Y.Q. Gosser et al.

    Structure

    (1995)
  • T.W. Muir et al.

    Proc. Natl. Acad. Sci. USA

    (1998)
  • T.C. Evans et al.

    Protein Sci.

    (1998)
  • V.W. Cornish et al.

    Angew. Chem., Int. Ed. Engl.

    (1995)
  • Y. Chen et al.

    Science

    (1994)
  • Cited by (0)

    View full text