Optimization of sortase A ligation for flexible engineering of complex protein systems

Engineering and bioconjugation of proteins is a critically valuable tool that can facilitate a wide range of biophysical and structural studies. The ability to orthogonally tag or label a domain within a multidomain protein may be complicated by undesirable side reactions to noninvolved domains. Furthermore, the advantages of segmental (or domain-specific) isotopic labeling for NMR, or deuteration for neutron scattering or diffraction, can be realized by an efficient ligation procedure. Common methods—expressed protein ligation, protein trans-splicing, and native chemical ligation—each have specific limitations. Here, we evaluated the use of different variants of Staphylococcus aureus sortase A for a range of ligation reactions and demonstrate that conditions can readily be optimized to yield high efficiency (i.e. completeness of ligation), ease of purification, and functionality in detergents. These properties may enable joining of single domains into multidomain proteins, lipidation to mimic posttranslational modifications, and formation of cyclic proteins to aid in the development of nanodisc membrane mimetics. We anticipate that the method for ligating separate domains into a single functional multidomain protein reported here may enable many applications in structural biology.

The development of tools to engineer and provide orthogonal labeling/tagging of proteins has expanded significantly in recent years (1,2). These tools range from insertion of singlesite nonnatural amino acids with reactive side chains (3,4) to tagging or extending a protein with "extra" groups, such as peptides and modified peptides, fluorophores, lipids, or carbohydrates (5)(6)(7)(8)(9). The ability to conduct such orthogonal labeling enables unique experimental tools to investigate structural and biophysical properties of proteins and complexes. An impor-tant subset of this methodology is the approach to constructing full-length multidomain proteins from separate segments or domains of the full protein. This methodology has become popular in NMR structural biology as a means to introduce stable isotopic labeling patterns on different components/segments of the full-length protein (10 -12) or to enable adventitious orthogonal tagging of a domain and then to construct the fulllength protein. This engineering, generally referred to as segmental labeling, allows detailed investigations of a domain within the context of the full system. All of these concepts share the goal of post-translationally ligating separate components to yield the desired intact system. Each component can be individually prepared with a desired isotopic labeling, or even individually tagged with a desired probe, prior to combination into the full molecular system. For the case of protein ligations, recent studies have shown the power of using the transpeptidase sortase A to enzymatically link two protein domains together (10,11,13). Recently, another transpeptidase enzyme (Oldenlandia affinis asparaginyl endopeptidase) has been proposed to enable transpeptidase ligation (14 -16). Sortase-mediated ligation (SML) 4 has been found to be advantageous or superior to expressed protein ligation (17), protein trans-splicing (18,19), and native chemical ligation (20).
There have been numerous investigations into the mechanisms and applications of Staphylococcus aureus sortase A (SrtA) ligations (also known as "sorttagging") (21,22). As part of a protein design/evolution study, SrtA was investigated with the intent to improve its efficiency in bond forming (23), completely separate from the ligation application arena of SrtA. A number of mutations were discovered through yeast phage display that showed significant/dramatic improvements in catalytic function. One of these mutants has become popular as the penta-SrtA (5-point mutations, P94R/D160N/D165A/K190E/ K196T; also known as eSrtA, named Srt5M in this study), and a subsequently developed, calcium-independent hepta-SrtA (7-point mutations, P94R/E105K/E108A/D160N/D165A/ K190E/K196T) (24,25) variant has been used for sorttagging on cell surfaces (26,27). The mechanistic role of these mutations has only been partially examined, in the context of the structure of a peptide-linked structure of SrtA (28). It is clear that the recognition peptide for the N-domain is critical to the reaction; however, only one or two of the mutations (P94S and D165N) in penta-SrtA cluster sufficiently close around the recognition site of this peptide to suggest a plausible interaction. It is more reasonable to predict that the collective actions of these mutations alter the binding pocket for the recognition sequence such that the on-and off-rates for this recognition sequence may be impacted favorably, leading to the increased activity. Detailed explanations would require structural investigation of the variants in complex with the recognition peptide. The C-domain recognition sequence is simply a GG-, and it appears that there is little specific structural recognition of this sequence, suggesting less involvement and less opportunity to optimize performance. Current applications in the literature have focused on the use of penta-SrtA or hepta-SrtA.
The basic ligation reaction is outlined in Fig. 1. Freiburger et al. (11) reported a generalized procedure that involves complementary tags and proteolytic cleavage sites on the N-terminal domain (NTD) and C-terminal domain (CTD) components. The optimized protocol suggested an ϳ6-h reaction time. There have also been reports of alternative methods to limit the back reaction (29). The motivation of the present work was to develop a system that was fast relative to previous protocols (10,11) and addressed complications due to the reverse reaction limiting the extent of ligation and facilitated separation of the cleaved by-products. Our particular interest is to find efficient, practical protocols, when dealing with difficult human proteins, that are the subject of structural biology investigations. The developments are based on the work of Chen et al. (23), who examined variants in a truncated version of SrtA (residues 60 -204) for their catalytic efficiencies.
We have explored three of the SrtA variants (Srt3M, Srt4M, and Srt5M, corresponding to 3-, 4-, and 5-point mutations, respectively; Fig. S1) (23) with respect to optimization of conditions and relative advantages that each may present for different applications of engineering. We find comparative advantages for different SrtA variants, depending on the applications and the conditions. By tailoring conditions to the specific application and SrtA variant, it is possible to achieve high levels of ligation, minimize back reactions, and simplify some of the previously reported procedures (11). The Srt3M and Srt4M variants are shown to be especially adaptive to different conditions and provide the best balance for reaction rate/efficiency and robustness of solution conditions to enable a range of ligation situations to create (i) segmentally labeled or joined proteins, including proteins with intrinsically disordered regions (IDRs), (ii) circularization of proteins (30,31), and (iii) lipidation of proteins (32) for anchoring in nanodiscs.

Comparison of sortase A variants
All experimental protocols were developed using the truncated, catalytic domain of S. aureus SrtA(⌬59), comprising residues 60 -204 of the full-length SrtA (residues 1-204) (see Fig.  S1). This 17.6-kDa version of SrtA is equivalent to the variants described by Chen et al. (23). The hepta-SrtA (27) provides calcium independent reactions; however, we find that the ability to quench the reaction by calcium extraction with EDTA can be advantageous. Four variants (Fig. S1, WT Srt, Srt3M, Srt4M, and Srt5M) were expressed and purified in-house. These vari- The circled residues (LPXT and G) represent the SrtA recognition sites within the NTD and CTD, respectively. SrtA variants and both the NTDs and CTDs are routinely constructed with affinity tags. The most common are His 6 , whereas others, such as StrepII, have been used.
Optimizing sortase A ligation for protein engineering ants were described previously, and the kinetic parameters associated with the enzymatic reaction were reported (23).
Optimization of the concentration of SrtA and the stoichiometry (ratio of NTD/CTD) was performed using two model peptides (NTD peptide and CTD peptide, 30 residues each) using Srt3. Sequence and reaction data are in Figs. S2 and S3. The findings are found to be similar or equivalent for Srt4M and Srt5M (Fig. S3D). The optimal Srt concentration was found to be 5 M. Equivalent results were found for all Srt variants (data not shown). Varying the stoichiometry of NTD/CTD can impact the yield of ligated product. It was observed that the extent of ligation can be driven further by increasing either the N-domain or the C-domain concentration. We have tested up to 10:1 (and 1:10) ratios, although 2:1 (or 1:2) suffices for most cases. Determining the appropriate ratio of N-domain and C-domain, for a specific system, is important when one or the other of these domains is either isotopically labeled or difficult to produce. Increasing the ratio of the unlabeled (or easy-toproduce) domain can be more efficient and cost less. We prefer to use more NTD in large-scale SML reactions, because unreacted NTD and Srt are both His-tagged and easily removed by Ni-NTA resin.
A significant difference between the Srt variants was observed in the optimal reaction time. Fig. 2 shows the comparison of Srt3M, Srt4M, and Srt5M in a time-course study, using the NTD and CTD for ligation of the E3 ligase gp78. We observed an optimal reaction time for Srt3M to be between 2 and 3 h, and significant back reaction (degradation) began to appear after 4 h or longer. Srt4M is the most efficient enzyme, with maximal ligation achieved in as little as 15 min. Srt5M is the second fastest enzyme, with optimal reaction in the 1-2-h range. The extent of ligation is a function of the catalytic efficiency of the Srt variant, the reverse reaction, and reaction time.
In Table 1, we indicate the relative efficiencies of the Srt variants based on our peptide ligation and other NTD and CTD protein segment ligations. It is clear that the use of Srt4M allows very rapid reactions and a high yield of ligated product. The rapid rate of the forward reaction offers opportunities for further optimizations under challenging conditions (see below). The rapid reaction of Srt4M can eliminate the need for either the dialysis method (10,33) or the centrifugal concentrator (11) to remove the cleavage product from the CTD. The ability to effectively trap the Srt variants with Ni-magnetic beads provides minimization of the back reaction. Appropriate combination of the Srt variant for the speed of reaction with the use of either rapid capture Ni-magnetic beads or Ni-NTA resin chromatography can yield a high net ligation efficiency.

Ligation of proteins with IDRs
The ubiquitin ligase gp78 is a multidomain E3 ligase that, in conjunction with the E2 conjugating enzyme Ube2g2, constructs Lys-48 -linked ubiquitin chains on substrates (34). Our previous work on this system (35)(36)(37) followed the classical divide-and-conquer strategy of structural biology; however, it is important to examine the complete, intact system. The cytoplasmic domain of gp78, referred to as gp78C, consists of 287 amino acids (30.3 kDa) and contains an ordered RING domain, an ordered CUE domain, and an E2-binding domain (G2BR).
There are flexible IDRs between the RING and CUE domains and between the CUE and G2BR domains. We designed an NTD construct containing RING and the first IDR linker (137 residues, 13.5 kDa) and a CTD containing CUE-IDR linker-G2BR (157 residues, 17.7 kDa) ( Fig. 2A). These domains were ligated using Srt3M, Srt4M, and Srt5M as shown in Fig. 2 (B-D). Whereas Srt4M and Srt5M produce their maximum ligation at 15 and 60 min, respectively, the net yield is greater for Srt3M, and there are net benefits for use of Srt3M (see below). With Srt3M, the reaction goes to Ն50% at 2 h, and the fulllength protein is readily purified with Ni-magnetic beads (to remove Srt3M and unreacted RING-containing NTD), followed by SEC to separate ligated gp78C from the CTD CUE-G2BR. By designing the NTD and CTD constructs to place the Srt recognition and ligation sites within the exposed IDRs, the reaction proceeds rapidly for all Srt variants, similar to what was observed for peptides. Srt3M is optimal for this application based on its efficient yet moderate reaction rate. The moderate ligation rate allows sufficient time to rapidly remove the enzyme (using batch application of Ni-NTA resin or Ni-NTA magnetic beads), while simultaneously minimizing the back reaction. The ligated gp78C contains the Zn-binding RING domain, which places two restrictions on the ligation reaction: (i) use of EDTA would strip out the Zn and unfold the domain, hence it must be avoided, and (ii) Srt inhibitors that react with the active-site Cys (30,38,39) will also react with the Zn-binding Cys and must be avoided. Large-scale reactions are performed in volumes up to 50 ml, enabling the protocol to produce milligram samples of ligated full-length gp78C in a few hours.
The selection of labeling pattern and choice of either NTD or CTD as the labeled construct can yield all desired labeling combinations, thus enabling examination of either backbone resonances through 15 N-and uniform 13 C-labeling or examination of methyl group reporters of binding interaction and dynamics (40). This segmental labeling enables future studies of the complex formed between gp78C and both Ube2g2 and Ube2g2ϳUb (Ube2g2 ligated with ubiquitin) (35,36,41). Fig. 2E illustrates the superposition of the 15 N HSQC spectrum of a uniformly 15 N-enriched gp78C with the 15 N HSQC spectrum of a segmentally labeled gp78C, where the NTD RING is 15 N-labeled and the CTD CUE-G2BR is unlabeled. The uniformly 15 N-enriched gp78C protein spectrum exhibits unresolved clusters of resonances that correspond to the IDR linkers, which complicates monitoring of RING reporter resonances for further structural and dynamics investigations. The simplification of the spectrum and the near equivalence of the spectrum with that of the RING domain alone indicates the power of the method, wherein interactions of the RING can be monitored in situ with the full-length gp78C in complex with a ubiquitinconjugating enzyme (Ube2g2). Fig. 2F illustrates labeling of the CTD CUE-G2BR with 13 CH 3 groups in the isoleucine, leucine, and valine residues (the so-called ILV labeling) (42). This spectrum reveals the ability to focus only on the CUE and G2BR domains of the intact protein. These spectra illustrate that, by selecting the labeling format, either the RING or the CUE-G2BR can be monitored using either 15 N or 13 C NMR probes to suit a desired investigation.

Optimizing sortase A ligation for protein engineering Ligation of two-domain proteins
The GTPase-activating protein (GAP) ASAP1 is a multidomain protein that regulates Arf1 signaling. The biochemical activity of ASAP1 is contained in a three-domain stretch (PZA) of the protein comprised of the PH domain (P), the ArfGAP domain (Z, for zinc binding, which comprises the ArfGAP domain), and an Ankrin-repeat domain (A) (43). Different constructs of these domains (the PH alone, the tandem ZA domain,

Optimizing sortase A ligation for protein engineering
and the three domains PZA) are readily expressed and isotopically labeled. Ongoing structural studies (44) enable detailed investigation of the structures and interactions of these domains with Arf1. To facilitate studies of the multicomponent complexes at the surface of nanodisc membrane mimetics (45), we explored the ability to ligate the P and ZA domains to make a functional PZA domain, thus opening the potential of segmental isotopic labeling. The WT linker between P and ZA is 10 amino acid residues. This linker was engineered to provide the appropriate recognition sequences in the PH domain as NTD and the ZA as the CTD, such that the linker in the ligated protein remains 10 amino acid residues (Fig. S4C). It is possible to ligate these domains using either Srt4M or Srt5M as shown in Fig. 3. However, the stability of these NTD and CTD constructs required further optimization using lower temperature combined with the addition of 10 mM L-arginine. These conditions benefited from the increased reaction rate of Srt4M. The modified protocol for the PH and ZA ligation (same Srt-ligation buffer with the addition of 10 mM L-Arg) takes advantage of the high reactivity of Srt4M compared with Srt5M (Fig. 3, C and D). Approximately 40 -50% of ZA is ligated to yield full-length srt-PZA in 1 h at 4°C. The production of milligrams of ligated full-length PZA can be achieved in 1 day with this method. The ligated srtPZA retains the biochemical activity compared with

Optimizing sortase A ligation for protein engineering
WT PZA as shown in Fig. 3E. Segmentally isotope-labeled PZA can be prepared using 15 N-labeled PH and unlabeled ZA, and NMR spectra demonstrating the advantages are shown in Fig.  3B. The spectra of uniformly 15 N-labeled PZA is very crowded (see Fig. 3B, inset) due, in part, to the nature of the ZA secondary structure. Similar advantages can be obtained using 13 Cmethyl labeling. The simplification of the spectra illustrates the power of the methodology and the ability to have specific and unique reporting of only the PH domain (or the ZA domain if labeling is reversed) in the context of multicomponent complexes of ASAP1 and Arf1 (45).

Ligation to create circular proteins
It has been demonstrated that SrtA can be used to create cyclic peptides (31). Also, Srt5M ligation was employed to make cyclic versions of membrane scaffolding proteins (MSPs) to assemble into nanodiscs (30), which exhibit increased stability and other desired properties of the nanodisc particles. Whereas an alternative method of forming cyclic MSPs using inteins has been reported (46), it is valuable to have optimized procedures using SrtA to enable other orthogonal labeling/tagging that may not be feasible in the intein-based methods. In these applications, particularly with the membrane scaffolding proteins, the aspects of ligation completeness and elimination of back reaction are very important. The ligation efficiency to circularize NW9 (a variant of membrane scaffolding protein MSP⌬H5 (47)), into cNW9 was found to be reasonable with Srt5M, as reported (30). We have observed that, with Srt5M, a minor component of head-to-tail linear dimers can be formed during the reaction, which, with time, degrades back to monomers via the back reaction and eventually leads to circular monomers. We tested the ability of all of the Srt variants to perform the ligation (Fig. 4A) of the NW9 to cNW9 (30). Furthermore, the amphipathic nature of MSP suggests that its behavior in aqueous solution may change substantially during the cyclizing ligation reaction and result in aggregation and precipitation. Hence, we explored the role of detergent additives to promote the reaction and stabilize the product. Table 2 indicates the tolerance of the reaction conditions to the presence of cholate and N-dodecyl ␤-D-maltoside (DDM). It was observed that WT Srt and Srt3M are very inefficient at producing cNW9, whereas both Srt4M and Srt5M can produce reasonable quantities of cNW9. Furthermore, Srt4M can tolerate DDM readily up to 1 mM in the ligation reaction, and, whereas reaction in the presence of DDM is slower, it minimizes oligomerization of cNW9 to an undetectable level (Fig. 4B). Optimization of the reaction yields 100% completion overnight (with cholate) or in 40 h (with DDM) at 4°C (Fig. 4). The resultant cNW9 is readily assembled into stable nanodiscs (ND), which exhibit the stable properties reported previously (30,46). Based on these observations, we focused on the use and application of Srt4M in the production of cyclic MSP for the formation of nanodiscs.

Ligation to form lipidated proteins
The production of naturally occurring lipidated proteins, such as N-terminal myristoylated proteins, can be quite challenging. Lipid acyl chains attached to human proteins play significant roles in localization in membranes, and it is critical to have these post-translational modifications present in structural biology studies of such systems. We have recently shown that the efficiency of myristoylation of proteins can be significantly elevated and combined with isotopic labeling to yield fully functional systems (45). However, for both sophisticated NMR and neutron-scattering and reflectometry experiments, it is highly desirable to produce labeling patterns that include deuteration of the nonexchangeable hydrogens (42,48). We have observed that the combined metabolic pressure of myristoylation and deuteration can sometimes be overwhelming for overexpression in Escherichia coli, whereas the nonmyristoylated proteins can be efficiently deuterated. Hence, we explored the use of SrtA to lipidate the N terminus of human Arf1 protein, a small GTPase important in cancer (49,50). The N-terminal peptide of Arf1 interacts with the membrane, and the myristoyl moiety, which is naturally added to the N terminus, embeds into the membrane and facilitates localization to the membrane (51, 52). This is a common mechanism for other

Optimizing sortase A ligation for protein engineering
GTPase proteins, such as Ras (53,54). Lipidation of proteins was previously illustrated using SrtA (32), wherein this was performed as a C-terminal addition to GFP, largely as a proof of concept. We elected to examine the use of SrtA to create a system close to the naturally occurring Arf1, using a synthesized myristoylated peptide containing the SrtA recognition sequence LPATG (Fig. 5A). Three peptides of various lengths were synthesized and tested for ligation efficiency. The most efficient myr-peptide (myr-GWKKQSLPATGQEHHHHHH) was used for subsequent lipidation studies. An additional 10-amino acid is introduced (GWKKQSLPAT) through ligation employing Srt4M in the presence of 20 mM cholate. Ligation to Ͼ90% can be achieved in 1 h at room temperature (RT) with a peptide/CTD ϭ 2:1 ratio. In Fig. 5B, we compared the incorporation of both in vivo produced myr-Arf1 (generated in E. coli by co-expressing with N-terminal myristoyl transferase, as described previously (45)) and in vitro Srt4M-ligated myr-sr-tArf1 for the ability to reconstitute into nanodiscs. In Fig. 5C, we demonstrate that this mimetic is functional in GDP loading and GTP hydrolysis assays. It is possible to isotopically label Arf1 with 13 CH 3 ILV methyl groups in the background of full deuteration. Utilizing the ligation approach, it is now possible to lipidate and create a nearly natural mimetic of myr-Arf1. Finally, Fig. 5D illustrates the high quality of NMR spectra that can be obtained for 13 CH 3 -ILV-2 H-myr-Arf1 associated with nanodiscs (45). These data illustrate the flexibility to obtain highly deuterated, functional 13 CH 3 -ILV-2 H-myr-Arf1 that will support a range of structural, dynamics, and biophysical studies.
Similarly, we explored the ability to localize gp78C to the membrane surface using in vitro myristoylation. gp78C is a natural membrane protein, with a predicted five-transmembrane region (34,55); however, expression and reconstitution of the natural system has proven extremely difficult. Hence, most of the structural and biochemical studies have been performed on the soluble, cytosolic domain of gp78 (34,35). We designed an approach to the myristoylation of gp78C (Fig. 6A) using Srt4M in 20 mM cholate and the same synthetic myristoyl peptide used for Arf1. In vitro myristoylated gp78C can be readily incorporated into nanodiscs, similar to the procedures used for myr-Arf1 (see above). Free (or excess) myr-gp78C is readily separated from ND-bound gp78C in the SEC purification of ND. Fig. 6B illustrates the incorporation of approximately two myr-gp78C per nanodisc, suggesting one myr-gp78C per side of the symmetric ND. This stoichiometry is similar to incorporation of myr-Arf1 into ND, which was confirmed by both SDS-PAGE analysis and translational diffusion (45). Fig. 6C illustrates the 13 C HMQC NMR spectra obtained for soluble gp78C and for myr-gp78C bound to nanodiscs. The equivalence of the spectra indicates that the domains are all intact and functional. These data indicate that studies of the membrane-bound gp78C will become feasible, both for structural and biochemical examination of ubiquitination.

Summary of ligations
Five different SML reaction systems have been investigated. The final conditions for these systems are provided in Table 3.

Discussion
The analysis of three variants of sortase A (Srt3M, Srt4M, and Srt5M) illustrates the range of possibilities in selecting an enzyme for ligation of two protein domains into a single, complex protein. We have demonstrated that a range of ligations can be efficiently performed, ranging from the ligation of IDRs, globular domains joined by a moderate linker, to formation of cyclic MSPs for use in nanodiscs and the flexible lipidation of proteins. We have illustrated that there is not a single "optimal" variant, and the ability to match a variant to the desired application is very powerful. The competing factors in selecting a sortase variant are the rate of reaction, extent of back reaction, the ability to purify the desired product from the reactants and Srt enzyme, and the ability to use additives (e.g. stabilizers or detergents). We also demonstrate that adaptation using temperature is available to tailor the selection of a variant. In terms of reaction rate, the variant order is WT Srt Ͻ Ͻ Srt3M Ͻ Srt5M Ͻ Srt4M. The discovery of the variants of Srt (23) has clearly enabled flexible use of SML in numerous applications.
Srt3M provides a substantial improvement in reaction rate compared with WT Srt, with optimal reaction times of ϳ2-4 h at RT. Furthermore, the reaction rate is slow enough that the product can be rapidly and efficiently purified from the reac-tants and Srt enzyme to minimize the back reaction. These properties enable a simplified reaction condition in conical tubes that does not require a dialysis or centrifugal concentrator (10,33), although these could be used if desired. This variant is recommended for multiple-domain systems linked with IDRs, which suggests that this method would be beneficial to the growing field of intrinsically disordered proteins (56,57).
Srt4M has been shown to be the fastest enzyme in practical tests involving ligation of peptides, IDRs, and folded domains. It demonstrates very efficient ligation and moderate back reaction, which is consistent with the effective k cat /K mGGG-COOH reported for this variant (23). The enzyme functions well at both RT and reduced temperature (4°C), and Srt4M exhibits the greatest tolerance for detergents and other additives. The ability to tune the reaction rate with temperature is advantageous for matching the stability of the individual domains. This was illustrated with the ASAP1 PZA protein, where the stability of the NTD (PH domain) and CTD (ZA domain) benefited from reaction at lower temperature, where the speed of the enzyme enabled a short overall reaction time of 1-2 h in 4°C. This variant is the best choice for proteins with short linkers between structured domains, protein circularization, and lipidation. In particular, it is the most tolerant of the variants to the addition

Optimizing sortase A ligation for protein engineering
of detergents such as DDM and cholate. The production of cyclic MSPs using Srt4M is readily tuned by the addition of DDM and variation of temperature. Interestingly, we observe the back reaction only for intermediate head-to-tail dimers, which then cleave and proceed to circularize. There is no back reaction for cyclic MSPs, indicating that the circular protein cannot readily bind in the active site. Consequently, Srt4M efficiently provides 100% reaction to produce cNW9. It is anticipated that similar observations will hold for moderately smalldiameter cyclic MSPs/nanodiscs, as used in solution and solidstate NMR studies, whereas larger-diameter MSPs, such as those used for EM (30), may behave differently. Srt5M has become quite popular for SML and is readily available from Addgene. This variant is intermediate in ligation rate (between Srt3M and Srt4M) and has a manageable back reaction rate. Centrifugal concentrators and dialysis during the reaction have been recommended with success in managing the back reaction with Srt5M (10,11,33). It was found that the reaction properties of Srt3M (moderate) and Srt4M (fast) enable different approaches to (quick) purification and can avoid back reaction without these steps. Although we have not included centrifugal concentrators or dialysis in our SML protocol, it may be possible to employ these techniques in combination with Srt4M to reach even higher levels of ligated product. An application related to those presented in this study involves the assembly of cytoplasmic domains with transmembrane domains to support NMR studies in nanodiscs (58). Those studies utilized Srt5M, and it is possible that some advantages may be found using Srt3M or Srt4M.
Our novel in vitro approach to lipidate a protein by SML, utilizing a short myr-peptide, can have broad implications for membrane targeting. We have illustrated lipidation using myristate; however, the application would be open to any synthetically available lipidated peptide and is readily generalized to N-or C-terminal lipidation. The approach provides an alternative for proteins that are difficult to myristoylate in vivo using a bacterial expression system. Additionally, adding a myr-peptide at the N terminus of a membrane protein such as gp78C and tethering it to the surface of a lipid bilayer (such as a nanodisc) provides the possibility to study a membrane protein in a lipid bilayer environment without the complexities of expression and reconstitution of complex trans-or integral membrane domains. Our extensive tests prove that high-level in vitro SML of myr-peptide can be achieved with virtually any protein engineered with one or two glycines at the N terminus. Because the SML is conducted in the presence of detergent (to prevent aggregation of the myr-protein), careful monitoring of the fold and function of the protein is necessary. Subsequent reconstitution in a nanodisc or large unilamellar vesicles yields unique opportunities for structural and biophysical investigations of such systems.
Overall, we have adapted the SML reaction for five different applications using different variants of SrtA. The recommendations based on the optimizations for our systems are described in Table 4. It should be noted that each system should be surveyed for the best Srt variant and conditions, and these optimizations provide examples and general recommendations.
Finally, we have focused on the illustration of the SML reactions using isotopic labeling for NMR spectroscopy as the motivation or readout of the utility of such ligations; however, it should be noted that any combinatorial engineering becomes feasible using optimized SML. In cases where the goal is to tag a system with a fluorophore, a paramagnetic tag, or other moiety, the tagging process may be developed for a single domain and then combined via SML to make the desired full-length protein.
One of the most common tagging tools involves creating single-Cys proteins by replacement of existing Cys residues and insertion of a new unique Cys residue at a desired site. The unique Cys may then be modified with a Cys-reacting tag. When dealing with mammalian proteins, one often encounters cases where it is impossible to mutate out cysteine residues and retain proper folding or function. However, for multidomain proteins, if one domain has a number of cysteine residues while a second domain has none, or if the second domain can be engineered to have only one Cys, then the single-Cys domain may be tagged and then ligated using SML to form the intact, functional protein. This example is an underlying driving force to our example of ligation of the PH domain to the ZA domain of ASAP1. Engineering a reactive single Cys into the PH domain can be feasible, but it is not possible to react the intact PZA with a tagging reagent without complications of reactions to other Cys residues in the ZA domain, followed by loss of function. However, as we have shown, it is possible to ligate the PH domain to the ZA domain, which, when using a pretagged PH domain, can yield a complete, tagged PZA protein. It would also be feasible to incorporate reactive side chains of nonnatural amino acids into one domain and then ligate this into a complex multidomain, active protein for use in biophysical and structural investigations. These approaches open the path to incorporate a range of tags for fluorescence-based binding measurements, spin-labeled distance measurements, and other novel reporter techniques.

Conclusions
The ligation of separate domains into a single intact, multidomain, functional protein is a significant enabling technology for many applications in structural biology. Our studies have concentrated on the combination of only two domains; how- Optimizing sortase A ligation for protein engineering ever, it is clearly possible to do multiple combinations. The ability to selectively report on a specific component of a complex system enables the full power of NMR spectroscopy and other biophysical methods to be applied in very complex systems. The sortase A-mediated ligation has been shown to be applicable to a variety of systems, and we have demonstrated that the range and efficiency can be expanded and simplified by utilizing one of several variants of SrtA. The variants can be selected and adapted for a wide range of conditions. The principles presented here can be applied to any transpeptidase and lead to further expansion of the toolkit for engineering complex proteins with selective reporting functions.

Protein constructs, expression, and purification
All specific protocols are described in detail in the supporting information. These protocols include all Srt enzymes, tobacco etch virus protease, all NTD and CTD segment proteins, and the membrane scaffold protein (MSP⌬H5).

Peptides
Peptides ranging in length from 19 to 30 amino acid residues were synthesized and obtained from LifeTein. These peptides were used as received. They are designated as NTD peptide, CTD peptide, and the myr-peptide (Fig. S2).

Ligation reaction conditions
Each test reaction was conducted in a 100-l volume in an Eppendorf tube at room temperature with shaking. Large-scale reactions can be conducted in volumes ranging from 2 to 50 ml. The reaction buffer (SML buffer) was 50 mM Tris, pH 7.8, 5 mM CaCl 2 , 100 mM NaCl, 0.2 mM TCEP. The stoichiometry for standard reaction conditions is NTD/CTD ϭ 1:1; however, in specific ligation applications, the stoichiometry has been modified to either 3:1 or 1:3, depending on the abundance or scarcity of reactants. The 3:1 ratio was found to be the most favorable, as the excess NTD (with C-terminal His 6 ) and SrtA variant can be removed easily by Ni-NTA magnetic beads. The concentration of Srt was found to be optimal at 5 M. The concentrationlimiting reactant (NTD or CTD, depending on the ratio) was kept at 20 M in linear ligations and reduced to 12.5 M in circularization ligations. Reaction times were screened for each Srt variant to determine the time for optimal ligation. Screening was accomplished by sampling 10 l of the test reaction mix at each time point, quenching with 10 l of SDS-PAGE sample buffer/dye, and visualization with SDS-PAGE. Reactions can be quenched by (i) adding EDTA to 20 mM, or (ii) in cases when ligation products are sensitive to EDTA (such as gp78C and PZA), Srt and excess of NTD can be captured and rapidly removed using Ni-NTA magnetic beads (GoldBio). We have generally not utilized SrtA inhibitors (30,38,39), because these classes of compounds are reactive with free Cys residues that are present in many of our protein domains. Table 4 summarizes the conditions for each application. Modifications were made to optimize the yield of ligated proteins as follows.
PZA ligation-For the ligation of PZA (containing a short linker and moderate to low stability of the domains), 10 mM L-arginine was added to the SML buffer, and the reaction was conducted at 4°C, to stabilize the components and ligated PZA. L-Arginine can be removed by buffer exchange following purification.
NW9 circularization-For NW9 circularization, 1 mM DDM was added to the SML buffer, and the reaction was performed at 4°C to prevent oligomerization of cNW9. Comparisons were conducted using 20 mM cholate and 1 mM DDM (Fig. 5B).
Lipidation of gp78C and Arf1-For lipidation of gp78C and Arf1, 20 mM cholate was included in the SML buffer to protect newly myristoylated protein from aggregation.

Purification of Srt-ligated proteins
At the conclusion of a large-scale SML, typically 10 -25 ml, the reaction mixture was adjusted to 500 mM NaCl and 20 mM imidazole and then mixed with Ni-NTA magnetic beads (10 l of beads/ml of reaction) to trap and remove Srt and excess NTD, according to the manufacturer's recommendations. The supernatant, containing ligated protein and excess CTD, was purified further based on the properties of each system. The following procedures were followed for each ligation described in this study.
Ligated PZA-The supernatant from Ni-magnetic beads was diluted to reduce the concentration of NaCl to 125 mM and applied to a 5-ml prepacked SP ion-exchange column (GE Healthcare). Unreacted ZA can be found in the flowthrough; ligated PZA was eluted at around 26% Buffer B (1 M NaCl, 50 mM Tris, pH 7.4, 0.5 mM TCEP) (Fig. S4C, chromatogram and SDS-PAGE).
Ligated myr-gp78C and myr-Arf1-The SML reaction mixture was subjected to Ni-NTA magnetic beads as described above. Because the reaction has been optimized to Ͼ90% completion (Ͼ90% of CTD is ligated to become myr-gp78C or myr-arf1), no further purification is needed, and myr-protein can be concentrated and used for nanodisc membrane targeting. 20 mM cholate was maintained in the reaction and the final sample, to prevent myr-protein from aggregation. Any residual nonmyristoylated protein will not be incorporated into nanodiscs and will separate in the SEC stage of nanodisc preparation (45).
Ligated cNW9 -The SML reaction in 1 mM DDM goes to 100% completion; hence, after removing Srt4M with Ni-NTA magnetic beads, the material can be concentrated and immediately used to form nanodiscs.

Optimizing sortase A ligation for protein engineering
Polar Lipids) at a molar ratio of 9:1. The lipids were combined with membrane scaffold protein MSP⌬H5 in a ratio of 50:1, whereas myr-srtArf1:MSP⌬H5 was kept at 1:1 (enabling the incorporation of one myr-srtArf1 on either side of the membrane nanodisc). GTP␥S (2 mM) is maintained in the assembly mixture to aid the conversion of Arf1 from the GDP to the GTP form. After a 2-h incubation at ambient temperature, 1 g of washed Bio-Beads (Bio-Rad) is added per ml of mixture to absorb detergents. The suspension is rotated at ambient temperature overnight. The supernatant is pooled and purified on an SEC column (Superdex 200 Increase 10/300, GE Healthcare) in 50 mM Tris, pH 7.4, 150 mM NaCl, 0.5 mM TCEP buffer. Properly assembled nanodisc with cargo (myr-srtArf1) elutes around 14 ml of the 24-ml column volume. Fractions are visualized on SDS-PAGE, and the correct peak should contain an equal amount of MSP⌬H5 and myr-srtArf1 (1:1). These fractions are pooled and concentrated for NMR and the ArfGAP activity assay. LC-MS spectrometry and dynamic light scattering measurements are routinely performed to ensure the quality of assembled nanodisc (45).

ArfGAP activity assay for myr-Arf1 and PZA
GTP hydrolysis of ND-anchored myr-Arf1⅐GTP and myr-srtArf1⅐GTP by ASAP1-PZA and srtPZA were performed as reported previously (45). The percentage of GTP bound to myr-Arf1 hydrolyzed in 3 min was plotted against the ASAP1-PZA concentration. The total concentration of exposed lipids was 500 M, and the Arf1 concentration was 5 M. All fluorescence experiment measurements were performed with a Horiba Fluoromax-4 spectrofluorometer in a 120-l quartz cell. The sample was thermostated at 22°C. The time constant of the fluorometer was set to 500 ms for the kinetic measurements. The excitation wavelength ( ex ) and emission wavelength ( em ) were 297 and 337 nm, respectively. The excitation and emission bandwidth were set to 4 nm. Induction of hydrolysis of myr-srtArf1GTP to myr-srtArf1GDP was determined by following the change in tryptophan fluorescence, as described previously. ASAP1-PZA was titrated into the reaction containing ND-bound myr-srtArf1 GTP as a substrate. Data points are represented as the mean value Ϯ S.D. from three separate experiments on the same sample.