SORTCERY—A High–Throughput Method to Affinity Rank Peptide Ligands

https://doi.org/10.1016/j.jmb.2014.09.025Get rights and content

Highlights

  • Relating sequence to binding function is a fundamental goal in proteomics.

  • SORTCERY determines binding semi-quantitatively for large peptide libraries.

  • The method combines cell sorting, deep sequencing and computational post-analysis.

  • We ranked 1000 peptide ligands of Bcl-xL with high accuracy using SORTCERY.

  • This approach has high potential to provide binding data for many protein families.

Abstract

Uncovering the relationships between peptide and protein sequences and binding properties is critical for successfully predicting, re-designing and inhibiting protein–protein interactions. Systematically collected data that link protein sequence to binding are valuable for elucidating determinants of protein interaction but are rare in the literature because such data are experimentally difficult to generate. Here we describe SORTCERY, a high-throughput method that we have used to rank hundreds of yeast-displayed peptides according to their affinities for a target interaction partner. The procedure involves fluorescence-activated cell sorting of a library, deep sequencing of sorted pools and downstream computational analysis. We have developed theoretical models and statistical tools that assist in planning these stages. We demonstrate SORTCERY's utility by ranking 1026 BH3 (Bcl-2 homology 3) peptides with respect to their affinities for the anti-apoptotic protein Bcl-xL. Our results are in striking agreement with measured affinities for 19 individual peptides with dissociation constants ranging from 0.1 to 60 nM. High-resolution ranking can be used to improve our understanding of sequence–function relationships and to support the development of computational models for predicting and designing novel interactions.

Introduction

Understanding the relationships between protein sequences and their functions is a fundamental objective of protein science. Our ability to map these relationships has improved with advances in technology. Until recently, the ability to decode information from experiments that characterize protein function was limited by the need to clone and/or individually sequence every gene of interest at relatively low throughput. Next-generation sequencing has changed this, and a number of important publications describe techniques that combine phenotypic screening and deep sequencing to investigate how protein sequence influences structure, folding, binding or organism growth/fitness [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Araya and Fowler have written a good review of recent advances [11]. Generally, the experimental approach involves constructing a library of many different mutant variants of a protein of interest. The library is then screened/selected for some property or function. The retained library pool is sequenced, and features of sequences that are observed with high frequency are implicated as important for the relevant property. In this introduction, we discuss applications of this approach to the problem of determining protein interactions with a target.

Interaction systems that have been subjected to a screening-plus-sequencing approach include PDZ domain peptide ligands [4], [5], PinWW domain peptide ligands [6], influenza hemagglutinin inhibitors [7], LYN kinase interaction partners [8], computationally designed digoxigenin binders [9] and Bcl-2-type receptor/BH3 (Bcl-2 homology 3) complexes [10]. Experiments varied in library size (~ 1000 to ~ 600,000 members) and in the type of screening used to detect binding (phage display, yeast display, ribosome display, bacterial two-hybrid). These studies are exciting milestones that dramatically expand the amount of data available to describe protein interactions. However, it is important to consider what information the data from various interaction screens contain and how it can be used. A standard approach has been to quantify the enrichment of each sequence or point mutation among library members classified as binders, relative to the unselected library, and to use this as a proxy for affinity. This may be problematic, as it relies on adequate deep sequencing of the starting library and bias-free amplification of sequences throughout screening and sample preparation. In fact, Derda et al. found that the relative abundance of phage-displayed peptides could be significantly skewed if phages were amplified after a selection step [12]. McLaughlin et al. have reported data that support an impressive correlation of enrichment scores with binding affinities [5], but the appropriateness and resolution of new methods for affinity determination is not well established.

Recently, Kinney et al. pioneered a detailed approach to the screen-and-sequence scheme and applied it to measure protein–DNA interactions [13]. Adopting the expression level of GFP as an indicator of transcription factor binding strength, they employed fluorescence-activated cell sorting (FACS) to sort a bacterial library of ~ 20,000 mutant lacZ promoters with different activities into pools and decoded these by deep sequencing. A maximum-likelihood computational routine transformed the sequencing data into a position-specific scoring matrix that described the DNA-binding affinity of the transcription factor. In a similar approach, Sharon et al. monitored the affinity of transcription factors for hundreds of mutant yeast promoters that were coupled to YFP and derived a ranking of transcription factor activities [14].

Sharon et al. and Kinney et al. employed multi-bin sorts that increased the resolution of their experiments (i.e., the ability to distinguish between two different dissociation constants or equivalent measures of affinity) and permitted the analysis of frequency distributions rather than the more difficult to interpret enrichment values. However, issues remain to be addressed. First, only the expression of fluorescent protein was monitored in the protein–DNA binding studies, without accounting for variations in transcription factor levels that impact reporter gene expression. Prior work supports the importance of a correction. Liang et al. developed a two-color FACS screen for RNA gene regulatory devices [15]. One fluorescence signal reported the device activity, and the other one was a measure of basic transcription levels. This setup dramatically increased the resolution of the sorting scheme in comparison to a one-color strategy. Similarly, Dutta et al. gauged the stability of protein mutants by fragment reconstitution and yeast display [16]. They observed the expression and display of a mutant fragment with one fluorescence signal and the binding of a complementary fragment with another signal. Their findings suggested a correlation between the stabilities of the protein mutants and the ratio of the two fluorescence signals. Chao et al. showed qualitatively that a mixture of two yeast displayed antibodies with very similar affinities for a target can be enriched for the stronger binder by FACS when expression levels are taken into account. Second, Kinney et al. and Sharon et al. considered averages of their detailed experimental information during computational analyses [13], [14]. They calculated position-specific scoring matrices and mean expression values, respectively. Cooperative effects and signal variance may limit the accuracy of models derived with such assumptions.

High-throughput characterization of protein interactions will be most useful if it can deliver accurate estimates of affinity or affinity rankings. For example, such estimates could enable the construction of more accurate predictive models or could guide the refinement of protein designs [7]. We present a protocol that uses a rigorous sorting strategy in combination with downstream computational processing that returns a precise affinity ranking of individual sequences. Taking advantage of yeast-surface display, in which a signal resulting from a peptide binding to a protein can be normalized by the expression level of that peptide, we developed a theoretical framework to derive the expected signals for binders of different affinities. Experimental sorting using FACS, plus library sequencing, yielded coarse-grained signal distributions for ~ 1000 peptide-displaying clones in a single experiment. Computational processing generated a global ranking of peptide affinities, and our theoretical model allowed a detailed statistical analysis of sources of error in the final results. Because existing methods are already capable of discerning strong from weak and non-binders, we have focused on discriminating tight binders within a 500-fold range of affinities (0.1–60 nM). Accurate data in this regime may aid in the design of very strong binders that can be important therapeutic and diagnostic agents [17], [18], [19]. We conducted our study using a small library of about 1000 yeast-displayed BH3 peptides that bind to Bcl-xL, a key regulator of apoptosis. High-affinity binders of Bcl-xL are of great interest due to their potential for diagnosing or surmounting apoptotic blockades in numerous cancers [20], [21], [22].

Section snippets

Results

Our high-throughput method called SORTCERY analyzes the binding of yeast-displayed peptide ligands to a target molecule and returns a ranking for the affinities of all considered ligands. The multi-step procedure involves sorting a yeast-displayed library into several bins, deep sequencing all bins, and analyzing the resulting data (see Fig. 1). Our optimized sorting strategy is based on a theoretical model relating two fluorescence signals to the peptide-target dissociation constant. The model

Discussion

Biophysical characterization of the binding of proteins and their mutational variants is typically conducted using low-throughput one-at-a-time analyses. Higher-throughput studies can provide qualitative information, and methods for obtaining higher resolution are being explored [11], [13], [14]. In this study, we ranked 1026 BH3 sequences based on affinity for Bcl-xL over a dynamic range of dissociation constants from ~ 0.1 nM to ~ 60 nM in a single experiment. We gauged the effect of combinations

Yeast display setup

The yeast-surface display experiment was similar to that described by Dutta et al. and used many of the same reagents [23]. Briefly, we displayed BH3 peptides fused to the C-terminus of the Aga2 yeast cell-surface protein. The construct included HA and FLAG tags N- and C-terminal to the BH3 peptide, respectively. All BH3 peptides were variants of either the Bim or the Puma human BH3 sequences. The Bim wild-type sequence consisted of the 31 residues RPEIWIAQELRRIGDEFNAYYARRVFLNNYQ and the Puma

Acknowledgements

The authors thank Christos Kougentakis for help with the experiments. The authors express their gratitude to the Swanson Biotechnology Center Flow Cytometry Facility and the Massachusetts Institute of Technology BioMicro Center for technical support. This study was funded by National Institutes of Health award GM096466 to A.K. and German Merit Foundation grant no. RE 3111/1-1 to L.R.

References (38)

  • B.J. DeKosky et al.

    High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire

    Nat Biotechnol

    (2013)
  • A. Ernst et al.

    Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing

    Mol Biosyst

    (2010)
  • R.N. McLaughlin et al.

    The spatial architecture of protein function and adaptation

    Nature

    (2012)
  • D.M. Fowler et al.

    High-resolution mapping of protein sequence–function relationships

    Nat Methods

    (2010)
  • T.A. Whitehead et al.

    Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    Nat Biotechnol

    (2012)
  • J. Zhu et al.

    Protein interaction discovery using parallel analysis of translated ORFs (PLATO)

    Nat Biotechnol

    (2013)
  • C.E. Tinberg et al.

    Computational design of ligand-binding proteins with high affinity and selectivity

    Nature

    (2013)
  • R. Derda et al.

    Diversity of phage-displayed libraries of peptides during panning and amplification

    Molecules

    (2011)
  • J.B. Kinney et al.

    Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence

    Proc Natl Acad Sci USA

    (2010)
  • Cited by (47)

    • Mutational fitness landscape and drug resistance

      2023, Current Opinion in Structural Biology
    • Phage Display for Imaging Agent Development

      2021, Molecular Imaging: Principles and Practice
    • Tertiary Structural Motif Sequence Statistics Enable Facile Prediction and Design of Peptides that Bind Anti-apoptotic Bfl-1 and Mcl-1

      2019, Structure
      Citation Excerpt :

      Binding interfaces were redefined using trimmed peptides, by taking all peptide atoms plus protein residues within 8 Å of any peptide atom. Structural scoring functions dTERMen (described above), FoldX4.0 and Rosetta were tested for their ability to predict peptide-protein binding affinity using binding data obtained using the SORTCERY protocol (Alford et al., 2017; Lewis and Kuhlman, 2011; Reich et al., 2015; Schymkowitz et al., 2005). Scoring was based on trimmed-peptide structures.

    View all citing articles on Scopus

    Present address: S. Dutta, Janssen Research and Development, Welsh and McKean Roads, Spring House, PA 19477, USA.

    View full text