SORTCERY—A High–Throughput Method to Affinity Rank Peptide Ligands

doi:10.1016/j.jmb.2014.09.025

Journal of Molecular Biology

Volume 427, Issue 11, 5 June 2015, Pages 2135-2150

https://doi.org/10.1016/j.jmb.2014.09.025 Get rights and content

Highlights

•
Relating sequence to binding function is a fundamental goal in proteomics.
•
SORTCERY determines binding semi-quantitatively for large peptide libraries.
•
The method combines cell sorting, deep sequencing and computational post-analysis.
•
We ranked 1000 peptide ligands of Bcl-x_L with high accuracy using SORTCERY.
•
This approach has high potential to provide binding data for many protein families.

Abstract

Uncovering the relationships between peptide and protein sequences and binding properties is critical for successfully predicting, re-designing and inhibiting protein–protein interactions. Systematically collected data that link protein sequence to binding are valuable for elucidating determinants of protein interaction but are rare in the literature because such data are experimentally difficult to generate. Here we describe SORTCERY, a high-throughput method that we have used to rank hundreds of yeast-displayed peptides according to their affinities for a target interaction partner. The procedure involves fluorescence-activated cell sorting of a library, deep sequencing of sorted pools and downstream computational analysis. We have developed theoretical models and statistical tools that assist in planning these stages. We demonstrate SORTCERY's utility by ranking 1026 BH3 (Bcl-2 homology 3) peptides with respect to their affinities for the anti-apoptotic protein Bcl-x_L. Our results are in striking agreement with measured affinities for 19 individual peptides with dissociation constants ranging from 0.1 to 60 nM. High-resolution ranking can be used to improve our understanding of sequence–function relationships and to support the development of computational models for predicting and designing novel interactions.

Graphical abstract

Introduction

Understanding the relationships between protein sequences and their functions is a fundamental objective of protein science. Our ability to map these relationships has improved with advances in technology. Until recently, the ability to decode information from experiments that characterize protein function was limited by the need to clone and/or individually sequence every gene of interest at relatively low throughput. Next-generation sequencing has changed this, and a number of important publications describe techniques that combine phenotypic screening and deep sequencing to investigate how protein sequence influences structure, folding, binding or organism growth/fitness [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Araya and Fowler have written a good review of recent advances [11]. Generally, the experimental approach involves constructing a library of many different mutant variants of a protein of interest. The library is then screened/selected for some property or function. The retained library pool is sequenced, and features of sequences that are observed with high frequency are implicated as important for the relevant property. In this introduction, we discuss applications of this approach to the problem of determining protein interactions with a target.

Interaction systems that have been subjected to a screening-plus-sequencing approach include PDZ domain peptide ligands [4], [5], PinWW domain peptide ligands [6], influenza hemagglutinin inhibitors [7], LYN kinase interaction partners [8], computationally designed digoxigenin binders [9] and Bcl-2-type receptor/BH3 (Bcl-2 homology 3) complexes [10]. Experiments varied in library size (~ 1000 to ~ 600,000 members) and in the type of screening used to detect binding (phage display, yeast display, ribosome display, bacterial two-hybrid). These studies are exciting milestones that dramatically expand the amount of data available to describe protein interactions. However, it is important to consider what information the data from various interaction screens contain and how it can be used. A standard approach has been to quantify the enrichment of each sequence or point mutation among library members classified as binders, relative to the unselected library, and to use this as a proxy for affinity. This may be problematic, as it relies on adequate deep sequencing of the starting library and bias-free amplification of sequences throughout screening and sample preparation. In fact, Derda et al. found that the relative abundance of phage-displayed peptides could be significantly skewed if phages were amplified after a selection step [12]. McLaughlin et al. have reported data that support an impressive correlation of enrichment scores with binding affinities [5], but the appropriateness and resolution of new methods for affinity determination is not well established.

Recently, Kinney et al. pioneered a detailed approach to the screen-and-sequence scheme and applied it to measure protein–DNA interactions [13]. Adopting the expression level of GFP as an indicator of transcription factor binding strength, they employed fluorescence-activated cell sorting (FACS) to sort a bacterial library of ~ 20,000 mutant lacZ promoters with different activities into pools and decoded these by deep sequencing. A maximum-likelihood computational routine transformed the sequencing data into a position-specific scoring matrix that described the DNA-binding affinity of the transcription factor. In a similar approach, Sharon et al. monitored the affinity of transcription factors for hundreds of mutant yeast promoters that were coupled to YFP and derived a ranking of transcription factor activities [14].

Sharon et al. and Kinney et al. employed multi-bin sorts that increased the resolution of their experiments (i.e., the ability to distinguish between two different dissociation constants or equivalent measures of affinity) and permitted the analysis of frequency distributions rather than the more difficult to interpret enrichment values. However, issues remain to be addressed. First, only the expression of fluorescent protein was monitored in the protein–DNA binding studies, without accounting for variations in transcription factor levels that impact reporter gene expression. Prior work supports the importance of a correction. Liang et al. developed a two-color FACS screen for RNA gene regulatory devices [15]. One fluorescence signal reported the device activity, and the other one was a measure of basic transcription levels. This setup dramatically increased the resolution of the sorting scheme in comparison to a one-color strategy. Similarly, Dutta et al. gauged the stability of protein mutants by fragment reconstitution and yeast display [16]. They observed the expression and display of a mutant fragment with one fluorescence signal and the binding of a complementary fragment with another signal. Their findings suggested a correlation between the stabilities of the protein mutants and the ratio of the two fluorescence signals. Chao et al. showed qualitatively that a mixture of two yeast displayed antibodies with very similar affinities for a target can be enriched for the stronger binder by FACS when expression levels are taken into account. Second, Kinney et al. and Sharon et al. considered averages of their detailed experimental information during computational analyses [13], [14]. They calculated position-specific scoring matrices and mean expression values, respectively. Cooperative effects and signal variance may limit the accuracy of models derived with such assumptions.

High-throughput characterization of protein interactions will be most useful if it can deliver accurate estimates of affinity or affinity rankings. For example, such estimates could enable the construction of more accurate predictive models or could guide the refinement of protein designs [7]. We present a protocol that uses a rigorous sorting strategy in combination with downstream computational processing that returns a precise affinity ranking of individual sequences. Taking advantage of yeast-surface display, in which a signal resulting from a peptide binding to a protein can be normalized by the expression level of that peptide, we developed a theoretical framework to derive the expected signals for binders of different affinities. Experimental sorting using FACS, plus library sequencing, yielded coarse-grained signal distributions for ~ 1000 peptide-displaying clones in a single experiment. Computational processing generated a global ranking of peptide affinities, and our theoretical model allowed a detailed statistical analysis of sources of error in the final results. Because existing methods are already capable of discerning strong from weak and non-binders, we have focused on discriminating tight binders within a 500-fold range of affinities (0.1–60 nM). Accurate data in this regime may aid in the design of very strong binders that can be important therapeutic and diagnostic agents [17], [18], [19]. We conducted our study using a small library of about 1000 yeast-displayed BH3 peptides that bind to Bcl-x_L, a key regulator of apoptosis. High-affinity binders of Bcl-x_L are of great interest due to their potential for diagnosing or surmounting apoptotic blockades in numerous cancers [20], [21], [22].

Section snippets

Results

Our high-throughput method called SORTCERY analyzes the binding of yeast-displayed peptide ligands to a target molecule and returns a ranking for the affinities of all considered ligands. The multi-step procedure involves sorting a yeast-displayed library into several bins, deep sequencing all bins, and analyzing the resulting data (see Fig. 1). Our optimized sorting strategy is based on a theoretical model relating two fluorescence signals to the peptide-target dissociation constant. The model

Discussion

Biophysical characterization of the binding of proteins and their mutational variants is typically conducted using low-throughput one-at-a-time analyses. Higher-throughput studies can provide qualitative information, and methods for obtaining higher resolution are being explored [11], [13], [14]. In this study, we ranked 1026 BH3 sequences based on affinity for Bcl-x_L over a dynamic range of dissociation constants from ~ 0.1 nM to ~ 60 nM in a single experiment. We gauged the effect of combinations

Yeast display setup

The yeast-surface display experiment was similar to that described by Dutta et al. and used many of the same reagents [23]. Briefly, we displayed BH3 peptides fused to the C-terminus of the Aga2 yeast cell-surface protein. The construct included HA and FLAG tags N- and C-terminal to the BH3 peptide, respectively. All BH3 peptides were variants of either the Bim or the Puma human BH3 sequences. The Bim wild-type sequence consisted of the 31 residues RPEIWIAQELRRIGDEFNAYYARRVFLNNYQ and the Puma

Acknowledgements

The authors thank Christos Kougentakis for help with the experiments. The authors express their gratitude to the Swanson Biotechnology Center Flow Cytometry Facility and the Massachusetts Institute of Technology BioMicro Center for technical support. This study was funded by National Institutes of Health award GM096466 to A.K. and German Merit Foundation grant no. RE 3111/1-1 to L.R.

References (38)

K.A. Reynolds et al.
Hot spots for allosteric regulation on protein surfaces
Cell
(2011)
J. DeBartolo et al.
Predictive Bcl-2 family binding models rooted in experiment or structure
J Mol Biol
(2012)
C.L. Araya et al.
Deep mutational scanning: assessing protein function on a massive scale
Trends Biotechnol
(2011)
S. Dutta et al.
High-throughput analysis of the protein sequence stability landscape using a quantitative yeast surface two-hybrid system and fragment reconstitution
J Mol Biol
(2008)
L. Zhu et al.
High-affinity peptide against MT1-MMP for in vivo tumor imaging
J Control Release
(2011)
H. Zhang et al.
Characterization of high-affinity peptides and their feasibility for use in nanotherapeutics targeting leukemia stem cells
Nanomed Nanotechnol
(2012)
S. Dutta et al.
Determinants of BH3 binding specificity for Mcl-1 versus Bcl-xL
J Mol Biol
(2010)
G. Pal et al.
Comprehensive and quantitative mapping of energy landscapes for protein–protein interactions by rapid combinatorial scanning
J Biol Chem
(2006)
G. Grigoryan et al.
Structure-based prediction of bZIP partnering specificity
J Mol Biol
(2006)
R.T. Hietpas et al.
Experimental illumination of a fitness landscape
Proc Natl Acad Sci USA
(2011)

B.J. DeKosky et al.

High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire

Nat Biotechnol

(2013)

A. Ernst et al.

Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing

Mol Biosyst

(2010)

R.N. McLaughlin et al.

The spatial architecture of protein function and adaptation

Nature

(2012)

D.M. Fowler et al.

High-resolution mapping of protein sequence–function relationships

Nat Methods

(2010)

T.A. Whitehead et al.

Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

Nat Biotechnol

(2012)

J. Zhu et al.

Protein interaction discovery using parallel analysis of translated ORFs (PLATO)

Nat Biotechnol

(2013)

C.E. Tinberg et al.

Computational design of ligand-binding proteins with high affinity and selectivity

Nature

(2013)

R. Derda et al.

Diversity of phage-displayed libraries of peptides during panning and amplification

Molecules

(2011)

J.B. Kinney et al.

Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence

Proc Natl Acad Sci USA

(2010)

Cited by (47)

Protein engineering via sequence-performance mapping
2023, Cell Systems
Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies—experimental and computational—enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance—e.g., binding, catalytic efficiency, biological activity, and developability—thereby advancing fundamental protein science and facilitating protein discovery and evolution.
The next wave of interactomics: Mapping the SLiM-based interactions of the intrinsically disordered proteome
2023, Current Opinion in Structural Biology
Short linear motifs (SLiMs) are a unique and ubiquitous class of protein interaction modules that perform key regulatory functions and drive dynamic complex formation. For decades, interactions mediated by SLiMs have accumulated through detailed low-throughput experiments. Recent methodological advances have opened this previously underexplored area of the human interactome to high-throughput protein–protein interaction discovery. In this article, we discuss that SLiM-based interactions represent a significant blind spot in the current interactomics data, introduce the key methods that are illuminating the elusive SLiM-mediated interactome of the human cell on a large scale, and discuss the implications for the field.
Mutational fitness landscape and drug resistance
2023, Current Opinion in Structural Biology
Robust technology has been developed to systematically quantify fitness landscapes that provide valuable opportunities to improve our understanding of drug resistance and define new avenues to develop drugs with reduced resistance susceptibility. We outline the critical importance of drug resistance studies and the potential for fitness landscape approaches to contribute to this effort. We describe the major technical advancements in mutational scanning, which is the primary approach used to quantify protein fitness landscapes. There are many complex steps to consider in planning and executing mutational scanning projects including developing a selection scheme, generating mutant libraries, tracking the frequency of variants using next-generation sequencing, and processing and interpreting the data. Key experimental parameters impacting each of these steps are discussed to aid in planning fitness landscape studies. There is a strong need for improved understanding of drug resistance, and fitness landscapes provide a promising new approach.
Phage Display for Imaging Agent Development
2021, Molecular Imaging: Principles and Practice
Phage display has been demonstrated as a powerful approach in the identification of lead compounds for the development of molecular imaging agents. It is an economical technique that covers a large area of diversity space and offers a high-throughput screening process with the availability of many types of phage clones and libraries, including peptides, cDNA, and antibodies. Isolated molecules identified from phage display determined to have optimal in vivo pharmacokinetics, specificity, and affinity can be labeled with radioisotopes, fluorophores, and nanoparticles for use in molecular imaging applications. The aim of this chapter is to provide a practical overview of the principles and applications of phage display in the context of identifying lead compounds for imaging agents. Key concepts on library constructions and selections, biopanning process, and lead compound and target identifications are covered to provide a state-of-the-art summary on molecular imaging agent development utilizing phage display technique.
The covalent SNAP tag for protein display quantification and low-pH protein engineering
2020, Journal of Biotechnology
Yeast display has become an important tool for modern biotechnology with many advantages for eukaryotic protein engineering. Antibody-based peptide interactions are often used to quantify yeast surface expression (e.g., by fusing a target protein to a FLAG, Myc, polyhistidine, or other peptide tag). However, antibody-antigen interactions require high stability for accurate quantification, and conventional tag systems based on such interactions may not be compatible with a low pH environment. In this study, a SNAP tag was introduced to a yeast display platform to circumvent disadvantages of conventional antibody display tags at low pH. SNAP forms a covalent bond with its small-molecule substrate, enabling precise and pH-independent protein display tagging. We compared the SNAP tag to conventional antibody-based peptide fusion and to direct fluorescent domain fusion using antibody fragment crystallizable (Fc) gene libraries as a case study in low pH protein engineering. Our results demonstrated that covalent SNAP tags can effectively quantify protein-surface expression at low pH, enabling the enrichment of Fc variants with increased affinity at pH 6.0 to the neonatal Fc receptor (FcRn). Incorporation of a covalent SNAP tag thus overcomes disadvantages of conventional antibody-based expression tags and enables protein-engineering applications outside of physiological pH.
Tertiary Structural Motif Sequence Statistics Enable Facile Prediction and Design of Peptides that Bind Anti-apoptotic Bfl-1 and Mcl-1
2019, Structure
Citation Excerpt :
Binding interfaces were redefined using trimmed peptides, by taking all peptide atoms plus protein residues within 8 Å of any peptide atom. Structural scoring functions dTERMen (described above), FoldX4.0 and Rosetta were tested for their ability to predict peptide-protein binding affinity using binding data obtained using the SORTCERY protocol (Alford et al., 2017; Lewis and Kuhlman, 2011; Reich et al., 2015; Schymkowitz et al., 2005). Scoring was based on trimmed-peptide structures.
Understanding the relationship between protein sequence and structure well enough to design new proteins with desired functions is a longstanding goal in protein science. Here, we show that recurring tertiary structural motifs (TERMs) in the PDB provide rich information for protein-peptide interaction prediction and design. TERM statistics can be used to predict peptide binding energies for Bcl-2 family proteins as accurately as widely used structure-based tools. Furthermore, design using TERM energies (dTERMen) rapidly and reliably generates high-affinity peptide binders of anti-apoptotic proteins Bfl-1 and Mcl-1 with just 15%–38% sequence identity to any known native Bcl-2 family protein ligand. High-resolution structures of four designed peptides bound to their targets provide opportunities to analyze the strengths and limitations of the computational design method. Our results support dTERMen as a powerful approach that can complement existing tools for protein engineering.

View all citing articles on Scopus

^†: Present address: S. Dutta, Janssen Research and Development, Welsh and McKean Roads, Spring House, PA 19477, USA.

View full text

Journal of Molecular Biology

SORTCERY—A High–Throughput Method to Affinity Rank Peptide Ligands

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Results

Discussion

Yeast display setup

Acknowledgements

Cell

J Mol Biol

Trends Biotechnol

J Mol Biol

J Control Release

Nanomed Nanotechnol

J Mol Biol

J Biol Chem

J Mol Biol

Experimental illumination of a fitness landscape

Proc Natl Acad Sci USA

High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire

Nat Biotechnol

Coevolution of PDZ domain-ligand interactions analyzed by high-throughput phage display and deep sequencing

Mol Biosyst

The spatial architecture of protein function and adaptation

Nature

High-resolution mapping of protein sequence–function relationships

Nat Methods

Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

Nat Biotechnol

Protein interaction discovery using parallel analysis of translated ORFs (PLATO)

Nat Biotechnol

Computational design of ligand-binding proteins with high affinity and selectivity

Nature

Diversity of phage-displayed libraries of peptides during panning and amplification

Molecules

Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence

Proc Natl Acad Sci USA