Engineering of biomolecules by bacteriophage directed evolution.

Conventional in vivo directed evolution methods have primarily linked the biomolecule's activity to bacterial cell growth. Recent developments instead rely on the conditional growth of bacteriophages (phages), viruses that infect and replicate within bacteria. Here we review recent phage-based selection systems for in vivo directed evolution. These approaches have been applied to evolve a wide range of proteins including transcription factors, polymerases, proteases, DNA-binding proteins, and protein-protein interactions. Advances in this field expand the possible applications of protein and RNA engineering. This will ultimately result in new biomolecules with tailor-made properties, as well as giving us a better understanding of basic evolutionary processes.


Introduction
Protein engineering enables the development of valuable biomolecules for pharmaceutical and biotechnological purposes. There are generally two strategies to guide protein engineering: rational design or directed evolution ( Figure 1). Rational design usually uses computational tools and structural considerations to identify beneficial mutations in the protein of interest [1]. Recent advances in this strategy even allow the design of proteins completely de novo [2][3][4][5][6]. In comparison, directed evolution mimics natural evolution and starts with a population of genotype(s) and then proceeds with the iterative generation of genotype diversity and a selection based on linked phenotype activity. It is applied when too little structural or biochemical information is available to guide engineering. In many cases, these two strategies can be combined in a semi-rational approach to improve the activity of biomolecules [7,8]. This illustrates how the method must be chosen to fit the particular problem.
A variety of directed evolution techniques have been developed that employ customized gene circuits [9][10][11][12]. One commonly used approach is to link the target protein's activity to cell growth, which is particularly suitable when the evolving gene directly improves cellular fitness [13][14][15]. Alternatively, the use of phage particles offers a convenient way to uncouple the target protein's activity from the fitness function of a cell. Instead, an artificial genetic circuit couples the evolving protein's function to increasingly efficient production of phage packaging the gene of interest [16 ].
Directed evolution requires genotypic diversity in the gene of interest and this can either be achieved in vivo or in vitro. In vivo mutagenesis relies on intracellular modification of the target gene [17 , 18,19] whereas in vitro mutagenesis can be achieved extracellularly by chemical modification [20], ultraviolet irradiation [21], or polymerase chain reaction (PCR) [22]. PCR-based methods generally employ an error-prone polymerase or oligonucleotides that contain randomized bases at the desired positions. Chemical mutagenesis and irradiation are less commonly-used methods because of the lack of uniform mutational spectra [20,23]. By making randomized libraries or using a progressive series of mutations, it is possible to explore the 'design space' of a target gene, ultimately enabling the engineering of new proteins.
In this review, we first discuss the requirements for using phages to evolve biomolecules. We then focus on new directed evolution methods based on conditional phage replication that have been developed thanks to advances in molecular and synthetic biology.

Re-engineering phage-host genetic interactions to select functional biomolecules
In vivo evolution systems allow selecting for more complex functions than in vitro methods (e.g. phage display) which are only suitable for binary protein-molecule interactions [24] (Figure 2a). By contrast, intracellular evolution potentially allows selection for multi-step processes, as long as they can be linked to genotype survival [25 ]. For example, intracellular processes can facilitate the simultaneous mutation and selection of the gene of interest. Furthermore, it enables the use of counterselections against an undesired biomolecule function [26]. Another advantage of intracellular evolution is the subsequent compatibility of evolved genes or complex gene networks with the entire host cell machinery, as these have to function in a host cell context. To exploit these advantages, alternative phage-assisted directed evolution platforms have been developed.
To allow enrichment of functional genes, phage selection systems require a link between the desired phenotype and conditional phage replication. This can be achieved by removing an essential gene required for phage replication from the phage genome and linking its expression to the function of the evolving biomolecule. Alternatively, this gene (or genes) could be a host co-factor required by the phage replication but dispensable to the cell (to allow cell survival in the uninfected cells that are required as a host reservoir). However, the only approach developed so far rely on moving essential genes from the phage to the host cell or its associated plasmids [16 ,27 ] (Figure 2b,c). These systems may be classified according to the degree of phage engineering involved, where only a single gene may be moved or practically all of them.
The evolving biomolecule has to be encoded in the phage and a genetic system has to be designed to allow a functional molecule to activate the expression of the essential gene ( positive selection). When the evolving biomolecule is able to induce the expression of the missing gene, infectious virions will package the DNA encoding the biomolecule, promoting its survival. The conditional expression of the essential gene can be done at the transcriptional or post-transcriptional levels, depending on the biomolecule to be evolved (e.g. a transcription factor or a riboregulator).
Alternatively, selection may consist in designing a conditional interference with phage replication if a biomolecule is functional (negative selection). This is used to penalize any unwanted activity such as the original parental function of the biomolecule. The selection can also be complex or variable, where the stringency of positive and negative selection can be modulated exogenously [26].
Many alternative phage-host systems can in principle be chosen for the evolution of biomolecules depending on the application. For instance, if one wanted to evolve a photosynthetic protein, one might choose a cyanobacterium and one of its known phages. The disadvantage of such approaches is that the phage biology is not well characterized. Consequently, in this article we will focus on Escherichia coli due to the lack of reported works with other organisms. The E. coli phages M13 [28], T4 [29,30], T7 [31] or l [32] have been used to optimize protein function and stability with phage display, although M13 has been the only phage vector used to evolve biomolecules in vivo thus far.

Evolving biomolecules through positive selection
Recently, a new method to evolve biomolecules using M13 was developed, using a redesign of the host to implement a positive selection: Phage-Assisted Continuous Evolution (PACE) describes a general approach for the directed evolution of proteins in vivo [16 ]. Using PACE, new T7 RNA polymerase (RNAP) variants against a T3 promoter have been evolved, which are not bound by the wild-type T7 RNAP. For this, the minor coat protein pIII is replaced by the evolving gene Engineering of biomolecules by phage directed evolution Brö del, Isalan and Jaramillo 33

Current Opinion in Biotechnology
Protein engineering by rational design or directed evolution. (a) Rational design uses computational tools as well as structural or other biochemical knowledge to identify beneficial mutations in the protein of interest. These mutations are inserted into the gene of interest (targeted mutagenesis) which is then expressed in host cells. Functional analysis for each protein variant is performed to confirm improved activity. (b) Directed evolution is applied when too little structural or biochemical information is available to guide engineering. Mutations in the gene of interest are inserted randomly or by targeting specific positions in the gene sequence leading to a library of gene variants. Functional library members are then selected via a suitable selection system (e.g. phage-assisted evolution) against a target function. The activity of the selected protein is finally confirmed by functional analysis. Rational design and directed evolution are often combined to obtain the best results (semi-rational approach).

Figure 2
Phage particle

Evolving biomolecules through negative selection
In many cases, the requirements for evolved proteins not only include target activity but also the avoidance of potential off-target effects. This can be achieved by engineering a negative selection to remove variants with unwanted properties, which can be implemented by down-regulating a gene required for phage replication [27 ,33]. Alternatively, one may exploit any of the known mechanisms by which a bacterium can counteract a phage infection [34]. PACE has been adapted for negative selection pressures by choosing an abortive infection mechanism, where the undesired activity (activation of the original promoter) was linked to the inhibition of phage propagation using a non-functional pIII variant [26].

Modulating selection stringency for new functions
An important challenge is the ability to maintain phage replication when there is a lack of initial function for the biomolecule to be evolved. In the original PACE approach, an intermediate selection system was used where the T7 RNAP was initially evolved to transcribe a hybrid T3-T7 promoter, which had some activity, to later switch the selection to the full-target T3 promoter [16 ]. This is actually very difficult to achieve because it requires engineering a hybrid promoter that is still active with the original polymerase. Therefore this cannot be easily generalised to other cases. Fortunately, an alternative method was proposed that does not require re-engineering the target promoter [26] and instead relies on adding a second complementary copy of the gene used for selection (here gIII). This is similar to the hypothesis for the natural evolution of new functions de novo by gene duplication, where one gene duplicate maintains the original function, while the second copy is allowed to drift [35]. In the directed evolution case, the first gIII copy is under the control of a T3 promoter. The second copy is under the control of a T7 promoter, but the expression of this gIII is regulated ('stringency modulation') to ensure this additional copy will cease to complement the original as the evolution progresses and the T7 RNAP acquires activity for the T3 promoter. Thus, the selection pressure is gradually increased over time to select the new function.

Tackling complex evolution pressures
Since the initial development of PACE, the platform has been adapted for the directed evolution of many different classes of proteins. For example, protease-PACE links the proteolysis of a target peptide to phage replication using a protease-activated RNA polymerase [36]. The system was used in the presence of two hepatitis C virus (HCV) protease inhibitor drug candidates (danoprevir and asunaprevir) to evolve HCV protease variants that possess up to 30-fold drug resistance. Strikingly, the predominant mutations obtained in the HCV protease were consistent with the mutations observed in human patients treated with danoprevir or asunaprevir. Alternatively, DNA-binding PACE is a general method for the directed evolution of DNA-binding activity and specificity [25 ]. The platform was used to engineer transcription activator-like effector nucleases (TALENs) with improved DNA cleavage specificity [25 ]. On the other hand, protein-binding PACE enables the directed evolution of protein-protein interactions [37 ]. The authors evolved variants of the Bt toxin CrylAc against a cell receptor from the insect pest Trichoplusia ni with novel binding affinity that can ultimately overcome insect toxin resistance. PACE was also employed to continuously evolve T7 split RNA polymerases for downstream biosensor applications [38]. PACE has even been combined with high-throughput sequencing methods to improve downstream analysis which allows the characterization of whole protein populations as they adapt to selection pressures over time [39].

Evolution using phagemids
Phagemids can provide an alternative to classic full-phage selection systems. They have specific advantages, such as large library sizes and avoiding the mutation of phage genes. Consequently, we developed a phagemid selection system [27 ,33] where only the phagemid (PM) containing a library member and one essential phage gene (gIII) Engineering of biomolecules by phage directed evolution Brö del, Isalan and Jaramillo 35 ( Figure 2 Legend Continued) way, a protein with desired characteristics can be evolved after dozens of rounds of reinfection. (c) Phagemidbased evolution from combinatorial libraries in batch mode. The library members are located on a packaged phagemid (PM) which also contains one essential phage gene (gIII). All the other phage genes are located on a modified helper phage (HP; contains all phage genes except genes III and VI) and an accessory plasmid (AP; contains a conditional gene VI expression circuit). After infection, a protein with desired activity upregulates gene VI expression and therefore increases phage production. In this way, a protein with desired activity can be selected after several rounds of reinfection.
is packaged, while all the other phage components (except gVI) are provided on a modified helper phage (HP). To complete the system, an accessory plasmid (AP) contains a conditional gene VI circuit (Figure 2c). After infection, a protein with desired activity upregulates gene VI expression and therefore increases phage production.
In this way, a protein with desired activity can be selected after several rounds of reinfection. Notably, our recently described system [27 ,33] uses conditional production of the minor coat protein pVI instead of pIII used in PACE. This is particularly useful for the directed evolution of transcription factors against basally-active promoters as expressed gIII in the starter culture would otherwise cause infection resistance resulting in a significantly decreased selection efficiency [40,41].
Phagemid selection has been applied for the directed evolution of a set of orthogonal transcription factors based on l cI against synthetic promoters [27 ]. Negative selection against wild-type (WT) activity via repression has been achieved by putting the WT DNA sequence between the -35 and -10 regions of each synthetic promoter. The resulting toolkit contains 12 transcription factors, operating as activators, repressors, dual activator-repressors or dual repressor-repressors for the use in gene network engineering. Moreover, this evolution strategy functions in batch mode and therefore requires no special equipment for reactor assembly, although it does rely on daily researcher interventions during selection [33].

Conclusion and perspectives
Recently developed directed evolution methods based on conditional phage replication further emphasize the strengths of phage-assisted protein engineering. These systems are particularly useful as they bypass key limitations of the widely-used phage display technology such as the simultaneous mutation and selection of complex biological functions. When choosing the most suitable method, various aspects including desired protein activity, available structural information, selection pressure and required selection efficiency need to be considered. Intracellular phage-assisted systems can, in principle, be used for all types of proteins, as long as their activity can be linked to conditional phage production ( Figure 3). Notably, this is easier to achieve for cytosolic proteins than it is for complex proteins (e.g. membrane proteins). Furthermore, general limitations of bacterial expression over mammalian expression such as protein solubility, posttranslational modifications and disulfide bond formation have to be taken into account when using any phageassisted technology. Directed evolution of different classes of proteins based on conditional M13 phage replication. (a) An evolving T7 RNA polymerase upregulates gene III expression in an activity-dependent manner [16 ]. (b) An evolving N-terminal T7 RNA polymerase fused to a leucine zipper ZA assembles with a C-terminal T7 RNA polymerase variant fused to leucine zipper ZB leading to gene III expression in an activity-dependent manner [38]. (c) An evolving transcription activator (e.g. l cI) upregulates gene VI expression downstream of a specific promoter (e.g. l P RM ) [27 ]. (d) DNA-binding PACE enables the evolution of transcription activator-like effector nucleases (TALENs) [25 ]. The evolving DNA-binding protein is linked to the v subunit of bacterial RNA polymerase III and binding to a target DNA sequence upstream of a minimal lac promoter enables gene III expression in an activity-dependent manner. (e) Protease-PACE enables the evolution of proteases against desired cleavage sites [36]. The T7 polymerase is inhibited when bound to T7 lysozyme as it inhibits transcription initiation and the transition from initiation to elongation [44]. Proteolysis of the target cleavage site by an evolving protease activates the T7 RNA polymerase leading to gene III expression in an activity-dependent manner. (f) Protein-binding PACE allows the evolution of protein-protein interactions [37 ]. The target protein is bound to the DNA upstream the promoter P lacZ-opt via a fused DNA-binding domain (orange) and the RNA polymerase omega subunit (RpoZ; yellow) is fused to the evolving protein. The binding of the evolving protein to the target protein enables the transcription of gene III from the P lacZ-opt promoter. The evolving protein is highlighted in blue and the target sequence is depicted in red for each individual example.
Phages may also be used to evolve non-coding RNAs provided that their function can be linked to gene expression. This is particularly useful to complement computational designs of riboregulators [42], where a cognate regulatory sequence has to be added in the 5 0 UTR of the gene used for selection (for instance gene VI in [27 ,33]). Protein or RNA-based sensors (activating gene expression under the presence of a target chemical inducer) may also be encoded in the phage, provided one designs cycles of selections composed of two steps. The first step consists of a positive selection where the sensor may activate the infectious virion packaging in the presence of the chemical inducer. The second step occurs in the absence of the chemical inducer, where only sensors that do not activate the negative selection gene would be able to produce infectious virions. Similarly, negative selections may also be used to evolve the targeted function in the case of a negative regulator of gene expression (e.g. repressor). A negative selection would here act as an inverter such that constitutive phage replication could be used for evolving a repressor.
Advances in the fields of DNA sequencing, gene synthesis and genome engineering will likely reduce costs and improve the efficiency of current phage-assisted systems as well as drive the development of new technologies based on bacteriophages other than M13 [43]. These advances will also impact new mutagenesis strategies, in particular ones that enable targeted mutagenesis with improved mutation rates in vivo. The mutation of only the target gene(s) while not affecting any other genetic information is desirable in order to reduce the probability of selecting false positive variants in any directed evolution approach. As a consequence, phage-assisted evolution technologies will continue to play a key role in protein engineering efforts for basic as well as applied research.