De novo active sites for resurrected Precambrian enzymes

Protein engineering studies often suggest the emergence of completely new enzyme functionalities to be highly improbable. However, enzymes likely catalysed many different reactions already in the last universal common ancestor. Mechanisms for the emergence of completely new active sites must therefore either plausibly exist or at least have existed at the primordial protein stage. Here, we use resurrected Precambrian proteins as scaffolds for protein engineering and demonstrate that a new active site can be generated through a single hydrophobic-to-ionizable amino acid replacement that generates a partially buried group with perturbed physico-chemical properties. We provide experimental and computational evidence that conformational flexibility can assist the emergence and subsequent evolution of new active sites by improving substrate and transition-state binding, through the sampling of many potentially productive conformations. Our results suggest a mechanism for the emergence of primordial enzymes and highlight the potential of ancestral reconstruction as a tool for protein engineering.


Supplementary Figure 7. Esterase Activity of the GNCA4-W229D/F290W β-Lactamase
Shown here is a plot of the rate (v/[E] 0 ] vs. substrate (p-nitrophenyl acetate) concentration for the esterase activity of the W229D/F290W variant of the ancestral GNCA4 scaffold. Experiments were carried our according to the following protocol: for each substrate concentration, hydrolysis was followed spectrophotometrically for a time sufficient to allow an accurate determination of the rate. Subsequently, 1 mM benzylpenicillin was added and the rate was measured again. This antibiotic concentration is much larger than the K M value and, therefore, the natural active site is expected to be saturated by the antibiotic despite its hydrolysis. The rates in the absence and presence of benzylpenicillin are shown with closed and open symbols, respectively. The results shown here are in agreement with those obtained by blocking the natural active by mutation and by inactivation with clavulanic acid (see Figure 7 in the main text).  Stereo view of a relevant portion of the GNCA4 W229D/F290W GNCA4 model with the |2Fo-Fc| electron density map at 1.4 Å resolution contoured at 1.0 σ (grey mesh). Some relevant amino acids are labelled. Figure 16. GNCA4 W229D/F290W electron density map with TSA. Stereo view of a relevant portion of the GNCA4 W229D/F290W 3D model with the |2Fo-Fc| electron density map at 1.77 Å resolution contoured at 1.0 σ (grey mesh). Some relevant amino acids and the bound transition-state analogue (6NT) are labelled. Figure 17. GNCA MP electron density map. Stereo view of a relevant portion of the GNCA MP 3D model with the |2Fo-Fc| electron density map at 1.5 Å resolution contoured at 1.0 σ (grey mesh). Some relevant amino acids are labelled. Figure 18. GNCA MP W229D electron density map. Stereo view of a relevant portion of the GNCA MP W229D 3D model with the |2Fo-Fc| electron density map at 1.3 Å resolution contoured at 1.0 σ (grey mesh). Some relevant amino acids are labelled.

-Lactamases
Shown here is a superposition of the 3D structures for modern and ancestral β-lactamases (see Supplementary Note for details). A) Superposition of the 3D structures for modern TEM-1 β-lactamase (green), GNCA4 β-lactamase (blue) and our best Kemp eliminase, GNCA4-W229D/F290W β-lactamase (orange) with the transition-state analog 5(6)-nitrobenzotriazole (red) bound at the de novo active site. B) Superposition of the 3D-structures for modern Bacillus licheniformis β-lactamase (green), GNCA4 β-lactamase (blue) and our best Kemp eliminase, GNCA4-W229D/F290W β-lactamase (orange) with the transition-state analog 5(6)-nitrobenzotriazole (red) bound at the de novo active site. In both panels, only the local environment of position 229 is shown.  a Shown here is the percentage of sequence identity between the most probabilistic sequences at Precambrian nodes in the evolution of β-lactamases. See Figure 1 in the main text for node definitions. a Shown here is the number of amino acid differences between different reconstructed sequences at the GNCA node in the evolution of β-lactamases. See Figure 1 for node definitions. Additional information on these proteins can be found in Supplementary Table 3.

!31
Supplementary corresponding to the GNCA node (GNCA1 to GNCA7) and the most probabilistic sequence at that node (GNCA MP β-lactamase). The sequence of the GNCA MP β-lactamase is given in Risso et al. (2013) 5 and has 122 mutational differences with that of the modern TEM-1 β-lactamase. The alternative representations of the sequence at the GNCA node were derived from Monte Carlo sampling of the posterior probability distribution, as described by Risso et al. (2013) 5 . The distances between the each mutated residue and D229 in the structure of GNCA MP -W229D lactamase are given.

!32
Supplementary a Percentage of sequence identity between the sequences of the 10 modern β-lactamases studied in this work. TEM-1 and BL stand for TEM-1 and B. licheniformis β-lactamases. All other modern lactamases (see Figure 1 of the main text) are identified by their PDB codes.

!33
Supplementary      That is in the modern background, the loop shifts away from position 229 and, therefore, it is unlikely to cause steric interference. Still, we deemed it convenient to mutationally probe the loop position that is closer to W229: position 254, at which the ancestral residue is proline and the modern residue is aspartate. However, the D254P mutation did not confer significant de novo Kemp eliminase activity to the TEM1-W229D/M182T lactamase.
More remarkably, we noted that, although the residue at position 290 is tryptophan in both the modern TEM-1 lactamase and the ancestral/engineered GNCA2-W229D/F290W, its spatial orientation is quite different. This is likely due to a cation-π interaction with R259 in the modern protein. Note that such a cation-π interaction is not possible in the ancestral background because the residue at the corresponding position 259 in the GNCA lactamase is an isoleucine. We speculated, therefore, that the orientation of the tryptophan at position 290 could hamper substrate and transition state binding to the cavity created by the W229D mutation in the modern background, thus precluding the generation of Kemp eliminase activity. To mutationally probe this possibility we carried out the following studies: 1) We performed an R259I mutation in the TEM1-W229D/M182T background. This mutation should eliminate the R-W cation-π interaction in the modern background, thus allowing the re-orientation of the W290 residue and perhaps leading to the generation of Kemp eliminase activity. However, we found that the R259I mutation did not generate any significant de novo activity in the TEM1-W229D/M182T lactamase. 2) We mutated W290 in TEM1-W229D/M182T to the ancestral F residue in that position, but observed no significant Kemp eliminase activity. 3) Finally, we mutated W290 in TEM1-W229D/M182T to alanine and glycine in order to remove any steric hindrance to catalysis linked to residue W290 in the modern scaffold. However, despite being stable (with a denaturation temperature of about 55 ºC from DSC), both the W290G and W290A variants of TEM1-W229D/M182T lactamase were as inactive for Kemp elimination as the parent TEM1-W229D/M182T background.
Experiments performed to explore whether some static structural features of the modern Bacillus licheniformis β-lactamase can explain its inefficiency as a scaffold for the de novo generation of Kemp eliminase activity on the basis of a single W229D mutation.
The experiments performed targeted the role of the residue at position 291. Structural superposition (panel B of Supplementary Fig. 20) indicates that residue 291 in the Bacillus licheniformis β-lactamase corresponds to residue 290 in the TEM-1 and GNCA β-lactamases. N291 was mutated to A to test the possibility that steric interference prevents catalysis. Mutations to F and W were performed because these are the residues present at the !40 corresponding position in active variants of GNCA lactamases. Mutation to L was performed because this residue is present at the corresponding position in the active variant of the FCA lactamase. Although some small activity increase was observed to occur upon these mutations, all the activity levels were barely indistinguishable from the blanks. A small steric interference between the transition state and the residue at position 287 (a methionine) is also suggested by the superposition in panel B of Supplementary Fig. 20. However, mutation to alanine (to eliminate the potential interference) does not lead to a significant level of Kemp elimination activity. On the other hand, we did find that a back-to-the-ancestor mutation (M287V, where valine is the residue present at position 287 in FCA lactamase) does lead to a significant activity level.

NMR sequential assignment.
All NMR experiments were performed at 31.5 ºC on a Bruker AV 800 spectrometer equipped with a cryoprobe on a 0.6 mM uniformly 13 C, 15 N-labeled sample. Sequence-specific assignments were made using standard procedures with the following experiments: 2D 1 H-15 N HSQC and 3D HNCO, HN(CA)CO, HN(CO)CA, HNCAi, CBCA(CO)NH and HNCACB 7 . Data obtained with these experiments were complemented with those of specific amino acid type discrimination 8,9 . NMRPipe 10 and Sparky 11 were used to process raw NMR data and for interactive spectrum analysis, respectively. Chemical shifts were referenced to the water signal as an internal reference for 1 H using pH and temperature corrections 12,13 . 15 N and 13 C chemical shifts were referenced indirectly 14 .

N Relaxation Measurements and Analysis.
15 N relaxation parameters T 1 , T 1 ρ, T 2 and { 1 H}-15 N NOE 15 were acquired on a Bruker AV 600 spectrometer equipped with a cryoprobe, at 31.5 ºC on a 0.6 mM, buffered pH 6.7, uniformly 15 N-labeled sample. Twelve delays (20,60,100,160,240,460,640,860,1260,1600, 2200 and 2750 ms) were used for T 1 measurements, ten delays (8,16,36,56,76,100,128,156,180 and 200 ms) were used for T 1 ρ determinations, and a different set of twelve delays (0, 16, 31, 47, 63, 80, 96, 111, 127, 142, 174 and 190 ms) was used to measure the T 2 values. The recycle delay was 3.0 s in all experiments. { 1 H}-15 N NOE experiments were carried out with an overall recycling delay of 10 s to ensure the maximal development of NOEs before acquisition and to allow solvent relaxation, thus avoiding transfer of saturation to the most exposed amide protons of the protein between scans 16 . Relaxation times were calculated via least-squares fitting of peak intensities to a two-parameter exponential function, using the rate analysis routine of the java version of NMRView 17 . Heteronuclear NOEs were calculated from the ratio of cross-peak intensities in spectra collected with and without amide proton saturation during the recycle delay. Uncertainties in peak heights were determined from the standard deviation of the distribution of intensities in the region of the HSQC spectra where no signal was present and only noise was observed. ! Internal Dynamics.

!41
The principal components of the GNCA MP inertia tensor were calculated with the Pdbinertia program 18 using the X-ray structure of the GNCA MP lactamase 5 . We estimated the overall correlation time from the ratio of the mean T 1 and T 2 values. These mean values of T 1 , T 1 ρ and T 2 were calculated from a subset of residues with little internal motion and no significant exchange broadening. This subset excluded residues with NOE values lower than 0.65 and also residues with T 2 values lower than the average minus one standard deviation, unless their corresponding T 1 values were larger than the average plus one standard deviation 19 . The diffusion tensor, which describes rotational diffusion anisotropy, was determined by two approaches 20,21 , with the r2r1_diffusion and the quadric_diffusion programs 22 . The calculations were unsuccessful after using the errors in T 1 and T 2 estimated by Monte Carlo simulations; these were unrealistically low. Therefore, the errors were scaled up by the minimum factor allowing an interpretation of the data in terms of a rotational diffusion tensor. This procedure resulted in 5% average errors. The 15 N relaxation was analyzed assuming dipolar coupling with the directly attached proton (with a bond length of 1.02 Å), and a contribution from the 15 N chemical shift anisotropy evaluated as -172 ppm. Relaxation data were fitted to the Lipari and Szabo model using FAST-Modelfree 17 , with interfaces with MODELFREE version 4.2 23 . Five models of internal motion were evaluated for each amide 1 H-15 N pair, each one described by the following parameters: S 2 , S 2 and τ e , S 2 and R ex , S 2 , τ e , and R ex , and S f 2 , S s 2 , and τ e ; where S 2 is the generalized order parameter of the internal mition, τ e is the effective internal correlation time, R ex is the exchange contribution to transverse relaxation, and S f 2 and S s 2 are related to the amplitude of the fast and slow internal motions. Model v takes into account a situation with two distinctive internal motions (with at least 2 or 3 orders of magnitude between their time constants) both faster than τ m (overall correlation time). The order parameters S f 2 (fast ps) and S s 2 (slow ns) reflect the amplitude of the two internal motions, being τ e the time constant for the slower one.
In order to get a good comparison with a related protein, the same approach was applied to the NMR relaxation parameters published for TEM-1 6 . We have measured a large set of individual 15 N relaxation parameters for the GNCA MP lactamase (Supplementary Figure 22). The heteronuclear { 1 H}-15 N NOE and the longitudinal (T 1 ) and transversal (T 2 ) relaxation times were measured for the 71% of the total 262 residues. The exceptions are the N-terminal residue, the twelve prolines, and some others due to severe signal overlapping in the crowded NMR spectra.

NMR Assignments of GNCA
The average values of the 15 N relaxation parameters are summarized in Supplementary  Table 9. There were several residues that deviated from the average. These are mostly located at the C-terminus, and at some loops. In these cases, low NOE values ( Supplementary Fig.  21) indicate flexibility in the fast time scale (picoseconds to nanoseconds). The average value of the order parameter (S 2 ) is 0.91± 0.05 showing that, globally, the GNCA lactamase has a high degree of order on the pico-to-nanosecond time scale. In general, residues in loop regions have lower values.
Similar results were found in the related protein TEM-1 6 . The calculated global rotational diffusion correlation time (τ m ) for the GNCA MP lactamase was 11.68 ± 0.02 ns. This value is in good agreement with the value obtained by hydrodynamic calculations (12.27 ns) (see Supplementary Table 10).

!42
The principal components of the inertia tensor, calculated for the X-ray structure 5 , have relative values of 1.00, 0.90, and 0.60. These values indicate that the shape deviates from that of a sphere and approaches a prolate ellipsoid. In agreement with these findings, the diffusion tensor that better explained the NMR relaxation data was anisotropic, with different values for the two components (parallel and orthogonal) of the tensor giving a value of D||/D of 1.27 ± 0.15 (Supplementary Table 10). On the bases of all these results, the relaxation data for the amide 1 H-15 N pair of each residue were analyzed using the model-free formalism to calculate the corresponding dynamical parameters. Most data (125+16 spins) could be satisfactorily described by one of the two simpler dynamical models (see Methods in the main text) (Supplementary Table 11), which describe the internal dynamics of the 1 H-15 N pair in terms of a generalized order parameter S 2 and an effective internal correlation time τ e of fast motions, which is always faster than the global correlation time. In a significant number of cases (twenty-six 25+1 residues), it was necessary to include a contribution of the slow motions to the transverse relaxation time, on the microsecond to millisecond time scale. In these cases, the internal dynamics is characterized by the contribution of conformational exchange, R ex . Even though data of exchange contribution was estimated from measurements at a single field, it is important to emphasize that the R 2 /R 1 ratio is very homogeneous (Supplementary Figure 22), and deviations from the expected correlation were not detected. Moreover, the rotational diffusion tensor was analysed using the full relaxation dataset in combination with the crystallographic structure of the mutant to rule out that an increased R 2 value would be induced by the molecular anisotropy.! In a few cases (five residues), the inclusion of the amplitude of two internal motions (S f 2 and S s 2 ) was also necessary to obtain a good fitting. Finally, only four residues were not fitted to any model. !