Increasing protein stability by inferring substitution effects from high-throughput experiments

Summary We apply a computational model, global multi-mutant analysis (GMMA), to inform on effects of most amino acid substitutions from a randomly mutated gene library. Using a high mutation frequency, the method can determine mutations that increase the stability of even very stable proteins for which conventional selection systems have reached their limit. As a demonstration of this, we screened a mutant library of a highly stable and computationally redesigned model protein using an in vivo genetic sensor for folding and assigned a stability effect to 374 of 912 possible single amino acid substitutions. Combining the top 9 substitutions increased the unfolding energy 47 to 69 kJ/mol in a single engineering step. Crystal structures of stabilized variants showed small perturbations in helices 1 and 2, which rendered them closer in structure to the redesign template. This case study illustrates the capability of the method, which is applicable to any screen for protein function.


In brief
Thermodynamic stability is a key goal in protein engineering. Norrild et al. demonstrate how profiling of highly mutated protein variants combined with computational approaches to infer effects of single amino acid substitutions presents a richly informative approach to further engineer an already stable model protein, now melting at >150 C.

INTRODUCTION
Protein engineering requires a complex concurrent optimization of function, stability, and other desired traits, which can be cumbersome even with efficient screening automation. One reason for this is that many substitutions may be required to reach the desired phenotype, and the combinatorial space, when introducing multiple substitutions in protein sequences, quickly rises to, and above, experimentally accessible numbers. Efficient ways to navigate this space are therefore highly desirable. 1 Even in the cases where enhanced stability is not the primary goal, it may still be a useful starting point when engineering enzymes. 2 As a general rule, proteins are only as stable as is required for the adequate fitness of their host. 3 It therefore stands to reason that it should be possible to stabilize most mes-ophilic proteins and enzymes for biotechnology purposes. This can provide the necessary stability headroom to alter a protein's function, for example, where modification of a substrate cavity is suboptimal for stability 4 or when multiple destabilizing substitutions are required for directed evolution toward an altered function. 5 Thus, increasing the stability of a protein can increase tolerance to substitutions, 6 which might be needed for changes in the active site or other desired traits in protein engineering.
Directed evolution in combination with genetic selection and screening can address some of the challenges associated with the large sequence space. One of the tools that can be used to optimize stability is tripartite folding sensors. 7 These are fusion proteins where the protein of interest (POI) is genetically inserted in a loop of a conditionally essential reporter enzyme. 8 Given a stable POI, the reporter enzyme is catalytically active, and the organism survives, while an unstable POI renders the fusion MOTIVATION Protein stability is an important parameter in almost all protein-engineering efforts. Evaluating the effects of the many possible amino acid changes to guide such projects is a significant task, even with recent advances in experimental and computational approaches. Thus, a generally applicable method to determine the effect on folding and stability of individual substitutions and in particular pinpointing multiple enhancing ones is missing. The work described here aims to devise such a method that should also be broadly applicable.
protein misfolded, which abolishes the catalytic activity of the reporter, thus impeding growth. Mutants can be selected for increased stability of the POI by increasing temperature, 9 antibiotic concentration, 10 or by following fluorescent readout. 11 Selection for aggregation resistance 12 and identification of stable protein scaffolds 13 can also achieved using such systems. They will, however, eventually reach the limit of their dynamic range, requiring more complicated approaches to increase stability further. 14 Genetic screening systems, when combined with massively parallel sequencing (MPS), can be exploited for protein science in a range of powerful techniques broadly termed deep mutational scanning (DMS). 15 The potential for applying the method for engineering has been shown in the optimization of a denovo-designed influenza inhibitor 16 and the identification of stabilizing substitutions by studying epistatic effects between the binding capabilities of single and double mutants. 17 More quantitative analyses have been enabled by a thermodynamic model that considers doubly substituted protein variants to infer the effect of single amino acid substitutions on both protein-protein interaction and folding free energies, 18,19 which was later shown to match chemical unfolding stabilities well. 20 We have recently used a folding sensor based on a bacterial heat-shock response 21 to select for variants with improved thermodynamic stability of an already stable protein. 22 In line with this, we have shown that single substitution effects may be obtained by analyzing the results of a DMS experiment with many diverse multiple-substituted protein variants and suggested a global multi-mutant analysis (GMMA) for this task. 23 GMMA rests on the observation that enhancing amino acid substitutions, although they have no individual phenotype in a given assay, can be identified by their ability to compensate deleterious substitutions, which have a phenotype. Combining the information of phenotype (e.g., growth/no growth) and genotype (mutations in the gene) in many multiply mutated variants allows for assignment of effects of individual substitutions, even if they do not display a phenotype on their own. Specifically, the current implementation is aimed at identifying generally enhancing substitutions characterized by additivity when combined.
We have previously developed an in vivo tripartite folding sensor based on the enzyme orotate phosphoribosyl transferase (OPRTase), encoded by the pyrE gene in Eschericia coli OPRTase, which is essential for pyrimidine biosynthesis. 9 Cells defective in this enzyme can only survive on minimal medium if a pyrimidine source, e.g., uracil, is added. We engineered a circularly permutated variant of OPRTase as a folding sensor, termed CPOP, where POIs are inserted between the former N and C termini in the circular permutated enzyme. While the circular permutation is fairly unstable, it still complements a pyrE deletion; however, it becomes highly sensitive to the folding competence of the inserted POI. As a proof of concept, we enhanced the stability of a marginally stable designed protein (called dF106 24 ) through conventional directed evolution. dF106 was created in an effort to computationally redesign the ubiquitous Rossmann fold of thioredoxins using Rosetta Design. 25 This initial version of the protein was fragile but yielded a crystal structure close to the design target. 24 Using the CPOP system, dF106 variants were selected, genetically resulting in a protein variant dF106-L11P-D83V, henceforth termed enhanced dF106 (edF106). This had a high stability with a folding energy of À48 kJ/mol. However, being extremely stable, this variant could not be improved further in the CPOP system because it had already reached the upper limit of the dynamic range in the CPOP selection.
In the present work, we have nevertheless further increased the stability of edF106 using CPOP in a DMS experiment on a library of more than 14,000 edF106 variants carrying, on average, 9 amino acid substitutions. GMMA estimates the additive effect of single amino acid substitutions on stability and function formulated as a fitness potential that relates to the assayed function. 23 Because CPOP reports the folding competence of a variant, we will in this work refer to the estimated fitness potential as stability. Thus, by linking a sequence to the growth/no-growth phenotype, a stability effect was assigned to each of 374 substitutions. By introducing the nine top-ranking substitutions from this single experiment, the stability was enhanced to almost 70 kJ/mol with minimal structural changes. These results demonstrate how GMMA is capable of accurately identifying stabilizing substitutions for optimization beyond the dynamic range of the assay.

RESULTS AND DISCUSSION
Resilience toward mutations reflects thermodynamic stability We previously optimized the stability of the designed protein edF106 to the limit of the selective screen using the CPOP folding sensor. 9 To map the effect of multiple mutations and push stability of edF106 beyond À48 kJ/mol, we generated variant libraries using semi-randomized oligonucleotides in the CPOP system ( Figure 1A). If misfolded, this imposes uracil requirement on the cells and allows for genetic selection 9 (Figure 1B). To generate a suitable dataset for application of the GMMA method, previous analysis had identified two key requirements for the variant library: 23 (1) the library should encode a large number multiply substituted protein variants, each carrying different combinations of amino acid substitutions, all of which must be found in different contexts so that all are connected. In practice, this is achieved by having many-fold more variants than unique substitutions. (2) Similar to chemical unfolding experiments, the inactivation transition should be well probed. This is achieved by, on average, having a number of substitutions per variant that is close to the number of substitutions required to inactivate the protein ( Figure S1).
Because the starting protein was already very stable, we opted for a high mutation frequency obtained by using randomly mutated (''doped'') oligonucleotides as degenerate primers 26,27 with $10% error at each position of 77 and 75 base lengths (Figure 1A). These were designed to cover the C-terminal half of edF106, amino acid residues 48-97, and to act as long primers for PCR amplification and were assembled using USER cloning. 28 This approach was also chosen because previous work suggested that error-prone PCR may result in an overrepresentation of mutations that occur in early PCR rounds, which is not ideal for GMMA. 23 Despite the huge diversity possible, we applied measures to obtain a library of limited size (about 10,000-20,000 variants) such that essentially all variants could be covered by MPS with a reasonable depth (see STAR Methods). This was important because those variants present in the non-selected, but not in the selected, library were inferred to be non-functional in GMMA and should therefore be identified with high accuracy ( Figure 1B). MPS of the library revealed 15,018 unique DNA variants with, on average, 13.4 mutations per gene ($9 amino acid substitutions) after processing the data and applying a quality cutoff on the reads that eliminated sequencing errors ( Figure S1A; Tables S1 and S2).
To identify the effect on stability of combinations of mutations using the CPOP system, the input library was plated on minimal medium to screen for growth (i.e., ura + cells) and the resulting sublibrary sequenced by MPS. From the input library, 19% of the sequences were recovered after selection at 30 C (Figure S1E). We emphasize that the edF106 protein is exceedingly stable and that the stringency of the screen was calibrated accordingly. This was also reflected in the survival of the variants, which depended on the number of substitutions in a slightly sigmoidal fashion instead of an exponential decay ( Figure S1A). This is consistent with previous observations and indicated that the genetic data could be modeled by the thermodynamic stability effects of the protein. 6 GMMA analysis discovers stability effects The principle behind the GMMA analysis is that the effect of a given amino acid substitution is determined in very many different variant contexts and thus that the inferred effect is mostly independent of a specific context. Such generally stabilizing substitutions may be recognized by their ability to compensate other destabilizing substitutions, as earlier shown to be the case when considering double mutants. 17 To illustrate this idea, consider two mildly destabilizing substitutions, A and D, a stabilizing substitution, B, and a deleterious destabilizing substitution, C ( Figures 1C and 1D). Singly substituted variants of A and D both show wild-type-like growth (within noise), whereas they may both be inferred to be destabilizing from the observation that the double-substituted variant A + D does not complement growth. On the other hand, a variant, A + B + D, with an additional substitution, B, is observed to complement growth, from which we may infer that B is stabilizing because it rescues the inactivation by A and D. Using this concept, the effect of each individual substitution is determined from combinations with many others in a global fit to all variants. GMMA was essentially computed as described previously. 23 We used binary classification of folding and misfolding and relied on the robustness of GMMA to infer a quantitative stability effect. We validated this approach by comparing our previous GMMA analysis with one in which we had artificially made the data binary and find that the two are strongly correlated (Pearson correlation of 0.97; Figure S1F). By using a binary phenotype, GMMA simplifies to a logistic regression model with the particular fitting scheme described previously. Briefly, initial estimates were obtained using the mean-field approach followed by a global optimization using Levenberg-Marquardt damped least squares. Reliable stability effects could be assigned to 374 out of the 838 unique substitutions in the library based on the criteria that the substitution should be present in more than 40 sequences and have a standard uncertainty of less than 6.3 kJ/mol ( Figure 1E).
The reference stability was estimated to À27.9 kJ/mol. This is substantially less stable than the value of À48 kJ/mol obtained from chemical unfolding, 9 which indicates that the absolute scales of stabilities are not directly comparable. The global model may be inaccurate in this respect, but it is also likely that the thioredoxin domain is less stable in the fusion. Furthermore, with a binary phenotype, we may not expect the absolute stability scale to be accurate, and the present study only relies on the ranking of the substitutions ( Figure S1F), although the absolute scale does seem relevant (Figure 2A).

Stability measurements validate GMMA
To gauge the accuracy of our GMMA of the data from the folding sensor, we examined how well it could pinpoint the presumably rare stabilizing substitutions in edF106. As substitutions in pro-teins are on average mostly destabilizing, [29][30][31] this was a stringent test of our analysis. We therefore introduced the 15 substitutions predicted to be most stabilizing by the GMMA into the isolated edF106 protein to measure their individual as well as combined effects on thermodynamic stability (Figure 2A). We used a ''two-dimensional'' unfolding approach, where denaturant unfolding is combined with a temperature scan, 32 to measure the stability. For very stable proteins, this analysis arguably gives a more accurate measure of stability because the unfolding transition is probed at different temperatures, increasing the confidence in the extrapolation to a solution with no denaturant (Figures S2 and S3). The top 15 substitutions are found at nine distinct positions, with some positions having two or three substitutions that GMMA suggests are stabilizing. Thus, only nine could be combined to form the multi-mutant 9 (MM9) variant. To evaluate additivity, we constructed two additional multimutants with three and six of the nine top substitutions (MM3 and MM6). Interestingly, the multi-mutants show progressively increasing stability, whereas the individual stability effects vary more ( Figure 2A). We compared the measured stability with the stability expected if all individual variant effects were additive and found that the total stability gains were 88%, 74%, and 69% in MM3, MM6, and MM9, respectively, of the expected. All the same, the MM9 variant had a stability of À68.6 ± 1.1 kJ/mol (DDG = À21.8 ± 1.1 kJ/mol, n = 5) and an extrapolated melting temperature of 152 C ± 10 C (n = 5) ( Figures 2B and 2C). While these values are somewhat uncertain due to the very long extrapolations to denaturant-free conditions, it is remarkable that this increase in stability was achieved in a single experiment of screening with a protein that was already very stable and that had hit the ceiling of the selection system used.
While 12 of the top 15 variants identified by GMMA were stabilizing in thermodynamic measurements of the isolated edF106 protein, three substitutions did not increase stability. Here, we must take into consideration that mutations were scored in the CPOP system, where insertion of target protein into a fusion with the sensor protein is not completely equivalent to that of the target, in this case a thioredoxin domain, on its own. Notably, the poorly predicted positions, E54Y, K79H, and M51T, are close to the N or C termini in the crystal structure of dF106 and are therefore in close physical proximity to the CPOP fusion site. Thus, mutations that could be stabilizing in the fusion might behave differently outside of the CPOP context. As two substitutions (E54Y and K79H) appeared to destabilize edF106 (Figures 2A and S2; Table S3) and the optimal substitution appeared not to have been chosen at all positions, we prepared an enhanced version of MM9, termed eMM9, in which the best possible substitutions at each position, according to the measurements of single variants, were chosen: M51K, E54V, L55V, T57I, V83L, L87F, I88T, and P92S. Within experimental error, however, we were not able to differentiate the stability of eMM9 from MM9 ( Figures 2B and 2C) (DG = À69.5 ± 1.1, n = 4, p = 0.30 [two-sided independent t test]), again possibly due to the very long extrapolations to denaturant-free conditions. All things considered, effects on stability in the CPOP fusion are surprisingly well reflected in the isolated edF106 protein, and we note that including slightly destabilizing substitutions, as measured in single substitutions in edF106, did not abrogate the stabilization in the multi-mutant context.

Crystal structures show increased similarity to the 1FB0 design template
To gain more insight into the effects of the mutations, we crystalized and determined the crystal structure of MM9 using diffraction data that extended to a resolution at 1.9 Å (Figure 3). Super imposition with the originally redesigned protein dF106 (PDB: 5J7D) showed a good fit with a root-mean-square deviation (RMSD) of 1.20 Å using the C a positions. However, there was no visible electron density to fit the N-proximal residues (1-17).
Instead, a strong crystal contact was present on the exposed surface and had likely moved the 17-residue helix (a1) out of the way ( Figure S4). The crystals grew very slowly, typically within a month, with a low success rate out of 48 examined crystallization conditions. Only one or two would yield crystals indicative of a conformational change taking place in order to stabilize the crystal lattice. The similar construct of eMM9, however, readily formed crystals in several conditions, with the best dataset collected at 2.25 Å resolution. The complete model could readily be built into the electron density despite minor crystal twinning present in the dataset that challenged the data analysis (STAR Methods). This revealed an equivalent structure to MM9 with all C a RMSDs at 0.70 Å , whereas when the structure was compared with dF106, the RMSD was measured to 1.83 Å .
With the N-proximal helix visible (a1), the main difference between eMM9 and the original designed protein, dF106, is indeed found in this region where eMM9 showed much better agreement with its original design template spinach thioredoxin (PDB: 1FB0). Interestingly, the other main region of change is the start of helix 2 (a2)-the active site of the spinach thioredoxin-where MM9/eMM9 is again more akin to the natural protein. Thus, an all-C a comparison of eMM9 showed an RMSD of 0.97 Å to spinach thioredoxin instead of 1.83 Å when compared with dF106 ( Figure 3A). The structure also revealed that the destabilizing substitution E54Y presented in MM9, but not in eMM9, would probably be incompatible with the native conformation of Val2, which may have facilitated its increased dynamics following MM9 crystal formation ( Figure 3C). That this clash happens close to the N-terminal further strengthens the idea that the discrepancy is likely an artifact from the fusion protein and not the GMMA method.

Structure and sequence-based methods do not predict most stabilizing variants
In general, we found it difficult to rationalize why many of the substitutions stabilized the protein. We therefore asked whether it would have been possible to point them out as likely candidates beforehand. We calculated variant effects using two methods, structure-based stability calculations using Rosetta and a direct coupling analysis (lbsDCA) of a multiple sequence alignment (MSA) of natural thioredoxins, as such analyses have previously been shown to predict stability effects 22,33,34 (Figure 4). While both methods gave rise to reasonable overall Pearson correlations of 0.54 and 0.59, respectively, they failed to identify many of the substitutions that were inferred to be stabilizing in our GMMA. Some of the best substitutions found here score poorly in Rosetta, e.g., P92T and P92S rank >50 and V83L, L87F, and L55V rank >150, and all five scored to be destabilizing. The MSA-based method, lbsDCA, in general performs better but still has V83L, P92S, and P92T with ranks >30 and E54V and L87F with ranks >100. With 12 out of 15 substitutions confirmed to be stabilizing (Figure 2) and the zero-effect accurately reproduced ( Figure S3), GMMA seems to identify stabilizing substitutions accurately. On the other hand, only 8 and 11 of the 15 top-ranking substitutions from Rosetta and lbsDCA are estimated to be stabilizing, respectively, and these effects are significantly smaller in magnitude as evaluated by GMMA. In particular, they identify only 3 and 4 of the GMMA top 15 ranking substitutions, respectively, and Rosetta rank 6 (K66I) and 13 (T74I) are estimated to be highly Computational evaluation of multi-mutant datasets as a general approach in protein engineering In screening for protein/enzyme optimization the choice is often between (1) individually analyzing the full complement of amino acid substitutions throughout the open reading frame or (2) gener-ation and screening of libraries of multiple random mutations using, e.g., error-prone PCR (epPCR). Option 1 has the advantage of being exhaustive because amino acid substitutions that require changing more than one base in a codon are more easily gauged. If there are indeed single substitutions, which offer clear enhancements, they are readily identified and implemented. Option 2, on the other hand, is less costly in library generation but will only access a smaller fraction of the possible amino acid substitutions. Also, because more than one substitution is typically introduced in each member of the library, significant time must be invested in identification of critical substitutions. In both cases, there needs to be screening headroom to actually measure enhanced stability/ activity of variants beyond the current state of the protein. With GMMA, both this and the issue of variant identification are dealt with. Furthermore, substitutions identified by GMMA have a build-in additivity, and they may be particularly appropriate in the context of several other substitutions. Another virtue of GMMA is the fact that libraries do not need to (and, indeed, should not) be very large in order to cover all relevant substitutions because of the relatively high mutation rate applied. While we have here generated mutant libraries using doped oligo nucleotides, one could also have used epPCR instead. This choice does, however, depend on the screening stringency applied, which, as in the case of edF106, may require a higher mutation rate than is easily attainable by epPCR, to adequately sample the pass/no pass for the screen. Suitable deep sequencing can be done with entry-level MPS as library sizes need not be very large. Finally, while we have emphasized stability-enhancing substitutions here, the output of GMMA may also be useful for identifying subtle destabilizing substitutions, e.g., for targeted heat inactivation. As GMMA for binary readout screens simplifies to logistic regression, computational packages for such procedures can be adapted. For even easier access, enrichment or depletion of a given substitution in the multi-mutants after screening correlates with stability enhancement and decrease, respectively. Together, we anticipate that GMMA may be the method of choice for research laboratories or innovative small-to-medium-sized enterprises where access to resources may preclude exhaustive, but more costly, approaches.

Conclusions
The GMMA approach shows remarkable efficiency in identifying the rare stabilizing substitutions of the model protein edF106, allowing us to improve the stability by close to 50% in a single iteration. The starting protein, edF106, can sustain growth even at the highest screening temperature 42 C in the CPOP selection system used, making it impossible to select for stability-enhancing substitutions by conventional directed evolution. 9 Nevertheless, the selection for folding was here done under ''mild'' conditions (30 C), where the lower temperature imposes a less stringent conditions for protein folding. Thus, the GMMA approach allows us to identify stability-enhancing substitutions outside the dynamic range of the selection. While other measures can be taken to address this issue, 14 we believe that the GMMA approach may be more robust. As GMMA only finds substitutions that are stabilizing in many variant backgrounds, stabilizing substitutions can be combined, suggesting that GMMA can efficiently guide optimization in the vast combinatorial space of the protein sequence. For the same reason, we expect that interrogating parts of target proteins and combining information from fragments provides useful and experimentally less demanding access to protein engineering. While the CPOP system has proven very successful here as a proxy for stability of a protein with no intrinsic function, GMMA can be applied to any system in which functional selection can be carried out in high throughput. Here, we have stabilized edF106 as an example, but we emphasize that the workflow should generalize to any other protein with a suitable screen for function.

Limitations of the study
This study focused on the C-terminal half of edF106 in the effort of stabilizing the protein thermodynamically. Due to the length restrictions of the mutagenic oligonucleotides used for library creation, we were not able to cover the full protein, and one would need to develop individual sets of primers to cover the full sequence with this procedure. Nevertheless, as substitutions are likely additive throughout the full length, we would expect also to find stabilizing substitutions in the N-terminal half. The GMMA approach has, however, also been shown to work on libraries generated by epPCR mutagenesis, 23 which does not have the same length limitations.
We have here not investigated the interplay of optimization of stability and function, e.g., catalytic activity of an enzyme. However, the method might in principle be able to improve both of these protein traits in parallel, which will also have valuable implications.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

ACKNOWLEDGMENTS
We would like to acknowledge the help from Joseph Nesme and Shashank Gupta in obtaining the MPS data. We also thank the MAX-IV synchrotron staff and Ana Gonzales for help during the remote X-ray diffraction data collection. We thank Matteo Cagiada and Anders Frederiksen for implementing computational pipelines for conservation and stability scores, respectively. We are thankful for discussions with Associate Professor Martin Willemoë s, and we are especially grateful for the help and funding provided by Professor Alexander K. Buell (Novo Nordisk Foundation: NNFSA170028392) to finalize the project. This work was supported by Independent Research Fund Denmark and the PRISM (Protein Interactions and Stability in Medicine and Genomics) center (NNF18OC0033950).

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Dr. Jakob R. Winther (jrwinther@bio.ku.dk).

Materials availability
All unique/stable reagents generated in this study are available from the lead contact with a completed materials transfer agreement.
Data and code availability d Crystallography structures have been deposited in PDB with accession numbers 7Q3J and 7Q3K and are publicly available as of the date of publication. d All original code has been deposited at GitHub: https://github.com/KULL-Centre/_2022_Norrild_GMMA_TRX (https://doi.org/ 10.5281/zenodo.7213166) and is publicly available as of the date of publication. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

METHOD DETAILS
Library construction Libraries were constructed by using long ''doped'' oligonucleotides as primers for inverse PCR on a plasmid 9 containing the fusion between the gene encoding CPOP sensor and a gene encoding edF106. Primers, obtained from LGC Biosearch technologies, contained a deoxyuracil at position 6 from the 5 0 end so as to allow for annealing using a USER cloning approach. 44 Primers were named based on the amino acid positions mutated; oligo [48:72] and oligo [74:97], and all positions not involved in the USER-cloning site, 71 and 69 bases respectively, were ''doped'' with 10% of the three non-wild-type nucleobases for random mutagenesis. A version of the plasmid pMMA010 9 with edF106 inserted in the CPOP system was amplified in a PCR reaction using PfuX7 polymerase, which is compatible with the USER-cloning. 37 50 mL of PCR reaction mix [1x HF buffer (Thermo Fischer), 0.2 mM of each primer, 50 mM dNTPs, 0.15 ng/mL template plasmid] were prepared with 1 mL of in-house purified PfuX7. The reaction was run with initial denaturation at 98 C for 30 seconds and 30 cycles of 10 seconds at 98 C, 30 seconds at 62 C and 5 minutes at 72 C. A final 10 minutes at 72 C was employed to complete any unfinished product. Twenty 50 mL PCR reactions were pooled and the intended product was purified by gel band excision from a 1% (w/v) agarose gel. DNA was extracted from the gel with a gel band purification kit (GE healthcare) and eluted from the spin columns of the kit with 50 mL MilliQ water. 100 mL reactions were prepared for USER excision and ligation consisting of 85 mL purified PCR product, 10 mL 10X T4 DNA Ligase buffer (Thermo) and 5 mL USER enzyme mix (New England biolabs). 45 The temperatures used for the reaction were 1 hour at 37 C for catalysis, 30 minutes at 25 C for dissociation of excised fragment and annealing consisting of 20 minutes at each of the following temperatures: 12 C, 11 C, 10 C, 9 C and 8 C. The solution was kept on ice before immediately adding 5 mL of T4 DNA ligase (Thermo) and 10 mL 5 mM ATP (VWR Life science). The solution was then incubated for 30 minutes at room temperature before heat inactivation of the ligase at 70 C for 10 min as recommended by the supplier. 2.5 mL Dpn1 (Thermo) were added before incubating the solution at 37 C overnight. Next day, the DNA was purified using the Illustra GFX PCR DNA and Gel Band Purification Kit (GE healthcare) and eluted in 20 mL sterilised MilliQ water.
Initial library transformation 10 mL of purified cloned DNA was used to transform electrocompetent MC1061 cells 36 and the transformants were plated on four large 140 mm petri dishes to maximize colony separation, yielding an estimated 99,000 colonies. Colonies were scraped off the plates and collected by adding 5 mL sterile PBS buffer to each plate (20 mM phosphate and 150 mM NaCl, pH = 7.4). The density of the recovered cells where normalised to OD 600 = 3 before isolating plasmids using a mini prep kit (Omega) for each plate and eluting in 50 mL TE buffer (10 mM Tris-HCL and 1 mM EDTA, pH = 7.3). To normalise the number of variants obtained from each purification, the purified plasmids were mixed in equimolar volume based on their absorbance at 260 nm.

Retransformation
The plasmid libraries were transformed into the selection strain by using 5 ng of the purified and mixed plasmid library to transform electrocompetent MRH205 cells. 9 Ten-fold dilution of the transformed cells were plated on a 140 mm LB agar plate with 100 mg/mL e2 Cell Reports Methods 2, 100333, November 21, 2022 Article ll OPEN ACCESS ampicillin and 50 mg/mL kanamycin for a limited library size of estimated 53,600 colonies. Cells were collected using the same protocol as for the initial library transformation. A freeze stock was prepared for each library with 800 mL cell suspension and 200 mL of 87% (v/v) glycerol. 50 mL aliquots were made from the freeze stock for screening of the libraries. Plasmids from single colonies of the library were purified and Sanger sequenced to confirm correct assembly of the library. Sequences appeared clear and uniform, suggesting that none of the transformants tested carried significant levels of more than one variant.

Screening of library
The cell library was diluted 100,000-fold in PBS buffer and plated on three 140 mm selective medium plates at 30 C ensuring no more than 80 CFU/cm 2 but a five-fold sampling depth of the library. After 22.5 hours incubation, plasmids from the plates were purified similarly to the collection of the initially transformed libraries. The concentration of DNA in the mini preps were normalised by dilution in TE buffer before mixing equal volumes of the solutions. In parallel, the library was also grown over night in 5 mL LB with 100 mg/mL ampicillin and 50 mg/mL kanamycin before purifying the plasmids of the full library using the same mini prep kit.
Massively parallel sequencing Sequencing of the purified plasmids were done with paired end amplicon sequencing protocol 46 on one third of a Illumina MiSeq run using the version 3 kit with 600 cycles. The mutated part of the protein and 70 base pairs flanking region in each direction was sequenced. Amplicons with Illumina Nextera primers (361 base pairs) were produced by a PCR reaction with two HPLC purified primers with one part complementary to the plasmid and the Nextera sequence as a 5 0 overhang. 25 ng template were used for the 25 mL PCR mix using HiFi Pfu polymerase (PCR biosystems). The reaction (Initial denaturation: 1 min at 95 C. Cycle: 15 s at 95 C, 15 s at 55 C and 1 min at 72 C) was run for a minimum amount of cycles (12 cycles) as to reduce PCR chimeras. 47 The final elongation step was omitted to reduce the amount of chimeras. 48 Amplicons from the first PCR were purified from 20 mL using AMPure XP magnetic beads (Beckman Coulter) and eluting in 40 mL elution buffer (Zymo kit). 2 mL of purified amplicons were used as templates for the second PCR to attach indexing primers to the Nextera adapters (total of 429 base pairs), including one reaction without template as a negative control. The PCR reaction was run for 15 cycles and was checked for uniform bands on a gel. 30 mL were purified using the magnetic beads (AMPure XP, Beckman Coulter) and eluted in 40 mL elution buffer (Zymo kit). The concentrations of the samples were then normalised with SequalPrep Normalization Plate (Invitrogen) and eluted in 20 mL elution buffer (Zymo kit). The amplicons were pooled and then cleaned and concentrated with the DNA Clean & Concentrator kit (Zymo). The concentration of the DNA in the eluate was quantified with Qubit and the sample was then diluted to 4.5 nM. The sample was denatured and diluted as described by the Illumina protocol for the MiSeq system. 200 mL sample were mixed with 400 mL of other samples for the run before removing 30 mL of the pooled samples and adding 30 mL PhiX (5% spike) as internal control. 1.35-1.5 million reads were collected totalling $3 million paired end reads before and after selection (Table S1).
Processing the paired end reads Data from the sequencing were demultiplexed and trimmed to the start of the plasmid coding sequence. Filtering was done with a custom python script (Filtering.py) that checked for full complementation of the mutated area plus 10 base pairs in each direction. Also, the length of the amplicon was required to match the expected size. The area was then compared to the template sequence using a custom script (GenerateMutfile.py) to compress the sequences to a list of mutations. Identical sets of mutations were then counted (SeqCount.py), resulting in a file with the mutations of the sequences and how many times they were read. For each sequencing pool, a cut-off was chosen to eliminate noise from the dataset based on the mutation rate in the 2 3 10 base pair region immediately down and upstream of the mutated area.

GMMA
The genotype counts were aggregated per amino acid variant and 226 complementing variants that were not observed in the input library were discarded. No pseudo-counts were used. Variants with any counts above the cut-off values were considered complementing in the binary readout. This resulted in 838 unique amino acid substitutions combined in 14.887 protein variants holding on average of 9.0 substitutions and 18.5% variants that complement growth in CPOP. The following GMMA was conducted according to the previously published protocol. 23 Using the mean-field approach, 23 initial stability estimates could be obtained for all 838 unique amino acid substitutions based on the 10,235 variants that did not contain non-sense mutations (4622) or substitutions that like-wise appeared irreversible fatal (30). Of these, relatively few (194) substitutions were observed in only active or only inactive variants.
For the global analysis, a network analysis found that all substitutions were connected, i.e. that no subset of substitutions only occurred together and never with the rest of the substitutions. This test was important because only a connected network of substitutions can inform the global analysis. Only 54 of the 838 unique substitutions only occurred in a single variant, i.e. are hanging substitutions that do not inform the global analysis.
The GMMA error analysis was slightly different from previously described. 23 We did not obtain errors on the estimated stability effects since uncertainties were not determined in the binary readout from the CPOP screen. For filtering of inaccurately estimated effects, we simply required that a substitution should be observed in at least 40 different variants and have a fitting standard uncertainty of 6.3 kJ/mol or better. This resulted in 293 accurately estimated effects. Requiring that a substitution has been observed in at Cell Reports Methods 2, 100333, November 21, 2022 e3 Article ll OPEN ACCESS