Structurally distributed surface sites tune allosteric regulation

Our ability to rationally optimize allosteric regulation is limited by incomplete knowledge of the mutations that tune allostery. Are these mutations few or abundant, structurally localized or distributed? To examine this, we conducted saturation mutagenesis of a synthetic allosteric switch in which Dihydrofolate reductase (DHFR) is regulated by a blue-light sensitive LOV2 domain. Using a high-throughput assay wherein DHFR catalytic activity is coupled to E. coli growth, we assessed the impact of 1548 viable DHFR single mutations on allostery. Despite most mutations being deleterious to activity, fewer than 5% of mutations had a statistically significant influence on allostery. Most allostery disrupting mutations were proximal to the LOV2 insertion site. In contrast, allostery enhancing mutations were structurally distributed and enriched on the protein surface. Combining several allostery enhancing mutations yielded near-additive improvements to dynamic range. Our results indicate a path toward optimizing allosteric function through variation at surface sites.


Introduction
In allosteric regulation, protein activity is modulated by an input effector signal spatially removed from the active site. Allostery is a desirable engineering target because it can yield sensitive, reversible, and rapid control of protein activity in response to diverse inputs (Dagliyan et al., 2019;Pincus et al., 2017;Raman et al., 2014). One common approach for achieving allosteric regulation in both engineered and evolved systems is through domain insertion: the transposition, recombination, or otherwise fusion of an 'input' domain into an 'output' domain of interest (Aroul-Selvam et al., 2004;Dagliyan et al., 2016;Ostermeier and Benkovic, 2000;Nadler et al., 2016). In natural proteins, domain insertions and rearrangements play a key role in generating regulatory diversity, with kinases serving as a prototypical example (Fan et al., 2018;Huse and Kuriyan, 2002;Peisajovich et al., 2010;Shah et al., 2018). In engineered proteins, domain insertions have been used to generate fluorescent metabolite biosensors (Nadler et al., 2016), sugar-regulated TEM-1 blactamase variants (Guntas et al., 2005), and a myriad of light-controlled proteins including kinases, ion channels, guanosine triphosphatases, guanine exchange factors, and Cas9 variants (Dagliyan et al., 2016;Wang et al., 2016;Karginov et al., 2011;Toettcher et al., 2013;Shaaya et al., 2020;Coyote-Maestas et al., 2019;Richter et al., 2016). In all cases, domain insertion provides a powerful means to confer new regulation in a modular fashion.
However, naively created domain insertion chimeras sometimes exhibit relatively modest allosteric dynamic range, with small observed differences in activity between the constitutive and activated states (Lee et al., 2008). These fusions then require further optimization by either evolution or empirical mutagenesis, but general principles to guide this process are largely absent. Which mutations tune or improve an allosteric system? Because we lack comprehensive studies of allosteric mutational effects in either engineered or natural systems, it remains unclear whether such mutations are common or rare, and what magnitude of allosteric effect we might typically expect for single mutations. Additionally, it is not obvious if such mutations are structurally distributed or localized (for example, to the insertion site). Answers to these questions would inform practical strategies for optimizing engineered systems and provide insight into the evolution of natural multi-domain regulation in proteins.
To address these questions, we performed a deep mutational scan of a synthetic allosteric switch: a fusion between the E. coli metabolic enzyme Dihydrofolate Reductase (DHFR) and the blue-light sensing LOV2 domain from A. sativa (Lee et al., 2008;Reynolds et al., 2011). This modestly allosteric chimera shows a 30% increase in DHFR velocity in response to light. Focusing on mutations to the DHFR residues, we found that only a small fraction (4.4%) of the mutations that retained DHFR activity had a statistically significant impact on allostery. Individual mutations exhibited generally modest effect sizes; the most allosteric single mutant characterized (H124Q) yielded a twofold increase in velocity in response to light relative to the starting construct. Structurally, allostery eLife digest Many proteins exhibit a property called 'allostery'. In allostery, an input signal at a specific site of a protein -such as a molecule binding, or the protein absorbing a photon of lightleads to a change in output at another site far away. For example, the protein might catalyze a chemical reaction faster or bind to another molecule more tightly in the presence of the input signal. This protein 'remote control' allows cells to sense and respond to changes in their environment. An ability to rapidly engineer new allosteric mechanisms into proteins is much sought after because this would provide an approach for building biosensors and other useful tools. One common approach to engineering new allosteric regulation is to combine a 'sensor' or input region from one protein with an 'output' region or domain from another.
When researchers engineer allostery using this approach of combining input and output domains from different proteins, the difference in the output when the input is 'on' versus 'off' is often small, a situation called 'modest allostery '. McCormick et al. wanted to know how to optimize this domain combination approach to increase the difference in output between the 'on' and 'off' states.
More specifically, McCormick et al. wanted to find out whether swapping out or mutating specific amino acids (each of the individual building blocks that make up a protein) enhances or disrupts allostery. They also wanted to know if there are many possible mutations that change the effectiveness of allostery, or if this property is controlled by just a few amino acids. Finally, McCormick et al. questioned where in a protein most of these allostery-tuning mutations were located.
To answer these questions, McCormick et al. engineered a new allosteric protein by inserting a light-sensing domain (input) into a protein involved in metabolism (a metabolic enzyme that produces a biomolecule called a tetrahydrofolate) to yield a light-controlled enzyme. Next, they introduced mutations into both the 'input' and 'output' domains to see where they had a greater effect on allostery.
After filtering out mutations that destroyed the function of the output domain, McCormick et al. found that only about 5% of mutations to the 'output' domain altered the allosteric response of their engineered enzyme. In fact, most mutations that disrupted allostery were found near the site where the 'input' domain was inserted, while mutations that enhanced allostery were sprinkled throughout the enzyme, often on its protein surface. This was surprising in light of the commonly-held assumption that mutations on protein surfaces have little impact on the activity of the 'output' domain. Overall, the effect of individual mutations on allostery was small, but McCormick et al. found that these mutations can sometimes be combined to yield larger effects.
McCormick et al.'s results suggest a new approach for optimizing engineered allosteric proteins: by introducing mutations on the protein surface. It also opens up new questions: mechanically, how do surface sites affect allostery? In the future, it will be important to characterize how combinations of mutations can optimize allosteric regulation, and to determine what evolutionary trajectories to high performance allosteric 'switches' look like.
disrupting mutations tended to cluster near the LOV2 insertion site and were modestly enriched at both conserved and co-evolving amino acid positions. In contrast, allostery enhancing mutations were distributed across the protein, and strongly associated with the protein surface. We observed that combining a few of these mutations yielded near-additive enhancements to allosteric dynamic range. Collectively, our data elucidates practical strategies for optimizing engineered systems, and shows that weakly conserved, structurally distributed surface sites can contribute to allosteric tuning.

Characterization of an unoptimized allosteric fusion of DHFR-LOV2
To begin our study of allostery tuning mutations, we selected a previously characterized synthetic allosteric fusion between DHFR and LOV2 generated in earlier work (Lee et al., 2008;Reynolds et al., 2011). In this fusion, the LOV2 domain of A. sativa is inserted between residues 120 and 121 of the E. coli DHFR bF-bG loop; we refer to this construct as DL121 ( Figure 1A,B). The choice of LOV2 insertion site was guided by Statistical Coupling Analysis (SCA), an approach for analyzing coevolution between pairs of amino acids across a homologous protein family (Rivoire et al., 2016;Lockless and Ranganathan, 1999;Halabi et al., 2009). A central finding of SCA is that coevolving groups of amino acids, termed sectors, often form physically contiguous networks in the tertiary structure that link allosteric sites to active sites (Halabi et al., 2009;Süel et al., 2003;Pincus et al., 2018). To create the DL121 fusion, Lee et al. followed the guiding principle that sector connected surface sites in DHFR might serve as preferred sites (or 'hot spots') for the introduction of allosteric regulation (Lee et al., 2008). The resulting DL121 fusion covalently attaches the N-and C-termini of LOV2 into a sector connected surface on DHFR, and displays a twofold increase in DHFR hydride transfer rate (k hyd ) in response to blue light (Lee et al., 2008). Under steady-state conditions, we measured a 28% increase in the turnover number (k cat ) in response to light and a statistically insignificant change in the Michaelis constant (K m ) ( Figure 1C). Thus, the DL121 fusion is modestly allosteric in vitro. As DHFR has no known natural allosteric regulation, the LOV2 insertion confers a new, evolutionarily unoptimized regulatory input.
But can this relatively small allosteric effect generate measurable physiological differences that could provide the basis for evolutionary selection? DHFR catalyzes the reduction of 7,8-dihydrofolate (DHF) to 5,6,7,8-tetrahydrofolate (THF) using NADPH as a co-factor. THF then serves as a one-carbon donor and acceptor in the synthesis of thymidine, purine nucleotides, serine, glycine, and methionine. Because of these critical metabolic functions, DHFR activity is strongly linked to growth rate, and under appropriate conditions, E. coli growth rate can be used as a proxy for DHFR activity (Reynolds et al., 2011;Thompson et al., 2020). Prior work found that the modest in vitro allosteric effect of DL121 conferred a selectable growth rate advantage in vivo: when an E. coli DHFR deletion strain (ER2566 DfolADthyA) was complemented with DL121, the resulting strain grew 17% faster in the light than in the dark (Reynolds et al., 2011). Thus, DL121 is a system where: (1) allosteric control is rapidly and reversibly applied, (2) the allosteric effects on activity can be readily quantified both in vitro and in vivo, and (3) there remains potential for large improvements in regulatory dynamic range through mutation.
A high-throughput assay to resolve small changes in DHFR catalytic activity Our goal was to measure the effect of every single amino acid mutation in DHFR on the allosteric regulation of DL121. To do this, we aimed to follow a strategy loosely akin to a double mutant cycle ( Figure 1D). The starting DL121 construct shows so-called V-type allostery, in which the effector (light) regulates the catalytic turnover number (k cat ) (Carlson and Fenton, 2016). Thus, allostery can be quantified as the ratio of k cat between lit and dark states. More generally, allostery might be considered as a ratio of velocities (v = k cat [S]/(K m + [S])) between the lit and dark states, as the allosteric effector could regulate turnover, substrate affinity, or both. In either case, we defined the allosteric effect of mutation as the fold change in allosteric regulation upon mutation ( Figure 1D, blue box). We sought to infer this quantity for every mutation in a saturation mutagenesis library of DHFR by using growth rate as a proxy for catalytic activity.
As in prior work, we measured the growth rate of many E. coli strains in parallel by using next generation sequencing (NGS) to monitor the frequency of individual DHFR mutants over time in a mixed culture (Figure 2; Reynolds et al., 2011;Thompson et al., 2020). Allele frequencies f a ð Þ at each time point (t) were normalized as follows: where N a and N WT are the number of mutant and wildtype (WT) counts at a given time point. By performing a linear fit of the log normalized allele frequencies vs. time we calculated a slope corresponding to relative growth rate: this value is the difference in growth rate for the mutant relative to a reference ('WT') construct.
As individual mutations tend to exhibit modest effects on allosteric regulation, we optimized the linear regime and resolution of the growth rate assay in two ways (Reynolds et al., 2011). First, we grew the E. coli populations in a turbidostat outfitted with blue LEDs to activate LOV2 ( Figure 2A).    (Sawaya and Kraut, 1997;Halavaty and Moffat, 2007). DHFR is in gray cartoon, NADP co-factor in green sticks, and folate substrate in yellow sticks. In LOV2 signaling, blue light triggers the formation of a covalent adduct between a cysteine residue (C450) and a flavin mononucleotide (FMN, yellow sticks) (Salomon et al., 2001;Crosson and Moffat, 2002;Swartz et al., 2002) and associated unfolding of the C-terminal Ja-helix (red cartoon); this order-to-disorder transition is used for regulation in several synthetic and natural systems (Pudasaini et al., 2015;Glantz et al., 2016). (B) DHFR loop conformational changes near the LOV2 insertion site. While the mechanism of DHFR regulation by LOV2 is currently unknown, inspecting the native DHFR structure provides some insight. The substrate-bound Michaelis complex of native DHFR is in the 'closed' conformation (gray cartoon), while the product ternary complex is in the 'occluded' state (purple cartoon). The bF-bG loop, where LOV2 is inserted, is highlighted in cyan. In native DHFR, hydrogen bonds between this loop (Asp122) and the Met20 loop (Gly15, Glu17) are thought to stabilize the closed conformation (Sawaya and Kraut, 1997;Schnell et al., 2004). Mutations to positions 121 and 122 reduce activity and cause the enzyme to prefer the occluded conformation (Cameron and Benkovic, 1997;Mhashal et al., 2018;Miller and Benkovic, 1998 Figure 2. A high-throughput, high-resolution assay for DHFR activity. (A) The turbidostat. The instrument has 15 individual growth chambers (vials), positioned on a stir plate inside an incubator. Illumination was provided by blue LEDs in each vial holder. (B) Log-normalized relative allele frequency over time for 11 DHFR point mutations of known catalytic activity and the DL121 fusion. Allele frequency (colored circles) was determined by next-generation sequencing of mixed-population culture samples at each time point. All frequencies were normalized to t = 0 and WT DHFR (no LOV2 insertion). Error bars reflect standard error across four measurements, they are sometimes Figure 2 continued on next page The turbidostat maintains each culture in exponential growth by dynamically sensing optical density and adjusting media dilution rate accordingly Toprak et al., 2013; this ensures near-constant media conditions and eliminates the need for manual serial dilutions. Second, we selected media conditions -M9 minimal media with 0.4% glucose and 1 mg/ml thymidine supplementation -in which growth rate can resolve subtle differences in catalytic activity near the DL121 fusion. We evaluated the resolution of our assay using a 'standard curve' of 11 point mutations of known catalytic activity in nonchimeric DHFR ( Figure 2B). Under these conditions, we observed a log-linear relationship between relative growth rate and DHFR velocity over nearly four orders of magnitude; this relationship saturates (plateaus) for the most active mutants (WT and M42F, Figure 2C). Importantly, the relative growth rate and velocity of DL121 were near the center of the linear regime of our assay. In using velocity to describe our data, we have incorporated two assumptions: (1) we presume minimal variation in protein abundance between mutants (enzyme concentration is equal to one) and (2) we fix the substrate concentration at 25 mM, which was previously reported as the endogenous concentration for WT E. coli (Kwon et al., 2008). Individual mutations may cause variation in protein abundance, but because allostery concerns a relative change in activity, light-independent differences in abundance can be removed by appropriate normalization (as discussed further below).
As previously observed, the exponential divergence of mutants with different growth rates in a population makes it possible to detect even small biochemical effects (Breslow et al., 2008). More specifically, we can discriminate a change of ±0.02 mM À1 s -1 in catalytic power (k cat /K m ) under these conditions. This level of precision is on par with -and in some cases better than -literature-reported errors for in vitro steady state kinetics measurements of DHFR (Reynolds et al., 2011;Wagner et al., 1992;Huang et al., 1994). Consequently, we can resolve small catalytic and allosteric effects of mutations on DL121 through this high-throughput growth-based assay.

Deleterious mutations are enriched at conserved, coevolving positions in DHFR
In order to map the coupling of individual DHFR positions to light, we constructed a deep mutational scanning library over all DHFR positions in the DL121 fusion (Figure 3-figure supplements 1-2). Then, we measured the growth rate effect of each mutation in triplicate under both lit and dark conditions using the above-described assay ( Figure 3A-C, Figure 3-figure supplements 3-4, Figure 3-source data 1). In this experiment, all growth rates were calculated relative to the unmutated DL121 fusion, which itself exhibits reduced activity (and growth rate) compared to WT DHFR. Mutations fell into four broad categories in terms of growth rate effects: neutral, uniformly deleterious ( Figure 3A), uniformly beneficial ( Figure 3B), or light dependent (and thus allosteric, Figure 3C). We were unable to measure growth rate for 891 of the 3021 possible missense mutations (19 substitutions over 159 positions): 226 (7.5%) were missing at the start of the experiment (t = 0) for one or more replicates (referred to as 'no data'), and an additional 665 (22%) were depleted from the library before reaching the minimum of three time points required for growth rate estimation (we refer to these as null mutants, see also Materials and methods, Figure 3-figure supplement 4). We interpreted these 665 rapidly depleting null mutants as highly deleterious to growth rate and thus DHFR activity. The relative growth rates for the remaining 2130 mutations (70.5%) were highly reproducible, with a correlation coefficient between replicate pairs above 0.9 ( Figure 3-figure supplement 3).
Before examining the allosteric effects of mutations, we first considered the effects of mutations on growth rate (and thus DHFR activity) in a single growth condition (dark). Prior work has found that deleterious mutations are enriched at evolutionarily conserved positions and within the protein sector (McLaughlin et al., 2012). The DHFR sector was defined by analyzing coevolution in a multiple sequence alignment of native DHFR domains, so we wished to examine if sector positions were indeed critical to function in the chimeric DL121 fusion. Good correspondence between the DHFR D27N was deleterious in both lit and dark conditions. (B) DL121 D122W was advantageous under both lit and dark conditions. (C) DL121 E154R was deleterious in the dark, and near neutral in the light. Solid lines were obtained by linear regression; the slope of these provides the difference in growth rate relative to the unmutated DL121 construct. Relative growth rates were measured in triplicate for each mutant under lit (blue) and dark (gray) conditions. (D) Distribution of relative growth rates under dark conditions. The distribution for all mutations with measurable growth rate effects is in gray ('null data' and 'no data' excluded); the distribution for sector mutations is in navy. The relative growth rate of DL121 D27N, a mutation that severely disrupts catalytic activity, is indicated with a cyan dashed line. (E) The fraction of DL121 mutations with measurable growth rates that can be categorized as: DHFR surface, core, sector, and evolutionarily conserved (see Materials and methods for definitions). The fraction is shown for both the complete library (gray bars), and the library after removing mutations with low growth (growth rate <= DL121 D27N). The absolute number of mutations is shown above each bar. A contingency table summarizes the overlap between mutations in the sector (at a p-value cutoff of 0.010), and the mutations that yield low growth (growth rate <= DL121 D27N). (F) Structural distribution of positions enriched for mutations with growth rates as low as or lower Figure 3 continued on next page sector, evolutionary conservation, and deleterious mutations in DL121 would provide confidence that the core functional elements of native DHFR remain intact in the chimera. The vast majority of mutations were at least modestly deleterious to growth, with a median relative growth rate of À0.084 in the dark and À0.083 in the light ( Figure 3D). A cluster of beneficial mutations was observed just before the LOV2 insertion site at position 121 in both conditions, suggesting some potential to compensate for the inserted LOV2 ( Figure 3-figure supplement 4). The overall distribution of fitness effects shows some differences relative to prior DMS studies of natural proteins including native E. coli DHFR (Thompson et al., 2020;Garst et al., 2017). First, the distribution of fitness effects for mutations in natural proteins is often centered around neutral, implying a certain degree of mutational robustness (McLaughlin et al., 2012;Stiffler et al., 2015). Secondly, DMS of native DHFR -under experimental conditions designed to resolve mutational effects near WTrevealed many beneficial (activating) mutations (Thompson et al., 2020). There are two explanations for the relative paucity of beneficial and neutral mutations in the present dataset. First, the DL121 fusion is comparably less robust because the unoptimized LOV2 insertion introduces an initial compromise to DHFR function. Secondly, the conditions of our assay (both expression and media) differ from prior work (Thompson et al., 2020) and were selected to resolve mutational effects near DL121; consequently, mutations with native-like (or better) activity are in the saturating, non-linear regime of our assay.
To identify the slowest growing -and presumably near, or entirely, inactivating -mutations, we applied an empirical growth rate cutoff of À0.13 to the lit and dark growth rates. This corresponds to the growth rate for DL121 D27N; D27N is an active site mutation that strongly reduces the activity of WT DHFR ( Figure 2B,C). The DL121 D27N mutant grows very slowly in the conditions of our assay and is inviable in the absence of thymidine supplementation (Figure 3-figure supplement 5). We found that mutations with growth rates at or below this cutoff (including the null mutants) were significantly enriched in both the sector (p=7.9Â10 À8 , Figure 3E, Supplementary file 1b) and at evolutionarily conserved positions (p=8.7Â10 À20 , Figure 3-figure supplement 6, Supplementary file 1c). When mapped to the WT DHFR structure, positions enriched for deleterious mutations surround the active site and co-factor binding pocket ( Figure 3F), structurally overlap with the sector ( Figure 3G), and include a number of positions known to play a critical role in WT DHFR catalysis (e.g. W22, D27, M42, and L54) (Howell et al., 1986;Fierke et al., 1987). These data are consistent with the view that sector positions continue to play a key role in conferring DHFR catalytic activity in the DL121 fusion.
Following the thinking that (near) inactive DHFR variants are both inherently non-allosteric and associated with the least reproducible growth rate measurements ( Figure 3-figure supplement 3), we removed the set of 1247 slow-growing (growth rate <À0.13) and null mutations prior to the analysis of allostery. The retained 1548 mutations -representing 51% of the growth assay data -remain well-distributed between the DL121 surface, core, sector, and evolutionarily conserved positions ( Figure 3E). These present a high-confidence and representative subset of the data for evaluating mutational effects on DL121 allosteric regulation.

Allostery tuning mutations are sparse
To compute the allosteric effect of mutation, we considered the triplicate measurements of lit and dark relative growth rate for each mutant ( Figure 3A-C). Given the log-linear relationship between Figure 3 continued than DL121 D27N (red spheres). The DHFR backbone is in gray cartoon, the folate substrate in yellow sticks, and the NADP co-factor in green sticks. (G) Relationship of the sector (navy blue surface) to positions enriched for growth-rate disrupting mutations (red spheres, same as in F). The online version of this article includes the following source data and figure supplement(s) for figure 3: Source data 1. Relative growth rates under lit and dark conditions for DL121 point mutations as determined by next-generation sequencing.      growth rate and DHFR velocity ( Figure 2C), subtracting growth rates approximates log-ratios of velocities. Thus, we estimated the allosteric effect of mutation by taking the difference in the average relative growth rates between lit and dark conditions: In the above equations, rgr is relative growth rate (which is directly measured in our sequencingbased assay) and gr refers to absolute growth rate. Accordingly, positive values indicate allostery enhancing mutations and negative values indicate allostery disrupting mutations ( Figures 1D and  4A). Of the 1548 mutations evaluated, the allosteric effect is normally distributed with a mean near zero (m = 0.0017, Figure 4-figure supplement 1). To assess the statistical significance of allosteric effects, we computed a p-value for each mutation by unequal variance t-test under the null hypothesis that the lit and dark replicate measurements have equal means. These p-values were compared to a multiple-hypothesis testing adjusted p-value of p=0.016 determined by Sequential Goodness of Fit (SGoF, Figure 4B; Carvajal-Rodriguez and de Uña-Alvarez, 2011). Under these criteria, only 69 mutations (4.5% of all viable mutants) significantly influenced allostery: 56 mutations enhanced allostery while 13 disrupted allostery. We did not observe a strong association between the magnitude of growth rate effect and the allosteric effect size. Allostery-influencing mutations spanned a wide range of growth rates and exhibited comparatively modest effects on light regulation ( Figure 4C).
To further examine the ability of the growth-based sequencing assay to quantitatively resolve mutation-associated changes in allosteric regulation, we selected 10 mutations spanning a range of allosteric and growth rate effects for in vitro characterization ( Figure 4B red dots, Figure 4-figure supplements 2-4). As a control, we included the light insensitive variant DL121-C450S: the C450S mutation of LOV2 abrogates light-based signaling by blocking formation of a light-induced covalent bond between position 450 and the FMN chromophore (Christie et al., 2002). We expressed and purified the selected DL121 mutants to near homogeneity; S148C and E154R did not yield sufficient quantities of active protein for in vitro studies. We find it noteworthy that E154R-one of the strongest allostery-enhancing mutations in vivo-was unstable in multiple purification strategies. For the remaining eight mutations we measured the k cat and K m of DHFR under lit and dark conditions (Figure 4-figure supplement 2). To confirm function of the fused LOV2 domain, we also measured relaxation of the FMN chromophore following light stimulation and collected absorbance spectra before and after the application of light (Figure 4-figure supplements 3-4). As expected, all the characterized DL121 mutations (with the exception of DL121-C450S) retained LOV2 domains with light-responsive absorbance spectra and chromophore relaxation constants similar to the unmutated DL121 construct. Evaluating the light dependence of DHFR activity, the change in K m value between lit and dark conditions was neither significant for any point mutation nor correlated to allosteric effect size (R 2 = 0.003) (Supplementary file 1a, Figure 4-figure supplements 5-6). The K m values for all characterized mutants (0.15-1.9 mM) were similar to that of unmutated DL121 (~1 mM). Instead, we observed that light predominantly modulated catalytic turnover (k cat ).The ratio of k cat in the light relative to the dark ranged from 1.1 (for the non-allosteric DL121-C450S construct) to 2.0 (for the most allosteric point mutation, H124Q) (Supplementary file 1a, Figure 4-figure supplements 5-6). For reference, the starting DL121 construct has a lit:dark k cat ratio of 1.3. So why might the characterized allosteric mutations predominantly effect k cat ? One plausible explanation is that the conditions of our in vivo experiments fall within a pseudo-zero-order kinetics regime ([DHF] >>K m ). In this scenario, light-associated changes in K m would have little impact on enzyme velocity (and accordingly growth rate) and go undetected in our assay. Consistent with this, the in vivo concentration of DHF for wildtype E. coli (25 mM) is well above the K m for all the characterized DL121 mutations. Alternatively, it could be that the biophysical mechanism of the DL121 fusion somehow makes it more energetically feasible for light to modulate k cat than K m . In any case, the 1.3-to 2-fold changes in k cat translate to similar fold changes in enzyme velocity. A comparison of the in vitro allosteric effect on velocity to the in vivo growth rate effect yields a near-linear relationship with a correlation coefficient of 0.83 ( Figure 4D). Taken together, these data show that our growth-based assay is quantitatively reporting on changes in allostery, and that the allosteric mutations identified here modulate DHFR activity through changes in catalytic turnover number.
The structural pattern of allostery tuning mutations Next, we examined the distribution of allostery-tuning mutations on the WT DHFR tertiary structure. The 13 allostery disrupting mutations localized to six DHFR positions concentrated near the LOV2 insertion site ( Figure 5A). More specifically, 90% of the allostery disrupting mutations occurred A7W 50  1  20  10  40  80  100  70  60  90  110  140  130  120 150  Figure 4. The effect of DL121 DHFR mutations on allostery. (A) Heatmap of mutational effects on allostery. Blue indicates allostery disrupting mutations, and red indicates allostery enhancing mutations. White squares with black outlines mark the WT residue at each position. Mutations missing from the library ('no data') are colored gray, and mutations that did not have sufficient sequencing counts for at least three time points ('null data') are colored navy. The LOV2 domain insertion site is indicated with a red star. (B) Volcano plot indicating the statistical significance of the light-dark growth rate difference (y-axis) as a function of relative growth rate difference (x-axis). p-Values were computed using a t-test across triplicate light and dark measurements. Individual points correspond to mutations; mutations on the left (yellow) side of the graph are allostery disrupting, while mutations on the right (blue) are allostery enhancing. Two cutoffs for statistical significance are indicated with dashed gray lines -both a standard value of p=0.05, Figure 4 continued on next page within 10 Å of the DHFR 121 ca atom ( Figure 5B). These mutations were modestly enriched in the protein sector (Supplementary file 1d). Overall, the observed spatial distribution suggests these mutations may disrupt allostery by altering local structural contacts needed to ensure communication between DHFR and LOV2. In contrast to this localized pattern, the 56 allostery enhancing mutations were observed at 25 positions distributed across the DHFR structure ( Figure 5C) and enriched on the protein surface ( Figure 5D, Supplementary file 1e). These enhancing mutations were never found in the protein sector and were thus statistically significantly depleted from the protein sector ( Figure 5E,F). This relationship -wherein allostery disrupting mutations were modestly enriched and allostery enhancing mutations were strongly depleted from the sector -also holds when defining the set of allosteric mutations at a relaxed cutoff of p=0.05 (Supplementary file 1d). Given the prior finding that sector connected surface sites were hotspots for introducing allostery in DHFR (Reynolds et al., 2011), we also examined the association between allostery-influencing mutations and two other groups of DHFR positions: (1) surface sites that are either within or contacting the sector and (2) surface sites that are only contacting the sector (but not within-sector). As for the analysis of sector positions only, we observed a statistically significant depletion of allostery enhancing mutations and enrichment of allostery disrupting mutations when considering the set of surface sites within or contacting the sector. This finding holds true over a range of significance thresholds for defining sector and allosteric mutations (Supplementary file 1f). When considering the set of positions that contact (but are not within) the sector, we did not observe a statistically significant association at nearly all cutoffs (Supplementary file 1g). Indeed, a number of allostery enhancing mutations do not contact the sector at all and occur in surface exposed loops (e.g. from residues 84 to 89, and from 116 to 119). So, counter to our expectations, the optimization of allostery did not occur at sector connected sites or even proximal to the LOV2 insertion site. Instead, structurally distributed and weakly conserved surface sites provided a basis for tuning and enhancing allosteric regulation regardless of sector connectivity.
Taken together, our data show that many distributed surface sites can make modest contributions to allosteric regulation. Can these mutants be combined to further improve allosteric dynamic range? To test this, we created two mutant constructs by combining the most potent allostery enhancing mutations as characterized in vitro: the double mutant DL121-M16A,H124Q, and the triple mutant DL121-M16A,D87A,H124Q ( Figure 6A). For both constructs, we measured steady-state catalytic parameters (Supplementary file 1a) and verified LOV2 function through absorbance spectra and chromophore relaxation kinetics experiments (Figure 6-figure supplement 1). Interestingly, all three mutations exhibited near-log-additive improvements in allostery ( Figure 6B). The DL121-M16A,H124Q fusion exhibits a 2.74 fold increase in velocity upon light activation while the and an adjusted p-value of 0.016, obtained by using Sequential Goodness of Fit (SGoF) to account for multiple hypothesis testing. Mutations selected for further in vitro experimental characterization are colored red and labeled. S148C and E154R did not yield sufficient quantities of active protein for further in vitro characterization. (C) Triplicate relative growth rate measurements under lit (blue) and dark (gray) conditions for all mutations with statistically significant allostery at the adjusted p-value (p<=0.016). The mutations are sorted by dark growth rate; mutations selected for in vitro characterization are marked with red asterisks. (D) Relationship between the allosteric effect as measured in vivo and in vitro. As we expect a log-linear relationship, we compare the ratio of velocity at 25 mM DHF (along x) to the exponent of the relative growth rate difference (along y). The relative growth rate difference under lit and dark conditions is the mean of triplicate measurements, error bars indicate SEM. All mutant effects on growth rate were measured in the same experiment (corresponding to a subset of the data in panel B) with the exception of DL121 C450S. The relative growth rate for this light-insensitive LOV2 mutant was measured in the 'calibration curve' experiment shown in Figure 2 (see also Materials and methods). The ratio between velocity in the light and velocity in the dark reflects the mean of triplicate measurements; error bars indicate SEM. The green line was fit by linear regression. The online version of this article includes the following figure supplement(s) for figure 4:      . The serial addition of allostery enhancing mutations also reduced the overall catalytic activity of DHFR, suggesting that further improvement could be obtained by combining these mutations with a non-allosteric but activity-enhancing mutation. Overall, these data suggest that a naïve sector connected fusion can be gradually evolved toward increased allosteric dynamic range through the stepwise accumulation of single mutations at structurally distributed surface sites ( Figure 6C).

Discussion
We used deep mutational scanning to study the frequency and structural pattern of allostery tuning mutations in a synthetic allosteric system, with the goal of understanding how regulation between domains can be optimized. Overall, allostery-influencing mutations were rare -just under 5% of viable mutations had statistically distinguishable effects on the lit and dark states of the DL121 fusion. We found that mutations at conserved and co-evolving (sector) positions were often deleterious to DHFR function and infrequently influenced allosteric regulation. In a few cases, sector mutations served to disrupt allostery; nearly all allostery disrupting mutations were localized to the LOV2 insertion site on DHFR. Counter to our expectations, allostery enhancing mutations were distributed across the DHFR structure, depleted from the sector, and enriched on the protein surface. When considered individually, the allostery-enhancing mutations had modest effects (up to twofold) on regulation, but (at least in some cases) they can be combined to yield near-additive improvements in  dynamic range. A triple mutant (DL121-M16A,D87A,H124Q) rationally designed using our point mutant data produces a 3.87-fold increase in velocity upon light stimulation, up from the 1.3-fold allosteric effect of our starting construct. These results should be considered in the context of our experiment: the DL121 fusion begins with sharply reduced DHFR activity, and our experiment intentionally used relatively stringent DHFR selection conditions to better resolve small differences in kinetic parameters. Thus, it is unsurprising that a large fraction of DHFR mutations in our library were deleterious, with an appreciable fraction near-inactive. This result echoes prior studies showing that the fraction of deleterious mutations (and mutational robustness) is strongly modulated by a variety of factors, including purifying selection strength and expression level (Stiffler et al., 2015;Jiang et al., 2013;Lundin et al., 2018). Given the finding that stabilizing mutations can often improve protein evolvability (Lundin et al., 2018;Bloom et al., 2006;Zheng et al., 2020), it would be interesting to examine how the distribution of mutational effects on both DL121 function and allostery would change in the background of a stability (and/or activity) enhancing mutation to DL121. While we observed that the number of allosteric mutations is few and the effect sizes are generally small in our model system, a previous study of allostery tuning mutations in pyruvate kinase indicated that up to 30% of mutations can tune allostery, with the maximum observed effect size approaching 22-fold (Tang and Fenton, 2017). Nevertheless, our data serve to illuminate the pattern of mutational effects on a newly established (and unoptimized) domain fusion -the presumptive first step toward regulation in a number of both natural and synthetic systems.
Interestingly, we observe a seeming disparity between the sites where we were able to introduce new allosteric regulation by domain fusion (in our earlier work), and the sites where allosteric tuning takes place (in this work). Previously, Reynolds et al. found that sector connected surface sites served as hotspots for the introduction of new light-based regulation in DHFR (Reynolds et al., 2011). Indeed, allosteric regulation was never obtained when the LOV2 domain was inserted at a non-sector connected site. In contrast, in this work, we observed that allostery enhancing mutations were depleted both within the sector and at sector connected sites. For example, we observed a number of allostery enhancing mutations at positions 83-89 of the DHFR aD-bE loop, while LOV2 insertions in this region location did not initiate allostery as quantified either in vitro or in vivo (Lee et al., 2008;Reynolds et al., 2011). This suggests different structural requirements for establishing and tuning allostery in this system (and possibly others): here allostery seems to be more easily introduced at evolutionarily conserved and co-evolving sites, but once established, can be optimized through less conserved sector-peripheral residues.
Although our work focuses on a synthetic allosteric fusion, our results are broadly consistent with an emerging body of work characterizing allostery-influencing mutations in natural proteins. Together, these data point to a model in which mutations at evolutionarily conserved positions exert large (and often disruptive) effects on function while allostery is tuned at less conserved surface sites. For example, Leander et al. recently used deep mutational scanning to map the pattern of compensatory mutations that rescued allosteric function for non-allosteric tetracycline repressor (TetR) variants (Leander et al., 2020). In that study a 'disrupt-and-restore' strategy was used: an alreadyallosteric system was inactivated and deep mutational scanning was then used to identify compensatory mutations. While there are significant differences between rescuing a deficient variant and the optimization of a novel allosteric construct, they likewise found that the mutations at highly conserved sites were often disruptive to stability and function, while allostery-rescuing mutations occurred at weakly conserved and structurally distributed sites (Leander et al., 2020). Similarly, mutations at 'rheostat' sites -weakly conserved positions distal to the site of regulation -were found to modulate allosteric control in human liver pyruvate kinase and the lactose repressor protein (lacI) (Campitelli et al., 2020;Wu et al., 2019). Intriguingly, the association of allostery enhancing mutations with the protein surface hints at a possible role for solvent -and more specifically the protein hydration layer -in tuning regulation.
The finding that the allostery initiated upon naïve fusion of the DHFR and LOV2 domains can be further enhanced by single mutations implies a path to improved allosteric dynamic range by stepwise mutagenesis and selection. Three of the most allostery enhancing mutations could be combined to yield a near-additive improvement in regulatory dynamic range. This has interesting implications for both evolved and engineered allosteric systems. In evolved systems, standing mutational variation is more likely at weakly conserved surface sites (particularly under less stringent selection conditions), and this could provide a means for generating variation in allosteric regulation upon a domain fusion event. Moreover, while engineering studies sometimes use mutations near the domain insertion site to optimize regulation, our results suggest that diffuse surface site mutations could present an effective alternative. Whether by engineering or evolution, it seems that mutations at weakly conserved and structurally distributed residues can provide a path to the optimization of regulation.  Reynolds et al., 2011and Thompson et al., 2020(Reynolds et al., 2011Thompson et al., 2020
PCR reactions were set up using 2x Q5 mastermix (NEB, cat#M0492), 10 ng of plasmid template, and 500 nM forward and reverse primers. PCR was performed in the following steps: (1) 98˚C for 30 s, (2) 98˚C for 10 s, (3) 55˚C for 30 s, (4) 72˚C for 2 min, (5) return to step 2 for 22 cycles, (6) 72˚C for 5 min. 25 mL of PCR reaction was mixed with 1 mL of DpnI (NEB, cat#R0176) at 37˚C for 4 hr. The samples were then purified by gel extraction and a DNA Clean and Concentrator À5 kit (Zymo Research,cat#D4014). PCR product solution were then phosphorylated with a second round of T4 PNK: 100 mL of gel-extracted PCR product,12 mL of 10x T4 ligase buffer, 5 mL of T4 PNK, 5 mL of sterile water and were incubated at 37˚C for 1 hr with 90˚C for 30 s. The reactions were ligated with 100 mL PNK phosphorylated PCR product, 15 mL T4 ligase (NEB, cat#M0202S), 30 mL T4 ligase buffer and, 155 mL sterile water. The reaction was incubated at room temperature for 24 hr.
Growth rate measurements in the turbidostat for DHFR DL121 mutant library DHFR DL121 sublibraries were transformed into ER2566 DfolA DthyA E. coli by electroporation using a MicroPulser Electroporator (Bio Rad) and gene pulser cuvettes (Bio Rad, cat#165-2089). Cultures were grown overnight at 37˚C in GM9 minimal media (93.0 mM Sodium (Na + ), 22.1 mM Potassium (K + ), 18.7 mM Ammonium (NH 4 ), 1.0 mM Calcium (Ca 2+ ), 0.1 mM Magnesium (Mg 2+ ), 29.2 mM Chloride (Cl -), 0.1 mM Sulfate (SO 4 2-), and 42.2 mM Phosphate (PO 4 3-), 0.4% glucose) pH 6.50, containing 50 mg/mL thymidine and 30 mg/mL chloramphenicol (Sigma, cat#C0378-5G) as well as folA mix which contains 38 mg/mL glycine (Sigma, cat#50046), 75.5 mg/mL L-methionine (Sigma, cat#M9625) 1 mg/mL calcium pantothenate (Sigma, cat#C8731), and 20 mg/mL adenosine (Sigma, cat#A9251). Four hours before the start of the experiment, the overnight culture was diluted to an optical density of 0.1 at 600 nm in GM9 minimal media containing 50 mg/mL thymidine and 30 mg/ mL chloramphenicol and incubated for four hours at 30˚C. The cultures were centrifuged at 2000 RCF for 10 min and resuspended in the experimental conditions of GM9 minimal media containing 1 mg/mL thymidine and 30 mg/mL chloramphenicol. This was repeated two more times. The cultures were then back-diluted to an OD600 of 0.1 in 16 mL/vial of media. The turbidostat described in Toprak et al., 2013 was used in continuous culture (turbidostat) mode with a clamp OD600 of 0.15 and a temperature of 30˚C. Each vial had a stir bar. Vials designated as 'lit' had one 5V blue LED active. The optical density was continuously monitored throughout the experiment. 1 mL samples were taken at the beginning of selection (0 hr) and at 4,8,12,16,20, and 24 hr into selection and were centrifuged at 21,130 RCF for 5 min at room temperature with the pellet being stored at À20˚C for sequencing sample preparation.

Next-generation sequencing Amplicon sample preparation
Cell pellets were lysed by the addition of 10 mL sterile water, mixed by pipetting, and incubated at 98˚C for 5 min. One mL of this was then combined with 5 mL Q5 buffer (NEB, cat#M0491S), 0.5 mL 10 mM DNTP (Thermo Scientific, cat#R0192), 2.5 mL of 10 mM forward and reverse primers specific to the sublibrary and containing the TruSeq adapter sequence (Appendix 1: SL1V2, SL2V2, SL3V2, SL4V2, DL121CLV3F, and DL_WTTS_R3), 0.25 mL of Q5 enzyme (NEB, cat#M0491S) and 13.25 mL of sterile water. These samples were then heated at 98˚C for 90 s and then cycled through 98˚C for 10 s 63-65˚C (sublibrary 1: 66˚C, sublibrary 2: 63˚C, sublibrary 3: 64˚C, and sublibrary 4: 65˚C) for 15 s and then 72˚C for 15 s, repeating 20 times with a final 72˚C heating for 120 s in a Veriti 96-well thermocycler (Applied Biosystems). These samples were then amplified using TruSeq PCR reactions with a unique combination of i5/i7 indexing primers for each timepoint. 1 mL of this PCR reaction was then combined with 5 mL Q5 buffer (NEB, cat#M0491S), 0.5 mL 10 mM DNTP (Thermo Scientific, cat#R0192), 2.5 mL of 10 mM forward and reverse primers, 0.25 mL of Q5 enzyme (NEB, cat#M0491S) and 13.25 mL of sterile water. These samples were then heated at 98˚C for 30 s and then cycled through 98˚C for 10 s 55˚C for 10 s and then 72˚C for 15 s, repeating 20 times with a final 72˚C heating for 60 s in a Veriti 96 well thermocycler (Applied Biosystems). Amplified DNA from i5/i7 PCR reaction was quantified using the picogreen assay (Thermo Scientific, cat#P7589) on a Victor X3 multimode plate reader (Perkin Elmer) and the samples were mixed in an equimolar ratio. The DNA was then purified by gel extraction and a DNA Clean and Concentrator À5 kit (Zymo Research, cat#D4014). DNA quality was determined by 260 nm/230 nm and 260 nm/280 nm ratios on a DS-11 +spectrophotometer (DeNovix) and concentration was determined using the Qubit 3 (Thermo Scientific). Pooled samples were sent to GeneWiz where they were analyzed by TapeStation

DHFR chimeric expression constructs
The E. coli DHFR LOV2 fusion was cloned as an NcoI/XhoI fragment into the expression vector pHIS8-3 (Lee et al., 2008;Reynolds et al., 2011). Point mutants were engineered into the DHFR gene using QuikChange II site-directed mutagenesis kits (Agilent cat#200523) using primers specified in Appendix 1. All DHFR/LOV2 fusions for purification were expressed under control of a T7 promoter, with an N-terminal 8X His-tag for nickel affinity purification. The existing thrombin cleavage site (LVPRGS) following the His-tag in pHIS8-3 was changed to a TEV cleavage site using restriction-free PCR to improve the specificity of tag removal (Bond and Naus, 2012). All constructs were verified by Sanger DNA sequencing.

Protein expression and purification
DHFR-LOV2 chimeric proteins were expressed in BL21(DE3) E. coli grown at 30˚C in Terrific Broth (12 g/L Tryptone, 24 g/L yeast extract, 4 mL/L glycerol, 17 mM KH 2 PO 4 , and 72 mM K 2 HPO 4 ). Protein expression was induced when the cells reached an absorbance at 600 nm of 0.7 with 0.25 mM IPTG, and cells were grown at 18˚C overnight. Cell pellets were lysed by sonication in binding buffer (500 mM NaCl, 10 mM imidazole, 50 mM Tris-HCL, pH 8.0) added at a volume of 5 ml/g cell pellet.

Steady state Michaelis Menten measurements
The protein was spun down at 21,130 RCF at 4˚C for 10 min and the supernatant was moved to a new tube with any pellet being discarded. The concentration of the protein was quantitated by A280 using a DS-11 +spectrophotometer (DeNovix) with an extinction coefficient of 44920 mM À1 cm À1 . The parameters k cat and K m under Michaelis-Menten conditions were determined by measuring the initial velocity for the depletion of NADPH as measured in absorbance at 340 nm, with an extinction coefficient of 13.2 mM À1 cm À1 . This is done in a range of substrate concentrations with a minimum of 8 data points around 4 K m , 2 K m , 1.5 K m , K m , 0.8 K m , 0.5 K m , 0.25 K m and 0. The initial velocities (slope of the first 15 s) were plotted vs. the concentration of Dihydrofolate and fit to a b-mercaptoethanol and quantitated by A282 with an extinction coefficient of 28 mM À1 cm À1 . Depletion of NADPH is observed in 1 mL cuvettes with a path length of 1 cm in a Lambda 650 UV/VIS spectrometer (Perkin Elmer) with attached water Peltier system set to 17˚C. Lit samples are illuminated for at least 2 min by full spectrum 125 watt 6400K compact fluorescent bulb (Hydrofarm Inc cat#FLC125D). Dark samples were also exposed to the light in the same way as the lit samples but were in opaque tubs. Velocity, V ¼ k cat P ½ S ½ KM þ S ½ , was calculated using the concentration of DHF found in wild-type E. coli (~25 mM Kwon et al., 2008).

Spectrophotometry of the LOV2 chromophore
The spectra of the LOV2 chromophore is determined with a Lambda 650 UV/VIS spectrometer (Perkin Elmer) at 350-550 nm using paired 100 mL Hellma ultra micro cuvettes (Sigma cat#Z600350-1EA) with a path length of 1 cm. Purified protein in was diluted (when possible) to 20 mM in MTEN buffer pH 7.00 with 0.35% b-mercaptoethanol The lit samples are illuminated for at least 2 min by full spectrum 125 watt 6400K compact fluorescent bulb (hydrofarm Inc). Relaxation of the lit state chromophore is observed in the Lambda 650 UV/VIS spectrometer (Perkin Elmer) at 447 nm (dark peak) using paired 100 mL Hellma ultra micro cuvettes (Sigma cat#Z600350-1EA) with a path length of 1 cm.

Quantification and statistical analysis Next-generation sequencing
The sequencing data analysis can be divided into two portions: (1) Read Joining, Filtering and Counting, followed by (2) Calculating Relative Fitness and Final Filtering. We describe each step below; all code was implemented in Bash shell scripting or Python 3.6.4. All analysis codes have been made available as a series of python 3 Jupyter Notebooks on github (https://github.com/reynoldsk/allostery-in-dhfr; McCormick et al., 2021; copy archived at swh:1:rev: dd8ee13f775f8b08548d64868f15e46583cbf543).

Read joining, filtering, and counting
The data analysis began with unjoined illumina fastq.gz files separated by index (generated by Gene-Wiz). The forward and reverse reads were combined using usearch v11.0.667 using the i86linux32 package. The commands given to usearch are contained in the script UCOMBINER.bsh.
Reads of each paired fastq file are identified and quality checked using the script DL121_fastq_analysis.py. Mutant nucleotide counts and number of wild-type reads are stored in a dictionary where the read count is separated by file name (vial and timepoint eg: T2V3) and sublibrary. If any nucleotide in the coding region is below a qscore cutoff of 30, that read is discarded. Counts of every nucleotide are saved in a text file by timepoint and vial.
Converting nucleotide variation to amino acid count as well as probabilistic sequencer error correction is done by the Hamming_analysis.ipynb script. Given the probabilistic nature of base calling on the Illumina platform, one can expect a number of reads that were errantly called. For each codon, the expected number of reads due to sequencing noise was calculated with the formula: The number of errant mutants (NErrant Mut t ) can be calculated from the number of observed wild type (N WT t ), the average Q score of the sequencing run Q, and the hamming distance (HDÞ or number of mutations away from. The number of errant mutants then subtracted from the actual mutant count. In addition to the number of observed wild type, this is calculated for every possible mutation observed, up to the 31 other nucleotide codons, (NNK codons are discarded due to the nature of library construction). Once the total number of errant reads are calculated and subtracted from the mutant and wild-type counts, they are then converted into the amino acid sequence and are saved into text files. These files are then used to load information for calculation of growth rate and allostery.
Calculating relative fitness and final filtering Growth_Rate_and_Allostery.ipynb was the python script used for this analysis. Relative frequency was calculated as follows: Variant frequencies (N Mut t ) were determined relative to WT (N Wt t ) and normalized to the initial frequency distribution at t=0. The relative growth rate then calculated by linear regression of these normalized frequencies. Light dependence was calculated as the difference between lit vs. dark growth rates. Variant frequency was only calculated if there were more than 50 mutant reads at time zero. Definitions for sector identity, conservation values, and surface identity used in SectorSurfaceDefinitions.ipynb are the same as those from Reynolds et al., 2011. Accessible surface area was calculated using MSMS, using a probe size of 1.4Å and excluding water as well as heteroatoms (Sanner et al., 1996). Values for total surface areas were taken from Chothia, 1976. Together these were used to calculate relative solvent accessible surface area, and 25% was used as a cutoff for 'surface'. A surface site is considered to contact the sector if the atoms comprising the peptide bond contact *any* sector atoms. Contact is defined as the sum of the atom's Pauling radii + 20%.
To determine significant allosteric mutations, a p-value for each mutation was computed by unequal variance t-test under the null hypothesis that the lit and dark replicate measurements have equal means. Two cutoffs were used, a standard cutoff of p=0.05, and a more stringent cutoff that is adjusted to consider multiple hypothesis testing. A multiple-hypothesis testing adjusted p-value of p=0.016 was determined by Sequential Goodness of Fit (Carvajal-Rodriguez and de Uña-Alvarez, 2011). General analysis and figures made from this data are performed in allostery_analysis.ipynb.  Figure 4 of the main text. (b) Fisher Exact Test p-values for the null hypothesis that the sector and inactivating mutants are independent properties. Inactivating mutations are defined as those that yield relative growth rates at or below the growth rate for DL121-D27N. Over a range of sector definitions, the null hypothesis is rejected at a confidence level of 0.05 or better, shown in red. Sector definitions were taken from Reynolds et al., 2011(Rivoire et al., 2016. (c) Fisher Exact Test p-values for the null hypothesis that conserved positions and inactivating mutants are independent. Calculations were made over a range of conservation definitions chosen to result in an equal number positions as the sector positions in Supplementary file 1b 23, 36, 40, and 49 positions respectively. In all cases, the null hypothesis is rejected at a confidence level of 0.05 or better (red), and inactivating mutations are enriched at conserved positions beyond expectation due to random chance. Conservation values are calculated as in Reynolds et al., 2011(Rivoire et al., 2016, and reflect the Kullback-Leibler relative entropy of amino acid frequencies at each DHFR position. (d) Fisher Exact Test p-values for the null hypothesis that the sector and allosteric mutations are independent. We compared over four sector cutoffs (as defined in Rivoire et al., 2016) and at two cutoffs for allostery significance (a standard p-value of 0.05, and an adjusted p-value of 0.016). The multiple hypothesis testing adjusted p-value was obtained by Sequential Goodness of Fit (SGoF, Carvajal-Rodriguez and de Uña-Alvarez, 2011). The top table shows the association between sector positions and allostery enhancing mutations; the bottom table computes the associate between sector positions and allostery disrupting mutations. In nearly all cases, the null hypothesis is rejected at a confidence level of 0.05 or better, shown in red. (e) Fisher Exact Test p-values for the null hypothesis that the solvent accessible DHFR surface and allosteric mutations are independent. At two cutoffs for allostery (a standard p-value of 0.05, and an adjusted p-value of 0.016), the null hypothesis is rejected at a confidence level of 0.05 or better, shown in red. (f) Statistical association of allosteric mutations and surface positions that are either within or contacting the sector. Contacting was defined as two atoms within the sum of their Pauling radii plus 20%. A surface site contacts the sector if the peptide bond atoms of the surface site contact any atoms in the sector position. P-values were computed by Fisher exact test with the null hypothesis that the sector and allosteric mutations are independent. Cutoffs for sector definition as defined in Rivoire et al., 2016 are shown as well as mutants determined to effect allostery either at a 95% confidence interval (p<0.05) or at the multiple hypothesis testing adjusted p-value (p<0.016). The null hypothesis that there is no relationship between allosteric mutations and sector or sector contacting positions on the surface of DHFR of the DL121 chimera is rejected at a confidence level of 0.05 or better over a range of cutoffs, shown in red. Allostery enhancing mutations are depleted from sector connected surface sites, while allostery disrupting mutations are enriched (in comparison with random expectation). (g) Statistical association of allosteric mutations and surface positions that are contacting the sector. In contrast to Supplementary file 1f, surface positions within the sector are excluded. Fisher Exact Test p-values were calculated for the null hypothesis that the sector and allosteric mutations are independent. Cutoffs for sector definition as defined in Rivoire et al., 2016 are shown as well as mutants determined to effect allostery either at a 95% confidence interval (p<0.05) or at the multiple hypothesis testing adjusted p-value (p<0.016). At most cutoff combinations, there is not a statistically significant association between sector connected surface sites and either mutations that enhance (top panel) or disrupt allostery (bottom panel).
. Transparent reporting form
The following dataset was generated: