Mutations that improve the efficiency of a weak-link enzyme are rare compared to adaptive mutations elsewhere in the genome

New enzymes often evolve by amplification and divergence of genes encoding enzymes with a weak ability to provide a new function. Experimental studies to date have followed the evolutionary trajectory of an amplified gene, but have not addressed other mutations in the genome when fitness is limited by an evolving gene. We have adapted Escherichia coli in which an enzyme’s weak secondary activity has been recruited to serve an essential function. While the gene encoding the “weak-link” enzyme amplified in all eight populations, mutations improving the new activity occurred in only one. This beneficial allele quickly swept the amplified array, displacing the parental allele. Most adaptive mutations, however, occurred elsewhere in the genome. We have identified the mechanisms by which three of the classes of mutations increase fitness. These mutations may be detrimental once a new enzyme has evolved, and require reversion or compensation, leading to permanent changes in the genome.

with two inefficient activities required for synthesis of histidine and tryptophan amplified and 54 diverged to alleles encoding two specialists within 2000 generations (Näsvall et al., 2012). 55 However, this study followed only mutations in the diverging gene. In general, when an 56 organism is exposed to a novel selection pressure that requires evolution of a new enzyme, any 57 mutation -either in the gene encoding the weak-link enzyme or elsewhere in the genome -that 58 improves fitness will provide a selective advantage. 59 We have generated a model system in E. coli for exploring the relative importance of 60 mutations in a gene encoding a weak-link enzyme and elsewhere in the genome. ProA (g-61 glutamyl phosphate reductase, Figure 2) is essential for proline synthesis in E. coli. ArgC (N-62 acetylglutamyl phosphate reductase) catalyzes a similar reaction in the arginine synthesis 63 pathway, although the two enzymes are not homologous (Goto et al., 2003;Ludovice et al., 64 1992; Page et al., 2003). ProA can reduce N-acetylglutamyl phosphate, but its activity is too 65 inefficient to support growth of a ∆argC strain of E. coli in glucose. However, a point mutation 66 that changes Glu383 to Ala allows slow growth of the ∆argC strain in glucose. Enzymatic assays 67 show that E383A ProA (ProA*) has severely reduced activity with g-glutamyl semialdehyde 68   Over the course of the experiment, growth rate increased 2.5-3.5-fold for all eight 119 populations ( Figure 3). This improvement corresponds to a change in doubling time from ~3.3 120 hours to ~1 hour. As expected, the rate at which growth rate increased slowed over the course of 121 adaptation. Occasional dips in growth rate occurred during the adaptation (Figure 3). These dips 122 are artifacts arising from temporary aberrations in selective conditions due to turbidostat 123 malfunctions that prevented introduction of fresh medium, causing the cultures to enter 124 stationary phase. Occasionally cultures were saved as frozen stocks until the turbidostat was 125 fixed (see Materials and Methods). Restarting cultures from frozen stocks may have caused a 126 temporary drop in growth rate. 127 We monitored proA* copy number during the adaptation experiment using qPCR of 130 population genomic DNA ( Figure 4A). proA* was present in at least 6 copies by generation 300 131 in all eight populations. Six of the populations maintained 6-9 copies for the remainder of the 132 adaptation. In contrast, proA* copy number in population 2 increased to as many as 20 copies. In 133 population 3, proA* copy number dropped to three by generation 400. proA* copy number to these differences in the size of the amplified region on the genome. The 143 population with the smallest amplified region (4.9 kb, population 2) carries fewer multicopy 144 genes and thus should incur a lower fitness cost, allowing proA* to reach a higher copy number 145 The decrease in proA* copy number in population 3 was particularly noteworthy since it 149 might have been an indication that a mutation had improved the neo-ArgC activity of ProA*, 150 resulting in a decreased need for multiple copies. In fact, a mutation in proA* that changes 151 Phe372 to Leu ( Figure 5A) was observed in population 3. E383A F372L ProA will be designated 152 ProA** hereafter. Introduction of this mutation into the parental strain (which carried proA*) 153 increased growth rate by 69% ( Figure 5B), confirming that the mutation is adaptive and not a 154 random hitchhiker. In contrast, no mutations in proA* were identified in any of the other 155 populations. 156 The neo-ArgC and native ProA activities of wild-type, ProA*, and ProA** were assayed 157 (in the reverse direction) with NAGSA and GSA, respectively ( Table 2). The kcat/KM,NAGSA for 158 ProA** is 3.6-fold higher than that of ProA* and nearly 80-fold higher than that for ProA. In 159 contrast, there is no difference between kcat/KM,GSA for ProA* and ProA**. 160 To determine when the mutation that changes Phe372 to Leu in ProA* occurred, we 161 performed sequencing of population genomic DNA at generations 270, 440, 630 and at the end 162 of the adaptation ( Figure 5C, 130x, 122x, 70x and 81x sequencing depth, respectively). proA** 163 was present in 9% of the sequencing reads of the population by generation 270. By the time 164 deamplification of proA* had occurred at generation 440, the frequency of proA** had risen to 165 21% of sequencing reads. By the end of the adaptation, proA** was fixed in the population, yet 3 166 copies remained in the genome, suggesting that ProA** does not have sufficient neo-ArgC 167 activity to be present at a single copy in the genome. 168 The fact that a mutation that improved the neo-ArgC activity of ProA* occurred in only 169 one population was surprising considering that ProA* is the weak-link enzyme limiting growth 170 rate. Because the growth rates of all 8 populations improved substantially (Figure 3)

Mutations outside of proA* improved fitness 176
Population genome sequencing at the end of the experiment and at several intermediate 177 timepoints from populations with longer adaptations ( Figure S1) revealed that the final 178 populations contained 13-178 mutations present at frequencies ≥5%, 3-5 mutations at 179 frequencies ≥30%, and 1-4 mutations (not including amplification of proA*) that were fixed 180 (100%). We found several mutations in the same genes in different populations, suggesting that 181 these mutations confer a fitness advantage (Supplementary File 1). 182 The first mutation to appear in all populations was either an 82 bp deletion in the rph 183 pseudogene directly upstream of pyrE or a C➝T mutation in the intergenic region between rph 184 and pyrE.  (Howard et al., 2011). We introduced this mutation into the genome of the parent AM187 and 193 compared the growth rates of the mutant and AM187 ( Figure S2). Surprisingly, we saw no 194 significant change in growth rate. Since this mutation appeared about the same time as the 195 mutations upstream of pyrE, we wondered whether the ygcB mutation might only improve 196 growth rate in the context of restored pyrE expression. Thus, we also tested the growth rate of a 197 strain with the Cas3 mutation and the 82 bp deletion upstream of pyrE. Again, we saw no 198 significant change in relative growth rate ( Figure S2). Thus, the ygcB mutation is most likely a 199 neutral hitchhiker. The most likely explanation for its prevalence is that it was present in a clade 200 of the parental population that later rose to a high frequency when an additional beneficial 201 mutation was acquired by one of its members. We reintroduced six of the mutations upstream of argB into the parental strain AM187. 208 The mutations increased growth rate by 36-61% ( Figure 6B  We also considered the possibility that mutations upstream of argB might increase 237 expression by increasing ribosome drafting. Figure S4 shows the predicted folding times of RNA 238 sequences encompassing 30 nucleotides downstream to 30 nucleotides upstream of the argB start 239 codon (totaling 63 nucleotides) for each mutant except the -94 A®G mutant. Folding is 240 predicted to be significantly slower for three of the seven mutant RNAs, which should increase 241 the efficiency of translation. For four of the mutants, however, folding rate is similar to or even 242 faster than that of the parental mRNA. For three of these mutants (the 58 bp deletion, -22 C®T, 243 and the 38 bp duplication), the secondary structure prediction shown in Figure S3 suggests that 244 the translation initiation region is less likely to be sequestered in a hairpin. Thus, the effects of 6 245 of the 8 mutations upstream of argB can be explained by a decrease in secondary structure 246 stability around the ribosome binding site or a decrease in the folding rate of the mRNA in this 247 region, or both. The effects of the -94 A➝G and -22 C➝A mutations, however, cannot be 248 explained by either of these mechanisms. 249 Figure 6. Several adaptive mutations occurred upstream of argB. (A) Locations of adaptive mutations (red) found upstream of argB (yellow) and argH (purple). argC was replaced with kan r in the parental strain AM187, giving this operon two promoters, one native to the operon (PargCBH), and the other introduced with the kan r gene (Pkanr). The table shows the percentages of each adapted population that contained a given argB mutation at the final time point. Six of the argB mutations were introduced into the genome of the ancestral strain and changes in growth rate (B), gene expression (C), and protein abundance (D) were determined.

Mutations in carB either increase activity or impact allosteric regulation 251
We found eight different mutations in carB in six of the evolved populations: four 252 missense mutations, three deletions (≥12 bp), and one 21 bp duplication ( Figure 7A). CarB, the 253 large subunit of carbamoyl phosphate synthetase (CPS), forms a complex with CarA to catalyze 254 production of carbamoyl phosphate from glutamine, bicarbonate, and two molecules of ATP (Eq. 255 [1]) (Rubino et al., 1986(Rubino et al., , 1987 Carbamoyl phosphate feeds into both the pyrimidine and arginine synthesis pathways and its 264 production is regulated in response to intermediates or products of both pathways ( Figure 7B), as 265 well as by IMP (a purine) (Pierrat & Raushel, 2002

288
We also measured the effect of mutations on UMP inhibition and ornithine activation of 289 CPS (Table 3, Figure 7D survival when an organism is exposed to a novel toxin or source of carbon or energy, or when 304 synthesis of a natural product can enable manipulation of competing organisms. This process 305 also contributes to non-orthologous gene replacement, which can occur when a gene is lost 306 during a time in which it is not required, but its function later becomes important again and is 307 replaced by recruitment of a non-orthologous promiscuous enzyme ( We have modeled a situation in which a new enzyme is required by deleting argC, which 310 is essential for synthesis of arginine in E. coli. Previous work showed that the most readily 311 available source of neo-ArgC activity that enables ∆argC E. coli to grow on glucose as a sole 312 carbon source is a promiscuous activity of ProA. However, a point mutation that changes Glu383 313 to Ala is required to elevate the promiscuous activity to a physiologically useful level. This 314  (Thoden et al., 1999). Beige, CarA; blue, CarB; yellow, allosteric domain of CarB; red, residues that are deleted or duplicated in the adapted strains; magenta, point mutations that occur in the adapted strains. IMP and ornithine bound to the allosteric domain are shown as spheres. One of the two bound ATP molecules can be seen as spheres in the center of CarB. (D-E) Influence of UMP or L-ornithine concentration on the glutamine-dependent ATPase activity of CarAB mutants. Reaction rates are shown relative to v0, the reaction rate in the absence of ligand. mutation substantially damages the native function of the enzyme, creating an inefficient 315 bifunctional enzyme whose poor catalytic abilities limit growth rate on glucose. It is important to 316 note that the decrease in the efficiency of the native reaction may be a critical factor in the 317 recruitment of ProA to serve as a neo-ArgC because it will diminish inhibition of the newly 318 important reaction by the native substrate (Khanal et al., 2015;McLoughlin & Copley, 2008). 319 When we previously carried out short-term adaptation of ∆argC proA* E. coli in glucose, The other four types of mutations are specific adaptations to the bottleneck in arginine synthesis 341 caused by substitution of the weak-link enzyme ProA* for ArgC. Interestingly, only two of these 342 -gene amplification and the mutation in proA* -directly involved the weak-link enzyme ProA*. 343 Surprisingly, we saw evolution of proA* towards a more efficient neo-ArgC in only one 344 population (Table 2). In this population, proA copy number dropped from ~7 to ~3 within 100 345 generations ( Figure 5C). This pattern is consistent with the IAD model; copy number is expected 346 to decrease as mutations increase the efficiency of the weak-link activity. However, the fact that 347 copy number did not return to one implies that the neo-ArgC activity of ProA** is not sufficient 348 to justify a single copy of the gene. The adaptive mutations in carB increase catalytic turnover, decrease inhibition by UMP, 407 or increase activation by ornithine of CPS. All of these effects should increase the level of CPS 408 activity in the cell and consequently the level of carbamoyl phosphate. Why would this be 409 advantageous? Carbamoyl phosphate reacts with ornithine, which will be in short supply due to 410 the upstream ProA* bottleneck, to generate citrulline (Legrain & Stalon, 1976) (Figure 9). If 411 ornithine transcarbamoylase is not saturated with respect to carbamoyl phosphate, then 412 increasing carbamoyl phosphate levels should increase citrulline production and thereby increase 413 flux into the lower part of the arginine synthesis pathway. Although we do not know whether 414 ornithine transcarbamoylase is saturated with respect to carbamoyl phosphate in vivo (the KM for 415 carbamoyl phosphate is 360 µM (Baur et al., 1990), but the intracellular concentration of 416 carbamoyl phosphate is unknown), the occurrence of so many mutations that increase carbamoyl 417 phosphate synthase activity supports the notion that they lead to an increase in carbamoyl 418 phosphate that potentiates flux through the arginine synthesis pathway. 419 The majority of adaptive mutations we observed in carB cause loss of the exquisite 420 allosteric regulation that controls flux through this important step in pyrimidine and arginine 421 synthesis. This tight regulation likely evolved due to the energetically costly reaction catalyzed 422 by CPS, which consumes two ATP molecules. While a constitutively active CPS is beneficial in 423 the short term to improve arginine synthesis, it will likely be detrimental once arginine 424 production no longer limits growth. We term mutations such as those observed in carB 425 "expedient" mutations because they provide a quick fix when cells are under strong selective 426 pressure, but at a cost to a previously well-evolved function. The damage caused by expedient 427 mutations may be repairable later by reversion, compensatory mutations or horizontal gene 428 transfer. Interestingly, the latter two repair processes may contribute to sequence divergence 429 between organisms that has typically been attributed to neutral drift, but rather may be due to 430 scars left from previous selective processes. 431 A particularly striking conclusion from this work is that most of the mutations that 432 improved fitness under these strong selective conditions did not impact the gene encoding the 433 weak-link enzyme, but rather adjusted fluxes in the metabolic network to compensate for the 434 bottleneck in metabolism by other mechanisms. Not surprisingly, the process of evolution of a 435 new enzyme by gene duplication and divergence does not take place in isolation, but is 436 inextricably intertwined with mutations in the rest of the genome. The ultimate winner in a 437 microbial population exposed to a novel selective pressure that requires evolution of a new 438 enzyme may be the clone that succeeds in evolving an efficient enzyme that no longer limits 439 fitness while accumulating the least damaging, or at least the most easily repaired, expedient 440 mutations. 441 442 Figure 9. Adaptive mutations are predicted to increase flux through the arginine synthesis pathway. The pathway bottleneck and weak-link enzyme ProA* is shown in red. Steps in the arginine synthesis pathway that are affected by adaptive mutations are highlighted in bold type.

Supplementary Figures
443 Figure S1. Growth rate (right axis, dotted lines) and proA* copy number (left axis, solid lines) for each adapted population. Vertical dotted lines indicate when population genomic DNA was sequenced.    The turbidostat occasionally malfunctioned, requiring the adaptation to be paused. When 507 this occurred, the populations were subjected to centrifugation at 4,000 x g for 10 min at room 508 temperature and the pelleted cells were resuspended in 1.6 mL of Adaptation Medium. Half of 509 the resuspension was used to make a -70 °C 10% glycerol stock, and the other half to purify 510 genomic DNA. When the turbidostat was fixed, the frozen stock was thawed and the cells were 511 collected by centrifugation at 16,000 x g for 1 min at room temperature. The pelleted cells were 512 resuspended in 1 mL of PBS, washed, and resuspended in 500 µL of Adaptation Medium. The 513 entire resuspension was used to inoculate the appropriate chamber of the turbidostat. Sometimes 514 the adaptation had to be restarted from a frozen stock of a normal sample (as opposed to the 515 entire population as just described), resulting in a more significant population bottleneck. In this 516 case, the entire frozen stock was thawed and only 700 µL washed as described above to be used 517 for the inoculation. The remaining 300 µL of the glycerol stock were re-stored at -70 °C in case 518 the frozen stock was needed for downstream analysis. 519 520

Calculation of growth rate and generations during adaptation 521
The turbidostat takes an OD650 reading every ~3 sec and dilutes the cultures every 60 522 seconds. Thus, readings between dilutions can be used to calculate an average growth rate each 523 day based on the following equation: 524 where is the average growth rate in hr -1 , n is the number of independent growth rate 525 calculations within a given 24 hr period, Nt 2 is the OD650 reading right before the dilution, Nt 1 is 526 the OD650 reading right after the dilution, and t1 and t0 are the times at which the OD650 was 527 measured. The number of generations per day (g) was then calculated from using Eq. Growth rates for each strain were calculated from growth curves measured in 545 quadruplicate. Overnight cultures were grown in LB/kanamycin at 37 °C from glycerol stocks. 546 Ten µL of each overnight culture was used to inoculate 4 mL of LB/kanamycin and the cultures 547 were allowed to grow to mid-exponential phase (OD600 0.3-0.6) at 37 °C with shaking. The 548 cultures were subjected to centrifugation at 4,000 x g for 10 min at room temperature and the 549 pellets resuspended in an equal volume of PBS. The pellets were washed once more in PBS. The 550 cells were diluted to an OD of 0.001 in Adaptation Medium and a 100 µL aliquot was loaded 551 into each well of a 96-well plate. The plates were incubated in a Varioskan (Thermo Scientific) 552 plate reader at 37 ˚C with shaking every 5 minutes for 1 minute. The absorbance at 600 nm was 553 measured every 20 minutes for up to 500 hours. The baseline absorbance for each well (the 554 average over several smoothed data points before growth) was subtracted from each point of the 555 growth curve. Growth parameters (maximum specific growth, μmax; lag time, λ; maximum 556 growth, Amax) were estimated by non-linear regression using the modified Gompertz equation 557 (Zwietering et al., 1990). Non-linear least-squares regression was performed in Excel using the 558 Solver feature. 559 560

Measurement of argB and argH gene expression by RT-qPCR 561
Overnight cultures were grown in LB/kanamycin at 37 °C from glycerol stocks. Ten µL 562 of each overnight culture was used to inoculate 4 mL of LB/kanamycin and the cultures were 563 grown to mid-exponential phase (OD600 0.3-0.6) at 37 °C with shaking. Exponential phase 564 cultures were centrifuged at 4,000 x g for 10 min and pellets resuspended in equal volume PBS. 565 Pellets were washed once more in PBS. The cells were diluted to an OD of 0.001 in 4 mL of 566 Adaptation Medium and grown to an OD600 of 0.2-0.3. Four 2-mL aliquots of culture were 567 thoroughly mixed with 4 mL of RNAprotect Bacteria Reagent (Qiagen) and incubated at room 568 temperature for 5 min before centrifugation at 4,000 x g for 12 min at room temperature. Pellets 569 were frozen in liquid N2 and stored at -70 °C. Procedures for purification of RNA, reverse 570 transcription, qPCR, and data analysis are described in the supplementary material. 571 572

Measurement of ArgB and ArgH levels by mass spectrometry 573
Freezer stocks were used to streak each strain on LB/kanamycin. Four parallel 2-mL 574 aliquots of LB/kanamycin were inoculated with individual colonies and the cultures were grown 575 to mid-exponential phase at 37 °C with shaking. One mL of each culture was subjected to 576 centrifugation at 16,000 x g for 1 min at room temperature. The cell pellets were resuspended in 577 1 mL PBS and washed twice more in PBS before resuspension and dilution to an OD of 0.001 in 578 5 mL of Adaptation Medium. Cultures were grown to an OD600 of 0.2-0.3 at 37 °C with shaking 579 and then chilled on ice for 10 min before pelleting by centrifugation at 4,000 x g at 4 °C. Cell 580 pellets were frozen in liquid N2 and stored at -70 °C. 581 to 80% (v/v) caused the proteins to bind to the beads. The beads were washed twice with 70% 590 (v/v) ethanol and once with 100% acetonitrile. Protein was digested and eluted from the beads 591 with 15 µL of 50 mM Tris buffer, pH 8.5 with 1 µg endoproteinase Lys-C (Wako) for 2 hours 592 with shaking at 600 rpm at 37 °C in a thermomixer (Eppendorf). One µg of trypsin (Pierce) was 593 then added to the solution and incubated at 37 °C overnight with shaking at 600 rpm. Beads were 594 collected by centrifugation and then placed on a magnet to more reliably remove the elution 595 buffer containing the digested peptides. The peptides were then desalted using an Oasis HLB 596 cartridge (Waters) according to the manufacturer's instructions and dried in a speedvac. 597 Samples were suspended in 12 µL of 3% (v/v) acetonitrile/0.1% (v/v) trifluoroacetic acid 598 and 500 ng of peptides were directly injected onto a C18 1.7 µm, 130 Å, 75 µm X 250 mm M-599 class column (Waters), using a Waters M-class UPLC. Peptides were eluted at 300 nL/minute 600 using a gradient from 3% to 20% acetonitrile over 100 minutes into an Orbitrap Fusion mass 601 spectrometer (Thermo Scientific). Precursor mass spectra (MS1) were acquired at a resolution of 602 120,000 from 380-1500 m/z with an AGC target of 2.0 x 10 5 and a maximum injection time of 603 50 ms. Dynamic exclusion was set for 20 seconds with a mass tolerance of +/-10 ppm. 604 Precursor peptide ion isolation width for MS2 fragment scans was 1.6 Da using the quadrupole, 605 and the most intense ions were sequenced using Top Speed with a 3-second cycle time. or ArgC with 2 mM "GSA" (including the hydrate and P5C) or 2 mM "NAGSA" (including the 680 hydrate), respectively, in a solution containing 100 mM potassium phosphate, pH 7.6, and 1 mM 681 NADP + and measured the burst in NADPH production (Khanal et al., 2015). The concentrations 682 of GSA+P5C+hydrate or NAGSA+hydrate were determined using the o-aminobenzaldehyde 683 assay (Albrecht et al., 1962;Mezl & Knox, 1976). The absorbance at 340 nm due to formation of 684 NADPH exhibited a burst followed by a linear phase when measured for 60 seconds. We assume 685 that the burst corresponds to reduction of the free aldehyde form of GSA or NAGSA and the rate 686 was measured at 37 °C by coupling production of ADP to oxidation of NADH using pyruvate 708 kinase, which converts ADP and PEP to ATP and pyruvate, and lactate dehydrogenase, which 709 were included in the structure prediction. The first 33 nucleotides were included because an 734 mRNA-bound ribosome prevents another ribosome from binding to the mRNA until it has 735 moved past the first 33 nucleotides (Steitz, 1969   Red recombinase genes (exo, gam, and bet) under the control of a heat-inducible promoter, and 774 a temperature-sensitive origin of replication (Datta et al., 2006). These cells were grown to an 775 OD600 of 0.2-0.4 at 30 °C and then incubated at 42 °C with shaking for 15 min to induce 776 expression of the Red recombinase genes. The cells were immediately subjected to 777 electroporation with 100 ng of either pAM068 or pAM100, which encode guide RNAs targeting 778 20-nucleotide sequences upstream of argB or in rph, respectively (Table S3), and 450 ng of a 779 linear homology repair template that will introduce the desired deletion into the genome (Table  780 S4). (Linear homology repair templates were amplified from genomic DNA of clones isolated 781 during the adaptation experiment that contained the desired deletions and and the PCR fragments 782 were gel-purified. Primers used to generate the linear DNA mutation fragments are listed in 783 Table .) The cells were allowed to recover at 30 °C for 2-3 hours before being spread onto 784 LB/ampicillin plates. Surviving colonies contained the guide RNA plasmid and the desired 785 deletion, as confirmed by Sanger sequencing. pAM053 and the guide RNA plasmids, both of 786 which have temperature-sensitive origins of replication, were cured by growth of individual 787 colonies at 37 °C. 788 Strain AM209 was constructed from E. coli BL21(DE3) for expression of wild-type and 789 mutant ProAs. We deleted argC and proA to ensure that any activity measured during in vitro 790 assays was not due to trace amounts of ArgC or wild-type ProA. To accomplish these deletions, 791 we amplified and gel-purified DNA fragments containing antibiotic resistance genes The primer sets used for each gene are listed in Table S2. PowerSYBR Green PCR 814 master mix (Thermo Scientific) was used according to the manufacturer's protocol. A standard 815 curve using variable amounts of AM187 genomic DNA was run on every plate to calculate 816 efficiencies for each primer set. Primer efficiencies were calculated with the following equation: 817 [10] M = 10 L, / P 4 where E is the efficiency of primer set x, and m is the slope of the plot of Ct vs. starting quantity 818 for the standard curve. proA* copy number was then calculated with the following equation 819 (Hellemans et al., 2007): 820 where n is the proA* copy number, and ∆Ct,x is the difference in Ct's measured during 821 amplification of AM187 and sample genomic DNA with primer set x.