Replacing the Calvin cycle with the reductive glycine pathway in Cupriavidus necator

Formate can be directly produced from CO2 and renewable electricity, making it a promising microbial feedstock for sustainable bioproduction. Cupriavidus necator is one of the few biotechnologically-relevant hosts that can grow on formate, but it uses the inefficient Calvin cycle. Here, we redesign C. necator metabolism for formate assimilation via the highly efficient synthetic reductive glycine pathway. First, we demonstrate that the upper pathway segment supports glycine biosynthesis from formate. Next, we explore the endogenous route for glycine assimilation and discover a wasteful oxidation-dependent pathway. By integrating glycine biosynthesis and assimilation we are able to replace C. necator’s Calvin cycle with the synthetic pathway and achieve formatotrophic growth. We then engineer more efficient glycine metabolism and use short-term evolution to optimize pathway activity, doubling the growth yield on formate and quadrupling the growth rate. This study thus paves the way towards an ideal microbial platform for realizing the formate bioeconomy.


72
Here, we replace the Calvin cycle of C. necator with the rGlyP. First, we show that overexpression of the 73 upper segment of the pathway enables an otherwise glycine-auxotrophic C. necator to utilize formate for 74 glycine biosynthesis. We then evolve C. necator to utilize glycine as a carbon source, revealing that, rather 75 than being converted to serine and pyruvate, this amino acid is first oxidized to glyoxylate, which is 76 subsequently assimilated via the well-known glycerate route 18,19 (Fig. 1). Next, we construct a strain in which 77 the Calvin cycle was disrupted and is thus unable to grow on formate. Integration of the two segments of the 78 rGlyP restores the formatotrophic growth of this strain, albeit at a low growth rate. We further optimize pathway 79 activity by shifting overexpression from a plasmid to the genome, forcing glycine assimilation via serine, and 80 conducting a short-term adaptive evolution. Our final strain displays a biomass yield on formate equivalent to 81 that of the WT strain using the Calvin cycle, hence confirming the recovery of the growth phenotype after the 82 fundamental rewiring of cellular metabolism towards the use of the synthetic route. Our study therefore paves 83 the way towards a highly efficient platform strain for the production of value-added chemicals from CO2. as main carbon source and formate and CO2 as glycine source (at 10% CO2 and 100 mM bicarbonate; high 104 CO2 concentration is required to thermodynamically and kinetically push the glycine cleavage system in the 105 reductive direction). Yet, the strain harboring the weak promoter p14 in pC1 showed the best growth. The 106 growth of the strain carrying pC1-p3 and pC2-p14 on fructose and formate but in the absence of glycine (green 107 line in Fig. 2b), was almost identical to the positive control, in which glycine was added to the medium (brown 108 line in Fig. 2b). Growth was also possible in the absence of fructose, in which case formate served both as a 109 carbon source for glycine biosynthesis and as a source of reducing power source to support growth via the 110 Calvin cycle (orange line in Fig. 2b). Interestingly, transformation of the glycine auxotroph strain with pC1-p3 111 alone sufficed to support glycine biosynthesis from formate, albeit at a low rate (blue line in Fig. 2b), indicating 112 that the native expression of the GCS supports at least some reductive activity. 113 To confirm that glycine as well as the cellular C1 moieties are indeed generated from formate assimilation, we 114 conducted a 13 C-labeling experiment. We cultivated the strain harboring pC1-p3 and pC2-p14 on unlabeled 115 fructose and CO2 as well as 13 C-formate. We measured the labeling pattern of glycine and histidine (the latter 116 contains a carbon derived from 10-formyl-THF) as well as serine, which was expected not to be labeled. We 117 found glycine and histidine to be almost completely once labeled and serine to be unlabeled, thus confirming 118 that formate is assimilated to the THF pool and into glycine (Fig. 2c). 119 activity of aminotransferase enzymes; we did not delete isocitrate lyase, the source of cellular glyoxylate, as we found this deletion to 134 hamper cell viability. Labeling experiments are averages from duplicates.

Exploring growth on glycine 136
Next, we turned our attention to the downstream segment of the rGlyP, that is, assimilation of glycine into 137 biomass. We explored the native capacity of C. necator to grow on glycine. We inoculated twelve parallel 138 cultures of C. necator on a minimal medium with glycine as a sole carbon source. Initially, no growth was 139 observed. However, after a week, three of the inoculated cultures started growing. When reinoculated into a 140 fresh medium with glycine, these cultures started growing immediately, possibly due to genetic adaptation. 141 We performed whole-genome sequencing of these three strains and found that each had a sense mutation in 142 either gltR1 or gltS1 (Supplementary Data 1). These two genes reside in the same operon and encode for a 143 dual-component signal-regulator system that activates the expression of gltP1, encoding for a putative 144 dicarboxylate/amino acid transporter (Fig. 3a). While we detected several other mutations in specific strains 145 (Supplementary Data 1), they did not appear in all three strains and hence they probably have lower 146 contribution to growth on glycine. 147

158
We analyzed the transcriptome of one of the evolved strains. In comparison to the WT strain growing on 159 pyruvate, we observed >1,000-fold increase of gltP1 transcription in the evolved strain growing on glycine. It 160 therefore seems that GltP1 acts as a glycine transporter and that glycine, once within the cell, activates a 161 native route for its metabolism. To uncover this endogenous glycine assimilation pathway we checked which 162 other genes were significantly overexpressed in the glycine-assimilating strain (Supplementary Data 2). 163 dadA6, encoding for a putative FAD-dependent D-amino acid dehydrogenase, was amongst the 10 most 164 upregulated genes (~200-fold increase in transcript abundance). Given that a similar enzyme from Bacillus 165 subtilis was demonstrated to oxidize both D-amino acids and glycine 22 , we speculated that DadA6 can also 166 oxidize glycine to glyoxylate. In addition, the genes encoding for glyoxylate carboligase (gcl), tartronate 167 semialdehyde reductase (tsr), and glycerate kinase (ttuD1) were amongst the 10 most upregulated genes 168 (>1600-fold, ~400-fold, ~300-fold increased transcript, respectively). This led us to speculate that glycine is 169 assimilated via oxidation to glyoxylate, followed by the activity of the well-known glycerate pathway 18,19 (Fig.  170   3a).

193
To further confirm that glycine assimilation occurs via its oxidation to glyoxylate and the operation of the 194 glycerate pathway, we deleted either dadA6 or gcl-tsr in the glycine-evolved strain. Either deletion completely 195 abolished growth on glycine (red and orange lines in Fig. 3b). On the other hand, deletion of the genes as the accumulation of glycine would induce the glycine-assimilating segment, as shown above. First, we 204 deleted the genes encoding for both Rubisco isozymes (cbbSLc2 on chromosome 2 and cbbSLp on a 205 megaplasmid) in a non-evolved C. necator 23 , thus abolishing growth on formate via the Calvin cycle. We 206 transformed this strain with pC1 and pC2 carrying different combinations of promoters (Methods and Fig. S1, 207 S2). After two weeks of incubation in a minimal medium with formate (10% CO2 and 100 mM bicarbonate) we 208 observed growth of three cultures harboring pC1-p14 and pC2-p3. Upon reinoculation to a fresh medium with 209 formate, these strains, which we termed CRG1 (C. necator rGlyP 1), were able to immediately grow on 210 formate, albeit at a low growth rate (purple line in Fig. 5c represents one of this strains, having a doubling time 211 56 h). To test whether glycine assimilation proceeds via the "glyoxylate route" (Fig. 3a), as was the case when 212 glycine was provided in the medium, we deleted dadA6 in a CRG1 strain. This CRG1 ΔdadA6 strain was 213 effectively unable to grow on formate (light red line in Fig. 5c), confirming that the growth of the CRG1 strain 214 takes place via glycine oxidation (Fig. 5a). 215 We sequenced the CRG1 strains and found mutations both in the genome (Supplementary Data 3) and on 216 the plasmid (Fig. S1). Several of these mutations occurred in all three CRG1 strains. One such shared 217 mutation occurred inside the promoter p3 on the pC2 plasmid, resulting in the mutated promoter p3mut1 with an 218 order of magnitude lower strength (Methods and Fig. S2). Another shared mutation was the deletion of 219 ccbRc2, which encodes for the key activator of all Calvin cycle genes, activating both CO2 fixation operons on 220 chromosome 2 and the megaplasmid 24 . The contribution of this deletion to growth might be attributed to the 221 downregulation of phosphoribulokinase, which upon the deletion of Rubisco, generates the dead-end 222 metabolite ribulose 1,5-bisphosphate. 223 As the initial high expression of the GCS seems to be deleterious (as suggested by the mutated promoter), 224 we decided to replace its overexpression from a plasmid with genomic overexpression. We therefore cured 225 the CRG1 strain from the pC2 plasmid and replaced the native, genomic promoter of the GCS operon with six 226 constitutive promoters of different strength (Methods and Fig. S1, S2). We inoculated this strain in a minimal 227 medium with formate (at 10% CO2 and 100 mM bicarbonate). After four weeks, we observed growth of several 228 cultures harboring different GCS promoters. Reinoculation of these strains in fresh media with formate 229 enabled immediate growth. The strain, in which the genomic GCS was engineered under the control of p3, 230 was termed CRG2. The CRG2 strain showed the best growth and was further analyzed. 231

246
labeling confirms the activity of the pathway and indicate low cyclic flux via the TCA cycle (Fig. S4). Labeling experiments are averages 247 from duplicates.The CRG2 strain showed a faster growth on formate than the CRG1 strains, having a doubling time of 45 hours (blue 248 line in Fig. 5c). We sequenced the genome of the CRG2 strain and identified a few mutations (Supplementary Data 4). Among these, 249 one was directly downstream of promoter p3. We conducted quantitative PCR and found that the expression of gcvT (the first gene in 250 the GCS operon) was an order of magnitude higher with the p3mut2 promoter than with the original p3 promoter (Fig. S3). Interestingly, 251 the transcript level of gcvT in the CRG2 strain (genomic expression via p3mut2) was similar to that observed in the CRG1 strain when 252 the GCS was expressed on a plasmid under the regulation of p3mut1 (Fig. S3).

253
Improved growth on formate via the 'serine' variant of the reductive glycine pathway 254 The "glyoxylate route" for glycine assimilation is less efficient than the "serine route", as the former wastes 255 reducing power during glycine oxidation; specifically, the expected pyruvate yield from formate using the 256 "serine route" is >30% higher than with the "glyoxylate route" (Fig. 1b). We therefore aimed to force glycine 257 assimilation via the "serine route", bypassing its oxidation (Fig. 5b). We cloned the native genes encoding for 258 serine hydroxymethyltransferase (glyA) and serine deaminase (sdaA) -the two components of the "serine 259 route"and assembled them into a synthetic operon on a plasmid, which we termed pC3, under the control 260 of four different constitutive promoters of varying strength (Methods and Fig. S1, S2). We transformed the 261 CRG2 strain with the pC3 plasmid and tested whether its growth rate was improved. We found that expression 262 of glyA and sdaA from the medium promoter pphaC1 improved growth the most, decreasing doubling time to 18 263 hours and increasing biomass yield (i.e., final OD600) by more than 2-fold (red line in Fig. 5c). To check whether 264 this strain, which we termed CRG3, is indeed independent on the "glyoxylate route" we deleted either dadA6 265 or gcl-tsr. Neither of these deletions substantially altered the growth phenotype (pink and orange lines in Fig.  266 5c), confirming that the "serine route" replaced the "glyoxylate route". The CRG3 thus assimilates formate via 267 the rGlyP using its original, more efficient design. 268 To further improve growth on formate, we conducted a short-term adaptive evolution, in which, upon reaching 269 stationary phase, the culture was reinoculated in fresh medium at OD600 of 0.05. After a several cycles of 270 cultivation, the growth rate of the culture increased. We isolated a strain, termed CRG4, in which the growth 271 rate increased by 50% (doubling time of 12 hours, green line in Fig. 5c). We sequenced the genome of strain 272 CRG4 and found several mutations, none of which in genes or regulatory elements directly related to the 273 rGlyP (Supplementary Data 5). We measured the exact biomass yield of the CRG4 strain and found it to be 274 2.6 gCDW/mol formate, similar to the biomass yield of the WT strain growing on formate via the Calvin cycle, 275 2.9 gCDW/mol formate. 276 To confirm that growth of the CRG4 strain takes place via the rGlyP we performed 13 C-labeling experiments. 277 We cultivated the strain with 13 C-formate/ 12 CO2, 12 C-formate/ 13 CO2, or 13 C-formate/ 13 CO2, and measured the 278 labeling pattern of proteinogenic glycine, serine, alanine, valine, proline, threonine, and histidine; these amino

Discussion 289
In this study, we demonstrated the successful engineering and optimization of the synthetic rGlyP in C. 290 necator, replacing the Calvin cycle for supporting growth on formate. To facilitate this, we divided the pathway 291 into two segments -(i) formate conversion to glycine and (ii) glycine assimilation to biomassand explored 292 the activity of each separately before combining them into a full pathway. We discovered that C. necator can 293 effectively assimilate intracellular glycine into biomass via its oxidation to glyoxylate and the activity of the 294 glycerate pathway. However, since this route is rather inefficient due to a wasteful dissipation of reducing 295 power, we replaced it with glycine conversion to serine and pyruvate. We further demonstrated the strength 296 of integrating both rational design and short-term evolution to optimize pathway activity. This approach 297 enabled us to more than double the growth yield on formate and increase growth rate almost 4-fold (Fig. 5c). 298 During the short-term adaptive evolution, we identified several mutations that might have contributed to the 299 improved growth. However, as manipulating C. necator's genome is difficult, a systematic exploration of the 300 contribution of each mutation to the phenotype could not be easily performed. Moreover, while shifting 301 overexpression from a plasmid to the genome improved growth (i.e., the genes of the GCS), we were not able 302 to replace all plasmids with genomic expression as the introduction of multiple-gene operon (e.g., ftl-fch-mtdA) 303 into C. necator's genome is still a challenging task. Once more effective tools for engineering the genome of 304 this bacterium become available, it will be possible to further optimize the activity of the rGlyP and explore in 305 detail the cellular adaptation towards efficient assimilation of formate. 306 Replacing C. necator's Calvin cycle with the rGlyP has the potential to substantially increase biomass yield 307 on formate. However, in this study we were able only to match the yield of the natural route. This should not 308 come as a surprise as the bacterium is still not fully adapted to the use of the synthetic pathway. Further 309 optimization of pathway activity, using both rational engineering and long-term evolution is expected to boost 310 growth rate and yield of our strain. 311 The C. necator strain utilizing the rGlyP compares favorably with a recently evolved E. coli strain that grew on 312 formate via the Calvin cycle with a doubling time of 18 hours 25 . Nevertheless, a recently engineered E. coli sufficient in vivo activity. At the current stage, the identity of the FDH variant might not be important, as growth 321 is likely limited by metabolic factors other than the supply of reducing power and energy. However, as we 322 keep improving formatotrophic growth via the rGlyP, the supply of reducing power will become more and more 323 limiting, and the bacterium that harbors the more efficient FDH could have a clear advantage 16 . 324 The successful implementation of the rGlyP into both E. coli and C. necator (for which less genetic tools are 325 available) suggests that this pathway is robust enough to be introduced to various relevant hosts. This 326 robustness can be attributed to several factors, including the use of mostly ubiquitous enzymes, a linear 327 structure that avoids the need for balancing fluxes within a cyclic route, and operation at the periphery of 328 metabolism, thus negating deleterious clashes with central metabolism. The implementation of the rGlyP in 329 multiple biotechnologically-relevant microorganisms therefore seems a viable strategy, providing flexible 330 platforms for valorizing CO2-derived formate into a myriad of value-added chemicals. 331 Methods 332

Bacterial strains and conjugation 333
A C. necator H16 strain knocked out for polyhydroxybutyrate biosynthesis (ΔphaC1) was used as a platform 334 strain for engineering in this study (kindly donated by O. Lenz) 30 . E. coli DH5α was used for routine cloning, 335 while E. coli NEB10-beta was used for cloning of larger vectors. E. coli S17-1 was used for conjugation of 336 mobilizable plasmids to C. necator by biparental overnight spot mating. C. necator transconjugants were 337 selected on LB agar plates with the appropriate selection marker and 10 µg/ml gentamycin for counter-338 selection of E. coli. A complete overview of strain genotypes used in this study can be found in Table S1. 339

C. necator genomic gene deletions 340
Genomic knockouts of target genes and operons were generated using the pLO3 suicide vector (kindly 341 donated by O. Lenz), similar to the previously described methods 31,32 . In short, homology arms upstream and 342 downstream of the knockout site of ~1 kb were PCR amplified by Phusion HF polymerase (Thermo Scientific). 343 Homology arms were assembled into digested (SacI, XbaI) or PCR-amplified pLO3 backbone via Gibson 344 Assembly (HiFi, NEB or In-fusion, Takara). C. necator was conjugated with the pLO3 vectors and single-cross 345 overs were selected on tetracycline. Transconjugants were grown overnight without tetracycline to allow for 346 pSEVA551, together with a synthetic RBS and GPF (cargo #7 from SEVA system) 34 . Relative promoter 354 strength was measured based on GFP fluorescence as explained below (Fig. S2). All four promoters 355 expressed GFP at different strengths in C. necator, but we found their relative strength ranking in C. necator 356 (weak to strong: p14→p3→p4→p2) to be scrambled when comparing to the previously identified order in E. 357 coli 33 (weak to strong: p4→p14→p2→p3). The promoters from this initial library appeared rather strong in C. 358 necator. Furthermore, p14 was the weakest promoter in a vector with a RK2 origin of replication, but in vectors 359 with higher copy number ori's (RSF1010 and pBBR1 35 ) it appeared to be a medium-strength promoter, 360 stronger than p3; the other three promoters kept their respective ranking p3→p4→p2. (Note: C. necator could 361 not be conjugated with strongest promoter p2 on the highest tested copy number ori pBBR1, likely due to too 362 high expression burden). To allow better benchmarking of our library and expanding the strength range 363 especially with weaker promoters we included in our expanded library constitutive promoters previously tested 364 in C. necator: ptac 35-37 , pcat 38 , plac 35-37 , pphaC1 35-38 , pj5 35 . We also included the native C. necator ppgi promoter 365 (phosphoglucoisomerase) as we expected that this central glycolytic promoter could be an additional 366 interesting constitutive promoter as seen before in E. coli 39 . A broader range of promoter strengths was 367 found, including several weaker promoters, and the previously described strongest pj5 was also the strongest 368 of our 10 tested promoters in the full library (Fig. S2). 369

Pathway enzyme expression plasmids 370
Genes ftl, fch and mtdA were PCR-amplified from M. extorquens AM1 genomic DNA (similar GC-content to 371 C. necator, so no codon optimization needed). A synthetic RBS was included for each gene by PCR, which 372 was designed by RBS Calculator 40 to have a medium-strength of ~30,000 arbitrary units, taking into account 373 the context of the 5' UTR and start of the gene. The genes were assembled using Gibson assembly in 374 pSEVA221 vectors with four different promoters (p2, p3, p4 and p14) generating a library of pC1 vectors. Genes 375 gcvT and gcvHP were PCR amplified from the C. necator genome and synthetic RBSs were included for gcvT 376 and gcvH designed as described above. The synthetic operon was assembled via Gibson assembly in 377 pSEVA551 using three different promoters (p2, p3, p4) generating the pC2 vectors. pC3 was constructed by 378 PCR amplication of C. necator's sdaA and glyA with their native RBSs and assembled in pSEVA331 with the 379 promoters p3, p4, pcat, pphaC1, the latter two weaker promoters were included as our previous experience with 380 pC1 and pC2 showed that the weaker promoters of the library gave better growth phenotypes. pDadA6 was gcvTHP operon promoter exchange 384 To optimize overexpression of the native gcvTHP operon from the genome the native promoter was 385 exchanged by 6 different promoters ranging from weak to strong (pcat, pphaC1, p3, p4, p2, pj5) as we had no good 386 indication of what level of genomic expression was desired. To allow for promoter exchange, we constructed 387 a pLO3 suicide vector with ~1000 bp homology arms flanking the native pgcvTHP promoter. In between the 388 homology arms different PCR-amplified promoters were inserted by restriction digestion (using AscI/XbaI). 389 The knock-in protocol was the same as for the knockouts in C. necator described above. 390

E. coli glyoxylate biosensor strain construction 391
The E. coli SIJ488 strain based upon K-12 MG1655 41 , was used for the generation of the glyoxylate biosensor 392 strain. SIJ488 is engineered to carry the gene deletion machinery in its genome (inducible recombinase and 393 flippase). All gene deletions were carried out by successive rounds of λ-Red recombineering using kanamycin 394 cassettes (FRT-PGK-gb2-neo-FRT (KAN), Gene Bridges, Germany) or chloramphenicol cassettes (pKD3 42 ) 395 as described before 43 . Homologous extensions (50 bp) for the deletion cassettes were generated by PCR. 396 The sensor strain required overexpression of the malate synthase (glcB), hence, the endogenous gene was 397 amplified from E. coli genomic DNA using a two-step PCR (to remove cloning system relevant restriction 398 sites 39in this case a single site). The glcB gene was subsequently cloned into cloning vector pNivB 44 using 399 restriction and ligation (Mph1103I/XhoI), generating pNivB-glcB. The glcB gene was subsequently cloned from 400 pNivB-glcB into a BioBrick adapted pKI 39 suicide vector for integration at the Safe Site 9 45 under the control of 401 a strong constitutive promoter in the E. coli genome using enzymes EcoRI and PstI, resulting in pKI-SS9-B-402 glcB. In brief, the knock-in system relies on conjugation of the suicide vector via a ST18 E. coli strain which 403 requires 5-aminolevulinic acid for growth 46 and a sucrose (sacB) counterselection system (system described 404 in full in 39 ). 405

Growth medium and conditions 406
C. necator and E. coli were cultivated for routine cultivation and genetic modifications on Lysogeny Broth (LB) 407 (1% NaCl, 0.5% yeast extract and 1% tryptone). When appropriate the antibiotics kanamycin (100 µg/mL for 408 C. necator or 50 µg/mL for E. coli), tetracycline (10 µg/mL), chloramphenicol (30 µg/mL), ampicillin (100 µg/mL 409 for E. coli) or gentamycin (10 µg/mL for C. necator) were added. Routine cultivation was performed in 3 mL 410 medium in 12 mL glass tubes in a shaker incubator at 240 rpm. C. necator was cultivated at 30ºC and E. coli 411 at 37ºC. 412 Growth characterization experiments of C. necator were performed in J Minimal Medium (JMM) a medium 413 previously optimized for formatotrophic growth 12 . For formatotrophic growth 80 mM sodium formate was and 30 s at 60°C, and finally 1 min at 95°C. A dilution series of cDNA was made to generate a standard curve 466 to correct for PCR efficiency. Data were analyzed as described in literature 49 . 467

468
Labeling of proteinogenic amino acids 469 For labeling analysis of glycine auxotroph strains and CRG4, precultures were performed in JMM with 80 mM 470 formate (and 20 mM fructose for glycine auxotroph) and 10% CO2 in the headspace. Cells were then washed 471 twice and re-inoculated at an OD600 of 0.01 into the same media as the precultures, but when applicable 472 sodium formate or CO2 were replaced by 13 C sodium-formate (Sigma-Aldrich) and/or 13 CO2 (Cambridge 473 Isotope Laboratories). Cells were incubated in 3 mL in tubes, and for cultures with 13 CO2 tubes were placed 474 in an airtight 6L-dessicator (Lab Companion) filled with 10% 13 CO2 and 90% air on a shaker platform (180 475 rpm). When reaching stationary phase 1 mL of culture was harvested, washed twice with dH2O and 476 resuspended in 1 mL 6 M HCl. Cells were hydrolyzed overnight at 95ºC, and then evaporated under an 477 airstream for 2-4 hours after which the hydrolysate was resuspended in 1 mL dH2O. The hydrolysate was 478 analyzed using ultra-performance liquid chromatography (UPLC) (Acquity, Waters) using a HSS T3 C18-479 reversed-phase column (Waters). The mobile phases were 0.1% formic acid in H2O (A) and 0.1% formic acid 480 in acetonitrile (B). The flow rate was 400 µL/min and the following gradient was used: 0-1 min 99% A; 1-5 min 481 gradient from 99% A to 82%; 5-6 min gradient from 82% A to 1% A; 6-8 min 1% A; 8-8.5 min gradient to 99% 482