Genome-scale metabolic modeling of P. thermoglucosidasius NCIMB 11955 reveals metabolic bottlenecks in anaerobic metabolism

39 Parageobacillus thermoglucosidasius represents a thermophilic, facultative anaerobic 40 bacterial chassis, with several desirable traits for metabolic engineering and industrial 41 production. To further optimize strain productivity, a systems level understanding of its 42 metabolism is needed, which can be facilitated by a genome-scale metabolic model. Here, 43 we present p-thermo, the most complete, curated and validated genome-scale model (to 44 date) of Parageobacillus thermoglucosidasius NCIMB 11955. It spans a total of 890 45 metabolites, 1175 reactions and 917 metabolic genes, forming an extensive knowledge base 46 for P. thermoglucosidasius NCIMB 11955 metabolism. The model accurately predicts 47 aerobic utilization of 22 carbon sources, and the predictive quality of internal fluxes was 48 validated with previously published C-fluxomics data. In an application case, p-thermo was 49 used to facilitate more in-depth analysis of reported metabolic engineering efforts, giving 50 additional insight into fermentative metabolism. Finally, p-thermo was used to resolve a 51 previously uncharacterised bottleneck in anaerobic metabolism, by identifying the minimal 52 required supplemented nutrients (thiamin, biotin and iron(III)) needed to sustain anaerobic 53 growth. This highlights the usefulness of p-thermo for guiding the generation of experimental 54 hypotheses and for facilitating data-driven metabolic engineering, expanding the use of P. 55 thermoglucosidasius as a high yield production platform. 56


Biomass composition and growth energetics 194
To capture biological growth in stoichiometric models, a demand reaction referred to as a 195 biomass pseudo-reaction, was added. An overview of how the biomass pseudo-reaction was 196 defined is explained in Materials & Methods, with the final reaction components and 197 associated stoichiometry given in Supplementary Table 1. Energetic parameters were fitted 198 from aerobically grown chemostat experiments. The energy required to maintain cellular 199 homeostasis is reflected in the non-growth associated maintenance (NGAM) and was found 200 to be 3.141 mmol ATP /g DW h -1 in p-thermo. The growth associated maintenance, (GAM), was 201 estimated as 152.3 mmol ATP /g DW and reflects the energy needed for cell replication, including 202 macromolecule synthesis. The contribution of polymerization energy, required for 203 macromolecule synthesis, to the obtained GAM was estimated to be approximately 20% 204 (Supplementary table 2); relatively low compared to previously reported mesophiles (30-205 40%) [65][66][67][68] . It was previously observed that thermophilic organisms tend to require higher 206 levels of energy for growth and homeostasis at elevated temperatures and thus have a 207 reduced growth efficiency, shown in the high maintenance estimated 18,69 . This trait of 208 thermophiles makes them valuable hosts for bioproduction as it leads to higher production 209 rates of catabolic products compared to other organisms. 210

Overview of metabolism 211
To provide a comprehensive overview, two pathway maps of the model were drawn using 212 Escher 70 corresponding to central carbon and amino acid metabolism (Supplementary Files 213 3 and 4) and deposited in the GitHub repository at p-thermo/maps. Traits specific to 214 Geobacillus spp. and P. thermoglucosidasius NCIMB 11955, presented in the literature, 215 were used to validate the model's metabolism. Detailed step-by-step decisions that were 216 made, can be followed in the GitHub repository at "p-thermo/notebooks". As an example, in 217 central carbon metabolism research has shown that Geobacillius spp., unlike many 218 mesophilic Bacillus species, lack genes for a 6-phosphogluconolactonase (6PGL), 219 responsible for part of the oxidative pentose phosphate pathway (PPP) 14 . Instead, the 220 reaction can occur spontaneously and, at thermophilic temperatures, has been hypothesized 221 to be sufficiently rapid to maintain the requisite PPP flux 71 . The absence of 6PGL was 222 captured in the model, but to reflect the active PPP pathway, a pseudo-reaction was added 223 to allow the complete oxidative PPP to function. 224 (Para)geobacillus species are known to be capable of growth on a wide range of 225 carbohydrates, and have been shown to secrete various polysaccharide degrading enzymes 226 such as xylanases and other hemicellulose degrading enzymes 14,24-26 . To assess the 227 metabolic capacity of the model, growth on various carbon sources was simulated ( Figure  228 2). The choice of carbon sources was made based on previous qualitative growth 229 experiments which demonstrate a range of sole carbon sources which allow aerobic growth 230 J o u r n a l P r e -p r o o f of P. thermoglucosidasius NCIMB 11955 51 . Additionally, anaerobic growth on these 231 substrates was computationally predicted. In both cases, carbon supply was normalized to 232 30 Cmol s /g DW h, to accommodate different polymeric substrate forms being present in the 233 data set. Initially, the model showed no aerobic growth on arbutin, salicin and rhamnose, due 234 to dead-end metabolites being formed as side products in the first steps of their break down. 235 Available literature was used to fill the gaps in the catabolic pathways, which enabled 236 aerobic growth on all three carbon sources. Anaerobically, in silico growth on arbutin and 237 salicin was unfeasible, as current knowledge suggests that their catabolism is oxygen 238 dependent. Both arbutin and salicin are non-conventional carbon sources and are 239 glycosides, composed of either a hydroquinone or salicyl alcohol functional group attached 240 to glucose, respectively. It is known that metabolism of these glycosides occurs through 241 splitting of the glycosidic bond, with the two functional groups being catabolized individually.  To make this comparison, 13 C-flux data from P. thermoglucosidasius M10EXG subject to 262 varying oxygen conditions was used to qualitatively assess the predictive quality of p-263 thermo 7 . Whole proteome analysis (on a sequence basis) of the P. thermoglucosidaius 264 M10EXG and NCIMB 11955 strains shows that the ORFs between the two strains are highly

292
J o u r n a l P r e -p r o o f

Recapitulating & interpreting knockout physiology 293
To evaluate the utility of p-thermo for metabolic engineering applications, we recreated 294 previously reported homoethanologenic mutants of P. thermoglucosidasius NCIMB 11955 in 295 silico. Cripps et al 33 engineered lactate dehydrogenase (ldh) and pyruvate formate lyase (pfl) 296 knockouts in vivo, and supplemented their ethanol yields with an upregulation of pyruvate 297 dehydrogenase expression (PDH up ). Using p-thermo, the wild type (WT), Δldh and 298 ΔldhΔpfl(PDH up ) strains were recreated, as stoichiometric modeling cannot distinguish 299 between upregulated expression levels (i.e. between ΔldhΔpfl and ΔldhΔpfl PDH up ). 300 Exchange rates of the main fermentation metabolites were predicted using p-thermo and 301 their accuracy evaluated based on measured data ( Figure 4). In performing the analysis, two 302 distinct thresholds for Flux Variability Analysis (FVA) were selected, 95% and 99% of 303 optimum biomass production 72 , to assess the flexibility of exchange rates to the simulated 304

conditions. 305
The performed simulations show a substantial discrepancy between predicted and 306 measured yields in the WT and Δldh strains, whereas simulations tightly match the 307 measured yields in the ΔldhΔpfl PDH up strain. Still, the mismatch between the experimental 308 and in silico data, p-thermo can be used to understand metabolic branch points. The main 309 discrepancy observed lies in the lactate and formate yields for the WT and Δldh strains, redox imbalance in the cell which is alleviated through production of lactate as the 331 production of formate by PFL is restricted. This could explain the discrepancy between the 332 experimentally measured low formate and high lactate production in the WT strain and the 333 prediction by p-thermo. In the LDH knockout at low dissolved oxygen conditions, which 334 prevents PFL activity, PDH instead predominantly carries flux to acetyl-CoA. In this instance, 335 in order to maintain cellular redox balance, the Δldh cells increase the produced 336 ethanol/acetate ratio. This picture highlights the complexity of cellular and enzymatic 337 regulation that is poorly captured in stoichiometric models, as well as the difficulty in 338 simulating uncontrolled environments accurately. 339 However, the performed simulations can still be used to visualize and understand the burden 340 that lactate production can have on cellular growth. The inability to induce PFL at moderate 341 levels of oxygen limitation, puts a larger reliance on fermentative metabolism to lactate, 342 providing less energy. We used p-thermo to investigate the possible impact this has. First, all 343 measured exchange rates for the three strains were fitted to the model and used in 344 subsequent determination of predicted biomass yields. This showed that the model is 345 physiologically capable of capturing the measured data, albeit with a lower predicted 346 J o u r n a l P r e -p r o o f biomass yield than was experimentally measured, suggesting that stoichiometrically sub-347 optimal fermentation pathways were active in vivo (Supplementary Figure 4A). Finally, the 348 effect of increasing lactate production on biomass yield was computed, showing the 349 energetic loss that occurs from lactate production (Supplementary Figure 4B)

356
Each panel highlights a different strain: wild type (WT), Δldh and ΔldhΔpfl (PDHup). Two varying 357 thresholds for FVA were run: 95% and 99% of the optimum biomass production. The in silico 358 predicted biomass yield (Y xs ) for 99% of the optimum biomass production is shown for each condition.

Genome-scale metabolic modeling allows the elucidation of metabolic bottlenecks 360
The availability of a comprehensive GEM can also facilitate the elucidation of metabolic 361 bottlenecks and identification and optimization of chemically defined growth media 77 . Thus, 362 p-thermo was used to help resolve known issues of the anaerobic metabolic physiology of P. 363 thermoglucosidasius. Although P. thermoglucosidasius is clearly capable of classical mixed 364 acid fermentation and shows elements of a regulated aerobic-anaerobic switch as revealed 365 by transcriptomic analysis 78 (although the aerobic respiratory electron transport chain 366 remained active in an oxygen-scavenging state under fermentative conditions), it has long 367 been known that growth under anaerobic conditions on existing minimal defined growth 368 medias requires additional supplements in comparison to growth under aerobic conditions. 369 Typically, this was resolved by supplementation with a small amount of oxygen or yeast 370 extract 14,33,51 . Therefore, here we used simulations of p-thermo to find a minimal set of 371 defined nutrients that can achieve anaerobic growth of P. thermoglucosidasius. As a first 372 observation, when fed true minimal, anaerobic medium, the model predicted no growth, in 373 accordance with experimental observations. However, fermentative energy generation was 374 observed, which highlights that oxygen requirement comes from critical secondary 375 metabolites or cofactors that cannot be synthesized anaerobically, which is corroborated by 376 previous observations 14 . By minimizing the oxygen uptake in the model, a critical reaction set 377 requiring oxygen was generated (Table 1). This analysis highlighted a complex combination 378 of components that cannot be synthesized anaerobically: thiamine, biotin, folate, vitamin 379 B12, spermine, spermidine and hemin. Additionally, iron(III) must be available in the medium 380 to allow porphyrin biosynthesis. Wolfe's vitamins plus the unique components of the supplementation mix (spermine, 397 spermidine and heme) ( Figure 5, Supplementary figure 5). 398 Experimental observations suggested that a combination of thiamin, biotin and iron(III) were 401 the minimal required supplementation set needed to sustain anaerobic growth, as no 402 difference was observed when additional defined supplementation was added ( Figure 5). 403 Yeast extract also contains significant amounts of amino acids and other components and so 404 provides an additional growth advantage, as expected. However, this highlights a 405 discrepancy with the model predictions, as a larger minimal supplementation set was 406 originally predicted (Table 1)

Biomass composition and growth energetics 541
To model growth, a biomass pseudo-reaction was added to the model. The reaction pools 542 metabolites needed for growth into a biomass metabolite. Base biomass composition was 543 previously determined experimentally according to reported practices 9,51 . Lipid composition 544 was obtained from previous reports 7 , and was incorporated into the model according to a 545 restrictive approach 62 , in which a determined acyl chain length is assumed for all lipid 546 species. Further fine-tuning of biomass composition was performed based on available 547 enzymatic and metabolic requirements of the strain, with case-by-case justification given in 548 the GitHub repository. Critical metabolites known to be required for catabolic functions of 549 essential enzymes, such as heme, were added at trace stoichiometries based on knowledge 550 from related organisms and scaled to ensure that all biomass components added up to 1 551 J o u r n a l P r e -p r o o f g/g DW 99 . Growth energetics (ATP cost of growth-associated maintenance and ATP 552 requirement for non-growth associated maintenance) were estimated by minimizing the 553 prediction error of the specific substrate consumption rate and the specific growth rate of 554 glucose fed, aerobic chemostats 51,61 . The P/O ratios were obtained from the given data for B. 555 subtilis 100 . The contribution of polymerization of each metabolite type to the total growth 556 associated maintenance was estimated based on previously reported polymerization 557 energies 65 . 558

Transport reactions 559
The model has two compartments: extracellular and intracellular. Transport reactions were 560 inferred from genome annotations and homology to known transporters. Additionally, 561 knowledge about growth on various substrates was used to validate the presence of the 562 corresponding transporters. When running FVA, a threshold below the optimum is used to represent the metabolic 578 freedom that is given to a model. In this study, a sensitivity analysis was run with FVA 579 thresholds from 90 to 99%, to evaluate at what sensitivity level the model better matches the 580 experimental data. Additionally, parsimonious flux balance analysis (pFBA) was run, by 581 conducting a bilevel linear programming optimization that computes the optimum (growth) 582 solution of the network, whilst minimizing the sum of all fluxes 73 . In doing so, this 583 optimization predicts the most stoichiometrically efficient pathway set, and captures the 584 maximum biomass per unit flux objective that has previously been described to be well 585 supported by proteomic and transcriptomic data 73,101 . 586

Genome comparison 587
For an unbiased genome comparison, the P. thermoglucosidasius NCIMB 11955 genome 588 was obtained from NCBI (Accession: CP016622), and the P. thermoglucosidasius M10EXG 589 genome was obtained from the Integrated Microbial Genome database (ID 2501416905). 590 Genome annotation was performed using RASTk 98 , after which the two proteomes were 591 compared through blast bi-directional best hits to create a homology matrix between the 592 strains, based on a published pipeline 102 . To filter for metabolic genes, any ORF associated 593 to a predicted EC code was considered metabolic. The exact workflow can be followed in the 594 GitHub repository. 595

Experimental procedures 596
The P. thermoglucosidasius NCIMB 11955 (DSM2542) strain was obtained from DSMZ 79 . 597 The strain was grown in either 2SPY medium or base thermophile minimal medium (TMM), 598 modified from Fong et al. 80 . 2SPY was used for a first preculture, and contains per liter, 16g 599 soy peptone, 10 g yeast extract and 5 g NaCl, adjusted to pH 6.8. Base TMM contains, per The aerobic cultures were inoculated to a starting OD 600 of around 0.05, after an overnight 616 culture on the base TMM medium. Growth was monitored through OD600 measurements, 617 during growth at 60°, 200 rpm in baffled shake flasks. Anaerobic medium was prepared 618 similarly to aerobic medium, but 1 µg/L resazurin was added to ensure complete anaerobic 619 conditions. The medium was flushed with nitrogen gas prior to use. All anaerobic work was 620