Genome-scale metabolic modelling of P. thermoglucosidasius NCIMB 11955 reveals metabolic bottlenecks in anaerobic metabolism

Parageobacillus thermoglucosidasius represents a thermophilic, facultative anaerobic bacterial chassis, with several desirable traits for metabolic engineering and industrial production. To further optimize strain productivity, a systems level understanding of its metabolism is needed, which can be facilitated by a genome-scale metabolic model. Here, we present p-thermo, the most complete, curated and validated genome-scale model (to date) of Parageobacillus thermoglucosidasius NCIMB 11955. It spans a total of 890 metabolites, 1175 reactions and 917 metabolic genes, forming an extensive knowledge base for P. thermoglucosidasius NCIMB 11955 metabolism. The model accurately predicts aerobic utilization of 22 carbon sources, and the predictive quality of internal fluxes was validated with previously published 13C-fluxomics data. In an application case, p-thermo was used to facilitate more in-depth analysis of reported metabolic engineering efforts, giving additional insight into fermentative metabolism. Finally, p-thermo was used to resolve a previously uncharacterised bottleneck in anaerobic metabolism, by identifying the minimal required supplemented nutrients (thiamin, biotin and iron(III)) needed to sustain anaerobic growth. This highlights the usefulness of p-thermo for guiding the generation of experimental hypotheses and for facilitating data-driven metabolic engineering, expanding the use of P. thermoglucosidasius as a high yield production platform.


60
As the global transition away from petroleum-derived feedstocks continues, the need to 61 produce commodity and fine chemicals using sustainable feedstocks has accelerated the 62 interest in establishing microbial bioprocesses with lower environmental footprints [1][2][3]  Without an accurate picture of how cellular metabolism operates as a whole, metabolic 68 engineering strategies can produce flux imbalances, resulting in the accumulation of carbon 69 intermediates, metabolic bottlenecks and/or imbalances in the overall cellular redox ratio 5,6 . 70 As a result, there can be large upfront costs in microbial strain engineering to ensure 71 economically viable biochemical product yields 3,7 . To bolster traditional metabolic engineering 72 efforts and help elucidate genotype-phenotype relationships, systems metabolic engineering 73 aims to describe a more holistic representation of cellular metabolism though the integration 74 of stoichiometric modelling and -omics data analyses 8 . 75 In particular, the advent of cheaper DNA sequencing has given rise to genome-scale metabolic 76 models (GEMs), in silico reconstructions of the metabolic reaction networks of a given 77 organism, derived from its annotated genome sequence 9 . In addition to operating as a 78 knowledge base of metabolic information for a particular organism, GEMs can be used via

Model reconstruction 159
The presented genome-scale metabolic reconstruction of P. thermoglucosidasius NCIMB 160 11955 is based on genome sequencing by ERGO TM Integrated Genomics 57 and Sheng et al. 58 . 161 Genome annotation was performed through the ERGO TM Integrated Genomics suite 57 and the 162 RAST annotation server 59 , followed by gap filling with Pathway Booster 60 . The reconstruction 163 was extensively manually curated using available literature and databases (KEGG, BRENDA, 164 MetaCyc, MetaNetX and EC2PDB), according to benchmark approaches 61 . Detailed manual 165 curation and refinement can be followed in Lisowska 14

Biomass composition and growth energetics 185
To capture biological growth in stoichiometric models, a demand reaction referred to as a 186 biomass pseudo-reaction, was added. An overview of how the biomass pseudo-reaction was 187

Overview of metabolism 202
To provide a comprehensive overview, two pathway maps of the model were drawn using 203 Escher 69 corresponding to central carbon and amino acid metabolism (Supplementary Files 3 204 and 4) and deposited in the GitHub repository at p-thermo/maps. Traits specific to Geobacillus 205 spp. and P. thermoglucosidasius NCIMB 11955, presented in the literature, were used to 206 validate the model's metabolism. Detailed step-by-step decisions that were made, can be 207 followed in the GitHub repository at "p-thermo/notebooks". As an example, in central carbon 208 metabolism research has shown that Geobacillius spp., unlike many mesophilic Bacillus 209 species, lack genes for a 6-phosphogluconolactonase (6PGL), responsible for part of the 210 oxidative pentose phosphate pathway (PPP) 14 . Instead, the reaction can occur spontaneously 211 and, at thermophilic temperatures, may be sufficiently rapid to maintain the requisite PPP 212 flux 70 . The absence of 6PGL was captured in the model, but to reflect the active PPP pathway, 213 a pseudo-reaction was added to allow the complete oxidative PPP to function. 214 (Para)geobacillus species are known to be capable of growth on a wide range of 215 carbohydrates, and have been shown to secrete various polysaccharide degrading enzymes 216 such as xylanases and other hemicellulose degrading enzymes 14,24-26 . To assess the 217 metabolic capacity of the model, growth on various carbon sources was simulated (Figure 2). 218 The choice of carbon sources was made based on what has previously been shown to allow 219 aerobic growth of P. thermoglucosidasius NCIMB 11955 51 . Additionally, anaerobic growth on 220 these substrates was computationally predicted. In both cases, carbon supply was normalized 221 to 30 Cmols/gDWh, to accommodate different polymeric substrate forms being present in the 222 data set. Initially, the model showed no aerobic growth on arbutin, salicin and rhamnose, due 223 to dead-end metabolites being formed as side products in the first steps of their break down. 224 Available literature was used to fill the gaps in the catabolic pathways, which enabled aerobic growth on all three carbon sources. Anaerobically, in silico growth on arbutin and salicin was 226 unfeasible, as current knowledge suggests that their catabolism is oxygen dependent. Both 227 arbutin and salicin are non-conventional carbon sources and are glycosides, composed of 228 either a hydroquinone or salicyl alcohol functional group attached to glucose, respectively. It 229 is known that metabolism of these glycosides occurs through splitting of the glycosidic bond,

Assessment of predictive power through 13 C-flux fitting 241
Prior to using a genome-scale model for metabolic analyses or ab initio predictions, it is critical 242 to validate its predictive power based on previously attained experimental data. This was done 243 by analysis of how well simulated fluxes match known flux distributions. 13 C-isotopic labelling 244 is a standard tool used to elucidate intracellular fluxes in central carbon metabolism, through 245 extensive experimental work and data analysis. Flux variability analysis (FVA) is an in silico 246 approach that can allow ab initio analysis of metabolism without the need for laborious 247 experimental data 71 . Comparing the two data types can give insights into metabolism and 248 allow the generation of hypotheses for metabolic engineering purposes. 249 To do so, 13 C-flux data from P. thermoglucosidasius M10EXG subject to varying oxygen 250 conditions was used to qualitatively assess the predictive quality of p-thermo 7 . Whole between the two strains, we assume that the 13 C-flux data from P. thermoglucosidasius 258 M10EXG can be utilized for a qualitative assessment. 259 In order to test if the model can predict intracellular fluxes close to the 13 C-flux data, the 260 measured production and consumption rates were fixed in the model as exchange rates, and

Recapitulating & interpreting knockout physiology 280
To evaluate the utility of p-thermo for metabolic engineering applications, we recreated 281 compared to converting pyruvate into lactate 74 . Therefore, from a stoichiometric perspective, 304 p-thermo predicts this to be the most optimal pathway for growth, explaining the high 305 concentrations of formate, ethanol and acetate predicted in the simulation.
However, this was not observed in the experimental yields, presumably because of subtle 307 differences in dissolved oxygen availability in the experimental setup that influence multiple 308 levels of regulation in vivo, intrinsically not accurately captured by stoichiometric models. In restricted. This could explain the discrepancy between the experimentally measured low 317 formate and high lactate production in the WT strain and the prediction by p-thermo. In the 318 LDH knockout at low dissolved oxygen conditions, which prevents PFL activity, PDH instead 319 predominantly carries flux to acetyl-CoA. In this instance, in order to maintain cellular redox 320 balance, the Δldh cells increase the produced ethanol/acetate ratio. This picture highlights the 321 complexity of cellular and enzymatic regulation that is poorly captured in stoichiometric 322 models, as well as the difficulty in simulating uncontrolled environments accurately. 323 However, the performed simulations can still be used to visualize and understand the burden 324 that lactate production can have on cellular growth. The inability to induce PFL at moderate 325 levels of oxygen limitation, puts a larger reliance on fermentative metabolism to lactate, 326 providing less energy. We used p-thermo to investigate the possible impact this has. First, all 327 measured exchange rates for the three strains were fitted to the model and used in subsequent 328 determination of predicted biomass yields. This showed that the model is physiologically 329 capable of capturing the measured data, albeit with a lower predicted biomass yield than was 330 experimentally measured, suggesting that stoichiometrically sub-optimal fermentation 331 pathways were active in vivo (Supplementary Figure 4A). Finally, the effect of increasing 332 lactate production on biomass yield was computed, showing the energetic loss that occurs from lactate production (Supplementary Figure 4B). Overall, this highlights the importance of 334 complex regulation dictating metabolism, over pure stoichiometric optima per se.   conditions. Typically, this was resolved by supplementation with a small amount of oxygen or 352 yeast extract 14,33,51 . Therefore, here we used simulations of p-thermo to find a minimal set of 353 defined nutrients that can achieve anaerobic growth of P. thermoglucosidasius. 354 As a first observation, when fed true minimal, anaerobic medium, the model predicted no 355 growth, in accordance with experimental observations. However, fermentative energy 356 generation was observed, which highlights that oxygen requirement comes from critical 357 secondary metabolites or cofactors that cannot be synthesized anaerobically, which is 358 corroborated by previous observations 14 . By minimizing the oxygen uptake in the model, a 359 critical reaction set requiring oxygen was generated (Table 1). This analysis highlighted a 360 complex combination of components that cannot be synthesized anaerobically: thiamine, 361 biotin, folate, vitamin B12, spermine, spermidine and hemin. Additionally, iron(III) must be 362 available in the medium to allow porphyrin biosynthesis.  Experimental observations suggested that a combination of thiamin, biotin and iron(III) were 379 the minimal required supplementation set needed to sustain anaerobic growth, as no 380 difference was observed when additional defined supplementation was added ( Figure 5). 381 Yeast extract also contains significant amounts of amino acids and other components and so 382 provides an additional growth advantage, as expected. However, this highlights a discrepancy 383 with the model predictions, as a larger minimal supplementation set was originally predicted 384 (Table 1) In the first place, the in silico dependence on vitamin B12 highlights the inaccuracy of 399 annotation pipelines. Vitamin B12 synthesis is classically divided into two routes: canonical 400 (aerobic) and non-canonical (anaerobic) 85 . Although the genome annotation of P.

Biomass composition and growth energetics 517
To model growth, a biomass pseudo-reaction was added to the model. The reaction pools 518 metabolites needed for growth into a biomass metabolite. Base biomass composition was 519 previously determined experimentally according to reported practices 9,51 . Lipid composition 520 was obtained from previous reports 7 , and was incorporated into the model according to a 521 restrictive approach 97 , in which a determined acyl chain length is assumed for all lipid species. 522 Further fine-tuning of biomass composition was performed based on available enzymatic and 523 metabolic requirements of the strain, with case-by-case justification given in the GitHub 524 repository. Critical metabolites known to be required for catabolic functions of essential 525 enzymes, such as heme, were added at trace stoichiometries based on knowledge from 526 related organisms and scaled to ensure that all biomass components added up to 1 g/gDW 98 .
Growth energetics (ATP cost of growth-associated maintenance and ATP requirement for 528 non-growth associated maintenance) were estimated by minimizing the prediction error of the 529 specific substrate consumption rate and the specific growth rate of glucose fed, aerobic 530 chemostats 51,61 . The P/O ratios were obtained from the given data for B. subtilis 99 . The 531 contribution of polymerization of each metabolite type to the total growth associated 532 maintenance was estimated based on previously reported polymerization energies 64 . 533

Transport reactions 534
The model has two compartments: extracellular and intracellular. Transport reactions were 535 inferred from genome annotations and homology to known transporters. Additionally, 536 knowledge about growth on various substrates was used to validate the presence of the 537 corresponding transporters. 538

Stoichiometric modeling & applied constraints 539
In traditional flux balance analysis 10 , reaction stoichiometries are converted into a 540 running FVA, a threshold below the optimum is used to represent the metabolic freedom that 553 is given to a model. In this study, a sensitivity analysis was run with FVA thresholds from 90 554 to 99%, to evaluate at what sensitivity level the model better matches the experimental data. 555 Additionally, parsimonious flux balance analysis (pFBA) was run, by conducting a bilevel linear 556 programming optimization that computes the optimum (growth) solution of the network, whilst 557 minimizing the sum of all fluxes 72 . In doing so, this optimization predicts the most 558 stoichiometrically efficient pathway set, and captures the maximum biomass per unit flux 559 objective that has previously been described to be well supported by proteomic and 560 transcriptomic data 72,100 . 561

Genome comparison 562
For an unbiased genome comparison, the P. thermoglucosidasius NCIMB 11955 genome was 563 obtained from NCBI (Accession: CP016622), and the P. thermoglucosidasius M10EXG 564 genome was obtained from the Integrated Microbial Genome database (ID 2501416905). 565 Genome annotation was performed using RASTk 96 , after which the two proteomes were 566 compared through blast bi-directional best hits to create a homology matrix between the 567 strains, based on a published pipeline 101 . To filter for metabolic genes, any ORF associated 568 to a predicted EC code was considered metabolic. The exact workflow can be followed in the 569 GitHub repository. 570

Experimental procedures 571
The P. thermoglucosidasius NCIMB 11955 (DSM2542) strain was obtained from DSMZ 102 . 572 The strain was grown in either SPY medium or base thermophile minimal medium (TMM), 573 modified from Fong et al. 78 . SPY was used for a first preculture, and contains per liter, 16g soy 574 peptone, 10 g yeast extract and 5 g NaCl, adjusted to pH 6.8. Base TMM contains, per liter: The aerobic cultures were inoculated to a starting OD600 of around 0.05, after an overnight 591 culture on the base TMM medium. Growth was monitored through OD600 measurements, 592 during growth at 60°, 200 rpm in baffled shake flasks. Anaerobic medium was prepared 593 similarly to aerobic medium, but 1 µg/L resazurin was added to ensure complete anaerobic 594 conditions. The medium was flushed with nitrogen gas prior to use. All anaerobic work was 595 performed in an anaerobic chamber. Overnight cultures were run in anaerobic serum flasks at 596