Systems biology solutions for biochemical production challenges

There is an urgent need to signiﬁcantly accelerate the development of microbial cell factories to produce fuels and chemicals from renewable feedstocks in order to facilitate the transition to a biobased society. Methods commonly used within the ﬁeld of systems biology including omics characterization, genome-scale metabolic modeling, and adaptive laboratory evolution can be readily deployed in metabolic engineering projects. However, high performance strains usually carry tens of genetic modiﬁcations and need to operate in challenging environmental conditions. This additional complexity compared to basic science research requires pushing systems biology strategies to their limits and often spurs innovative developments that beneﬁt ﬁelds outside metabolic engineering. Here we survey recent advanced applications of systems biology methods in engineering microbial production strains for biofuels and -chemicals.


Systems biology solutions for biochemical production challenges
Anne Sofie Laerke Hansen, Rebecca M Lennen, Nikolaus Sonnenschein and Markus J Herrgå rd There is an urgent need to significantly accelerate the development of microbial cell factories to produce fuels and chemicals from renewable feedstocks in order to facilitate the transition to a biobased society. Methods commonly used within the field of systems biology including omics characterization, genome-scale metabolic modeling, and adaptive laboratory evolution can be readily deployed in metabolic engineering projects. However, high performance strains usually carry tens of genetic modifications and need to operate in challenging environmental conditions. This additional complexity compared to basic science research requires pushing systems biology strategies to their limits and often spurs innovative developments that benefit fields outside metabolic engineering. Here we survey recent advanced applications of systems biology methods in engineering microbial production strains for biofuels and -chemicals.

Introduction
One of the key challenges in the 21st century is to identify technical solutions for the transition away from a petrochemical-based economy. The production of chemicals and fuels from renewable feedstocks in a commercially and ecologically sustainable fashion is a central component of these solutions [1]. A handful of bio-based chemicals and fuels have already been commercialized for industrial scale production including 1,3-propanediol [2], succinic acid [3], and 1,4-butanediol [4]. However, despite microbial fermentation-based production offering multiple advantages over current petrochemical processes, its full implementation has been hampered by difficulties in reaching cost-effective yields from low cost feedstocks [5].
Metabolic engineering offers a systematic workflow for rational cell factory development by overexpression of pathway genes, elimination of byproducts, balancing of cofactors and increasing precursor supply among other approaches [6,7]. Recent advances in the field of synthetic biology, such as the development of the CRISPR/Cas9 system [8,9], and other genome editing tools, have increased the pace and ease with which microbial cell factories can be built [10][11][12]. However, the number of obvious gene targets for optimization is limited, and genetic manipulations often lead to unintended effects due to complex genotype-phenotype relationships [6]. Facilitated by the emergence of high-throughput technologies like next-generation sequencing and quantitative proteomics, systems biology offers several methods to unravel complexity of microbial metabolism and physiology.
The scope of systems biology is to investigate biological systems in a holistic manner to elucidate the mechanisms underlying the cellular behavior in contrast to the classic reductionist approaches where single elements of the system are studied in detail. Similarly, metabolic engineering requires, in addition to manipulation of single enzymes and pathways, also engineering of the interactions between the target pathway and endogenous metabolism [6]. In the field of systems biology, quantitative workflows have been developed in recent years to study responses of microorganisms to relatively simple environmental and genetic changes [13 ,14], together with dataand model-driven approaches for predicting phenotypes [15,16]. These workflows can now be extended to engineered cell factories to understand effects of complex manipulations and to design more robust and efficient production organisms.
Here we review some of the most recent applications of systems biology tools for metabolic engineering of microorganisms for sustainable production of chemicals with special focus on non-native biofuels and bulk chemicals. We will focus on three particular technology platforms that have demonstrated impact in metabolic engineering: omics data collection and analysis, genome-scale models (GEMs) of cellular processes, and adaptive laboratory evolution (ALE). Indeed, the integration of omics and computational techniques together with the recent possibility to screen, select and fine-tune cellular responses [17,18] hold promise to speed up the systems metabolic engineering approach. In this context, GEMs offer a useful framework for interpretation of collected data as well as formulation and assessment of potential engineering strategies. The application of ALE for systems-level optimization of host robustness and biochemical production, and the subsequent investigation of causal mutations by omics and computational analysis, allows for simultaneous strain improvement and identification of potential targets for further engineering.

Trends in omics characterization for metabolic engineering
The use of the four major omics technologies (transcriptomics, proteomics, metabolomics, and fluxomics) applied to characterize cell systems behavior has increased rapidly in metabolic engineering-related publications since 2010 ( Figure 1). This increase has been brought about both by growth of the metabolic engineering and biofuel fields as well as improvements in omics technologies. Among different omics methods, the development of the quantitative RNA sequencing method has made transcriptomics by far the most commonly used methodology followed by proteomics and in particular targeted pathway-oriented proteomics ( Figure 1b). The use of metabolomics and fluxomics (typically 13 C-based) is still relatively rare in metabolic engineering studies most likely due to both incomplete coverage of metabolites/fluxes, and challenges and/or costs in experimental implementation. The majority of the metabolic engineering studies using omics data focus on common biofuels that are produced natively (e.g., ethanol, n-butanol or fatty acids) or on platform strains without aim to produce a particular product (Figure 1c). Studies using omics technologies characterizing strains making non-native fine or bulk chemicals are surprisingly rare despite well-documented ability of omics methods to discover potential bottlenecks in engineered strains [19].
Recent years have seen the emergence of multi-omic characterization studies that often also incorporate a modeling component to study either platform or production strains. Examples of such studies targeting production strains include identification of bottlenecks in terpenoid production in Escherichia coli [20 ], 3-hydroxypropionic acid production in baker's yeast [21], and L-lysine production in Corynebacterium glutamicum [22]. For platform strains, such studies have included comparisons of multiple possible wild type host strains [23 ], in depth characterization of less-well-studied production hosts [24], and determination of effects of major flux re-routing in central metabolism [25]. Multi-omic characterization has also become one of the key tools in identification of mechanisms of adaptation in ALE studies targeting either general stress or product tolerance [26 ,27 ]. In recent years, standard omics data types are increasingly complemented by genome-wide screening of knock-out or knock-down libraries using, for example, transposon insertion sequencing [28,29] or  CRISPR/Cas9-based [30] methods. These methods allow identification of targets for further genetic manipulation more directly than other types of omics methods.
Genome-scale models for cell factory design and omics data interpretation In the context of metabolic engineering, GEMs represent an invaluable tool for estimating theoretical maximum yields (Figure 2), enumerating heterologous production pathways with high yields [31 ], and predicting physiological changes in redox balance or energy metabolism upon perturbation [32 ]. Furthermore, numerous methods for computing strain engineering strategies by overexpression, down-regulation, and/or deletion of genes have been developed [33]. Of particular interest in the context of ALE experiments are gene deletions that couple the production of a desired product to growth. These so-called growth-coupled designs [34] can be implemented experimentally using ALE, as the selection for higher growth phenotypes drives the organism in the desired corner of the growth-production phenotype space ( Figure 2). Multiple algorithms for the computation of growth-coupled designs have been published [35][36][37], and in particular the use of metabolic cut sets [38] has been demonstrated to be scalable to larger numbers of knock-outs and has been recently used effectively for the engineering of E. coli for the production of itaconic acid [39 ].
Ideally, suitable experimental data can be used to shrink the feasible solution space of GEMs to obtain more reliable predictions (Figure 2). Quantitative analysis of omics data with GEMs, however, has proven to be difficult. While numerous methods have been published for the integration of transcriptomics and proteomics data with GEMs, a comprehensive benchmark of published methods [40 ] has revealed that none of these methods surpass methods that do not take omics data into account [41] in the quantitative prediction of metabolic fluxes. While this does not preclude the use of GEMs in more qualitative types of omics-data analyses [20 ,23 ,42], the lack of ability to accurately predict rates of by-product formation makes the model-guided analysis of transcriptomics and proteomics data in strain engineering projects challenging. New modeling approaches that extend GEMs beyond metabolism provide a platform for direct integration of proteomics and transcriptomics data [43] and can result in improved flux predictions. Furthermore, integrating multi-omic data with both mechanistic and machine learning models that encompass additional cellular systems, for example, transcriptional regulatory networks, could be a further avenue [44].
Significant new modeling method development will be needed to allow interpretation of metabolomics data in the context of GEMs as these models do not use metabolite concentrations as state variables. Recently, Zelezniak et al. [45 ] proposed a network based framework for reconciling transcriptome changes with metabolome changes highlighting the importance of network context and kinetics. Finally, fluxomics data, that is, the measurement of intracellular fluxes with 13 C labeling experiments [46 ,47] in addition to uptake and secretion rates, likely holds the largest potential in informing metabolic engineering projects, as such data can be incorporated unambiguously as flux constraints into GEMs. The main challenge with fluxomics data is the small number of fluxes that can be directly estimated requiring the use of methods such as sampling to estimate the remaining fluxes with GEMs.
Systems biology solutions for biochemical production challenges Hansen et al. 87   Experimental data integration and growth-coupled design with GEMs. The production envelopes of the initial strain (theoretical: blue, constrained with experimental data: green), and a knock-out mutant strain predicted with genome-scale metabolic modelling (orange). The black dot indicates the maximum theoretical product yield, the green dot indicates the typical initial growth and product yield, whereas the movement of the yellow dot indicates how the growth and product yield of the growth-coupled knock-out production mutant strain is improved by ALE. Traditional and emerging uses of adaptive laboratory evolution for metabolic engineering Sparsity of biological knowledge necessitates both the use of omics technologies as a characterization tool, and ALE to determine non-intuitive routes to improve strain robustness and production metrics. In a typical ALE experiment (Figure 3a), a laboratory selection pressure is maintained (in either batch cultures or a chemostat) to select for cells with better growth, which are typically acquired through spontaneous mutations. Ultimately, individual isolates or populations with improved growth under the selection condition are whole genome resequenced to determine the acquired mutations. Evolved isolates can then be used directly as production strains, or selected mutations can be reintroduced into production hosts to generate the desired phenotype.
Because of the requirement of a growth selection, more traditional ALE studies (Figure 3b) relevant for biofuel and chemical production applications have focused on wild-type strains challenged with more direct effectors of growth. ALE of E. coli on minimal glucose [48 ] and minimal glycerol media [49] has resulted in the identification of numerous key regulatory and metabolic mutations. Two of these regulatory mutations in RNA polymerase were further studied by structural modeling, transcriptomic, and metabolomic analyses to determine a novel trade-off mechanism for growth in constant versus fluctuating environments [50]. Both E. coli and Saccharomyces cerevisiae have been evolved for thermotolerance [26 ,51,52] and osmotolerance [53,54], which are beneficial traits for economical production of biofuels and bulk chemicals. Product tolerance is also of key importance for 88 Energy biotechnology  Adaptive laboratory evolution (ALE). (a) A typical ALE experiment consists of maintaining a selective pressure through either serial passaging of batch cultures or using a chemostat and whole genome re-sequencing isolates or populations. (b) Traditional ALE applications include simple growth selections on feedstocks containing inhibitory substrates or components, alternative substrates than those typically utilized by the strain, exogenously added toxic products, or general stress conditions present in industrial fermentation. More pioneering ALE technologies as applied to microbial production of bulk chemicals and fuels include engineered strains that either directly (e.g., requiring the product for biomass production) or indirectly (e.g., by providing non-optimized pathways to balance redox potential) growth-coupled product formation (left), or that utilize productresponsive biosensors (right) that are either employed to produce components of biomass or to negatively select against non-producers, or to produce fluorescent reporter proteins that enable iterative rounds of cell sorting.
reaching economically relevant product titers. Recent examples employing ALE include detailed functional investigation of an evolved ethanol-tolerant E. coli [55], generation of an octanoic acid tolerant mutant with reduced cell lysis and improved free fatty acid production [56], and isolation of 3-hydroxypropionate tolerant S. cerevisiae mutants with causal mutations related to detoxification of a toxic aldehyde byproduct [27 ].
The use of carbon feedstocks such as CO 2 , or sugars found in lignocellulosic hydrolysates (containing significant proportions of C 5 sugars such as xylose and arabinose), is preferred in biofuel and bulk chemical applications due to tight economic constraints and reducing net CO 2 emissions. While many organisms exist that by nature can utilize these substrates, it is often desired to apply well-studied, easy-to-engineer model organisms such as E. coli and S. cerevisiae. The development of pentose-fermenting S. cerevisiae strains has been an intense area of study for the past two decades [57]. ALE has also been employed to improve two remaining troublesome aspects: C 6 and C 5 sugar co-utilization [58], and C 5 transport [59], in strains expressing heterologous xylose utilization pathways. CO 2 fixation by E. coli into biomass has been achieved by heterologously expressing RuBisCo and phosphoribulokinase, eliminating carbon flow from glycolysis into the TCA cycle, and performing ALE to improve growth of the resulting strain supplied with exogenous pyruvate while reducing xylose feeding, which was originally required [60 ]. Another example is the evolution of E. coli and S. cerevisiae strains harboring heterologous pathways for synthetic nitrogen and phosphorus sources, with ALE performed to further improve utilization [61 ]. The use of synthetic nutrient sources could offer economic advantages due to reduced reactor sterilization costs or reduced antibiotic supplementation.
Enabling product formation to be a selectable phenotype through either growth-coupled designs (see above), or the use of biosensors within synthetic gene circuits or coupled with flow-assisted cell sorting (FACS), is a clear direction of much future work (Figure 3b). One pioneering work was enhancing L-valine production in C. glutamicum using cells expressing an L-valine responsive biosensor driving a fluorescent reporter, where cells were sorted over subsequent rounds of growth [62]. While ALE has not yet been employed, directed evolution has been performed at the pathway and protein level coupled with the use of synthetic suicide riboswitches [63] or toggled selection schemes where both negative and positive selection can be iteratively applied to a sensor selector to isolate cells with improved production [64].

Conclusions
Technologies that have been introduced in the field of systems biology (omics characterization, GEMs, ALE) have been used extensively in engineering microbial cell factories for production of chemicals and fuels. Recent years have seen an increase in studies that use a broader range of these technologies at once in order to study wildtype platform or engineered strains. Much remains still to be done in order to allow rapid iterative development of cell factories based on systems strategies. The cost of omics data collection and analysis needs to be further reduced. GEMs need to be expanded and modeling methods need to be developed to use quantitative omics data and make more accurate predictions of genetic manipulation targets. In general improved phenotypic predictions from genotypes, especially for large numbers of simultaneous genetic perturbations, will require development of methods that integrate mechanistic modeling with machine learning. Novel selection strategies need to be devised to allow routine use of ALE for optimizing metabolite production. All of these developments together with improved genome editing and other synthetic biology methods have the potential to significantly increase the speed at which new cell factories are developed.
Systematic evaluation of methods for integration of transcriptome data with GEMs revealing that all methods do not surpass parsimonious flux balance analysis in predicting metabolic fluxes quantitatively.