Yeast synthetic biology advances biofuel production

Increasing concerns of environmental impacts and global warming calls for urgent need to switch from use of fossil fuels to renewable technologies. Biofuels represent attractive alternatives of fossil fuels and have gained continuous attentions. Through the use of synthetic biology it has become possible to engineer microbial cell factories for efficient biofuel production in a more precise and efficient manner. Here, we review advances on yeast-based biofuel production. Following an overview of synthetic biology impacts on biofuel production, we review recent advancements on the design, build, test, learn steps of yeast-based biofuel production, and end with discussion of challenges associated with use of synthetic biology for developing novel processes for biofuel production.


Introduction
The rapid increase in green-house gas (GHG) emissions due to the extensive use of fossil resources have necessitated the production of renewable energy sources to sustain current economic activities while reducing net carbon dioxide emission. Compared with other renewable energies, including solar, wind, tidal, thermal and hydro energies, biofuels produced through biorefineries relies on combustion to release its energy, and it is therefore storable and compatible with the current fossil fuel infrastructures. Many researchers, including the Intergovernmental Panel on Climate Change (IPCC), have envisaged a major role of biofuel to both replace fossil fuels and mitigate climate changes [1]. However, due to fluctuated prices of fossil resources and the low-throughput of bioengineering, the commercialization of biofuels besides bioethanol remains challenging.
Synthetic biology aims to integrate biology, mathematics, chemistry, biophysics, and automation, to construct synthetic enzymes, circuits, pathways, chromosomes and organisms in a systematic, modular and standardized fashion [2]. It has particularly advanced biofuel production through accelerating the speed of strain engineering resulting in prototype strains that can be evaluated for industrial production. The repertoire of synthetic biology and automation is revolutionizing the current biofuel production pipeline, and ushering a new era of biorefineries. For example, using synthetic biology biofoundries, Casini et al. managed to construct 1.2 Mb of synthetic DNA, built 215 strains spanning Saccharomyces cerevisiae, Escherichia coli, 3 Streptomyces species and two cell free systems within three months [3]. Yeast S. cerevisiae is a widely used chassis with many available synthetic biology tools and a long history of biofuel production, in particular for ethanol production, and has therefore been evaluated for bioproduction of a range of chemicals [4]. Herein, we will discuss recent achievements in the design-build-test-learn (DBTL) cycle of biofuel production by S. cerevisiae, and prospect challenges and future research directions towards the advancements of biofuel productions.

Design for biofuel production
The design stage of synthetic biology involves model construction [5], data mining [6], the sequence design of synthetic promoters [7], terminators [8], enzymes [9], the metabolic design of pathways and metabolisms [10], as well as the process design of cell production and fermentation [11] (Figure 1).
With the mass amounts of omics data and biofoundry data available, model construction tools have been developed, including COBRA for constructing biochemical constraint-based models [12] and FluxML for constructing 13C metabolic flux analysis models [13]. Moreover, Parts-Genie is an open-source online software for optimizing synthetic biology parts and bridging design, optimization, application, storage algorithms and databases [14]. MAPPs can be used for mapping reference networks into a graph and search for shortest pathways between two metabolites [15]. novoPathFinder can be used to design pathways based on stoichiometric networks under specific constraints [16]. The robot programming language PR-PR can be used in procedure standardization and sharing among biofoundries, and ease communications between protocols and equipment [17].
The conversion of feedstocks into biofuels is a nonlinear and multiscale process, and mass conservation, the supply of building blocks, energy and cofactors, thermodynamic feasibility, enzyme kinetics, cell growth, biofuel production, biofuel transportation, and stress response all need to be balanced and optimized. In the context of automationaid synthetic biology, model-based analysis has been extensively used for pathway prediction, resource allocations, metabolism characterization and optimization to improve biofuel production [18]. Various models have been constructed, such as genome-scale metabolic models (GEMs), kinetic models, coarse-gained cell models to design resource allocations [19 ]. Recently, a whole-cell model WM_S288C has been constructed expanding yeast GEM model to cover 15 cellular states such as RNA, protein, metabolite, cell geometry, as well as 26 cellular processes such as replication, formation, consumption, interaction and transportation [20 ]. This model has been demonstrated with the ability to simulate real-time cellular landscape on a 1 s time-scale [20 ]. Moreover, Yang et al. established a complex metabolic reaction set by integrating the natural reaction database MetaCyc and the non-natural reaction database ATLAS, and used a combined calculation algorithm to mine and design onecarbon compound utilization pathways [21]. Through the evaluation of kinetic traps, mining of new enzymes, and optimization of thermodynamics, a pathway with a carbon utilization rate of 88% has been constructed in vitro [21].

Build strains for biofuel production
The build stage of synthetic biology involves DNA assembly, genome editing, genome regulation, and automation (Table 1). Recently developed automation platforms have substantially accelerated our capabilities in reconstructing engineered strains, but automation requires development of technologies that are simple, modular, multiplexable, and efficient.
Automation friendly DNA assembly tools include the methyltransferase-assisted BioBrick that uses a site-specific DNA methyltransferase together with endonucleases and allows consecutive constructions without gel purification [22], Golden Gate that utilizes Type IIs restriction enzymes with the ability to assemble 24fragments in a single reaction [23], Twin-Primer Assembly (TPA) that is an enzyme free in vitro DNA assembly method and could assemble 10-fragments with no sensitivity to junction errors and GC contents [24], Gibson and NEBuilder assembly that is an homology-based in vitro method and is able to clone large DNA parts with high GC contents [25], Ligase Cycling Reaction (LCR) that employs bridging oligonucleotides to provide overlaps and allows automated assembly in consecutive steps [26], and yeast in vivo assembly that relies on the high homology recombination efficiency of S. cerevisiae [27]. Many of these DNA assembly tools have already been utilized in automation. For example, Q-metric has been

Current Opinion in Microbiology
Yeast synthetic biology and biofuel production.
developed to standardizes automated DNA assembly methods, and computes suitable assembly robotic practices, metrics and protocols based on output, cost and time [28 ]. Amyris Inc. managed to use transformationassociated recombination (TAR)-based biofoundries to assembly 1500 DNA constructs per week with fidelities over 90% [29 ].
Efficient and multiplexable genome engineering tools include mutiplexed genome disruption [30], integration [31], base editing [32], SCRaMbLE [33], automation [34]. Details could be referred to recent reviews [35]. For example, Zhang et al. reported the efficient GTR-CRISPR system that managed to simultaneously disrupt six genes in three days and improve yeast production of free fatty acid by 30-fold in 10 days [30]. Moreover, based on large amount of loxP sites across the synthetic yeast genome, SCRaMbLE managed to substantially improve yeast tolerance towards ethanol and acetate [33]. Si et al.
reported the automation-aided genome-scale regulations using overexpression and knockdown cDNA libraries, and successfully screens mutants towards cellulase expression and isobutanol production [36].

Test strains for biofuel production
The test stage of synthetic biology involves cell culture, cell sorting and cell analysis, and automation has also posed special requirements on the test workflow.
For cell cultivations, deep-well 96-well plates allows high-throughput cultivation under global maintenance of temperature and oxygen availability. Moreover, the BioLector system integrated with robotics allows realtime controlling of cell growth, pH and dissolved oxygen The design, build, test, learn steps of yeast-based biofuel production Liu, Wang and Nielsen 35 GTR-CRISPR A method that simultaneously disrupts six genes in three days and improves yeast production of free fatty acid by 30-fold in 10 days Free fatty acid [30] Laboratory evolution The strategy involves a repeated liquid nitrogen freeze-thaw process coupled with multi-stress shock selection Bioethanol [56] Test stage Quorum sensingbased biosensor The sensor can turn on the expression of specific genes when the cell biomass accumulates.
Ethanol [57] Metabolite-based biosensor The medium-chain fatty acids (MCFA)-responsive promoters can be used in dynamic regulation of fatty acids and fatty acid-derived products in Saccharomyces cerevisiae.

Systems biology (Multi-Omics) analysis
Through this multi-omics study, effects of fatty alcohol production on the host metabolism have been discovered. This knowledge can be used as guidance for further strain improvement towards the production of fatty alcohols Fatty alcohol [61] Machine learning A tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system Limonene, bisabolene and dodecanol [62] Constraint-based model By implementing SLIMEr (a formalism for correctly representing lipid requirements in genome-scale metabolic models (GEMs) using commonly available experimental data) on the consensus GEM of S.cerevisiae, accurate amounts of lipid species can be represented, the flexibility of the resulting distribution can be analyzed, and the energy costs of moving from one metabolic state to another can be computed Lipid [63] [37]. High-throughput cell sorting often requires phenotypes that could be correlate with cell growth such as screening for substrate utilization and host robustness, or phenotypes that can be read through fluorescence emissions. Equipments such as plate reader, microfluidics, fluorescence-activated single cell sorting can be used for high-throughput cell screening. Biosensors are often developed and utilized to convert screening targets to the easily detected fluorescence phenotypes (Table 1). Current used biosensors can be categorized as riboswitchbased biosensors, reporter protein-based biosensors and transcription factor-based biosensors [38,39]. For example, Dabirian et al. developed a FadR-based biosensor and demonstrated that the overexpression of RTC3, GGA2 and LPP1 could enhance fatty acyl-CoA production by 80% [40]. Baumann et al. developed a biosensor based on the octanoic acid responsive PDR12 promotor, and demonstrated that overexpression of KCS1 and FSH2 could enhance for the production of branched-chain higher alcohol octanoic acid by 55% [41 ].
If the phenotype cannot be correlated to cell growth or a fluorescence readout, conventional analytical measurements using chromatographic, spectroscopic, and mass spectrometric have to be used, but these are not suited for high-throughput analysis as they generally take more than 20 min per sample, and are hence not compatible with high-throughput screening and automation. Researchers thus focus on developing advanced analytical technologies and platforms. For example, Fialkov et al. reported a LTM-LPGC-MS technology that allows efficient measurement of fatty acid methyl esters at the speed of less than 1 min per sample [42 ]. Similarly, Xue et al. reported a colony-based screening method using MALDI-ToF-MS that allows rapid profiling of medium-chain fatty acids at the speed of 2 s per sample [43].

Learnings on engineered strains
The learn stage of synthetic biology involves systems biology analysis [44] and machine learning [45 ] ( Table 1). Automation platforms can generate massive amount of data, that need to be analyzed and integrated back to the design stage to refine the models and guide the following iterative DBTL cycles through standardized procedures (Figure 2). Jayakody et al. performed laboratory evolution and systems analysis and suggested that macromolecule protection mechanisms and detoxification mechanisms are required to alleviate aldehyde toxicity [46]. Yu et al. reprogrammed yeast alcoholic fermentation to lipogenesis through systems and synthetic biology engineering and managed to improve the production of free fatty acids to 30 g/L [47]. Hohenschuh et al. developed a dynamic flux balance model integrated with mRNA abundance data, and suggested that the anaplerotic glyoxylate pathway is key to improve ethanol production in xylose utilization [48]. To facilitate learnings the Global Biofoundries Alliance was established in 2019 to share knowledges and resources among laboratories and non-commercial biofoundries [49].
Machine learning and quantitative biology based on constraint-based models has also gained considerable progress for use in identification of correlations between genotype and phenotype [50]. Various techniques have been developed in machine learning to analyze the massive amount of data, including unsupervised learning and dimensionality reduction [51]. Radivojevi c et al. developed an automated recommendation tool based on machine learning and probabilistic modeling techniques, and improved the production of limonene, bisabolene and dodecanol [45 ]. Moreover, the development of automated learning technologies is particularly important to realize the iterative engineering of microbial cell factories in the automation procedure. Regarding this need, Mohammad et al. developed a fully automated platform BioAutomata that integrated machine learning algorithms with the iBioFAB robotic system [52 ]. This system as a compelling proof of concept can be used to guided automatically iterative DBTL cycles to accumulate beneficial engineering for bioproduction.

Outlook
The rapid development of sequencing and bioinformatics analysis techniques allows a mix and match pathway design from different organisms, as well as whole cell analysis and optimization of carbon and nitrogen flux distribution, building block and energy balance, cell resource allocation, transcriptional and kinetic cell responses.
Continuous developments of automation-based DBTL cycle is, however, still necessary as the costs of developing strains that can be used for industrial production of biofuels is still high and need to be reduced in order to support bio-based production of fuels and chemicals at low costs. Automation allows large-scale prototyping and combinatorial analysis of related genetic and process variables with much reduced operational biases, as well as time and human investment [29 ]. However, many designed pathways and calculated yields could yet not be realized. Future research directions in the context of automated synthetic biology and biofuel production include refining current model predictions through integration of high-throughput data and machine learning. Furthermore, advancements towards constructing constraint and kinetic-based models that incorporate more and more cellular processes, that is, moving towards a whole-cell description, will improve the predictive strength of models and can lead to better design tools. Furthermore, improving standardization and interoperability among methods and platforms to encourage interlab collaborations will also advance build and test tools, and hereby enable faster evaluation of different design strategies. Here advancement in development of biosensors with broad dynamic ranges and robust to various conditions is important, but also capabilities of performing realtime accessibility of omics data will enable better guiding of designs. Finally, enhancing communications, establishment of common databases and software, will ensure that published data become more widely available for the research community, and here trends to ensure that raw data are more findable, accessible, interoperable and reusable (FAIR) are extremely valuable [29 ,53]. With these developments we are confident that synthetic biology will enable development of more efficient cell factories for biofuel production in the future, and this will lead to establish more sustainable production of transportation fuels for our society.