An optimized methanol assimilation path- way relying on promiscuous formaldehyde-condensing aldolases in E. coli

Engineering biotechnological microorganisms to use methanol as a feedstock for bioproduction is a major goal for the synthetic metabolism community. Here, we aim to redesign the natural serine cycle for implementation in E. coli. We propose the homoserine cycle, relying on two promiscuous formaldehyde aldolase reactions, as a superior pathway design. The homoserine cycle is expected to outperform the serine cycle and its variants with respect to biomass yield, thermodynamic favorability, and integration with host endogenous metabolism. Even as compared to the RuMP cycle, the most efficient naturally occurring methanol assimilation route, the homoserine cycle is expected to support higher yields of a wide array of products. We test the in vivo feasibility of the homoserine cycle by constructing several E. coli gene deletion strains whose growth is coupled to the activity of different pathway segments. Using this approach, we demonstrate that all required promiscuous enzymes are active enough to enable growth of the auxotrophic strains. Our findings thus identify a novel metabolic solution that opens the way to an optimized methylotrophic platform.


Introduction
Microbial production of commodity chemicals is limited by feedstock availability and cost. Sugars and starches, despite being commonly used, are not ideal microbial feedstocks as their biotechnological utilization directly competes with human consumption, thus eroding food security (Walker, 2009). Furthermore, the expansion of agricultural cultivation comes at the expense of shrinking natural habitats, hence threatening biodiversity (Fitzherbert et al., 2008). The use of lignocellulosic biomass, while avoiding some of these problems, presents other challenges, including heterogenic composition, difficult processing, and deleterious waste products (Sanderson, 2011). One carbon compounds provide a favorable alternative as they can be produced at high levels without burdening agricultural production and they represent homogenous, easy-to-handle microbial feedstocks (Schrader et al., 2009;Takors et al., 2018;Yishai et al., 2016). Methanol is especially interesting as it is completely water miscible, avoiding mass transfer barriers that constrain the use of gaseous one carbon compounds (e.g., carbon monoxide and methane). Methanol can be produced at low-cost from fossil or renewable methane (Zakaria and Kamarudin, 2016), or it can be produced sustainably and efficiently from CO 2 and electrochemically derived hydrogen (Szima and Cormos, 2018).
There has been much recent progress in the metabolic engineering of microorganisms that naturally grow on methanol, e.g., Methylobacterium extorquens (Marx and Lidstrom, 2004;Schada von Borzyskowski et al., 2015). Still, when compared to biotechnological microorganisms, such as Escherichia coli, natural methylotrophs are more difficult to manipulate and engineer (Pfeifenschneider et al., 2017). To alleviate this problem, multiple recent efforts have sought to engineer biotechnological hosts for growth on methanol via one of the naturally occurring methanol assimilation pathways (Wang et al., 2020;Whitaker et al., 2015;Zhang et al., 2017): the ribulose monophosphate (RuMP) cycle (Chen et al., 2018;He et al., 2018;Meyer et al., 2018), the dihydroxyacetone (DHA) cycle (Dai et al., 2017), and the serine cycle (Yu and Liao, 2018). However, these natural pathways might not represent optimal solutions. Better pathways, more efficient in the use of cellular resources and/or more metabolically compatible with the host microorganism, could be designed and implemented. For example, a recent study designed and partially implemented a modified serine cycle in E. coli, where some of the natural reactions were replaced with others, better fitting the endogenous metabolism of the host (Yu and Liao, 2018). However, the serine cycle and its modified variants are ATP-inefficient, which results in low biomass and product yields https://doi.org/10.1016/j.ymben.2020.03.002 Received 15 January 2020; Received in revised form 26 February 2020; Accepted 6 March 2020 (Claassens et al., 2019). In other studies, new-to-nature reactions were demonstrated, including the formolase reaction that condenses formaldehyde, the direct product of methanol oxidation, to glycolaldehyde or DHA (Lu et al., 2019;Wang et al., 2017;Yang et al., 2019), or hydroxyacyl-CoA lyase that condenses formyl-CoA with formaldehyde to generate glycolyl-CoA (Chou et al., 2019). However, the condensation rate and/or affinity for formaldehyde of these reactions are too low to be physiologically relevant.
In this study, we aim to upgrade the serine cycle for implementation in E. coli. By applying several design principlesincluding maximization of biomass yield and thermodynamic favorability as well as replacement of CO 2 fixation with formaldehyde assimilationwe devise the homoserine cycle, which relies on two promiscuous formaldehydecondensing aldolase reactions. We show that this synthetic route is expected to outperform the serine cycle and its modified variants in multiple key metrics. Moreover, even when compared to the RuMP cycle, the most efficient natural methanol assimilation pathway, the homoserine cycle is expected to support higher yields of products that are derived from acetyl-CoA, including ethanol, acetone, butyrate, butanol, citrate, itaconate, 2-ketoglutarate, and levulinic acid. As such, this cycle can outcompete all the natural aerobic methanol assimilation pathways for a wide array of products.
While establishing growth via the complete homoserine cycle is a challenging task that would require extensive further research, here we demonstrate that E. coli native enzymes can promiscuously catalyze all non-natural reactions of the homoserine cycle in a physiologically relevant manner, producing essential cellular building blocks. Overall, our work confirms the feasibility of a novel methanol assimilation pathway that could pave the way for future implementation of highly efficient conversion of this one carbon feedstock into commodity chemicals.

Design of the homoserine cycle
Inspecting the structure of the serine cycle ( Fig. 1a and Supplementary Table S1), we identified three key shortcomings: (i) Three of the pathway reactions participate in the central metabolism of the host (the pentose phosphate pathway, glycolysis, and the TCA cycle). This could lead to competition for flux between the pathway reactions and those of central metabolism, thus making regulation highly challenging (Bar-Even, 2016). (ii) The serine cycle is ATP-inefficient, consuming 3 ATP molecules per acetyl-CoA generated. This lowers the biomass and product yields as, rather than being assimilated into biomass, a substantial fraction of the substrate methanol needs to be oxidized to generate the required ATP (Claassens et al., 2019). (iii) For each formaldehyde molecule assimilated by the serine cycle, one CO 2 molecule is fixed. While CO 2 fixation has the advantage of serving as an electron sink for access reducing power , it leads to two key problems. First, as carboxylation is thermodynamically unfavorable, it needs to be coupled directly or indirectly to ATP hydrolysis, thus reducing the ATP efficiency of the pathway ( Bar-Even et al., 2012). In the case of the serine cycle, glycerate kinase consumes an ATP that is subsequently used to energize carboxylation by phosphoenolpyruvate carboxylase. Second, CO 2 fixation must be accompanied by two reduction steps, in order to bring the carbon to the average oxidation state of carbon in biomass. These extra reduction steps require more of the substrate to be completely oxidized to generate NAD(P)H, thus necessitating a delicate regulation between substrate assimilation and substrate oxidation.
We aimed to design a more applicable version of the serine cycle, which overcomes the shortcomings of the natural pathway. Rather than restricting the design to primary enzymatic reactions, we considered also promiscuous enzyme activities, i.e., side reactions. This dramatically expands the metabolic solution space and thus assists in identifying optimal pathway structures (Trudeau et al., 2018). To facilitate pathway implementation, we further aimed to employ only E. coli enzymes, reduce their overall number, and avoid overlap with central metabolism.
Importantly, we aimed the new pathway to be independent on carboxylation so as to lower the demand for NAD(P)H generation by complete substrate oxidation. A formaldehyde assimilation reaction should ideally replace the existing carboxylation reaction, as, unlike CO 2 , formaldehyde is already at the average oxidation state of cellular carbon. Similarly, consumption of ATP should be minimized in order to reduce substrate oxidation and increase biomass (and product) yield. Despite the reduction of ATP consumption, the pathway should be at least as thermodynamically favorable as the serine cycle (Noor et al., 2014).
Following a comprehensive literature search of formaldehyde condensing reactions catalyzed by E. coli native enzymes, we identified the homoserine cycle ( Fig. 1b and Supplementary Table S1) as a superior design. In the homoserine cycle, glycine is directly condensed with formaldehyde to generate serine. This serine aldolase (SAL) reaction ( Fig. 1b) was previously found to be promiscuously catalyzed (in vitro) by threonine aldolase (LtaE) (Contestabile et al., 2001). The SAL reaction bypasses the very long, multi-cofactor-dependent, and ATP-inefficient route for formaldehyde assimilation to 5,10-methylene-tetrahydrofolate (CH 2 -THF) (Crowther et al., 2008) (Fig. 1b). As within the previously proposed modified serine cycles (Bar-Even, 2016;Yu and Liao, 2018), serine is then deaminated to pyruvate by serine deaminase (SDA in Fig. 1b), bypassing a longer route via glycerate, which further involves the highly toxic intermediate hydroxypyruvate (Kim and Copley, 2012) (Fig. 1b). Despite being rather oxygen-sensitive, serine deaminase was shown to support high flux under aerobic conditions (Yu and Liao, 2018).
Pyruvate is then condensed with formaldehyde to generate the nonnative metabolite 4-hydroxy-2-oxobutanoate (HOB) (Bouzon et al., 2017), which is subsequently aminated to homoserine. The first of these reactions -HOB aldolase (HAL in Fig. 1b) was found to be promiscuously catalyzed by E. coli 2-keto-3-deoxy-L-rhamnonate aldolase (RhmA) (Hernandez et al., 2017) as well as by similar aldolases . The latter reaction -HOB amination (HAT in Fig. 1b) is supported by numerous aminotransferases (Hernandez et al., 2017;Walther et al., 2018;Zhong et al., 2019) as well as amino acid dehydrogenases such as (engineered) glutamate dehydrogenase (Chen et al., 2015). This route effectively replaces a carboxylation reaction (by phosphoenolpyruvate carboxylase) with a formaldehyde assimilation reaction that provides an alternative way to generate a C 4 intermediate. Homoserine is then metabolized by homoserine kinase (ThrB, HSK in Fig. 1b) and threonine synthase (ThrC, TS in Fig. 1b) to produce threonine. Finally, threonine is cleaved by threonine aldolase (LTA in Fig. 1b, catalyzed by the same LtaE that catalyzes the SAL reaction) to regenerate glycine and produce acetaldehyde which can be further oxidized to acetyl-CoA and assimilated to central metabolism.

The homoserine cycle could outperform natural methanol assimilation pathways
We compared the homoserine cycle to the serine cycle and the modified serine cycle according to multiple criteria. Assuming formaldehyde as the substrate of the pathways, we find that: (i) the homoserine cycle requires only eight enzymes, which is half the number of enzymes needed for the serine cycle (16) and~40% fewer enzymes than the modified serine cycle (13, Fig. 1c); also, unlike the other pathways, the homoserine cycle is not dependent on foreign enzymes. (ii) The homoserine cycle consumes a single ATP molecule for the production of acetyl-CoA, while the serine cycle requires 3 ATP molecules and the modified serine cycle needs 4 ATP equivalents (Fig. 1c). (iii) The homoserine cycle does not lead to the net consumption of NAD (P)H, while the other two pathways consume 2 NAD(P)H per acetyl-H. He, et al. Metabolic Engineering 60 (2020) 1-13 CoA (Fig. 1c); as mentioned above, this difference can be attributed to the fact that the homoserine cycle replaces a carboxylation reaction with another formaldehyde assimilation reaction. (iv) Despite its low resource consumption, the homoserine cycle is more energetically favorable than the other two pathways as its Max-min Driving Force -MDF, representing the minimal thermodynamic driving force via the pathway reactions after optimizing metabolite concentrations within a physiological range (Noor et al., 2014) is substantially higher (Fig. 1c, Supplementary Fig. S1). The higher thermodynamic driving force could assist in pulling methanol oxidation forward, a reaction which is thermodynamically unfavorable and represents a major constraint for establishing synthetic methylotrophy. We further used Flux Balance Analysis (FBA) to calculate the maximal biomass yield with each of the three pathways using methanol as a substrate (Methods). We found the expected biomass yield of the homoserine cycle to be 13% higher than with the serine cycle, while the modified serine cycle supported an even lower yield (Fig. 1d). Moreover, while the operation of the two other pathways requires > 50% of the methanol to be completely oxidized to provide NAD(P)H and ATP, only~6% of methanol is completely oxidized during growth via the homoserine cycle (Fig. 1d). This indicates that the establishment of highly efficient formaldehyde and formate oxidation systems is strictly required for the operation of the serine cycle and its modified variants, while the homoserine cycle could operate without such systems, in which case the biomass yield drops by less than 1%. The ability to bypass formaldehyde and formate oxidation is key for the establishment of this synthetic route, as it circumvents the need for a delicate finetuning of the formaldehyde oxidation flux relative to the formaldehyde assimilation flux.
It is commonly argued that the RuMP cycle is the most efficient naturally occurring route for methanol assimilation. Indeed, among all aerobic methanol assimilation pathways, the RuMP cycle supports the highest biomass yield (Claassens et al., 2019). We used FBA to compare the maximal yields of various central metabolites, precursors, and chemicals of interest using methanol as a substrate and either the RuMP cycle or the homoserine cycle as assimilation routes (Methods). While the RuMP cycle was able to sustain higher yields of biomass and phosphosugars, the homoserine cycle supported substantially higher yields of compounds derived from acetyl-CoA, including ethanol, acetone, butyrate, 1-butanol, citrate, itaconate (Zhao et al., 2018), 2ketoglutarate, levulinic acid (Cheong et al., 2016), and adipic acid (Yu et al., 2014) (Fig. 2a). Both pathways supported similar yields of pyruvate and its derivatives, including lactate, isobutanol, and 2,3-butanediol (Fig. 2a).  Supplementary Table S1 (b) The synthetic homoserine cycle (formaldehyde as feedstock). The homoserine cycle harbors three promiscuous enzyme activities: serine aldolase (SAL), 4-hydroxy-2-oxobutanoate (HOB) aldolase (HAL), and the HOB aminotransferase (HAT). The other pathway reactions include threonine biosynthesis (homoserine kinase, HSK; threonine synthase, TS) and cleavage (threonine aldolase, LTA) as well as serine deaminase (SDA). Note that both SAL and LTA are catalyzed by the same LtaE enzyme. Acetylating acetaldehyde dehydrogenase (ACDH) can convert the direct pathway product acetaldehyde to acetyl-CoA. (c) The homoserine cycle outperforms the natural serine cycle and the previously suggested modified serine cycle (Yu and Liao, 2018) in terms of simplicity (small number of reactions), resource consumption efficiency and thermodynamic driving force as indicated by the Max-min Driving Force (MDF) criterion (Noor et al., 2014). These analyses were based on formaldehyde as pathway feedstock (Methods). (d) The homoserine cycle is predicted by Flux Balance Analysis to support higher biomass yield on methanol than the natural serine cycle and modified serine cycle. Furthermore, unlike the other pathways, the homoserine cycle is almost independent on the complete oxidation of methanol to provide the cell with reducing power and energy.
Why does homoserine cycle outcompete the RuMP cycle for so many products? Methanol is a highly reduced carbon source, more reduced than most products. Excess electrons generated during methanol bioconversion are channeled to the respiratory chain and dissipated wastefully. The more electrons are dissipated in this way, the greater the loss in potential yield . The RuMP cycle especially suffers from electron overflow as it does not use an electron sink such as CO 2 and further releases CO 2 during the oxidative biosynthesis of acetyl-CoA (Bogorad et al., 2014). The homoserine cycle, on the other hand, generates acetyl-CoA directly without the release of CO 2 . Moreover, while the homoserine cycle is not strictly dependent on carboxylation, it can channel a fraction of its flux via a carboxylation route that serves as sink for excess reducing power: some of the pyruvate, instead of reacting with formaldehyde, can be converted to homoserine via the carboxylation-dependent anaplerotic route, which serves as an electron sink (grey arrows in Fig. 2b). The flexibility of the homoserine cycle in terms of CO 2 usebeing generally independent of carboxylation while able to moderately fix CO 2 when necessaryis the main reason of the high yields it can support. Furthermore, conversion of acetyl-CoA, the product of the homoserine cycle, into pyruvate and other C ≥3 compounds does not involve wasteful decarboxylation: succinate, the product of the glyoxylate shunt, can be reintegrated directly into the homoserine cycle to produce pyruvate (blue arrow in Fig. 2b).
Overall, the homoserine cycle seems to be preferable to its counterparts, with the potential to outperform both the serine cycle and the RuMP cycle for the production of a wide array of value-added chemicals. However, as the synthetic pathway relies on promiscuous enzyme activities, the question remains whether these activities are high enough as to be of physiological relevance. In the rest of the paper, we explore the in vivo activities of these reactions.

Demonstration of the in vivo activity of serine aldolase
Most of the reactions of the homoserine cycle correspond to the primary activities of their catalyzing enzymes and thus are less likely to constrain pathway flux. For example, overexpression of serine deaminase, either on a plasmid or from the genome, enabled E. coli to use serine as a sole carbon and energy source ( Supplementary Fig. S2). However, three pathway reactions -SAL, HAL, and HATcorrespond to promiscuous activities that, while characterized in vitro, might not be able to support physiologically relevant fluxes. Hence, we aimed to test each of these promiscuous activities in vivo within dedicated gene deletion strains, the growth of which is dependent on the activity of these reactions.
We started by testing the ability of LtaE to catalyze the SAL reaction in vivo (Fig. 3a). Towards this aim we constructed two strains auxotrophic for glycine and serine. In both strains the gene encoding for 3phosphoglycerate dehydrogenase (ΔserA) was deleted. In one strain the gene encoding serine hydroxymethyltransferase was also deleted (ΔglyA) while in the other strain the genes of the glycine cleavage system were deleted (ΔgcvTHP). The growth of these strains required the addition of glycine and serine, as the cellular interconversion of these compounds is blocked (Fig. 3b,c).
We reasoned that if the SAL reaction indeed supports physiologically relevant flux, both strains should be able to grow when methanol dehydrogenase (MDH) and LtaE are overexpressed and serine is replaced with methanol in the medium. In the ΔserA ΔglyA strain, the SAL reaction would be responsible for the production of serine (Fig. 3b), which accounts for~3% of the carbon in biomass (Neidhardt et al., 1990). In the ΔserA ΔgcvTHP strain, the SAL reaction would be responsible for the production of both serine and the cellular C 1 moieties (Fig. 3c), together accounting for~6% of the carbon in biomass (Neidhardt et al., 1990). To avoid formaldehyde oxidation to formate, which might deplete its intracellular pool and constrain its assimilation, the genes encoding for the glutathione-dependent formaldehyde oxidation system were also deleted (ΔfrmRAB). Upon overexpression of MDH and LtaE, we observed growth of both selection strains with glucose as the main carbon source and glycine and methanol as precursors of serine (Fig. 3d,e). This indicates that the SAL reaction can operate in vivo at a physiologically significant rate. Expression of only MDH or only LtaE failed to sustain growth, indicating that the native expression of genomic ltaE is too low to support the SAL reaction. The observed growth rate and yield were dependent on the concentration of methanol, where the ΔserA ΔglyA ΔfrmRAB strain had higher rates and yields than the ΔserA ΔgcvTHP ΔfrmRAB strain on low methanol concentrations. This corresponds to our prediction that the latter strain depends on the SAL reaction to provide a higher fraction of the cellular carbons. Methanol concentrations in the range of 200-1000 mM seem to be optimal, supporting growth rates similar to that of the positive control (in which serine was added to the medium). In all experiments, we added 50 μM MnCl 2 , as Mn 2+ is a known cofactor of LtaE (Fesko, 2016). Without the additional supplementation of MnCl 2 , we observed lower growth rates and yields ( Supplementary Fig. S3). is its primary function, while serine aldolase is a promiscuous activity (Contestabile et al., 2001). (b), (c) Two selection schemes for the in vivo activity of the SAL reaction. Carbon sources are shown in purple (glucose not shown) while the formaldehyde moiety is shown is green. (b) ΔfrmRAB ΔserA ΔglyA strain in which methanol assimilation is required for the biosynthesis of serine. (c) ΔfrmRAB ΔserA ΔgcvTHP strain, in which methanol assimilation is required for the biosynthesis of serine and the cellular C 1 moieties. (d), (e) Growth with different concentrations of methanol confirm the activity of the SAL reaction of the strains shown in (b) and (c), respectively. In all cases, 10 mM glucose and 10 mM glycine were added to the medium. Each growth curve represents the average of three replicates, which differ from each other by less than 5%. "PC" corresponds to positive control. "DT" corresponds to doubling time. (f), (g) Labeling pattern of proteinogenic glycine (GLY), serine (SER), threonine (THR), methionine (MET) and histidine (HIS), within the strains shown in (b) and (c), respectively, upon feeding with 13 C-methanol as well as unlabeled glucose and glycine. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) H. He, et al. Metabolic Engineering 60 (2020) 1-13 To confirm the activity of the SAL reaction, we conducted 13 C-labeling experiments. We cultivated both strains with 13 C-methanol as well as unlabeled glucose and glycine. In the ΔserA ΔglyA ΔfrmRAB strain we found serine to be entirely singly labeled as expected, while the other amino acids were unlabeled (beyond the natural abundance of 13 C, Fig. 3f). As threonine and methionine are derived from oxaloacetate, and hence carry a carbon that originates from CO 2 (i.e., anaplerotic reactions), their lack of labeling indicates that formaldehyde oxidation to CO 2 is negligible, as expected by the deletion of frmRAB. In the ΔserA ΔgcvTHP ΔfrmRAB strain we found serine, methionine, and histidine to be entirely singly labeled (Fig. 3g). Unlike threonine, both methionine and histidine harbor a carbon derived from THF carrying a C 1 unitmethyl-THF in the case of methionine and formyl-THF in the case of histidine. The labeling of these amino acids thus indicates that all cellular C 1 moieties are derived from methanol. Overall, the labeling results confirm that the SAL reaction provides the sole source of serine in both strains and sole source of C 1 moieties in the ΔserA ΔgcvTHP ΔfrmRAB strain.
Next, we tested whether it is possible to omit glycine from the medium, such that it will be produced endogenously via LtaE-dependent threonine cleavage (Fig. 4a). Towards this aim, we deleted, in both selection strains, the genes encoding for threonine dehydrogenase and 2-amino-3-ketobutyrate CoA-ligase (Δkbl-tdh), thus blocking the LtaEindependent route of threonine degradation (Fig. 4a) (Yishai et al., 2017). We found that replacing glycine with threonine did not alter the growth of either selection strain (Fig. 4b, note that growth was still strictly dependent on methanol). To enable this growth, the overexpressed LtaE catalyzes two subsequent reactions, first cleaving threonine to glycine and acetaldehyde and then reacting glycine with formaldehyde to produce serine. Hence, LtaE can be regarded as a glycyltransferasetransferring a glycine moiety from one small aldehyde (acetaldehyde) to another (formaldehyde).
We wondered whether it was also possible to omit threonine from the medium and rely on native threonine biosynthesis to provide this amino acid as a precursor for glycine and serine (Fig. 4a). Indeed, despite showing reduced growth rates, both selection strains were able to grow with only glucose (as main carbon source) and methanol without the addition of glycine or threonine (green lines in Fig. 4b). This indicates that half of the homoserine cycle is active: homoserine (generated natively from aspartate) is metabolized to threonine via HSK and TS, and LtaE then cleaves threonine to glycine (and acetaldehyde) and condenses glycine with formaldehyde to produce serine (Fig. 4a).
To confirm that, also in the absence of externally provided glycine or threonine, all cellular serine is produced from glycine condensation with formaldehyde, we conducted 13 C-labeling experiments. The strains were cultured in the presence of labeled or unlabeled methanol as well as glucose labeled at different carbons (glucose-1-13 C, glucose-2-13 C, and glucose-3-13 C). While the labeling pattern of glycine changed according to the labeled carbon of glucose, cultivation with 13 C-methanol always resulted in exactly one more labeled carbon in serine than in glycine (Fig. 4c). This unequivocally confirms the methanol-dependent production of serine from glycine when the latter compound is produced internally from homoserine metabolism.
Our results demonstrate the capability of LtaE to convert threonine to serine in vivo by releasing acetaldehyde and assimilating formaldehyde. These findings further confirm the physiologically relevant activity of half of the homoserine cycle, where homoserine metabolism to glycine and serine provided all the biomass requirement of these amino acids as well as cellular C 1 moieties.

Demonstration of the in vivo activity of HOB aldolase and transaminase
After demonstrating the methanol-dependent conversion of homoserine to serine, we aimed to demonstrate methanol-dependent conversion of pyruvate to homoserine. To select for the in vivo conversion of pyruvate to homoserine and threonine via HOB production and amination, we constructed a homoserine auxotroph strain: a deletion of the gene encoding for aspartate-semialdehyde dehydrogenase (Δasd) resulted in a strain capable of growing only when homoserine and diaminopimelate (DAP) were added to the medium (Cardineau and Curtiss, 1987). In this strain, homoserine is metabolized to methionine, threonine, and isoleucine, while DAP is metabolized to lysine and peptidoglycans. (We note that despite being formally reversible, homoserine dehydrogenase was not able to oxidize homoserine to aspartate-semialdehyde, the precursor of DAP, and hence the addition of the latter intermediate to the medium was required).
We reasoned that, in the presence of methanol and methanol dehydrogenase, the combined activities of HAL and HAT should enable the Δasd ΔfrmRAB strain to grow without the addition of homoserine to the medium (Fig. 5a). While only few enzymes were previously shown to catalyze the HAL reaction (Hernandez et al., 2017;Wang et al., 2019), we speculated that similar aldolases could also support this activity, maybe even outperforming those identified before. Hence, we searched for all E. coli enzymes (strain MG1655, using EcoCyc (Keseler et al., 2005)) that are known to catalyze an aldolase reaction with pyruvate as an donor (and which might be able to use formaldehyde as an acceptor). Besides RhmA itself (Hernandez et al., 2017), we found six candidate aldolases: GarL, YagE, YjhH, Eda, DgoA, and MhpE (Fig. 5b). RhmA and GarL belong to the structural family of HpcH while mhpE belongs to DmpG family. Both of these families are Type II pyruvate aldolases which use a divalent metal cation for donor binding and enolization (Fang et al., 2019). YagE and YjhH belong to the structural family of DHDPS while Eda and DgoA belong to KDPG family. These families are Type I pyruvate aldolases, using a lysine residue to form a Schiff base with the donor substrate (Fang et al., 2019). We decided to test all of the seven aldolases for their ability to catalyze the HAL reaction in vivo. As the HAT reaction is known to be supported by the native aspartate aminotransferase (AspC) (Zhong et al., 2019), a highly expressed protein (Li et al., 2014), and might be further catalyzed by other highly expressed, promiscuous aminotransferases, we hypothesized that no dedicated enzyme overexpression would be required to enable this key reaction.
We found that overexpression of mdh together with rhmA, garL, yagE, or yjhH enabled growth of the Δasd ΔfrmRAB strain when homoserine was replaced with methanol (Fig. 5c). These four aldolase enzymes supported roughly the same growth rates. No growth was observed without methanol, or when methanol dehydrogenase or the aldolase enzymes were overexpressed alone. Growth was not observed when mdh was expressed together with eda, dgoA, or mhpE. The relative sequence similarity between RhmA, GarL, YagE, and YjhH ( Fig. 5b and Supplementary Fig. S4) might explain why these enzymes, and not the others, were able to support the HAL reaction. Indeed, the structures of RhmA and GarL are almost identical (Supplementary Fig. S5). As anticipated, growth was possible without dedicated overexpression of an aminotransferase enzyme, presumably since the native high expression level of these enzymes is sufficient to support the promiscuous HAT reaction. Genomic overexpression of aspC or a mutated version of alanine aminotransferase (alaC*, the protein product of which was previously shown to catalyze the HAT reaction (Bouzon et al., 2017)) did not alter growth substantially (Supplementary Fig. S6). Similarly, genomic overexpression of thrBC did not consistently assist growth ( Supplementary Fig. S6). This indicates that the HAT, HSK and TS reactions do not constrain the flux from pyruvate to threonine.
To confirm that homoserine and its downstream products threonine and methionine are produced from pyruvate and methanol via the HAL and HAT reactions, we performed 13 C-labeling experiments. Upon cultivation with unlabeled glucose and 13 C-methanol, we found threonine and methionine to be completely once labeled, where lysine and aspartate (serving as control) were fully unlabeled (beyond the natural abundance of 13 C, Fig. 5d). This confirms that homoserine and threonine are completely derived from pyruvate and methanol.
H. He, et al. Metabolic Engineering 60 (2020) 1-13 Our experiments thus demonstrate the capacity of several E. coli endogenous aldolases to generate HOB in vivo and the capacity of the native aminotransferase network to convert HOB into homoserine. These findings further confirm the physiologically relevant activity of the "other half" of the homoserine cycle, where pyruvate metabolism to homoserine and threonine provided the biomass requirement of these amino acids.

Discussion
Different metabolic engineering strategies can be used to establish novel modes of growth and production. In the simplest approach, a The two selection strains, deleted in the LtaE-independent threonine cleavage system (Δkbl-tdh), can grow with methanol as serine precursor. Glycine is either provided externally (10 mM, blue lines) or produced internally, either from externally added threonine (10 mM, red lines) or from the internal pool of threonine (green lines). The latter growth confirms that LtaE can catalyze the LTA and SAL reactions simultaneously. In all cases, 10 mM glucose and 500 mM methanol were added. Each growth curve represents the average of three replicates, which differ from each other by less than 5%. "PC" corresponds to positive control. "DT" corresponds to doubling time. (c) Labeling pattern of proteinogenic glycine and serine within the strains shown in (b) upon feeding with glucose (glc) labeled at different carbon as well as labeled or unlabeled methanol. This labeling confirms that all cellular serine is produced from glycine and methanol even when glycine is produced internally from threonine biosynthesis and degradation. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) H. He, et al. Metabolic Engineering 60 (2020) 1-13 metabolic pathway is "copied" from one organism and "pasted" in another, where pathway enzymes are expressed in the new host (Erb et al., 2017). Most studies that try to establish synthetic methylotrophy in heterotrophic hosts follow this strategy by expressing the enzymes of the known methanol assimilation pathways in E. coli (Chen et al., 2018;He et al., 2018;Meyer et al., 2018) and other biotechnological organisms (Lessmeier et al., 2015;Witthoff et al., 2015). However, this approach is limited to a small number of possible routes, which might not represent an optimal solution (e.g., in terms of resource consumption and biomass yield) and might not interact favorably with the endogenous metabolism of the host (e.g., disrupt key cellular fluxes, necessitate complex regulation, or introduce reactive metabolic intermediates). Alternatively, a mix-and-match of existing enzymes could expand the space of possible pathways: while all the components are known, the combinatorics of their integration can be quite extensive (Erb et al., 2017). The design of the modified serine cycle exemplifies this approach, where key reactions of the natural pathway were replaced with coli enzymes are known to catalyze an aldolase reaction with pyruvate as a donor, and might be able to react with formaldehyde as an acceptor. The sequence similarity of these enzymes is indicated by the schematic tree to the left. YagE and YjhH can catalyze each others reactions. Similarly, Eda and DgoA can catalyze each others reactions. (c) Four aldolases, once overexpressed together with methanol dehydrogenase, support growth of the selection strain. Glucose was added at 10 mM, methanol at 500 mM, diaminopimelate at 0.25 mM, and isoleucine at 1 mM. Each growth curve represents the average of three replicates, which differ from each other by less than 5%. "PC" corresponds to positive control. "DT" corresponds to doubling time. (d) Labeling pattern of proteinogenic methionine (MET), threonine (THR), lysine (LYS), and aspartate (ASP) within the strains shown in (c) upon feeding with unlabeled glucose, diaminopimelate and isoleucine as well as 13 C-methanol. The results confirm that all cellular threonine and methionine are derived from HOB aldolase and transaminase reactions. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) H. He, et al. Metabolic Engineering 60 (2020) 1-13 other known reactions in order to optimize pathway activity in the host (Yu and Liao, 2018). While this approach is considerably more flexible than "copy-paste", it is still limited by the repertoire of known enzymes and reactions. In order to fully harness the power of synthetic biology and identify the most promising pathways, it is beneficial to consider also reactions which, while chemically plausible, are not known to be enzymatically catalyzed (Bouzon et al., 2017;Erb et al., 2017). A recent study has demonstrated this approach for the design and in vitro realization of synthetic photorespiration pathways which do not release CO 2 and could thus boost photosynthetic carbon fixation (Trudeau et al., 2018). Following such guidelines, previous studies designed novel methanol assimilation pathways, based on direct condensation of formaldehyde molecules (Chou et al., 2019;Lu et al., 2019;Wang et al., 2017;Yang et al., 2019); however, the rates of most of the key reactions remain too low to be of physiological significance.
In this study, we use a similar strategy to identify an optimal methanol assimilation design based on the general structure of the serine cycle. Instead of relying on completely novel reactions, the establishment of which could be highly challenging, we decided to follow a middle-ground approach, where besides known enzymatic reactions we considered characterized promiscuous activities for which at least basic supporting evidence exists.
We identified the homoserine cycle as a superior pathway design. It is short and relies solely on E. coli native enzymes. Still, its reactions are "simple" enough to be catalyzed by native enzymes of many different hosts. The pathway does not overlap with central metabolism, thus bypassing possible regulation difficulties. As its consumption of ATP and NAD(P)H is low, it is expected to support high biomass yield. Importantly, by replacing a carboxylation reaction with another formaldehyde condensation reaction, the homoserine cycle should be able to operate effectively without oxidizing formaldehyde, thus avoiding the difficulty of regulating a flux bifurcation point in which formaldehyde (and formate) can either be fully oxidized or assimilated to biomass. We further showed that the homoserine cycle has the potential to outperform even the RuMP cycle for the production of a wide array of chemicals which are derived from acetyl-CoA. Moreover, whereas the synthetic pathway is highly thermodynamically favorable (MDF > 12 kJ/mol), the RuMP cycle is only moderately favorable (MDF~4 kJ/mol). Overall, the homoserine cycle seems to display clear advantages when compared to all of its natural counterparts.
By using dedicated auxotrophic strains we were able to demonstrate that the promiscuous enzyme activities required for the homoserine cycle are sufficiently high to enable the production of biomass building blocks. Quite surprisingly, we identified four native aldolase enzymes that can catalyze the HAL reaction at a sufficient rate to enable growth of the auxotrophic strain. While dedicated overexpression of the promiscuous aldolase enzymes was necessary to establish the required activity, the needed aminotransferase reaction was supported by endogenously expressed aminotransferases.
The findings of this study provide a firm basis for future experiments towards the establishment of methylotrophic growth via the full homoserine cycle. Protein engineering and evolution can be applied to improve the activity of the promiscuous enzymes and thus enhance pathway kinetics. Specifically, while LtaE activity, which generates 10% of cellular carbon, supports a relatively high growth rate, all aldolases catalyzing the HAL reaction support a relatively low growth rate despite providing only 4% of cellular carbon. Hence, engineering these aldolases for better kinetics with formaldehyde would be highly useful.
The establishment of the activity of the full homoserine cycle would probably require long term adaptive evolution under selective conditions (Gresham and Hong, 2015). The auxotrophic strains used in this study would be useful for such evolution, as formaldehyde assimilation via the aldolase reactions must avoid the biosynthesis of serine and homoserine via other routes which would thermodynamically and kinetically "push" the aldolases in the reverse direction (i.e., formaldehyde-producing). For example, a strain auxotrophic to both serine and homoserine, overexpressing methanol dehydrogenase and both aldolases, could be first cultivated on a limiting amount of glycine and saturating amount of pyruvate and methanol. This would select for the emergence of a strain that could grow without glycine, thus combining the activities of both aldolases. The resulting strain can be then cultivated on a limiting amount of pyruvate and saturating amount of methanol, until a strain capable of growing on methanol as a sole carbon source emerges. For such growth on methanol to arise, a delicate balance between the metabolic fluxes within the homoserine cycle and those that converge to and diverge from the pathway would have to evolve. For example, the fluxes that generate pyruvatefrom serine deamination and acetyl-CoA assimilationwould have to balance the fluxes that consume pyruvate, both within the homoserine cycle as well as towards other cellular routesfor example, pyruvate oxidation to acetyl-CoA. Such an evolutionary approach was previously shown to be successful for the implementation of the Calvin Cycle in E. coli, where the fluxes that diverge from the cycle were downregulated to match the fluxes within the cycle (Antonovsky et al., 2016;Gleizer et al., 2019). While achieving E. coli growth via the homoserine cycle is undoubtedly a challenging task, it holds the promise of vast new production opportunities for the bioindustry.

Max-min driving force (MDF) analysis
MDF analysis (Noor et al., 2014) was applied to evaluate the thermodynamic feasibility of the homoserine cycle for acetyl-CoA production from formaldehyde. The natural serine cycle and the modified serine cycle (Yu and Liao, 2018) were also analyzed for comparison. eQuilibrator-API  was used for these analyses. Metabolite concentrations were constrained as descripted before (Noor et al., 2014) with two changes: (i) formaldehyde upper bound was set to 0.5 mM according to highest concentration E. coli can tolerate (He et al., 2018); (ii) glutamate and 2-oxoglutarate were set as cofactors with concentrations of 100 mM and 0.5 mM, respectively (Bennett et al., 2009). pH was assumed to be 7.5 (as E. coli cytoplasm) and ionic strength was assumed to be 0.25 M (as recommended (Alberty et al., 2011)). Since energetic calculation using C 1 -tetrahydromethanopterin intermediates are not supported by eQuilibrator-API, we used formate as substrate for serine cycle and the modified serine cycle. The scripts and details can be found at https://gitlab.com/hi-he/wenzhangfujian within the "2020_Promiscuous aldolases" directory.

Flux balance analysis
Flux balance analysis was conducted in Python with COBRApy (Ebrahim et al., 2013). New reactions were added to the most updated E. coli genome-scale metabolic network iML1515 (Monk et al., 2017) with several curations and changes: (i) transhydrogenase (THD2pp) translocates one proton instead of two (Bizouarn et al., 2005); (ii) homoserine dehydrogenase (HSDy) produces homoserine from aspartate-semialdehyde irreversibly (as we found experimentally, see main text); (iii) GLYCK (glycerate-3P producing glycerate kinase) and POR5 (pyruvate synthase) were removed from the model as their existence in E. coli is highly disputable; (iv) since we introduced NAD-dependent formate dehydrogenase, the two quinone-dependent formate dehydrogenase, FDH4pp and FDH5pp, were removed from the model; (v) pyruvate formate lyase (PFL) and 2-oxobutanoate formate lyase (OBTFL) were removed from the model as they operate only under anaerobic condition. We further removed the ATP maintenance reaction (ATPM) as, rather than estimating growth rate, we used FBA to estimate the maximal yield. Biomass yield was calculated by the predicted maximal biomass reaction flux divided by the flux of methanol H. He, et al. Metabolic Engineering 60 (2020) 1-13 uptake. While modeling acetyl-CoA production, two unrealistic pathways were blocked by deleting reactions DRPA and PAI2T respectively. The full code, including changes to the model, reactions specific to the methanol assimilation pathways, and the reactions of each production route can be found at https://gitlab.com/hi-he/wenzhangfujian within the "2020_Promiscuous aldolases" directory.

Strains and genomic manipulation
Strains used in this study are listed in Table 1. An E. coli MG1655 derived strain SIJ488 (Jensen et al., 2015) was used as the parental strain for genomic modifications. Iterative rounds of λ-Red recombineering (Jensen et al., 2015) or P1 phage transduction (Thomason et al., 2007) were used for gene deletions. For the recombineering, selectable resistance cassettes were generated via PCRprimers 50 bp homologous arms as in (Baba et al., 2006)using the FRT-PGK-gb2-neo-FRT (Km) cassette (Gene Bridges, Germany) for kanamycin resistance (Km) and the pKD3 plasmid (GenBank: AY048742) (Datsenko and Wanner, 2000) as a template for chloramphenicol resistance cassettes (CAP). The procedures of the deletion, verification and antibiotic cassette removal are detailed in (Wenk et al., 2018).
A similar strategy was applied to exchange the genomic promoter of target genes. A constitutive strong promoter pgi-20 (P pgi-20 ) (Braatsch et al., 2008) and a ribosome binding site "C" (AAGTTAAGAGGCAAGA) (Zelcbuch et al., 2013) were constructed downstream of the CAP cassette using primers listed shown in Supplementary Table S2. The synthetic promoter was first introduced to the SIJ488 strain by the recombineering method; P1 transduction was then used to transfer the synthetic promoter into the selection strains. thrB (encoding homoserine kinase, HSK) and thrC (encoding threonine synthase, TS) are on the same operon with thrL and thrA. Since thrL encodes regulatory peptide and thrA is redundant in the Δasd selection strains, thrLA was deleted during the promoter exchange of thrBC. The point mutations A142P Y275D (Bouzon et al., 2017) were introduced along with the promoter exchange of alaC (In this case, the recombineering cassette has the mutated gene downstream the CAP cassette and synthetic promoter). Promoter exchanges were confirmed by sequencing the promoter regions.

Growth experiments
Strains were precultured in 4 mL M9 medium with proper carbon sources and streptomycin. The precultures were harvested and washed three times in M9 medium, then inoculated in M9 media with suitable carbon sources, with a starting OD 600 of 0.02. 150 μL of culture were added to each well of 96-well microplates (Nunclon Delta Surface, Thermo Scientific). Further 50 μL mineral oil (Sigma-Aldrich) was  He, et al. Metabolic Engineering 60 (2020) 1-13 added to each well to avoid evaporation (while enabling gas diffusion). The 96-well microplates were incubated at 37°C in microplate reader (BioTek EPOCH 2). The shaking program cycle (controlled by Gen5 v3) had 4 shaking phases, lasting 60 s each: linear shaking followed by orbital shaking, both at an amplitude of 3 mm, then linear shaking followed by orbital shaking both at an amplitude of 2 mm. The absorbance (OD 600 ) in each well was monitored and recorded after every three shaking cycles (~16.5 min). Raw data from the plate reader were calibrated to normal cuvette measured OD 600 values according to OD cuvette =OD plate /0.23. Growth parameters were calculated using MATLAB (MathWorks) based on three technical triplicatesthe average values were used to generate the growth curves. Checked in MATLAB, in all cases variability between triplicates measurements were less than 5%.

Stable isotopic labeling
13 C-Methanol, glucose-1-13 C, glucose-2-13 C, glucose-3-13 C were purchased from Sigma-Aldrich. Strains were cultivated aerobically in glass tubes on M9 minimal media with the appropriate carbon sources at the concentrations mentioned above (section 4.5: Growth media). Experiments were performed in duplicates which in all cases showed identical results ( ± 5%). Cells were harvested at the late exponential phase. The equivalent volume of 1 mL of culture at OD 600 of 1 was harvested and washed by centrifugation. Protein biomass was hydrolyzed with 6 M HCl, at 95°C for 24 h (You et al., 2012). The samples were completely dried under a stream of air at 95°C. Hydrolyzed amino acids were analyzed with UPLC-ESI-MS as previously described (Giavalisco et al., 2011). Chromatography was performed with a Waters Acquity UPLC system (Waters), using an HSS T3 C 18 reversed phase column (100 mm × 2.1 mm, 1.8 μm; Waters). 0.1% formic acid in H 2 O (A) and 0.1% formic acid in acetonitrile (B) were the mobile phases. The flow rate was 0.4 mL/min and the gradient was: 0 to 1 min -99% A; 1 to 5 minlinear gradient from 99% A to 82%; 5 to 6 minlinear gradient from 82% A to 1% A; 6 to 8 minkept at 1% A; 8-8.5 minlinear gradient to 99% A; 8.5-11 minre-equilibrate. Mass spectra were acquired using an Exactive mass spectrometer (Thermo Scientific) in positive ionization mode, with a scan range of 50.0 to 300.0 m/z. The spectra were recorded during the first 5 min of the LC gradients. Data analysis was performed using Xcalibur (Thermo Scientific). The identification of amino acids was based on retention times and m/z, which were determined by analyzing amino acid standards (Sigma-Aldrich) under the same conditions.

Molecular phylogenetic analysis
The protein sequences of the aldolases predicted to catalyze the HAL reaction were obtained from UniProt: RhmA/YfaU P76469, GarL P23522, YagE P75682, YjhH P39359, Eda P0A955, DgoA Q6BF16 and MhpE P51020. MAFFT v7 (Katoh and Standley, 2013) was used for multiple sequence alignment with default parameters. The aligned sequences were used by MEGA X (Kumar et al., 2018) with Maximum Likelihood method to construct a phylogenetic tree. The bootstrap consensus tree was generated with the setting No. of bootstrap replications to 1000.

Declaration of competing interest
A.B.E. is co-founder of b.fab, aiming to commercialize microbial growth of C 1 compounds. b.fab was not involved in this study and did not fund it.