Environmental conditions drive self-organisation of reaction pathways in a prebiotic reaction network

The evolution of life from the prebiotic environment required a gradual process of chemical evolution towards greater molecular complexity. Elaborate prebiotically-relevant synthetic routes to the building blocks of life have been established. However, it is still unclear how functional chemical systems evolved with direction using only the interaction between inherent molecular chemical reactivity and the abiotic environment. Here, we demonstrate how complex systems of chemical reactions exhibit well-dened self-organisation in response to varying environmental conditions. This self-organisation allows the compositional complexity of the reaction products to be controlled as a function of factors such as feedstock and catalyst availability. We observe how Breslow’s cycle contributes to the reaction composition by feeding C 2 building blocks into the network, alongside reaction pathways dominated by formaldehyde-driven chain growth. The emergence of organised systems of chemical reactions in response to changes in the environment offers a potential mechanism for a chemical evolution process that bridges the gap between prebiotic chemical building blocks and the origin of life.


Introduction
The origin of life from the prebiotic environment required extensive development of increasingly sophisticated chemical systems, with only environmental factors and inherent chemical activity as the driving forces. 1 Conditions on early Earth allowed for a wide range of chemical reactions which could have given rise to a diverse range of structurally complex organic molecules. [2][3][4][5][6][7] Recent work has shown how these simple starting materials may produce many of the components found in extant metabolic networks, hinting at the possibility of a prebiotic origin of a core metabolism. 4,8,9 Yet, for a functional genetic or metabolic system to emerge from a mixture of feedstock molecules, increasingly organised and interconnected reaction pathways must be forged between them. [10][11][12][13] Chemical reactivity alone is not su cient to dictate the formation of one pathway over another, 14 and it is therefore di cult to conceive how on prebiotic Earth, chemical systems became organised to avoid the formation of intractable mixtures. Environmental conditions could provide a directing force for the emergence of (pre)biotic systems. 1,15 There is a dearth of understanding and experimental studies of how reactive and environmental information translates into the self-organisation of chemical reaction networks. Therefore, creating a conceptual framework in which chemical reactivity and reaction conditions combine to organise dynamic, out-of-equilibrium reaction networks is key to elucidating how inanimate matter evolved towards life.
Here, we demonstrate how simple sets of reactions between chemical feedstocks present in a model prebiotic reaction system form well-de ned compositional patterns via the interaction between environmental conditions and chemical reactivity rules. Employing the formose reaction as a model system in a series of ow reactions, we studied the responsiveness of the system to a broad range of environmental factors. By characterising the propagation of periodic input modulations through chemical reaction networks, 16-19 we were able to infer the underlying reaction connectivity between formose products and unravel the self-organisational response of the network to environmental conditions. Our results demonstrate how patterns of chemical reactivity and environmental conditions may give rise to organised systems of chemical reactions, offering the rst glimpse of a possible mechanism for chemical evolution.

Results And Discussion
The formose reaction is a prebiotically-plausible model system of sugar-forming reactions using formaldehyde as a C 1 building block (Fig. 1a). [20][21][22] It broadly consists of ve reactions: enolate formation/protonation, aldol addition, retro-aldol and Cannizzaro reactions (Scheme S1). Much of its core reactivity is catalysed by hydroxide and divalent metal ions such as Ca 2+ . 23 Conceptually, any given monosaccharide or enol(ate) compound in the formose reaction may be converted into another via application of the aforementioned reaction types. Therefore, a range of feedstock monosaccharides may be used to initiate the reaction.
Unconstrained, recursive application of the limited set of reaction classes operative in the formose reaction produces a so-called combinatorial explosion of compounds (Fig. 1a). A number of studies have explored means to contain the potential generation of intractable mixtures of compounds formed in the formose reaction using thermodynamic constraints. [24][25][26] However, relatively little data has been collected for comprehensively rationalising the formose reaction under out-of-equilibrium conditions. 22,26-30 Such conditions are a key characteristic of living systems and of high relevance on prebiotic Earth, 12 upon which the conditions were dynamic and modulated on a variety of timescales. In out-of-equilibrium chemical reactions, kinetic, rather than thermodynamic, properties govern the reaction behavior and product distribution. 11 Therefore, molecular reactivity is a prime controller of such systems.
Here, out-of-equilibrium conditions were induced in the formose reaction using ow conditions in a continuous stirred-tank reactor (CSTR, Fig. 1B). The compositional and reaction connectivity responses of Experiments were performed to measure steady-state equilbrium compositions of the formose reaction. In addition, investigations were also performed in which the input concentrations of initiating sugars were modulated sinusoidally. Measuring the transfer of input modulation to product compounds (Extended Data 4) provided a handle on which estimations of the underlying reaction pathways of the formose reaction could be based. [16][17][18] The composition of the CSTR was continually sampled from its out ow. Following appropriate derivatisation, 31-33 samples were analysed by GC-MS and HPLC ( Fig. 1c; see Materials and Methods for full details). Analysis of the chromatographic peaks and mass spectra provided a compositional pattern for each sample (Extended Data 3, ve examples are shown in Fig. 2a), comprising of varying amounts of the 52 compounds detected within the data set.
To visualise the relationships between the average compositions and kinetic signatures generated for each condition, hierarchical clustering was performed using a correlation-based pairwise dissimilarity metric (Materials and Methods). The resulting dendrogram (depicted qualitatively in Fig. 2b, see also Figure S2) represents the relative relationships between reaction outcomes. Pie plots placed on the 'leaf' positions represent normalised average product distributions. Longer paths between leaves represent lower similarity. The inset panels indicate how the values of key environmental parameters vary across the branches of the dendrogram.
Each branch of the dendrogram arises as a result of a dominant environmental factor. Fine-tuning of compositions within branches result from the mixing of additional condition variables. For instance, branch I is characterised by relatively low concentrations of formaldehyde (1, ≤ 50 mM) and inputs combining glycolaldehyde (2) dihydroxyacetone (3) and erythrulose (9). Its constituent compositions are distinguished by relatively high amounts of C 6 compounds (green sector hues). Following branch I from its tip towards the center of the tree, more diverse sets of compounds are produced, including both branched and linear C 4 and C 5 compounds (denoted by hues of red). Interestingly, varying the concentration of 1 (with xed inputs 3, CaCl 2 and NaOH of 50, 15 and 30 mM, respectively) results in a series of compositional transitions (Fig. 2c), manifesting as a series of 'jumps' of varying magnitude across the dendrogram (Fig. 2d). Beginning in branch I ([1] ≤ 50 mM), the compositions consist of mostly of an α-hydroxymethyl-aldohexose (32) and an α, β-(hydroxymethyl)-aldotetrose (37). Within the series of experiments with [1] ≤ 50 mM, the composition remains in branch I but the molecular diversity increases as the [1] increases. The contributions of 32 and 37 in the reaction mixture decrease in favor of the generation of α-hydroxymethyl-glyceraldehyde (7), 9 and ribulose (20). Increasing [1] to 50 mM results in a jump towards the center of the tree, suggesting the beginning of a more signi cant compositional transition. Compounds 7, 20 and an α-hydroxymethyl-aldotetrose (14) become prominent. Further increasing the [1] to above 50 mM results in a signi cant compositional transition from branch I to branch VII, highlighting a transition in the molecular complexity of the system. Threose (10), lyxose (18), two 3-ketohexoses (29, 30) and a new α-hydroxymethyl-aldopentose (34) are added to the composition. Compounds 32 and 37 are almost completely depleted. Thus, the concentration of feedstock molecule 1 controls a thresholded compositional transition whose dynamic range is in the region [1] = 0-100 mM.
Ca 2+ and hydroxide are involved in several reaction types of the formose reaction. Ca(OH) 2 has been noted to have greater activity in catalysing the formose reaction than Sr 2+ or Ba 2+ hydroxides. 34 Fe(OH) 3 has also been reported to be active in catalysing formose reactions. 35 Though de nitive characterisation is not yet available, it is generally understood that Ca 2+ binds divalently to enolates, 34,36 and may participate in organising intermediates in aldol addition reactions. The principal role of hydroxide is in αproton abstraction and enolate formation from monosaccharides, which has been noted to be a key rate-limiting step in the reactions of this class of compounds. [37][38][39] Hydroxide is also involved as a reactant in Cannizzaro reactions.
A range of [Ca 2+ ]:[NaOH] input ratios (remaining below the solubility limit of Ca(OH) 2 ) were explored in the data set maintaining xed concentrations of 1 and 3. A demonstrative subset of the data (Fig. 2e) crosses three branches of the dendrogram (II, III and VII, Fig. 2f). Beginning at low [Ca 2+ ]:[NaOH] (in branch VII), compositions similar to those previously described in the high [1] regime are found. Other environmental conditions such as varying the initiator sugar identity lead to distinctive compositions. Branches IV, V and VI result from using 9, ribose (19) and the dimer of 2, respectively as initiators. When the temperature in the reactor from 10 to 40°C (branch VII), the reaction composition remained remarkably constant. At 10°C there is a relatively lower concentration of 10. There is also a slight divergence of the concentrations of 18 and 20 with increasing temperature. Therefore, the in uence of temperature on the steady state composition of the formose reaction is modest in the range investigated.
The observed compositional variations are a direct result of the translation of the input conditions through the underlying formose reactions. To elucidate the structure of the self-organised reaction networks, we exploited the principles of the transfer of input modulation to different compounds in the network, 16-19 rule-based pathway reaction network generation [40][41][42] and graph searching. 43 The framework provides a direct translation of the experimental data into a descriptive set of reactions responsible for the compositions observed (see Materials and Methods for details). Brie y, a reaction network was generated by recursively applying a set of reaction rules to an initial set of compounds (1, the dimer of 2 and NaOH). This network was used as a basis for a set of pathway searches. It has been shown that periodic perturbations applied to reactants in reactions systems are transferred to products and co-reactants. [16][17][18][19] The amplitudes (here, modulations in compound concentration) measured in response to the periodic input decrease as a function of connectivity with respect to the input. Figure 3a shows a series of representative timecourse measurements that follow compound concentration changes in response to a modulated input of 3. The search procedure was used to estimate the self-organisational response of the formose reaction pathways in response to increasing formaldehyde concentration (initiated by 3, Fig. 3b). When the [1] is 5 mM, the operative reaction pathways are mainly accounted for by a small set of reactions between C 3 species to form C 6 compounds (green pathways). Increasing [1] to 10 mM, triggers expansion of the repertoire of reactions.
The number of possible pathways for formaldehyde addition (red) increases, with a corresponding increase in the number of proton transfer pathways (black). However, 1-based chain growth pathways do not appear to completely account for the observed behavior. Although compound 14 (a branched aldopentose) has a lower amplitude than 9, consistent with chain growth via formaldehyde addition, 20 (ribulose) and 18 (lyxose) have stronger couplings to the input modulation than expected (Fig. 2a). There is a very low concentration of 12 and 13 (3-ketopentoses), which would be required in the formaldehyde growth pathway towards 2-ketopentoses and aldopentoses. For these reasons, we attribute the production of 20 to the reaction between 2 and the enolate of 3 and 4 (blue pathways). Further increasing [1] (> 50 mM) builds C 1 growth pathways from the previously described set of pathways, producing 29 and 30 as a product of the formaldehyde addition to enolates derived from 18 and 20.
The pathway searching protocol was applied to experimental data for which input modulation was applied, affording lists of reactions across the set of conditions. These reactions describe how each product distribution was created from the carbon-containing inputs. Each reaction was assigned a class. Re-casting the lists of reactions as counts of each reaction class provides a condensed view of how formose reactivity adapts to environmental conditions (Fig. 3c).
Following the branches of the dendrogram traversed in Fig. 2d (variation in [1]), reveals key reaction characteristics which govern the various reaction outcomes. A signi cant feature of branch I is the relatively low proportion of formaldehyde aldol addition reactions. The majority of the reactivity is accounted for by monosaccharide-enolate reactions between C 3 compounds, which are responsible for creating products 32 and 37 (Fig. 4a, branch I). 38,44 Moving to branch VII (higher [1]), the repertoire of reactions is expanded, and aldol addition reactions involving 1 are added to the network. In particular, reactions in which the α-carbon is bound to a hydrogen or glycol group are promoted. A range of protonation/ deprotonation reactions are promoted in branch VII. Deprotonation is favored at less sterically hindered positions (where the α-carbon is bound to a hydrogen or methoxy group, e.g. follow the sequence 3, 9, 12, 20, 29 in Fig. 4a, branch VII). Protonation is favored at α-carbons bound to methoxy groups. Interestingly, the amount of monosaccharide-enolate reactions also increases, suggesting that some monosaccharide products interact with other members of the network as reactants.
The formose reaction reorganises in a different manner when the [Ca 2+ ] and [NaOH] are varied. At high [Ca 2+ ] (52 mM) and low [NaOH] (2.5 mM) a limited set of pathways is formed, the majority of which may be accounted for via formaldehyde addition and proton transfer reactions terminating at 14 via 9 (the pathway connecting 3, 9 and 14 in Fig. 4b). Interestingly, the linear C 5 12 and 13 are not formed in signi cant quantities, so the reaction hits the 'dead end' branched 14. However, the system unexpectedly avoids the formation the branched C 4 compound 7 under these conditions. Decreasing the [Ca 2+ ]: [NaOH] ratio (35 mM:10 mM) leads to a more signi cant contribution of formaldehyde-controlled pathways. The branched C 4 7 is created, whilst the branched C 5 14 is demoted and at the same time the population of C 5 species and instances of Cannizzaro reactions are increased. As the ratio is further decreased, the conditions and reaction pathways begin to resemble those found for the high [1] region, as described above.
In contrast to the prevailing views of the formose reaction, our data indicate that formaldehyde-based chain growth pathways do not completely account for the observed behavior. Rather, reactions between C 2 and C 3 compounds are key chain-building reactions. 20,46,47 Surprisingly, we observe the emergence of a self-organised cyclic set of reactions that explain how the C 2 monosaccharide 2 must be created from retroaldol reactions, as described in Breslow's proposed mechanism for autocatalysis in the formose reaction (Fig. 4c). 45 Though usually seen as an autocatalytic mechanism our results show how it can contribute to formose reactivity as a generator of 2, 3 and 4 (and their enolates) as building blocks embedded in a set of pathways in which chain growth occurs via formaldehyde addition. As such, the Breslow cycle can be envisaged as a source of new reaction pathways through which monosaccharides may be built. These reactions between the formose reaction products are an excellent example of how underlying patterns in chemical reactivity de ne reaction outcomes. Thus, we propose that reinforcement of molecular diversity in the formose reaction does not necessarily occur via promotion of an initiating species (glycolaldehyde). Rather, diversity may be promoted by the activation of a class of reactions in which longer carbon chains are synthesised from building blocks with units of greater than one carbon.

Conclusion
Here, we have demonstrated how an environment directs the self-organisation of reaction pathways in the model prebiotic formose reaction network to create well-de ned product compositions. Our analysis is based on detailed HPLC and GC-MS characterisation of the formose reaction under out-of-equilibrium conditions, combined with graph pathway analysis informed by chemical knowledge.
In the absence of a directing force, the recursive application of the small number of chemical reactions operative in the formose reaction potentially leads to a wide range of possible reaction pathways and compositions. However, we have found that in the diverse environments studied, the formose reaction is induced into using only a subset of these pathways, depending on the local conditions. This re nement of reaction pathways results in well-de ned product compositions, in contrast to the intractable mixtures expected of a combinatorially detonated reaction route. We were able to estimate the organisation of the underlying reaction pathways via analysis of the time-resolved propagation of periodically changing inputs. This analysis revealed that sets of reactions can respond collectively to environmental conditions. The self-organisation we observe is reminiscent of theoretically predicted ne-tuning of reaction network behaviour to environments. 48,49 Our work represents an important breakthrough in understanding how molecular systems adapt abiotically to the environment. Environmental conditions control reaction types to varying degrees in complex abiotic reaction networks, leading to well-de ned reaction pathways and product compositions. Such translation of information from the environment into its embedded chemical reaction networks hints at how reaction networks of biological importance may be the result of the abiotic self-organisation of systems of reactions in the absence of speci c catalysis or genetic inheritance. Applying our methodology to other prebiotically relevant reaction networks that include cyanosul dic 50 or α-ketoacid 8,9 reactivity could shed new insight into an environment-driven formation of a primitive core metabolic networks furnishing the building blocks of life.

Analysis Programs
All programs used to analyse and plot the data are available on GitHub (https://github.com/huckgroup/formose-2021.git). Figure 1 Background to this work. a: Schematic overview of the formose reaction. Key reaction pathways are coloured according to the reaction types (given below the scheme, wavy lines indicate both stereoisomers possible. Details of reaction types are given in Scheme S1). b: Schematic drawing of the experimental setup. Syringe pumps containing formaldehyde, an initiator sugar, CaCl2, NaOH and water are connected to the inlets of a continuous stirred-tank reactor (CSTR). The outlet of the reactor was sampled continually. c: An example GC-MS chromatogram indicating reactor composition. Regions of the chromatogram are coloured according to carbon chain length. Peaks are labelled according to the compound number (panel (a) Figure S1).

Figure 2
Description of the reaction data set collected in this work. a: Example product distributions demonstrating the compositional diversity covered by the data sets. Bars are coloured according to compound, with C4 alcohols in grey, C4 sugars in blue, C5 compounds in red and C6 compounds in green. b: The relationships between product compositions and compound response pro les in the data set when   Key reaction pathways in the discussion a: The shift in reaction pathways from branch I to branch VII controlled by formaldehyde concentration. b: A subset of reaction pathways from panel a which account for the majority of reaction behavior in branches II and III. c: Relation of the apparent reaction pathways to the Breslow cycle,45 indicating how the cycle acts as a source of C2 building blocks for the formose reaction, whilst its constituents may be involved in off-cycle formaldehyde chain-growth reaction pathways. The reaction arrows are coloured according to the legend below the gure.