Autocatalytic sets and chemical organizations: modeling self-sustaining reaction networks at the origin of life

Two related but somewhat different approaches have been proposed to formalize the notion of a self-sustaining chemical reaction network. One is the notion of collectively autocatalytic sets, formalized as RAF theory, and the other is chemical organization theory. Both formalisms have been argued to be relevant to the origin of life. RAF sets and chemical organizations are defined differently, but previously some relationships between the two have been shown. Here, we refine and explore these connections in more detail. In particular, we show that so-called closed RAFs are chemical organizations, but that the converse is not necessarily true. We then introduce and apply a procedure to show how chemical organizations can be used to find all closed RAFs within any chemical reaction system. We end with a discussion of why and how closed RAFs could be important in the context of the origin and early evolution of life.


Introduction
Currently the dominant paradigm in origin of life research is the RNA world (Gilbert 1986, Joyce 2002. However, despite experimental progress towards the spontaneous formation of RNA (Powner et al 2009, Hud et al 2013, Patel et al 2015, the RNA world hypothesis still has problems (Benner et al 2012, Szostak 2012, and so far no one has been able to show that RNA can catalyze its own template-directed replication.
An alternative view is that the origin of life most likely involved a network of cooperating chemical elements, all mutually 'helping' (i.e. catalyzing) each other's formation from a basic food source (Nghe et al 2015, Hordijk andSteel 2017). Such a cooperative network is more formally described as an autocatalytic set (see below). Autocatalytic sets have been studied extensively recently, and have been shown to exist in computational models as well as in real chemical and biological reaction networks (Hordijk 2013, Sousa et al 2015, Hordijk and Steel 2017. In previous work we have shown that autocatalytic sets often consist of a hierarchical structure of smaller and smaller autocatalytic subsets (Hordijk et al 2012), which satisfies one of the main requirements for them to be evolvable (Vasas et al 2012, Hordijk and. In fact, in theory there can be an exponentially large number of autocatalytic subsets within a given reaction network (Hordijk et al 2012), and we showed that in practice one can indeed expect a large number of them .
However, in a dynamical sense it turns out that many of these autocatalytic subsets are only 'transient', in the sense that they would almost immediately become part of a larger subset where all reactions that can happen indeed will happen. This has previously been defined as a closed autocatalytic set (Smith et al 2014). Even though there can, in principle, also be an exponentially large number of such closed autocatalytic subsets within a given reaction network, an unresolved question so far has been how many of them can be expected in practice in arbitrary reaction networks.
A concept related to autocatalytic sets is that of chemical organizations. A chemical organization is a similar but more general formalism describing closed and self-maintaining chemical reaction networks (Dittrich and Speroni di Fenizio 2007). Although this formalism does not explicitly include catalysis or a food source, these can be included implicitly (explained in detail below). In earlier work, some connections between autocatalytic sets and chemical organizations were already derived (Contreras et al 2011, Smith et al 2014. Here, we refine and explore these connections in more detail. In particular, we show that the chemical organizations within an autocatalytic set include all its closed autocatalytic subsets. This corrects a minor error in Contreras et al (2011), by showing that only closed autocatalytic subsets are chemical organizations. However, we also show that not all chemical organizations are necessarily closed autocatalytic sets, thus answering a question that was left open in Contreras et al (2011). We then introduce a procedure for finding all closed autocatalytic subsets within a given autocatalytic network by calculating its chemical organizations. This procedure is illustrated by applying it to a simple computational model of polymer-based reaction networks, which provides insight into the expected number of closed autocatalytic subsets in practice. We end by discussing possible consequences for origin of life studies, in which closed autocatalytic subsets are likely to be of primary interest from a dynamical and evolutionary point of view.

Methods
The methods used in this paper are all theoretical and computational. In particular, they are based on the formalisms of autocatalytic sets and chemical organizations, which are then applied to a class of computational models known as the binary polymer model. These formalisms and models are reviewed briefly in the current section.

RAF theory
The concept of autocatalytic sets, as used here, was originally introduced by Kauffman (1971) (Kauffman 1986(Kauffman , 1993, and subsequently formalized and further developed as RAF theory (Steel 2000, Hordijk and Steel 2004, Mossel and Steel 2005, Hordijk and Steel 2017. Here, we briefly review the basic definitions and results of RAF theory. First, we define a chemical reaction system (CRS) as a tuple X C , ,   = ( ) consisting of a set X of molecule types, a set  of reactions, and a catalysis set Ç =AE, representing the reactants A and products B of a chemical reaction, respectively.
Note that in the RAF framework (and its graph theoretical realization), the stoichiometric coefficients are not required, so it is sufficient to represent a reaction by a pair of sets. A catalyzation x r C , Î ( ) indicates that molecule type x catalyzes reaction r.
We also explicitly consider the notion of a food set F X Ì , which is a subset of molecule types that are assumed to be directly available from the environment. Often, we will include F in the CRS description by writing ) . The notion of catalysis plays a central role here. A catalyst is a molecule that significantly speeds up the rate at which a chemical reaction happens, without being 'used up' in that reaction. Catalysis is ubiquitous in life (van Santen and Neurock 2006). The majority of organic reactions are catalyzed, and catalysts are essential in determining and regulating the functionality of the chemical reaction networks that support life.
Given a set of molecules A X Í and a set of reactions   ¢ Í , we define the closure A X cl , ¢ Í ( ) as the set of molecules obtained by adding to A all molecules that can be produced (in arbitrarily many steps) from A using reactions from R¢.
An autocatalytic set (or RAF set, for short) is now defined as a subset   ¢ Í of reactions that is: (i) reflexively autocatalytic: each reaction r  Î ¢ is catalyzed by at least one molecule type that is either a product of ¢ or is present in the food set F; and (ii) F-generated (F): all reactants in ¢ can be created from the food set F by using a series of reactions only from ¢ itself; formally for all A B A cl F , , , A formal definition of RAF sets is also provided in Hordijk and Steel (2004) and Hordijk et al (2011), including an efficient (polynomial-time) algorithm for finding RAF sets in a general CRS, or determining there is none. This RAF algorithm returns the unique maximal RAF set (maxRAF), which is the union of all RAF (sub)sets within a given CRS. If no RAF set exists within a CRS, the RAF algorithm returns the empty set. A maxRAF can often be decomposed into several smaller subsets which themselves are RAF sets, i.e., subRAFs (Hordijk et al 2012). If such a subRAF cannot be reduced any further without losing the RAF property, it is referred to as an irreducible RAF, or irrRAF (Hordijk and Steel 2004). A simple example is given in figure 1.
RAF theory has been applied extensively to simple polymer-based models of chemical reaction networks (see below), showing that autocatalytic sets are highly likely to exist at modest (and realistic) catalysis rates, and under a wide variety of model assumptions (Hordijk and Steel 2004, Mossel and Steel 2005, Hordijk et al 2011, 2014a, 2014b, Smith et al 2014. Moreover, these results show that generally many hierarchical levels of subRAFs and irrRAFs exist in such networks (Hordijk et al 2012. The formal RAF framework has also been applied successfully to analyze real chemical and biological networks in terms of autocatalytic sets Steel 2013, Sousa et al 2015). Autocatalytic networks have even attracted attention in the physics community (Farmer et al 1986, Stadler et al 1993, Jain and Krishna 1998, Hordijk et al 2010.

Closed RAFs and CAFs
As we defined the closure of a set of molecules (relative to a set of reactions), we can similarly define the closure of a set of reactions (relative to a set of molecules), leading to the notion of a closed RAF. Given a CRS with a food set, ) , a (non-empty) subset ¢ of  is said to be a closed RAF if ¢ is a RAF that satisfies the additional closure property (CP) (Smith et al 2014): CP: ¢ contains each reaction r  Î for which each reactant and at least one catalyst are either generated by another reaction from ¢ or are part of the food set F.
In other words, a closed RAF is a RAF in which each reaction that can happen indeed will happen. This notion of the closure of a RAF ¢, denoted ¢ was already defined in Smith et al (2014), and ¢ is a closed RAF if and only if   ¢ = ¢. Briefly, the closure of a RAF ¢ is obtained by starting with ¢ and sequentially adding new reactions from  if their reactants and at least one catalyst are either an element of the food set or a product of a reaction in the set so-far constructed. The notion of closure is well-defined, i.e. it does not depend on the order in which reactions are added (Smith et al 2014). The irrRAF r r , figure 1 is also a closed RAF, as is the maxRAF r r r , , Note that a maxRAF in any CRS is always closed. Thus, any CRS that has a RAF has a unique maximal closed RAF (namely the maxRAF). However, as pointed out in Smith et al (2014), although the union of RAFs is always a RAF, the union of closed RAFs need not be a closed RAF. Note also that, given a CRS  and food set F, a closed RAF is completely determined by the subset of molecules that participate in its reactions (Smith et al 2014).
Finally, a stronger notion of a RAF is a constructively autocatalytic and F-generated (CAF) set, which requires that the reactions can be ordered so that for every reaction in the sequence, each of its reactants and at least one catalyst are either produced by an earlier reaction in the sequence or are in the food set F (Mossel and Steel 2005). One can also define a closed CAF in an analogous way, namely as a CAF ¢ that satisfies the additional CP above.

Chemical organization theory (COT)
COT is an alternative formal framework for defining and studying closed and self-maintaining reaction networks. Following ideas by Fontana andBuss (Fontana 1992, Fontana andBuss 1994), COT has been developed to study the hierarchical structure (Speroni di Fenizio et al 2000) and dynamics (Speroni Di Fenizio and Dittrich 2002) of artificial chemistries that employ catalytic reactions under a general dilution flow. Later, the theory was generalized to arbitrary reaction networks (Dittrich and Speroni di Fenizio 2007). Note that chemical organizations do not require an explicit notion of catalysis or a food set. However, COT shares with RAF theory a precise axiomatic approach, where the objects of interest are sets of molecule types as opposed to sets of reactions. Nevertheless, as we show below, there is a direct correspondence between the two theories, and the mathematical representation of one theory can be readily converted into one of the other, and vice versa.
In COT, a reaction network is a pair X,  ( ) where X is a set of molecule types and  is a set of reactions. A reaction r  Î is a pair of multisets over X, representing the reactant and product side, respectively. Note that catalysis can be formally included by adding a catalyst x to both the reactants and the products of r, e.g.
. Note that this is excluded in RAF theory, where A and B are sets and A B Ç = AE. The reactions  can be represented by two matrices L i r , ( )and R i r }is the set of product types of reaction r. Given a reaction network X,  ( ) a chemical organization is a set of molecule types O X Í that is: (i) closed: none of the reactions that can be applied using only molecules from O generates any molecules that are not already in O, i.e. for all r r O : Note that here the closure is always defined relative to the full reaction network . The self-maintaining condition formalizes the notion that molecules that can react will eventually react and that molecules can be regenerated as fast as they are used up (at least for some setting of the reaction rates as specified by v). A reaction network in COT corresponds to a CRS in RAF theory, and is a flow system, since for each molecule x F Î in a CRS there is an inflow reaction x AE  in the corresponding reaction network in COT, and for each molecule x X Î in a CRS there is a decay reaction x  AE in the corresponding reaction network in COT (this will be derived in detail in section 3.1). This implies the following properties (Dittrich and Speroni di Fenizio 2007): for any set A X Í , we can generate uniquely a chemical organization With G O we can map any state x of a dynamical flow system uniquely to a particular chemical organization , with x f ( ) the molecules with strictly positive concentrations in x. Thus we can track qualitative transitions in the space of chemical organizations. Given a 'chemical' differential equation of the form t t x Sv x = ( ) ( ( )) (with kinetic laws v x ( )), basically all long term behaviors tend towards chemical organizations, i.e., given a limit set and any state of this set, the molecular types with positive concentrations form a chemical organization (Peter and Dittrich 2011). COT can be used to unravel the hierarchical structure of models (Centler and Dittrich 2007), follow the chemical evolution of constructive artificial chemistries, predict growth phenotypes ( Kauffman (1993) introduced an abstract model of chemical reaction networks, now known as the binary polymer model, which has been used extensively to study autocatalytic sets, e.g., (Bagley and Farmer 1992, Filisetti et al 2012. In this model, molecules are represented by bit strings that can be concatenated into longer ones or cut into smaller ones. Catalysis is then assigned at random.

The binary polymer model
More specifically, the binary polymer model generates a CRS as follows: • The molecule set X consists of all bit strings up to (and including) a maximum length n: X 0, 1 n • The food set F consists of all bit strings up to (and including) length t, where t n  . Here, t=2 is used: F 0, 1, 00, 01, 10, 11 = { } .
• The reaction set  consists of two types of reactions: (i) Ligation: concatenating two bit strings together into a longer one, e.g. 00 111 00111 +  .
(ii) Cleavage: cutting a bit string into two smaller ones, e.g. 010101 01 0101  + . All possible ligation and cleavage reactions between bit strings are included in , as long as none of the reactants or products violates the maximum molecule length n.
• The catalysis set C is constructed at random for each instance of the model, as follows. For each bit string x X Î and reaction r  Î , the pair (x, r) is independently included in C with probability p.
This model has two main parameters: the maximum molecule length n and the probability of catalysis p. We keep the maximum length of food molecules fixed at t=2 throughout.
Note that in this model each ligation (forward) reaction has a corresponding cleavage (backward) reaction. In the model, if a molecule catalyzes a reaction, it catalyzes it in both the forward and the backward direction. Because of the random catalysis assignments, each instance of the model gives rise to a different reaction network, which may or may not contain one or more RAF (sub)sets and/or chemical organizations. Figure 2 shows a maxRAF as found by the RAF algorithm in an instance of the binary polymer model with n=5 and p = 0.0045. It also shows some of the RAF subsets that exist within the maxRAF as differently colored areas. Note that the top right subset, shaded green, is not in itself a RAF, but it can be added to the blue subRAF to create a larger RAF; such a set is called a 'co-RAF'.

Results
We now show how the concepts of closed RAFs and chemical organizations are formally related by deriving some new mathematical results. We then introduce a computational procedure involving the calculation of the chemical organizations within a (max)RAF to find all its closed subRAFs. Finally, we show the results of applying this procedure to instances of the binary polymer model.

Theoretical results
Previously it was shown that an F-generated set is always a chemical organization, but the reverse may not be true (Contreras et al 2011. We now show that there is an even closer connection between RAFs and COT, in particular in the context of closed RAFs. First note that, in general, a CRS  can have many closed RAFs but it has at most one closed CAF. To see this, recall that there is always a unique maximal closed RAF for  (namely the maxRAF). However,  may also have many smaller closed (sub)RAFs. In fact, it is straightforward to construct an example of a CRS  for which the number of closed RAFs is exponential in the size of . For example, if we let n n 1, , )has 2 1 nclosed RAFs, corresponding to the non-empty subsets of .
By contrast, a CRS has at most one closed CAF by virtue of the following result.
)has at most one closed CAF.
Proof. Consider the following sequence of subsets of . Let 1  be the set of reactions in  that have all their reactants and at least one catalyst present in F. For i 1 > , let i  be the reactions in i 1  -along with those reactions r in  for which every reactant of r and at least one catalyst of r are either present in F or are produced by a reaction in i 1  -. Formally, for i 1 > : and with , .
Here r r ( ) denotes the set of reactants of r, and We now derive a direct relationship between closed RAFs and chemical organizations. Let ) be a CRS (with food set F) and let  be any subset of 0  . We first state the following definitions: where the left-and righthand sides should be viewed as multisets (i.e. x i may already appear as a reactant or product, so another copy of it is added on each side). In other words, each molecule that catalyzes a reaction is explicitly included as both a reactant and a product for that reaction.
• Let , for all and : , for all .
In other words, F  + is a set of 'inflow' reactions that introduce food molecules, and X is a set of 'outflow' reactions that remove each molecule type.
• Let   ( )denote the set of chemical organizations that exist within the algebraic chemistry X,   (˜).
• For o X Í , let o ( ) denote the set of reactions in  that have all their reactants and at least one catalyst in o. In other words, o ( ) is the subset of reactions in  that can happen in a catalyzed manner when the molecule set o is present.
is the set of molecules in X appearing as a product of some reaction in ¢. In other words, O ¢ ( )is the union of the food set and all molecules produced by all reactions in ¢.
The next lemma will be helpful in our second theoretical result stated below.
(i) If a maxRAF  in a CRS  contains no CAF, then there will be a chemical organization o   Î ( ) consisting of only F which does not correspond to any closed RAF. ⟶ indicates a reaction that is catalyzed by the molecule 00 (and similar for 11). These reactions only involve food molecules, and since  is the maxRAF, o=F is a chemical organization for   . So, there is a chemical organization o   Î ( ) consisting of only F, but there is a CAF and a corresponding closed RAF o ( ) (both equal to the maxRAF). , We can now fully state our second theoretical result.
Theorem 2. Given a CRS X C F , , ,   = ( )and its corresponding reaction network RN X,  =  (˜), the following hold: (i) if ¢ is a RAF for  then the corresponding set of molecules O ¢ ( )is self-maintaining for RN; (ii) if, additionally, the RAF ¢ is closed then the corresponding set of molecules O ¢ ( ) is a chemical organization for RN; (iii) the reverse of (ii) is not true in general, i.e. in RN there can be a chemical organization that does not correspond to any closed RAF in .
Proof. Part (i): the argument that O ¢ ( )is self-maintaining involves a similar approach to that of lemma 4.1(i) of Steel et al (2013), which, in turn, is related to lemma 2 of Contreras et al (2011).
Since ¢ is F-generated, lemma 3.1 of Steel et al (2013) shows that ¢ has a linear ordering r r r , , , k 1 2 ¼ so that each reactant of r i is either an element of F or is a product of some reaction r j where j i 1  < . Now we can order the reactions in   so that the reactions F  + come first, followed by the expansions of r 1 , then the expansions of r 2 and so on, up to the expansions of r k , finally followed by the reactions R X -. Next, consider the corresponding stoichiometric matrix S. The first non-zero element in each row of S is +1, by the property of the linear ordering described for r r , , k 1 ¼ , and noting that any catalyst x contributes 1 1 0 + + -= ( ) to any entry of S. Finally, for any real matrix with this last property, there is a strictly positive column vector v for which Sv 0 > , since if S has c columns and if the largest absolute value of any negative entry of S is b, then we can take v to be the strictly positive vector that has its ith coordinate given by v b 1 Part (ii): since the RAF ¢ is closed, O ¢ ( )also satisfies the CP required for a chemical organization. Part (ii) now follows from part (i).
Part (iii): we show this by example. Consider the following set of reactions , which could be a subset of an instance of the binary polymer model with n=6 and t=2: If we assume that these are the only catalyzed reactions, then  is the maxRAF of this model instance. Figure 3 shows this maxRAF in a graphical way. It turns out that X,   (˜) contains three chemical organizations: }is not F-generated (each of the three reactions has at least one non-food molecule as its reactants), and is therefore not a closed RAF of . This illustrates a second case in which a chemical organization can fail to correspond to a closed RAF.
The third chemical organization listed above, o F 110, 1100, 11011, 110110 gives rise to an o ( ) that is equal to the maxRAF, and is thus a chemical organization that does correspond to a closed RAF.
, Part (ii) of theorem 2 implies that the closed RAFs in a given (max)RAF  correspond to chemical organizations. However, as part (iii) shows, the converse need not always hold, even in the binary polymer model. Essentially, this is because the self-maintaining property of a chemical organization is an algebraic condition based on certain linear inequalities holding for a freely varying positive (flux) vector, and thus has a reasonable degree of freedom. On the other hand, a RAF  has to be F-generated, which translates into a more strict combinatorial condition regarding the ability to order the reactions in  in a certain way, i.e., for each reaction in the sequence each of its reactants (but not necessarily any catalyst) is either a product of an earlier reaction or is present in the food set.
The reason we need to include the inflow and outflow reactions in   to make the correspondence between closed RAFs and chemical organizations work is because food molecules are implicitly assumed to be available from the environment in the RAF framework, without necessarily needing to be produced by the reaction network itself. The inflow reactions make this assumption explicit in the chemical organization context. Furthermore, if the outflow reactions were not included, there may exist chemical organizations containing molecules that are neither produced nor used up in any reaction. In a RAF this is not allowed, as all molecules need to be produced, through a series of reactions from the RAF set itself, from the food molecules. Therefore, inclusion of the outflow reactions forces all molecules to be produced at a strictly positive rate, rather than merely a non-negative rate (as is normally assumed for a chemical organization). However, as part (iii) of theorem 2 shows, there may be chemical organizations for   that do not correspond to closed RAFs in . First, any RAF  that does not contain a CAF will have a chemical organization o=F consisting of only the food molecules, which would correspond to an 'empty' closed RAF o ( ). However, as lemma 1(ii) shows, we cannot simply discard any chemical organization that consists of F only. But since there can be at most one such chemical organization in any reaction network, it is easy to check if such an chemical organization actually corresponds to a closed RAF or not.
Second, note that in the counter-example given in the proof of theorem 2, the three reactions (r r r , , 3 4 5 ) form what is known as an autocatalytic cycle. Reaction r 3 starts with one 110 molecule plus a food molecule (11) to create 11011. Reaction r 4 then takes this newly produced molecule plus another food molecule (0) to create 110110. Finally, reaction r 5 splits this longer molecule into two copies of 110. In other words, this sequence of three reactions starts with one copy of 110, adds two food molecules, and then produces two copies of 110, which can be written in short-hand as: In other words, the cycle of three reactions 'reduces' to a single autocatalytic reaction. Hence the name autocatalytic cycle. However, this is different from an autocatalytic set, even though each of the three reactions in the autocatalytic cycle is catalyzed by a molecule from the set itself (in this case, 110 for each reaction). In fact, the reaction set r r r , , { } is not F-generated and is thus not an autocatalytic set, but it does correspond to a chemical organization, as shown in the proof above. In conclusion, an autocatalytic set may contain a subset that forms an autocatalytic cycle that corresponds to a chemical organization, but not to a (closed) RAF.
There may even be other specific situations that give rise to a chemical organization but not a closed RAF. However, we argue that these cases are likely to be rare. Indeed, as the computational results below show, they do not seem to occur often (or at all) in random instances of the binary polymer model.
Assuming these exceptions are indeed rare, theorem 2 suggests that calculating the chemical organizations in   provides a reliable method for finding the closed RAFs in a (max)RAF . We therefore propose the following computational procedure: Procedure findClosedRAFs: ) with food set F X Ì , let  be a (max)RAF for . Let  p ( ) be the set of products of reactions in .
(i) Convert  into the expanded reaction set  by replacing each r  Î and x F ) by a reaction in which x is added to the left and right side. In other words, if a reaction r has more than one catalyst in F  È p ( ) , it will be represented multiple times in  , once for each of its catalysts.
(ii) Convert  into   by adding the reactions in F  + and X  -.
This procedure provides an exact algorithm for finding all closed RAFs, by way of calculating all chemical organizations within a (max)RAF . Given that there was no known algorithm yet to enumerate all closed RAFs, or even to determine whether a maxRAF contains another (smaller) closed subRAF, the above procedure can be very helpful in providing more insight into the (so far) unresolved question of how many closed RAFs can be expected to exist within arbitrary reaction networks. As already argued earlier, it is exactly these closed RAFs (corresponding to chemical organizations) that are of interest from a dynamical point of view, and they play an important role in the potential evolvability of autocatalytic sets (see the discussion below).
Note, however, that calculating all chemical organizations ( Step 3) is, in general, an intractable problem, as the number of chemical organizations (and also closed RAFs) can grow exponentially with the size of the reaction network. This algorithm is therefore limited in practice to relatively small reaction networks (a few hundred reactions, roughly), but it is guaranteed to find all closed RAFs.
The last step, checking whether each chemical organization corresponds to a RAF set (and thus a closed RAF set), is necessary due to theorem 2(iii). However, the standard RAF algorithm can be used for this and is therefore an efficient operation, especially compared to the computationally costly step of calculating all chemical organizations.
An illustrative example of applying this computational procedure to instances of the binary polymer model is provided next.

Computational results
Recall figure 2, which shows an example of a maxRAF  as found by the RAF algorithm in an instance of the binary polymer model with n=5 and p = 0.0045. This maxRAF consists of eight reactions. Some of the subRAFs within the maxRAF are indicated by colored shapes. When only food molecules (bit strings of lengths one and two) are present, only the purple and yellow subsets can immediately come into existence, as they use only food molecules as their reactants and catalysts. In other words, the union of the purple and yellow subsets forms the CAF for this network. However, the red and blue subsets each need at least one spontaneous (uncatalyzed) reaction to happen before they can come into existence. Finally, once the blue subset exists, it can be extended with the green subset, which, by itself, is not a proper RAF but can form a larger RAF together with the blue subset. The green extension also requires a spontaneous reaction to come into existence dynamically.
A short movie showing these various subsets coming into existence at different times as a result of (stochastic) spontaneous reaction events is available online at Hordijk (2016).
The eight reactions making up this particular maxRAF  are as follows: Note that for the calculation of the chemical organizations below, we only consider the ligation (forward) reactions here, to keep the example tractable. This maxRAF actually contains a total of 29 subRAFs, which form a partially ordered set (poset), shown as a Hasse diagram in figure 4 (right). The question is now which ones of these 29 subsets are also closed RAFs. To answer this, we calculate the chemical organizations within the maxRAF.
First, the maxRAF  is converted to the corresponding reaction set   as stated in the findClosedRAFs procedure: organizations, also forming a poset as shown in figure 4 (left). The molecules contained in each chemical organization are shown next to the corresponding nodes in the Hasse diagram, with only the 'new' molecules listed (i.e. those molecules that are not already in any of a chemical organization's subsets). For example, the chemical organization labeled '3' contains molecule 01111 in addition to the molecules already listed for the chemical organizations labeled '0' and '2' (its subsets). Finally, we check whether the reconstructed reaction sets o ( ) of these six chemical organizations are indeed RAF sets, which they all are. Therefore, all six chemical organizations o correspond to closed RAFs o ( ). Figure 4 (right) shows how these six closed RAFs are situated within the poset of subRAFs. These closed RAFs correspond to the following reaction subsets and combinations of colored shapes in figure 2: r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r r where P = purple, Y = yellow, R = red, B = blue, and G = green. This example explicitly shows how the procedure works. To get more insight into the expected number of closed RAFs within arbitrary maxRAFs, figure 5 shows the number of closed RAFs found within 120 different maxRAFs of sizes up to around 200 reactions in the binary polymer model, using n=6 and p = 0.0050, and only forward (ligation) reactions. As this graph shows, these maxRAFs contain up to around 30 closed RAFs.
For maxRAFs of around 200 reactions, calculating the poset of subRAFs is already highly intractable. Even maxRAFs of around 20 reactions can have more than 1000 subRAFs. But calculating the chemical organizations (of which there are fewer) is still doable, with the largest instances from the current sample requiring up to one hour of computing time on a 1.50 GHz CPU. Unfortunately, though, for n=7 in the binary polymer model (with maxRAF sizes averaging close to 400 reactions), calculating the chemical organizations becomes too timeconsuming as well.
Finally, to confirm that the counter-examples as shown in the proof of theorem 2 indeed seem to be rare, there are only 27 instances, out of the 120, where the number of closed RAFs is one less than the number of chemical organizations. These are actually all instances where there is no CAF in the maxRAF, which means that there is one chemical organization consisting of just the food set F which does not correspond to a closed RAF (which is easy to check). In other words, cases like the autocatalytic cycle as illustrated in the proof of theorem 2 have not occurred in the current sample. Thus, the proposed computational procedure indeed seems to provide a reliable method for finding all closed RAFs within a (max)RAF.

Discussion
We have shown a direct and formal relationship between closed RAF sets and chemical organizations, refining and extending earlier established relationships between COT and RAFs (Contreras et al 2011, Smith et al 2014. This relationship is considered to be relevant in the context of the origin of life in the following way. As was shown elsewhere, the existence of subRAFs is one of the main requirements for autocatalytic sets to be evolvable (Vasas et al 2012, Hordijk and). However, not every subRAF will necessarily be realized or persist in a dynamical sense. In particular, many subRAFs will only exist as a 'transient' state, quickly becoming part of a larger subset where all reactions that currently can happen indeed will happen. Figure 2 showed an example where there are only six closed subsets out of a larger set of 29 subRAFs. Only these six closed subRAFs are dynamically 'stable', while the other subRAFs are 'transient'. Thus, these closed RAFs are the equivalent of the viable cores (plus their periphery) of Vasas et al (2012), possibly serving as units of variation and selection in early evolution via the possibility of having different combinations of closed RAFs existing inside different compartments (or 'protocells').
However, an important remaining question was how many closed subRAFs can be expected to exist within arbitrary chemical reaction networks. In other words, how much potential variation is there for such early evolution to work with? Here we have introduced and applied an exact computational procedure, based on calculating the chemical organizations within a maxRAF, to answer this question, thus confirming a suggestion made by Contreras et al (2011). Even in a simple polymer model where molecules are represented by bit strings up to length six, we can already expect to see up to around 30 closed subRAFs. We expect this number to grow with increasing network sizes, although unfortunately the introduced computational procedure quickly becomes intractable for such larger sizes.
In conclusion, it seems that we can expect a large enough diversity of closed subRAFs to exist within arbitrary reaction networks (or, more precisely, within arbitrary maxRAFs) to satisfy a basic but important requirement for autocatalytic sets to be evolvable. In earlier work, we had already shown that autocatalytic sets (or maxRAFs) have a high probability of existing, also for moderate (and chemically realistic) levels of catalysis Steel 2004, 2017). Thus, together these results could have significant implications for a possible origin and further evolution of life as a collection of mutually catalytic molecules, i.e., a cooperative reaction network, rather than one or a few individually self-replicating RNA sequences.
Note that one of the main reasons the introduced procedure works is that it explicitly looks for chemical organizations within the maxRAF  (or, more precisely, its expanded version   ), rather than in the full reaction network 0  in which the maxRAF  exists. When asking whether a set of molecules X is a chemical organization or not, it is important to specify the reaction set  that is implied. For example, in Contreras et al (2011) it is stated that 'all RAF sets are chemical organizations'. However, this is only true relative to the reaction set ¢ that consists of only the reactions in the given RAF (sub)set. It is (in general) not true relative to the maxRAF  that a RAF subset ¢ might be part of, or relative to the entire reaction network 0  in which the RAF set exists.
Here we have made the relationship between chemical organizations and RAF sets more precise, by showing that all closed RAF (sub)sets are chemical organizations (relative to the maxRAF). Furthermore, we have answered an open question from Contreras et al (2011): 'whether all chemical organizations are F-generated is an ambiguous matter'. We have shown here, by counter-example, that not all chemical organizations are necessarily closed RAF sets, as they may not be F-generated. However, the exceptions can either be easily detected (such as chemical organizations that consist of only food molecules, when there is no CAF) or they seem to be rare (such as autocatalytic cycles, which may not be F-generated).
This precise correspondence between closed RAFs and chemical organizations also allows a direct application of COT to RAF sets and their dynamics. Thus we can track qualitative transitions in the space of subRAFs, unravel the hierarchical structure and follow the chemical evolution of closed RAFs, predict growth phenotypes, and coarse-grain the overall RAF dynamics.
Some computational questions still remain. For example, is it possible to obtain an estimate of the number of closed subRAFs for larger reaction networks or, alternatively, obtain a random but representative sample of closed subRAFs? Another interesting question is whether or not there is a polynomial-time algorithm for the following basic decision problem: given a CRS X C F , , , ) having a maxRAF , does  contain a closed RAF as a strict subset? We hope to address some of these questions in future work.