Theoretical Computer Science

Efforts in programming DNA and other biological molecules have recently focused on general schemes to physically implement arbitrary Chemical Reaction Networks. Errors in some of the proposed schemes have driven a desire for formal veriﬁcation methods. By interpreting each implementation species as a multiset of formal species, the concept of weak bisimulation can be adapted to CRNs in a way that agrees with an intuitive notion of a correct implementation. The theory of CRN bisimulation can be used to prove the correctness of a general implementation scheme or to detect subtle problems. Given a speciﬁc formal CRN and a speciﬁc implementation CRN, the complexity of ﬁnding a valid interpretation between the two CRNs if one exists, and that of checking whether an interpretation is valid are both PSPACE-complete in the general case, but are NP-complete and polynomial-time respectively under an assumption that holds in many cases of interest. We present effective algorithms for both of those problems. We further discuss features of CRN bisimulation including a transitivity property and a modularity condition, the precise connection to the general theory of bisimulation, and an extension that takes into account spurious catalysts. © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
In molecular programming, many real and abstract systems can be expressed in the language of Chemical Reaction Networks (CRNs).A CRN specifies a set of chemical species and the set of reactions those species can do, and the CRN model allows us to deduce the global behavior of the system from that local specification.CRNs are a useful way to separately analyze the computational and the physical aspects of a system.We can use the CRN model to help analyze real systems [1,2] or design engineered systems [3,4].
The CRN model is particularly useful as a programming language for molecular computation.Small CRNs have been designed to exhibit simple behaviors and to compute simple problems, such as the "rock-paper-scissors oscillator" which oscillates between high concentrations of three species in consistent order [5,6], and the approximate majority network which converts all of two initial populations of species to whichever one was initially greater [7].Other examples of CRNs designed to compute include a CRN that produces an output with count equal to the larger of two input counts [8] and a CRN that simulates a given Turing machine with arbitrarily small probability of error [9].To show that using CRNs as a programming language can apply to real molecular systems, Chen et al. experimentally demonstrated an implementation of the approximate majority CRN [3], and Srinivas et al. demonstrated an implementation of the rock-paper-scissors oscillator [4], both using DNA strand displacement cascades [10].
In order to discuss how these CRNs compute, we first define a semantics that specifies the behavior of a CRN, then define computation in terms of, for example, from a given input state, how many of or with what probability a given output molecule is produced.The two most common semantics are deterministic or ordinary differential equation (ODE) semantics, and stochastic or continuous-time Markov chain (CTMC) semantics.Deterministic semantics assumes real-valued concentrations of species which evolve according to a system of ODEs determined by the reactions.Stochastic semantics assumes integervalued counts of species which transition discretely at random times according to the reactions, with rates based on the counts and the reaction rate constants.Any CRN can be interpreted in either semantics, but its behavior may be slightly or entirely different.In deterministic semantics, for example, the rock-paper-scissors oscillator will oscillate correctly indefinitely, and the approximate majority CRN will correctly convert all molecules to whichever species was initially greater quickly, even as the difference between initial counts approaches zero, except for an equilibrium with both species present when the difference is exactly 0. In stochastic semantics, on the other hand, the rock-paper-scissors oscillator will oscillate for a while but eventually undergo an extinction event and stop oscillating [6], while the approximate majority CRN will still compute correctly with high probability, but may convert all molecules to the wrong species with error probability increasing with smaller initial differences [7].The CRN from [8] that computes the maximum of its input counts functions identically in deterministic and stochastic semantics, which we suspect is related to the rate-independent property discussed below, a connection which has been partially explored by Chen, Doty and Soloveichik [11].Finally, the CRN to simulate a Turing machine has only been explored in the stochastic model, since it depends on exact integer counts of certain species; its dynamics in deterministic semantics are probably uninteresting.
There is a certain subclass of CRNs that, while they are interpreted in the stochastic model, do not depend on randomness or relative probabilities of certain trajectories to compute an interesting function.This type of computation has been called "deterministic computation" by some [8], but to avoid confusion with deterministic semantics we refer to it in this paper as rate-independent computation.Rate-independent computation requires that the correct answer is produced no matter what reactions fire in what order (subject to a fairness condition), a property which implies that for any choice of positive rate constants, the correct answer is produced with probability 1 in stochastic semantics.Rate-independent computation can compute exactly the semilinear functions [12,8,13], an example of which is computing the maximum of two input counts [8].A concept of rate-independent computation in the deterministic model has been explored by Chen, Doty and Soloveichik [11], where it can compute exactly positive continuous and piecewise rational linear functions, but the exact relation between the two concepts of rate-independent computation is unknown.
Despite the current progress in CRN computation, there remains a gap between abstract and real CRNs.To illustrate this gap, consider the approximate majority CRN [7,3]: This abstract CRN quickly and with high probability converts all of the initial X and Y molecules into the same amount of whichever one was initially greater [7].However, no three molecules with exactly this behavior are known to exist.(In a strict sense, no three molecules with exactly this behavior can exist, because for all three reactions to be driven forward would require X + Y to be both lower-energy and higher-energy than 2B.)For contrast, consider the DNA strand displacement system built by Chen et al. [3] meant to implement this abstract CRN.The DNA system uses additional molecules which are consumed as "fuel" to drive these three reactions, ending up with over 25 each of species and reactions.Without knowing that it is meant to be an implementation of the approximate majority CRN, it might be difficult to tell what the DNA system was meant to do.Even knowing the correspondence, it is not obvious that there is no mistake in that complex implementation.
The issue of verifying correctness is exacerbated by the recent profusion of experimental and theoretical implementations in synthetic biology and molecular programming.Of particular interest to us, Soloveichik et al. [14] designed a systematic way to construct a DNA system to simulate an arbitrary CRN.Since then there have been a number of methods to translate an arbitrary CRN into a DNA strand displacement circuit [14][15][16].While each one gave arguments for why it was a correct implementation, they did not come with a general theory of what it means to correctly implement a CRN.In some cases this led to subtle problems, of which we will give examples later.To be certain that such implementations are correct, CRN verification methods were invented.Such methods include Shin's pathway decomposition [17], Lakin et al.'s serializability analysis [18], and Cardelli's morphisms between CRNs [1].
The concept of "bisimulation" was developed as a way of comparing the observable behavior of concurrent systems [19].Roughly speaking, bisimulation involves coming up with a relation on states such that any action in one of a pair of related states can be matched in the other, leading to another pair of related states.Weak bisimulation, in particular, abstracts away from the details of the system by focusing on "observable" actions and allowing matches between sequences with arbitrary numbers of "silent" actions and one observable action.From this local concept of equivalent states, one can prove global properties of equivalence of the behavior of the systems.Bisimulation has been studied in finite-state systems, Petri nets, and hybrid systems, among others [20][21][22].The complexity of bisimulation in finite systems and in Petri nets (which are equivalent to CRNs) has also been studied; particularly relevant to us, whether an (arbitrary) bisimulation relation exists that relates the initial states of two Petri nets is undecidable [21].However, the direct application of bisimulation to Petri nets used in that result ignores the structure of a Petri net (or CRN) in the following sense: where bisimulation allows matching arbitrary states with each other, CRNs (and equivalently, Petri nets) have a structure on the state space that allows addition of states, and we might require that the bisimulation relation preserves that addition.For example, if A ∼ = B and C ∼ = D, where ∼ = is the bisimulation relation, we might require that A + C ∼ = B + D. If such constraints better capture the notion of "equivalent CRNs", they could also be exploited to simplify the tasks of finding a satisfactory bisimulation or proving that none exists.
Motivated by the expectation that there is a natural class of restricted bisimulations that respect the structure of CRNs in a way well-suited for molecular implementations-and that makes analysis tractable-we present a method for comparing an implementation CRN with an abstract CRN based on the concept of bisimulation from concurrency theory [19].Our method associates each implementation species with a multiset of formal species, then asserts correctness if the reactions reachable from any implementation state are the same as those in the corresponding state in the abstract CRN.Like pathway decomposition [17] and serializability [18] but unlike Cardelli's morphisms [1], our bisimulation method works with the stochastic model for low-copy-number CRNs and doesn't take into account rates or kinetics.Therefore, like pathway decomposition and serializability, bisimulation cannot prove that a property true with some probability in the formal CRN will be true with the same or even approximately the same probability in an implementation CRN, but it can guarantee that any rate-independent behaviors or computations of the formal CRN will carry over to the implementation.The use of interpretations instead of pathways means that some implementations considered correct by pathway decomposition are considered incorrect by bisimulation and vice versa.Interpretations also make bisimulation more local than pathway decomposition or serializability, in that many properties can be checked on individual reactions rather than pathways; we hope this makes bisimulation more understandable and tractable.We show how bisimulation can be used to prove a CRN implementation correct or identify subtle problems.We present an algorithm to check whether a particular interpretation between two CRNs is a bisimulation relation, and an algorithm to find such an interpretation if one exists.We analyze the computational complexity of both problems.We prove that both are PSPACE-complete in the general case but become polynomial time and NP-complete, respectively, when formal reactions are limited to a constant number of reactants.We hope this method can be used in both verifying that engineered systems match their specification and in comparing natural systems to a system simple enough to analyze.

Related works
Our research into verification of CRN implementations is inspired by a number of works on implementing arbitrary CRNs with DNA strand displacement, such as [14][15][16].Soloveichik et al. in [14] present a general construction for DSD implementations of arbitrary CRNs, give an argument that the ODE semantics of their construction should approximate the original CRN in a certain limit, and demonstrate similar (but not identical) behavior on some example CRNs.However, this argument does not address the stochastic model, or rate-independent computation within the stochastic model.(We will show in Section 3.3 that the construction in [14] is in fact correct for rate-independent computation according to our definition of bisimulation, but this is not proven in [14].)Qian et al. in [15] present a "history-free" general CRN implementation suited to the stack machines they then describe, and give an argument for its qualitative correctness in the stochastic model; however the argument is non-rigorous.In fact, the argument in [15] misses an error in the construction as published when applied to certain combinations of reactions, allowing some probability of incorrect behavior when run with low counts of species, which we will discuss further in Section 3.3.Cardelli in [16] presents a simplified, history-free general CRN implementation using only nicked double-stranded gates and single-stranded signals, and defines an algebra which can be used to prove statements about the behavior of such systems.This algebra can prove some desirable properties, but Cardelli in [16] acknowledges that properties such as lack of crosstalk require exploring large state spaces and are thus difficult to prove with that technique.
Visual DSD [23] and Peppercorn [24] provide formal semantics for the behavior of a DNA strand displacement such as the ones mentioned in [14][15][16].Both [23] and [24] also provide an algorithm to, given a DSD system, produce a CRN which models its behavior.This allows us to reduce the problem of verifying DSD implementations of CRNs to the problem of checking whether one CRN is a correct implementation of another.
In addition to the bisimulation method we propose, other methods have been proposed for verifying CRN equivalence for the use case of DNA strand displacement implementations of CRNs, usually focusing on rate-independent computation in the stochastic model.Lakin et al. define a method of verification of an implementation CRN that is already divided into one module for each formal reaction [18].This verification method gives properties of each module individually and of non-interaction between modules that, if satisfied, imply that every trajectory of the implementation CRN is equivalent to a serial execution of trajectories corresponding to some sequence of formal reactions, which is at least sufficient for cor-rectness of rate-independent computation.Shin et al. define a method of computing the "formal basis" of a CRN based on pathways in the implementation CRN that begin and end in formal species [17].If every pathway can be decomposed into the pathways that make up the formal basis, and certain niceness conditions are satisfied, then this "pathway decomposition" method can guarantee that the implementation CRN will be equivalent, at least in rate-independent computation, to its formal basis.Both of these methods depend on a division of the implementation CRN into "formal" species, "fuel" species, "waste" species, and "intermediate" species.This division is a typical feature of engineered DNA strand displacement implementations of CRNs such as the ones mentioned above, but may not generalize to other types of implementations or to comparing natural systems.Lakin et al. [18] give no algorithm for computationally checking the conditions of their serializability method when they are not already known by design; Shin et al. [17] give an algorithm for finding the formal basis of a CRN, but its complexity is not known and it seems to require more than polynomial space in the general case.Both of these methods have been applied to DNA strand displacement implementations such as those in [14][15][16].Where Lakin et al. [18] discuss a method of verification that they apply to a specific class of implementations (two-domain strand displacement), and Shin et al. [17] discuss a method with an algorithm that can be applied to any individual implementation CRN, we hope to present a method with an algorithm that can verify any individual implementation CRN, but is also tractable enough to permit proofs that, for example, any implementation created by a given translation scheme will be correct.
Another area of related work has explored more abstract forms of model reduction and CRN equivalence not shaped by DNA strand displacement CRN implementations.Gay et al. discuss a method meant to be useful for systems biology models, where the exact structure of the network has a number of uncertainties and unknowns [25].This method is based on two operations, merge and delete, which can be applied to graphs of the reactions of a CRN, such that there is an epimorphism from the detailed graph to the more simple graph if and only if the simple graph is equivalent to the result of some sequence of these rules applied to the detailed graph.Some of these concepts correspond loosely to concepts in bisimulation, for example the merging of two species in [25] is intuitively similar to two implementation species being interpreted as the same formal species, but [25] does not have any equivalent of the requirement in bisimulation that any reaction involving one of two equivalent species has an equivalent reaction that involves the other instead.Possibly for this reason, Gay et al. do not prove any properties of the behavior of their networks that a graph epimorphism implies carry over from one network to another.
In a line more directly related, Cardelli, Tribastone, Tschaikowski, Vandin, and Tognazzi have developed multiple concepts of CRN equivalence based on strong bisimulation.Forward and backward bisimulation respectively imply two different forms of equivalence of the ODE semantics behavior of the equivalent CRNs [26,27], while syntactic Markovian bisimulation implies equivalence of the CTMC semantics [28].Each of these types of bisimulation is an equivalence relation between species of a single CRN, with properties that imply that a simpler CRN, whose species are the equivalence classes of that relation, has dynamics that are well-defined and equivalent in the appropriate sense to the original CRN.In all three cases it is easily checkable whether a given relation is a (forward, backward, or syntactic Markovian) bisimulation, and the same authors give algorithms and software tools to find such relations [29][30][31]28].All three of these methods correspond to strong bisimulation in the sense of [19]: one reaction in one system must be matched by exactly one reaction in the other, and vice versa, as opposed to weak bisimulation, where one reaction in one system can be matched by zero or more reactions in the other.This allows these methods to imply equivalence of kinetics in the deterministic and stochastic models, respectively, which would be much more difficult with weak bisimulation.These methods also use a relation on species to induce the relation on states required by bisimulation in the sense of [19], which disallows phenomena such as a single strand of a DSD circuit representing that one copy of the formal species C and two copies of formal D are all present.Since DSD CRN implementations tend to use both one DSD species representing multiple formal species and one formal reaction taking multiple DSD reactions to implement, none of the DSD schemes presented in [14][15][16] are equivalent to their formal specifications according to forward, backward, or syntactic Markovian bisimulation.
A preliminary version of this work by the same authors was presented at the DNA22 conference in 2016 [32], which in turn incorporated many elements of Qing Dong's 2012 Masters Thesis [33] on this topic.The present paper has been expanded to include additional examples and improved proofs; an algorithm to check the modularity condition and complexity analysis of that algorithm; an expansion of the theory to handle "spurious catalysts", algorithms for the expanded theory, and complexity analysis of those algorithms; additional examples to illustrate various features of CRN bisimulation; and additional discussion of this theory's place in the field of bisimulation and related verification theories.

The chemical reaction network model
We work within the Chemical Reaction Network (CRN) model.A CRN is a pair (S, R), where S is a finite set of species and R a finite set of reactions.A reaction is itself a triple (R, P , k), where the reactants R and products P are both multisets of species, and k > 0 is a real number.We require that in any reaction R = P , and that no two reactions (R, P , k 1 ) and (R, P , k 2 ) with the same reactants and products exist in the same CRN.Once we define semantics, we will see that these requirements can be met without loss of generality; in both deterministic and stochastic semantics, a reaction (R, R, k) has no effect on the CRN's behavior, and a CRN with two reactions (R, P , k 1 ) and (R, P , k 2 ) behaves exactly the same as one where those two are replaced by one reaction (R, P , k 1 + k 2 ).
2C is a shorthand for the two reactions ({| A, B|} , {|2C|} , k 1 ) and ({|2C|} , {| A, B|} , k 2 ).Where S ∈ N S is a multiset and X ∈ S, S( X) refers to the count of X in S; this matches the formal definition of N S as the set of functions S → N. Multisets can be added and multiplied by scalars componentwise, and can be compared componentwise: S ≤ T ⇐⇒ ∀ X S( X) ≤ T (X), and S < T if S ≤ T and S = T .If S ≤ T then subtraction T − S is defined componentwise.Set operations involving multisets implicitly treat each multiset as the set of all objects which appear at least once; e.g.{|1, 1, 2|} ⊂ {1, 2, 3} but {|1, 1, 2|} ⊂ {1}.S ∧ T is the componentwise minimum, (S ∧ T )(X) = min(S( X), T (X)).S\T = S − (S ∧ T ) is the removal of T from S: (S\T )(X) = max(S( X) − T (X), 0).The size of a multiset S ∈ N S is the number of objects in it, |S| = X∈S S( X).
The CRN model comes with two common interpretations or semantics: deterministic semantics and stochastic semantics.
In deterministic semantics, the state of the CRN at any given time is a vector s ∈ R S ≥0 of the concentrations of each species, which evolves according to a set of ordinary differential equations.These differential equations come from the reactions, and for each X ∈ S, where Here ρ(r i , s) refers to the "rate" of reaction r i in state s, which according to mass-action kinetics is given by ρ (Y ) .Since this paper mostly does not deal with deterministic semantics, we only briefly mention it.
In stochastic semantics, the state of the CRN at any given time is a vector S ∈ N S of the counts of each species, which transitions probabilistically to other states, forming a continuous-time Markov chain.In any given state S, each reaction , and the reaction takes the CRN from state S to state S − R i + P i .Note that this expression is defined if and only if S ≥ R i ; when S ≥ R i we set ρ(r i , S) = 0.These propensities play the role of state transition rates in the continuous-time Markov chain.
In the stochastic model, each possible behavior of a CRN is specified by a timed trajectory: an initial state S 0 ∈ N S together with a (finite or infinite) sequence of reactions r i = (R i , P i , k i ) ∈ R and times t i at which they occur, with t i > t i−1 > 0. When we care only which reactions happened in what order but not at what exact time, we can define a trajectory as an initial state followed by a sequence of reactions, without the times; each trajectory can be identified with the set of all timed trajectories with that initial state and sequence of reactions.A trajectory implicitly specifies a sequence of states S i = S 0 + j≤i (P j − R j ), but a sequence of states is not enough to specify a trajectory.For example, if both reactions, then the sequence of two states (S 0 , S 1 ) = ({| X, A|} , {| X, B|}) does not specify which of those two reactions happened, which is sometimes important.The CTMC implicitly specifies a probability distribution over timed trajectories, and since a trajectory interpreted as a set of timed trajectories will be a measurable set in this probability space, we also get a probability distribution over trajectories.For rate-independent computation, we care only about which trajectories are possible, ignoring the times of the reactions and the relative probabilities.We say that a finite trajectory is valid if and only if it has nonzero probability, and an infinite trajectory is valid if every finite prefix is valid.Note that a trajectory is valid if and only if every reaction is possible in the state it occurs, i.e.R i ≤ S i−1 for all i.Since whether a trajectory is valid in a CRN does not depend on the rate constants of any reaction (as long as they are all positive), and from here on we are generally working with rate-independent computation, we write reactions as a pair (R, P ) or R → P instead of a triple (R, P , k) or R k − → P .In general when we speak of "the trajectories of a CRN" we mean the valid trajectories.
A state T is reachable from a state S if T is the result of a valid finite trajectory that starts in S. We say a state T is coverable from a state S if there is some T ≥ T such that T is reachable from S. While the set of reachable states (from any given initial state) is an important aspect of the behavior of a CRN, it does not contain all the information about that CRN.
For example, the two CRNs ({A, B, C }, {A → B, B → C , C → A}) and ({A, B, C }, {A → C , C → B, B → A}) have exactly the same set of reachable states T from any given initial state S, but in an external context that distinguishes A, B, and C from each other these two sets of reactions are clearly different in a meaningful way.If however the set of (valid) trajectories of two CRNs are the same, then the two CRNs must be identical: since in particular the length-zero trajectories (i.e.states) are the same, so the sets of species are the same, and the length-one trajectories (single reactions) are the same.We say that two CRNs are isomorphic if there is a bijection between the sets of species such that the set of reactions of one, after applying this bijection, equals the set of reactions of the other.

Interpretations
Schemes for translating an arbitrary abstract CRN into a DNA Strand Displacement (DSD) implementation [14][15][16] provide designs for the necessary DNA molecules, but how these molecules interact is best described by a model of the relevant biophysics.Reaction enumerators such as Visual DSD [23] and Peppercorn [24] produce, given a set of DNA molecules, a description of their predicted interactions as a CRN, allowing us to compare it to the original CRN using the same language.Since most molecular systems can be described as CRNs, defining correctness as a comparison of CRNs will also cover much more general cases, not limited to DNA strand displacement.We refer to the original abstract CRN as the formal CRN (S, R) and the model's enumerated CRN as the implementation CRN (S , R ), which is usually larger than the formal CRN.As a convention, we assume that the formal CRN and the implementation CRN make use of disjoint sets of species.When using Fig. 1.Implementation of A + B → C + D using the scheme described in [14].Top: DNA complexes and reactions, given as a diagram of the DNA strand displacement circuit.Each complex shown in the diagram is one species in the enumerated CRN, and arrows are reactions that would be enumerated by a reaction enumerator.Designated "signal" species are enclosed in dashed boxes, and designated "fuel" species in gray boxes.Bottom left: Direct translation of reactions in the implementation CRN.Bottom right: Implementation CRN after removing fuels.
verification to compare a detailed model of a natural system with unknown function to a simpler abstract CRN with known function, the natural system is the implementation and the abstract system is the formal CRN.
Although the definition of correctness we will propose is general, some of its parts are inspired by engineered implementations such as DNA strand displacement.There are three important features typical of engineered implementation CRNs that a concept of correctness must deal with.First, there is typically for each formal species A an implementation species x A intended to correspond specifically to it, sometimes called a "signal species".Second, certain implementation species must always be present for the system to work, and are designated "fuel species".Fuel species are typically assumed to be held at a constant concentration, for example by setting their initial concentration high enough that it does not vary significantly over the running time of the CRN.In this situation, we can approximate the implementation CRN by a simplified CRN with all fuel species removed; e.g. if g 1 is a fuel, the reaction x A + g 1 → i A can be replaced by x A → i A with no loss of meaning.This approximation holds not just for rate-independent computation, but for both stochastic and deterministic semantics in the following sense: if a species g 1 is (approximately) at a constant concentration c over the time interval considered, the equations introduced in Section 2 for the dynamics of all other species will be (approximately) the same [14].Third, certain species are inert side products whose further presence or absence has no effect on the correctness of the system behavior, and are thus designated "waste species".Such species can also be removed with no effect on the system dynamics.However, one advantage of our theory is that it does not need to distinguish signal species or waste species from each other or from other species: while knowing that a given species is a signal or a waste can be a helpful hint for finding an interpretation in our theory, it is not necessary, and there are no special rules for signal or waste species.Our theory still requires that fuel species be removed before applying the theory.
Fig. 1 gives an example of this process for the formal reaction A + B → C + D, yielding an implementation CRN with four reactions.(Names such as x A and t C D are based on the intent of the designers of the CRN, but the subscripts have no theoretical meaning.)The signal species x A can freely convert to and from i A , and the strand t C D can produce the signals x C and x D (and waste w 2 ).Intuitively, i A is an A and t C D is a C and a D; in this sense the first and third reactions are silent, and the second is A + B → C + D. We use this intuition as a basis for our definition of correctness.Formally, we define an interpretation of the implementation species (Fig. 2): We also define an interpretation of reactions: , in which case m(R → P ) = τ and we say the reaction is trivial.
The interpretation of an implementation reaction is always a pair (R, P ) of multisets of formal species, or τ , but (R, P ) may not be in R. Any such pair is a reaction in the language of the formal CRN, but is a formal reaction only if (R, P ) ∈ R.
Similarly, (R , P ) is an implementation reaction only if it is in R .[19], a connection that will be further discussed in Section 5.1.)

Three notions of correctness
Our notion of correctness is motivated by the earlier observation that the set of valid trajectories defines equivalence between formal CRNs, and allowing renaming of species defines isomorphism.Applying this notion to an implementation CRN with an interpretation introduces two difficulties.First, due to trivial reactions, the implementation trajectory may involve more steps.This is easily solved by defining the interpretation of an implementation trajectory to remove trivial reactions.Second, and more seriously, the full set of interpreted implementation trajectories may cover the formal trajectories, yet particular implementation trajectories may experience restricted options for alternative paths.Two examples of this are shown in Figs. 3 and 4.
In the first example (Fig. 3), there are two implementation species, x B and y B , that are both interpreted as B. Because x B can do anything in the implementation CRN that B can do in the formal CRN, and because x A can react to become x B , the implementation CRN can match any trajectory of the formal CRN using x B and ignoring y B .However, an implementation trajectory that reaches-or starts from-y B cannot proceed, whereas, the formal CRN cannot get stuck.To see that the key issue concerns limiting options, of which getting stuck is just a special case, the reader is encouraged to construct an example where the formal CRN is the one in Fig. 3 augmented by the three reverse reactions, but the implementation CRN Fig. 3. Example CRNs with the same set of (interpreted) trajectories but different behavior.Circles represent species and arrows represent reactions; implementation species are given with their name above the line and interpretation below the line, so for example x A is an implementation species with m(x A ) = {| A|}.Both CRNs have the same set of trajectories (after interpretation): from any initial count of As, Bs, and C s, each species can independently change from A to B, B to C , and C to A any finite or infinite number of times.However, the implementation CRN (right) can convert all species to y B , from which no further reactions are possible, while the corresponding formal state of all Bs can react further.

Fig. 4.
Modified version of the rock-paper-scissors oscillator [5,6] and an incorrect implementation.Adding the reactions 2 A 0.01 −−→ 2B etc. ensures that the formal CRN oscillates forever under stochastic semantics (left CRN, left image); without these reactions, eventually the count of one will hit zero and can never be recovered [6].An implementation CRN with two variants (x A , y A , etc.) of each formal species oscillates correctly over short periods of time, and the sets of trajectories of the two CRNs are the same; however, the implementation CRN can and eventually will reach a state where all species are in y form, in which case no further reactions can happen (right CRN, right image).can become trapped in a subspace where only the "clockwise" trajectories are possible, although it can never get stuck.To appreciate the subtlety of the problem, in our second example (Fig. 4), there are two forms of each formal species, and while the x forms can copy the formal CRN exactly, if all species are converted to y forms in the implementation CRN then no further reaction can happen.As mentioned earlier, this paper is not concerned about differences in kinetics or the probability distribution over trajectories; however, we would like to be able to ensure that properties about what states can be visited in the future, and in what order, are preserved in the implementation.Effectively, the naive definition of trajectory equivalence requires that for every formal state there exists an implementation state with the same interpretation and behavior, while we need a finer-grained notion of trajectory equivalence that requires for every formal state, all implementation states with the same interpretation have the same behavior.As defined formally below, the finer-grained notion becomes a satisfactory definition of correctness.
Although trajectory equivalence as defined below has the desired meaning, since the sets of trajectories are generally infinite, we would like a more local definition that facilitates efficient computational analysis.We define three local conditions on the interpretation which we show are equivalent to trajectory equivalence.As further evidence that our notion of correctness is sound, we show that these three conditions are equivalent to a special case of weak bisimulation from con-currency theory [19].(We discuss this connection further in Section 5.1.)This gives us three notions of correctness, given a formal CRN, an implementation CRN, and an interpretation: Definition 3.2 (Three notions of correctness).An implementation CRN (S , R ) is a correct implementation of a formal CRN (S, R) if a correct interpretation exists.An interpretation m : S → N S is correct if any of the following three sets of conditions are true: I Equivalence of trajectories (i) The set of formal trajectories and interpretations of implementation trajectories are equal.
(ii) For every implementation state S , the set of formal trajectories starting from m(S ) and interpretations of implementation trajectories starting from S are equal.II Three conditions on the interpretation (i) Atomic condition: For every formal species A, there exists an implementation species x A such that m(x A ) = {| A|}.A few comments may help explain these definitions.It may seem that the second condition for trajectory equivalence supersedes the first, but it does not: for example, the second condition may be satisfied even if there is no implementation state S that is interpreted as formal state S, whereas the first condition will not be satisfied in that case.In our definition of weak bisimulation, the use of T and T is in some sense redundant due to the structure of CRNs: the resulting state of a reaction is determined by the initial state and the reaction, so for example if S r − → T and S r = ⇒ T then it already must be the case that m(T ) = T .The definitions of the delimiting and permissive conditions are thus more suited to the structure of CRNs; we gave the definition of weak bisimulation as we did to match the definitions used in concurrency theory [19], and Theorem 3.1 proves that this definition is still equivalent.Because of the connection to bisimulation theory, when m : S → N S is a correct interpretation we often say that m is a bisimulation from (S , R ) to (S, R).When the distinction is important, we refer to a bisimulation that satisfies our additional restrictions (i.e., is an interpretation that satisfies the atomic condition) as a "CRN bisimulation", but for most of this paper when not explicitly comparing the different theories of bisimulation we use "bisimulation" to mean "CRN bisimulation".Theorem 3.1.The three definitions of correctness, namely trajectory equivalence, the three conditions on the interpretation, and weak bisimulation, are equivalent.
Proof.We show that trajectory equivalence implies the three conditions formulation; the three conditions imply weak bisimulation; and weak bisimulation implies trajectory equivalence.
Given trajectory equivalence, we prove the three conditions on m.First, for the atomic condition, consider applying condition I.(i) of trajectory equivalence to formal trajectories of length 0, which are just formal states, and in particular formal states S A = {| A|} for each formal species A. That the set of trajectories are equal implies that there is an implementation trajectory whose interpretation is the (zero-length trajectory) state S A , i.e. an implementation state S A with m(S A ) = {| A|}.Since implementation species cannot be interpreted as fractional or negative formal species, there is some species x A ∈ S A with m(x A ) = {| A|}, satisfying the atomic condition.Second, for the delimiting condition, consider implementation trajec- tories of length 1, specifically for each implementation reaction r = (R , P ) the trajectory R r − → P .If r is trivial, that is m(r ) = τ , its interpreted trajectory is a zero-length trajectory; if not, its interpreted trajectory is m(R ) m(r ) − −− → m(P ), which by trajectory equivalence must be a formal trajectory.For that to be so, m(r ) must be a reaction in R, thus satisfying the delimiting condition.And third, for the permissive condition, for every formal reaction r = (R, P ) and implementation state S with m(S ) ≥ R, the trajectory m(S ) r − → T , where T = m(S ) − R + P , is a formal trajectory.By condition I.(ii) of trajectory equivalence, there is an implementation trajectory starting in S whose interpreted trajectory is m(S ) r − → T .(Note that condition I.(i) implies this for some S with m(S ) = S, but not necessarily for every S .)To have that interpretation, that implementation trajectory must have some reaction r with m(r ) = r and all other reactions trivial; this is the definition of Given weak bisimulation, we prove trajectory equivalence.We first prove condition I.(ii).Given any S 0 with S 0 = m(S 0 ) and any implementation trajectory (S 0 , r 1 , . . ., r k , . . .= ⇒ implicitly defines an implementation trajectory (S 0 , r 1,1 , . . ., r 1,l 1 , r 1 , r 2,1 , . . . ) where each m(r k, j ) = τ and each m(r k ) = r k ; the interpretation of this trajectory is the formal trajectory (S 0 , r 1 , . . ., r k , . . . ) as desired, proving condition I.(ii).Condition I.(i) follows from condition I.(ii) of trajectory equivalence and condition III.(ii) of weak bisimulation: every implementation trajectory starts from some S and by condition I.(ii) its interpretation must be a formal trajectory starting from m(S ).Conversely, every formal trajectory starts from some S, by condition III.(ii) of weak bisimulation there is some S with m(S ) = S, and by condition I.(ii) of trajectory equivalence there is an implementation trajectory starting from S whose interpretation is that formal trajectory. 2

Applying bisimulation
We now consider how to use bisimulation to analyze our example implementation of A + B → C + D, as shown in Fig. 2. We use the three conditions formulation.The atomic condition is satisfied by the "signal species" x A , x B , x C , and x D .
For the delimiting condition, we check each implementation reaction individually: Now consider a different case.Fig. 5 shows an implementation of A + B → C + D as described by Qian et al. [15] as a means to implement stack machines, along with a natural interpretation.The species i A B:C D is interpreted as C + D, while i A:BC D is interpreted as A and x B as B. This makes the reaction i A B:C D → i A:BC D + x B interpreted as C + D → A + B, which is not a valid formal reaction.Thus the delimiting condition is unsatisfied, and the implementation is not correct according to bisimulation.Although we only show one candidate interpretation, any interpretation will have the same problem provided , and m(x D ) = {|D|}.The core of the problem in this implementation is that the irreversible step in the pathway happens only after x C and x D are released, so there exist trajectories in which one or both is released and then the pathway reverses, producing x A and x B again.When analyzed with bisimulation, such a system will (in the closest possible interpretation) have some step that is interpreted as A + B → C + D but is also reversible, in which case the reverse reaction will be interpreted as C + D → A + B, which is the problem described above.While this does not lead to a problem for the specific construction of deterministic stack machines used in Qian et al. [15], it does identify an error with this as a translation scheme for arbitrary CRNs: if the reaction A + B → C + D were put together with a reaction C → C + E, then it would be possible to go from {| A, B|} to {| A, B, E|} in the implementation CRN when it is not possible to do so in the formal CRN.As an aside, the inevitability of a reverse reaction interpreted as C + D → A + B brings up the question of whether this construction would be a correct implementation of a formal CRN with the reversible reaction A + B C + D. In fact the construction in Fig. 5 would not, since although the delimiting condition would be satisfied, the reaction C + D → A + B cannot take place starting from x C + x D , failing the permissive condition: the reaction needs x C + x D + i A BC D: to reverse itself, but i A BC D: is not a fuel.However, Qian et al. designed a construction intended to implement reversible reactions, also presented in [15], which when enumerated produces a similar CRN with i A BC D: replaced by a fuel f A BC D: and no i A BC D: + f i → w A BC D reaction, and this construction is correct according to bisimulation.A given CRN can be a correct implementation of more than one formal CRN.Given an interpretation, there is only one possible formal CRN for which that interpretation might be a bisimulation, but a given implementation CRN may have multiple interpretations to multiple formal CRNs, more than one of which may be correct bisimulations with an appropriate formal CRN.At the extremes, every CRN is an implementation of itself where m(x) = {|x|} for all species x is a bisimulation, and every CRN is an implementation of the CRN with 0 species and 0 reactions where m(x) = ∅ for all x is a bisimulation.As a more interesting example, consider the implementation CRN shown in Fig. 6(A).The CRN has 16 species x i, j for 1 ≤ i, j ≤ 4, with reactions x i, j x i+1, j and x i, j x i, j+1 for all appropriate i, j.In many implementations, species with m(z) = ∅ play a role.We call those species "null species".One type of null species is what theories such as pathway decomposition [17] would call "waste species": implementation species that never appear as a reactant unless all other species involved in the reaction are also waste species.This allows the waste species to be ignored once produced.The species w 1 and w 2 in the example above from Soloveichik et al. [14] (Fig. 1) are examples of this type of null species, which are usually easy to handle with bisimulation.However, not all null species have to be waste species.Consider the following formal and implementation CRNs: , which requires the null species z.So starting from either x A or y A , to implement A → B the system loops x A → y A and y A → x A + z as many times as needed, then does x A + 3z → x B .Since any state whose interpretation has an A must have either x A or y A , this proves the permissive condition is satisfied.The atomic condition is satisfied by m(x A ) = A and m(x B ) = B, and the delimiting condition is satisfied since the first two implementation reactions are trivial and the third is A → B. Thus this is an example of a correct interpretation with a non-waste null species.
As a final example of the effects of null species, consider a pair of CRNs similar to the previous example: If we consider the same interpretation as the previous CRN pair, m then the permissive condition is not satisfied.In the implementation state {| y A |}, since m({| y A |}) = {| A|} the reaction A → B should be possible, but in the implementation CRN no reactions are possible.(In fact, no correct interpretation exists.)Intuitively, this CRN tried to implement A → B where the combination y A + z represents one copy of A; in fact, the theory of pathway decomposition [17] would take this view, with x A and x B identified as formal species A and B and with y A and z considered intermediate species, because pathway decomposition only considers initial states consisting exclusively of formal species when evaluating correct behavior.However, our use of interpretations requires that every implementation species must have a meaning on its own, and counts implementations that rely on combinations having meaning as invalid.
One example of CRN comparison, as discussed in [1], is when a larger CRN contains multiple copies of a smaller CRN, and we want to consider the larger CRN as an implementation of the smaller CRN.As a naive example, consider just two copies of a correct implementation as the implementation CRN: However, when we try to do the same thing with bimolecular reactions, problems arise.Consider: then the permissive condition is not satisfied: the state {|x A , y B |} is interpreted as {| A, B|}, and thus should be able to implement A + B → C , but in fact no reactions are possible and it cannot.This sort of problem is the motivation for the modularity condition which we describe soon, and the reason why checking the permissive condition is difficult, as discussed in Section 4. One solution, in terms of implementation CRN design, is to allow every possible combination of reactants to any reaction to interact: This, with the same interpretation as above, is now a correct implementation.However, it can scale poorly: if a formal reaction has k reactants and each one has n possible implementation species, then there are roughly n k combinations which each need their own reaction.A better-scaling way is to allow different implementation species to interconvert: Again using the same interpretation, this system is a correct implementation.This approach is the one expected by the modularity condition, which helps the scaling behavior.

Properties of CRN bisimulation
We describe two properties of CRN bisimulation that are likely to be useful when analyzing larger systems.While bisimulation in the classic sense is an equivalence relation between systems [19], our definition of interpretation-dependent CRN bisimulation is a partial order on the set of CRNs.In particular, CRN bisimulation is transitive, which allows us to do complex proofs of correctness in stages.We also show a modularity condition, where the combination of interpretations can be verified using only properties of each individual interpretation.This is particularly useful for general translation schemes where the translation of a whole CRN is the combination of one "module" for each reaction.As an example, we use modularity to prove that the translation scheme in [14] is correct for any CRN.
We first show that CRN bisimulation is transitive.Consider three CRNs: an abstract CRN (S, R), an implementation CRN (S , R ), and an intermediate CRN (S , R ).For example, (S, R) is an abstract CRN, (S , R ) is a low-level reaction enumeration of a prospective DNA implementation of (S, R), and (S , R ) is a more high-level reaction enumeration of the same DNA implementation which abstracts away from certain details.Say we have proven that (S , R ) is a valid implementation of (S, R) by finding an interpretation m 1 : S → N S which is a bisimulation, and similarly have found an interpretation m 2 : S → N S which is a bisimulation from (S , R ) to (S , R ).We want to prove that (S , R ), the system we actually have, is a valid implementation of (S, R), the system we want.The natural interpretation m : ), treating m 2 as a function of species and m 1 as extended to a function of states.It turns out that this interpretation is in fact a bisimulation.
Proof.We use the three conditions formulation of correctness.We refer to (S, R) as the "formal" CRN, (S , R ) as the "implementation" CRN, and (S , R ) as the "intermediate" CRN.We show that each condition for m follows from the corresponding conditions for m 1 and m 2 .
For any formal species A, by the atomic conditions for m 1 and m 2 there is an intermediate species For any implementation reaction r = R → P , by the delimiting condition for m 2 its interpretation m 2 (r ) is either Since n is a bijection, any reaction that would be trivial after interpretation (by either m 1 or m 2 ) must be trivial before interpretation, and thus cannot exist.By the delimiting condition for m 1 , every reaction in R 1 must have its image under n in R 2 ; by the permissive condition for m 1 , every reaction in R 2 must have its preimage under n in R 1 ; thus the two CRNs are equal up to the isomorphism n. 2 This result stands in contrast to the definition of bisimulation in transition systems, which is an equivalence relation on states that can be extended to an equivalence relation on systems [19].We discuss this difference further in Section 5.1.
In Section 3.3 we showed that the translation scheme from [14] is a correct implementation of the single reaction Intuitively, given a CRN of multiple reactions we should be able to combine the implementations of each such reaction to form a correct implementation of the CRN.In particular, we would like to show that the combined implementation CRN is correct using a condition which we can check on each individual reaction's implementation without having to check any property of the combined CRN.Since, as we will see in Section 4, the time required to check an interpretation scales much worse than linearly in the size of the implementation CRN, such a modularity condition would be a significant saving in the time required.While it is not in general true that combining two correct implementation CRNs gives a correct implementation of the combined formal CRN, there is a modularity condition which guarantees that the combined CRN is correct.
We consider an implementation CRN (S 1 , R 1 ) and formal CRN (S 1 , R 1 ) with interpretation m 1 : S 1 → N S 1 , and another implementation CRN (S 2 , R 2 ) and formal CRN (S 2 , R 2 ) with interpretation m 2 : S 2 → N S 2 , where both m 1 and m 2 are bisimulations.We assume the interpretations are compatible: for each We also assume that the reactions in R 1 and R 2 are the only reactions that occur when you combine the implementation species in S 1 and S 2 ; that is, we assume no crosstalk reactions.Whether there is crosstalk can be checked by a reaction enumerator [23,24], but is beyond the scope of this theory.Aside from crosstalk, the main reason for the combined implementation to be incorrect according to bisimulation is a failure of the permissive condition.If some implementation species y in e.g. S 1 but not in S 2 has an interpretation that contains a formal species A ∈ S 1 ∩ S 2 , there may be some formal reaction in R 2 with A as a reactant that cannot be implemented from an implementation state where y is the representation of A, in which case the permissive condition is false.(For example, if in Fig. 7 the reaction i 1:1 → x A was not present, then the interpretation m 1 would still be a correct bisimulation, but the combined interpretation m would fail the permissive condition for where formal reaction C + A → B + D cannot be implemented from implementation state {|x C , i 1:1 |}.)If any such species y can, via trivial reactions, "release" any formal species in S 1 ∩ S 2 in its interpretation to implementation species in S 1 ∩ S 2 , then we would think this problem cannot arise.This condition can be checked individually on each module without checking the combined CRN, and we show that this condition guarantees that the combined implementation is correct according to bisimulation.An example of a modular implementation CRN is shown in Fig. 7.

Definition 3.3 (Modularity condition).
Let m be a bisimulation from (S , R ) to (S, R).Let S 0 ⊂ S and S 0 ⊂ S be subsets of implementation and formal species, respectively, where y ∈ S 0 ⇒ m( y) ⊂ S 0 .We say that m is a modular interpretation with respect to the common (implementation and formal) species S 0 and S 0 if for any x ∈ S there is a sequence of trivial reactions {|x|} τ = ⇒ Y + Z where Y ⊂ S 0 and m(Z ) ∩ S 0 = ∅, i.e. all common formal species in the interpretation of x are extracted as common implementation species.

Theorem 3.2 (Modularity). Let m 1 be a bisimulation from
where m 1 and m 2 agree on and m : S → N S equal m 1 on S 1 and m 2 on S 2 .If m 1 and m 2 are both respectively modular bisimulations with respect to the common implementation species S 1 ∩ S 2 and common formal species S 1 ∩ S 2 , then m is a bisimulation from (S , R ) to (S, R), and m is also modular with respect to S 1 ∩ S 2 and S 1 ∩ S 2 .
Proof.We use the three conditions formulation.The atomic condition for m for each formal species A is satisfied by the species x A that satisfy it for m 1 or m 2 , as appropriate, or possibly both; e.g. if A ∈ S 1 then there is some species Similarly the delimiting condition for m follows from that for m 1 and m 2 : for any implementation reaction R → P in R , that reaction is in either R 1 or R 2 (the proof still holds if in both), and its interpretation in m agrees with its interpretation in either m 1 or m 2 as appropriate, which is either a trivial reaction or a formal reaction in R 1 or R 2 , which is thus in R.
For the permissive condition, consider a formal reaction r = R → P and implementation state S where R ≤ m(S ).Either r ∈ R 1 or r ∈ R 2 ; without loss of generality say r ∈ R 1 (where again, the proof still holds if also r ∈ R 2 ).Divide S into species in the first CRN and species not: let S = S 1 + S 2 , where S 1 ⊂ S 1 and S 2 ∩ S 1 = ∅.If m(S 1 ) ≥ R, then the permissive condition for m 1 applied to reaction r and state S 1 mean S 1 r = ⇒, thus S r = ⇒ by the same sequence of reactions ignoring species in S 2 .In the general case, this means the proof is nontrivial only for formal species in R whose implementations in S are in S 2 , and we need to show that those formal species can be "extracted" into an implementation species in S 1 .This is exactly the modularity condition: for each species x i ∈ S 2 there is a sequence of trivial reactions by which That m is modular with respect to S 1 ∩ S 2 and S 1 ∩ S 2 also follows simply from the same properties for m 1 and m 2 , and the fact that if m 1 (r ) = τ or m 2 (r ) = τ then m(r ) = τ .For any x ∈ S , there is a sequence of trivial (under m) reactions by which then such a sequence follows from the modularity of m 1 , and if x ∈ S 2 , from the modularity of m 2 . 2 DNA implementation schemes for arbitrary CRNs such as [14], [15], and [16] typically have a set of common species and for each formal reaction a "module" with additional species and implementation reactions that implement the formal reaction.If the modules have no crosstalk and each one correctly implements its reaction and satisfies the modularity condition, then repeated applications of Theorem 3.2 prove that the entire CRN is a correct implementation.Corollary 3.3.Consider a formal CRN (S, R) with n reactions R = {r i } n i=1 , and n implementation "module" CRNs (S 0 ∪ S i , R i ) with species S 0 in common, where any S i is disjoint from any S j for 0 ≤ i < j ≤ n.If there are interpretations m i : S i → S for 0 ≤ i ≤ n such that the interpretation (m 0 ∪ m i ) is a bisimulation from (S 0 ∪ S i , R i ) to (S, {r i }) and is modular with respect to the common implementation species S 0 and common formal species S, In particular, the translation scheme from [14] discussed earlier satisfies the condition in Corollary 3.3 for S 0 = {x A | A ∈ S}, i.e. the signal species.Note that formally, each module contains all signal species-even ones that do not appear in reactions in that module.For example, if the formal CRN has reactions A + B → C and A + D → B, then the signal species x D ∈ S 0 appears as an implementation species in the module corresponding to A + B → C but does not appear in any reaction in that module.Although counterintuitive, our theory works fine when some species do not appear in any reactions.Thus Corollary 3.3 proves that for any number of formal reactions, the scheme in [14] produces a correct implementation CRN, as long as the DSD reaction enumerator produces exactly the described reactions and no additional crosstalk reactions.

Checking bisimulation
We now have a definition of "correct implementation", and can sometimes prove that a particular implementation is or is not correct.We would like to find a general way to check whether any implementation is correct.
We divide "checking bisimulation" into three questions.First, given a formal and implementation CRN and an interpretation, is the interpretation a bisimulation?Second, if (as in most engineered CRN implementations) we have a formal CRN, implementation CRN, and for each formal species A a designated signal species x A , is there an interpretation which is a bisimulation and has m(x A ) = {| A|}? Finally, given a formal CRN, implementation CRN, and no additional information, is there an interpretation which is a bisimulation?
For complexity purposes, we define the size of a CRN (S, R) as This corresponds roughly (up to polynomial factors) to writing the number of species in unary (to cover edge cases involving species not appearing in any reaction), then writing each reaction in the usual chemical notation (e.g. 5 X + 3Y → Z + 2 X ).Similarly, for an interpretation m : S → N S we define |m| = x∈S X∈S log(m(x)(X) + 1) , which corresponds roughly to writing out each interpretation e.g.m(x 3 ) = 3 A + B. We find that the complexity of our algorithms is best expressed in terms of three parameters: the size n = |S| + |R| + |S | + |R |, the total number of species and reactions in the two CRNs; the arity k = max R→P ∈R X∈S R( X), the maximum number of reactants in any formal reaction; and the max stoichiometry s and k ≤ s|S|, so any algorithm whose time or space complexity is (for example) polynomial in some combination of n, log s, log k, and |m| is (for example) polynomial in |(S, R)| + |(S , R )| + |m|.We find that some algorithms are less complex when k is bounded by a constant, such as k = 2 limiting the formal CRN to bimolecular reactions, and that the possibility that s is not bounded by a constant (in particular, when k is ω(n)) affects the technical details of the proofs but not the result.

Checking an interpretation
First we consider the problem of, given an interpretation, checking whether it is a bisimulation.We use the three conditions on an interpretation, having proved in Theorem 3.1 that they are equivalent to bisimulation and trajectory equivalence.Given two CRNs and an interpretation between them, the atomic and delimiting conditions are trivial to check.This leaves only the permissive condition.
Checking the permissive condition means, for each formal reaction r = (R, P ) and implementation state S with m(S ) ≥ R, S can reach via trivial reactions some state from which a reaction that is interpreted as r can happen.Although there are infinitely many such implementation states, we can find a finite set that is sufficient for checking the permissive condition.Fig. 8 shows an example of such a set for the reaction A + B → C in a hypothetical implementation CRN.Definition 4.1.Given a formal reaction r = (R, P ) ∈ R, we say that an implementation state S is a minimal implementation state for r if m(S ) ≥ R and there is no S 0 with S 0 < S and m(S 0 ) ≥ R. (Equivalently, S is a minimal element-in the usual sense for partially ordered sets-of the set {S | m(S ) ≥ R}, so we often say S is "minimal for m(S ) ≥ R".)We use the notation M(r) for the set of minimal implementation states for a formal reaction r.The number of minimal implementation states for r is at most n k , and all such states can be enumerated in time poly(n k ) and space poly(n, k).When k ≥ n, in particular the number of minimal implementation states for r is at most 2 n log k , and they can be enumerated in time 2 poly(n,log k) and space poly(n, log k).
Proof.We describe an algorithm to enumerate all implementation states S minimal for m(S ) ≥ R given R, then show that it has the desired complexity and correctly enumerates all minimal states.
If R = ∅, then the only minimal implementation state for m(S ) ≥ R is S = ∅.If not, choose an arbitrary formal species A ∈ R. For each implementation species x with A ∈ m(x): Construct a multiset of formal species Q x = R\m(x) by removing m(x) from R, ignoring any species in m(x) but not in R. Apply this algorithm recursively to enumerate all implementation states S x minimal for m(S x ) ≥ Q x .For each such S x , the state S = S x + {|x|} has m(S ) ≥ R and may be minimal.Check if it is minimal by for each y ∈ S checking if m(S − {| y|}) ≥ R. If none of them are, then S is minimal; print it and continue enumerating.
We now prove that the algorithm enumerates exactly all S minimal for m(S ) ≥ R. Since we check whether S is minimal before printing it, the algorithm clearly does not enumerate any S with m(S ) ≥ R that is not minimal.(Without this check, the algorithm could generate a non-minimal implementation state.For example, let R = {| A, B|} where there are implementation species x 1 with m(x 1 ) = {| A|} and m(x 2 ) = {| A, B|}.The algorithm could first choose A from R and x 1 with A ∈ m(x 1 ), generating Q x 1 = {|B|}, then when called recursively choose B from Q x 1 and x 2 with B ∈ m(x 2 ), thus generating the state S = {|x 1 , x 2 |} which is not minimal, since S > {|x 2 |} and m({|x 2 |}) = {| A, B|} ≥ R.) That every enumerated S has m(S ) ≥ R can be proven by induction on |R|.If |R| = 0, i.e.R = ∅, then m(∅) = ∅ is trivially true.If not, then for each x with A ∈ m(x) iterated through, |Q x | < |R| so by assumption each generated S x has m(S x ) ≥ Q x .Then the S generated from that Finally, to prove that every minimal state is enumerated, we again use induction on |R|, with the case |R| = 0 having only one minimal state, S = ∅, which is generated.When |R| > 0, consider an arbitrary state S minimal for m(S ) ≥ R.Where A ∈ R is the first formal species chosen by the algorithm, there is at least one x ∈ S with A ∈ m(x), and the algorithm at some point iterates through that x.Consider Q x = R\m(x) as generated in the algorithm, and S x = S − {|x|}.If we can show that S x is minimal for m(S x ) ≥ Q x , then by assumption the recursive call to the algorithm generates S x , thus the algorithm generates S = S x + {|x|}.If S x is not minimal for Q x , then there is some S was not minimal for R, creating a contradiction.Thus the algorithm generates every minimal S , completing the proof of correctness.
Finally, we prove the algorithm takes time poly(n k ) and space poly(n, k).Since the algorithm adds one implementation species at each recursion depth and subtracts at least one species from R at each depth, the depth is at most |R| = k.Iterating through at most |S | = n implementation species at each depth proves a bound of n k on the number of minimal implementation states and the poly(n k ) time bound.At any time the algorithm stores one implementation species plus poly(n, k) information for the Q x and S x 's for each recursion depth, proving the space bound.
If k ≥ n then instead of removing one copy of one implementation species x at each recursive step, we choose one implementation species x and a number α, set Q αx = R\αm(x), and mark that x cannot be chosen again in the recursive call.To keep only minimal states, we bound α such that (α − 1)m(x) ∧ R < αm(x) ∧ R (i.e. the αth copy of x is not redundant); since we only choose x with |m(x)| ≥ 1 and |R| = k this implies α ≤ k.This algorithm has a depth of at most n, making a choice out of k possibilities at each step and keeping track of at most n numbers each bounded by k, which proves the bounds of time 2 poly(n,log k) and space poly(n, log k) given when k ≥ n. (Note that k ≥ n ≥ 2 ⇒ n log k ≤ k log n, so the bounds given for k ≥ n are tighter than the bounds given for the general case.) 2 Consider applying the algorithm described in Lemma 4.2 to the example implementation CRN in Fig. 8.Here R = {| A + B|}, and assume A is chosen first.The algorithm then splits into different branches based on the choice of implementation species that contains A, where each branch may enumerate one or more minimal states; we discuss the branches separately.One branch of the algorithm will choose j A B to contain A, and stop, since m( j A B ) ≥ R, enumerating one minimal state.Other branches will choose each of i A , x A , and i AC .Each of those branches will recursively call the algorithm with Within each of those three branches, the recursive call will again split into branches, with one branch that considers each of i B , x B , j B , and j A B to contain B. The branches that consider j A B will conclude that the resulting minimal states e.g.{|x A + j A B |} are not minimal, but the other branches will conclude that the resulting states are minimal, enumerating the other 9 minimal states shown in Fig. 8. Now we have reduced an infinite number of possible initial implementation states to a finite number of minimal implementation states for each formal reaction.We have to check, for each S 0 ∈ M(r), whether S 0 r = ⇒; equivalently, whether S 0 τ = ⇒ T where T ≥ R for some implementation reaction with m(R → P ) = r.Checking this for one S 0 is the "superset reachability" or "covering" problem, which was proven by Rackoff [34] and Lipton [35] to be EXPSPACE-complete.Surprisingly, we found that the requirement that every S 0 r = ⇒ makes the permissive condition checkable in PSPACE, by ruling out complex constructions such as Lipton's proof of hardness in [35].
Intuitively, we will show that if some S 0 r = ⇒ but requires the full complexity of exponential space to determine that, then that complexity will force some other S 1 with m(S 1 ) ≥ R to have S 1 r = ⇒.Fig. 9 provides a simplified example of the principle we use.Here we have two minimal states for the formal reaction r = A + B → C .One of them, S 0 = {|x A , x B |} can implement r via the reactions x B → y B + z and x A + y B + z → x C .In doing so, it passes through a non-minimal state {|x A , y B , z|}, and requires the extra species z to finish implementing r.However, that the extra z is required to implement r means that the other minimal state S 1 = {|x A , y B |} has no way to implement r; so the permissive condition is false, and we don't need to check whether S 0 r = ⇒.This idea turns out to be generalizable, and allows us to mostly ignore "null species" with m(x) = ∅, which among other things prevents the complexity necessary for Lipton's proof in [35].In particular, we will show that when checking for a pathway by which S r = ⇒ we need only consider the minimal states M(r) plus a small amount of additional information, and that this can be done in poly(n k   We draw an arrow from a state S 1 to a state S 2 if there is a trivial reaction that can occur in S 1 (plus some null species) and the resulting state is ≥ S 2 .Arrows "out" (with no target) represent implementation reactions interpreted as the formal reaction in question and that can occur in the minimal state in question plus some null species.A. An example graph for the reaction A + B → C in an implementation CRN without null species, producing the same set of minimal states shown in Fig. 8.Here the permissive condition is true for A + B → C if and only if every minimal state has a path to some arrow out, which we can see is true for this graph.Note that the reaction possible in any minimal state and the feature our algorithms exploit is that we do not need to remember information beyond the minimal state and the null species, as far as we are concerned the reverse reaction is impossible.B. An example graph for the reaction A + B → C in an implementation CRN with one null species z.Here arrows for reactions that consume and/or produce null species are marked with the number of null species they consume and/or the number they produce.Arrows for reactions that consume null species are grayed out, since they cannot occur in the minimal state in question, but may be relevant if the required null species are produced.Checking the permissive condition for A + B → C in this CRN may require more complex techniques.
For a simple case, consider an implementation CRN with no species x where m(x) = ∅, such as the one shown in Fig. 10A, and consider its graph of minimal states for a formal reaction r = R → P .If, for every minimal state, there is a path through the graph of minimal states to a reaction that implements r, then the permissive condition is true.In fact, the permissive condition is true if and only if such a path exists for every minimal state.If S 0 is minimal and S 0 τ = ⇒ S which is not minimal, then there is some minimal S 1 with S > S 1 , which without null species must have m(S 1 ) < m(S ) = m(S 0 ).Here either = ⇒ is valid.Effectively, for the purpose of checking the permissive condition, we can pretend S is S 1 , thus reducing our search for trajectories to a search for paths through the set of minimal states.Where k is the number of reactants in r and n the number of implementation species, we know from Lemma 4.2 that the number of minimal states is at most n k when k ≤ n and at most 2 n log k when k ≥ n, both of which are exponential in the size of the CRN as defined at the beginning of this section.Because searching for paths through a graph can be done in space logarithmic in the size of the graph [36], we can check the permissive condition in polynomial space when there are no null species.To generalize this, we show that null species and loops do not make this bound worse.
Now consider an implementation CRN with null species, such as the one shown in Fig. 10B, and its graph of minimal states for a formal reaction r = R → P .We can try to apply the same logic as in the case without null species: if a minimal state S 0 τ = ⇒ S non-minimal with a minimal state S 1 < S , either S 1 r = ⇒ and we can pretend S is S 1 , or S 1 r = ⇒ and the permissive condition is false anyway.Without null species this was valid because S > S 1 implies m(S ) > m(S 1 ), and thus S 1 cannot reach S 0 via trivial reactions.With null species, on the other hand, it may be that S − S 1 contains only null species and it may be that S 1 τ = ⇒ S 0 ; in particular it may be that both S 0 r = ⇒ and S 1 r = ⇒, but the only path by which S 1 r = ⇒ goes through S 0 and the only path by which S 0 r = ⇒ goes through S , creating a loop that will not be found when searching through the graph of minimal states.In fact, this is exactly the case in the CRN shown in Fig. 10B In the CRN in Fig. 10B, the path by which {|x minimal state can reach a state equal to itself plus some null species.In such a case, that loop can be repeated any number of times to produce any number of z, and when searching for a path, there is no need to keep track of the exact number of z produced: either there are no z, or there are "enough" z produced by a previous loop.Recall the argument we tried to use that failed: if S 0 τ = ⇒ S 1 + Y , with S 0 and S 1 minimal, then either S 1 r = ⇒ and so does S 0 , or S 1 r = ⇒ and the permissive condition is false.Recall that this argument only failed because it may be that S 1 r = ⇒ but only by passing through S 0 , in a situation similar to the loop in Fig. 10B.This suggests, which we will show is true, that this example is general: if S 0 τ − → S 1 + Y for S 0 , S 1 ∈ M(r), then for each y ∈ Y , either y can be completely ignored when checking the permissive condition, or else the following all hold: m( y) = ∅, there is some S j ∈ M(r) such that S j τ = ⇒ S j + y + . . ., and y is only "relevant" after it has been produced in some such loop.
The above discussion allows us to define a graph, which can be both enumerated and searched through in polynomial space, such that paths through the graph correspond to paths by which a given minimal state implements a formal reaction r.The states of this loopsearch graph are triples of the form (S , S 0 , ζ ) where ζ maps each null species in Z to 0, 1, or ∞: in each state we are at or covering one minimal implementation state S ∈ M(r), in the middle of a loop beginning at some other state S 0 ∈ M(r), and each null species in Z is either absent, produced previously in the current incomplete loop, or present in infinite copies from a previous loop.An example of such a graph is given in Fig. 11.
We use the following notation in defining and discussing the loopsearch graph, in the context of a given formal and implementation CRN with interpretation and a specific formal reaction r.
is the set of null species that have been produced in previous loops, and are thus "available" for use later.We write S 0 ζ − → S 1 + Z if there is a trivial reaction that, for some n ∈ N, can occur in S 0 + nζ −1 (∞), where the resulting state is some S containing S 1 and at least one copy of each null species in Z .Z may be empty, in which case S 0 ζ − → S 1 is the same as S 0 ζ − → S 1 + ∅.Following the terminology of [34], to "cover" a state S in a CRN is to be in a state containing S plus possibly some other species.Definition 4.2.Given (S, R) and (S , R ) a formal and implementation CRN, m : S → N S an interpretation, and r a formal reaction, we define the loopsearch graph for r.The loopsearch graph is a directed graph with vertex set M(r) × M(r) × 3 Z , where Z = {z ∈ S | m(z) = ∅}, with some vertices designated as "terminal".Here a vertex (S , S 0 , ζ ) is interpreted as, "we are covering state S , in the middle of a loop starting and ending at S 0 , with null species present or absent as determined by ζ ", except that S = S 0 is interpreted as "not in the middle of a loop".ζ ∈ 3 Z maps each null species z to {0, 1, ∞}, a coarse-grained representation of the number of copies of z: we only need to remember whether z is not present (ζ(z) = 0), produced during the current loop (1), or produced as many times as necessary in a previous loop (∞).This interpretation suggests the definition of the edges of the loopsearch graph, which is as follows: • Reactions outside a loop: • Finishing a loop: − →, that is, if there is some implementation reaction r with m(r ) = r that can occur in S plus sufficiently many copies of null species z with ζ(z) = ∞.
Some comments on the definition may help give an intuitive understanding of the loopsearch graph.First, note that ζ is monotonic in this graph: for any given z ∈ Z, ζ(z) can change from 0 to 1, from 1 to ∞, or from 0 to ∞, but never decrease.
A null species z can be produced inside a loop, but the paths we are searching for cannot use z inside the loop where it was first produced; and once that loop ends, z is present in "infinite" copies and will always be so.Second, the loopsearch graph has a repeating substructure that mirrors the structure of the graph of minimal states; compare Fig. 11  edges from "Reactions inside a loop" and "Finishing a loop" in Definition 4.2, have a structure very close to the graph of minimal states, differing occasionally when the edge changes ζ .Finally, many of the vertices in the loopsearch graph are unreachable from any "initial vertex" (i.e., vertex of the form (S , S , 0)); usually such unreachable vertices, according to the meaning we give them, would contain some sort of contradiction.For example, every vertex of the form (S , S , ζ ) with ζ −1 (1) = ∅ will be unreachable in any loopsearch graph; in such a vertex, the form (S , S , ζ ) means we are at state S and not in the middle of a loop, but ζ(z) = 1 means z has been produced in the current, nonexistent loop.Other vertices are unreachable due to less obvious contradictions; in the example in Fig. 11, where i ∈ {1, 2, 3} and j ∈ {4, 5}, vertices of the form (S i , S j , ζ ) are unreachable, because we would be at S i in a loop starting at S j , but such states S i are unreachable from states S j ; similarly, vertices of the form (S j , S i , ζ ) are unreachable for ζ(z) = ∞.Unreachable vertices and edges are shown in gray in Fig. 11.
Because the edges in the loopsearch graph come from trivial reactions possible at the corresponding states, any path through the loopsearch graph implies the existence of a class of trajectories in the implementation CRN.A segment from (S 0 , S 0 , ζ ) to (S 1 , S 1 , ζ ) traveling only through vertices of the form (S , S , ζ ) with no change in ζ implies the corresponding sequence of trivial reactions can occur starting from S 0 plus some null species, including "sufficiently many" copies of ζ −1 (∞), and ending in a state that covers S 1 .A segment from (S 0 , S 0 , ζ ) to (S 0 , S 0 , ζ ) traveling only through vertices of the form (S , S 0 , ζ ) implies the existence of a "loop" in the implementation CRN of the form S 0 + Z 0 τ = ⇒ S 0 + Z 1 , where Z 0 includes "sufficiently many" copies of ζ −1 (∞), and Z 1 includes at least all null species in (ζ Given any number of times for this loop to happen, it can happen that many times starting with sufficiently many copies of ζ −1 (∞), producing as many copies of (ζ ) −1 (∞) as desired.This logic lets us compose paths made of these segments into trajectories possible in the implementation CRN, by taking each loop as many times as necessary to produce all species necessary for all future segments.In particular, a path from (S 0 , S 0 , 0), where 0 ∈ 3 Z is the function that maps all null species to 0, to a terminal state of the form (S , S , ζ ) implies the existence of a trajectory in the implementation CRN by which S 0 r = ⇒.Thus, if such paths can be found for every minimal S 0 , we know that the permissive condition is satisfied.
What we will show is that, if the permissive condition is satisfied, then it is satisfied by trajectories corresponding to paths through the loopsearch graph.Lemma 4.3.Let (S, R) and (S , R ) be a formal and implementation CRN, with interpretation m.Let r = (R, P ) ∈ R be a formal reaction and S 0 an implementation state minimal for m(S 0 ) ≥ R. If the permissive condition is satisfied, then there exists a path through the loopsearch graph described in Definition 4.2 from (S 0 , S 0 , 0) to some terminal state, where 0(z) = 0 for all null species z.
Conversely, if such paths exist for every formal reaction and minimal implementation state, then the permissive condition is satisfied.
Proof.Given r = R → P ∈ R and S 0 ∈ M(r), assuming the permissive condition is true, we find a path through the loopsearch graph for r from (S 0 , S 0 , 0) to a terminal state.In particular, we show that if the permissive condition is true, then from any (S , from which this process can be repeated.Since ζ −1 (∞) ⊂ Z which is finite, this process must find a terminal state in finitely many steps, namely at most |Z|.
and note that for each S ∈ M(r) and μ ≥ 0, we know that m(S + μZ) ≥ R. By the permissive condition, there is a trajectory in the implementation CRN by which S + μZ r = ⇒; for each S , choose a shortest such path.Construct a new trajectory by starting at S 1 + μZ , where μ is high enough for this trajectory to be valid, and at each step where we are ≥ some S + μ Z , take the first reaction on the chosen shortest path for S .Continue until the trajectory either takes an implementation reaction r with m(r ) = r, or covers the same minimal state S 2 twice.
If the trajectory takes an implementation reaction r with m(r ) = r, then by the construction of the loopsearch graph, for each reaction in the trajectory from a state ≥ S 1 to a state ≥ S 2 , there is an edge from (S 1 , S 1 , ζ ) to (S 2 , S 2 , ζ ).Where S 2 is the minimal state such that r was taken in a state ≥ S 2 , since r was possible that means (S 2 , S 2 , ζ ) is a terminal state.Thus these edges give a path from (S 1 , S 1 , ζ ) to (S 2 , S 2 , ζ ) which is terminal, which is the desired path.
If on the other hand the trajectory covers the same minimal state S 2 twice, then there must be at least one null species z / ∈ Z produced by a reaction between the first and second times S 2 is covered; otherwise such a path would be a futile loop, implying that for at least one S covered in that time there is a μ which gives a shorter path by which S + μZ r = ⇒.Then again by the construction of the loopsearch graph, for each reaction in the trajectory from a state ≥ S 1 to a state ≥ S 2 before the first state ≥ S 2 , there is an edge from (S If such a path through the loopsearch graph from (S 0 , S 0 , 0) exists for a given formal reaction r and minimal implementation state S 0 , then we show S 0 r = ⇒.We gave this argument informally above.Where states are of the form (S i , S j , ζ ), observe from Definition 4.2 that the only edges that changes ζ leave S j unchanged (i.e. are in a loop), and the only edges that change S j are edges from (S 1 , S 1 , ζ ) to (S 2 , S 2 , ζ ) leaving ζ unchanged with ζ −1 (1) = ∅ (i.e. are outside a loop).From that, given a path from (S 0 , S 0 , 0) to some terminal state (S f , S f , ζ f ), we can divide the path into segments as follows: in states of the form (S i , S j , ζ ), segments will alternate between segments where all states have S i = S j and ζ is unchanged ("paths"), followed by segments where S j is unchanged ("loops").Where (S i , S i , ζ i ) is the state at the end of the ith loop and Z i = ζ −1 i (∞), we show by induction on i that for any μ ≥ 0, S 0 τ = ⇒ S i + μZ i .The base case i = 0 has 0 −1 (∞) = ∅, so S 0 τ = ⇒ S 0 + μ∅ is trivially true.Assuming that S 0 τ = ⇒ S i + μZ i for any μ, consider the sequence of trivial reactions corresponding to the edges on the path from (S i , S i , ζ i ) to (S i+1 , S i+1 , ζ i ) and the loop from there to (S i+1 , S i+1 , ζ i+1 ).Those trivial reactions are, by Definition 4.2, possible using only null species in Z i ; let μ be the largest number of a single null species in Z i used after adding up all reactants of all reactions along the path and loop (ignoring any products of the reactions).Given arbitrary μ ≥ 0, let μ = (μ + 1)(μ + 1), and by assumption, S 0 τ = ⇒ S i + μ Z i .Then by following the trivial reactions along the path from S i to S i+1 , we use up at most μ Z i , so Then by following the trivial reactions along the loop μ times, each loop uses at most μ Z i and produces at least Z i+1 \Z i , so With this preparation, we can now describe algorithms to check the permissive condition.Having shown that the permissive condition is true if and only if certain paths through the loopsearch graph exist, our algorithms will be based on searching for those paths.In general, if a formal reaction r = R → P has k = |R| reactants and the implementation CRN has n = |S | species, there may be order n k minimal implementation states for r and the trajectories by which any one implements r may have to pass through most or all of them.As that suggests, we will later prove that checking the permissive condition (and thus checking an interpretation in general) is PSPACE-complete.So the first algorithm we present is the loopsearch algorithm, which runs in poly(n, k) space and poly(n kn ) time, which is Algorithm 1.
The size (number of vertices) of the loopsearch graph is |M(r)| 2 3 |Z| , at worst exponential in the size of the CRNs, and we have reduced the permissive condition to a question of whether paths between certain pairs of vertices exist in that graph.Savitch's theorem states that we can decide whether such paths exist through a graph of size N in log 2 N space [36], which given the results so far completes the proof that the permissive condition can be decided in polynomial space; the loopsearch algorithm is just a concrete application of Savitch's result to the loopsearch graph.Specifically, the loopsearch algorithm breaks a path from (S 0 , S 0 , 0) into alternating segments of two types: one type from Proof.We show that the loopsearch algorithm is correct and runs in polynomial space.We proved in Lemma 4.3 that the permissive condition is true if and only if a loop-segmented path exists for every formal reaction r and every minimal implementation state S 0 for r, so we need only to show that the loopsearch algorithm finds a loop-segmented path if and only if one exists.Each loop-segmented path implicitly specifies a sequence of l minimal states S i , and a sequence of sets of null species Y i .By removing from Y i all null species in Y j for j < i and defining Y 0 = Z − l i=1 Y i , we get the partition that the loopsearch algorithm searches for while preserving the loop-segmented path.Since the loops and paths in the desired loop-segmented path never repeat a minimal state, they must each have length at most N the number of minimal states, and a path of length 2 j from S a to S b exists if and only if for some S c a path of length 2 j−1 from S a to S c and a path of from S c to S b both exist.The desired loop-segmented path has each loop and path as a path between minimal states with certain null species ignored and the algorithm matches this restriction, so the loopsearch algorithm is correct.
At any point in the loopsearch algorithm, it is storing the following information: a formal reaction r, a minimal state S 0 , a partition of l +1 sets of null species Y i (thus implying that l ≤ z ≤ n), a sequence of l minimal states S i , and at most log N triples of minimal states S a , S b , S c in the recursive search algorithm.The at most n k minimal states can be enumerated in polynomial space (i.e.without storing any states other than the current and next one) as shown in Lemma 4.2, and similarly partitions of z ≤ n elements can be enumerated in poly(z) space.Also according to Lemma 4.2, the number of minimal states is N ≤ n k , so the depth of the search log N is poly(n, k).(When k ≥ n, N ≤ 2 n log k so the depth is at most poly(n, log k).)To complete the proof, note that as discussed earlier, checking the atomic and delimiting conditions are both trivial given an interpretation, thus whether an interpretation is a bisimulation can be checked in polynomial space. 2 We have repeatedly said that the difficulty of checking the permissive condition scales with the number of minimal states for any given formal reaction r = R → P , which typically scales like (and scales no worse than) n k where n = |S | and k = |R|.We stated earlier, and will show later, that when k is unbounded, checking an interpretation is PSPACE-complete.
However, many CRNs in practice have large numbers of species but small numbers of reactants per (formal) reaction; in particular, almost any interesting behavior-if not any interesting behavior-that can be done with a CRN can be done with a CRN with a bound of k ≤ 2. For those CRNs, we present a graphsearch algorithm which takes poly(n k ) space and time, making it much faster than the loopsearch algorithm when k is small but taking much more space when k is large, as Algorithm 2.
For each formal reaction r, the graphsearch algorithm enumerates and creates a table of all implementation states S i minimal for r.The algorithm uses this table to store "known information" about which states are reachable from S i and iteratively updates this information, continuing until either every S i r = ⇒ is known or until no further information can be known, in which case some S i r = ⇒.For each S i , the algorithm stores whether or not it is known (yet) that S i r = ⇒.If it is not yet known that S i r = ⇒, then the algorithm stores, for each minimal state S i , whether it is known that S i τ = ⇒ S j , and for each y ∈ S with m( y) = ∅, whether it is known that S i τ = ⇒ S i + y.Initially, the only thing known is that S i τ = ⇒ S i for each S i .
Given this table, the algorithm goes through repeated "cycles" of updating the known reachabilities until one cycle passes with no changes.In each cycle, the algorithm iterates through each minimal S i .For each S i , if S i r = ⇒ is known, the algorithm Algorithm 2: The graphsearch algorithm to check the permissive condition in time and space polynomial in the number of minimal states.

def graphsearch(CRN formal, CRN impl, interpretation m):
for each reaction r in formal: Min = minimal_states(impl, m, r) # states or r reachable from S' table reach(S') = <empty> for each state S' in Min # null species producible in a loop at S' table prod(S') = <empty> for each state S' in Min repeat until reach and prod are both unchanged: for each S'1 where reach(S'1) is not r: if S'1 + inf prod(S'1) can do r' with m(r') = r: set reach(S'1) to r, continue to next S'1 for each trivial reaction r' by which S'1 + inf prod(S'1) -> S'2: if reach(S'2) is r: set reach(S'1) to r, continue to next S'1 reach(S'1) += reach(S'2) if S'1 in reach(S'2): prod(S'1) += prod(S'2) prod(S'1) += {y in products(r') where m(y) = empty} if not all reach(S') is r: # after table no longer changes return False return True # if all reactions pass without returning False skips S i .If not, where Y i is the set of all y such that S i τ = ⇒ S i + y is known, the algorithm checks for each implementation reaction r with m(r ) = r whether it can occur in S i given arbitrarily many copies of Y i .If S i + ∞Y i r − → for one of those r , then the algorithm records that S i r = ⇒ and will skip S i in the future.Otherwise, the algorithm iterates through all trivial reactions in the implementation CRN and check whether they can occur in S i + ∞Y i .For each reaction that can occur, the algorithm finds the (possibly multiple) minimal state(s) S j that are covered by the state after that reaction, and updates the table based on what is known about S j : The algorithm terminates when a full cycle passes with no change to the table.At that time, if S i r = ⇒ is known for every minimal S i , the algorithm states that the permissive condition is true; otherwise, the algorithm states that the permissive condition is false.

Theorem 4.2. When the number of reactants in a formal reaction k is constant, whether an interpretation is a bisimulation can be checked in polynomial time.
Proof.We prove that i) if the graphsearch algorithm returns true, then the permissive condition is true; ii) if the permissive condition is true, then the graphsearch algorithm will return true; and iii) the graphsearch algorithm always terminates in poly(n k ) time.
To prove the first, the graphsearch algorithm is based on deductions from "known" information.The initially known information is that each S i τ = ⇒ S i , which is trivially true (since τ = ⇒ includes the sequence of 0 reactions).There are five deduction rules: the rule that if S i τ = ⇒ S i + Y i and S i + ∞Y i r − → with m(r ) = r then S i r = ⇒, and the four listed rules for when S i + ∞Y i τ − → S j .Each of the rules is a valid deduction, so any information deduced by the algorithm will be true.In particular, the algorithm says the permissive condition is true only when every minimal S i r = ⇒, which by Lemma 4.1 implies that the permissive condition is in fact true.
To prove the second, we show that if the permissive condition is true but at a given cycle evaluating the formal reaction r the graphsearch algorithm does not yet know that every minimal S i r = ⇒, then there is at least one additional fact that it can learn this cycle.The proof is similar, but not identical, to the proof of Lemma 4.3.At any given cycle, for each minimal state S i let Y i be the set of all null species such that S i τ = ⇒ S i + Y i is known.If the permissive condition is true, then the permissive condition is true in a modified CRN which for each S i adds the (trivial under the given interpretation) reaction For each minimal S i , there is a path in the modified CRN by which S i r = ⇒ which is "shortest" in the sense of having the fewest reactions that are not any of the added reactions S i → S i + Y i ; consider for each of those shortest paths the first reaction that is not one of the added reactions.Note that each of those reactions is a reaction, either trivial or implementing r, that the graphsearch algorithm will detect as possible and consider.
For each S i , construct a trajectory which starts at S i , takes the reaction S i → S i + Y i as many times as necessary to take the first non-added reaction on the shortest path by which S i r = ⇒, and takes that reaction to reach some S j ; takes S j → S j + Y j as many times as possible to take the first non-added reaction on the shortest path by which S j r = ⇒, and takes that reaction; then continues this process.Each such path must eventually, in a number of reactions less than the number of minimal states N ≤ n k , either implement r or repeat a minimal state.If any path eventually implements r, and for any minimal state S k on that path it is not known that S k r = ⇒, then the last such S k has the reaction S k τ − → S l available and S l r = ⇒ known, so on this cycle the algorithm will deduce that S k r = ⇒.
The last possibility is that at least one path much eventually repeat a minimal state; if all paths implement r and all states on such paths are known to implement r, then all minimal states are known to implement r.For any such loop, that loop will be the entire trajectory for all states on that loop.If some state in that loop is not known to reach some other state in that loop, then at least one such fact will be deduced this cycle: there will be some S i where a reaction S i τ − → S j is possible and S j τ = ⇒ S k is known but S i τ = ⇒ S k is not known; that fact will be deduced this cycle.If not, then there must be some y with m( y) = ∅ that is produced along that loop and some S i in that loop for which S i τ = ⇒ S i + y is not known; otherwise one of the reactions in the loop is not the first reaction along the shortest path by which the appropriate S j r = ⇒.
If that S i is one such that S i τ − → S j + y, then S j τ = ⇒ S i is known (by assumption that all such facts in this loop are known), and S i τ = ⇒ S i + y will be deduced this cycle.Otherwise, there will be an S i such that S i τ − → S j is possible, S j τ = ⇒ S j + y is known but S i τ = ⇒ S i + y is not known, and that fact will be deduced this cycle.This covers all the cases, and proves that if the permissive condition is true but not yet proven, then at least one fact will be deduced each cycle until the permissive condition is proven.
To complete the proof, we note that the number of facts is bounded above by (z + 1)n k + n 2k , where z ≤ n is the number of null species, and thus is poly(n k ).Thus, if the permissive condition is true, one of a finite number of facts will be learned each cycle until the permissive condition is proven.Furthermore, the algorithm will terminate one way or another in at most poly(n k ) cycles, thus poly(n k ) time. 2 Although polynomial space is inefficient, in the general case we cannot do better.Two results in particular suggest a connection between CRNs and space-bounded Turing machines, the acceptance problem of which is known to be PSPACEcomplete [37]; we use this connection to prove that verifying CRN bisimulation is PSPACE-complete.Jones et al. gave a construction to, given a space-bounded Turing machine with m states and tape size n, construct a Petri net (equivalently, a CRN) that directly simulates it, with poly(n, m) species and reactions [38].Thachuk and Condon extended this connection to reversible CRNs, constructing a CRN that solves the known PSPACE-complete problem QSAT, proving a number of questions about CRNs and DNA strand displacement systems to be PSPACE-complete [39].In the case of CRN bisimulation, if we have (on the order of) n k minimal states, it is possible to embed a PSPACE-complete computation in the trivial reactions between those n k states.Given any space-bounded Turing machine and input, we construct a formal and implementation CRN with interpretation, where the implementation CRN contains the construction of Jones et al. in the trivial reactions, plus some additional reactions specific to our case.Our construction is illustrated in Fig. 12.There is one formal reaction that can only occur in an implementation state corresponding to the accept state of the Turing machine; thus, the state corresponding to the start state can implement that reaction if and only if the Turing machine does in fact accept.The additional reactions ensure that every minimal implementation state can implement the formal reaction if the "start state" can, making the interpretation a CRN bisimulation if and only if the space-bounded Turing machine accepts.

Theorem 4.3. Verifying CRN bisimulation in the general case is PSPACE-complete.
Proof.We are given a Turing machine with m states with tape alphabet {0, 1}, and an input x of length n.We assume the states are numbered such that q 0 is the start state and q m−1 is the halt state.We assume the Turing machine always halts while never using more space than the length of its input, and when it halts, it does so in state q m−1 reading the first square of its tape, with all squares but the first reading 0, and the first square reading 1 to indicate an accepting state and 0 to indicate rejecting.Given that, our formal CRN has n + 2 species and 1 reaction, We construct an implementation CRN with species 0 i and 1 i for each spot on the tape 1 ≤ i ≤ n, q j i for each tape spot 1 ≤ i ≤ n and state of the Turing machine 0 ≤ j ≤ m − 1, additional species for a "reset" state q m i for 1 ≤ i ≤ n, and a halting species h.The implementation CRN contains reactions to simulate the Turing machine, reactions to reset the Turing machine to the start state q 0 on input x, and reactions to check whether a halting state is accepting or rejecting that can implement the formal reaction if and only if it is an accepting state.12.An implementation CRN with a correct interpretation if and only if the corresponding space-bounded Turing machine accepts.The formal CRN has one species Q corresponding to the Turing machine head and one species A i for each ith tape square; if all are present, they can react.The implementation CRN simulates the space-bounded Turing machine, with Turing machine head state species q j i all interpreted as Q and tape square species 0 i and 1 i both interpreted as A i .All reactions involved in the simulation are thus trivial.If the simulation accepts, the formal reaction can be implemented.At any time the implementation CRN can use q 6  i to "reset" to the start state on the given input, thus being able to "correctly" simulate the computation from an arbitrary initial implementation state.Thus this interpretation satisfies the permissive condition if and only if the Turing machine accepts.
To simulate the Turing machine, for each transition of the form, in state j reading symbol σ ∈ {0, 1}, write symbol σ ∈ {0, 1}, transition to state j , and move (right, left) we have n reactions of the form where the product is q j i+1 if the move is right and q j i−1 if the move is left.(If the move is right, the reaction for q j n is instead q j n + σ n → q j n + σ n , and similarly if the move is left the reaction for q j 1 is instead To reset, we have a reaction q j i → q m n for every q j i including j = m and q m−1 1 but not including any q m−1 i for i ≥ 2. We then have reactions where x i represents the species 0 i if the ith character of the string x is 0 and 1 i if the ith character is 1.
To check whether a halting state is an accepting state, we have a reaction q m−1 for 2 ≤ i ≤ n − 1, and a reaction q m−1 n + 0 n → h.Note that by assumption q m−1 is the halting state of the given Turing machine and thus has no transitions; so other than these reactions the only implementation reaction with any q m−1 i as a reactant is the reaction q m−1 1 → q m n .We want to check the validity of the interpretation m where m(q and m(h) = H .However, we will show in Theorem 4.5 that for any CRN constructed this way based on a Turing machine, the only possible valid interpretations are this one up to a permutation of formal species, and either this interpretation is valid or no valid interpretation exists.
We use the three conditions formulation of validity.The atomic condition is always satisfied by m(q 0 1 ) = Q , m(0 i ) = A i , and m(h) = H , as well as many other ways.The delimiting condition is satisfied since under this interpretation, every reaction mentioned above is trivial except for the reaction q m−1 n + 0 n → h, which is interpreted as the one formal reaction It only remains to check the permissive condition, and we will show that the permissive condition is true if and only if the given Turing machine accepts the given input x.

The set of minimal states for the one formal reaction r =
exactly the set of states containing either one copy of one q j i for j = m − 1 and one copy of either 0 i or 1 i for each 1 ≤ i ≤ n, or one copy of some q m−1 i and one copy of either 0 k or 1 k for each i ≤ k ≤ n.Appealing to Lemma 4.1, the permissive condition is true if and only if each of those minimal states S has S r = ⇒.We are particularly interested in the state S 0 containing q 0 1 and the species x i for each 1 ≤ i ≤ n, where x i represents either 0 i or 1 i depending on the ith character of x, which is minimal.Any minimal state of the first type with a q j i for j = m − 1 can reach S 0 by the reaction q j i → q m n followed by the reset reactions q m i + σ i → q m i−1 + x i and q m 1 + σ 1 → q 0 1 + x 1 in the appropriate order.Any minimal state of the second type with a q m−1 i can reach S 0 by the reverse reactions q m−1 i+1 → q m−1 i + 0 i and q m−1 2 → q m−1 1 + 1 1 , as appropriate, of the reactions to check the halting state, followed by q m−1 1 → q m n , followed by the rest of the reset reactions as appropriate.Since every minimal state can reach S 0 via trivial reactions, the permissive condition is true if and only if S 0 r = ⇒.
Note that any state of the implementation CRN with exactly one copy of one q j i for j < m − 1 and for each 1 ≤ i ≤ n exactly one of either 0 i or 1 i corresponds to the Turing machine state where the tape contents of square i is whichever of 0 i or 1 i is present, and the Turing machine is in state j reading square i where q j i is the q species present.S 0 is one such state.Note also that in any such state, the only possible reactions are to either faithfully simulate the next transition of the Turing machine, leading to either another such state or a halting state (i.e. a state which would be a simulating state except that the q species present is q m−1 1), or to start a reset.In a "resetting" state, which is a state where any q m i and for each 1 ≤ i ≤ n exactly one of either 0 i or 1 i is present, the only possible reaction is to continue the reset, leading to either another resetting state or eventually to a state that represents a Turing machine state.
If the Turing machine accepts x, then S 0 can faithfully simulate the Turing machine until it halts, which since the input was x, will be the accepting state q m−1 From this state, the reactions to check whether a halting state is an accepting state are possible in order, eventually leading to the reaction q m−1 n + 0 n → h, which is interpreted as r; thus S 0 r = ⇒ and the permissive condition is true.Conversely, if the Turing machine rejects x, then from S 0 the only possible trajectories are those that faithfully simulate the Turing machine with occasional resets.Some of those trajectories may reach a halting state, but since the Turing machine rejects x, that state will be q m−1

and the reaction
will not be possible.None of these trajectories ever reach the reaction q m−1 n + 0 n → h, thus S 0 r = ⇒ and the permissive condition is false. 2

Finding an interpretation
We now consider the problem of, given a formal and implementation CRN, can we find an interpretation that is a bisimulation or correctly assert that none exists?It is natural to consider performing an exhaustive depth-first search through the space of possible interpretations, testing each one to see if it satisfies the atomic, delimiting, and permissive conditions using the algorithms described above-thus either finding an interpretation or asserting that none exists.There are two major stumbling blocks to this approach.First, the space of possible interpretations is infinite, and thus we need some way to guarantee that if a valid interpretation exists, there must be one among a defined finite subset of interpretations that we can search.Second, to be useful in practice, the depth-first search must prune aggressively to eliminate fruitless branches.
The reactionsearch algorithm, presented below, addresses both of these challenges.Rather than directly exploring the space of interpretations, the reactionsearch algorithm organizes the depth-first search according to properties that the interpretation must have, effectively proceeding in five stages.First, as a precondition for the permissive condition, the algorithm ensures that every formal reaction has an implementation reaction that interprets to it; second, to satisfy the delimiting condition, the algorithm ensures that every remaining implementation reaction is interpreted as some formal reaction or is trivial; third, to satisfy the atomic condition, the algorithm ensures that every formal species has some implementation species that interprets to it; fourth, any unassigned implementation species are provided an interpretation that respects the assignment of implementation reactions as formal reactions or trivial reactions; and fifth, the permissive condition is tested on any such completed interpretation that is thus found.We will first describe the algorithm itself, and then discuss the lemmas that guarantee that a valid interpretation will be found if one exists.
Often, implementation CRNs are designed with specific interpretations in mind for some species, so it is reasonable to provide such information as an additional constraint on the search.Further, such a formulation enables a natural recursive definition for the algorithm (Algorithm 3).Thus, the algorithm takes as input a formal CRN (S, R), implementation CRN (S , R ), and a partial interpretation m which, for some (possibly empty) subset of S , specifies each m(x) ∈ N S .The algorithm first constructs a table of, for each formal reaction r ∈ R or τ and for each implementation reaction r ∈ R , whether r can be interpreted as r (in some completion of the partial interpretation m, regardless of what that completion will do to the other reactions).The algorithm then enumerates interpretations by iterating, in an order described below, through all possible assignments of each r to be interpreted as some r or τ , and enumerating completed interpretations which match that.
After constructing the table, if there is some r ∈ R with no r that is interpreted as r (i.e., all species x involved in r have m(x) specified and m(r ) = r), the algorithm chooses such an r with the smallest number of r that can be interpreted as that r. (If there is an r with no r that can be interpreted as r, then there is no completion of the partial interpretation that can satisfy the atomic and permissive conditions, so the algorithm returns that fact.)For each r that can be interpreted as that r, the algorithm enumerates all possible interpretations of each species not yet interpreted by m and involved in r that make m(r ) = r.For each enumerated set of interpretations, the algorithm calls itself recursively to enumerate completions of the partial interpretation m with those new interpretations added.If a valid completion is found, the algorithm returns it; otherwise, the algorithm continues with the next partial interpretation or next r .
If after constructing the table every r ∈ R has some r that is interpreted as r, the algorithm then finds the r that has the fewest r ∈ R such that r can be interpreted as r, and which hasn't yet been used for branching.(Again, if there is an r with no r ∈ R or τ that r can be interpreted as, then no completion of the partial interpretation can satisfy the delimiting condition, and the algorithm returns that.)For each such r ∈ R, the algorithm as above enumerates all possible interpretations of each uninterpreted species involved in r that make m(r ) = r and calls this algorithm recursively for each such partial interpretation.If r can be interpreted as τ , then an additional recursive branch is explored wherein m(r ) = τ is enforced.If all r ∈ R which involve uninterpreted species can be interpreted as τ , then on one branch the algorithm will consider the possibility that all of them are interpreted as τ , which as described above does not involve specifying any interpretations for the remaining uninterpreted species.Since the trivial reaction solver-which as described below will complete the interpretation for the uninterpreted species, if possible-works more efficiently with a partial interpretation which satisfies the atomic condition, the algorithm first ensures that that is the case.For each formal species A for which there is no im-Algorithm 3: The reactionsearch algorithm to complete a partial interpretation or assert that no completion exists, in polynomial space.
def reactionsearch(CRN formal, CRN impl, partial interpretation m): return complete(formal, impl, m, { }) def complete(CRN formal, CRN impl, partial interpretation m, assigned reactions k): table maybe(r, r') = True if m does not rule out m(r') = r ... for reaction r in formal or trivial, reaction r' in impl if any r has no r' or any r' has no r where maybe(r,r'): return False let r in formal where no r' in impl where m(r') = r ... and which minimizes |{r' in impl where maybe(r,r')}| if such an r exists: for each r' where maybe(r,r'): for each assignment of m'(x) where x in r' ... and m(x) undefined such that (m U m')(r') = r: out = complete(formal, impl, m U m', k U {r'}) if out is an interpretation: return out return False # if no r' is found if no such r exists: if (m(r') is known or maybe(trivial, r')) for all r': for each assignment of, for each formal species A with no implementation species x with m(x) = A, one unassigned x to have m'(x) = A: out = solve_diophantine_equations(impl, m U m') if out is an interpretation and permissive_check(out): return out if k includes all implementation reactions: return False let r' in impl where r' is not in k ... and which minimizes |{r in formal where maybe(r,r')}| for each r in formal or trivial where maybe(r,r'): for each assignment of m'(x) where x in r' ... and m(x) undefined such that (m U m')(r') = r: out = complete(formal, impl, m U m', k U {r'}) if out is an interpretation: return out return False # if no r is found plementation species x A where the partial interpretation specifies m(x A ) = A, the algorithm lists all possible x A for which, if m(x A ) = A was added to the partial interpretation, all remaining reactions would still be able to be trivial.The algorithm then iterates over all combinations of choices of such x A for each unimplemented formal species A, and runs the trivial reaction solver for each combination.
To find a completed interpretation in which all remaining r are interpreted as τ , the trivial reaction solver sets up and solves a system of linear equations in variables m(x; A) for each uninterpreted implementation species x and formal species A, where m(x; A) is the count of A in m(x).For each pair of an implementation reaction r and a formal species A the algorithm derives one equation regarding various m(x; A) by setting the sum of counts m(x; A) in the (interpreted or uninterpreted) reactants of r minus that of the products of r to be 0.For example, if the implementation reaction should be interpreted as τ , m(x 1 ) = A + C and m(x 3 ) = B + C are specified while m(x 2 ) and m(x 4 ) are unspecified, then the algorithm will derive the equations 1 and m(x 2 ; D) − m(x 4 ; D) = 0 for each other formal species D. Combining such equations for all remaining implementation reactions that are to be interpreted as trivial, we obtain a system of linear Diophantine equations where the variables specify the interpretations of all remaining uninterpreted species.The trivial reaction solver then runs an algorithm described by Contejean and Devie [40] that will find a minimal solution to a system of linear Diophantine equations, if any solution exists; this solution is then used as the interpretation of each remaining implementation species.(A minimal solution for a system of linear Diophantine equations is one such that no other solution has every variable being less than or equal to the given solution; thus there may be many minimal solutions.)We will prove in Lemma 4.4 that if there is any solution to these equations that satisfies the permissive condition, then the minimal solution returned by this algorithm does.If no solution to the equations exists, then no completed interpretation where all remaining m(r ) = τ is possible, and the algorithm returns that.
Once the reactionsearch algorithm has a completed interpretation, which may be passed in initially, passed in by a recursive call, or found by the trivial reaction solver, it then runs either the loopsearch algorithm or the graphsearch algorithm as described previously, or any other algorithm yet to be discovered, in order to check the permissive condition.If the permissive condition is satisfied, then the given interpretation is valid and the reactionsearch algorithm returns that.If not, the algorithm passes that information to previous recursive calls in order to check further possible interpretations.If any level of recursion in the algorithm has checked all possible completed interpretations without finding a valid one, that level returns that no completion exists.An example of the depth-first search tree explored by the reactionsearch algorithm for a simple pair of CRNs is shown in Fig. 13.This completes the description of the reactionsearch algorithm.Now we turn to its correctness and its complexity.For correctness, the exhaustive depth-first search aspect of the algorithm is self-evident; the outstanding issue is whether the trivial reaction solver is guaranteed to find a solution if one exists.Lemma 4.4.Given a formal CRN (S, R) and implementation CRN (S , R ), let m 0 : S → N S be a partial interpretation on some S S which satisfies the atomic condition.If there exists any completion m 1 : S → N S which agrees with m 0 on S , is a bisimulation, and is such that every implementation reaction r involving at least one species not in S has m 1 (r ) = τ , then any minimal solution of the system of equations set up by the trivial reaction solver produces a completed interpretation m which is a bisimulation.
Proof.It is clear that, given m 0 as described, any such m 1 will correspond to a solution of the equations set up by the trivial reaction solver on m 0 .It is also clear that any m 1 produced by a solution to the trivial reaction solver equations will satisfy the atomic and delimiting conditions if and only if m 0 does (since m 0 is assumed to satisfy the atomic condition), so we assume that m 0 satisfies the delimiting condition and are concerned only with the permissive condition.We first prove that if some solution to the equations produces an m 1 that satisfies the permissive condition (thus implying that a solution exists), then there is a minimal solution to the equations that produces an interpretation m that also satisfies the permissive condition.For any formal reaction r = R → P and implementation state S with m(S ) ≥ R it is also true that m 1 (S ) ≥ m(S ) ≥ R, because either m 1 is minimal or there is a minimal solution m in which each value is the same or smaller than in m 1 .Since m 1 satisfies the permissive condition, there is some implementation trajectory which, under m 1 , is interpreted as S r = ⇒.Since m 1 and m agree in their interpretation of every implementation reaction, that trajectory under m is also interpreted as S r = ⇒; since r and S were arbitrary, m satisfies the permissive condition.Since every solution to the trivial reaction equations is ≥ some minimal solution, this proves that if there is any solution that produces m 1 that satisfies the permissive condition, some minimal solution (in particular, the one ≤ it) produces m that also satisfies the permissive condition.
Having proven that at least one minimal solution produces a valid interpretation m, we show that every minimal solution does.In fact, the statement of the lemma is somewhat misleading: we show that under the above assumptions, for each formal species A that appears as a reactant in at least one formal reaction, the minimal solution to the equations for the counts of A is unique.For a formal species that never appears as a reactant, its counts in the interpretation of an implementation species cannot influence the permissive condition (given a fixed interpretation of every implementation reaction r , which is true by assumption), thus if some solution satisfies the permissive condition then every solution does.Now given m which satisfies the permissive condition, consider a complete interpretation m 2 generated by a distinct minimal solution which differs in at least one formal species that appears as a reactant.Then there is some x ∈ S where m(x)( A) > m 2 (x(A)) and A ∈ R for some formal reaction r = R → P (where m(x)( A) is the count of A in m(x)), and in particular choose x to minimize m(x)( A) for the given A. Let R 1 = R\m(x) be the formal species in R not in m(x), and let R 1 be the implementation state obtained by, for each formal species B in R 1 , taking (the appropriate count of) the species x B with m(x B ) = m 2 (x B ) = B, which exists since the partial interpretation satisfies the atomic condition.Then for R = R 1 + x, we have m(R ) ≥ R, so by the permissive condition there is a sequence of reactions which under m are interpreted as R r = ⇒.Let r be the last reaction in that sequence, which means m(r ) = r, which since all unspecified reactions must be trivial means that r involves only species in S and m 2 (r ) = r also.By removing r , we get a sequence of reactions Then first of all, Y contains all the reactants of r , and second, m and m 2 agree on every species in Y .Since that trajectory consists of only trivial reactions under both interpretations, we must have , which is also a contradiction since we assumed x was chosen to minimize m(x)( A) of species for which m(x)( A) > m 2 (x)(A).So either way, given a partial interpretation which satisfies the atomic condition, if there is a complete interpretation which satisfies the three conditions in which all remaining reactions are trivial, then the trivial reaction solver will find one by searching for the first minimal solution.2 Theorem 4.4.Given a formal CRN (S, R), implementation CRN (S , R ), and a partial interpretation which specifies m(x) for some (possibly empty) set of various x ∈ S , whether a complete interpretation m : S → N S exists that respects the given partial interpretation and is a bisimulation can be decided in polynomial space.In particular, if such an interpretation exists, then one exists that is polynomial size in that of the two CRNs and the partial interpretation.
Proof.We prove that the reactionsearch algorithm described above outputs a correct completion of the partial interpretation if one exists and returns false if none exists; that it does so using only polynomial space; and that in particular if a correct interpretation is output then the interpretation is polynomial size in the input.
Most of the algorithm consists of, given a partial interpretation, trying out some number of ways to specify the interpretation of some species not yet specified, then calling the algorithm recursively on each of those more-specified partial interpretations.We will show that, if a correct completed interpretation exists, then at least one of those more-specified partial interpretations can be completed to that interpretation.Since the number of unspecified species and reactions decreases at each recursive call, the algorithm will eventually reach that completed interpretation.In any correct interpretation, for each formal reaction r, the atomic condition implies that there is an implementation state that interprets to the reactants of that reaction, and the permissive condition that that state must be able to eventually reach some r with m(r ) = r; in particular, such an r must exist.So when the algorithm says, if no r is known to be interpreted as r, for each possible r enumerate all possible ways that r can be interpreted as r, if a correct completed interpretation exists then one of those possible ways must be part of it, provided that "enumerate all possible ways" can be done in finite time and in particular in polynomial space.Similarly, by the delimiting condition every r must have either m(r ) = r for some r or m(r ) = τ , so when the algorithm considers all those possibilities, one of them must be part of the correct completed interpretation if one exists.On the branch where the algorithm has found all r with m(r ) = τ in the correct interpretation, it will run the trivial reaction solver for all remaining m(r ) = τ .At any given time the algorithm only needs to store a single partial interpretation for each of at most |S | + |R | layers of recursive calls (since each one will specify the interpretation of at least one implementation species or reaction), plus whatever information is needed to "enumerate all possible interpretations" or run the trivial reaction solver or a permissive condition test, all of which we will prove take polynomial space (with the permissive condition tests already proven).So all we have left to prove is that enumerating "all possible interpretations" of each uninterpreted species in some r such that m(r ) = r for a specific r, or enumerating "all possible interpretations" for achieving the atomic condition, can be done in finite time and polynomial space, and that the trivial reaction solver works and takes polynomial space.We first address enumerating all possible interpretations of each species in a given r such that m(r ) = r for a given r.If r = R → P and r = R → P , then m(r ) = r if and only if m(R ) = R and m(P ) = P .In particular, if in a partial interpretation r can be interpreted as r, then the interpreted part of m(R ) must be ≤ R and similarly the interpreted part of m(P ) must be ≤ P .By taking the difference of R minus the partial interpretation of R and taking all assignments of each formal species in that difference to some uninterpreted species in R , doing the same for P and P , and removing any assignments that self-contradict, we can enumerate all possible partial interpretations where m(r ) = r.To elaborate, multiple copies of the same formal species in R or P can be assigned to different implementation species, but different copies of the same implementation species in R and/or P must be assigned the exact same multiset of formal species, otherwise the interpretation self-contradicts.Since all assignments of at most k copies each of i objects to j boxes can be enumerated in poly(i, j, log k) space, this process takes polynomial space.Since the interpretation of any given species involved in r where m(r ) = r = R → P must be ≤ R or ≤ P , the partial interpretation throughout this entire part of the algorithm is bounded by the size of the formal CRN (except for larger interpretations provided in the initial partial interpretation), and is thus polynomial size.Lemma 4.4 proves that if at the point when the trivial reaction solver is called a valid completion of the interpretation exists (where all remaining reactions are trivial), then the trivial reaction solver will find one and it will be a minimal solution of the given equations.It remains to show that the trivial reaction solver runs in polynomial space and produces a polynomial-size interpretation.The trivial reaction solver runs, for each formal species, the stack-based algorithm given by Contejean and Devie [40] to solve a system of linear Diophantine equations.Where q is the number of variables in the system, i.e. the number of implementation species whose interpretation is not yet specified, Contejean and Devie prove that their algorithm stores at most q states at one time on the stack, each of which is a tuple of q integers.Pottier has proven a bound on the size of those integers, namely, that their sum is at most, in that notation, (1 There r is bounded above by the number of unknown implementation reactions and ||A|| 1,∞ is the maximum of any individual equation (i.e., unknown implementation reaction) of the sum of coefficients in that equation (coefficients of unknown implementation species in the reaction, or of formal species in the known interpretations involved).The "size" of the given interpretation is bounded by q times the logarithm of the bound on an individual count, specifically, |m| ≤ qr(1 + log ||A|| 1,∞ ), where log ||A|| 1,∞ itself is the "size" of some combination of the implementation CRN and the current partial interpretation.The implementation CRN is of course part of the input to the algorithm, and we have proven that the partial interpretation up to this point is polynomial in the size of the input; thus the entire algorithm runs in space polynomial in its input, and if a correct interpretation exists, then one exists which is polynomial size. 2 In the general case, finding an interpretation turns out to be just as hard as checking an interpretation; in fact, the same space-bounded Turing machine reduction from Theorem 4.3 applies.

Theorem 4.5. Whether a bisimulation interpretation exists from a given implementation CRN to a given formal CRN is PSPACEcomplete.
Proof.Theorem 4.4, with Theorem 4.1 for checking the permissive condition, prove that a bisimulation can be found, or shown that none exists, in polynomial space.To prove completeness, we use the same formal and implementation CRN used in Theorem 4.3 and shown in Fig. 12.Consider an arbitrary Turing machine with m states and tape alphabet {0, 1}, with start state q 0 and halt state q m−1 which on any input, halts without using more space than the length of the input, with the tape reading 10 n−1 if it accepts and 0 n if it rejects; also consider an input x with length n.Given that, the formal CRN has n + 2 species and 1 reaction, The implementation CRN has species 0 i and 1 i for each tape spot i, q j i for each tape spot i and each Turing machine state j, additional q m i for a "reset" state m and each tape spot, and a halt species h.As described in Theorem 4.3, the implementation CRN can simulate the Turing machine with reactions q j i + σ i → q j i±1 + σ i ; can "reset" the computation to the start state reading string x with reactions q j i → q m n (for all q j i except q m−1 i for i > 1), q m i + σ i → q m i−1 + x i , and q m 1 + σ 1 → q 1 0 + x 1 ; and can check whether the computation has accepted with reactions q m−1 and q only if the given Turing machine accepts the string x, thus proving that checking an interpretation is PSPACE-complete.To prove that finding an interpretation is PSPACE-complete, we show that aside from permutations of the formal species in that interpretation (which are also correct if and only if the Turing machine accepts x), no other interpretation can be correct; therefore, a correct interpretation exists if and only if the Turing machine accepts x.
To prove that only one correct interpretation (up to permutations) is possible, we use the three conditions to eliminate possibilities until only that one remains.First, by applying the delimiting condition to the reactions q j i → q m n , either all q j i (except q m−1 i for i > 1) must have the same interpretation, or some subset of them (including q m n ) must interpret to H and the rest to We can quickly eliminate the cases m(q m n ) = ∅ and m(q m n ) = H , and prove that m(0 i ) = m(1 i ) for every i.
If m(q m n ) = ∅, then by the atomic condition there must be some state which interprets to exactly by, for each of those formal species, selecting one implementation species that interprets to exactly one of that formal species and nothing else.For each q j i that appears, apply the appropriate sequence of reactions from q m−1 i , and q j i → q m n until the only q j i that appears is some number of copies of q m n .All these reactions must be trivial (or the delimiting condition is violated): each of the q m−1 i -involving reactions are reversible while the formal reaction is not, while if m(q m n ) = ∅ then any q j i → q m n reaction has product ∅ while no formal reaction does; thus the resulting implementation state has the same interpretation as the original, namely , by removing all copies of it we get another implementation state with interpretation Q + A 1 + • • • + A n but no copies of q j i for any i, j.Since every implementation reaction has some q j i as a reactant, no reactions can fire in this state and the permissive condition is violated.
If m(q m n ) = H then m(q m i ) = m(q 0 1 ) = H because the "reset" reactions q m i + x i → q m i−1 + x i and q m 1 + x 1 → q 0 1 + x 1 (using the notation x i for 0 i if the ith symbol of the string x is 0 and 1 i if it is 1, and x −i for 1 i if the ith symbol of x is 0 and 0 i if it is 1; these are the reactions used in the event that the tape already has the correct symbol) must be trivial (since no formal reaction has an H as a reactant), and the only difference between the two sides is the two q's.
for all i: the reaction must be trivial since m(q m n ) = ∅ and no formal reaction is catalytic.By the atomic condition, there are n + 1 species Q , A 1 , . . ., A n which must each have some implementation species that interprets as one copy of that species, and since m(0 i ) = m(1 i ) the 0 i 's and 1 i 's can account for at most n of them.Call the remaining species X , and note that by assumption either m(q m−1 a sequence which takes a state whose interpretation contains either an H or all of Q , A 1 , . . ., A n to a state which has no H and only one of Q , A 1 , . . ., A n ; this is impossible in the formal CRN, thus at least one implementation reaction on that pathway must be nontrivial and not a formal reaction.The only remaining implementation species are the q j i 's, and we have already shown that if m(q m n ) = H then m(q j i ) for every j = m is either H or Q + A 1 + • • • + A n , and cannot be just X .That leaves only some q m−1 i for i ≥ 2, but since the reversible reactions must be trivial, m(q m−1 i ) = m(q m−1 1) + k<i m(0 k ), which must contain either an H or all of Q + A 1 + • • • + A n , and cannot be just X .Thus it is impossible to have m(q m n ) = H , and we must have m( So far we know that m(0 i ) = m(1 i ) for all i, m(q m n ) is neither ∅ nor H , m(q j i ) = m(q m n ) for j = m − 1, and since the reversible reactions must be trivial m(q m−1 i ) = m(q m n ) + k<i m(0 i ).In order to satisfy the atomic condition for n + 2 formal species, we need n + 2 implementation species with distinct interpretations.Regardless of the interpretations of any other species, m(q m−1 i ) ≥ m(q m n ) so no q m−1 i can be interpret as a single formal species other than the one (if any) that q m n is interpreted as, so all the q j i 's can satisfy the atomic condition for at most one formal species.Each pair of 0 i and 1 i must have the same interpretation, so all the 0 i 's and 1 i 's can satisfy at most n species, and h can satisfy an additional one, but no other implementation species remain.So since we have n + 2 "categories" of implementation species that can possibly be interpreted as a single formal species and n + 2 formal species, each such group must be interpreted as a distinct formal species.Given that, the reaction q m−1 n + 0 n → h cannot be trivial, so it must be interpreted as Q + A 1 + • • • + A n → H ; in particular, we must have m(h) = H .The remaining constraints say that each 0 i and q m n is interpreted as a distinct one of Q or some A i , that m(1 i ) = m(0 i ), that m(q j i ) = m(q m n ) for j = m − 1, and that m(q m−1 i ) = m(q m n ) + k<i m(0 i ); this is exactly the interpretation given in Theorem 4.3, up to a permutation of the formal species Q and the A i 's.Any such interpretation will satisfy the atomic and delimiting conditions, and will satisfy the permissive condition if and only if the given Turing machine accepts the string x, thus finding a correct interpretation is as hard as deciding whether a linear bounded Turing machine accepts a given string, which is PSPACE-complete. 2 When the number of formal reactants is bounded by a constant, we showed that whether an interpretation is valid can be checked in polynomial time.Then finding an interpretation is a natural NP problem; we show that it is in fact NP-complete, with a reduction from 3-SAT.An example of this reduction is shown in Fig. 14.
Theorem 4.6.When the number of reactants in a formal reaction k is bounded by a constant k ≥ 1, whether a bisimulation interpretation exists is NP-complete.
Proof.If a valid interpretation exists, Theorem 4.4 guarantees that we can find a valid polynomial-size interpretation which can be checked in polynomial time by Theorem 4.2.
To prove NP-completeness, given an arbitrary 3-SAT formula we construct a formal and implementation CRN such that a valid interpretation exists if and only if the formula is satisfiable.Our formal CRN has two species C and T and three reactions C → T , C → 2T , and C → 3T .Our implementation CRN has two species s C and s T plus for each variable x i in the 3-SAT formula two species x t i and x f i .We encode each clause of our 3-SAT formula e.g.(x 1 ∨ ¬x 2 ∨ x 3 ) as an implementation reaction e.g.
Formally, we say that a clause is (l 1 ∨ l 2 ∨ l 3 ) where each l j is some x i or ¬x i , and it is encoded in the implementation reaction s C → v 1 + v 2 + v 3 , where v j is x t i if l j is x i , or x f i if l j is ¬x i .We also add the implementation reactions s C → s T , s C → 2s T , and s C → 3s T , and for each x i the reaction s T x t i + x f i which we will show restrict interpretations that satisfy the three conditions to correspond to satisfying assignments of the 3-SAT formula.
We again observe that none of the formal reactions are reversible, so in order to satisfy the delimiting condition the reactions s T x t i + x f i must be trivial.Note that all other implementation reactions have exactly s C as their reactants.The leaving not enough species to satisfy the atomic condition.Therefore at least one of those two reactions must be formal; the only way to satisfy that is m(s T ) = T , making those two reactions and s C → 3s T interpreted as the three formal reactions respectively, and also satisfying the atomic condition.
Given that m(s C ) = C and m(s T ) = T , since the reversible reactions are all trivial we have for each i, m( In other words, exactly one of m(x t i ) or m(x f i ) is T , and the other is ∅.Such an interpretation satisfies the atomic condition with m(s C ) = C and m(s T ) = T , and satisfies the permissive condition since any state whose interpretation has a C has an s C , and can do the reactions s C → s T , s C → 2s T , or s C → 3s T for whichever formal reaction is desired; further, those three reactions and the reversible s T x t i + x f i satisfy the delimiting conditions, leaving only the reactions for each clause of the 3-SAT formula.These interpretations have an obvious one-to-one correspondence with assignments to the variables in the 3-SAT formula: corresponding to a clause (l 1 ∨ l 2 ∨ l 3 ), will be interpreted as either C → ∅, C → T , C → 2T , or C → 3T , depending on how many of l 1 , l 2 , and l 3 are true in the corresponding assignment.Specifically, the interpretation will satisfy the delimiting condition (none of the clause reactions are interpreted as C → ∅) if and only if the assignment satisfies the formula (none of the clauses have no true variables).Thus a valid interpretation exists if and only if the formula has a satisfying assignment, which completes the proof of NP-completeness. 2

Using the modularity condition
Because finding and checking interpretations are in some cases computationally intractable, any way of reducing the size of the problem would be helpful.Often, a larger implementation CRN can be broken up into a number of smaller modules.The modularity condition of Definition 3.3 shows that each module can be checked individually and combined into a correct implementation, as described in Theorem 3.2 and Corollary 3.3.We show that the reactionsearch algorithm from Section 4.2 can be modified to iterate through a number of correct interpretations such that if any correct interpretation is modular with respect to the given sets of common formal and implementation species, then one of the enumerated interpretations is.We also show that whether an interpretation is modular with respect to given sets of common formal and implementation species can be checked in polynomial time, and present an algorithm to do so.We begin by proving Lemma 4.5, which provides the mathematical foundation for the modified reactionsearch algorithm.Then, Lemma 4.6 presents an algorithm for testing modularity and establishes its running time.Finally, Theorem 4.7 presents the modified reactionsearch algorithm for finding a modular interpretation and establishes its running time.Note that the user will be responsible for identifying the modules and the common species; the modified reactionsearch algorithm will be responsible for finding a valid modular interpretation with respect to those modules, if one exists.Thus, below, the formal CRN and implementation CRN that we discuss will be only those species and reactions relevant to verification of a single module at a time.
Recall that the reactionsearch algorithm for finding an interpretation was based on Lemma 4.4 which proved that if a valid completion of an interpretation exists when the trivial reaction solver is called, the trivial reaction solver will find a valid completed interpretation.Our first step is to show that if at the same point a valid modular completion exists, then the trivial reaction solver will find one.Lemma 4.5.Given a formal CRN (S, R), implementation CRN (S , R ), and sets of common species S 0 and S 0 , let m 0 : S → N S be a partial interpretation defined on some set of implementation species S with S 0 ⊂ S S , where m 0 satisfies the atomic condition.If there exists any completion m 1 : S → N S that agrees with m 0 on S , is a modular bisimulation with respect to S 0 and S 0 , and such that every reaction r involving at least one species not in S has m 1 (r ) = τ , then any minimal solution of the system of equations set up by the trivial reaction solver produces a completed interpretation m which is a modular bisimulation with respect to S 0 and S 0 .If every formal species in S 0 appears as a reactant in R, then the previous statement holds even if S 0 ⊂ S .
As a remark, we give two conditions of which only one is required for the lemma to hold: S 0 ⊂ S (the interpretation of the common implementation species is already known) or every formal species in S 0 appears in a reaction.In the typical use case of this algorithm, namely a systematic DNA strand displacement implementation to which we want to apply Corollary 3.3, the first condition holds but the second does not.In such a system, S 0 is typically the set of all signal species {x A , x B , . . .}, for which we "know" that m(x A ) = A etc., at least in the sense that any interpretation where that is not true is not interesting.However, in such a system S 0 = S is the set of all formal species, in that each module officially contains all formal species of the larger CRN even if only some of them appear in a reaction in any given module.
Proof.First, we show that if some solution to the trivial reaction solver equations produces a modular bisimulation, then some minimal solution does.Let m 1 be the modular bisimulation produced by the solution that exists by assumption, and let m be the interpretation produced by an arbitrary minimal solution ≤ that solution; so for all x, m(x) ≤ m 1 (x).Lemma 4.4 shows that m is a bisimulation, so we only need to prove it is modular.Given any x, there is a sequence of trivial reactions (since any two solutions agree on the interpretation of any reaction, the sequence is trivial under both m 1 and m) by which x τ = ⇒ Y + Z , where Y ⊂ S 0 and m 1 (Z ) ∩ S 0 = ∅.Since m ≤ m 1 , in particular m(Z ) ≤ m 1 (Z ) and m(Z ) ∩ S 0 = ∅, so the same sequence of trivial reactions satisfies the modularity condition for m.Since this applies for all x, m is also modular.Now that there exists at least one minimal solution which produces a modular bisimulation, let m be that modular bisimulation and m 2 a bisimulation produced by another minimal solution.Given that the interpretation of every reaction is the same, we know we can treat the solution for each formal species separately.For species B / ∈ S 0 , if x τ = ⇒ Y + Z for Y ⊂ S 0 and m(Z ) ∩ S 0 = ∅ then m 2 disagreeing with m on the count of B in any species will not affect whether m 2 (Z ) ∩ S 0 = ∅; if m and m 2 only disagree on species not in S 0 then m 2 is modular.(Since Lemma 4.4 proves that if any minimal solution produces a bisimulation then the minimal solution is unique on all formal species that appear as a reactant, if every species in S 0 appears in a reaction then the above completes the proof without requiring S 0 ⊂ S .)If m and m 2 disagree on any m(x)( A) for A ∈ S 0 , then take any such x look at the reactions by which x τ = ⇒ Y + Z with Y ⊂ S 0 and m(Z ) ∩ S 0 = ∅.Now using the assumption S 0 ⊂ S so m and m 2 agree on any species in S 0 , in particular m(x Since each formal species can be taken independently, if m 2 (x)(A) > m(x)( A) whenever they disagree then an interpretation m 3 defined as m 3 (x)(A) = m(x)( A) and m 3 (x)(B) = m 2 (x)(B) for B = A is also a solution, but m 3 ≤ m 2 so m 2 is not minimal, a contradiction.This proves that the minimal solution to the trivial reaction solver is unique and correct for any formal species A that appears as a reactant in a formal reaction (Lemma 4.4) or is in S 0 (assuming S 0 ⊂ S ), and any other formal species can't affect whether the interpretation is a modular bisimulation; so if a modular bisimulation exists, then given the conditions, any minimal solution to the trivial reaction equations produces one. 2 Given a formal and implementation CRN (S, R) and (S , R ), an interpretation m which we already know is a bisimulation, and sets of common species S 0 and S 0 , we can check whether m is modular with respect to S 0 and S 0 using an algorithm similar to the graphsearch algorithm for checking the permissive condition.First, construct a table which for each implementation species x ∈ S stores either that x is "finished", or if it is not finished, stores a list of all x ∈ S such that m(x ) = x and x τ = ⇒ x + Z is known (x "can reach" x ) and a list of all z ∈ S such that x τ = ⇒ x + z is known (x "can produce" z; this implies m(z) = ∅).Here "x is finished" means either it is known that x τ = ⇒ Y + Z for Y ⊂ S 0 and m(Z ) ∩ S 0 = ∅, or it is known that x τ = ⇒ x 1 + X for ∅ < m(x 1 ) < m(x).We will show later that if all x meet one of those two conditions, then m is modular with respect to S 0 and S 0 .Initialize the table to say that x is finished if x ∈ S 0 or m(x) ∩ S 0 = ∅, otherwise x is known to reach itself and not known to produce anything.Then in cycles, for each x not yet finished where Z x is the set of all z such that x τ = ⇒ x + z is known, for each trivial reaction that can happen in a state with one x and arbitrarily many copies of Z x (written x + ∞Z x τ − → . . .), update the table according to the following rules: Continue until one cycle (checking every trivial reaction for every unfinished x) passes with no changes to the table.At that time, if every x is finished then m is modular, otherwise m is not modular (with respect to S 0 and S 0 ).Lemma 4.6.Given a formal and implementation CRN (S, R) and (S , R ), a bisimulation m : S → N S , and common species sets S 0 ⊂ S and S 0 ⊂ S , whether m is modular with respect to S 0 and S 0 can be checked in polynomial time.
Proof.We prove that the above algorithm is correct and runs in polynomial time.For correctness, first, the table is initialized with true facts and updated with true deductions: for initialization, if x ∈ S 0 then x τ = ⇒ x + ∅, and if m(x) ∩ S 0 = ∅ then then every species x ∈ X has m(x ) ≤ m(x) − m(x 1 ) < m(x), so by the induction hypothesis every x ∈ X ∪ {x 1 } is finished and therefore has x τ = ⇒ Y x + Z x as desired, so by combining all those x τ = ⇒ x Y x + x Z x , satisfying the modularity condition.This proves that if the algorithm says m is modular, then it is.
To complete the proof that the algorithm is correct, we show that if m is in fact modular, the algorithm will not say it is not.The algorithm terminates when a cycle passes with no change to the table, so we show that if m is modular but at the beginning of a cycle at least one species is not yet finished, then there is some fact not yet in the table that will be learned this cycle.Consider a modified implementation CRN where the reaction x τ − → x + Z x for the current Z x known to be producible at x is added for each x; if m is modular in the original CRN then it is modular in the new CRN, since reactions were only added.Then consider, for each x ∈ S , the first reaction on the path with the fewest non-x τ − → x + Z x reactions by which x τ = ⇒ Y + Z to satisfy the modularity condition; the algorithm will consider these reactions (among others) as possible and update based on them this cycle.Now consider the path obtained by starting from an x not yet finished, at any given state x (ignoring null species), taking that first reaction, until either a Y + Z satisfying the modularity condition is reached, or some x 1 + X with ∅ < m(x 1 ) < m(x) is reached, or a non-null species is repeated; for this purpose, the reactions x τ − → x + Z x do not count as repeating a species.(Given finitely many x with m(x ) = m(x), one of those three must eventually happen.)If the path ends in Y + Z or x 1 + X , then the last x with m(x ) = m(x) along the path which is not yet finished (which may be x) will be marked as finished this cycle.If the path ends by repeating a species, not known to reach some other species x 2 in the loop, for each x 2 the last such species will become known to reach x 2 this cycle.If every species in the loop is known to reach every other species in the loop, and x 0 is known to produce every species in Z 0 , then replacing the x 0 τ = ⇒ x 0 + Z 0 loop with the (in our measure of path length, 0-length) reaction x 0 τ − → x 0 + Z x 0 would create a shorter path by which x τ = ⇒ Y + Z ; so there is some z ∈ Z 0 not known to be produced from x 0 .Somewhere in that loop is a reaction x 1 τ − → x 2 + z + Z 2 , and since every species in the loop is known to reach every other species in the loop (including itself), the last species in the loop before x 1 which is not yet known to produce z (which may be x 0 ) will be known to produce z this cycle.This covers all cases, and completes this part of the proof: if m is modular but the algorithm does not yet known that every species is finished, it will learn at least one new fact each cycle, thus never saying no.
To complete the proof, we show that the algorithm always terminates in polynomial time.Each cycle consists of for each implementation species, for each trivial reaction, checking whether that reaction is possible from that species plus null species, checking properties in the table for each of the produced species, and updating the table, a polynomial number of polynomial-time operations.Since at least one fact must be learned each cycle, the number of cycles is bounded by the number of facts: n(n + z + 1), where n is the number of implementation species and z the number of null species, so the algorithm is guaranteed to terminate in polynomial time.Since the algorithm is guaranteed to terminate, that the algorithm never returns no when m is modular implies that it returns yes, which also completes the proof of correctness.2 To find modular bisimulation interpretations, we modify the reactionsearch algorithm such that after checking the permissive condition it also uses the above algorithm to check the modularity condition.For this to be correct we would need to prove that if a modular bisimulation exists then the algorithm will find it; thankfully this is true.Theorem 4.7.Given a formal and implementation CRN (S, R) and (S , R ), sets of common species S 0 ⊂ S and S 0 ⊂ S , and a partial interpretation which specifies m(x) for some set S ⊂ S of various x ∈ S , provided that either S 0 ⊂ S or every formal species in S 0 appears as a reactant in R, whether a complete interpretation m : S → N S exists that respects the given interpretation and is a modular bisimulation with respect to S 0 and S 0 can be decided in polynomial space.In particular, if such an interpretation exists, then the modified reactionsearch algorithm will find one that is polynomial size in that of the two CRNs and the partial interpretation.
Proof.That the modified reactionsearch algorithm outputs only polynomial-size bisimulations and runs in polynomial space is proven in Theorem 4.4, so when combined with the modularity checker it will output only polynomial-size modular bisimulations.If a complete modular bisimulation exists, then whatever partial interpretation of it specifies the interpretation of all nontrivial reactions will be considered by the modified reactionsearch algorithm, at which point it will call the trivial reaction solver; so far, the proof is the same as Theorem 4.4.Given that partial completion, by Lemma 4.5, if a completion of it exists (which it does) then one exists which is a minimal solution of the trivial reaction solver equations.That such an interpretation must be polynomial-size is again proven in Theorem 4.4, so it will be found and verified by the permissive and modularity checkers, and returned.2 Given a large formal CRN and implementation CRN along with a user-provided breakdown into modules with defined common species and a partial interpretation on the common implementation species, the modified reactionsearch algorithm may be applied sequentially (or in parallel) to each module; if all modules admit a valid modular interpretation, then a valid interpretation for the full system exists-specifically, the union of all the modules' interpretations, which will be consistent since they share the given interpretation on the common species.Furthermore, if the algorithm fails to find an interpretation for some module, then no valid modular interpretation exists (although it remains possible that a non-modular valid interpretation exists).Note that the choice of modules must be consistent with the requirements of Theorem 3.2, for example, every module contains the same set of common species and their intersection contains no other species.
We have shown that finding a modular bisimulation is not significantly harder than finding a bisimulation at all, and in fact finding a modular bisimulation for a large CRN broken into many modules is much easier than trying to find a bisimulation for the whole CRN with no information about its modularity.On a trivial level, any interpretation is modular with respect to common implementation species S 0 = S regardless of the common formal species or common formal species S 0 = ∅ regardless of the common implementation species.Thus, Theorems 4.3, 4.5, and 4.6 apply, and checking a modular bisimulation is PSPACE-complete in general (but polynomial time in n k if the largest number of reactants in a formal reaction is k), and finding one is PSPACE-complete in general and NP-complete when the number of reactants in a formal reaction is bounded by a constant.Whether this stays true for more "meaningful" cases of modularity is a more interesting question.For example, the property S 0 = S, |S 0 | = |S 0 | and satisfies the atomic condition-every formal species is common, and the common implementation species consist of exactly one species x A with m(x A ) = A for each formal species A-describes the typical non-history-domain systematic DSD implementation of CRNs, and neither of the CRNs in Theorems 4.3, 4.5, or 4.6 are modular with respect to any sets with that property.Whether there is another hardness proof for that sort of set of common species, or whether checking or finding a modular interpretation is in fact easier in (the worst case of) that subcase, is currently an open question.

Bisimulation in transition systems
We call this theory "CRN bisimulation" because it is a special case of the theory of weak bisimulation in concurrent systems, adapted to CRNs.In [19], this theory is defined in terms of a labeled transition system where is a set of states, T a set of labels, and for each t ∈ T there is a relation t − →⊂ × , specifying which states can transition to which other states by an action of type t.For example, a CRN (S, R) can be expressed as a labeled transition system; there are multiple ways to do this, but to give one particularly natural way, let = N S be the set of all states in the usual sense of the CRN, T = R so that the labels for transitions are the reactions, and for r = R → P ∈ R the transition relation is r − →= {(S, S − R + P ) | S ∈ N S , S ≥ R}.This construction matches the semantics of CRNs as defined in Section 2, and is the basis of the connection between our theory of CRN bisimulation and weak bisimulation as defined in [19].
Aside from labeled transition systems, however, the paradigm used by Milner in [19] diverges from the paradigm we use when discussing CRNs.Milner discusses concurrent processes in terms of agents and agent expressions in a certain language, and defines a single labeled transition system where is the set of all, infinitely many (in fact uncountably many) agent expressions, with T similarly infinite.Strong and weak bisimulation are used to define which agent expressions are in fact "the same agent" in terms of either what actions they can do (strong) or what observable (non-τ ) actions they can do (weak).The two types of bisimulation are eventually used to define a notion of equality of agent expressions which roughly matches "same sequence of observable actions" while being preserved by each of the combinators in the language used to define an agent expression.For this purpose, the concept of one labeled transition system is particularly useful, and this one labeled transition system has single relations ∼, ≈, and = defined as "the" strong bisimulation, weak bisimulation, and equality, respectively.In order to get there, however, Milner defines what it means for a relation to be "a" strong or weak bisimulation, then defines "the" strong or weak bisimulation as the largest such relation.For example, Definition 5.1 (Definition 5.5 in [19]).In contrast to Milner in [19] comparing two agents (states) of the same labeled transition system, we want to compare two CRNs which we think of as separate (labeled transition) systems.This itself is not a significant difference: given two systems ( 1 , T , t − → 1 : t ∈ T ) and ( 2 , T , t − → 2 : t ∈ T ) and a relation ↔⊂ 1 × 2 we can consider the system fitting Milner's paradigm with no significant changes.More importantly, our concept of CRN equivalence wants to consider an asymmetric pair of CRNs: one, (S, R), is the "formal" CRN where R is the set of "meaningful actions", and another, (S , R ), is meant to be an implementation of the formal CRN.This means that some natural conditions we want our definition of correctness to have are that every state of the implementation CRN corresponds to one and only one state of the formal CRN; that every state of the formal CRN has at least one state of the implementation CRN that implements it; and since we're working with CRNs which are fundamentally linear, that the sum of any number of implementation states corresponds to the sum of their corresponding formal states.(This linearity condition is also why the undecidability result from [21] doesn't apply to our CRN bisimulation.)It turns out that those conditions on a relation ↔⊂ N S × N S are true if and only if ↔ corresponds to some interpretation m : S → N S as defined in Definition 3.1: Lemma 5.1.Let ↔⊂ N S × N S be a relation between formal states and implementation states.If for every implementation state S there is exactly one formal state S such that S ↔ S (function) and for every pair of pairs S 1 ↔ S 1 and S 2 ↔ S 2 we have S 1 + S 2 ↔ S 1 + S 2 (linearity), then there is some interpretation m : S → N S which, when extended to implementation states m : N S → N S , induces that relation: S ↔ S ⇐⇒ S = m(S ).Furthermore, for every S there is some S such that S ↔ S (surjectivity) iff m satisfies the atomic condition.
Proof.Given that the relation ↔ is a linear function from N S to N S , we define the interpretation to be m(x) = S x where S x is the unique formal state such that S x ↔ {|x|}.Now, any implementation state S is some sum of implementation species, S = x∈S α x x, and because we define the interpretation of a state as the sum of interpretations of species, m(S ) = x∈S α x m(x).Then by the linearity assumption on ↔, m(S ) ↔ S .Thus, if S = m(S ), then S ↔ S .Conversely, if S ↔ S , then S = m(S ) because ↔ is a function.
If we further assume that ↔ is surjective, then in particular for each formal species A, there must be some S such that {| A|} ↔ S , i.e. m(S ) = {| A|}.Since m(S ) is the sum of interpretations of species in S and an implementation species cannot interpret to fractional or negative formal species, there must be some species x A ∈ S with m(x A ) = {| A|} (and any other species in S interpret to ∅).Thus the atomic condition is satisfied.Conversely, if the atomic condition is satisfied, then consider an arbitrary formal state S = A∈S α A A. Using linearity, let S = A∈S α A x A , so m(S ) = S, and thus ↔ must be surjective.2 Since we said that R should be the set of "meaningful actions", that means our transition systems ( i , T , t − → i ) should have the same set of labels, and at first glance that set of labels should be T = R ∪ {τ }.In the formal CRN, this is easy: the interpretation of a CRN as a transition system that we previously described has T = R, which simply means no τ transitions appear.In the implementation CRN, we need to find a correspondence between implementation reactions and formal reactions (or τ ), but this is already what the interpretation does: as described in Definition 3.1, any interpretation m : S → N S induces a map m : R → (N S × N S ) ∪ {τ }. (We previously referred to members of N S × N S not necessarily in R as "reactions in the language of the formal CRN".)Since an implementation reaction might be interpreted as a reaction in the language of the formal CRN which is "invalid" i.e. not in R, we take T = (N S × N S ) ∪ {τ }, and when converting the implementation CRN to a transition system we say for r ∈ T that S r − → T if S r − → T and m(r ) = r for some r ∈ R .
(This means that, formally, which transition systems we are comparing depends on the relation we find between them, a bit of apparent circularity which leads to various differences between CRN bisimulation and the classic definition.)Given this, an interpretation is a CRN bisimulation (satisfies Definition 3.2(III)) if and only if it satisfies the Atomic Condition (Definition 3.2(II.i))and the relation on states it induces is a weak bisimulation (Definition 5.1).Therefore, a valid interpretation m can be equivalently described as a surjective linear weak bisimulation.
We would like to compare some features of our concept of CRN bisimulation to bisimulation in transition systems as in [19].To avoid ambiguity, for this discussion we use the phrase "bisimulation relation" to mean a relation between states ↔⊂ S × S which satisfies Definition 5.1, and "CRN bisimulation" or "bisimulation interpretation" to mean an interpretation m : S → N S which satisfies Definition 3.2 and therefore induces a bisimulation relation.
The most important difference between a bisimulation relation and a CRN bisimulation is that, as we said earlier, the transition system induced by the implementation CRN is not fully defined until an interpretation is given.For example, if R , m(x 1 ) = A and m(x 2 ) = B then the transition {|2x 1 |} → {|x 1 , x 2 |} has label A → B, but if m(x 1 ) = m(x 2 ) = A then the same transition has label τ .So for example, the fact (Proposition 5.1(4) in [19]) that i∈I ↔ i is a bisimulation relation if each ↔ i is a bisimulation relation has no obvious analog for CRN bisimulations, since there is no obvious way to even define the union of two relations defined on different and "contradictory" transition systems.We don't try.
Milner discusses the relation ≈= {↔|↔ is a bisimulation relation} [19].In any given transition system, such a relation exists, is the largest bisimulation relation (which follows from the previous statement about unions), and is an equivalence relation.In the context of comparing agent expressions, this is a useful way of saying, for example, that the result of applying a sequence of transitions to a complex agent expression is another complex agent expression.In the context of a formal CRN (S, R) and its induced transition system (N S , R ∪ {τ }, r − →), the relation ≈ exists but is not very useful.In fact the relation is the trivial S ≈ T ⇐⇒ S = T relation except in some particularly degenerate CRNs; for example, if S = {A} and R = {∅ → A} then S ≈ T for all S, T .In the context of CRN bisimulation, a formal CRN and implementation CRN where the actions are reactions in the language of the formal CRN or τ , as previously discussed we haven't finished defining the transition system, so ≈ is not yet defined without an interpretation.Given an interpretation m, we have a transition system so ≈ exists, but it is mostly restricted by m.If m is a CRN bisimulation, then adopting the convention that m(S) = S for formal states S ∈ N S , S ≈ T ⇐⇒ m(S) ≈ m(T ) for any (formal or implementation) states S, T , where the right side can use the definition of ≈ on the formal CRN.That is, "the bisimulation" is just m together with any degeneracy in the formal CRN.

Handling spurious catalysts
The definition of CRN bisimulation as stated previously has difficulty handling overly detailed enumerations of DNA strand displacement circuits, but a simple extension of bisimulation fixes that problem.As an example, consider the implementation of the reaction A + B → C + D according to the variant of the scheme by Soloveichik et al. [14] discussed in Section 3. Fig. 15 shows a spurious reaction possible in that scheme: a toehold on the "trigger strand" (what would be t C D if released) in the complex i A binds to the exposed complementary toehold in another copy of i A .This spurious binding has no meaningful effect on the DSD system's function: no strand displacement reactions are possible given that binding that should not be possible, and since the binding is reversible it can fall off before any reaction that it would otherwise interfere with.In particular, an analog of the i A + x B → t C D + w 1 reaction can still happen in this complex, producing a t C D strand with a spurious binding to an i A complex (and a normal w 1 waste complex).However, when analyzing the system with bisimulation, we need to interpret each implementation species, including this complex and the result of the reaction.In order for the binding and unbinding reactions to be trivial, we must interpret the i A : i A complex as 2 A and the t C D : i A complex as C + D + A, and they must be trivial in order to satisfy the delimiting condition.Then the reaction i A : i A + x B → t C D : i A + w 1 is interpreted as 2 A + B → A + C + D. This is neither trivial nor is it a formal reaction, so by bisimulation as so far defined, the delimiting condition is violated.However, it is "clearly" the reaction A + B → C + D with a "spurious catalyst" A on the side, and we would like a definition of bisimulation that can confirm this.
Recall that in Definition 3.1 we defined three related but distinct interpretations: m : S → N S an interpretation of implementation species; m : N S → N S an interpretation of implementation states; and m : N S × N S → (N S × N S ) ∪ {τ } an interpretation of implementation reactions.At the time, we said that the interpretation of species m : S → N S was arbitrary, but the other two were defined unambiguously in terms of that m.While we still want to keep the interpretation of states as the sum of the interpretations of species, we can loosen the definition of the interpretation of a reaction to allow reactions like i A : i A + x B → t C D : i A + w 1 to be interpreted as A + B → C + D as intended.Theorem 5.1.Given a formal and implementation CRN (S, R) and (S , R ) with interpretation (m, m ρ ) where m ρ is consistent with m, the problem of checking whether (m, m ρ ) is a CRN bisimulation is PSPACE-complete in general, and can be checked in polynomial space by the loopsearch algorithm as previously described.The graphsearch algorithm as previously described also correctly checks whether (m, m ρ ) is a CRN bisimulation, and when the number of reactants in any formal reaction in R is bounded by some k, the graphsearch algorithm runs in poly(n k ) time and space.
Proof.Both algorithms do not depend on the fact that m(R → P ) = m(R ) → m(P ) or τ , but only depend on being able to find m(r ) given r , which is still easy given m ρ .Similarly, assumptions made by the algorithm such as if S τ − → T then m(S ) = m(T ) still hold.Thus, the previous proof of correctness and complexity of the algorithms holds.Similarly, in the completeness proof in Theorem 4.3, the given interpretation with m ρ (R → P ) = m(R ) → m(P ) when m(R ) = m(P ) is of a different formal species.Then m(0 n ) is either a single formal species not equal to m(h), or is and m(h) = H .This rules out the possibilities that any of the x i 's or q m n are interpreted as H , leaving the only possibility that they are interpreted as some assignment of Q and the A i 's, with m(0 i ) = m(1 i ), m(q j i ) = m(q m n ) for j = m − 1, and m(q m−1 i ) = m(q m n ) + k<i m(0 k ).This interpretation satisfies the atomic and delimiting conditions, and satisfies the permissive condition if and only if the given Turing machine accepts x, thus deciding whether a correct interpretation exists is still PSPACE-complete.The proof of Theorem 4.6 applies to the new definition of interpretation without modification, proving that whether a correct interpretation exists is NP-complete when the number of reactants in a formal reaction is bounded by a constant k ≥ 1. 2

Discussion
Comparing Chemical Reaction Networks on different levels of abstraction is an important tool for systematic programming with CRNs.We showed how to adapt the concept of bisimulation to check whether one CRN is a correct implementation of another.We showed that bisimulation can be used to prove the correctness of some existing CRN implementations, and to identify subtle but real problems with others.We discussed transitivity and modularity, which can be used to simplify a bisimulation proof.We presented different algorithms to check bisimulation which are adapted to different cases.We showed that the condition can be checked in polynomial time with favorable assumptions, is NP-complete with less favorable assumptions, and is PSPACE-complete in the general case.
In the beginning, we mentioned a DNA implementation of the approximate majority CRN [7] that was experimentally demonstrated by Chen et al. [3].We might consider applying our bisimulation checker to this implementation.The implementation as presented in [3] would be incorrect according to bisimulation, for the same reason the example in Fig. 5 from Qian et al. [15] fails: outputs of an irreversible reaction are released before an irreversible step is taken, leading to a small probability of such a reaction reversing itself after the products have reacted downstream.Despite this, Chen et al.'s in vitro experimental demonstration showed no such problems.While there are a number of explanations for this observation, including that the formal approximate majority CRN is particularly resistant to error [7], it nonetheless raises the question of how serious are the potential errors that may occur in CRN implementations that are not correct according to bisimulation?The answer will depend on the specific formal CRN of interest, as well as the conditions under which it is run.For example, behavior that may be problematic with non-negligible probability in low molecular counts, may have negligible effect in high molecular counts typical of in vitro experiments.
Another observation we have is that for typical engineered CRN implementations, at least for DNA strand displacement implementations, either there is a problem in the implementation of one formal reaction; or there is a problem with crosstalk between formal reactions; or there is no problem, and correctness can be proven by the modularity condition.In the case of crosstalk, as we mentioned in Section 3.4, that problem needs to be detected by the reaction enumerator, and is beyond the scope of our bisimulation theory.In the implementation by Chen et al. [3], for example, there are three formal reactions, but the (technically) incorrect behavior can be detected by considering only one of them.In the implementation of the rock-paper-scissors oscillator by Srinivas et al. [4], they use a systematic translation method slightly modified from Soloveichik et al. [14].After confirming that their method applied to one reaction is correct, using Corollary 3.3 we can prove that such a method applied to any combination of reactions will be correct according to bisimulation.
The theory and algorithms discussed in this paper have been incorporated by Badelt et al. into the Nuskell compiler, a software package that automatically translates a CRN into a DNA strand displacement circuit and verifies that the result is correct [42].Nuskell currently contains the loopsearch and graphsearch algorithms for checking the permissive condition as well as an exhaustive search algorithm for the same, the reactionsearch algorithm for finding an interpretation, and the algorithms to check the modularity condition and find a modular interpretation when given a decomposition into modules of an implementation CRN.Badelt et al. use bisimulation to verify a number of translation schemes applied to the rock-paper-scissors oscillator [5,6,4], showing that bisimulation algorithms can be used to verify CRN implementations used in practice.
Algorithms such as the graphsearch algorithm and loopsearch algorithm scale better with the number of meaningful species than the number of null species, while engineered CRN implementations generally do not use loops that produce null species.Thus those algorithms will be faster than their worst-case limits in practical cases.For example, the graphsearch algorithm takes at most (2zn k + 1)n k = O (n 2k+1 ) cycles in theory, where n is the number of implementation species, k the largest number of reactants in a formal reaction, and z the number of implementation species with empty interpretation.When there are no null species (or when none can be produced in a loop, as in schemes such as [14]), this becomes at most n k cycles.
In CRN bisimulation, we require that every implementation species has an interpretation as a (possibly empty) multiset of formal species.In contrast, verification methods such as pathway decomposition [17] or serializability [18] both assume that each formal species is represented by one implementation species, while other implementation species are classified into fuels, wastes, and intermediates.Because of this, pathway decomposition and serializability compare formal reactions to implementation pathways which begin and end with (representations of) formal species, while in bisimulation an individual implementation reaction can be interpreted and compared to the formal CRN.An additional consequence, for pathway decomposition, is that correctness guarantees do not apply to implementation states that cannot be reached from initial states representing formal species, whereas bisimulation is more robust in that correctness is asserted in those cases as well.Furthermore, even in the permissive condition, bisimulation requires that there exist an implementation pathway which implements a given formal reaction, while pathway decomposition and serializability both require that all implementation pathways have properties which may be nontrivial to check.This locality is what allows us to prove the complexity results given, which we suspect are significantly lower complexity than methods that depend on implementation pathways.
However, the use of interpretations instead of pathways means that in some cases CRN bisimulation and pathway decomposition differ on which implementations they consider correct.Bisimulation can easily be adapted to situations where there is no clear single "canonical representation" of a given formal species, while pathway decomposition has difficulty.For example, the implementation in [15] of the reversible formal reaction A + B C + D by reversible implementation reactions {x A i A , i A + x B i C D , i C D x C + i D , i D x D }. Bisimulation considers this correct with the obvious interpretation, while pathway decomposition considers the ability to release x C then reverse without releasing x D to be an error.On the other hand, bisimulation has trouble handling implementation species with no well-defined interpretation.Shin et al. describe a "delayed choice" phenomenon where an implementation CRN commits to implementing one of two formal reactions before deciding which one, producing an intermediate that cannot be correctly interpreted as either of the reaction's products or their reactants; such implementations are generally considered incorrect according to bisimulation but pathway decomposition often considers them correct [17].They then propose a hybrid notion of correctness where an implementation CRN is considered correct if it is a correct implementation according to pathway decomposition of some intermediate CRN, and the intermediate CRN is a correct implementation of the formal CRN according to bisimulation [17].This notion considers correct any implementation that is correct according to either pathway decomposition or bisimulation, plus some others.One area this theory overlooks is the rates of reactions and the probabilities of reaching certain states.For example, in [14] Soloveichik et al. argue that the concentration of each intermediate is proportional to the product of that of the formal species which we would call its interpretation, and thus the reaction rates are approximately correct.Whether this can be generalized, and whether bisimulation can help this generalization, is an important open question.

Fig. 2 .Definition 3 . 1 .
Fig. 2. Interpretation of the implementation CRN in Fig. 1. m(t C D ) = A + B would also be a valid interpretation for this CRN.Definition 3.1.An interpretation is a function m : S → N S from implementation species to multisets of formal species.We extend this linearly from species to states: m( In the following notation, S , T , S , and T refer to implementation states; S and T to formal states; r to an implementation reaction; and r to a reaction in the language of the formal CRN or τ .When a formal reaction r takes state S to state T , we write S r − → T ; S r − → T is similar.Note that if S r − → T , then r = (R, P ) ∈ R as well as S − R + P = T , and analogously for the implementation.Further, we write S r − → T when S r − → T for some r with m(r ) = r, which does not require r ∈ R (but does require r ∈ R ).Note that if S τ − → T then m(S ) = m(T ).One of the core concepts of weak bisimulation is the behavior of the system when we abstract away from trivial reactions.To discuss this behavior, we introduce the relation S r = ⇒ T : we say S τ = ⇒ T when S can reach T via 0 or more trivial reactions, and S r = ⇒ T when S τ and S τ = ⇒ S are always true. S r = ⇒ T for r = τ is again defined as S r = ⇒ T for some r with m(r ) = r.S r = ⇒ T for r = τ is defined but trivial: S r = ⇒ T ⇐⇒ S r − → T .When the final state is irrelevant, we sometimes write S r = ⇒, etc., as appropriate.We say an implementation state S can reach an implementation reaction r when S r = ⇒, and we say S can implement a formal reaction r when S r = ⇒.(These definitions are based on notation used by Milner in

(
ii) Delimiting condition: The interpretation of any implementation reaction is either trivial or a valid formal reaction.(iii) Permissive condition: If S r − → and m(S ) = S, there exists an implementation reaction r such that m(r ) = r and S r = ⇒.III Weak bisimulation (i) For all implementation states S , if S r − → T , then S r = ⇒ T where S = m(S ) and T = m(T ).(ii) For all formal states S, there exists S with m(S ) = S, and for all such S , if S r − → T , then for some T , S r = ⇒ T and m(T ) = T .
satisfying the permissive condition.Given the three conditions, we prove weak bisimulation.Given any S with m(S ) = S and S r − → T where r = (R , P ), by the delimiting condition either m(r ) = τ is trivial or m(r ) = r = (R, P ) ∈ R. If trivial, then m(T ) = m(S ) = S and S τ = ⇒ S is true by convention.If nontrivial, then r ∈ R; since S r − → we must have S ≥ R , thus m(S ) ≥ m(R ) = R, and S r − → T (therefore S r = ⇒ T ) where T = S − R + P .Since T = S − R + P , m(T ) = m(S ) − m(R ) + m(P ) = T , satisfying condition III.(i) of weak bisimulation.Given any S, we find an S with m(S ) = S by taking S = A α A x A , where S = A α A A and m(x A ) = {| A|} must exist by the atomic condition.Given any such S with S r − → T where r = (R, P ), by the permissive condition there is some r with m(r ) = r and S r = ⇒, which is an abbreviation for ∃ T S r = ⇒ T , which is further an abbreviation for ∃ S S τ = ⇒ S r − → T .(Strictly speaking S r = ⇒ T means there is some S τ , but since we are choosing an arbitrary T we can take T = T .)Then m(S ) = m(S ) = S since they are connected by trivial reactions, and where r = (R , P ) with m(R ) = R and m(P ) = P we have T = S − R + P so m(T ) = S − R + P = T , satisfying condition III.(ii) of weak bisimulation.
by linearity of m.From bisimulation, since each S k−1 r k − → S k we have either r k = τ and S k−1 = S k , or r = τ and S k−1 r k − → S k , since for r = τ in the formal CRN S r = ⇒ T ⇐⇒ S r − → T .The interpretation of that implementation trajectory is exactly S 0 followed by those reactions S k−1 r k − → S k for which r k = τ , and thus the interpretation is a formal trajectory.Conversely, given S 0 with S 0 = m(S 0 ) and any formal trajectory (S 0 , r 1 , . . ., r k , . . . we construct an implementation trajectory whose interpretation is that formal trajectory.Given S 0 , define inductively S k and r k to be an implementation state and reaction such that S k−1r k = ⇒ S k with m(r k ) =r k and m(S k ) = S k , which exists by condition III.(ii) of weak bisimulation.Expanding each r k

Fig. 5 .
Fig. 5.The translation scheme from [15], when used as a general CRN implementation, violates the delimiting condition.Species named f are fuels.Top: DNA Strand Displacement diagram of the reversible reaction x B + i A:BC D i AB:C D + f + B with interpretation of involved species given.With the closest possible interpretation (shown), the reverse reaction (highlighted) is interpreted as C + D → A + B, which violates the delimiting condition when the only formal reaction is A + B → C + D. Bottom: List of enumerated reactions of the full DSD system for A + B → C + D, before removing fuels, with the violating reaction x B + i A:BC D i AB:C D + f + B highlighted.Although only one candidate interpretation is shown to be invalid in this figure, any other interpretation either has the same problem or is invalid for other reasons.

Fig. 6 .
Fig. 6.An example implementation CRN with multiple possible correct interpretations as different formal CRNs.Here each circle represents one implementation species, with the name of the species above the bar and its interpretation below the bar; a "?" means the interpretation is not yet specified.A. The implementation CRN with constraints on the interpretation of 4 of its species.B., C. Two correct interpretations as two distinct formal CRNs.D. An interpretation not correct for any formal CRN.For B., C., and D., species in the same colored circle have the same interpretation.

= ⇒. 2 Lemma 3 . 2 (
which by the delimiting condition for m 1 is either a valid formal reaction or trivial.For any formal state S and reaction r with S r − → and any implementation state S with m(S ) = S, that means S = m 2 (S ) is an intermediate state with m 1 (S ) = S.By the permissive condition on m 1 , there is some r with m 1 (r ) = r and S r = ⇒.Using the permissive condition on m 2 and the argument used in Theorem 3.1 to show that the permissive condition implies trajectory equivalence, there is a sequence of implementation reactions starting from S which implements the intermediate trajectory by which S r = ⇒.This means that one of those reactions r has m 2 (r ) = r , some of them interpret via m 2 to various intermediate reactions in that pathway which are trivial under m 1 , and the rest of which are trivial under m 2 .An implementation reaction trivial under m 2 is trivial under m, as is a reaction which interprets under m 2 to an intermediate reaction trivial under m 1 , thus all reactions in this pathway except r are trivial under m, so when viewed under m, S r Partial order).The following relation is a partial order: (S , R ) (S, R) if there exists an m : S → N S which satisfies the atomic, delimiting, and permissive conditions with equality defined as (S , R ) ≡ (S, R) if there exists a bijection n : S → S such that (n(S ), n(R )) = (S, R) where n is extended naturally to sets and reactions.Proof.A partial order must be transitive, reflexive, anti-symmetric.Transitivity (if a ≤ b and b ≤ c then a ≤ c) follows immediately from Lemma 3.1.Reflexivity (a ≤ a) is obvious by letting m be the identity function.It remains to show antisymmetry (if a ≤ b and b ≤ a, then a = b), i.e. that given (S 1 , R 1 ) and (S 2 , R 2 ) with m 1 : S 1 → N S 2 and m 2 : S 2 → N S 1 that each satisfy the atomic, delimiting, and permissive conditions, (S 1 , R 1 ) and (S 2 , R 2 ) are identical up to a change of species names.The atomic condition implies that |S 1 | ≤ |S 2 | and |S 2 | ≤ |S 1 |, thus the numbers of species are equal and in particular m 1 is a bijection from species in S 1 to sets of exactly one species in S 2 (and the same is true for m 2 ).To simplify notation, we let n(x) = y if m 1 (x) = {| y|}; n must be a bijection from S 1 to S 2 .(If the CRN has sufficient symmetry, it is not necessarily true that m 2 (n(x)) = {|x|}, for example if both CRNs are {A → C , B → C } we could have m 2 (n(A)) = {|B|}.)

Fig. 7 .
Fig. 7.An implementation CRN that satisfies the modularity condition.Circles represent implementation species with interpretation given below the line, and boxes with arrows represent reactions, with reactants on one side of the box and products on the other; boxes where arrows point both ways are reversible reactions.Here the top CRN (S 1 , R 1 ) is a correct implementation of the formal reaction A + B → C + D, and the bottom CRN (S 2 , R 2 ) a correct implementation of C + A → B + D. Green arrows (which may appear as light gray in a black-and-white print) indicate reactions used to satisfy the modularity condition in Definition 3.3; for example, i 1:2

Fig. 8 .
Fig. 8. Example set of minimal implementation states for the reaction A + B → C .Large circles represent minimal states; circles within minimal states represent implementation species, with their interpretation in formal species given below the line.Assuming the 7 implementation species shown are the only species whose interpretation contains either an A or a B, any implementation state whose interpretation contains A + B must contain one of the 10 minimal states shown.The set of minimal states does not depend on the implementation reactions, so no reactions are shown.

Lemma 4 . 1 . 2 Lemma 4 . 2 .
Let (S, R) and (S , R ) be a formal and implementation CRN with interpretation m.The interpretation m satisfies the permissive condition if and only if, for every formal reaction r and every S ∈ M(r), S r = ⇒.Proof.Clearly, if the permissive condition is satisfied, then for any formal reaction r and implementation state S ∈ M(r), m(S ) r − → so by the permissive condition, S r = ⇒.For the converse, assume for every r ∈ R and S ∈ M(r) that S r = ⇒, and consider an arbitrary formal reaction r = (R, P ) and implementation state S with m(S ) r − →, i.e. m(S ) ≥ R.There is at least one minimal S ∈ M(r) with S ≥ S , and by assumption S r = ⇒.Since S ≥ S , the same sequence of reactions can occur in S , thus S r = ⇒.Since this is true for every r and S , the permissive condition is satisfied.Let (S, R) and (S , R ) be a formal and implementation CRN with interpretation m, and r = (R, P ) ∈ R a formal reaction.Let n = |S | be the number of implementation species and k = |R| the number of reactants of r.

Fig. 9 .
Fig. 9. Minimal state S 0 = {|x A , x B |} can implement A + B → C by producing and then consuming a null species z.Because z must be produced from x B , minimal state S 1 = {|x A , y B |} cannot implement A + B → C .

Fig. 10 .
Fig.10.Example graphs of minimal states.We draw an arrow from a state S 1 to a state S 2 if there is a trivial reaction that can occur in S 1 (plus some null

S 1 r
= ⇒ and therefore S 0 r = ⇒ by reaching S and then following the same path by which S 1 r = ⇒, or S 1 r = ⇒ and the permissive condition is false regardless of whether S 0 r = ⇒.Since m(S 1 ) < m(S 0 ), the path by which S 1 r = ⇒ cannot pass through S 0 , so reducing the question of S 0 r = ⇒ to S 1 r

Fig. 11 .
Fig. 11.Example of the loopsearch graph for the formal reaction A + B → C with implementation CRN from Fig. 10B.The graph of minimal states from Fig.10Bis reproduced at bottom right, with each minimal state given a name S i .In the loopsearch graph, initial vertices (S i , S i , 0) are filled in green (which may appear as light gray in a black-and-white print), and terminal vertices represented by doubled circles.Vertices and edges not reachable from any initial vertex are grayed out, as they are not relevant to the theory or algorithms that follow.The permissive condition is true for A + B → C if and only if for each initial vertex, there is a path in the loopsearch graph to some terminal vertex.One such path is given by the numbered vertices 0 through 4 from initial vertex ({| y A + x B |} , {| y A + x B |} , 0) to terminal vertex ({| y A + y B |} , {| y A + y B |} , ∞z); observe that each of the other four initial vertices can also reach a terminal vertex, so the permissive condition is true for this interpretation.
bottom right to the remainder of Fig. 11.Vertices of the form (S , S , ζ ) for fixed ζ with ζ −1 (1) = ∅, with edges from "Reactions outside a loop" in Definition 4.2, have exactly the structure of the graph of minimal states, with "greyed" edges (representing reactions that have null species as reactants, see Fig. 10B, Fig. 11 bottom right) present or absent depending on whether the null species z that are reactants of the corresponding reaction have ζ(z) = ∞.Vertices of the form (S , S 0 , ζ ) for fixed S 0 and ζ , with ζ 1 = ζ , each new ζ 2 equaling ζ 1 except that any null species z produced in the corresponding reaction with ζ 1 (z) = 0 has ζ 2 (z) = 1 ("Reaction inside a loop" in Definition 4.2), and the last ζ 2 = ζ has ζ (z) = ∞ if any ζ (z) = ∞ or 1 or if z was produced by the last reaction (ζ (z) = 0 otherwise), ending in the state (S 2 , S 2 , ζ ) ("Finishing a loop").Since at least one such z a special case of this proof, and since by Definition 4.2 that (S f , S f , ζ f ) is a terminal state means S f + μZ f r − → for some μ ≥ 0, this proves that S 0 r = ⇒.If such paths exist for every formal reaction r and minimal state S 0 for r, then every minimal state S 0 r = ⇒.Thus as discussed in Lemma 4.1 every state with m(S ) r − → has S r = ⇒, satisfying the permissive condition.2 (i) If S i + ∞Y i τ − → S j and S j r = ⇒, then S i r = ⇒.Otherwise, (ii) If S i + ∞Y i τ − → S j and S j τ = ⇒ S k , then S i τ = ⇒ S k .(iii) If S i + ∞Y i τ − → S j + y and S j τ = ⇒ S i , then S i τ = ⇒ S i + y. (iv) If S i + ∞Y i τ − → S j and S j τ = ⇒ S j + y and S j τ = ⇒ S i , then S i τ = ⇒ S i + y.

Fig.
Fig.12.An implementation CRN with a correct interpretation if and only if the corresponding space-bounded Turing machine accepts.The formal CRN has one species Q corresponding to the Turing machine head and one species A i for each ith tape square; if all are present, they can react.The implementation

Fig. 13 .
Fig. 13.A pictorial illustration of the search tree explored by the reactionsearch algorithm for the given pair of formal and implementation CRNs.Doublelined boxes indicate the new constraints on the partial interpretation at each node of the tree, where x ≡ A is shorthand for m(x) = A. Rounded boxes indicate the new constraints on the interpretation of reactions, where r ≡ r is shorthand for requiring that m(r ) = r.The dashed box indicates the Diophantine equation set up by the trivial reaction solver upon the first execution where it can successfully find a solution.The blue dotted boxes illustrate the table of which implementation reactions may be interpreted as which formal reactions, for the given node in the tree.

Fig. 14 .
Fig. 14.Example formal and implementation CRN corresponding to an instance of the 3-SAT problem.Top left: example 3-SAT instance.Top middle: the formal CRN.Eventual interpretations of the corresponding implementation reactions are given in parentheses.Top right: the implementation CRN corresponding to this 3-SAT instance.Each 3-SAT clause has a corresponding implementation reaction, with auxiliary implementation reactions added.Bottom left: a satisfying assignment for the 3-SAT formula.Bottom right: a valid CRN bisimulation interpretation for the two CRNs, which the reactionsearch algorithm would find.Note the correspondence between satisfying assignment and interpretation: if x i is true then m(x t i ) = T and m(x f i ) = ∅, otherwise (i) If x + ∞Z x τ − → Y where every species in Y is finished, then x is finished.(ii) If x + ∞Z x τ − → x 1 + X where ∅ < m(x 1 ) < m(x), then x is finished.(iii) If x + ∞Z x τ − → x + Z where m(x ) = m(x) (implying m(Z ) = ∅),then x can reach x and x can reach any species that x can reach.(iv) If x + ∞Z x τ − → x + Z where m(x ) = m(x) and x can reach x, then x can produce any species z ∈ (Z ∪ Z x ).
x τ = ⇒ ∅ + x, as sequences of 0 reactions fulfill x τ = ⇒ Y + Z for Y ⊂ S 0 and m(Z ) ∩ S 0 = ∅.The deductions in general follow from that, if x τ = ⇒ x + Z x and x + ∞Z x τ − → S , then x τ = ⇒ S .The only non-obvious one is that if x τ = ⇒ Y where every species in Y is finished, then x is finished; this follows from induction on the order in which the algorithm marks species as finished: if every species x marked as finished before x satisfies one of the two x τ = ⇒ . . .conditions in the definition of finished, and x τ = ⇒ Y made up of only those species, then x satisfies one of the two conditions.Then we need to prove that if every x ∈ S is finished then every x τ = ⇒ Y + Z as desired: the proof is by induction on |m(x)|, proving that if x τ = ⇒ Y + Z for every x with m(x ) < m(x) and x is finished then x τ = ⇒ Y + Z .Recall that "x is finished" is defined as, either x τ = ⇒ Y + Z for Y ⊂ S 0 and m(Z ) ∩ S 0 = ∅, or x τ

Q 1 and 3 . 2 (
for some P , P α = ⇒ P and P ↔ Q where, as in Section 3.1, for α = τ .(Our notation is slightly different from Milner's: we use α = ⇒ to mean the same thing as Milner's α = ⇒.)Note the similarity between Definitions 5.III).
which is formal, while x A i A and t C D → x C + x D + w 2 are trivial.The permissive condition says that for every formal reaction and for every implementation state in which that reaction should be able to happen, it can.There is one formal reaction, A + B → C + D, and any state in which it should be able to happen must contain an x B and either an x A or i A , since those are the only species whose interpretations contain either B and/or A. If the state contains x B and i A , then the reaction i A + x B → t C D + w 1 can happen and satisfies the permissive condition.If the state contains x B and x A , then the trivial reaction x A → i A followed by i A + x B → t C D + w 1 satisfies the permissive condition.
, for example with S 0 = {|x A B |} and S 1 = {|x A + x B |}.All minimal states in that CRN can in fact implement A + B → C , but doing so for e.g.{|x A + x B |} requires a "loop" through {| y A + x B |} and {|x A B |} to {|x A + x B + 2z|}, eventually producing enough z for the reactions x B + 3z → y B and x A + y B + 2z → x C , which is interpreted as A + B → C .Since any state with z is a non-minimal state, that path cannot be found by searching through only the graph of minimal states.
1 , S 1 , ζ ) where ζ −1 (1) = ∅, there is a path either to a terminal state (S 2 , S 2 , ζ ) for the same ζ , or to another such state (S 2 , S 2 , ζ ) where 1 , S 1 , ζ ) to (S 2 , S 2 , ζ ).For each reaction after the first state ≥ S 2 , there is an edge from (S 1 , S 2 , ζ 1 ) to (S 2 , S 2 , ζ 2 ), with the first To search for a path of length 2 0 = 1, we check each possible edge (trivial reaction) to see if the start and target state are connected; to search for a path of length 2 i+1 , we check for each possible intermediate state, whether a path of length 2 i exists from the start to the intermediate, and whether a path of length 2 i exists from the intermediate to the target.For a segment of the first type, the possible intermediate states are This condition, which reduces the number of ζ to check, relies on a monotonicity of ζ(z) that follows from the types of edges defined in Definition 4.2.
[32] , S i , ζ i ) to (S i+1 , S i+1 , ζ i ) through only states of the form (S , S , ζ i ), and the other type from (S i , S i , ζ i−1 ) to (S i , S i , ζ i ) through only states of the form (S , S i , ζ ), the same decomposition discussed in the proof of Lemma 4.3.(In the preliminary version of this paper we called such decomposition a loop-segmented path[32].)Bothtypes of segments can have length no longer than |M(r)|; for the first type, this is obvious, while for the second, we rely on the proof of Lemma 4.3 to say that a decomposition exists where no segment covers the same minimal state twice.just(S , S , ζ i ) for S ∈ M(r), while for a segment of the second type, the possible intermediate states are (S , S i , ζ ) where S ∈ M(r) and for all z ∈ Z,ζ i−1 (z) ≤ ζ(z) ≤ ζ i (z).

Algorithm 1 :
The loopsearch algorithm to check the permissive condition in polynomial space.