Existence of a unique quasi-stationary distribution in stochastic reaction networks

In the setting of stochastic dynamical systems that eventually go extinct, the quasi-stationary distributions are useful to understand the long-term behavior of a system before evanescence. For a broad class of applicable continuous-time Markov processes on countably inﬁnite state spaces, known as reaction networks, we introduce the inferred notion of absorbing and endorsed sets, and obtain sufﬁcient conditions for the existence and uniqueness of a quasi-stationary distribution within each such endorsed set. In particular, we obtain sufﬁcient conditions for the existence of a globally attracting quasi-stationary distribution in the space of probability measures on the set of endorsed states. Furthermore, under these conditions, the convergence from any initial distribution to the quasi-stationary distribution is exponential in the total variation norm.


Introduction
We may think of reaction networks in generality as a natural framework for representing systems of transformational interactions of entities [48].The set of entities (species) may in principle be of any nature, and specifying not just which ones interact (stoichiometry and reactions) but also quantifying how frequent they interact (kinetics), we obtain the dynamical system of a reaction network.Examples abound in biochemistry, where the language originated, however the true power of this approach is the ability to model diverse processes such as found in biological [4,6], medical [3], social [14], computational [12], economical [49], ecological [43] or epidemiological [36] contexts.
Whether the universe is inherently deterministic or stochastic in nature, the lack of complete information in complex systems inevitably introduces some degree of stochasticity.Thus, a stochastic description is not an alternative to the deterministic approach, but a more complete one [41].Indeed, the deterministic model solution is an approximation of the solution for the stochastic model, improving with the system size, and in general only remaining valid on finite time intervals [28].Thus, the long-term behavior of a given reaction network may depend crucially on whether it is modeled deterministically or stochastically [22].In particular, the possibility of extinction, which is a widely occurring phenomenon in nature, may sometimes only be captured by the latter [25].As 1 arXiv:1808.06833v1[math.PR] 21 Aug 2018 a consequence, the counterpart to a stable stationary solution in the deterministically modeled system is not generally a stationary distribution of the corresponding stochastic model.Instead, a so-called quasi-stationary distribution, which is a stationary measure when conditioned on the process not going extinct, has shown to be the natural object of study.A concise overview of the history and current state of this field can be found in [46], while [40] contains a comprehensive bibliography on quasi-stationary distributions and related work.
From a modeling standpoint, when the copy-numbers of interacting entities are low and reaction rates are slow, it is important to recognize that the individual reaction steps occur discretely and are separated by time intervals of random length [4].This is for example the case at the cellular level [16], where stochastic effects resulting from these small numbers may be physiologically significant [12].Furthermore, stochastic variations inherent to the system may in general be beneficial for identifying system parameters [35].The quasi-stationary distribution possesses several desirable properties in this domain.Most importantly, if the system under study has been running for a long time, and if the only available knowledge about the system is that it has not reached extinction, then we can conclude that the quasi-stationary distribution, if it exists and is unique, is the likely distribution of the state variable [36].Consider a right-continuous time-homogenous Markov process (X t : t ≥ 0) [42], that evolves in a domain D ⊆ R d , wherein there is a set of absorbing states, a "trap", A ⊂ D. The process is absorbed, also referred to as being killed, when it hits the set of absorbing states, implying X t ∈ A for all t ≥ τ A , where τ A = inf{t ≥ 0 : X t ∈ A} is the hitting time of A. As we are interested in the process before reaching A, there is no loss of generality in assuming X t = X t∧τ A .We refer to the complement, E := D\A, as the set of endorsed states.For any probability distribution, µ, on E, we let P µ and E µ be the probability and expectation respectively, associated with the process (X t : t ≥ 0), initially distributed with respect to µ.For any x ∈ E, we let P x = P δx and E x = E δx .Under suitable conditions, the process hits the absorbing set almost surely (a.s.), that is P x (τ A < ∞) = 1 for all x ∈ E, and we investigate the behavior of the process before being absorbed [11].
Definition 1.1.A probability measure ν on E is called a quasi-stationary distribution (QSD) for the process (X t : t ≥ 0) absorbed at A, if for every measurable set B ⊆ E or equivalently, if there exists a probability measure µ on E such that in which case we also say that ν is a quasi-limiting distribution.
We refer to [30] for a proof of the equivalence of quasi-limiting and quasi-stationary distributions.Existence and uniqueness of a QSD on a finite state space is well known [11,Chapter 3], and it is given by the normalized left Perron-Frobenius eigenvector of the transition rates matrix restricted to E. For the infinite dimensional case, most work has been carried out for birth-death processes in one dimension [46], where classification results yielding information about the set of QSDs exist [44].
In the present paper, we will focus on a special case of multidimensional processes on countable infinite state spaces which can be viewed as reaction networks.We will prove as the main result in Theorem 5.1 and Corollary 5.2 sufficient conditions for the existence of a unique globally attracting QSD in the space of probability distributions on E, equipped with the total variation norm, • T V .Recall that this norm may be defined as [39] Thus, informally, the metric associated to this norm is the largest possible difference between the probabilities that two probability distributions can assign to the same event.
Our result is based on the following recent result [7, Theorem 2.1].

Theorem 1.2. The following are equivalent
• There exists a probability measure ν on E and two constants C, γ > 0 such that, for all initial distributions µ on E, • There exists a probability measure ν on E such that (A1) there exists t 0 , c 1 > 0 such that for all x ∈ E, (A2) there exists c 2 > 0 such that for all x ∈ E and t ≥ 0, Now, using Foster-Lyapunov theory [31,32], a series of assumptions on the process (X t : t ≥ 0) has been shown to be sufficient for (A1) and (A2) to hold [8].This approach has been applied to a particular case of multidimensional birth-death processes, giving sufficient conditions, in terms of the parameters of the process, for the existence and uniqueness of a QSD.Here, we extend this result, not just to a larger set of parameter values in the birth-death process case, but to the much broader class of stochastic processes known as stochastic reaction networks.
The outset of the paper is as follows.In section 2, we introduce the setup and notation of reaction network theory, and define the central inferred notions of endorsed and absorbing states for this class of processes.Section 3 contains the terminology and main assumptions that we shall use throughout the paper.We then move on in section 4, to prove that the processes associated with stochastic reaction networks do indeed satisfy all the required assumptions made by [8,Corollary 2.8].Section 5 contains the main result, Theorem 5.1.Finally, we give some examples in section 6, illustrating the applicability of the results.A reaction network is a triple N = (S, C, R), where S is a finite ordered set of species1 , C is a finite set of complexes, consisting of linear combinations over N 0 of the species, and R ⊂ C × C is an irreflexive relation on C, referred to as the set of reactions [2,18,21].Furthermore, R is assumed to be ordered.

Reaction Network Setup
We define the dimension of the reaction network, d = |S|.Any species S i ∈ S can be identified with the unit vector e i ∈ N d 0 , thus any complex y ∈ C can be identified with a vector in N d 0 .It is customary to denote an element (y k , y k ) ∈ R by y k → y k ∈ R in which case we refer to y k as the source complex and to y k as the product complex of reaction k.We may thus write R = {y k → y k : k = 1, . . ., r}.Employing a standard, although slight abuse of, notation, we identify S = {S 1 , . . ., S d } with the set {1, . . ., d} and R with {1, . . ., r}.We write the k'th reaction with the notation where y ki = (y k ) i and y ki = (y k ) i are the stoichiometric coefficients associated with the source and product complexes of reaction k, respectively.Define the reaction vectors ξ k = y k − y k and the stoichiometric matrix The order of reaction k is the sum of the stoichiometric coefficients of the source complex, i∈S y ki .Finally, we define the maximum of a vector over the set R, x = max k∈R y k , as the entry-wise maximum, x i = max k∈R y ki .
A set of reactions R induces a set of complexes and a set of species, namely the complexes and species that appear in the reactions.We will assume that a reaction network is always given in this way by R, and one may then completely describe a reaction network in terms of its reaction graph, whose nodes are the complexes and whose directed edges are the reactions.This concise description will be employed in the rest of the paper.To avoid trivialities, we assume R = ∅.

For each reaction we specify an intensity function λ
, which satisfies the stoichiometric admissibility condition: where we use the usual vector inequality notation; x ≥ y if x i ≥ y i for all i ∈ S. Thus, reactions are only allowed to take place whenever the copy-numbers of each species in the current state is at least as great as those of the corresponding source complex.A widely used example is stochastic mass action kinetics given by for some reaction rate constants α k > 0 [2].The idea is that the rate is proportional to the number of distinct subsets of the molecules present that can form the input of the reaction.It reflects the assumption that the system is well-stirred [2].Other examples include power law kinetics or generalized mass action kinetics [1,24,34].A particular choice of such rate functions constitute a stochastic kinetics λ = (λ 1 , . . ., λ r ) for the reaction network N , and the pair (N , λ) is referred to as a stochastic reaction system, or simply a reaction network with kinetics λ.
We may then specify the stochastic process (X t : t ≥ 0) on the state space D := N d 0 related to the reaction system (N , λ).Let X t be the vector in N d 0 whose entries are the species counts at time t.If reaction y k → y k occurs at time t, then the new state is X t = X t− + y k − y k = X t− + ξ k , where X t− denotes the previous state.The stochastic process then follows, where Y k are independent and identically distributed unit-rate Poisson processes [2,17,37].This stochastic equation is referred to as a random time change representation.We assume throughout the paper that the process is non-explosive, so that the process is well defined.Assumption 2, though, will imply non-explosiveness.

The State Space
To define the set of endorsed states and absorbing states in the setting of stochastic reaction networks, we recall some terminology from stochastic processes.We say that there is a path from x to y, denoted x → y, if there exists t ≥ 0 such that P x (X t = y) > 0.
We extend this notion to sets as follows; B 1 → B 2 if there exists x ∈ B 1 and y ∈ B 2 such that x → y.Finally, we introduce the region of large copy numbers, where all reactions may take place, defined as Any network satisfies R = ∅.Indeed, by the stoichiometric compatibility condition, x → R}, we may decompose the state space into a disjoint union A state space D is irreducible if for all x, y ∈ D we have P x (X t 1 = y) > 0 and P y (X t 2 = x) > 0 for some t 1 , t 2 > 0 [22].Thus, D is irreducible if for all x, y ∈ D there exists a path x → y.Irreducibility induces a class structure on the state space [37], and we denote the classes by I 1 , I 2 , . . .(potentially infinitely many).Let I denote the set of irreducible classes.Obviously, either Lemma 2.1.The pair (I , ), where is given by is a well defined poset.The irreflexive kernel (I , ≺) gives a well defined strict poset.
Proof.Since all elements I ∈ I are irreducible, there exists a path between any two points in I hence I I yielding the relation reflexive.Suppose I i I j and I j I i for some i, j ≥ 1.Let x ∈ I i and y ∈ I j be given.By assumption, we may find a path from x to some z 2 ∈ I j , and by irreducibility of I j there is a path from z 2 to y.Similarly, we may by assumption find a path from y to some z 1 ∈ I i and by irreducibility of I i a path from z 1 to x.As x, y were arbitrary, we conclude that there exists a path between any two points in I i ∪ I j hence I i = I j , yielding the relation antisymmetric.
Finally, suppose I k I j and I j I i for some i, j, k ≥ 1.Then there exists a path from some x ∈ I i to some z 1 ∈ I j and a path from some z 2 ∈ I j to some y ∈ I k .By irreducibility of I j there is a path from z 1 to z 2 , and concatenation of the three paths yield one from x to y.We conclude that I k I i , hence the relation is transitive.
A similar ordering has been considered in [45].However, their further analysis rests on the setting of discrete time, rendering the approach insufficient for stochastic reaction networks.To exploit the graphical structure induced by , define the marked directed acyclic graph D = (I , E ) as follows.The set of directed edges is while the marking I = I A I E is given by The corresponding state space is defined by As D E is non-empty, the existence of at least one endorsed set is guaranteed.By construction, the endorsed sets are disjoint, their union is D E and their number, N E , may in general be countable infinite.Furthermore, any absorbing set is confined to a subset of {x ∈ N d 0 | x M }, lying "close" to the boundary of N d 0 .If extinction is possible from an endorsed class E n then this absorption will take place in A n .Note, however, that the set I A may in general be empty, in which case no absorbing set exist.To further illuminate the structure of the endorsed sets, we provide the following classification result.Proof.Suppose first that rank Ξ < d.Let x ∈ D E .Then x ∈ E 1 , say and there exists a y ∈ D E such that y / ∈ (x + span R Ξ).In particular, y ∈ E 2 where E 1 = E 2 .This procedure can be repeated indefinitely, yielding infinitely many endorsed sets.Now, suppose rank Ξ = d.Then one may choose a linear combination of the reaction vectors yielding a strictly positive point, Let a = k∈R |a k | and a 0 = 0. Define the sequence (w ) =1,...,a by As the reaction vectors are finite, the partial sums P j = j =1 w are finite for each j ≤ a.Let m = min i∈S,j≤a Choosing each coordinate M i > |m| + max k∈R y k for each i ∈ S, it follows that any point x ∈ D E with x ≥ M satisfies x + P j ∈ R for any j ≤ a.We say that a sequence of states (x 1 , . . ., x n ) is an undirected walk from . As all reactions may occur in R, we conclude that x has an undirected walk to the point Thus, by definition, x and x belong to the same endorsed set, say E 1 .To determine the number of endorsed sets, let B = (b i ) denote the basis matrix of the free Z-module generated by the stoichiometric matrix Ξ, and define the lattice generated by Ξ to be By the construction in the previous paragraph, it follows that all points in the region {y ∈ E 1 : y ≥ M } belong to the same translated lattice x + L(Ξ).By assumption, span R Ξ = R d hence the lattice L(Ξ) has rank d.The number of ways one may translate a rank d lattice in R d to an integer lattice point without any points intersecting is given by considering the number of integer lattice points inside the fundamental parallelotope, Indeed, as P(Ξ) tiles R d , that is for any point z ∈ R d there exists a unique z ∈ L(Ξ) such that z ∈ z + P(Ξ), the problem is reduced to a single fundamental parallelotope, which by definition contains exactly one point from each translated lattice [13].Further, the number of integer lattice points inside P(Ξ) is exactly equal to the volume of the parallelotope [10, p. 97], hence, by finiteness of the reaction vectors, Finally, if span Z Ξ = Z d then L(Ξ) = Z d and the unit vectors e 1 , . . ., e d ∈ L(Ξ).As P(I d ) ∩ L(Ξ) = {0} we conclude from [13] that e 1 , . . ., e d is a basis for L(Ξ) hence N E (M ) = 1 as desired.
Note that, in particular, a reaction network whose associated stochastic process is a birth-death process, that is, a process where for each i = 1, . . ., d either e i ∈ R or −e i ∈ R, have a single endorsed set for x sufficiently large.In practice, one may find the endorsed sets by picking x ∈ R and adding states by a backtracking algorithm [38].Verification of span Z Ξ = Z d can be done by calculation of the Hermite normal form [38].
One may suspect that Proposition 2.3 could be strengthened to hold on the entire set D E .This is only partially true.Consider as an example the three-dimensional reaction network given by the reaction graph , and each singleton {(1, m, 1)}, m ≥ 1, constitutes its own endorsed set.Thus, in the generic picture, close to the absorbing set, there may be infinitely many endorsed sets.We do, however, have the following corollary.Proof.We only need to prove that if rank Ξ = d then there are finitely many endorsed sets.For this, it suffices to prove that at most finitely many x ∈ D E do not have an undirected walk (as introduced in the proof of Proposition 2.3) to a point z ≥ M .Indeed, by the proof of Proposition 2.3, if such a path exists, then by definition x belongs to one of finitely many endorsed sets.With the remaining set being finite, the total number of endorsed sets is therefore finite.Now, let x ∈ D E be given.By the definition of endorsed sets, there exists a path x → y with λ k (y) > 0 for all k ∈ R. As rank Ξ = d, for each 1 ≤ j ≤ d, there exists k(j) ∈ R such that e j , ξ k(j) = 0. Keeping the jth coordinate fixed and increasing the possible other if necessary, thus arriving at a point y with y j = y j and y i ≤ y i for i = j, by the stoichiometric compatibility condition we may by repeated use of reaction k(j) find a path y → z or z → y with z ≥ M .Repeating the argument for the possible remaining coordinate, we conclude that there exists an M ∈ N d such that if x ≮ M then x has an undirected walk to a point z ≥ M .As the set {x ∈ D E | x < M } is finite, this concludes the proof.
One may easily verify, that the second part of Proposition 2.3 can also be extended for d = 1.Indeed, for any point x ∈ D E there exists a reaction k ∈ R such that λ k (x) > 0 and either x + ξ k > x or x + ξ k < x.Otherwise x ∈ D A .Consequently, there is a point z ≥ M for any M ∈ D E such that either x → z or z → x.However, note that the network in Figure 1 shows that this result does not hold in the case d = 2.
If there is more than one irreducible class in E n , then we need that there is a smallest one to ensure uniqueness of a QSD.
Assumption 1.For a given endorsed class E n , n ≥ 1, we assume: We shall see that Assumption 1(i) is equivalent to a more technical property of the state space, which is necessary for our results to hold.Thus no generality is lost in having Assumption 1(i).
Networks without any minimal class exists, for example ∅ → S 1 , which does not have an absorbing set either.Furthermore, networks with more than one minimal class also exist, for example, S 1 + S 2 → ∅, S 2 → ∅.Thus Assumption 1(i) is indeed not superfluous.We believe Assumption 1(ii) is always met if Assumption 1(i) is.It ensures that one may always reach the absorbing set, if it is non-empty.Definition 2.2 accommodates the general case where uniqueness of a QSD does not necessarily hold, in which case the support of the QSD may stretch the entire endorsed set rather than, as we shall see, the unique minimal irreducible class.We remark that rather than investigating an entire endorsed set, one may be interested in a particular irreducible component, say I. Letting the state space be D = E A where the theory to be developed in this paper applies to this case as well.
As an illuminating example consider the generalized death process mS 1 → ∅ with m ∈ N, where each point in the state space D = N 0 constitutes its own irreducible class.Here, the endorsed and absorbing sets are E n = {n + pm − 1 | p ∈ N} and A n = {n − 1} respectively, for n = 1, . . ., m, and Assumption 1 is satisfied for all n.Thus, D E = {m, m + 1, . . .} and D A = {0, . . ., m − 1}.It is known that in the simple death case, m = 1, uniqueness does not hold on D E = N.Indeed, there is a continuum of QSDs with support larger than {1}, the unique minimal class [19].In the setting of birth-death processes in one dimension, which has an infinite state space, it is known that there will be either none, a unique or a continuum of QSDs [44].Consider the two reaction networks endowed with mass action kinetics.In both cases we conclude, according to Definition 2.2, that the set of absorbing states and the set of endorsed states are respectively.For the network on the left in (2), assuming α 1 > α 2 , there is a continuum of QSDs on E 1 , while for the network on the right, there is a unique QSD on E 1 , for all parameter values [30].This fits well with our result -the necessity of having reactions of order higher than one to ensure uniqueness permeates to higher dimensions.

Extension of Arguments
In this and the following sections, we shall simply use the notation E to refer to a single endorsed set, with corresponding non-empty absorbing set A, when there is no ambiguity.Further, as existence and uniqueness is known on finite state spaces, we shall assume without loss of generality that E is countably infinite.We make the following definitions inspired by [8] and [31].
Definition 3.1.For any vector v ∈ N d , we define a corresponding function v, • : given by the standard inner product This function may in general take negative values, however, when restricting v, • to E, one obtains a norm-like function, [31].Choosing v = (1, . . ., 1) we recover the function used in [8].In general, we shall choose v ∈ N d based on the particular reaction network at hand, and will in the following consider it fixed.For n ∈ N, define the sets respectively.Note that all of these are stopping times and might be infinite.As we will be concerned with the application of unbounded functions serving the purpose of a Lyapunov function, we introduce the weakened generator, L, for the Markov process [31,7].
Definition 3.2.A measurable function W : D → R belongs to the domain D(L) of the weakened generator L of (X t : t ≥ 0) if there exists a measurable function U : E → R such that, for all n ∈ N, t ≥ 0 and x ∈ E and and we define LW = U on E and LW ≡ 0 on A.
As the state space of interest is always countable, all functions f : D → R are measurable.Moreover, as the state space is discrete and O n is finite for all n ∈ N, all functions f : D → R are in the domain of the weakened generator, D(L) [31].In particular, E x W (X t∧Tn ) is well-defined and finite.Generally, the weakened and the infinitesimal generator need not agree, and the infinitesimal generator may not exist [31].
However, if f is bounded, then they do agree.In particular, E x |f (X t )| < ∞ and it follows that as t → 0, Hence E x f (X t ) is differentiable and from the fundamental theorem of calculus we conclude that the weakened generator coincides with the (weak) infinitesimal generator [31], Moreover, setting W (x) = v, x as in Definition 3.1, it follows from the Poisson characterization of the process (1), that Note that (3) is fulfilled as O n is finite.
All networks, for which extinction is possible, have the property that there exists a v ∈ N d such that v, ξ k ≤ 0 for some reaction k ∈ R. Indeed, suppose that v, ξ k > 0 for all v ∈ N d and k ∈ R. Then ξ k ∈ N d 0 for all k ∈ R, hence x + ξ k ≥ x for any x ∈ D. In particular, if x ∈ E then x + ξ k ∈ E and we conclude, from the observation that any absorbing set is confined to a subset of {x ∈ N d 0 | x M } for some M sufficiently large, that the process is not absorbed.By contraposition the desired claim holds.Note that for fixed n ∈ N, the set {x ∈ E, v, x = n} might be empty, thus we define and make the following central assumption.
Assumption 2. There exists v ∈ N d and η > 0, N ∈ N, such that, for n ≥ N , and, with the limit being taken over N ⊆ N, We note that, as d v (n) is always non-negative, this assumption assures that d v (n) is non-negative for n sufficiently large.We shall see that this assumption further ensures the ability to "come down from infinity" in finite time.In the case where there is no absorbing set, the empty sum in Definition 3.3 yields d v (n) = 0, and we may reformulate a result of [22] (see below).In this paper, we will extend the result to the case where E is not necessarily irreducible, but satisfies Assumption 1(i), and where there may exist a non-empty absorbing set of states (see Theorem 5.3).Theorem 3.4.For a reaction network satisfying Assumption 2, with A = ∅ and E irreducible, the associated stochastic process (X t : t ≥ 0) is exponentially ergodic and thus admits a unique stationary distribution π.Further, there exist constants C, γ > 0 such that, for all probability measures µ on E, Proof.For x ∈ D, let n = v, x .Thus, by Assumption 2(ii), for any constant c > 0, for n larger than some N ∈ N. The set of x ∈ D such that v, x = n ≤ N is compact, hence there is c 1 > 0 such that We conclude that for all x ∈ D, hence from [22,Proposition 4] it follows, due to irreducibility of E, that there exist constants C, γ > 0 such that for all x 0 ∈ E, for all t ≥ 0. Finally, if we consider the random starting point X 0 ∼ µ, we find as required.
It is sufficient to have η = 0 for Theorem 3.4 to hold.However, as we shall see, if A = ∅ then η > 0 is required.The intuitive meaning is that the quasi-stationary distribution exists on the long-time, but not infinite time horizon, where the process will be absorbed.Thus if the process does not "come down from infinity in finite time", that is if η = 0, starting close to A will almost surely result in absorption while starting at "infinity" will not, contradicting uniqueness of the QSD.When no absorbing set exist, however, the quasi-stationary distribution reduces to the stationary distribution which exists on the infinite time horizon.

Verifying Assumptions
We start by introducing some notation and definitions from [8] for ease of reference.Definition 4.1.A couple (V, ϕ) of measurable functions V and ϕ from D = E ∪ A to R is an admissible couple of functions if (i) V and ϕ are bounded and nonnegative on D, positive on E, satisfy V (x) = ϕ(x) = 0 for all x ∈ A, and further (ii) For all sequences (x p ) p≥1 in E such that {p ∈ N : and lim n→∞ V (X Tn ) = 0 P x -a.s. for all x ∈ E.
(iii) LV is bounded from above and Lϕ is bounded from below.
The definition of a couple of admissible functions in [8] further requires that V and ϕ belong to the domain of the weakened infinitesimal generator of (X t : t ≥ 0).However, since any function f : N d 0 → R is in this domain for discrete state spaces [31], the requirement is automatically satisfied.Furthermore, as V, ϕ are bounded, the infinitesimal generator L is defined hereon and agrees with the weakened generator L.
The question of extinction has recently attracted much attention on its own [26].Therefore, we provide the following proposition which renders an explicit criterion for when the stochastic process associated to a stochastic reaction network goes extinct almost surely.The assumption in the proposition is weaker than Assumption 2. Proposition 4.2.Under Assumption 1, with A = ∅, the process (X t : t ≥ 0) is absorbed for n sufficiently large, where Proof.Define the norm-like function W (x) = v, x on N d 0 as in Definition 3.1, and let L be the weakened infinitesimal generator of (X t : t ≥ 0).It follows from ( 4) and the assumption that n for n sufficiently large that for each x ∈ E with v, x = n, for n sufficiently large.In particular, there exists an N ∈ N such that for n = v, x ≥ N , we have LW (x) < 0, hence, setting M = max x∈E : 1≤ v,x ≤N {0, LW (x)}, yields x ∈ E.
Since O N is compact, we may apply [31, Theorem 3.1] to conclude that the process (X t : t ≥ 0) is non-evanescent, that is, Define the discrete time jump chain (Y n : n ∈ N 0 ) by Y n = X Jn , where J 0 , J 1 , . . .denote the jump times of (X t : t ≥ 0) given by By [15, Theorem 2.3] we get with y = Y 0 = X 0 = x, for any m ∈ N. Now, the complement of the event As B m is an increasing sequence of events in m, we obtain by monotone convergence and ( 7) that Thus, (Y n : n ∈ N 0 ) either tends to infinity or is eventually absorbed in A. The same holds for the full process (X t : t ≥ 0), and by ( 6) we conclude that P x (G) = 0 hence P x (τ A < ∞) = P x (X t ∈ A for some t) = 1.In particular, we also have that thus the process is regularly absorbed, by definition.Note that ζ in Proposition 4.2 may be negative thus Assumption 2(i) is stronger and immediately provides the same conclusion of almost sure absorption of the process.Further, as we shall see in the next proposition, Assumption 2(ii) assures that the expected magnitude of X t , in the form of v, X t given X 0 = x, is uniformly bounded in x ∈ E for any t > 0. This, in turn, implies that the time of "coming down from infinity" is finite, which is closely related to the uniqueness of QSDs.This is where η > 0 is required.Proposition 4.3.Under Assumptions 1-2 with A = ∅, the process (X t , t ≥ 0) satisfies in particular, the process is absorbed P x -a.s.Further, sup x∈E E x v, X t < ∞ for any t > 0.
Proof.By Assumption 2(i), it follows, applying the same notation as in Proposition 4.2, that for n sufficiently large and ζ as in the proposition.Thus by Proposition 4.2, the process (X t : t ≥ 0) satisfies P x -a.s. for all x ∈ E. Hence the process is regularly absorbed.
The second claim is apparently a 'classical result' [7] but we are not aware of a proof in the literature, hence we provide one here.Let W (x) = v, x on N d 0 as in Definition 3.1.It follows from (5) of Proposition 4.2 and Assumption 2(i) that with v, x = n, LW It follows, under Assumption 2(ii), that Since we have k∈R λ k (x) < ∞ for each x ∈ D, it follows from [2, p. 12] and the equivalence of the weakened and infinitesimal generators on W that is a martingale.Thus, by the martingale property we find which is a form of Dynkin's formula [27].Using the bound (8) combined with Jensen's inequality we obtain, upon differentiation of (9), Consider the associated differential equations By Petrovitsch' theorem [33, p. 316] applied twice and the fact that solutions to the ordinary differential equations are continuous in the initial value, we conclude that where g x (t) = lim →0 g x, (t).The solution to the initial value problem (10) cannot be given in explicit form, however, the associated simpler differential equation (11) does have an explicit solution for η > 0, given by In order to use this to bound f x (t), define the function Then, using ( 12), we have Since g x (0) = W (x) = k x (0) it follows that g x (t) ≤ k x (t) and we infer that for all fixed t > 0 as desired.(c) There exists constants , C > 0 and C ≥ 0, such that 1 j p with p > 1 which is a convergent hyperharmonic series.This proves (a).Thinking in terms of Riemann sums, using that j → j −β is decreasing in j for β > 1, we obtain the bound Exploiting the linearity of v, • , we consider, for x ∈ E with n = v, x , and L the weakened generator which coincides with the infinitesimal generator since ϕ is bounded, Using Assumption 2(i), we can choose β > 1 large enough such that Lϕ(x) ≥ 0 for all n = v, x sufficiently large.In particular, condition (b) is satisfied.
Similarly, as V is bounded, we find for x ∈ E with x, v = n, Note that by treating V (x) as a lower Riemann sum, for x ∈ E, and similarly, treating ϕ(x) as an upper Riemann sum, with the last inequality holding for v, x sufficiently large, using that for β > 1, We infer that 1) , ] .Note that by definition ϕ(x) > 0 for x ∈ E hence, choosing α = 1 + η/2 and = η/[2(β − 1)], we get Using Assumption 2(ii), the first term becomes negative for v, x sufficiently large.In other words LV (x) + V 1+ (x) ϕ (x) ≤ 0 for x / ∈ O n with n sufficiently large.Since by definition ϕ(x) ≥ 0, condition (c) holds.Lemma 4.6.Under Assumpstions 1-2, the pair (V, ϕ) is an admissible couple of functions.
Proof.Choose α, β ∈ R such that the conclusions of Lemma 4.5 hold.In particular, α, β > 1 hence the functions V and ϕ are bounded, non-negative on E ∪ A and positive on E, and by definition and Definition 4.1(1) is fulfilled.Let (x p ) p≥1 be any sequence in E such that the set {p ∈ N : Furthermore, since the process is regularly absorbed by Proposition 4.3, we have hence Definition 4.1(2) is fulfilled.Finally, from Lemma 4.5 and the fact that V and ϕ are both bounded functions, it follows that LV is bounded from above and Lϕ is bounded from below.This concludes the proof.

Lemmas
Lemma 4.7.Assumption 1(i) is equivalent to the following: There exists n 0 ∈ N, θ 0 , θ 1 , a 1 > 0 and a probability measure ν on E such that, for all x ∈ O n 0 and all s ∈ [θ 0 , θ 0 + θ 1 ], and in addition, for all n ≥ n 0 , there exists s n ≥ 0 such that Proof.We first prove that Assumption 1 implies the existence of such constants and probability measure.Let I min be the unique minimal irreducible class contained in E. Set Pick z ∈ I min ∩ O n 0 arbitrarily and let ν = δ z .Pick arbitrary θ 0 , θ 1 > 0 and let Note that for all n ∈ N the set O n is finite and any x ∈ O n ⊂ E has a path to I min and thus, by irreducibility, to z.By continuity of P x (X • = z), we conclude that a 1 > 0 and choosing s n = θ 0 > 0 for all n, the desired holds.
For the reverse direction, we first prove uniqueness of the minimal irreducible class in the endorsed set E. Suppose for contradiction that I i = I j are irreducible minimal classes in E. Let x 1 ∈ I i and x 2 ∈ I j .Then there is a path x 1 → y 1 and a path x 2 → y 2 with y 1 , y 2 ∈ O n 0 .Indeed, if I ∩ O n 0 = ∅ for l = i, j we may simply choose y = x , = 1, 2. Otherwise, x 1 , x 2 are in some O n 1 and O n 2 respectively with n 1 , n 2 > n 0 , hence by ( 14), there exist paths as described.By (13), there exist θ 0 , θ 1 , a 1 > 0 and a probability measure ν on E such that, for all y ∈ O n 0 and all s ∈ [θ 1 , θ 1 + θ 0 ], As ν is a probability measure on a countable space, there exists some z ∈ E such that ν({z}) > 0. But then P y (X s = z) > 0 for all y ∈ O n 0 .We conclude that there exist paths y 1 → z and y 2 → z.By minimality of I i and I j , we conclude that z ∈ I i ∩ I j which is a contradiction.This proves the uniqueness in Assumption 1(i).
We now prove existence.Suppose for contradiction that no minimal class exists.Then, for all I ⊆ E there exists J ⊆ E such that I → J .Repeating the argument results in an infinite path In the case where there are only finitely many irreducible classes in E, there must exist an i ∈ N such that J 1 → J i = J 1 which contradicts the lack of cycles in D.
Suppose therefore that there are infinitely many classes.Given n 0 , the set O n 0 is finite.Thus it intersects at most finitely many irreducible classes.We conclude that in the infinite path (15), there are infinitely many i ∈ N such that for some n i > n 0 .However, by ( 14) each of these classes have a path to O n 0 .This implies the existence of at least one irreducible class intersecting O n 0 which appears more than once in the infinite path (15).This creates a cycle in D which is a contradiction.
Lemma 4.8.Under Assumpstions 1-2, for all λ > 0, there exists n ≥ 1 such that Proof.It follows from Proposition 4.3 that sup x∈E E x v, X t < M for all t > 0 and some constant M > 0. Let τ n,A = τ n ∧ τ A .Then, for > 0 Suppose for contradiction that P x (τ n,A > ) does not converge uniformly in x to 0 as n → ∞.Then for any δ > 0, sup x∈E P x (τ n,A > ) > δ for infinitely many n ∈ N. Thus, choosing δ = , there exists a sequence (x i , n i ) i≥1 with v, x i > n i such that P x i (τ n i ,A > ) > and n i > n i−1 .But then, noting that P x (τ n−1,A > ) > P x (τ n,A > ) for all n ∈ N, we obtain Now, given t > 0 we can choose 0 < < 1 such that t = p for some p ∈ N. Applying the Markov property we get, for n ≥ N ( ), hence we conclude, as t < p, that Since e λ(τn∧τ A ) is non-negative, by choosing < e −λ < 1 we get holds as desired.
Lemma 4.9.For all n ≥ 0, there exists a constant C n such that, for all t ≥ 0, Further, with V from Definition 4.4, there exist constants r 0 , p 0 > 0 such that for n sufficiently large, Proof.Let n ≥ 0 be given.If O n is empty, then the statement is vacuously true for any C n .If O n is non-empty, then as sup x∈On P x (t < τ A ) ≤ 1 and inf x∈On P x (t < τ A ) > 0 since O n is finite, we may simply choose To see the second claim, note that letting we have for any r 0 > 0, for all x ∈ E, and the desired holds.

The Main Result
We are now ready to state and prove the main result of the paper.In the case where only a single endorsed set, E, is considered, we find that the unique QSD hereon is in fact globally attracting in the space of probability measures on E.
Theorem 5.1.A reaction network (N , λ) with associated stochastic process (X t : t ≥ 0) on D = E A, with A = ∅ and satisfying Assumption 1-2, admits a unique quasistationary distribution ν.Further, there exist constants C, γ > 0 such that, for all probability measures µ on E, Proof.Suppose for contradiction that supp ν ⊆ I min .Then there exists a point y ∈ I = I min for which ν({y}) > 0, where I is an irreducible class of E. As ν is globally attracting in P(E), the space of probability distributions on E, it follows by definition that for any µ ∈ P(E) and any measurable set B ⊆ E. In particular, letting µ = δ z with z ∈ I min yields, by minimality which is a contradiction.We conclude that supp ν ⊆ I min .In particular, there exists x ∈ I min such that ν({x }) > 0. Further, since I min is irreducible, P x (X t = y) > 0 for all x, y ∈ I min .As ν is a QSD, it follows from [11, p. 48] that for some θ > 0 and all t ≥ 0, y ∈ I min .Consequently, supp ν ⊇ I min which in turn implies supp ν = I min as desired.
We now examine the general holistic setting with state space D = D E ∪ D A , where D E consists of possibly several endorsed sets, each with or without a corresponding nonempty absorbing set.As one might in practice not have complete information about the starting-point of the process, one may in general not know exactly which endorsed set the process evolves in.However, one may have a qualitative guess in the form of an initial distribution, µ on D E .
The following theorem shows that we may consider the problem of finding a unique QSD on each endorsed set independently and then piecing these together to form a unique limiting measure up to a choice of the initial distribution, µ, on D E .

Examples
In this section, the main theorems and their applicability are illustrated through a series of examples.In particular, we show explicitly how the results of [8] are extended.Example 6.1.
As discussed in Section 2, the endorsed sets and corresponding absorbing sets are respectively.These endorsed sets are evidently not irreducible.However, {i + m − 1} is the unique minimal irreducible class in E i from which one may jump directly to A i , thus Assumption 1 is satisfied.Further, assuming mass-action kinetics, for n sufficiently large, hence we conclude by Theorem 5.3 that there exists a unique QSD on each E i if m ≥ 2. Further, in this case, for any initial distribution, µ, on the full set of endorsed states, D E = {m, m + 1, . . .}, the measure P µ (X t ∈ • | t < τ A ) tends to ν µ exponentially fast for t → ∞.
The Lotka-Volterra system describing competitive and predator-prey interactions has been of interest for approximately a century [29,51].In the stochastic description of the model, we find R = N 2 hence it follows that the state space can be divided into the endorsed and absorbing sets given by respectively.Using mass-action kinetics, it follows that for . Thus, choosing sufficiently large, d v (n) = O(n) and d v (n) > 0 for n sufficiently large, provided that α 2 > α 1 .Note also that max k∈R A v, ξ k < 0 hence by Proposition 4.2 the process is P x -a.s.absorbed for all x ∈ E if α 2 > α 1 .However, d v (n) = O(n 2 ), hence Assumption 2(i) is not satisfied, and one can not apply Theorem 5.1.
Using generalized mass-action [34] for the same standard Lotka-Volterra network, we may obtain a different result.Suppose for example that Choosing v = (2, 1), say, we find and likewise Thus Assumption 2 is satisfied.As E is irreducible, Assumption 1 is also satisfied and we conclude by Theorem 5.1 that there is a unique QSD on E.
Let us now consider the slightly altered version of the original Lotka-Volterra system using mass-action kinetics, obtained by addition of the reactions 2S 1 → S 1 and 2S 2 → S 1 + S 2 .In this case, there is still one endorsed set with corresponding absorbing set given by We need v 2 > v 1 for the coefficient of the 4th reaction to be positive.Further, by the second derivative test, d v (n) = O(n 2 ) exactly when The set of possible v-vectors, V, is therefore , which is non-empty for any positive reaction rates.A particular choice would be v = ( , + 1), for sufficiently large.As E is irreducible, Assumption 1 is satisfied and we conclude by Theorem 5.1 that the modified Lotka-Volterra system has a unique QSD on E for any reaction rates.Example 6.3.
S 1 There are two endorsed sets given by The corresponding set of absorbing sets are given by We conclude that there exists an 0 < η < 1 such that Assumption 2 holds.Since both E 1 and E 2 are irreducible, Assumption 1 is satisfied hence Theorem 5.3 applies regardless of the rate constants -there exists a unique QSD, ν n , on each E n , and given an initial distribution, µ, on D E , the measure P exponentially fast in t.Note that if we had modeled the network using deterministic mass action [21], we would have found the attracting 2 fixed point which seems to lie near the peak of the quasi-stationary distribution for the parameter values used in Figure 4. Indeed, we find the fixed point (13.39,44.81) which is a stable spiral.
The stoichiometric matrix is in this example given by It follows that rank Ξ = 2 = d hence by Proposition 2.3, there is a finite number endorsed sets for x sufficiently large.Indeed, upon inspection, we find two endorsed sets given by Thus, Assumption 2 is not satisfied and we can not apply Theorem 5.3.Example 6.6.
As rank Ξ = 1 < d, there are infinitely many endorsed sets.These are each irreducible.Indeed, upon inspection, we find the endorsed sets and the full set of absorbing states D A = {x ∈ N 2 0 : x 1 = 0}.Furthermore, some sets (E i with i ≡ 1 mod 2) have no corresponding absorbing set while all others do.For all classes however, we find hence, by Theorem 5.3, we conclude that for any initial distribution µ with support contained in a finite subset of the endorsed sets, the measure P µ (X t ∈ • | t < τ A ) converges exponentially fast in t to ν µ .Had we instead looked at the slightly altered network then Assumption 1 would no longer be satisfied as no minimal irreducible class exists.

Conclusions
In this paper, we have provided sufficient conditions for the existence and uniqueness of a quasi-stationary distribution, within each endorsed set, for general stochastic reaction networks.In particular, we have provided sufficient conditions for the existence and uniqueness of a globally attracting quasi-stationary distribution in the space of probability distributions.
The requirement that for mass-action reaction systems there exists a stoichiometric coefficient strictly greater than one for each species strong, however, it seems to be intrinsic to the problem of guaranteeing uniqueness of the QSD.Indeed, this can be seen already in 1-dimensional systems, and one can imagine that getting stuck near one of the axes would approximately reduce the multi-dimensional system to such a 1-dimensional case.It is an important question to determine sufficient conditions for just the existence of a QSD, in which case one would expect much weaker conditions to be satisfied.This is still an open problem.
The application of our results depends on the existence of a vector v ∈ N d , such that Assumption 2 holds.One would like to have easy graphical ways of guaranteeing this existence or an explicit algorithmic way of constructing such a vector.This may not be possible in general, since the problem is equivalent to determining the sign of a multivariate polynomial in a certain region of the positive orthant.However, the specific network at hand is often prone to analytical ad hoc methods, and even in lack of this, one may rely on numerical methods to find a suitable candidate for v. Another caveat is the exact calculation of the endorsed sets.Methods for finding these numerically, in the case where the intensity functions are positive in the positive orthant, exist [23,38].However, the problem becomes more complicated in the general case [22].
Knowing the existence of a unique QSD, one would of course like to know the explicit analytical expression for this distribution on each endorsed set.However, this seems to be a very hard problem to solve in general, and even for 1-dimensional systems it has not yet been fully resolved.Thus, so far, one is forced to apply numerical methods or rely on analytical approximations, see [20,50].
Another problem which is important for applications, is the question of observability.Indeed, the relative sizes of the time to extinction, τ A , versus the time to reach the mean of the QSD, τ ν , determines whether we are likely to observe the QSD or not.Only if τ A τ ν would we expect the process to behave according to the QSD [47].Few recent results on this matter exist for limited scenarios [22], while methods exploiting the WKB approximation [5] are more general although not yet fully rigorous [9].Finally, numerical evidence seems to suggest a strong connection between the deterministic and stochastic models of the same underlying reaction network.Indeed, for systems close to thermodynamic equilibrium, also referred to as the fluid limit [28], the modes of the QSD appear to be located near the deterministic steady states.However, far from equilibrium, the picture may be radically different.Our result can be seen as a stochastic analogue to the deterministic case of having an equilibrium point within each stoichiometric compatibility class.In this light, the QSD bridges the gap between the knowledge of extinction in the stochastic description and the existence of a stationary steady state in the deterministic setting.Future work lies in analyzing what can be inferred about the QSD from the corresponding deterministic dynamical system.

Figure 1 :
Figure 1: Left: The reaction graph of a stochastic reaction network.Right: The state space with the region R in shaded grey.Points in D A are marked red.

Proposition 2 . 3 .
For M ∈ N d sufficiently large, the set {x ∈ D E | x ≥ M } intersects (i) finitely many endorsed sets if and only if rank Ξ = d.(ii) a single endorsed set if and only if span Z Ξ = Z d .

Corollary 2 . 4 .
If d ≤ 2, there are finitely many endorsed sets if and only if rank Ξ = d.

S 1 Figure 2 :
Figure 2: State space of the reaction network 2S 1 → ∅.There are two endorsed sets.
which, irrespectively of v, are compact subsets of E, satisfying O n ⊆ O n+1 and E = n∈N O n .We denote the first hitting time of A, the first hitting time of O n and the first exit time of O n by
denote the vertex set of the respective connected components of the induced subgraph D[I E ], the graph with vertex set I E and edges from E with start and end nodes in I E .
be the probability of this shortest path, and define β m = min x∈Om b x .As O m is compact, β m > 0. It follows that for each n ∈ N 0 the conditioned process fulfils for some n}, where the last equality follows from A being an absorbing set.By Assumption 1, all states in O m ⊆ E have a shortest path to A (there is a path to the minimal irreducible class, and then to A, for any x ∈ O m ), which has some positive probability.For each state x ∈ O m , let b x γt , t ≥ 0. Proof.By Proposition 4.3, the process is regularly absorbed, and by Lemma 4.6 the pair (V, ϕ) given in Definition 4.4 is admissible, satisfying conditions (a) and (b) from Lemma 4.5.The result now follows from Lemma 4.7-4.9together with [8, Cor.2.8].Corollary 5.2.Let ν be the unique quasi-stationary distribution on E and I min the unique minimal class of E. Then supp ν = I min .