Complete hierarchies of SIR models on arbitrary networks with exact and approximate moment closure

We first generalise ideas discussed by Kiss et al. (2015) to prove a theorem for generating exact closures (here expressing joint probabilities in terms of their constituent marginal probabilities) for susceptible-infectious-removed (SIR) dynamics on arbitrary graphs (networks). For Poisson transmission and removal processes, this enables us to obtain a systematic reduction in the number of differential equations needed for an exact `moment closure' representation of the underlying stochastic model. We define `transmission blocks' as a possible extension of the block concept in graph theory and show that the order at which the exact moment closure representation is curtailed is the size of the largest transmission block. More generally, approximate closures of the hierarchy of moment equations for these dynamics are typically defined for the first and second order yielding mean-field and pairwise models respectively. It is frequently implied that, in principle, closed models can be written down at arbitrary order if only we had the time and patience to do this. However, for epidemic dynamics on networks, these higher-order models have not been defined explicitly. Here we unambiguously define hierarchies of approximate closed models that can utilise subsystem states of any order, and show how well-known models are special cases of these hierarchies.


Introduction
A primary method for incorporating spatial structure and other contact structures into epidemic models is to use a network of contacts [1]. While simulation of stochastic models is straightforward on these networks, obtaining differential equation descriptions of the relevant time series is more complex. Here we 2 Background concepts Apart from Theorem 3.1 which applies more generally, we shall consider a Markovian class of SIR models on contact networks. In particular, we consider a directed graph D = (V, A) consisting of N = |V | individuals/nodes and a set A of arcs. We also label each individual according to some arbitrary ordering such that if i ∈ V then i ∈ {1, 2, . . . , N }. Each individual can be in only one of three states S, I, or R at any given time. Node j ∈ V , when infectious, makes 'infectious contacts' to node i ∈ V via a Poisson process of rate T ij ≥ 0, where T ij > 0 ⇔ (j, i) ∈ A and where we assume that T ii = 0 for all i ∈ V . If node i is susceptible when it receives an infectious contact then it immediately becomes infectious. It will then remain infectious for an exponentially distributed period, with parameter γ i , after which it becomes recoveredwhich is an absorbing state for the individual. We thus have a continuous-time Markov chain with a state space of size 3 N . Except where otherwise stated, we also assume initial conditions such that the states of all nodes are initially statistically independent. This assumption encompasses all pure-state initial conditions, such as a specific individual being infectious with all others susceptible, and it also incorporates binomially distributed initial conditions. Uniform initial conditions can also be exactly represented with additional computation [13].
Definition 2.1. S i , I i and R i denote the indicator random variables for the events that node i ∈ V is susceptible, infectious and removed respectively. Depending on the context, it will also be convenient to refer to S i , I i and R i as the corresponding events themselves.
The hierarchy comprises of a sequence of equations containing the first moments and mixed moments of the random variables S i and I i . Using angle brackets to denote expectation values, it can be shown [9] that the master equation (or Kolmogorov forward equations) implies the following rate equations: T ij S i I j , where S i I j is a product of the indicator random variables which also specifies a state of the subsystem of order two comprising of the pair of nodes i and j. For this pair state we have: More generally, for these models, the master equation allows us to write down a rate equation for the probability of any subsystem state of size n in terms of subsystem states of size n and subsystem states of size n + 1. We now state this as a theorem.
Following prior work [11], but with a notational simplification brought about by using the same index for all system and subsystem states, we define an alternative notation to Definition 2.1 that is useful for keeping track of the hierarchy of moment equations in this context. Definition 2.2. We use the following notation to denote subsystem states.
• ψ W is a subsystem comprising of the set of nodes W ⊂ V .
• Let A be a mapping from the elements of W to {S, I, R}, and let A i be the image of node i ∈ W under A. Thus, A can also be interpreted as a pure state for subsystem ψ W , i.e. the state where, for all i ∈ W , individual i is in state A i .
• ψ A W denotes the indicator random variable for the event that ψ W is in state A. Thus the probability of the event that subsystem ψ W is in state A is P (ψ W = A) = ψ A W . As in definition 2.1, it is also convenient to refer to ψ A W as the event that ψ W is in state A.
Remark. For the event where node i is in a susceptible state, we can draw the following correspondence between the notations: ψ S i = S i , and similarly for the infectious and removed states. Definition 2.3. Let k ∈ W ⊆ V and X ∈ {S, I, R} and let A be a state of subsystem ψ W . Then, h X k (ψ A W ) denotes the indicator random variable or event ψ A W , but where the state of node k is changed to state X leaving the states of all other nodes unchanged. Note that if A k = X then h X k (ψ A W ) = ψ A W .
Theorem 2.1. For any subsystem ψ W , the probability that it is in state A is governed by the rate equation: where here, and throughout this paper, the indicator ½(.) is equal to 1 if its argument is true and is equal to zero otherwise.
This theorem is proved in [11]. Starting with subsystem states that are only composed of susceptible or infectious individuals, repeated application of equation 3 to each of these states as well as to any subsystem states that arise on its right-hand side can never result in subsystem states with a removed individual. This is due to the absence of h R k in equation 3. Hence, for these subsystem states, ½(A k = R) = 0 for all k ∈ W so equation 3 becomes: Equations 1 and 2 can now be seen to be special cases of this theorem. By applying equation 4 to every individual in the network for states S and I and then reapplying to every new subsystem state which emerges, we obtain a closed set of differential equations for a set M of subsystem states. However, |M | will generally be very large for most systems, preventing numerical solution.
To reduce the number of equations, we need to introduce a mechanism to curtail the generation of new subsystem states. In the next section, we discuss scenarios in which this can be done where the emerging system is still an exact representation of the underlying stochastic process. Following this, we shall consider hierarchies of approximate closed models.

Exact closed models
Here we prove a theorem pertaining to arbitrary SIR dynamics on arbitrary networks. We then use this to derive a class of exact models for Markovian SIR dynamics on arbitrary networks. We illustrate this with some examples, and finally state a theorem specifying the maximum size of subsystem needed to exactly represent the dynamics on any given network.

Exact closure theorem
For a given directed graph D = (V, A) with set V of nodes/individuals and set A of arcs, we make the following definitions: and D − Z is the vertex-set deleted subgraph consisting of nodes V \ Z. Here, E is chosen to represent 'exact'; this is appropriate since we shall now see that f E (X, Y, Z) = 1 implies the existence of an exact closure relation.
Remark. If the network is undirected then f E (X, Y, Z) = 1 if and only if there is no path between X and Y in D − Z.
Theorem 3.1. We consider stochastic SIR dynamics defined on a time-invariant network where the initial conditions are such that the states of individual nodes are initially statistically independent. Let ψ A X , ψ B Y and ψ C Z be indicator random variables or events where X, Y, Z ⊆ V are disjoint and nonempty. If Z is dynamically partitioning with respect to X and Y , and all nodes in subsystem state C are susceptible (C i = S ∀i ∈ Z), then provided that ψ C Z = 0, Proof. If the infection has not passed through Z (which is guaranteed by all nodes in state C being susceptible), the states of the individuals in X are statistically independent of the states of the individuals in Y . This is true since f E (X, Y, Z) = 1 implies that there are no individuals from which both a member X and a member of Y can be reached without traversing a member of Z.
We have: Given that P (ψ C Z ) = 0, we have: from which the result follows.
Remark. For the case of zero denominator, note that P (ψ C Z ) = 0 implies that P (ψ A X , ψ B Y , ψ C Z ) = 0. Notice that we made no assumptions about the SIR dynamics in proving this theorem and that it is therefore not restricted to Markovian systems, although it is the Markovian case that we shall be applying it to in the remainder of this paper.
The theorem is a generalisation of the main result in [12] which is stated in terms of single dynamically partitioning individuals on undirected networks. In that context they are referred to simply as partitioning individuals due to their correspondence to graph partitioning. Some examples of applying the exact closure theorem are shown in Figure 1. In this Figure and throughout the remainder of the paper, network links without arrowheads denote undirected links whereas those with arrowheads denote directed links.
For our purposes, we are interested in a special case of the exact closure theorem which is captured by the following corollary.  a) b) c) I 1 Figure 1: Three examples of applying the exact closure theorem. Here directed links have arrowheads and undirected links do not. a) This motif state is typical of the dynamical partitioning we shall consider in this paper. Applying Theorem 3.1, we see that there is dynamical partitioning about node 2, so we have I 1 S 2 S 3 I 4 S 5 = I 1 S 2 S 2 S 3 I 4 S 5 / S 2 . b) We can dynamically partition on this graph about a cluster of susceptible nodes.
In fact there are two exact closures we can write down: I 1 I 2 S 3 S 4 S 5 S 6 I 7 I 8 S 9 = I 1 I 2 S 3 S 4 S 6 S 3 S 4 S 5 S 6 I 7 I 8 S 9 / S 3 S 4 S 6 = I 1 I 2 S 3 S 4 S 5 S 6 S 3 S 4 S 5 I 7 I 8 S 9 / S 3 S 4 S 5 . c) Here we can apply the exact closure theorem to obtain S 1 I 2 S 4 S 5 I 6 I 7 = S 1 I 2 S 4 S 4 S 5 I 6 I 7 / S 4 . Note that I 3 is not included in this closure.
This corollary is illustrated by the example in Figure 1a. By applying this to equation 4 we obtain: For an arbitrary network, by applying equation 10 to the indicator random variables S i and I i for all i ∈ {1, 2, ..., N }, and then reapplying it to every new subsystem state that emerges, a closed set of differential equations for the exact time-evolution of the probability of an individual being in a particular state is obtained for all individuals. The number of equations that will be needed is limited by the closures that are made possible by the exact closure theorem. Remark. It follows that S i and I i (∀i ∈ V ) and S i I j (∀i, j ∈ V : T ij > 0) represent members of M E for any network.

Examples
Before determining the network structures under which dynamical partitioning occurs more generally, we consider some examples. For further examples in the context of undirected networks the reader is directed to [12].
Proof. For such tree networks, every individual is dynamically partitioning relative to any two of its neighbours on the underlying graph. Hence, the above system follows directly from repeated application of equation 10, starting with Remark. This is the pairwise model that was shown to be exact for tree networks in [11].

Example 2
Consider the graph in Figure 2a. Let us suppose that all nodes have the same removal rate g and that the transmission rate across all network links is unity.
For simplicity we shall also make this assumption through the remainder of the explicit examples in this paper. We can apply Corollary 3.1 which is embedded in equation 10 to build up the induced subsystem states M E . Let us just consider the infectious probability of node 1 to see how this works. We have: Here and throughout the paper we order nodes according to the numerical order of their labels; the relevant motif structures need to be understood with reference to the associated graph. Now, node 2 is dynamically partitioning with respect to nodes 1 and 3, and it is also dynamically partitioning with respect to nodes 1 and 4. Hence: Rather than a complete analysis of all induced subsystem states that arise, we take the single pair state S 2 I 3 from this equation as an example. Here, node 3 is not dynamically partitioning with respect to nodes 2 and 4 but node 2 is dynamically partitioning with respect to 1 and 3 so: Then for S 2 S 3 I 4 , node 2 is dynamically partitioning with respect to node 1 and nodes 3 and 4 so: We see that here, M E represents a significant dimensional reduction in the number of induced subsystem states compared to the full set of induced subsystem states M .

Example 3
For the undirected graph in Figure 2b there is dynamical partitioning about node 1. Starting with (for example) the infectious probability for node 1, we have:˙ where again we are assuming transmission rates of unity and a removal rate g for each node. Now, choosing the first of these pairs to develop one part of the induced set M E gives: and then for the first of these triples: For the first of these quads we have: Here, the maximum size of a subsystem state is four. We note that this is equal to the size of the largest simple cycle and that this was also true for example 2. However, this is not always the case as shown by the next example.

Example 4
Figure 2c shows a network where the maximum simple cycle size is 4 but the maximum size of a subsystem state in M E is 5. Starting with the infectious probability of node 1 we have: Then, taking just the subsystem state in the first term: and again taking just the first term: Finally, taking the first term again gives: In this case we see that the maximum size of a subsystem state is at the size of the system (5 nodes) and is not constrained by the largest simple cycle (4 nodes). This leads to the question: What aspect of a network specifies the largest subsystem size that appears in M E ? We answer this question in the following subsection.

System size
Here we define the type of network structures that are amenable to dynamical partitioning. We start from single node subsystems and expand out, via equation 10, until the largest subsystem is reached incorporating that individual before dynamical partitioning prevents larger subsystems emerging. For the undirected case, the situation simplifies considerably [12] since all dynamically partitioning individuals are also cut-vertices (individuals which, when removed, increase the number of connected components). It is then helpful to represent the network as a collection of blocks (maximal biconnected subgraphs) where the between-block structure is tree-like (see Figure 3a). This makes it straightforward to assess the feasibility of constructing a solvable exact system by making use dynamical partitioning. Notice that it is possible for a node to belong to more than one block as in the top right of Figure 3a although the overlap between any two blocks can only be a single node. It is interesting that this representation of the network resembles the household model structure where analytic progress can also be made [15]. For directed networks, the situation is more complicated. Here we define 'transmission blocks' to play a similar role to blocks. Indeed, blocks and transmission blocks will have equivalent definitions in the undirected case. We use the term transmission block rather than block since there are likely to be other useful extensions of the block concept for directed networks. Remark. According to this definition, any block in an undirected network is also a directed sub-block. Hence, the blocks illustrated in Figure 3a are all directed sub-blocks. The shaded boxes in Figure 3 are examples of transmission blocks. Figure 3b gives an example of these on a directed graph. Notice that now it is possible for transmission blocks to overlap my more than one node (the darker shaded triangle belongs to two transmission blocks). This happens when a region of the network has paths to two or more other regions that do not have paths between each other. Figure 4 shows some more examples of these definitions for directed networks. Figure 4a and b have underlying graphs that are biconnected. However, b) has a node (node 1) which is reachable from all others whereas a) does not, and so b) is a directed sub-block while a) is not. Figure 4b is also a transmission block since it is maximal. Additionally, neither have sub-graphs of the underlying graphs that are biconnected and so neither contain directed sub-blocks as subgraphs. Figure 4c is a transmission block (the underlying graph is biconnected and node 2 is reachable from all others). It also contains several directed sub-blocks (for example nodes 1,2 and 3).  transmission block as a subgraph (nodes 1,2,3,4) and contains several directed sub-blocks.
We can now state the main result of this subsection:

Hierarchies of approximate models
The systems of equations in the previous section are exact, but limited in applicability because of the limited scope for dynamical partitioning in most networks. To suitably curtail the large number of equations, the networks need to have a structure which is roughly tree-like. More typically, we want to trade off some exactness for models which are numerically tractable and provide a good, rather than exact description of the underlying dynamics. The pair-level SIR model (equation 11) is exact for tree networks but is also a reasonably good approximation for SIR dynamics on a wide range of networks. Higher-order models will typically be more accurate, but will have considerably greater computational cost. Here we formally define hierarchies of approximate models that can be applied to Markovian SIR dynamics on any network.
We define 'pseudo-partitioning' according to different criteria. We define two hierarchies of models via what we term 'cycle-partitioning' and 'size-partitioning'. We then also consider a 'hybrid-partitioning' hierarchy utilising both methods. Although these pseudo-partitionings can be defined more generally, as in the case of dynamical partitioning itself, we shall restrict our attention here to dynamical partitioning with respect to single susceptible nodes.
Generalising from the case of dynamical partitioning, we define an operator which acts on subsets of the network and which enables a systematic curtailing of the number of subsystem states necessary for a solvable model. We use a function f p (X, Y, i) to determine some pseudo-partitioning of subsets X and Y with respect to node i. By analogy with equation 10, we have: So, when f p (X, Y, i) = 1, we treat i as if it is dynamically partitioning with respect to X and Y and so the right-hand-side of the rate equation does not generate larger subsystem states. The specific type of approximate model depends on how f p (X, Y, i) is defined and is formed by assuming equality between the left and right hand sides. Note that equation 15 defines a solvable model that is based on the closure in equation 9. However, other closures such as the Kirkwood-closure fall outside of this scheme. It is, however, straightforward to define a solvable hierarchy of approximate models that incorporates the standard Kirkwood closure as a special case.
Let us denote the adjacency matrix for the underlying graph by U (U ij = sgn(T ij + T ji ) for all i, j ∈ V ). Then, for the probability of subsystem ψ W being in state A, we can approximate: where m i = j∈W U ij is the number of neighbours of node i in W in the underlying graph and is also the number of times that the state of node i appears on the numerator. For a subsystem of three nodes on an undirected graph, this is seen to reproduce the standard Kirkwood closure. Using this form, we can write an alternative to equation 9. For a subsystem state A, if A i = S, i ∈ W and j ∈ V \ W , then we can approximate: where we use N n to denote the set of neighbours of node n in the underlying graph.
We use this closure to motivate the following hierarchy: It is worth noting that both of these closures are based around a single IS arc being added each time. Other schemes with more complex closures should also be possible. For example, Theorem 3.1 allows closures where we do not necessarily need to have only singlet states in the denominator (see Figure 1b). i j W Figure 5: An example of a node i ∈ W which is not dynamically partitioning with respect to node j and W \ i, but it is cycle-partitioning up to x = 2.

Cycle-partitioning
With reference to Figure 5, although node i is not dynamically partitioning with respect to W \ i and node j, we might observe that it is in some sense 'approximately' dynamically partitioning because the path length between j and W is reasonably long when i is deleted. It seems sensible to define a type of pseudo-partitioning according to this path length.
where a, b ∈ AE.
We make the following observations: i) If the network is undirected then f C(x) (X, Y, i) = 0 if and only if there is at least one path of length x or less between some member of X and some member of Y when i is deleted. ii) An individual who is dynamically partitioning with respect to two subsets is also cycle-partitioning at all orders with respect to those subsets. iii) In Figure 5, node i is cycle-partitioning with respect to W \ i and j for x = 0, x = 1, and x = 2, but not x > 2. iv) Any individual i ∈ V is always cycle-partitioning at order x = 0 with respect to any other two subsets.
Remark. By applying this rate equation to every individual in the network for states S and I and then reapplying to every new subsystem state which emerges, we obtain a closed set of differential equations which form the xth model in a hierarchy of approximating models (note that the model corresponding to x = 0 is the pair-level model given by equation 11). The associated set of induced subsystem states will be denoted by M C(x) .

Examples
We can consider cycle-partitioning for the network in Figure 2b. If we cyclepartition at x = 1, then the first two terms of equation 12 are closed at the level of pairs. Specifically, for the first term, node 2 is cycle-partitioning with respect to nodes 1 and 3. For the second term, node 1 is cycle-partitioning with respect to nodes 2 and 4. This gives: Thus, triples within the square are no longer 'kept intact', and so, within the square, the model closes at the level of pairs. However, triples made up of the members of the triangle are kept intact. For example, we have: If we apply cycle-partitioning to Figure 2c instead, then the x = 0 model is the pair-level model as always. The x = 1 model is also the pair-level model and the x = 2 model is equivalent to the exact closure model. Hence, cyclepartitioning does not necessarily lead to improved models as x increases and it does not always lead to a reduction in system size with respect to the exact closure model. The results from the x = 0 pair-level model and the exact model applied to Figure 2c can be seen in the section on size-partitioning below ( Figure 8) and so are not reproduced here.
An extreme example of the failure of cycle-partitioning to produce improved models is given by the triangular lattice shown in Figure 7. Here, the x = 0 model is the pair-level model. For x = 1, consider the triple A 3 S 1 C 4 (∀A, C ∈ {S, I}). Here we do not have cycle-partitioning since by deleting node 1, there is a path of length 1 between nodes 3 and 4. As we move to order 4 motifs, (e.g. adding a node to the above triple either by the edge (1,2) or the edge (3,2)), it is readily seen that there will always exist motifs which do not cycle-partition for x = 1 at all orders. Hence for the triangle lattice, even for x = 1 cyclepartitioning, we obtain a model with motif states at the size of the full network. Some cycle-partitioning does occur however, so the resulting model is not exact.  For example, for the triple A 2 S 1 C 4 , deleting node 1 means that the shortest path from 2 to 4 is via node 3 and is of length 2. So we have cycle-partitioning here. We also have it for states A 7 S 1 C 4 . This state is also cycle-partitioning at x = 2 (the path length from node 7 to node 4 after deletion of node 1 is 3) but we no longer cycle-partition A 2 S 1 C 4 . Finally, at x = 3, no cycle-partitioning occurs anywhere and we have an exact model containing subsystem states at the size of the system (M C(3) = M ).
In general, if the largest transmission block in a network has n individuals, then any cycle-partitioning model of order x ≥ n − 2 is exact (see Theorem B.1 in Appendix B). This is illustrated by the network in Figure 2b where the largest transmission block is of size n = 4 and the x = 2 cycle-partitioning model is exact (Figure 6). This is also the case for the graph in Figure 2c where n = 5 and the x = 3 model is exact (the x = 2 model is also happens to be exact here as well). Another general result is that if the smallest directed sub-block consists of n individuals, then the cycle-partitioning models of order x < n − 2 are all equivalent to the pair-level (x=0) models (see Theorem B.2 in Appendix B). This is illustrated by the graph in Figure 2c where the smallest directed sub-block is n = 4, and we found that the x = 1 cycle-partitioning model is the same as the pair-level model.

Size-partitioning
The issues arising in some networks such as Figure 2c, where even cycle-partitioning at x = 2 requires subsystem states containing all individuals, and the extreme example of the triangular lattice, motivate an alternative pseudo-partitioning approach whereby the sizes of subsystem states are more directly constrained.
where X ⊆ V and x ∈ AE.
Here we make the substitution f p (n, W \k, k) = f S(x) (W \k) into equation 15.
Remark. As with previous pseudo-partitioning, a complete approximate model arises from the equations for the individual-level states and then repeatedly applying this equation to each subsystem state that emerges. As with cyclepartitioning, the x = 0 size-partitioning model corresponds to the pair-level model.

Examples
As an example, consider the x = 1 size-partitioning model for Figure 2c, where the cycle-partitioning hierarchy was redundant. Equation 13 now becomes: For x = 2 size-partitioning, equation 13 is left untouched since the exact rate equation for a subsystem state of size 3 does not involve subsystem states larger than 4. However, equation 14 becomes: In this way, we obtain three different approximate models: x = 0, x = 1 and x = 2. For x > 2, the model is exact. Figure 8 shows results from the application of each of these three approximate models and the exact x = 3 model to SIR dynamics on the network depicted in Figure 2c. An interesting feature that should be noted for the x = 2 model is that it very slightly underestimates the rate of spread of the epidemic. Typically, experience shows that the closure of these equations leads to an over-estimate the rate of spread, but this provides a counter example. While size-partitioning will resolve issues such as the triangular lattice, it has problems of its own. Specifically, we see from Figure 8 that since the smallest  Figure 2c with x = 0 which corresponds to the pair-level model through x = 1, x = 2, and finally x = 3 which is exact for this scenario. An individual is assumed to be initially susceptible with probability 4/5 and infected otherwise (the states of individuals are initially statistically independent). We have assumed a transmission rate of unity across each link and a removal rate of unity. cycle size in Figure 2c is 4, the x = 1 size-partitioning model is almost identical to the x = 0 pair-level model. The x = 3 and x = 4 models are also almost identical. Hence, the extra computation in evaluating at x = 2 and x = 4 is wasteful. In this sense, cycle-partitioning has an advantage by only picking up complete cycles in the network.
An additional problem with size-partitioning is that it ignores genuine dynamical partitioning. For example, for Figure 2b, we would require motif sizes of 6 (x = 5) to describe this exactly within the size-partitioning scheme. However, if we permit genuine dynamical partitioning, we only need motif sizes of less than or equal to 4. This issue is readily resolved by considering the modified scheme: which incorporates genuine dynamical partitioning into size-partitioning. With this rule, in Figure 2b, the genuine dynamical partitioning around node 1 is utilised wherever possible.

Hybrid-partitioning
Both cycle-partitioning and size-partitioning have their merits. Size-partitioning avoids unnecessarily large motif states where cycle-partitioning cannot be effectively implemented beyond an early stage, such as in the triangle lattice. On the other hand, cycle-partitioning picks out cycles in the network and closes at the pair level unless complete cycles can be incorporated, avoiding wasteful computation with minimal gain in accuracy. We can construct a hybrid-partitioning scheme that captures the benefits of both cycle-partitioning and size-partitioning while avoiding the problems of both. We define this hybrid-partitioning as: This leads to a hierarchy of models defined by substituting f p (n, W \ k, k) = f C(x)S(x) (n, W \ k, k) into equation 15. This also has the pair-level model for x = 0. We also note that alternative hierarchies could be designed with different values of x for the size-partitioning and the cycle-partitioning parts.
This closure benefits from the advantages of both cycle-partitioning and size-partitioning. Firstly, if there are only large cycles, the hierarchy is closed at a low order by cycle-partitioning. This is desirable since, as illustrated in Figure 8, continuing on generates little benefit unless we are able to continue to the size of the smallest cycle. However, if the system is not amenable to cycle-partitioning, as in the triangular lattice, then size-partitioning is required. A network illustrating the benefits of this is shown in Figure 9. For hybrid-  Figure 9: A graph that illustrates the benefits of hybrid-partitioning. Expanding from node 1 using x = 1, we utilise both cycle-partitioning and size-partitioning capturing the advantages of both.
partitioning with x = 1, let us start with the probability that node 1 is infectious: For the first of these terms on the right-hand-side, the corresponding approximate differential equation is: where we have employed x = 1 cycle-partitioning. For the term S 1 I 5 in equation 23 we obtain: where, again, x = 1 cycle-partitioning has been implemented where possible. For the second term in this expression, we have: Here, the closures on the first line are via x = 1 size-partitioning, whereas the closures on the second line are via meeting the criteria for both x = 1 sizepartitioning and x = 1 cycle-partitioning. So, this hybrid-partitioning obtains the best of both methodologies. Cyclepartitioning avoids unnecessarily including extra terms in the large cycle 1-2-3-4-5-1 which we have seen (Figure 8) generates minimal extra accuracy. Sizepartitioning forces partitioning where the motif sizes get beyond a specified level, here constraining the maximum motif size to be 3.

Alternative closure
Before leaving this section on pseudo-partitioning, we include a brief aside on pseudo-partitioning using the alternative closure defined in equation 16. In this case, we can still apply the cycle, size and hybrid methods, but we use equation 18 in place of equation 15. Two examples of applying this are illustrated in Figure 10. Here the shaded regions represent the existing subsystem states and the solid lines coming out of these regions represents the new infectious node being added on. The dashed lines represent other links between the new infectious nodes and the original subsystems. Supposing that the criteria for pseudo-partitioning is met at this stage (i.e. the relevant f p (.) = 1), for Figure 10a we obtain and for Figure 10b, we obtain We note that for cycle-partitioning with x > 0, both closure methods become equivalent (equation 18 reduces to equation 15) since the types of additional links drawn in Figure 10 could not be present. Notice that when the closure of triples always occurs (e.g. x = 0 cyclepartitioning or x = 0 size-partitioning), the variant of the pair-level models introduced in [8] and [9] is obtained under this closure. This variant is expected to be able to handle clustered networks more accurately than the variant considered in [11] and [12] that follows from equation 15.

Discussion
Recently it has been possible to establish exact and practicable representations of stochastic epidemic dynamics on finite tree networks [11] using closure methodologies evaluated at the level of individuals [8,9]. Message-passing also gives exact representations on trees [16] and this can be shown, under some circumstances with Poisson transmission processes, to be equivalent to moment closure models [13]. Under suitable and very restrictive homogeneity assumptions, population-level versions of these closed models (e.g. [5]) can also be exactly derived on idealised graphs with homogeneous initial conditions [8].
Within the individual-level closure construction, it is possible to go beyond trees and obtain exact representations of epidemic dynamics on some networks with cycles using the idea of dynamical partitioning on the graph [12]. Here we defined dynamical partitioning on arbitrary networks and also observed that it applies to both Markovian and non-Markovian SIR dynamics. In the Markovian case with Poisson transmission and removal processes, we can use dynamical partitioning to define exact SIR moment closure models. The extent to which these models are computationally viable depends primarily on the underlying structure of the network.
More specifically, starting from the probabilities of the states of individual nodes in a given network, we uniquely defined the full set of exact induced moment equations by automatically implementing dynamical partitioning where applicable. We also defined transmission blocks as a natural decomposition of a network for the closure of SIR models. Transmission blocks represent a possible extension of the block concept in graph theory into directed networks. Using this concept, we proved a theorem stating that the size of the largest subsystem state appearing in the set of moment equations is equal to the size of the largest transmission block.
We also investigated hierarchies of approximate moment closure models. In the epidemic literature, it is normally the case that moment closure models are constructed at the level of pairs, or occasionally at triples or quads [2,6,7]. This is often accompanied with an assertion that higher order models exist. However, to our knowledge, these higher order epidemic models have never been defined explicitly. This is understandable since these models rapidly become too complex to be of real practical relevance, but it does leave open the theoretical question of how these models can be defined [9]. To address this, we introduced 'pseudo-partitioning' to construct complete hierarchies of approximate closed models that are well-defined at all orders. In fact, we defined several hierarchies of closed models; one in terms of motif size, one in terms of the size of cycles in the network, and a hybrid method taking the best of both of the previous methods. Undoubtedly other hierarchies can be defined as well. In addition, we investigated two mechanisms of closure -one based on exact dynamical partitioning and the other which is more related to the Kirkwood closure.
The closure based directly around dynamical partitioning has the variant of the closure model considered by [11] as its zeroth order variant (for all of the size, cycle and hybrid approaches). The hierarchies based around the alternative closure all have the model introduced by Sharkey ( [8] and [9]) as their zeroth order variant (this is designed to handle networks with clustering in a more effective way). We also observed that the conditions for cycle-partitioning at orders greater than zero mean that both methods of closure become equivalent.
The hierarchies of models generated some interesting observations concerning the convergence to exactness with order. For example, for size-partitioning, the models converge to the exact solution with increasing order, but this convergence is not always monotonic (see Figure 8). It is typical for moment closure models of SIR epidemics to over-exaggerate the spread of an epidemic, but here we observed a counter example (see also [9]) where this is discussed as a possibility). An unanswered question is whether the approximate models always increase in accuracy as the order of the hierarchy increases. Intuitively we would expect that they do, and this is validated by the examples so far investigated. Figure 11: Demonstration for Lemma A.2: 'ways' in which a set W 3 = {i, j, k} can be generated from the pair {i, j}, where j is connected towards i. Note that W 3 is always a subset of some directed sub-block, and i is reachable from all others in both D[W 3 ] and the directed sub-block. The dashed arrows represent paths which may consist of any number of vertices. W 3 = {i, j, k} can be generated from the pair W 2 = {i, j}, with j connected towards i, then there is a link from k to either i or j. Further, from the definition of dynamical partitioning and the generating rule, there are two possibilities: 1) there exists two vertex disjoint paths P 1 , P 2 from some individual (which could be k) to both members of W 2 , and where k is the penultimate individual in one of these paths (see Figure 11a&c), or 2) there exists a path P 3 from one member of W 2 to the other, and k is the penultimate individual in this path (see Figure 11b&d). Note that in all cases depicted in Figure 11, W 3 is a subset of some directed sub-block in which i is reachable from all others (and i is reachable from all others in D[W 3 ]). Lemma A.3. If the statement made in Lemma A.1 is true for the case where |W | = n, then it is also true when |W | = n + 1.
Proof. Firstly, note that W n+1 , where |W n+1 | = n + 1, can be generated from some connected pair if and only if it can be generated from some set W n , where |W n | = n, which can itself be generated from some connected pair. Now suppose that Theorem A.1 is true for the case where |W | = n, and let W n be a set of size n that can be generated from some connected pair. Then we have a Figure 12: Demonstration for Lemma A.3. Here, the single node in X \ W is illustrative of the nodes in this set which must be connected by at least one path leading to node i, and where the underlying graph G[X] is biconnected. We have placed node k outside of X, but k ∈ X \ W is also permitted. a) shows k belonging to one of two vertex disjoint paths from some node to W and b) shows k as the penultimate individual in a path from a node in W to a different node in W . In either case, W n ∪{k} is seen to always be a subset of some Y ⊇ X where D[Y ] is a directed sub-block in which i is reachable from all others (and i is reachable from all others in W n ∪ {k}).
set X ⊇ W n such that D[X] is a directed sub-block where, without loss of generality, i ∈ W n ⊆ X is reachable from all others in both D[W n ] and D[X]. With reference to Figure 12, and again focusing only on directed links, if a set W n+1 = W n ∪ {k} (k / ∈ W n ) can be generated from W n , then either there exist two vertex disjoint paths P 1 , P 2 from some individual to two different members of W n and k is the penultimate individual in one of these paths (Figure 12a), or there exists a path P 3 from one member of W n to a different member of W n and k is the penultimate individual in this path (Figure 12b). This follows from the generating rule and the definition of dynamical partitioning. Note that if P 1 , P 2 exist then D[X ∪ P 1 ∪ P 2 ] is a directed sub-block in which i is reachable from all others (and i is reachable from all others in D[W n+1 ]). Similarly, if P 3 exists then D[X ∪ P 3 ] is a directed sub-block in which i is reachable from all others (and i is reachable from all others in D[W n+1 ]).
Lemma A.4. If there exists X ⊂ V such that D[X] is a directed sub-block, then there exists A X ∈ M E for some A.
Proof. If D[X] is a directed sub-block in which i ∈ X is reachable from all others, then there exists at least one arc (j, i) in D[X]. The lemma then follows from lemma A.5 below which proves that X can be generated from {i, j}. Proof. From Figure 12 but with k ∈ X we note that some set W ∪ {k}, where k ∈ X \W , can be generated from W if and only if there exist two vertex disjoint i k W Figure 13: Demonstration for Lemma A.5: shows the ways in which k ∈ X \ W can be added to W . We have cases a) the underlying graph of W ∪ {k} is not biconnected, b) Existence of path P 3, c) Existence of paths P 1 and P 2 and d) Existence of a node that is not connected to W . Cases a) and d) are not directed sub-blocks so the existence of paths P 1 and P 2 or path P 3 is established.
paths P 1 , P 2 from some individual to two different members of W and where k is the penultimate individual in one of these paths, or there exists a path P 3 from one member of W to a different member of W and k is the penultimate individual in this path. Our proof is by contradiction. We shall assume that neither of these scenarios hold and show that this contradicts the assumption that D[X] is a directed sub-block. The existence of paths P 1, P 2 and P 3 do not depend on the arcs which connect the members of W to one another. Therefore, let us remove all arcs between members of W and denote the resulting graph by D * . Every individual in X \ W is at the start of a path to some member of W in D * [X]. This follows because there exists i ∈ W that can be reached from all members of X in D[X]. Figure 13 shows the different types of method by which k can be connected to the nodes in W in the underlying graph. Firstly, the underlying graph in Figure 13a is not biconnected so here W ∪ {k} is not a directed sub-block. Secondly, Figures 13b and c correspond to the existence of path P 3 and the existence of paths P 1, P 2 respectively and hence W ∪ {k} is generated. Finally, Figure 13d has a node that is not connected to W and so D[X] cannot be a directed sub-block. Other more complicated variants of this path will also contain such nodes that are not connected to W . Hence, if paths P 1 or P 2 do not exist and path P 3 does not exist then W ∪ {k} is not a directed sub-block. Proof. For any W ⊂ V where at least one individual is reachable from all others in D[W ], if any i ∈ W is cycle-partitioning at order x ≥ n − 2 with respect to some j / ∈ W and W \ i, where (j, i) is an arc, then i is also dynamically partitioning with respect to j and W \ i. This follows because if i is not dynamically partitioning, then this implies the existence of a directed sub-block containing j, i and at least one other member of W , and which consists of more than n individuals due to the definition of cycle-partitioning. Therefore, by Lemma B.1, M C(x) only utilises genuine dynamical partitioning and we have M C(x) = M E (x ≥ n − 2).
Theorem B.2. If the smallest directed sub-block consists of n individuals, then all cycle-partitioning models of order x < n − 2 are equivalent to the pair-level models.
Proof. For any connected pair W ⊂ V (|W | = 2), if i ∈ W is not cyclepartitioning at order x < n − 2 with respect to j / ∈ W and W \ i, where (j, i) is an arc, then there exists a directed sub-block containing W ∪ j, and which consists of less than n individuals. Therefore, no such j can exist. From the way in which M C(x) is constructed, this means that no subsystem states larger than connected pairs emerge due to cycle-partitioning and we have the pair-level model, i.e. M C(x) = M C(0) (x < n − 2).
Remark. Together, Theorems B.1 and B.2 imply that the difference in size between the largest directed sub-block (or largest transmission block) and smallest directed sub-block gives an upper bound on the number of distinct models that the cycle-partitioning approach can provide. If all directed sub-blocks are the same size then no models that are distinct from the pair-level model and the exact dynamical partitioning model emerge. However, even when this difference is large the number of distinct models may sometimes be small, as was shown to be the case for the triangle lattice (where the difference is N − 3).