The Transmission Process: A Combinatorial Stochastic Process for the Evolution of Transmission Trees over Networks

We derive a combinatorial stochastic process for the evolution of the transmission tree over the infected vertices of a host contact network in a susceptible-infected (SI) model of an epidemic. Models of transmission trees are crucial to understanding the evolution of pathogen populations. We provide an explicit description of the transmission process on the product state space of (rooted planar ranked labelled) binary transmission trees and labelled host contact networks with SI-tags as a discrete-state continuous-time Markov chain. We give the exact probability of any transmission tree when the host contact network is a complete, star or path network – three illustrative examples. We then develop a biparametric Beta-splitting model that directly generates transmission trees with exact probabilities as a function of the model parameters, but without explicitly modeling the underlying contact network, and show that for specific values of the parameters we can recover the exact probabilities for our three example networks through the Markov chain construction that explictly models the underlying contact network. We use the maximum likelihood estimator (MLE) to consistently infer the two parameters driving the transmission process based on observations of the transmission trees and use the exact MLE to characterize equivalence classes over the space ∗Corresponding author Email addresses: raazesh.sainudiin@gmail.com (Raazesh Sainudiin), david.welch@auckland.ac.nz (David Welch) Preprint submitted to Journal of Theoretical Biology August 6, 2016 of contact networks with a single initial infection. An exploratory simulation study of the MLEs from transmission trees sampled from three other deterministic and four random families of classical contact networks is conducted to shed light on the relation between the MLEs of these families with some implications for statistical inference along with pointers to further extensions of our models. The insights developed here are also applicable to the simplest models of “meme” evolution in online social media networks through transmission events that can be distilled from observable actions such as ‘likes’, ‘mentions’, ‘retweets’ and ‘+1s’ along with any concomitant comments.


Introduction
The detailed picture of the path an epidemic takes through a population over its course is encapsulated in the transmission tree. The transmission tree represents the physical continuum of contacting hosts and thus frames the host-level structure within which pathogens are transmitted in a communicable disease. Therefore, models of transmission trees are crucial to understanding the evolution of pathogen populations. Constructing models of transmission trees is the main focus of this paper. Although we limit ourselves here to the epidemiological context of transmissions of a communicable disease over a contact network of hosts for concreteness of language and notions from a field with a longer research history, most of our basic results and insights are naturally applicable, as briefly discussed in Section 5.2, to the cultural context of transmissions of "memes" (Dawkins, 1976 p. 192) over a social network of individuals, such as Twitter (Solon, 2013). More generally, they can be used to model transmission events in Finite Markov Information Exchange processes (Aldous, 2013, Section 2.2) as described below.
To understand the process by which a transmission tree grows, we need to consider (i) the structure of the population in which the epidemic spreads and (ii) the state of the individuals in the population as the epidemic spreads. Network models are a natural candidate for describing population structure where the population is identified with a network in which each vertex represents an individual and an arc (a directed weighted edge) from vertex ı i to ı j , given by a non-negative , represents the propensity with which the infection can be transmitted from ı i to ı j . This propensity can be given meaning in terms of frequency of contacts by taking each > w 0 i j , to specify independent rate-w i j , Poisson process for the contact times between ı i and ı j , for instance (this is the "meeting process" of Aldous, 2013). We call these networks contact networks and assume that they are fixed or static through time. Thus, the contact network of a population summarizes "who can contact whom and how frequently" and is depicted in Fig. 1 (a) for a small population with vertices labelled by individuals ı ı ı … , , , 1 2 9 (the edges are undirected). Note that we sometimes label the vertices starting from ı 0 to stay true to the indexing convention in sageMath/python (but this should be clear from the context).
The epidemic state of each individual at a given time can be in one of the several possible states, depending on the particularities of the epidemic model. The simplest case, known as the SI model, involves only two states that indicate whether an individual at a given time is susceptible (S) to or infected (I) by a pathogen. Under this model, the only possible state transition is from S to I as specified by the contact network. In other words, a susceptible individual can be infected by any individual in its in-neighborhood who is already infected. The contact network with its individual vertices further "tagged" by their epidemic states (S or I) is called the tagged contact network. The epidemic states of the individuals in the population after some time are shown by tagging (coloring) the infected or susceptible individuals with I or S tags (red or white colors) in Fig. 1(b).
The transmission digraph is a directed edge-labelled subgraph of the contact network containing all infected vertices and directed edges labelled by the time of transmission. It is a basic object of interest and is depicted in Fig. 1(b). The transmission digraph can also be represented by the more convenient transmission tree shown in Fig. 1(c). The internal vertices of the transmission tree correspond to times of transmission events, the below (or left) and above (or right) planar sub-trees encode who infected whom, and the leaf vertices correspond to the set of infected individuals. Since the tagged contact network co-evolves with the transmission tree, the transmission process is naturally seen as a Markov chain on the product space of tagged contact networks and transmission trees. We consider a stochastic model, as opposed to a deterministic one, to be natural because the spread of an epidemic is inherently probabilistic (Andersson and Britton, 2000).
The transmission tree captures several details about how an infection spreads through the population, including combinatorial structural information such as who infected whom, order and timing of infection events, the time it takes for a specified set of individuals to be infected, tree shape statistics such as indices of Sackin (1975) and Colless (1982), number of cherries or sub-terminal vertices (McKenzie and Steel, 2000), etc., various isomorphism classes, such as (un)ranked/(non)planar unlabelled trees and so on, but also classical epidemiological univariate statistics, such as prevalence and incidence through time, reproduction numbers and total time of epidemic.
Furthermore, by a natural extension of the pure-birth process underpinning the SI model to a birth-and-death process that is combinatorially more involved with an additional epidemic state indicating whether the individual is "removed" (R) from the population, one can extend the transmission process developed here for the SI epidemic model to the more realistic susceptible-infected-recovered (SIR) epidemic model. With such an extension, which we will not pursue in this elementary study of the simplest SI epidemic model (for reasons explained below), the leaves of the SIR transmission trees will not only be tagged by I but also by R and they will naturally capture various univariate statistics of interest to applied epidemiologists including the final-size or total number of infections (Ludwig, 1975;Pellis et al., 2008;House et al., 2012). We outline a set of combinatorial steps needed towards such a future direction of work in Section 5.1.
While various analytical results (e.g. Andersson and Britton, 2000) and computationally intensive methods (e.g. House et al., 2012) are available for various univariate epidemiological statistics and can often be obtained without explicitly modelling the tree, most insights about the structural information in the tree (even for the simplest SI epidemic model) are difficult to derive analytically and so are based on simulation studies over parametric families of specific models.
Empirical efforts to understand the transmission process have historically focused on time series and individual event times (such as infection or recovery times) as the main data source. These relatively sparse forms of data have been difficult to collect and not particularly informative, providing limited information about the transmission tree (but see Haydon et al., 2003;Wallinga and Teunis, 2004) or the underlying contact network.
Recently, there has been an increasing attention paid to using the large amounts of viral and bacterial genomic data now available to study outbreaks. The key observation suggesting this data will be informative about the transmission tree is that, if there is little withinhost viral genetic diversity, the phylogenetic tree of pathogenic genomes will match the transmission tree (though, in many cases, this assumption does not hold, Romero-Severson et al., 2014;Ypma et al.,  . This insight has seen the rise of a new area of research, known as phylodynamics (Grenfell et al., 2004), that specifically treats genomic data in the context of infectious diseases.
The ultimate goal of phylodynamic methods would be to reconstruct the transmission tree (or some sampled subtree) and therefore any interesting properties of the epidemic process. To approach this goal, we need to have good models of how transmission trees grow which, in turn, requires a thorough understanding of how the structure of the network influences the topology of the transmission tree (Frost et al., 2015).
Previous work on how network structure influences tree topology used computer simulations to vary some property of the network while attempting to hold others constant and observing their influence on simulated transmission trees. For example, Leventhal et al. (2012) investigated a number of standard random network models (Erdős-Rényi, Barabási-Albert preferential attachment and Watts-Strogatz small-worldwe also analyze these networks via simulations in Section 4.2 in addition to seven other families of contact networks in Section 4) with a range of parameter values to show that gross changes in the network structure can cause significant and detectable changes in the resulting transmission tree, as measured using the Sackin index of tree imbalance. Frost and Volz (2012) suggest that while this effect is real, it may be swamped by other effects such as sampling strategy. O'Dea and Wilke (2011) concentrate on varying degree heterogeneity in the contact network while holding mean degree constant and also find that heterogeneity is detectable in the transmission tree using standard phylogenetic methods. Welch (2011) employs a simulation approach to study the effect of clustering on transmission trees using exponential family random graph models (ERGMs). Clustering is the most basic of pure network properties, reflecting transitivity (or anti-transitivity) in relationships: if edges (i, j) and (i, k) are present, then high (low) clustering in the network implies that ( ) j k , is more (less) likely present than when (i, j) and (i, k) are not present. While some changes in various measures of the transmission tree are observed as clustering changes over a wide range of values with degree distribution held constant, a strong effect is not observed suggesting that inference of the clustering property would be difficult. More recent work (Colijn and Gardy, 2014) describes a method that roughly classifies epidemics into host population structures such as homogeneous, super-spreading (Lloyd-Smith et al., 2005) or having a path-like contact network using machinelearning classifiers trained on simulated data.
There is no work that we know of that explicitly estimates a contact network as we have described it here based on transmission trees or genetic data, though some early, ad hoc attempts exist (Leigh Brown et al., 2011). There is a series of papers (Britton and O'Neill, 2002;Groendyke et al., 2011Groendyke et al., , 2012 that uses time-series data from epidemics to infer the parameters of an ERGM but the transmission tree here is incidental and poorly inferred. It is suggested in Groendyke et al. (2012) that inference within this framework would be greatly improved by having more informative data.
Thus, insights in the literature about the structural or topological information in the tree are primarily based on simulationintensive programs over parametric families of specific models of the epidemic and the contact network. Formalizing a large class of such simulation programs as a discrete-time Markov chain with transition probabilities in Eq. (2.1) that is embedded in the continuous-time Markov chain with generator in Eq. (2.2) is our first contribution. Such a formalization along with the SageMath/Python code in Appendix A.1 concretizes the meaning of the transmission process, which currently does not seem to be defined unambiguously in the literature.
Models for population structure have increased in complexity over the years; from simple homogeneous models over a static complete network in which each individual has an equal propensity to infect any other individual, to ones which incorporate varying degrees of heterogeneity across the population (who can infect whom) and through time (time-varying contact networks). Recent reviews in Pastor-Satorras et al. (2015) and Holme (2015) summarize this literature.
Basic population genetic models such as the coalescent (Kingman, 1982) that are used for phylodynamic inference assume a fully mixing population of genomes, an assumption that is typically violated in host populations when observed on the epidemic time-scale. Moving to a more complex model such as the structured coalescent (Hudson, 1990;Notohara, 1990) or multi-type branching process  allows incorporation of a few large population features such as country of sampling, but struggles to deal with more than four or five homogeneously mixing population groups at a time (Vaughan et al., 2014;Stadler and Bonhoeffer, 2012;Rasmussen et al., 2014) and is therefore far from the fine scale heterogeneity of a given static contact networkour main focus in this paper.
Although static networks are epidemiologically reasonable approximations when the speed of epidemic spread is much faster than the speed of change in the population's structure or vice versa in the case of annealed networks (Pastor-Satorras et al., 2015, III.E), our restriction to static networks in this paper is motivated by finding the simplest and yet interesting mathematical setting to formulate the transmission process. We restrict our attention to the simplest epidemic model on a given static contact network in order to focus on explicitly modelling the random transmission tree itself, as the epidemic spreads through the population. To the best of our knowledge, Markov models of transmission trees, over a fixed contact network, and their probabilities are not available explicitly as a function of both branch-lengths and tree topologies even for well-known networks. A straightforward derivation of the probabilities of transmission trees in Section 2.1 for some simple static contact networks from the general Markov chains of Eqs. (2.1) and (2.2) is the second contribution of this paper. These examples are meant to illustrate that the general formulae hold for some special cases of contact networks.
We also restrict our attention in this paper to the most basic transmission process given by an SI epidemic model in which hosts are either susceptible (S) to or infected (I) by a pathogen. Our restriction to the simplest model is due to the following reasons. First, this model can be seen as the two-state Finite Markov Information Exchange (FMIE) process (Aldous, 2013, Section 2.2) called the Pandemic Process (Aldous, 2013, Section 7) that is shown to be a fundamental building-block (Aldous, 2013, Section 3.2,7) for a large class of FMIE processes which includes various classical epidemic models (see Sections 8, 9 and references therein Aldous, 2013). For instance, the SI model exhibits the fastest possible spread of information in any FMIE model (Aldous, 2013, Section 3.2) and it approximates the initial time evolution of the SIS (where infectious hosts return to susceptibility) and SIR (where infectious hosts are removed from the population) models (Pastor-Satorras et al., 2015, II.A). Second, we are mainly interested in allowing the underlying contact network to be essentially "arbitrary", but fixed. Specifically, we develop a biparametric Beta-splitting family of models for the growth of transmission trees in Section 3 that has the following properties: gives the exact probability of any transmission tree as a function of α and β (Theorem 1), avoids having to explicitly model the underlying contact network that is typically unobservable, can be interpreted in terms of a Beta-splitting construction for the "infection potential" of the infector and the infectee in a transmission event, contains the models generated by the complete, star and path , 1 and approaches ( − ∞) 1, , respectively, has explicit expressions for its maximum likelihood estimators from independent observations of transmission trees and their sufficient statistics (Theorem 2), specifies an equivalence class of contact networks that have the same α β ( ) , -specified distribution of transmission trees (Theorem 3) and is amenable to exact probability calculations for various equivalence classes of transmission trees as rooted (leaf-unlabelled) trees that are (un)ranked/(non)planar with or without continuous branch-lengths based on the results for the same Beta-splitting model, but studied in the context of species diversification .
The Beta-splitting model is the third and perhaps the most important contribution of this paper and is to be contrasted with what is typically done in the literature since 2000 according to Aldous (2013, Section 2.4), whereby various quantitative statements (not of the structural properties of the transmission tree itself but of its univariate summary statistics such as the time for a random individual to be infected) are made on more complex models with increasingly elaborate update rules while considering only a standard number of fixed network "geometries' (or structures) as specific contact networks or as specific random contact networks.
Finally in Section 4 we explore, mostly by simulations, the nature of distributions on transmission trees that are induced by classical families of three other deterministic and four random contact networks through their most likely Beta-splitting model parameters. Furthermore, using a sequential family of contact networks that interpolates in the space of contact networks from the star network to the circular path network via the complete network by means of edge addition and deletion operations, we show that the maximum likelihood estimate of the Beta-splitting model that are obtained from the induced transmission trees over each contact network in the family also smoothly interpolates, in the parameter space [ − ∞] 1, 2 , between that for the complete, star and path networks. These insights lead to some implications for statistical inference as described in Section 4.4. This is the fourth and final contribution of our paper. In summary, the rest of the paper is organized as follows. In Section 2 we introduce the model for the random growth of a transmission tree over an arbitrary contact network as a discretestate continuous-time Markov chain and give examples of transmission trees on three specific deterministic networks. In Section 3 we introduce a parametric Beta-splitting model for the transmission tree, derive the likelihood for a given tree, explore the relationship between this Beta-splitting model and the coupled transmission tree-contact network Markov chain model described in Section 2, obtain sufficient statistics, derive numerically robust expressions for the maximum likelihood estimator, and characterize the equivalence class of contact networks with the same Beta-splitting model of transmission trees. In Section 4 we gather insights through the most likely Beta-splitting models fitted to independent samples drawn from the distributions over transmission trees that are induced by various deterministic and random contact networks carefully chosen from several classical families of networks with implications for statistical inference. In Section 5 we discuss future directions that this work may take.

Model
Consider a population of n individuals with labels in be a map from the set of nonnegative integers Z ≔{ …} + 0, 1, 2, to the set of natural numbers no greater than n, [ ]≔{ … } n n 1, 2, , , so that, I ı ∈ ( ) i z n denotes the z-th infected individual as the epidemic evolves in the population. Thus, ı ( ) i 0 is the initially infected individual in the population. In the example of Fig. 2 Augment each vertex ı j in I n with a binary status tag: Thus the status of each vertex I ı ∈ j n is: Let k n be the complete weighted directed graph or network over the vertex set I n with weighted directed edge set Let 2 w n be the power set of w n , i.e., the set of all subsets of w n . For the given set of labelled individuals in the population I n , let the susceptible-infected contact network or SICN be the double that is composed of a weighted directed edge set ∈ w 2 w n and status tags of the individuals I ∈ { } s 0, 1 n . Now, for each Z ∈ + z , let Z ( ) → + c z : n give the SICN at discrete time z standing for the z-th infection event.
We can view the discrete-time discrete-space Markov chain with state space × n n , the product space of n , rooted planar ranked leaf-labelled binary transmission trees, and n , the set of SICNs on I n . A sample path of this Markov chain for a population of size 3 is shown in Fig. 2. We give the one-step transition probabilities for this Markov chain next. Let denote the label of the left or right vertex, respectively, subtending from the internal vertex labelled by m in τ( ) z , the transmission tree at time z. Let L τ ( ( )) z denote the set of leaf vertices, i.e., the set of potential infectors, of τ( ) z and let ı ı ( ( ) ) w cz , ; i j denote the weight of the edge between vertices labelled by ı i and ı j in c(z), the SICN at time z. Then, the one-step transition probabilities for the discrete-time discrete-space transmission Markov chain is: By the immediate precedence relation: 1 , we mean that τ ( ( + ) ( + )) z cz 1 , 1 can be obtained from τ ( ( ) ( )) z c z , by a single transmission event. Note that τ ( + ( + )) L z z 1; 1 and τ ( + ( + )) R z z 1; 1 are the latest or ( + ) z 1 -th infector and infectee labels in I n .
Thus, in other words, the transition probability of reaching state τ , the weight of the edge from the ( + ) z 1 -th infector to the ( + ) z 1 -th infectee, that is normalized by the sum of the edge-weights ı ı ( ( ) ) ℓ w cz , ; j from every potential infector, i.e., L ı τ ∀ ∈ ( ( )) ℓ z , to every potential infectee within its susceptible outneighborhood of the SICN at time z, i.e., I ı ∀ ∈ j n such that ( ) = s z 1 j . Independent samples of transmission trees from the Markov chain with transition probabilities in Eq. (2.1) over a given SICN C and an initial infected individual initialI can be generated using transmissionProcessTC(C,initialI), an algorithmic implementation using SageMath/Python (http://www.sagemath. org) in Appendix A.1.
By allowing the time for each infection event to be exponentially distributed with rate λ > 0, we obtain a continuoustime discrete-space Markov chain from the jump chain in Eq. (2.1) with the following generator: Note that the parameter λ is usually called β in the epidemiology literature; we use λ to avoid confusion with notation introduced later in the article.
Remark 1. This continuous-time transmission Markov chain and its embedded jump chain are nonparametric since the underlying state space allows for transmission trees to encode an SI epidemic evolving on arbitrary contact networks, i.e., any element of 2 w n . We mainly formulate the model to be concrete about what is typically simulated by computational epidemiologists. We will often, as done in epidemiology, assume that the edges are bi-directional or "undirected". We also focus on connected contact graphs under the assumption that the ideas can be applied to each connected component of a disconnected contact network (but see Section 5 for generalization to generic digraphs that may contain a strongly connected giant component).
To gain concrete insights, let us consider the generator of Eq. (2.2) for three specific cases of the contact network.

Examples
Let us look at Eq. (2.2) for specific initial SICN and initial distributions for the 0-th infected individual. We focus on three of the simplest contact networks to concretely study the effect on the transmission tree distributions they induce.

Transmission on complete network
If the contact network is initially the complete network, i.e., complete weighted directed graph, k n on I n with weights ı ı ( ) = w , 1 i j for each ı ı ≠ i j , then since there are z infected individuals and − n z individuals in each of their susceptible out-neighborhoods after the z-th infection event, the one-step transition probability in Eq. (2.1) simplifies to the following: Due to independent exponential waiting times at rate λ, the probability of a transmission tree with branch-lengths 1: , after m infection events, is:  Note that when = − z n 1 and the entire population is infected, then each of the discrete transmission trees (ignoring the branchlengths) with n leaves labelled by I n is equally likely: Thus, the number of discrete transmission trees over the complete SI contact network, initialized uniformly at random from any individual in I n , for different values of ∈ { …} n 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, is given respectively by: 2,12,144,2880,86400,3628800,203212800,14631321600,1316818944000, .

Transmission on star network
If the only initially infected individual is I ı ı = ∈ ( ) ⋆ i n 0 and the initial SI contact network is the star network, ⋆ n , centered at ı ⋆ with directed edge weights , then since there are − n z individuals in the non-empty susceptible outneighborhood of the only possible infector ı ⋆ after the z-th infection event, the one-step transition probability in Eq. (2.1) simplifies to the following:  Note that if = − z n 1 and the entire population is infected then each of the discrete transmission trees, with the "left-branching comb" topology (ignoring the branch-lengths) such that the leftmost leaf is labelled by the first infected individual ı ı =  Thus, the number of discrete transmission trees over a star contact network on I n with the initially infected individual having degree − n 1 is ( − )! n 1 (Fig. 3).

Transmission on path network
If the contact network is the path network on I n with directed edge weights equalling 1 along a linear path, and the initial infected individual ı ( ) i 0 is at the beginning or source vertex of the path, then since there is exactly 1 individual in the non-empty susceptible out-neighborhood of the only possible infector after the z-th infection event, the one-step transition probability in Eq.
(2.1) simplifies to the following: with the corresponding star network ⋆ 3 with vertices colored by their susceptible (lightly shaded) or infected status (darkly shaded) over a population of 3 individuals labelled by I ı ı ı = { } , , 3 1 2 3 . After the first transmission event from ı 3 to ı 1 with probability 1/2, the transmission tree splits with the internal vertex labelling the first infection event by 1 and the first infector ı 3 labelling its left leaf vertex and the first infectee ı ı = ( ) i 1 1 labelling its right leaf vertex (middle panel). In the final absorbing state (right panel), with probability 1, the transmission tree has a new internal vertex labelled by 2 for the second infection event with its left leaf vertex labelled by the second infector ı 3 and its right leaf vertex labelled by the second infectee ı ı = ( ) i 2 2 . (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) beginning of the path and 0 otherwise. Then the probability of a discrete transmission tree τ( ) Due to independent exponential waiting times at rate λ, the probability of a transmission tree with branch-lengths Thus when = − z n 1 and the entire population is infected, the discrete transmission tree with the "right-branching comb" topology (ignoring the branch-lengths) with the right-most leaf labelled by the latest infectee is the only possible one (Fig. 4).

Branch-lengths
We can obtain the expected branch-length of the transmission tree between the ( − ) z 1 -th and z-th infection event or equivalently when there are z infected individuals by simply taking the mean of the exponentially distributed holding-time random variable in the generators given by Eqs. (2.4), (2.8) and (2.12) as shown in Fig. 5.
Here we take the 0-th infection event as the initial infection.
Thus, if the underlying SI contact network is k n then initially at the start of the transmission, the transition rate is λ × ( − ) where T 1 is the duration of the epoch when there is only one infected individual. In general, T z is the duration of time when there are z infected individuals and is the length of the transmission tree when there are z branches, where ∈ [ − ] z n 1 . The transition rate λ × ( − ) z n z increases and the expected branch-length λ ( ) = ( × ( − )) E T z n z 1/ z decreases at the z-th infection event as z increases to n/2. The expected branch-length is smallest at λ ( ) n 4/ 2 when = z n/2 and then starts increasing to λ ( ( − )) n 1/ 1 as → − z n 1 when all n individuals are infected. This is shown as a "bath-tub" curve in Fig. 5. This means that the branch length of the tree at the z-th transmission step, which gives the duration of continuous time taken for the z-th infection event, will have mean length λ ( × ( − )) z n z 1/ , such that any one of the z infected leaf vertices can branch with uniform probability z 1/ at equal rate λ( − ) n z to infect one of the ( − ) n z susceptible (and yet uninfected) individuals with uniform probability ( − ) n z 1/ . The sampling distribution of branch-lengths between consecutive infection events from 500 independent simulations of the transmission tree is shown in Fig. 6 and two typical transmission trees with branch-lengths and topologies over the complete SI contact network for a population of size n¼ 50 is shown in Fig. 7.
Furthermore, by rescaling time in units of population size with λ = ( − ) n 1/ 1 , the time of the z-th infection event, T z , is independent exponential random variable with rate ( − ) ( − ) z n z n / 1 Fig. 5. Expected branch-lengths when there are z infection events or + z 1 infected individuals, ( ) E T z , for the three cases. Here n¼ 50 and λ = 1. and vertices colored by their susceptible (lightly shaded) or infected status (darkly shaded) over a population of 3 individuals labelled by I ı ı ı = { } , , 3 1 2 3 . After the first transmission event from ı 3 to ı 1 with probability 1, the transmission tree splits with the internal vertex labelling the first infection event by 1 and the first infector ı 3 labelling its left leaf vertex and the first infectee ı ı = ( ) i 1 1 labelling its right leaf vertex (middle panel). In the final absorbing state (right panel), with probability 1, the transmission tree has a new internal vertex labelled by 2 for the second infection event with its left leaf vertex labelled by the second infector ı 1 and its right leaf vertex labelled by the second infectee ı ı = ( ) i 2 2 . (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) and satisfies the following randomly shifted-logistic-limit (see for e.g. Aldous, 2013, Eq. 7.13): where F is the logistic function: 1 , when the SI contact network is the star network (⋆ n ) or the path network (p n ), is inversely proportional to ( − ) n z or independent of z and n with ( ) E T z equalling λ ( ( − )) n z 1/ or λ 1/ , respectively, as depicted in Fig. 5.

A biparametric Beta-splitting transmission process
We gave a non-parametric description of the transmission process for arbitrary contact networks in the previous section. This Markov construction over the state space of SI contact networks and transmission trees can be too detailed. Often, one does not have knowledge of the state space at this detailed resolution so it is useful to construct transmission processes without explicitly tracking the underlying SI contact network. Here, we give a parametric construction for such a process, by integrating over a Beta-splitting family of transmission trees with interval-labelled leaves, that captures the three Examples in Sections 2.1.1-2.1.3 as special cases.
The biparametric Beta-splitting model is described in Sainudiin and Véber (2016) for evolutionary trees. We adapt that construction here for transmission trees. To match the standard definition of the Beta distribution, for any α β > , 1 as the Beta-splitting density (for α β > − , 1 ), with density proportional to . This parametric choice corresponds to that used by Aldous (2001) for the symmetric case with α β = .
be a sequence of independent and identically distributed (i.i.d.) random variables, with the . Thus, each of these variables takes its values in [ ] 0, 1 . We call the generating sequence for the Beta-splitting trees. It will be the basis of an incremental construction of transmission tree as a labelled ranked planar binary tree with m leaves and − m 1 internal vertices. Our core idea relies on decomposing the transmission tree construction into two stages: (1) constructing a random transmission tree without infector-infectee leaf labels such that it biparametrically captures an essential aspect of the underlying SI contact network's structure and (2) labelling the leaf vertices with infected individuals from I n for each transmission or splitting event from stage (1). Stage (2) is optional and the construction of transmission trees without leaf labels from I n can be obtained just from stage (1)such leaf-unlabelled transmission trees can still provide useful prior distributions for integration during inference with partial observations.
Stage (1) of the transmission tree construction involves a deterministic mapping followed by an integration. We first describe the deterministic mapping that takes a realization of the generating sequence N ( ) ∈ G z z and turns it into a Beta-splitting tree, i.e. a planar binary tree in which the internal vertices are ranked with integer labels and the leaves are labelled by subintervals that partition [ ] 0, 1 . We then describe an integration over α β ( ) , -specific random partitions by such sub-intervals.
As we shall see below, the integer labels of the internal vertices will give the order in which these vertices have been split during the construction, i.e., the order of infections or successful transmissions. The interval labels of the leaves will form a partition of the interval [ ] 0, 1 and will be used to decide which leaf is split and becomes an internal vertex in the next step. The left and right leaf vertices resulting from a split stand for the infector and infectee in the underlying (unobserved) SI contact network after the infection event. Let be a realization of the generating sequence. The organizing map O(g) proceeds incrementally as follows, until the tree created has m internal vertices and + m 1 leaves. We start with a single root vertex, labelled by the interval [ ] 0, 1 .
Step 1: Split the root into a left leaf with interval label [ . Change the label of the root to the integer 1. Step

then split the left child vertex of the root into a left leaf and a right leaf labelled by
, then instead split the right child vertex of the root into left and right leaves with respective labels [ . Label the former leaf that is split during this step by 2.
Step z: Find the leaf whose label [ ] a b , contains u z . Change its label to the integer z and split it into a left leaf with label [ Stop at the end of Step m.
In other words, at each step z, the interval labels of the leaves form a partition of the interval [ ] 0, 1 . We find the next leaf vertex to be split by checking which leaf interval contains the corresponding u z and then b z is used to split the interval of that former leaf, say with z . Thus, the width of the left interval of a current leaf vertex that is about to be split should be constructed by the Beta-splitting density such that it is proportional to all infection events that will subtend from the current infector and its future infectees after this infection event.
Similarly, the width of the right leaf label of this current leaf vertex should be such that it is proportional to all infection events that will subtend from the current infectee and its future infectees. Intuitively, one can think of the width of the interval label of a leaf vertex as the infection potential of the individual associated with that leaf and the widths of the left and right interval labels upon a split or an infection event as the infection potentials of the infector and the infectee, respectively, after the event. Thus, the Beta-splitting trees capture the essence of transmission trees that are co-evolving with underlying SI contact networks, without explicitly requiring complete knowledge of the networks during their construction. The internal vertex just created is then labelled by z to record the order of the splits. At the end of step z, the tree has + z 1 leaves, and so we stop the procedure at step m to ensure + m 1 leaves, where ≤ ≤ − m n 1 1. Fig. 8 shows an example of such a Beta-splitting tree construction for m¼3.
After the Beta-splitting construction, we first integrate over N ( ) ∈ G z z to "erase" the interval-valued leaf labels and then assign infected individuals in I n as leaf labels to obtain transmission trees from integrated Beta-splitting trees. These trees have m integerlabelled internal vertices or splits and + m 1 unlabelled leaves. The process of assigning leaf labels from I n via a pre-order traversal on the m internal vertices, in increasing order, i.e., Stage (2) of the construction, is described next.
We start with the internal vertex labelled 1 and assign the initial infected individual ı ( ) i 0 to its left child vertex. Then we assign the first infectee to the right child vertex of 1. In general, as we descend down the internal vertices of the integrated Beta-splitting tree in increasing order of its integer labels we slide the individual label ı ℓ to the left of the split and assign a new label to the right vertex as the infectee ı j chosen according to the infectee distribution for ı ℓ : the probability that ı ℓ infects ı j at discrete time-step z. This distribution is defined to be generic on purpose, without necessarily making explicit reference to c(z), the underlying SI contact network at time z, that is typically unknown or partially known. We can always obtain specific form for Eq. (3.2) by making explicit assumptions on c(z) via the infector-specific infectee distribution within Eq. (2.1):

Probability of a given Beta-splitting transmission tree
For a given (leaf-unlabelled) ranked planar tree, and an internal vertex labelled by i, let us write s i L (resp., s i R ) for the number of internal vertices in the left (resp., right) subtree below vertex i. In particular, if vertex i subtends two leaves, then Theorem 1. The probability of any discrete transmission tree τ( ) m with m splits and + m 1 leaves under the integrated Beta-splitting model is: was defined in Eq. (3.1) and Pr i 0 . We now focus on the first term in Eq. (3.4) which results from integrating over the ( , , the probability that it is split during the z-th step is − b a, the probability that the uniform , 0 ,1. If it is chosen to split, it is given label z and the left and right leaves created are labelled by intervals of respective lengths Then these intervals may split later, but into intervals of lengths that are always proportional to B z or − B 1 z (respectively). Now the probability of the tree τ is the product of the m probabilities of choosing a given leaf to split at each step, each of which is equal to the length of the interval labelling that leaf. As a consequence, each split occurring in the left subtree below vertex z brings in another B z in the product, or another − B 1 z if the split occurs in the right subtree below vertex z. Averaging over the possible values of the B z 's, which are independent α β ( + + ) 1, 1 random variables, yields the result.
Finally, to prove the first term in Eq. (3.5) we exploit the fact that s L and s R are non-negative integers and repeatedly apply the following elementary properties of the beta function: Remark 2. Note that the expression for the probability of a transmission tree with m infection events given by Eq. (3.5) as a function of the parameters α and β, i.e., the likelihood function, only involves additions, multiplications and divisions. It is therefore numerically more robust during local optimization for maximum likelihood estimation in Section 3.3 than the expression in Eq. (3.4), which further requires numerical evaluations of the beta function.

Examples
Now we reconsider the three specific SI contact networks and show that they arise for specific values of α and β.
Recall that α β ( ) B , is related to the Gamma function Γ by the and that Γ β 3.2.1. Complete network underlies Beta-splitting transmission trees with α β = = 0 Let us assume that the initial infection is uniformly distributed in I n and that the SICN is the complete contact network k n with unit weights as in Section 2.1.1 and show that the probability of the discrete transmission tree after m infections has the same probability as Section 2.5.
The first term in Eq. (3.4) with α β = = 0 simplifies as follows: where the second equality is obtained by observing that is the number of internal vertices of the tree rooted at vertex z, which is the left or the right subtree below the internal vertex z. Hence, each term ! s z L in the numerator of the product cancels with the term in the denominator that corresponds to the left child vertex of z, except if = s 0 z L and the left child vertex of z is a leaf. But in this case, != 0 1 by convention. The same holds true for each of the ! s z R . Likewise, the terms in the denominator which are not compensated by some term in the numerator are those corresponding to internal vertices having no ancestral vertices. But the only such vertex is the root (z¼ 1) with . This gives us the result.
From Eq. (3.3), the infectee probability is uniformly distributed over − n z infectees for each infector at time-step z and thus the second term in Eq. (3.4) simplifies to: Finally, putting Eqs. (3.9) and (3.10) into Eq. (3.4), we get the desired identity with Eq. (2.5). Since the probabilities of the discrete transmission trees are identical between the integrated Betasplitting trees with α β = = 0 and the construction of Section 2.1.1 with an explicit complete SI contact network, the continuous-time process will also be identical to Eq. (2.4) due to independent Exponential rates for the infection events.
Remark 3. The transmission tree thus constructed with α β = = 0 corresponds to Yule (1924) model for evolutionary trees (ignoring planarity and leaf labels). This Beta-splitting construction is very different from the standard evolutionary construction of the Yule tree, in which the next leaf to split is chosen uniformly at random from among the current set of leaves. Here the choice of the next split is dictated by the lengths of the intervals labelling the current leaves, which will all be distinct with probability one. However, by averaging over the law of the generating sequence (when α β = = 0) yields the same uniform distribution on rooted ranked planar binary trees with m splits and + m 1 unlabelled leaves. These ! m many trees are in bijective correspondence with permutations of { … } m 1, , through the increasing binary tree-lifting operation (Flajolet and Sedgewick, 2009, Ex 17, p. 132).

Star network underlies Beta-splitting transmission trees with
To obtain a left-branching comb we let α β ( ) , approach the limiting bottom-right corner (∞ − ) , 1 of the parameter space. As α → ∞ from the left and β → − 1 from the above, the α β ( + + ) 1, 1 distribution concentrates on the boundary of the support at 1. In the limit, each random variable B z in the generating sequence takes the value 1, with probability 1. Thus, the root is first split into a left leaf with label [ ] 0, 1 and a right leaf with label { } 1 (i.e., an interval reduced to a single point 1). Next, the uniform random variable U 2 belongs to the interval [ ] 0, 1 with probability one, so that the left leaf labelled by [ ] 0, 1 is necessarily that chosen to split next. Again, it is split into two leaves with left leaf label [ ] 0, 1 and right leaf label { } 1 , implying that the next leaf to split is again the left one which inherited the full interval [ ] 0, 1 with probability one. This recursive reasoning can be carried on until step m with + m 1 leaves. Hence, morally the tree corresponding to α → ∞ and β → − 1 is a fully unbalanced tree, with a left-branching comb with + m 1 leaves. See Fig. 9 for an example with m¼3. Recall that this is exactly the transmission tree obtained when the underlying SICN is the star network of Section 2.1.2.
For Stage (2) of the construction where we assign leaf labels to the integrated Beta-splitting tree we assume that the underlying SICN is the star network initialized at the source vertex. Since there is only one discrete transmission tree topology, i.e., the leftbranching comb, we can label the leaves of the integrated Betasplitting tree in n z 1 1 many ways to obtain the same probability in Eq. (2.9) for the discrete transmission tree with individual leaf labels from I n .
3.2.3. Path network underlies Beta-splitting transmission trees with α β → − → ∞ 1, By an analogous argument to that in Section 3.2.2 with β → ∞ and α → − 1, the α β ( + + ) 1, 1 distribution concentrates on the boundary of the support at 0 and each random variable B z in the generating sequence takes the value 0, with probability 1. Thus, the only discrete transmission tree topology for the Beta-splitting tree with α β ( ) → ( − ∞) , 1 , , the limiting top-left corner of the parameter space, is the right-branching comb shown in Fig. 9 (b), the same one obtained by assuming that the underlying SICN is the path network in Section 2.1.3. By further assuming that the underlying SICN is the path network for the leaf-labelling Stage (2) with the initial infection spreading from the individual ı ↪ at the beginning of the path as in Section 2.1.3, we obtain exactly one possible labelling and obtain the same probability in Eq. (2.13).

Maximum likelihood estimation and sufficient statistics
In order to find the maximum likelihood estimates of α and β for the Beta-splitting model that give the most likely explanation for the transmission trees sampled from an arbitrary SICN (under the likelihood principle), we use the following inferential procedure: Step 1: Generate a sample of r independent transmission trees τ τ τ ( … ) , , , r 1 2 from: the given SICN C and initial infected individual initialI by calling transmissionProcessTC(C,initialI) in Appendix A.1 r times.
Step 2: Compute α β ( ) , , the maximum likelihood estimate (MLE) of the parameters by maximizing the likelihood function as follows: a r gm a x P r ; , .
The probability of the tree τ i for a given α β , is obtained from a post-order traversal of τ i to compute the first term in Eq. (3.4). To focus on the jump chain's discrete structural information in the transmission trees, our likelihood of the transmission tree ignores leaf labels and the waiting times between events as implemented in Appendix A.2. Note that such additional information can be easily incorporated into more elaborate likelihood functions derived from Eq. (2.2) as outlined in Remarks 6 and 7.
Theorem 2. The likelihood of all r transmission tree topologies only depends on the sufficient statistic of split-pair frequencies: Therefore, the maximum likelihood point estimate for the Beta-splitting model based on r independent transmission trees, each with n leaves, is obtained by maximizing the per-vertex loglikelihood: Using the fact that s L and s R are non-negative integers we can exploit the properties of the beta function given by Eq. (3.7) to further simplify the likelihood function from Eq. (3.14), as follows:  To focus on the jump chain's discrete structural information about the combinatorial skeletons of the contact network buried within the distribution over discrete transmission trees, as a necessary prelude to Theorem 3, our likelihood expressions in Eqs.
(3.12)-(3.15) are skeletal and ignore leaf labels and the waiting times between events (see code in Appendix A.2). However, the likelihood function can be extended as discussed in Section 4.4.3.
The demonstration at the end of Appendix A.2 shows two independent MLE computations based on r ¼10 independent transmission trees (without branch-lengths and leaf labels) that were sampled from the complete SICN on n ¼50 vertices. The MLE α β ( ) , takes the following realizations: ( − − ) 0.0664, 0.0502 and ( − ) 0.0047, 0.0430 . As expected, these are close to α β ( ) = ( ) , 0 ,0, the parameters of the Beta-splitting model corresponding to the transmission tree distribution generated from the complete SICN. The variability in MLE is expected due to natural sampling variability. The MLE is ( ) 0.0279, 0.0325 from another trial based on r ¼1000 independent transmission trees drawn from the same complete SICN on 50 vertices. Fig. 10 shows the sufficient statistics for the parameters α and β in these three trials.

Equivalence class of contact networks with the same Betasplitting model
The maximum likelihood estimate and the sufficient statistics for the parameters from Section 3.3 finally lead to a partitioning of all contact networks by an equivalence relation of having the same effective Beta-splitting model for transmission trees. Consider n 0 , the set of all initial SICNs, i.e., the set of all SI-tagged contact networks on n vertices with a single initial infector labelled without loss of generality by ı 0 and from whom the infection can spread to the remaining − n 1 individuals: ı ı ı … − , , , n 0 1 1 . Thus, n 0 is the set of all initial distributions starting from a single individual for our Markov chain in (2.11), i.e. with initial condition τ ( ( ) ( )) c 0 , 0 , where τ( ) 0 is the tree with ı 0 as its only vertex and ( ) c 0 is the initial SICN. Note that ( ) c 0 , being an SICN, encodes that ı 0 is infected and all other vertices are uninfected initially. Let τ ( ( )) c Pr ; 0 be the probability distribution on the space of transmission trees with n leaves (without branch-lengths) at the end of the transmission process (when all n individuals are infected) after starting from ( ) c 0 according to (2.11). Finally let { (( ) ( )) ( ) ∈ } s s c s s Pr , ; 0 : , L R L R n be the probability distribution on the split-pairs in n that is further induced by τ ( ( )) c Pr ; 0 . Note that as the number of independent transmission trees r approaches infinity, the empirical relative in the per-vertex loglikelihood of Eq. (3.12) will produce an asymptotic or exact and possibly set-valued MLE in a deterministic manner without any standard error that is caused by finite r. We use this fact to prove Theorem 3.  (3.19). In other words, if two initial SICNs in n 0 have the same distribution of split-pairs on n then they are indistinguishable by the exact MLE of the Beta-splitting model. We refer to the following map as the Beta-projection of the initial SICNs into the quarter-plane: We simply define the map and its inverse image in Eq. (3.18). □ Note that this equivalence relation is based on the rooted ranked planar binary topology of the tree and ignores other informative statistics of the transmission tree, including waiting times and individual labels.
Remark 4. In practice, we will only be able to obtain the estimated effective Beta-splitting model associated with an initial SICN by finding α β ( ) , , the MLE from finitely many transmission trees drawn from the transmission process (i.e., with < ∞ r and positive standard error). As shown in Table 1 and Fig. 11, the MLEs can distinguish different underlying contact networks even when r is finite.

Elements from the equivalence class of the source-initialized path network
To show that this equivalence relation is non-trivial, we give a concrete example of an equivalence class that not only contains the path network initialized at the source vertex (as in Section 3.2.3) but also n other initial SICNs. Recall that the only discrete transmission tree topology for the Beta-splitting tree with α β ( ) → ( − ∞) , 1 , is the right-branching comb, the same one obtained by assuming that the underlying SICN is the path network with the initial infection spreading from the individual, say ı 0 , at the beginning of the path or the source vertex as in Section 2.1.3. To obtain other initial SICNs in the same equivalence class as the path network that is initially infected at the source vertex, let us consider the unidirectional circular network 1 0 . We can imagine the network being laid out on the plane along a circle. It is clear that we can have the infection initialized from any ı i and it will spread sequentially along the circular path until all remaining individuals are infected in tandem, say anti-clockwise. Thus the transmission tree (ignoring leaf labels) generated on the unidirectional circular network by starting from any one of the n individuals is identical to the right-branching comb under the path network initialized at the source vertex. This simple construction gives us + n 1 initial SICNs that belong to this equivalence class from two different underlying networks, namely circular and linear (unidirectional) path networks, but with the same relative split-pair frequencies given by the following uniform distribution on a side boundary of n , provided ≥ n 2:

L R L R n
This is due to each of the r independent transmission trees being identically equal to the right-branching comb with split-pairs: 2 , 0, 3 , , 0, 1 , 0, 0 . Furthermore, in this simple example the probability of a split-pair is identical to its relative frequency for any ≥ r 1 due to all probability being concentrated on one tree, i.e.,

Classical families of contact networks and some inferential implications
We have already seen the values of α and β that prescribe the exact distribution over transmission trees when the SI-tagged contact network or SICN is the complete, path or star network.
Here we explore other families of deterministic and random contact networks, primarily via simulations (using the generic code in Appendices A.1 and A.3), in order to obtain the sampling distributions they induce on the space of transmission trees. We further use samples from this distribution to obtain the maximum likelihood estimates (MLEs) for their corresponding effective Betasplitting models (with code in Appendix A.2). These explorations are meant to strengthen one's intuition about the influence of various classical families of SICNs on the MLEs and their standard errors over ⎡ ⎣ ) − ∞ 1, 2 , the shared parameter space of their effective Beta-splitting models. As discussed in Section 4.4, there are some natural inferential implications from these insights that can go well beyond the classical frequentist point estimation under the likelihood principle that is primarily pursued here. More concretely, we simulate r transmission trees from various classical families of host contact networks using the Markov chain in Eq. (2.1) and tabulate the maximum likelihood estimates for α and β corresponding to their effective Beta-splitting models in the practical sense of Theorem 3 (with < ∞ r ) as per Remark 4. We study transmissions on three more deterministic and four random contact networks or parametric families of them. The mean and standard error (s.e.) of the MLEs α and β based on transmission trees simulated from various contact networks over a few trials are tabulated in Table 1 with their IDs, and these IDs are depicted pictorially in Fig. 11. These simulation results, which we will try to make sense of below, do indicate that the MLEs can indeed distinguish different underlying contact networks to an extent even when r is finite. A more exhaustive study of other contact networks is possible by extending the generic code in Section Appendices A.1 and A.2 beyond the ten models studied here (as coded in Appendix A.3). We warn that the local optimization routines used here are non-rigorous although we have been careful by using multiple initial conditions and choosing the numerically most stable expressions for the likelihood functions for each case. Ideally, the MLEs should be rigorously enclosed using interval arithmetic even through the natural interval extension (Hofschuster and Krämer, 2003) of the per-vertex likelihood function (which we have not done here).

Deterministic contact networks
We have already seen the complete, path and star networks as our guiding examples in Section 2.1 and their corresponding exact Betasplitting models in Section 3.2. We also saw that the unidirectional circular network is in the same effective Beta-splitting equivalence class as the path network in Section 3.4.1. In this section we explore a few key families of deterministic contact networks to further extend our insights by interpolations of the ones already seen, when possible.

Bidirectional circular path network
Let us extend the unidirectional circular network of Section 3.4.1 to a bidirectional circular network by making each edge bidirected (or undirected). Thus is the edge set for this bidirectional circular network. With bidirectionality, we can have randomness in the transmission trees unlike the deterministic right-branching comb for the unidirectional case. This is because the next infection event can be either in the left or the right subtree of the root vertex encoding the first infection event. The initial infector could be any one of the vertices due to circular symmetry of the network and we take this to be ı 0 without loss of generality. Thus the probability that the next infection is in the right or the left Table 1 The mean and standard error (s.e.) of the MLEs α and β based on transmission trees simulated from various contact networks in replicated trials. Here, s.e. is the sample standard deviation over the trials.

ID
Contact network n r Trials α (s.e.) β (s.e.) subtree of the root is equally likely, provided the current number of leaves (number of individuals infected) is less than n and each edgeweight in this contact network is 1. Finally each of these subtrees will be deterministically right-branching combs due to the fact that there is only one infected individual in each subtree that is capable of infecting its only uninfected out-neighbor, if any. We call such trees with only right-branching subtrees of the root vertex as orbsor trees. Two such orbsor trees generated from the bidirectional contact network with n¼6 and initialized from ı 0 are shown in Fig. 12. Putting all these facts together we can directly see that the probability distribution on transmission trees is uniformly distributed over the − 2 n 1 orbsor trees that form as subset of the !( − )! n n 1 many possible transmission trees, as follows: if is an orbsor tree with leaves, For comparison with our three core examples, recall that the distribution on transmission trees induced by the complete, star and path networks are also uniformly distributed on the following subsets of the set of transmission trees: the entire set, the set of ( − )! n 1 left-branching comb trees and the singleton set of the right-branching comb tree, respectively. Thus, our first four deterministic contact networks induce uniform distributions on various subsets of the set of all transmission trees.
As shown in Table 1, α β ( ) , , the mean MLE of the effective Betasplitting model based on r¼1 or r¼ 100 transmission trees drawn from such a network over n¼50 individuals is close to ( − ) 0.98, 1.5 .
They are depicted by IDs 4 and 5 on the top left corner of the zoomedout image at the bottom of Fig. 11. Although some combinatorial bookkeeping may allow one to analytically pursue the transformation of the distribution in Eq. (4.1) to n (see last paragraph of Sections 4.4.1 and 5.3), we begin to content ourselves with mere simulation results in preparation for more complicated networks that are not easily amenable to extractions of exact expressions. The mean MLE of ( − ) 0.98, 1.5 for this bidirectional path network, that is binomially composing the right-branching comb of the path network on either side of its root vertex, makes intuitive sense because it is not as extreme as ( − ∞) 1, , the MLE of the path network at the boundary of the parameter space in its limit (with ID 3 that can be imagined in Fig. 11).

Balanced tree network
is the perfectly balanced tree of height ≥ h 1 and whose root has degree ≥ d 2. The number of vertices in this network is and the number of edges is − n 1. Balanced tree networks can be thought of as a biparametric extension of the star network which is equivalent to ( − ) n BalancedTree 1, 1 . The transmission tree generated on such perfectly balanced tree contact networks is unique if we ignore vertex labels and branch-lengths. We refer to them as left-branching d-sharks in the visual spirit of left-branching combs for star networks. Instead of giving a recursive formula for these trees we illustrate them by examples for ( ) BalancedTree 3, 2 and ( ) BalancedTree 2, 3 in Fig. 13.
contact network produces a distribution on transmission trees that is concentrated on a single   left-branching d-shark tree (ignoring all vertex labels and branchlengths) and when = − d n 1 and h¼1 the left-branching ( − ) n 1 -shark tree is the left-branching comb tree for the star contact network. The mean MLE of the effective Beta-splitting models are shown in Table 1 for various values of d and h. As d and h approach − n 1 and 1, respectively, while keeping the population size n as close to 1000 as possible, the corresponding mean MLEs are approaching (∞ − ) , 1 , the limiting MLE of the star network with ID 2 and the ( ) BalancedTree 999, 1 with ID 12 as expected. This tendency toward (∞ − ) , 1 is depicted by the sequence of IDs 6-11 in Fig. 11. The standard errors are zero due to the uniqueness of the transmission tree (at the sufficient but not minimally sufficient resolution of rooted, unranked, planar and leaf-unlabelled tree) that is realized under each ( ) d h BalancedTree , contact network.

Toroidal regular grid network
We identify the vertices along the two pairs of opposite edges and the three pairs of opposite faces in the regular finite 2-dimensional (2D) square grid with × = n n n individual vertices and the 3-dimensional (3D) cube grid with × × = n n n n 3 3 3 individual vertices, respectively. A 2D toroidal grid with n ¼9 is shown in Fig. 14. The figure also shows the sampling variation in three independent transmission trees grown on the network with initial infection at ı 0 . Due to the toroidal structure, the transmission tree distributions are invariant to the initial infection (ignoring leaf labels). The mean MLEs of the effective Beta-splitting model corresponding to 2D and 3D toroidal grids with n around 10 3 and 10 4 are depicted in Table 1. The mean MLEs based on one transmission tree seem to be fairly concentrated about ( − − ) 0.9, 0.66 and ( − − ) 0.76, 0.5 for contact networks on toroidal 2D and 3D grids with about = n 10 4 individuals, respectively. There is also an effect towards ( − − ) 1, 1 for the 2D and 3D cases as n increases from about 10 3 to 10 4 as depicted by the ID pairs (13, 14) and (15,16) in Fig. 11, respectively.

Random contact networks
Although random graph models of contact networks add another level of randomness, we can informally think of a static contact network as a typical realization of a random network model (Aldous, 2013, Section 2.5). Thus the transmission process on any given static contact network can be used to provide insights into the sampling distribution of transmission trees for a large class of random network models already available in SageMath's graph libraries. For example, the following code: can produce 1000 independent samples of transmission trees from 1000 independent realizations of the random k-regular graph over n vertices. We explore some basic random graph models to gain insights into the distribution of transmission trees and the MLEs of their effective Beta-splitting models.
Remark 5. Let us note that one may also study the distribution of transmission trees for a specific realization of a given random contact network or its partially observed sub-network. Such subtle distinctions, which in turn will depend on the exact decision problem and the available data at hand, can be pursued by modifying our basic code in Appendix Appendix A. But here we limit ourselves to the random sense involving multiple independent trials such that each trial is a realization of a full transmission tree with n leaves on each realization of a given random contact network with an initially infected node (i.e., a static initial SICN).

Erdős-Rényi random network
The Erdős-Rényi random network denoted by ( ) n p ER , on n vertices is obtained by inserting each of the ( − ) n n 1 /2 undirected edges independently with probability p (Erdős and Rényi, 1959;Gilbert, 1959). We interpret the undirected edges as being bidirected to obtain our random contact network ( ) n p ER , to study transmission trees evolving on the connected component of ( ) n p ER , containing the initially infected individual ı 0 . Thus ( ) n p ER , becomes more dense and approaches the complete network as the edge probability → p 1 or equivalently as the average vertex degree λ≔ → np n. We observed more than 90 vertices on average in the connected component containing ı 0 if > p 0.03 when n ¼100. This is sensible because ( ) n p ER , is known theoretically to have a unique giant component containing a positive fraction of the vertices almost surely if λ > 1. We are primarily interested in the regime where > ( ) = ( ) ≊ p n n log / log 100 /100 0.0461 (in Table 1) or λ > ( )≊ log 100 4.61 (in Table 15) when ( ) n p ER , is known theoretically to be connected almost surely so that all n individuals can be eventually infected from the initial infection at ı 0 .
Maximum likelihood estimates α and β of the effective Betasplitting model based on r¼30 independent transmission trees that were grown on independent realizations of λ ( ) ER 100, /100 and replicated in five independent trials are shown in Fig. 15 as a function of λ. The mean and standard errors of the MLEs are also given in Table 1.
Note that the standard errors are higher when compared to those of the deterministic contact networks if r¼1 (results not shown). This is naturally due to the additional randomness introduced by distinct realizations of the ( ) n p ER , random contact network across the trials. The standard errors in Table 1 have been reduced by increasing r to 30. As expected, the MLEs plotted as points in Fig. 15 and the mean MLEs in Table 1 and their corresponding IDs 17-24 in Fig. 11 approach the origin as λ approaches 100 and the contact network ( ) ER 100, 1 becomes the complete network.
Interestingly ID 6 of the deterministic ( ) BalancedTree 2, 9 has its MLE fairly close to the mean MLE of ( ) ER 100, 0.050 with ID 19. When h is increased by 1 from 6 (with n¼127) to 9 (with n¼1023), the mean MLE of ( ) BalancedTree 2, 9 starts from ( − − ) 0.3259, 0.05752 and reaches ( − − ) 0.4052, 0.1477 piece-wise linearly with different slopes slightly over 1. These results are in sync with a standard probability trick of approximating a sparse connected ER using trees.

Random regular network
Each vertex in a d-regular graph has the same degree d or number of neighbors. A d-regular directed graph or network must also satisfy the stronger condition that the indegree and outdegree of each vertex are equal to each other. A random d-regular graph is a graph drawn uniformly at random from the set of all d-regular graphs on n vertices, where ≤ < d n 3 and nd is even (Bollobás, 2001). We merely interpret the undirected edges as being bidirected to obtain ( ) n d RandReg , , the random d-regular network on n vertices. Note that ( ) n d RandReg , approaches the complete network on n vertices as d approaches − n 1. This is evident in the behavior of the mean MLEs obtained from r ¼1 transmission tree grown on the ( ) n d RandReg , contact network on n ¼1000 vertices for values of d in { } 3, 4, 6, 10, 100, 999 as shown in Table 1 and by their IDs 25-30 approaching ID 0 of the complete network with the same n and r at the origin in Fig. 11. Note that Model ID 28 with d ¼10 and n¼ 1000 nearly satisfies the condition that = ( ) −ϵ d O n 1/3 for the underlying randomized algorithm (Steger and Wormald, 1999) to generate an asymptotically uniform random d-regular graph on n vertices (Kim and Vu, 2003).

Connected small-world random network
The small-world random graph model of Watts and Strogatz (1998) on n vertices is constructed by first creating a ring over n vertices (undirected circular path graph). Then each vertex in the ring is connected with its m nearest neighbors if m is even (and its − m 1 nearest neighbors if m is odd). Finally edges are rewired as follows: for each edge (u,v) in the underlying n-ring with m nearest neighbors graph, with probability p rewire (u,v) as the new edge (u, w) with randomly chosen existing vertex w. The undirected edges are interpreted as bidirected and we repeatedly sample until we obtain a connected small world random network ( ) n m q SWRN , , that is specified by the three parameters: n for the number of vertices, m for the number of nearest neighbors each vertex is connected to and q for the probability of rewiring each edge. Thus, ( ) n m q SWRN , , accounts for clustering and at least partially explains the "smallworld" phenomena that is observed in a variety of real-world networks while retaining the short average path lengths of the Erdős-Rényi random graph ( ) n p ER , . We consider two possible initializations, i.e. two different initial SICNs for a given realization of ( ) n m q SWRN , , . In the first case, denoted by ( ) ⁎ n m q SWRN , , , we grow the transmission tree from the vertex with the largest out-degree. If more than one vertex has the maximal out-degree then we choose the vertex with the smallest label. In the second case, denoted by°( ) n m q SWRN , , , we grow the transmission tree from a randomly chosen vertex. The maximum likelihood estimates for their effective Beta-splitting models are obtained from r independent transmission trees grown on these random networks with the two initializations having different parameters as shown in Table 1 with their IDs. The two initializations basically coincide when q ¼0.
We mainly explore the case with m equalling 2 and 5 with 1 and 2 neighbors, respectively, on either side of each vertex initially, for a few values of the rewiring probability ∈ { } q 0.2, 0.5, 0.99 with ∈ { } n 50, 100 and ∈ { } r 1, 30 as shown in Table 1 and depicted with their IDs 31-47 in Fig. 11. More interpretable IDs, in the sense of having variation in only one parameter in the family, are shown by the same color with lines connecting them.
( ) n m q SWRN , , interpolates between the n-ring with m nearest neighbors network and ( ) n p ER , , such that, as → q 1, ( ) ( )→ ( ( − ) ) n m q n nm n n SWRN , , ER , / 1 . When q is close to 1, we do find that the MLEs of°( ) SWRN 100, 5, 0.99 with ID 46, with initial infection chosen uniformly at random just as in the ( ) n p ER , contact network, are roughly closer to those for ( ) ER 100, 0.050 with ID 19 since ( ( − ))≊ nm n n / 1 0 . 0 5 when compared with°( ) q SWRN 100, 5, with smaller values of q (results not shown). An interesting observation is the proximity of the mean MLEs for certain 2D and 3D toroidal grids to certain SWRN s: (i) ID 15 of the 3D toroidal grid with n¼ 1000 vertices (each connected to its six nearest neighbors) and ID 38 of°( ) SWRN 50, 5, 0.2 (each of its 50 vertices initially connected to its four nearest neighbors before being rewired with probability 0.2), (ii) ID 16 of the 3D toroidal grid with n¼10 648 and IDs 36 and 37 corresponding to ( ) ⁎ SWRN 50, 5, 0.1 and°( ) SWRN 50, 5, 0.1 , respectively, and (iii) ID 14 of the 2D toroidal grid with = n 10 4 and°( ) SWRN 100, 2, 0.5 with ID 45. More extensive simulations are necessary to systematically understand these proximities (using rigorous global interval optimization techniques, say in Hofschuster and Krämer (2003), for the MLEs as opposed to the local optimizations used here). Insights from such rigorous simulations may lead to further analysis towards understanding the nature of such proximities between these distinct model families under the Beta-projections of Eq. (3.18), especially for different values of n. Other insights from Fig. 11 include the effect of changing n and the initialization strategy.

Preferential attachment random network
Next we explore the random network created using the preferential attachment model of Barabási and Albert (1999). Realworld networks are best described by a scale-free power-law distribution for their degrees. The preferential attachment model, unlike the other random graph models here, produces such a power-law degree distribution through two generic mechanisms: (i) networks are grown by the addition of new vertices and (ii) new vertices attach preferentially to existing vertices that are already well connected (i.e. with a high degree). The randomized algorithm for the construction of the preferential attachment network ( ) n m PrefAttach , is as follows. First, a graph with m vertices and no edges is initialized, and a graph of n vertices is grown by attaching new vertices, each with m edges, to existing vertices independently according to probability proportional to their degrees. This results in a preferential attachment behavior whereby new vertices are attached to existing vertices that already have a high degree. We interpret the undirected edges as bidirected to obtain a network.
We also use two different initializations to grow a random transmission tree on a realization of this random contact network. In the°( ) n m PrefAttach , random SICN we initialize the infection uniformly at random from one of the initial m individuals with no edges at the beginning of its construction. For the ( ) ⁎ n m PrefAttach , contact network we find the smallest vertex label with the largest out-degree, i.e. the most preferentially attached vertex (which may not necessarily be the first vertex to be attached), as the initial infected individual. We make the distinction because transmission trees can only be grown on ( ) ⁎ n m PrefAttach , after the preferential attachment model has completed its construction of the network with all n vertices. In contrast, for°( ) n m PrefAttach , we can grow the transmission tree even while the preferential attachment model is constructing the underlying contact network. The mean MLEs of the effective Beta-splitting model corresponding to r¼30 independent transmission trees grown over independent realizations of these two variants of the preferential attachment contact networks are shown in Table 1. The bursty or starry nature of the hubs or popular vertices is evident in the left-branching tendency of the transmission trees grown on such preferential attachment networks as shown in Fig. 16. The IDs 50-54 correspond to ( ) m PrefAttach 100, where m takes the values 1, 2, 3, 5 and 10, respectively, show the mean MLEs approach toward the origin where the mean MLE of the complete SICN occurs in Fig. 11 it cannot reach the origin since ≪ m n for ( ) n m PrefAttach , . This is sensible since more of the m vertices will get attached to in the initial steps of the algorithm that is constructing the preferential attachment network and thereby reduce the preferential attachment tendency for larger values of m.

A family of contact networks interpolating the star, complete and path networks
In Section 3.2 we saw that the distribution on discrete transmission trees generated by the Beta-splitting model with α β ( ) , taking (limiting) values (∞ − ) , 1 , ( ) 0, 0 , and ( − ∞) 1, corresponds exactly to that under ⋆ n (the star SICN), k n (the complete SICN) and p n (the path SICN), respectively. Since these three specific SICNs seem to be isolated instances of all possible SICNs, we next show that other SICNs that sequentially interpolate between ⋆ n , k n and p n can be constructed such that their transmission tree distributions correspond to that under the Beta-splitting model with α β ( ) , values that also sequentially interpolate between (∞ − ) , 1 , ( ) 0, 0 and ( − ∞) 1, . Using the inferential procedure of Section 3.3 we can consistently estimate the α β ( ) , parameters of the best-fitting (most likely) Beta-splitting transmission process from a set of transmission trees generated from the transmission process on any given SICN. Next we present a family of SICNs that interpolate our three SICNs: one at the origin and two at extremes of our parameter space This sequence is shown for n ¼5 in the bottom row of Fig. 17 Fig. 17 for n¼ 5. Putting this sequence of networks between ⋆ n and k n and the other between k n and p n we get a total of − n 2 2 networks (including ⋆ n , k n and p n ). This sequence can be generated for any n using the function star2Complete2Path(n) in Appendix A.4.
We can finally see in Fig. 18 how the probability density function (PDF) of the MLEs, α β ( ) , , change as we sequentially vary the SICN in the family that interpolates from the star network (red hue) to the circular path network (pink hue) via the complete network (blue hue). The MLEs is based on 10 independent transmission trees simulated from each SICN in the sequence of 98 SICNs over a population of size n¼50. In Fig. 18, the hue of the PDFs sequentially change from red which is concentrated entirely on the boundary at 1 (star network), to orange and yellow which are decreasing their concentration at 1 due to disappearance of the star's signal from the larger neighborhoods of the circulant graphs A ( ) n C , i . As i approaches − n 1 the green and azure hues of the PDFs become increasingly uniform around blue when the SICN is the complete network. The hue of the PDFs becomes purple and start concentrating at 0 as the SICN approaches the path network that is fully concentrated at 0 (pink hue). The pattern of the PDFs is stochastic since it is based on MLEs from just 10 samples. However, it clearly demonstrates that the interpolating sequence in the space of SICNs does convey continuity in the parameter space of α β ( ) , . In other words, this suggests that there is an α β ( ) , under the Beta-splitting model (recall that the Beta-splitting model need not explicitly refer to the contact network), that best fits the distribution of transmission trees generated from any specific contact network.
The sufficient statistics of split-pair frequencies for α and β from various stages of the interpolating family of SICNs spanning the star, complete and path network are shown in Fig. 19. Note how these sufficient statistics are also changing gradually as expected.

Implications for statistical inference
We have deliberately confined the inferential aspects of this study to the classical maximum likelihood estimator in the simplest sampling setting involving fully grown transmission trees with n leaves over r independent trials. This merely amounts to inferring the underlying contact network via multiple random breadth-first expansions that are encoded through rooted, ranked, planar, labelled, and binary (spanning) trees, i.e. our transmission trees. We now give some basic insights into two other natural approaches to inference for applied network scientists and also provide some natural extensions of the likelihood function.

An L 1 perspective
We could have also defined the effective Beta-splitting model by minimizing the total variation distance between , the probabilities generated by the Beta-splitting model as follows: One elegant aspect of this L 1 -minimizing rule, as opposed to the likelihood maximizing rule behind MLE, is its desirable non-parametric decision-theoretic properties, such as being a metric on all probability distributions over n , having easily interpretable distances in the universal scale of [ ] 0, 1 , and having well-known relations to other statistical notions of divergence. For these reasons we believe that the L 1 -minimizing rule, if efficiently implementable, would lead to better statistical properties for the task of characterizing the equivalence class based on empiricalP of relative  analogous to the loglikelihood expressions. Such efficient expressions require further transformation of the probabilities at the rooted unranked planar and leaf-unlabelled resolution of transmission trees (Sainudiin and Véber, 2016, Lemma 4.1) onto n , which is nothing but the planar unordered cousin of Aldous' shape statistics sequence (Sainudiin et al., 2015, Eq. (4.1)). Note that Aldous' shape statistics sequence, even in its original non-planar form (Aldous, 2001), can itself be further projected to various tree shape statistics (Sainudiin et al., , p. 1231) used routinely in simulation-based studies of transmission trees sketched earlier in Section 1.

A Bayesian perspective
We saw that the L 1 perspective has some merits, provided appropriate expressions can be obtained. In defense of the likelihood principle on the other hand, the exact likelihood expressions developed here are most useful to Bayesian methods of inference that further incorporate prior information with the likelihood function based on finite samples. For example, if we know that the underlying contact network is complete then we can invoke appropriate parametric families of prior distributions centered at (0,0), the known MLE for the complete network, with variance in priors reflecting our extent of this prior knowledge. Using the simulation-based approach of Section 4 we can obtain the effective Beta-splitting model parameters with standard errors that can inform the formulation of appropriate prior distributions. Such a prior formulation can not only reflect the sample sizes, number of leaves in possibly partially observed transmission trees, etc. at our disposal, but also various aspects of the contact networks from other sources of information such as (i) transportation networks and cell-signal triangulated or GPS-based location networks, etc. that underpin coarse structural information of the host contact process in the context of a communicable disease or (ii) Twitter follower networks in the context of "cultural" transmission events that are defined merely to be retweets or mentions, for example (see Section 5.2).
Thus, inducing priors over the parameter space ⎡ ⎣ ) − ∞ 1, 2 for more complex analytically intractable models through their effective Beta-splitting models will now be possible, due to our efficient likelihood expressions, using current semi-parametric and variational Bayes methods, including the composition of different effective Beta-splitting models as finite mixtures with possibly unknown number of components over different sub-network neighborhoods covering the entire contact network.

Natural extensions of the likelihood function
Next we provide some natural extensions of the likelihood expressions developed here.
Remark 6. The likelihood function can be extended to include branch-lengthsespecially under the independent but possibly non-identical exponential waiting-times assumption in the generator of the encompassing continuous time Markov chain given by Eq. (2.2), as is the case for the complete, star and path networks studied here. Leaf labels can also be added according to a tractable labelling process of the population as per Eq. (3.6).
Remark 7. More crucially, the likelihood expressions naturally generalize to transmission trees that are only partially grown or in the process of growing at the time of observation so that they have fewer than n leaves encoding all currently infected individuals. For such partial likelihood expressions one merely needs to specify τ ( ) i , the set of internal vertices in the i-th partially grown transmission tree τ i , in Eq. (3.13) and let this specification imply to its consequent expressions. Also note that the second product over internal vertices within tree τ i in Eq. (3.13) allows each τ ( ) i to be distinct with its own size, i.e., τ | ( )| ∈ { … − } n 1, 2, , 1 i . Such natural extensions of the likelihood expressions may be necessary in Fig. 19. The sufficient statistics of split-pair frequencies for α and β shown as the empirical mass function from 1000 independent transmission trees over a population of size n¼ 100 at various stages of the interpolating family spanning the star, complete (in the center of the panel with 100 vertices and 9900 edges) and path networks (the subplots are labelled by the number of vertices and edges in the host contact network). applied contexts and do produce maximum likelihood estimates with larger standard errors at least in some cases (results not shown here). However, their asymptotic consistency is expected to depend on the details of the SICN itself and on how often one samples trees with nearly n leaves and/or on the initialization mechanisms for the first infector across the r independently drawn partial trees. These partial likelihoods are implementable by modifying the lists being comprehended over vertices in Appendix A.1 and/or Appendix A.2, for instance.
In this work, for concreteness and clarity, we keep the transmission trees full with n leaves representing the final state when the infection has invaded the whole population. Due to this assumption, the unordered split-fair frequencies in Eq. (3.11) become sufficient for α and β. When allowing for multiple but independent partial trees of possibly different sizes less than n as per Remark 7, the unordered split-fair frequencies will still remain sufficient, provided the sampling strategy is asymptotically consistent for the underlying initial SICN, albeit one may only be observing split-pair frequencies over a subset of n .
The sufficiency of unordered split-fair frequencies over n may no longer hold for (i) more complex extensions of the likelihood which may involve non-static or temporally varying networks or for (ii) multiple simultaneous partial transmission trees grown on the same static network up to possibly random discrete stopping time M from a set of initially infected vertices (especially when the set of their infected vertices become mutually incident before time M). Simplest extensions of the Beta-splitting model via finite mixtures for instance could be constructed to approximate such more realistic models by limiting oneself to the sufficient statistics of ordered split-pair frequencies: where z is the discrete time index, and over products of such ordered split-pair frequencies. We could allow z to belong to Z 0 and even allow → ∞ n in order to study contact networks in the large population limit with or without appropriate rescaling of discrete time into the continuum, for the purposes of ignoring combinatorial complications of the modelled individuals in such limits, where possible and sensible.

Discussion
We give a probabilistic description of the transmission process in Section 2 as a Markov chain on the product space of SI-tagged contact networks (SICNs) and transmission trees in discrete and continuous time. The Markov chain is also constructed as a randomized algorithm in the SageMath/Python code in Appendix A.1. This formalizes a large class of simulation programs in the computational epidemiology literature as a transmission process. The probabilities of transmission trees as an explicit function of both branch-lengths and tree topologies are derived in Section 2.1 from the general Markov chains of Eqs. (2.1) and (2.2) for some simple static contact networks.
Although the Markov chain model is general and only needs a directed weighted graph, our examples were limited to simple connected networks with each weight set to 1 in order to focus on the combinatorial skeletons of their associated topological Markov chains. It is relatively straightforward to consider the dynamics on more general networks using the richer language for digraphs (Pastor-Satorras et al., 2015, Fig. 4). For example, the epidemic will spread to the strongly connected giant component (if it exists) and the giant out-component, provided the infection starts from one of the vertices in either the strongly connected giant component or in a giant in-component.
We then develop a biparametric Beta-splitting family of models for the growth of transmission trees in Section 3 that gives the exact probability of any transmission tree as a function of α > − 1 and β > − 1. The model can be interpreted in terms of a Beta-splitting construction for the "infection potential" of the infector and the infectee. Thus, the model captures aspects of the underlying contact network up to how its contact structure affects the infection potential of the infector and infectee after the infection event. The approach avoids the explicit modelling of the underlying contact network (that is typically unobserved or only partially observed) in order to grow transmission trees, unlike the general Markov chain models of Eqs.
(2.1) and (2.2). The Beta-splitting family of models is shown analytically to contain the models generated by the complete network (k n ) when α β ( ) , equals ( ) 0, 0 , star network (⋆ n ) when α β ( ) → (∞ − ) , , 1 and path network (p n ) when α β ( ) → ( − ∞) , 1 , . We also derive explicit expressions for the maximum likelihood estimator and sufficient statistics of split-pair frequencies for the Beta-splitting model from independent observations of the transmission trees. Using the distributions on split-pairs we specify equivalence classes of initial SICNs that are indistinguishable by their Beta-splitting transmission trees with the same effective Beta-splitting model through their Beta-projections into the quarter-plane ( − ∞) 1, 2 as conjectured in Sainudiin and Welch (2015). We have also shown by simulations coupled with an inferential maximum likelihood procedure that the best-fitting parameters of the effective Beta-splitting models based on samples of trees grown over (i) six deterministic contact networks and four random contact networks seem to be well-separated under their Betaprojections into ( − ∞) 1, 2 and (ii) the Beta-projections of a sequential family of SI-tagged contact networks from ⋆ n to k n to p n do indeed change gradually in ( − ∞) 1, 2 from (∞ − ) , 1 to ( ) 0, 0 to ( − ∞) 1, . Various natural implications for statistical inference were outlined for future work. In the following sections we discuss some obstacles that need to be overcome for a few other important extensions of this work.

Towards births and deaths
Recall from Section 1 that our construction needs a tagged contact network. We only focus on the simplest binary tags (S and I) here and thus limited ourselves to SI-tagged contact networks (SICNs). This simple setting allowed us to obtain our main results here because of the underlying core process being one of the pure birth events, whereby individuals "are born" from the set of vertices tagged by S into the set tagged by I at some rate. Thus, there is no "death" here, in the sense of vertices with tag I being retagged for "removal" with tag R (SIR model). The tag R denotes recovery from the infection (after having learnt how to fight the infection at some other rate). In the SIR model the R-tagged individual does not get retagged by S or I. One could represent such a sequence of birth and death events by a binary sequence that could be encoded by Dyck paths that satisfy natural conditions depending on the encoding for the SIR model. By further incorporating such Dyck paths and the three tags one could extend the Markov chains defined here and try to study the distributions they induce on trees with leaves recruited from the vertices with the tag S into vertices with tag I and those with tag I replaced by tag R (and frozen, in the sense of not having any further descendants) at different rates. Extensions to SIR model which allows for the "removal" of infected individuals from the population at a given rate is also conceivable via mapping to percolation on semi-directed networks (see Pastor-Satorras et al., 2015, V.B.4 and the references therein).
By considering birth and death processes, as opposed to a pure birth process, one can also make progress on developing transmission processes with only two tags for the more complex SIS epidemic model that not only allows susceptible individuals to become infected by any infected individual at a given "birth" rate but also allows infected individuals to become susceptible again according to a given "death" rate. Such as model is interesting from a discrete dynamics viewpoint since one could allow the discrete time z to continue to infinity and study time-asymptotic distributions over SIS transmission treesthe same set of transmission trees for the Markov chains developed here, but with the number of leaf nodes being allowed to be any appropriate number in { … } n 0, 1, , at time Z ∈ z 0 , and with all leaf vertices tagged by I (as implicitly done here) with an absorbing state when the tree has 0 leaves with no infected individual or by conditioning on specific set of Dyck paths (Addario-Berry and Reed, 2008) that remove such absorption events. The Markov chains for these problems will need more complicated state spaces and immediate precedence rules for state transitions that take the possibly conditioned Dyck paths into account. It is not clear how one could extend the Beta-splitting construction in an interpretable manner for these settings.

Towards cultural transmissions
The insights developed here through the SICNs and their transmission trees are also applicable to the simplest models of "meme" (Dawkins, 1976, p. 192) evolution in online social media networks (Solon, 2013) through transmission events that can be distilled (admittedly naively) from observable actions such as "likes", "mentions", "retweets" and "þ1s" along with any concomitant comments. See a dataset spotlight (Risdal, 2016) in the official blog of Kaggle.com for a specific extremist cultural context. We would like to point out that the 55 models and their parameter choices were partly informed by empirical insights from extracting, transforming, loading and exploring the raw tweets available to Twitter developers via DataFrame and GraphFrame APIs in Apache Spark (Zaharia et al., 2010).
Although the SI model studied here is the simplest two-state Finite Markov Information Exchange (FMIE) process (Aldous, 2013, Section 2.2) called the Pandemic Process (Aldous, 2013, Section 7), it is shown to be a fundamental building-block (Aldous, 2013, Section 3.2,7) for a large class of FMIE processes which includes various classical epidemic models (see Aldous, 2013, Sections 8, 9) and has some remarkable properties: (i) SI model exhibits the fastest possible spread of information in any FMIE model (Aldous, 2013, Section 3.2) and (ii) it approximates the initial time evolution of the SIS (where infectious hosts return to susceptibility) and SIR (where infectious hosts are removed from the population) models (Pastor-Satorras et al., 2015, II.A). These properties of the model make the Beta-splitting tree distributions we provide here particularly useful for extensions into applied operations research along Markov control processes that are aimed at influencing the growth of certain undesirable aspects of transmission trees, over carefully filtered (Risdal, 2016) extremist cultural networks, through interventions orchestrated from appropriate control spaces, including artificially intelligent chatbots, for instance.

Towards other tree resolutions
We only looked at the resolution of leaf-labelled and leaf-unlabelled transmission trees with and without branch-lengths in this work. Transmission trees are rooted, binary, ranked, and planar. Fortunately, it is straightforward to carry over these probabilities to planar unranked trees, nonplanar ranked trees and nonplanar unranked trees using the explicit formulae and code in Sainudiin and Véber (2016). These formulae can be used to conduct simulation intensive inference based on projections of the transmission trees onto coarser tree shape statistics or used as prior distributions to constrain the micro-structure of the continuum of contacting hosts in space-time within which the pathogens can evolve through transmission events.

Towards dynamic contact networks
The jump Markov chain of the transmission process on static SItagged contact networks (SICNs) is a prerequisite for contemplating appropriate partial orders on the set of all SICNs in order to define natural transitions in the state space that can allow for contact networks to vary in time by possibly depending on the current state of the tagged contact network as well as the transmission treea natural state space for formalizing epidemics over adaptive or coevolving contact networks. Such adaptive contact networks are known in simulation studies to be highly sensitive to the structure of the initial contact network (see Pastor-Satorras et al., 2015, VII.B.7 and the references therein) in complete agreement with Theorem 3 on equivalence classes over initial SICNs.
Let n be the poset under subset ordering of the connected elements of 2 w n , the power set of the edge set w n of k n , the complete network with unit edge-weights. By fixing an initial infector, say ı 0 , we can use the Beta-projection: ( ) → ( − ∞) ↓ w : 1 , n 2 , to map each (connected) contact network w in n to the exact maximum likelihood estimate α β ( ) ∈ ( − ∞) , 1 , 2 while maintaining the partial ordering between contact networks in n . Such a planar geometric embedding of the contact networks, given by α β ( ) ↓ , into the quarter-plane can help one gain a more systematic understanding of the connection between the transmission tree distributions specified by the Betasplitting model at α β ( ) , and that specified directly by the initial contact networks in α β ( ) ↑ , . Future research on Markov chains with transitions over partially ordered contact networks as well as transmission trees could build upon insights from our simpler setting here. .