A Unified Framework for Complex Networks with Degree Trichotomy Based on Markov Chains

This paper establishes a Markov chain model as a unified framework for describing the evolution processes in complex networks. The unique feature of the proposed model is its capability in addressing the formation mechanism that can reflect the “trichotomy” observed in degree distributions, based on which closed-form solutions can be derived. Important special cases of the proposed unified framework are those classical models, including Poisson, Exponential, Power-law distributed networks. Both simulation and experimental results demonstrate a good match of the proposed model with real datasets, showing its superiority over the classical models. Implications of the model to various applications including citation analysis, online social networks, and vehicular networks design, are also discussed in the paper.

dynamics. In continuum theory, degree is deterministically growing over time and the network has nodes arriving with fixed interarrival time. There are clear deviations of the continuum theory from the realistic environment, which is random as the attachment process (i.e., new edge formation) is a random choice in every complex networks. One would expect that different assumptions on the random environment would lead to different network characteristic, while continuum theory would fail to characteristic those assumptions explicitly due to its deterministic approximation nature. In the MC model, the choice of the random environment is explicitly specified as Markovian, specifically the network size has independent increment, and a new node's arrival rate only depends on the current number of nodes in the network. Third, the closed-form formula for the general case could be given under the MC model which are not present in existing works. There are also other effects, including the effect of various distributions of starting degrees and burst structure, which could only be accounted in a stochastic model, are thus absent in existing models, but are provided explicitly in our proposed MC model. For ease of references, a list of existing model for accounting individual effects of degree trichotomy, in our terminologies, are given as follows: Power-law behavior: Preferential attachment is proposed in BA model. Using our terminologies, the attachment kernel, denoted ask is proportional to the degree k, is given by: where c is a proportionally constant. Continuum theory showed a one-phase power-law distribution is resulted under this model.
High-degree cutoff: Sublinear preferential attachment 2, 4 is used, i.e., the attachment kernel is given by: where ν < 1 is a constant used to model sublinear effect in high degree and c is the proportionality constant. Continuum theory 2 results in (approximate) degree distribution with two power-law regimes (with second regime act as a cut-off).
Small-degree flattening: Initial attractiveness factor is used (in generic network 5 , in finite-size networks 6 , and with link addition, rewiring, and node addition consideration 7 ), i.e., the attachment kernel is given by: where A > 0 is an initial attractiveness constant and c is the proportionality constant. Continuum theory 5 results in a degree distribution with small-degree flattening (SDF) (i.e., degree distribution lower than power law prediction). From the existing literature, i.e., equations (1), (2) and (3), it is not easy to find out a direct generalization that can produce degree trichotomy while analytically tractable.
The proposed MC model , in contrast, to be elaborated in this SI, can provide a closed-form formula on network degree distribution for the general attachment kernelk in theorem 3, and in particular, it reduced to theorem 9 for the case of interestthe attachment kernel exhibiting trichotomy, i.e., Several auxiliary effects are also discussed in the existing models to model other realistic features of network generation, and they could also be incoporated in the MC model as follows. (1) To account for heterogeneity of node degree dynamics, fitness model [8][9][10][11] is proposed where node i has individual fitness values η i which changes its attachment probability from proportional to degree k i tok i = c(η i k i ). Closed-form degree distribution may only be achieved under specific scenarios 8 , or asymptotic regime 11 . Its major disadvantage is that there are many model parameters, easily leading to over-fitting problem.
(2) To account for the effect of node deletion 12 and internal links 13 , a presumed form with phase transition (i.e., a power-law with exponential cut-off) is assumed, as node deletion can also give cut-off 14 in degree distribution. But these effects are not applicable to citation network, as citation count always increases. Since degree trichotomy is also observed empirically without those auxiliary effects (i.e., heterogeneity of degree dynamics and node deletions) 15 , the MC model is proposed as a mechanism to give rise to the trichotomy feature first. One could add those effects to our trichotomy result using continumm theory, by changing the external arrival rate λ to account further those internal links formation (by increasing λ ) and node deletion (by reducing λ ), and using our general result for MC model for the case with fitness. For including the auxiliary effects into the proposed MC model, without using any approximation like the continuum theory (so as to account the stochastic effect appropriately), or include other effects like similarity 16 and causality 17 , substantial analysis still needed, which is left as future work.
Stochastic evolution modeling: To our best knowledge, most existing works rely on continuum (deterministic) theory, the state-of-art result on stochastic model in generating complex network is mainly Yule process 18,19 , which is a birth process with a rate proportional to the state variable. And it can be viewed as a BA model after proper scaling of the interarrival time. Fellow researchers choose to use continuum theory, instead of generalizing Yule process, likely because a deterministic model is more easier to analyze than its stochastic counterpart, despite its deficiency mentioned above. The proposed general MC model can be considered as a generalization of Yule process, to a more general Markov process with a rate accounting both the state variable and the node degree bounds on the connection probabilities due to physical constraints, and a bursty arrival (specifically, it differs from the Yule process 18,19 by having a lower bound and an upper bound on the birth rates, modified with a bursty arrival. For example, as shown in Figure 1 in the main text, at the burst location k b = 0, p 0→1 , p 0→2 , . . . , p 0→L +1 are the probabilities of having burst size of 1, 2, . . . , L + 1 connections due to the arrival of new nodes respectively, which is a structure absent in Yule process). Yule process is a special case of MC model, as it has only one phase in the MC model -the fast-evolving phase, corresponding to the preferential attachment mechanism based only on the degrees of the existing nodes but not related to the size or other physical constraints of the current network structure. This generalization is important, as in Experiment Section below, it will be shown that there are some statistical evidence that most real-world networks have nodes that are well-modeled by the initializing and maturing phases of the present model.
In short, related works on modeling the realistic effects either the high-degree cutoff or small-degree saturation are difficult to be put in a unified framework for a closed-form solution analysis. Furthermore, some practical issues like varieties in starting degree and burst structure are not discussed in existing literature, since it is more naturally incorporated in a stochastic model, but most existing works rely on continuum (deterministic) theory. Therefore, the proposed MC model, as a more general modeling framework, could possibly be a more realistic choice than existing models in representing the generation processes of a general complex networks.

S2 Table of Notations
For the ease of reference, a list of important system parameters used in this Supplementary Information document (and the main text) is summarized in Table 1.
As time t increases from time 0 to the observation time T , the system parameters are fixed, but the system variables (which are defined in Table 2) could change. In particular, the main text defined an important system variable -the modified degree k i in equation (4), based on the above system parameters. Recall in the main text, thisk i plays an essential role in the network formation process as it is the proportionality constant of the attachment probability of a new node to a particular existing node i. This Supplementary Information document will further illustrate its important role in changing the system dynamics and eventually the node-degree distribution in the following Section. Burst attractive degree, i.e., a special degree where burst arrival to a node could happen (optional) p k b →k A parameter to control the probability of burst size being k − k b at burst attractive degree (optional) The probability of the starting degree of a node is i T Current observation time of the system N T The number of nodes in the system at time T It is remarked that L , U , L is the minimum set of parameters to model degree trichotomy. Specifically one can use the default MC model with U = U , it is still possible to model degree trichotomy.
Also for convenience, a list of important system variables used in this paper is also summarized in Table 2.
In later discussions, the subscript * used in the notations of the above variables (presented in Table 2) would be omitted, whenever the specified node * is explicitly defined in the text, e.g., k will be used as an abbreviation of k * , and T will be used as an abbreviation of T * . Since the values of the system variables depend on time t, this time variable t will be explicitly specified in the text whenever there is a need of clarification, e.g., p k,n (t) is used to denote the probability that "the degree of the specified node * = k, and the network size = n, at time t". Degree of a specific node * n Network size -i.e., number of nodes in the network k * Modified degree of a specific node * defined in equation (4) T * Residential-time of a specific node *, i.e., T − t * , where t * is the arrival time of node * t Running time index of the Markov Chains (M.C.)

S3.1 Relationship between different dynamics equations in the MC model
In the section, we explain one could arrive at the degree dynamics of an individual node in the complex network (shown in Figure 1 of the main text), from the dynamics of the full state variables in the MC model, as mentioned in the methodology section of the main text. For illustrating the derivation steps, it suffices to consider the case of the MC model without bursty arrival structure. Recall the two key dynamic equations in the main text under this case: • Full System State Dynamics. The dynamics of the probability mass function of the state variable -the degree k of a specific node * (for k ∈ {0, · · · N T }) and the network of size n (for n ∈ {1, · · · N T }) satisfy the following equation: For n = N T : with the boundary conditions p k,n * +k−2 (t) = 0 for all k ≥ 0 (assuming that when node * arrives, the network contains n * of nodes) and p k,N T +d (t) = 0 for all k ≥ 0 and d ≥ 1.
This dynamic equation means that as a new node arrives (i.e., n increases by 1), either the node under consideration can attract the new node (i.e., k increases by 1) or it cannot (i.e. k remain the same).
A remark for the case with burst structure is that there could be some transition probability from multiple new node arrive in a dependent manner (instead of one by one), in such case degree k can also increases by more than 1.
• Individual Degree Dynamics. The dynamics of the degree k (for k ∈ {0, · · · N T }) of a specified node * satisfy and after removing the small boundary terms λ ( k − 1)p k−1,N T (t) and λkp k,N T (t) (which vanishes as T → ∞), the degree dynamics satisfy (as shown in the main text) which can capture the degree dynamics of many existing and other interesting models.

4/26
Example: Under Degree-Trichotomy model, equation (8) becomes: where k s is the starting degree of the node. Noted that equation (9) could be further simplified if L = L or U = U .
In fact, these two dynamics are related by a simple summation as follows: equation (7) is obtained by summing equation (5) over n from 2 to N T which gives Rewrite equation (10) with the definition of the density function of the degree k given by p k (t) = N T ∑ n=2 p k,n (t), equation (7) is thus obtained. Similar aggregation of state (i.e. summation of probability) could be done for the case of MC model with bursty arrival, and resulting in a general degree dynamics, depicted in Figure 1 of the main text, as follows: It is further noted that when deriving equation (8), two boundary terms λ ( k − 1)p k−1,N T (t), and λkp k,N T (t) in (7) (i.e., b k,N T (t) are omitted, as they are very small as compared to other terms in equation (7). Actually, in p k (t) = N T ∑ n=2 p k,n (t), the term p k,N T (t) is the smallest term in the sum, and it vanishes as T → ∞. Hence, neglecting these boundary terms would not cause significant difference in equation (8), but can greatly simplify the subsequent analysis.
Using the above dynamic equations, with a particular setting of the system parameters L, L , U , U, some results of the classical models, including Poisson network model, Exponential network model, and Power-law network model, can be reformulated in the proposed MC model as stated in the theorems of the main text. Furthermore, the proposed MC model can derive new closed-form results on degree distributions which fit much better with the empirical data than the classical models. In particular, this MC model is the first model to explain the observed trichotomy phenomenon. All the aforementioned results are proved in the remaining part of this section. Proof of statements of the Poisson and Exponential network model is presented in subsection S3.4. Proof of statements of power-law network model is presented in subsection S3.5. Furthermore, detail proof of statements of the general case (as a supplement to the proof in the main text) is presented in subsection S3.5.

S3.2 A simple lemma on difference equation
Lemma 1 Consider a k-parameterized difference equation in P k (s) of the following form (throughout this paper, P k (s) is the Laplace transform (as a function in s) of degree distribution of a node p k (t) over time t): where k and k s are nonnegative integers, and Q k (s) and R k (s) are some given functions in s for all k ≥ k s , where k s is some arbitrary starting degree.

5/26
Suppose the initial conditions are P k s (s) = R k s (s), it has the following solution where Q i→k−1 (s) Proof It can be easily proved by mathematical induction as follows. Let the equation (14) to be the statement S (k) for induction test. By equation (13), S (k s + 1) is true as Q k s →k s (s) = Q k s (s). Suppose S (k) is true for k ≥ k s + 1, by equation (13), and then plug in equation (14) to P k (s), then P k+1 (s) can be expressed in the same form as equation (14), as illustrated as follows: thus S (k + 1) is true.

S3.3 General solution methodology in obtaining degree distribution under the MC model
Step 1: Laplace Transform of degree distribution Define the T -truncated Laplace transform of any function of t, f (t), as follows: In particular, the T -truncated Laplace transform of p k (t) is denoted as P k (s) = T 0 e −st p k (t)dt. There is a useful lemma on P k (s) written as follows:

Lemma 2
The Laplace transform of the degree distribution p k (t) under the MC model is Proof Using the rule of integration-by-parts, one has Therefore, by taking the T -truncated Laplace transform on degree dynamics of the MC model (i.e. the differential-difference equation (12)), one has, for k ≥ 0

6/26
where B k,N T (s) λ − k − 1 P k−1,N T (s) 1(k = 0) + k P k,N T (s) , P k,N T (s) is defined as the T -truncated Laplace transform of p k,N T (t), i.e. P k,N T (s) T 0 e −st p k,N T (t)dt, which is small for all k since p k,N T (t) is small for the reason mentioned above.
Thus, for k ≥ 1, with the initial condition By treating equation (22) with initial condition (23) as a special case of equation (13), and then apply lemma 1, it results in the following: The Laplace transform of the degree distribution p k (t) under the MC model is with initial condition where After simplification of the above expression, the lemma as stated could be obtained.
Step 2: Residential-time Distribution Note that taking inverse Laplace transform of the above P k (s), would give p k (t), which only represents the degree distribution of one particular node * at time t if it arrives at time 0. More generally, if a node i arrives at time a i , its degree distribution at the observation time T is given by p k (T − a i ). Denote the residential-time of each node i by The degree distribution of the network is thus given by N T denotes the total number of nodes in the network at the observation time T , and f T i (t) denotes the probability density function of the residential-time T i of node i. The symbol i ∼ U{1, · · · , N T } represents that a node i is uniformly selected among the nodes ranged from 1 to N T .
In other words, the degree distribution of the network is given by , where T is the random variable denoting the residential-time of a randomly picked node, and p k (t) is the degree distribution of this randomly picked node with residential-time T = t. Thereafter, a crucial step of computing p k is to find the distribution of the residential-time T .
The details of the calculation of the distribution of Residential-time T is presented in Section S4.

Step 3: Degree Distribution Method 1: Using inverse Laplace Transform
When T is large, from the T -Truncated Laplace Transform P k (s), one could obtain p k (t) by inverse Laplace Transform on P k (s), and compute the expectation over the residential time to get the degree distribution . This method is

7/26
particularly useful when expectation is easy to calculate. In particular, this method is used for accounting the degree distribution of a specific set of nodes in this paper where the residential time is a constant.

Method 2: Using the MC model's properties
This method is useful for accounting the degree distribution of all nodes in the network where the node dynamics are driven by the MC model.
From Section S4, at the observation time T , the residential time distribution of the nodes under the dynamics driven by the MC model would be exponential distributed with some parameters λ γ, abbreviated as exp (λ γ) (in which γ, will be called the residential-time parameter, and is determined by the system parameters of the MC model), i.e., the probability density function of the residential-time T , f T (t), is given by: Thus, the degree distribution of the network is given by After simplifying corollary 2, it results in the following useful lemma throughout the text of this Supplementary Information: The the degree distribution of the network, p k , under the MC model (with residential-time parameter γ) is which is a weighting factor related to the probability mass of that i is the starting degree, p i (0), the transition probability from the burst occurring degree k b to i, k b c λ γ γ p k b →i , and the probability mass at burst occurring degree k b , p k b ; and which is the degree distribution when the starting degree is i; and which is a discount factor of the degree distribution when the starting degree is i, due to certain proportion of probability 1 − p k b →k b +1 is left for bursty arrival (i.e. not single arrival).
From theorem 3, it is remarked that the following useful relationships between special cases of the MC model and their general cases.
Remark 1 For sufficiently large T , the term B i,N T (s) − e −sT p i (T ) can be viewed as an approximation error, which is ≈ 0, for most values of i. It is because B i,N T (s) is small (and asymptotically converges to 0 as T → ∞), as it comes from two parts, λ ( i − 1)P i−1,N T (s) and λîP i,N T (s), which vanishes as T → ∞ (since they are Laplace transforms of the two boundary terms (7) which are very small as compared to other terms in equation (7).). Besides that, the other part of error e −sT p i (T ) is also small (and asymptotically converges to 0 as T → ∞).
Remark 2 Degree distribution p k under MC model with (nonsingleton) starting degree(s), but no burst is a (non-degenerated) mixture of degree distributions p ki of the default MC models with various starting degree(s) i, specifically, where p i (0) is the mixing probability.
Remark 3 Degree distribution under the general MC model p k is a mixture of degree distribution of the default MC models with discounting d ki . The probability mass at k need discounting when the burst location is inside the interval between starting degree i and k − 1, specifically, where c i q i is the mixing probability, with c i being a normalization constant such that The discounting factor d ki of the degree distribution p ki at different starting degree i could be interpreted as follows. For a fixed starting degree i, when the starting degree i ≥ k b + 1, no discounting of probability mass happens, i.e., the degree distribution under general MC model is the same as that under the MC model with no burst in this regime. When the starting degree i ≤ k b , before the burst occurs (for k ≤ k b ), there is no discounting of the probability mass at those k, i.e. the degree distribution under general MC model is the same as that under the MC model with no burst for those k in this regime of i; yet for k ≥ k b + 1 in this regime of i, discounting of the probability mass at k happens, specifically, more weighting in the mixing probability in the general MC model for those with starting degree ≥ k b + 2, compared to that of the MC model with no burst.

S3.4 MC generalizes Poisson and Exponential network
Theorem 4 When L = L = U = U in the default MC model, as T → ∞, it reduces to one of the two classical models respectively, for • Case 1 -accounting a fixed set of nodes (starting at the same time and the same degree): it reduces to the Poisson network model; • Case 2 -accounting all nodes: it reduces to the exponential network model.
Proof In this proof, the degree distribution of the network p k is computed under the setting L = L = U = U.
In this setting, the modified degree is given byk = L, for all k. Denote the starting degree as k s , when T is large enough, by lemma 2, the Laplace Transform of degree distribution of a node is obtained as follows: ∀k ≥ k s , and P k (s) = 0, ∀k < k s .
• In Case 1, the specified set of nodes for statistics collection, have the same starting time a 0 , and thus have been existing in the network for the same amount of residential time T − a 0 . In other words, the probability density function of the residential-time is a Dirac-delta function. Since residential time distribution is very 9/26 simple in this case, method 1 (i.e., perform inverse Laplace Transform and then take expectation over residential time) is used to find the degree distribution in this case. The inverse Laplace transform of P k (s) yields, ∀k ≥ k s , and p k (t) = 0, ∀k < k s , which is a Poisson distribution (with starting degree k s ) with parameter λ L. Then, the degree distribution of the network of the specified set of nodes is given by , which is a Poisson distribution (with starting degree k s ) with parameter λ LT . A network with Poisson node degree distribution is known as the Poisson network introduced by Erdös and Rényi 20 , wherein all nodes start with degree 0. Specifically, we can generate the Poisson Network in 20 by collecting statistics of initial nodes in our model with aforementioned parameters L = L = U = U on generating the networks, with initial nodes' degree k s = 0, and with p k (0) = 0 for all k except p 0 (0) = 1. Lemma 5 When L = L = U = U = 1, the residential-time T of a node has an exponential distribution with rate λ .
Proof The network dynamics satisfy which is obtained by summing equation (5) over k. Note that this is a Yule process, which is shown in 18 that the residential-time of a node has an exponential distribution with rate λ .
The detailed derivation for the general case wherein L = L = U = U, is presented in Case 1 of lemma 10 of section S4. The residential-time T is shown to be exponentially distributed with parameter λ L, abbreviated as exp (λ L). Thus, method 2 for calculating the degree distribution of the network is suitable in this case (i.e., by averaging the Poisson density p k (t) in equation (36) with the residential-time distribution exp (λ L)). The result is a special case of applying theorem 3 to the default MC model with this specific set of parameter L = L = U = U as follows: which is a geometric distribution with parameter 1 2 (with the starting degree k s ), known as the node-degree distribution of an exponential network 21 , since the geometric distribution is the discrete version of the exponential distribution. A network with Exponential node degree distribution is known as the Exponential network, wherein all nodes start with degree k s = 1 in most literature 22 . Specifically, we can generate the Exponential Network 22 by collecting statistics of all nodes in our model with aforementioned parameters L = L = U = U on generating the networks, with starting nodes' degree k s = 1, and with p k (0) = 0 for all k except p k s (0) = 1.
From the MC model, one could also generate a more general class of exponential distribution for the degree distribution of the network, by allowing non-singleton probability mass on the starting degree k s of nodes, i.e., different nodes could have different starting degree (mathematically, there is no single k s such that p k s (0) = 1 while p k (0) = 0 for other k; or in other words, p k (0) > 0, for more than one k) as shown in the following corollary.
Corollary 6 When L = L = U = U in the general MC model with the starting degree having (non-singleton) probability mass specified by p i (0) , ∀i ≥ 0, as T → ∞, it reduces to a (nono-degenerated) mixture of discounted geometric distribution with different starting degree with mixing probability q i , i.e., where d ki is the discount factor specified in theorem 3. When the MC model with no burst structure, q i = p i (0) and no discounting happens, i.e. d ki = 1 for all k and i.
Proof It is a corollary of theorem 3, with p k in equation (37) as p s ki in theorem 3 for each starting degree i being k s in theorem 4, as discussed in Remark 2 and Remark 3.

Remark 4
As a remark to Corollary 6,equation (38) shows that this specific setting of the MC model (under the case of no burst structure, but possibly different starting degrees) gives a degree distribution of the form of a mixture of consecutively truncated geometric distributions with parameter 1 2 , and the mixing probability (or the i-th truncation probability) being p i (0). It looks like the discrete analog of the hyperexponential distribution. For hyperexponential distribution, as a mixture of some exponential distributions, the empirical data generated from this distribution can be well-fitted to the distribution using the well-known Prony method (widely used in the engineering literature, e.g., in power systems 23 ). However, there are still some differences between equation (38) and the hyperexponential distribution. Precisely, equation (38) is a mixture of truncated geometric distributions. A simple generalization to the discrete analog of the Prony method for parameter estimation on fitting the empirical data may not be good enough, so a further extension of the method is needed to handle the effect of truncations. Since the detailed design of this possibly optimal fitting procedure still needs further investigation, it is left for future research. In this paper, only a simple heuristic procedure is proposed for fitting data from the mixture of truncated geometric distributions (see the Fitting Methodology Section in S9 for details). Proof In this setting, the modified degree is given byk = max {1, k}, ∀k ≥ 0, i.e.,k = k, ∀k ≥ 1 and0 = 1. Denote the starting degree as k s . Recall that the degree distribution of the network is given by p k = E T [p k (t)], where T is the random variable denoting the residential-time of a randomly picked node. The detailed derivation of the residential-time of any specific node *, when L = L and U = U = ∞, is presented in Case 2 of lemma 10 of section S4, and is shown to be exponentially distributed with rate 2λ as T → ∞. So method 2 of finding degree distribution p k is applicable in this case. Apply theorem 3 to the default MC model with above parameter setting, one can get

S3.5 MC generalizes the power-law model
∼c k s k −3 for some constant c k s (for each given k s ), for large enough k, i.e., a Power Law with exponent -3, with the given starting degree k s .
Similar to the case of extending Exponential network to more general class of exponential distributed network, from the MC model, one could also generate a more general class of Power-Law distribution for the degree distribution of the network, by allowing non-singleton probability mass on the starting degree k s of nodes as shown in the following corollary.
Corollary 8 When L = L = 1, U = U = ∞ in the general MC model with the starting degree having (non-singleton) probability mass specified by p i (0) , ∀i ≥ 0, as T → ∞, it reduces to a (non-degenerated) mixture of discounted power-law with different starting degree with mixing probability q i , i.e., where d ki is the discount factor specified in theorem 3. When the MC model with no burst structure, q i = p i (0) and no discounting happens, i.e. d ki = 1 for all k and i.

11/26
Proof It is a corollary of theorem 3, with p k in equation (39) as p s ki in theorem 3 for each starting degree i being k s in theorem 7, as discussed in Remark 2 and Remark 3.
Remark 5 As a remark to Corollary 8,equation (40) show that this specific setting of the MC model (under the case of no burst structure, but possibly different starting degrees) gives a degree distribution of the form of a mixture of some consecutively truncated power-law distributions having a slope parameter −3, with the mixing probability being p i (0). The resultant distribution is still a power law with a slightly greater exponent −3 + ε, where ε > 0 is a very small value.

S3.6 MC model can explain Degree Trichotomy
Besides providing a unified framework for studying general complex networks , the proposed MC model can offer an analytical closed-form expression of the degree distribution and explain the observed three phases in empirical degree distributions in realistic networks, which is addressed in this paper for the first time in the literature.
The results are summarized as follows.

Theorem 9
The degree distribution of the default MC model (with starting degree i) in general parameter settings is given by and the degree distribution of the general MC model is a mixture of the above distributions over various i with mixing probability q i and burst related discount factor d ki as follows: where L ≤ γ ≤ L + 1. For U N, one has γ ≈ L, while for U ∼ N, one has γ ≈ L + 1; c, c 2i , c 3i are normalization constant making the total probability to be 1, i.e., ∑ k p k = 1; are the (normalization constant, starting degree) pairs for degrees at phase 2 and phase 3 (It is noted that when s.d. is outside the boundary of the phase, the probability mass function is defined as 0 at that phase.); N T is the network size at current time T . With no burst structure, q i = p i (0), i.e. the starting degree distribution, and d ki = 1.
With burst structure, q i is greater than p i (0) by a factor proportional to the probability of transfer from k b to i, i.e., d ki = p k b →k when k b is in the interval between i and k − 1; and d ki = 1 otherwise.
Proof In this setting, the modified degree is given bŷ Denote the starting degree as i. Recall that the degree distribution of the network is given by , where T is the random variable denoting the residential-time of a randomly picked node. As can be seen from lemma 10 of section S4, the residential-time of the node * has an exponential distribution with parameter λ γ and a normalization constant c = 1 1−e −λ γT . So method 2 of finding degree distribution p k is applicable in this case. Apply theorem 3 to the general MC model with above parameter setting, one can get equation (29), i.e., p k = ∑ k i=0 q i p s ki d ki ∀k ≥ 0 with q i given by equation (30), and p s ki and d ki given as follows.
It is noted that p s ki is the degree distribution under the default MC model with starting degree i, whenever i ≤ k. And one may extend it to a matrix with row index k and column index i by setting it to be 0 for i > k. The specific form of the entries depends on which of the three regions that the index i and k falls in. Specifically, the matrix p s ki is a lower triangular matrix, whose entries is written in the following form: where c is the normalization constant such that the sum of the probabilities is 1.
Based on the asymptotic property of the Gamma function, i.e., lim k→∞ Γ(k)k (γ+1) Γ(k+γ+1) = 1, thus the (k, i) entries for phase 2's degrees k (i.e., L < k ≤ U ) would be γ Γ(k) , which is a power law with exponent In summary, one can obtain p ki for the degree distribution of the starting degree i as stated in the theorem as follows: are the (normalization constant, starting degree) pairs for degrees at phase 2 and phase 3. For the case of the MC model with bursty structure k b , one further need the discount matrix d ki , by equation (32) in theorem 3, and apply it to the case with the above "degree-trichotomy" setting, its representation is divided into the following cases: When 0 ≤ k b ≤ L , the matrix d ki is defined as otherwise As a result, the degree distribution p k could be view as a matrix-vector multiplication, with the matrix being the result of the matrix point-wise product of the matrix p ki and one of the above d ki matrices (the choice depends on k b ); and the vector being q i given in equation (30).
Hence the result as stated in the theorem is obtained, with p k being a mixture of discounted p ki with different stating degree i, with discounting given by d ki and the mixing probability given by p i (0) adjusted by the probability come from the burst.
By checking the matrix vector multiplication of p ki d ki and q i , and notice from Remark 3, the mixing probability governed by q i in the general MC model have more weighting for those with starting degree i ≥ k b + 2 than q i = p i (0) in the default MC model, some interpretations of the above mathematical results on the effect of burstiness are given as follows, The resultant degree distribution has more mixing of geometric distribution in region 1 (0 ≤ k ≤ L ) for those k > k b , and give stronger power-law (i.e., smaller power-law exponent or change from geometric to heavier-tail as in power-law) in the whole second region (L < k ≤ U ) and the whole third region (k > U ).
• Case 2: L < k b ≤ U The resultant degree distribution is a stronger power-law (i.e., smaller power-law exponent or change from geometric to heavier-tail as in power-law) in the second region (L < k ≤ U ) for those k > k b , and the third region (k > U ).
• Case 3: k b > U The resultant degree distribution is more geometric in the third region (k > U ) for those k > k b .
It is noted in accounting the degree distribution p k in theorem 9, there is an approximation error term in the mixing probability q i , for very large starting degree i (corresponding to the approximation error term in P i (s) for very large i, i.e., . It will cause some small discrepancies in observed probability density against the theoretical expression of the probability mass function under default MC model in equation (41) for very large k, even when the starting degree is a singleton k s (as those error terms in those q i are mainly located at very large i, which only take effect for large degree k as shown in equation (42)). This component of approximation error explains part of the errors in the so-called "node dynamics" in the existing literature for the power-law case. More details on this node-dynamics will be discussed in Section S8 on further discussion on the Simulation.

S4 A Technical lemma to derive the main theorem and its proof: Residential-time distribution of a node in the proposed MC model
In the analysis of the node-degree distribution p k , it is very important to know how long (the residential-time T ) the specified node * has been existing in the network. Denote the probability density function of the residential-time by f T (t). At the observation time T , the node-degree distribution p k can be written as follows: An illustration of the residential-time T of a node is shown by Fig. 1. node8 Figure 1. Arrival time of node i is denoted by t i , and residential-time of node i is denoted by T i , which is the same as T − t i , where T is the system starting time since the appearance of the first two nodes.
In this section, we will prove the following technical lemma on the residential-time T of a node as follow: Lemma 10 This residential-time probability density function f T (t) of each node in the network, under certain typical setting of the system parameters L, L , U, and U , is approximately given by:

14/26
• Case 1 -When L = 1 and U = ∞, f T (t) = 2λ e −2λt Proof This residential-time distribution f T (t) of each node in the network, in turn, is related to how the network size N(t) is evolving over time t. This evolution of N(t) (or specifically, the differential-difference equation on p n (t), or the marginal distribution of N(t)) can be obtained by summing equation (5) over k, as follows: Clearly, equation (46) is an inhomogeneous birth process. Note that the difference between S n and S n−1 is given by It can be verified that the difference S n − S n−1 is given by k i + 1 − k i +k n . It means that when a new node comes to the network, it will create a new link and increase the rate of birth in the process by λ (L + ε i ), where ε i depends on the existing degree k i of node i . Specifically, It is remarked that the followingε i will give the same effect as the above ε i (even when L = L or U = U) in analyzing the dynamics on p n (t): Specifically, this sum ofε i (i.e., ∑ i is incident by a new nodeε i ) gives an upper, but accurate enough, estimate of the sum ∑ i is incident by a new node ε i .
Since this ε i depends on the current network size n, it can be viewed as a realization of a random variable ε n , with ε 0 = 0 and ε 1 = 0, because no other node i appears in these cases. Hence, expressing S n and S n−1 by ε i , and replacing the sum of ε i by the sum ofε i , the node dynamics (46) satisfy Let the final network size at time T be N T and consider several important cases and their network-size dynamics over time, as follows: Case 1. When L = 1 and U = ∞, the node dynamics satisfy = 2λ (n − 2) p n−1 (t) − 2λ (n − 1) p n (t).

15/26
In this case, the density function f T (t) of the residential-time of the specified node * is a (scaled) exponential distribution with parameter 2λ , specifically with probability density function f T (t) = 2λ e −2λt 1−e −2λ T . Case 2. When U is large (comparable to N), the network size dynamics satisfy According to the preferential attachment mechanism, a new node has a higher probability to connect to an existing one with a larger degree, instead of those with degrees less than or equal to L. Hence, c is small compared to n, i.e., around L + 1, instead around n − 1.
In this case, the density function f T (t) of the residential-time of the specified node * is given by a (scaled) exponential distribution with parameter close to λ (L + 1), specifically with probability density function f T (t) = λ (L+1)e −λ (L+1)t 1−e −λ (L+1)T . Case 3. This is the most practical scenario. When U N, the network size dynamics satisfy where c is small, compared to n, and satisfies 0 ≤ c ≤ n − 1 − (L + 1). In this case, the density function f T (t) of the residential-time of the specified node * is a (scaled) exponential distribution with parameter close to λ L, specifically with the probability density function f T (t) = λ Le −λ Lt 1−λ Le −λ LT . It is noted that in all above cases, the denominator of the density function f T (t) would only result in a difference in the normalization constant in the calculation of p k in equation (45).

S5 Z-Transform of Node-Degree Distribution
In this and the next section, we will derive some performance metrics , based on our result of the closed-form degree distribution.
In this section, our goal is to establish a differential equation of the generator function, or z-transform of the node-degree distribution p k . The z-transform of the degree distribution is widely used in deriving many network metrics 24 , e.g., the average path length, and some closeness measures, such as clustering coefficient.
Define the generating function (also called the z-transform) of the degree distribution p ki (with starting degree i) as By equation (44), it turns out P(z), depending on the starting degree i, can be split into three possible terms as follows: With some simplification, we noted that and where P (k+1,L ) (1) is given in equation (54), and for L < i ≤ U

17/26
In summary, the following theorem is obtained: The ccdf of the network degree under the default MC model, with starting degree i, is given by where F c geom γ γ+L and F c geom γ γ+U are ccdf of geometric distribution with parameters γ γ+L and γ γ+U respectively, and parameters definitions are same as those in theorem 9. (It is noted that when s.d. is outside the boundary of the phase, the ccdf is defined as 1 at that phase, and when k < s.d., ccdf value at that k would be 1.) Similarly, the network degree distribution, i.e., cummulative distribution function (cdf), with starting degree i, is given by F ki = 1 − F c ki . The cummulative distribution function (cdf) of the general MC model is a mixture of above cdf F ki over various i with mixing probability q i and burst related discount factor p k b →k b +1 as follows: . Similarly the ccdf is given by 1 − F k . Proof

S7 Diameter of the Network
In this section, we want to compare the diameter of the network (i.e., the maximum distance between a pair of nodes in the network) exhibiting trichotomy in degree distribution under our Markov model and that of the scale-free network (e.g. the BA model): We start with the first node of the network, and called it the root. For simplest illustration of the impact of setup parameter L and saturation parameter U, assume in our model L = L , U = U , and p 1 (0) = 1, p k (0) = 0 for all other k, i.e. only one

18/26
edge is formed when a new node joins the network. In this case, the resultant network would be a tree (both under our Markov model and the BA model).
Traversing along each edge attached to the root, some nodes are farthest from the root (i.e., within each branch of the tree, there is a node takes the maximum number of hops from the root to it). Collect all those nodes from all branch of the root and call them the foremost nodes. Then the problem of computing the diameter of the network can be reduced to the maximum distance between a pair of foremost nodes (it is remarked that this pair of foremost nodes may not be unique). We call the foremost nodes achieving this maximum distance as the candidate foremost nodes. It is noted that only attachment of a new node to one of the candidate foremost nodes will increase the diameter of the network. Hence, when a new node arrives at the time when the network size is n, the probability of increasing the diameter would be the same as the probability of attaching it to one of the candidate foremost node. Summing these diamater increasing events' probabilities from n = 1 to the final system size N T would give us the expected network diameter at observation time T .
According to the attachment rule of our Markov model, the probability of attaching the new node to one of the candidate foremost node, would be proportional to the modified degree of the candidate foremost node. Specifically, this probability would be given by the ratio of the modified degree of the candidate foremost node to the total modified degrees, times the number of candidate foremost nodes Since each candidate foremost node has degree one, • for our MC model, its modified degree is L, • for BA model, its modified degree is still 1.
When the network size is n, the sum of modified degrees of all nodes in the network are also different for each model, • for our MC model, it is N L,n L + ∑ {i|ki=ki} k i,n + N U,n U, where N L,n = the number of nodes in the set-up phase, and N U,n = the number of nodes in the saturation phase. It is noted that the number of nodes within power-law regime is given by • for BA model, it is the total degree and is given by 2n.
When the network size is n, denote the number of candidate foremost nodes c n as: • c Markov,n for our Markov model, in general case; and • c BA,n for B-A model, i.e. L = 1,U = ∞ under our model.
It is noted that as the network size n increases, c n in both cases would stay the same whenever a new node is attached to an internal node (not connecting to candidate foremost nodes) of the tree, which happens most of the time when network size is large. Occasionally, the list of candidate foremost nodes can also keep expanding or potentially increase in next step (when a new node is attached to a leaf node of the tree which is not the candidate foremost node or attached to a sibling node of the candidate foremost node), and c n can also decrease whenever a new node is attached to one of the candidate foremost nodes. Since when the network size is large, most of the nodes are internal nodes, those new node attachment events causing the changes in c n are relatively rare compared to those events causing no changes in c n . Hence c n stay the same most of the time, even though it can fluctuate between 2 and a small maximum value, as n increases (as expected and observed in simulation). For simplicity of analysis, let us treat c n as a Θ (1) constant, and treat it as roughly the same for both cases. As a result, the probability of increasing the diameter at the time when the network size is n, is given by the following: • for our Markov model, it is c Markov,n L 2n + ; • for B-A model, it is c BA,n 2n .
Hence, an estimate on the diameter when network size is N T is given by: • Under our Markov Model, the value is: • Under the B-A Model, the value is: In both cases, the estimates of the network diameters are harmonic series, so according to the property of harmonic series, the diameter scale as Θ (logN T ). However, one could have some simple observation on how L and U affect the scaling constant of this asymptotic of the network diameter.
When comparing the diameters of two models (of different L and U, a special case of L = 1 and U = ∞ is the BA model), whether the "new" model has a longer diameter compared to the "old" one depends on whether (I) or (II) is larger, where In particular, when (I) ≤ (II), the diameter of the new model is larger; otherwise it is smaller. As c new ≈ c old , one could observe that: when U increases (for fixed L), (I) ≤ (II), so the diameter increases; when L increases (for fixed U), it is still more likely that (I) ≤ (II), so the diameter also increases.
In short, having a longer setup phase L > 1 and an earlier saturation phase (U < ∞) generally increase the diameter of the network. This analysis show that trichotomy network may not be as good as scale-free network for information dissemination if it takes a cost to traverse each hop in the network. Hence network designer aiming for faster information dissemination may consider remedies to reduce L and increase U.

S8.1 More illustrations of the degree distribution resultant from the proposed MC model
In the main text, by setting L = L = 2 and U = U = 8, the MC model generates the trichotomy distribution as shown in Fig.2(a), i.e., power-law distribution with exponential head and tail. Several characteristics of the node-degree distribution plot matches with those predicted by theory. As a second illustration, the same predictions also hold when L = L = 3 and U = U = 10, as shown in Fig. 2(b). Referring to the middle range of the closed-form expression (41), and γ ≈ L (for U N), the theory predicts that the slope is − (L + 1) and it matches with the observed exponent −4 in Fig. 2(b). It is also observed that the head part is geometrically distributed with parameter value 0.53 and the tail part is geometrically distributed with parameter value 0.25, as predicted by the theory.

S8.2 Discussions on the accuracy of approximation
In the main text, it is remarked that simulation curve is an averaging of the empirical distribution results over N = 100 simulation runs, where the goal is to reduce the variation in the empirical distribution plot, particularly reduce the variation in the tail part, so that the main characteristic of the result can be observed more easily. The reason that N can reduce the variance is because of the following asymptotic convergence theorem: sample averaged empirical degree distribution converges to the true 20/26 dt (as mentioned before in section S3.3), the sample averaged (over simulation) empirical distribution would be asymptotically given as follows: (where k j i is the degree of the i-th node at the j-th simulation at the observation time T ) for each k, as N → ∞ (where 1{·} is an indicator function), i.e., the empirical distribution will converge almost surely (a.s.) to the true degree distribution. The setting on N (N = 100) is already large enough to ensure a small variation for most parts of the curve (i.e., for most values of k). However, there is still a significant variation in the tail part (when k is very large) of the empirical distribution observed from the simulation, especially when U is large. For example, U = ∞ yields the BA model, but from the proposed MC model, there is significant variation in the tail part as shown in Fig. 2(c) even if the simulation runs 100 times. These tail variations in BA model are usually called the "node dynamics" in the existing literature, which have not been theoretically addressed. In the proposed MC model, the difference between the deterministic part of the "node dynamics" and the proposed closed-form distribution, is the neglected boundary terms λ ( k − 1)p k−1,N T (t) and λkp k,N T (t) in equation (7), to obtain the degree dynamics in equation (8), which causes an approximation error B k,N T (λ γ) − e −λ γT p k (T ) as shown in the proof of Theorem 9. It is noted that this error term is very small for small k, hence error almost only occur at the tail (i.e., large k). In particular, the larger this term B k,N T (λ γ) is, the more discrepancy it causes to the approximation formula and the empirical curves at the tail. As a result, from the proposed MC model viewpoint, there will be larger error at the tail for larger U, ask can increase further with a looser upper bound causing more discrepancy (or dynamics), while the approximation error for the model with a smaller U is smaller, ask cannot increase too far due to the tighter upper bound causing less discrepancy (or dynamics). This prediction is verified in the simulation as shown in Fig. 2(b) (where L = 3,U = 10) and Fig. 2(c) (where L = 3,U = N).

S9 Fitting Methodology
As the degree distribution is typically provided as a list of positive integers x min , ..., x max , we aim to fit the data with a discrete set of data points. Let N T denote the number of data points. The steps of the fitting process are: Step 1. Partial logarithmic binning : Because there are much more data points in the tail than that in the head of the degree distributions which may lead the fittings to over-focus on the tails, we first compute the probability density function (PDF) using bins of exponentially increasing width B i |i = 0, 1, 2, ... when the degree x is larger than a threshold 25 .
Step 2. Find L : Choose a value of X min between x min and x max and estimate the value of the degree exponent corresponding to this X min using (69)

21/26
With the obtained parameter pair (γ, X min ), assume that the degree distribution has the form where ζ (γ, X min ) = ∑ ∞ n=0 1 (X min +n) γ is Hurwitz zeta function. Then, with the current X min , use the Kormogorov-Smirnov test to determine the maximum distance D(X min ) between the CDF of the data S(x) and the CDF of the estimated degree distribution S(p x ): We scan the whole range of X min from x min to x max and calculate the corresponding D. We aim to identify the X min value for which D is minimal Step 3. Find U : Similar to Step 2, scan the whole X max range from X min to x max and calculate the corresponding D: We aim to identify the X max value for which D is minimal: Step 4. Fit the power-law segment : We apply least-squares fitting to fit the segment [L , U ] with a power-law p phase2 = a × k −γ , where a is a coefficient and −γ is the exponent.
Step 5. Fit the small-degree flattening segment (i.e., the phase exhibiting geom( γ L+γ ) distribution): With the exponent found in Step 4, we use the derived closed-form formula to fit the initializing phase. Since it is usually unclear how many truncated geometric distributions are combined in this phase, we use only two truncated geometric distributions (i.e., the maximum number of head parameter is set to 1). According to the equation, p phase1 = p 0 (1 − p a ) k−2 1 {k≥2} , where p a = γ/(L + γ). We then apply least-squares fitting to find the best p 0 1 so as to fit the initializing phase segment [1, L ]. For the general case of fitting the initializing phase (with a larger maximum number of head parameters), it is referred to Fig. 3.
Step 6. Fit the high-degree cutoff segment (i.e., the phase exhibiting the geom( γ U+γ ) distribution): We fit the high-degree cutoff segment by the derived closed-form formula . Once γ and U are found in the previous steps, p phase3 is determined for the maturing phase segment [U , N T ].
Step 7. Normalization : Finally, we normalize the probability of three segments to make the sum to 1. In other words, find c such that c ∑ L k=1 p phase1 (k) + ∑ U k=L p phase2 (k) + ∑ N T k=U p phase3 (k) = 1.

S10 Further Discussion on the Experimental Data
The main text provides three datasets from citation networks. In this section of the Supplementary Information, another six datasets from social networks and vehicular networks are provided to verify our model.  Figure 3. The method of fitting the head part of the distribution, i.e. the initializing phase S10.1 Social networks Table 3 shows three social network datasets 26,28,29 , which are used in our verification and analysis. MC model of Social networks: In social networks, people are considered as nodes. When a person p1 coauthors a paper with another person p2, adds p2 as a friend on Facebook, or follows p2 on Twitter, a link is built between p1 and p2 and the node degrees of both persons increase by 1. To further explain the lower bound L and the upper bound U in the new MC model, take the coauthorship network as an example. When there is no famous scientist (i.e., a node with high degree), each scientist likely chooses to work with others with an equal probability. When there are more and more scientists in the network, a new scientist has a higher probability to choose to work with those having better reputation (i.e., nodes with degrees > L). For those who are already very famous (i.e., nodes with degrees > U), a new scientist may choose to work with any of them with an equal probability because they all have a good enough reputation. It is a similar process for friendships in Facebook and Twitter. Empirical Data and Fitting Results: Fig. 4 shows the distributions of coauthors for DBLP Computer Science publications, friends in Facebook datasets, and followers in Twitter datasets. Fig. 4(b) further shows the coauthor distribution in a subset of publications about Computer Networks in DBLP dataset. The fitting parameters and errors are shown in Table 4. One can see that the new MC model reduces Table 4. Fitting parameters and errors in social datasets. MC is the fitting error of our model, PL is the fitting error of using power-law, PL −C is the fitting error of using power-law with expoenetial cutoff, and SPL −C is the error of shifted power-law with geometric cutoff. the fitting errors by up to 95%. Implications to social network analysis: In social networks, the distribution of the numbers of friends or coauthors that each people has is an important indicator of the health of the networks. Typically, the more evenly distributed the network is (i.e., the more number of friends each people has), the healthier (i.e., more desirable) it is. In the MC model, this corresponds to a smaller L, as the exponent in the power-law region γ is proportional to L, hence a smaller L implies a flatter slope, which means a flatter distribution. However, as proven in the MC model, a smaller L also means a smaller growth rate of the network. Hence, there is a tradeoff in obtaining a healthier distribution and a faster growth of the network by changing L. In practice, an administrator of a social network can control this L in various ways. For example, the administrator can introduce subsidies or promotions to lure the recruitment of new members, hence increasing L, or introduce fees for joining the network, hence reducing L, so as to obtain a desired tradeoff level.  . Probability P(x) versus node degree x on a double logarithmic scale in social network datasets. The node degree represents the number of coauthors, friends, or followers for the three datasets, respectively. The inner figures show the values of Kormogorov-Smirnov test vs. X min and X max . L and U are selected from X min and X max to minimize the KS-test distance value. Table 5 shows two vehicular networks datasets [30][31][32] , which are used in our verification and analysis. MC model of Vehicular networks: In vehicular networks, vehicles are considered as nodes. When a vehicle v1 is within the communication range r with another vehicle v2, a link is built between v1 and v2 and the contact counts (i.e., node degrees) of both vehicles increase by 1. A vehicle's degree is related to the area it locates. For those vehicles located in a quiet zone, there are only a few vehicles around, so their node degrees are low (i.e., degrees < L). These vehicles move freely and have an equal probability to meet each other. On the other hand, a vehicle will have a higher degree if it is closer to the crowded areas (e.g., downtowns or hot scenic points). When those vehicles get closer to such areas, they will meet more vehicles with higher probabilities. Vehicles located in such areas may be able to communicate with most vehicles in the areas and therefore can link to these vehicles with an equal probability.  Table 6. Fitting parameters and errors in vehicular networks datasets. MC is the fitting error of our model, PL is the fitting error of using power-law, PL −C is the fitting error of using power-law with expoenetial cutoff, and SPL −C is the error of shifted power-law with geometric cutoff. Empirical Data and Fitting Results: Fig. 5 shows the distributions of contact counts for Rome and Beijing taxi datasets. The fitting parameters and errors are shown in Table 6. One can see that the proposed MC model reduces the fitting errors by 23% -90%. One can also observe a 24/26 similar trend that all the data include three phases. However, the fitting is not as good as that in citation and social networks. This is because, in vehicular networks, vehicles not only build links with other vehicles (when entering the communication ranges), but also break some links (when leaving the communication ranges).  Implications to vehicular network design: The MC model can be used to design a better routing scheme for Delay-Tolerant Networking (DTN), such as vehicular networks. With the intermittent connections in DTN, one of the main challenges is how to select nodes to forward data in order to improve the reachability and throughput. Existing routing schemes rely on exchanging information when two nodes meet, and predicting if the node will get closer to the destination node in the future. These schemes however have two main disadvantages. First, the current prediction is usually based on the exponential or power-law model, which has been shown to have larger prediction errors above. Second, when two nodes meet, they need to exchange the complete or summarized history, which can occupy lots of bandwidths, especially when the intra-connection time is short. With the new MC model, one can calculate the probability directly according to the derived close-form formula, in which vehicles only need to exchange the simple information of their current node degrees.