ANALYTICAL SOLUTION FOR THE SPREAD OF EPIDEMIC DISEASES IN COMMUNITY CLUSTERED NETWORK

: We present a bond percolation model for community clustered networks with an arbitrarily speciﬁed joint degree distribution. Our model is based on the Probability Generating Function (PGF) method for multitype networks, but incorporate the free-excess degree distribution, which makes it applicable for clustered networks. In the context of contact network epidemiology, our model serves as a special case of community clustered networks which are more appropriate for modelling the disease transmission in community networks with clustering eﬀects. Beyond the percolation threshold, we are able to obtain the probability that a randomly chosen community-i node leads to the giant component. The probability refers to the probability that an individual in a community will be aﬀected from the infective disease. Besides that, we also establish method to calculate the size of the giant component and the average small-component size (excluding the giant component). When the clustering ef-Received

fect is taken into account through the free-excess degree distribution, the model shows that the clustering effect will decrease the size of the giant component.In short, our model enables one to carry out numerical calculations to simulate the disease transmission in community networks with different community structure effects and clustering effects.

Introduction
It has long been recognized that two of the key features of social networks are community structure and clustering effect.The former one emphasizes that the links are dense in a community but sparse between communities, while the later one refers to the relative number of triangles in a network.For the community structure effect, the links between communities will make the network less heterogeneous and result in larger epidemic prevalence in the exponential degree distribution networks [1].For other types of degree distributions such as the power-law degree distribution, the authors of [2] developed an algorithm to obtain a social network model with a multiple-community structure with adjustable clustering coefficients and adjustable degree of community.They showed that the heterogeneous network is less efficient than the homogeneous network in spreading of epidemic.Different to [2], we study the bond percolation model of community clustered networks with an arbitrary joint degree distribution by using the PGF formalism.
In a series of papers [3,4,5], M.E.J. Newman used PGF for random graphs with arbitrary distributions of vertex degree.With the mathematics of generating functions, the author managed to calculate exactly some statistical properties of such graphs in the limit of large numbers of vertices, including the mean of component size and the giant component.Using the combination of mapping to percolation models and the generating function method, Newman established the analytic expressions for the size of epidemic outbreaks and the mean degree of individuals affected in an epidemic.Following Newman's work, many researchers consider more features to improve the percolation models based on the generating function method.For example, in [6], the author developed a model to represent heterogeneous populations so as to study the mixing patterns.Apart from that there are many other models, including percolation models for random directed networks [7], models for two compet-ing disease spreading over the same network at the same time [8].Despite all these efforts, for the PGF formalism type of models, the impact of community structure on epidemic spreading has not been well considered.In this paper, we intend to fill this gap by proposing a PGF model with community structure and investigating epidemic spreading upon these community structure networks.
There are some other models which use other mathematical tools or simulation to predict the transmission of infectious diseases in social networks.Among that including [9,10,11].In [9], the authors proposed a growth model to create networks with overlapping community structure and investigating epidemic spreading upon these networks by numerical simulation, while in [10], the authors proposed a model which the links in and between communities are dynamical to study the epidemic spreading in communities.Apart from that, a model consists of two types of nodes with different behavior patterns, i.e. active nodes and passive nodes was proposed to study the effects of social impact on the epidemic spreading in complex networks based on SIS epidemic model in [11] and simulations are done for the model in ER random network, BA scale-free network, structured scale-free network , and WS small-world network.Different from these models, our model having specific joint degree distribution in and between the communities.For this purpose, we use PGF formalism to generate the community structure model with the specific degree distribution we desire for each node.The biggest advantage is the PGF formalism allows us to get the analytical solution.
Apart from that, one of the areas which get considerable attention from researchers is clustered networks.In [7], the author introduced a class of random clustered networks and showed that the clustered networks had small component sizes and bigger epidemic threshold in comparison to the same preferential mixing unclustered networks.We intend to study the clustering effect in our community structure model.
In this paper, we present a bond percolation model of community clustered networks with an arbitrary joint degree distribution (i.e. the degree distribution can be specified arbitrarily).Our model is based on the PGF formalism for multitype networks introduced by Antoine Allard et.al.[12].Their multitype network model is the extension of the PGF formalism which investigated by M.E.J. Newman.In addition, we incorporate the free-excess degree distribution, which was introduced in [13] to make it applicable for clustered networks.We focus on complex networks with arbitrary joint degree distribution.In the context of contact network epidemiology, our model serves as a special case of community clustered networks which are more suitable for modelling the disease transmission in community networks with the clustering effect.For certain clustering coefficient and beyond the percolation threshold, we obtain the probability that a randomly chosen community-i node leads to the giant component.In the context of contact network epidemiology, this probability refers to the probability that an individual in a community is affected by the infectious disease.In addition, we derive formulae to calculate the size (i.e.fraction) of the giant component and the average small-component size (excluding the giant component) in the community structure network.If the disease transmission rate between each pair of communities is the same in both direction, the size of the giant component in each community-i is equivalent to the probability that a randomly chosen community-i node leads to the giant component.When the clustering effect is taken into account through the free-excess degree distribution , our model shows that the clustering effect leads to the reduce of the size of the giant component.
The rest of the paper is organized as follows.In Section 2, we discuss some assumptions to be used in our community clustered networks.In Section 3, we present the PGF formamlism for the proposed community clustered networks.In Sections 4 and 5, we focus on the calculation of the outbreak size distribution and the percolation threshold, followed by numerical simulations in Section 6.Some conclusions are given in Section 7.

Community Clustered Networks
We discuss a model with 2 communities which can be generalized to multicommunities.Throughout the discussion, we assume that (a) there exist realizable degree sequences which lead to simple graphs (i.e.networks) with no self loop, (b) the inter-community edges will be redistributed according to the three ways as explained in Section 3.1.Assumption (b) implies that power-law distribution will not be applied here.In other words, nodes with higher degree do not necessarily have more intercommunity edges.In assumption (c), we assign some isolated nodes to each community.In a finite time, the isolated nodes remain isolated when we randomly connect the inter-community edges to the nodes with different degrees in each community.This can serve as a check point for our computer program when we use it to find distribution for the size of small component.There is no contact for every isolated node, namely there is no transmissibility of disease.Hence, the number of isolated nodes in each community will remain the same.Apart from that, we set the highest degree at 8. In other words, the network is not a highly right skewed network and does not have super infection nodes.
In some of our analysis, we will further assume that one of the communities has a low vaccination rate and thus a higher rate of disease transmissibility among their members.BEsides that, the rate of disease transmissibility along the inter-community edges is low, as sick individuals will travel less.For our models, we assume that the exact number of nodes having degree k, denoted by n k , is known.Hence, we can write the exact generating function for the probability distribution in the form of a finite polynomial.

Formalism
We now present a formalism that describes the bond percolation model of community networks.It is based on the PGF formalism for multitype networks introduced by Antoine Allard et.al.[12].

Degree Distribution
First, we assume that the arbitrarily specified degree distribution will produce a realizable degree sequence.Let P i=1 (k 1 , k 2 , ..., k M ) and P i=2 (k 1 , k 2 , ..., k M ) be the probability degree distributions that a randomly chosen community-i node is connected to k 1 nodes, k 2 nodes, k 3 nodes and so on.Since we deal with community structure networks, among the edges of a node, there are some edges which may connect to nodes in other community.
For the sake of simplicity, we discuss the PGF formalism for the model with two communities.In this case, we have two ways to represent the degree distribution, namely number of nodes in each community or number of nodes connecting community-i node to community-j.

Degree Distribution for Number of Nodes in Each Community
In the first way, we need to have degree distribution for the number of nodes in each community.As an example, we consider the following degree distribution where k = {k l } 6 l=0 = {l} 6 l=0 = (0, 1, 2, 3, 4, 5, 6) and the value of p i (k l ) is the number of nodes with degree l in the community-i.In the above example, there are 10 nodes with 0 degree (i.e.isolated nodes) in community 1, 10 nodes with 1 degree, 10 nodes with 2 degree, 45 nodes with 3 degree and so on in community 1. Apart from that, we can get the information about the number of edges in each community, namely there are 2170 and 2450 edges respectively in communities 1 and 2 in our model example and the total edges is 4620.Note that here one link is counted as two edges.If we divide P i (k) by N i where N i denotes the number of nodes in community-i, we will have a probability degree distribution.Without considering the rate of transmissibility of diseases, we have the following PGF where i = 1, 2; M denotes the number of communities which is 2 in this example and x k l l denotes node in community-l with degree k l .

Degree Distribution for Number of Nodes Connecting Community-i to Community-j
The second way of representing the degree distribution is by considering the number of nodes connecting community-i to community-j.We have the following degree distribution: where p i (k i ) is as defined before, while p i (k jl ) denotes the number of nodes in community-i with k jl edges linking to community j in which k jl = l for l = 0, 1, 2, . ... For the above example, p 1 (k 1 ) shows that there are 10 nodes with 0 degree (i.e.isolated nodes) in community 1, 10 nodes with 1 degree, 20 nodes with 2 degree and so on in community 1; The data of p 1 (k 2 ) shows that there are 420 nodes in community 1 with no edge linking to community 2, 50 nodes in community 1 having one edge linking to community 2 and so on.The same will apply to the nodes in community 2. We can also get the information about the number of inter-community edges and the number of intra-community edges in each community.The number of intra-community edge is 2170 in community-1 and 2450 in community-2.For the two communities model, k = k 1 + k 2 , hence the total edges must be the same as that by the first way mentioned in Section 3.1.1(where it is 4620).The number of inter-community edges for communities 1 and 2 must be the same, in our model, we have 120 inter-community edges.
Without considering the rate of transmissibility of diseases, let G ij be the generating function where the subscript ij represents the chosen node in community-i connected to node in community-j, and let P i (k uv ) be the probability that a randomly chosen community-i node with u edges connects to the nodes in community v.Then, we have the following PGF denotes the average number of edges connecting nodes in communityi to nodes in community-j, and we represent it as z ij .
Definition 1.Let z ij be the average number of edges connecting nodes in community-i to nodes in community-j, then In matrix form, for two communities, we have Let the community-i nodes occupy a fraction w i of the network, and define w = w 1 0 0 w 2 .
We have wz = (wz) T ,tr(w) = 1 and w 1 = z 21 z 12 +z 21 ,w 2 = z 12 z 12 +z 21 .In our model, we have w 1 = z 21 z 12 +z 21 = 2/9 6/25+2/9 = 25 52 ,w 2 = z 12 z 12 +z 21 = 6/25 6/25+2/9 = 27 52 .This means that we have 25 52 fraction of nodes in community 1 and 27 52 fraction of nodes in community 2. The followings two ways can be used to determine how strong the community structure is: (a) Let Z be the number of edges connecting nodes in community-i to nodes in community-j, then we will have Z = n 2m z where n is the number of nodes and m is the number of edges.Hence we can use the information to obtain the modularity which is given by Q (b) In [14], the degree of community σ , is given by σ = p/q where p is the probability for the event that there exist links within the community and q is the probability for the event that there exist links between the communities.In this work, we redefine σ = tr(z) i =j z ij where σ >> 1 implies strong community.It is easy to show that our model has Q = 0.446 or σ = 18.02.

Algorithm for Redistributing the Intercommunity Links
There are a few possible ways to consider the distribution of intercommunity links including: the links are randomly attached to the nodes in each community, the links are equally attached to the nodes in each community, the links are preferably attached to the nodes with higher degrees in each community.
Consider the two communities model, where the links are equally attached to the nodes in each community.Let n be the number of intercommunity links, and let M be the highest degree of the nodes in the respectively community.
If n mod M = R, then we have Y M + R = n .Assuming that P and Q are the number of nodes with degree 1 and degree M , then we have the number of nodes with degree M + 1 is Y + R, the number of nodes with degree M is P − R where P is the number of nodes with degree 1, the number of nodes with degree 1 is Q − Y where Q is the number of nodes with degree M .

The Occupied Degree Distribution
For the discussion of the PGF formalism involving rate of disease transmission, we should consider the occupied degree distribution.For the two communities model, we define a bond occupation probability matrix as Definition 2. The probability that a randomly chosen degree-k node has k occupied edges is The occupied degree distribution, Pi ( k), is Hence, the PGF is From Definition 2, we can get the average occupied degree connecting nodes in community-i to nodes in community-j as zij = ∂G i (1;T) For two communities (we assume each node has at most degree 6 as for in model 2) , we have the following generating functions when we consider the rate of transmissibility, , where L 1 and L 2 are the total number of edges in communities 1 and 2 respectively and L ij is the total number of edges which link nodes in community-i to nodes in community-j.Note that G 1 (1; T) = 1 and G 2 (1; T) = 1.

The Occupied Excess Degree Distribution
Definition 3. The occupied excess degree distribution is given by Hence, the PGF is given by where the ij represents the chosen edges connecting community-i and communityj.
For our 2-community model, we can obtain F ij (x; T) by For two communities networks with nodes of at most degree 6, using (2), after some algebra work, we have where the parameter p uv denotes the probability that a node with excess degree v is reached by following a randomly chosen edge with excess degree u.Similar formulae can be established for F 12 (x 1 , x 2 ; T) , F 21 (x 1 , x 2 ; T) and F 22 (x 1 , x 2 ; T) where the parameter p in the above equation will be replaced by q,r,and s respectively.

The Occupied Free-Excess Degree Distribution
In order to consider the clustered network (C > 0 ), we apply the free-excess degree distribution concept introduced in [12].Analogously to the excess degree, we follow one of the edges of node v 0 to reach a neighbour v 1 having degree d(v 1 ) = i + 1( i is the excess degree).We are interested in calculating the probability that the node has k neighbours that are not connected back to v 0 (via a triangle), which is given by i The free excess degree distribution is given by By Definition 4, we use the generating function associated with the occupied excess degree distribution as given in Definition 3. Thus, we have the following relationship

Outbreak Size Distribution
In this section, we discuss an iteration method to obtain the probability that a randomly chosen community-i node leads to the giant component.Although the discussion applies to the model without considering the clustering coefficient C, for the model with clustering coefficient C, we will have F c ij instead of F ij as in (3).Let H ij (x; T) be the generating function for the size distribution of the component reached by following an i → j edge.
The solution for the equation ( 4) can be found by seeking the stable fixed point of the mapping as i = 1 . . .M and n → ∞ for initial conditions H 0 ij (x; T) = x j .The equation says that when we follow an i → j edge, we find at least one node at the other end (the factor of x 1 , x 2 on the RHS), plus some other clusters of nodes (each represented by H ij (x; T) ) which are reachable by following other edges attached to that node.The number of the clusters is distributed according to the coefficients of , and hence the appearance of F ij (x; T) .For our model, using (4) and ( 5), we have the following iteration process (x 1 , x 2 ; T); T)

Percolation Threshold
Using the moment property of the PGF, the average number of community-i nodes in the small component reached from a randomly chosen node is obtained by differentiating H (n) ij (x; T) with respect to x i .Hence, the average number of community-i nodes in the small component, s i ,is given by where α lj is well approximated by the solution of α jn and where the average excess degree is given by B µv δ iv .If there are only two communities, A can be obtained by with the average of s given by Hence, the phase transition occurs when det(I − A) = 0.
After redistributing the intercommunity links, we obtain The modularity for this model is σ = 19.634or Q = 0.451.First, we study the effect of transmissibility rate.In this case, we have the same transmissibility rate for all the links, namely either inter or intra community edges within the same community have the same transmissibility rate.For example, T = 0.1 means that T = 0.1 0.1 0.1 0.1 .Figure 1 shows the result obtained.
Secondly, we study the effect of clustering in this mode.Using (10), the epidemic threshold is determined to be 0.599.Above the epidemic threshold, clustering will reduce the probability that a randomly chosen community-i node leads to the giant component.In this case we use T = 0.64. Figure 2 shows the result obtained.
Thirdly, by using (8), we determine the average number of community-i nodes in the small component, s i , reached from a randomly chosen node for different clustering coefficient, C, in Figure 3.For analysis, we distribute the inter-community edges randomly across nodes in each community, we get   , there exists a giant component.(i.e.outbreak of disease) We present the results in Figure 4 and Figure 5.

Concluding Remarks
In this paper, we focus on complex networks with an arbitrary joint degree distribution.We study community clustered networks which are more suitable for modelling the disease transmission in community networks with clustering effects.With certain clustering coefficient and beyond the percolation threshold, we obtain the probability that a randomly chosen community-i node leads to the giant component.In the context of contact network epidemiology, this probability refers to the probability that an individual in a community is affected by the infectious disease.We have also derived formulae to calculate the size of the giant component and the average small-component size (excluding the giant component).Taking into account the clustering effect through the free-excess degree distribution, the model shows that the clustering effect will decrease the size of the giant component.

Figure 1 :Model 2 :P 1 (k 2
Figure 1: Diagram showing the probability, P , that a randomly chosen community-i node leads to the giant component versus tranmissiblity rate, T , for model 1.The diagram also shows the fraction of giant component in communities 1 and 2 (S1 and S2) versus tranmissiblity rate, T .The epidemic threshold in this example is 0.599.

Figure 2 :
Figure 2: The probability, P , that a randomly chosen community-i node leads to the giant component versus clustering coefficients, C, for model 1 when T = 0.64.S1 and S2 are the fraction of giant component in communities 1 and 2 respectively

Figure 3 :Table 2 : 10 .
Figure 3: A diagram showing the average number of community-i nodes in the small component, s i , reached from a randomly chosen node for different clustering coefficient, C, for model 1 when T = 0.64.

Figure 4 :
Figure 4: The probability,P , that a randomly chosen community-i node leads to the giant component versus the clustering coefficient,C, for model 2. S1 and S2 are the fraction of the giant component in communities 1 and 2 respectively.

Figure 5 :
Figure 5: A diagram showing the average number of community-i nodes in the small component, s i , reached from a randomly chosen node for different clustering coefficient, C, for model 2.

Table 1 :
Number of edges in each community in model 1