On the relationship between Gaussian stochastic blockmodels and label propagation algorithms

The problem of community detection receives great attention in recent years. Many methods have been proposed to discover communities in networks. In this paper, we propose a Gaussian stochastic blockmodel that uses Gaussian distributions to fit weight of edges in networks for non-overlapping community detection. The maximum likelihood estimation of this model has the same objective function as general label propagation with node preference. The node preference of a specific vertex turns out to be a value proportional to the intra-community eigenvector centrality (the corresponding entry in principal eigenvector of the adjacency matrix of the subgraph inside that vertex's community) under maximum likelihood estimation. Additionally, the maximum likelihood estimation of a constrained version of our model is highly related to another extension of label propagation algorithm, namely, the label propagation algorithm under constraint. Experiments show that the proposed Gaussian stochastic blockmodel performs well on various benchmark networks.


Introduction
In complex networks, communities [1] are considered as groups of vertices whose intraconnections are dense and inter-connections are sparse. Many real-world networks contain community structures. Detection of communities in networks has been developed a lot in recent years.
Since the proposal of modularity [2] as a quality function of a community partition, many algorithms were proposed to maximize modularity. Meanwhile, label propagation algorithms [3] are much simpler and more scalable for very large networks. of the adjacency submatrix computed inside the community. Additionally, the norm of this vector turns out to be the square root of the principal eigenvalue of this adjacency submatrix. This node preference as a local metric (i.e., a value proportional to intracommunity eigenvector centrality [19]) has never been proposed before. We show that the objective function of our model is the sum of the squares of the principal eigenvalues of the adjacency matrices of all communities. Also, we show that label propagation under constraint [5] is a special case of the maximum likelihood estimation of a constrained version of our model.
We experimented on two real-world networks and also synthetic networks, and illustrated that our model (GSBM-P) performs better than label propagation with intra-community PageRank [10] as node preference [7] and the variational Bayes method for weighted stochastic blockmodel [15] in community detection. Our method is also compared with label propagation algorithm (LPA) [3]. In synthetic networks, our model achieves a performance superior or comparable to that of LPA.
The rest of the paper is organized as follows: Section 2 reviews related work; Section 3 introduces our model and the maximum likelihood estimation of our model; Section 4 presents our experiments and the last section concludes this paper.

Label Propagation Algorithm
Label propagation algorithm [3] seeks to discover communities by propagating vertices' labels. It iteratively updates a vertex's label by choosing the label possessed most by its neighbors. If we denote the label of vertex i as z i , then the update step of z i can be formulated as where W is the adjacency matrix, and δ ab = 1 iff a = b is the Kronecker delta function. When the iteration process terminates, all vertices with the same labels are considered as a community. Tibély et al. [18], and also Barber and Clark [5] showed that the objective function of label propagation algorithm can be formulated as This objective function is the total weight of the intra-community edges. It is shown that label propagation algorithm aims to maximize Q LPA through a zero-temperature kinetics. It can be seen that this objective function encourages large communities. Thus, label propagation algorithm sometimes gets a trivial solution where all vertices are in one community and the objective function reaches its maximum.

Label Propagation with Node Preference
Leung et al. [6] introduced hop attenuation and node preference to label propagation algorithm. We leave out the introduction of hop attenuation regularization which prevents the formation of very large communities. They proposed a label update strategy where f j is the propagation strength (a.k.a. node preference) of node j, m is a parameter which controls the influence of f j . Subelj and Bajec [7] proposed the defensive diffusion and attenuation label propagation algorithm to accurately detect community cores. They define the propagation strength f m i of node i in community z i as the probability of a random walker inside the community labeled with z i visits vertex i, i.e., where N(i) denotes the set of vertices which are the neighbors of vertex i, and k z i j denotes the total weight of edges which connect vertex j and vertices from community z i . Note that this definition of node preference is similar to the L 1 -normalized PageRank [10] inside the community z i and can be easily extended to weighted networks. In the following part of this paper, we shall call this value as intra-community PageRank or local PageRank. Then, defensive diffusion applies preference to the core of each community, However, the label propagation with intra-community PageRank as node preference prefers small communities, and normally discovers much more communities than what is expected.
We propose the Gaussian stochastic blockmodel with node preference (GSBM-P), whose maximum likelihood estimation is equivalent to label propagation with node preference, where the node preference of a vertex is proportional to local eigenvector centrality computed inside its community rather than the local PageRank score. An alternative measure for eigenvector centrality [19] in directed network is the authority score and hub score in HITS [20]. In the field of ontology learning, He et al. [21] used an HITS based community detection algorithm to produce a concept hierarchy.
In this paper, we focus on undirected networks, and the proportion of intracommunity eigenvector centrality is derived from maximum likelihood estimation.

Weighted Stochastic Blockmodel
Aicher et al. [17] and Aicher et al. [15] applied stochastic blockmodel to weighted networks. For dense networks, they claimed that Gaussian distributions can be used to generate the weights of the edges between each pair of vertices. That is, For sparse networks, they claimed that absent edges are different from edges with zero weight. Therefore, they modelled the edge existence and edge weights separately. Exponential family distribution such as Gaussian distribution is adopted only for weighted edges. They further showed that degree-corrected stochastic blockmodel [13] can be adopted to model the existence of edges in the following manner: where P denotes the distribution for modeling edge existence, A ij is 1 when vertex i and j is connected and 0 otherwise, (i, j) ∈ E denotes a pair of linked vertices, p i is the parameter associated with vertex i and p i p j θ z i ,z j is the expectation of A ij , and α is between 0 and 1. Note that the authors did not propose a degree-corrected version of weighted stochastic blockmodel for modeling edge weights. However, the maximum likelihood estimation of σ 2 i,j will be zero if the weights of all the edges between group i and group j are the same, which creates a degeneracy in the likelihood calculation. Therefore, a Bayesian approach is adopted. Prior distributions are introduced to all parameters and variable z as well.
We propose a Gaussian stochastic blockmodel that can be viewed as a constrained version of degree-corrected version of weighted stochastic blockmodel. It assumes blocks are assortative in the way that it explicitly assumes sparse inter-block connections, so that it is more suitable for community detection.

Model
In our model, namely Gaussian stochastic blockmodel with node preference(GSBM-P), it is assumed that the graph is generated by • Assigning block indicators to each vertex; • Then drawing edge weights from a Gaussian distribution for each edge (including non-existing edges with weight equal to zero) between each pair of vertices.
The Gaussian distribution for the weight of the edge between each pair of vertices is defined as follows: where p i is the node preference of vertex i, z i is the block indicator for vertex i, and σ 2 is the variance of Gaussian distributions. This model assumes that the expected value of the weights of inter-community edges is 0; the expected value of the weights of the intra-community edges is a product of the node preference of the two vertices associated with the edges.
Our model can be viewed as a constrained version of the degree corrected version of weighted stochastic blockmodel(WSBM) proposed by Aicher et al. [17]. We extend the WSBM [17] to be degree corrected when explaining edge weights, rather than incorporate the degree corrected SBM [13] to explaining edge existence, which is proposed in [15]. The degree correction should make our model fit better than WSBM in networks with substantial heterogeneous degrees (or sum of linked edge weights). Meanwhile, our model can be applied in assortative networks. By explicitly assuming the connection between blocks should be sparse, our model can discover community structures accurately.

The Maximum Likelihood Estimation of GSBM-P
In this section, we adopt the maximum likelihood estimation to fit our model, and show its equivalence to label propagation with node preference. Since our model does not introduce priors over block indicators z, we treat the block indicators as parameters, and use coordinate ascent method to estimate both the block indicators and p. The adopted coordinate ascent method estimates a parameter to maximize the likelihood when fixing all other parameters.
The likelihood function of our model is where p i is node preference of vertex i. Maximizing the likelihood function is equivalent to minimizing σ for Gaussian models. Maximum likelihood estimation of σ 2 yields where n is the total number of vertices. Minimizing σ 2 is equivalent to maximizing the following objective function: The maximum-likelihood estimated value of p i can be expressed in the following iterative updating manner: or in vector and matrix notation: where p z is a vector whose entries consist of the node preferences of the vertices inside community z; W z is the adjacency matrix of community z. Vector p z converges to the principal eigenvector of W z . It can be shown that the norm of p z is the square root of the principal eigenvalue of W z , denoted as √ λ z . Thus, these node preferences are the product of √ λ z and the L 2 -normalized local eigenvector centrality computed inside the community z.
Noticing that for a specific community z, i,j the aforementioned objective function (Eq. (11)) can be expressed as follows: Therefore, the maximum likelihood estimation of GSBM-P using coordinate ascent method is equivalent to maximizing the sum of the squares of the principal eigenvalues of the adjacency matrices of all communities. Intuitively, a community z with higher λ z is better intra-connected. Thus, the maximum likelihood estimation of GSBM-P aims to find communities that are densely intra-connected.
3.1.1. Relationship with Label Propagation Algorithm with node preference Rewriting the objective function according to Eq. (14) and Eq. (15) yields Given all other vertices' labels, the update rule of vertex i should be which is exactly the update rule of label propagation with node preference without considering the hop attenuation regularization. The difference is that the node preference in Eq. (12) is derived in a statistically-grounded way rather than heuristically defined. The update rule has a clear explanation. Each vertex has a similarity with each community, which is the weighted sum of linked edge weights connected to this community, with more representative (higher local eigenvector centrality) vertex in the community owning higher weight. Then, each vertex joins in the most similar community. In this sense, the node preferences of vertices inside two communities should be updated immediately when a vertex joins in a new community, though it is not required by the coordinate ascent method. Thus, the maximum likelihood estimation of GSBM-P is equivalent to label propagation with a value proportional to intra-community eigenvector centrality as node preference.

a Constrained Version of GSBM-P
In this section, we examine a constrained version of our model and show its relation to label propagation under constraint. The constrained version assumes that each vertex i inside the community z owns the same node preference √ µ z , and hence is not degree corrected. That is, where the expected edge weight between vertex i and j is µ µ z i is a parameter associated with each community. This simplified model assumes that the expected value of the weights of the inter-community edges is 0; the expected value of the weights of the edges inside community z is a uniform value µ z . The likelihood function of a graph under this constrained model is Maximizing the likelihood function is equivalent to minimizing σ for Gaussian models. The maximum likelihood estimation of σ 2 can be expressed as Hence, minimizing σ 2 is equivalent to maximizing the following objective function: where n z denotes the number of vertices inside community z.
In this constrained version of our model, if we specify µ z = µ for all communities, then this simplified quality function can be viewed as the objective function of label propagation algorithm under constraint, since where z n 2 z is the penalty term, which is maximized when all vertices are in a community, and prevents the large communities from growing. µ can be viewed as a resolution parameter, which controls the community size.
In comparison with label propagation algorithm under constraint [5], leaving µ tunable makes two objective functions equivalent. Moreover, label propagation algorithm under constraint can be viewed as a coordinate ascent method for this objective function. Therefore, the maximum likelihood estimation of the block indicators in the constrained version of our model while leaving µ z tunable is a generalization of the label propagation algorithm proposed by Barber and Clark [5] with a penalty term The constrained version of our model has its disadvantages, especially that the expected weight of edges inside each community is equal. It may not fit well in real networks. By incorporating node preference, the expected weight of edges may vary according the the node preference assigned for each vertex. This renders our model more expressive and robust against modelling complex real-world networks. We show in next section that our proposed Gaussian stochastic blockmodel with node preference performs well.

Experiments
In this section, the coordinate ascent method for GSBM-P is tested on both real-world networks and synthetic networks. We also compare it with label propagation algorithm (LPA) and label propagation algorithm with intra-community PageRank (LPA-P) as the node preference on various synthetic networks, and the variational Bayes method for weight stochastic blockmodel (WSBM) on weighted synthetic networks.
Two real-world networks, namely the karate club [22] and political blogs [23] are chosen as test data. For synthetic networks, unweighted benchmark networks proposed by Lancichinetti et al. [24] and weighted benchmark networks proposed by Lancichinetti and Fortunato [25] are chosen. Erdös-Rényi random graphs [26] are chosen for checking overfitting. Gaussian distribution is not suitable to fit binary data (unweighted networks). However, in the literature, label propagation algorithms are often applied in unweighted networks. Due to the equivalence to label propagation with node preference, we also test our coordinate ascent method for GSBM-P in unweighted networks and show its good performance.
Normalized mutual information (NMI, a.k.a. symmetric uncertainty, introduced by Witten and Frank [27]) is adopted to reflect the similarity between the obtained partition and the true partition. Higher normalized mutual information implies greater similarity between two partitions. The maximal value of NMI is 1, which implies identical partitions.
In all of the following experiments, the coordinate ascent method for GSBM-P starts from the initialization where each vertex has its unique label. The iterative order of the vertices for the coordinate ascent method is random, and the node preferences inside two communities are updated immediately when a vertex moves from one community to another. The time complexity of this method should be O((d+td 1 C)n) in each iteration, where d denotes the average degree, t denotes the average iterative times for computing intra-community eigenvector centrality, d 1 denotes the average intra-community degree, and C denotes the average size of communities. In very large networks, the node preferences can be updated everytime all vertices are traversed for saving time. Then, the time complexity should be O(dn + td 1 n) in each iteration, which is scalable for very large networks.

Empirical networks
Karate club data set [22] is a social network of friendships between 34 members of a karate club at a U.S. university in the 1970s. The network has 34 nodes and 78 edges with weight 1. In reality, the karate club finally splits into 2 groups. The true partition of this network is known.
The coordinate ascent method for GSBM-P is run several times and the partition with the highest quality function value is chosen, which is impossible for traditional label propagation with node preference. Fig. 1 demonstrates the result of our method in the karate club network. The result of our method is a more fine-grained partition of the true partition. In fact, this result has a slightly higher quality function value than the true partition. Note that modularity maximization method in [28] and the non-parametric Bayesian mixed membership stochastic blockmodel [29] favor the partition with four communities. The true partition is a local optimum of our method which is frequently reached.
We notice that the vertex 10 is misidentified by degree corrected stochastic blockmodel in [13] and modularity maximization method in [28]. We verify that the objective function value (i.e., Eq. (16) or Eq. (17)) of the true partition is higher than that of the partition with misidentification of vertex 10. Thus GSBM-P performs better in karate club network.
For GSBM-P, we observed that vertices with higher degree have generally higher node preference inside one community, and in karate club network, the pair of vertices with higher degree inside one community are more likely to be connected. It is the same as expected by the model, which expects the weight of edge linked to pair of vertices with higher node preference should be larger.
We then show how our method performs in a larger network. political blogs data set [23] is a network of direct hyperlinks between political blogs whose largest connected component contains 1222 nodes. In this paper, we use the undirected form. Fig. 2 shows the result of our method in the larget connected component of political blog network.
GSBM-P discovers 9 communities with 7 tiny communities and 2 major communities that are roughly two political tendencies. The normalized mutual information of obtained partition is 0.678. If we merge small communities to one of the two major communities, normalized mutual information increases to 0.724. The performance is very close to that of the degree corrected stochastic blockmodel [13] with given cluster number.

Synthetic networks
4.2.1. Unweighted synthetic networks Since our model treats the block indicators as parameters, the maximum likelihood estimation of the block indicators of vertices may be bias especially when these vertices have very small degree. To check overfitting, the coordinate ascent method for GSBM-P is tested on Erdös-Rényi random graphs [26] where no community structures exist. Analogous to [30], the network sizes are fixed to 128, and the average degrees in all random graph range from 4 to 32. Our method is run 100 times on each random graph and the results with highest objective function value are chosen. In Fig. 3(a), the blue dashed line represents the number of connected components in each random graph. The number of discovered communities in each random graph is illustrated in Fig. 3(a). It shows that the coordinate ascent method is not overfitting when the average degree is not too small.
Our method is also tested on resolution limit test benchmark networks [31]. The networks are composed by cliques with 4 vertices and each clique is linked to the next clique with an edge to form a ring. Both of our method and label propagation algorithm [3] are run 10 times on each network and the average number of discovered communities is shown in Fig. 3(b). Each time, our method discovers the correct communities. Hence, in Fig. 3(b), the red line covers the green dashed line which represents the number of planted communities. Thus, our method tends not to suffer from resolution limit. However, the blue error bars in Fig. 3(b) imply that LPA algorithm is not stable. The average number of discovered communities of LPA is less than the planted, because LPA sometimes identifies several cliques as one community. We then test the algorithms in the unweighted benchmark networks proposed by Lancichinetti et al. [24]. Experiments show that the coordinate ascent method for GSBM-P is superior to LPA in those networks.
There are several parameters to generate the benchmark networks, including the number of vertices n, average degree, maximum degree, the degree distribution, the community size distribution, the average community size, the range of community size C, and the ratio that a vertex links to vertices outside its community (i.e., topological mixing parameter µ t ). When the mixing parameter is smaller than 0.5, the communities in generated networks are strong communities [32] where each vertex has more connections with vertices inside its community than vertices outside the community.
We set two kind of parameters with different range of community size according to [33] while leaving the mixing parameter µ t as the independent variable. We compare GSBM-P with LPA-P and LPA. The normalized mutual information between the obtained partition and true partition is calculated. Fig. 4 shows how the performance of these algorithms vary according to the mixing parameter in two kind of networks.
From Fig. 4(a) and Fig. 4(b), it can be seen that the true partition is an optimum of our method when the mixing parameter µ t is small. Our method outperforms LPA. The label propagation approach applied in LPA is very suitable for finding strong communities. However, as the mixing parameter is larger than 0.5 (i.e. weak communities), LPA fails. The performance of LPA-P is not satisfying.
When mixing parameter µ t is small, coordinate ascent method for GSBM-P always converges to the optimum in every run.
In our experiments, we find the number of iterations of those label propagation algorithms is few. The coordinate ascent method for GSBM-P may oscillate near the  . Normalized mutual information with respect to the mixing parameter µ t for different algorithms on two unweighted test benchmarks [24]. Each data point is an average over 30 networks generated by the same parameters.
local optimum when the mixing parameter is large. But 10 iterations is already enough for it to reach a local optimum. We then show the running time of algorithms in Fig. 5. In Fig. 5, GSBM-P(alter) represents the alternative coordinate ascent method for GSBM-P that only updates node preferences everytime all vertices are traversed. Mixing parameter µ t is set to 0.1, while parameters other than the number of vertices and community size are setting according to [33]. In Fig. 5(a), the community sizes are fixed to range from 20 to 100, and the number of vertices varies among the test networks. In Fig. 5(b), network sizes are fixed to 3000, the community size varies. The community size in x axis represents the average value, for instance, 390 in x axis represents the community sizes ranges from 370 to 410, and so forth.
It shows that the running time of our method is linear to network size and community size. For very large networks with large communities, the alternative coordinate ascent method can be applied.

Weighted synthetic networks
We then test the algorithms in the weighted benchmark networks proposed by Lancichinetti and Fortunato [25]. The weights of edges are non-negative in these benchmark networks.
There are two more parameters in weighted benchmark networks than unweighted benchmark networks, namely β which controls the power-law relation between the sum of weights a vertex links to its neighbors and the vertex's degree, and the ratio of weights a vertex links to vertices outside its community (i.e., mixing parameter µ w ).
We compare our method with LPA and the variational Bayes method for WSBM. We set the parameter α in WSBM (see Eq. 7) to 0.0 and 0.5, corresponding to pure WSBM and WSBM with auxiliary degree correction respectively. Community number . Running time with respect to the network size and community size for different algorithms on unweighted test benchmarks [24]. Each data point is an average over 100 runs.
is given for WSBM in all networks. We set two kind of parameters with different topological mixing parameter while leaving the mixing parameter µ w as the independent variable. The network sizes are fixed to 150, the average degree is 15, and the maximum degree is 30, and all other parameters are identical to that in [33]. The results are illustrated in Fig. 6. Each point in Fig. 6 shows the average NMI over the results with highest objective function values in 10 runs on each of 30 networks.
(a) (b) Figure 6. Normalized mutual information with respect to the mixing parameter µ w for different algorithms on two small weighted test benchmarks [25]. Each data point is an average over 30 networks generated by same parameters.
The variational Bayes algorithm for WSBM is not efficient. For example, it often reaches a poor local optimum. Only sometimes, it reaches a better result with higher marginal likelihood, which may not presents in 10 runs. Sometimes, the variational Bayes algorithm even not converge in 200 iterations. From Fig. 6, WSBM fails to give a good partition even when mixing parameter µ w is small, which may due to the lack of degree correction for modeling edge weights. Moreover, the time complexity of each iteration of naive variational Bayes algorithm for WSBM is O(mK 2 ), where m denotes the total number of edges in networks and K denotes the number of communities, which limits its application in very large networks. The coordinate ascent method for GSBM-P performs much better, and is comparable to LPA. In Fig. 6(a), when mixing parameter µ w is small, it recovers the planted partition.
We show the performance of our method in weighted networks with 1000 vertices. We set four kind of parameters with different range of community size and different topological mixing parameter according to [33] while leaving the mixing parameter µ w as the independent variable. The results are illustrated in Fig. 7. Here we omit the presentation of WSBM, because it takes too much time to run on 3060 networks. (c) (d) Figure 7. Normalized mutual information with respect to the mixing parameter µ w for different algorithms on four weighted test benchmarks [25]. Each data point is an average over 30 networks generated by same parameters.
In all of the four groups of weighted benchmark networks, the coordinate ascent method for GSBM-P recovers the true partition(see Fig. 7(a)) or achieves the results whose NMI are only slightly lower than that of LPA (see Fig. 7(b), Fig. 7(c) and Fig.  7(d)). When mixing parameter µ w is slightly larger, our method outperforms LPA. In Fig. 7(c) and Fig. 7(d), LPA discovers much more communities than the true partition even when the mixing parameter µ w is small. It implies that the communities are not well connected inside themselves. The performance of our method is again comparable to LPA on those benchmark networks, while LPA-P is not satisfying.

Conclusions and Future Work
In this paper, we proposed the Gaussian stochastic blockmodel with node preference (GSBM-P). The maximum likelihood estimation of GSBM-P is proved to be equivalent to maximizing the objective function below: where λ z is the principal eigenvalue of the adjacency submatrix in community z. We then proved that the maximum likelihood estimation of our model(i.e., our coordinate ascent method) can be seen as label propagation with node preference. We demonstrated that the vector composed by node preferences of vertices inside community z is the product of √ λ z and the intra-community eigenvector centrality, i.e. the L 2normalized dominant eigenvector of the adjacency matrix inside the community.
Experiments showed that the coordinate ascent method for Gaussian stochastic blockmodel with node preference worked well in most cases. It outperforms the variational Bayes method for weighted stochastic blockmodel and label propagation with intra-community PageRank as node preference in the aspect of community detection. In unweighted networks, the coordinate ascent method for GSBM-P is superior to LPA. In weighted networks, the performance of our method is comparable to LPA.
Subelj and Bajec [4] have proposed that the iterative order implicitly contains the propagation strength. In the future, we may explore the suitable iterative order for our coordinate ascent method. We make two strong assumptions for our model, such that it is related to label propagation algorithms. However, these assumptions may be strict. The first one is that we fixed the expected edge weights between distinct communities to be zero. In the future, we may try to leave this small expected inter-community edge weight as a tunable parameter. The second one is that we treat the block indicators as parameters. Possible future work also includes placing a prior on block indicators and adopts more advanced approximation inference methods such as variational Expectation Maximization.