On the relationship between Gaussian stochastic blockmodels and label propagation algorithms

Junhao Zhang; Tongfei Chen; Junfeng Hu

doi:10.1088/1742-5468/2015/03/P03009

1. Introduction

In complex networks, communities [1] are considered as groups of vertices whose intra-connections are dense and inter-connections are sparse. Many real-world networks contain community structures. Detection of communities in networks has been developed a lot in recent years.

Since the proposal of modularity [2] as a quality function of a community partition, many algorithms were proposed to maximize modularity. Meanwhile, label propagation algorithms [3] are much simpler and more scalable for very large networks. The label propagation algorithm has many extensions, such as balanced label propagation algorithm [4], label propagation under constraint [5] and label propagation with hop attenuation and node preference [6, 7]. Barber and Clark [5] showed that modularity can be viewed as a special case of label propagation under constraint. Liu and Murata [8] introduced multi-step greedy agglomeration [9] to the label propagation approach used by Barber and Clark [5], which helps to escape from poor local optimum. Leung et al [6] first introduced hop attenuation and node preference to label propagation algorithm. Šubelj and Bajec [7] used a formulation similar to PageRank [10] inside each community as the node preference and used the defensive diffusion and attenuation label propagation algorithm to extract the cores of communities.

Recent years have also seen the popularity of stochastic blockmodels [11–16] in both non-overlapping and overlapping community detection. Stochastic blockmodels assume that vertices in a network are partitioned into blocks, and edges of two vertices are sampled from a distribution parameterized by block indicators of two vertices. Stochastic blockmodels are flexible and can not only handle community detection, but also group vertices with similar behaviors, that is, vertices from one block have similar connectivity patterns with vertices from another block. For the community detection problem, it implies dense intra-community connections and sparse inter-community connections.

Karrer and Newman [13] proposed a degree-corrected stochastic blockmodel which incorporates vertex degree into stochastic blockmodels and found that degree-corrected stochastic blockmodels perform better than standard stochastic blockmodels in networks with substantial heterogeneous degrees. Aicher et al [15, 17] proposed the weighted stochastic blockmodel to model weighted networks.

However, in community detection literature, there is little work on the nature of the label propagation algorithm and label propagation with node preference. Tibély et al [18] proposed that the objective function of the label propagation algorithm [3] is equivalent to a ferromagnetic Potts model. Barber and Clark [5] extended the objective function to a constrained one. For label propagation with node preference [6, 7], the definition of node preference is still heuristic. In this paper, we establish a connection between stochastic blockmodels and label propagation and some of its extensions, so that new node preference parameters are derived in a mathematically rigorous framework, namely, through maximum likelihood estimation.

We propose the Gaussian stochastic blockmodel with node preference (GSBM-P) for non-overlapping community detection. Our model can be viewed as a constrained version of a degree corrected version of the weighted stochastic blockmodel [17]. GSBM-P uses Gaussian distributions to model the weight of edges with the same global variance shared across all communities. It is assumed that the expected weight of edges between two different communities should be zero. GSBM-P assumes that the weight of an edge inside a community is determined by the node preference of its two vertices. The maximum likelihood estimation of GSBM-P turns out to have the same objective function as general label propagation with node preference. The vector whose entries are node preferences of vertices inside a community is proportional to the L₂-normalized principal eigenvector of the adjacency submatrix computed inside the community. Additionally, the norm of this vector turns out to be the square root of the principal eigenvalue of this adjacency submatrix. This node preference as a local metric (i.e. a value proportional to intra-community eigenvector centrality [19]) has never been proposed before. We show that the objective function of our model is equivalent to the sum of the squares of the principal eigenvalues of the adjacency matrices of all communities. Also, we show that the objective function of label propagation under constraint [5] is a special case of the objective function of the maximum likelihood estimation of a constrained version of our model.

A coordinate ascent algorithm is developed to optimize the objective function of our model. We experimented on two real-world networks and also synthetic networks, and illustrated that our method performs better than label propagation with leading eigenvector of the intra-community random walk matrix as node preference [7] and the variational Bayes method for weighted stochastic blockmodel [15] in community detection. Our method is also compared with the label propagation algorithm (LPA) [3] and order statistics local optimization method (OSLOM) [20]. In synthetic networks, our method achieves a performance superior or comparable to that of LPA. In unweighted synthetic networks, the performance of our method is close to that of OSLOM.

The rest of the paper is organized as follows: section 2 reviews related work; section 3 introduces our model and the maximum likelihood estimation of our model; section 4 presents our experiments and the last section concludes this paper.

2. Related work

2.1. Label propagation algorithm

The label propagation algorithm [3] seeks to discover communities by propagating vertices' labels. It iteratively updates a vertex's label by choosing the label possessed most by its neighbors. If we denote the label of vertex i as z_i, then the update step of z_i can be formulated as

$\begin{equation} z_i = \arg\max_z \sum_{j \neq i} W_{ij} \delta_{z_j z}, \end{equation} \tag{ 1 }$

where W is the adjacency matrix, and δ_ab = 1 if a = b is the Kronecker delta function. When the iteration process terminates, all vertices with the same labels are considered as a community.

Tibély et al [18], and also Barber and Clark [5] showed that the objective function of label propagation algorithm can be formulated as

$\begin{equation} Q_{\rm{LPA}} = \sum_{i,j} W_{ij} \delta_{z_i z_j} \ = \sum_z \sum_{i,j} W_{ij} \delta_{z_i z} \delta_{z_j z} \ . \end{equation} \tag{ 2 }$

This objective function is the total weight of the intra-community edges. It is shown that the label propagation algorithm aims to maximize Q_LPA through a zero-temperature kinetics. It can be seen that this objective function encourages large communities. Thus, the label propagation algorithm sometimes gets a trivial solution where all vertices are in one community and the objective function reaches its maximum.

2.2. Label propagation with node preference

Leung et al [6] introduced hop attenuation and node preference to the label propagation algorithm. We leave out the introduction of hop attenuation regularization which prevents the formation of very large communities. They proposed a label update strategy

$\begin{equation} z_i = \arg\max_z \sum_{j \neq i} f_j^m W_{ij} \delta_{z_j z} \ , \end{equation} \tag{ 3 }$

where f_j is the propagation strength (a.k.a. node preference) of node j, m is a parameter which controls the influence of f_j.

Šubelj and Bajec [7] proposed the defensive diffusion and attenuation label propagation algorithm to accurately detect community cores. They define the propagation strength $f_i^m$ of node i in community z_i as the probability of a random walker inside the community labeled with z_i visits vertex i, i.e.

$\begin{equation} p_i^{z_i} = \sum_{j \in N(i)} \frac{p_j^{z_j}}{k_j^{z_i}} \delta_{z_i z_j} \ , \end{equation} \tag{ 4 }$

where N(i) denotes the set of vertices which are the neighbors of vertex i, and $k_j^{z_i}$ denotes the total weight of edges which connect vertex j and vertices from community z_i. Note that this definition of node preference is the leading eigenvector of the random walk matrix inside the community z_i and can be easily extended to weighted networks. In the following part of this paper, we shall call this value as intra-community RandomWalk or local RandomWalk.

Then, defensive diffusion applies preference to the core of each community,

$\begin{equation} z_i = \arg\max_z \sum_{j \neq i} p_j^{z_j} W_{ij} \delta_{z_j z} \ . \end{equation} \tag{ 5 }$

However, the label propagation with intra-community RandomWalk as node preference prefers small communities, and normally discovers many more communities than what is expected.

Generally, the objective function that label propagation with node preference tries to maximize can be formulated as

$\begin{equation} Q_{\rm LPA-P} = \sum_z \sum_{i,j} p_i p_j W_{ij} \delta_{z_i z} \delta_{z_j z} \ , \end{equation} \tag{ 6 }$

where p_i is the node preference associated with vertex i, and it can be node degree or other centrality not depending on the vertex's label.

We propose the Gaussian stochastic blockmodel with node preference (GSBM-P), whose maximum likelihood estimation results in the same objective function as general label propagation with node preference, where the node preference of a vertex is proportional to local eigenvector centrality computed inside its community rather than the local RandomWalk score. In other words, label propagation with intra-community RandomWalk as node preference does not have this objective function since the objective function may decrease once node preference is re-estimated. An alternative measure for eigenvector centrality [19] in directed network is the authority score and hub score in HITS [21]. In the field of ontology learning, He et al [22] used an HITS based community detection algorithm to produce a concept hierarchy.

In this paper, we focus on undirected networks, and the proportion of intra-community eigenvector centrality is derived from maximum likelihood estimation.

2.3. Weighted stochastic blockmodel

Aicher et al [15, 17] applied the stochastic blockmodel to weighted networks. For dense networks, they claimed that Gaussian distributions can be used to generate the weights of the edges between each pair of vertices. That is,

$\begin{equation} W_{ij} \sim \mathcal{N}(\mu_{z_i z_j}, \sigma_{z_i z_j}^2) \;. \end{equation} \tag{ 7 }$

For sparse networks, they claimed that absent edges are different from edges with zero weight. Therefore, they modelled the edge existence and edge weights separately. Exponential family distribution such as Gaussian distribution is adopted only for weighted edges. They further showed that the degree-corrected stochastic blockmodel [13] can be adopted to model the existence of edges in the following manner:

$\begin{eqnarray} &&\fl \log P(W|{\bf{z}},{\bf{p}},{\boldsymbol{\theta}},{\boldsymbol{\mu}},{\boldsymbol{\sigma}}) = \alpha \displaystyle\sum_{ij} \log \mathcal{P}({A_{ij}}|{p_i}{p_j}{\theta _{{z_i},{z_j}}})+ (1 - \alpha )\displaystyle\sum_{(i,j) \in E} \log \mathcal{N}({W_{ij}}|{\mu _{{z_i}{z_j}}},\sigma _{{z_i}{z_j}}^2),\nonumber\\ \end{eqnarray} \tag{ 8 }$

where $\mathcal{P}$ denotes the distribution for modeling edge existence, A_ij is 1 when vertices i and j are connected and 0 otherwise, (i, j) ∈ E denotes a pair of linked vertices, p_i is the parameter associated with vertex i and p_ip_jθ_{z_i, z_j} is the expectation of A_ij, and α is between 0 and 1. Note that the authors did not propose a degree-corrected version of weighted stochastic blockmodel for modeling edge weights.

However, the maximum likelihood estimation of $\sigma_{i,j}^2$ will be zero if the weights of all the edges between group i and group j are the same, which creates a degeneracy in the likelihood calculation. Therefore, a Bayesian approach is adopted. Prior distributions are introduced to all parameters and variable z as well.

We propose a Gaussian stochastic blockmodel that can be viewed as a constrained version of a degree-corrected version of the weighted stochastic blockmodel. It assumes blocks are assortative in the way that it explicitly assumes sparse inter-block connections, so that it is more suitable for community detection.

3. Model

In our model, namely Gaussian stochastic blockmodel with node preference (GSBM-P), it is assumed that the graph is generated by

Assigning block indicators to each vertex;
Then drawing edge weights from a Gaussian distribution for each edge (including non-existing edges with weight equal to zero) between each pair of vertices.

The Gaussian distribution for the weight of the edge between each pair of vertices is defined as follows:

$\begin{equation} W_{ij} \sim \mathcal{N}(p_i p_j \delta_{z_i z_j}, \sigma^2) \end{equation} \tag{ 9 }$

where p_i is the node preference of vertex i, z_i is the block indicator for vertex i, and σ² is the variance of Gaussian distributions.

This model assumes that the expected value of the weights of inter-community edges is 0; the expected value of the weights of the intra-community edges is a product of the node preference of the two vertices associated with the edges.

Our model can be viewed as a constrained version of the degree corrected version of weighted stochastic blockmodel (WSBM) proposed by Aicher et al [17]. We extend the WSBM [17] to be degree corrected when explaining edge weights, rather than incorporate the degree corrected SBM [13] to explain edge existence, which is proposed in [15]. The degree correction should make our model fit better than WSBM in networks with substantial heterogeneous degrees (or sum of linked edge weights). Meanwhile, our model can be applied in assortative networks. By explicitly assuming the connection between blocks should be sparse, our model can discover community structures accurately.

3.1. Maximum likelihood estimation of GSBM-P

In this section, we adopt the maximum likelihood estimation to fit our model, and show its objective function is the same as general label propagation with node preference. Since our model does not introduce priors over block indicators z, we treat the block indicators as parameters, and use coordinate ascent method to estimate both the block indicators and node preference p. The adopted coordinate ascent method estimates a parameter to maximize the likelihood while fixing all other parameters.

The likelihood function of our model is

$\begin{equation} P(W|{\bf z},{\bf p},\sigma)=\prod_{i,j}\frac{1}{\sigma\sqrt{2\pi}} \exp{\frac{-\left( W_{ij}-p_i p_j \delta_{z_i z_j} \right)^2}{2\sigma^2}} \ , \end{equation} \tag{ 10 }$

where p_i is node preference of vertex i.

Maximizing the likelihood function is equivalent to minimizing σ for Gaussian models. Maximum likelihood estimation of σ² yields

$\begin{equation} \sigma^2=\frac{\sum_{i,j}\left(W_{ij}-p_i p_j \delta_{z_i,z_j}\right)^2}{n^2} \ . \end{equation} \tag{ 11 }$

where n is the total number of vertices.

Minimizing σ² is equivalent to maximizing the following objective function:

$\begin{equation} \begin{array}{rcl} Q_{\rm GSBM-P} &=& \displaystyle 2\sum_{i,j} p_i p_j W_{ij} \delta_{z_i z_j} - \sum_{i,j} (p_i p_j \delta_{z_i z_j})^2 \\ &=& \displaystyle 2\sum_z \sum_{i,j} p_i p_j W_{ij} \delta_{z_i z} \delta_{z_j z} - \sum_z \sum_{i,j} (p_i p_j \delta_{z_i z} \delta_{z_j z})^2 \;. \end{array} \end{equation} \tag{ 12 }$

The maximum-likelihood estimated value of p_i can be expressed in the following iterative updating manner:

$\begin{equation} p_i^{(t+1)} =\frac{\sum_j p_j^{(t)} W_{ij}\delta_{z_j z_i}}{\sum_j \left( p_j^{(t)} \right)^2 \delta_{z_j z_i}} \;, \end{equation} \tag{ 13 }$

or in vector and matrix notation:

$\begin{equation} {\bf p}_z^{(t+1)} = \frac{{\bf W}_z {\bf p}_z^{(t)}}{\left\| {\bf p}_z^{(t)} \right\| ^2} \;, \end{equation} \tag{ 14 }$

where p_z is a vector whose entries consist of the node preferences of the vertices inside community z; W_z is the adjacency matrix of community z, that is, W_z describes how vertices inside community z are connected to each other while it ignores the links outside community z.

Vector p_z converges to the principal eigenvector of W_z. It can be shown that the norm of p_z is the square root of the principal eigenvalue of W_z, denoted as $\sqrt{\lambda_z}$ . Thus, these node preferences are the product of $\sqrt{\lambda_z}$ and the L₂-normalized local eigenvector centrality computed inside the community z.

Noticing that for a specific community z,

$\begin{equation} \sum_{i,j} p_i p_j W_{ij}\delta_{z_i z}\delta_{z_j z}={\bf p}_z^{\rm T} {\bf W}_z {\bf p}_z = \lambda_z^2 \;, \end{equation} \tag{ 15 }$

$\begin{equation} \sum_{i,j} (p_i p_j \delta_{z_i z}\delta_{z_j z})^2 = \lambda_z^2 \;, \end{equation} \tag{ 16 }$

the aforementioned objective function (equation (12)) can be expressed as follows:

$\begin{equation} Q_{\rm GSBM-P} = \sum_z \lambda_z^2 \;. \end{equation} \tag{ 17 }$

Therefore, the maximum likelihood estimation of GSBM-P is equivalent to maximizing the sum of the squares of the principal eigenvalues of the adjacency matrices of all communities. Intuitively, a community z with higher λ_z is better intra-connected. Thus, the maximum likelihood estimation of GSBM-P aims to find communities that are densely intra-connected.

3.1.1. Relationship with label propagation algorithm with node preference.

Rewriting the objective function according to equations (15) and (16) yields

$\begin{equation} Q_{\rm GSBM-P} = \sum_z \sum_{i,j} p_i p_j W_{ij} \delta_{z_i z} \delta_{z_j z} \ , \end{equation} \tag{ 18 }$

which is the same as the objective function of general label propagation with node preference (i.e. equation (6)).

We describe the adopted coordinate ascent method here in detail. This method optimizes the objective function in a similar way to the label propagation algorithm. The coordinate ascent method aims to maximize an objective function f(X), where X is the collection of all parameters. The coordinate ascent method adjusts the value of one parameter to maximize objective function while fixing all other parameters. Different parameters are adjusted cyclically. It is guaranteed that the objective function is non-decreasing during this process and the method converges to a local optimum where the objective function no longer increases.

To optimize the objective function of our model, our coordinate ascent method treats both the block indicators (i.e. labels of vertices) and node preference p as parameters. To adjust node preference, we prefer to adjust the node preference p_z of all vertices inside one community z, by approximating the product of $\sqrt{\lambda_z}$ and the L₂-normalized local eigenvector centrality computed inside the community z. To adjust label of one vertex given all other vertices' labels and p, the coordinate ascent method updates the label of vertex i as follows:

$\begin{equation} z_i = \arg\max_z \sum_{j \neq i} p_j W_{ij} \delta_{z_j z} \ , \end{equation} \tag{ 19 }$

which is exactly the update rule of label propagation with node preference without considering the hop attenuation regularization. Note that assigning a new label for vertex i will not increase the objective function given all other vertices' labels and p. The difference between our method and general label propagation with node preference is that the node preference in equation (13) is derived in a statistically-grounded way rather than heuristically defined. The update rule has a clear explanation. Each vertex has a similarity with each community, which is the weighted sum of linked edge weights connected to this community, with more representative (higher local eigenvector centrality) vertex in the community having greater weight. Then, each vertex joins in the most similar community. In this sense, the node preferences of vertices inside two communities should be updated immediately when a vertex joins in a new community, though it is not required by the coordinate ascent method.

The objective function in equation (18) is non-decreasing during the process of estimating labels and node preference of vertices. The coordinate ascent method is terminated when no vertex changes label or the objective function no longer increases after several iterations. The number of communities is determined as the number of different labels in the converged state.

3.2. A constrained version of GSBM-P

In this section, we examine a constrained version of our model and show its relation to label propagation under constraint. The constrained version assumes that each vertex i inside the community z owns the same node preference $\sqrt {\mu_z}$ , and hence is not degree corrected. That is,

$\begin{equation} W_{ij} \sim \mathcal{N}(\mu_{z_i} \delta_{z_i z_j}, \sigma^2) \;, \end{equation} \tag{ 20 }$

where the expected edge weight between vertex i and j is $\mu_{z_i}\delta_{z_i z_j}$ or equivalently $\sqrt {\mu_{z_i}}\sqrt {\mu_{z_j}}\delta_{z_i z_j}$ . $\mu_{z_i}$ is a parameter associated with each community. This simplified model assumes that the expected value of the weights of the inter-community edges is 0; the expected value of the weights of the edges inside community z is a uniform value μ_z.

The likelihood function of a graph under this constrained model is

$\begin{equation} P(W|{\bf z}, {\boldsymbol{\mu}} ,\sigma ) = \prod\limits_{i,j} {\frac{1}{{\sigma \sqrt {2\pi}}}} \exp \frac{{- {{\left( {{W_{ij}} - {\mu _{{z_i}}}\delta_{z_i z_j}} \right)}^2}}}{{2{\sigma ^2}}}\;. \end{equation} \tag{ 21 }$

Maximizing the likelihood function is equivalent to minimizing σ for Gaussian models. The maximum likelihood estimation of σ² can be expressed as

$\begin{eqnarray} \begin{array}{rcl} \sigma^2 &=& \displaystyle \frac{\sum_{i,j}(W_{ij} - \mu_{z_i}\delta_{z_i z_j})^2}{n^2} \\ &=& \displaystyle \frac{-2 \sum_{i,j} \mu_{z_i} W_{ij} \delta_{z_i z_j} + \sum_{i,j} (\mu_{z_i}\delta_{z_i z_j})^2}{n^2} + {\rm const} \;. \end{array} \end{eqnarray} \tag{ 22 }$

Hence, minimizing σ² is equivalent to maximizing the following objective function:

$\begin{eqnarray} \begin{array}{rcl} Q &=& \displaystyle 2\sum_{i,j} \mu_{z_i} W_{ij}\delta_{z_i z_j} - \sum_{i,j} (\mu_{z_i}\delta_{z_i z_j})^2 \\ &=& \displaystyle 2 \sum_z \sum_{i,j} \mu_z W_{ij} \delta_{z_i z} \delta_{z_j z} -\sum_z \mu_z^2 n_z^2 \ , \end{array} \end{eqnarray} \tag{ 23 }$

where n_z denotes the number of vertices inside community z.

In this constrained version of our model, if we specify μ_z = μ for all communities, then this simplified quality function can be viewed as the objective function of the label propagation algorithm under constraint, since

$\begin{eqnarray} \begin{array}{rcl} Q &=& \displaystyle \mu \left( 2 \sum_z \sum_{i,j} W_{ij} \delta_{z_i z} \delta_{z_j z} - \mu \sum_z n_z^2 \right) \\ &=& \displaystyle \mu \left( 2 Q_{\rm{LPA}} - \mu \sum_z n_z^2 \right) \ , \end{array} \end{eqnarray} \tag{ 24 }$

where $\sum_z n_z^2$ is the penalty term, which is maximized when all vertices are in a community, and prevents the large communities from growing. μ can be viewed as a resolution parameter, which controls the community size.

In comparison with the first model of the label propagation algorithm under constraint [5] (a.k.a. constant Potts model [23]), leaving μ tunable makes two objective functions equivalent. Therefore, the maximum likelihood estimation of the block indicators in the constrained version of our model while leaving μ_z tunable results in an objective function which is a generalization of the objective function of the label propagation algorithm proposed by Barber and Clark [5] with a penalty term $\frac{1}{2} \sum_z n_z^2$ .

The constrained version of our model has its disadvantages, especially that the expected weights of edges inside each community are equal. It may not fit well in real networks. By incorporating node preference, the expected weight of edges may vary according to the node preference assigned for each vertex. This renders our model more expressive and robust against modelling complex real-world networks. We show in the next section that our proposed Gaussian stochastic blockmodel with node preference performs well.

4. Experiments

In this section, the coordinate ascent method for GSBM-P is tested on both real-world networks and synthetic networks. We also compare it with the label propagation algorithm (LPA), label propagation algorithm with intra-community RandomWalk (LPA-P) as the node preference and OSLOM [20] on various synthetic networks, and the variational Bayes method for weight stochastic blockmodel (WSBM) on weighted synthetic networks.

Two real-world networks, namely the karate club [24] and political blogs [25] are chosen as test data. For synthetic networks, unweighted benchmark networks proposed by Lancichinetti et al [26] and weighted benchmark networks proposed by Lancichinetti and Fortunato [27] are chosen. Erdös–Rényi random graphs [28] are chosen for checking overfitting. Gaussian distribution is not suitable to fit binary data (unweighted networks). However, in the literature, label propagation algorithms are often applied in unweighted networks. Due to the same objective function as label propagation with node preference and similar optimization method to label propagation with node preference, we also test our coordinate ascent method for GSBM-P in unweighted networks and show its good performance.

Normalized mutual information (NMI, a.k.a. symmetric uncertainty, introduced by Witten and Frank [29]) is often adopted to reflect the similarity between the obtained partition and the planted partition. However, as is shown in [30], NMI sometimes has systematic bias in finite-size networks. This phenomenon is also observed in our experiments on both synthetic unweighted networks with no intra-community edge in planted partitions and synthetic weighted networks with total intra-community edge weights close to zero in planted partitions. Algorithms that are designed only to find densely intra-connected communities cannot find similar partitions to the planted partitions on these networks. However, for LPA-P and the coordinate ascent method for GSBM-P, the values of NMI are much larger than 0 between obtained partitions and the planted partitions on these networks (the presentation of results is omitted).

Zhang [30] proposed relative normalized mutual information (rNMI) to fix this systematic bias. rNMI is defined as follows:

$\begin{equation} {\rm rNMI}\left(A,B\right)={\rm NMI}\left(A,B\right)-\langle {\rm NMI}\left(A,C\right) \rangle \ , \end{equation} \tag{ 25 }$

where A is the planted partition, B is partition obtained by algorithm, C is a random partition with the same group-size distribution as partition B, and 〈NMI(A, C)〉 is the expected NMI(A, C) over different realizations of random partition C. In our experiments, we observe that the value of rNMI is slightly smaller than zero for LPA-P and the coordinate ascent method for GSBM-P on these benchmark networks with no intra-community edge or with total intra-community edge weights close to zero in planted partitions.

For synthetic networks, to reflect the similarity between the obtained partition B and the planted partition A, we will use the ratio of relative normalized mutual information (rrNMI) as defined below:

$\begin{equation} {\rm rrNMI}\left(A,B\right)=\frac{{\rm rNMI}\left(A,B\right)}{{\rm rNMI}\left(A,A\right)} , \end{equation} \tag{ 26 }$

such that it is up-bounded by 1 and equals 1 when partition B is identical to partition A. Note that rNMI(A, A) is not zero for all the planted partitions in our experiment, such that this equation will not suffer from a divide-by-zero problem. The expectation in rNMI is estimated over 100 realizations of random partition C.

OSLOM [20] may produce overlapping communities. Both NMI and rrNMI cannot be directly applied. An adjusted NMI [31] is used to measure the similarity between the obtained covering by OSLOM and the planted partition. The adjusted NMI equals 1 when obtained covering and planted partition are identical.

In all of the following experiments, LPA-P is terminated after 50 iterations. The coordinate ascent method for GSBM-P starts from the initialization where each vertex has its unique label. The iterative order of the vertices for the coordinate ascent method is random, and the node preferences inside two communities are updated immediately when a vertex moves from one community to another. Our method is terminated when no vertex changes label or the objective function no longer increases. We observe that the actual number of iterations of our method is generally less than 10 in tested networks. The time complexity of our method should be $\mathcal{O}((d+td_1C)n)$ in each iteration, where d denotes the average degree, t denotes the average iterative times for computing intra-community eigenvector centrality, d₁ denotes the average intra-community degree, and C denotes the average size of communities. In very large networks, the node preferences can be updated every time all vertices are traversed for saving time. Then, the time complexity should be $\mathcal{O}(dn+td_1n)$ in each iteration, which is scalable for very large networks.

4.1. Empirical networks

The karate club data set [24] is a social network of friendships between 34 members of a karate club at a US university in the 1970s. The network has 34 nodes and 78 edges with weight 1. In reality, the karate club finally splits into two groups. The true partition of this network is known.

The coordinate ascent method for GSBM-P is run several times and the partition with the highest quality function value is chosen.

Figure 1 demonstrates the result of our method in the karate club network. The result of our method is a more fine-grained partition of the true partition. In fact, this result has a slightly higher quality function value than the true partition. Note that the modularity maximization method in [32] and the non-parametric Bayesian mixed membership stochastic blockmodel [33] favor the partition with four communities. The true partition is a local optimum which our method frequently reached.

We notice that the vertex 10 is misidentified by the degree corrected stochastic blockmodel in [13] and modularity maximization method in [32]. We verify that the objective function value (i.e. equation (17) or equation (18)) of the true partition is higher than that of the partition with misidentification of vertex 10. Thus GSBM-P performs better in the karate club network.

For GSBM-P, we observed that vertices with higher degree have generally higher node preference inside one community, and in the karate club network, the pair of vertices with higher degree inside one community are more likely to be connected. It is the same as expected by the model, which expects the weight of an edge linked to a pair of vertices with higher node preference should be larger.

We then show how our method performs in a larger network. The political blogs data set [25] is a network of directed hyperlinks between political blogs whose largest connected component contains 1222 nodes. In this paper, we use the undirected form. Figure 2 shows the result of our method in the largest connected component of the political blog network.

**Figure 2.** Partition of *political blogs* network by GSBM-P. The size of each vertex is proportional to its degree and the color reflects the group membership.
Download figure:
Standard image High-resolution image

GSBM-P discovers nine communities with seven tiny communities and two major communities that are roughly two political tendencies. The normalized mutual information of obtained partition is 0.678. If we merge small communities to one of the two major communities, normalized mutual information increases to 0.724. The performance is very close to that of the degree corrected stochastic blockmodel [13] with given cluster number.

4.2. Synthetic networks

4.2.1. Unweighted synthetic networks.

Since our model treats the block indicators as parameters, the maximum likelihood estimation of the block indicators of vertices may be biased especially when these vertices have very small degree. To check overfitting, the coordinate ascent method for GSBM-P is tested on Erdös–Rényi random graphs [28] where no community structures exist. Analogous to [34], the network sizes are fixed to 1000 nodes, and the average degrees in all random graphs range from 10 to 100. Our method is run 10 times on each random graph and the results with highest objective function value are chosen. The number of discovered communities in each random graph is illustrated in figure 3(a). It shows that the coordinate ascent method is not overfitting when the average degree is not too small. In Erdös–Rényi random graphs with network size equal to 1000, our method begins to identify the whole network as a community when the average degree is 40, though it fails in some generated Erdös–Rényi random graphs with average degree equal to 40.

We also show how the number of communities found by our method varies with respect to the network size of Erdös–Rényi random graphs in figure 3(b). It can be observed that when the system size of Erdös–Rényi random graphs increases, our method demands larger average degree to find only one community in Erdös–Rényi random graphs. It implies that our method may overfit in networks with large communities when the intra-community degree of vertices remains the same, in that it may not recognize a planted community as only one community. To deal with this situation, one can evaluate the significance of obtained communities and merge the less significant communities into other communities. This process will decrease the number of communities. One property of our method and label propagation algorithms is that the number of communities will not increase in the iterations. Thus, it is reasonable to apply our method again after the merging procedure.

Our method is also tested on resolution limit test benchmark networks [35]. Analogous to [36], the networks are composed by cliques with 4 vertices and each clique is linked to the next clique with an edge to form a ring. Both our method and the label propagation algorithm [3] are run 10 times on each network and the average number of discovered communities is shown in figure 4. Each time, our method discovers the correct communities. Hence, in figure 4, the red line covers the green dashed line which represents the number of planted communities. Thus, our method tends not to suffer from resolution limit. However, the blue error bars in figure 4 imply that the LPA algorithm is not stable. The average number of discovered communities of LPA is less than the planted, because LPA sometimes identifies several cliques as one community.

**Figure 4.** Resolution limit test on benchmark networks [35]. The networks are composed by cliques with 4 vertices and each clique is linked to the next clique with an edge to form a ring. We show the number of discovered communities in networks composed by different number of cliques varying from 8 to 24. Each data point is an average over 10 runs.
Download figure:
Standard image High-resolution image

We then test the algorithms in the unweighted benchmark networks proposed by Lancichinetti et al [26]. Experiments show that the coordinate ascent method for GSBM-P is superior to LPA in those networks, and achieves a similar performance to OSLOM.

There are several parameters to generate the benchmark networks, including the number of vertices n, average degree, maximum degree, the degree distribution, the community size distribution, the range of community size C, and the ratio that a vertex links to vertices outside its community (i.e. topological mixing parameter μ_t). When the mixing parameter is smaller than 0.5, the communities in generated networks are strong communities [37] where each vertex has more connections with vertices inside its community than vertices outside the community. When topological mixing parameter μ_t = 1 in benchmark networks, communities of planted partition have no intra-community edges.

We set two kinds of parameters with different range of community size according to [34] while leaving the mixing parameter μ_t as the independent variable. We compare GSBM-P with LPA-P and LPA. The ratio of relative normalized mutual information (equation (26)) between the obtained partition and planted partition is calculated. For OSLOM, adjusted NMI [31] is calculated. Figure 5 shows how the performance of these algorithms varies according to the mixing parameter in two kinds of networks.

**Figure 5.** Performance with respect to the mixing parameter μ_t for different algorithms on two unweighted test benchmarks [26]. We use the ratio of relative normalized mutual information (equation (26)) for LPA, LPA-P and GSBM-P. We use adjusted NMI [31] for OSLOM. Each data point is an average over 30 networks generated by the same parameters. In two subfigures, the size of networks is 1000, the average degree is 20, and the maximum degree is 50. In (a), the range of community size is C = [10, 50]. In (b), the range of community size is C = [20, 100].
Download figure:
Standard image High-resolution image

From figures 5(a) and (b), it can be seen that the planted partition is an optimum of our method when the mixing parameter μ_t is small. When mixing parameter μ_t is small, the coordinate ascent method for GSBM-P always converges to the optimum in every run. Our method outperforms LPA. The label propagation approach applied in LPA is very suitable for finding strong communities. However, as the mixing parameter is larger than 0.5 (i.e. weak communities), LPA fails. The performance of LPA-P is not satisfying.

We cannot directly compare the performance of our coordinate ascent method for GSBM-P with OSLOM since their performances are evaluated by different measures. However, values of both measures equal one when the obtained partition or covering is identical to planted partition. Figure 5(a) shows that our coordinate ascent method recovers planted partition in a similar range of topological mixing parameters to OSLOM in networks with small communities. Figure 5(b) shows that the performance of our coordinate ascent method is close to that of OSLOM in networks with large communities.

We then show the corresponding average number of communities found by different algorithms in figure 6. The average number of planted communities is also shown. LPA discovers one community when topological mixing parameter μ_t equals or is larger than 0.6 in benchmark networks with small communities, and equals or is larger than 0.5 in benchmark networks with large communities. In benchmark networks with small communities, our coordinate ascent method for GSBM-P discovers a reasonable number of communities in a similar range of topological mixing parameters to OSLOM. However, in benchmark networks with large communities, our method discovers more communities when μ_t equals or is larger than 0.55. As discussed above, it may improve the performance of our method if the merging procedure is introduced.

We then show the running time of algorithms in figure 7. In figure 7, GSBM-P(alter) represents the alternative coordinate ascent method for GSBM-P that only updates node preferences every time all vertices are traversed. Mixing parameter μ_t is set to 0.1, while parameters other than the number of vertices and community size are set according to [34]. In figure 7(a), the community sizes are fixed to range from 20 to 100, and the number of vertices varies among the test networks. In figure 7(b), network sizes are fixed to 3000, the community size varies. The community size on the x axis represents the average value, for instance, 390 on the x axis represents the community sizes ranges from 370 to 410, and so forth.

It shows that the running time of our method is linear in network size and community size. For very large networks with large communities, the alternative coordinate ascent method can be applied.

4.2.2. Weighted synthetic networks.

We then test the algorithms in the weighted benchmark networks proposed by Lancichinetti and Fortunato [27]. The weights of edges are non-negative in these benchmark networks.

There are two more parameters in weighted benchmark networks than unweighted benchmark networks, namely β which controls the power-law relation between the sum of weights a vertex links to its neighbors and the vertex's degree, and the ratio of weights a vertex links to vertices outside its community (i.e. mixing parameter μ_w).

We compare our method with LPA and the variational Bayes method for WSBM. We set the parameter α in WSBM (see equation (8)) to 0.0 and 0.5, corresponding to pure WSBM and WSBM with auxiliary degree correction respectively. The number of planted communities is given for WSBM in all networks. We set two kinds of parameters with different topological mixing parameter while leaving the mixing parameter μ_w as the independent variable. The network sizes are fixed to 150, the average degree is 15, and the maximum degree is 30, and all other parameters are identical to those in [34]. The results are illustrated in figure 8. For algorithms that have an objective function, each point in figure 8 shows the average rrNMI over the results with highest objective function values in 10 runs on each of 30 networks.

**Figure 8.** Performance with respect to the mixing parameter μ_w for different algorithms on two small weighted test benchmarks [27]. We use the ratio of relative normalized mutual information (equation (26)) for WSBM, LPA and GSBM-P. We use adjusted NMI [31] for OSLOM. Each data point is an average over 30 networks generated by the same parameters. In (a) and (b), the size of networks is 150, the average degree is 15, the maximum degree is 30, and the range of community size is C = [10, 20]. In (a), the topological mixing parameter μ_t is 0.5. In (b), the topological mixing parameter μ_t is 0.8.
Download figure:
Standard image High-resolution image

When μ_w is 1.0, communities in planted partition have total intra-community edge weights close to zero. Both LPA and the coordinate ascent method for GSBM-P fail to obtain partitions similar to the planted partition. WSBM can group vertices with similar connectivity pattern, and does discover some communities with total intra-community edge weights close to zero. Thus, rrNMI is much larger than 0 for WSBM when μ_w is 1.0.

The variational Bayes algorithm for WSBM is actually not efficient. For example, it often reaches a poor local optimum. Only sometimes does it reach a better result with higher marginal likelihood, which may not happen in 10 runs. Sometimes, the variational Bayes algorithm even is not converged in 200 iterations. From figure 8, WSBM fails to give a good partition even when mixing parameter μ_w is small, which may be due to the lack of degree correction for modeling edge weights. Moreover, the time complexity of each iteration of the naive variational Bayes algorithm for WSBM is $\mathcal{O}(mK^2)$ , where m denotes the total number of edges in networks and K denotes the number of communities, which limits its application in very large networks. The coordinate ascent method for GSBM-P performs much better, and is comparable to LPA. In figure 8(a), when mixing parameter μ_w is small, it recovers the planted partition. For OSLOM, it is sensitive to topological mixing parameter μ_t. In figure 8(b), the performance of OSLOM is unsatisfactory, and OSLOM identifies each vertex as a community when μ_w is larger than 0.35. In figure 8(a), OSLOM returns the same communities as planted communities in some tested benchmark networks with small μ_w, but different communities in other tested benchmark networks with small μ_w.

We show the performance of our method in weighted networks with 1000 vertices. We set four kind of parameters with different range of community size and different topological mixing parameter according to [34] while leaving the mixing parameter μ_w as the independent variable. The results are illustrated in figure 9. Here we omit the presentation of WSBM, because it takes too much time to run on these networks.

In all of the four groups of weighted benchmark networks, the coordinate ascent method for GSBM-P recovers the planted partition(see figure 9(a)) or achieves the results whose rrNMI are only slightly lower than that of LPA (see figures 9(b)–(d)). When mixing parameter μ_w is slightly larger, our method outperforms LPA. In figures 9(c) and (d), LPA discovers many more communities than the planted partition even when the mixing parameter μ_w is small. It implies that the communities are not well connected inside themselves. The performance of our method is again comparable to LPA on those benchmark networks, while LPA-P is not satisfying. OSLOM performs really well in benchmark networks when μ_t = 0.5, but again fails when μ_t = 0.8.

5. Conclusions and future work

In this paper, we proposed the Gaussian stochastic blockmodel with node preference (GSBM-P). The maximum likelihood estimation of GSBM-P is proved to be equivalent to maximizing the objective function below:

$\begin{equation*}Q_{\rm GSBM-P} = \sum_z \lambda_z ^2 \;, \end{equation*}$

where λ_z is the principal eigenvalue of the adjacency submatrix in community z.

We then proved that our coordinate ascent method optimizes our objective function in a similar way to label propagation with node preference. We demonstrated that the vector composed by node preferences of vertices inside community z is the product of $\sqrt{\lambda_z}$ and the intra-community eigenvector centrality, i.e. the L₂-normalized dominant eigenvector of the adjacency matrix inside the community.

Experiments showed that the coordinate ascent method for the Gaussian stochastic blockmodel with node preference worked well in most cases. It outperforms the variational Bayes method for the weighted stochastic blockmodel and label propagation with intra-community RandomWalk as node preference in the aspect of community detection. In unweighted networks, the coordinate ascent method for GSBM-P is superior to LPA, and achieves similar performance to OSLOM. In weighted networks, the performance of our method is comparable to LPA.

Šubelj and Bajec [4] have proposed that the iterative order implicitly contains the propagation strength. In the future, we may explore the suitable iterative order for our coordinate ascent method. Moreover, the coordinate ascent method is not the only way to optimize our objective function. We may explore different optimization algorithms in the future. We make two strong assumptions for our model, such that it is related to label propagation algorithms. However, these assumptions may be strict. The first one is that we fixed the expected edge weights between distinct communities to be zero. In the future, we may try to leave this small expected inter-community edge weight as a tunable parameter. The second one is that we treat the block indicators as parameters. Possible future work also includes placing a prior on block indicators and adopts more advanced approximation inference methods such as variational Expectation Maximization.

Acknowledgments

We thank C Aicher for providing the implementation of the weighted stochastic blockmodel. We thank A Lancichinetti, F Radicchi, J Javier Ramasco and S Fortunato for providing the implementation of the order statistics local optimization method. We thank P Zhang for providing the implementation of relative normalized mutual information. This work is supported by the National Natural Science Foundation of China (grant No. M1321005, 61472017).

On the relationship between Gaussian stochastic blockmodels and label propagation algorithms

Article metrics

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction