Exact Recovery of Stochastic Block Model by Ising Model

Zhao, Feng; Ye, Min; Huang, Shao-Lun

doi:10.3390/e23010065

Open AccessArticle

Exact Recovery of Stochastic Block Model by Ising Model

by

Feng Zhao

^1,†

,

Min Ye

^2,† and

Shao-Lun Huang

^2,*

¹

Department of Electronics, Tsinghua University, Beijing 100084, China

²

Tsinghua Berkeley Shenzhen Institute, Berkeley, CA 94704, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2021, 23(1), 65; https://doi.org/10.3390/e23010065

Submission received: 29 November 2020 / Revised: 20 December 2020 / Accepted: 30 December 2020 / Published: 2 January 2021

(This article belongs to the Special Issue Phase Transitions and Emergent Phenomena: How Change Emerges through Basic Probability Models)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this paper, we study the phase transition property of an Ising model defined on a special random graph—the stochastic block model (SBM). Based on the Ising model, we propose a stochastic estimator to achieve the exact recovery for the SBM. The stochastic algorithm can be transformed into an optimization problem, which includes the special case of maximum likelihood and maximum modularity. Additionally, we give an unbiased convergent estimator for the model parameters of the SBM, which can be computed in constant time. Finally, we use metropolis sampling to realize the stochastic estimator and verify the phase transition phenomenon thfough experiments.

Keywords:

stochastic block model; exact recovery; Ising model; maximum likelihood; metropolis sampling

1. Introduction

In network analysis, community detection consists in inferring the group of vertices that are more densely connected in a graph [1]. It has been used in many domains, such as recommendation systems [2], task allocation in distributed computing [3], gene expressions [4], and so on. The stochastic block model (SBM) is one of the most commonly used statistical models for community detection problems [5,6]. It provides a benchmark artificial dataset to evaluate different community detection algorithms and inspires the design of many algorithms for community detection tasks. These algorithms, such as semi-definite relaxation, spectral clustering, and label propagation, not only have theoretical guarantees when applied to the SBM, but also perform well on datasets without the SBM assumption. The study of the theoretical guarantee of the SBM model can be divided between the problem of exact recovery and that of partial recovery. Exact recovery requires that the estimated community should be exactly the same as the underlining community structure of the SBM whereas partial recovery expects the ratio of misclassified nodes to be as small as possible. For both cases, the asymptotic behavior of the detection error is analyzed when the scale of the graph tends to infinity. There are already some well-known results for the exact recovery problem on the SBM. To name but a few, Abbe and Mossel established the exact recovery region for a special sparse SBM with two communities [7,8]. Later on, the result was extended to a general SBM with multiple communities [9].

Parameter inference in the SBM is often considered alongside the exact recovery problem. Previous inference methods require the joint estimation of node labels and model parameters [10], which have high complexity since the recovery and inference tasks are done simultaneously. In this article, we will decouple the inference and recovery problems, and propose an unbiased convergent estimator for SBM parameters when the number of communities is known. Once the estimator is obtained, the recovery condition can be checked to determine whether it is possible to recover the labels exactly. Additionally, the estimated parameter will guide the choice of parameters for our proposed stochastic algorithm.

In this article, the exact recovery of the SBM is analyzed by considering the Ising model, which is a probability distribution of node states [11]. We use the terms node states and node labels interchangeably throughout this paper, both of which refer to the membership of the underlining community. The Ising model was originally proposed in statistical mechanics to model the ferromagnetism phenomenon but has wide applications in neuroscience, information theory, and social networks. Among different variants of Ising models, the phase transition property is shared. Phase transition can be generally formulated when some information quantity changes sharply in a small neighborhood of parameters. Based on the random graph generated by an SBM with two underlining communities, the connection of the SBM and the Ising model was first studied by [12]. Our work will extend the existing result to the multiple community case, establish the phase transition property, and give the recovery error an upper bound. The error bounds decay in a polynomially fast rate in different phases. Then we will propose an alternative approach to estimate the labels by finding the Ising state with maximal probability. Compared with sampling from the Ising model directly, we will show that the optimization approach has a sharper error upper bound. Solving the optimization problem is a generalization of maximum likelihood and also has a connection with maximum modularity. Additionally, searching the state with maximal probability could also be done within all balanced partitions. We will show that this constrained search is equivalent to the graph minimum cut problem, and the detection error upper bound for the constrained maximization will also be given.

The exact solution to maximize the probability function or exact sampling from the Ising model is NP-hard. Many polynomial time algorithms have been proposed for approximation purposes. Among these algorithms, simulated annealing performs well and produces a solution that is very close to the true maximal value [13]. On the other hand, in the original Ising model, metropolis sequential sampling is used to generate samples for the Ising model [14]. Simulated annealing can be regarded as metropolis sampling with decreasing temperature. In this article, we will use the metropolis sampling technique to sample from the Ising model defined on the SBM. This approximation enables us to verify the phase transition property of our Ising model numerically.

This paper is organized as follows. Firstly, in Section 3 we introduce the SBM and give an estimator for the parameters of the SBM. Then, in Section 4, our specific Ising model is given and its phase transition property is obtained. Derived from the Ising model, in Section 5, the energy minimization method is introduced, and we establish its connection with maximum likelihood and modularity maximization algorithm. Furthermore, in Section 6, we realize the Ising model using the metropolis algorithm to generate samples. Numerical experiments and conclusion are given lastly to finish this paper.

Throughout this paper, the community number is denoted by k; the random undirected graph G is written as

G (V, E)

with vertex set V and edge set E;

V = {1, \dots, n} = : [n]

; the label of each node is

X_{i}

, which is chosen from

W = {1, ω, \dots, ω^{k - 1}}

, and we further require W to be a cyclic group with order k;

W^{n}

is the n-ary Cartesian power of W; f is a permutation function on W and is extended to

W^{n}

in an element-wise manner;

U^{c}

is the complement set of U and

| U |

is the cardinality of U; the set

S_{k}

is used to represent all permutation functions on W and

S_{k} (σ) : = {f (σ) | f \in S_{k}}

for

σ \in W^{n}

; the indicator function

δ (x, y)

is defined as

δ (x, y) = 1

when

x = y

, and

δ (x, y) = 0

when

x \neq y

;

f (n) = O (g (n))

if there exists a constant

c > 0

such that

f (n) \leq c g (n)

for large n;

f (n) = o (g (n))

holds if

{lim}_{n \to \infty} \frac{f (n)}{g (n)} = 0

; we define the distance of two vectors as:

dist (σ, σ^{'}) = | {i \in [n] : σ_{i} \neq σ_{i}^{'}} | for σ, σ^{'} \in W^{n}

and the distance of a vector to a space

S \subseteq W^{n}

as

dist (σ, S) : = min {dist (σ, σ^{'}) | σ^{'} \in S}

. For example, when

n = 2

and

k = 2

,

σ = (1, ω) \in W^{2}

;

ω^{0} = 1

;

ω \cdot ω = ω^{2} = 1

; let f be a mapping such that

f (1) = ω

and

f (ω) = 1

, then

f \in S_{2}

and

f (σ) = (ω, 1)

;

dist (σ, f (σ)) = 2

;

S_{k} (σ) = {σ, f (σ)}

; and

S_{k}^{c} (σ) = {(1, 1), (ω, ω)}

.

2. Related Works

The classical Ising model is defined on a lattice and confined to two states

{\pm 1}

. This definition can be extended to a general graph and multiple-state case [15]. In [16], Liu considered the Ising model as defined on a graph generated by sparse SBM and his focus was to compute the log partition function, which was averaged over all random graphs. In [17], an Ising model with a repelling interaction was considered on a fixed graph structure, and the phase transition condition was established, which involves both the attracting and repelling parameters. Our Ising model derives from the work of [12], but we extend their results by considering the error upper bound and multiple-community case.

The exact recovery condition for the SBM can be derived as a special case of many generalized models, such as pairwise measurements [18], minimax rates [19], and side information [20]. The Ising model in this paper provides another way to extend the SBM model and derives the recovery condition. Additionally, the error upper bound for exact recovery of the two-community SBM by constrained maximum likelihood has been obtained in [7]. Compared with previous results, we establish a sharper upper bound for the multiple-community case in this paper.

The connection between maximum modularity and maximal likelihood was investigated in [21]. To get an optimal value of maximum modularity approximately, simulated annealing was exploited [22], which proceeds by using the partition approach, while the Metropolis sampling used in this paper is applied to estimate the node membership directly.

3. Stochastic Block Model and Parameter Estimation

In this paper, we consider a special symmetric stochastic block model (SSBM), which is defined as follows:

Definition 1 (SSBM with k communities).

Let

0 \leq q < p \leq 1

,

V = [n]

and

X = (X_{1}, \dots, X_{n}) \in W^{n}

. X satisfies the constraint that

| {v \in [n] : X_{v} = u} | = \frac{n}{k}

for

u \in W

. The random graph G is generated under

SSBM (n, k, p, q)

if the following two conditions are satisfied.

There is an edge of G between the vertices i and j with probability p if $X_{i} = X_{j}$ and with probability q if $X_{i} \neq X_{j}$ .
The existences of each edge are mutually independent.

To explain SSBM in more detail, we define the random variable

Z_{i j} : = 1 [{i, j} \in E (G)]

, which is the indicator function of the existence of an edge between nodes i and j. Given the node labels X,

Z_{i j}

follows Bernoulli distribution, whose expectation is given by:

E [Z_{i j}] = \{\begin{matrix} p & if X_{i} = X_{j} \\ q & if X_{i} \neq X_{j} \end{matrix}

(1)

Then the random graph G with n nodes is completely specified by

Z : = {Z_{i j}, 1 \leq i < j \leq n}

in which all

Z_{i j}

are jointly independent. The probability distribution for SSBM can be written as:

\begin{matrix} P_{G} (G) : = P_{G} (Z = z | X) = p^{\sum_{X_{i} = X_{j}} z_{i j}} q^{\sum_{X_{i} \neq X_{j}} z_{i j}} \\ \cdot {(1 - p)}^{\sum_{X_{i} = X_{j}} (1 - z_{i j})} {(1 - q)}^{\sum_{X_{i} \neq X_{j}} (1 - z_{i j})} \end{matrix}

(2)

We will use the notation

G_{n}

to represent the set containing all graphs with n nodes. By the normalization property,

P_{G} (G_{n}) = \sum_{G \in G_{n}} P_{G} (G) = 1

.

In Definition 1, we have supposed that the node label X is fixed instead of a uniformly distributed random variable. Since the maximum posterior estimator is equivalent to the maximum likelihood when the prior is uniform, these two definitions are equivalent. Although the random variable definition is more commonly used in previous literature [6], fixing X makes our formal analysis more concise.

Given the SBM, the exact recovery problem can be formally defined as follows:

Definition 2 (Exact recovery in SBM).

Given X, the random graph G is drawn under

SSBM (n, k, p, q)

. We say that the exact recovery is solvable for

SSBM (n, k, p, q)

if there exists an algorithm that takes G as input and outputs

\hat{X}

such that:

P_{a} (\hat{X}) : = P (\hat{X} \in S_{k} (X)) \to 1 as n \to \infty

In the above definition, the notation

P_{a} (\hat{X})

is called the probability of accuracy for estimator

\hat{X}

. Let

P_{e} (\hat{X}) = 1 - P_{a} (\hat{X})

represent the probability of error. Definition 2 can also be formulated as

P_{e} (\hat{X}) \to 0

as

n \to \infty

. The notation

\hat{X} \in S_{k} (X)

means that we can only expect a recovery up to a global permutation of the ground truth label vector X. This is common in unsupervised learning as no anchor exists to assign labels to different communities. Additionally, given a graph G, the algorithm can be either deterministic or stochastic. Generally speaking, the probability of

\hat{X} \in S_{k} (X)

should be understood as

\sum_{G \in G_{n}} P_{G} (G) P_{\hat{X} | G} (\hat{X} \in S_{k} (X))

, which reduced to

P_{G} (\hat{X} \in S_{k} (X))

for the deterministic algorithm.

For constants

p, q

, which are irrelevant with the graph size n, we can always find algorithms to recover X such that the detection error decreases exponentially fast as n increases; that is to say, the task with a dense graph is relatively easy to handle. Within this paper, we consider a sparse case when

p = \frac{a log n}{n}, q = \frac{b log n}{n}

. This case corresponds to the sparsest graph when exact recovery of the SBM is possible. And under this condition, a well known result [9] states that exact recovery is possible if and only if:

\sqrt{a} - \sqrt{b} > \sqrt{k}

(3)

Before diving into the exact recovery problem, we first consider the inference problem for SBM. Suppose k is known, and we want to estimate

a, b

from the graph G. We offer a simple method by counting the number of edges

T_{1}

and the number of triangles

T_{2}

of G, and the estimators

\hat{a}, \hat{b}

are obtained by solving the following equation systems:

\begin{matrix} \frac{x + (k - 1) y}{2 k} & = \frac{T_{1}}{n log n} \end{matrix}

(4)

\begin{matrix} \frac{1}{k^{2}} (\frac{x^{3}}{6} + \frac{k - 1}{2} x y^{2} + (k - 1) (k - 2) \frac{y^{3}}{6}) & = \frac{T_{2}}{{log}^{3} n} \end{matrix}

(5)

The theoretical guarantee for the solution is given by the following theorem:

Theorem 1.

When n is large enough, the equation system of Equations (4) and (5) has the unique solution

(\hat{a}, \hat{b})

, which are unbiased consistent estimators of

(a, b)

. That is,

E [\hat{a}] = a, E [\hat{b}] = b

, and

\hat{a}

and

\hat{b}

converge to

a, b

in probability, respectively.

Given a graph generated by the SBM, we can use Theorem 1 to obtain the estimated

a, b

and determine whether exact recovery of label X is possible by Equation (3). Additionally, Theorem 1 provides good estimation of

a, b

to initialize their parameters of some recovery algorithm like maximum likelihood or our proposed Metropolis sampling in Section 6.

4. Ising Model for Community Detection

In the previous section, we have defined SBM and its exact recovery problem. While SBM is regarded as obtaining the graph observation G from node label X, the Ising model provides a way to generate estimators of X from G by a stochastic procedure. The definition of such an Ising model is given as follows:

Definition 3 (Ising model with k states).

Given a graph G sampled from

SSBM (n, k, \frac{a log n}{n}, \frac{b log n}{n})

, the Ising model with parameters

γ, β > 0

is a probability distribution of the state vector

σ \in W^{n}

whose probability mass function is

\begin{matrix} P_{σ | G} (σ = \bar{σ}) = \frac{exp (- β H (\bar{σ}))}{Z_{G} (α, β)} \end{matrix}

(6)

where

H (\bar{σ}) = γ \frac{log n}{n} \sum_{{i, j} \notin E (G)} δ ({\bar{σ}}_{i}, {\bar{σ}}_{j}) - \sum_{{i, j} \in E (G)} δ ({\bar{σ}}_{i}, {\bar{σ}}_{j})

(7)

The subscript in

P_{σ | G}

indicates that the distribution depends on G, and

Z_{G} (α, β)

is the normalizing constant for this distribution.

In physics,

β

refers to the inverse temperature and

Z_{G} (γ, β)

is called the partition function. The Hamiltonian energy

H (\bar{σ})

consists of two terms: the repelling interaction between nodes without edge connection and the attracting interaction between nodes with edge connection. The parameter

γ

indicates the ratio of the strength of these two interactions. The term

\frac{log n}{n}

is added to balance the two interactions because there are only

O (\frac{log n}{n})

connecting edges for each node. The probability of each state is proportional to

exp (- β H (\bar{σ}))

, and the state with the largest probability corresponds to that with the lowest energy.

The classical definition of the Ising model is specified by

H (σ) = - \sum_{(i, j) \in E (G)} σ_{i} \cdot σ_{j}

for

σ_{i} = \pm 1

. There are two main differences between Definition 3 and the classical one. Firstly, we add a repelling term between nodes without an edge connection. This makes these nodes have a larger probability to take different labels. Secondly, we allow the state at each node to take k values from W instead of the two values

\pm 1

. When

γ = 0

and

k = 2

, Definition 3 is reduced to the classical definition of the Ising model up to a scaling factor.

Definition 3 gives a stochastic estimator

{\hat{X}}^{*}

for X:

{\hat{X}}^{*}

is one sample generated from the Ising model, which is denoted as

{\hat{X}}^{*} \sim Ising (γ, β)

. The exact recovery error probability for

{\hat{X}}^{*}

can be written as

P_{e} ({\hat{X}}^{*}) : = \sum_{G \in G_{n}} P_{G} (G) P_{σ | G} (S_{k}^{c} (X))

. From this expression we can see that the error probability is determined by two parameters

(γ, β)

. When these parameters take proper values,

P_{e} ({\hat{X}}^{*}) \to 0

, and the exact recovery of the SBM is achievable. On the contrary,

P_{e} ({\hat{X}}^{*}) \to 1

if

(γ, β)

takes other values. These two cases are summarized in the following theorem:

Theorem 2.

Define the function

g (β), \tilde{g} (β)

as follows:

g (β) = \frac{b e^{β} + a e^{- β}}{k} - \frac{a + b}{k} + 1

(8)

and:

\tilde{g} (β) = \{\begin{matrix} g (β) & β \leq \bar{β} = \frac{1}{2} log \frac{a}{b} \\ g (\bar{β}) = 1 - \frac{{(\sqrt{a} - \sqrt{b})}^{2}}{k} & β > \bar{β} \end{matrix}

(9)

where

\bar{β} = arg {min}_{β > 0} g (β)

. Let

β^{*}

be defined as:

β^{*} = log (\frac{a + b - k - \sqrt{{(a + b - k)}^{2} - 4 a b)}}{2 b})

(10)

which is the solution to the equation

g (β) = 0

and

β^{*} < \bar{β}

. Then depending on how

(γ, β)

take values, for any given

ϵ > 0

and

{\hat{X}}^{*} \sim Ising (γ, β)

, when n is sufficiently large, we have:

If $γ > b$ and $β > β^{*}$ , $P_{e} ({\hat{X}}^{*}) \leq n^{\tilde{g} (β) / 2 + ϵ}$ ;
If $γ > b$ and $β < β^{*}$ , $P_{a} ({\hat{X}}^{*}) \leq (1 + o (1)) max {n^{g (\bar{β})}, n^{- g (β) + ϵ}}$ ;
If $γ < b$ , $P_{a} ({\hat{X}}^{*}) \leq exp (- C n)$ for any $C > 0$ .

By simple calculus,

\tilde{g} (β) < 0

for

β > β^{*}

and

g (β) > 0

for

β < β^{*}

.

g (\bar{β}) < 0

follows from Equation (3). The illustration of

g (β), \tilde{g} (β)

is shown in Figure 1a. Therefore, for sufficiently small

ϵ

and as

n \to \infty

, the upper bounds in Theorem 2 all converge to 0 at least in polynomial speed. Therefore, Theorem 2 establishes the sharp phase transition property of the Ising model, which is illustrated in Figure 1b.

Theorem 2 can also be understood from the marginal distribution for

σ : P_{σ} (σ = \bar{σ}) = \sum_{G \in G_{n}} P_{G} (G) P_{σ | G} (σ = \bar{σ})

. Let

D (σ, σ^{'})

be the event when

σ

is closest to

σ^{'}

among all its permutations. That is,

D (σ, σ^{'}) : = {σ = arg min_{f \in S_{k}} dist (f (σ), σ^{'})}

(11)

Then Theorem 2 can be stated with respect to the marginal distribution

P_{σ}

:

Corollary 1.

Suppose

γ > b

, depending on how β takes values:

When $β > β^{*}$ , $P_{σ} (σ = X | D (σ, X)) = 1 - o (1)$ ;
When $β < β^{*}$ , $P_{σ} (σ = X | D (σ, X)) = o (1)$ .

Below we outline the proof ideas of Theorem 2. The insight is obtained from the analysis of the one-flip energy difference. This useful result is summarized in the following lemma:

Lemma 1.

Suppose

{\bar{σ}}^{'}

differs from

\bar{σ}

only at position r by

{\bar{σ}}_{r}^{'} = ω^{s} \cdot {\bar{σ}}_{r}

. Then the change of energy is:

\begin{matrix} H ({\bar{σ}}^{'}) - H (\bar{σ}) & = (1 + γ \frac{log n}{n}) \sum_{i \in N_{r} (G)} J_{s} ({\bar{σ}}_{r}, {\bar{σ}}_{i}) \\ + γ \frac{log n}{n} (m (ω^{s} \cdot {\bar{σ}}_{r}) - m ({\bar{σ}}_{r}) + 1) \end{matrix}

(12)

where

m (ω^{j}) : = | {i \in [n] | {\bar{σ}}_{i} = ω^{j} |}

,

N_{r} (G) : = {j | (r, j) \in E (G)}

and

J_{s} (x, y) = δ (x, y) - δ (ω_{s} \cdot x, y)

.

Lemma 1 gives an explicit way to compare the probability of two neighboring states by the following equality:

\frac{P_{σ | G} (σ = {\bar{σ}}^{'})}{P_{σ | G} (σ = \bar{σ})} = exp (- β (H ({\bar{σ}}^{'}) - H (\bar{σ})))

(13)

Additionally, since the graph is sparse and every node has

O (log n)

neighbors, from Equation (12) the computational cost (time complexity) for the energy difference is also

O (log n)

.

When

H ({\bar{σ}}^{'}) > H (\bar{σ})

, we can expect

P_{σ | G} (σ = {\bar{σ}}^{'})

is far less than

P_{σ | G} (σ = \bar{σ})

. Roughly speaking, if

\sum_{dist (σ^{'}, X) = 1} exp (- β (H ({\bar{σ}}^{'}) - H (X)))

converges to zero, we can expect the probability of all other states differing from

S_{k} (X)

converges to zero. On the contrary, if

\sum_{dist (σ^{'}, X) = 1} exp (- β (H ({\bar{σ}}^{'}) - H (X)))

tends to infinity, then

P_{σ} (S_{k} (X))

converges to zero. This illustrates the idea behind the proof of Theorem 2. The rigorous proof can be found in Section 8.

5. Community Detection via Energy Minimization

Since

β^{*}

is irrelevant with n, when

γ > b

, we can choose a sufficiently large

β

such that

β > β^{*}

, then by Theorem 2,

σ \in S_{k} (X)

almost surely, which implies that

P_{σ | G} (σ = X)

has the largest probability for almost all graphs G sampled from the SBM. Therefore, instead of sampling from the Ising model, we can directly maximize the conditional probability to find the state with the largest probability. Equivalently, we can proceed by minimizing the energy term in Equation (7):

{\hat{X}}^{'} : = arg min_{\bar{σ} \in W^{n}} H (\bar{σ})

(14)

In (14), we allow

\bar{σ}

to take values from

W^{n}

. Since we know X has equal size

| {v \in [n] : X_{v} = u} | = \frac{n}{k}

for each label u, another formulation is to restrict the search space to

W^{*} : = {σ \in W^{n} | | {v \in [n] : σ_{v} = ω^{s}} | = \frac{n}{k}, s = 0, \dots, k - 1}

. When

σ \in W^{*}

, minimizing

H (σ)

is equivalent to:

{\hat{X}}^{″} : = arg min_{σ \in W^{*}} \sum_{{i, j} \notin E (G)} δ (σ_{i}, σ_{j})

(15)

where the minimal value is the minimum cut between different detected communities.

When

{\hat{X}}^{″} \neq X

, we must have

dist ({\hat{X}}^{″}, X) \geq 2

to satisfy the constraint

{\hat{X}}^{″} \in W^{*}

. Additionally, the estimator of

{\hat{X}}^{″}

is parameter-free whereas

{\hat{X}}^{'}

depends on

γ

. The extra parameter

γ

in the expression of

{\hat{X}}^{'}

can be regarded as a kind of Lagrange multiplier for this integer programming problem. Thus, the optimization problem for

{\hat{X}}^{'}

is the relaxation of that for

{\hat{X}}^{″}

by introducing a penalized term and enlarging the searched space from

W^{*}

to

W^{n}

.

When

β > \bar{β}

,

\tilde{g} (β)

becomes a constant value. Therefore, we can get

n^{g (\bar{β}) / 2}

as the tightest error upper bound for the Ising estimator

{\hat{X}}^{*}

from Theorem 2. For the estimator

{\hat{X}}^{'}

and

{\hat{X}}^{″}

, we can obtain a sharper error upper bound, which is summarized in the following theorem:

Theorem 3.

When

\sqrt{a} - \sqrt{b} > \sqrt{k}

, for sufficiently large n,

If $γ > b$ , $P_{G} ({\hat{X}}^{'} \notin S_{k} (X)) \leq (k - 1 + o (1)) n^{g (\bar{β})}$ ;
$P_{G} ({\hat{X}}^{''} \notin S_{k} (X)) \leq ({(k - 1)}^{2} + o (1)) n^{2 g (\bar{β})}$ .

As

g (\bar{β}) < 0

,

n^{2 g (\bar{β})} < n^{g (\bar{β})} < n^{g (\bar{β}) / 2}

, Theorem 3 implies that

P_{e} ({\hat{X}}^{″})

has the sharpest upper bound among the three estimators. This can be intuitively understood as the result of smaller search space. The proof technique of Theorem 3 is to consider the probability of events

H (X) > H (\bar{σ})

for

dist (σ, X) \geq 1

. Then by union bound, these error probabilities can be summed up. We note that a loose bound

n^{g (\bar{β}) / 4}

was obtained in [7] for the estimator

{\hat{X}}^{″}

when

k = 2

. For a general case, since

\tilde{g} (β) = 1 - \frac{{(\sqrt{a} - \sqrt{b})}^{2}}{k}

, Theorem 3 implies that exact recovery is possible using

{\hat{X}}^{'}

as long as

\sqrt{a} - \sqrt{b} > \sqrt{k}

is satisfied.

Estimator

{\hat{X}}^{'}

has one parameter,

γ

. When

γ

takes different values,

{\hat{X}}^{'}

is equivalent with maximum likelihood or maximum modularity in the asymptotic case. The following analysis shows their relationship intuitively.

The maximum likelihood estimator is obtained by maximizing the log-likelihood function. From (2), this function can be written as:

log P_{G} (Z | X = σ) = - log \frac{a}{b} \cdot H (σ) + C

where the parameter

γ

in

H (σ)

satisfies

γ \frac{log n}{n} = \frac{1}{log (a / b)} (log (1 - \frac{a log n}{n}) - log (1 - \frac{b log n}{n}))

and C is a constant irrelevant with

σ

. When n is sufficiently large, we have

γ \to γ_{M L} : = \frac{a - b}{log (a / b)}

. That is, the maximum likelihood estimator is equivalent to

{\hat{X}}^{'}

when

γ = γ_{M L}

asymptotically.

The maximum modularity estimator is obtained by maximizing the modularity of a graph [23], which is defined by:

\begin{matrix} Q & = \frac{1}{2 | E |} \sum_{i j} (A_{i j} - \frac{d_{i} d_{j}}{2 | E |}) δ (C_{i}, C_{j}) \end{matrix}

(16)

For the i-th node,

d_{i}

is its degree and

C_{i}

is its community belonging. A is the adjacency matrix. Up to a scaling factor, the modularity Q can be re-written using the label vector

σ

as:

\begin{matrix} Q (σ) = & - \sum_{{i, j} \notin E (G)} \frac{d_{i} d_{j}}{2 | E |} δ (σ_{i}, σ_{j}) \\ + \sum_{{i, j} \in E (G)} (1 - \frac{d_{i} d_{j}}{2 | E |}) δ (σ_{i}, σ_{j}) \end{matrix}

(17)

From (17), we can see that

Q (σ) \to - H (σ)

with

γ = γ_{M Q} = \frac{a + b}{2}

as

n \to \infty

. Indeed, we have

d_{i} \sim \frac{(a + b) log n}{2}, | E | \sim \frac{1}{2} n d_{i}

. Therefore, we have

\frac{d_{i} d_{j}}{2 | E |} \to γ_{M Q} \frac{log n}{n}

. That is, the maximum modularity estimator is equivalent with

{\hat{X}}^{'}

when

γ = γ_{M Q}

asymptotically.

Using

a > b

and the inequality

x - 1 > log x > 2 \frac{x - 1}{x + 1}

for

x > 1

we can verify that

γ_{M Q} > γ_{M L} > b

. That is, both the maximum likelihood and the maximum modularity estimator satisfy the exact recovery conditions

γ > b

in Theorem 3.

6. Community Detection Based on Metropolis Sampling

From Theorem 2, if we could sample from the Ising model, then with large probability, the sample is aligned with X. However, exact sampling is difficult when n is very large since the cardinality of the state space increases in the rate of

k^{n}

. Therefore, some approximation is necessary, and the most common way to generate an Ising sample is using Metropolis sampling [14]. Empirically speaking, starting from a random state, the Metropolis algorithm updates the state by randomly selecting one position to flip its state at each iteration step. Then after some initial burning time, the generated samples can be regarded as sampling from the Ising model.

The theoretical guarantee of Metropolis sampling is based on the Markov chain. Under some general conditions, Metropolis samples converge to the steady state of the Markov chain, and the steady state follows the probability distribution to be approximated. For the Ising model, there are many previous works which have shown the convergence of Metropolis sampling [24].

For our specific Ising model and energy term in Equation (7), the pseudo code of our algorithm is summarized in Algorithm 1. This algorithm requires that the number of the communities k is known and the strength ratio parameter

γ

is given. We should choose

γ > b

where b is estimated by

\hat{b}

in Theorem 1. The iteration time N should also be specified in advance.

Algorithm 1 Metropolis sampling algorithm for SBM.

Inputs: the graph G, inverse temperature

β

, the strength ratio parameter

γ

Output:

\hat{X} = \bar{σ}

1:: random initialize $\bar{σ} \in W^{n}$
2:: for $i = 1, 2, \dots, N$ do
3:: propose a new state ${\bar{σ}}^{'}$ according to Lemma 1 where $s, r$ are randomly chosen
4:: compute $Δ H (r, s) = H ({\bar{σ}}^{'}) - H (\bar{σ})$ using (12)
5:: if $Δ H (r, s) < 0$ then
6:: $σ_{r} \leftarrow w^{s} \cdot σ_{r}$
7:: else
8:: with probability $exp (- β Δ H (r, s))$ such that $σ_{r} \leftarrow w^{s} \cdot σ_{r}$
9:: end if
10:: end for

The computation of

Δ H (r, s)

needs

O (log n)

time from Lemma 1. For some special Ising model, it needs to take

N = O (n log n)

to generate the sample for good approximation [25]. For our model, it is unknown whether

O (n log n)

is sufficient, and we empirically chose

N = O (n^{2})

in numerical experiments. Then the time complexity of Algorithm 1 is

O (n^{2} log n)

.

In the remaining part of this section, we present experiments conducted to verify our theoretical results. Firstly, we considered several combinations of

(a, b, k)

and obtained the estimator

(\hat{a}, \hat{b})

by Theorem 1. Using the empirical mean squared error (MSE)

\frac{1}{m} \sum_{i = 1}^{m} {(\hat{a} - a)}^{2} + {(\hat{b} - b)}^{2}

as the criterion and choosing

m = 1000

, the result is shown in Figure 2a. As we can see, as n increases, the MSE decreases polynomially fast. Therefore, the convergence of

\hat{a} \to a

and

\hat{b} \to b

was verified.

Secondly, using Metropolis sampling, we conducted a moderate simulation to verify Theorem 2 for the case

γ > b

. We chose

n = 9000, k = 2

, and the empirical accuracy was computed by

P_{e} = \frac{1}{m_{1} m_{2}} \sum_{i = 1}^{m_{1}} \sum_{j = 1}^{m_{2}} 1 [{\hat{X}}^{*} = \pm X]

. In this formula,

m_{1}

is the number of times the random graph was generated by the SBM, whereas

m_{2}

is the number of times consecutive samples were generated by Algorithm 1 for a given graph. We chose

m_{1} = 2100, m_{2} = 6000

, which is fairly large and can achieve a good approximation of

P_{e} ({\hat{X}}^{*})

by the law of large numbers. The result is shown in Figure 2b.

The vertical red line (

β = β^{*} = 0.198

), computed from (10), represents the phase transition threshold. The point

(0.199, \frac{1}{2})

in the figure can be regarded as the empirical phase transition threshold, whose first coordinate is close to

β^{*}

. The green line

(β, n^{g (β) / 2})

is the theoretical lower bound of accuracy for

β > β^{*}

, and the purple line

(β, n^{- g (β)})

is the theoretical upper bound of accuracy for

β < β^{*}

. It can be expected that as n becomes larger, the empirical accuracy curve (blue line in the figure) will approach the step function, which jumps from 0 to 1 at

β = β^{*}

.

7. Conclusions

In this paper, we presented one convergent estimator (in Theorem 1) to infer the parameters of the SBM and analyzed three label estimators to detect communities of the SBM. We gave the exact recovery error upper bound for all label estimators (in Theorems 2 and 3) and studied their relationships. By introducing the Ising model, our work makes a new path to study the exact recovery problem for the SBM. More theoretical and empirical work will be done in the future, such as convergence analyses on modularity (in Equation (17)), the necessary iteration time (in Algorithm 1) for Metropolis sampling, and so on.

8. Proof of Main Theorems

8.1. Proof of Theorem 1

Lemma 2.

Consider an Erdős–Rényi random graph G with n nodes, in which edges are placed independently with probability p [26]. Suppose

p = \frac{a log n}{n}

, the number of edges is denoted by

| E |

while the number of triangles is T. Then

\frac{| E |}{n log n} \to \frac{a}{2}

and

\frac{T}{{log}^{3} n} \to \frac{a^{3}}{6}

in probability.

Proof.

Let

X_{i j}

represent a Bernoulli random variable with parameter p. Then

| E | = \sum_{i, j} X_{i j}

,

X_{i j}

are i.i.d.

E [T (G)] = \frac{n (n - 1)}{2} p = \frac{(n - 1) log n}{2} a

and

Var [| E |] = \frac{n (n - 1)}{2} p (1 - p) < a \frac{(n - 1) log n}{2}

. Then by Chebyshev’s inequality,

\begin{matrix} P (| \frac{| E |}{n log n} - \frac{a}{2} \frac{n - 1}{n} | > ϵ) & \leq \frac{Var [| E | / (n log n)]}{ϵ^{2}} \\ < \frac{a (n - 1)}{2 n^{2} ϵ^{2} log n} \end{matrix}

For a given

ϵ

, when n is sufficiently large,

\begin{matrix} P (| \frac{| E |}{n log n} - \frac{a}{2} | > ϵ) & < P (| \frac{| E |}{n log n} - \frac{a}{2} \frac{n - 1}{n} | > 2 ϵ) \\ \leq \frac{n - 1}{8 n^{2} ϵ^{2} log n} \end{matrix}

Therefore, by the definition of convergence in probability, we have

\frac{| E |}{n log n} \to \frac{a}{2}

as

n \to \infty

.

Let

X_{i j k}

represents a Bernoulli random variable with parameter

p^{3}

. Then

T = \sum_{i, j, k} X_{i j k}

. It is easy to compute that

E [T] = (\binom{n}{3}) p^{3}

. Since

X_{i j k}

are not independent, the variance of T needs careful calculation. From [27] we know that:

\begin{matrix} Var [T] & = (\binom{n}{3}) p^{3} + 12 (\binom{n}{4}) p^{5} + 30 (\binom{n}{5}) p^{6} + 20 (\binom{n}{6}) p^{6} \\ - {(\binom{n}{3})}^{2} p^{6} = O ({log}^{3} n) \end{matrix}

Therefore by Chebyshev’s inequality,

\begin{matrix} P (| \frac{T}{{log}^{3} n} - \frac{a^{3}}{6} \frac{(n - 1) (n - 2)}{n^{2}} | > ϵ) & \leq \frac{Var [T / {log}^{3} n]}{ϵ^{2}} \\ = \frac{1}{ϵ^{2}} O (\frac{1}{{log}^{3} n}) \end{matrix}

Hence,

\frac{T}{{log}^{3} n} \to \frac{a^{3}}{6}

. □

The convergence of

| E |

in the Erdős–Rényi graph can be extended directly to the SBM since the existence of each edge is independent. However, for T, it is a little tricky since the existences of each triangle are mutually dependent. The following two lemmas give the formula for the variance of inter-community triangles in the SBM.

Lemma 3.

Consider a two-community SBM

(2 n, p, q)

and count the number of triangles T, which has a node in

S_{1}

and an edge in

S_{2}

. Then the variance of T is:

\begin{matrix} Var [T] & = \frac{n^{2} (n - 1)}{2} q^{2} p + n^{2} (n - 1) (n - 2) p^{2} q^{3} \\ + \frac{n^{2} {(n - 1)}^{2}}{2} q^{4} p - \frac{n^{2} (n - 1) (3 n - 4)}{2} q^{4} p^{2} \end{matrix}

(18)

Lemma 4.

Consider a three-community SBM

(3 n, p, q)

and count the number of triangles T, which has a node in

S_{1}

, one node in

S_{2}

, and one node in

S_{3}

. Then the variance of T is:

Var [T] = n^{3} q^{3} + 3 n^{3} (n - 1) q^{4} + 3 n^{3} {(n - 1)}^{2} q^{5} - n^{3} (3 n^{2} - 3 n + 1) q^{6}

The proof of the above two lemmas uses some counting techniques and is similar to that in [27], and we omit it here.

Lemma 5.

For a SBM

(n, k, p, q)

where

p = \frac{a log n}{n}, q = \frac{b log n}{n}

. The number of triangles is T. Then

\frac{T}{{(log n)}^{3}}

converges to

\frac{1}{k^{2}} (\frac{a^{3}}{6} + \frac{k - 1}{2} a b^{2} + (k - 1) (k - 2) \frac{b^{3}}{6})

in probability as

n \to \infty

.

Proof.

We split T into three parts: the first is the number of triangles within community i,

T_{i}

. There are k terms of

T_{i}

. The second is the number of triangles that have one node in community i and one edge in community j,

T_{i j}

. There are

k (k - 1)

terms of

T_{i j}

. The third is the number of triangles that have one node in community i, one node in community j and one node in community k.

We only need to show that:

\begin{matrix} \frac{T_{i}}{{log}^{3} n} & \to \frac{{(a / k)}^{3}}{6} \end{matrix}

(19)

\begin{matrix} \frac{T_{i j}}{{log}^{3} n} & \to \frac{1}{2} (a / k) {(b / k)}^{2} \end{matrix}

(20)

\begin{matrix} \frac{T_{i j k}}{{log}^{3} n} & \to {(b / k)}^{3} \end{matrix}

(21)

The convergence of

\frac{T_{i}}{{log}^{3} n}

comes from Lemma 2. For

T_{i j}

we use the conclusion from Lemma 3. We replace n with

n / k

,

p = a \frac{log n}{n}

, and

q = b \frac{log n}{n}

in Equation (18).

Var [T_{i j}] \sim \frac{a b^{2}}{2 k^{3}} {log}^{3} n

. Since the expectation of

\frac{T_{i j}}{{log}^{3} n}

is

(n / k) (\binom{n / k}{2}) p q^{2} / ({log}^{3} n) = \frac{n - 1}{2 n} \frac{a b^{2}}{k^{3}}

, by Chebyshev’s inequality we can show that:

\begin{matrix} P (| \frac{T_{i j}}{{log}^{3} n} - \frac{n - 1}{2 n} \frac{a b^{2}}{k^{3}} | > ϵ) & \leq \frac{Var [T_{i j} / {log}^{3} n]}{ϵ^{2}} \\ = \frac{1}{ϵ^{2}} O (\frac{1}{{log}^{3} n}) \end{matrix}

Therefore,

\frac{T_{i j}}{{log}^{3} n}

converges to

\frac{1}{2} (a / k) {(b / k)}^{2}

.

To prove

\frac{T_{i j k}}{{log}^{3} n} \to {(b / k)}^{3}

, from Lemma 4 we can get

Var [T_{i j k}] = O ({log}^{5} n)

:

P (| \frac{T_{i j k}}{{log}^{3} n} - \frac{b^{3}}{k^{3}} | > ϵ) \leq \frac{Var [T_{i j k} / {log}^{3} n]}{ϵ^{2}} = \frac{1}{ϵ^{2}} O (\frac{1}{log n})

□

Proof of Theorem 1.

Let

e_{1}^{*} = \frac{a + (k - 1) b}{2 k}

,

k^{2} e_{2}^{*} = \frac{a^{3}}{6} + \frac{k - 1}{2} a b^{2} + (k - 1) (k - 2) \frac{b^{3}}{6}

and

e_{1} = \frac{T_{1}}{n log n}, e_{2} = \frac{T_{2}}{{log}^{3} n}

. From Lemma 2,

e_{1} \to e_{1}^{*}

. From Lemma 5,

e_{2} \to e_{2}^{*}

as

n \to \infty

. Using

x = 2 k e_{1} - (k - 1) y

, we can get:

g (y) : = (k - 1) (y^{3} - 6 e_{1} y^{2} + 12 e_{1}^{2} y) + 6 e_{2} - 8 k e_{1}^{3} = 0

(22)

This equation has a unique real root since

g (y)

is increasing on

R

:

g^{'} (y) = 3 (k - 1) {(y - 2 e_{1})}^{2} \geq 0

. Next we show that the root lies within

(0, 2 e_{1})

.

\begin{matrix} lim_{n \to \infty} g (0) = 6 e_{2}^{*} - 8 k {(e_{1}^{*})}^{3} = - \frac{3}{k^{2}} (k - 1) (k - 2) a b^{2} \\ - \frac{3 (k - 1)}{k^{2}} a^{2} b - \frac{k - 1}{k^{2}} ((k - 2) - {(k - 1)}^{2}) b^{3} < 0 \\ lim_{n \to \infty} g (2 e_{1}) = 6 e_{2}^{*} - 8 {(e_{1}^{*})}^{3} = \frac{(k - 1) {(a - b)}^{3}}{k^{3}} > 0 \end{matrix}

Therefore, we can get a unique solution y within

(0, 2 e_{1})

. Since

(a, b)

is a solution for the equation array, the conclusion follows.

By taking expectation on both sizes of Equations (4) and (5) we can show

E [\hat{a}] = a, E [\hat{b}] = b

. By the continuous property of

g (y)

,

\hat{b} \to b

and

\hat{a} \to a

follows similarly. □

8.2. Proof of Theorem 2

Proof of Lemma 1.

First we rewrite the energy term in (7) as:

H (\bar{σ}) = γ \frac{log n}{n} \sum_{i < j} δ ({\bar{σ}}_{i}, {\bar{σ}}_{j}) - (1 + γ \frac{log n}{n}) \sum_{{i, j} \in E (G)} δ ({\bar{σ}}_{i}, {\bar{σ}}_{j})

Then calculating the energy difference term by:

\begin{matrix} H ({\bar{σ}}^{'}) - H (\bar{σ}) & = (1 + γ \frac{log n}{n}) \\ \cdot \sum_{i \in N_{r} (G)} (δ ({\bar{σ}}_{r}, {\bar{σ}}_{i}) - δ (ω^{s} \cdot {\bar{σ}}_{r}, {\bar{σ}}_{i})) \\ + γ \frac{log n}{n} \sum_{i \neq r} (δ (ω^{s} \cdot {\bar{σ}}_{r}, {\bar{σ}}_{i}) - δ ({\bar{σ}}_{r}, {\bar{σ}}_{i})) \\ = (1 + γ \frac{log n}{n}) \sum_{i \in N_{r} (G)} J_{s} ({\bar{σ}}_{r}, {\bar{σ}}_{i}) \\ + γ \frac{log n}{n} \sum_{i = 1}^{n} (δ (ω^{s} \cdot {\bar{σ}}_{r}, {\bar{σ}}_{i}) - δ ({\bar{σ}}_{r}, {\bar{σ}}_{i}) + 1) \\ = (1 + γ \frac{log n}{n}) \sum_{i \in N_{r} (G)} J_{s} ({\bar{σ}}_{r}, {\bar{σ}}_{i}) \\ + γ \frac{log n}{n} (m (ω^{s} \cdot {\bar{σ}}_{r}) - m ({\bar{σ}}_{r}) + 1) \end{matrix}

□

Before diving into the technical proof of Theorem 2, we need to introduce some extra notations. When

\bar{σ}

differs from X only at position r, taking

\bar{σ} = X

in Lemma 1, we have:

H ({\bar{σ}}^{'}) - H (\bar{σ}) = (1 + γ \frac{log n}{n}) (A_{r}^{0} - A_{r}^{s}) + γ \frac{log n}{n}

(23)

where

A_{r}^{s}

is defined as

A_{r}^{s} = | {j \in [n] \ {r} : {j, r} \in E (G), X_{j} = ω^{s} \cdot X_{r}} |

. Since the existence of each edge in G is independent,

A_{r}^{s} \sim Bernoulli (\frac{n}{k}, \frac{b log n}{n})

for

s \neq 0

and

A_{0}^{s} \sim Bernoulli (\frac{n}{k} - 1, \frac{a log n}{n})

.

For the general case, we can write:

H (\bar{σ}) - H (X) = (1 + γ \frac{log n}{n}) [A_{\bar{σ}} - B_{\bar{σ}}] + γ \frac{log n}{n} N_{\bar{σ}}

(24)

in which we use

A_{\bar{σ}}

or

B_{\bar{σ}}

to represent the binomial random variable with parameter

\frac{a log n}{n}

or

\frac{b log n}{n}

, respectively, and

N_{\bar{σ}}

is a deterministic positive number depending on

\bar{σ}

but irrelevant with the graph structure. The following lemma gives the expression of

A_{\bar{σ}}, B_{\bar{σ}}

and

N_{\bar{σ}}

:

Lemma 6.

For SSBM

(n, k, p, q)

, we assume

\bar{σ}

differs from the ground truth label vector X in the

| I | : = dist (\bar{σ}, X)

coordinate. Let

I_{i j} = | {r \in [n] | X_{r} = w^{i}, σ_{r} = w^{j}}

for

i \neq j

and

I_{i i} = 0

. We further denote the row sum as

I_{i} = \sum_{j = 0}^{k - 1} I_{i j}

and the column sum as

I_{i}^{'} = \sum_{j = 0}^{k - 1} I_{j i}

. Then:

\begin{matrix} N_{\bar{σ}} & = \frac{1}{2} \sum_{i = 0}^{k - 1} {(I_{i} - I_{i}^{'})}^{2} \end{matrix}

(25)

\begin{matrix} B_{\bar{σ}} & \sim B e r n o u l l i (\frac{n}{k} | I | + \frac{1}{2} \sum_{i = 0}^{k - 1} (- 2 I_{i}^{'} I_{i} + I_{i}^{' 2} - \sum_{j = 0}^{k - 1} I_{j i}^{2}), q) \end{matrix}

(26)

\begin{matrix} A_{\bar{σ}} & \sim B e r n o u l l i (\frac{n}{k} | I | - \frac{1}{2} \sum_{i = 0}^{k - 1} (I_{i}^{2} + \sum_{j = 0}^{k - 1} I_{i j}^{2}), p) \end{matrix}

(27)

The proof of Lemma 6 is mainly composed of careful counting techniques, and we omit it here. When

| I |

is small compared to n, we have the following Lemma, which is an extension of Proposition 6 in [12].

Lemma 7.

For

t \in [\frac{1}{k} (b - a), 0]

and

| I | \leq n / \sqrt{log n}

\begin{matrix} P_{G} (B_{\bar{σ}} - A_{\bar{σ}} \geq t | I | log n) \\ \leq & exp (| I | log n (f_{β} (t) - β t - 1 + O (\frac{1}{\sqrt{log n}}))) \end{matrix}

(28)

where

f_{β} (t) = {min}_{s \geq 0} (g (s) - s t) + β t \leq \tilde{g} (β)

.

Corresponding to the three cases of Theorem 2, we use three non-trivial lemmas to establish the properties of the Ising model.

Lemma 8.

Let

γ > b

. When

dist (\bar{σ}, X) \geq \frac{n}{\sqrt{log n}}

and

D (\bar{σ}, X)

, the event

P_{σ | G} (σ = \bar{σ}) > exp (- C n) P_{σ | G} (σ = X)

happens with a probability (with respect to SSBM) less than

exp (- τ (α, β) n \sqrt{log n})

, where C is an arbitrary constant and

τ (α, β)

is a positive number.

Proof.

We denote the event

P_{σ | G} (σ = \bar{σ}) > exp (- C n) P_{σ | G} (σ = X)

as

\tilde{D} (\bar{σ}, C)

. By Equation (24),

\tilde{D} (\bar{σ}, C)

is equivalent to:

(1 + \frac{γ log n}{n}) [B_{\bar{σ}} - A_{\bar{σ}}] > \frac{γ log n}{n} N_{\bar{σ}} - \frac{C}{β} n

(29)

We claim that

\bar{σ}

must satisfy at least one of the following two conditions:

$\exists i \neq j$ s.t. $\frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}} \leq I_{i j} \leq \frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}$
$\exists i \neq j$ s.t. $I_{i j} > \frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}$ and $I_{j i} < \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}$

If neither of the above two condition holds, then from condition 1 we have

I_{i j} < \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}

or

I_{i j} > \frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}

for any

0 \leq i, j \leq k - 1

. Since

\sum_{i, j} I_{i j} = | I | \geq \frac{n}{\sqrt{log n}}

, there exists

i, j

such that

I_{i j} > \frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}

. Under such conditions, we also assume

I_{j i} > \frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}

. Let

X^{'}

be the vector that exchanges the value of

w^{i}

with

w^{j}

in X. We consider:

\begin{matrix} dist (\bar{σ}, X^{'}) & - dist (\bar{σ}, X) = | {r \in [n] | X_{r} = w^{i}, {\bar{σ}}_{r} \neq w^{j}} | \\ + | {r \in [n] | X_{r} = w^{j}, {\bar{σ}}_{r} \neq w^{i}} | \\ - | {r \in [n] | X_{r} = w^{i}, {\bar{σ}}_{r} \neq w^{i}} | \\ - | {r \in [n] | X_{r} = w^{j}, {\bar{σ}}_{r} \neq w^{j}} | \\ = \frac{n}{k} - I_{i j} + \frac{n}{k} - I_{j i} - I_{i} - I_{j} \\ < \frac{2}{k (k - 1)} \frac{n}{\sqrt{log n}} - I_{i} - I_{j} < 0 \end{matrix}

(30)

which contracts with the fact that

\bar{σ}

is nearest to X. Therefore, we should have

I_{j i} < \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}

. Now the

(i, j)

pair satisfies condition 2, which contracts with the fact that

\bar{σ}

satisfies neither of the two conditions.

Under condition 1, we can get a lower bound on

| A_{\bar{σ}} |

from Equation (27). Let

I_{i j}^{'} = I_{i j}

for

i \neq j

and

I_{i i}^{'} = \frac{n}{k} - I_{i}

. Then we can simplify

| A_{\bar{σ}} |

as:

\begin{matrix} | A_{\bar{σ}} | & = \frac{n}{k} | I | - \frac{1}{2} \sum_{i = 0}^{k - 1} (I_{i}^{2} + \sum_{j = 0}^{k - 1} I_{i j}^{2}) \\ = \frac{n^{2}}{2 k} - \frac{1}{2} \sum_{i = 0}^{k - 1} \sum_{j = 0}^{k - 1} I_{i j}^{' 2} \end{matrix}

We further have

\sum_{i = 0}^{k - 1} \sum_{j = 0}^{k - 1} {I^{'}}_{i j}^{2} \leq (k - 1) \frac{n^{2}}{k^{2}} + {(\frac{n}{k} - I_{i j})}^{2} + I_{i j}^{2}

where

I_{i j}

satisfies condition 1. Therefore,

\sum_{i = 0}^{k - 1} \sum_{j = 0}^{k - 1} {I^{'}}_{i j}^{2} \leq (k - 1) \frac{n^{2}}{k^{2}} + {(\frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}})}^{2} + {(\frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}})}^{2} = \frac{n^{2}}{k} - \frac{2 n^{2}}{k^{2} (k - 1) \sqrt{log n}} (1 + o (1))

. As a result,

A_{\bar{σ}} \geq \frac{n^{2}}{k^{2} (k - 1) \sqrt{log n}} (1 + o (1))

(31)

Under condition 2, we can get a lower bound on

N_{\bar{σ}}

. Since

dist (\bar{σ}, X^{'}) - dist (\bar{σ}, X) \geq 0

, from (30) we have

I_{i j} + I_{j i} + I_{i} + I_{j} \leq \frac{2 n}{k}

. Since

I_{i} \geq I_{i j} > \frac{n}{k} - \frac{1}{k (k - 1)} \frac{n}{\sqrt{log n}}

, we have

I_{j} \leq \frac{2}{k (k - 1)} \frac{n}{\sqrt{log n}}

. Now consider

I_{j}^{'} - I_{j} \geq \frac{n}{k} - \frac{3}{k (k - 1)} \frac{n}{\sqrt{log n}}

. From (25):

N_{\bar{σ}} \geq \frac{1}{2} {(\frac{n}{k} - \frac{3}{k (k - 1)} \frac{n}{\sqrt{log n}})}^{2} = \frac{n^{2}}{2 k^{2}} (1 + o (1))

.

Now we use the Chernoff inequality to bound Equation (29); we can omit

\frac{γ log n}{n}

on the left-hand side since it is far smaller than 1. Let

Z \sim Bernoulli (\frac{a log n}{n}), Z^{'} \sim Bernoulli (\frac{b log n}{n})

, then:

\begin{matrix} P_{G} (\tilde{D} (\bar{σ}, C)) \leq {(E [exp (s Z)])}^{| B_{\bar{σ}} |} {(E [exp (- s Z^{'})])}^{| A_{\bar{σ}} |} \\ \cdot exp (- s (\frac{γ log n}{n} N_{\bar{σ}} - \frac{C}{β} n)) \\ \leq exp (| B_{\bar{σ}} | \frac{b log n}{n} (e^{s} - 1) + | A_{\bar{σ}} | \frac{a log n}{n} (e^{- s} - 1) \\ - s (\frac{γ log n}{n} N_{\bar{σ}} - \frac{C}{β} n)) \end{matrix}

Using

| B_{\bar{σ}} | = N_{\bar{σ}} + | A_{\bar{σ}} |

we can further simplify the exponential term as:

\frac{log n}{n} [| A_{\bar{σ}} | (b (e^{s} - 1) + a (e^{- s} - 1)) + N_{\bar{σ}} (b (e^{s} - 1) - γ s)] + s \frac{C}{β} n

Now we investigate the function

g_{1} (s) = b (e^{s} - 1) + a (e^{- s} - 1)

and

g_{2} (s) = b (e^{s} - 1) - γ s

. Both functions take zero values at

s = 0

and

g_{1}^{'} (s) = (b e^{s} - a e^{- s}), g_{2}^{'} (s) = b e^{s} - γ

. Therefore,

g_{1}^{'} (0) = b - a < 0, g_{2}^{'} (0) = b - γ < 0

and we can choose

s^{*} > 0

such that

g_{1} (s^{*}) < 0, g_{2} (s^{*}) < 0

. To compensate the influence of the term

s C n / β

we only need to make sure that the order of

\frac{log n}{n} min {| A_{\bar{σ}} |, N_{\bar{σ}}}

is larger than n. This requirement is satisfied since either

| A_{\bar{σ}} | \geq \frac{n^{2}}{k^{2} (k - 1) \sqrt{log n}} (1 + o (1))

or

N_{\bar{σ}} \geq \frac{n^{2}}{2 k^{2}} (1 + o (1))

. □

Lemma 9.

If

γ > b

,

β > β^{*}

, For

1 \leq r \leq \frac{n}{\sqrt{log n}}

and

\forall ϵ > 0

, there is a set

G^{(r)}

such that:

P_{G} (G_{n}^{(r)}) \geq 1 - n^{r (\tilde{g} (β) / 2 + ϵ)}

(32)

and for every

G \in G_{n}^{(r)}

,

\frac{P_{σ | G} (dist (σ, X) = r | D (σ, X))}{P_{σ | G} (σ = X | D (σ, X))} < n^{r \tilde{g} (β) / 2}

(33)

For

r > \frac{n}{\sqrt{log n}}

, there is a set

G^{(r)}

such that:

P (G \in G_{n}^{(r)}) \geq 1 - e^{- n}

(34)

and for every

G \in G_{n}^{(r)}

,

\frac{P_{σ | G} (dist (σ, X) = r | D (σ, X))}{P_{σ | G} (σ = X | D (σ, X))} < e^{- n}

(35)

Proof.

We distinguish the discussion between two cases:

r \leq \frac{n}{\sqrt{log n}}

and

r > \frac{n}{\sqrt{log n}}

.

When

r \leq \frac{n}{\sqrt{log n}}

, we can show that

dist (σ, X) = r

implies

D (σ, X)

by using the triangle inequality of dist. For

f \in S_{k} \ {id}

, where

id

is the identity mapping, we have:

\frac{2 n}{k} \leq dist (f (X), X) \leq dist (σ, f (X)) + dist (σ, X)

Therefore,

dist (σ, f (X)) \geq \frac{2 n}{k} - \frac{n}{\sqrt{log n}} \geq dist (σ, X)

and Equation (33) is equivalent with:

\frac{P_{σ | G} (dist (σ, X) = r)}{P_{σ | G} (σ = X)} < n^{r \tilde{g} (β) / 2}

(36)

The left-hand side can be written as:

\begin{matrix} \frac{P_{σ | G} (dist (σ, X) = r)}{P_{σ | G} (σ = X)} & = \sum_{dist (\bar{σ}, X) = r} exp (- β (H (\bar{σ}) - H (X))) \\ by (24) & \leq \sum_{dist (\bar{σ}, X) = r} exp (β_{n} (B_{\bar{σ}} - A_{\bar{σ}})) \end{matrix}

where

β_{n} = β (1 + γ \frac{log n}{n})

.

Define

Ξ_{n} (r) : = \sum_{dist (\bar{σ}, X) = r} exp (β_{n} (B_{\bar{σ}} - A_{\bar{σ}}))

and we only need to show that:

P_{G} (Ξ_{n} (r) \geq n^{r \tilde{g} (β) / 2}) \leq n^{r (\tilde{g} (β) / 2 + ϵ)}

(37)

Define the event

Λ_{n} (G, r) : = {B_{\bar{σ}} - A_{\bar{σ}} < 0, \forall \bar{σ} s . t . dist (\bar{σ}, X) = r}

, and we proceed as follows:

\begin{matrix} P_{G} (Ξ_{n} (r) \geq n^{r \tilde{g} (β) / 2}) & \leq P_{G} (Λ_{n} {(G, r)}^{c}) \\ + P_{G} (Ξ_{n} (r) \geq n^{r \tilde{g} (β) / 2} | Λ_{n} (G, r)) \end{matrix}

For the first term, since

| {\bar{σ} | dist (\bar{σ}, X) = r} | = {(k - 1)}^{r} n^{r}

, by Lemma 7,

P_{G} (Λ_{n} {(G, r)}^{c}) \leq {(k - 1)}^{r} n^{r g (\bar{β})} \leq n^{r (\tilde{g} (β) / 2 + ϵ / 2)}

. For the second term, we use Markov inequality:

\begin{matrix} P_{G} (Ξ_{n} (r) \geq n^{r \tilde{g} (β) / 2} | Λ_{n} (G, r)) \leq E [Ξ_{n} (r) | Λ_{n} (G, r)] n^{- r \tilde{g} (β) / 2} \end{matrix}

The conditional expectation can be estimated as follows:

\begin{matrix} E [Ξ_{n} (r) | Λ_{n} (G, r)] = \sum_{dist (\bar{σ}, X) = r} \sum_{t r log n = - \infty}^{- 1} \\ P_{G} (B_{\bar{σ}} - A_{\bar{σ}} = t r log n) exp (β_{n} t r log n) \\ \leq {(k - 1)}^{r} n^{r + r β_{n} (b - a) / k} + \sum_{dist (\bar{σ}, X) = r} \sum_{t r log n = r \frac{b - a}{k} log n}^{- 1} \\ P_{G} (B_{\bar{σ}} - A_{\bar{σ}} = t r log n) exp (β_{n} t r log n) \end{matrix}

r + r β_{n} (b - a) / k = f_{β_{n}} (\frac{b - a}{k}) < \tilde{g} (β_{n})

, therefore,

{(k - 1)}^{r} n^{r + r β_{n} (b - a) / k} n^{- r \tilde{g} (β) / 2} \leq

n^{r (\tilde{g} (β) / 2 + ϵ / 2)}

. Using Lemma 7, we have:

\begin{matrix} P_{G} (B_{\bar{σ}} - A_{\bar{σ}} = t r log n) exp (β_{n} r t log n) \leq n^{r (f_{β_{n}} (t) - 1 + O (\frac{1}{\sqrt{log n}}))} \end{matrix}

Since

β_{n} \to β

,

\forall ϵ

, when n is sufficiently large we have

\tilde{g} (β_{n}) \leq \tilde{g} (β) + ϵ / 2

. Therefore,

\begin{matrix} \sum_{dist (\bar{σ}, X) = r} \sum_{\begin{matrix} t r log n = \\ r (b - a) / k log n \end{matrix}}^{- 1} P_{G} (B_{\bar{σ}} - A_{\bar{σ}} = t log n) exp (β_{n} r t log n) \\ \leq n^{r (\tilde{g} (β_{n}) - \tilde{g} (β) / 2)} \\ \leq n^{r (\tilde{g} (β) / 2 + ϵ / 2)} O (log n) {(k - 1)}^{r} \end{matrix}

Combining the above equations, we have:

\begin{matrix} P_{G} (Ξ_{n} (r) \geq n^{r \tilde{g} (β) / 2}) & \leq n^{r (\tilde{g} (β) / 2 + ϵ / 2)} O (log n) {(k - 1)}^{r} \\ \leq n^{r (\tilde{g} (β) / 2 + ϵ)} \end{matrix}

When

r > \frac{n}{\sqrt{log n}}

, using Lemma 8, we can choose a sufficiently large constant

C > 1

such that

k^{n} exp (- C n) < e^{- n}

:

\begin{matrix} \frac{P_{σ | G} (dist (σ, X) = r | D (σ, X))}{P_{σ | G} (σ = X | D (σ, X))} & = \sum_{\begin{matrix} D (σ, X) \\ dist (σ, X) = r \end{matrix}} \frac{P_{σ | G} (σ = \bar{σ})}{P_{σ | X} (σ = X)} \\ > exp (- n) \end{matrix}

happens with probability less than

e^{- n}

. Therefore, Equation (35) holds.□

If

γ > b

and

β < β^{*}

, we have the following lemma:

Lemma 10.

If

γ > b

and

β < β^{*}

, there is a set

G_{n}^{(1)}

such that

P_{G} (G_{n}^{(1)}) \geq 1 - n^{g (\bar{β})}

and:

\begin{matrix} E [\sum_{r = 1}^{n} exp (β_{n} (A_{r}^{s} - A_{r}^{0})) | G \in G_{n}^{(1)}] & = (1 + o (1)) n^{g (β_{n})} \end{matrix}

(38)

\begin{matrix} Var [\sum_{r = 1}^{n} exp (β_{n} (A_{r}^{s} - A_{r}^{0})) | G \in G_{n}^{(1)}] & \leq (1 + o (1)) n^{g (2 β_{n})} \end{matrix}

(39)

Lemma 10 is an extension of Proposition 10 in [12] and can be proved using almost the same analysis. Thus we omit the proof of Lemma 10 here.

Lemma 11.

If

γ > b

and

β < β^{*}

, there is a set

G_{n}^{'}

such that:

P_{G} (G_{n}^{'}) \geq 1 - (1 + o (1)) max {n^{g (\bar{β})}, n^{\tilde{g} (2 β_{n}) - 2 g (β_{n}) + ϵ}}

(40)

and for every

G \in G_{n}^{'}

,

\frac{P_{σ | G} (dist (σ, X) = 1 | D (σ, X))}{P_{σ | G} (σ = X | D (σ, X))} \geq (1 + o (1)) n^{g (β_{n})}

(41)

Proof.

The left-hand side of Equation (41) can be rewritten as:

\frac{P_{σ | G} (dist (σ, X) = 1)}{P_{σ | G} (σ = X)} = (1 + o (1)) \sum_{s = 1}^{k - 1} \sum_{r = 1}^{n} exp (β_{n} (A_{r}^{s} - A_{r}^{0}))

(42)

Let

G_{n}^{(1)}

be defined in Lemma 10 and

G_{n}^{(2)} : = {| \sum_{r = 1}^{n} exp (β_{n} (A_{r}^{s} - A_{r}^{0})) - (1 + o (1)) n^{g (β_{n})} | \leq n^{g (β_{n}) - ϵ / 2}}

.

Using Chebyshev’s inequality, we have:

P_{G} (G \notin G_{n}^{(2)} | G \in G_{n}^{(1)}) \leq n^{\tilde{g} (2 β_{n}) - 2 g (β_{n}) + ϵ}

Let

G_{n}^{'} = G_{n}^{(1)} \cap G_{n}^{(2)}

:

\begin{matrix} P_{G} (G \in G_{n}^{'}) & = P_{G} (G_{n}^{(1)}) P_{G} (G \in G_{n}^{(2)} | G \in G_{n}^{(1)}) \\ \geq (1 - n^{\tilde{g} (2 β_{n}) - 2 g (β_{n}) + ϵ}) (1 - n^{g (\bar{β})}) \\ = 1 - (1 + o (1)) max {n^{g (\bar{β})}, n^{\tilde{g} (2 β_{n}) - 2 g (β_{n}) + ϵ}} \end{matrix}

and for every

G \in G_{n}^{'}

,

\sum_{r = 1}^{n} exp (β_{n} (A_{r}^{s} - A_{r}^{0})) = (1 + o (1)) n^{g (β_{n})}

Therefore, from Equation (42) we have:

\frac{P_{σ | G} (dist (σ, X) = 1)}{P_{σ | G} (σ = X)} \geq (1 + o (1)) n^{g (β_{n})}

□

Let

Λ : = {ω^{j} \cdot 1_{n} | j = 0, \dots, k - 1}

where

1_{n}

is the all-ones vector with dimension n, and we have the following lemma:

Lemma 12.

Suppose

γ < b

and

\bar{σ}

satisfies

dist (\bar{σ}, 1_{n}) \geq \frac{n}{\sqrt{log n}}

and

D (\bar{σ}, 1_{n})

. Then the event

P_{σ | G} (σ = \bar{σ}) > exp (- C n) P_{σ | G} (σ = 1_{n})

happens with a probability (with respect to SSBM) less than

exp (- τ (α, β) n \sqrt{log n})

where C is an arbitrary constant,

τ (α, β)

is a positive number.

Proof.

Let

n_{r} = | {{\bar{σ}}_{i} = w^{r} | i \in [n]} |

. Then

n_{0} \geq n_{r}

for

r = 1, \dots, k - 1

since

arg {min}_{σ^{'} \in Λ} dist (\bar{σ}, σ^{'}) = 1_{n}

. Without loss of generality, we suppose

n_{0} \geq n_{1} \dots \geq n_{k - 1}

. Define

N_{\bar{σ}} = \frac{1}{2} (n (n - 1) - \sum_{r = 0}^{k - 1} n_{r} (n_{r} - 1)) = \frac{1}{2} (n^{2} - \sum_{r = 0}^{k - 1} n_{r}^{2})

. Denote the event

P_{σ | G} (σ = \bar{σ}) > exp (- C n) P_{σ | G} (σ = 1_{n})

as

D^{'} (\bar{σ}, C)

, which can be transformed as:

\begin{matrix} (1 + \frac{γ log n}{n}) (\sum_{{\bar{σ}}_{i} \neq {\bar{σ}}_{j}, X_{i} = X_{j}} Z_{i j} + \sum_{{\bar{σ}}_{i} \neq {\bar{σ}}_{j}, X_{i} \neq X_{j}} Z_{i j}) \\ \leq \frac{γ log n}{n} N_{\bar{σ}} + \frac{C}{β} n \end{matrix}

(43)

Firstly we estimate the order of

N_{\bar{σ}}

, and obviously

N_{\bar{σ}} \leq \frac{1}{2} n^{2}

. Using the conclusion in Appendix A of [18], we have:

\sum_{r = 0}^{k - 1} n_{r}^{2} \leq \{\begin{matrix} n n_{0} & n_{0} \leq \frac{n}{2} \\ n^{2} - 2 n_{0} (n - n_{0}) & n_{0} > \frac{n}{2} \end{matrix}

(44)

By assumption of

dist (\bar{σ}, 1_{n}) \geq \frac{n}{\sqrt{log n}}

, we have

n_{0} \leq n - \frac{n}{\sqrt{log n}}

and

n_{0} \geq \frac{n}{k}

follows from

n_{0} \geq n_{r}

. When

n_{0} > \frac{n}{2}

, we have

N_{\bar{σ}} \geq n_{0} (n - n_{0}) \geq \frac{n^{2}}{\sqrt{log n}} (1 + o (1))

. The second inequality is achieved if

n_{0} = n - \frac{n}{\sqrt{log n}}

. When

n_{0} < \frac{n}{2}

,

N_{\bar{σ}} \geq \frac{n^{2} - n n_{0}}{2} \geq \frac{n^{2}}{4}

and the second inequality is achieved when

n_{0} = \frac{n}{2}

. Thus generally we have

\frac{n^{2}}{\sqrt{log n}} (1 + o (1)) \leq N_{\bar{σ}} \leq \frac{n^{2}}{2}

.

Since

\frac{C}{β} n = o (\frac{log n}{n} N_{\bar{σ}})

we can rewrite Equation (43) as:

(\sum_{\begin{matrix} {\bar{σ}}_{i} \neq {\bar{σ}}_{j} \\ X_{i} = X_{j} \end{matrix}} - Z_{i j} + \sum_{\begin{matrix} {\bar{σ}}_{i} \neq {\bar{σ}}_{j} \\ X_{i} \neq X_{j} \end{matrix}} - Z_{i j}) \geq - γ \frac{log n N_{\bar{σ}}}{n} (1 + o (1))

(45)

Let

N_{1} = \sum_{{\bar{σ}}_{i} \neq {\bar{σ}}_{j}, X_{i} = X_{j}} 1

and

N_{2} = \sum_{{\bar{σ}}_{i} \neq {\bar{σ}}_{j}, X_{i} \neq X_{j}} 1 = N_{\bar{σ}} - N_{1}

.

Using the Chernoff inequality we have:

\begin{matrix} P_{G} (D^{'} (\bar{σ}, C)) & \leq {(E [exp (- s Z)])}^{N_{1}} {(E [exp (- s Z^{'})])}^{N_{2}} \\ \cdot exp (γ \frac{log n N_{\bar{σ}} s}{n} (1 + o (1))) \\ = exp (\frac{log n}{n} (1 + o (1)) (e^{- s} - 1) (a N_{1} + b N_{2}) \\ + γ \frac{log n N_{\bar{σ}} s}{n} (1 + o (1))) \end{matrix}

Since

s > 0

and

a > b

, we further have:

\begin{matrix} P_{G} (D^{'} (\bar{σ}, C)) & \leq exp (\frac{N_{\bar{σ}} log n}{n} (b (e^{- s} - 1) + γ s + o (1))) \end{matrix}

Let

h_{b} (x) = x - b - x log \frac{x}{b}

, which satisfies

h_{b} (x) < 0

for

0 < x < b

, and take

s = - log \frac{γ}{b} > 0

, using

N_{\bar{σ}} \geq \frac{n^{2}}{\sqrt{log n}}

we have:

\begin{matrix} P_{G} (D^{'} (\bar{σ}, C)) & \leq exp (N_{\bar{σ}} \frac{log n}{n} h_{b} (γ) (1 + o (1))) \\ \leq exp (h_{b} (γ) n \sqrt{log n} (1 + o (1))) \end{matrix}

□

Proof of Theorem 2.

(1) Since

P_{σ} (σ \notin S_{k} (X)) = \sum_{f \in S_{k}} P_{σ} (σ \neq f (X) | D (σ, f (X))) P_{σ} (D (σ, f (X)))

, we only need to establish

P_{σ} (σ \neq X | D (σ, X)) \leq n^{\tilde{g} (β) / 2 + ϵ}

. From Lemma 9, we can find

G_{n}^{(r)}

for

r = 1, \dots, n

. Let

G_{n}^{'} = \cap_{r = 1}^{n} G_{n}^{(r)}

and choose

\frac{ϵ}{2}

; from Equations (32) and (34), we have:

\begin{matrix} P_{G} (G_{n}^{' c}) & = P_{G} (\cup_{r = 1}^{n} {(G_{n}^{(r)})}^{c}) \\ \leq \sum_{r = 1}^{n / \sqrt{log n}} n^{r (\tilde{g} (β) / 2 + ϵ / 2)} + n e^{- n} \\ \leq \frac{1}{2} n^{\tilde{g} (β) / 2 + ϵ} \end{matrix}

where the last equality follows from the estimation of sum of geometric series. On the other hand, for every

G \in G_{n}^{'}

, from Equations (33) and (35), we have:

\begin{matrix} \frac{P_{σ | G} (σ \neq X | D (σ, X))}{1 - P_{σ | G} (σ \neq X | D (σ, X))} & = \frac{P_{σ | G} (σ \neq X | D (σ, X))}{P_{σ | G} (σ = X | D (σ, X))} \\ < \sum_{r = 1}^{n / \sqrt{log n}} n^{r \tilde{g} (β) / 2} + n e^{- n} \end{matrix}

from which we can get the estimation

P_{σ | G} (σ \neq X | D (σ, X)) \leq \frac{1}{2} n^{\tilde{g} (β) / 2 + ϵ}

. Finally,

\begin{matrix} P_{σ} (σ \neq X | D (σ, X)) & = \sum_{G \in G_{n}^{'}} P_{G} (G) P_{σ | G} (σ \neq X | D (σ, X)) \\ + P_{G} (G_{n}^{' c}) & \leq n^{\tilde{g} (β) / 2 + ϵ} . \end{matrix}

(2) When

β < β^{*}

, using Lemma 11, for every

G \in G_{n}^{'}

we can obtain:

\frac{1 - P_{σ | G} (σ = X | D (σ, X))}{P_{σ | G} (σ = X | D (σ, X))} \geq (1 + o (1)) n^{g (β_{n})}

We then have:

P_{σ | G} (σ = X | D (σ, X)) \leq (1 + o (1)) n^{- g (β_{n})}

Then:

\begin{matrix} P_{σ} (σ = X | D (σ, X)) \leq P (G_{n}^{' c}) \\ + \sum_{G \in G_{n}^{'}} P_{G} (G) P_{σ | G} (σ = X | D (σ, X)) \\ \leq (1 + o (1)) n^{- g (β_{n})} + (1 + o (1)) max {n^{g (\bar{β})}, n^{\tilde{g} (2 β_{n}) - 2 g (β_{n}) + ϵ}} \\ \leq (1 + o (1)) max {n^{g (\bar{β})}, n^{- g (β) + ϵ}} \end{matrix}

(3) When

γ < b

, for any

f \in S_{k}

, we have

dist (f (X), Λ) = \frac{(k - 1) n}{k} > \frac{n}{\sqrt{log n}}

. Therefore, using Lemma 12, we can find a graph

G_{n}^{'}

such that

P_{C} (G_{n}^{'}) \leq exp (- n C)

and for any

G \notin G_{n}^{'}

,

P_{σ | G} (σ = f (X)) \leq exp (- C n)

. Therefore,

\begin{matrix} P_{a} ({\hat{X}}^{*}) \leq P_{G} (G_{n}^{'}) + k! exp (- C n) = (1 + k!) exp (- C n) \end{matrix}

The conclusion of

P_{a} ({\hat{X}}^{*}) \leq exp (- C n)

follows since C can take any positive value. □

8.3. Proof of Theorem 3

Lemma 13

(Lemma 8 of [7]). Let m be a positive number larger than n. When

Z_{1}, \dots, Z_{m}

are i.i.d. Bernoulli(

\frac{b log n}{n}

) and

W_{1}, \dots, W_{m}

are i.i.d Bernoulli(

\frac{a log n}{n}

), independent of

Z_{1}, \dots, Z_{m}

, then:

P (\sum_{i = 1}^{m} (Z_{i} - W_{i}) \geq 0) \leq exp (- \frac{m log n}{n} {(\sqrt{a} - \sqrt{b})}^{2})

(46)

Proof of Theorem 3.

Let

P_{F}^{(r)}

denote the probability when there is

σ

satisfying

dist (σ, X) = r

and

H (σ) < H (X)

.

From Equation (23), when

σ

differs from X only by one coordinate, from Lemma 13 the probability for

H (σ) < H (X)

is bounded by

P_{G} (A_{r}^{s} - A_{r}^{0} > 0) \leq n^{- \frac{{(\sqrt{a} - \sqrt{b})}^{2}}{k}}

. Therefore,

P_{F}^{(1)} \leq (k - 1) n^{g (\bar{β})}

. Using Lemma 7, we can get

P_{F}^{(r)} \leq {(k - 1)}^{r} n^{r g (\bar{β})}

for

r \leq \frac{n}{\sqrt{log n}}

. For

r \geq \frac{n}{\sqrt{log n}}

, choosing

C = 0

in Lemma 8 we can get

\sum_{r \geq \frac{n}{\sqrt{log n}}} P_{F}^{(r)} \leq e^{- n}

. That is, the dominant term is

\sum_{r \leq \frac{n}{\sqrt{log n}}} P_{F}^{(r)}

since the other part decreases exponentially fast. Therefore, the upper bound for error rate of

{\hat{X}}^{'}

is:

\begin{matrix} P_{F} = & \sum_{r = 1}^{n} P_{F}^{(r)} \leq (1 + o (1)) \sum_{r = 1}^{\infty} {(k - 1)}^{r} n^{r g (\bar{β})} \\ \leq & (1 + o (1)) \frac{(k - 1) n^{g (\bar{β})}}{1 - (k - 1) n^{g (\bar{β})}} = (k - 1 + o (1)) n^{g (\bar{β})} \end{matrix}

When

σ \in W^{*}

, since

| {r \in [n] | σ_{r} = ω^{i}} | = I_{i}^{'} + \frac{n}{k} - I_{i}

, we have

I_{i}^{'} = I_{i}

. From Lemma 6, we can see

| B_{\bar{σ}} | = | A_{\bar{σ}} |

and

N_{\bar{σ}} = 0

. Then from Equation (24),

H (\bar{σ}) < H (X)

is equivalent with

B_{\bar{σ}} > A_{\bar{σ}}

.

Therefore, when

dist (\bar{σ}, X) \geq \frac{n}{\sqrt{log n}}

and

D (σ, X)

, from Equation (31), we have

A_{\bar{σ}} \geq \frac{n^{2}}{k^{2} (k - 1) \sqrt{log n}} (1 + o (1))

. We use Lemma 13 by letting

m = | A_{\bar{σ}} |

when

m \geq \frac{n}{\sqrt{log n}}

; the error term is bounded by

\sum_{r \geq \frac{n}{\sqrt{log n}}} P_{F}^{(r)} \leq \sum_{r \geq \frac{n}{\sqrt{log n}}} {(k - 1)}^{r} exp (- \frac{{(\sqrt{a} - \sqrt{b})}^{2}}{k^{2} (k - 1)} n \sqrt{log n})) \leq exp (- n)

, which decreases exponentially fast. For

m < \frac{n}{\sqrt{log n}}

, we can use Lemma 7 directly by considering

\sum_{r = 2}^{\frac{n}{\sqrt{log n}}} P_{F}^{(r)}

. The summation starts from

r = 2

since

σ \in W^{*}

. Therefore,

\begin{matrix} P_{F} = & \sum_{r = 2}^{n} P_{F}^{(r)} \leq (1 + o (1)) \sum_{r = 2}^{\infty} {(k - 1)}^{r} n^{r g (\bar{β})} \\ \leq & ({(k - 1)}^{2} + o (1)) n^{2 g (\bar{β})} . \end{matrix}

□

Author Contributions

The work of this paper was conceptualized by M.Y., who also developed the mathematical analysis methodology for the proof of the main theorem of this paper. All simulation code was written by F.Z., who also extended the idea of M.Y., concreted the proof and wrote this paper. The work is supervised and funded by S.-L.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Natural Science Foundation of China under Grant 61807021, in part by the Shenzhen Science and Technology Program under Grant KQTD20170810150821146, and in part by the Innovation and Entrepreneurship Project for Overseas High-Level Talents of Shenzhen under Grant KQJSCX20180327144037831.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SBM	Stochastic Block Model
SSBM	Symmetric Stochastic Block Model
MSE	Mean Squared Error
ML	maximum likelihood
MQ	maximum modularity

References

Fortunato, S. Community detection in graphs. Phys. Rep. 2010, 486, 75–174. [Google Scholar] [CrossRef] [Green Version]
Feng, H.; Tian, J.; Wang, H.J.; Li, M. Personalized recommendations based on time-weighted overlapping community detection. Inf. Manag. 2015, 52, 789–800. [Google Scholar] [CrossRef]
Hendrickson, B.; Kolda, T.G. Graph partitioning models for parallel computing. Parallel Comput. 2000, 26, 1519–1534. [Google Scholar] [CrossRef] [Green Version]
Cline, M.S.; Smoot, M.; Cerami, E.; Kuchinsky, A.; Landys, N.; Workman, C.; Christmas, R.; Avila-Campilo, I.; Creech, M.; Gross, B.; et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2007, 2, 2366. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 2017, 18, 6446–6531. [Google Scholar]
Abbe, E.; Bandeira, A.S.; Hall, G. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory 2015, 62, 471–487. [Google Scholar] [CrossRef] [Green Version]
Mossel, E.; Neeman, J.; Sly, A. Consistency thresholds for the planted bisection model. Electron. J. Probab. 2016, 21, 24. [Google Scholar] [CrossRef] [Green Version]
Abbe, E.; Sandon, C. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Proceedings of the 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 17–20 October 2015; pp. 670–688. [Google Scholar]
Nowicki, K.; Snijders, T.A.B. Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc. 2001, 96, 1077–1087. [Google Scholar] [CrossRef]
Ising, E. Beitrag zur theorie des ferromagnetismus. Z. Für Phys. 1925, 31, 253–258. [Google Scholar] [CrossRef]
Ye, M. Exact recovery and sharp thresholds of Stochastic Ising Block Model. arXiv 2020, arXiv:2004.05944. [Google Scholar]
Liu, J.; Liu, T. Detecting community structure in complex networks using simulated annealing with k-means algorithms. Phys. A Stat. Mech. Appl. 2010, 389, 2300–2309. [Google Scholar] [CrossRef]
Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef] [Green Version]
Potts, R.B. Some generalized order-disorder transformations. In Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 1952; Volume 48, pp. 106–109. [Google Scholar]
Liu, L. On the Log Partition Function of Ising Model on Stochastic Block Model. arXiv 2017, arXiv:1710.05287. [Google Scholar]
Berthet, Q.; Rigollet, P.; Srivastava, P. Exact recovery in the Ising blockmodel. Ann. Stat. 2019, 47, 1805–1834. [Google Scholar] [CrossRef] [Green Version]
Chen, Y.; Suh, C.; Goldsmith, A.J. Information recovery from pairwise measurements. IEEE Trans. Inf. Theory 2016, 62, 5881–5905. [Google Scholar] [CrossRef] [Green Version]
Hajek, B.; Wu, Y.; Xu, J. Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inf. Theory 2016, 62, 2788–2797. [Google Scholar] [CrossRef] [Green Version]
Saad, H.; Nosratinia, A. Community detection with side information: Exact recovery under the stochastic block model. IEEE J. Sel. Top. Signal Process. 2018, 12, 944–958. [Google Scholar] [CrossRef] [Green Version]
Newman, M.E. Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E 2016, 94, 052315. [Google Scholar] [CrossRef] [Green Version]
He, J.; Chen, D.; Sun, C. A fast simulated annealing strategy for community detection in complex networks. In Proceedings of the 2016 2nd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 14–17 October 2016; pp. 2380–2384. [Google Scholar]
Clauset, A.; Newman, M.E.; Moore, C. Finding community structure in very large networks. Phys. Rev. E 2004, 70, 066111. [Google Scholar] [CrossRef] [Green Version]
Diaconis, P.; Saloff-Coste, L. What do we know about the Metropolis algorithm? J. Comput. Syst. Sci. 1998, 57, 20–36. [Google Scholar] [CrossRef] [Green Version]
Frigessi, A.; Martinelli, F.; Stander, J. Computational Complexity of Markov Chain Monte Carlo Methods for Finite Markov Random Fields. Biometrika 1997, 84, 1–18. [Google Scholar] [CrossRef]
Erdős, P.; Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 1960, 5, 17–60. [Google Scholar]
Holland, P.W.; Leinhardt, S. A Method for Detecting Structure in Sociometric Data. Am. J. Sociol. 1970, 76, 492–513. [Google Scholar] [CrossRef]

Figure 1. Illustration of Theorem 2.

Figure 2. Experimental results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, F.; Ye, M.; Huang, S.-L. Exact Recovery of Stochastic Block Model by Ising Model. Entropy 2021, 23, 65. https://doi.org/10.3390/e23010065

AMA Style

Zhao F, Ye M, Huang S-L. Exact Recovery of Stochastic Block Model by Ising Model. Entropy. 2021; 23(1):65. https://doi.org/10.3390/e23010065

Chicago/Turabian Style

Zhao, Feng, Min Ye, and Shao-Lun Huang. 2021. "Exact Recovery of Stochastic Block Model by Ising Model" Entropy 23, no. 1: 65. https://doi.org/10.3390/e23010065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exact Recovery of Stochastic Block Model by Ising Model

Abstract

1. Introduction

2. Related Works

3. Stochastic Block Model and Parameter Estimation

4. Ising Model for Community Detection

5. Community Detection via Energy Minimization

6. Community Detection Based on Metropolis Sampling

7. Conclusions

8. Proof of Main Theorems

8.1. Proof of Theorem 1

8.2. Proof of Theorem 2

8.3. Proof of Theorem 3

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI