Inference for a generalised stochastic block model with unknown number of blocks and non-conjugate edge models

The stochastic block model (SBM) is a popular model for capturing community structure and interaction within a network. Network data with non-Boolean edge weights is becoming commonplace; however, existing analysis methods convert such data to a binary representation to apply the SBM, leading to a loss of information. A generalisation of the SBM is considered, which allows edge weights to be modelled in their recorded state. An eﬀective reversible jump Markov chain Monte Carlo sampler is proposed for estimating the parameters and the number of blocks for this gener-alised SBM. The methodology permits non-conjugate distributions for edge weights, which enable more ﬂexible modelling than current methods as illustrated on synthetic data, a network of brain activity and an email communication network.

(2) 104 This framework may be extended to an edge-weight distribution G with multiple parameters. 105 For example, if G represents the normal distribution, then θ k = (µ k , σ k ) represents the mean and 106 standard deviation of the edge weights in block k. In this case, an additional subscript is required 107 on θ k such that θ kp is the p th parameter for block k. In the normal example, line 3 of Equation (2) 108 yields W ij |θ, Z ind ∼ Normal(Z i µZ j , Z i σZ j ).

109
The choice of distributions for G and G 0 is driven by the type of edge weight considered (i.e. edge has an explicit prior distribution. Let F 0 be a distribution on {1, 2, 3, . . .} with parameter δ, then 118 the prior for (K, Z) considered in the remainder of the paper is given in Equation (3): where Dirichlet(γ, K) is the symmetric Dirichlet distribution on the K − 1 simplex. The size of block k is the number of nodes whose block membership is k and is given by N k = Notice that the MFM gives comparatively less probability mass to small blocks than the CRP. Also, 121 the distribution for the CRP is independent of γ. Thus, the MFM approach gives more control over 122 the prior block structure. 123 The parameter ρ can be marginalised out of Equation (3) to obtain a prior density for block memberships depending only on K and γ as such: since K k=1 N k = N and where Γ(a) = ∞ 0 x a−1 e x dx is the gamma function; this is referred to as the Dirichlet-Multinomial distribution. Similarly, the conditional distribution for the block membership of node i, given K and the other block memberships Z −i is: In the remainder of this article, the generalised SBM (GSBM) used is:

125
where G 0 and G are specified by the modeller. The prior on (K, Z) will be referred to as the 126 DMA(γ, δ) (Dirichlet-Multinomial Allocation) prior. When a model G is defined, we refer to the 127 specific form of the model as G-SBM. This section discusses the benefit of split-merge steps over Gibbs samplers for mixture models, 130 describes the difficulty that arises when designing split-merge moves for block membership in the 131 GSBM, and presents a split-merge RJMCMC sampler for the GSBM. This algorithm draws samples 132 from the posterior distribution of (K, Z, θ).

133
For models containing a mixture component (such as the block structure in Mørup and Schmidt, 134 2012;McDaid et al., 2013) a Gibbs sampler can get stuck in local modes of the posterior. Consider 135 two "true" blocks k and l with sizes N k ≥ N l and a state s of a Gibbs sampler with a block k s 136 consisting of all nodes in true blocks k and l. For the Gibbs sampler to separate the nodes in k s 137 into blocks k and l, it will require at least N l steps, each of which takes a node assigned to k s 138 and assigns it to a new block l s . Each of these moves is quite unlikely, especially if the parameters 139 θ k , θ l are close to θ 0 . On the other hand, if all nodes could be moved at once, then the proposal 140 would be more likely to be accepted. This is a common problem with Gibbs sampling algorithms: block affects all nodes with an edge to i. This implies that the prior will penalise the split move inversely proportional to block size, at random, etc. In this paper, for simplicity, the pair k, l is 175 chosen with probability 1/K s (K s − 1). Secondly, the block membership Z is updated. This is 176 deterministic: any node that is a member of block k or l in Z s is assigned to block k in Z . All is to take the mean value: θ k = θ k /2 + θ l /2; however, to allow more flexibility in the sampler, 181 an uneven merge is considered using a weighted mean with tuning parameter λ ∈ (0, 1). Since the 182 split move will invert the merge move, a matching function m is required to ensure that parameters 183 lie in the correct space. For example, a rate parameter must be positive, whereby a suitable choice 184 for m is the exponential function. Possible matching functions for some common parameter spaces 185 are shown in Table 1. The full parameter proposal during a merge move is shown in Equation (5): Algorithm 1 Reversible jump Markov Chain Monte Carlo sampler for the GSBM with unknown K: split-merge algorithm. Inputs: edge-weight data w, prior parameters α, γ, δ, sampler parameters λ, ν, σ.
with probability 1/2 propose a split or a merge end if if There are no empty blocks then Propose adding an empty block else with probability N ∅ N ∅ +ν attempt deleting an empty block. or with probability ν N ∅ +ν attempt adding an empty block. end if for i = 1, . . . , N do for k = 1, . . . , K s do Finally, the acceptance probability A merge is computed (see Appendix A) and the next state of the sampler (K s+1 , Z s+1 , θ s+1 ) is taken as (K , Z , θ ) with probability A merge , and as (K s , Z s , θ s ) 189 otherwise.
190 Table 1: Possible matching functions to ensure parameters lie in the correct space.

Range for
The split proposal takes a state (K s , Z s , θ s ) and proposes a new state (K , Z , θ ) with K = 192 K s + 1. Firstly, the block to split is chosen at random. Possible mechanisms include sampling at 193 random among the K s blocks, proportional to block size, etc. In this paper the block is chosen 194 uniformly amongst the K s blocks. To mirror the notation of the merge move, the block to split is 195 labelled k , and the proposed new blocks k and l.

196
The first step in a split move determines the new block parameters. This requires the inverse of Equation (5). On top of this, an auxiliary variable u is needed to match the dimension of the parameter space. In this work, u ∼ Normal 0, σ 2 and represents the weighted difference of the mapped parameters m(θ k ) and m(θ l ). The parameter split is thus: Note that the dimension-matching criterion of RJMCMC (Green, 1995) is achieved since the 197 vectors (θ k , u , λ ) and (θ k , θ l , λ) have the same cardinality.

198
To determine Z , the nodes assigned to block k in Z s are reassigned to blocks k and l. In a this is the order in which nodes will be reassigned to block k or l.

209
When assigning node i, the following quantity can be calculated: Node i is then assigned to block k with probability q(Z i = k) and to block l otherwise. Once 210 assigned, i is moved from I to J for the next assignment. The total proposal probability of the new block assignment is thus: Finally, the proposed split is accepted as the next state of the sampler with probability A split To allow the sampler to explore the parameter space, an additional two moves are included: a 215 Gibbs-like move (which allocates each node to a block proportional to the posterior density) and a 216 move that allows the addition and deletion of empty blocks.

217
The Gibbs-like allocation move for node i computes the conditional posterior value for i being 218 a member of each of the K blocks in the current state of the sampler. Since K is finite, this set of 219 posterior values can trivially be normalised to a probability vector, such that p ik is the probability 220 that node i is reassigned to block k. Thanks to the structure of the GSBM, p ik can be written as 221 the product of two densities: the posterior density of edge weights to nodes in block k, and the 222 posterior density of edge weights to nodes in other blocks: Notice it is possible to reassign i to its current block. This move, as well as the split move, can 224 leave a block empty; waiting for the sampler to merge an empty block with another block can leave 225 empty blocks in the sampler state for some time, adding to the uncertainty around the number of 226 blocks K. A proposal that addresses these concerns is considered in the next section.
The sampler is implemented in the R package "SBMSplitMerge" Ludkin (2020). This package 239 is used to perform the inference in the following sections.  The generalised negative binomial distribution is parameterised by the real-valued "number of 251 failures" r > 0 and success probability p ∈ [0, 1]. If X ∼ NegBin(r, p) then:

253
Notice that the Bernoulli distribution admits a conjugate prior; therefore, existing samplers, such 254 as those introduced by Mørup and Schmidt (2012) and McDaid et al. (2013), could be applied.

255
However, for the negative binomial with both r and p unknown, no conjugate prior exists.

256
To apply the GSBM, the prior on K and Z was set to a DMA distribution with hyperparameters 257 set to (γ, δ) = (1, 10). The parameter values used for each of the edge-weight models is given in 258  Parameter θ 0 θ 1 θ 2 θ 3 θ 4 Bernoulli(p) 0.05 0.4 0.5 0.6 0.7 Negative binomial(p, r) (0.5, 1) (0.5, 1) (0.5, 4) (0.5, 5) (0.5, 6) In both cases, a random walk Metropolis-Hastings step was applied to θ on a transformed 263 scale with standard-deviation 0.1. A draw from the prior was taken as the initial state then the 264 split-merge sampler of Section 3 ran for 10,000 iterations with 5000 iterations discarded as burn-in.

265
To evaluate the performance of the algorithm, the ability to detect the true number of blocks, 266 block structure and parameter values are considered. To measure the ability to detect block struc-267 ture, the posterior joint probabilities that two nodes belong to the same block are calculated after 268 burn-in, via:

270
where S contains the indices of samples remaining after burn-in.

271
The parameter estimates can be compared to the true values in Table 2. Note that the model in  Table 2 for the Bernoulli network.

285
For the negative binomial network, Figure 2b shows that blocks 2, 3 and 4 are well identified 286 by the sampler. As for the block 1, recall θ 0 = θ 1 in the true parameters; this gives no structure 287 to block 1. Indeed, one could reassign the nodes in block 1 arbitrarily between two blocks 1a and  Table 3 lead to similar conclusions: the estimates for parameters θ 0 , θ 2 , θ 3 and θ 4 are good, 295 but, the poor specification of block 1 leads to poor estimates of θ 1 . configurations -one with all nodes assigned to one block and the other with each node assigned to 300 a unique block.

301
In the first case, the mean and variance of the parameter values are used as summary statistics of 302 the sampler performance, which are recorded at every iteration of the sampler. The Gelman-Rubin 303 statistics for the sampler for each model are shown in Table 4 based on 30 independent chains.

304
These values are close to 1, indicating that convergence appears to have occurred during the first 305 10,000 iterations.

306
The second technique for assessing convergence is inspired by perfect simulation: starting two

Real data 312
The split-merge sampler is demonstrated on real networks: a network of brain connectivity with 313 binary edge weights in Section 5.1 and a network of emails with count data for edge weights in 314 Section 5.2.

Macaque sensory data 316
The first data set analysed concerns the brain of a macaque monkey (Négyessy et al., 2006).

317
Regions of the cortex were deemed connected, or not, during a sensory task. In total, 45 regions of 318 the brain were analysed as a network.   Table 5 together with 5% and 95% quantiles and the effective sample size. The parameters for 337 smaller blocks have wider confidence intervals; this is expected since there are fewer edge weights 338 governed by those parameters. Note that parameter θ 5 is more uncertain; this is due to the block consisting of two nodes, meaning that θ 5 only governs one edge weight. The effective sample size 340 cannot be computed for this parameter since it is absent in many iterations when the block has 341 been merged with another block.  Gamma(1,1) prior and (ii) a negative binomial with a Gamma(1,1) prior for r and a Beta(1,1) prior 350 for p. In both cases a DMA(1,10) joint prior is placed on K, Z. On a first analysis, the mean 351 number of emails sent by any one employee is 3.7, whilst the variance is 4753, so a Poisson model 352 seems a bad fit a priori. The split-merge algorithm of Section 3 was applied with 10,000 iterations 353 and 1500 discarded as burn-in.
assignments, the nodes are ordered by block label. This ordering applied to the log edge-weight matrix W and P in Figure 5a and Figure 5b respectively. The negative binomial model is more 358 flexible and is thus able to more easily detect structure in the network compared to the Poisson 359 model. This is exemplified in the ordered plot of the log edge weights in Figures 5a and 6a. 360 Furthermore, the fit using the Poisson distribution for edge weights finds one large group (fourth 361 from the left in Figure 5b) with a low incidence of sent emails. This group corresponds to parameter 362 λ 4 , which has a posterior mode of 0.19. Under the negative binomial distribution, the low-incidence 363 group is much smaller, with modal parameters r 9 = 0.004 and p 9 = 0.012 giving an expected number 364 of emails sent by a node in block nine as r(1 − p)/p 0.33. The modal parameter values for each 365 model are given in Table 6 together with the 5% and 95% quantiles. For simplicity, the models presented in Section 2 assume all edges are present in the network 383 and that each edges has a recorded edge weight. This assumption can be relaxed in (at least) two Since a merge move is the inverse of a split move, A merge = 1/A split , hence only A split is derived.