A Bayesian Inference Method Using Monte Carlo Sampling for Estimating the Number of Communities in Bipartite Networks

Community detection is an important analysis task for complex networks, including bipartite networks, which consist of nodes of two types and edges connecting only nodes of different types. Many community detection methods take the number of communities in the networks as a fixed known quantity; however, it is impossible to give such information in advance in realworld networks. In our paper, we propose a projection-free Bayesian inference method to determine the number of pure-type communities in bipartite networks. .is paper makes the following contributions: (1) we present the first principle derivation of a practical method, using the degree-corrected bipartite stochastic block model that is able to deal with networks with broad degree distributions, for estimating the number of pure-type communities of bipartite networks; (2) a prior probability distribution is proposed over the partition of a bipartite network; (3) we design aMonte Carlo algorithm incorporated with our proposedmethod and prior probability distribution. We give a demonstration of our algorithm on synthetic bipartite networks including an easy case with a homogeneous degree distribution and a difficult case with a heterogeneous degree distribution. .e results show that the algorithm gives the correct number of communities of synthetic networks in most cases and outperforms the projection method especially in the networks with heterogeneous degree distributions.


Introduction
A bipartite network is a network with nodes of two types and edges connecting only nodes of different types. e decomposition of bipartite networks into communities (clusters, modules, or groups), i.e., community detection, plays an important role in revealing the structure of large networked systems, providing new insights into how the network is organized [1][2][3][4].
Many methods [5][6][7][8][9] have been developed for community detection in bipartite networks in recent years. A fundamental shortcoming of most community detection methods is that they partition networks into a fixed number of groups. However, this number is usually unknown in realworld networks, and we need to mine such information from the network data. A lot of research [10][11][12][13] for making such efforts to determine the number of communities in bipartite networks has been proposed recently. ere are three main problems in these methods. One is that they performed estimation through maximizing the modularity proposed in [10,14] that is proved to be NP-hard [15,16]; the second is that they gave the number of communities of mixed-type, which is nearly always substantially less efficient [6]; the third is that the projection method [15] performed poorly due to information loss. e heuristic methods proposed in [17,18] for community detection in bipartite networks does not need the number of communities to be given a priori.
In this paper, we propose a projection-free Bayesian inference method for determining the number of pure-type communities in a bipartite network. Our method builds mainly on the work as below: (i) the degree-corrected bipartite stochastic block model, proposed by Larremore et al. [6], is used to find the community structure of empirical networks with broad degree distributions; and (ii) the prior probability distribution over divisions of a network into groups and a new prior probability distribution based on a queueing-type process, both proposed by Riolo et al. [4], are used for calculating the number of communities in a unipartite network.
In Section 2, first, we present the first-principle derivation of a practical method, using the degree-corrected bipartite stochastic block model that is able to deal with networks with broad degree distributions, for estimating the number of pure-type communities of bipartite networks. Second, we propose a prior probability distribution over the partition of a bipartite network, with the community-type parameter ensuring that each community is pure type. In Section 3, we design a Monte Carlo algorithm incorporated with our proposed method and prior probability distribution. In the following section, we demonstrate our method on synthetic bipartite networks including an easy case with homogeneous degree distributions and a difficult case with a heterogeneous degree distribution. e results show that the proposed algorithm can determine the correct number of communities and perform better than our projection method in every case.

Degree-Corrected Bipartite Stochastic Block Model.
e stochastic block model is a generative model used to produce networks containing blocks, groups, or communities. is model is very important in network science and is used for recovering the community structure in network data [2,3]. e classic stochastic block model can be described as follows: divide the number of vertices N into K disjoint communities; any two vertices i and j are connected by an edge with probability ω g i g j , which is an entry of a symmetric K × K matrix, and g i is the community of vertex i. However, the block model described above finds the community structure merely due to the degree sequence and fails to detect the known communities in a real-world network that has heterogeneous degree distributions [19]. Karrer and Newman [20] extended the classic stochastic block model including heterogeneity in the degrees of vertices and proposed the degree-corrected stochastic block model, which is proved to overcome the problems of the classic block model. Most stochastic block model community detection methods can be naturally applied to bipartite networks [20,21]. Unfortunately, the stochastic block model often overfits bipartite data by mixing nodes of different types within communities and it is nearly always substantially less efficient [6]. Built on the work of Karrer and Newman [20], Larremore et al. [6] proposed the degree-corrected bipartite stochastic block model, which is employed in our calculations. In the degree-corrected bipartite stochastic block model, a bipartite network G is given with an N a × N b bipartite asymmetric adjacency matrix B, where N a is the number of nodes of type-a and N b is the number of nodes of type-b. Let e type-a nodes are divided into some number k a of communities, labeled 1, . . . , k a , and the N b nodes of type-b are divided into k b communities, labeled k a + 1, . . . , k a + k b . We express the matrix of community interrelationships as a k × k matrix, where k � k a + k b . Let g i again encode the community node i belongs to. Let t i be the type of vertex i and T r be the type of community r, imposing the constraint which indicates that node types and community types must match and ensure that communities will be pure type. We write where δ is the Kronecker delta. Let θ i control the expected degree of node i and ω rs be the k × k symmetric matrix of parameters to control the number of edges between communities r and s. Following [4], the normalization of θ i can be fixed by imposing the constraint where n r � i δ g i ,r is the number of nodes in community r.
Following [22], we let the numbers of edges between nodes i and j follow a Poisson distribution with mean θ i θ j ω g i g j . Enforcing the bipartite constraint of equation (1) produces a restriction on ω: Given parameters g, k, θ, ω, and T for the specification of the mode, the probability of observing a bipartite network G with adjacency matrix A can be written as Allowing for the constraint of equation (4), the probability P(A | g, k, θ, ω, T) can be simplified to the more convenient form of where d i is the observed degree of vertexes i and m rs � i,j t i ≠ t j A ij δ g i ,r δ g j ,s is the number of edges between communities r and s. We have neglected an overall multiplicative constant in (7) since it cancels out in later calculations. Note that a similar probability given by Larremore et al. [6] has been modified in equation (7) as follows: (i) e number k of communities, the objective we will estimate, is incorporated as an unknown quantity (ii) e exponential expression is − n r n s ω rs rather than − ω rs , with the normalization of θ i under a different constraint condition en, we integrate out the irrelevant parameters θ and ω. We assume maximum-entropy (i.e., least informative) prior probability distributions on the parameters θ and ω. For θ, this means a uniform prior probability distribution over the regular simplex of values specified by equation (4). en, we let the expected value of the edge probability ω rs be equal to the observed average edge probability in the network as a whole: p � 2m/N 2 , where m is the total number of edges in the bipartite network. en, the maximum-entropy prior probability distribution is an exponential distribution P(ω) � (1/p)e − ω/p . We assume the priors to be independent (conditioned on g, k, and T) so that P(ω, θ | k, g, T) � P(ω | k, T)P(θ | k, g, T) and With these choices of priors, integration is performed on equation (8). en, we have where κ r � i d i δ g i ,r and an overall multiplying constant has been discarded.

Prior on Community Partitions.
Our goal is to estimate the correct values of k a and k b for a given bipartite network using this model as the basis for a Bayesian model selection procedure. We have where P(A | g, k, T) is given by equation (6), and the probability P(A), which in the denominator of equation (10) has no effect on our results, is unknown but cancels out in later calculations. In this paper, our primary focus is to get the posterior distribution on k through summing over g; then, we choose a value for k and calculate k a and k b using equations (2) and (3). Now, we start to choose the prior P(g, k, T), which is often the most important and difficult task of the calculation in the case of Bayesian methods.

Prior P(g | k) on Community Partitions.
If we know the number of communities k in advanced, let us choose the prior P(g | k) on community partitions of one type of node. We first employ the most commonly used approach, which is described as follows. e prior on the community partition probabilities c is uniform under the constraint r c r � 1, where c r ∈ [0, 1], with which nodes are assigned to communities independently at random. We can get a particular community partition with the probability e values c r fall on a regular (k − 1)-dimensional simplex with volume (k − 1)!, so its probability density is P(c | k) � (k − 1)!. We integrate equation (11) over the simplex and get the following equation [4,23,24]: Since the process above generates a uniform distribution over possible community sizes, we then introduce an alternative and simpler way (used by Riolo et al. [4]) to derive the prior P(g | k). We have N + k − 1 k − 1 possible ways to choose k communities with k r�1 n r � N and N!/ r n r ! possible ways to place the nodes in the k communities; thus, any partition g of nodes to communities is given with the probability the same to equation (12) without the need for parameters c r . However, these two methods may generate partitions g with empty communities. As in [4], we have the binomial coefficient N − 1 k − 1 possible choices of k communities with nonempty ones and en, we allow for two different types of partitions and get Scientific Programming

Choice of the Number of Communities.
Some previous work has been done for the choices of prior P(k) over the number of communities itself, such as letting P(k) equal 1/N [23,24] or 1/k! [21]. We again follow [4] and take a different approach, in which community partitions g and the number of communities k can be generated synchronously. We use a queueing-type mechanism for processing community partitions g. For one type, such as type-a, we order the N a nodes uniformly at random and the first node is placed in community 1. en, we place each following node either (a) with probability 1 − q in the same community as the previous node or (b) with probability q in the next community; for another type, we repeat the process above. is process ensures that all communities generated are not empty.
For type-a, there are N a ! (N b ! for type-b) possible ways to order the nodes, so the probability of each one occurring is the same as 1/N a ! (1/N b ! for type-b). If k � k a + k b communities are generated finally, we must create k − 2 new communities (k a − 1 new type-a ones and k b − 1 new type-b ones). Because for one type each node except the first starts a new community with equal probability q, k communities with sizes n 1 , . . . , n k are generated with the probability: For each community of the same partition g, the nodes can be rearranged in r n r ! ways. us, any given partition is generated in the process with the probability: Given that P(g, k, T) � P(k, T)P(g | k, T), We let q � μ/(N − 1), where the expected number of new communities created is µ. en, In equation (19), ((1 − q) N )/(q 2 N a !N b !) has no effect on our result and cancels out in later calculations.
As in [4], we let µ � 1 and neglecting constants Now, equation (10) can be written as and here we allow for equations (9) and (20). Unfortunately, it is hard to sum over g since the sum has k N terms [4]. Instead, we approximate the distribution over k (k a and k b according to different types) by Markov chain Monte Carlo sampling.

Our Algorithm.
We design a Monte Carlo algorithm incorporated with the bipartite block model and prior probability distribution discussed above to apply the bipartite networks. We call our algorithm the bipartite network Monte Carlo algorithm (BMCA), and it is built on the unipartite network analysis of Riolo et al. [4]. Our algorithm fulfills the requirements of ergodicity and detailed balance [25]. ere are two types of steps used by BMCA: Type 1: moving one node from its current community to a different existing community. ere are again two types of processes in this type of rearrangement. In the processes of the first type, BMCA decreases the number of communities (k a or k b ) of the same type as the community whose last node moved, thereby decreasing the value of k by one. In the processes of the second type, the community the node moved from contains more than one node and the move here does not change the number of communities. Type 2: moving one node to a community newly created. e number of communities (k a or k b ) of the same type as the community the node moved from and the value of k increase by one.
e two types of steps described above make BMCA meet the requirement of ergodicity.
Detailed balance requires that the rate R(g, k, T ⟶ g ′ , k ′ , T ′ ) goes from a state (g, k, T) to another state (g ′ , k ′ , T ′ ) and the opposite meet where we allow for equation (10). From (20), we have

Scientific Programming
We consider R(g, k, T ⟶ g ′ , k ′ , T ′ ) as π(g, k, T ⟶ g ′ , k ′ , T ′ )α(g, k, T ⟶ g ′ , k ′ , T ′ ), where the previous part of the product represents the probability of proposing a move and the latter represents the probability chosen to satisfy the detailed balance condition for accepting the move. en, BMCA is described as follows: Input: the bipartite adjacency matrix B and the nodetype vector t i . Initial communities partition: using the process described in Section 2.2.2. Monte Carlo Sampling: (1) (a) In each step of BMCA, we carry out a rearrangement of type 1 with probability 1 − 1/(N − 1). If k � 2 (i.e., k a � 1 and k b � 1), we do nothing. Otherwise, when k > 2, first, we randomly select a community label r in the range 1, . . . , k. If the number of communities of type T r is more than one, then we randomly select a community of type T r labels s; otherwise, we turn to communities of another type, from which we randomly reselect a pair of communities, respectively, labels r and s. en, we randomly select one node from community r and move it to community s. e number k � k a + k b of total communities remains constant. (b) In the process, if community r becomes empty as a result that its last node is removed, the number of communities k decreases by one. In practice, for the type-a communities labels 1, . . . , k a , we can efficiently change the community k a to have label r and then change the community k to have label k a ; specially, if r � k a , we only perform the latter relabeling. In such a process, the number k a of communities of type-a decreases by 1 and the number k b of communities of type-b remains constant.
For type-b community labels k a + 1, . . . , k, we can efficiently change the community k to have label r; specifically, if r � k, no relabeling is necessary. In such a process, the number k b of communities of type-b decreases by 1 and the number k a of communities of type-a remains constant. e number k a � k a + k b of total communities decreases by 1.
(2) Otherwise, we carry out a rearrangement of type 2 with probability 1/(N − 1). We randomly select a community label r in the range 1, . . . , k. If there is only one node in community r, we do nothing.
Otherwise, if T r � type − a, we change community label k a + 1 to k + 1 and create a new empty community k a + 1. en, we randomly select a node from community r and move it to the newly created community k a + 1. During this process, the number k a of communities of type-a increases by 1 and the number k b of communities of type-b remains constant. If T r � type − b, we simply create a new empty community k + 1 and no relabeling is necessary. en, we randomly select a node from community r and move it to the newly created community k + 1; during this process, the number k b of communities of type-b increases by 1 and the number k a of communities of type-a remains constant. e number k � k a + k b of total communities increases by 1.
(3) We accept the rearrangement proposed above with acceptance probability [4]: (4) Repeat steps 1-3. Output: the posterior probabilities P(k a | A) and Rearrangements of any type are performed, and we always have the following equation (see Table 1): Taking into consideration equations (22) and (24), the detailed balance condition can be written as erefore, our algorithm satisfies the detailed balance with acceptance probability of equation (25) and will sample correctly from the distribution P(g, k, T | A).

Output of BMCA.
In our implementation, a given number of steps t i per node is performed in a Monte Carlo run on a bipartite network, and then we write approximate posterior probabilities: the times k showed out in a Monte Carlo run S .

k′, T′))/(P(g, k, T))
(using equation (23)) Scientific Programming P k a | A � the times k a showed out in a Monte Carlo run S , and for type-b, e most likely number of type-a communities in a bipartite network is k a with the biggest value P(k a | A), and the number of type-b communities k b can be given in the same way.
In order to avoid any bias in the results and improve the correctness of BMCA, instead of just using P(k a | A), we performed a given number M of Monte Carlo runs for a bipartite network. us, we obtained the average value of each P i (k a | A) as the final posterior probabilities: for type-a and for type-b. We take O (N + N a  *  N b ) to calculate the acceptance probability of each node move, so the time complexity of our

Example Application
Here, we demonstrate our algorithm on synthetic bipartite networks generated by the bipartite stochastic block model [6] and find that it works well in most cases. ere is often some noise in empirically observed networks because of both errors in the measurements and missing data [26]. erefore, we employ a mixing model to generate noisy synthetic networks for testing the robustness of our algorithm. In this model, we specify g, and the expected value of an edge is given by where the parameters ω planted and ω random are used to generate a pure planted community structure and no community structure, respectively; and the mixing parameter λ ∈ [0, 1] is used to control various levels of uniformly random noise. Following Larremore et al. [6], we let ω planted rs � m rs and ω random rs � κ r κ s /m, where θ i � d i /κ g i . We employ this model to create synthetic networks of an easy case with a homogeneous degree distribution and a difficult case with a heterogeneous degree distribution.
We performed M � 10 Monte Carlo runs for each network of S � 10000 steps per node and found that BMCA can determine the correct number of communities for synthetic networks and outperform the projection method in every case.

An Easy Case.
In the easy case, we use the model above to create the synthetic networks including four type-a communities and four type-b communities, i.e., k a � k b � 4, and each node has the same degree (i.e., the network with planted community structure has a homogeneous degree distribution). We let the number of each type of node equal to 1000, and all communities are equally sized as 250. en, we let m 1,5 � m 2,6 � m 3,7 � m 4,8 � 2500 (i.e., the total number of edges is 10000), and let the block structure matrix ω planted be defined with ω � 2500 (the symmetric entry has the same value). Moreover, the random structure matrix ω random can be defined with ω random i,j � 625 (i � 1, 2, 3, 4; j � 5, 6, 7, 8). Finally, with these specifications, we create networks of the easy case to test our algorithm (the code to create these networks can be downloaded from [27]).
As the mixing parameter λ increases, i.e., the level of noise is decreased, BMCA begins to estimate the correct number of communities for the network in an easy case when λ � 0.5; in addition, the fraction of correct communities number of the network calculated by BMCA increases as a whole (blue line in Figure 1). en, we use our method to derive the synthetic mixing networks generated with λ � 0.6 and λ � 0.65, indicated by red circles in Figure 1(a) and Figure 1(b) and showing the posterior probabilities of the number of communities in the network in Figure 2(a) and Figure 2(b). As shown in Figures 1 and 2, we are able to estimate the correct number of different types of communities when λ ≥ 0.65 with P BMCA ′ (k a � 4 | A) � 0.6 (P BMCA ′ (k b � 4 | A) � 0.6); then, when λ ≥ 0.8, the proposal probability of the correct number ) is always bigger than P PM ′ (k a � 4 | A)(P PM ′ (k b � 4 | A)), as shown in Figure 3.

A Difficult Case.
In the difficult case, the synthetic networks are created with two type-a communities (k a � 2) and three type-b communities (k a � 3), and the degree of each node is different (i.e., the network with planted community structure has a heterogeneous degree distribution). e communities are set with different sizes, and we divide 700 type-a nodes evenly into 2 communities {350, 350} and 300 type-b nodes into 3 communities {100, 150, 150}. Let m 1,3 � m 2,4 � 2500 and m 1,5 � m 2,5 � 1500; i.e., the total number of edges is 8000. en, the block structure matrix ω planted can be defined with ω Planted � 1500. e symmetric entry has the same value. Finally, with the specification above, we create networks of the difficult case to test our algorithm (the code to create these networks can be downloaded from [27]).
As shown by the blue line in Figure 3, when the level of noise is decreased, BMCA begins to estimate the correct number of communities for the network in a difficult case when λ � 0.5, and the posterior probabilities P ′ (k a � 2 | A) and P ′ (k b � 3 | A) increase sharply after λ ≥ 0.55. Especially, when λ ≥ 0.75, the proposal probability i.e., we are always able to determine the correct number of communities. We used our method to calculate the probability of the number of communities in the synthetic mixing networks generated with λ � 0.55 and λ � 0.6, indicated as red circles in Figure 3, and show the result in Figure 4. As seen from Figures 3 and  4, our method can estimate the correct number of communities k a � 2 and k b � 3 in the synthetic network when λ ≥ 0.6 for the difficult case. However, the projection method fails to estimate the correct number of communities even when λ � 1 and there is no noise, as shown by the red line in Figure 3.

Further Testing.
We tested our method on synthetic networks of two different sizes of community as the number of network communities increases. e networks were generated using the bipartite stochastic block model with   Table 2, where θ(i � 1, . . . , N) is heterogeneous as real-world networks and the mean node degree of the network for the figures is 10. e number of communities k a or k b estimated using BMCA was correct until the actual number of communities increased to about 7, which is about 14 for k. e results are shown in Figure 5.
e results show a tendency to underestimate the number of communities for higher actual numbers of communities, especially when the size of the community changes from 250 (Figures 5(a) and 5(b)) to 500 (Figures 5(c) and 5(d)) and that of the networks increase correspondingly. However, these calculations appear when BMCA is run with a random initialization partition of nodes to communities. When BMCA is started on the same network with community partitions corresponding exactly to the planted community structure (yellow triangles), we always find the accurate number of communities.
Even so, the underestimation of k a (or k b ) occurs not because the correct community partition fails to maximize the posterior probability P(k a | A) but rather because BMCA has not run for long enough to find the maximum. e method is theoretically sound, but when the number of possible community partitions k N increases very rapidly with k, the numerical calculation becomes too demanding [4]. We can possibly design a more efficient Monte Carlo algorithm to

Conclusions
In our paper, a new projection-free Bayesian inference method for determining the number of pure-type communities in a bipartite network has been introduced. First, we present the first principle derivation of a practical method, using the degree-corrected bipartite stochastic block model that is able to deal with networks with broad degree distributions, for estimating the number of pure-type communities of bipartite networks. Second, we propose a prior probability distribution over the partition of a bipartite network, with type parameter T ensuring that each community is pure type. ird, we design a Monte Carlo algorithm incorporated with our proposed method and prior probability distribution. We have illustrated the performance of the method with applications to a wide range of synthetic bipartite networks, including an easy case with homogeneous degree distributions and a difficult case with heterogeneous degree distributions. e results show that the proposed algorithm can determine the correct number of communities and perform better than the projection method, especially in networks with heterogeneous degree distributions.
However, our method underestimates the number of communities when the number of communities becomes large. e reason for this is due to the number of possible community partitions increasing very rapidly with the increase in the number of communities and because our algorithm has not run for long enough to find the posterior probability. us, our future work will focus on (i) finding a method that can more efficiently sample the posterior distribution over community partitions to correctly estimate a large number of communities in the network and (ii) extending applications on real-world data sets.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request. Our code and test data are available at http://my.shu.edu.cn/ Web_LWZZ.aspx?TID=2887

Conflicts of Interest
e authors declare that there are no conflicts of interest.  Figure 5: Tests of our algorithm on synthetic bipartite networks. In each subgraph, triangles represent results derived from Monte Carlo runs with the known correct partitions (ground truth initial conditions), while circles represent runs started with a random initial partition of nodes to community (random initial conditions).
Scientific Programming 11