Detecting groups of similar components in complex networks

We study how to detect groups in a complex network each of which consists of component nodes sharing a similar connection pattern. Based on the mixture models and the exploratory analysis set up by Newman and Leicht (Newman and Leicht 2007 {\it Proc. Natl. Acad. Sci. USA} {\bf 104} 9564), we develop an algorithm that is applicable to a network with any degree distribution. The partition of a network suggested by this algorithm also applies to its complementary network. In general, groups of similar components are not necessarily identical with the communities in a community network; thus partitioning a network into groups of similar components provides additional information of the network structure. The proposed algorithm can also be used for community detection when the groups and the communities overlap. By introducing a tunable parameter that controls the involved effects of the heterogeneity, we can also investigate conveniently how the group structure can be coupled with the heterogeneity characteristics. In particular, an interesting example shows a group partition can evolve into a community partition in some situations when the involved heterogeneity effects are tuned. The extension of this algorithm to weighted networks is discussed as well.


I. INTRODUCTION
As a concise abstract model, the concept of network captures the most essential ingredients of a complex system, namely, its basic component units and their interaction configuration. This advantage -simple in form but powerful in modelling -has attracted intensive studies of complex networks in a wide spectrum of contexts, ranging from natural sciences to engineering problems and human societies [1,2,3]. Roughly speaking, the investigations mainly fall into two categories: seeking the topological characteristics and their origins in one and understanding how they interact with the dynamical processes supported by the networks in the other. It has been found that topological characteristics, such as small-world [4] and scale-free [5] properties, are quite general; they are common features in a large set of networks from various fields. Moreover, they are closely related to the dynamical processes on the networks. Illuminating examples among many others include epidemic spreading, to which the surprising implications of the scale-free property have been well illustrated [6,7]; and network synchronization, where the role played by the topology can be marvellously separated and appreciated by analyzing the master stability function [8]. Such progress has greatly enhanced our belief in the significance of identification and detection of these important topological characteristics [1,2,3].
Community is another common topological feature that exists in many complex networks. Intuitively, a community refers to a set of nodes whose connections between themselves are denser than their connections to the nodes outside the set [9,10,11,12]. Community detection is very important in network studies, because communities usually govern certain functions as seen in many biochemical networks [13] and social networks [14]. Communities also have important implications to the dynamical processes based on the networks, such as synchronization [15,16,17,18], percolation and diffusion [19,20,21,22]. In addition, in networks of large size, community structure may serve as a crucial guide for reducing the network, which is believed to be helpful in shedding light on the most essential properties of a complex system [23,24]. In view of the importance of the community structure, there have been a lot of studies devoted to the issue of community detection. (See Ref. [25] for a recent and comprehensive review.) Recently, attempts have also been made to extend the community detection methods developed in these studies to weighted networks [26,27] and directed networks [28,29].
However, community is not the only perspective for partitioning a network. For example, in a bipartite network, the best justified partition is to separate all the nodes into two groups such that nodes in one group only link to the nodes in the other. Indeed, partition perspectives other than that of community is necessary in order to have a better understanding of both the structures of complex networks and the dynamical processes they support, as shown in [30] by the study of synchronous motions on bipartite networks.
An insightful idea is to partition a network into groups where nodes in each group share a similar connection pattern. As the connection patterns are various and can vary from group to group, this group model is very general and powerful in representing many different types of structures in a network. This idea has a long history. It was first introduced in social science by Lorrain and White [31], where the nodes of similar connection pat-tern are referred to as being structurally equivalent. This idea has fruitfully led to the analysis of networks in social [32] and computer science based on block modelling. A recent review can be found in Ref. [33].
In a recent study [34], Newman and Leicht came up with a novel and general partition scheme based on this idea. It divides a network into groups of similar connection pattern. The most striking advantage of their scheme lies in that it can be applied for seeking a very broad range of types of structures in networks without any prior knowledge of the structures to be detected. In addition, the algorithm thus developed is ready to be used for both the directed and undirected networks, and it is straightforward to generalize it to analyze weighted networks [35]. The efficiency of the algorithm is also high in terms of computation complexity. Recently, Ramasco and Mungan [36] have analyzed this method in detail and devised a generalized Newman and Leicht algorithm based on their study. Other than the Newman and Leicht algorithm and its variant [36], another intriguing and insightful scheme for partitioning a network into groups of similar connection pattern has also been developed based on the information theory [37].
The Newman and Leicht theory assumes that in a group the total outgoing degree must be larger than zero [36]. This assumption limits the application of their theory. In order to overcome this limitation, it has been suggested in [36] to deal with the incoming degrees, outgoing degrees, and bidirectional degrees separately. In this paper, we show that by assuming that all nodes in a group share the same a prior probability to connect unidirectionally to a given node (see analysis in Sec. III), this problem can be solved straightforwardly. The algorithm we develop based on this assumption can be applied without any restriction on the degree distribution. Moreover, the partition of a network given by our algorithm can be shown to be exactly the same as that of its complementary network (see Sec. III). This is required by the definition of a group of similar connection pattern. Another advantage of our algorithm is that it allows an analysis of the heterogeneity effects, which reveals further useful information of the network structure. In addition to all of these, our algorithm shows clearly that it is the information whether there is a link between two given nodes, rather than the link exclusively (if it exists between the two nodes), that contributes to the partition. The information that there is no link between two given nodes is equally important. This insight provides a new and different view for partitioning weighted networks. Our algorithm also inherits all the advantages of that by Newman and Leicht.
In the next section, we first review briefly the theory by Newman and Leicht, and then point out the extent of its applicability. Next, in Sec. III, we develop our algorithm based on the a priori probability assumption and discuss its properties. After that we present examples of various types of groups together with the analysis of two real networks. We discuss in Sec. IV the role played by the involved heterogeneity effects, and show how a group partition can depend on it by the example of the karate network [38]. Finally, before summarizing the results of this paper, we discuss in Sec. V how to extend our algorithm to weighted networks.

II. THE NEWMAN-LEICHT ALGORITHM (NLA)
In search of the structures in a network, a dilemma we often encounter is that we have to input initially what structures we are intending to look for but this information is however usually unavailable before the structures have been found successfully. As a result what we can find eventually may strongly depend on whether we have enough prior knowledge of the structures to be detected. To overcome this difficulty, Newman and Leicht [34] insightfully focused on the groups of similar connection pattern. In their theory, the connection pattern for a group is specified by sets of parameters to be determined. Initially, the information of these connection patterns is not required as input to the search algorithm thus designed; rather, they are shaped up during the search process (running of the algorithm) and produced as outputs. Finally, what the algorithm provides simultaneously is not only the best way for grouping the nodes, but also the common connection pattern that nodes in each group share. They made this possible by skillfully harnessing the probabilistic mixture models and the expectation-maximization algorithm [34]. As the groups of similar connection pattern are effective in modelling various structures in networks, their algorithm is very general and has a wide application spectrum.
The main points of the Newman and Leicht theory are as follows. (For the sake of convenience and clarity, we take the same notation as in [34] throughout this paper.) Let us consider a network of n nodes belonging to c groups. Its connection configuration is given by the adjacency matrix A. If there is a link between node i and node j then A ij = 1 otherwise A ij = 0. In the Newman and Leicht theory, n, c and A are assumed to be known and used as the input for their algorithm. Here the number of groups c is the only information needed in advance about the partition. If it is unavailable, it should be assumed or estimated based on other known information of the network.
Next, the connection configuration A is assumed to be a realization of an underlying statistical model defined by two sets of probabilities denoted by π ≡ {π r } and θ ≡ {θ rj }, respectively, with r = 1, · · · , c and j = 1, · · · , n. This statistical model assumes that each node has probability π r to fall in a group r and for all nodes in that group they have the same probabilityclosely related to θ rj -to connect to a given node j.
Here θ rj is equivalent to the portion of the outgoing links of group r that connect to node j. The outgoing links of group r refers to the outgoing links that all nodes in group r have.
In this sense θ r ≡ {θ rj , j = 1, · · · , n} defines the connection pattern shared by all nodes in group r. As long as π and θ are known, together with the adjacency matrix A as measured data, one can obtain the probability for observing the node i being in the group r, namely q ir ≡ Pr(g i = r|A, π, θ), and thus all the information about the group partition. Here g i represents the group to which the node i is regarded to belong in a certain partition; we use q and g to denote {q ir } and {g i } respectively.
Hence the key is to specify π and θ. Newman and Leicht assumed that the right values of the elements of π and θ are those that maximize the likelihood to observe the connection configuration A and a certain partition g, namely Pr(A, g|π, θ), or equivalently those that maximize its logarithm L = ln Pr(A, g|π, θ). (2.1) In this way, the problem is converted to a solvable fitting model problem with the help of the maximum likelihood method [34]. The next task is then reduced to find π and θ that satisfy this requirement.
To proceed further, Newman and Leicht adopted a crucial simplification: they suggested instead to maximize the averaged L over all possible partitions:  [36] one contains the left two nodes and another contains the right two; and of the two groups in the right network (b) one consists of the center node and another consists of the rest. However, due to the fact that one group, of the right two nodes in (a) and the peripheral nodes in (b), has no outgoing links, the Newman and Leicht algorithm (NLA) fails to partition them correctly. As a comparison the APBEMA has no restriction on the degree distribution; it partitions these two networks without any ambiguity.
Then π and θ that maximize L were deduced in terms of A and q as where k i ≡ j A ij denotes the outgoing degree of node i. Eqs. (2.7), (2.8) and (2.9) thus define the Newman-Leicht algorithm (NLA). It runs in an iterative way: at each step, the old values of the elements of q, π and θ are substituted into the right hand side of these equations to generate their updated values. The convergent result of θ then defines the connection patterns of groups and that of q suggests grouping. In practice, the calculation converges rapidly. (We found that the convergence time goes as ∼ O(n 2 ) in all the networks we have analyzed with the NLA, including those that are not presented in this paper.) It should be noted that in getting Eqs. (2.8) and (2.9) the following constraints imposed on π and θ have been taken into consideration: Indeed, the results given by Eqs. (2.8) and (2.9) satisfy these requirements. In addition, the results of Eq. (2.8) and Eq. (2.9) are in consistency with the definitions of π r and θ rj . In particular, Eq. (2.9) makes it clear that θ rj is the expected portion of the outgoing links of group r that connect to node j. The definition of θ rj and the corresponding normalization condition imposed by Eq. (2.11) imply that the partition given by the NLA must be such that each group has at least one outgoing link [36]. This constraint limits the application range of the NLA. An example cited in [36] (see Fig. 2 in [36]) is a directed bipartite network which is reproduced in Fig. 1(a). According to the definition of a group of similar connection pattern, this network should be partitioned into two groups such that one contains the left two nodes and one contains the right two nodes, respectively. However, as the right group has no outgoing links, NLA would suggest instead a partition into the upper two nodes and the lower two nodes, or the whole network as a single group [36]. Another example is the directed star as shown in Fig. 1(b); NLA partitions all nodes into one group though from the viewpoint of similar connection pattern or symmetry we expect the center node to be in one group and other peripheral nodes in another.

III. A PRIORI PROBABILITY BASED EXPECTATION MAXIMIZATION ALGORITHM (APBEMA)
In this section we present an expectation maximization algorithm that does not have any restriction on the degree distribution of a group. In addition, it also has many other advantages which will be discussed in the following sections. Our method is in the same spirit as the NLA, but the statistical model of the group is different.
First let us suppose the network under consideration has n nodes that belong to c groups, and the connection configuration is given by the adjacency matrix A. Similarly, we assume n, c and A are known and serve as the input.
Next, as in the NLA, we assume that each node has probability π r to fall in group r. π r in effect reflects the size of group r, which is expected to be nπ r . As any node must be in the network, we have r π r = 1. (3.1) However, to specify the connection pattern of a group, we take the a priori probability assumption instead. We assume that in a given group r all its nodes share the same a priori probability, denoted by ρ rj , to connect unidirectionally to a given node j. As such ρ rj should satisfy 0 ≤ ρ rj ≤ 1. We also assume that ρ ri is independent of ρ rj for i = j; namely, the probabilities for a node (in group r) to connect to two different nodes are completely independent. The normalization condition for ρ rj can be expressed as ρ rj + (1 − ρ rj ) = 1, where (1 − ρ rj ) stands for the probability with which a node in group r does not connect to node j. As compared with the NLA, here we need not introduce a normalization condition like Eq. (2.11); ρ rj can take any allowed value (0 ≤ ρ rj ≤ 1) independently. It is this flexibility and adaptability that makes our algorithm applicable in principle to any network.
Now we follow the NLA to develop the algorithm based on π ≡ {π r } and ρ ≡ {ρ rj }. In order to introduce less notations, here we take all other symbols adopted in the NLA except θ and maintain their original meaning (with θ being replaced by ρ where necessary). We also refer to our algorithm the a priori probability based expectation maximization algorithm (APBEMA) in the following. Our starting point is the conditional probabilities and It should be stressed that the right hand side of Eq. (3.3) accounts for not only the probability for the presence of a link (A ij = 1) but also that for a null link (A ij = 0), hence honestly reflects the conditional probability for observing the configuration given by A. As can be seen in the following, it also implies the null links are as equally important as links for partitioning a network, which agrees well with our intuition. Our next task is to find π and ρ that maximize Pr(g|A, π, ρ) ln Pr(A, g|π, ρ). It can be rewritten as Here q ir ≡ Pr(g i = r|A, π, ρ). Apparently, it satisfies the normalization condition r q ir = 1 as required. Now we are ready to obtain π and ρ that maximize L with the only constraint r π r = 1. We set with L being given by Eq. (3.5) and α the Lagrange multiplier introduced. By solving the following equations ∂f ∂α = 0, ∂f ∂π r = 0, and ∂f ∂ρ rj = 0, (3.8) we obtain and (3.10) Then we get the APBEMA defined by Eqs. (3.6), (3.9) and (3.10). Its iterative implementation is the same as that for the NLA, hence it has the same efficiency in terms of computational complexity. Also as in the NLA, the convergent values of {q ir } suggest the partition, and those of {ρ rj } describe the connection patterns of groups. It is worthwhile noting that according to Eq. (3.10) 0 ≤ ρ rj ≤ 1 as expected. In addition, Eq. (3.10) is consistent with the meaning of ρ rj , namely, the probability with which a node in group r is unidirectionally linked to node j. This can be seen further from j ρ rj , which represents the averaged outgoing degree a node in group r has. Indeed, according to Eq. (3.10) The right hand side of Eq. (3.11) is exactly the expected outgoing degree of a node in group r.
To summarize, our algorithm is based on the apriori probability assumption. It is this difference in the meaning between ρ rj and θ rj that makes the APBEMA radically different from the NLA despite their similarity in form.

A. Properties of the APBEMA
The APBEMA developed previously has the following properties: (i) Applicable without any restriction on the degree distribution. Even in the trivial and less meaningful example where the network contains some isolated nodes the APBEMA can successfully assign them into one group, say group r, that is characterized by ρ rj = 0. For the examples shown in Fig. 1, the APBEMA partitions them without any ambiguity in the sense that the output values of ρ rj and q ir are all virtually zero or one. For the directed bipartite network shown in Fig. 1(a) it suggests the left two nodes in one group and the right two in another while for the directed star ( Fig. 1(b)) it separates the center node from the rest just as expected. (To apply the APBEMA to these two networks, the number of groups has been assumed to be c = 2.) (ii) Suggesting the same partition for the complementary network. By the complementary network of a network specified by the adjacency matrix A, we mean the network which has the same nodes but its adjacency matrix A ′ is related to A via A ′ ij = 1 − A ij . Namely, a link in network A is a null link in its complementary network A ′ and vice versa. Obviously, a group r in A characterized by {ρ rj } (j = 1, · · · , n) is still a group in A ′ with {ρ ′ rj = 1 − ρ rj } according to the definition of group. Hence an algorithm aiming at identifying the groups of similar connection pattern should suggest the same partition for both a network and its complementary network. This is the case for APBEMA, which is guaranteed by the (3.9) and (3.10). This symmetry also implies that null links play the same important role as links in partitioning a network. A further discussion will be given in Sec. V.
(iii) Applicable to both directed and undirected networks. Although the APBEMA we obtain here is for directed networks, it can be extended without any modifications in form to undirected networks. The argument is similar to that given in [34]: In an undirected network, ρ rj is still the probability for a node in group r to connect to node j; the probabilities for there is and there is no link between node i and node j are ρ gi,j ρ gj ,i and which is the same as Eq. (3.3). (A ij = A ji has been used.) Other derivations are then exactly the same as in the directed case.
(iv) Powerful in accounting for the heterogeneity effects on grouping. The APBEMA allows us to prescribe the involved heterogeneity effects of the outgoing degree distribution. This can be done by conveniently introducing a tunable parameter to the APBEMA. With this extension, we can study how the degree heterogeneity may affect the grouping results in a controlled way. In the situations where we desire to bias the heterogeneity effects on the grouping this extended algorithm would be superior. This algorithm will be discussed in detail in Sec. VI.
(v) Applicable to weighted networks. With a straightforward extension, the APBEMA can also be used to analyze weighted networks. A detailed discussion will be presented in Sec. V.
(vi) The same efficiency as the NLA in terms of computational complexity.

B. Examples
To show how well the APBEMA works, we present in this subsection several typical examples. Just as in the NLA, besides the adjacency matrix A we also need to In each set the nodes are randomly connected with the average intragroup degree k intra = 13, and between the two sets the links are randomly connected with the average inter-group degree k inter . The error rate by the APBEMA is shown as a function of the inter-group degree k inter . The two sets are successfully recognized for k intra ≫ k inter and k intra ≪ k inter when the group structure is clear.
set the number of groups, c, as another input. For all the examples throughout this paper we assume that this information has been known. In particular, we set c = 2 in all other examples except for the case of the American college football teams where c = 12 is assumed. The first example is a homogeneous undirected network. We simply divide n nodes into two sets of equal size and in each of them nodes are randomly intraconnected with the average intra-degree k intra . After that the inter-group links are randomly added with the average inter-group degree k inter . Obviously, these two sets are two groups according to the definition, and when k intra ≫ k inter (k intra ≪ k inter ) they are assortatively (disassortatively) connected. In practice, the larger the difference between k inter and k intra is, the clearer the group structure would be, and the easier it should be to detect the groups.
The results for n = 60, k intra = 13 against k inter are summarized in Fig. 2. We find that the APBEMA works well: it identifies successfully both the assortatively and disassortatively linked groups when their structures are clear. If k intra and k inter are too close it fails just as expected.
It is interesting to note that when k intra ≫ k inter the two groups can be seen as two communities. This fact suggests that in the cases when groups and communities overlap with each other in a network the APBEMA can be used to detect communities as well. Given this, it is expected that for k intra ≪ k inter , when the network becomes bipartite-like, the APBEMA works equally well. This is because the complementary network in this case The error rate (solid dots) is for the group detection result by the APBEMA in identifying a fully connected clique of nc = 7 nodes immersed in a randomly connected background of 63 nodes whose average degree k BG is varied for investigating how the error rate depends on it. For k BG < nc the APBEMA works very well (the error rate is smaller than < 10%), and the error rate due to wrongly partitioning the clique nodes into the background (open squares) is small and can be neglected. In this case the error rate is mainly contributed by wrongly partitioning the background nodes into the clique as a result of fluctuations in building the network.
is a community network, and as having been pointed out in the last subsection, the APBEMA is symmetric for a network and its complementary network. Indeed, such a symmetry has manifested itself clearly on the error rate curve presented in Fig. 2.
To measure the error of group detection, we define the error rate ǫ as the sum of the portions of nodes wrongly partitioned into the opposite group: where n 1 (n 2 ) is the number of nodes in the first (second) group and δn 12 (δn 21 ) the number of nodes belonging to group 1 (2) but are assigned to group 2 (1) by the algorithm. If the nodes are randomly assigned to each group, or all nodes are simply regarded as belonging to a single group, the error rate so defined takes the value one and implies a complete detection failure. It is zero only when all the nodes are correctly grouped. To suppress the fluctuations, for every data point presented in Fig.  2 we have averaged the error rates evaluated over 1000 realizations of the network. We have also checked that with other definitions of the detection error, for example, that used in Ref. [39,40,41], which is based on the normalized mutual information, the results are qualitatively the same. This is also the case for all other examples throughout this paper where the error rate is evaluated. In our second example the groups are connected in a way neither purely assortative nor purely disassortative. First we build a random homogeneous and undirected network of n nodes with the average degree k BG , then we chose from them n c ≪ n nodes randomly and fully connect them to form a clique. We then have two sets of nodes: the clique, whose nodes have an average degree (n c − 1) + (1 − n c /n)k BG , and the one consists of the rest nodes which we call the background, whose nodes have an average degree k BG . We restrict ourselves to the case k BG ≪ n c , namely, the degrees of the nodes in the clique are much larger than those in the background, thus making the clique quite outstanding to the background. Hence the network under consideration is in fact highly heterogeneous. It should be pointed out that in this case the communities occasionally formed in the background due to fluctuations [42] can be neglected, and according to the definition the clique and the background are two groups since nodes in themselves share the same connection pattern that can be appropriately specified in terms of {ρ rj }. Furthermore, this network is neither assortative nor disassortative; it is not a community network either because the background nodes are connected between themselves the same densely as they are connected to the clique nodes. In Fig. 3 the partition results by the APBEMA for n = 70 and n c = 7 are shown against the average degree of the background nodes, k BG . It can be seen that for k BG ≪ n c it gives the correct partition perfectly. In fact, the APBEMA works well all the way up to k BG ∼ n c with the error rate smaller than 10%. As k BG is increased further the clique becomes less distinct from the background, and the fluctuations in the background begin to play a role. As a result the error rate starts to increase quickly. Further investigations show that for k BG < n c the detection error due to wrongly partitioning the clique nodes into the background (open squares in Fig. 3), namely δn 12 /n 1 in Eq. (3.13)(subscript 1 (2) indicates the clique (background)), is very small and can be safely neglected. The detection error is mainly contributed by wrongly partitioning the background nodes into the clique in certain network realizations due to fluctuations where the wrongly partitioned background nodes happen to have a higher degree and more links connecting to the clique nodes. On average the total number of the wrongly partitioned nodes (mainly from the background to the clique) is about 0.11, 0.39, 0.88 and 1.6 for k = 1, 2, 3 and 4 respectively. In this calculation 1000 realizations of the network are considered again to average the error rate.
The network studied in this example could be relevant for studying some real networks containing cliques. The success of the APBEMA is a good indication of the flexibility and adaptability of the apriori probability assumption, and suggests that the APBEMA may find some unique applications in certain partition problems.
In general, in a community network the nodes in a community may not share the same connection pattern. In such cases the group partition can be different from SN89 PL SN100 FIG. 4: The dolphin social network [43,44]. Nodes denoted by solid squares and solid dots represent the two disjointed subdivisions the network split into during the development of the network [45] after the departure of a key member SN100 (open dot). The dashed line is the group partition suggested by the APBEMA corresponding to the largest value of L which regards nodes SN89 and PL belonging to the opposite subdivision but all others nodes to their own subdivisions. This is one real network example where the APBEMA can be used to detect the community structure.
that of the community partition. Such an example will be discussed in the next section. However, in the cases where they do share the same connection pattern, or approximately do, our algorithm can then be used to find the community structure. This has been seen in the first example (Fig. 2) when the two groups are assortatively connected. In the following we show two examples of real community network where the partition result given by our algorithm is in good agreement with the community partition.
The first one is a network of bottlenose dolphin [47] living in Doubtful Sound, New Zealand [43,44,45] which is composed of 62 dolphins (nodes) and 159 social ties (edges). It is assembled by researchers over years (Fig.  4). During the course of the investigation of this network, it split into two disjointed subdivisions [45] of unequal size (represented by solid squares and solid dots in Fig.  4 respectively) following the departure of a key member named SN100 (denoted by the open dot in Fig. 4). The group partition provided by the APBEMA corresponding to the largest value of L agrees very well with the natural splitting except two nodes named PL and SN89.
The second example is the network of the American college football teams [46]. The network is a map of the schedule of Division I games for the 2000 season where 115 nodes represent the teams and 616 edges represent regular-season games between the two teams they connect [46]. All 115 teams are organized into 12 conferences each of which contains about 8-12 teams. As games are usually more frequent between members of the same conference than between members of different conferences, most conferences can be seen as communities. But because there are few of them whose teams played more or nearly as many games against teams in other conferences than/as those in their own conference, the network  [46]. The nodes denoted by the same symbols belong to the same conference. The grouping result produced by the APBEMA with assumed group number c = 12 is represented by the clusters. Stars stand for the "IA independence" conference which are scattered due to their sparser connections inside. In this case the groups given by the APBEMA coincide with the communities very well despite the scattering of the "IA independence" conference. This is another example in addition to the dolphin network (see Fig. 4) where the APBEMA can be used to detect the community structure. structure does not reflect the genuine conference structure perfectly [46]. The partition suggested by APBEMA is presented in Fig. 5. (The number of the groups is assumed to be c = 12 as input.) It can be seen that the group structure suggested has a fairly accurate coincidence with that of the conference. In particular, five groups (the top five) are completely the same as the corresponding conferences without any nodes wrongly assigned to/from other conferences, and five others have only one or two nodes being assigned to/from other conferences. The most obvious mismatch lies in the partition of the conference "IA independence". Its members, Central Florida, Connecticut, Navy, Notre Dame and Utah State (denoted by stars in Fig. 5) are assigned to other groups rather than in their own. Considering the fact that they have more games in the conferences they are assigned to than in their own, this is reasonable and somehow expected.
To summarize this subsection, the APBEMA performs well in identifying various structures in a network. More examples and further discussions of the presented ones will be given in the following sections.

IV. EFFECTS OF HETEROGENEITY ON GROUPING
In this section we study how the degree heterogeneity may affect the grouping results. Theoretically this problem is interesting as it is related to a general issue in network study, namely, whether/how two different types of topological characteristics are coupled. Obviously, in the APBEMA the coupling between the degree distribution and the group structure is inherent: The APBEMA suggests the grouping based on the connection patterns it recognizes, but the connection patterns are in turn evaluated based on the outgoing degrees. The close relation between the connection patterns (given by {ρ rj }) and the outgoing degrees, {k i }, can be seen clearly in Eq. (3.11).
Then the next question for our aim here is how the APBEMA captures the degree heterogeneity. A key observation is that the APBEMA models the network in a coarse-graining way. It uses the groups as the 'patches' to represent different parts of the network, hence in effect the network is characterized at two different levels. At the lower level, namely inside each group, the APBEMA has assumed that all nodes are identical and statistically independent. Therefore the structure of a group, its degree distribution as well, has been assumed to be homogeneous. So at this level the heterogeneity is not captured by the APBEMA, which can be seen as a simplification adopted by the APBEMA. The difference between the outgoing degree of a node from its expected value (i.e. j ρ rj , see Eq. (3.11)) in a group is treated by the APBEMA as a result of the statistical fluctuations.
However, at the level of groups the APBEMA is flexible. It allows the statistical characteristics of the groups to vary from group to group so that the local structures of the network are given the best matching. Therefore it is at this level that the heterogeneity is taken into account by the APBEMA. With this understanding we may imagine that the APBEMA tries to mimic the degree distribution function with a series of peak-like functions. Each peak-like function corresponds to a homogeneous degree distribution in a group, and its position represents the average outgoing degree of the group.
Hence if the network is heterogeneous, then the heterogeneity would be characterized by the distances between these peaks. A good example is the network studied in Fig. 3; its degree distribution function happens to be one of two narrow peaks representing the clique and the background. The distance between them tells directly how heterogeneous the whole network is. For a more general degree distribution function, though it is hard to infer all the information of the heterogeneity based on the distances between these peak-like functions, they are still a good indicator of it. Another (opposite) extreme case is for the homogeneous networks, see for example the one presented in Fig.2, where all these peak-like functions overlap with each other and the distances between them are all zero.
What we have learned here implies that if we can ap-propriately preset the positions of these peak-like functions, namely the average outgoing degrees of the groups, then we can interfere the way the APBEMA considers the heterogeneity effects. Our aim in this section is to develop such an algorithm. For example, if all the average outgoing degrees are taken to be equal, then we have in effect suppressed the heterogeneity effects to be considered completely. This extreme case will be discussed in the first subsection in the following. The APBEMA discussed in Sec. III has taken into account the heterogeneity effects as fully as it can, so it stands as another extreme. In the second subsection we will discuss how to introduce a control parameter to build an interpolating algorithm such that the heterogeneity effects involved can be tuned between these two extremes continuously. Then we will show in the third subsection by the example of the karate network [38] how the heterogeneity plays its role in grouping. A comparison with the dolphin network will reveal an interesting underlying structural difference between the two networks.
A. The heterogeneity suppressed algorithm (HSA) As discussed in Sec. III, j ρ rj gives the expected outgoing degree for a node in group r. If we assume that all the nodes, regardless of which group they belong to, have the same expected outgoing degree, then j ρ rj should satisfy j ρ rj = d out , (4.1) where d out ≡ 1 n i,j A ij is the average outgoing degree over the whole network. With this consideration, we can build up a grouping algorithm where the effect of heterogeneity is completely suppressed. First we start from Eqs. (3.2) and (3.3) and get L as in Eq. (3.5) and q ir as in Eq. (3.6), namely, again. Then we can get π and ρ with constraints of r π r = 1 and those imposed by Eq. (4.1) by setting f (π, ρ, α, β) = L − α( r π r − 1) − r β r ( j ρ rj − d out ) and requiring that the partial derivatives of f with respect to its variables to be zero. α and β ≡ {β r } serve as Lagrange multipliers of the constrains. It leads to and (4.5) We refer to this algorithm defined by Eqs. (4.2)-(4.5) the heterogeneity suppressed algorithm (HSA). As expected, if we impose zero to all β r , then the APBEMA is retrieved. Compared with the APBEMA, the change in form of the HSA caused by β makes its implementation different: Here in fact two cycles of iteration, the outer one and the inner one, are involved. At each step of the outer cycle, we update q and π via Eqs. Now we have two extreme algorithms at hand: in one (the APBEMA) the heterogeneity is given full consideration and in another (the HSA) it is completely suppressed. Inspired by the way we construct the HSA, we realize that an 'interpolating' algorithm bridging the two extremes can be created by introducing a tunable parameter w into Eq. (4.1) such that Now ξ r (w) is the average outgoing degree we impose on the group r, and the parameter w prescribes the weight of the heterogeneity. For w = 0, ξ r (w = 0) = d out , then no difference of the expected outgoing degrees between the groups is considered; Eq. (4.6) is then reduced to Eq. (4.1). For w = 1, ξ r (w = 1) = d out r , which is exactly the average outgoing degree of group r when the heterogeneity is fully considered; it is then reduced to Eq. (3.11). For other values of w (0 < w < 1) the average outgoing degree ξ r (w) takes the linear interpolating values between ξ r (w = 0) and ξ r (w = 1) as a result.
Following the derivations as in the HSA, the solution of π and ρ under constraints r π r = 1 and j ρ rj = ξ r (w) are still given by Eqs. (4.2)-(4.4), but β r now reads instead. It is easy to show that for w = 0 it reduces to Eq. (4.5) and the HSA is retrieved, and for w = 1 as β r = 0 we have the APBEMA again. For 0 < w < 1 we thus have an intermediate algorithm in between where only partial effects of heterogeneity are considered, hence in effect it is a heterogeneity weighted algorithm (HWA). By changing w one can therefore conveniently adjust the degree of heterogeneity involved and investigate how it may affect the grouping results. The numerical implementation of this algorithm is the same as the HSA. As a trivial test this heterogeneity weighted algorithm has been applied to the example in Fig. 2. As it is a homogeneous network, we can expect that weighting the heterogeneity will not produce any effects. Namely, the partition results shown in Fig. 2 does not depend on w. Another trivial test is the clique-background network studied in Fig. 3. As in this example the groups are characterized by their own average degrees, we may expect that suppressing the heterogeneity effects may blur the line of distinction of the two groups and hence cause a detection deterioration. These conjectures have been fully verified by our simulations (the data of which are not shown here).
In the following we will consider some more meaningful and inspiring examples. In particular we will apply the HWA to two real social networks. Interesting results will be discussed in detail.

C. Analysis of the karate club
In Ref. [38], Zachary reported an anthropological study of a karate club in a university. During the development of the club, two groups led by the instructor and the president formed gradually and in the end, due to the lack of a solution to a dispute, the club split. In recent years, the network of this karate club has been widely used for testing various community finding techniques, including the NLA in [34] where it has been found that the result of the NLA is in good agreement with the true splitting.
To apply our heterogeneity weighted algorithm, it is found that for w = 0, namely the heterogeneity effects are completely suppressed, the partition result is the same as that given by the NLA (Fig. 6(a)). But for w = 1 (Fig.  6(b)), when the heterogeneity effects are fully considered, it suggests that those dominant nodes (open dots in Fig.  6(b)) belong to one group and the others belong to another group. Such a result (Fig. 6(b)) is not surprising because nodes in each group are indeed much more similar, which agrees better with our definition of group. For example, nodes in each group have more similar degrees; they have the similar connection pattern as well: in the dominant group nodes are weakly connected to each other and serve as the branches of the whole network, while in the other group nodes are only sparsely connected between themselves and look like leaves attached to the dominant group. This partition is also meaningful in reality: it recognizes the leaders and coordinators from the other members. It is important to note that from a different viewpoint based on the information theory [37], similar partition result has been obtained (see Fig. 4B in [37]). This example shows clearly that the groups of similar components may not be the same as the communities in a community network. In order to have a better understanding of the network structure, analysis of both is necessary. Now let us look at what happens if the weight of the heterogeneity is changed. Starting from w = 0, each time we increase w with a small step ∆w and then iterate the stabilized results of q, π and ρ obtained at w until they converge. In this way, we can trace the partition shown in Fig. 6(a) up to w = 1. Similarly, starting from w = 1, the partition shown in Fig. 6(b) can be traced back up to w close to zero. The values of L evaluated by Eq. (3.5) that correspond to these two groupings are presented in Fig. 7. We can find that the corresponding L value for the partition in Fig. 6(a) changes only very slightly during this process, but that for the partition in Fig. 6(b) is, first, smaller when w is close to zero, but it increases continuously with w and at w c ≈ 0.37 it begins to become larger. For w > w c , the fact that the  [38] depends on the degree heterogeneity by using the heterogeneity weighted algorithm (HWA). The L values corresponding to the two groupings shown in Fig. 6 are presented as functions of w, the weight of the heterogeneity. They are two maxima and intersect at wc ≈ 0.37. It suggests that when the heterogeneity effects are suppressed (w < wc) the partition as in Fig. 6(a) is preferred but when the heterogeneity effects are more fully considered (w > wc) the partition as in Fig. 6(b) is recommended instead. It shows that a group partition can depend on the heterogeneity effects strongly.
partition of Fig. 6(a) can still be traced suggests that the corresponding value of L is, though not global, still a local maximum as well. (As both partitions coexist for our algorithm as maxima of L, we believe that a network analysis by the expectation maximization method would be more powerful if local maxima solutions other than that of the global maximum are considered in addition.) Fig. 7 shows clearly the important role played by the heterogeneity in the definition and detection of the groups and communities. In this example we have both groups and communities. As they are identical for w < w c , that is where our algorithm can be used to detect the communities. If we insist that only the solution corresponding to the global maximum of L defines the groups, then they are different from the communities when w > w c .
On the other hand, as w sets the weight of the heterogeneity to be considered, this tunable algorithm is quite flexible and may find some interesting applications in practice, in particular in those situations where we wish to stress or weaken the effects of the heterogeneity on purpose.
Next let us cite the social network of dolphin as a comparison. In Fig. 8 the three largest maxima of L value are shown as functions of the weight of the heterogeneity. There are not any intersections between them. This fact may suggest that we have a unique grouping and it is robust to the heterogeneity. This is verified by the careful investigation that shows the partitions corresponding to these curves indeed do not change with w. The groupings corresponding to the largest two L maxima are given by Fig. 4 and Fig. 9 respectively. A comparison between L grouping as in Fig. 4 grouping as in Fig. 9 w FIG. 8: Study on how the grouping of the dolphin network [43,44,45] depends on the degree heterogeneity by using the heterogeneity weighted algorithm (HWA). The three largest maxima of the L value against the weight of heterogeneity, w, are shown. The grouping of the network corresponding to the top (middle) curve is given in Fig. 4 (Fig. 9). It suggests that in this example the group structure depends insensitively on the heterogeneity effects.
these two partitions is interesting: the only difference lies in the node PL. On one hand the nuance between their L values may be a signature that our algorithm lacks confidence in partitioning node PL due to its special role in between the two subdivisions, and on the other hand their overwhelming agreement may suggest that our algorithm is quite confident in partitioning all other nodes except PL. This is consistent with the big gap between the second and the third maxima of L, which indicates that our algorithm would prefer to discard any other groupings except those shown in Fig. 4 and Fig. 9. These results may be an indication that the natural subdivisions formed after the splitting of the network are the only main topological structure from the view point of group partition in this network. Unlike the karate network where different structures may coexist, the network of dolphin lacks a 'core' of dominant nodes around which the other nodes are organized. This topological difference may have implications in understanding the different social behaviors of the two societies.

V. EXTENSION TO THE WEIGHTED NETWORKS
As the expectation maximization algorithms have so many advantages, it is desirable to extend them to weighted networks. In fact the Newman and Leicht scheme favors such an extension. A straightforward method was suggested in [35] where the weight of each link was related to its contribution to the L value. In this section we discuss this problem based on the APBEMA, but the derivations are similar and straightforward for the heterogeneity suppressed and the hetero-  [43,44,45]. Same as in Fig. 4 but the partition represented by the dashed line, given by both the APBEMA and the HWA, corresponds to the second maximum of L (see Fig. 8) instead. In this partition only the node SN89 is not classified into the natural subdivision it belongs to [45]. A comparison with the partition corresponding to the first maximum of L (see Fig. 4) indicates a special role node PL may play.
geneity weighted algorithm. The radical difference between our scheme and that in [35] is that in our algorithm it is the information provided by each entry of the adjacency matrix that is weighted.
We rewrite Eq. (3.5) in the form of from which we can tell that the term between the square brackets represents the contribution to the L value given by A ij , namely the information of the connection state between node i and node j. Obviously, no matter A ij = 1 or A ij = 0 its contribution is equally important and counts. Hence if we attach a weight ω ij to the information provided by A ij , then the L value for the aim of grouping should naturally be replaced by Next, we assume the right grouping should be the one that maximize L ω with the constrain r π r = 1. The deduction is then the same as in the APBEMA and finally we have and where q ir is still given by Eq. (3.6). It is apparent that, for an unweighted network where ω ij = const, this algorithm is reduced to the APBEMA as expected.
Similarly, if the constraints of Eq. (4.1) or Eq. (4.6) are taken into account, we can get the heterogeneity suppressed or heterogeneity weighted algorithm for the weighted network as well.
It is important to note that ω ij is the weight of the information provided by A ij rather than of the link between node i and j. (Note that though in calculating ρ rj (Eq. (5.4)) ω ij does not count in evaluating the numerator if A ij = 0, it does in evaluating the denominator.) In other words, even if there is no link between node i and node j, this piece of information (A ij = 0) is equally important for recognizing the group structure. This result is consistent with our intuition and experience.
In order to well appreciate the implications of this algorithm, let us take the network studied in Fig. 3 as an illustration. For the sake of simplicity, we assume that all the weights take only two values: 1 and ω. Here ω is a constant used to weight a selected potion of entries of the adjacency matrix and 0 ≤ ω ≤ 1; it is introduced to control the information of that potion the algorithm can use and so that we can investigate how the grouping results depend on it. We consider the following three cases: (i) ω ij = 1 for A ij = 0 and ω ij = ω for A ij = 1; (ii) ω ij = 1 for A ij = 1 and ω ij = ω for A ij = 0; (iii) ω ij = ω if both node i and node j are in the clique and ω ij = 1 otherwise. For ω = 0, since a crucial part of information of the network topology lacks, we may expect a failure of grouping. As ω is increased, more and more information are taken into account, the grouping should be more and more accurate. Finally, as ω = 1 is approached, all the topological information is considered, our algorithm should suggest the grouping as perfectly as the APBEMA does. This conjecture has been well verified by the simulations. In Fig. 10 the grouping error rate against ω is summarized for the case where the network has n = 70 nodes, the clique size is n c = 7 and the average degree of the background nodes k BG = 3. Each data point represents the averaged error rate over 1000 realizations of the network.
In the first case (solid squares in Fig. 10), the information associated with the null links is fully considered but that associated with the links is controlled by ω. For ω = 0 their contributions are completely ignored; as a consequence the algorithm 'sees' all the nodes isolated from each other and classifies them into a single group. To increase ω from zero, thought slightly, would stop the algorithm from classifying all the nodes in a single group, but the error rate is still high. As ω is increased further, more and more information of the links is available and the partition becomes more and more accurate. When it comes to the point ω ∼ 0.7, the information seems to have been enough for the algorithm to recognize well the clique from the background. This phenomenon is interesting: it suggests that in fact there is a redundance in information for the use of partition in the network under Error rates for the grouping results suggested by the weighted APBEMA in identifying a fully connected clique of nc = 7 nodes immersed in a randomly connected background of 63 nodes whose average degree is k BG = 3. The information contained in each entry of the adjacency matrix is weighted by either ω or 1, and solid squares, open squares and solid dots represent three different ways for assigning the weights among the entries, which correspond to the case (i), (ii) and (iii) as described in the text (see the text). In all the cases as ω is increased the grouping becomes more accurate, which supports the viewpoint that the information of links and null links are equally important.

study.
In the second case (open squares in Fig. 10), the information associated with the links is fully considered but that with the null links is tuned by ω. Similarly, for ω = 0 the algorithm cannot 'see' the null links and thus the background. All nodes are regarded to be in one well connected group. This result shows clearly the information of the null links is a requisite for a correct partition. As ω is increased from zero, the error rate undergoes an abrupt drop. This is because here we have much more null links than links and hence even a small value of ω may release much more information than in the case (i). To increase ω further would improve the grouping correspondingly just as expected.
In the last case (solid dots) the weights of the information associated with the clique is varied instead, but again we have qualitatively the same result as in the first two cases. These results are in good consistency with our discussions on the weighted APBEMA from the information perspective.
To weight the information contained in {A ij } can be more relevant in practice. To construct a network representation of a real complex system, it involves unavoidably the measurement of the connection state between any two nodes. In a general case, the measurement does not generate a definite zero/one output; rather, the errors and uncertainties are entangled intrinsically. In many cases, such as in some biological systems, biochemical systems and human societies, as the relations between the elements can be numerous and of various types on one hand, and these relations themselves can be coupled with each other on the other hand, the problem of measurement is even more subtle and difficult. Hence for any network abstracted in the end, the evaluations of the confidence in the measured connection states are important and necessary. These evaluations of the confidence are the ideal measures of the weights considered here.

VI. SUMMARY
In this work we have studied how to detect the groups in a complex network that consist of nodes having the similar connection pattern. Our algorithm is based on the mixture models and the exploratory analysis suggested by Newman and Leicht, but significant differences exist. In our algorithm the connection pattern is modelled by the a priori probability assumption instead. The main advantages of our algorithm are that (i) It can be applied without any restriction on the degree distribution; (ii) It possesses the symmetry between the links and the null links; (iii) It is flexible in dealing with the heterogeneity effects; and (iv) It can be extended to the connection information weighted networks. These advantages have been illustrated by various network examples.
With our algorithm we have studied the role played by the heterogeneity. We find that the grouping result may depend on the heterogeneity effects involved. This finding suggests that in order to have a thorough knowledge of the network structure, this dependence should be analyzed. For this reason all the groupings found (at various values of w, see Sec. IV) are justified. This can be seen as an extension to the definition of group formally defined at w = 1 when the heterogeneity effects are fully considered.
Based on our analysis, it is natural to extend our algorithm to the connection information weighted networks. This result is a direct implication of our a priori probability based group connection pattern model. As the connection information weighted networks can be closely related to the measurement of networks, we expect our extended algorithm may find wide applications.
Finally, our study has also suggested that groupings associated with other top maxima of the merit function (L) could be meaningful and useful as well. This may be a common feature among the expectation maximization algorithms. How to interpret these groupings seems to be interesting and potentially important that deserves further investigations.