Community Structure in Small-world and Scale-free Networks

The community structure of small-world networks and scale-free networks are explored in the framework of the modularity using computing simulations. We find: first, community structure in small-world networks is clear, but in scale-free networks it is not; second, community structure is affected by the density of the network, namely, the sparser the network connections, the clearer is the community structure; third, modularity in small-world networks is a dynamic variable which changes as a function of the probability of rewiring connections but not monotonically.


Introduction
Networks consist of a set of nodes connected by ties indicating interaction. Network analysis "makes it possible to bridge the 'micro-macro gap', the theoretical gulf between microsociology, which examines the interaction of individuals, and macrosociology, which studies the interaction of groups or institutions." (Mustafa and Jeff, 1994).
Mathematicians and physicists have found a number of distinctive structural properties in "real-world" networks, and study of complex networks is active in many fields (Wasserman and Faust 1994;Scott, 2002;Dorogovtsev and Mendes, 2001;Albert and Barabàsi, 2002;Watts et al., 2002). The most famous characteristics of the complex networks (Xiao and Chen, 2003;Zhou et al., 2005) are "small-world phenomena" (Watts and Strogatz, 1998;Dorogovtsev and Mendes, 2002) and "scale-free properties" Barabàsi, 2002).
The small world phenomenon produces effective diffusion of information in a society in that every node can be reached through a short chain of acquaintances (Kleinberg, 2000). Compared with regular and random networks, a small-world network consists of many local links and fewer long range "shortcuts". In spite of the lack of an accurate definition, a network is usually regarded as having the small-world property if the average path length is small and the average clustering coefficient is big (Watts and Strogatz, 1998). On the other hand, the scale-free property reflects that the network's degree distribution obeys a power law Albert and Barabàsi, 2002); that is, a few nodes with high degree act as hubs while the degree of most nodes is low.
The main difference between a Poisson (or normal) and power-law distribution is in their tails. Unlike Poisson and normal distributions, whose tails decay exponentially, power-law degree distributions decay slower, probably as a result of the existence of hubs.
It has been suggested that the scale-free network is one type of small-world network (Amaral et al., 2000).
Further studies have revealed that many actual networks are highly inhomogeneous (Strogatz, 2001); they do not consist of an undifferentiated mass of nodes but have some subgroup structure (Flake et al., 2002;Girvan and Newman, 2002;Holme et al., 2003;Guimer`a and Amaral, 2005;Mercedes and Jennifer, 2006). A network can often be divided into subgroups, within which the network connections are dense, but between which they are more sparse (Newman, 2004a(Newman, , 2004b. This seems to be a common feature of many networks, especially social networks (Luo, 2005). Each of these subgroups is called a "community", and the complete structure of these subgroups within a network is referred to as "community structure" (Newman, 2004a), which has been identified as one of the main characteristics of complex networks (Wang and Dai, 2005).
In order to evaluate community structure, several measures which are based on density have been proposed (Wasserman and Faust, 1994). Recently, Newman and Girvan (2003) proposed a new measure they call "modularity", which is a numerical index of how good a particular partition is, and overcomes limitations of previous measures. In contrast to most other measures, it not only directly measures the quality of a particular clustering of nodes in a network, but also can be used to automatically select the optimal number of clusters (Scott and Smyth, 2005).
Although community structure has attracted some attention (Wasserman and Faust 1994;Scott, 2002;Newman, 2004a;Wang and Dai, 2005), there are few studies that focus on comparing community structure of small-world networks and scale-free networks.
The main purposes of this paper are to detect the community structure in Watts and Strogatz's small-world network and Barabàsi's scale-free network, and to compare these two standard complex networks from the point of view of the community structure. In section 2, our analysis strategy will be briefly introduced. Results of simulation experiments will be shown in section 3. We conclude with some discussion of our results in section 4.

Analytic Strategy
Although more complex models have been developed, Watts and Strogatz's small-world network and Barabàsi's scale-free network, which will be denoted as W-S and B-A in the remainder of this note, are still the most common models for complex networks. These two types of networks are the focus of this paper (The W-S and B-A models are described in Appendices A and B). The main structural parameters of W-S include the size n, the number of nearest neighbors k and the rewiring probability p. For the B-A model, starting size m 0 , m (≤m 0 ) edges for a new additional node and total time steps t are the main parameters, and the B-A network size is n=m 0 +t.
Despite the many parameters that could be involved in social network analysis, such as degree, centrality, centralization transitivity, reciprocity, etc. (Wasserman and Faust, 1994), complex network analysis generally focuses on the clustering coefficient C, the average path length L and the degree distribution. The clustering coefficient C and average path length L are the most important properties in assessing small-word phenomena (Watts and Strogatz, 1998) and the degree distribution is used in describing the scale-free property. It is difficult to judge a power-law distribution statistically and the conventional strategy is to draw a simple histogram of the degree distribution and plot it on a log scale to see if it produces a straight line (Newman, 2005). The degree distribution will not be discussed in the following. Instead we focus on the modularity Q together with C and L (These three statistics are described in Appendix C).
To explore community structure in small-world networks, we simulate many W-S networks with different p, k and n. L, C and Q are calculated for each network. Then we examine that how the average values of L, C and Q change as p, k and n change separately. In this way, we can assess the main effects of these parameters on a small-world network's community structure. Our strategy for studying the community structure of scale-free networks is similar to that for the small-world networks, but m and n are the two major parameters considered for B-A. Since m 0 <<n and m 0 has little effect on the B-A structure, we do not consider the role of m 0 in community structure.
L and C cannot be calculated strictly according to the theoretical formula because the real networks are usually sparse and have a few isolated nodes. We only calculate C for nodes with more than 1 neighbor (namely n v > 1, where n v is the number of elements in the set of nodes directly connected to node v) and L for nodes possessing short path length (namely i, j ( )≤ n − 1, where i, j ( ) is the minimum distance between nodes i and j). Our results are in agreement with Ucinet software. Q is computed according to the algorithm of Aaron et al (2004). All analyses were carried out using Matlab.

Community structure in the small-world network
For each rewiring probability p, we construct 100 networks according to W-S model and denote the average clustering coefficient C as C(p), the average path length L as L(p), and the modularity Q as Q(p). Hence, C(0), L(0) and Q(0) are the parameters for the regular lattice network and C(1) , L(1), and Q(1) are those for the random network.
When n=100 and k=12, figure 1 shows L(p), C(p) and Q(p) for the family of randomly rewired graphs described in Appendix A. For convenience, the three functions are normalized by the values L(0), C(0) and Q(0), respectively.

Figure 1
From figure 1, we see that L and C decrease monotonically as the rewiring connection probability p increases. This is one of the small-world phenomena, and is intermediate between properties of regular lattices (p=0) and random networks (p=1).The change in modularity Q is very different from that of L and C. For a fixed value of k and network size n varying from 60 to 160 in increments of 20, Figure 2 shows the modularity Q for the family of W-S model.

Figure 2
Analysis of figure 2 shows: (1) For different k and n, there is clearly a peak of modularity Q as a function of the rewiring probability p. Generally, When 0.01 < p < 0.1, the Q value for W-S is greater than that of the corresponding regular lattice network (p=0) and the random network (p=1). The modularity Q increases with network size n, but is not strongly affected by k, the number of nearest neighbors.
(2) The density of the network is nk n For k fixed, the network becomes sparser as n is increased. Hence, the sparser the connections in W-S, the greater the value of Q and the clearer the community structure.   From figures 3 and 4, we find:

Community structure in the scale-free network
(1) Compared with the corresponding random network, B-A has a similar average path length L and a greater average clustering coefficient C, and most of the C values of B-A are about double those of the random network. Hence, B-A networks display small-world phenomena in a "weak" sense.
(2) The modularity Q for B-A is similar to that of the random networks, and is less than 0.3 (shown in figure 4), indicating that there is no clear community structure in B-A networks.
(3) m is the key parameter for B-A. Among the B-A family with different sizes, C increases while L and Q decrease with increasing m.
(4). Modularity Q is also affected by the density of the network. For each curve in figure 4, the B-A network becomes sparser with n increasing, while the modularity Q increases.

Discussion
Our main findings are the following: (1). Community structure is the basic characteristic of the networks considered here. Compared with the random network, the regular lattice network and W-S have clear community structures, while B-A does not. The rewiring probability p of W-S has a remarkably strong effect on community structure. Our simulations reveal that the community structure in W-S is clearer than that for the corresponding regular lattices network when 0.01 < p < 0.1. Cowan et al (2004) found that the long-run average level of knowledge diffusing through a W-S network is also a non-monotonic function of the rewiring probability p, with its absolute peak corresponding to values of p around 0.06, which are clearly in the small-world region (between p = 0.01 and p = 0.1 approximately; Morone and Taylor, 2004). Our findings are in agreement with these. The criterion for small-world phenomena suggested by Watts and Strogatz (1998) is qualitative. Using community structure, however, we obtain a more quantitative criterion, namely, a network with small-world phenomena should has a large average clustering coefficient, small average path length and clear community structure.
The community structure of the B-A network is clearly different from that of the W-S network. Table 1 summarizes the main differences in the properties of the different types of networks.

Table 1
(2). The density of the network affects the community structure. Our simulations suggest that a sparse network may have clear community structure. The structure of networks can vary a great deal and there are many different types of community structure for a network, so that detecting community structure is difficult. Hence, theoretical analysis for the relationship between the density and community structure is impossible.
However, the situation for regular networks is comparatively simple. For a regular lattice network (as described in Appendix A) with t communities, the modularity Q is where If k n is small, the network is sparse, and Q becomes larger. Obviously, t is the other major contribution to Q. Furthermore, if we take t = 4n k + 2 to which gives the greater Q, we have where x ⎣ ⎦ rounds x to the nearest larger integer and x ⎡ ⎤ rounds it towards the nearest lower integer.
If k ≤ 0.49n − 2, then Q max ≥ 0.3, and the regular lattice network has clear community structure. Obviously, this condition is very easily satisfied for regular lattice networks and W-S is constructed from the regular lattice network by rewiring a few connections. Hence, if the rewiring probability p is small, the regular structure of the lattices is retained in W-S. With a few random long range "shortcuts", the community structure may be clearer than for the regular lattice network. If p is large, W-S looks more like a random network and the regular structure of the lattices is not seen in W-S with the result that the community structure is less noticeable. For the B-A network, the connections becomes denser with increasing m but the connection between each pair of nodes is random because of preferential attachment. Consequently, there is no clear community structure in B-A network.
(3). It may be not accurate to claim that the scale-free network is one of the classes of small-world networks. Compared to the corresponding random network, B-A has a larger average clustering coefficient and similar average path length. Hence, the scale-free network might be regarded as one class of small-world networks (Amaral et al., 2000). In our simulations the average clustering coefficient of the B-A network is no more than 3 times that of the random network so it does not satisfy the criterion 13 "C>>C rand " suggested by Watts and Strogatz (1998), where C rand is the average clustering coefficient of the random network. From the point view of community structure, the B-A network is closer to the random network than the small-world network. Together with their difference in degree distributions, the preceding remarks lead us to believe that small-world network and scale-free network should be regarded as two classes of complex networks that should be evaluated separately.
(4). Community structure vs. small-world phenomena, and community structure vs. scale-free properties. Many local links and few long range "shortcuts" are structural characteristics in networks with small-world phenomena. The "local links" may be considered as connections among nodes in the same community, and "long range 'shortcuts'" may be connections among different communities. On the other hand, a network with clear community structure may exhibit small-world phenomena. For example, we can form a network as follows: s sub-networks with n1,n2, ns { } nodes, respectively, are generated. For network i, the connection probability is p i for each pair of nodes, where i = 1,2, ,s , and p c for a connection between nodes in any two sub-networks. These sub-networks can be combined to form a new larger network.
If p i >> p c , i = 1,2, ,s , then the whole network has an obvious community structure (Du, et al., 2006) and we denote such a network as R-R. For the following three cases, we calculate C, L and Q for R-R networks and the corresponding random network and list them separately in Table 2.
The relationship between community structure and scale-free properties is difficult to describe. On the one hand, the scale-free property suggests that the degree distribution of a network should obey a power law, and a few nodes named "hubs" control the whole network. Thus, community structure may not be obvious because most of the nodes should be connected to the "hubs" simultaneously. On the other hand, community structure provides a bridge between the single node and the whole network, because there are few connections between these sub-networks.
Real-world social networks such as those of the rural-urban migrants' social networks obtained from the Shenzhen survey (Du, et al., 2006;) are sparse and complex. These real social networks exhibit clear community structure, small-world phenomena and scale-free properties simultaneously. None of the current network models, such as the random network, W-S, or B-A model, fits these social networks very well. A new social network structure may be necessary. Figure 5 shows how to construct the network: we start with a ring of n nodes, each connected to its k nearest neighbors by directed edges, making a regular network as in figure 5 (a). With probability p, we reconnect each edge to a destination node over the entire ring, with duplicate edges forbidden; the small-world networks shown in figure 5 (b) will be obtained for p=0.1. For p=1 the network will be a random network, as in figure 5 (c) shown. According to Watts and Strogatz's small-world network model, we require n >> k >> ln n >> 1, where k >> lnn guarantees that a random graph will be connected. Here, n=20 and k=4 in figure 5(a).

Appendix B Barabàsi Scale-Free Network Model
A network with a power-law degree distribution can be constructed using the Barabási and Albert model  in two steps: (a) Growth: Starting with a small number (m 0 ) of nodes, at every time step we add a new node with m (≤m 0 ) edges (i.e., a new individual, who will be connected to m of the nodes already present in the system).
(b) Preferential attachment: When choosing the nodes to which the new node connects, we assume that the probability ∏ that a new node will be connected to node i depends on the connectivity k i of that node: The time dependence of the connectivity of a given node can be calculated analytically using a mean-field approach. The degree distribution of the network obeys a power-law, namely, Growth and preferential attachment are essential ingredients of the BA model . A network with a power law degree distribution is scale-free (Barabàsi, 2002).

Appendix C Parameters
A social network may be defined as G = V,E ( ), where V is the set of nodes or individuals in the network and E is the set of edges, namely the relationships between each pair of nodes.
The clustering coefficient, C, represents the degree of local clustering, namely, the probability that a pair of nodes connected to a common node are also connected to each other. Suppose there are n nodes. For a node v, V v is the set of nodes directly connected to v, and n v is the number of elements in V v . The clustering coefficient for node v may be defined as: where the edge l contains node x, then δ l x = 1. Otherwise δ l x = 0. The average clustering coefficient of the network is therefore The greater the value of C, the more local clustering occurs in the network.
The average path length, L, measures the efficiency of communication or passage time between nodes, namely, the average number of links in the shortest path between a pair of nodes in the network. If i, j ( ) is the minimum distance between nodes i and j, L can be defined as: The smaller the value of the average path length L, the more efficient the communication through the network.
For a partition of a network into t communities, we define a t × t matrix E whose entry e pq is the fraction of edges in the original network that connect vertices in community p to those in community q.
Here V represents the set of nodes of the whole network and V p is the set of nodes of the The modularity is defined as (Girvan and Newman, 2002): The greater the value of Q, the stronger is the community structure of the network (Newman, 2004a(Newman, , 2004b. In practice, Q values for networks with strong community structure typically fall in the range from about 0.3 to 0.7 (Newman and Girvan, 2004).