Modelling weighted networks using connection count

Weight-driven and degree-driven evolutionary models for weighted networks are proposed, with a distance-dependent mechanism to increase the clustering coefficient. Compared with the well-known BA model there are two generalizations. Firstly, introducing connection count and converting it into edge weight, both new and old vertices can attempt to build up connections. When a new edge is inserted between already connected nodes, the connection count (and thereby the weight) of the corresponding link is increased. Secondly, the distribution of local path distance is also used as a reference in the preferential attachment. The models show some interesting results including scale-free distributions on degree, vertex weight and edge weight. Even the pure distance-dependent preferential attachment model shows all these typical behaviours and a high cluster coefficient. The concepts of reconnecting and converting count as weight are not new and have been used in empirical studies of weighted networks but are first used to model weighted networks.


3
Institute of Physics ⌽ DEUTSCHE PHYSIKALISCHE GESELLSCHAFT weighted network (of scientists only) [13]. For example, the connection between two scientists via different papers in the original bipartite network can be naturally regarded as a repeated connection between them, therefore it gives us connection counts. From this point of view, for the one-mode network projected from a bipartite network, the idea of converting connection count into weight seems quite general. Furthermore, the models constructed in this way do not require any extra dynamical process of weight.
For binary networks, the preferential attachment introduced in the BA model [14] is believed to be a good mechanism to generate a power-law distribution of degree. It is a natural and simple way to incorporate the above relation between connection count and weight into modelling weighted networks. In order to realize this, we add two new ingredients into the degree-driven BA model. Firstly, there is activation of old vertices, which has been used for example in [17,20,21]. Secondly, we can convert connection count into weight, which has already been used in empirical study [2,4,5], but is for the first time applied here on network models. Furthermore, if it gives good results in the degree-driven model, it will be also interesting to apply this idea to the weight-driven model. These two models compose the first part of this work. In the second part, we also introduce a distance-dependent preferential attachment (DDPA) mechanism, which helps to increase the clustering coefficient of the networks. The original BA model and Barrat's model result in quite low clustering coefficient. Some evolutionary models do give high clustering coefficient but have additional structures or quantities [15]- [19]. In our DDPA model, new connections are added into the network according to the distribution of local path distances. In the final part, we will show that a pure DDPA model with second-nearest neighbours gives all those scale-free distributions together with a high clustering coefficient.

The model
An N-vertex weighted network is defined by an N × N matrix, (w ij ) N×N , which represents the weight on the link between vertices i and j. Weight used here is in the sense of similarity, which means the larger the weight, the closer the two ends nodes are, for example, w lm = 0 indicates no relation between vertex l and m. When the link between vertices i, j represents an event happening between the vertices, the number of occurrences of such an event may be defined as the connection count T ij . Therefore such a weighted network can also be characterized by an N × N matrix (T ij ) N×N . The link weight w ij is related to the connection count T ij between vertex i and j, by such as the tanh function w ij = tanh(αT ij ) we used in [5], or just the linear relation w ij = T ij used in [2,4]. The weight defined in this way is a similarity weight, which means the larger the weight the closer the relation between the vertices. Based on this similarity weight, similarity distance will be used to measure the distance between vertices in the sense of similarity. This is different from the usual meaning of distance in a binary network. For example the calculation of such a distance between two vertices 1, 3 along a path connected by two edges (with w 12 and w 23 respectively) is l 13 = 1/[(1/w 12 ) + (1/w 23 )], which is smaller than both w 12 and w 23 (see also [5]). Then to find the shortest path is actually to find the path with maximum similarity distance.

4
Institute of Physics ⌽ DEUTSCHE PHYSIKALISCHE GESELLSCHAFT The evolution is given as follows. Starting from a fully connected n 0 initial network, with initial times T ij = 1 (and initial weight w ij = f (1)), at every time step, 1. One new vertex is added into this network, and l old vertices are randomly chosen from the existing network. 2. Every one (denoted as vertex n) of these (l + 1) vertices will build up m connections.
The probability for every link from n connecting onto vertex i is given by where k i is the degree of vertex i, w i = j w ji is the 'onto' vertex weight of vertex i, l ni is the similarity distance from n to i, and ∂ d n is the set of nodes within the dth neighbours of vertex n. For example, ∂ 1 n is the set of nearest neighbours, ∂ 2 n includes both the nearest and the second-nearest neighbours, and so on. The first two are degree-based and weight-based preferential attachment as usual. The last term means when the old vertices are activated, they will prefer to connect their neighbours rather than remote vertices. 3. After an end node i * is chosen from all vertices over the existing network by the probability above in equation (2), the connection count between vertex n and i * increases by one, 4. The weight of the edges changes as Although our general model defined above can even be applied onto directed networks, we assume that w ij = w ji in the following analysis. A general form of f(T ij ) can be used to convert connection count into weight. We mostly consider the special case and in one simulation, we use the tanh function,

Pure weight-driven and pure degree-driven models
Let us first consider the pure weight-driven case with p = 1 and δ = 0. In some sense, this means the scientists choose their cooperators according to the weight, instead of focusing on degree. In fact, the same idea of this weight-driven mechanism was used in Barrat's paper [11]. The only difference between Barrat's model and this one is the evolution of weight. In Barrat's model, an increment of weight distributes among the neighbours of the chosen vertex. In our model, it evolves through the connection count. Analytical results for the vertex weight distribution can be calculated for this case if we use the linear relation between weight and connection count, equation (5). Vertex weight or strength is Institute of Physics ⌽ DEUTSCHE PHYSIKALISCHE GESELLSCHAFT where w wN(w, t − 1) = 2E 0 + 2m(1 + l)t is the total weight and N = n 0 + t is the size of system at time t with initial E 0 edges and n 0 nodes. The first term reflects the preferential attachment to select the other end of the link, while the following two terms correspond to the random selection of l old vertices. When E 0 and n 0 are much smaller than t the size of the network N is approximately the time step t. Then the rate equation (8) can be written as where p(w, t) N(w, t)/t is the density of vertices with strength w at time t. When t is large enough, we get the asymptotic distribution of vertex weight, Comparison of the analytical results of equation (10) with computer simulations in figure 1 shows they are very consistent with each other. We can find that the lower end of vertex weight is obviously affected by the parameter l, and departs from a power-law, while the upper end is still distributed as a power-law. From simulations of pure weight-driven model, figure 2(a) shows the typical behaviour of degree and vertex weight distributions. The distribution of link weight also obeys a power law as shown in figure 3. A logarithmic binning will smooth the noisy tail coming from finite size simulation; however such linear frequency counting plots are good enough to see  the general distribution form. From the inset diagrams of the figures, we can see there is a very strong correlation between vertex weight and vertex degree, but not linear. It can be described by w ∼ k η , as we did in the insets of the figures. This is an indication that the information stored in those weighted networks cannot be fully covered by the corresponding binary networks. Such correlation has been observed by empirical works, for example η = 1.5 for a world-wide airport network in [2]. For the correlation analysis, an average is taken for the weights of the vertices with the same degree. In the case of the pure degree-driven model, our model is similar with BA model, except now the old vertices can also be activated [17,20,21] and connection can be repeated. The typical distributions of degree and vertex weight from simulations of the degree-driven model are shown in figure 2(b), and edge weight distribution is shown in figure 3. There is another important reason driving us to consider both degree-referred and weight-referred preferential attachment. We want to investigate the structural role of weight. Should it be a fundamental quantity at the same level of connection, or is it a higher level structure, which can be defined by fundamental ones? The difference between those two models, and the conclusion as to which one behaves better, reflect the answer to the question: whether the weight of the edge is as fundamental as connection of the edge. However, as far as we can see from figure 2, both these models show typical behaviours even with the same exponent. In the weight-driven model, we also tried the tanh nonlinear weight, equation (6), to couple connection count to weight. The results in figure 4 do not show a large difference with the case of linear weight above.

Distance-dependent preferential attachment
According to the mechanism represented by the δ term in equation (2), for a given start vertex, the probability to attach to an end vertex is proportional to the similarity distance between them. Therefore, when a connection is built up from a new active vertex, the similarity distance to every existing vertex is 0, so an end vertex is randomly chosen. When a connection is built up from a vertex already connected to the network, the existing connections within the ∂ d neighbours form a distribution of the similarity distance. Then the probability to attach to a vertex is higher if it has a closer relation (larger similarity distance) with the active one. This term can act as an additional mechanism to the other two mechanisms, or it can drive the network evolution alone. First, we keep the degree term and the δ term in our model so that p = δ ∈ (0, 1), and let Clustering coefficient when the δ mechanism is added onto the pure degree-driven model, where the first line above is the one from the pure δ mechanism. The plots of weight/degree distributions have been omitted from those mixing models because they are very similar with the above ones from pure models. in equation (2). In figure 5, simulation shows that this δ mechanism significantly increases the clustering coefficient. The distribution of degree and weight still follow a power law. Similar results are obtained when this δ term is added into the weight-driven model. In figure 6, we set p = δ = 1 to keep the δ term only and with also d = 2. The model is now a pure DDPA model. The simulation shows that even this case demonstrates all the typical properties and a high clustering coefficient.

9
Institute of Physics ⌽ DEUTSCHE PHYSIKALISCHE GESELLSCHAFT

Conclusion
In this paper, we presented evolutionary models for weighted networks which integrate the contributions from both new and old vertices. The three mechanisms: degree-driven, weightdriven and distance-dependent preferential attachment are discussed and compared. Weight has been assigned to each link according to the connection count of the link, which increases when the connection is repeated, so the weight of the link changes as the network evolves. Including the behaviours from the old vertices and recording the connection counts and converting them into weight, are the essential ideas in this work. The way to incorporate local distance into network evolution through the δ-mechanism is also one highlight of this work. Although it is not a new idea to convert connection counts into edge weight, it is the first time it has been the key point in network modelling. Because of this link between weight and connection count, we do not need another mechanism for the evolution of weights. However, the emergence of typical results from our models in an agreement with empirical studies makes this simple idea valuable.