Triadic closure dynamics drives scaling-laws in social multiplex networks

Social networks exhibit scaling-laws for several structural characteristics, such as the degree distribution, the scaling of the attachment kernel, and the clustering coefficients as a function of node degree. A detailed understanding if and how these scaling laws are inter-related is missing so far, let alone whether they can be understood through a common, dynamical principle. We propose a simple model for stationary network formation and show that the three mentioned scaling relations follow as natural consequences of triadic closure. The validity of the model is tested on multiplex data from a well studied massive multiplayer online game. We find that the three scaling exponents observed in the multiplex data for the friendship, communication and trading networks can simultaneously be explained by the model. These results suggest that triadic closure could be identified as one of the fundamental dynamical principles in social multiplex network formation.

Social networks often exhibit statistical structures that manifest themselves in scaling-laws which can be quantified through a set of characteristic exponents. Maybe the three most relevant scaling laws in terms of network formation are the linking probability for new nodes joining the network as a function of degree of the existing (linked-to) node, the degree distribution, and the clustering coefficient of nodes as a function of their degree. In particular, the probability for a node to acquire a new link, the attachment kernel Π(k), often scales with the node degree k [1,2] as The degree distribution of social networks, i.e. the probability to find a node with a given degree k, P (k), often shows features of exponential, fat-tailed distributions [3,4] or something inbetween, depending on the type of social interaction [5,6]. They can be parameterized conveniently by the q-exponential [7,8], with q a parameter that determines an asymptotic scaling exponent 1/(1 − q). A third scaling law, which is ubiquitous in social networks [5,6,9,10], is observed for the clustering coefficients c(k) as function of node degree, Despite the overwhelming empirical evidence for the scaling laws in equations (1 -3), it is still undecided if they share a common dynamical origin, and if and how characteristic exponents are related to each other. For example, for growing network models, where new nodes are constantly added which link through a preferential attachment rule to already existing nodes [3], a relation between scaling exponents of the degree distribution and the attachment kernel γ has been found [11]. However, these models can not explain the observed scaling of the clustering coefficients. Moreover, the preferential attachment process [3] requires global information (the degrees of all nodes in the network) to establish a new social tie, which is clearly an unrealistic assumption for most social networks. To overcome this problem, growth and preferential attachment mechanisms have been extended by local network formation rules [12,13,14,16], where a node's linking dynamics only depends on its neighbors or second neighbors. One such local rule which is extremely relevant for social network formation is the principle of triadic closure [17,18], which means that the probability of a new link to close a triad is higher than the probability to connect any two nodes. Scaling-laws for the degree distribution [13], degree distribution and clustering coefficients [14,15], and preferential attachment [16] have been reproduced in the context of specific models using triadic closure, respectively. While it is instructive to see how a combination of growth, preferential attachment and clustering processes give rise to the three scaling laws above, this does not help us to understand if the existence and possible inter-relations of the three exponents can emerge from a single underlying dynamical origin, and to which extent this common origin is an actual feature of real social network formation processes. Less is known on relations between characteristic exponents in non-growing, stationary  Figure 1. Node i (with more than two links) and one of its neighbors j are randomly selected. With probability r the process of triadic closure takes place (triad consists of i, j, k), with probability 1 − r, j links to a random node. networks [7,19]. It has been shown that triadic closure is related to scaling-laws for the degree distribution and clustering coefficients in the stationary case [20,21,22,23].
Here we study a simple model that simultaneously explains the three scaling laws in equations (1 -3) based on the process of triadic closure in non-growing networks. This process introduces a mechanism from which preferential attachment emerges, leads to fat-tailed degree distributions, and induces scaling of the clustering coefficients with node degrees. The model is validated with data from a social multiplex, i.e. a superposition of several social networks labeled by α with adjacency matrices M α , defined on the same set of nodes [24]. The model can be fully calibrated with the multiplex data and explains three observed characteristic exponents for three different sub-networks of the multiplex.

Model specification
The model is built around the process of triadic closure, the principle that links tend to be created between nodes that share a neighbor. The model includes the addition and removal of nodes. The network is initialized with N nodes, each node having one link to a randomly chosen node. The dynamics is completely specified by an iteration of the following steps, starting at t.
(i) Pick a node i at random. If i has less than two links, create a link between i and any randomly chosen node, and continue with step (iii). If i has two or more links, choose one of its neighbors at random, say node j, and continue with step (ii).
(ii) With probability r (triadic closure parameter), create a link between j and another randomly chosen neighbor of i, say k. With probability 1 − r, create a link between j and a node randomly chosen from the entire network, see figure 1.
(iii) With probability p (node-turnover parameter) remove a randomly chosen node from the network along with all its links, and introduce a new node linking to m randomly chosen nodes. Then continue with time-step t + 1.
For p > 0 nodes have a finite lifetime, which implies that the network reaches a stationary state where the total number of links L(t) and the network measures Π(k), P (k), and c(k) fluctuate around steady state levels. The model is a variant of the model proposed in [20], which is contained as the special case r = 1 in the above protocol. Our model can also be seen as a stationary version of the connecting-nearest-neighbors-model in [14].
Combinations of triadic closure and random edge attachment have also been studied in growing [13,15], and weighted [22] networks. Reaching a stationary state is independent of m. The model is completely specified by four parameters, N, r, p, and m. Table 1. Summary of network measures and model results. For the Pardus friendship (α = 1), communication (comm., α = 2), and trade (α = 3) networks the number of nodes N α , links L α , average degreek α , and average number of nodes entering and leaving the network per day, ∆n + α and ∆n − α , are shown. The results of the calibration of the model to the empirical networks, r and p, are given, together with the fit results of the parameters γ, q, and β for the data and the model. type network features parameter exponents (data and model)

Estimation of model parameters
Social ties are often established between two individuals by being introduced by a mutual acquaintance. Other modes of social tie formation, such as random encounters may not lead to triadic closure.
Step (ii) in the above protocol captures these two linking processes. Ties also change because people enter and leave social circles, for example they change workplaces, move to different cities, or change their hobbies. This is incorporated in step (iii). To calibrate the model to a real social multiplex network, M α with N α nodes and L α links, the stationarity assumption has to be checked, and the parameters for triadic closure r, and node-turnover p have to be estimated. Consider the average number of nodes entering (∆n + α ) and leaving (∆n − α ) the network M α per time unit. For stationarity to hold we demand i.e. the net growth rate is much smaller than the rates at which nodes enter or leave the network. The triadic closure parameter r α can be directly measured as the ratio between the number of links in network M α which -at their creation -close at least one triangle, and the total number of created links. The node-turnover parameter p can be estimated by demanding for the number of links in the model and in the real network to be the same. To see this, note that one adds on average ∆l + and removes ∆l − links per time-step. Stationarity means that ∆l + = ∆l − . Since one link is created at each time-step in either step (i) or (ii), and with probability p, m links are added in step (iii), we have ∆l + = 1 + pm. Denoting the average degree byk = 2N L , with probability p, in step (iii), one removes on averagek links per time-step, ∆l − = pk. To calibrate the model to a network M α the turnover parameter p α is The model is initialized with N α nodes and the dynamics follows the protocol with parameters r α and p α . After a transient phase the number of links fluctuates around L α , and the scaling exponents γ, q, β approach stationary values. Calibration of the model requires complete, time-resolved topological information M α (t) over a large number of link-creation processes. Suitable data is available for example in the social multiplex network of the online game 'Pardus' [6,25,26,27,28], see the Methods section. Table 1 summarizes key features of M α , including the number of nodes N α , links L α for the Pardus friendship (α = 1), communication (α = 2), and trade (α = 3) networks. Table 1 also lists the average degreek α , as measured on the last day of the observation record, and the average number of nodes entering (∆n + α ) and leaving (∆n − α ) per day, confirming that the networks are in fact stationary in the sense of equation (4). Estimates for r and p are also shown in table 1.

Characteristic exponents
Simulation results for the values of the characteristic exponents γ, q, and β in the model depend on the parameters p and r, as shown in figure 2. We fix N = 10 3 and m = 0. Results are averaged over 500 realizations for each parameter pair (p, r). All three scaling exponents, equations (1-3), can be explained by the model.
Model exponents for γ fall in the range 0 < γ < 1, depending on p and r, figure  2(a). γ is close to one for high p and high r. The preferential attachment associated with triadic closure is therefore sub-linear. The dependence of the exponent q on both p and r is shown in figure 2(b). Note that for q = 1 the q-exponential is equivalent to the exponential. Values of q above (below) one indicate that the distribution decays slower (faster) than the exponential. For small p and large r, q is significantly larger than one and degree distributions are fat-tailed. For large p the values of q approach one, independent of r. Values for β are close to zero for r = 0 or p going to 0. β approaches a plateau at β = −1 for high values of p and r, see figure 2(c).
For the experimental validation of the model, figure 3 shows the attachment kernel Π α (k α ), degree distribution P α (k α ), and clustering coefficients c α (k α ) for the three subnetworks M α of the empirical multiplex data. They are compared to the respective distributions of the calibrated model (results averaged over 20 realizations). Data and model results are logarithmically binned, a version of figure 3 showing raw data can be found in the supplementary information.
The observed preferential attachment in the data is in good agreement with model results for each network M α , see top row of figure 3. We find exponents of γ = 0.88 (4) for the data and γ mod = 0.77 (2) (3), respectively. The model results for c α (k α ) show a curvature and are not straight lines. Comparing the curves for α = 1, 2, 3 suggests that this curvature increases with the average degreek α . Values for β mod should be interpreted as first order approximations for the slopes of these curves. Results for the exponents γ, q, β for data and model are summarized in table 1.

Discussion
We reported strong evidence that the process of triadic closure may play an even more fundamental role in social network formation than previously anticipated [17,18]. Given that all model parameters can be measured in the data, it is remarkable that three important scaling laws are simultaneously explained by this simple triadic closure model. Since exponents γ, q, and β are sensitive to choices of the model parameters p and r, the agreement between data and model is even more remarkable.
The Pardus multiplex data contains three other social networks, where links express negative relationships between players, such as enmity, attacks, and revenge [6]. Triadic closure is known to be not a good network formation process for negative ties, "the enemy of my enemy is in general not my enemy" [29]. It was shown that the probability  The attachment kernels scales sub-linearly with the node degrees in each case for data (γ) and model (γ mod ). Curves for data and model are barely distinguishable from each other. Middle row: Degree distributions for α = 1, 2, 3 and best fits of a q-exponential, for data (q) and model (q mod ). Bottom row The scaling of the average clustering coefficients as a function of degree is compared between data and model. Fits for β and β mod yield almost the same results for friends and trades, with comparably larger deviations for the communication network. The model results for c α (k α ) show an upwards curvature for high k α .
of triadic closure between three players is one order of magnitude smaller for enmity links when compared to friendship links in the Pardus multiplex [25,6]. The model is therefore not suited to describe network formation processes of links expressing negative sentiments.
The findings in the current model also compare well to several facts of real-world social networks. Sub-linear preferential attachment has been reported in scientific collaboration networks and the actor co-starring network (Π(k) ∝ k 0.79 and ∝ k 0.81 , respectively [2]). Degree distributions of many social networks often fall between exponential and power-law distributions [3,4,5,25,30], and scaling of the average clustering coefficients as a function of degree, has been observed in the scientific collaboration and actor networks with values for c(k) ∝ k −0.77 and ∝ k −0.31 , respectively (when same fitting as in figure 3 is applied). Mobile phone and communication networks give ∝ k −1 [31].
In the Pardus dataset players are removed if they choose to leave the game or if they are inactive for some time [25]. In the mobile communication, actor, and collaboration networks, a link is established by a single action (phone call, movie, or publication) and persists from then on. Note that our model addresses the empirically relevant case where node-turnover rates (∆n + α , ∆n − α ) are significantly larger than the effective network growth rate (∆n + α −∆n − α ). For growing networks (without node deletion) it has been shown that sub-linear preferential attachment (γ < 1) leads to degree distributions with power-law tail with an exponent proportional to γ [11]. Something similar can be observed in the present model. If we keep the node-turnover parameter p fixed and decrease the triadic closure parameter r, figures 3(a) and (b) show that γ decreases and q approaches one. The network is dominated by randomly created links. However, if we fix r = 1 (only triadic closure, no random links) and increase p, figures 3(a) and (b) show that q approaches one despite an increase in γ. An increase of the node-turnover parameter p implies a shorter life-time for individual nodes and hence a shorter time in which they may acquire new links. Consequently, the degree distribution only has a substantial right-skew if both, p 0.25, and r 0.5 holds.

Multiplex data
The Pardus dataset allows to continuously track all actions of more than 370,000 players in an open-ended, virtual, futuristic game universe where players interact in a multitude of ways to achieve their self-posed goals, such as accumulating wealth and influence. Players can establish friendship links, exchange one-to-one messages (similar to phone calls) and trade with each other. We focus on three sub-networks (friendship, communication, trade) of the multiplex, over one year from Sep 2007 to Sep 2008. Network label α = 1 refers to the friendship network, α = 2 for communication, and α = 3 for trade. In the friendship network a node is present on a given day if at least one friendship link to another node exists on that day. A node is removed if the player either leaves the game or has no friendship link. The same holds for the message and trade networks, where a link exists between two nodes on day t if at least one message (trade) is exchanged within the period of six days, [t − 6, t]. For details of structural and dynamical properties of the Pardus multiplex, see [6,25,26,27,28].
To measure the degree distributions P α (k α ) and clustering coefficients c α (k α ), we use the adjacency matrix of the networks M α on the last day of the data record. The preferential attachment probability Π α (k α ) is measured by counting (over the entire observation period) the number of link-creation events in which a node with degree k acquires a new link, and then dividing this by the average number of nodes with degree k, where the average is again taken over the observation period.

Fitting procedures
Power-law fits (least-squares) to the logarithms of the logarithmically binned data in figure 3 are shown for γ, for 2 < k (α) < 100, and for β over the range 5 < k (α) < 100, for each α, for data and model. The reported errors are the standard deviations of the coefficients. For the degree distributions the data is also logarithmically binned and fitted over the entire range k (α) > 0 in figure 3 with equation (2). The coefficients are obtained as maximum likelihood estimates, reported errors correspond to the 95% confidence intervals. For better comparison and to diminish the effect of outliers, data and model results for Π α (k α ) are normalized over the range k α ≤ 100. Higher values correspond to data outliers, often due to behavior of non-serious players.

Acknowledgments
This work was supported by Austrian science fund FWF P23378 and EU FP7 projects CRISIS no. 288501 and LASAGNE no. 318132. We thank B. Fuchs and M. Szell for data issues.