Asymptotic theory of time-varying social networks with heterogeneous activity and tie allocation

The dynamic of social networks is driven by the interplay between diverse mechanisms that still challenge our theoretical and modelling efforts. Amongst them, two are known to play a central role in shaping the networks evolution, namely the heterogeneous propensity of individuals to i) be socially active and ii) establish a new social relationships with their alters. Here, we empirically characterise these two mechanisms in seven real networks describing temporal human interactions in three different settings: scientific collaborations, Twitter mentions, and mobile phone calls. We find that the individuals’ social activity and their strategy in choosing ties where to allocate their social interactions can be quantitatively described and encoded in a simple stochastic network modelling framework. The Master Equation of the model can be solved in the asymptotic limit. The analytical solutions provide an explicit description of both the system dynamic and the dynamical scaling laws characterising crucial aspects about the evolution of the networks. The analytical predictions match with accuracy the empirical observations, thus validating the theoretical approach. Our results provide a rigorous dynamical system framework that can be extended to include other processes shaping social dynamics and to generate data driven predictions for the asymptotic behaviour of social networks.

The formation of social networks requires invest-ments in time and energy by each individual actor with the anticipation that collective benefits can arise for individuals and groups.Individuals however invest in developing social interactions heterogeneously and according to very diverse strategies.In the first place not all individuals are equally active in a given social network.Furthermore, individuals may allocate their social capital in very diverse way, for instance by favoring the strengthening of a limited number of strong ties (bonding capital) as opposed to favor the exploration of weak ties opening access to new information and communities (bridging capital) [1,2,3,4,5,6,7,8].The origins of such heterogeneities are rooted in the trade off between competing factors such as the need for close relationships [9], the efforts required to keep social ties [10], temporal and cognitive constraints [11,12,13], and have long been acknowledged as key elements in the description of social networks' properties [14,15,16], dynamical features [17,18,19,20,21,22,23,24,15,25], and the the behavior of processes unfolding in social systems [14,15,16,17,26,27,28,29,30,31,32].However, it is still lacking a general dynamical system framework able to relate the emerging connectivity pattern of social networks to the combined action of social actors activity and their heterogeneity in distributing resources in social capital allocation.
Here we analyze seven time-resolved datasets describing three different types of social interactions: scientific collaborations, Twitter mentions, and mobile phone calls.For all network datasets we define two functions statistically encoding the instantaneous activity of nodes and the allocation of social capital, respectively.The latter function is regulated by two parameters -system dependent-that define a simple reinforcement mechanism.In particular we observe in all datasets that the larger the number of social ties already activated by each node and the smaller is the probability of creating a new tie.We provide a thorough statistical characterization of the activity and reinforcement dynamics at play in each network and identify the basic parameters defining the dynamic of ties evolution.
Prompted by this statistical analysis, we propose a dynamic network model that includes the heterogenous activity of nodes and the the tie formation mechanisms.This model allows the definition of a formal Master Equation (ME) describing the evolution of the network connectivity structure that can be solved in the asymptotic regime (large network size and long time evolution).The solution of the ME provides the asymptotic form the degree distribution and the scaling relations relating degree, activity and and the functions characterizing the social capital allocation.The analytical solutions are capturing very well the empirical behavior measured in the analyzed datasets, connecting explicitly the evolution of social networks to the parameter regulating the emergence of heterogenous social ties.The proposed analytical framework is remarkably general and it can be solved for statistically different activity patterns.The presented results have the potential to open the path to a general asymptotic theory of the dynamic of social networks by progressively integrating further social capital allocation strategies for the formation of social ties.

Results
We analyze seven datasets containing time-stamped information about three different type of social interactions: scientific collaborations, Twitter mentions, and mobile phone calls.While we refer the reader to the Material and Methods section for the details of each data set, we represent all datasets as timevarying networks.Each node describes an individual.Each time-resolved link describes a social act.The nature of connections is different according to the specific dataset.Links might represent a collaboration resulting in a publication in a scientific journal, a Twitter mention, or a mobile phone call.We considered five scientific collaborations networks obtained from five different journals (P RA, P RB, P RD, P RE, and P RL) of the American Physical Society (APS), one Twitter mentions network (T M N ), and one mobile phone network (M P N ).
In order to characterize the time-varying properties of such networks we first measure the activity a i .Formally, a i is defined as the fraction of interactions in which node i is engaged per unit of time with respect to all the interactions per unit time occurring in the network.This quantity describes the propensity of nodes i to be involved in social interactions.Empirical measurements in a wide set of social networks show broad distributions of activity [23,28,29,16,33].As shown in Figure 1 [A-D], we confirm these observations in our datasets.In particular we find that in the APS and MPN datasets the activity is well fitted by a truncated power law, while in the TMN we find a Log-Normal distribution (see Material and Methods and Supplementary Online Materials for details).

Social capital allocation
The activity a i sets the clock for the activation of each node, however it does not provide any information on how each node invests its social capital in exploring new ties or reinforcing already established ties [34].In order to measure the formation of new ties, we group nodes in classes with similar activity a and final degree k, so that each class b contains actors with statistically equivalent characteristics (see SI for details).We then measure the probability p b (k) that the next social act for the nodes in the class b that have already contacted k nodes will result in the establishment of a new, k + 1-th, tie.As shown in Figure 1 [E-H] p b (k) is in general a decreasing function of k.This observation resonates with previous research and empirical findings suggesting that our social interactions are bounded by cognitive and temporal constrains [10,11,12,13].Indeed, the larger the number of alters in our social circle, the smaller the probability that the next social act will be towards a new tie.
The above empirical findings suggest that the mechanism governing the allocation of social capital follow a general form that in its simplest analytical form can be written as: In this expression, β b modulate the tendency to explore new connections, while c b define the intrinsic characteristic limit of the individual to maintain multiple ties.Although one could imagine more complicate analytical forms, we use this parsimonious approach to characterize the different data sets.Interestingly, we find that in the five co-authorship networks and Twitter, the exponent β is the same regardless of the class b.Furthermore, the values of c b are typically peaked around a well defined value (see SI for details).More in detail, we can rescale the proposed functional form in each class b by defining the variable In the presence of a single exponent β characterizing the system, as shown in Figure 1 [I-K], all empirical curves do collapse on the reference function (1+x) −1 .The data collapse however is not occurring in the case of the MPN dataset.In the latter we find a more heterogeneous scenario in which different nodes' classes are characterized by different values of β b and c b , see Figure 1 L. In the Supplementary Online Material we provide further evidence for the evidence of a single or distirbuted value of β in different datasets.

Stochastic model for the network dynamic
By leveraging on the empirical evidence gathered here, it is possible to define a basic generative model of network formation based on two stochastic processes.Defined the network G containing N nodes, at each time step a node i is active according to a probability a i drawn from distribution F (a). [23,28,29,16,33].Once active, the node i that has already contacted k different agents will contact a new, randomly chosen node with probability Otherwise, with probability 1 − p i (k), it will interact with an already contacted node chosen at random.Interactions are considered to last one single time step.For this model it is possible to write explicitly the master equation (ME) describing the evolution of the probability distribution P i (k, t) that a node i has degree k at time t: In the above equation j ∼ i and j i are the sum over the nodes already contacted and not yet contacted by i, respectively.Within these sums, we use k j as the degree of the node j.The first two terms on the right hand side of Eq. ( 3) account for the creation of nodes of degree k which occurs when a node of degree k − 1 gets active and contacts a new node, or when it gets in contact with a new node of previous degree k j that activates and attaches to node i.
The third and fourth terms of the r.h.s. of the equation account for the conservation of nodes of degree k, i.e. nodes that either get active and contact one of their neighbors with probability a(1 − p(k)) or get contacted by one of their neighbors.The last line of Eq. (3) takes into account for the case in which no node gets active in the current evolution time step, thus conserving the P i (k, t).

Asymptotic theory for network with
In the case of networks characterized by a single exponent β it is possible to consider for the ME the large time and large k limit, so that k can be approximated by a continuous variable.By neglecting the subleading terms of order 1\t we can thus write the continuous asymptotic version of Eq. ( 3) as This equation can be solved explicitly (see SI for details), yielding the asymptotic form: where A is a normalization constant, C a constant and B(a i , c i ) a multiplicative factor of the t 1/(1+β) term that depends on the activity a i and c i of the considered agent.Its implicit expression is given in the SI.
A first general result concerns the evolution in time of the average degree k(a, t) of nodes belonging to a given activity class that follows the scaling laws k(a, t) ∝ (at) The growth of the system is thus modulated by the parameter β that sets the strength of the reinforcement process in the process ruling the establishment of new social ties.In the limit case β = 0 the growth would be linear.Indeed, the reinforcement of previously activated ties would be zero and nodes would keep connecting randomly to other vertices, thus increasing indefinitely their social circle.In the opposite limit β → ∞ each node would invest is social capital on just one single connection, i.e. the first established.In the six datasets described by a single β value, we observe the range 0.13 ≤ β ≤ 0.47 that indicates a sub-linear growth of the social system.In Figure S12 we find a very good agreement between the analytical prediction of Eq. ( 6) and the empirical k(a, t) curves, obtaining the first empirical validation of the modeling framework proposed and its ability at capturing the network formation dynamic.Furthermore, Eq. ( 6) connects, at a given time t, the degree k and the activity a of a given node, as k ∝ a 1 1+β .Thus, given any specific activity distribution F (a), we can infer the functional form of the degree distribution ρ(k) by substituting a → k 1 1+β , finding: It is important stressing that the analytical framework is not limited to a specific functional form of the activity.Indeed, with an arbitrary functional form of F (a), Eq. ( 6) gives us the possibility to predict the behavior and parameters of the corresponding degree distribution.In Table 3 we report the degree distribution predicted by Eq. ( 6) for activities following a common set of heavy-tailed distributions, i.e. power-laws, truncated power-law, stretched exponentials, and log-normal, that are usually find in empirical data.In Figure S12[E-G] we compare the degree distributions ρ(k) predicted by Eq. (S22) with real data.Interestingly, also in this case the functional form obtained from the analytical solution of the model fit remarkably well the empirical evidence.
It is important to notice that ρ(k) is also function of the parameter β.In other words, the connectivity patterns emerging from social interactions can be inferred knowing the propensity of individuals to be involved in social acts, the activity, and the strength of the reinforcement towards previously establish ties, β.Finally it is worth remarking that Eqs.(6, S22) are not affected by the distribution of c i .This is an important result as it reduces the number of relevant parameters necessary to define the temporal evolution of the system.

Asymptotic theory for networks with distributed β
As we already mentioned, in the M P N dataset we find the evolution of social ties described by a distribution of β rather than a single value of it.This observation points to a more heterogeneous distribu-tion of social attitudes with respect to the other six analyzed datasets.Arguably, such tendency might be driven by the different functions phone calls serve enabling us to communicate with relatives, friends or rather to companies, clients etc..The need to introduce different values of β in the system complicates the model beyond analytical tractability (see SI for details).Nevertheless, we find that the leading term of the evolving average degree can be described by introducing a simplified model, in which the nodes of the system feature different values of β and undergo a simplified dynamics (see SI for further information) that neglects, for every node, the effects of links established by others.In these settings we can solve the ME and show that the minimum value of β, β min , rules the leading term of the evolving average degree.
In other words, we find that even in this case k(a, t) evolves as in Eq. ( 6) but with β substituted by β min .As shown in Figure S12-D the analytical predictions coming from the simplified model find good agreement with the empirical evidence.It is interesting to notice that the nodes characterized by β min are those with the weak tendency to reinforce already established social ties.They are social explorers [34].Notably, our results, indicate that they lead the growth of average connectivity of the network.

Discussion
The empirical finding presented here shows clearly that the "cost" associated to the establishment of a new social tie is not constant but is function of the number of already activated ties, thus supporting the idea that social capabilities are limited by cognitive, temporal or other forms of constraints [11,12,13].
Framing this empirical finding in a simple stochastic model of network formation, we can derive a general asymptotic theory of the network dynamic and derive the general scaling laws for the behavior in time of the node's degree and degree distribution.
The model comes with some shortcomings.Indeed, it does not capture the modular structure or, more in general, correlations beyond the nearest neighborhood that are typical of many social networks [35].In fact, individuals tend to organize their social cir-cles in tight, often hierarchical, communities.The model does not capture the burstiness typical of social acts [36,37].We consider a simplified Poissonian scheme of nodes activation.A recent extension of the activity driven framework, without the reinforcement mechanism acting on social ties, has been proposed to account for non Poissonian node dynamics [38].This is the natural starting point to generalize our model to bursty activities.Furthermore, the model does not consider the turnover of social ties [34].Indeed, in our framework once a social connection has been established it cannot be eliminated in favor of others.Clearly, this feature is of particular importance when considering social systems evolving on longer time scales, as the scientific journals we studied here, and might influence the measurement of the parameters describing evolution of the ego-networks.
Notwithstanding these limitations, the modeling framework we propose pave the way to a deeper understanding of the emergence and evolution of social ties.The agreement between the analytical predictions and observed behaviors in seven real datasets, describing different types of social interactions, are encouraging steps in this direction.Finally, our results are a starting point for the development of predictive tools able to forecast the growth and evolution of social systems based not just of regression models or simplified toy models but on a more rigorous analysis of ego-network dynamics.

Datasets
We analyzed seven large-scale and time resolved networks describing three different types of social interactions.
• Five networks from the APS datasets takes into account the co-authorship networks found in the Journals of the American Physical Society.Specifically, the PRA dataset covers the period from Jan.

Asymptotic solution of the ME for distributed β i values
The solution of Eq. ( 4) found in Eq. ( 5) holds if the system feature a single value of β.As already discussed in the MPN dataset we find multiple values of β ranging from a minimum value, βmin to a maximum one β max .To find a prediction of the long time behavior of such a system, let us propose a simplified model in which we focus on a single agent whose parameters are a i , β i and c i .In this simplified version the agent can only call other nodes in the network, i.e. we neglect the contribution coming from the incoming calls).In this approximation we have to solve a modified version of Eq. ( 3), obtained by discarding all the terms containing the activity a j of the nodes j = i.By repeating the same procedure above, we get to the continuum limit that reads: whose solution is similar to Eq. ( 5), the only differences being the value of β = β i and the behavior of the B(a i , c i ) constant (see Materials and Methods and the SI for details).Interestingly, even in this case we find an average degree k(a, t) growing accordingly to the exponent β i , i.e. k(a, t) ∝ (at) 1 1+β i .Now, let us create a reservoir of N distinct nodes of equal activity a and assign to each of them a different value of βi drawn from an arbitrary distribution P (βi).Let us also group these nodes in B classes, defined so that each class i contains all the nodes featuring a similar value of β ∼ βi.If we now let these N nodes evolve following the simplified model above, the average degree of each class i will grow as k i (a, t) ∝ t 1 1+β i .Then, in the long time limit, the minimum value of β i , i.e. β min , will lead the growth of the ensemble's average degree (see SI for further details), i.e.

F (a) and ρ(k) distributions from real data
We implement the method found in [39] to determine the most likely functional form of both the activity and degree distributions.The fitting procedure is as follows: for each functional form of the distribution considered (power law, log-normal, truncated power law and stretched exponential) we first determine the x min value, i.e. the lower bound to the functional form behavior.The x min value is defined as the value that minimizes the Kolmgorov-Smirnov (KS) distance between the analytical complementary cumulative distribution (CDF) and the CDF of the data.The latter are found for each value of x min by computing the optimal parameters of the distribution using the maximum-likelihood estimator (MLE).Then, comparing the CDF(x ≥ x min ) of the data S(x) with the analytical one S(x), we compute the KS-distance as the maximum distance between the two CDF, i.e.KS d (x min ) = max xi≥xmin |S(x i ) − P (x i )|.Once all the distances are computed we determine x min as the values at which the minimum distance is recorded, i.e. x min = min x KS d (x) (see SI and [39] for details).Once we compute all the parameters for all the functional forms analyzed we compare them with the likelihood ratio test R combined with the p-value that gives the statistical significance of R (see SI for details).The result of this procedure gives us the best candidate for the F (a) for each dataset.We find that a truncated power law is the best candidate for all the APS datasets together with the MPN one.The only exception is the TMN that displays a Log-Normal distribution of activity (see Fig. 1 and SI for details).
After we estimate the functional form and the parameters of the activity distribution F (a), Eq. (S22) gives us the possibility to predict both the functional form of the degree distribution ρ(k) and the values of the parameters of such a distribution (e.g. the α exponent in a power-law with cutoff, see Table 3 for details).The degree distribution can then be fitted by optimizing over the non-scale-free parameters for whose values we do not have an analytical or numerical prediction (e.g. the cut-off τ in a power-law with cutoff).Indeed, we are missing the value of the constant in front of the (at) term in the growth of the average degree k(a, t) in Eq. ( 6).  3) is shown for comparison (light blue solid lines).As in Fig. 1 we show the data ranging from the lower bound of the degree distribution to the 99.9% of the data range, thus excluding from the plot area the higher 0.1% of the degree values.

PDF
F (a) ρ(k) Log-Normal Fig. 3: The functional form of the activity PDF F (a) and the predicted functional form of the ρ(k) degree distribution as found in Eq. (S22), i.e. by replacing a → k 1+β .This substitution fixes the scale free parameters of the resulting distribution, i.e. the exponent of the power-law and of the k term at the exponent in the first three cases, and the STD σ k = σa 1+β in the Log-Normal case.The free parameters over which we fit the degree distribution are: (i) the cut-off τ in the stretched exponential and power-law with cut-off and (ii) the γ average value in the Log-Normal case.The selected PDF are, from top to bottom: power law, stretched exponential (Stret.Exp.), power law with cutoff (Trunc.PL) and the Log-Normal distribution.
The APS dataset contains the five co-authorship networks of five journals of the American P hysical S ociety, i.e., Physical Review A, B, D, E and Letters (L).
The various datasets contains the data referring to all the issues of the single journals from their first issue up to a certain edition, specifically: Each dataset is composed by several files (one per month).Each file has as many lines as the number of papers published in that month.Finally, each line contains the IDs of the authors of the specific paper.For instance, the typical head of a file is: The data are cleaned so as to not take into account the papers with a single author.
When analyzing this dataset we define the user's activity a i as the number of engaged collaborations (e.g. an author i that publish two papers, the first with 3 co-authors and the second with a single co-author, has activity a i = 4).

S1.2 Twitter Mention Network
The dataset of Twitter is composed by 273 daily files covering the period between January the 1 st to September the 30 th 2008.The dataset contains the so called fire-hose, i.e., all the 16, 329, 466 citations done by all the 536, 210 users in the given period.The nodes in the network are connected via 2, 620, 764 edges.
Each file contains the daily events with the structure: This dataset is not cleaned, as we have all the events that happened on the platform in the selected period.
When analyzing this dataset we define the user's activity a i as the number of citation made by i, i.e. the number of events actually engaged by the node i.

S1.3 Mobile Phone Network
The dataset of the M obile P hone C alls (MPC ) is composed by a single file containing the 1, 949, 624, 446 time ordered events with 1 second resolution covering the period between January and July of 2008 for 6, 779, 063 users of a single operator with 20% market share in an undisclosed European country.
The dataset contains all the events from and toward users of the company (so that even the calls from non-company users to company users and vice-versa are taken into account).As a result, we have 33, 160, 589 nodes (of which 6, 779, 063 are users of the selected company) that are connected via 92, 784, 825 edges.
We split the huge list of events in 98 files (each of them containing more or less the same number of events) for computing convenience.Each file contains events with the structure: When analyzing this dataset we define the user's activity a i as the number of calls done by the node, i.e. the number of calls actually engaged by the node i.

S2 Data analysis S2.1 Activity distribution and the nodes binning
For the datasets presented in Section S1 we first evaluate, for each node i, the total number u i of events engaged by the node itself.For instance u i is the number of calls made by the node i in the MPC dataset or the number of citations done by i in the Twitter dataset.
We then define the node activity a i as the ratio between the i-th node's number of events and the total number of events observed in the dataset, i.e. a i = u i /u tot where u tot = j u j .Thus, a i falls in the range a i ∈ [ , 1.0) with = min i (u i )/u tot .We then introduce and compute the activity distribution F (a).In Fig. S1 we show the resulting activity distribution for each analyzed dataset, while in Table (1) we show the best candidate functional form for the F (a) distribution of each dataset.The latter is estimated using the methods found in [39].
In particular, we compare the goodness of fit on the F (a) distribution of the functional forms found in Table [1] of the main paper, i.e. power-law, truncated power-law, stretched exponential and log-normal distribution.The procedure for each dataset and each functional form reads as follows: -we fit the F (a) taking into account all the nodes featuring a i ≥ x min , where x min is the lower bound of the distribution.The fit is performed using the maximum likelihood estimators (MLE) that return the optimal values of the parameters; -once the optimal parameters are found we compute the Kolmogorov-Smirnov distance (KD d (x min )) between the analytical and experimental complementary cumulative distribution function (CDF); -we then apply this procedure for different x min and set the a min = min xmin KS d (x min ) lower bound value as the one that minimizes the KS d .TMN and (G) MPN (blue points).We also show the best candidate fit of the F (a) distribution (blue solid lines) featuring the functional form and parameters found in Table (1).In all the plots we show the data and fit ranging from the lower bound xmin to the 99.9% of the measured data, thus excluding from the visible area the top 0.1% of the activity values (see Table (1) for the lower bound details).Tab.1: The candidate functional form of the activity distribution for each analyzed dataset, the evaluated parameters (see Table [1] in the main for the analytical expressions), the Kolmgorov-Smirnov distance KS d , the percent % of nodes in the dataset that have activity ai ≥ amin and the normalized log-likelihood L .In the parameters we include amin that is the value of the activity that minimizes the KS distance.This is the lower bound for the functional form behavior, i.e. the point at which data behave as the functional form.

Dataset Distribution
We then repeat this procedure for all the functional forms of the F (a) and we then compare them with the likelihood ratio test R combined with the p-value that gives the statistical significance of R [39,40].The result of this procedure gives us the best candidate for the F (a) for each dataset as shown in Table (1).We find that a truncated power law is the best candidate for all the APS datasets together with the MPN one.
On the other hand, in the TMN we find a log-normal distribution as the best candidate for the dataset.
Our datasets provide evidence that nodes within the same activity class (i.e.node with similar values of activity a i ) can feature very different memory behavior.In particular agents with large activity may connect to very few different nodes (strong reinforcement) or establish new links at almost every step (weak reinforcement).For this reasons each node i of the network is naturally classified according to her activity a i and her final degree k i , i.e. the total number of different agents that have been connected to i in the considered time window.
We then define a binning procedure that let us group together the similar nodes, i.e. nodes with similar activity and final degree.We divide the nodes in N act activity classes so that within each activity class the most active node performs at most 1.5 times the events of the least active node.Then, with the same procedure, we further group the nodes within each activity class a according to their final degree, thus defining N deg (a) final degree classes.The nodes are therefore divided in N b = Nact a=1 N deg (a) activity-degree classes.From now on, unless differently stated, whenever we mention the nodes' class or bin b we will be referring to one of these N b classes.

S2.2 The reinforcement process
To measure the reinforcement process of each system, we count all the communication events e b (k) engaged by every node i of the b-th class when it has degree k i = k.In other words, e b (k) is the total number of events engaged by the nodes of the b-th class at degree k.
Each time an event engaged by a node i of the b-th class results in a degree increase k i = k → k i = k + 1, we increment the counter n b (k) by 1.In other words, n b (k) is the total number of events that the nodes belonging to the b-th and featuring degree k perform toward a new node.Of course, if a node i of the b-th class with degree k i = k increases its degree to k i = k + 1 because it gets called by a new node, the n b (k) counter is not incremented.
The best estimate of the probability for a new node to get establish a new connection at degree k then reads: where n b (k) and e b (k) are the event counters as defined above.We can give an estimate of the uncertainty on f b (k), by assuming that at a given degree k the events are independent (i.e.there are no correlations between users) and by checking that 1 We then fit f b (k) with the proposed reinforcement function p b (k, β): where c(b) is the social propensity of the b-th bin, k is the cumulative degree and β is the reinforcement strength, that will be kept fixed for all the nodes in the system.In particular, for each class b and with a fixed β, we optimize the parameter c(b), by minimizing the function χ 2 b (β): where the index k runs over the K b points of the b-th bin's curve and σ b (k) is as defined in Eq. (S2).By repeating this procedure for each value of β ∈ [0, 5.0] we find, for each class b, a χ 2 b (β) curve.In Fig. S2 we show the behavior of χ 2 b (β).For each class b we find a minimum of χ 2 b (β) at a certain β opt (b) (see the horizontal lines in the heat-map-like panels of Fig. S2).
Moreover, Fig. S2 shows that there are two different behaviors.Specifically, in the TMN case (see Fig. S2  (a)), one value of β opt = 0.47 fits most of the curves, exception made for some outsiders: the value of β opt (b) that maximizes the 1/χ 2 b (β) is practically the same for all the bins.On the contrary, in the MPC case the In the multi-β case instead, we compute the different values of the exponent β opt (b) found in the system by grouping the memory classes b accordingly to their final degree as shown in Fig. S2.The optimal value of β opt (b) is found to be minimum for the bins featuring a large final degree, i.e. β min ≡ β opt ∼ 1.2, which, as we will show in Section S3.2.3, is the exponent driving the evolution of the network.
To corroborate the results just outlined, we show in Fig. S4 the box plot of the β opt (b) distribution for different groups of nodes classes b grouped by their final degree.We note that the APS and TWT datasets are well approximated by a single β opt as the distribution of β opt (b) within each sub-group of nodes is compatible with the global optimal value β opt .On the other hand, in the MPN case we see that the large final-degree classes have their β opt (b) distribution centered around a smaller value of β min ∼ 1.2.As already anticipated, this value will lead the asymptotic growth of the system as we will show in Section S3.2.3.
As a last remark we present in Fig. S5 (a, b) the measured distribution of the constant c(b) for the MPN and TMN datasets.We show the distribution for all the nodes in the network and for each activity class a, i.e. the group of nodes featuring similar activity.The values of this constants are distributed but peaked around an average value.Moreover, the distribution of the c(b) parameter within each activity closely follows the global one.The distribution of the social attitude c(b) then appears to be a global, activity independent feature of the nodes in the system.Finally, in Fig. S5 (c  S3 The model

S3.1 Activity driven networks with no memory
The activity driven networks are an effective framework to describe time varying networks.The simplest memory-less model is defined as follows: the network consists of N nodes featuring an activity potential a i , i.e. the probability for a node i to get active in a certain time interval dt reads a i dt.The evolution rules are: (i) at each time step we start with N disconnected nodes; (ii) each node i whether gets active with probability a i dt or does not activate with probability (1 − a i )dt.If a node gets active it calls a randomly selected node j in the network, thus creating an edge e ij .(iii) At the end of the time step all the created connection are deleted and we start again from the initial step (i).
These evolution rules define the Master Equation (ME) for P i (k, t), i.e. the probability that a node i of activity a i has degree k at time t, where the degree k is the number of nodes that contacted i up to time t.We also set, without losing generality, dt = 1.The discrete time equation for P i (k, t) then reads: The equation is obtained in the approximation where a i 1, so that between two consecutive times t i = t and t i+1 = t + 1 only one site can be active.We will assume that the activity a i of a node i is small, i.e. 0 < a i 1, and we wil also consider the approximation 1 k N i.e. the integrated number of neighbors of a site is much larger than 1 but much smaller than the total number of agents N .The first term of the sum represents the probability that the site i is active and a new link is added to the system.The second term is the probability that the site i is active but this site connects to a site that has been already linked.In the third and fourth terms, the symbol j i denotes the sum over the sites that are not yet connected to i.In particular, the third term represents the probability that one of these sites is active and that it connects to i.The fourth term is the probability that one of these sites is active but no link between j and i is established.The fifth term is the probability that one of the sites already connected to i is active (being j∼i the sum over the nodes already connected to i); in this case no new link is added to i. Finally, the last term represents the probability that at time t all the sites are not active.For k N , the second term can be neglected.After some algebra we obtain the equation: given that h P j (h, t) = 1.For k N , we assume that 1 N j i a j = a i.e. the average value of the activity.In the limit of large time and large k we can write a continuous equation in t and k obtaining: The solution of Eq. (S9) is straightforward: ). (S10) In the large time limit this solution reduces to a delta function: P (a, k, t) = δ(k − (a + a )t) Therefore, the average degree k(a, t) of the nodes of activity a grows as: as already found in [23,41].Moreover the asymptotic degree distribution ρ(k) of a network with activity distribution F (a) ∝ a −ν is:

S3.2 Plugging in the reinforcement process
The model presented in Sec.S3.1 is a basic model as it contains no correlations on an agent's story at all.In particular, the probability for a node i to re-call an already contacted node is independent of the node degree.While simple to describe and solve analytically, this model is not realistic, as there are no correlations in the each agent's history.Moreover, the probability to call an already contacted node is always small as k/N 1 (and thus the probability to call a new node remains ∼ 1 even at large degree k).However, as shown in Sec.S2.2, real-world systems features a strong reinforcement process, as the probability p i (k) to call a new node at degree k decreases as the degree k increases.
For this reason we introduce an extended version of the model described in [16] et al. which includes a reinforcement function p i (k) that measures the probability for an active node i, that has already contacted k different nodes in the network, to call a new node instead of an already contacted one.

S3.2.1 The single β case
As already shown in Sec.S2.2 the functional form for the reinforcement process p i (k), i.e. the probability of adding a new link for the node i of degree k, reads: By plugging Eq. (S13) into Eq.(S8) for node i, we get: where N is the number of nodes in the network, i j is the sum over the nodes not yet connected to i and j is the sum over all the N nodes of the network.Each term of Eq. (S14) corresponds to a particular event that may take place in the system, as already presented in the paper.For instance, the first term of the l.h.s. of Eq. (S14) takes into account the increment of the node i's degree from k − 1 to k.This may happen whether because node i gets active and contacts a new node in the system with probability a i p i (k − 1) or because a node j never contacted before gets active and calls exactly node i with probability a j p j (h)/(N − h), being h the degree of j.In the same way, the second line takes into account that node i does not change degree k whether because it calls an already contacted node or because the non contacted nodes call other nodes in the network.The last line of Eq. (S14) considers the possibility that no node in the network gets active.
If we now substitute Eq. (S13) in Eq. (S14), after some algebra we get: Then, by applying the same approximations of large degree k and time t we obtain the continuous equation: where ρ(c j |a j ) is the probability for a node j of activity a j to have reinforcement constant c j .
The long time asymptotic solution of Eq. (S16) is of the form: Moreover, C(a, c) is a constant depending on the activity a and the reinforcement constant c that follows the: C(a, c) We do not have an exact solution for C(a, c), however C(a, c) (ac β ) 1/(1+β) for large a.
Let us note that Eq. (S17) can be obtained setting the variable x = k − C(a)t 1 1+β and substituting it in Eq. (S16) and imposing that |x| t 1 1+β from Eq. (S16): The solution of the latter equation is of the form thus confirming that x can be considered much smaller than t 1 1+β .
An important consequence of equations (S17) and (S18) is that, for a system featuring a reinforcement strength β, the average degree of the nodes belonging to a class b of activity a and constant c grows as: In particular, k(a, c, t) ∝ (at) for large values of the activity a.As expected, the average degree grows slower than in the memoryless case (β = 0) where the average degree grows linearly in time, as found in Eq. (S11).Moreover, the presence of a reinforcement process also affects the asymptotic behavior of ρ(k).Indeed, as already shown in the main paper, Eq. (S21) gives us the relation between the degree k and the activity a at a given time t, as k ∝ a 1 1+β .Thus, given an activity distribution F (a), we can infer the functional form of the degree distribution ρ(k) by substituting a → k 1 1+β , finding: Specifically, by supposing a power-law activity distribution F (a) ∝ a −ν and considering that the degree distribution for a class b is described by Eq. (S17), we obtain where we integrated over time t and reinforcement constant c(b) and we considered the asymptotic regime of large time and activity.
We start with no edge in the system and we draw for each node the activity a i from the distribution F (a).At each step a randomly chosen node gets active with probability a i .An active node then connects with probability p i (k) with a randomly chosen node which have not been yet connected to i or, with probability 1 − p i (k), the node calls an already contacted node and no new connection is added to the system.An evolution step corresponds to N of these elementary steps, i.e. for each evolution step we give, on average, the possibility to make a call to every node in the network.
The results are in excellent agreement with the analytical predictions.First, in Fig. S6 we show that the analysis presented in Section S2.1 correctly recovers the reinforcement exponent β opt .Indeed the minimum of the χ 2 b (β) are vertically aligned with the value of β fixed in the simulations.Then, in Fig. S7 we present the asymptotic growth of the average degree for an activity class (i.e. a collection of nodes bins b featuring similar activity values) and we compare it with the analytical prediction .In Fig. S8 we show that the shape and the evolution of the P i (k, t) distribution follows the predicted form of Eq. (S17).

S3.2.3 The multi-β case
As shown in Section S2 a single value of the reinforcement exponent β is found to fit most of the p b (k) curves in both the APS and TMN datasets, while for M P N a single value of β cannot fit all the p b (k) curves for each activity-degree class b at once.For this reason we further develop the model, letting each node i to feature three parameters: the activity a i and the reinforcement constant c i together with the exponent β i of the underlying reinforcement process.
Since the model with three parameters per node is difficult to handle, we apply some approximations in order to get analytical insight.In particular, we work in simplified single-agent framework, where we focus on a single agent that can only connect to other nodes and never get called.Within this approximation the master equation for the node i reads: (S24) The continuum limit for large degree k and time t of Eq. (S24) is:  The solution for P i (k, t) is: where the C i now reads: Again, the average degree k i (t) grows as: The result found in Eq. (S28) holds for a single class of nodes with a given set of activity a i and reinforcement constant c i and strength β i .
The average degree k(a, t) for the activity class a can be computed by integrating over the different values of β i and c i : k(a, t) = dc dβ ρ(β , c |a)C(a, c , β )(t) where ρ(β, c|a) is the probability for a node of activity a to have a reinforcement exponent and constant equal to β and c.By assuming that the distribution of the exponent β is independent from a and c we can factor out the time-depend term obtaining for the activity class a: k(a, t) ∝ dβ ρ(β )t     We compare the data for k(a, t) with the expected behavior (dashed lines) (at) 1/(1+β opt ) : in panels (a-e) βopt has been evaluated according Eq. (S6), while in the (f) case we use βopt = βmin = 1.2.We also plot the power-law fit k(a, t) ∝ (at) 1/(1+β * ) (solid lines) for comparison.

Fig. 1 :Fig. 2 :
Fig. 1: (A − D) The activity distribution F (a) for PRB (A), PRL (B4, TMN (C) and the MPN (D) dataset.The solid lines represent the fit F (a) with the best functional form for each dataset.The latter are a truncated power law for the PRB, PRL and MPN case, while we find a lognormal for the TWT case (see SI and supplementary materials for details).In these plots we show the experimental data ranging from the lower bound of the fit to the 99.9% of the total amount, thus excluding the higer 0.1% of the measured activity values from the visible area (see materials and SI for details).(E − H) The measured p b (k) curves for selected nodes classes belonging to the PRB (E), PRL (F ), TMN (G) and MPC (H) datasets.Each data sequence (different colors and markers) corresponds to a selected nodes class of the system.As one can see different nodes classes feature a differently behaving attachment rate function p b (k): for some nodes the probability to attach to a new node quickly drops to 0 at degree 10 while for some others the attachment probability is still 0.1 even at very large degree (k ∼ 10 2 ).(I − K) We rescale the attachment rate curves of all the nodes classes of the PRB (I), PRL (J) and TMN (K) datasets by sending k → x b = k/c b and then plotting the p b (x b ) 1/β , where β has the same value for every curve.For the MPC dataset (L) we show the original p b (k) curves belonging to a single nodes class with their fit.The resulting values of β b are shown in the legend.The latter are found to fall in the 1.0 βi 2.5 range.

-
PRA from January 1970 to December 2006; -PRB and PRD from January 1970 to December 2007; -PRE from January 1993 to December 2006; -PRL from February 1960 to December 2006.

Fig. S1 :
Fig. S1: The experimental activity distribution F (a) for (A) PRA, (B) PRB, (C) PRD, (D) PRE, (E) PRL, (F)TMN and (G) MPN (blue points).We also show the best candidate fit of the F (a) distribution (blue solid lines) featuring the functional form and parameters found in Table(1).In all the plots we show the data and fit ranging from the lower bound xmin to the 99.9% of the measured data, thus excluding from the visible area the top 0.1% of the activity values (see Table(1) for the lower bound details).

Fig. S2 :
Fig. S2: The heat-map-like value of − ln [χ 2 b (β)] (bottom plots).We plot the exponent β on the x-axes and the different bins b sorted by their final degree on the y-axes.The color-map is proportional to − ln [χ 2b (β)] representing the goodness of fit: the darker, the higher.The cyan vertical line is the value of βopt defined in Eq. (S6), while the other vertical lines represent the same quantity evaluated in the three black boxes corresponding to different final degree intervals.(Top plots) The curve χ 2 (β) as defined in Eq. (S5) (upfilled curve) and the same quantity for the three final degree intervals.For Twitter (a): a single value of βopt = 0.47 fits most of the curves and only some bins b deviate from the average behavior.(b) MPC: in this case we observe different behaviors depending on the final degree.Thus, a single βopt = 2.14 does not fit all the curves.We also show a "guide-to-the-eye" to highlight this feature (yellow dashed line).

Fig
Fig. S4: The box plot representing the distribution for different range of nodes classes b of the βopt(b) for (a) PRB, (b) TMN and (c) MPN.We also show the global optimal value βopt (horizontal red line) as found in Eq. (S6).The height of the box corresponds to the lower and upper quartile values of the distribution and the horizontal solid line corresponds to the distributions median, while the dashed lines indicates the average value for each range of final degree.The whiskers extend from the box to values that are within 1.5x the quartile range.As one can see, in both the PRB and TMN datasets the optimal values βopt is compatible with the distribution found in all the nodes class ranges (we find the same result for all the other APS datasets analyzed).On the other hand, in the MPN the distribution of βopt(b) lowers as the final degree of the class increases.The last group of nodes classes is no more compatible with the overall optimal βopt, being the distribution centered around βopt ∼ 1.1, in agreement with our estimation of βmin = 1.2.
) we show how the average value of the c(b) constants, c = c(b) b , differs from one dataset to the other varying from c = 0.8 in PRB to c = 1.7 in TMN and c = 4.6 in the MPN case, respectively.

Fig
Fig. S5: The P (c) distribution of the constant c(b) for the (a) TMN, and (b) MPN case (solid green line).We also compare the global P (c) distribution with the distribution of the c(b) values found within each activity class (solid lines and points): we find that the distribution of the c(b) parameter is more or less activity independent as most of the distribution of the single activity classes follows the same functional form of the total distribution P (c).We then report the average value c of the c(b) constant for each dataset (vertical cyan line).The latter reads 1.71 for TMN and 4.62 for MPN.Note that in the single β case we evaluate c(b) as the values of c(b) that best fits the b-th p b (k, β) curve fixing β at its optimal value (β = βopt).On the other hand, in the multi-β case we evaluate c(b) as the ones that best fits the p b (k, β(b)) curve, where the exponent is now foxed to the β(b) value for the memory class b, i.e. to the optimal value for the class b.(c) The box plot showing the global distribution of the constant c(b) in all the datasets analyzed.The height of the box corresponds to the lower and upper quartile values of the distribution and the horizontal solid line corresponds to the distributions median, while the dashed lines indicates the average value for each range of final degree.The whiskers extend from the box to values that are within 1.5x the quartile range.(d) In this table we report all the values of the reinforcement exponent βopt and the average reinforcement constant c(b) .For the MPN case we report the βopt = βmin and the constant values are evaluated for each nodes class b using its optimal value of β, βopt(b).

Fig. S6 :
Fig. S6: The heat-map of − ln(χ 2 b (β)) obtained from the simulation in the same way as in Figure S2 from real data.As one can see the recovered βopt is in excellent agreement with the value used in the simulation: 1.0 in the (a) panel and 2.06 in the in the (b) panel.

)Fig. S7 :
Fig. S7: The average degree k(at) for different activity classes in the β = 0.5 (a), β = 1.0 (b), β = 1.5 (c) and β = 2.0 (d) case.The time is rescaled with activity t → at, so that all the curves collapse on a single behavior.We also fit k(at) ∝ (t/A) 1 1+β * (cyan solid line) and compare the simulation with the analytical result k(at) = A • t 1 1+β (blue dashed line).

Fig
Fig. S8: The probability distribution Pa(k, t) for a selected activity class a in the simulations with exponent β = 1.0(a) and β = 2.0 (b).We compare different evolution times (see legend) by rescaling the degree k → k = (k − k(a, t))/ k(a, t) 1/2 on the x-axis and the distribution itself Pa(k, t) → k(a, t) 1/2 P (a, k, t) on the y-axis, where k(a, t) is the average degree at time t for the nodes belonging to the activity class a.We also show the fit of the large time P (a, k, t) with a Gaussian curve (black dashed line) as predicted in Eq. (S17).
ρ(β) is the probability distribution of the β parameter.

Fig. S13 :
Fig. S13: The degree distribution ρ(k) for: (a) PRA, (b) PRB, (c) PRD, (d) PRE, (e) PRL and (f) TMN (blue circles).We compare the results with the predicted behavior of Table(1) of main paper given the parameters of Table (1) (red solid lines).We use the single value of βopt defined by Eq. (S6) in all the cases.As in Fig.S1we show the data and fit ranging from the lower bound to the 99.9% of the measured data, thus excluding from the visible area the top 0.1% of the degrees values.
1970 to Dec. 2006 and contains 36,880 papers written by 34,093 authors and connected by 100,683 edges.The PRB dataset refers to the Jan. 1970 to Dec. 2007 period and con-tains 104,047 papers published by 84,367 authors which are connected by 416,048 links.The PRD datasets covers the same period as the PRB one and it is composed by 33,376 papers, 21,202 authors and 60,033 edges.The PRE dataset refers to the Jan. 1993 to Dec. 2006 period with 24,204 papers published by 28,188 authors connected by 68,029 edges.Finally, the PRL dataset contains all the 66,422 papers published between Jan. 1960 to Dec. 2006 and written by 78,763 authors forming 299,017 edges.
• One network dataset describing Twitter mentions (TMN), exchanged by users from January to September 2008.The network has 536,210 nodes performing about 160M events and connected by 2.6M edges.• One Network dataset describing the mobile phone calls network (MPN) of 6,779,063 users of a single operator with about 20% market share in an undisclosed European country from January to July 2008.The datasets contains all the phone calls to and from company users thus including the calls towards or from 33,160,589 users in the country connected by 92,784,825 edges.