Reconstructing irreducible links in temporal networks: which tool to choose depends on the network size

Filtering information in complex networks entails the process of removing interactions explained by a proper null hypothesis and retaining the remaining interactions, which form the backbone network. The reconstructed backbone network depends upon the accuracy and reliability of the available tools, which, in turn, are affected by the specific features of the available dataset. Here, we examine the performance of three approaches for the discovery of backbone networks, in the presence of heterogeneous, time-varying node properties. In addition to the recently proposed evolving activity driven model, we extend two existing approaches (the disparity filter and the temporal fitness model) to tackle time-varying phenomena. Our analysis focuses on the influence of the network size, which was previously shown to be a determining factor for the performance of the evolving activity driven model. Through mathematical and numerical analysis, we propose general guidelines for the use of these three approaches based on the available dataset. For small networks, the evolving temporal fitness model offers a more reasonable trade-off between the number of links assigned to the backbone network and the accuracy of their inference. The main limitation of this methodology lies in its computational cost, which becomes excessively high for large networks. In this case, the evolving activity driven model could be a valid substitute to the evolving temporal fitness model. If one seeks to minimize the number of links inaccurately included in the backbone network at the risk of dismissing many links that could belong to it, then the temporal disparity filter would be the approach-of-choice. Overall, our contribution expands the toolbox of network discovery in the technical literature and should help users in choosing the right network discovery instrument, depending on the problem considered.

does not identify any true positive, thus failing to detect the known backbone network. The EADM has adequate performance in finding the known backbone network only when the network is large, while the ETFM can tackle the analysis of systems of any size. The main issue of the ETFM is that it detects several false positives when the known backbone network is present almost at any time step. In other words, the TDF detects the least amount of links (counted as the sum of false and true positives), the ETFM an intermediate value, and the EADM the largest amount. Further, when the network size is large, the ETFM and EADM detect the same backbone network. This is because, for large networks, the activity estimation of the ETFM is equivalent to the one of the EADM.
Finally, we detect the backbone network of two real systems freely available as part of the SocioPatterns project [30]. We study the Primary School and Workplace datasets, where nodes represent students (or teachers) and workers, respectively, interacting over successive days. We detect the backbone network in the entire school and workplace, but also in each class and department. The reconstructed backbone networks are coherent with expectation from the analysis of artificial networks: the TDF finds the least amount of links, the ETFM an intermediate value, and the EADM the largest number. Interestingly, we also observe that the EADM and ETFM detect the same backbone network when the real systems are sufficiently large.
Overall, we identify a strong finite-size effect in backbone detection, which suggests that care should be placed when selecting an approach. Further, the TDF minimizes the number of false positives, but often it does not detect any true positive, challenging the proper reconstruction of the backbone network. The ETFM is especially useful for small systems, offering a reasonable trade-off between the number of true and false positives detected. The EADM detects the same backbone network as the ETFM when the system is large, and should be preferred for the analysis of large systems due to its reduced computational burden.
The rest of the paper is organized as follows. In table 1, we summarize the acronyms used to indicate the new methodologies, the nomenclature of the article, and the parameters employed in the generation of the artificial networks. In section 2, we summarize the EADM and introduce the new approaches, namely, the ETFM (an extension of the temporal fitness model [5]) and the TDF (an extension of the disparity filter [6]). In section 3, we discuss our analytical and numerical findings for both artificial and real systems. Finally, in section 4, we draw our main conclusions and propose potential lines of further inquiry.

Significant links
In this section, we summarize the EADM to detect significant links [19] and introduce the ETFM, our extension of the temporal fitness model [5], and the TDF, our extension of the well-established disparity filter [6].
For all the approaches under consideration, we consider a temporal undirected network of N nodes that evolves in time over an observation window composed of T 1 time steps, labeled by the discrete time index t = 1, ..., T . At each time t, nodes interact among themselves and form a time-varying network of interactions, whose interconnection links are encoded in a binary adjacency matrix A(t) that varies in time. The generic entry of A(t) is defined as A ij (t), where i, j = 1, . . . , N are the indices of the nodes. Through the paper, we adopt the following notation: given a quantity '·', we define · as its statistical average and E [·] as its expected value across multiple, independent realizations.
The EADM, ETFM, and TDF are similar in some aspects and different in others; a summary is provided in table 2. The approaches follow three sequential steps: (i) determining the interval partition, (ii) estimating model parameters, which are evolving over successive intervals, and (iii) designing a statistical filter, which removes all links explained by the null hypothesis and retains the backbone network. Details on the implementation of each of these steps and key differences among the methodologies are detailed in what follows.

Interval partition
In our previous work [19], we demonstrated that dividing the overall observation window into successive intervals improves the accuracy of the reconstruction. Therein, we compared a naïve supervised partitioning method, which divides the time series at random, with an unsupervised one (the Bayesian Block (BB) method [31]). We found that the latter should be preferred for its performance over artificial network benchmarks, its freedom from the need of prior information about the interval partition, and its implementability on relatively short time series.
The BB method [31] employs maximum likelihood and marginal posterior functions to distinguish statistically significant features from random observational errors. It starts with considering the first time step t = 1, and then it adds and analyzes the subsequent time instants t = 2, . . . , T, approximating the time series in piecewise constant segments, where a homogeneous number of links is observed. The algorithm coherently builds on iterations from past time steps to efficiently compute future interval partition.
In our application, the BB method analyzes the total number of temporal links created in the entire network at time t and returns the interval partition, which divides the overall time window T into I disjoint intervals indexed by ∆ = 1, . . . , I, containing approximately a uniform total number of connections [31]. The superscript 'ts' indicates that these variables are estimated from the time series of the individual links. From the knowledge of the interval partition, we determine the length, τ (∆), of the generic ∆th interval under the closure relation  [31] Weighted configuration model [19,32] Poisson distribution [5,19] ETFM BB method [31] Maximum likelihood [5] Poisson distribution [5,19] TDF BB method [31] Normalized weights [6] Disparity filter [6] 2.1.2. Parameter estimation In order to detect the backbone network, we should determine the extent by which each pair of nodes i and j will interact by simple chance. The EADM is based on a heterogeneously distributed parameter, the activity, which represents the propensity of generating links over time [21]. The probability that node i interacts with node j at time t is the product of the two activities [5,19], Individual activities are estimated according to the weighted configuration model [19,32].
Since the I intervals found by the BB method are disjoint, we can analyze each of them separately. For instance, we consider the generic ∆th time interval that starts at t = t in (∆) and ends at t = t in (∆) + τ (∆) − 1. The activity of node i at time t ∈ [t in (∆), t in (∆) + τ (∆) − 1] is computed through the following formula: where s ts i (∆) and W ts (∆) 1 are the node strength and the total number of temporal links generated in the network in the ∆th interval, respectively. Specifically, s ts i (∆) and W ts (∆) are computed from A ts (t) through A ts ij (t) and Once the activities are estimated according to equation (3), the probability in equation (2) can be computed as

Statistical filter
The typical objective of a statistical filter is to test whether a specific link ij can be explained by the null hypothesis. If it were explained by the null model, it should be filtered out, otherwise it should be included in the backbone network. The same statistical test has to be applied to each of the N E links that appear at least once over the entire observation window.
The EADM requires to compare the total number of connections between node i and node j , w ts ij , observed in the entire observation window with its expected quantity, E w ij . The observed number of connections can be computed from A ts ij (t) as The expected number of connections between nodes i and j in the time window T is calculated as the sum of the binomial random variables given in equation (2) [19] The sum of non-identical binomial random variables in equation (7) is a Poisson binomial distribution, which is usually approximated by the Poisson distribution [33][34][35], due to a lack of closed-form expression. The confidence that the observed weight, w ts ij , could be explained by its corresponding expected weight, E w ij in equation (7), is computed according to the cumulative function of the Poisson distribution where P x; E w ij indicates the Poisson distribution with random variable x and expected value E w ij . Equation (8) represents the p-value, α ij , that the observed link ij can be explained by the null hypothesis of being extracted from a Poisson distribution. If the p-value is below a pre-defined threshold, then the link is deemed as significant and retained in the backbone network.
From equation (8), a total of N E p-values are computed, one for each link created at least once in the overall network evolution. The interval partition begets better estimation of the expected number of links between nodes i and j , in equation (7), thereby improving on the estimation of the p-values through equation (8).

Evolving temporal fitness model (ETFM)
As detailed in table 2, the interval partition is carried using the BB method [31], thereby following section 2.1.1. Similarly, the statistical filter is equivalent to the one of the EADM, explained in section 2.1.3. Here, we detail the main difference between the EADM and ETFM, which pertains to the activity estimation.
Similar to the EADM, the probability that node i interacts with node j at time t is the product of the two individual activities, as given in equation (2). However, the ETFM estimates the activities in a maximum likelihood sense, based on the approach introduced in [5]. Specifically, given that the intervals returned from the BB method are disjoint, we can apply the maximum likelihood presented in [5] to each of the I intervals. The input parameters of the maximum likelihood are computed according to the EADM in equation (3).
For instance, in the ∆th interval, starting at t = t in (∆) and ending at t = t in (∆) + τ (∆) − 1, individual activities are assumed to be constant and estimated according to equation (3). Some nodes in the network, however, do not generate nor receive connections. These nodes have an activity equal to zero and they should be discarded in the maximum likelihood computation. Hence, from the set of all possible activities at time t, a(t) = (a 1 (t), . . . , a N (t)), the maximum likelihood estimation is applied only to those entries with a i (t) > 0. Assuming that there are n(∆) N of these nodes, we indicate the reduced set of optimal activities as a * (t) = a * 1 (t), . . . , a * n(∆) (t) . The estimation of the optimal activity values is carried similar to [5], whereby All the individual activities are constant (between zero and one) within the ∆th interval, and w ts ij (∆) is given by equation (6). Once the optimal values for the activities are estimated from equation (9), they can be used to estimate the probability in equation (2).

Temporal disparity filter (TDF)
According to table 2 and similar to the EADM and ETFM, the interval partition is carried out through the BB method implemented on the time series of the individual links [31]. The parameter estimation and the statistical filter are different from the EADM and they are explained in the following.

Parameter estimation
In the TDF, parameters are estimated directly from the time series, without relying on any expected quantity like the activities for the EADM and ETFM. For instance, in the ∆th interval that starts at t = t in (∆) and ends at t = t in (∆) + τ (∆) − 1, a number of connections between node i and node j are observed, w ts ij (∆), in equation (6). Similar to the disparity filter [6], two normalized quantities are computed by dividing w ts ij (∆) either by the strength of node i, s ts i (∆), or node j , s ts j (∆), in equation (4), respectively, such that These quantities must satisfy the closure condition j r ts Note that while w ts ij (∆) = w ts ji (∆), in general r ts ij (t) = r ts ji (t) since the strengths of nodes i and j can be different. Over the entire time window T, the normalized weight, r ts ij , is computed using a weighted time average starting from the results of equation (10), yielding Here, the longer is the interval, the greater is its contribution to the overall average. If only one interval is returned by applying the BB method to the time series, equation (11) reduces to the estimation of the classical disparity filter [6].

Statistical filter
The TDF statistical filter takes two quantities as input: the normalized weights computed according to equation (11) and the observed degree of either node i, k ts i , or node j , k ts j , over the entire network evolution. The node degree is computed from equation (6) as where H is the Heaviside function that takes a value equal to one when w ts ij 1, and zero otherwise. The TDF null hypothesis is that a node connects uniformly at random with all its neighbours. For node i with degree k ts i , the TDF considers as a baseline that node i exchanges an equal number of connections with all its neighbors, which results in all normalized weights from equation (11) being equal to 1/k ts i . Irreducible links included in the backbone network have a normalized weight r ts i greater than the expectation from the null hypothesis.
The advantage of the TDF with respect to the original disparity filter [6] is that it accounts for time-variations in node properties, through equation (10). The disparity filter in its original incarnation can be only applied to weighted networks. A p-value has to be associated with any link in the network. We use the same formula proposed for the disparity filter [6], namely, where γ ij is the p-value. In general, equation (13) may yield a different result if it is specialized to node i or node j , even for undirected networks. Therefore, it is necessary to consider an additional p-value for the undirected link ij, by focusing on node j rather than i, such that we compute γ ji in lieu of γ ij . If at least one of the two p-values γ ij or γ ji is lower than a pre-defined threshold, the undirected link ij is included in the backbone network. If only one interval is returned by applying the BB method to the time series, the TDF becomes equivalent to the disparity filter [6]. Similar to equations (7) and (8), the temporal partition is used to estimate the network weights and a p-value is associated with each link of the network that appeared at least once during the overall network evolution. Graphical representations of the ETFM and TDF are given in figure 1.

Multiple tests comparison
In all of the three approaches under consideration, a statistical test is carried out for all the N E links observed at least once in the evolving network. The EADM and ETFM use equation (8), while the TDF employs equation (13). Since multiple hypotheses are tested, a multiple hypothesis test correction is required [36]. Here, we choose the Bonferroni correction [37]. It computes the multivariate threshold equal to β = β/N E , where β is the univariate threshold, usually set equal to 0.01. This correction assures that no false positives will be included in the backbone network with probability 1 − β. An alternative to the Bonferroni correction might be the false discovery rate (FDR) [38], which is a less restrictive option because it ensures that the fraction of false positive is less than β.

Results
Here, we discuss how finite-size effects influence the performance of the EADM, ETFM, and TDF, toward the ultimate objective of informing the selection of the most appropriate approach as a function of the size of the network.
Since the main goal of these approaches is to infer the backbone network in real datasets, we begin by briefly summarizing the features of the Primary School dataset under consideration, freely available as part of the Socio-Patterns project [30]. In appendix C, we analyze the Workplace dataset, also part of the SocioPatterns project [30], in a similar way.
Upon concluding the analysis of the Primary School dataset, we illustrate the features of the generative algorithm for the synthetic networks examined in this study. Our generative mechanism entails the creation of artificial networks with arbitrary distributions of both the activity of the network nodes and of the link weight in the network backbone. In the main text, we consider the case of a heterogeneous activity distribution and homogeneous backbone, similar to our previous work [19]. In appendix A, we contemplate three additional cases: (i) heterogeneous activity distribution and heterogeneous backbone, (ii) homogeneous activity distribution and homogeneous backbone, and (iii) homogeneous activity distribution and heterogeneous backbone.
Later, we study the computational complexity of the EADM, ETFM, and TDF, concluding that the ETFM is too demanding for application to large networks composed of several hundreds or more nodes, while the EADM and TDF are scalable approaches, suitable for large networks.
Next, we evaluate the error of the EADM and ETFM, by comparing expected with observed quantities. This analysis cannot be carried out for the TDF, because it does not compute any expected quantity and its filter takes as input only observed quantities (such as the node strength and degree). We determine that the ETFM optimizes the activities in a maximum likelihood sense and its results are always more accurate than those of the EADM. However, the error of the EADM decreases with the inverse of the size of the network so that the error is reasonably low for many applications on large systems.
In order to more precisely estimate how finite-size effects influence the EADM, ETFM, and TDF performance, we consider an artificial network, where the backbone network is a-priori known [19]. We find that the TDF brings to zero the false positives (links incorrectly identified as part of the backbone network). However, most of the links that truly belongs to the backbone network are not identified by the TDF. The ETFM offers a more flexible framework that allows for detecting most of the links belonging to the backbone network, without yielding an excess of false positives. The EADM has equivalent performance to the ETFM for networks with a hundred nodes or more, but performs poorly for smaller networks. Overall, we find that the TDF detects the least amount of true backbone links among the considered approaches. The ETFM finds an intermediate value and the EADM begets the largest number.
Finally, we illustrate our numerical arguments by studying the Primary School and Workplace datasets. We find that the TDF has the most restrictive null hypothesis, which allows for including in the backbone network only a few links. This approach should be used only if we must avoid false positives at all costs. The ETFM detects more links than the TDF and affords to study features of the backbone network that would have been lost otherwise, in agreement with prior studies [5]. When the network contains at least one hundred of nodes, the ETFM and EADM perform similarly, so that the EADM should be preferred from its reduced computational burden.
Backbone network In the ETFM individual activities are evaluated in a maximum likelihood sense (top), according to equation (9). In the TDF normalized weights are estimated from the time series (bottom), according to equation (10). (iii) Statistically verify whether links observed from the time series can be explained by the null hypothesis. In the ETFM the link is significant if its weight, w ts ij , is above the threshold w ts ij (β ) (top), according to equation (8). In the TDF the link is significant if the observed normalized weight, r ts ij , is above the threshold r ts ij (β ) (bottom), and according to equation (13). As a result, the backbone networks are extracted from the temporal network (right graphics).
Further, in appendix D, we compare the TDF with its basic version, the disparity filter, showing that the two approaches yield different predictions for the Primary School and Workplace datasets.

Data processing
Here, we describe in detail our approach to the analysis of the freely available Primary School dataset [30], a similar analysis is carried out for the Workplace dataset [30] in appendix C. Our approach is inspired by previous efforts [5,19]. In the original dataset, person to person interactions are recorded every 20 seconds; however, in reality, interactions might last longer than this pre-defined temporal resolution. In order to remove redundancies in the collected data and confounds associated with a particular resolution [5,19], we study two temporal resolutions of 5 and 15 min, and remove multiple connections at the same temporal resolution. We remove the time interval between the last connection of a day and the first one of the following day.
In order to explore the role of network size, we study different temporal networks from the Primary School dataset and detect a backbone network for each of them. Specifically, we focus on the temporal interactions occurring in each classroom (10 in total), the body of the teachers, and the entire school. A data summary is provided in table 3.

Generative mechanisms of artificial networks
We present a generative mechanism for temporal networks with known backbone network and arbitrary distribution of reducible and irreducible links. The mechanism extends the one introduced in [19], where temporal networks with heterogeneous activity distribution and homogeneous backbone are generated.
Artificial temporal networks are generated from the superposition of reducible and irreducible links. Reducible links are created according to equation (2) and evolve over the whole observation window composed by T time steps, partitioned into I intervals. Nodes have interval-dependent (piece-wise constant) activities a i (t), drawn from the activity distribution F(a). Between two consecutive intervals, , the activity values vary according to where p is an autocorrelation parameter and y a random number extracted from F(a). Among all the links observed at least once, a fraction of them is arbitrarily assigned to the backbone network. The preponderance of a link between nodes i and j is λ ij , such that if λ ij = 0, the link appears according to the product of activity values, as in equation (2), while increasing λ ij makes the backbone network easier to discover. The values of λ ij are extracted from the distribution G(λ). More details about the generative algorithm can be found in appendix A1.
In the main text, we consider the case of a heterogeneous activity distribution, F(a) ∼ a −2.1 and a ∈ [a min , 1], and homogeneous backbone network, G(λ) ∼ δ λ −λ , that is, λ ij =λ for all link in the backbone network, where δ (·) is the Dirac delta distribution. In appendix A, we examine three more cases: (i) heterogeneous activity distribution and heterogeneous backbone, (ii) homogeneous activity distribution and homogeneous backbone, and (iii) homogeneous activity distribution and heterogeneous backbone. Table 3. Data summary of the Primary School dataset. The temporal networks evolve over successive time steps of length equal to either 5 or 15 min and all multiple links within that temporal resolution are removed. The '# temporal links' column indicates the number of temporal links retained in the dataset for their respective temporal resolutions, in brackets. The '# static links' column indicates the number of aggregated links, which corresponds to links appearing at least once in the overall temporal evolution. The 'All' dataset corresponds to the whole Primary School, the 'Teachers' dataset correspond to the interactions between teachers, and all the other datasets are named with the respective class-section in the Primary School. We always remove the time interval between the last connection of a day and the first one of the following day.

Computational complexity
All approaches determine the interval partition using the BB method, which requires a computational time of O(I 2 max ). Here, I max represents the number of times in which the total number of temporal links in the entire network at time t, as computed in equation (1), is different from what observed at t + 1, Ω(t) = Ω(t + 1) for t = 1, . . . , T − 1, and O is the Landau symbol. Similarly, the statistical filters in equations (8) and (13), share a similar computational cost that scales as O(N E ), where N E is the number of links observed at least once in the temporal evolution.
The estimation of model parameters is different across the EADM, ETFM, and TDF. In the EADM, activities are estimated through equation (3), which scales linearly with the number of intervals, I, and the number of links created in each interval. The complexity of this approach is O (cNI), where c N for sparse networks and c ∼ N for dense networks. A similar reasoning applies for the estimation of the TDF's parameters in equations (10) and (11), yielding a complexity of O (cNI).
In the ETFM, activity values are optimized in a maximum likelihood sense, which requires to recursively solve equation (9). Hence, the ETFM computational cost depends on three factors: the number I of intervals, the number of nodes in the ∆th interval, n(∆), and the number m of recursions. The computational cost grows linearly with the number of intervals I, since it has to be applied independently to each interval, and with m, which is the number of iterations for the maximum likelihood to converge. On the other hand, it is quadratic in n (∆), because of equation (9). Overall, the ETFM complexity hits O (mIN 2 ) in the worst case scenario, while in the best case scenario, when only one interval is present, it is O (mN 2 ), like in the temporal fitness model [5].
By comparing the EADM and TDF complexities, O (cNI), with the ETFM complexity, O (mN 2 I), we observe that the bottleneck is the size of the network. For large networks, the EADM and TDF are much faster procedures than the ETFM.

Accuracy
Our study involves the mathematical analysis of the model accuracy, which is verified through numerical simulations using the artificial networks described in section 3.2 and appendix A.1. In the main text, we focus on the case of heterogeneous activity distribution and homogeneous backbone network; in appendix A.2 we explore the case of heterogeneous backbone and in appendix A.3 the case of homogeneous activities.
The TDF statistical filter in equation (13) uses quantities that are computed directly from the time series, without relying on any expected value, which is otherwise needed for the EADM and ETFM. Therefore, the following analysis can be carried out only for the EADM and ETFM, since it involves comparison between expected and observed quantities. We focus on both microscopic and macroscopic quantities, including the observed node strength, s ts i , compared to its expected value, E[s ts i ]; and the observed average strength, s ts , compared with its expected value, E[ s ts ].

Mathematical analysis
The analysis of the ETFM is trivial, since, by definition, the maximum likelihood in equation (9) returns optimal activity values, a * i (t) for i = 1, . . . , N , for an optimal representation of observed values. Hence, we expect that the ETFM provides the most accurate estimation of the backbone network. We focus on both global and local quantities, which we report here, and successively compare to the results of the EADM. In the ETFM, the expected individual strength of node i is computed by summing over the entire time-window and over all possible connections that node i may have In equation (15), we use the product of activities to be consistent with the probability that two nodes randomly connect, given in equation (2). The expected average strength is the simple average of equation (15) among all possible nodes in the network and reads The analysis of the EADM is more challenging. The node strength is a key quantity to compute the individual activities, in equation (3), to estimate the link weights, in equation (7), and to determine whether links are explained by our null model or they belong to the backbone network, in equation (8). From equation (7), we compute the expected individual strength for node i in the whole observation window by summing over all the possible connections that node i can make, excluding self-loops that are not allowed in the model. This reads By changing the order of the two summands and using the definition of the total number of links in the ∆th interval, W ts (∆) in equation (4), we obtain By summing and subtracting s ts i (∆) from the numerator, we get which can be rewritten as where E [s i ] and s ts i represent the expected and observed individual strengths, respectively. We observe that the EADM underestimates the observed strength, s ts i , by a quantity that only depends on the self-loops. The contribution of self-loops is inversely proportional to the size of the network and scales as 1/N. This result generalizes the proof in [39], based on the static configuration model, to temporal networks.
As a consequence, macroscopic quantities, such as the expected average strength, should become more accurate in the EADM as the size of the network increases. By averaging equation (20) for all nodes in the network we obtain which scales as 1/N. In other words, since the EADM error comes from self-loops only, the observed quantities, s ts i and s ts , become equivalent to expected quantities if self-loops are included in both equation (20) and equation (21). We indicate these adjusted quantities as where the superscript 'sl' indicates that these sums include self-loops into equations (20) and (21). In order to properly assess the error of the EADM, we consider two metrics: the relative error and the normalized root mean squared error. The relative error determines whether our estimators are unbiased, while the normalized root mean squared error is needed to ensure that the variance of our estimators is bounded. The theoretical relative error and normalized root mean squared error of the EADM are obtained by properly combining equations (20)-(22)

Numerics
The mathematical analysis above suggests that the ETFM error is minimized, while the error of the EADM should decay as the inverse of the size of the network, 1/N, and coincide with predictions from equation (23). The numerical relative error and normalized root mean squared error are computed as follows: where we substitute the expected quantities, E sl [s i ] and E sl [ s ], in equation (23), with their observed values, s ts i and s ts . The remaining expected quantities in equation (24) can be either equations (15) and (16) for the ETFM numerics, or equations (20) and (21) for the EADM numerics.
In our numerics, shown in figure 2, we consider artificial networks having a homogeneous backbone with λ = 0.7, as introduced in appendix A.1, and real systems, as presented in table 3. The cases of artificial networks having either a homogeneous backbone with λ = 0.02 or heterogeneous backbone are explored in appendix A.2. We determine that, for both artificial and real systems, our theoretical arguments hold. Specifically, we find that the theoretical error of the EADM comes from self-loops only and it scales as the inverse of the network size. Further, we show that the ETFM improves the EADM estimation, lowering the relative and normalized root mean squared errors. However, the ETFM estimation is challenged by the excessive preponderance of backbone network, in terms of the normalized root mean squared error. This is due to the fact that the maximum likelihood approach of the ETFM in equation (9) estimates the activity values without considering the presence of the backbone network.
Interestingly, we find that in both artificial and real systems, the relative EADM errors are about 1% when networks reach one hundred of nodes. This amount of error is acceptable for many practical applications. On the contrary, when the artificial and real systems are smaller, the relative EADM error may reach up to 10%, challenging its usability for many practical purposes.

Performance comparison on artificial networks
Going further in the comparison between the EADM, ETFM, and TDF, we examine the performance of the three approaches in detecting backbone networks as a function of the size of the network. We use the artificial network introduced in our prior work [19]. We analyze two different cases, resembling two realistic situations: one in which the backbone network is barely present, where we set λ = 0.02, and another in which the backbone network appears almost at every time step, where we set λ = 0.7. In this analysis, we set the univariate threshold β equal to 0.01 [15,19], and then correct it according to the Bonferroni correction as detailed in section 2.4, to obtain the β used to filter out links explained by the null models.
The performance of the three approaches is determined through the analysis of the algorithms' precision and recall [40]. The former corresponds to the ratio between the number of true positives (all the links discovered   that truly belong to the backbone network) divided by the number of all the detected links (true and false positives). The latter metric corresponds to the ratio between the true positives divided by the sum of true positives and false negatives (all the links in the irreducible backbone). As shown in figures 3(a) and (c), when the backbone network is barely present, the TDF does not detect any link (recall is zero and precision is not defined). The ETFM outperforms the EADM, while it falls behind the EADM in terms of recall. In other words, the EADM detects more links than the ETFM and, some of them are false positive that belong to the known backbone network. When the size of the network reaches about one hundred nodes, the EADM and ETFM are equivalent. When the backbone network is present almost in every time step, the TDF has the highest precision among the three methodologies, but its recall goes rapidly to zero when the size of the network increases, as shown in figures 3(b) and (d). In this case, the EADM and ETFM perform similarly in terms of precision and recall, and they are accurate only when the size of the system reaches about one hundred nodes. If links in the backbone network appear in almost all time steps, they significantly modify the activity estimation in equations (3) and (9), which lowers the performance of the EADM and ETFM.
The analysis of artificial networks suggests that the TDF does not detect false positive, while often missing major parts of the backbone network. Failing to detect most of the backbone network may be due to two main aspects: (i) the use of the Bonferroni correction [37], which increases the methodology's precision and lowers its recall; and (ii) the heterogeneous distribution of activity values, whereby in appendix A.3 we determine an improvement in the number of true positives when dealing with a homogeneous activity distribution. The ETFM offers adequate performance in many situations, scoring poorly only when links in the backbone network are present at almost every time step. The EADM becomes equivalent to the ETFM when the size of the network reaches about one hundred nodes. Since the EADM is faster than the ETFM, it should be preferred for the study of large networks.

Network backbone discovery in real systems
Detecting the backbone in real systems yields more challenges than the study of artificial networks. The main limitation is that the backbone network is not known a-priori and performance can only be assessed through comparison. A possible way to circumvent such a limitation is to observe whether or not results from real systems are coherent with findings on artificial networks. According to our results on artificial networks, the TDF has the most restrictive null hypothesis, which leads to the least number of links detected in the backbone network. The ETFM finds smaller backbone networks than the EADM when the networks under examination have only tens of nodes, while they tend to become equivalent for larger systems. To confirm this claim in real systems, we compare the set of irreducible links detected by the three methodologies, using the EADM as reference. We compare the sets of links discovered by the TDF, L TDF , and by the ETFM, L ETFM , with that of the EADM, L EADM . The dimension of a set is indicated with | · |. We use two metrics: the ratio of the number of links |L x |/|L EADM | and the Overlap index [41], defined as where x = {ETFM, TDF}. The ratio |L x |/|L EADM | is useful to quantify whether the EADM detects more links than the ETFM and TDF. The Overlap coefficient in equation (25) measures the fraction of shared links between the two methodologies, thereby helping ascertain whether the ETFM and TDF find a subset of the links detected by the EADM. In order to properly compare the EADM with the ETFM, we consider a series of values for the univariate threshold β and indicate with a vertical black line the multivariate threshold β , properly modified according to the Bonferroni correction.
In figure 4, we show that the TDF detects less links than the ETFM, which, in turn, finds less links than the EADM, as expected. The fact that the TDF detects the least amount of links is due to its more restrictive null hypothesis, which ensures that no false positives are identified, but it misses many links that belong to the backbone network. ETFM and EADM results are affected by the network size; but when the entire school (formed by a few hundreds of nodes) is analyzed, the two methodologies become equivalent. This result is in line with the analysis of artificial networks.
In figure 5, we show that, for most of the real datasets studied herein, the Overlap coefficient is close to one. This result implies that the ETFM and TDF share the same core of the backbone networks with the EADM. The few cases in which the Overlap coefficient is lower than one pertain to the comparison between the EADM and TDF, due to the difference in their null hypotheses.  By combining the results of figures 4 and 5 we draw two main conclusions. First, the TDF has the most restrictive null hypothesis and detects the smallest backbone network. Second, the ETFM finds a subset of the links detected by the EADM only when a classroom (composed of only tens of nodes) is studied, while it identifies the same backbone network when the entire school (composed of only hundreds of nodes) is analyzed. Since the EADM is faster than the ETFM, it should be used to discover backbone networks in larger systems.
In table 4, we synoptically present the take-home message of the comparisons conducted throughout this paper. While, the EADM should be used for large networks and ETFM for small networks, the TDF can be applied to any network. The EADM and ETFM can be successful in identifying many irreducible links, but this may come at the cost of some of the reducible links being inaccurately assigned to the backbone. The TDF allows to filter out almost all the reducible links, but it tends to miss most of the irreducible links.

Discussion
In this paper, we introduce two new approaches that account for time-varying individual properties: an extension of the temporal fitness model [5], the evolving temporal fitness model (ETFM), and an extension of the disparity filter [6], the temporal disparity filter (TDF). We critically compare the performance of these approaches with  the evolving activity driven model (EADM) [19], an approach we recently proposed to account for time-varying individual properties. Our analysis focuses on the network size, toward an improved understanding of finite-size effects in backbone detection.
Our study involves the analysis of both artificial and real systems. From the analysis of artificial networks, we find that the TDF proposes the most restrictive null hypothesis and the discovered backbone networks contain the smallest amount of links among the three methodologies. This approach ensures that no false positives (links incorrectly identified as part of the backbone network) are detected, but most of the times it misses links that truly belong to the backbone network. The ETFM offers a trade-off between the number of detected links and the likelihood that they belong to the backbone network. From our study, we determine that false positives are detected only when links in the backbone network are present in almost any time step and the size of the network is of the order of ten nodes. When the network contains hundreds of nodes, the number of false positives is reduced to zero. Interestingly, for networks with at least one hundred nodes, the ETFM has the same performance of the EADM. Given that the EADM has a smaller computational burden, it should be preferred to the ETFM in this case. Comparing the three approaches, we observe that the TDF has the most restrictive null hypothesis, and it detects the least amount of irreducible links. The ETFM identifies an intermediate number of irreducible links, which becomes equivalent to the EADM when the networks have at least one hundred nodes.
Our results on real systems confirm the insight gathered from artificial networks. When detecting the backbone network for the Primary School and Workplace datasets [30], we determine that the EADM, ETFM, and TDF detect a different number of irreducible links. In most cases, the TDF finds a subset of the links of the ETFM, which detects a subset of the links of the EADM, thereby validating the same core of the backbone network.
The size of the network plays a critical role on the proper determination of the backbone network. On the one hand, large systems can only be studied by the EADM and TDF. On the other hand, the ETFM requires a large computational power, but it offers the best trade-off between the number of detected links and the fraction of them that actually belongs to the backbone network.
In summary, we suggest to use the TDF if the objective of the study is to minimize the number of false positives, at the cost of failing to detect many irreducible links. Employing the Bonferroni correction, the TDF could detect most of the irreducible links only if the distribution of activity values is homogeneous, a scenario that is in contrast with several empirical observations [42][43][44]. Instead, if finding as many irreducible links as possible is the main goal, then the ETFM is preferable for small networks. When networks count several hundreds, thousands, or even more nodes, the ETFM and EADM become equivalent; thus, we propose to use the latter, since it is faster.
These methodologies are not free of limitations and our work identifies possible choices depending on the problem considered. For instance, an interesting line of research may entail improving the estimate of the interval partition, which now is common to all the nodes in the network. It is possible that each node in the network evolves according to its own pattern, which would result in considering different interval partitions, one for each node in the network.
The TDF can be applied to systems of any size, if the main objective of the study is to minimize the number of false positives. Instead, the other two approaches are highly affected by the size of the network. Real examples of human social systems in which the ETFM should be used are: classrooms and small departments (as done in our work), small organizations, bars, or small conferences. Instead, real examples of social systems that would benefit of the EADM are: entire schools or work environments (as in our work), entire universities, large companies, or large conferences. These approaches do not have to be limited to social systems but they can be applied to animal groups [45], the world wide web [46], financial markets [47], brain circuits [48], and many others.

A.1. Generation of artificial temporal networks with arbitrary distributions of activity values and links in the backbone
We present here an extension of the generative algorithm proposed in our previous work [19]. Specifically, we introduce a mechanism to create a temporal network with known backbone network and arbitrary distribution backbone network G(λ) ∼ δ λ −λ is already described in our previous work [19]. The procedure is as follows.λ    (i) We generate a temporal network unfolding over an observation window of length T, partitioned into I consecutive intervals. We manually set the initial time of the first interval at the beginning of the time series, t in (1) = 1, and randomly draw without replacement the I − 1 initial time of the remaining intervals in {2, ..., T}, which we sort as t in (2) < t in (3) < · · · < t in (I). The generic interval ∆ has length τ (∆), and the average length of the intervals is τ (∆) = T/I. (ii) Each node in the network has a time-varying, piece-wise constant, individual activity, a i (t), which is extracted according to the following procedure.

S DS DM DI A Dataset
Theory: EADM Numerics: EADM Numerics: ETFM Figure C1. Comparison between the theoretical relative error (RE) and normalized root mean squared error (NRMSE) of the EADM, in equation (23), with the numerical results of both the EADM and ETFM, in equation (24). The Workplace dataset evolving with a temporal resolution set equal to 5 min are studied in panels (a) and (c), while systems evolving with a temporal resolution of 15 min are examined in panels (b) and (d). The 'S' label stands for SRH, the 'DS' for DSE, the 'DM' for DMCT, the 'DI' for DISQ, and the 'A' for the entire dataset.
• For the first interval, ∆ = 1, we extract N activities, one for each node in the network, from the activity distribution F(a). These values are held constant in the interval. • When 2 ∆ I, we may observe correlations between the activities in the present interval and the previous one, that is, between [t in (∆ − 1), t in (∆ − 1) + τ (∆ − 1) − 1] and [t in (∆), t in (∆) + τ (∆) − 1], respectively. Activities vary according to a i (t 2 ) = a i (t 1 )ρ + y(1 − ρ), where y is a random number extracted from F(a), and ρ is an autocorrelation parameter.
(iii) In the whole observation window [1, T], we generate a temporal network where both reducible and irreducible links are present.
• Reducible links are created according to equation (2).
• Once reducible links are formed, we appoint a fraction of them that appeared at least once in the overall temporal evolution as the irreducible links. For each of these irreducible links ij of the backbone network, we record the time instants where the adjacency matrix is zero and with probability λ ij , we create a temporal link in that time instant, that is, we switch A ij to 1. The parameter λ ij , extracted from G(λ), measures the preponderance of the link during the whole observation window, such that larger values of λ ij favor the association between the link and the backbone network.  1] in panels (c) and (f). We find that the ETFM provides better estimations than the EADM, whose error scales as the inverse of the size of network. Further, the ETFM is challenged by the excessive preponderance of the backbone network, whereby λ = 0.70 yields a normalized root mean squared error that is comparable to the one of the EADM. We corroborate the results in the main text regarding precision and recall of the EADM, ETFM, and TDF, by analyzing the case of a heterogeneous backbone network with G(λ) ∼ λ −2.3 and λ ∈ [0.02, 1], in figure A2. For small systems, the highest precision is given by the TDF, then by the ETFM, and finally by the EADM; with respect to recall, we find the opposite scenario. For large systems, all methodologies yield the same precision; the recalls of EADM and ETFM are similar, and they are both much higher than the TDF.

A.3. Results for artificial temporal networks with homogeneous activity distribution
To shed some light into the reasons for the small recall in the TDF, we consider the case a homogeneous activity distribution, F(a) ∼ δ a − √ 0.001 . Activities are time invariant as we consider a single interval, I = 1, with  Following the main text, we analyze another real network from the Sociopattern project [30]: the Workplace dataset. We provide a data summary in table C1; analyze the EADM and ETFM accuracy in figure C1; compare the number of links detected by the ETFM or TDF with those found by the EADM in figure C2; and display the Overlap coefficient according to equation (25) in figure C3. All the results support the claims in the main text: (i) the TDF has the most restrictive null hypothesis and finds the smallest amount of links in the backbone network; (ii) the ETFM detects less links than the EADM when the system is composed of few tens of nodes, and (iii) the ETFM and EADM yield equivalent results when the system reaches one hundred nodes.