Probabilistic network sparsification with ego betweenness

Sparsification is the process of decreasing the number of edges in a network while one or more topological properties are preserved. For probabilistic networks, sparsification has only been studied to preserve the expected degree of the nodes. In this work we introduce a sparsification method to preserve ego betweenness. Moreover, we study the effect of backboning and density on the resulting sparsified networks. Our experimental results show that the sparsification of high density networks can be used to efficiently and accurately estimate measures from the original network, with the choice of backboning algorithm only partially affecting the result.

information loss (De Choudhury et al. 2010). The second approach is sampling in which a percentage of possible worlds are sampled (Jin et al. 2011;Li et al. 2015;Maniu et al. 2017). As probabilistic graphs' entropy 1 is high, the variance of measures over samples and the required number of samples are also high (Parchas et al. 2018;Potamias et al. 2010;Dang et al. 2015;Kahn and Marshall 1953). Therefore, to decrease the required number of samples Parchas et al. proposed the probabilistic networks sparsification approach in which the number of edges and graph's entropy are reduced while nodes' expected degrees are preserved by modifying the remaining edges' probabilities (Parchas et al. 2018).
In this work we generalize the definition of probabilistic network sparsification. More specifically, we define sparsification as a method to decrease a probabilistic graph's entropy while any specific measure is preserved. This paves the way to examine various topological properties in sparsification. We focus on ego betweenness as a fundamental path-based measure to show the broader applicability of sparsification.
The sparsification procedure includes two steps. (i) In the first step we extract a backbone of the original graph through which we decrease original graph's density 2 and (ii) in the second step we modify the edges' probabilities of the backbone graph. This raises three questions: first, which backboning method leads to a better sparsified graph in terms of preserving the original graph's property? Second, to what extent can we decrease the original graph's density? Third, does the original graph's density have any impact of the quality of the sparsified graph? Figure 1 shows a probabilistic graph and two steps of sparsification. Figure 1a is a probabilistic graph with 10 edges. Figure 1b shows a backbone extracted from the original graph as the result of the first step. The second column in Table 1 shows the mean relative error (MRE) of expected degree and expected ego betweenness in the backbone. Figure 1c illustrates the second step of sparsification in which edge probabilities are modified such that nodes' ego betweenness are preserved. The third column in Table 1 shows the MRE values of expected degree and expected ego betweenness in this network. Fig. 1 Procedure of sparsifying a probabilistic graph: a input graph to be sparsified, b backbone and c sparsified graph with our proposed method in which our goal is to preserve nodes' ego betweenness 1 Entropy of network G is H(G) = e∈E −p e log p e − (1 − p e ) log (1 − p e ) (Parchas et al. 2018).

Problem statement
A probabilistic network G = (V , E, p) is a graph (V, E) whose edges are associated with a probability of existence p : E → (0, 1]. One of the most practiced approach to analyze probabilistic networks is Monte-Carlo sampling. However, the number of required samples increases as the entropy of the probabilistic graph increases. To decrease the required number of samples, Parchas et al. (2018) have defined probabilistic graph sparsification as: Definition 1 Given a probabilistic graph G = (V , E, p) and a sparsification ratio Where, ExD * v is the expected degree of node v in the sparsified graph G * .
We change this definition to be more inclusive and not to be limited to expected degree. As a result, we can consider other structural properties and examine the effect of changing edge probabilities on that structural property. Problem 1 Given a probabilistic network G = (V , E, p) and sparsification ratio 0 < α < 1 , sparsification is the process to extract a new probabilistic network where M and M * are the values of a structural property in the original graph G and the sparsified graph G * respectively.
In this paper, ego betweenness is the structural property that we aim to preserve. Betweenness of node u is: where, g st (u) is the number of shortest paths between s and t passing through u and g st is the total number of shortest paths between s and t (Freeman 1978). In probabilistic networks, the probability of paths decreases as the number of constituent edges increases. Contrary to deterministic networks in which all shortest paths with different length have the same contribution in the calculation of betweenness, lengthy shortest paths have lower effect in the calculation of betweenness in probabilistic networks. Therefore, ego betweenness is a reasonable alternative for betweenness in probabilistic networks. Ego betweenness of node u is the betweenness of u between its immediate neighbors (Everett and Borgatti 2005). Although the calculation of ego betweenness in probabilistic networks is computationally expensive, it can be estimated as: where p uv is the probability of the edge between nodes u and v, and N(u) is the set of nodes having an incident edge to u in probabilistic graph G and if edge (v, w) / ∈ E , we consider p vw = 0 . Computational complexity is O(L 2 ) where L is the number of incident edges to node u (Kaveh et al. 2020).

Solution framework
The solution of Problem 1 includes two steps: where α is the sparsification ratio, 2 modify the probability of the edges in E b such that nodes' ego betweenness is as close as possible to their value in the original graph. The resulting graph is a sparsified graph G ′ = (V , E ′ , p ′ ) , where |E ′ | = |E b | = α|E| . As we see in "Solution framework" section, E ′ = E b in the output of the Gradient-Descent algorithm (see "Gradientdescent (GD)" section) and E ′ ⊂ E in the output of the Expectation-Maximization algorithm with E ′ and E b not necessarily equal (see "Expectation-maximization (EM)" section).
Therefore, we define ego betweenness discrepancy as follows: (1) Definition 2 (ego betweenness discrepancy) Given a probabilistic network G and a sparsified network G ′ , ego betweenness discrepancy of node u is: where, EB G (u) is the ego betweenness of u in G.
Formally, in the second step of the solution we aim to minimize D = v∈V |δ(v)| . Linear programming (LP) is a possible solution to get the global minimum D. However, it has been shown that not only it is inefficient on large graphs, but also it does not explicitly reduce entropy (Parchas et al. 2018). Therefore, we adapt Gradient-Descent (GD) and Expectation-Maximization (EM) algorithms as in Parchas et al. (2018) to approximate the optimal probability adjustment in a small proportion of time compared to LP while decreasing the entropy. Since in both algorithms we need to have a differentiable function as the objective function, and v∈V |δ(v)| is not differentiable at 0, then we use D = v∈V δ 2 (v) (Parchas et al. 2018) as the objective function hereinafter.

Backboning
In this section we introduce concisely four backboning methods that have been utilized in this research.
1 Noise corrected (NC): The first backboning method is a simplified version of the noise corrected method (Coscia and Neffke 2017;Coscia and Rossi 2019) in which an edge is kept if its probability is higher than the ratio of the sum of the expected degree of its incident nodes divided by the total number of edges connecting to these two nodes. In the NC backboning algorithm the edge under consideration is excluded in the calculation of the ratio. 2 Maximum Spanning Tree (MST): The second method is the iterative spanning tree method (Nagamochi and Ibaraki 1992;Parchas et al. 2018). First we construct the backbone graph and initialize it with the same set of nodes in the original graph and empty set of edges. Then, in the first iteration of the algorithm we remove the edges of the spanning tree of G and add them to the backbone. After that in each iteration we compute the spanning tree/forest of the remaining graph and move the selected edges from remained graph to the backbone. This procedure is repeated until the backbone includes α|E| edges. 3 Monte Carlo (MC): The third method is Monte-Carlo sampling through which α|E| edges of the input graph are sampled. 4 Hybrid (MST/MC): The forth method is the combination of the second and the third methods (Parchas et al. 2018). First α ′ |E| edges where α ′ < α are selected via the iterative spanning tree method and then (α − α ′ )|E| edges are sampled via the Monte-Carlo sampling method. (3) We illustrate the differences between the backboning methods on the example of a complete graph K 10 as shown in Fig. 2. For all backboning methods we repeat the procedures as long as the backbone maintained as single connected component. The first column in Fig. 3 shows the four resulting backbones with α = 0.31 resulting from NC, MST, MST/MC ( α ′ = 0.155) and MC methods respectively. Although the edges with the highest probabilities are most likely to be represented, there are still considerable differences among the four resulting backbones (e.g., the edge with probability .95 is only present in three of the four backbones.
In the following two sections we describe the Gradient-Descent and the Expectation-Maximization algorithms where the first one modifies edges' probabilities and the second one rewires backbones as well as modifies edges' probabilities.

Gradient-descent (GD)
, the Gradient-Descent algorithm picks one edge e ∈ E b in each iteration and optimizes that edge's probability. To achieve this goal, in each iteration we have to reduce the objective function D = v∈V δ 2 (v) . According to Eq. 3, if the probability of the edge e = (u, v) changes by ∂p i+1 e at iteration i + 1 , then the discrepancies of two groups of nodes will change; first the discrepancies of incident nodes to that edge, i.e., u and v and second the discrepancies of common neighbors of the incident nodes. The discrepancies of those nodes that are not members of these two groups do not change because of ∂p i+1 e . Then the derivative of the objective function at iteration i + 1 with respect to the change of p e at that iteration is: where W(u, v) is the set of common neighbors of nodes u and v. In Fig. 4, if the probability of edge e = (u, v) increases, the ego betweenness of node u increases, because first nodes x 1 and x 2 rely more on u to be connected to v and second node v relies relatively more on node u to connect to common neighbors w 1 and w 2 if compared to direct connections, i.e., (v, w 1 ) and (v, w 2 ) . At the same time, if the probability of e increases, the ego betweenness of the common nodes w 1 and w 2 decreases as nodes v and u rely relatively more on their adjacent edge (u, v) in comparison to the two-hop paths that cross nodes w 1 and w 2 .
In the following we express the change of discrepancies based on the change of p e (probability of edge (u, v) in Fig. 4) at iteration i + 1 based on the aforementioned intuitions. Equations 5 and 6 represent the change of discrepancies on nodes adjacent to the edge e, and Eq. 7 shows the change of discrepancies of the common neighbors of u and v.
where, N(u) are the neighbors of u, and W (u, v) = N (u) ∩ N (v) is the set of common neighbors of u and v, and p i uv is the probability of edge (u, v) at iteration i. Hence, by calculating p i+1 uv as follows, we will be assured that v∈V δ 2 (v) will get one step closer to the local minimum value: where, 0 < h ≤ 1 is the gradient descent step size. For proof see Appendix 1.
Algorithm 1 illustrates the Gradient-Descent algorithm in which the objective function converges to the local minimum. In line 1, the sparsified graph is initialized with backbone graph ( G b ). Then, the algorithm takes iterative steps to reach a local minimum of D. In each iteration, it picks an edge and assigns a new probability to it according to Eq. 8 (line 5). Lines 6-10 assure that the new probability value does not violate constraint 0 ≤ p ≤ 1 . At the beginning and at the end of each iteration the objective function is calculated in lines 3 and 12 respectively. If the difference between these two values is equal to or lower than the input threshold τ GD , the algorithm finishes.
The second column in Fig. 3 shows the resulting sparsified graphs applied on the four backbones.

Expectation-maximization (EM)
Algorithm 1 only modifies the probability of the edges of the input backbone graph G b . Therefore, the output sparsified graph G ′ is dependent not only on the probability modification of the Gradient-Descent (GD) algorithm but also on G b . The authors in Parchas et al. (2018) proposed Expectation-Maximization (EM) algorithm that both rewires G b and modifies edge probabilities.
The objective function of the EM algorithm is v∈V δ 2 (v) . Algorithm 2 illustrates the EM algorithm. The EM algorithm first initializes G ′ with the input backbone graph G b . Lines 2-20, for each edge in E ′ the algorithm replaces it with an edge in E\E ′ that yields lower D. In more details, in lines 5-6 the selected edge is removed from G ′ and the discrepancies of all corresponding nodes are updated. Line 7, selects the node that has the highest discrepancy, v c . In lines 8-15, all incident edges to v c that are not available in the current G ′ are examined and the one that has the maximum gain is added to G ′ . Line 18 runs the GD algorithm to modify edge probabilities of G ′ in that iteration. At the beginning and at the end of each iteration the objective function is calculated and if the difference between these two values is equal to or lower than the input threshold τ EM , the algorithm finishes.
The gain of candidate edges are calculated as follow: where, D(G ′ ) is the objective function computed on G ′ and D(G ′ + e c ) is the objective function computed on G ′ after adding e c . It should be noted that if the probability of an edge becomes zero in the final output of the Gradient-Descent or Expectation-Maximization algorithms, that edge will be removed from the sparsified graph. This is because according to the definition of probabilistic networks, edge probability has to be in the range (0, 1]. As a result, the condition |E ′ | = α|E| becomes |E ′ | ≈ α|E| . Notice that the condition |E ′ | = α|E| cannot be obtained exactly in practice anyway, because it can define a non-integer number of edges. As a result, the sparsified graph will only contain approximately α|E| edges.
An easy way to avoid probabilities to go to 0 would just be to set a minimum probability of ǫ > 0 in Algorithm 1 line 9. A small ǫ would keep the edge but not have any significant impact on the measures. However, the objective of sparsification is to reduce the size and entropy of the network, so having an algorithm that may lead to a slightly lower size and entropy than requested is practically reasonable in our opinion, without needing any ad hoc fine-tuning.

Experiments
We evaluate the effect of the input graph's density ("Density" section), the impact of the backboning method used ("Backbone" section), and discuss performance ("Performance" section).

Datasets
To evaluate the proposed method, we use three real datasets and six synthetic datasets. While the real datasets give general insights into the scalability and performance on realistic networks where sparsification will potentially play an important role, we use the synthetic datasets to specifically study the impact of density and network structure.

Brain network
The first dataset is a brain network in which nodes are regions of interest (ROIs). The number of nodes based on the modified version of the standard AAL 3 scheme is 89 (Termenon et al. 2016). This graph is a complete graph and an edge probability is the absolute value of Pearson correlation between the incident ROIs' activity timeseries. A probability value indicates the likelihood that two incident nodes (ROIs) will be functional in the next scanning experiment.

Enron
The second dataset is a snowball sample of the Enron email network in which nodes represent employees and there is an edge between two nodes if at least one email has been exchanged between them. Edge probabilities quantify the likelihood that a new email will be exchanged between a pair of nodes at time t, p i,j = 1 − k (1 − exp(−µ(t − t k ))) . µ is the scaling parameter, and t k is the time when message k has been exchanged between nodes i and j (Pfeiffer and Neville 2011).

FriendFeed
A snowball sample of the FriendFeed online social network (Magnani et al. 2010) with 9894 nodes and 172567 edges is the third dataset. There is an edge between two nodes if they follow each other mutually and the probability of that edge is the likelihood that the two incident nodes will exchange a message in the future. This probability is quantified by the exponential function p ij = 1 − exp(−µn) , where n is the number of messages exchanged between them in any direction and µ is the scaling parameter with the value of 0.2.

Synthetic networks
In addition to the real networks, we also assess multiple synthetic networks, three Erdős-Rényi and three Forest-Fire networks (Leskovec et al. 2005) with densities ρ = {0.1, 0.5, 0.9} . While Erdős-Rényi is a simple random network, the Forest-Fire network is characterized by a heavy-tailed node degree distribution and community structures. Edge probabilities are assigned using a uniform random distribution between 0 and 1. All datasets are summarized in Table 2.

Density
In this section we study the impact of density and average degree on the sparsification. Figure 5 shows the properties of Erdős-Rényi networks that have been sparsified with the proposed algorithms GD and EM that preserve nodes' ego betweenness (btw). It also shwos GD and EM as proposed in Parchas et al. (2018) that preserve nodes' expected degree (deg). Columns 1-3 represent relative entropy, i.e. H(G ′ ) H(G) , mean Fig. 5 Impact of density and backboning methods on sparsification -synthetic data: Relative entropy (column one), mean relative error of expected degree (column two) and mean relative error of expected ego betweenness (column three) of sparsified Erdős-Rényi graphs of different density and average degree. The results compare different sparsification methods over various backboning methods. In all graphs sparsification ratio is α = 0.45 . In MST/MC backboning method α ′ = 0.5α relative error (MRE) of expected degree (deg) and mean relative error of ego betweenness (btw) respectively. Rows 1-3 show the results for Erdős-Rényi networks with average degree D = 899.1, 499.5 and 99.9 accordingly. Figure 5 shows that all algorithms obtain better results for networks with high density ρ and average degree D . We have repeated the same experiments over Forest-Fire networks and the results confirm the same conclusion. For the sake of brevity we only include figures for Erdős-Rényi networks. The same evaluations have been performed over the real datasets in Fig. 6. All methods extract low entropy sparsified graphs from the brain network and the FriendFeed network. On the contrary all methods show poor results for the Enron network. This shows that higher average degree of the input graph gives more choices to optimize edge probabilities, and we conclude from these results that sparsification algorithms work better for graphs with high average degree.

Backbone
The structure of the backbone does not seem to have a significant impact on the final sparsified graph. All experiments including Erdős-Rényi (Fig. 5), Forest-Fire as well as Fig. 6 Impact of density and backboning methods on sparsification-real data: Relative entropy (column one), mean relative error of expected degree (column two) and mean relative error of expected ego betweenness (column three) after sparsification of a-c the brain network with α = 0.55 , d-f the FriendFeed network with α = 0.1 , and g-i the Enron network with α = 0.65 . In the MST/MC backboning method α ′ = 0.5α the real networks (Fig. 6) show only little variation among the backboning algorithms. This happens due to the fact that all backboning methods pick high probability edges as the constituent edges of the backbones (or in the case of Monte-Carlo (MC) method the likelihood that high probability edges are picked is higher). As a result, the majority of the edges in the backbones are common. Our experiments show that between 60 and 80% of the edges in all backboning methods are the same.

Performance
In this section we evaluate the performance of the proposed sparsification methods in the format of computational time (4.4.1), the required number of samples (4.4.2) and precision (4.4.3).

Time complexity
Computational complexity of a node's expected degree and approximated ego betweenness are O(L) and O(L 2 ) respectively where L is the number of incident edges to that node. Therefore, sparsification based on approximated ego betweenness is computationally more expensive compared to expected degree. Figure 7 shows the time spent to sparsify Erdős-Rényi, the brain and the FriendFeed networks depending on the backboning method. As expected Gradient-Descent (GD) and Expectation-Maximization (EM) with approximated ego betweenness (btw) take more time than GD and EM with expected degree (deg). However, EM (deg) takes more time compared to GD (btw). While time complexity of EM (deg) is higher than GD (btw), Fig. 5 demonstrates that GD (btw) outperforms EM (deg) in reducing entropy. Figure 5c shows that if we use the (btw) algorithms instead of the (deg) algorithms the MRE (btw) error decreases from around 0.2 to around 0.1 while the MRE (deg) error in Fig. 5b increases from around 0.1 to around 0.12. This pattern can be seen in other networks in Figs. 5 and 6.

Number of samples
In this section, we aim to obtain the required number of samples to evaluate measures in sparsified graphs. To estimate the value of measure M, we start with N 1 = 100 samples and gradually increase the number of samples until the mean of the measure converges, i.e. |M N i − M N i−1 | < τ error , where M N i is the mean value of measure M over N i samples. In this regard we examine betweenness, normalized closeness (harmonic Marchiori and Latora 2000) and eigenvector centrality. Figure 8 shows that the calculated measures over all sparsified graphs converge with lower number of samples compared to the original graph. This happens because sparsification methods decrease graphs' entropy and subsequently the variance of measures decreases. As a result the required number of samples is lower compared to what is needed for the original graph. Figure 8 shows that measures converge with a lower number of samples if the networks are sparsified with Gradient-Descent (btw). Eigenvector centrality converges faster in the case that the Erdős-Rényi network has been sparsified with the Expectation-Maximization (btw) method. For the brain network all measures converge faster in the sparsified graphs with GD (btw) and EM (deg). However, note that Gradient-Descent (btw) takes less time compared to Expectation-Maximization (deg) for the brain network (see Fig. 7).

Precision
In order to examine precision of measures over sparsified graphs first we compute measures over 50000 samples of the original graph and consider them as the actual value of those measures (although they are estimations of measures, calculating the actual measures are computationally prohibitive). Then, we obtain those measures over 1000 samples of each sparsified graph. Finally, we represent precision of the measures over sparsified graphs by calculating MRE between these two values.
As mentioned in the introduction the majority of measures over probabilistic networks are represented as probability distributions. One of the most fundamental measures is Fig. 8 Required number of samples to calculate measures over sparsified graphs with errors lower than a specific threshold: a-c The original graph is Erdős-Rényi with ρ = 0.1 . The sparsification ratio is α = 0.45 and backboning method the maximum spanning tree method, d-f the original graph is the brain network, sparsification ratio is α = 0.55 and the backboning method is the noise corrected method shortest path length distribution between a pair of nodes. Therefore, instead of comparing the mean value of shortest path length distribution between two nodes we intend to calculate the distance between two distributions. In doing so we require a method to calculate the minimum change that is needed to convert a distribution to another (Parchas et al. 2018). In this regard earth mover's distance ( D em ) is an appropriate option (Rubner et al. 2000).
Rows 1 and 2 in Fig. 9 represent Erdős-Rényi and the brain networks respectively. The first column in Fig. 9 shows earth mover's distance over various α sparsified graphs. All methods show a similar precision using the earth mover's distance. For small alpha the earth mover's distance has lower error if networks are sparsified with Expectation-Maximization (btw) method. Similarly, betweenness can be estimated with lower errors if the networks are sparsified with Gradient-Descent (btw) and Expectation-Maximization (btw) methods (see Fig. 9).
Gradient-Descent (btw) and Expectation-Maximization (btw) are likewise good methods to sparsify networks if we aim to estimate nodes' closeness with a lower number of samples. However, Fig. 9f shows that Gradient-Descent (deg) outperforms other methods for high values of α . This can be explained because of the often high correlation between closeness and degree in networks. As these two measures are highly correlated, sparsifiying a graph while preserving expected degree leads to preserving expected closeness. Fig. 9 Error of calculation of measures over sparsified graphs with 1000 samples: In all figures the y-axis shows the difference between the value of measures in the original graphs and the estimated value (over 1000 samples) in the sparsified graphs. The x-axis shows sparsification ratios. a-c Erdős-Rényi network with density ρ = 0.1 and MST/MC backbone ( α ′ = 0.5α ), and d-f the brain network with noise corrected backbone

Discussion and conclusion
In this paper we generalized the definition of probabilistic network sparsification proposed in Parchas et al. (2018). Our generalized definition is more inclusive and is able to incorporate any topological measure in the process of sparsification. In particular, we examined estimated expected ego betweenness and derived mathematical equations that represent the change of nodes' discrepancies as the function of edge probabilities.
However, we should note that using other topological measures may have scalability issues if we use them naively. A major challenge in probabilistic networks analysis is that all measures are represented in the form of probability distributions and their calculation is expensive. Therefore, developing a closed-formed relation that calculates or estimates each measure may require more research before being able to use it in an efficient sparsification process. Among all measures expected degree can be calculated precisely with O(L) and has been used in sparsification in Parchas et al. (2018) and ego betweenness can be estimated in O(L 2 ) as done in this paper.
Therefore, using other measures in sparsification algorithms requires (1) calculating/estimating that measure with an efficient time complexity, and (2) finding the relationship of the change of those measures by changing the edges' probabilities.
To evaluate the proposed sparsification methods, we examined various backboning methods (iterative MST, Noise corrected and Monte-Carlo sampling) over multiple synthetic and real datasets. Our experimental results show that the denser a graph is, the better sparsified graphs yield regardless of which sparsification method is used. Better here means lower discrepancies and lower MRE when we compare measures over original and sparsified graphs. This can be explained by the fact that probabilities are real numbers between 0 and 1 and this limits variation of edge probabilities. More precisely we can not increase the values of edge probabilities to be more than 1 in order to compensate positive discrepancies or decrease the values to be less than 0 to compensate negative discrepancies. If the graph is too sparse, the sparsification process may result in all edges having extreme probabilities, which can no longer be updated in the following iterations.
Moreover, it should be noted that the distribution of edge probabilities of the synthetic datasets used in the experiments reported in this paper are uniformly distributed between 0 and 1. We have repeated our experiments for not-skewed (Normal) and skewed (Beta) distributions. While for distributions with a mean lower than 0.5 we have not observed a significant difference with the current experiments, for distributions with higher mean values we have observed lower entropy for the sparsified graphs as well as higher mean relative errors. These errors even increase when we choose small values for α . The intuition on this finding is that when edges probabilities are high on average and we remove ( 1 − α ) edges in the backboning stage, discrepancies will be high compared to the case where edges probabilities are low on average. Therefore, Gradient-Descent and Expectation-Maximization algorithms have to minimize discrepancies by adding the probability of the edges available in the sparsified graph. However, as those edges' probabilities are initially high, then (1) their probability will increase to one and as a result the entropy of the sparsified network will be low (or even zero and in this case the sparsified graph is a deterministic graph) and (2) the minimum discrepancies are still high as edges' probabilities can not be higher than one and as a result the mean relative errors will be high (see Figs. 10 and 11 in Appendix 2). This can affect the estimation of measures such as closeness and shortest path length considerably. For example in the case that the resulting sparsified graph is a deterministic graph and we aim to estimate the reliability between two nodes, it will be estimated either as 0 or 1 which is not a reasonable estimation.
Finally, our experiments show that no sparsification method is consistently outperforming the others. While one method may accurately preserve shortest path length distributions on one network, it does not necessarily have satisfying results for other measures.

Fig. 11
Beta distributions: relative entropy (column one), mean relative error of expected degree (column two) and mean relative error of expected ego betweenness (column three) of sparsified Erdős-Rényi graphs with 500 nodes and density ρ = 0.5 . Edge probabilities are assigned according to a Beta distribution with (row 1) B(1, 4) and mean = 0.2, and (row 2) B(7, 3) and mean = 0.7