Journal of Graph Algorithms and Applications Fast Approximation of Centrality

Social scientists use graphs to model group activities in social networks. An important property in this context is the centrality of a vertex: the inverse of the average distance to each other vertex. We describe a randomized approximation algorithm for centrality in weighted graphs. For graphs exhibiting the small world phenomenon, our method estimates the centrality of all vertices with high probability within a (1 +) factor iñ O(m) time. A preliminary version of this paper appeared in the proceedings of SODA 2001.


Introduction
In social network analysis, the vertices of a graph represent agents in a group and the edges represent relationships, such as communication or friendship.The idea of applying graph theory to analyze the connection between the structural centrality and group processes was introduced by Bavelas [4].Various measurement of centrality [7,15,16] have been proposed for analyzing communication activity, control, or independence within a social network.
We are particularly interested in closeness centrality [5,6,25], which is used to measure the independence and efficiency of an agent [15,16].Beauchamp [6] defined the closeness centrality of agent a j as where d(i, j) is the distance between agents i and j. 1 We are interested in computing centrality values for all agents.To compute the centrality for each agent, it is sufficient to solve the all-pairs shortest-paths (APSP) problem.No faster exact method is known.
The APSP problem can be solved by various algorithms in time O(nm + n 2 log n) [14,20], O(n 3 ) [13], or more quickly using fast matrix multiplication techniques [2,11,26,28], where n is the number of vertices and m is the number of edges in a graph.Faster specialized algorithms are known for graph classes such as interval graphs [3,9,24] and chordal graphs [8,18], and the APSP problem can be solved in average-case in time O(n 2 log n) for various types of random graph [10,17,21,23].Because these results are slow, specialized, or (with fast matrix multiplication) complicated and impractical, and because recent applications of social network theory to the internet may involve graphs with millions of vertices, it is of interest to consider faster approximations.Aingworth et al. [1] proposed an algorithm with an additive error of 2 for the unweighted APSP problem that runs in time O(n 2.5 √ log n).Dor et al. [12] improved the time to Õ(n 7/3 ).However it is still slow and does not provide a good approximation when the distances are small.
In this paper, we consider a method for fast approximation of centrality.We apply a random sampling technique to approximate the inverse centrality of all vertices in a weighted graph to within an additive error of ∆ with high probability in time O( log n 2 (n log n + m)), where > 0 and ∆ is the diameter of the graph.
It has been observed empirically that many social networks exhibit the small world phenomenon [22,27].That is, their diameter is For such networks, our method provides a near-linear time (1+ )-approximation to the centrality of all vertices.
We are given a graph G(V, E) with n vertices and m edges, the distance d(u, v) between two vertices u and v is the length of the shortest path between them.The diameter ∆ of a graph G is defined as max u,v∈V d(u, v).We define the centrality c v of vertex v as follows: If G is not connected, then c v = 0. Hence we will assume G is connected.

The Algorithm
We now describe a randomized approximation algorithm RAND for estimating centrality.RAND randomly chooses k sample vertices and computes singlesource shortest-paths (SSSP) from each sample vertex to all other vertices.The estimated centrality of a vertex is defined in terms of the average distance to the sample vertices.Algorithm RAND: 1. Let k be the number of iterations needed to obtain the desired error bound.
2. In iteration i, pick vertex v i uniformly at random from G and solve the SSSP problem with v i as the source.

Let ĉu = 1 k i=1 n d(vi,u) k(n−1)
be the centrality estimator for vertex u.
It is not hard to see that, for any k and u, the expected value of 1/ĉ u is equal to 1/c u .
Proof: Each vertex has equal probability of 1/n to be picked at each round.The expected value for 1 ĉu is In 1963, Hoeffding [19] gave the following theorem on probability bounds for sums of independent random variables.

Lemma 2 (Hoeffding
. Theorem 3 Let G be a connected graph with n vertices and diameter ∆.With high probability, algorithm RAND computes the inverse centrality estimator 1 ĉu to within ξ = ∆ of the inverse centrality 1 cu for all vertices u of G, using Θ( log n 2 ) samples, for > 0. Proof: We need to bound the probability that the error in estimating the inverse centrality of any vertex u is at most ξ.This is done by applying Hoeffding's bound with x i = d(vi,u)n (n−1) , µ = 1 cu , a i = 0, and b i = n∆ n−1 .We know E[1/ĉ u ] = 1/c u .Thus the probability that the difference between the estimated inverse centrality 1/ĉ u and the actual inverse centrality 1/c u is more than ξ is For ξ = ∆, using k = α • log n 2 samples, α ≥ 1, will cause the probability of error at any vertex to be bounded above by e.g.1/n 2 , giving at most 1/n probability of having greater than ∆ error anywhere in the graph. 2 Fredman and Tarjan [14] gave an algorithm for solving the SSSP problem in time O(n log n + m).Thus, the total running time of our algorithm is O(k • m) for unweighted graphs and O(k(n log n + m)) for weighted graphs.Thus, for k = Θ( log n 2 ), we have an O( log n 2 (n log n + m)) algorithm for approximation of the inverse centrality within an additive error of ∆ with high probability.

Conclusion
We gave an O( log n 2 (n log n+m)) randomized algorithm with additive error of ∆ for approximating the inverse centrality of weighted graphs.Many graph classes such as unweighted paths, cycles, and balanced trees, have inverse centrality proportional to ∆.More interestingly, Milgram [22] showed that many social networks have bounded diameter and inverse centrality.For such networks, our method provides a near-linear time (1 + )-approximation to the centrality of all vertices.