Elsevier

Parallel Computing

Volume 47, August 2015, Pages 3-18
Parallel Computing

Incremental closeness centrality in distributed memory

https://doi.org/10.1016/j.parco.2015.01.003Get rights and content

Highlights

  • We propose a distributed memory framework for incremental closeness centrality computation.

  • We parallelize different components of the framework for faster solution.

  • Vectorization is applied to make the computation faster.

  • All the algorithms and techniques are experimentally validated.

  • Our framework proves to be practical for real time scenarios.

Abstract

Networks are commonly used to model traffic patterns, social interactions, or web pages. The vertices in a network do not possess the same characteristics: some vertices are naturally more connected and some vertices can be more important. Closeness centrality (CC) is a global metric that quantifies how important is a given vertex in the network. When the network is dynamic and keeps changing, the relative importance of the vertices also changes. The best known algorithm to compute the CC scores makes it impractical to recompute them from scratch after each modification. In this paper, we propose Streamer, a distributed memory framework for incrementally maintaining the closeness centrality scores of a network upon changes. It leverages pipelined, replicated parallelism, and SpMM-based BFSs, and it takes NUMA effects into account. It makes maintaining the closeness centrality values of real-life networks with millions of interactions significantly faster and obtains almost linear speedups on a 64 nodes 8 threads/node cluster.

Introduction

How central is a vertex in a network? Which vertices are more important during an entity dissemination? Centrality metrics have been used to answer such questions. They have been successfully used to carry analysis for various purposes such as power grid contingency analysis [16], quantifying importance in social networks [23], analysis of covert networks [18], decision/action networks [8], and even for finding the best store locations in cities [26]. As the networks become large, efficiency becomes a crucial concern while analyzing these networks. The algorithm with the best asymptotic complexity to compute the closeness and betweenness metrics [4] is believed to be asymptotically optimal [17]. The research on fast centrality computation have focused on approximation algorithms [7], [9], [24] and high performance computing techniques [22], [32], [20]. Today, the networks to be analyzed can be quite large, and we are always in a quest for faster techniques which help us to perform centrality-based analysis.

Many of today’s networks are dynamic. And for such networks, maintaining the exact centrality scores is a challenging problem which has been studied in the literature [10], [19], [27]. The problem can also arise for applications involving static networks such as the power grid contingency analysis and robustness evaluation of a network. The findings of such analyses and evaluations can be very useful to be prepared and take proactive measures; for instance if there is a natural risk or a possible adversarial attack that can yield undesirable changes on the network topology in the future. Similarly, in some applications, one might be interested in trying to find the minimal topology modifications on a network to set the centrality scores in a controlled manner. (Applications include speeding-up or containing the entity dissemination, and making the network immune to adversarial attacks).

Offline closeness centrality (CC) computation can be expensive for large-scale networks. Yet, one could hope that the incremental graph modifications can be handled in an inexpensive way. Unfortunately, as Fig. 1 shows, the effect of a local topology modification can be global. In a previous study, we proposed a sequential incremental closeness centrality algorithm which is orders of magnitude faster than the best offline algorithm [27]. Still, the algorithm was not fast enough to be used in practice. In a previous work, we proposed Streamer [29] to parallelize these incremental algorithms. In this paper, we present an improved version of Streamer, to efficiently parallelize the incremental CC computation on high-performance clusters.

The best available algorithm for the offline centrality computation is pleasingly parallel (and scalable if enough memory is available) since it involves n independent executions of the single-source shortest path (SSSP) algorithm [4]. In a naive distributed framework for the offline case, one can distribute the SSSPs to the nodes and gather their results. Here the computation is static, i.e., when the graph changes, the previous results are ignored and the same n SSSPs are re-executed. On the other hand, in the online approach, the graph modifications can arrive at any time even while the centrality scores for a previous modification are still being computed. Furthermore, the scores which need to be recomputed (the SSSPs that need to be executed) change depending on the modification of the graph. Finding these SSSPs and distributing them to the nodes is not a straightforward task. To be able to do that, the incremental algorithms maintain complex information such as the biconnected component decomposition of the current graph [27]. Hence, after each edge insertion/deletion, this information needs to be updated. There are several (synchronous and asynchronous) blocks in the online approach. And it is not trivial to obtain an efficient parallelization of the incremental algorithm. As our experiments will show, the dataflow programming model and pipelined parallelism are very useful to achieve a significant overlap among these computation/communication blocks and yield a scalable solution for the incremental centrality computation.

In this paper, we extend Streamer that we introduced in [29]. Our contributions in [29] can be summarized as follows:

  • 1.

    We proposed the first distributed-memory framework Streamer for the incremental closeness centrality computation problem which employs pipelined parallelism to achieve computation–computation and computation–communication overlap [29].

  • 2.

    The worker nodes we used in the experiments have 8 cores. In addition to the distributed-memory parallelization, we also leveraged the shared-memory parallelization and take NUMA effects into account [29].

  • 3.

    The framework scales linearly: when 63 worker nodes (8 cores/node) are used, Streamer obtains almost linear speedups compared to a single worker node-single thread execution [29].

In addition to above contributions, we introduce new extensions in this paper as follows:

  • 1.

    The Streamer framework is modular which makes it easily extendable. When the number of used nodes increases, the computation inevitably reaches a bottleneck on the extremities of the analysis pipeline which are not parallel. In [29], this effect appeared on one of the graph (web-NotreDame). Here, we show how the computation can be made parallel by leveraging the modularity of dataflow middleware.

  • 2.

    Using an SpMM-based BFS formulation, we significantly improved the incremental CC computation performance and show that the dataflow programming model makes Streamer highly modular and easy to enhance with novel algorithmic techniques.

  • 3.

    These new techniques provide an improvement of a factor between 2.2 and 9.3 times compared to the techniques presented in [29].

The paper is organized as follows: Section 2 introduces the notation, formally defines the closeness centrality metric, and describes the incremental approach in [27]. Section 3 presents DataCutter [3], our in-house distributed memory dataflow middleware leveraged in this work. Section 4 describes the proposed distributed framework for incremental centrality computations in detail. The experimental analysis is given in Section 5, and Section 6 concludes the paper.

Section snippets

Incremental closeness centrality

Let G=(V,E) be a network modeled as a simple undirected graph with n=|V| vertices and m=|E| edges where each node is represented by a vertex in V, and a node–node interaction is represented by an edge in E. Let ΓG(v) be the set of vertices which share an edge with v.

A graph G=(V,E) is a subgraph of G if VV and EE. A path is a sequence of vertices such that there exists an edge between consecutive vertices. Two vertices u,vV are connected if there is a path from u to v. If all vertex

DataCutter

Streamer employs DataCutter [3], our in-house dataflow programming framework for distributed memory systems. In DataCutter, the computations are carried by independent computing elements, called filters, that have different responsibilities and operate on data passing through them. DataCutter follows the component-based programming paradigm which has been used to describe and implement complex applications [11], [12], [13], [29] by way of components – distinct tasks with well-defined

Streamer

Streamer is implemented in the DataCutter framework. We propose to use the four-filter layout shown in Fig. 5. InstanceGenerator is responsible for sending the updates to all the other components. StreamingMaster does the work filtering for each update, explained in Section 2, and generates the workload for following components. ComputeCC component executes the real work and computes the updated CC scores for each incoming update. Aggregator does the necessary adjustments related to identical

Experiments

Streamer runs on the owens cluster in the Department of Biomedical Informatics at The Ohio State University. For the experiments, we used all the 64 computational nodes, each with dual Intel Xeon E5520 Quad-core CPUs (with 2-way Simultaneous Multithreading, and 8 MB of L3 cache per processor), 48 GB of main memory. The nodes are interconnected with 20 Gbps InfiniBand. The algorithms were run on CentOS 6, and compiled with GCC 4.8.1 using the -O3 optimization flag. DataCutter uses an

Conclusion

Maintaining the correctness of a graph analysis is important in today’s dynamic networks. Computing the closeness centrality scores from scratch after each graph modification is prohibitive, and even sequential incremental algorithms are too expensive for networks of practical relevance. In this paper, we proposed Streamer, a distributed memory framework which guarantees the correctness of the CC scores, exploits replicated and pipelined parallelism, and takes the hierarchical architecture of

Acknowledgments

This work was supported in parts by the NSF Grant OCI-0904809, and the Defense Threat Reduction Agency Grant HDTRA1-14-C-0007.

References (33)

  • M.D. Beynon et al.

    Distributed processing of very large datasets with DataCutter

    Parallel Comput.

    (2001)
  • T.D.R. Hartley et al.

    Improving performance of adaptive component-based dataflow middleware

    Parallel Comput.

    (2012)
  • V. Agarwal, F. Petrini, D. Pasetto, D.A. Bader, Scalable graph exploration on multicore processors, in: SuperComputing...
  • M. Belgin, G. Back, C.J. Ribbens, Pattern-based sparse matrix representation for memory-efficient SMVM kernels, in:...
  • U. Brandes

    A faster algorithm for betweenness centrality

    J. Math. Sociol.

    (2001)
  • A. Buluç et al.

    The combinatorial BLAS: design, implementation, and applications

    Int. J. High Perform. Comput. Appl. (IJHPCA)

    (2011)
  • A. Buluc, S. Williams, L. Oliker, J. Demmel, Reduced-bandwidth multithreaded algorithms for sparse matrix-vector...
  • S.Y. Chan, I.X.Y. Leung, P. Liò, Fast centrality approximation in modular networks, in: Proc. of the 1st ACM...
  • Ö. Şimşek, A.G. Barto, Skill characterization based on betweenness, in: Proc. of Advances in Neural Information...
  • D. Eppstein, J. Wang, Fast approximation of centrality, in: Proceedings of the Twelfth Annual ACM-SIAM Symposium on...
  • O. Green, R. McColl, D.A. Bader. A fast algorithm for streaming betweenness centrality, in: Proc. of SocialCom,...
  • T.D.R. Hartley, Ü. V. Çatalyürek, A. Ruiz, F. Igual, R. Mayo, M. Ujaldon, Biomedical image analysis on a cooperative...
  • T.D.R. Hartley, A.R. Fasih, C.A. Berdanier, F. Özgüner, Ü. V. Çatalyürek, Investigating the use of GPU-accelerated...
  • J. Hopcroft et al.

    Algorithm 447: efficient algorithms for graph manipulation

    Commun. ACM

    (1973)
  • Y. Jia et al.

    Edge vs. node parallelism for graph centrality metrics

  • S. Jin, Z. Huang, Y. Chen, D.G. Chavarría-Miranda, J. Feo, P.C. Wong, A novel application of parallel betweenness...
  • Cited by (0)

    View full text