Abstract
“[A]llain Gersten, Hopfen, und Wasser” — 1516 Reinheitsgebot
We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning theory, to efficiently compute tight bounds on the maximum deviation of the estimates from the exact values. The MCERAs provide a sample-dependent approximation guarantee much stronger than the state-of-the-art, thanks to its use of variance-aware probabilistic tail bounds. The flexibility of the MCERAs allows us to introduce a unifying framework that can be instantiated with existing sampling-based estimators of BC, thus allowing a fair comparison between them, decoupled from the sample-complexity results with which they were originally introduced. Additionally, we prove novel sample-complexity results showing that, for all estimators, the sample size sufficient to achieve a desired approximation guarantee depends on the vertex-diameter of the graph, an easy-to-bound characteristic quantity. We also show progressive-sampling algorithms and extensions to other centrality measures, such as percolation centrality. Our extensive experimental evaluation of Bavarian shows the improvement over the state-of-the-art made possible by the MCERAs (2–4× reduction in the error bound), and it allows us to assess the different trade-offs between sample size and accuracy guarantees offered by the different estimators.
- [1] . 2017. A benchmark for betweenness centrality approximation algorithms on large graphs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, 6:1--6:12.Google ScholarDigital Library
- [2] . 2021. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 522–539.Google ScholarCross Ref
- [3] . 1971. The Rush in a Directed Graph.
Technical Report BN 9/71. Stichting Mathematisch Centrum, Amsterdam, Netherlands.Google Scholar - [4] . 2007. Approximating betweenness centrality. In Proceedings of the Algorithms and Models for the Web-Graph. and (Eds.),
Lecture Notes in Computer Science , Vol. 4863, Springer Berlin, 124–137.Google ScholarCross Ref - [5] . 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, Nov. (2002), 463–482.Google Scholar
- [6] . 1950. Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America 22, 6 (1950), 725–730.Google ScholarCross Ref
- [7] . 1962. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association 57, 297 (1962), 33–45.Google ScholarCross Ref
- [8] . 2015. Fully-dynamic approximation of betweenness centrality. In Proceedings of the 23rd European Symposium on Algorithms. 155–166.Google ScholarCross Ref
- [9] . 2016. Approximating betweenness centrality in fully-dynamic networks. Internet Mathematics 12, 5 (2016), 281–314.Google ScholarCross Ref
- [10] . 2015. Approximating betweenness centrality in large evolving networks. In Proceedings of the 17th Workshop on Algorithm Engineering and Experiments. SIAM, 133–146.Google ScholarCross Ref
- [11] . 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222–262.Google ScholarCross Ref
- [12] . 2016. Centrality measures on big graphs: Exact, approximated, and distributed algorithms. In Proceedings of the 25th International Conference Companion on World Wide Web. 1017–1020.Google ScholarDigital Library
- [13] . 2019. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. Journal of Experimental Algorithmics 24, 1 (2019), 1–35.Google ScholarDigital Library
- [14] . 2006. A graph-theoretic perspective on centrality. Social Networks 28, 4 (2006), 466–484.Google ScholarCross Ref
- [15] . 2000. A sharp concentration inequality with application. Random Structures & Algorithms 16, 3 (2000), 277–292.Google ScholarDigital Library
- [16] . 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.Google ScholarCross Ref
- [17] . 2002. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique 334, 6 (2002), 495–500.Google ScholarCross Ref
- [18] . 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163–177.Google ScholarCross Ref
- [19] . 2008. On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30, 2 (2008), 136–145.Google ScholarCross Ref
- [20] . 2007. Centrality estimation in large networks. International Journal of Bifurcation and Chaos 17, 7 (2007), 2303–2318.Google ScholarCross Ref
- [21] . 2018. Efficient exact and approximate algorithms for computing betweenness centrality in directed graphs. In Proceedings of the Advances in Knowledge Discovery and Data Mining. , , , , , and (Eds.), Springer International Publishing, Cham, 752–764.Google ScholarDigital Library
- [22] . 2016. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web. 471–481.Google ScholarDigital Library
- [23] . 2018. On the complexity of sampling vertices uniformly from a graph. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 149:1--149:13.Google Scholar
- [24] . 2020. Making mean-estimation more efficient using an MCMC trace variance approach: DynaMITE. arXiv:2011.11129. Retrieved from https://arxiv.org/abs/2011.11129.Google Scholar
- [25] . 2020. Sharp uniform convergence bounds through empirical centralization. In Proceedings of the Advances in Neural Information Processing Systems. , , , , and (Eds.), Vol. 33, Curran Associates, Inc., 15123–15132. Retrieved from https://proceedings.neurips.cc/paper/2020/file/ac457ba972fb63b7994befc83f774746-Paper.pdf.Google Scholar
- [26] . 2021. Betweenness centrality approximation with variance-aware rademacher averages. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 196–206.Google Scholar
- [27] . 2020. Estimating the percolation centrality of large networks through pseudo-dimension theory. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1839--1847.Google ScholarDigital Library
- [28] . 2019. A rademacher complexity based method for controlling power and confidence level in adaptive statistical analysis. IEEE International Conference on Data Science and Advanced Analytics (DSAA), 71--80.Google Scholar
- [29] . 2010. Routing betweenness centrality. Journal of the ACM 57, 4(2010), 27 pages.Google ScholarDigital Library
- [30] . 2015. A divide-and-conquer algorithm for betweenness centrality. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 433–441.Google ScholarCross Ref
- [31] . 2019. Learning to identify high betweenness centrality nodes from scratch. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 559--568.
DOI: Google ScholarDigital Library - [32] . 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1 (1977), 35–41.Google ScholarCross Ref
- [33] . 2008. Better approximation of betweenness centrality. In Proceedings of the 10th Workshop on Algorithm Engineering and Experiments. SIAM, 90–100.Google ScholarCross Ref
- [34] . 2016. Better identification of repeats in metagenomic scaffolding. In Proceedings of the WABI 2016: Algorithms in Bioinformatics. Springer, 174–184.Google ScholarCross Ref
- [35] . 2012. A fast algorithm for streaming betweenness centrality. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust. IEEE, 11–20.Google ScholarDigital Library
- [36] . 1995. Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A 69, 2 (1995), 217–232.Google ScholarDigital Library
- [37] . 2015. Fully dynamic betweenness centrality maintenance on massive networks. Proceedings of the VLDB Endowment 9, 2 (2015), 48–59.Google ScholarDigital Library
- [38] . 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13–30.Google ScholarCross Ref
- [39] . 2005. Algorithms for centrality indices. In Proceedings of the Network Analysis. and (Eds.),
Lecture Notes in Computer Science , Vol. 3418, Springer Berlin, 62–82.Google ScholarCross Ref - [40] . 1996. Static versus dynamic sampling for data mining. In Proceedings of the 2nd Int. Conf. Knowl. Disc. Data Mining. The AAAI Press, Menlo Park, CA, 367–370.Google Scholar
- [41] . 2013. Incremental algorithm for updating betweenness centrality in dynamically growing networks. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE/ACM, 33–40.Google ScholarDigital Library
- [42] . 2014. Estimating sizes of social networks via biased sampling. Internet Mathematics 10, 3–4 (2014), 335–359.Google ScholarCross Ref
- [43] . 2001. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 5(2001), 1902–1914.Google ScholarDigital Library
- [44] Aryeh Kontorovich and Iosif Pinelis. 2019. Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model. The Annals of Statistics 47, 5 (2019), 2822--2854.Google Scholar
- [45] . 2012. Identifying high betweenness centrality nodes in large social networks. Social Network Analysis and Mining 3, 4 (2012), 899–914.Google ScholarCross Ref
- [46] . 2015. Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2494–2506.Google ScholarDigital Library
- [47] . 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data. Accessed January 2023.Google Scholar
- [48] . 2019. Electric power grid invulnerability under intentional edge-based attacks. In Proceedings of the DependSys 2019: Dependability in Sensor, Cloud, and Big Data Systems and Applications. Springer Singapore, 454–461.Google ScholarCross Ref
- [49] . 2011. Online estimating the k central nodes of a network. In Proceedings of the IEEE Network Science Workshop. 118–122.Google ScholarDigital Library
- [50] . 2010. Online sampling of high centrality individuals in social networks. In Proceedings of the Advances in Knowl. Disc. Data Mining. Springer Berlin, 91–98.Google ScholarDigital Library
- [51] . 2019. Comparing the speed and accuracy of approaches to betweenness centrality approximation. Computational Social Networks 6, 1 (2019), 2.Google ScholarCross Ref
- [52] . 2014. Scalable and high performance betweenness centrality on the GPU. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis (2014), 572--583.Google ScholarDigital Library
- [53] . 2004. Finding and evaluating community structure in networks. Physical Review E 69(2004), 026113. Issue 2.Google Scholar
- [54] . 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32, 3 (2010), 245–251.Google ScholarCross Ref
- [55] . 2020. MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Association for Computing Machinery, New York, NY, 2165–2174.
DOI: Google ScholarDigital Library - [56] . 2012. k-Centralities: Local approximations of global measures based on shortest paths. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 1043–1050.Google Scholar
- [57] . 1984. Convergence of Stochastic Processes. Springer-Verlag.Google ScholarCross Ref
- [58] . 2015. Fully dynamic betweenness centrality. In Proceedings of the 26th International Symposium on Algorithms and Computation. 331–342.Google ScholarCross Ref
- [59] . 2013. Betweenness centrality: Algorithms and implementations. In Proceedings of the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming.ACM, New York, NY, 35–46.Google ScholarDigital Library
- [60] . 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438–475.Google ScholarDigital Library
- [61] . 2018. ABRA: Approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 61.Google ScholarDigital Library
- [62] . 2017. Graph manipulations for fast centrality computation. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 1–25.Google ScholarDigital Library
- [63] . 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.Google ScholarCross Ref
- [64] . 2010. Note on Refined Dudley Integral Covering Number Bound. (2010). Retrieved from http://www.cs.cornell.edu/sridharan/dudley.pdf.Google Scholar
- [65] . 2016. NetworKit: An interactive tool suite for high-performance network analysis. Network Science 4, 4 (2016), 508--530. http://www.cs.cornell.edu/∼sridharan/dudley.pdf. Accessed January 2023.Google ScholarCross Ref
- [66] . 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 4 (1969), 354–356.Google ScholarDigital Library
- [67] . 1998. Statistical Learning Theory. Wiley.Google ScholarDigital Library
- [68] . 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 2 (1971), 264–280.Google ScholarCross Ref
- [69] . 2014. Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1416–1425.Google ScholarDigital Library
Index Terms
- Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages
Recommendations
Bavarian: Betweenness Centrality Approximation with Variance-Aware Rademacher Averages
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningWe present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning ...
Fast approximation of betweenness centrality through sampling
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningBetweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices in a network in terms of the fraction of shortest paths that pass through them. Exact computation in large networks ...
Efficient Centrality Maximization with Rademacher Averages
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningThe identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed ...
Comments