skip to main content
research-article

Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages

Published:06 March 2023Publication History
Skip Abstract Section

Abstract

“[A]llain Gersten, Hopfen, und Wasser” — 1516 Reinheitsgebot

We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning theory, to efficiently compute tight bounds on the maximum deviation of the estimates from the exact values. The MCERAs provide a sample-dependent approximation guarantee much stronger than the state-of-the-art, thanks to its use of variance-aware probabilistic tail bounds. The flexibility of the MCERAs allows us to introduce a unifying framework that can be instantiated with existing sampling-based estimators of BC, thus allowing a fair comparison between them, decoupled from the sample-complexity results with which they were originally introduced. Additionally, we prove novel sample-complexity results showing that, for all estimators, the sample size sufficient to achieve a desired approximation guarantee depends on the vertex-diameter of the graph, an easy-to-bound characteristic quantity. We also show progressive-sampling algorithms and extensions to other centrality measures, such as percolation centrality. Our extensive experimental evaluation of Bavarian shows the improvement over the state-of-the-art made possible by the MCERAs (2–4× reduction in the error bound), and it allows us to assess the different trade-offs between sample size and accuracy guarantees offered by the different estimators.

REFERENCES

  1. [1] AlGhamdi Ziyad, Jamour Fuad, Skiadopoulos Spiros, and Kalnis Panos. 2017. A benchmark for betweenness centrality approximation algorithms on large graphs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, 6:1--6:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Alman Josh and Williams Virginia Vassilevska. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 522539.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Anthonisse Jac. M.. 1971. The Rush in a Directed Graph. Technical Report BN 9/71. Stichting Mathematisch Centrum, Amsterdam, Netherlands.Google ScholarGoogle Scholar
  4. [4] Bader David A., Kintali Shiva, Madduri Kamesh, and Mihail Milena. 2007. Approximating betweenness centrality. In Proceedings of the Algorithms and Models for the Web-Graph. Bonato Anthony and Chung Fan R. K. (Eds.), Lecture Notes in Computer Science, Vol. 4863, Springer Berlin, 124137.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bartlett Peter L. and Mendelson Shahar. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, Nov. (2002), 463482.Google ScholarGoogle Scholar
  6. [6] Bavelas Alex. 1950. Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America 22, 6 (1950), 725730.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bennett George. 1962. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association 57, 297 (1962), 3345.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bergamini Elisabetta and Meyerhenke Henning. 2015. Fully-dynamic approximation of betweenness centrality. In Proceedings of the 23rd European Symposium on Algorithms. 155166.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bergamini Elisabetta and Meyerhenke Henning. 2016. Approximating betweenness centrality in fully-dynamic networks. Internet Mathematics 12, 5 (2016), 281314.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Bergamini Elisabetta, Meyerhenke Henning, and Staudt Christian L.. 2015. Approximating betweenness centrality in large evolving networks. In Proceedings of the 17th Workshop on Algorithm Engineering and Experiments. SIAM, 133146.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Boldi Paolo and Vigna Sebastiano. 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222262.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Bonchi Francesco, Morales Gianmarco De Francisci, and Riondato Matteo. 2016. Centrality measures on big graphs: Exact, approximated, and distributed algorithms. In Proceedings of the 25th International Conference Companion on World Wide Web. 10171020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Borassi Michele and Natale Emanuele. 2019. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. Journal of Experimental Algorithmics 24, 1 (2019), 135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Borgatti Stephen P. and Everett Martin G.. 2006. A graph-theoretic perspective on centrality. Social Networks 28, 4 (2006), 466484.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Boucheron Stéphane, Lugosi Gábor, and Massart Pascal. 2000. A sharp concentration inequality with application. Random Structures & Algorithms 16, 3 (2000), 277292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Boucheron Stéphane, Lugosi Gábor, and Massart Pascal. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Bousquet Olivier. 2002. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique 334, 6 (2002), 495500.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Brandes Ulrik. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163177.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Brandes Ulrik. 2008. On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30, 2 (2008), 136145.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Brandes Ulrik and Pich Christian. 2007. Centrality estimation in large networks. International Journal of Bifurcation and Chaos 17, 7 (2007), 23032318.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Chehreghani Mostafa Haghir, Bifet Albert, and Abdessalem Talel. 2018. Efficient exact and approximate algorithms for computing betweenness centrality in directed graphs. In Proceedings of the Advances in Knowledge Discovery and Data Mining. Phung Dinh, Tseng Vincent S., Webb Geoffrey I., Ho Bao, Ganji Mohadeseh, and Rashidi Lida (Eds.), Springer International Publishing, Cham, 752764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Chierichetti Flavio, Dasgupta Anirban, Kumar Ravi, Lattanzi Silvio, and Sarlós Tamás. 2016. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web. 471481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Chierichetti Flavio and Haddadan Shahrzad. 2018. On the complexity of sampling vertices uniformly from a graph. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 149:1--149:13.Google ScholarGoogle Scholar
  24. [24] Cousins Cyrus, Haddadan Shahrzad, and Upfal Eli. 2020. Making mean-estimation more efficient using an MCMC trace variance approach: DynaMITE. arXiv:2011.11129. Retrieved from https://arxiv.org/abs/2011.11129.Google ScholarGoogle Scholar
  25. [25] Cousins Cyrus and Riondato Matteo. 2020. Sharp uniform convergence bounds through empirical centralization. In Proceedings of the Advances in Neural Information Processing Systems. Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.), Vol. 33, Curran Associates, Inc., 1512315132. Retrieved from https://proceedings.neurips.cc/paper/2020/file/ac457ba972fb63b7994befc83f774746-Paper.pdf.Google ScholarGoogle Scholar
  26. [26] Cousins Cyrus, Wohlgemuth Chloe, and Riondato Matteo. 2021. Betweenness centrality approximation with variance-aware rademacher averages. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 196206.Google ScholarGoogle Scholar
  27. [27] Lima Alane M. de, Silva Murilo V. G. da, and Vignatti André L.. 2020. Estimating the percolation centrality of large networks through pseudo-dimension theory. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1839--1847.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Stefani Lorenzo De and Upfal Eli. 2019. A rademacher complexity based method for controlling power and confidence level in adaptive statistical analysis. IEEE International Conference on Data Science and Advanced Analytics (DSAA), 71--80.Google ScholarGoogle Scholar
  29. [29] Dolev Shlomi, Elovici Yuval, and Puzis Rami. 2010. Routing betweenness centrality. Journal of the ACM 57, 4(2010), 27 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Erdős Dóra, Ishakian Vatche, Bestavros Azer, and Terzi Evimaria. 2015. A divide-and-conquer algorithm for betweenness centrality. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 433441.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Fan Changjun, Zeng Li, Ding Yuhui, Chen Muhao, Sun Yizhou, and Liu Zhong. 2019. Learning to identify high betweenness centrality nodes from scratch. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 559--568. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Freeman Linton C.. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1 (1977), 3541.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Geisberger Robert, Sanders Peter, and Schultes Dominik. 2008. Better approximation of betweenness centrality. In Proceedings of the 10th Workshop on Algorithm Engineering and Experiments. SIAM, 90100.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Ghurye Jay and Pop Mihai. 2016. Better identification of repeats in metagenomic scaffolding. In Proceedings of the WABI 2016: Algorithms in Bioinformatics. Springer, 174184.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Green Oded, McColl Robert, and Bader David A.. 2012. A fast algorithm for streaming betweenness centrality. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust. IEEE, 1120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Haussler David. 1995. Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A 69, 2 (1995), 217232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Hayashi Takanori, Akiba Takuya, and Yoshida Yuichi. 2015. Fully dynamic betweenness centrality maintenance on massive networks. Proceedings of the VLDB Endowment 9, 2 (2015), 4859.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Hoeffding Wassily. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 1330.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Jacob Riko, Koschützki Dirk, Lehmann KatharinaAnna, Peeters Leon, and Tenfelde-Podehl Dagmar. 2005. Algorithms for centrality indices. In Proceedings of the Network Analysis.Brandes Ulrik and Erlebach Thomas (Eds.), Lecture Notes in Computer Science, Vol. 3418, Springer Berlin, 6282.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] John George H. and Langley Pat. 1996. Static versus dynamic sampling for data mining. In Proceedings of the 2nd Int. Conf. Knowl. Disc. Data Mining. The AAAI Press, Menlo Park, CA, 367370.Google ScholarGoogle Scholar
  41. [41] Kas Miray, Wachs Matthew, Carley Kathleen M., and Carley L. Richard. 2013. Incremental algorithm for updating betweenness centrality in dynamically growing networks. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE/ACM, 3340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Katzir Liran, Liberty Edo, Somekh Oren, and Cosma Ioana A.. 2014. Estimating sizes of social networks via biased sampling. Internet Mathematics 10, 3–4 (2014), 335359.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Koltchinskii Vladimir. 2001. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 5(2001), 19021914.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Aryeh Kontorovich and Iosif Pinelis. 2019. Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model. The Annals of Statistics 47, 5 (2019), 2822--2854.Google ScholarGoogle Scholar
  45. [45] Kourtellis Nicolas, Alahakoon Tharaka, Simha Ramanuja, Iamnitchi Adriana, and Tripathi Rahul. 2012. Identifying high betweenness centrality nodes in large social networks. Social Network Analysis and Mining 3, 4 (2012), 899914.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Kourtellis Nicolas, Morales Gianmarco De Francisci, and Bonchi Francesco. 2015. Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 24942506.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Leskovec Jure and Krevl Andrej. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data. Accessed January 2023.Google ScholarGoogle Scholar
  48. [48] Li Yixia, Li Shudong, Chen Yanshan, He Peiyan, Wu Xiaobo, and Han Weihong. 2019. Electric power grid invulnerability under intentional edge-based attacks. In Proceedings of the DependSys 2019: Dependability in Sensor, Cloud, and Big Data Systems and Applications. Springer Singapore, 454461.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Lim Yeon-sup, Menasche Daniel S., Ribeiro Bruno, Towsley Don, and Basu Prithwish. 2011. Online estimating the k central nodes of a network. In Proceedings of the IEEE Network Science Workshop. 118122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Maiya Arun S. and Berger-Wolf Tanya Y.. 2010. Online sampling of high centrality individuals in social networks. In Proceedings of the Advances in Knowl. Disc. Data Mining. Springer Berlin, 9198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Matta John, Ercal Gunes, and Sinha Koushik. 2019. Comparing the speed and accuracy of approaches to betweenness centrality approximation. Computational Social Networks 6, 1 (2019), 2.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] McLaughlin Adam and Bader David A.. 2014. Scalable and high performance betweenness centrality on the GPU. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis (2014), 572--583.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Newman Mark E. J. and Girvan Michelle. 2004. Finding and evaluating community structure in networks. Physical Review E 69(2004), 026113. Issue 2.Google ScholarGoogle Scholar
  54. [54] Opsahl Tore, Agneessens Filip, and Skvoretz John. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32, 3 (2010), 245251.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Pellegrina Leonardo, Cousins Cyrus, Vandin Fabio, and Riondato Matteo. 2020. MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Association for Computing Machinery, New York, NY, 21652174. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Pfeffer Jürgen and Carley Kathleen M.. 2012. k-Centralities: Local approximations of global measures based on shortest paths. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 10431050.Google ScholarGoogle Scholar
  57. [57] Pollard David. 1984. Convergence of Stochastic Processes. Springer-Verlag.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Pontecorvi Matteo and Ramachandran Vijaya. 2015. Fully dynamic betweenness centrality. In Proceedings of the 26th International Symposium on Algorithms and Computation. 331342.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Prountzos Dimitrios and Pingali Keshav. 2013. Betweenness centrality: Algorithms and implementations. In Proceedings of the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming.ACM, New York, NY, 3546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Riondato Matteo and Kornaropoulos Evgenios M.. 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Riondato Matteo and Upfal Eli. 2018. ABRA: Approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Sarıyüce Ahmet Erdem, Kaya Kamer, Saule Erik, and Çatalyürek Ümit V.. 2017. Graph manipulations for fast centrality computation. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Shalev-Shwartz Shai and Ben-David Shai. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Srebro Nathan and Sridharan Karthik. 2010. Note on Refined Dudley Integral Covering Number Bound. (2010). Retrieved from http://www.cs.cornell.edu/sridharan/dudley.pdf.Google ScholarGoogle Scholar
  65. [65] Staudt Christian L., Sazonovs Aleksejs, and Meyerhenke Henning. 2016. NetworKit: An interactive tool suite for high-performance network analysis. Network Science 4, 4 (2016), 508--530. http://www.cs.cornell.edu/∼sridharan/dudley.pdf. Accessed January 2023.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Strassen Volker. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 4 (1969), 354356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Vapnik Vladimir N.. 1998. Statistical Learning Theory. Wiley.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Vapnik Vladimir N. and Chervonenkis Alexey J.. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 2 (1971), 264280.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Yoshida Yuichi. 2014. Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 14161425.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Knowledge Discovery from Data
                  ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 6
                  July 2023
                  392 pages
                  ISSN:1556-4681
                  EISSN:1556-472X
                  DOI:10.1145/3582889
                  Issue’s Table of Contents

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 6 March 2023
                  • Online AM: 20 December 2022
                  • Accepted: 9 December 2022
                  • Revised: 24 September 2022
                  • Received: 2 December 2021
                  Published in tkdd Volume 17, Issue 6

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                Full Text

                View this article in Full Text.

                View Full Text

                HTML Format

                View this article in HTML Format .

                View HTML Format