research-article

Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages

Authors:
Cyrus Cousins

University of Massachusetts Amherst, Amherst, MA

University of Massachusetts Amherst, Amherst, MA

0000-0002-1691-0282
View Profile

,
Chloe Wohlgemuth

Amherst College, Amherst, MA

Amherst College, Amherst, MA

0000-0003-2353-3646
View Profile

,
Matteo Riondato

Amherst College, Amherst, MA

Amherst College, Amherst, MA

0000-0003-2523-4420
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 17 Issue 6Article No.: 78pp 1–47https://doi.org/10.1145/3577021

Published:06 March 2023Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

“[A]llain Gersten, Hopfen, und Wasser” — 1516 Reinheitsgebot

We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning theory, to efficiently compute tight bounds on the maximum deviation of the estimates from the exact values. The MCERAs provide a sample-dependent approximation guarantee much stronger than the state-of-the-art, thanks to its use of variance-aware probabilistic tail bounds. The flexibility of the MCERAs allows us to introduce a unifying framework that can be instantiated with existing sampling-based estimators of BC, thus allowing a fair comparison between them, decoupled from the sample-complexity results with which they were originally introduced. Additionally, we prove novel sample-complexity results showing that, for all estimators, the sample size sufficient to achieve a desired approximation guarantee depends on the vertex-diameter of the graph, an easy-to-bound characteristic quantity. We also show progressive-sampling algorithms and extensions to other centrality measures, such as percolation centrality. Our extensive experimental evaluation of Bavarian shows the improvement over the state-of-the-art made possible by the MCERAs (2–4× reduction in the error bound), and it allows us to assess the different trade-offs between sample size and accuracy guarantees offered by the different estimators.

REFERENCES

[1] AlGhamdi Ziyad, Jamour Fuad, Skiadopoulos Spiros, and Kalnis Panos. 2017. A benchmark for betweenness centrality approximation algorithms on large graphs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. ACM, 6:1--6:12.Google ScholarDigital Library
[2] Alman Josh and Williams Virginia Vassilevska. 2021. A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 522–539.Google ScholarCross Ref
[3] Anthonisse Jac. M.. 1971. The Rush in a Directed Graph. Technical Report BN 9/71. Stichting Mathematisch Centrum, Amsterdam, Netherlands.Google Scholar
[4] Bader David A., Kintali Shiva, Madduri Kamesh, and Mihail Milena. 2007. Approximating betweenness centrality. In Proceedings of the Algorithms and Models for the Web-Graph. Bonato Anthony and Chung Fan R. K. (Eds.), Lecture Notes in Computer Science, Vol. 4863, Springer Berlin, 124–137.Google ScholarCross Ref
[5] Bartlett Peter L. and Mendelson Shahar. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, Nov. (2002), 463–482.Google Scholar
[6] Bavelas Alex. 1950. Communication patterns in task-oriented groups. The Journal of the Acoustical Society of America 22, 6 (1950), 725–730.Google ScholarCross Ref
[7] Bennett George. 1962. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association 57, 297 (1962), 33–45.Google ScholarCross Ref
[8] Bergamini Elisabetta and Meyerhenke Henning. 2015. Fully-dynamic approximation of betweenness centrality. In Proceedings of the 23rd European Symposium on Algorithms. 155–166.Google ScholarCross Ref
[9] Bergamini Elisabetta and Meyerhenke Henning. 2016. Approximating betweenness centrality in fully-dynamic networks. Internet Mathematics 12, 5 (2016), 281–314.Google ScholarCross Ref
[10] Bergamini Elisabetta, Meyerhenke Henning, and Staudt Christian L.. 2015. Approximating betweenness centrality in large evolving networks. In Proceedings of the 17th Workshop on Algorithm Engineering and Experiments. SIAM, 133–146.Google ScholarCross Ref
[11] Boldi Paolo and Vigna Sebastiano. 2014. Axioms for centrality. Internet Mathematics 10, 3–4 (2014), 222–262.Google ScholarCross Ref
[12] Bonchi Francesco, Morales Gianmarco De Francisci, and Riondato Matteo. 2016. Centrality measures on big graphs: Exact, approximated, and distributed algorithms. In Proceedings of the 25th International Conference Companion on World Wide Web. 1017–1020.Google ScholarDigital Library
[13] Borassi Michele and Natale Emanuele. 2019. KADABRA is an ADaptive Algorithm for Betweenness via Random Approximation. Journal of Experimental Algorithmics 24, 1 (2019), 1–35.Google ScholarDigital Library
[14] Borgatti Stephen P. and Everett Martin G.. 2006. A graph-theoretic perspective on centrality. Social Networks 28, 4 (2006), 466–484.Google ScholarCross Ref
[15] Boucheron Stéphane, Lugosi Gábor, and Massart Pascal. 2000. A sharp concentration inequality with application. Random Structures & Algorithms 16, 3 (2000), 277–292.Google ScholarDigital Library
[16] Boucheron Stéphane, Lugosi Gábor, and Massart Pascal. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press.Google ScholarCross Ref
[17] Bousquet Olivier. 2002. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Mathematique 334, 6 (2002), 495–500.Google ScholarCross Ref
[18] Brandes Ulrik. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163–177.Google ScholarCross Ref
[19] Brandes Ulrik. 2008. On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30, 2 (2008), 136–145.Google ScholarCross Ref
[20] Brandes Ulrik and Pich Christian. 2007. Centrality estimation in large networks. International Journal of Bifurcation and Chaos 17, 7 (2007), 2303–2318.Google ScholarCross Ref
[21] Chehreghani Mostafa Haghir, Bifet Albert, and Abdessalem Talel. 2018. Efficient exact and approximate algorithms for computing betweenness centrality in directed graphs. In Proceedings of the Advances in Knowledge Discovery and Data Mining. Phung Dinh, Tseng Vincent S., Webb Geoffrey I., Ho Bao, Ganji Mohadeseh, and Rashidi Lida (Eds.), Springer International Publishing, Cham, 752–764.Google ScholarDigital Library
[22] Chierichetti Flavio, Dasgupta Anirban, Kumar Ravi, Lattanzi Silvio, and Sarlós Tamás. 2016. On sampling nodes in a network. In Proceedings of the 25th International Conference on World Wide Web. 471–481.Google ScholarDigital Library
[23] Chierichetti Flavio and Haddadan Shahrzad. 2018. On the complexity of sampling vertices uniformly from a graph. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, 149:1--149:13.Google Scholar
[24] Cousins Cyrus, Haddadan Shahrzad, and Upfal Eli. 2020. Making mean-estimation more efficient using an MCMC trace variance approach: DynaMITE. arXiv:2011.11129. Retrieved from https://arxiv.org/abs/2011.11129.Google Scholar
[25] Cousins Cyrus and Riondato Matteo. 2020. Sharp uniform convergence bounds through empirical centralization. In Proceedings of the Advances in Neural Information Processing Systems. Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.), Vol. 33, Curran Associates, Inc., 15123–15132. Retrieved from https://proceedings.neurips.cc/paper/2020/file/ac457ba972fb63b7994befc83f774746-Paper.pdf.Google Scholar
[26] Cousins Cyrus, Wohlgemuth Chloe, and Riondato Matteo. 2021. Betweenness centrality approximation with variance-aware rademacher averages. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 196–206.Google Scholar
[27] Lima Alane M. de, Silva Murilo V. G. da, and Vignatti André L.. 2020. Estimating the percolation centrality of large networks through pseudo-dimension theory. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1839--1847.Google ScholarDigital Library
[28] Stefani Lorenzo De and Upfal Eli. 2019. A rademacher complexity based method for controlling power and confidence level in adaptive statistical analysis. IEEE International Conference on Data Science and Advanced Analytics (DSAA), 71--80.Google Scholar
[29] Dolev Shlomi, Elovici Yuval, and Puzis Rami. 2010. Routing betweenness centrality. Journal of the ACM 57, 4(2010), 27 pages.Google ScholarDigital Library
[30] Erdős Dóra, Ishakian Vatche, Bestavros Azer, and Terzi Evimaria. 2015. A divide-and-conquer algorithm for betweenness centrality. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 433–441.Google ScholarCross Ref
[31] Fan Changjun, Zeng Li, Ding Yuhui, Chen Muhao, Sun Yizhou, and Liu Zhong. 2019. Learning to identify high betweenness centrality nodes from scratch. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 559--568. DOI:Google ScholarDigital Library
[32] Freeman Linton C.. 1977. A set of measures of centrality based on betweenness. Sociometry 40, 1 (1977), 35–41.Google ScholarCross Ref
[33] Geisberger Robert, Sanders Peter, and Schultes Dominik. 2008. Better approximation of betweenness centrality. In Proceedings of the 10th Workshop on Algorithm Engineering and Experiments. SIAM, 90–100.Google ScholarCross Ref
[34] Ghurye Jay and Pop Mihai. 2016. Better identification of repeats in metagenomic scaffolding. In Proceedings of the WABI 2016: Algorithms in Bioinformatics. Springer, 174–184.Google ScholarCross Ref
[35] Green Oded, McColl Robert, and Bader David A.. 2012. A fast algorithm for streaming betweenness centrality. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust. IEEE, 11–20.Google ScholarDigital Library
[36] Haussler David. 1995. Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory, Series A 69, 2 (1995), 217–232.Google ScholarDigital Library
[37] Hayashi Takanori, Akiba Takuya, and Yoshida Yuichi. 2015. Fully dynamic betweenness centrality maintenance on massive networks. Proceedings of the VLDB Endowment 9, 2 (2015), 48–59.Google ScholarDigital Library
[38] Hoeffding Wassily. 1963. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58, 301 (1963), 13–30.Google ScholarCross Ref
[39] Jacob Riko, Koschützki Dirk, Lehmann KatharinaAnna, Peeters Leon, and Tenfelde-Podehl Dagmar. 2005. Algorithms for centrality indices. In Proceedings of the Network Analysis.Brandes Ulrik and Erlebach Thomas (Eds.), Lecture Notes in Computer Science, Vol. 3418, Springer Berlin, 62–82.Google ScholarCross Ref
[40] John George H. and Langley Pat. 1996. Static versus dynamic sampling for data mining. In Proceedings of the 2nd Int. Conf. Knowl. Disc. Data Mining. The AAAI Press, Menlo Park, CA, 367–370.Google Scholar
[41] Kas Miray, Wachs Matthew, Carley Kathleen M., and Carley L. Richard. 2013. Incremental algorithm for updating betweenness centrality in dynamically growing networks. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE/ACM, 33–40.Google ScholarDigital Library
[42] Katzir Liran, Liberty Edo, Somekh Oren, and Cosma Ioana A.. 2014. Estimating sizes of social networks via biased sampling. Internet Mathematics 10, 3–4 (2014), 335–359.Google ScholarCross Ref
[43] Koltchinskii Vladimir. 2001. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47, 5(2001), 1902–1914.Google ScholarDigital Library
[44] Aryeh Kontorovich and Iosif Pinelis. 2019. Exact lower bounds for the agnostic probably-approximately-correct (PAC) machine learning model. The Annals of Statistics 47, 5 (2019), 2822--2854.Google Scholar
[45] Kourtellis Nicolas, Alahakoon Tharaka, Simha Ramanuja, Iamnitchi Adriana, and Tripathi Rahul. 2012. Identifying high betweenness centrality nodes in large social networks. Social Network Analysis and Mining 3, 4 (2012), 899–914.Google ScholarCross Ref
[46] Kourtellis Nicolas, Morales Gianmarco De Francisci, and Bonchi Francesco. 2015. Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2494–2506.Google ScholarDigital Library
[47] Leskovec Jure and Krevl Andrej. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data. Accessed January 2023.Google Scholar
[48] Li Yixia, Li Shudong, Chen Yanshan, He Peiyan, Wu Xiaobo, and Han Weihong. 2019. Electric power grid invulnerability under intentional edge-based attacks. In Proceedings of the DependSys 2019: Dependability in Sensor, Cloud, and Big Data Systems and Applications. Springer Singapore, 454–461.Google ScholarCross Ref
[49] Lim Yeon-sup, Menasche Daniel S., Ribeiro Bruno, Towsley Don, and Basu Prithwish. 2011. Online estimating the k central nodes of a network. In Proceedings of the IEEE Network Science Workshop. 118–122.Google ScholarDigital Library
[50] Maiya Arun S. and Berger-Wolf Tanya Y.. 2010. Online sampling of high centrality individuals in social networks. In Proceedings of the Advances in Knowl. Disc. Data Mining. Springer Berlin, 91–98.Google ScholarDigital Library
[51] Matta John, Ercal Gunes, and Sinha Koushik. 2019. Comparing the speed and accuracy of approaches to betweenness centrality approximation. Computational Social Networks 6, 1 (2019), 2.Google ScholarCross Ref
[52] McLaughlin Adam and Bader David A.. 2014. Scalable and high performance betweenness centrality on the GPU. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis (2014), 572--583.Google ScholarDigital Library
[53] Newman Mark E. J. and Girvan Michelle. 2004. Finding and evaluating community structure in networks. Physical Review E 69(2004), 026113. Issue 2.Google Scholar
[54] Opsahl Tore, Agneessens Filip, and Skvoretz John. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32, 3 (2010), 245–251.Google ScholarCross Ref
[55] Pellegrina Leonardo, Cousins Cyrus, Vandin Fabio, and Riondato Matteo. 2020. MCRapper: Monte-Carlo rademacher averages for poset families and approximate pattern mining. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.Association for Computing Machinery, New York, NY, 2165–2174. DOI:Google ScholarDigital Library
[56] Pfeffer Jürgen and Carley Kathleen M.. 2012. k-Centralities: Local approximations of global measures based on shortest paths. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 1043–1050.Google Scholar
[57] Pollard David. 1984. Convergence of Stochastic Processes. Springer-Verlag.Google ScholarCross Ref
[58] Pontecorvi Matteo and Ramachandran Vijaya. 2015. Fully dynamic betweenness centrality. In Proceedings of the 26th International Symposium on Algorithms and Computation. 331–342.Google ScholarCross Ref
[59] Prountzos Dimitrios and Pingali Keshav. 2013. Betweenness centrality: Algorithms and implementations. In Proceedings of the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming.ACM, New York, NY, 35–46.Google ScholarDigital Library
[60] Riondato Matteo and Kornaropoulos Evgenios M.. 2016. Fast approximation of betweenness centrality through sampling. Data Mining and Knowledge Discovery 30, 2 (2016), 438–475.Google ScholarDigital Library
[61] Riondato Matteo and Upfal Eli. 2018. ABRA: Approximating betweenness centrality in static and dynamic graphs with rademacher averages. ACM Transactions on Knowledge Discovery from Data 12, 5 (2018), 61.Google ScholarDigital Library
[62] Sarıyüce Ahmet Erdem, Kaya Kamer, Saule Erik, and Çatalyürek Ümit V.. 2017. Graph manipulations for fast centrality computation. ACM Transactions on Knowledge Discovery from Data 11, 3 (2017), 1–25.Google ScholarDigital Library
[63] Shalev-Shwartz Shai and Ben-David Shai. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.Google ScholarCross Ref
[64] Srebro Nathan and Sridharan Karthik. 2010. Note on Refined Dudley Integral Covering Number Bound. (2010). Retrieved from http://www.cs.cornell.edu/sridharan/dudley.pdf.Google Scholar
[65] Staudt Christian L., Sazonovs Aleksejs, and Meyerhenke Henning. 2016. NetworKit: An interactive tool suite for high-performance network analysis. Network Science 4, 4 (2016), 508--530. http://www.cs.cornell.edu/∼sridharan/dudley.pdf. Accessed January 2023.Google ScholarCross Ref
[66] Strassen Volker. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 4 (1969), 354–356.Google ScholarDigital Library
[67] Vapnik Vladimir N.. 1998. Statistical Learning Theory. Wiley.Google ScholarDigital Library
[68] Vapnik Vladimir N. and Chervonenkis Alexey J.. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 2 (1971), 264–280.Google ScholarCross Ref
[69] Yoshida Yuichi. 2014. Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1416–1425.Google ScholarDigital Library

Index Terms

Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages

Recommendations

Bavarian: Betweenness Centrality Approximation with Variance-Aware Rademacher Averages
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning ...
Read More
Fast approximation of betweenness centrality through sampling
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices in a network in terms of the fraction of shortest paths that pass through them. Exact computation in large networks ...
Read More
Efficient Centrality Maximization with Rademacher Averages
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 6
July 2023
392 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3582889
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 March 2023
- Online AM: 20 December 2022
- Accepted: 9 December 2022
- Revised: 24 September 2022
- Received: 2 December 2021
Published in tkdd Volume 17, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Concentration bounds
dynamic graphs
percolation centrality
random sampling
sample complexity
statistical learning theory
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 265
  Total Downloads
- Downloads (Last 12 months)176
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Bavarian: Betweenness Centrality Approximation with Variance-aware Rademacher Averages

ACM Transactions on Knowledge Discovery from Data

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Bavarian: Betweenness Centrality Approximation with Variance-Aware Rademacher Averages

Fast approximation of betweenness centrality through sampling

Efficient Centrality Maximization with Rademacher Averages