Abstract
We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe \(\{1,\ldots ,n \}\), find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor \(1-1/e\) in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to \(1-1/e\), that use sublinear space \(o(mn)\). Our main results are:
-
Two \((1-1/e-\epsilon )\) approximation algorithms: One uses \(O(\epsilon ^{-1})\) passes and \(\tilde {O}(\epsilon ^{-2} k)\) space whereas the other uses only a single pass but \(\tilde {O}(\epsilon ^{-2} m)\) space. \(\tilde {O}(\cdot )\) suppresses polylog factors.
-
We show that any approximation factor better than \((1-(1-1/k)^{k})\approx 1-1/e\) in constant passes requires \({\Omega }(m)\) space for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, \((1-\epsilon )\) approximation algorithm using \(\tilde {O}\left (\epsilon ^{-2} m \cdot \min (k,\epsilon ^{-1})\right )\) space.
We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges.
-
We show that any constant approximation in constant passes requires \({\Omega }(N)\) space for constant k whereas \(\tilde {O}(\epsilon ^{-2}N)\) space is sufficient for a \((1-\epsilon )\) approximation and arbitrary k in a single pass.
-
For regular graphs, we show that \(\tilde {O}(\epsilon ^{-3}k)\) space is sufficient for a \((1-\epsilon )\) approximation in a single pass. We generalize this to a \((\kappa -\epsilon )\) approximation when the ratio between the minimum and maximum degree is bounded below by \(\kappa \).
Similar content being viewed by others
Notes
Note that their work addressed the more general problem of maximizing sub-modular functions.
We consider \(1-1/poly(m)\) or \(1-1/poly(n)\) as high probability.
The number of guesses can be reduced to \(\lceil \log _{2} k \rceil \) if the size of the largest set is known since this gives a k approximation of OPT. The size of the large set can be computed in one additional pass if necessary.
References
Ageev, A.A., Sviridenko, M.: Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: IPCO, volume 1610 of Lecture Notes in Computer Science, pp. 17–30. Springer (1999)
Ageev, A.A., Sviridenko, M.: Pipage rounding: A new method of constructing algorithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)
Ahn, K.J, Cormode, G., Guha, S., McGregor, A., Wirth, A.: Correlation clustering in data streams. In: ICML, volume 37 of JMLR Workshop and Conference Proceedings, pp. 2237–2246, JMLR.org (2015)
Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: SODA, pp. 459–467. SIAM (2012)
Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: PODS, pp. 5–14. ACM (2012)
Ahn, K.J., Guha, S., McGregor, A.: Spectral sparsification in dynamic graph streams. In: APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pp. 1–10. Springer (2013)
Anagnostopoulos, A., Becchetti, L., Bordino, I., Leonardi, S., Mele, I., Sankowski, P.: Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst. 33(3), 11:1–11:35 (2015)
Assadi, S.: Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In: PODS, pp. 321–335. ACM (2017)
Assadi, S., Khanna, S., Li, Y.: Tight bounds for single-pass streaming complexity of the set cover problem. In: STOC, pp. 698–711. ACM (2016)
Assadi, S., Khanna, S., Li, Y., Yaroslavtsev, G.: Maximum matchings in dynamic graph streams and the simultaneous communication model. In: SODA, pp. 1345–1364. SIAM (2016)
Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.Th.: Online maximum k-coverage. Discret. Appl. Math. 160(13-14), 1901–1913 (2012)
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680. ACM (2014)
Badanidiyuru, A., Vondrák, J.: Fast algorithms for maximizing submodular functions. In: SODA, pp. 1497–1514. SIAM (2014)
Bateni, M.H., Esfandiari, H., Mirrokni, V.S.: Almost optimal streaming algorithms for coverage problems CoRR, arXiv:1610.08096 (2016)
Bhattacharya, S., Henzinger, M., Nanongkai, D., Tsourakakis, C.E.: Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In: STOC, pp. 173–182. ACM (2015)
Bonnet, É., Escoffier, B., Paschos, V.Th., Stamoulis, G.: A 0.821-ratio purely combinatorial algorithm for maximum k-vertex cover in bipartite graphs. In: LATIN, volume 9644 of Lecture Notes in Computer Science, pp. 235–248. Springer (2016)
Caskurlu, B., Mkrtchyan, V., Parekh, O., Subramani, K.: On partial vertex cover and budgeted maximum coverage problems in bipartite graphs. In: IFIP TCS, volume 8705 of Lecture Notes in Computer Science, pp. 13–26. Springer (2014)
Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. Electronic Colloquium on Computational Complexity (ECCC) 18, 62 (2011)
Chakrabarti, A., Kale, S.: Submodular maximization meets streaming: Matchings, matroids, and more. In: IPCO, volume 8494 of Lecture Notes in Computer Science, pp. 210–221. Springer (2014)
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. IEEE Computer Society (2003)
Chakrabarti, A., Wirth, A.: Incidence geometries and the pass complexity of semi-streaming set cover. In: SODA, pp. 1365–1373. SIAM (2016)
Chekuri, C., Gupta, S., Quanrud, K.: Streaming algorithms for submodular function maximization. In: ICALP (1), volume 9134 of Lecture Notes in Computer Science, pp. 318–330. Springer (2015)
Chekuri, C., Kumar, A.: Maximum coverage problem with group budget constraints and applications. In: APPROX-RANDOM, volume 3122 of Lecture Notes in Computer Science, pp. 72–83. Springer (2004)
Chitnis, R., Cormode, G., Esfandiari, H., Hajiaghayi, M.T., McGregor, A., Monemizadeh, M., Vorotnikova, S.: Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In: SODA, pp. 1326–1344. SIAM (2016)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)
Cormode, G., Karloff, H.J., Wirth, A.: Set cover algorithms for very large datasets. In: CIKM, pp. 479–488. ACM (2010)
Emek, Y., Rosén, A.: Semi-streaming set cover - (extended abstract). In: ICALP (1), volume 8572 of Lecture Notes in Computer Science, pp. 453–464. Springer (2014)
Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
Guha, S., McGregor, A., Tench, D.: Vertex and hyperedge connectivity in dynamic graph streams. In: PODS, pp. 241–247. ACM (2015)
Har-Peled, S., Indyk, P., Mahabadi, S., Vakilian, A.: Towards tight bounds for the streaming set cover problem. In: PODS, pp. 371–383. ACM (2016)
Kapralov, M., Lee, Y.T., Musco, C., Musco, C., Sidford, A.: Single pass spectral sparsification in dynamic streams. IEEE Computer Society (2014)
Kapralov, M., Woodruff, D.P.: Spanners and sparsifiers in dynamic streams. In: PODC, pp. 272–281. ACM (2014)
Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. Theory of Computing 11, 105–147 (2015)
Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Inf. Process. Lett. 70(1), 39–45 (1999)
Kogan, D., Krauthgamer, R.: Sketching cuts in graphs and hypergraphs. In 6th Innovations in Theoretical Computer Science (2015)
Konrad, C.: Maximum matching in turnstile streams. In: ESA, volume 9294 of Lecture Notes in Computer Science, pp. 840–852. Springer (2015)
Krause, A., Guestrin, C.: Near-optimal observation selection using submodular functions. In: AAAI, pp. 1650–1654. AAAI Press (2007)
Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. TOPC 2(3), 14 (2015)
McGregor, A.: Graph stream algorithms: a survey. SIGMOD Record 43(1), 9–20 (2014)
McGregor, A., Tench, D., Vorotnikova, S., Vu, H.T.: Densest subgraph in dynamic graph streams. In: MFCS (2), volume 9235 of Lecture Notes in Computer Science, pp. 472–482. Springer (2015)
McGregor, A., Vorotnikova, S., Vu, H.T.: Better algorithms for counting triangles in data streams. In: PODS, pp. 401–411. ACM (2016)
McGregor, A., Vu, H.T.: Better streaming algorithms for the maximum coverage problem. In: ICDT, volume 68 of LIPIcs, pp. 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
Panconesi, A., Srinivasan, A.: Randomized distributed edge coloring via an extension of the chernoff-hoeffding bounds. SIAM J. Comput. 26(2), 350–368 (1997)
Radhakrishnan, J., Shannigrahi, S.: Streaming algorithms for 2-coloring uniform hypergraphs. In: Algorithms and Data Structures - 12th International Symposium, WADS 2011, New york. Proceedings, pp. 667–678 (2011)
Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: SDM, pp. 697–708. SIAM (2009)
Schmidt, J.P., Siegel, A., Srinivasan, A.: Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8(2), 223–250 (1995)
Spielman, D.A., Teng, S.H.: Spectral sparsification of graphs. SIAM J. Comput. 40(4), 981–1025 (2011)
Srinivasan, A.: Distributions on level-sets with applications to approximation algorithms. In: FOCS, pp. 588–597. IEEE Computer Society (2001)
Sun, H.: Counting hypergraphs in data streams. CoRR, arXiv:1304.7456 (2013)
Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: SDM, pp. 758–766. SIAM (2013)
Acknowledgements
We thank Sagar Kale for discussions of related work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Special Issue on Database Theory
This work was supported by NSF Awards CCF-0953754, IIS-1251110, CCF-1320719, and a Google Research Award.
Rights and permissions
About this article
Cite this article
McGregor, A., Vu, H.T. Better Streaming Algorithms for the Maximum Coverage Problem. Theory Comput Syst 63, 1595–1619 (2019). https://doi.org/10.1007/s00224-018-9878-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-018-9878-x