Skip to main content
Log in

Better Streaming Algorithms for the Maximum Coverage Problem

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe \(\{1,\ldots ,n \}\), find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor \(1-1/e\) in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to \(1-1/e\), that use sublinear space \(o(mn)\). Our main results are:

  • Two \((1-1/e-\epsilon )\) approximation algorithms: One uses \(O(\epsilon ^{-1})\) passes and \(\tilde {O}(\epsilon ^{-2} k)\) space whereas the other uses only a single pass but \(\tilde {O}(\epsilon ^{-2} m)\) space. \(\tilde {O}(\cdot )\) suppresses polylog factors.

  • We show that any approximation factor better than \((1-(1-1/k)^{k})\approx 1-1/e\) in constant passes requires \({\Omega }(m)\) space for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, \((1-\epsilon )\) approximation algorithm using \(\tilde {O}\left (\epsilon ^{-2} m \cdot \min (k,\epsilon ^{-1})\right )\) space.

We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges.

  • We show that any constant approximation in constant passes requires \({\Omega }(N)\) space for constant k whereas \(\tilde {O}(\epsilon ^{-2}N)\) space is sufficient for a \((1-\epsilon )\) approximation and arbitrary k in a single pass.

  • For regular graphs, we show that \(\tilde {O}(\epsilon ^{-3}k)\) space is sufficient for a \((1-\epsilon )\) approximation in a single pass. We generalize this to a \((\kappa -\epsilon )\) approximation when the ratio between the minimum and maximum degree is bounded below by \(\kappa \).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Note that their work addressed the more general problem of maximizing sub-modular functions.

  2. We consider \(1-1/poly(m)\) or \(1-1/poly(n)\) as high probability.

  3. The number of guesses can be reduced to \(\lceil \log _{2} k \rceil \) if the size of the largest set is known since this gives a k approximation of OPT. The size of the large set can be computed in one additional pass if necessary.

References

  1. Ageev, A.A., Sviridenko, M.: Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: IPCO, volume 1610 of Lecture Notes in Computer Science, pp. 17–30. Springer (1999)

  2. Ageev, A.A., Sviridenko, M.: Pipage rounding: A new method of constructing algorithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ahn, K.J, Cormode, G., Guha, S., McGregor, A., Wirth, A.: Correlation clustering in data streams. In: ICML, volume 37 of JMLR Workshop and Conference Proceedings, pp. 2237–2246, JMLR.org (2015)

  4. Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: SODA, pp. 459–467. SIAM (2012)

  5. Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: PODS, pp. 5–14. ACM (2012)

  6. Ahn, K.J., Guha, S., McGregor, A.: Spectral sparsification in dynamic graph streams. In: APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pp. 1–10. Springer (2013)

  7. Anagnostopoulos, A., Becchetti, L., Bordino, I., Leonardi, S., Mele, I., Sankowski, P.: Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst. 33(3), 11:1–11:35 (2015)

    Article  Google Scholar 

  8. Assadi, S.: Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In: PODS, pp. 321–335. ACM (2017)

  9. Assadi, S., Khanna, S., Li, Y.: Tight bounds for single-pass streaming complexity of the set cover problem. In: STOC, pp. 698–711. ACM (2016)

  10. Assadi, S., Khanna, S., Li, Y., Yaroslavtsev, G.: Maximum matchings in dynamic graph streams and the simultaneous communication model. In: SODA, pp. 1345–1364. SIAM (2016)

  11. Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.Th.: Online maximum k-coverage. Discret. Appl. Math. 160(13-14), 1901–1913 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680. ACM (2014)

  13. Badanidiyuru, A., Vondrák, J.: Fast algorithms for maximizing submodular functions. In: SODA, pp. 1497–1514. SIAM (2014)

  14. Bateni, M.H., Esfandiari, H., Mirrokni, V.S.: Almost optimal streaming algorithms for coverage problems CoRR, arXiv:1610.08096 (2016)

  15. Bhattacharya, S., Henzinger, M., Nanongkai, D., Tsourakakis, C.E.: Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In: STOC, pp. 173–182. ACM (2015)

  16. Bonnet, É., Escoffier, B., Paschos, V.Th., Stamoulis, G.: A 0.821-ratio purely combinatorial algorithm for maximum k-vertex cover in bipartite graphs. In: LATIN, volume 9644 of Lecture Notes in Computer Science, pp. 235–248. Springer (2016)

  17. Caskurlu, B., Mkrtchyan, V., Parekh, O., Subramani, K.: On partial vertex cover and budgeted maximum coverage problems in bipartite graphs. In: IFIP TCS, volume 8705 of Lecture Notes in Computer Science, pp. 13–26. Springer (2014)

  18. Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. Electronic Colloquium on Computational Complexity (ECCC) 18, 62 (2011)

    MATH  Google Scholar 

  19. Chakrabarti, A., Kale, S.: Submodular maximization meets streaming: Matchings, matroids, and more. In: IPCO, volume 8494 of Lecture Notes in Computer Science, pp. 210–221. Springer (2014)

  20. Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. IEEE Computer Society (2003)

  21. Chakrabarti, A., Wirth, A.: Incidence geometries and the pass complexity of semi-streaming set cover. In: SODA, pp. 1365–1373. SIAM (2016)

  22. Chekuri, C., Gupta, S., Quanrud, K.: Streaming algorithms for submodular function maximization. In: ICALP (1), volume 9134 of Lecture Notes in Computer Science, pp. 318–330. Springer (2015)

  23. Chekuri, C., Kumar, A.: Maximum coverage problem with group budget constraints and applications. In: APPROX-RANDOM, volume 3122 of Lecture Notes in Computer Science, pp. 72–83. Springer (2004)

  24. Chitnis, R., Cormode, G., Esfandiari, H., Hajiaghayi, M.T., McGregor, A., Monemizadeh, M., Vorotnikova, S.: Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In: SODA, pp. 1326–1344. SIAM (2016)

  25. Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)

    Article  Google Scholar 

  26. Cormode, G., Karloff, H.J., Wirth, A.: Set cover algorithms for very large datasets. In: CIKM, pp. 479–488. ACM (2010)

  27. Emek, Y., Rosén, A.: Semi-streaming set cover - (extended abstract). In: ICALP (1), volume 8572 of Lecture Notes in Computer Science, pp. 453–464. Springer (2014)

  28. Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  29. Guha, S., McGregor, A., Tench, D.: Vertex and hyperedge connectivity in dynamic graph streams. In: PODS, pp. 241–247. ACM (2015)

  30. Har-Peled, S., Indyk, P., Mahabadi, S., Vakilian, A.: Towards tight bounds for the streaming set cover problem. In: PODS, pp. 371–383. ACM (2016)

  31. Kapralov, M., Lee, Y.T., Musco, C., Musco, C., Sidford, A.: Single pass spectral sparsification in dynamic streams. IEEE Computer Society (2014)

  32. Kapralov, M., Woodruff, D.P.: Spanners and sparsifiers in dynamic streams. In: PODC, pp. 272–281. ACM (2014)

  33. Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. Theory of Computing 11, 105–147 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  34. Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Inf. Process. Lett. 70(1), 39–45 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  35. Kogan, D., Krauthgamer, R.: Sketching cuts in graphs and hypergraphs. In 6th Innovations in Theoretical Computer Science (2015)

  36. Konrad, C.: Maximum matching in turnstile streams. In: ESA, volume 9294 of Lecture Notes in Computer Science, pp. 840–852. Springer (2015)

  37. Krause, A., Guestrin, C.: Near-optimal observation selection using submodular functions. In: AAAI, pp. 1650–1654. AAAI Press (2007)

  38. Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. TOPC 2(3), 14 (2015)

    Article  Google Scholar 

  39. McGregor, A.: Graph stream algorithms: a survey. SIGMOD Record 43(1), 9–20 (2014)

    Article  MathSciNet  Google Scholar 

  40. McGregor, A., Tench, D., Vorotnikova, S., Vu, H.T.: Densest subgraph in dynamic graph streams. In: MFCS (2), volume 9235 of Lecture Notes in Computer Science, pp. 472–482. Springer (2015)

  41. McGregor, A., Vorotnikova, S., Vu, H.T.: Better algorithms for counting triangles in data streams. In: PODS, pp. 401–411. ACM (2016)

  42. McGregor, A., Vu, H.T.: Better streaming algorithms for the maximum coverage problem. In: ICDT, volume 68 of LIPIcs, pp. 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)

  43. Panconesi, A., Srinivasan, A.: Randomized distributed edge coloring via an extension of the chernoff-hoeffding bounds. SIAM J. Comput. 26(2), 350–368 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  44. Radhakrishnan, J., Shannigrahi, S.: Streaming algorithms for 2-coloring uniform hypergraphs. In: Algorithms and Data Structures - 12th International Symposium, WADS 2011, New york. Proceedings, pp. 667–678 (2011)

  45. Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: SDM, pp. 697–708. SIAM (2009)

  46. Schmidt, J.P., Siegel, A., Srinivasan, A.: Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8(2), 223–250 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  47. Spielman, D.A., Teng, S.H.: Spectral sparsification of graphs. SIAM J. Comput. 40(4), 981–1025 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  48. Srinivasan, A.: Distributions on level-sets with applications to approximation algorithms. In: FOCS, pp. 588–597. IEEE Computer Society (2001)

  49. Sun, H.: Counting hypergraphs in data streams. CoRR, arXiv:1304.7456 (2013)

  50. Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: SDM, pp. 758–766. SIAM (2013)

Download references

Acknowledgements

We thank Sagar Kale for discussions of related work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoa T. Vu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Special Issue on Database Theory

This work was supported by NSF Awards CCF-0953754, IIS-1251110, CCF-1320719, and a Google Research Award.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McGregor, A., Vu, H.T. Better Streaming Algorithms for the Maximum Coverage Problem. Theory Comput Syst 63, 1595–1619 (2019). https://doi.org/10.1007/s00224-018-9878-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-018-9878-x

Keywords

Navigation