skip to main content
research-article
Open Access

Model Counting Meets Distinct Elements

Published:23 August 2023Publication History
Skip Abstract Section

Abstract

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F0) of a data stream.

Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works.

References

  1. Alon, N., Matias, Y., Szegedy, M. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1 (1999), 137--147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L. Counting distinct elements in a data stream. In Volume 2483 of Proceedings of RANDOM (2002), Springer, Cambridge, USA, 1--10.Google ScholarGoogle Scholar
  3. Bar-Yossef, Z., Kumar, R., Sivakumar, D. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of SODA (2002), ACM/SIAM, NY, 623--632.Google ScholarGoogle Scholar
  4. Carter, J.L., Wegman, M.N. Universal classes of hash functions. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (1977), ACM, NY, 106--112.Google ScholarGoogle Scholar
  5. Chakraborty,, S., Meel, K.S., Vardi, M.Y. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In Proceedings of IJCAI (2016), IJCAI/AAAI Press, New York, USA.Google ScholarGoogle Scholar
  6. Cormode, G., Muthukrishnan, S. Estimating dominance norms of multiple data streams. In Proceedings of ESA, Volume 2832 of Lecture Notes in Computer Science. G.D. Battista and U. Zwick, eds. Springer, Budapest, Hungary, 2003, 148--160.Google ScholarGoogle ScholarCross RefCross Ref
  7. Cormode, G., Muthukrishnan, S., Yi, K. Algorithms for distributed functional monitoring. ACM Trans. Algorithms (TALG) 7, 2 (2011), 1--20.Google ScholarGoogle Scholar
  8. Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q. Continuous sampling from distributed streams. J. ACM (JACM) 59, 2 (2012), 1--25.Google ScholarGoogle Scholar
  9. Dagum, P., Karp, R., Luby, M., Ross, S. An optimal algorithm for monte carlo estimation. SIAM J. Comput. 29, 5 (2000), 1484--1496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B. Low-density parity constraints for hashing-based discrete integration. In Proceedings of ICML (2014), JMLR, Beijing, China, 271--279.Google ScholarGoogle Scholar
  11. Feng, W., Hayes, T.P., Yin, Y. Distributed symmetry breaking in sampling (optimal distributed randomly coloring with fewer colors). arXiv preprint arXiv:1802.06953 (2018).Google ScholarGoogle Scholar
  12. Feng, W., Sun, Y., Yin, Y. What can be sampled locally? Distrib. Comput. 33 (2018), 1--27.Google ScholarGoogle Scholar
  13. Feng, W., Yin, Y. On local distributed sampling and counting. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing (2018), ACM, NY, 189--198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fischer, M., Ghaffari, M. A simple parallel and distributed sampling technique: Local glauber dynamics. In 32nd International Symposium on Distributed Computing (2018) Schloss Dagstuhl - Leibniz-Zentrum für Informatik, New Orleans, USA.Google ScholarGoogle Scholar
  15. Flajolet, P., Martin, G.N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31, 2 (1985), 182--209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gibbons, P.B., Tirthapura, S. Estimating simple functions on the union of data streams. In Proceedings of SPAA. A. L. Rosenberg, ed. ACM, NY, 2001, 281--291.Google ScholarGoogle Scholar
  17. Gomes, C.P., Hoffmann, J., Sabharwal, A., Selman, B. From sampling to model counting. In Proceedings of IJCAI (2007), IJCAI/AAAI Press, Hyderabad, India, 2293--2299.Google ScholarGoogle Scholar
  18. Huang, Z., Yi, K., Zhang, Q. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proceedings of PODS (2012), ACM, Scottsdale, USA 295--306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ivrii, A., Malik, S., Meel, K.S., Vardi, M.Y. On computing minimal independent support and its applications to sampling and counting. Constraints An Int. J. 21, 1 (2016), 41--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kane, D.M., Nelson, J., Woodruff, D.P. An optimal algorithm for the distinct elements problem. In Proceedings of PODS (2010), ACM, NY, 41--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Karp, R., Luby, M. Monte-carlo algorithms for enumeration and reliability problems. In Proceedings of FOCS (1983), IEEE Computer Society, Arizona, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Karp, R.M., Luby, M., Madras, N. Monte-carlo approximation algorithms for enumeration problems. J. Algorithms 10, 3 (1989), 429--448.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Meel, K.S., Akshay, S. Sparse hashing for scalable approximate model counting: Theory and practice. In Proceedings of LICS (2020) ACM, Saarbrücken, Germany.Google ScholarGoogle Scholar
  24. Meel, K.S., Shrotri, A.A., Vardi, M.Y. On hashing-based approaches to approximate dnf-counting. In Proceedings of FSTTCS (2017) Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Kanpur, India.Google ScholarGoogle Scholar
  25. Meel, K.S., Shrotri, A.A., Vardi, M.Y. Not all fprass are equal: Demystifying fprass for dnf-counting (extended abstract). In Volume 8 of Proceedings of IJCAI (2019), IJCAI, Macau, China.Google ScholarGoogle Scholar
  26. Pavan, A., Tirthapura, S. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput. 37, 2 (2007), 359--379.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ré, C., Suciu, D. Approximate lineage for probabilistic databases. Proc. VLDB Endowment 1, 1 (2008), 797--808.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Senellart, P. Provenance and probabilities in relational databases. ACM SIGMOD Rec. 46, 4 (2018), 5--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Soos, M., Meel, K.S. Bird: Engineering an efficient cnf-xor sat solver and its applications to approximate model counting. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI) (2019) AAAI Press, Honolulu, USA.Google ScholarGoogle Scholar
  30. Stockmeyer, L. The complexity of approximate counting. In Proceedings of STOC (1983), ACM, Boston, 118--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tirthapura, S., Woodruff, D.P. Rectangle-efficient aggregation in spatial data streams. In Proceedings of PODS (2012), ACM, NY, 283--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Valiant, L. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410--421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Woodruff, D.P., Zhang, Q. Tight bounds for distributed functional monitoring. In Proceedings of the 44th Annual ACM Symposium on Theory of Computing (2012), ACM, New York, USA 941--960.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Model Counting Meets Distinct Elements

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 66, Issue 9
        September 2023
        97 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/3617556
        • Editor:
        • James Larus
        Issue’s Table of Contents

        Copyright © 2023 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 August 2023

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)5,087
        • Downloads (Last 6 weeks)38

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format