Model Counting Meets Distinct Elements

Authors:
A. Pavan

Iowa State University, Ames, Iowa, USA

Iowa State University, Ames, Iowa, USA
View Profile

,
N. V. Vinodchandran

University of Nebraska-Lincoln, Lincoln, Nebraska, USA

University of Nebraska-Lincoln, Lincoln, Nebraska, USA
View Profile

,
Arnab Bhattacharyya

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Kuldeep S. Meel

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

Authors Info & Claims

Communications of the ACM Volume 66 Issue 9September 2023pp 95–102https://doi.org/10.1145/3607824

Published:23 August 2023Publication History

Communications of the ACM

Abstract

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F₀) of a data stream.

Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works.

References

Alon, N., Matias, Y., Szegedy, M. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1 (1999), 137--147.Google ScholarDigital Library
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L. Counting distinct elements in a data stream. In Volume 2483 of Proceedings of RANDOM (2002), Springer, Cambridge, USA, 1--10.Google Scholar
Bar-Yossef, Z., Kumar, R., Sivakumar, D. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of SODA (2002), ACM/SIAM, NY, 623--632.Google Scholar
Carter, J.L., Wegman, M.N. Universal classes of hash functions. In Proceedings of the 9^th Annual ACM Symposium on Theory of Computing (1977), ACM, NY, 106--112.Google Scholar
Chakraborty,, S., Meel, K.S., Vardi, M.Y. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In Proceedings of IJCAI (2016), IJCAI/AAAI Press, New York, USA.Google Scholar
Cormode, G., Muthukrishnan, S. Estimating dominance norms of multiple data streams. In Proceedings of ESA, Volume 2832 of Lecture Notes in Computer Science. G.D. Battista and U. Zwick, eds. Springer, Budapest, Hungary, 2003, 148--160.Google ScholarCross Ref
Cormode, G., Muthukrishnan, S., Yi, K. Algorithms for distributed functional monitoring. ACM Trans. Algorithms (TALG) 7, 2 (2011), 1--20.Google Scholar
Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q. Continuous sampling from distributed streams. J. ACM (JACM) 59, 2 (2012), 1--25.Google Scholar
Dagum, P., Karp, R., Luby, M., Ross, S. An optimal algorithm for monte carlo estimation. SIAM J. Comput. 29, 5 (2000), 1484--1496.Google ScholarDigital Library
Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B. Low-density parity constraints for hashing-based discrete integration. In Proceedings of ICML (2014), JMLR, Beijing, China, 271--279.Google Scholar
Feng, W., Hayes, T.P., Yin, Y. Distributed symmetry breaking in sampling (optimal distributed randomly coloring with fewer colors). arXiv preprint arXiv:1802.06953 (2018).Google Scholar
Feng, W., Sun, Y., Yin, Y. What can be sampled locally? Distrib. Comput. 33 (2018), 1--27.Google Scholar
Feng, W., Yin, Y. On local distributed sampling and counting. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing (2018), ACM, NY, 189--198.Google ScholarDigital Library
Fischer, M., Ghaffari, M. A simple parallel and distributed sampling technique: Local glauber dynamics. In 32^nd International Symposium on Distributed Computing (2018) Schloss Dagstuhl - Leibniz-Zentrum für Informatik, New Orleans, USA.Google Scholar
Flajolet, P., Martin, G.N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31, 2 (1985), 182--209.Google ScholarDigital Library
Gibbons, P.B., Tirthapura, S. Estimating simple functions on the union of data streams. In Proceedings of SPAA. A. L. Rosenberg, ed. ACM, NY, 2001, 281--291.Google Scholar
Gomes, C.P., Hoffmann, J., Sabharwal, A., Selman, B. From sampling to model counting. In Proceedings of IJCAI (2007), IJCAI/AAAI Press, Hyderabad, India, 2293--2299.Google Scholar
Huang, Z., Yi, K., Zhang, Q. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proceedings of PODS (2012), ACM, Scottsdale, USA 295--306.Google ScholarDigital Library
Ivrii, A., Malik, S., Meel, K.S., Vardi, M.Y. On computing minimal independent support and its applications to sampling and counting. Constraints An Int. J. 21, 1 (2016), 41--58.Google ScholarDigital Library
Kane, D.M., Nelson, J., Woodruff, D.P. An optimal algorithm for the distinct elements problem. In Proceedings of PODS (2010), ACM, NY, 41--52.Google ScholarDigital Library
Karp, R., Luby, M. Monte-carlo algorithms for enumeration and reliability problems. In Proceedings of FOCS (1983), IEEE Computer Society, Arizona, USA.Google ScholarDigital Library
Karp, R.M., Luby, M., Madras, N. Monte-carlo approximation algorithms for enumeration problems. J. Algorithms 10, 3 (1989), 429--448.Google ScholarDigital Library
Meel, K.S., Akshay, S. Sparse hashing for scalable approximate model counting: Theory and practice. In Proceedings of LICS (2020) ACM, Saarbrücken, Germany.Google Scholar
Meel, K.S., Shrotri, A.A., Vardi, M.Y. On hashing-based approaches to approximate dnf-counting. In Proceedings of FSTTCS (2017) Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Kanpur, India.Google Scholar
Meel, K.S., Shrotri, A.A., Vardi, M.Y. Not all fprass are equal: Demystifying fprass for dnf-counting (extended abstract). In Volume 8 of Proceedings of IJCAI (2019), IJCAI, Macau, China.Google Scholar
Pavan, A., Tirthapura, S. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput. 37, 2 (2007), 359--379.Google ScholarDigital Library
Ré, C., Suciu, D. Approximate lineage for probabilistic databases. Proc. VLDB Endowment 1, 1 (2008), 797--808.Google ScholarDigital Library
Senellart, P. Provenance and probabilities in relational databases. ACM SIGMOD Rec. 46, 4 (2018), 5--15.Google ScholarDigital Library
Soos, M., Meel, K.S. Bird: Engineering an efficient cnf-xor sat solver and its applications to approximate model counting. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI) (2019) AAAI Press, Honolulu, USA.Google Scholar
Stockmeyer, L. The complexity of approximate counting. In Proceedings of STOC (1983), ACM, Boston, 118--126.Google ScholarDigital Library
Tirthapura, S., Woodruff, D.P. Rectangle-efficient aggregation in spatial data streams. In Proceedings of PODS (2012), ACM, NY, 283--294.Google ScholarDigital Library
Valiant, L. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410--421.Google ScholarDigital Library
Woodruff, D.P., Zhang, Q. Tight bounds for distributed functional monitoring. In Proceedings of the 44^th Annual ACM Symposium on Theory of Computing (2012), ACM, New York, USA 941--960.Google ScholarDigital Library

Index Terms

Model Counting Meets Distinct Elements
1. Theory of computation
  1. Design and analysis of algorithms
    1. Streaming, sublinear and near linear time algorithms
      1. Sketching and sampling
  2. Models of computation
    1. Streaming models

Recommendations

Model Counting meets F₀ Estimation
PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Constraint satisfaction problems (CSP's) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently ...
Read More
Model Counting Meets Distinct Elements in a Data Stream

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently ...
Read More
Streamed approximate counting of distinct elements: beating optimal batch methods
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Counting the number of distinct elements in a large dataset is a common task in web applications and databases. This problem is difficult in limited memory settings where storing a large hash table table is intractable. This paper advances the state of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 66, Issue 9
September 2023
97 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3617556
Editor:
James Larus
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 August 2023
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 5,087
  Total Downloads
- Downloads (Last 12 months)5,087
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Model Counting Meets Distinct Elements

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Model Counting meets F₀ Estimation

Model Counting Meets Distinct Elements in a Data Stream

Streamed approximate counting of distinct elements: beating optimal batch methods