Abstract
Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F0) of a data stream.
Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works.
- Alon, N., Matias, Y., Szegedy, M. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1 (1999), 137--147.Google ScholarDigital Library
- Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L. Counting distinct elements in a data stream. In Volume 2483 of Proceedings of RANDOM (2002), Springer, Cambridge, USA, 1--10.Google Scholar
- Bar-Yossef, Z., Kumar, R., Sivakumar, D. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of SODA (2002), ACM/SIAM, NY, 623--632.Google Scholar
- Carter, J.L., Wegman, M.N. Universal classes of hash functions. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (1977), ACM, NY, 106--112.Google Scholar
- Chakraborty,, S., Meel, K.S., Vardi, M.Y. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In Proceedings of IJCAI (2016), IJCAI/AAAI Press, New York, USA.Google Scholar
- Cormode, G., Muthukrishnan, S. Estimating dominance norms of multiple data streams. In Proceedings of ESA, Volume 2832 of Lecture Notes in Computer Science. G.D. Battista and U. Zwick, eds. Springer, Budapest, Hungary, 2003, 148--160.Google ScholarCross Ref
- Cormode, G., Muthukrishnan, S., Yi, K. Algorithms for distributed functional monitoring. ACM Trans. Algorithms (TALG) 7, 2 (2011), 1--20.Google Scholar
- Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q. Continuous sampling from distributed streams. J. ACM (JACM) 59, 2 (2012), 1--25.Google Scholar
- Dagum, P., Karp, R., Luby, M., Ross, S. An optimal algorithm for monte carlo estimation. SIAM J. Comput. 29, 5 (2000), 1484--1496.Google ScholarDigital Library
- Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B. Low-density parity constraints for hashing-based discrete integration. In Proceedings of ICML (2014), JMLR, Beijing, China, 271--279.Google Scholar
- Feng, W., Hayes, T.P., Yin, Y. Distributed symmetry breaking in sampling (optimal distributed randomly coloring with fewer colors). arXiv preprint arXiv:1802.06953 (2018).Google Scholar
- Feng, W., Sun, Y., Yin, Y. What can be sampled locally? Distrib. Comput. 33 (2018), 1--27.Google Scholar
- Feng, W., Yin, Y. On local distributed sampling and counting. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing (2018), ACM, NY, 189--198.Google ScholarDigital Library
- Fischer, M., Ghaffari, M. A simple parallel and distributed sampling technique: Local glauber dynamics. In 32nd International Symposium on Distributed Computing (2018) Schloss Dagstuhl - Leibniz-Zentrum für Informatik, New Orleans, USA.Google Scholar
- Flajolet, P., Martin, G.N. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31, 2 (1985), 182--209.Google ScholarDigital Library
- Gibbons, P.B., Tirthapura, S. Estimating simple functions on the union of data streams. In Proceedings of SPAA. A. L. Rosenberg, ed. ACM, NY, 2001, 281--291.Google Scholar
- Gomes, C.P., Hoffmann, J., Sabharwal, A., Selman, B. From sampling to model counting. In Proceedings of IJCAI (2007), IJCAI/AAAI Press, Hyderabad, India, 2293--2299.Google Scholar
- Huang, Z., Yi, K., Zhang, Q. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proceedings of PODS (2012), ACM, Scottsdale, USA 295--306.Google ScholarDigital Library
- Ivrii, A., Malik, S., Meel, K.S., Vardi, M.Y. On computing minimal independent support and its applications to sampling and counting. Constraints An Int. J. 21, 1 (2016), 41--58.Google ScholarDigital Library
- Kane, D.M., Nelson, J., Woodruff, D.P. An optimal algorithm for the distinct elements problem. In Proceedings of PODS (2010), ACM, NY, 41--52.Google ScholarDigital Library
- Karp, R., Luby, M. Monte-carlo algorithms for enumeration and reliability problems. In Proceedings of FOCS (1983), IEEE Computer Society, Arizona, USA.Google ScholarDigital Library
- Karp, R.M., Luby, M., Madras, N. Monte-carlo approximation algorithms for enumeration problems. J. Algorithms 10, 3 (1989), 429--448.Google ScholarDigital Library
- Meel, K.S., Akshay, S. Sparse hashing for scalable approximate model counting: Theory and practice. In Proceedings of LICS (2020) ACM, Saarbrücken, Germany.Google Scholar
- Meel, K.S., Shrotri, A.A., Vardi, M.Y. On hashing-based approaches to approximate dnf-counting. In Proceedings of FSTTCS (2017) Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Kanpur, India.Google Scholar
- Meel, K.S., Shrotri, A.A., Vardi, M.Y. Not all fprass are equal: Demystifying fprass for dnf-counting (extended abstract). In Volume 8 of Proceedings of IJCAI (2019), IJCAI, Macau, China.Google Scholar
- Pavan, A., Tirthapura, S. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput. 37, 2 (2007), 359--379.Google ScholarDigital Library
- Ré, C., Suciu, D. Approximate lineage for probabilistic databases. Proc. VLDB Endowment 1, 1 (2008), 797--808.Google ScholarDigital Library
- Senellart, P. Provenance and probabilities in relational databases. ACM SIGMOD Rec. 46, 4 (2018), 5--15.Google ScholarDigital Library
- Soos, M., Meel, K.S. Bird: Engineering an efficient cnf-xor sat solver and its applications to approximate model counting. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI) (2019) AAAI Press, Honolulu, USA.Google Scholar
- Stockmeyer, L. The complexity of approximate counting. In Proceedings of STOC (1983), ACM, Boston, 118--126.Google ScholarDigital Library
- Tirthapura, S., Woodruff, D.P. Rectangle-efficient aggregation in spatial data streams. In Proceedings of PODS (2012), ACM, NY, 283--294.Google ScholarDigital Library
- Valiant, L. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410--421.Google ScholarDigital Library
- Woodruff, D.P., Zhang, Q. Tight bounds for distributed functional monitoring. In Proceedings of the 44th Annual ACM Symposium on Theory of Computing (2012), ACM, New York, USA 941--960.Google ScholarDigital Library
Index Terms
- Model Counting Meets Distinct Elements
Recommendations
Model Counting meets F0 Estimation
PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsConstraint satisfaction problems (CSP's) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently ...
Model Counting Meets Distinct Elements in a Data Stream
Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently ...
Streamed approximate counting of distinct elements: beating optimal batch methods
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningCounting the number of distinct elements in a large dataset is a common task in web applications and databases. This problem is difficult in limited memory settings where storing a large hash table table is intractable. This paper advances the state of ...
Comments