skip to main content
research-article

ChainedFilter: Combining Membership Filters by Chain Rule

Published:12 December 2023Publication History
Skip Abstract Section

Abstract

Membership (membership query/membership testing) is a fundamental problem across databases, networks and security. However, previous research has primarily focused on either approximate solutions, such as Bloom Filters, or exact methods, like perfect hashing and dictionaries, without attempting to develop an integral theory. In this paper, we propose a unified and complete theory, namely chain rule, for general membership problems, which encompasses both approximate and exact membership as extreme cases. Building upon the chain rule, we introduce a straightforward yet versatile algorithm framework, namely ChainedFilter, to combine different elementary filters without losing information. Our evaluation results demonstrate that ChainedFilter improves performance of many applications including static dictionary, lossless data compression, Cuckoo Hashing, LSM-Tree and Learned Filters.

References

  1. Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys & Tutorials, 14(1):131--155, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  2. Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485--509, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. Shahabeddin Geravand and Mahmood Ahmadi. Bloom filter applications in network security: A state-of-the-art survey. Computer Networks, 57(18):4047--4064, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):1--26, 2008.Google ScholarGoogle Scholar
  5. Niv Dayan, Manos Athanassoulis, and Stratos Idreos. Optimal bloom filters and adaptive merging for lsm-trees. ACM Transactions on Database Systems (TODS), 43(4):1--48, 2018.Google ScholarGoogle Scholar
  6. Yoshinori Matsunobu, Siying Dong, and Herman Lee. Myrocks: Lsm-tree database storage engine serving facebook's social graph. Proceedings of the VLDB Endowment, 13(12):3217--3230, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood. Fast packet classification using bloom filters. In Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, pages 61--70, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dan Li, Henggang Cui, Yan Hu, Yong Xia, and Xin Wang. Scalable data center multicast using multi-class bloom filter. In 2011 19th IEEE international conference on network protocols, pages 266--275. IEEE, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pedro Reviriego, Jorge Martínez, David Larrabeiti, and Salvatore Pontarelli. Cuckoo filters and bloom filters: Comparison and application to packet classification. IEEE Transactions on Network and Service Management, 17(4):2690--2701, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael T Goodrich and Michael Mitzenmacher. Invertible bloom lookup tables. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 792--799. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. A Pinar Ozisik, Gavin Andresen, Brian N Levine, Darren Tapp, George Bissias, and Sunny Katkuri. Graphene: efficient interactive set reconciliation applied to blockchain propagation. In Proceedings of the ACM Special Interest Group on Data Communication, pages 303--317. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Muhammad Anas Imtiaz, David Starobinski, Ari Trachtenberg, and Nabeel Younis. Churn in the bitcoin network: Characterization and impact. In 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 431--439. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  13. Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. In Proceedings of the tenth annual ACM symposium on Theory of computing, pages 59--65, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pages 75--88, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. Rocksdb: Evolution of development priorities in a key-value store serving large-scale applications. ACM Transactions on Storage (TOS), 17(4):1--32, 2021.Google ScholarGoogle Scholar
  16. Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 international conference on management of data, pages 489--504, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. Advances in Neural Information Processing Systems, 31, 2018.Google ScholarGoogle Scholar
  18. Qiyu Liu, Libin Zheng, Yanyan Shen, and Lei Chen. Stable learned bloom filters for data streams. Proceedings of the VLDB Endowment, 13(12):2355--2367, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhenwei Dai and Anshumali Shrivastava. Adaptive learned bloom filter (ada-bf): efficient utilization of the classifier with application to real-time information filtering on the web. Advances in Neural Information Processing Systems, 33:11700--11710, 2020.Google ScholarGoogle Scholar
  20. The open source code of chainedfilter. https://github.com/ChainedFilter.Google ScholarGoogle Scholar
  21. Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 30--39. Citeseer, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Denis Charles and Kumar Chellapilla. Bloomier filters: A second look. In Algorithms-ESA 2008: 16th Annual European Symposium, Karlsruhe, Germany, September 15--17, 2008. Proceedings 16, pages 259--270. Springer, 2008.Google ScholarGoogle Scholar
  23. Thomas Mueller Graf and Daniel Lemire. Xor filters: Faster and smaller than bloom and cuckoo filters. Journal of Experimental Algorithmics (JEA), 25:1--16, 2020.Google ScholarGoogle Scholar
  24. Thomas Mueller Graf and Daniel Lemire. Binary fuse filters: Fast and smaller than xor filters. Journal of Experimental Algorithmics (JEA), 27(1):1--15, 2022.Google ScholarGoogle Scholar
  25. Stefan Walzer. Peeling close to the orientability threshold--spatial coupling in hashing-based data structures. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2194--2211. SIAM, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ye Yu, Djamal Belazzougui, Chen Qian, and Qin Zhang. Memory-efficient and ultra-fast network lookup and forwarding using othello hashing. IEEE/ACM Transactions on Networking, 26(3):1151--1164, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yang Tong, Dongsheng Yang, Jie Jiang, Siang Gao, Bin Cui, Lei Shi, and Xiaoming Li. Coloring embedder: A memory efficient data structure for answering multi-set query. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1142--1153. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  28. Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yuriy Arbitman, Moni Naor, and Gil Segev. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In 2010 IEEE 51st Annual symposium on foundations of computer science, pages 787--796. IEEE, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shachar Lovett and Ely Porat. A space lower bound for dynamic approximate membership data structures. SIAM Journal on Computing, 42(6):2182--2196, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Murmur hashing source code. https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp.Google ScholarGoogle Scholar
  32. David A Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098--1101, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  33. Jarek Duda. Asymmetric numeral systems: entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540, 2013.Google ScholarGoogle Scholar
  34. Jóhannes B Hreinsson, Morten Krøyer, and Rasmus Pagh. Storing a compressed function with constant time access. In Algorithms-ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7--9, 2009. Proceedings 17, pages 730--741. Springer, 2009.Google ScholarGoogle Scholar
  35. Pedro Reviriego, Alfonso Sánchez-Macián, Stefan Walzer, and Peter C Dillinger. Approximate membership query filters with a false positive free set. arXiv preprint arXiv:2111.06856, 2021.Google ScholarGoogle Scholar
  36. Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094--1104, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Udi Manber and Sun Wu. An algorithm for approximate membership checking with application to password security. Information Processing Letters, 50(4):191--197, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Li Fan, Pei Cao, Jussara Almeida, and Andrei Z Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM transactions on networking, 8(3):281--293, 2000.Google ScholarGoogle Scholar
  40. Salvatore Pontarelli, Pedro Reviriego, and Michael Mitzenmacher. Emoma: Exact match in one memory access. IEEE Transactions on Knowledge and Data Engineering, 30(11):2120--2133, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771--780):1612, 1999.Google ScholarGoogle Scholar
  42. The open source code of learned filter. https://github.com/karthikeya20/Learned-Bloom-Filters.Google ScholarGoogle Scholar
  43. Yuhan Wu, Jintao He, Shen Yan, Jianyu Wu, Tong Yang, Olivier Ruas, Gong Zhang, and Bin Cui. Elastic bloom filter: deletable and expandable filter using elastic fingerprints. IEEE Transactions on Computers, 71(4):984--991, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  44. Zhuochen Fan, Yubo Zhang, Siyuan Dong, Yi Zhou, Fangyi Liu, Tong Yang, Steve Uhlig, and Bin Cui. Hoppingsketch: More accurate temporal membership query and frequency query. IEEE Transactions on Knowledge and Data Engineering, 35(9):9067--9072, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Haoyu Li, Qizhi Chen, Yixin Zhang, Tong Yang, and Bin Cui. Stingy sketch: A sketch framework for accurate and fast frequency estimation. Proc. VLDB Endow., 15(7):1426--1438, mar 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yuanpeng Li, Feiyu Wang, Xiang Yu, Yilong Yang, Kaicheng Yang, Tong Yang, Zhuo Ma, Bin Cui, and Steve Uhlig. Ladderfilter: Filtering infrequent items with small memory and time overhead. Proceedings of the ACM on Management of Data, 1(1):1--21, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jiarui Guo, Yisen Hong, YuhanWu, Yunfei Liu, Tong Yang, and Bin Cui. Sketchpolymer: Estimate per-item tail quantile using one sketch. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 590--601, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ziwei Wang, Zheng Zhong, Jiarui Guo, Yuhan Wu, Haoyu Li, Tong Yang, Yaofeng Tu, Huanchen Zhang, and Bin Cui. Rencoder: A space-time efficient range filter with local encoder. In 2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 2036--2049, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  49. Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. {SketchLib}: Enabling efficient sketch-based monitoring on programmable switches. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 743--759, 2022.Google ScholarGoogle Scholar
  50. Xiaodong Li, Zhuochen Fan, Haoyu Li, Zheng Zhong, Jiarui Guo, Sheng Long, Tong Yang, and Bin Cui. Steadysketch: Finding steady flows in data streams. In 2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS), pages 01--09. IEEE, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  51. Zhuochen Fan, Gang Wen, Zhipeng Huang, Yang Zhou, Qiaobin Fu, Tong Yang, Alex X Liu, and Bin Cui. On the evolutionary of bloom filter false positives-an information theoretical approach to optimizing bloom filter parameters. IEEE Transactions on Knowledge and Data Engineering, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. Sketchovsky: Enabling ensembles of sketches on programmable switches. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1273--1292, 2023.Google ScholarGoogle Scholar
  53. Zirui Liu, Chaozhe Kong, Kaicheng Yang, Tong Yang, Ruijie Miao, Qizhi Chen, Yikai Zhao, Yaofeng Tu, and Bin Cui. Hypercalm sketch: One-pass mining periodic batches in data streams.Google ScholarGoogle Scholar
  54. Martin Dietzfelbinger and Rasmus Pagh. Succinct data structures for retrieval and approximate membership. In Automata, Languages and Programming: 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7--11, 2008, Proceedings, Part I 35, pages 385--396. Springer, 2008.Google ScholarGoogle Scholar
  55. Ely Porat. An optimal bloom filter replacement based on matrix solving. In Computer Science-Theory and Applications: Fourth International Computer Science Symposium in Russia, CSR 2009, Novosibirsk, Russia, August 18--23, 2009. Proceedings 4, pages 263--273. Springer, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Bohdan S Majewski, Nicholas C Wormald, George Havas, and Zbigniew J Czech. A family of perfect hashing methods. The Computer Journal, 39(6):547--554, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  57. Michael L Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with 0 (1) worst case access time. Journal of the ACM (JACM), 31(3):538--544, 1984.Google ScholarGoogle Scholar
  58. Andrej Brodnik and J Ian Munro. Membership in constant time and minimum space. In Algorithms-ESA'94: Second Annual European Symposium Utrecht, The Netherlands, September 26--28, 1994 Proceedings 2, pages 72--81. Springer, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  59. Andrej Brodnik and J Ian Munro. Membership in constant time and almost-minimum space. SIAM Journal on computing, 28(5):1627--1640, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Rasmus Pagh. Low redundancy in static dictionaries with constant query time. SIAM Journal on Computing, 31(2):353--363, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ioana Oriana Bercea and Guy Even. A space-efficient dynamic dictionary for multisets with constant time operations. arXiv preprint arXiv:2005.02143, 2020.Google ScholarGoogle Scholar
  62. Anna Pagh, Rasmus Pagh, and S Srinivasa Rao. An optimal bloom filter replacement. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 823--829, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ChainedFilter: Combining Membership Filters by Chain Rule

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Management of Data
            Proceedings of the ACM on Management of Data  Volume 1, Issue 4
            PACMMOD
            December 2023
            1317 pages
            EISSN:2836-6573
            DOI:10.1145/3637468
            • Editor:
            • Divyakant Agrawal
            Issue’s Table of Contents

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 December 2023
            Published in pacmmod Volume 1, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader