Abstract
Membership (membership query/membership testing) is a fundamental problem across databases, networks and security. However, previous research has primarily focused on either approximate solutions, such as Bloom Filters, or exact methods, like perfect hashing and dictionaries, without attempting to develop an integral theory. In this paper, we propose a unified and complete theory, namely chain rule, for general membership problems, which encompasses both approximate and exact membership as extreme cases. Building upon the chain rule, we introduce a straightforward yet versatile algorithm framework, namely ChainedFilter, to combine different elementary filters without losing information. Our evaluation results demonstrate that ChainedFilter improves performance of many applications including static dictionary, lossless data compression, Cuckoo Hashing, LSM-Tree and Learned Filters.
- Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys & Tutorials, 14(1):131--155, 2011.Google ScholarCross Ref
- Andrei Broder and Michael Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485--509, 2004.Google ScholarCross Ref
- Shahabeddin Geravand and Mahmood Ahmadi. Bloom filter applications in network security: A state-of-the-art survey. Computer Networks, 57(18):4047--4064, 2013.Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):1--26, 2008.Google Scholar
- Niv Dayan, Manos Athanassoulis, and Stratos Idreos. Optimal bloom filters and adaptive merging for lsm-trees. ACM Transactions on Database Systems (TODS), 43(4):1--48, 2018.Google Scholar
- Yoshinori Matsunobu, Siying Dong, and Herman Lee. Myrocks: Lsm-tree database storage engine serving facebook's social graph. Proceedings of the VLDB Endowment, 13(12):3217--3230, 2020.Google ScholarDigital Library
- Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood. Fast packet classification using bloom filters. In Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, pages 61--70, 2006.Google ScholarDigital Library
- Dan Li, Henggang Cui, Yan Hu, Yong Xia, and Xin Wang. Scalable data center multicast using multi-class bloom filter. In 2011 19th IEEE international conference on network protocols, pages 266--275. IEEE, 2011.Google ScholarDigital Library
- Pedro Reviriego, Jorge Martínez, David Larrabeiti, and Salvatore Pontarelli. Cuckoo filters and bloom filters: Comparison and application to packet classification. IEEE Transactions on Network and Service Management, 17(4):2690--2701, 2020.Google ScholarDigital Library
- Michael T Goodrich and Michael Mitzenmacher. Invertible bloom lookup tables. In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 792--799. IEEE, 2011.Google ScholarCross Ref
- A Pinar Ozisik, Gavin Andresen, Brian N Levine, Darren Tapp, George Bissias, and Sunny Katkuri. Graphene: efficient interactive set reconciliation applied to blockchain propagation. In Proceedings of the ACM Special Interest Group on Data Communication, pages 303--317. 2019.Google ScholarDigital Library
- Muhammad Anas Imtiaz, David Starobinski, Ari Trachtenberg, and Nabeel Younis. Churn in the bitcoin network: Characterization and impact. In 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pages 431--439. IEEE, 2019.Google ScholarCross Ref
- Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. In Proceedings of the tenth annual ACM symposium on Theory of computing, pages 59--65, 1978.Google ScholarDigital Library
- Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, pages 75--88, 2014.Google ScholarDigital Library
- Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. Rocksdb: Evolution of development priorities in a key-value store serving large-scale applications. ACM Transactions on Storage (TOS), 17(4):1--32, 2021.Google Scholar
- Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 international conference on management of data, pages 489--504, 2018.Google ScholarDigital Library
- Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. Advances in Neural Information Processing Systems, 31, 2018.Google Scholar
- Qiyu Liu, Libin Zheng, Yanyan Shen, and Lei Chen. Stable learned bloom filters for data streams. Proceedings of the VLDB Endowment, 13(12):2355--2367, 2020.Google ScholarDigital Library
- Zhenwei Dai and Anshumali Shrivastava. Adaptive learned bloom filter (ada-bf): efficient utilization of the classifier with application to real-time information filtering on the web. Advances in Neural Information Processing Systems, 33:11700--11710, 2020.Google Scholar
- The open source code of chainedfilter. https://github.com/ChainedFilter.Google Scholar
- Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The bloomier filter: an efficient data structure for static support lookup tables. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 30--39. Citeseer, 2004.Google ScholarDigital Library
- Denis Charles and Kumar Chellapilla. Bloomier filters: A second look. In Algorithms-ESA 2008: 16th Annual European Symposium, Karlsruhe, Germany, September 15--17, 2008. Proceedings 16, pages 259--270. Springer, 2008.Google Scholar
- Thomas Mueller Graf and Daniel Lemire. Xor filters: Faster and smaller than bloom and cuckoo filters. Journal of Experimental Algorithmics (JEA), 25:1--16, 2020.Google Scholar
- Thomas Mueller Graf and Daniel Lemire. Binary fuse filters: Fast and smaller than xor filters. Journal of Experimental Algorithmics (JEA), 27(1):1--15, 2022.Google Scholar
- Stefan Walzer. Peeling close to the orientability threshold--spatial coupling in hashing-based data structures. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2194--2211. SIAM, 2021.Google ScholarCross Ref
- Ye Yu, Djamal Belazzougui, Chen Qian, and Qin Zhang. Memory-efficient and ultra-fast network lookup and forwarding using othello hashing. IEEE/ACM Transactions on Networking, 26(3):1151--1164, 2018.Google ScholarDigital Library
- Yang Tong, Dongsheng Yang, Jie Jiang, Siang Gao, Bin Cui, Lei Shi, and Xiaoming Li. Coloring embedder: A memory efficient data structure for answering multi-set query. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 1142--1153. IEEE, 2019.Google ScholarCross Ref
- Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.Google ScholarDigital Library
- Yuriy Arbitman, Moni Naor, and Gil Segev. Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In 2010 IEEE 51st Annual symposium on foundations of computer science, pages 787--796. IEEE, 2010.Google ScholarDigital Library
- Shachar Lovett and Ely Porat. A space lower bound for dynamic approximate membership data structures. SIAM Journal on Computing, 42(6):2182--2196, 2013.Google ScholarDigital Library
- Murmur hashing source code. https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp.Google Scholar
- David A Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098--1101, 1952.Google ScholarCross Ref
- Jarek Duda. Asymmetric numeral systems: entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540, 2013.Google Scholar
- Jóhannes B Hreinsson, Morten Krøyer, and Rasmus Pagh. Storing a compressed function with constant time access. In Algorithms-ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7--9, 2009. Proceedings 17, pages 730--741. Springer, 2009.Google Scholar
- Pedro Reviriego, Alfonso Sánchez-Macián, Stefan Walzer, and Peter C Dillinger. Approximate membership query filters with a false positive free set. arXiv preprint arXiv:2111.06856, 2021.Google Scholar
- Rasmus Pagh and Flemming Friche Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004.Google ScholarDigital Library
- Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems, 12(10):1094--1104, 2001.Google ScholarDigital Library
- Udi Manber and Sun Wu. An algorithm for approximate membership checking with application to password security. Information Processing Letters, 50(4):191--197, 1994.Google ScholarDigital Library
- Li Fan, Pei Cao, Jussara Almeida, and Andrei Z Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM transactions on networking, 8(3):281--293, 2000.Google Scholar
- Salvatore Pontarelli, Pedro Reviriego, and Michael Mitzenmacher. Emoma: Exact match in one memory access. IEEE Transactions on Knowledge and Data Engineering, 30(11):2120--2133, 2018.Google ScholarDigital Library
- Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 14(771--780):1612, 1999.Google Scholar
- The open source code of learned filter. https://github.com/karthikeya20/Learned-Bloom-Filters.Google Scholar
- Yuhan Wu, Jintao He, Shen Yan, Jianyu Wu, Tong Yang, Olivier Ruas, Gong Zhang, and Bin Cui. Elastic bloom filter: deletable and expandable filter using elastic fingerprints. IEEE Transactions on Computers, 71(4):984--991, 2021.Google ScholarCross Ref
- Zhuochen Fan, Yubo Zhang, Siyuan Dong, Yi Zhou, Fangyi Liu, Tong Yang, Steve Uhlig, and Bin Cui. Hoppingsketch: More accurate temporal membership query and frequency query. IEEE Transactions on Knowledge and Data Engineering, 35(9):9067--9072, 2023.Google ScholarDigital Library
- Haoyu Li, Qizhi Chen, Yixin Zhang, Tong Yang, and Bin Cui. Stingy sketch: A sketch framework for accurate and fast frequency estimation. Proc. VLDB Endow., 15(7):1426--1438, mar 2022.Google ScholarDigital Library
- Yuanpeng Li, Feiyu Wang, Xiang Yu, Yilong Yang, Kaicheng Yang, Tong Yang, Zhuo Ma, Bin Cui, and Steve Uhlig. Ladderfilter: Filtering infrequent items with small memory and time overhead. Proceedings of the ACM on Management of Data, 1(1):1--21, 2023.Google ScholarDigital Library
- Jiarui Guo, Yisen Hong, YuhanWu, Yunfei Liu, Tong Yang, and Bin Cui. Sketchpolymer: Estimate per-item tail quantile using one sketch. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 590--601, 2023.Google ScholarDigital Library
- Ziwei Wang, Zheng Zhong, Jiarui Guo, Yuhan Wu, Haoyu Li, Tong Yang, Yaofeng Tu, Huanchen Zhang, and Bin Cui. Rencoder: A space-time efficient range filter with local encoder. In 2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 2036--2049, 2023.Google ScholarCross Ref
- Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. {SketchLib}: Enabling efficient sketch-based monitoring on programmable switches. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 743--759, 2022.Google Scholar
- Xiaodong Li, Zhuochen Fan, Haoyu Li, Zheng Zhong, Jiarui Guo, Sheng Long, Tong Yang, and Bin Cui. Steadysketch: Finding steady flows in data streams. In 2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS), pages 01--09. IEEE, 2023.Google ScholarCross Ref
- Zhuochen Fan, Gang Wen, Zhipeng Huang, Yang Zhou, Qiaobin Fu, Tong Yang, Alex X Liu, and Bin Cui. On the evolutionary of bloom filter false positives-an information theoretical approach to optimizing bloom filter parameters. IEEE Transactions on Knowledge and Data Engineering, 2022.Google ScholarDigital Library
- Hun Namkung, Zaoxing Liu, Daehyeok Kim, Vyas Sekar, and Peter Steenkiste. Sketchovsky: Enabling ensembles of sketches on programmable switches. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1273--1292, 2023.Google Scholar
- Zirui Liu, Chaozhe Kong, Kaicheng Yang, Tong Yang, Ruijie Miao, Qizhi Chen, Yikai Zhao, Yaofeng Tu, and Bin Cui. Hypercalm sketch: One-pass mining periodic batches in data streams.Google Scholar
- Martin Dietzfelbinger and Rasmus Pagh. Succinct data structures for retrieval and approximate membership. In Automata, Languages and Programming: 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7--11, 2008, Proceedings, Part I 35, pages 385--396. Springer, 2008.Google Scholar
- Ely Porat. An optimal bloom filter replacement based on matrix solving. In Computer Science-Theory and Applications: Fourth International Computer Science Symposium in Russia, CSR 2009, Novosibirsk, Russia, August 18--23, 2009. Proceedings 4, pages 263--273. Springer, 2009.Google ScholarDigital Library
- Bohdan S Majewski, Nicholas C Wormald, George Havas, and Zbigniew J Czech. A family of perfect hashing methods. The Computer Journal, 39(6):547--554, 1996.Google ScholarCross Ref
- Michael L Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with 0 (1) worst case access time. Journal of the ACM (JACM), 31(3):538--544, 1984.Google Scholar
- Andrej Brodnik and J Ian Munro. Membership in constant time and minimum space. In Algorithms-ESA'94: Second Annual European Symposium Utrecht, The Netherlands, September 26--28, 1994 Proceedings 2, pages 72--81. Springer, 1994.Google ScholarCross Ref
- Andrej Brodnik and J Ian Munro. Membership in constant time and almost-minimum space. SIAM Journal on computing, 28(5):1627--1640, 1999.Google ScholarDigital Library
- Rasmus Pagh. Low redundancy in static dictionaries with constant query time. SIAM Journal on Computing, 31(2):353--363, 2001.Google ScholarDigital Library
- Ioana Oriana Bercea and Guy Even. A space-efficient dynamic dictionary for multisets with constant time operations. arXiv preprint arXiv:2005.02143, 2020.Google Scholar
- Anna Pagh, Rasmus Pagh, and S Srinivasa Rao. An optimal bloom filter replacement. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 823--829, 2005.Google ScholarDigital Library
Index Terms
- ChainedFilter: Combining Membership Filters by Chain Rule
Recommendations
A linear ordering on the class of Trapezoidal intuitionistic fuzzy numbers
A complete ranking on the class of trapezoidal intuitionistic fuzzy numbers is achieved.In this paper our proposed method is compared with all familiar existing methods.Proposed ranking method is the best one when compare with all existing ...
Ranking of incomplete trapezoidal information
Any information system or decision model which consists of combinations of quantitative, qualitative, imprecise and incomplete informations can be modelled better using trapezoidal intuitionistic fuzzy numbers (TrIFNs) than interval valued ...
Fuzzy rule interpolation based on principle membership functions and uncertainty grade functions of interval type-2 fuzzy sets
Fuzzy rule interpolation is an important research topic in sparse fuzzy rule-based systems. In this paper, we present a new method for dealing with fuzzy rule interpolation in sparse fuzzy rule-based systems based on the principle membership functions ...
Comments