Abstract
Hybrid transactional/analytical processing (HTAP) would overload database systems. To alleviate performance interference between transactions and analytics, recent research pursues the potential of in-storage processing (ISP) using commodity computational storage devices (CSDs). However, in-storage query processing faces technical challenges in HTAP environments. Continuously updated data versions pose two hurdles: (1) data items keep changing, and (2) finding visible data versions incurs excessive data access in CSDs. Such access patterns dominate the cost of query processing, which may hinder the active deployment of CSDs.
This paper addresses the core issues by proposing an analytic offload engine (AIDE) that transforms engine-specific query execution logic into vendor-neutral computation through a canonical interface. At the core of AIDE are the canonical representation of vendor-specific data and the separate management of data locators. It enables any CSD to execute vendor-neutral operations on canonical tuples with separate indexes, regardless of host databases. To eliminate excessive data access, we prescreen the indexes before offloading; thus, host-side prescreening can obviate the need for running costly version searching in CSDs and boost analytics. We implemented our prototype for PostgreSQL and MyRocks, demonstrating that AIDE supports efficient ISP for two databases using the same FPGA logic. Evaluation results show that AIDE improves query latency up to 42× on PostgreSQL and 34× on MyRocks.
- 2020. sysbench-1.0.20. Available at https://github.com/akopytov/sysbench.Google Scholar
- 2022. HammerDB Version 4.4. Available at https://github.com/TPC-Council/HammerDB/releases/tag/v4.4.Google Scholar
- 2022. NTT OSS Center DBMS Development and Support Team: pg_hint_plan-1.4. Available at https://github.com/ossc-db/pg_hint_plan.Google Scholar
- 2022. Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393): Vitis Analyzer. Available at https://docs.xilinx.com/r/en-US/ug1393-vitis-application-acceleration/Using-the-Vitis-Analyzer.Google Scholar
- Amazon Web Services, Inc. 2022. What Is AWS Glue? https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html.Google Scholar
- Oracle Corporation and/or its affiliates. 2022. MySQL 8.0 Reference Manual: 15.3 InnoDB Multi-Versioning. https://dev.mysql.com/doc/refman/8.0/en/innodb-multi-versioning.htmlGoogle Scholar
- Oracle Corporation and/or its affiliates. 2022. Oracle Database Concept: 9 Data Concurrency and Consistency. https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/data-concurrency-and-consistency.htmlGoogle Scholar
- Philip A. Bernstein and Nathan Goodman. 1982. Concurrency Control Algorithms for Multiversion Database Systems. In Proceedings of the First ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (Ottawa, Canada) (PODC '82). Association for Computing Machinery, New York, NY, USA, 209--215. Google ScholarDigital Library
- Philip A. Bernstein and Nathan Goodman. 1983. Multiversion Concurrency Control---Theory and Algorithms. ACM Trans. Database Syst. 8, 4 (Dec. 1983), 465--483. Google ScholarDigital Library
- Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, et al. 2020. POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 29--41.Google Scholar
- Citus Data. 2020. Citusdata: Tools for running CH-benCHmark with HammerDB. https://github.com/citusdata/ch-benchmark.Google Scholar
- Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, Kai-Uwe Sattler, Michael Seibold, Eric Simon, and Florian Waas. 2011. The Mixed Workload CH-BenCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems (Athens, Greece) (DBTest '11). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages. Google ScholarDigital Library
- Umur Cubukcu, Ozgun Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, and Marco Slot. 2021. Citus: Distributed PostgreSQL for Data-Intensive Applications. In Proceedings of the 2021 International Conference on Management of Data (Xi'an, Shaanxi, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 2490--2502. Google ScholarDigital Library
- Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 1243--1254. Google ScholarDigital Library
- Jaeyoung Do, Yang-Suk Kee, Jignesh M Patel, Chanik Park, Kwanghyun Park, and David J DeWitt. 2013. Query processing on smart ssds: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1221--1230.Google ScholarDigital Library
- Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012. SAP HANA Database: Data Management for Modern Business Applications. SIGMOD Rec. 40, 4 (Jan. 2012), 45--51. Google ScholarDigital Library
- Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. 2006. GPUTeraSort: high performance graphics co-processor sorting for large database management. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 325--336.Google ScholarDigital Library
- Carnegie Mellon University Database Group. 2020. Peloton: The Self-driving Database Management System. https://pelotondb.io/Google Scholar
- Carnegie Mellon University Database Group. 2020. Terrier: The Self-driving Database Management System. https://github.com/cmu-db/terrierGoogle Scholar
- Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 511--524.Google ScholarDigital Library
- Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. 2020. TiDB: A Raft-Based HTAP Database. Proc. VLDB Endow. 13, 12 (aug 2020), 3072--3084. Google ScholarDigital Library
- Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind Arvind, and Sungjin Lee. 2020. PinK: High-Speed in-Storage Key-Value Store with Bounded Tails. USENIX Association, USA.Google Scholar
- MemSQL Inc. 2022. MemSQL. https://www.memsql.com/Google Scholar
- Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A High-Performance Database System Leveraging in-Storage Computing. Proc. VLDB Endow. 9, 12 (aug 2016), 924--935. Google ScholarDigital Library
- Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). IEEE Computer Society, USA, 195--206. Google ScholarDigital Library
- Jongbin Kim, Kihwang Kim, Hyunsoo Cho, Jaeseon Yu, Sooyong Kang, and Hyungsoo Jung. 2021. Rethink the Scan in MVCC Databases. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 938--950. Google ScholarDigital Library
- Jongbin Kim, Jaeseon Yu, Jaechan Ahn, Sooyong Kang, and Hyungsoo Jung. 2022. Diva: Making MVCC Systems HTAP-Friendly. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 49--64. Google ScholarDigital Library
- Per-Åke Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-Time Analytical Processing with SQL Server. Proc. VLDB Endow. 8, 12 (aug 2015), 1740--1751. Google ScholarDigital Library
- Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 137--152. Google ScholarDigital Library
- Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, and Andrew Pavlo. 2020. Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats. Proc. VLDB Endow. 14, 4 (Dec. 2020), 534--546. Google ScholarDigital Library
- Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, and Xiaowei Li. 2019. InSDLA: An In-SSD Deep Learning Accelerator for Near-Data Processing. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 173--179. Google ScholarCross Ref
- Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. Association for Computing Machinery, New York, NY, USA, 2530--2542. Google ScholarDigital Library
- Microsoft. 2022. Microsoft SQL Server. https://www.microsoft.com/en-us/sql-server/Google Scholar
- NuoDB. 2022. NuoDB. https://nuodb.com/Google Scholar
- Christos H. Papadimitriou and Paris C. Kanellakis. 1982. On Concurrency Control by Multiple Versions. In Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Los Angeles, California) (PODS '82). Association for Computing Machinery, New York, NY, USA, 76--82. Google ScholarDigital Library
- Boris Pismenny, Liran Liss, Adam Morrison, and Dan Tsafrir. 2022. The Benefits of General-Purpose on-NIC Memory. Association for Computing Machinery, New York, NY, USA, 1130--1147. Google ScholarDigital Library
- D. P. Reed. 1978. Naming and Synchronization in a Decentralized Computer System. Technical Report. USA.Google Scholar
- Erik Riedel, Christos Faloutsos, and David Nagle. 2000. Active disk architecture for databases. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.Google Scholar
- Rubao Lee, Minghong Zhou, Chi Li, Shenggang Hu, Jianping Teng, Dongyang Li, and Xiaodong Zhang. 2021. The Art of Balance: A RateupDB™ Experience of Building a CPU/GPU Hybrid Database Product. Proc. VLDB Endow. 14, 12 (Aug. 2021), 2999--3013. Google ScholarDigital Library
- Henry N. Schuh, Weihao Liang, Ming Liu, Jacob Nelson, and Arvind Krishnamurthy. 2021. Xenic: SmartNIC-Accelerated Distributed Transactions. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 740--755. Google ScholarDigital Library
- Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 731--742. Google ScholarDigital Library
- Malcolm Singh and Ben Leonhardi. 2011. Introduction to the IBM Netezza warehouse appliance. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research. 385--386.Google ScholarDigital Library
- The PostgreSQL Global Development Group. 2022. PostgreSQL: Documentation for PostgreSQL 12: Chapter 15. Parallel Query. https://www.postgresql.org/docs/12/parallel-query.html.Google Scholar
- The PostgreSQL Global Development Group. 2022. PostgreSQL: Documentation for PostgreSQL 12: Chapter 29.3. Asynchronous Commit. https://www.postgresql.org/docs/12/wal-async-commit.html.Google Scholar
- Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-Efficient, in-Situ Data Analytics on Extreme-Scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (San Jose, CA) (FAST'13). USENIX Association, USA, 119--132.Google Scholar
- Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, and Andreas Koch. 2020. NKV: Near-Data Processing with KV-Stores on Native Computational Storage. In Proceedings of the 16th International Workshop on Data Management on New Hardware (Portland, Oregon) (DaMoN '20). Association for Computing Machinery, New York, NY, USA, Article 10, 11 pages. Google ScholarDigital Library
- Tobias Vinçon, Christian Knödler, Leonardo Solis-Vasquez, Arthur Bernhardt, Sajjad Tamimi, Lukas Weber, Florian Stock, Andreas Koch, and Ilia Petrov. 2022. Near-Data Processing in Database Systems on Native Computational Storage under HTAP Workloads. Proc. VLDB Endow. 15, 10 (sep 2022), 1991--2004. Google ScholarDigital Library
- Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2021. Evaluating List Intersection on SSDs for Parallel I/O Skipping. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1823--1828. Google ScholarCross Ref
- Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD In-Storage Computing for List Intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware (San Francisco, California) (DaMoN '16). Association for Computing Machinery, New York, NY, USA, Article 4, 7 pages. Google ScholarDigital Library
- Satoru Watanabe, Kazuhisa Fujimoto, Yuji Saeki, Yoshifumi Fujikawa, and Hiroshi Yoshino. 2019. Column-oriented database acceleration using FPGAs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 686--697.Google ScholarCross Ref
- Ronald Weiss. 2012. A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012).Google Scholar
- Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced sql offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963--974.Google ScholarDigital Library
- Xilinx. 2021. SmartSSD Computational Storage Drive. https://www.xilinx.com/applications/data-center/computational-storage/smartssd.htmlGoogle Scholar
- Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 386--399. Google ScholarCross Ref
- Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 386--399.Google Scholar
- Shuotao Xu, Sungjin Lee, Sang Woo Jun, Ming Liu, Jamey Hicks, and Arvind. 2016. BlueCache: A Scalable Distributed Flash-based Key-value Store. Proc. VLDB Endow. 10 (2016), 301--312.Google ScholarDigital Library
- Haichang Yang, Zhaoshi Li, Jiawei Wang, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2021. HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms. In 2021 Design, Automation Test in Europe Conference Exhibition (DATE). 834--837. Google ScholarCross Ref
- Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, Yuan Gao, Qilin Dong, Junxiong Zhou, Jeremy Wood, Goetz Graefe, Jeff Naughton, and John Cieslewicz. 2020. F1 Lightning: HTAP as a Service. Proc. VLDB Endow. 13, 12 (aug 2020), 3313--3325. Google ScholarDigital Library
Recommendations
Diva: Making MVCC Systems HTAP-Friendly
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataMultiversion concurrency control (MVCC) and design principles thereof are ingrained in modern database management systems, thus promoting remarkable progress in managing online transaction processing (OLTP) workloads for decades. However, MVCC systems ...
HTAP Databases: What is New and What is Next
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataProcessing the mixed workloads of transactions and analytical queries in a single database system can eliminate the ETL process and enable real-time data analysis on the transaction data. However, there is no free lunch. Such systems must balance the ...
Near-data processing in database systems on native computational storage under HTAP workloads
Today's Hybrid Transactional and Analytical Processing (HTAP) systems, tackle the ever-growing data in combination with a mixture of transactional and analytical workloads. While optimizing for aspects such as data freshness and performance isolation, ...
Comments