skip to main content
research-article
Artifacts Available / v1.1

Deploying Computational Storage for HTAP DBMSs Takes More Than Just Computation Offloading

Published:01 February 2023Publication History
Skip Abstract Section

Abstract

Hybrid transactional/analytical processing (HTAP) would overload database systems. To alleviate performance interference between transactions and analytics, recent research pursues the potential of in-storage processing (ISP) using commodity computational storage devices (CSDs). However, in-storage query processing faces technical challenges in HTAP environments. Continuously updated data versions pose two hurdles: (1) data items keep changing, and (2) finding visible data versions incurs excessive data access in CSDs. Such access patterns dominate the cost of query processing, which may hinder the active deployment of CSDs.

This paper addresses the core issues by proposing an analytic offload engine (AIDE) that transforms engine-specific query execution logic into vendor-neutral computation through a canonical interface. At the core of AIDE are the canonical representation of vendor-specific data and the separate management of data locators. It enables any CSD to execute vendor-neutral operations on canonical tuples with separate indexes, regardless of host databases. To eliminate excessive data access, we prescreen the indexes before offloading; thus, host-side prescreening can obviate the need for running costly version searching in CSDs and boost analytics. We implemented our prototype for PostgreSQL and MyRocks, demonstrating that AIDE supports efficient ISP for two databases using the same FPGA logic. Evaluation results show that AIDE improves query latency up to 42× on PostgreSQL and 34× on MyRocks.

References

  1. 2020. sysbench-1.0.20. Available at https://github.com/akopytov/sysbench.Google ScholarGoogle Scholar
  2. 2022. HammerDB Version 4.4. Available at https://github.com/TPC-Council/HammerDB/releases/tag/v4.4.Google ScholarGoogle Scholar
  3. 2022. NTT OSS Center DBMS Development and Support Team: pg_hint_plan-1.4. Available at https://github.com/ossc-db/pg_hint_plan.Google ScholarGoogle Scholar
  4. 2022. Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393): Vitis Analyzer. Available at https://docs.xilinx.com/r/en-US/ug1393-vitis-application-acceleration/Using-the-Vitis-Analyzer.Google ScholarGoogle Scholar
  5. Amazon Web Services, Inc. 2022. What Is AWS Glue? https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html.Google ScholarGoogle Scholar
  6. Oracle Corporation and/or its affiliates. 2022. MySQL 8.0 Reference Manual: 15.3 InnoDB Multi-Versioning. https://dev.mysql.com/doc/refman/8.0/en/innodb-multi-versioning.htmlGoogle ScholarGoogle Scholar
  7. Oracle Corporation and/or its affiliates. 2022. Oracle Database Concept: 9 Data Concurrency and Consistency. https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/data-concurrency-and-consistency.htmlGoogle ScholarGoogle Scholar
  8. Philip A. Bernstein and Nathan Goodman. 1982. Concurrency Control Algorithms for Multiversion Database Systems. In Proceedings of the First ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (Ottawa, Canada) (PODC '82). Association for Computing Machinery, New York, NY, USA, 209--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philip A. Bernstein and Nathan Goodman. 1983. Multiversion Concurrency Control---Theory and Algorithms. ACM Trans. Database Syst. 8, 4 (Dec. 1983), 465--483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, et al. 2020. POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 29--41.Google ScholarGoogle Scholar
  11. Citus Data. 2020. Citusdata: Tools for running CH-benCHmark with HammerDB. https://github.com/citusdata/ch-benchmark.Google ScholarGoogle Scholar
  12. Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, Kai-Uwe Sattler, Michael Seibold, Eric Simon, and Florian Waas. 2011. The Mixed Workload CH-BenCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems (Athens, Greece) (DBTest '11). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Umur Cubukcu, Ozgun Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, and Marco Slot. 2021. Citus: Distributed PostgreSQL for Data-Intensive Applications. In Proceedings of the 2021 International Conference on Management of Data (Xi'an, Shaanxi, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 2490--2502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL Server's Memory-Optimized OLTP Engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD '13). Association for Computing Machinery, New York, NY, USA, 1243--1254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jaeyoung Do, Yang-Suk Kee, Jignesh M Patel, Chanik Park, Kwanghyun Park, and David J DeWitt. 2013. Query processing on smart ssds: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1221--1230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012. SAP HANA Database: Data Management for Modern Business Applications. SIGMOD Rec. 40, 4 (Jan. 2012), 45--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. 2006. GPUTeraSort: high performance graphics co-processor sorting for large database management. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 325--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Carnegie Mellon University Database Group. 2020. Peloton: The Self-driving Database Management System. https://pelotondb.io/Google ScholarGoogle Scholar
  19. Carnegie Mellon University Database Group. 2020. Terrier: The Self-driving Database Management System. https://github.com/cmu-db/terrierGoogle ScholarGoogle Scholar
  20. Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 511--524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. 2020. TiDB: A Raft-Based HTAP Database. Proc. VLDB Endow. 13, 12 (aug 2020), 3072--3084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind Arvind, and Sungjin Lee. 2020. PinK: High-Speed in-Storage Key-Value Store with Bounded Tails. USENIX Association, USA.Google ScholarGoogle Scholar
  23. MemSQL Inc. 2022. MemSQL. https://www.memsql.com/Google ScholarGoogle Scholar
  24. Insoon Jo, Duck-Ho Bae, Andre S. Yoon, Jeong-Uk Kang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2016. YourSQL: A High-Performance Database System Leveraging in-Storage Computing. Proc. VLDB Endow. 9, 12 (aug 2016), 924--935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE '11). IEEE Computer Society, USA, 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jongbin Kim, Kihwang Kim, Hyunsoo Cho, Jaeseon Yu, Sooyong Kang, and Hyungsoo Jung. 2021. Rethink the Scan in MVCC Databases. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 938--950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jongbin Kim, Jaeseon Yu, Jaechan Ahn, Sooyong Kang, and Hyungsoo Jung. 2022. Diva: Making MVCC Systems HTAP-Friendly. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 49--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Per-Åke Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-Time Analytical Processing with SQL Server. Proc. VLDB Endow. 8, 12 (aug 2015), 1740--1751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 137--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tianyu Li, Matthew Butrovich, Amadou Ngom, Wan Shen Lim, Wes McKinney, and Andrew Pavlo. 2020. Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats. Proc. VLDB Endow. 14, 4 (Dec. 2020), 534--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, and Xiaowei Li. 2019. InSDLA: An In-SSD Deep Learning Accelerator for Near-Data Processing. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 173--179. Google ScholarGoogle ScholarCross RefCross Ref
  32. Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. Association for Computing Machinery, New York, NY, USA, 2530--2542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Microsoft. 2022. Microsoft SQL Server. https://www.microsoft.com/en-us/sql-server/Google ScholarGoogle Scholar
  34. NuoDB. 2022. NuoDB. https://nuodb.com/Google ScholarGoogle Scholar
  35. Christos H. Papadimitriou and Paris C. Kanellakis. 1982. On Concurrency Control by Multiple Versions. In Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Los Angeles, California) (PODS '82). Association for Computing Machinery, New York, NY, USA, 76--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Boris Pismenny, Liran Liss, Adam Morrison, and Dan Tsafrir. 2022. The Benefits of General-Purpose on-NIC Memory. Association for Computing Machinery, New York, NY, USA, 1130--1147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. P. Reed. 1978. Naming and Synchronization in a Decentralized Computer System. Technical Report. USA.Google ScholarGoogle Scholar
  38. Erik Riedel, Christos Faloutsos, and David Nagle. 2000. Active disk architecture for databases. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.Google ScholarGoogle Scholar
  39. Rubao Lee, Minghong Zhou, Chi Li, Shenggang Hu, Jianping Teng, Dongyang Li, and Xiaodong Zhang. 2021. The Art of Balance: A RateupDB™ Experience of Building a CPU/GPU Hybrid Database Product. Proc. VLDB Endow. 14, 12 (Aug. 2021), 2999--3013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Henry N. Schuh, Weihao Liang, Ming Liu, Jacob Nelson, and Arvind Krishnamurthy. 2021. Xenic: SmartNIC-Accelerated Distributed Transactions. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 740--755. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of a Column Store Myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 731--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Malcolm Singh and Ben Leonhardi. 2011. Introduction to the IBM Netezza warehouse appliance. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research. 385--386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. The PostgreSQL Global Development Group. 2022. PostgreSQL: Documentation for PostgreSQL 12: Chapter 15. Parallel Query. https://www.postgresql.org/docs/12/parallel-query.html.Google ScholarGoogle Scholar
  44. The PostgreSQL Global Development Group. 2022. PostgreSQL: Documentation for PostgreSQL 12: Chapter 29.3. Asynchronous Commit. https://www.postgresql.org/docs/12/wal-async-commit.html.Google ScholarGoogle Scholar
  45. Devesh Tiwari, Simona Boboila, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter J. Desnoyers, and Yan Solihin. 2013. Active Flash: Towards Energy-Efficient, in-Situ Data Analytics on Extreme-Scale Machines. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (San Jose, CA) (FAST'13). USENIX Association, USA, 119--132.Google ScholarGoogle Scholar
  46. Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, and Andreas Koch. 2020. NKV: Near-Data Processing with KV-Stores on Native Computational Storage. In Proceedings of the 16th International Workshop on Data Management on New Hardware (Portland, Oregon) (DaMoN '20). Association for Computing Machinery, New York, NY, USA, Article 10, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tobias Vinçon, Christian Knödler, Leonardo Solis-Vasquez, Arthur Bernhardt, Sajjad Tamimi, Lukas Weber, Florian Stock, Andreas Koch, and Ilia Petrov. 2022. Near-Data Processing in Database Systems on Native Computational Storage under HTAP Workloads. Proc. VLDB Endow. 15, 10 (sep 2022), 1991--2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2021. Evaluating List Intersection on SSDs for Parallel I/O Skipping. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1823--1828. Google ScholarGoogle ScholarCross RefCross Ref
  49. Jianguo Wang, Dongchul Park, Yang-Suk Kee, Yannis Papakonstantinou, and Steven Swanson. 2016. SSD In-Storage Computing for List Intersection. In Proceedings of the 12th International Workshop on Data Management on New Hardware (San Francisco, California) (DaMoN '16). Association for Computing Machinery, New York, NY, USA, Article 4, 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Satoru Watanabe, Kazuhisa Fujimoto, Yuji Saeki, Yoshifumi Fujikawa, and Hiroshi Yoshino. 2019. Column-oriented database acceleration using FPGAs. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 686--697.Google ScholarGoogle ScholarCross RefCross Ref
  51. Ronald Weiss. 2012. A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012).Google ScholarGoogle Scholar
  52. Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An intelligent storage engine with support for advanced sql offloading. Proceedings of the VLDB Endowment 7, 11 (2014), 963--974.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xilinx. 2021. SmartSSD Computational Storage Drive. https://www.xilinx.com/applications/data-center/computational-storage/smartssd.htmlGoogle ScholarGoogle Scholar
  54. Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 386--399. Google ScholarGoogle ScholarCross RefCross Ref
  55. Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 386--399.Google ScholarGoogle Scholar
  56. Shuotao Xu, Sungjin Lee, Sang Woo Jun, Ming Liu, Jamey Hicks, and Arvind. 2016. BlueCache: A Scalable Distributed Flash-based Key-value Store. Proc. VLDB Endow. 10 (2016), 301--312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Haichang Yang, Zhaoshi Li, Jiawei Wang, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2021. HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms. In 2021 Design, Automation Test in Europe Conference Exhibition (DATE). 834--837. Google ScholarGoogle ScholarCross RefCross Ref
  58. Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, Yuan Gao, Qilin Dong, Junxiong Zhou, Jeremy Wood, Goetz Graefe, Jeff Naughton, and John Cieslewicz. 2020. F1 Lightning: HTAP as a Service. Proc. VLDB Endow. 13, 12 (aug 2020), 3313--3325. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)236
    • Downloads (Last 6 weeks)16

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader