Abstract
In this paper, we propose, HyBench, a new benchmark for HTAP databases. First, we generate the testing data by simulating a representative HTAP application. We particularly develop a time-dependent generation phase and an anomaly generation phase for testing HTAP with large cardinality and various anomalies. Second, we propose a set of hybrid workloads. Specifically, we design 18 read/write transactions, 13 analytical queries, and a mix workload of 6 analytical transactions and 6 interactive queries. We also develop a graph-based parameter curation method to control the access patterns including skew access and data contention of the hybrid workload. Third, we propose a unified metric for quantifying the overall HTAP performance. Particularly, we introduce a query-driven method that evaluates the data freshness (lag time between analytics and transactions). Then we introduce a three-phase execution rule to compute a unified metric, combining the performance of OLTP (TPS), OLAP (QPS), and OLXP (XPS) and data freshness. To verify the effectiveness of HyBench and to debunk the myth of different HTAP architectures, extensive experiments have been conducted over five HTAP databases.
- Peter A. Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. In TPCTC (Lecture Notes in Computer Science), Vol. 8391. Springer, 61--76.Google Scholar
- Fábio Coelho, João Paulo, Ricardo Vilaça, José Pereira, and Rui Oliveira. 2017. HTAPBench: Hybrid Transactional and Analytical Processing Benchmark. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering. 293--304.Google ScholarDigital Library
- Richard Cole, Florian Funke, Leo Giakoumakis, et al. 2011. The Mixed Workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems. 1--6.Google ScholarDigital Library
- Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.Google ScholarDigital Library
- Markus Dreseler, Martin Boissier, Tilmann Rabl, and Matthias Uflacker. 2020. Quantifying TPC-H Choke Points and Their Optimizations. Proc. VLDB Endow. 13, 8 (2020), 1206--1220.Google ScholarDigital Library
- Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC social network benchmark: Interactive workload. In SIGMOD. 619--630.Google Scholar
- Google AlloyDB. 2023. AlloyDB Omni overview. https://cloud.google.com/alloydb/docs/omniGoogle Scholar
- Jim Gray. 1993. Database and Transaction Processing Performance Handbook.Google Scholar
- Qingsong Guo, Jiaheng Lu, Chao Zhang, Calvin Sun, and Steven Yuan. 2020. Multi-model data query languages and processing paradigms. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3505--3506.Google ScholarDigital Library
- Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: A Raft-based HTAP Database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072--3084.Google ScholarDigital Library
- Guoxin Kang, Lei Wang, Wanling Gao, Fei Tang, and Jianfeng Zhan. 2022. OLxP-Bench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems. In ICDE. IEEE, 1822--1834.Google Scholar
- Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, Teck-Hua Lee, et al. 2015. Oracle Database In-Memory: A Dual Format In-Memory Database. In ICDE. IEEE, 1253--1258.Google Scholar
- Per-Åke Larson, Adrian Birka, Eric N Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-Time Analytical Processing with SQL Server. VLDB 8, 12 (2015), 1740--1751.Google ScholarDigital Library
- Guoliang Li, Haowen Dong, and Chao Zhang. 2022. Cloud Databases: New Techniques, Challenges, and Opportunities. VLDB 15, 12 (2022), 3758--3761.Google ScholarDigital Library
- Guoliang Li and Chao Zhang. 2022. HTAP Databases: What is New and What is Next. In SIGMOD. 2483--2488.Google Scholar
- Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song. 2018. Heterogeneous Graph Neural Networks for Malicious Account Detection. In CIKM. ACM, 2077--2085.Google ScholarDigital Library
- Lerong Lu. 2018. How a little ant challenges giant banks? The rise of Ant Financial (Alipay)'s fintech empire and relevant regulatory concerns. International Company and Commercial Law Review (2018), Sweet & Maxwell, ISSN (2018), 0958--5214.Google Scholar
- Elena Milkai, Yannis Chronis, Kevin P. Gaffney, Zhihan Guo, Jignesh M. Patel, and Xiangyao Yu. 2022. How Good is My HTAP System?. In SIGMOD. ACM, 1810--1824.Google ScholarDigital Library
- MySQL 8.0. 2023. Consistent Nonlocking Reads. https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.htmlGoogle Scholar
- MySQL Heatwave. 2021. Real-time Analytics for MySQL Database Service.Google Scholar
- Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD. 677--689.Google Scholar
- Patrick E. O'Neil, Elizabeth J. O'Neil, Xuedong Chen, and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In TPCTC (Lecture Notes in Computer Science), Vol. 5895. Springer, 237--252.Google Scholar
- Oracle 21c. 2023. Automating Management of In-Memory Objects. https://docs.oracle.com/en/database/oracle/oracle-database/21/inmem/configuring-memory-management.htmlGoogle Scholar
- Vijayshankar Raman, Gopi Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M Lohman, et al. 2013. DB2 with BLU Acceleration: So Much More Than Just A Column Store. VLDB 6, 11 (2013), 1080--1091.Google ScholarDigital Library
- Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP Through Elastic Resource Scheduling. In SIGMOD. 2043--2054.Google Scholar
- Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient Transaction Processing in SAP HANA Database: The End of A Column Store Myth. In SIGMOD. 731--742.Google Scholar
- Snowflake Unistore. 2022. Getting Started with Transactional and Analytical data in Snowflake.Google Scholar
- Tecent. 2021. WeBank. https://segmentfault.com/a/1190000040792825/enGoogle Scholar
- Tecent. 2023. WeBank. https://www.webank.com/en/product/000001Google Scholar
- Tecent. 2023. WeBank. https://www.webank.com/en/characteristic/tech/bigdataGoogle Scholar
- Transaction Processing Performance Council. 2021. TPC-C.Google Scholar
- Transaction Processing Performance Council. 2021. TPC-H.Google Scholar
- Wikipedia. 2023. David DeWitt. https://en.wikipedia.org/wiki/David_DeWittGoogle Scholar
- Jiacheng Yang, Ian Rae, Jun Xu, et al. 2020. F1 Lightning: HTAP as a Service. Proceedings of the VLDB Endowment 13, 12 (2020), 3313--3325.Google ScholarDigital Library
- Zhenkun Yang, Chuanhui Yang, Fusheng Han, Mingqiang Zhuang, Bing Yang, Zhifeng Yang, Xiaojun Cheng, Yuzhong Zhao, Wenhui Shi, Huafeng Xi, Huang Yu, Bin Liu, Yi Pan, Boxue Yin, Junquan Chen, and Quanqing Xu. 2022. OceanBase: A 707 Million tpmC Distributed Relational Database System. Proceedings of the VLDB Endowment 15, 12 (2022), 3385--3397.Google ScholarDigital Library
- Chao Zhang and Jiaheng Lu. 2020. Selectivity estimation for relation-tree joins. In 32nd International Conference on Scientific and Statistical Database Management (SSDBM). 1--12.Google ScholarDigital Library
- Chao Zhang and Jiaheng Lu. 2021. Holistic evaluation in multi-model databases benchmarking. Distributed Parallel Databases 39, 1 (2021), 1--33.Google ScholarDigital Library
- Chao Zhang, Jiaheng Lu, Pengfei Xu, and Yuxing Chen. 2018. UniBench: A Benchmark for Multi-model Database Management Systems. In TPCTC, Vol. 11135. Springer, 7--23.Google Scholar
Recommendations
Rethink Query Optimization in HTAP Databases
PACMMODThe advent of data-intensive applications has fueled the evolution of hybrid transactional and analytical processing (HTAP). To support mixed workloads, distributed HTAP databases typically maintain two data copies that are specially tailored for data ...
How Good is My HTAP System?
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataHybrid Transactional and Analytical Processing (HTAP) systems have recently gained popularity as they combine OLAP and OLTP processing to reduce administrative and synchronization costs between dedicated systems. However, there is no precise ...
HTAP Databases: What is New and What is Next
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataProcessing the mixed workloads of transactions and analytical queries in a single database system can eliminate the ETL process and enable real-time data analysis on the transaction data. However, there is no free lunch. Such systems must balance the ...
Comments