research-article

Asymptotically Better Query Optimization Using Indexed Algebra

Authors:
Philipp Fent

Technische Universität München

Technische Universität München
View Profile

,
Guido Moerkotte

Universität Mannheim

Universität Mannheim
View Profile

,
Thomas Neumann

Technische Universität München

Technische Universität München
View Profile

Authors Info & Claims

Proceedings of the VLDB Endowment Volume 16 Issue 11pp 3018–3030https://doi.org/10.14778/3611479.3611505

Published:01 July 2023Publication History

Proceedings of the VLDB Endowment

Abstract

Query optimization is essential for the efficient execution of queries. The necessary analysis, if we can and should apply optimizations and transform the query plan, is already challenging. Traditional techniques focus on the availability of columns at individual operators, which does not scale for analysis of data flow through the query. Tracking available columns per operator takes quadratic space, which can result in multi-second optimization time for deep algebra trees. Instead, we need to re-think the naïve algebra representation to efficiently support data flow analysis.

In this paper, we introduce Indexed Algebra, a novel representation of relational algebra that makes common optimization tasks efficient. Indexed Algebra enables efficient reasoning with an auxiliary index structure based on link/cut trees that support dynamic updates and queries in O(log n). This approach not only improves the asymptotic complexity, but also allows elegant and concise formulations for the data flow questions needed for query optimization. While large queries see theoretically unbounded improvements, Indexed Algebra also improves optimization time of the relatively harmless queries of TPC-H and TPC-DS by more than 1.8×.

References

Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. Proc. VLDB Endow. 5, 10 (2012), 968--979.Google ScholarDigital Library
Darshana Balakrishnan, Carl Nuessle, Oliver Kennedy, and Lukasz Ziarek. 2021. TreeToaster: Towards an IVM-Optimized Compiler. In SIGMOD Conference. ACM, 155--167.Google ScholarDigital Library
Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources. In SIGMOD. ACM, 221--230.Google ScholarDigital Library
Peter A. Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. In TPCTC (Lecture Notes in Computer Science), Vol. 8391. Springer, 61--76.Google Scholar
Nicolas Dieu, Adrian Dragusanu, Françoise Fabret, François Llirbat, and Eric Simon. 2009. 1,000 Tables Under the From. Proc. VLDB Endow. 2, 2 (2009), 1450--1461.Google ScholarDigital Library
Philipp Fent, Altan Birler, and Thomas Neumann. 2022. Practical planning and execution of groupjoin and nested aggregates. VLDB J. (2022).Google Scholar
Michael J. Freitag and Thomas Neumann. 2019. Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates. In CIDR. www.cidrdb.org.Google Scholar
Goetz Graefe. 1994. Volcano - An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 120--135.Google ScholarDigital Library
Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29.Google Scholar
Goetz Graefe and David J. DeWitt. 1987. The EXODUS Optimizer Generator. In SIGMOD. ACM Press, 160--172.Google Scholar
Torsten Grust. 2002. Accelerating XPath location steps. In SIGMOD Conference. ACM, 109--120.Google ScholarDigital Library
Dov Harel and Robert Endre Tarjan. 1984. Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput. 13, 2 (1984), 338--355.Google ScholarDigital Library
Michael Jungmair, André Kohn, and Jana Giceva. 2022. Designing an Open Framework for Query Optimization and Compilation. Proc. VLDB Endow. 15, 11 (2022), 2389--2401.Google ScholarDigital Library
Timo Kersten, Viktor Leis, and Thomas Neumann. 2021. Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra. VLDB J. 30, 5 (2021), 883--905.Google ScholarDigital Library
Philip N. Klein and Shay Mozes. 2021. Optimization Algorithms for Planar Graphs. https://planarity.org.Google Scholar
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. ACM, 311--326.Google ScholarDigital Library
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9, 3 (2015), 204--215.Google ScholarDigital Library
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In SIGMOD. ACM, 2530--2542.Google Scholar
Norman May, Alexander Böhm, and Wolfgang Lehner. 2017. SAP HANA - The Evolution of an In-Memory DBMS from Pure OLAP Processing Towards Mixed Workloads. In BTW (LNI), Vol. P-265. GI, 545--563.Google Scholar
Guido Moerkotte. 2020. Building Query Compilers. http://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf.Google Scholar
Raghunath Othayoth Nambiar and Meikel Poess. 2006. The Making of TPC-DS. In VLDB. ACM, 1049--1058.Google Scholar
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539--550.Google ScholarDigital Library
Thomas Neumann. 2018. Reasoning in the Presence of NULLs. In ICDE. IEEE Computer Society, 1682--1683.Google Scholar
Thomas Neumann. 2020. Linear Time Liveness Analysis. https://databasearchitects.blogspot.com/2020/04/linear-time-liveness-analysis.html.Google Scholar
Thomas Neumann. 2020. Taming Deep Recursion. https://databasearchitects.blogspot.com/2020/11/taming-deep-recursion.html.Google Scholar
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR. www.cidrdb.org.Google Scholar
Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW (LNI), Vol. P-241. GI, 383--402.Google Scholar
Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In SIGMOD. ACM, 677--692.Google Scholar
Patrick E. O'Neil, Elizabeth J. O'Neil, Shankar Pal, Istvan Cseri, Gideon Schaller, and Nigel Westbury. 2004. ORDPATHs: Insert-Friendly XML Node Labels. In SIGMOD Conference. ACM, 903--908.Google ScholarDigital Library
Ravindra Pindikura. 2018. Gandiva Initiative: Improving SQL Performance by 70x. https://www.dremio.com/gandiva-performance-improvements-production-query/.Google Scholar
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. 1979. Access Path Selection in a Relational Database Management System. In SIGMOD Conference. ACM, 23--34.Google Scholar
Hesam Shahrokhi and Amir Shaikhha. 2023. Building a Compiled Query Engine in Python. In CC. ACM, 180--190.Google Scholar
Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In SIGMOD. ACM, 1907--1922.Google ScholarDigital Library
Daniel Dominic Sleator. 2011. Submission #860934 - Codeforces. https://codeforces.com/contest/117/submission/860934.Google Scholar
Daniel Dominic Sleator and Robert Endre Tarjan. 1983. A Data Structure for Dynamic Trees. J. Comput. Syst. Sci. 26, 3 (1983), 362--391.Google ScholarDigital Library
Daniel Dominic Sleator and Robert Endre Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM 32, 3 (1985), 652--686.Google ScholarDigital Library
Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, and Rhonda Baldwin. 2014. Orca: a modular query optimizer architecture for big data. In SIGMOD. ACM, 337--348.Google Scholar
Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In SIGMOD Conference. ACM, 307--322.Google Scholar
Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In DBTest@SIGMOD. ACM, 1:1--1:6.Google Scholar
Adrian Vogelsgesang, Tobias Mühlbauer, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2019. Domain Query Optimization: Adapting the General-Purpose Database System Hyper for Tableau Workloads. In BTW (LNI), Vol. P-289. Gesellschaft für Informatik, Bonn, 313--333.Google Scholar

Recommendations

Distributed Query Plan Generation using Ant Colony Optimization

Query processing is a critical performance evaluation parameter and has received a considerable amount of attention especially in the context of distributed database systems. The aim of distributed query processing is to effectively and efficiently ...
Read More
Distributed Query Plan Generation using Particle Swarm Optimization

A large number of queries are posed on databases spread across the globe. In order to process these queries efficiently, optimal query processing strategies that generate efficient query processing plans are being devised. In distributed relational ...
Read More
Query optimization using views in semistructured databases
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 16, Issue 11
July 2023
789 pages
ISSN:2150-8097
Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2023
Published in pvldb Volume 16, Issue 11

Check for updates
Badges
- Artifacts Available / v1.1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 95
  Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Asymptotically Better Query Optimization Using Indexed Algebra

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Distributed Query Plan Generation using Ant Colony Optimization

Distributed Query Plan Generation using Particle Swarm Optimization

Query optimization using views in semistructured databases