skip to main content
research-article
Artifacts Available / v1.1

Asymptotically Better Query Optimization Using Indexed Algebra

Published:01 July 2023Publication History
Skip Abstract Section

Abstract

Query optimization is essential for the efficient execution of queries. The necessary analysis, if we can and should apply optimizations and transform the query plan, is already challenging. Traditional techniques focus on the availability of columns at individual operators, which does not scale for analysis of data flow through the query. Tracking available columns per operator takes quadratic space, which can result in multi-second optimization time for deep algebra trees. Instead, we need to re-think the naïve algebra representation to efficiently support data flow analysis.

In this paper, we introduce Indexed Algebra, a novel representation of relational algebra that makes common optimization tasks efficient. Indexed Algebra enables efficient reasoning with an auxiliary index structure based on link/cut trees that support dynamic updates and queries in O(log n). This approach not only improves the asymptotic complexity, but also allows elegant and concise formulations for the data flow questions needed for query optimization. While large queries see theoretically unbounded improvements, Indexed Algebra also improves optimization time of the relatively harmless queries of TPC-H and TPC-DS by more than 1.8×.

References

  1. Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. Proc. VLDB Endow. 5, 10 (2012), 968--979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Darshana Balakrishnan, Carl Nuessle, Oliver Kennedy, and Lukasz Ziarek. 2021. TreeToaster: Towards an IVM-Optimized Compiler. In SIGMOD Conference. ACM, 155--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Edmon Begoli, Jesús Camacho-Rodríguez, Julian Hyde, Michael J. Mior, and Daniel Lemire. 2018. Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources. In SIGMOD. ACM, 221--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter A. Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. In TPCTC (Lecture Notes in Computer Science), Vol. 8391. Springer, 61--76.Google ScholarGoogle Scholar
  5. Nicolas Dieu, Adrian Dragusanu, Françoise Fabret, François Llirbat, and Eric Simon. 2009. 1,000 Tables Under the From. Proc. VLDB Endow. 2, 2 (2009), 1450--1461.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Philipp Fent, Altan Birler, and Thomas Neumann. 2022. Practical planning and execution of groupjoin and nested aggregates. VLDB J. (2022).Google ScholarGoogle Scholar
  7. Michael J. Freitag and Thomas Neumann. 2019. Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  8. Goetz Graefe. 1994. Volcano - An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 120--135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29.Google ScholarGoogle Scholar
  10. Goetz Graefe and David J. DeWitt. 1987. The EXODUS Optimizer Generator. In SIGMOD. ACM Press, 160--172.Google ScholarGoogle Scholar
  11. Torsten Grust. 2002. Accelerating XPath location steps. In SIGMOD Conference. ACM, 109--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dov Harel and Robert Endre Tarjan. 1984. Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput. 13, 2 (1984), 338--355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Michael Jungmair, André Kohn, and Jana Giceva. 2022. Designing an Open Framework for Query Optimization and Compilation. Proc. VLDB Endow. 15, 11 (2022), 2389--2401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Timo Kersten, Viktor Leis, and Thomas Neumann. 2021. Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra. VLDB J. 30, 5 (2021), 883--905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Philip N. Klein and Shay Mozes. 2021. Optimization Algorithms for Planar Graphs. https://planarity.org.Google ScholarGoogle Scholar
  16. Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. ACM, 311--326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In SIGMOD. ACM, 2530--2542.Google ScholarGoogle Scholar
  19. Norman May, Alexander Böhm, and Wolfgang Lehner. 2017. SAP HANA - The Evolution of an In-Memory DBMS from Pure OLAP Processing Towards Mixed Workloads. In BTW (LNI), Vol. P-265. GI, 545--563.Google ScholarGoogle Scholar
  20. Guido Moerkotte. 2020. Building Query Compilers. http://pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf.Google ScholarGoogle Scholar
  21. Raghunath Othayoth Nambiar and Meikel Poess. 2006. The Making of TPC-DS. In VLDB. ACM, 1049--1058.Google ScholarGoogle Scholar
  22. Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (2011), 539--550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Thomas Neumann. 2018. Reasoning in the Presence of NULLs. In ICDE. IEEE Computer Society, 1682--1683.Google ScholarGoogle Scholar
  24. Thomas Neumann. 2020. Linear Time Liveness Analysis. https://databasearchitects.blogspot.com/2020/04/linear-time-liveness-analysis.html.Google ScholarGoogle Scholar
  25. Thomas Neumann. 2020. Taming Deep Recursion. https://databasearchitects.blogspot.com/2020/11/taming-deep-recursion.html.Google ScholarGoogle Scholar
  26. Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR. www.cidrdb.org.Google ScholarGoogle Scholar
  27. Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW (LNI), Vol. P-241. GI, 383--402.Google ScholarGoogle Scholar
  28. Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In SIGMOD. ACM, 677--692.Google ScholarGoogle Scholar
  29. Patrick E. O'Neil, Elizabeth J. O'Neil, Shankar Pal, Istvan Cseri, Gideon Schaller, and Nigel Westbury. 2004. ORDPATHs: Insert-Friendly XML Node Labels. In SIGMOD Conference. ACM, 903--908.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ravindra Pindikura. 2018. Gandiva Initiative: Improving SQL Performance by 70x. https://www.dremio.com/gandiva-performance-improvements-production-query/.Google ScholarGoogle Scholar
  31. Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. 1979. Access Path Selection in a Relational Database Management System. In SIGMOD Conference. ACM, 23--34.Google ScholarGoogle Scholar
  32. Hesam Shahrokhi and Amir Shaikhha. 2023. Building a Compiled Query Engine in Python. In CC. ACM, 180--190.Google ScholarGoogle Scholar
  33. Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In SIGMOD. ACM, 1907--1922.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Daniel Dominic Sleator. 2011. Submission #860934 - Codeforces. https://codeforces.com/contest/117/submission/860934.Google ScholarGoogle Scholar
  35. Daniel Dominic Sleator and Robert Endre Tarjan. 1983. A Data Structure for Dynamic Trees. J. Comput. Syst. Sci. 26, 3 (1983), 362--391.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel Dominic Sleator and Robert Endre Tarjan. 1985. Self-Adjusting Binary Search Trees. J. ACM 32, 3 (1985), 652--686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, and Rhonda Baldwin. 2014. Orca: a modular query optimizer architecture for big data. In SIGMOD. ACM, 337--348.Google ScholarGoogle Scholar
  38. Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In SIGMOD Conference. ACM, 307--322.Google ScholarGoogle Scholar
  39. Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias Mühlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In DBTest@SIGMOD. ACM, 1:1--1:6.Google ScholarGoogle Scholar
  40. Adrian Vogelsgesang, Tobias Mühlbauer, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2019. Domain Query Optimization: Adapting the General-Purpose Database System Hyper for Tableau Workloads. In BTW (LNI), Vol. P-289. Gesellschaft für Informatik, Bonn, 313--333.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)10

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader