Abstract
We demonstrate the most important new feature of SQL:2023, namely SQL/PGQ, which eases querying graphs using SQL by introducing new syntax for pattern matching and (shortest) path-finding. We show how support for SQL/PGQ can be integrated into an RDBMS, specifically in the DuckDB system, using an extension module called DuckPGQ. As such, we also demonstrate the use of the DuckDB extensibility mechanism, which allows us to add new functions, data types, operators, optimizer rules, storage systems, and even parsers to DuckDB. We also describe the new data structures and algorithms that the DuckPGQ module is based on, and how they are injected into SQL plans.
While the demonstrated DuckPGQ extension module is lean and efficient, we sketch a roadmap to (i) improve its performance through new algorithms (factorized and WCOJ) and better parallelism and (ii) extend its functionality to scenarios beyond SQL, e.g., building and analyzing Graph Neural Networks.
- Daniel Abadi, Peter A. Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013. The Design and Implementation of Modern Column-Oriented Database Systems. Found. Trends Databases 5, 3 (2013), 197--280.Google ScholarCross Ref
- Renzo Angles et al. 2017. Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv. (2017).Google Scholar
- Renzo Angles et al. 2018. G-CORE: A Core for Future Graph Query Languages. In SIGMOD 2018.Google Scholar
- Renzo Angles et al. 2021. PG-Keys: Keys for Property Graphs. In SIGMOD 2021.Google Scholar
- Maciej Besta et al. 2019. Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries. CoRR (2019).Google Scholar
- Peter A. Boncz et al. 2020. FSST: Fast Random Access String Compression. PVLDB 13, 11 (2020), 2649--2661.Google ScholarDigital Library
- Alin Deutsch et al. 2022. Graph Pattern Matching in GQL and SQL/PGQ. In SIGMOD 2022.Google Scholar
- Till Döhmen et al. 2017. Multi-Hypothesis CSV Parsing. In SSDBM.Google Scholar
- Orri Erling et al. 2015. LDBC Social Network Benchmark: Interactive Workload. In SIGMOD 2015.Google Scholar
- Nadime Francis et al. 2023. A Researcher's Digest of GQL (Invited Talk). In ICDT 2023.Google Scholar
- Michael J. Freitag and Thomas Neumann. 2019. Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates. In CIDR 2019.Google Scholar
- Per Fuchs et al. 2020. EdgeFrame: Worst-Case Optimal Joins for Graph-Pattern Matching in Spark. In GRADES-NDA at SIGMOD 2020.Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR 2017.Google Scholar
- André Kohn et al. 2022. DuckDB-Wasm: Fast Analytical Processing for the Web. PVLDB 15, 12 (2022).Google Scholar
- Laurens Kuiper and Hannes Mühleisen. 2023. These Rows Are Made for Sorting and That's Just What We'll Do. In ICDE 2023.Google ScholarCross Ref
- Viktor Leis et al. 2014. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD 2014.Google Scholar
- Viktor Leis et al. 2015. Efficient Processing of Window Functions in Analytical SQL Queries. PVLDB 8, 10 (2015), 1058--1069.Google Scholar
- Panagiotis Liakos et al. 2022. Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. PVLDB (2022).Google Scholar
- Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW 2015 (LNI), Vol. P-241. GI, 383--402.Google Scholar
- Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In SIGMOD 2018.Google Scholar
- Adam Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. CoRR abs/1912.01703 (2019).Google Scholar
- Pedro Pedreira et al. 2022. Velox: Meta's Unified Execution Engine. PVLDB 15, 12 (2022), 3372--3384.Google Scholar
- Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In SIGMOD 2019.Google ScholarDigital Library
- Gábor Szárnyas et al. 2022. The LDBC Social Network Benchmark: Business Intelligence Workload. PVLDB (2022).Google Scholar
- Daniel ten Wolde et al. 2023. DuckPGQ: Efficient property graph queries in an analytical RDBMS. In CIDR 2023.Google Scholar
- Manuel Then et al. 2014. The More the Merrier: Efficient Multi-Source Graph Traversal. PVLDB (2014).Google Scholar
- Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. SuperScalar RAM-CPU Cache Compression. In ICDE 2006.Google Scholar
Recommendations
Graph Pattern Matching in GQL and SQL/PGQ
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataAs graph databases become widespread, the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) have approved a project to create GQL, a standard property graph query language. This complements the SQL/...
Comments