skip to main content
research-article
Artifacts Available / v1.1

DuckPGQ: Bringing SQL/PGQ to DuckDB

Published:01 August 2023Publication History
Skip Abstract Section

Abstract

We demonstrate the most important new feature of SQL:2023, namely SQL/PGQ, which eases querying graphs using SQL by introducing new syntax for pattern matching and (shortest) path-finding. We show how support for SQL/PGQ can be integrated into an RDBMS, specifically in the DuckDB system, using an extension module called DuckPGQ. As such, we also demonstrate the use of the DuckDB extensibility mechanism, which allows us to add new functions, data types, operators, optimizer rules, storage systems, and even parsers to DuckDB. We also describe the new data structures and algorithms that the DuckPGQ module is based on, and how they are injected into SQL plans.

While the demonstrated DuckPGQ extension module is lean and efficient, we sketch a roadmap to (i) improve its performance through new algorithms (factorized and WCOJ) and better parallelism and (ii) extend its functionality to scenarios beyond SQL, e.g., building and analyzing Graph Neural Networks.

References

  1. Daniel Abadi, Peter A. Boncz, Stavros Harizopoulos, Stratos Idreos, and Samuel Madden. 2013. The Design and Implementation of Modern Column-Oriented Database Systems. Found. Trends Databases 5, 3 (2013), 197--280.Google ScholarGoogle ScholarCross RefCross Ref
  2. Renzo Angles et al. 2017. Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv. (2017).Google ScholarGoogle Scholar
  3. Renzo Angles et al. 2018. G-CORE: A Core for Future Graph Query Languages. In SIGMOD 2018.Google ScholarGoogle Scholar
  4. Renzo Angles et al. 2021. PG-Keys: Keys for Property Graphs. In SIGMOD 2021.Google ScholarGoogle Scholar
  5. Maciej Besta et al. 2019. Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries. CoRR (2019).Google ScholarGoogle Scholar
  6. Peter A. Boncz et al. 2020. FSST: Fast Random Access String Compression. PVLDB 13, 11 (2020), 2649--2661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alin Deutsch et al. 2022. Graph Pattern Matching in GQL and SQL/PGQ. In SIGMOD 2022.Google ScholarGoogle Scholar
  8. Till Döhmen et al. 2017. Multi-Hypothesis CSV Parsing. In SSDBM.Google ScholarGoogle Scholar
  9. Orri Erling et al. 2015. LDBC Social Network Benchmark: Interactive Workload. In SIGMOD 2015.Google ScholarGoogle Scholar
  10. Nadime Francis et al. 2023. A Researcher's Digest of GQL (Invited Talk). In ICDT 2023.Google ScholarGoogle Scholar
  11. Michael J. Freitag and Thomas Neumann. 2019. Every Row Counts: Combining Sketches and Sampling for Accurate Group-By Result Estimates. In CIDR 2019.Google ScholarGoogle Scholar
  12. Per Fuchs et al. 2020. EdgeFrame: Worst-Case Optimal Joins for Graph-Pattern Matching in Spark. In GRADES-NDA at SIGMOD 2020.Google ScholarGoogle Scholar
  13. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR 2017.Google ScholarGoogle Scholar
  14. André Kohn et al. 2022. DuckDB-Wasm: Fast Analytical Processing for the Web. PVLDB 15, 12 (2022).Google ScholarGoogle Scholar
  15. Laurens Kuiper and Hannes Mühleisen. 2023. These Rows Are Made for Sorting and That's Just What We'll Do. In ICDE 2023.Google ScholarGoogle ScholarCross RefCross Ref
  16. Viktor Leis et al. 2014. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD 2014.Google ScholarGoogle Scholar
  17. Viktor Leis et al. 2015. Efficient Processing of Window Functions in Analytical SQL Queries. PVLDB 8, 10 (2015), 1058--1069.Google ScholarGoogle Scholar
  18. Panagiotis Liakos et al. 2022. Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. PVLDB (2022).Google ScholarGoogle Scholar
  19. Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW 2015 (LNI), Vol. P-241. GI, 383--402.Google ScholarGoogle Scholar
  20. Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In SIGMOD 2018.Google ScholarGoogle Scholar
  21. Adam Paszke et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. CoRR abs/1912.01703 (2019).Google ScholarGoogle Scholar
  22. Pedro Pedreira et al. 2022. Velox: Meta's Unified Execution Engine. PVLDB 15, 12 (2022), 3372--3384.Google ScholarGoogle Scholar
  23. Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In SIGMOD 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gábor Szárnyas et al. 2022. The LDBC Social Network Benchmark: Business Intelligence Workload. PVLDB (2022).Google ScholarGoogle Scholar
  25. Daniel ten Wolde et al. 2023. DuckPGQ: Efficient property graph queries in an analytical RDBMS. In CIDR 2023.Google ScholarGoogle Scholar
  26. Manuel Then et al. 2014. The More the Merrier: Efficient Multi-Source Graph Traversal. PVLDB (2014).Google ScholarGoogle Scholar
  27. Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. SuperScalar RAM-CPU Cache Compression. In ICDE 2006.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Article Metrics

    • Downloads (Last 12 months)132
    • Downloads (Last 6 weeks)46

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader