skip to main content
research-article

Optimization of conjunctive predicates for main memory column stores

Published:01 August 2016Publication History
Skip Abstract Section

Abstract

Optimization of queries with conjunctive predicates for main memory databases remains a challenging task. The traditional way of optimizing this class of queries relies on predicate ordering based on selectivities or ranks. However, the optimization of queries with conjunctive predicates is a much more challenging task, requiring a holistic approach in view of (1) an accurate cost model that is aware of CPU architectural characteristics such as branch (mis)prediction, (2) a storage layer, allowing for a streamlined query execution, (3) a common subexpression elimination technique, minimizing column access costs, and (4) an optimization algorithm able to pick the optimal plan even in presence of a small (bounded) estimation error. In this work, we embrace the holistic approach, and show its superiority experimentally.

Current approaches typically base their optimization algorithms on at least one of two assumptions: (1) the predicate selectivities are assumed to be independent, (2) the predicate costs are assumed to be constant. Our approach is not based on these assumptions, as they in general do not hold.

References

  1. D. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In ICDE 2007, pages 466--475, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. A. Blakeley, W. J. McKenna, and G. Graefe. Experiences building the open oodb query optimizer. In SIGMOD, volume 22, pages 287--296, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya. Towards estimation error guarantees for distinct values. In PODS, pages 268--279, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Christodoulakis. Implications of certain assumptions in database performance evauation. TODS, pages 163--186, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. College of Natural Resources Colorado State University. Forest dataset. http://kdd.ics.uci.edu/databases/covertype/covertype.data.html.Google ScholarGoogle Scholar
  6. T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, et al. Introduction to algorithms. MIT press Cambridge, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode, M. Garofalakis, P. Haas, and C. Jermaine. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. NOW Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD, pages 1243--1254, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In VLDB, pages 541--550, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: a main memory hybrid storage engine. PVLDB, pages 105--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates, volume 22. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Ibaraki and T. Kameda. On the optimal nesting order for computing n-relational joins. TODS, pages 482--502, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. IBM. Soliddb. http://www.ibm.com/software/data/soliddb.Google ScholarGoogle Scholar
  14. R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. VLDB, pages 622--634, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, pages 1496--1499, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C.-C. Kanne and G. Moerkotte. Histograms reloaded: The merits of bucket diversity. In SIGMOD, pages 663--674, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kemper and G. Moerkotte. Advanced query processing in object bases using access support relations. In VLDB, pages 290--301, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Kemper, G. Moerkotte, and M. Steinbrunn. Optimizing boolean expressions in object bases. In VLDB, pages 79--90, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In VLDB, pages 128--137, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? VLDB, pages 204--215, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Markl, P. J. Haas, M. Kutsch, N. Megiddo, U. Srivastava, and T. M. Tran. Consistent selectivity estimation via maximum entropy. The VLDB journal, 16(1):55--76, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Markl, G. Lohman, and V. Raman. LEO: An autonomic query optimizer for DB2. IBM Systems Journal, 42(1):98--106, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Moerkotte. Building Query Compiler. 2014. pi3.informatik.uni-mannheim.de/~moer/querycompiler.pdf.Google ScholarGoogle Scholar
  25. G. Moerkotte, M. Montag, A. Repetti, and G. Steidl. Proximal operator of quotient functions with application to a feasibility problem in query optimization. Journal of Computational and Applied Mathematics, 285:243--255, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. VLDB, pages 982--993, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Munagala, S. Babu, R. Motwani, and J. Widom. The pipelined set cover problem. In Database Theory-ICDT 2005, pages 83--98. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, pages 539--550, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Neumann, S. Helmer, and G. Moerkotte. On the optimal ordering of maps and selections under factorization. In ICDE, pages 490--501, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Oracle. TimesTen In-Memory Database. http://www.oracle.com/technetwork/database/database-technologies/timesten/overview/index.html.Google ScholarGoogle Scholar
  31. K. A. Ross. Conjunctive selection conditions in main memory. In SIGMOD, pages 109--120, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SAP. In-Memory Computing (SAP HANA). http://www.sap.com/pc/tech/in-memory-computing-hana/software/overview/index.html.Google ScholarGoogle Scholar
  33. S. Setzer, G. Steidl, T. Teuber, and G. Moerkotte. Approximation related to quotient functionals. Journal of Approximation Theory, pages 545--558, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, pages 33--40, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Tzoumas, A. Deshpande, and C. Jensen. Efficiently adapting graphical models for selectivity estimation. VLDB Journal, 22:3--27, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. VoltDB. In-memory database. http://www.voltdb.com.Google ScholarGoogle Scholar
  37. M. Zukowski, M. Van de Wiel, and P. Boncz. Vectorwise: A vectorized analytical dbms. In ICDE, pages 1349--1350, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimization of conjunctive predicates for main memory column stores
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 9, Issue 12
        August 2016
        345 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2016
        Published in pvldb Volume 9, Issue 12

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader