skip to main content
research-article

A scalable, predictable join operator for highly concurrent data warehouses

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans---unaware of each other---compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly when multiple complex queries run at the same time.

We describe an augmentation of traditional query engines that improves join throughput in large-scale concurrent data warehouses. In contrast to the conventional query-at-a-time model, our approach employs a single physical plan that can share I/O, computation, and tuple storage across all in-flight join queries. We use an "always-on" pipeline of non-blocking operators, coupled with a controller that continuously examines the current query mix and performs run-time optimizations. Our design allows the query engine to scale gracefully to large data sets, provide predictable execution times, and reduce contention. In our empirical evaluation, we found that our prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.

References

  1. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In ACM SIGMOD Intl. Conf. on Management of Data, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. J. Abadi, D. S. Myers, D. J. Dewitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Intl. Conf. on Data Engineering, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching over event streams. In Intl. Conf. on Data Engineering, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD Intl. Conf. on Management of Data, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In ACM SIGMOD Intl. Conf. on Management of Data, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: continuous dataflow processing. In ACM SIGMOD Intl. Conf. on Management of Data, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for internet databases. SIGMOD Record, 29(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cieslewicz and A. Ross, Kenneth. Adaptive aggregation on chip multiprocessors. In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. M. Fernandez. Red Brick warehouse: A read-mostly RDBMS for open SMP platforms. In ACM SIGMOD Intl. Conf. on Management of Data, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In ACM SIGMOD Intl. Conf. on Management of Data, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Harizopoulos and A. Ailamaki. StagedDB: Designing database servers for modern hardware. IEEE Data Eng. Bulletin, 28(2), 2005.Google ScholarGoogle Scholar
  12. S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. Qpipe: a simultaneously pipelined relational query engine. In In ACM SIGMOD Intl. Conf. on Management of data, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Liu, S. Parthasarathy, A. Ranganathan, and H. Yang. A generic flow algorithm for shared filter ordering problems. In Symp. on Principles of Database Systems, New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Liu, S. Parthasarathy, A. Ranganathan, and H. Yang. Near-optimal algorithms for shared filter evaluation in data stream systems. In ACM SIGMOD Intl. Conf. on Management of Data, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Madden, M. Shah, M. Hellerstein, Joseph, and V. Raman. Continuously adaptive continuous queries over streams. In ACM SIGMOD Intl. Conf. on Management of Data, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Majumder, R. Rastogi, and S. Vanama. Scalable regular expression matching on data streams. In Intl. Conf. on Data Engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. B. O. Patrick O'Neil and X. Chen. The Star Schema Benchmark. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF, 2007.Google ScholarGoogle Scholar
  18. L. Qiao, V. Raman, F. Reiss, P. Haas, and G. Lohman. Main-memory scan sharing for multi-core CPUs. In Intl. Conf. on Very Large Data Bases, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In Intl. Conf. on Data Engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Roussopoulos, C.-M. Chen, S. Kelley, A. Delis, and Y. Papakonstantinou. The ADMS project: Views R Us. IEEE Data Eng. Bulletin, 18(2), 1995.Google ScholarGoogle Scholar
  21. T. K. Sellis. Multiple-query optimization. ACM Trans. Database Systems, 13(1):23--52, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (it's time for a complete rewrite). In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. TPC benchmark DS (decision support), draft specification, revision 32. http://www.tpc.org/tpcds/spec/tpcds32.pdf.Google ScholarGoogle Scholar
  24. M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative scans: dynamic bandwidth sharing in a DBMS. In Intl. Conf. on Very Large Data Bases, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A scalable, predictable join operator for highly concurrent data warehouses

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader