Skip to main content

Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3181))

Included in the following conference series:

Abstract

Data warehouses (DW) with enormous quantities of data put major performance and scalability challenges. The Node-Partitioned Data Warehouse (NPDW) divides the DW into cheap computer nodes for scalability. Partitioning and data placement strategies are relevant to the performance of complex queries on the NPDW. In this paper we propose a partitioning placement and join processing strategy to boost the performance of costly joins in NPDW, compare alternative strategies using the performance evaluation benchmark TPC-H and draw conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on Database Systems 13(3), 263–304 (1988)

    Article  Google Scholar 

  2. Bernstein, P.A., et al.: Query Processing in a System for Distributed Databases (SDD-l). ACM Trans. DB Sys. 6(4), 602–625 (1981)

    Article  MATH  Google Scholar 

  3. Chen, Hao, Liu, C.: An Efficient Algorithm for Processing Distributed Queries Using Partition Dependency. In: Int’l Conf. on Par. and Distr. Sys., ICPADS 2000, pp. 339–346 (2000)

    Google Scholar 

  4. Copeland, G.P., Alexander, W., Boughter, E.E., Keller, T.W.: Data Placement In Bubba. In: SIGMOD Conference, pp. 99–108 (1988)

    Google Scholar 

  5. DeWitt, D.J., Gerber, R.: Multiprocessor Hash-Based Join Algorithms. In: Proceedings of the 11th Conference on Very Large Databases, Morgan Kaufman pubs, Stockholm

    Google Scholar 

  6. Hua, K.A., Lee, C.: An Adaptive Data Placement Scheme for Parallel Database Computer Systems. In: Proc. VLDB Conf., Brisbane, Australia (1990)

    Google Scholar 

  7. Kitsuregawa, M., Tanaka, H., Motooka, T.: Application of hash to database machine and its architecture. New Generation Computing 1(1), 66–74 (1983)

    Article  Google Scholar 

  8. Liu, C., Chen, H.: A Hash Partition Strategy for Distributed Query Processing. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  9. Liu, C., Chen, H., Krueger, W.: A Distributed Query Processing Strategy Using Placement Dependency. In: Proc. 12th Int’l Conf. on Data Eng, February 1996, pp. 477–484 (1996)

    Google Scholar 

  10. Livny, Miron, Khoshafian, S., Boral, H.: Multi-Disk Management Algorithms. In: Procs. Of ACM SIGMETRICS 1987, pp. 69–77 (1987)

    Google Scholar 

  11. Rao, J., Zhang, C., Megiddo, N., Lohman, G.M.: Automating physical database design in a parallel database. In: SIGMOD Conference 2002, pp. 558–569 (2002)

    Google Scholar 

  12. Sacca, D., Wiederhold, G.: Database Partitioning in a Cluster of Processors. ACM TODS 10(1), 29–56 (1985)

    Article  MATH  Google Scholar 

  13. Shasha, D., Wang, T.-L.: Optimizing Equijoin Queries In Distributed Databases Where Relations Are Hash Partitioned. ACM Transactions on Database System 16(2), 279–308 (1991)

    Article  MathSciNet  Google Scholar 

  14. Teradata Corporation. Database Computer System Concepts and Facilities. Document C02- 0001-01, Teradata Corporation, Los Angeles (October 1984)

    Google Scholar 

  15. Yu, C., Guh, K., Brill, D., Chen, A.: Partition strategy for distributed query processing in fast local networks. IEEE Trans. on Software Eng. 15(6), 780–793 (1989)

    Article  Google Scholar 

  16. Zhou, S., Williams, M.H.: Data placement in parallel database systems. In: Parallel Database Techniques, IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  17. Transaction Processing Council Benchmarks, http://www.tpc.org

  18. Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning Key Selection for a Shared-Nothing Parallel Database System IBM Research Report RC 19820 (87739) 11/10/94, T. J. Watson Research Center, Yorktown Heights, NY (October 1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Furtado, P. (2004). Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2004. Lecture Notes in Computer Science, vol 3181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30076-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30076-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22937-7

  • Online ISBN: 978-3-540-30076-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics