Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses

Furtado, Pedro

doi:10.1007/978-3-540-30076-2_4

Pedro Furtado¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3181))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

464 Accesses
15 Citations

Abstract

Data warehouses (DW) with enormous quantities of data put major performance and scalability challenges. The Node-Partitioned Data Warehouse (NPDW) divides the DW into cheap computer nodes for scalability. Partitioning and data placement strategies are relevant to the performance of complex queries on the NPDW. In this paper we propose a partitioning placement and join processing strategy to boost the performance of costly joins in NPDW, compare alternative strategies using the performance evaluation benchmark TPC-H and draw conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on Database Systems 13(3), 263–304 (1988)
Article Google Scholar
Bernstein, P.A., et al.: Query Processing in a System for Distributed Databases (SDD-l). ACM Trans. DB Sys. 6(4), 602–625 (1981)
Article MATH Google Scholar
Chen, Hao, Liu, C.: An Efficient Algorithm for Processing Distributed Queries Using Partition Dependency. In: Int’l Conf. on Par. and Distr. Sys., ICPADS 2000, pp. 339–346 (2000)
Google Scholar
Copeland, G.P., Alexander, W., Boughter, E.E., Keller, T.W.: Data Placement In Bubba. In: SIGMOD Conference, pp. 99–108 (1988)
Google Scholar
DeWitt, D.J., Gerber, R.: Multiprocessor Hash-Based Join Algorithms. In: Proceedings of the 11th Conference on Very Large Databases, Morgan Kaufman pubs, Stockholm
Google Scholar
Hua, K.A., Lee, C.: An Adaptive Data Placement Scheme for Parallel Database Computer Systems. In: Proc. VLDB Conf., Brisbane, Australia (1990)
Google Scholar
Kitsuregawa, M., Tanaka, H., Motooka, T.: Application of hash to database machine and its architecture. New Generation Computing 1(1), 66–74 (1983)
Article Google Scholar
Liu, C., Chen, H.: A Hash Partition Strategy for Distributed Query Processing. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)
Chapter Google Scholar
Liu, C., Chen, H., Krueger, W.: A Distributed Query Processing Strategy Using Placement Dependency. In: Proc. 12th Int’l Conf. on Data Eng, February 1996, pp. 477–484 (1996)
Google Scholar
Livny, Miron, Khoshafian, S., Boral, H.: Multi-Disk Management Algorithms. In: Procs. Of ACM SIGMETRICS 1987, pp. 69–77 (1987)
Google Scholar
Rao, J., Zhang, C., Megiddo, N., Lohman, G.M.: Automating physical database design in a parallel database. In: SIGMOD Conference 2002, pp. 558–569 (2002)
Google Scholar
Sacca, D., Wiederhold, G.: Database Partitioning in a Cluster of Processors. ACM TODS 10(1), 29–56 (1985)
Article MATH Google Scholar
Shasha, D., Wang, T.-L.: Optimizing Equijoin Queries In Distributed Databases Where Relations Are Hash Partitioned. ACM Transactions on Database System 16(2), 279–308 (1991)
Article MathSciNet Google Scholar
Teradata Corporation. Database Computer System Concepts and Facilities. Document C02- 0001-01, Teradata Corporation, Los Angeles (October 1984)
Google Scholar
Yu, C., Guh, K., Brill, D., Chen, A.: Partition strategy for distributed query processing in fast local networks. IEEE Trans. on Software Eng. 15(6), 780–793 (1989)
Article Google Scholar
Zhou, S., Williams, M.H.: Data placement in parallel database systems. In: Parallel Database Techniques, IEEE Computer Society Press, Los Alamitos (1997)
Google Scholar
Transaction Processing Council Benchmarks, http://www.tpc.org
Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning Key Selection for a Shared-Nothing Parallel Database System IBM Research Report RC 19820 (87739) 11/10/94, T. J. Watson Research Center, Yorktown Heights, NY (October 1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Centro de Informática e Sistemas, Engenharia Informática, Univ Coimbra (DEI /CISUC),
Pedro Furtado

Authors

Pedro Furtado
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, 606-8501, Sakyo, Kyoto, Japan
Yahiko Kambayashi
I.B.M. India Research Lab,, India
Mukesh Mohania
Institute for Application Oriented Knowledge Processing (FAW), Johannes Kepler University Linz, Austria
Wolfram Wöß

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furtado, P. (2004). Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2004. Lecture Notes in Computer Science, vol 3181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30076-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-30076-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22937-7
Online ISBN: 978-3-540-30076-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics