Abstract
Data warehouses (DW) with enormous quantities of data put major performance and scalability challenges. The Node-Partitioned Data Warehouse (NPDW) divides the DW into cheap computer nodes for scalability. Partitioning and data placement strategies are relevant to the performance of complex queries on the NPDW. In this paper we propose a partitioning placement and join processing strategy to boost the performance of costly joins in NPDW, compare alternative strategies using the performance evaluation benchmark TPC-H and draw conclusions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on Database Systems 13(3), 263–304 (1988)
Bernstein, P.A., et al.: Query Processing in a System for Distributed Databases (SDD-l). ACM Trans. DB Sys. 6(4), 602–625 (1981)
Chen, Hao, Liu, C.: An Efficient Algorithm for Processing Distributed Queries Using Partition Dependency. In: Int’l Conf. on Par. and Distr. Sys., ICPADS 2000, pp. 339–346 (2000)
Copeland, G.P., Alexander, W., Boughter, E.E., Keller, T.W.: Data Placement In Bubba. In: SIGMOD Conference, pp. 99–108 (1988)
DeWitt, D.J., Gerber, R.: Multiprocessor Hash-Based Join Algorithms. In: Proceedings of the 11th Conference on Very Large Databases, Morgan Kaufman pubs, Stockholm
Hua, K.A., Lee, C.: An Adaptive Data Placement Scheme for Parallel Database Computer Systems. In: Proc. VLDB Conf., Brisbane, Australia (1990)
Kitsuregawa, M., Tanaka, H., Motooka, T.: Application of hash to database machine and its architecture. New Generation Computing 1(1), 66–74 (1983)
Liu, C., Chen, H.: A Hash Partition Strategy for Distributed Query Processing. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)
Liu, C., Chen, H., Krueger, W.: A Distributed Query Processing Strategy Using Placement Dependency. In: Proc. 12th Int’l Conf. on Data Eng, February 1996, pp. 477–484 (1996)
Livny, Miron, Khoshafian, S., Boral, H.: Multi-Disk Management Algorithms. In: Procs. Of ACM SIGMETRICS 1987, pp. 69–77 (1987)
Rao, J., Zhang, C., Megiddo, N., Lohman, G.M.: Automating physical database design in a parallel database. In: SIGMOD Conference 2002, pp. 558–569 (2002)
Sacca, D., Wiederhold, G.: Database Partitioning in a Cluster of Processors. ACM TODS 10(1), 29–56 (1985)
Shasha, D., Wang, T.-L.: Optimizing Equijoin Queries In Distributed Databases Where Relations Are Hash Partitioned. ACM Transactions on Database System 16(2), 279–308 (1991)
Teradata Corporation. Database Computer System Concepts and Facilities. Document C02- 0001-01, Teradata Corporation, Los Angeles (October 1984)
Yu, C., Guh, K., Brill, D., Chen, A.: Partition strategy for distributed query processing in fast local networks. IEEE Trans. on Software Eng. 15(6), 780–793 (1989)
Zhou, S., Williams, M.H.: Data placement in parallel database systems. In: Parallel Database Techniques, IEEE Computer Society Press, Los Alamitos (1997)
Transaction Processing Council Benchmarks, http://www.tpc.org
Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning Key Selection for a Shared-Nothing Parallel Database System IBM Research Report RC 19820 (87739) 11/10/94, T. J. Watson Research Center, Yorktown Heights, NY (October 1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Furtado, P. (2004). Workload-Based Placement and Join Processing in Node-Partitioned Data Warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2004. Lecture Notes in Computer Science, vol 3181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30076-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-30076-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22937-7
Online ISBN: 978-3-540-30076-2
eBook Packages: Springer Book Archive