Skip to main content
Log in

An efficient processing of a chain join with the minimum communication cost in distributed database systems

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m 2 n 2+m 3 n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the “eigenorder” of a “chain” join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a “simple” query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m 2 n 2+m 3 n) toO(mn+m 2). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P.M.G. Apers, A. Hevner, and S.B. Yao, “Optimization Algorithms for Distributed Queries,”IEEE Transactions on Software Engineering, SE-9(1), pp. 57–68, 1983.

    Google Scholar 

  2. Y. Bartal, A. Fiat, and Y. Rabani, “Competitive Algorithms for Distributed Data Management,”24th Annual ACM Symposium on the Theory of Computing, pp. 39–49, 1992.

  3. P.A. Bernstein and D. Chiu, “Using Semi-Joins to Solve Relational Queries,”Journal of ACM, 28(1), pp. 25–40, 1981.

    Google Scholar 

  4. P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve, and J.B. Rothe, “Query Processing in a System for Distributed Database (SDD-1),”ACM Transaction on Database Systems, 6(4), pp. 602–625, 1981.

    Google Scholar 

  5. J.A. Bondy and U.S.R. Murty,Graph Theory with Applications, The Macmillan, 1978.

  6. D. Chiu, P.A. Bernstein, and Y. Ho, “Optimizing Chain Queries in a Distributed Database System,”SIAM Journal on Computing, 13(1), pp. 116–134, 1984.

    Google Scholar 

  7. M.-S. Chen and P.S. Yu, “Interleaving a Join Sequence with Semijoins in Distributed Query Processing,”IEEE Transactions on Parallel and Distributed Systems, 3(5), pp. 611–621, 1992.

    Google Scholar 

  8. M.-S. Chen and P.S. Yu, “Using Join Operations as Reducers in Distributed Query Processing,”Databases in Parallel and Distributed Systems, pp. 116–123, 1990.

  9. T.H. Cormen, C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms, The MIT press, 1990.

  10. C.J. Date,An Introduction to Database System, 2 Addision-Wesley, 1982.

  11. S. Ganguly, W. Hasan, and R. Krishnamurthy, “Query Optimization for Parallel Execution,”SIGMOD Record, 21(2), pp. 9–18, 1992.

    Google Scholar 

  12. M.R. Garey and D.S. Johnson,computers and Intratability: a guide to the theory of NP-Completeness, W. H. Freeman and Company, 1978.

  13. A.R. Hevner and S.B. Yao, “Query Processing in Distributed Database Systems,”IEEE Transactions on Software Engineering, SE-5(3), pp. 177–187, 1979.

    Google Scholar 

  14. T. Ibaraki and T. Kameda, “On the Optimal Nesting Order for Computing N-Relations Joins,”ACM Transactions Database Systems, 9, pp. 482–502, 1984.

    Google Scholar 

  15. Y.E. Ioannidis and S. Christodoulakis, “On the Propagation of Errors in the Size of Join Results,”Proceedings of the 1991 SIGMOD International Conference on Management of Data, pp. 268–277, 1991.

  16. R. Krishnamurthy, H. Boral, and C. Zaniolo, “Optimization of Nonrecursive Queries,”Proceedings of VLDB 86, pp. 1282–137, 1986.

  17. H. Lu, M.C. Shan, and K.L. Tan, “Optimization of Multi-Way Join Queries for Parallel Execution,”Proceedings of VLDB 91, pp. 549–560, 1991.

  18. S. Pramanik and D. Vineyard, “Optimizing Join Queries in Distributed Databases,”IEEE Transactions on Software Engineering, 14(9), pp. 1319–1326, 1988.

    Google Scholar 

  19. D. Maier,Theory of Relational Databases, Computer Science Press, 1993.

  20. Y. Mansour and B. Patt-Shamir, “Greedy Packet Scheduling on Shortest Paths,”Proceedings of the 10th Annual ACM Symposium on Principles of Distributed Computing, pp. 165–176, 1991.

  21. M.W. Orlowski, “On Optimisation of Joins in Distributed Database Systems,”Future Databases 92, World Scientific, pp. 106–114, 1992.

  22. M.W. Orlowski,Private Communication.

  23. D. Shasha and T.L. Wang, “Optimizing Equijoin Queries in Distributed Databases Where Relations are Hash Partitioned,”ACM Transactions on Database Systems, 16(2), pp. 279–308, 1991.

    Google Scholar 

  24. A. Swami and A. Gupta, “Optimization of Large Join Queries: Combining Heuristics and Combinatorial Techniques,”Proceedings of SIGMOD 89, pp. 367–376, 1989.

    Google Scholar 

  25. A.E. Taylor,Advanced Calculus, Ginn, 1955.

  26. M. Templeton, et al., “Mermaid-Experiences with network operation,”Proceedings of IEEE Data Engineering Conference, 1986.

  27. J.D. Ullman,Principles of Database Systems, Computer Science Press, Rockville, MD, 1982.

    Google Scholar 

  28. C.P. Wang, “The Complexity of Processing Tree Queries in Distributed Databases,”2nd IEEE Symposium on Parallel and Distributed Processing, pp. 604–611, 1990.

  29. C.P. Wang, V.O.K. Li, and A.L.P. Chen, “One-shot Semi-Join execution strategies for processing distributed queries,”7th IEEE Data Engineering Conference, pp. 756–763, 1991.

  30. C.P. Wang, A.L.P. Chen, and S.-C. Shyu, “A Parallel Execution Method for Minimizing Distributed Query Response Time,”IEEE Transactions on Parallel and Distributed Systems, 3(3), pp. 325–333, 1992.

    Google Scholar 

  31. E. Wong, “Dynamic Rematerialization: Processing Distributed Queries Using Redundant Data,”IEEE Transactions on Software Engineering, SE-9(3), pp. 228–232, 1983.

    Google Scholar 

  32. C.T. Yu and C.C. Chang, “Distributed Query Processing,”ACM Computing Surveys, 16(4), 1984.

  33. C.T. Yu, Z.M. Ozsoyoglu and K. Lam, “Optimization of Distributed Tree Queries,”Journal of Computer and System Science, 29, pp. 399–433, 1984.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Editor: Peter Apers

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, X., Orlowska, M.E. An efficient processing of a chain join with the minimum communication cost in distributed database systems. Distrib Parallel Databases 3, 69–83 (1995). https://doi.org/10.1007/BF01263657

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01263657

Keywords

Navigation