Abstract
This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m 2 n 2+m 3 n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the “eigenorder” of a “chain” join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a “simple” query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m 2 n 2+m 3 n) toO(mn+m 2). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries.
Similar content being viewed by others
References
P.M.G. Apers, A. Hevner, and S.B. Yao, “Optimization Algorithms for Distributed Queries,”IEEE Transactions on Software Engineering, SE-9(1), pp. 57–68, 1983.
Y. Bartal, A. Fiat, and Y. Rabani, “Competitive Algorithms for Distributed Data Management,”24th Annual ACM Symposium on the Theory of Computing, pp. 39–49, 1992.
P.A. Bernstein and D. Chiu, “Using Semi-Joins to Solve Relational Queries,”Journal of ACM, 28(1), pp. 25–40, 1981.
P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve, and J.B. Rothe, “Query Processing in a System for Distributed Database (SDD-1),”ACM Transaction on Database Systems, 6(4), pp. 602–625, 1981.
J.A. Bondy and U.S.R. Murty,Graph Theory with Applications, The Macmillan, 1978.
D. Chiu, P.A. Bernstein, and Y. Ho, “Optimizing Chain Queries in a Distributed Database System,”SIAM Journal on Computing, 13(1), pp. 116–134, 1984.
M.-S. Chen and P.S. Yu, “Interleaving a Join Sequence with Semijoins in Distributed Query Processing,”IEEE Transactions on Parallel and Distributed Systems, 3(5), pp. 611–621, 1992.
M.-S. Chen and P.S. Yu, “Using Join Operations as Reducers in Distributed Query Processing,”Databases in Parallel and Distributed Systems, pp. 116–123, 1990.
T.H. Cormen, C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms, The MIT press, 1990.
C.J. Date,An Introduction to Database System, 2 Addision-Wesley, 1982.
S. Ganguly, W. Hasan, and R. Krishnamurthy, “Query Optimization for Parallel Execution,”SIGMOD Record, 21(2), pp. 9–18, 1992.
M.R. Garey and D.S. Johnson,computers and Intratability: a guide to the theory of NP-Completeness, W. H. Freeman and Company, 1978.
A.R. Hevner and S.B. Yao, “Query Processing in Distributed Database Systems,”IEEE Transactions on Software Engineering, SE-5(3), pp. 177–187, 1979.
T. Ibaraki and T. Kameda, “On the Optimal Nesting Order for Computing N-Relations Joins,”ACM Transactions Database Systems, 9, pp. 482–502, 1984.
Y.E. Ioannidis and S. Christodoulakis, “On the Propagation of Errors in the Size of Join Results,”Proceedings of the 1991 SIGMOD International Conference on Management of Data, pp. 268–277, 1991.
R. Krishnamurthy, H. Boral, and C. Zaniolo, “Optimization of Nonrecursive Queries,”Proceedings of VLDB 86, pp. 1282–137, 1986.
H. Lu, M.C. Shan, and K.L. Tan, “Optimization of Multi-Way Join Queries for Parallel Execution,”Proceedings of VLDB 91, pp. 549–560, 1991.
S. Pramanik and D. Vineyard, “Optimizing Join Queries in Distributed Databases,”IEEE Transactions on Software Engineering, 14(9), pp. 1319–1326, 1988.
D. Maier,Theory of Relational Databases, Computer Science Press, 1993.
Y. Mansour and B. Patt-Shamir, “Greedy Packet Scheduling on Shortest Paths,”Proceedings of the 10th Annual ACM Symposium on Principles of Distributed Computing, pp. 165–176, 1991.
M.W. Orlowski, “On Optimisation of Joins in Distributed Database Systems,”Future Databases 92, World Scientific, pp. 106–114, 1992.
M.W. Orlowski,Private Communication.
D. Shasha and T.L. Wang, “Optimizing Equijoin Queries in Distributed Databases Where Relations are Hash Partitioned,”ACM Transactions on Database Systems, 16(2), pp. 279–308, 1991.
A. Swami and A. Gupta, “Optimization of Large Join Queries: Combining Heuristics and Combinatorial Techniques,”Proceedings of SIGMOD 89, pp. 367–376, 1989.
A.E. Taylor,Advanced Calculus, Ginn, 1955.
M. Templeton, et al., “Mermaid-Experiences with network operation,”Proceedings of IEEE Data Engineering Conference, 1986.
J.D. Ullman,Principles of Database Systems, Computer Science Press, Rockville, MD, 1982.
C.P. Wang, “The Complexity of Processing Tree Queries in Distributed Databases,”2nd IEEE Symposium on Parallel and Distributed Processing, pp. 604–611, 1990.
C.P. Wang, V.O.K. Li, and A.L.P. Chen, “One-shot Semi-Join execution strategies for processing distributed queries,”7th IEEE Data Engineering Conference, pp. 756–763, 1991.
C.P. Wang, A.L.P. Chen, and S.-C. Shyu, “A Parallel Execution Method for Minimizing Distributed Query Response Time,”IEEE Transactions on Parallel and Distributed Systems, 3(3), pp. 325–333, 1992.
E. Wong, “Dynamic Rematerialization: Processing Distributed Queries Using Redundant Data,”IEEE Transactions on Software Engineering, SE-9(3), pp. 228–232, 1983.
C.T. Yu and C.C. Chang, “Distributed Query Processing,”ACM Computing Surveys, 16(4), 1984.
C.T. Yu, Z.M. Ozsoyoglu and K. Lam, “Optimization of Distributed Tree Queries,”Journal of Computer and System Science, 29, pp. 399–433, 1984.
Author information
Authors and Affiliations
Additional information
Editor: Peter Apers
Rights and permissions
About this article
Cite this article
Lin, X., Orlowska, M.E. An efficient processing of a chain join with the minimum communication cost in distributed database systems. Distrib Parallel Databases 3, 69–83 (1995). https://doi.org/10.1007/BF01263657
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01263657