Abstract
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries.
Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hash-join algorithm is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the introduced Pipelining hash-join algorithm yields a better performance for multi-join queries. The format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule; larger operands are better off with a linear schedule. The results from the simulation study are confirmed with an analytic model for dataflow query execution.
Similar content being viewed by others
References
P. America (ed.),Proc. PRISMA Workshop Parallel Database Systems, Springer-Verlag: New York, 1991.
P.M.G. Apers, C.A. van den Berg, J. Flokstra, P.W.P.J. Grefen, M.L. Kersten, and A.N. Wilschut, “PRISMA/DB: A parallel main-memory relational DBMS.” To appear in IEEE transactions on Knowledge and Data Engineering.
D. Bitton, D.J. DeWitt and C. Turbyfill, “Benchmarking database systems—A systematic approach,” in M. Schkolnick and C. Thanos (eds.),Proc. 9th Int. Conf. Very Large Data Bases, Florence, Italy VLDB Endowment: Saratoga, CA, 1983.
P. Bodorik and J.S. Riordon, “Heuristic algorithms for distributed query processing,” in S. Jajodia, W. Kim and A. Silberschatz (eds.),Proc. Int. Symposium on Databases Parallel Distributed Systems, Austin, Texas IEEE Press: Montvale, NJ, pp. 107–117, 1988.
H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez, “Prototyping Bubba, a highly parallel database system,IEEE Trans Knowledge Data Eng., Vol. 2, no. 2, pp. 4–24, 1990.
K. Bratbergsengen and T. Gjelsvik, “The development of the CROSS8 and HC16-186 (Database) computers,” in H. Boral and P. Faudemay (eds.),Proc. 6th Int. Workshop Database Machines, Deauville, France, June 1989, Springer-Verlag: New York, pp. 359–372, 1989.
B.W. Char, K.O. Geddes, G.H. Gonnet, M.B. Monager, and S.M. Watt,Maple Reference Manual, WATCOM: Waterloo, Canada, 1988.
D.J. DeWitt and J. Gray, “Parallel database systems: The future of database processing or a passing fad?,”ACM SIGMOD Record, vol. 19, no. 4, pp. 104–112, 1990.
D.J. DeWitt, S. Ghandeharizadeh, D.A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, “The GAMMA database machine project,”IEEE Trans. Knowledge Data Eng., vol. 2, no. 1, pp. 44–62, 1990.
G. Graefe, “Encapsulation of parallelism in the volcano query processing system,” in H. Garcia-Molina, H.V. Jagardish (eds.),Proc. ACM-SIGMOD 1990 Int. Conf. Management Data, Atlantic City, NJ, ACM Press: New York, pp. 102–111.
P.W.P.J. Grefen, A.N. Wilschut, and J. Flokstra, “PRISMA/DB1 User Manual,” Universiteit Twente, Enschede, The Netherlands, Memorandum INF91-06, 1991.
M. Jarke and J. Koch, “Query optimization in database systems,”Comput. Surv., vol. 16, no. 2, pp. 111–152, 1984.
M.L. Kersten, P.M.G. Apers, M.A.W. Houtsma, H.J.A. van Kuijk, and R.L.W. vande Weg, “PRISMA: A Distributed main memory database machine,” inProc. 5th Inter. Workshop Database Machines, Karuizawa, Japan, 1987.
E. van Kuijk, “Semantic query optimization in distributed database systems,” Ph.D. thesis, University of Twente, 1991.
A. Okubo,Diffusion and Ecological Problems: Mathematical Models, Springer-Verlag: New York, 1980.
D.A. Schneider and D.J. DeWitt, “A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment,” in J. Clifford, B. Lindsay and D. Maier (eds.),Proc. ACM-SIGMOD 1989 Inter. Conf. Management Data, Portland, OR, ACM Press: New York, 1989 (Also appeared as ACM SIGMOD Record, vol. 18, no. 2, 1989.)
D.A. Schneider and D.J. Dewitt, “Tradeoffs in processing complex join queries via hashing in multiprocessor database machines,” in D. McLeod, R. Sacks-Davis and H. Schek (eds.),Proc. 16th Int. Conf. Very Large Data Bases, Brisbane, Australia, Morgan Kaufmann: Palo Alto, CA, pp. 469–480, 1990.
P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie and T.G. Price, “Access path selection in a Relational Database Management System,” inProc. ACM-SIGMOD 1979 Int. Conf. Management Data, Boston, MA, pp. 82–93, 1979.
W.B. Teeuw and H.M. Blanken, “Control versus data flow in distributed database machines,” Universiteit Twente, Enschede, The Netherlands, Memorandum INF91-02, 1991.
Teradata Corporation, “Teradata,” DBC/1012 Database Computer Concepts and Facilities,” C02-0001-00, 1983.
A.N. Wilschut, “A model for dataflow query execution in a parallel main-memory environment,” Universiteit Twente, Enschede, The Netherlands, Memorandum INF91-34, 1991.
A.N. Wilschut and P.M.G. Apers, “Pipelining in query execution,” in N. Rishe, S. Navathe, and D. Tal (eds.),Proc. Int. Conf. Databases, Parallel Architectures and their applications, Miami, IEEE Press: Montvale, NJ, 1990.
A.N. Wilschut, P.M.G. Apers, and J. Flokstra, “Parallel query execution in PRISMA/DB,” in P. America (ed.),Proc. PRISMA Workshop Parallel Database Systems, Noordwijk, The Netherlands, Springer-Verlag: New York, 1991.
A.N. Wilschut and P.G. Doucet, “Theoretical studies on animal orientation: A model for kinesis,”Theoret. Biol. vol. 127, pp. 111–125, 1987.
A.N. Wilschut, J. Flokstra, and P.M.G. Apers, “Parallelism in a main-memory system: The performance of PRISMA/DB.,” inProc. 18th Int. Conf. Very Large Data Bases, Vancouver, Canada, 1992.
A.N. Wilschut, P.W.P.J. Grefen, P.M.G. Apers, and M.L. Kersten, “Implementing PRISMA/DB in an OOPL.,” in H. Boral and P. Faudemay (eds.),Proc. 6th Int. Workshop Database Machines, Deauville, France, Springer-Verlag: New York, pp. 359–372, 1989.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wilschut, A.N., Apers, P.M.G. Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1, 103–128 (1993). https://doi.org/10.1007/BF01277522
Issue Date:
DOI: https://doi.org/10.1007/BF01277522