Abstract
In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join queries over distributed streams. We observe that by partitioning streams into substreams we can significantly reduce the communication cost and hence propose a novel partition-based join scheme – PMJoin. A few partitioning techniques are studied. To generate the query plan for each substream, a heuristic algorithm is proposed based on a rate-based model. Results from an extensive experimental study show that our techniques can sufficiently reduce the communication cost.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: CIDR (2005)
Ahmad, Y., Çetintemel, U.: Networked query processing for distributed stream-based applications. In: VLDB (2004)
Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. (1988)
Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)
Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: SIGMOD (2004)
Babu, S., et al.: Adaptive ordering of pipelined stream filters. In: SIGMOD (2004)
Bernstein, P.A., et al.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst (1981)
DeWitt, D.J., Gerber, R.H.: Multiprocessor hash-based join algorithms. In: VLDB (1985)
Epstein, R., Stonebraker, M., Wong, E.: Distributed query processing in a relational data base system. In: SIGMOD (1978)
Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)
Kang, J., et al.: Evaluating window joins over unbounded streams. In: ICDE (2003)
Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. (2000)
Lohman, G.M., et al.: Query processing in r*. In: Query Processing in Database Systems, Springer, Heidelberg (1985)
Madden, S., et al.: Continuously adaptive continuous queries over streams. In: SIGMOD (2002)
Shasha, D., Wang, J.T.-L.: Optimizing equijoin queries in distributed databases where relations are hash partitioned. ACM Trans. Database Syst (1991)
Sidell, J., et al.: Data replication in mariposa. In: ICDE (1996)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)
Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)
Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Trans. Database Syst. (1997)
Yu, C.T., et al.: Partition strategy for distributed query processing in fast local networks. IEEE Trans. Software Eng. (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, Y., Yan, Y., Yu, F., Zhou, A. (2006). PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_24
Download citation
DOI: https://doi.org/10.1007/11733836_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)