Skip to main content

PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Included in the following conference series:

Abstract

In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join queries over distributed streams. We observe that by partitioning streams into substreams we can significantly reduce the communication cost and hence propose a novel partition-based join scheme – PMJoin. A few partitioning techniques are studied. To generate the query plan for each substream, a heuristic algorithm is proposed based on a rate-based model. Results from an extensive experimental study show that our techniques can sufficiently reduce the communication cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: CIDR (2005)

    Google Scholar 

  2. Ahmad, Y., Çetintemel, U.: Networked query processing for distributed stream-based applications. In: VLDB (2004)

    Google Scholar 

  3. Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. (1988)

    Google Scholar 

  4. Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)

    Google Scholar 

  5. Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: SIGMOD (2004)

    Google Scholar 

  6. Babu, S., et al.: Adaptive ordering of pipelined stream filters. In: SIGMOD (2004)

    Google Scholar 

  7. Bernstein, P.A., et al.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst (1981)

    Google Scholar 

  8. DeWitt, D.J., Gerber, R.H.: Multiprocessor hash-based join algorithms. In: VLDB (1985)

    Google Scholar 

  9. Epstein, R., Stonebraker, M., Wong, E.: Distributed query processing in a relational data base system. In: SIGMOD (1978)

    Google Scholar 

  10. Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)

    Google Scholar 

  11. Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)

    MATH  Google Scholar 

  12. Kang, J., et al.: Evaluating window joins over unbounded streams. In: ICDE (2003)

    Google Scholar 

  13. Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. (2000)

    Google Scholar 

  14. Lohman, G.M., et al.: Query processing in r*. In: Query Processing in Database Systems, Springer, Heidelberg (1985)

    Google Scholar 

  15. Madden, S., et al.: Continuously adaptive continuous queries over streams. In: SIGMOD (2002)

    Google Scholar 

  16. Shasha, D., Wang, J.T.-L.: Optimizing equijoin queries in distributed databases where relations are hash partitioned. ACM Trans. Database Syst (1991)

    Google Scholar 

  17. Sidell, J., et al.: Data replication in mariposa. In: ICDE (1996)

    Google Scholar 

  18. Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)

    Google Scholar 

  19. Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)

    Google Scholar 

  20. Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Trans. Database Syst. (1997)

    Google Scholar 

  21. Yu, C.T., et al.: Partition strategy for distributed query processing in fast local networks. IEEE Trans. Software Eng. (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, Y., Yan, Y., Yu, F., Zhou, A. (2006). PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_24

Download citation

  • DOI: https://doi.org/10.1007/11733836_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33337-1

  • Online ISBN: 978-3-540-33338-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics