Skip to main content

A Cache-Based Semi-Stream Join to deal with Unmatched Stream Data

  • Conference paper
  • First Online:
  • 1497 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Abstract

In Data Stream Management System (DSMS) semi-stream processing has become a popular area of research due to the high demand of applications (e.g. real-time data warehousing) for up-to-date information. One common operation in semi-stream processing is joining of incoming stream with disk-based master data. A recent algorithm called CACHEJOIN was proposed to implement this join operation. However, CACHEJOIN loads entire stream data into join module and consumes all its resources without eliminating those stream tuples which have no relevant tuples in disk-based master data. Due to this, the performance of CACHEJOIN remains suboptimal. In this paper we present a revised version of CACHEJOIN called Improved CACHEJOIN which removes this limitation. This reduces the processing cost for the new algorithm and as a consequence, the new algorithm outperforms existing CACHEJOIN significantly. In order to quantify the performance differences, we compare both algorithms using both synthetic and real datasets with a known skewed distribution. We also present the cost model for our new algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, C.: The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion (2006)

    Google Scholar 

  2. Bornea, M.A., Deligiannakis, A., Kotidis, Y., Vassalos, V.: Semi-streamed index join for near-real time execution of ETL transformations. In: IEEE 27th International Conference on Data Engineering (ICDE 2011), pp. 159–170, April 2011

    Google Scholar 

  3. Chakraborty, A., Singh, A.: A partition-based approach to support streaming updates over persistent data in an active datawarehouse. In: IPDPS 2009: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–11. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  4. Karakasidis, A., Vassiliadis, P., Pitoura, E.: ETL queues for active data warehousing. In: IQIS 2005: Proceedings of the 2nd International Workshop on Information Quality in Information Systems, pp. 28–39. ACM, New York (2005)

    Google Scholar 

  5. Naeem, M.A., Dobbie, G., Weber, G.: An event-based near real-time data integration architecture. In: EDOCW 2008: Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops, pp. 401–404. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  6. Naeem, M.A., Dobbie, G., Weber, G.: A lightweight stream-based join with limited resource consumption. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 431–442. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Naeem, M.A., Dobbie, G., Weber, G., Alam, S.: R-MESHJOIN for near-real-time data warehousing. In: DOLAP 2010: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP. ACM, Toronto (2010)

    Google Scholar 

  8. Pandit, S., Chau, D.H., Wang, S., Faloutsos, C.: Netprobe: a fast and scalable system for fraud detection in online auction networks. In: Proceedings of the 16th International Conference on World Wide Web, pp. 201–210. ACM (2007)

    Google Scholar 

  9. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.E.: Supporting streaming updates in an active data warehouse. In: ICDE 2007: Proceedings of the 23rd International Conference on Data Engineering, Istanbul, Turkey, pp. 476–485 (2007)

    Google Scholar 

  10. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.: Meshing streaming updates with persistent data in an active data warehouse. IEEE Trans. on Knowl. and Data Eng. 20(7), 976–991 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Asif Naeem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Naeem, M.A., Bajwa, I.S., Jamil, N. (2015). A Cache-Based Semi-Stream Join to deal with Unmatched Stream Data. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19548-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19547-6

  • Online ISBN: 978-3-319-19548-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics