Cut-and-Rewind: Extending Query Engine for Continuous Stream Analytics

Chen, Qiming; Hsu, Meichun

doi:10.1007/978-3-662-47804-2_5

Qiming Chen²¹ &
Meichun Hsu²¹

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9260))

469 Accesses
5 Citations

Abstract

Combining data warehousing and stream processing technologies has great potential in offering low-latency data-intensive analytics. Unfortunately, such convergence has not been properly addressed so far. The current generation of stream processing systems is in general built separately from the data warehouse and query engine, which can cause significant overhead in data access and data movement, and is unable to take advantage of the functionalities already offered by the existing data warehouse systems.

In this work we tackle some hard problems in integrating stream analytics capability into the existing query engine. We define an extended SQL query model that unifies queries over both static relations and dynamic streaming data, and develop techniques to extend query engines to support the unified model. We propose the cut - and - rewind query execution model to allow a query with full SQL expressive power to be applied to stream data by converting the latter into a sequence of “chunks”, and executing the query over each chunk sequentially, but without shutting the query instance down between chunks for continuously maintaining the application context across the execution cycles as required by sliding-window operators. We also propose the cycle - based transaction model to support Continuous Querying with Continuous Persisting (CQCP) with cycle-based isolation and visibility.

We have prototyped our approach by extending the PostgreSQL. This work has resulted in a new kind of tightly integrated, highly efficient system with the advanced stream processing capability as well as the full DBMS functionality. We demonstrate the system with the popular Linear Road benchmark, and report the performance. By leveraging the matured code base of a query engine to the maximal extent, we can significantly reduce the engineering investment needed for developing the streaming technology. Providing this capability on proprietary parallel analytics engine is work in progress.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Article Google Scholar
Abadi, D.J., et al.: The design of the borealis stream processing engine. In: CIDR (2005)
Google Scholar
Arasu, A., B, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)
Article Google Scholar
Bryant, R.E.: Data-intensive supercomputing: the case for DISC. In: CMU-CS-07-128 (2007)
Google Scholar
Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing for an uncertain world. In: CIDR (2003)
Google Scholar
Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: easy and efficient parallel processing of massive data sets. VLDB 1(2), 1265–1276 (2008)
Google Scholar
Chen, J., et al.: NiagaraCQ: a scalable continuous query system for internet databases. In: SIGMOD (2000)
Google Scholar
Chen, Q., Hsu, M.: Cooperating SQL dataflow processes for In-DB analytics. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2009, Part I. LNCS, vol. 5870, pp. 389–397. Springer, Heidelberg (2009)
Chapter Google Scholar
Chen, Q., Hsu, M., Liu, R.: Extend UDF technology for integrated analytics. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 256–270. Springer, Heidelberg (2009)
Chapter Google Scholar
Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. VLDB. 1(2), 1277–1288 (2008)
Google Scholar
Cranor, C.D., et al.: Gigascope: a stream database for network applications. In: SIGMOD (2003)
Google Scholar
Cuzzocrea, A., Mansmann, S.: OLAP visualization: models, issues, and techniques. In: Wang, J. (ed.) Encyclopedia of Data Warehousing and Mining, 2nd edn, pp. 1439–1446. IGI Global, Hershey (2009)
Chapter Google Scholar
Cuzzocrea, A., Saccà, D.: Balancing accuracy and privacy of OLAP aggregations on data cubes. In: Proceedings of the 13th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2010) in conjunction with 19th ACM International Conference on Information and Knowledge Management (CIKM 2010), Toronto, pp. 93–98, 26–30 October 2010
Google Scholar
Cuzzocrea, A., Bertino, E.: A secure multiparty computation framework for privacy preserving OLAP over distributed XML data. In: Proceedings of the 25th ACM International Symposium on Applied Computing (SAC 2010), Sierre, pp. 1666–1673, 22–26 March 2010
Google Scholar
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: the system s declarative stream processing engine. In: ACM SIGMOD (2008)
Google Scholar
Franklin, M.J., et al.: Continuous analytics: rethinking query processing in a network-effect world. In: CIDR (2009)
Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys 2007, March 2007
Google Scholar
Jain, N., et al.: Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In: SIGMOD (2006)
Google Scholar
Liarou, E., et.al.: Exploiting the power of relational databases for efficient stream processing. In: EDBT (2009)
Google Scholar
Zeller, H.: NonStop SQL/MX publish subscribe: continuous data streams in transaction processing. In: SIGMOD Conference (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

HP Labs, Hewlett Packard Co., Palo Alto, CA, USA
Qiming Chen & Meichun Hsu

Authors

Qiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meichun Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiming Chen .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
ICAR-CNR and University of Calabria, Rende, Italy
Alfredo Cuzzocrea
Hewlett-Packard Labatories, Palo Alto, California, USA
Umeshwar Dayal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, Q., Hsu, M. (2015). Cut-and-Rewind: Extending Query Engine for Continuous Stream Analytics. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI. Lecture Notes in Computer Science(), vol 9260. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47804-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-47804-2_5
Published: 17 July 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47803-5
Online ISBN: 978-3-662-47804-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics