Abstract
Continuous queries often require significant run-time state over arbitrary data streams. However, streams may exhibit certain data or arrival patterns, or constraints, that can be detected and exploited to reduce state considerably without compromising correctness. Rather than requiring constraints to be satisfied precisely, which can be unrealistic in a data streams environment, we introduce k-constraints, where k is an adherence parameter specifying how closely a stream adheres to the constraint. (Smaller k's are closer to strict adherence and offer better memory reduction.) We present a query processing architecture, called k-Mon, that detects useful k-constraints automatically and exploits the constraints to reduce run-time state for a wide range of continuous queries. Experimental results showed dramatic state reduction, while only modest computational overhead was incurred for our constraint monitoring and query execution algorithms.
Supplemental Material
Available for Download
Appendix for Exploiting k-constraints to reduce memory overhead in continuous queries over data streams by Babu, Srivastava, and Widom
- Ajtai, M., Jayram, T., Kumar, R., and Sivakumar, D. 2002. Counting inversions in a data stream. In Proceedings of the 2002 Annual ACM Symposium on Theory of Computing. ACM Press, New York, NY, 370--379. Google ScholarDigital Library
- Arasu, A. 2003. CQL Specification of the Linear Road Benchmark. Available online at http://www-db.stanford.edu/stream/cql-benchmark.html.Google Scholar
- Arasu, A., Babu, S., and Widom, J. 2002. An abstract semantics and concrete language for continuous queries over streams and relations. Tech. rep. Stanford University, Stanford, CA. Available online at http://dbpubs.stanford.edu/pub/2002-57.Google Scholar
- Arasu, A., Cherniack, M. et al. 2004. Linear road: A stream data management benchmark. In Proceedings of the 2004 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
- Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--16. Google ScholarDigital Library
- Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. 2004a. Adaptive ordering of pipelined stream filters. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarDigital Library
- Babu, S., Munagala, K., Widom, J., and Motwani, R. 2004b. Adaptive caching for continuous queries. Tech. rep. Stanford University, Stanford, CA. Available online at http://dbpubs. stanford.edu/pub/2004-14.Google Scholar
- Babu, S. and Widom, J. 2004. StreaMon: An adaptive engine for stream query processing (demonstration). In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarDigital Library
- Bloom, B. 1970. Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13, 7, 422--426. Google ScholarDigital Library
- Caceres, R. et al. 2000. Measurement and analysis of IP network usage and behavior. IEEE Commun. Mag. 38, 5, 144--151. Google ScholarDigital Library
- Carney, D., Centintemel, U. et al. 2002. Monitoring streams---a new class of data management applications. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 215--226. Google ScholarDigital Library
- Chandrasekharan, S. and Franklin, M. J. 2002. Streaming queries over streaming data. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 203--214. Google ScholarDigital Library
- Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 379--390. Google ScholarDigital Library
- Cranor, C., Johnson, T., Spataschek, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 647--651. Google ScholarDigital Library
- Das, A., Gehrke, J., and Riedewald, M. 2003. Approximate join processing over data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 40--51. Google ScholarDigital Library
- Dobra, A., Garofalakis, M., Gehrke, J., and Rastogi, R. 2002. Processing complex aggregate queries over data streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 61--72. Google ScholarDigital Library
- Duffield, N. and Grossglauser, M. 2000. Trajectory sampling for direct traffic observation. In Proceedings of the 2000 ACM SIGCOMM. ACM Press, New York, NY, 271--284. Google ScholarDigital Library
- Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 2000. Testing and spot checking of data streams. In Proceedings of the 2000 Annual ACM-SIAM Symposium on Discrete Algorithms. ACM/SIAM, New York, NY, 165--174. Google ScholarDigital Library
- Ganguly, S., Garofalakis, M., and Rastogi, R. 2004. Processing data-stream join aggregates using skimmed sketches. In Proceedings of the 9th International Conference on Extending Database Technology. Springer, Berlin, Heidelberg, Germany, 569--586.Google Scholar
- Garcia-Molina, H., Labio, W., and Yang, J. 1998. Expiring data in a warehouse. In Proceedings of the 1998 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 500--511. Google ScholarDigital Library
- Gehrke, J. 2003. Special issue on data stream processing. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 26, 1 (March).Google Scholar
- Golab, L. and Ozsu, T. 2003a. Issues in data stream management. SIGMOD Rec. 32, 2 (June), 5--14. Google ScholarDigital Library
- Golab, L. and Ozsu, T. 2003b. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 500--511. Google ScholarDigital Library
- Hammad, M., Aref, W., and Elmagarmid, A. 2003. Stream window join: Tracking moving objects in sensor-network databases. In Proceedings of the 2003 International Conference on Scientific and Statistical Database Management. IEEE Computer Society Press, Los Alamitos, CA, 75--84. Google ScholarDigital Library
- Hellerstein, J. M., Franklin, M. J. et al. 2000. Adaptive query processing: Technology in evolution. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 23, 2 (June), 7--18.Google Scholar
- Helmer, S., Westmann, T., and Moerkotte, G. 1998. Diag-join: An opportunistic join algorithm for 1:n relationships. In Proceedings of the 1998 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 98--109. Google ScholarDigital Library
- Kang, J., Naughton, J. F., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the 2003 International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
- Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 49--60. Google ScholarDigital Library
- Motwani, R., Widom, J. et al. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the 1st Conference on Innovative Data Systems Research. 245--256.Google Scholar
- Raman, V., Deshpande, A., and Hellerstein, J. 2003. Using state modules for adaptive query processing. In Proceedings of the 2003 International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
- Srivastava, U. and Widom, J. 2004. Memory-limited execution of windowed stream joins. In Proceedings of the 2004 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
- Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. 2003. Load shedding in a data stream manager. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 309--320. Google ScholarDigital Library
- Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarDigital Library
- Urhan, T., Franklin, M. J., and Amsaleg, L. 1998. Cost based query scrambling for initial delays. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 130--141. Google ScholarDigital Library
- Viglas, S., Naughton, J. F., and Burger, J. 2003. Maximizing the output rate of multi-join queries over streaming information sources. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 285--296. Google ScholarDigital Library
Index Terms
- Exploiting k-constraints to reduce memory overhead in continuous queries over data streams
Recommendations
Characterizing memory requirements for queries over continuous data streams
This article deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of ...
Exploiting Punctuation Semantics in Continuous Data Streams
As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators ...
Semantics and implementation of continuous sliding window queries over data streams
In recent years the processing of continuous queries over potentially infinite data streams has attracted a lot of research attention. We observed that the majority of work addresses individual stream operations and system-related issues rather than the ...
Comments