article

Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

Authors:
Shivnath Babu

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Utkarsh Srivastava

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Jennifer Widom

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 29 Issue 3pp 545–580https://doi.org/10.1145/1016028.1016032

Published:01 September 2004Publication History

ACM Transactions on Database Systems

Abstract

Continuous queries often require significant run-time state over arbitrary data streams. However, streams may exhibit certain data or arrival patterns, or constraints, that can be detected and exploited to reduce state considerably without compromising correctness. Rather than requiring constraints to be satisfied precisely, which can be unrealistic in a data streams environment, we introduce k-constraints, where k is an adherence parameter specifying how closely a stream adheres to the constraint. (Smaller k's are closer to strict adherence and offer better memory reduction.) We present a query processing architecture, called k-Mon, that detects useful k-constraints automatically and exploits the constraints to reduce run-time state for a wide range of continuous queries. Experimental results showed dramatic state reduction, while only modest computational overhead was incurred for our constraint monitoring and query execution algorithms.

Supplemental Material

Available for Download

pdf

p1-babu-appendix.pdf (157.4 KB)

Appendix for Exploiting k-constraints to reduce memory overhead in continuous queries over data streams by Babu, Srivastava, and Widom

References

Ajtai, M., Jayram, T., Kumar, R., and Sivakumar, D. 2002. Counting inversions in a data stream. In Proceedings of the 2002 Annual ACM Symposium on Theory of Computing. ACM Press, New York, NY, 370--379. Google ScholarDigital Library
Arasu, A. 2003. CQL Specification of the Linear Road Benchmark. Available online at http://www-db.stanford.edu/stream/cql-benchmark.html.Google Scholar
Arasu, A., Babu, S., and Widom, J. 2002. An abstract semantics and concrete language for continuous queries over streams and relations. Tech. rep. Stanford University, Stanford, CA. Available online at http://dbpubs.stanford.edu/pub/2002-57.Google Scholar
Arasu, A., Cherniack, M. et al. 2004. Linear road: A stream data management benchmark. In Proceedings of the 2004 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--16. Google ScholarDigital Library
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. 2004a. Adaptive ordering of pipelined stream filters. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarDigital Library
Babu, S., Munagala, K., Widom, J., and Motwani, R. 2004b. Adaptive caching for continuous queries. Tech. rep. Stanford University, Stanford, CA. Available online at http://dbpubs. stanford.edu/pub/2004-14.Google Scholar
Babu, S. and Widom, J. 2004. StreaMon: An adaptive engine for stream query processing (demonstration). In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarDigital Library
Bloom, B. 1970. Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13, 7, 422--426. Google ScholarDigital Library
Caceres, R. et al. 2000. Measurement and analysis of IP network usage and behavior. IEEE Commun. Mag. 38, 5, 144--151. Google ScholarDigital Library
Carney, D., Centintemel, U. et al. 2002. Monitoring streams---a new class of data management applications. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 215--226. Google ScholarDigital Library
Chandrasekharan, S. and Franklin, M. J. 2002. Streaming queries over streaming data. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 203--214. Google ScholarDigital Library
Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 379--390. Google ScholarDigital Library
Cranor, C., Johnson, T., Spataschek, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 647--651. Google ScholarDigital Library
Das, A., Gehrke, J., and Riedewald, M. 2003. Approximate join processing over data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 40--51. Google ScholarDigital Library
Dobra, A., Garofalakis, M., Gehrke, J., and Rastogi, R. 2002. Processing complex aggregate queries over data streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 61--72. Google ScholarDigital Library
Duffield, N. and Grossglauser, M. 2000. Trajectory sampling for direct traffic observation. In Proceedings of the 2000 ACM SIGCOMM. ACM Press, New York, NY, 271--284. Google ScholarDigital Library
Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 2000. Testing and spot checking of data streams. In Proceedings of the 2000 Annual ACM-SIAM Symposium on Discrete Algorithms. ACM/SIAM, New York, NY, 165--174. Google ScholarDigital Library
Ganguly, S., Garofalakis, M., and Rastogi, R. 2004. Processing data-stream join aggregates using skimmed sketches. In Proceedings of the 9th International Conference on Extending Database Technology. Springer, Berlin, Heidelberg, Germany, 569--586.Google Scholar
Garcia-Molina, H., Labio, W., and Yang, J. 1998. Expiring data in a warehouse. In Proceedings of the 1998 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 500--511. Google ScholarDigital Library
Gehrke, J. 2003. Special issue on data stream processing. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 26, 1 (March).Google Scholar
Golab, L. and Ozsu, T. 2003a. Issues in data stream management. SIGMOD Rec. 32, 2 (June), 5--14. Google ScholarDigital Library
Golab, L. and Ozsu, T. 2003b. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 500--511. Google ScholarDigital Library
Hammad, M., Aref, W., and Elmagarmid, A. 2003. Stream window join: Tracking moving objects in sensor-network databases. In Proceedings of the 2003 International Conference on Scientific and Statistical Database Management. IEEE Computer Society Press, Los Alamitos, CA, 75--84. Google ScholarDigital Library
Hellerstein, J. M., Franklin, M. J. et al. 2000. Adaptive query processing: Technology in evolution. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 23, 2 (June), 7--18.Google Scholar
Helmer, S., Westmann, T., and Moerkotte, G. 1998. Diag-join: An opportunistic join algorithm for 1:n relationships. In Proceedings of the 1998 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 98--109. Google ScholarDigital Library
Kang, J., Naughton, J. F., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the 2003 International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 49--60. Google ScholarDigital Library
Motwani, R., Widom, J. et al. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the 1st Conference on Innovative Data Systems Research. 245--256.Google Scholar
Raman, V., Deshpande, A., and Hellerstein, J. 2003. Using state modules for adaptive query processing. In Proceedings of the 2003 International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
Srivastava, U. and Widom, J. 2004. Memory-limited execution of windowed stream joins. In Proceedings of the 2004 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. 2003. Load shedding in a data stream manager. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 309--320. Google ScholarDigital Library
Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarDigital Library
Urhan, T., Franklin, M. J., and Amsaleg, L. 1998. Cost based query scrambling for initial delays. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 130--141. Google ScholarDigital Library
Viglas, S., Naughton, J. F., and Burger, J. 2003. Maximizing the output rate of multi-join queries over streaming information sources. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 285--296. Google ScholarDigital Library

Index Terms

Exploiting k-constraints to reduce memory overhead in continuous queries over data streams
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Characterizing memory requirements for queries over continuous data streams

This article deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of ...
Read More
Exploiting Punctuation Semantics in Continuous Data Streams

As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators ...
Read More
Semantics and implementation of continuous sliding window queries over data streams

In recent years the processing of continuous queries over potentially infinite data streams has attracted a lot of research attention. We observed that the majority of work addresses individual stream operations and system-related issues rather than the ...
Read More

Reviews

Reviewer: Shannon Jacobs

This paper considers analytic techniques for data streams, but I'm going to start this review with a simpler kind of analytic problem that is comparable to the primary example used in the paper, a kind of logical puzzle you've surely played with. This kind of problem might describe the results of a footrace with (deliberately) minimal clues, such as "Alice finished behind Bill and Cathy, but ahead of Elizabeth," and "Doug was right behind Bill." The goal might be to deduce who finished third. The clues are designed to be sufficient enough to eliminate all of the possibilities except for the desired answer. This paper describes something similar, but for streamed data where the race never ends. The authors use the example of network traffic analysis, with a primary focus on reducing memory usage by eliminating unneeded intermediate data, rapidly sorting the pending packets, and emitting significant results into their output streams as quickly as possible. Their k -constraints can be regarded as a kind of annotation to describe the minimal clues effectively. For example, they may describe constraints on sequencing, latency, or routing, and packets that violate those constraints can be discarded immediately. The problem sounds relatively simple in that form, but there are plenty of complexities that fill the 36 pages, and overflow into an eight-page appendix. For example, there are indirect effects when pending packets are affected by decisions made for other packets, and the authors consider some situations where probabilistic constraints are justified. Actually, the paper promises even more, specifically that the authors' system will help recognize and identify the minimal but useful clues (minimal patterns) to be expressed in their k -constraint notation. Unfortunately, this long paper still doesn't make it sufficiently clear how their system can contribute to that higher level task of recognizing significant patterns, but, rather, remains focused primarily on a much lower level of efficient analysis for data within predetermined patterns. The paper actually doesn't go all the way to the lowest levels of the proofs, which were relegated to an electronic appendix. (The appendix also contains brief descriptions of several other network analysis problems using the authors' system, and some related algorithms.) The paper also includes some analyses of the resource savings, and a discussion of the relationships of the savings to constraint selection. Without being an expert in this field, it's hard to assess the significance of this work. It would be very useful for certain researchers analyzing data streams, but the scope of relevance and the degree to which the system can be generalized are not clear. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 29, Issue 3
September 2004
136 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1016028
Issue’s Table of Contents

Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2004
Published in tods Volume 29, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Continuous queries
constraints
data streams
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 79
  Total Citations
  View Citations
- 1,251
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

ACM Transactions on Database Systems

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Characterizing memory requirements for queries over continuous data streams

Exploiting Punctuation Semantics in Continuous Data Streams

Semantics and implementation of continuous sliding window queries over data streams

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

ACM Transactions on Database Systems

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Characterizing memory requirements for queries over continuous data streams

Exploiting Punctuation Semantics in Continuous Data Streams

Semantics and implementation of continuous sliding window queries over data streams

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media