skip to main content
article

Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

Published:01 September 2004Publication History
Skip Abstract Section

Abstract

Continuous queries often require significant run-time state over arbitrary data streams. However, streams may exhibit certain data or arrival patterns, or constraints, that can be detected and exploited to reduce state considerably without compromising correctness. Rather than requiring constraints to be satisfied precisely, which can be unrealistic in a data streams environment, we introduce k-constraints, where k is an adherence parameter specifying how closely a stream adheres to the constraint. (Smaller k's are closer to strict adherence and offer better memory reduction.) We present a query processing architecture, called k-Mon, that detects useful k-constraints automatically and exploits the constraints to reduce run-time state for a wide range of continuous queries. Experimental results showed dramatic state reduction, while only modest computational overhead was incurred for our constraint monitoring and query execution algorithms.

Skip Supplemental Material Section

Supplemental Material

References

  1. Ajtai, M., Jayram, T., Kumar, R., and Sivakumar, D. 2002. Counting inversions in a data stream. In Proceedings of the 2002 Annual ACM Symposium on Theory of Computing. ACM Press, New York, NY, 370--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arasu, A. 2003. CQL Specification of the Linear Road Benchmark. Available online at http://www-db.stanford.edu/stream/cql-benchmark.html.Google ScholarGoogle Scholar
  3. Arasu, A., Babu, S., and Widom, J. 2002. An abstract semantics and concrete language for continuous queries over streams and relations. Tech. rep. Stanford University, Stanford, CA. Available online at http://dbpubs.stanford.edu/pub/2002-57.Google ScholarGoogle Scholar
  4. Arasu, A., Cherniack, M. et al. 2004. Linear road: A stream data management benchmark. In Proceedings of the 2004 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. 2004a. Adaptive ordering of pipelined stream filters. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Babu, S., Munagala, K., Widom, J., and Motwani, R. 2004b. Adaptive caching for continuous queries. Tech. rep. Stanford University, Stanford, CA. Available online at http://dbpubs. stanford.edu/pub/2004-14.Google ScholarGoogle Scholar
  8. Babu, S. and Widom, J. 2004. StreaMon: An adaptive engine for stream query processing (demonstration). In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bloom, B. 1970. Space/time tradeoffs in hash coding with allowable errors. Commun. ACM 13, 7, 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Caceres, R. et al. 2000. Measurement and analysis of IP network usage and behavior. IEEE Commun. Mag. 38, 5, 144--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carney, D., Centintemel, U. et al. 2002. Monitoring streams---a new class of data management applications. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chandrasekharan, S. and Franklin, M. J. 2002. Streaming queries over streaming data. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cranor, C., Johnson, T., Spataschek, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 647--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Das, A., Gehrke, J., and Riedewald, M. 2003. Approximate join processing over data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dobra, A., Garofalakis, M., Gehrke, J., and Rastogi, R. 2002. Processing complex aggregate queries over data streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Duffield, N. and Grossglauser, M. 2000. Trajectory sampling for direct traffic observation. In Proceedings of the 2000 ACM SIGCOMM. ACM Press, New York, NY, 271--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 2000. Testing and spot checking of data streams. In Proceedings of the 2000 Annual ACM-SIAM Symposium on Discrete Algorithms. ACM/SIAM, New York, NY, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ganguly, S., Garofalakis, M., and Rastogi, R. 2004. Processing data-stream join aggregates using skimmed sketches. In Proceedings of the 9th International Conference on Extending Database Technology. Springer, Berlin, Heidelberg, Germany, 569--586.Google ScholarGoogle Scholar
  20. Garcia-Molina, H., Labio, W., and Yang, J. 1998. Expiring data in a warehouse. In Proceedings of the 1998 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 500--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gehrke, J. 2003. Special issue on data stream processing. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 26, 1 (March).Google ScholarGoogle Scholar
  22. Golab, L. and Ozsu, T. 2003a. Issues in data stream management. SIGMOD Rec. 32, 2 (June), 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Golab, L. and Ozsu, T. 2003b. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 500--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hammad, M., Aref, W., and Elmagarmid, A. 2003. Stream window join: Tracking moving objects in sensor-network databases. In Proceedings of the 2003 International Conference on Scientific and Statistical Database Management. IEEE Computer Society Press, Los Alamitos, CA, 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hellerstein, J. M., Franklin, M. J. et al. 2000. Adaptive query processing: Technology in evolution. IEEE Comput. Soc. Bull. Tech. Comm. Data Eng. 23, 2 (June), 7--18.Google ScholarGoogle Scholar
  26. Helmer, S., Westmann, T., and Moerkotte, G. 1998. Diag-join: An opportunistic join algorithm for 1:n relationships. In Proceedings of the 1998 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 98--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kang, J., Naughton, J. F., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the 2003 International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA.Google ScholarGoogle Scholar
  28. Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Motwani, R., Widom, J. et al. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the 1st Conference on Innovative Data Systems Research. 245--256.Google ScholarGoogle Scholar
  30. Raman, V., Deshpande, A., and Hellerstein, J. 2003. Using state modules for adaptive query processing. In Proceedings of the 2003 International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, CA.Google ScholarGoogle Scholar
  31. Srivastava, U. and Widom, J. 2004. Memory-limited execution of windowed stream joins. In Proceedings of the 2004 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tatbul, N., Cetintemel, U., Zdonik, S., Cherniack, M., and Stonebraker, M. 2003. Load shedding in a data stream manager. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Urhan, T., Franklin, M. J., and Amsaleg, L. 1998. Cost based query scrambling for initial delays. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 130--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Viglas, S., Naughton, J. F., and Burger, J. 2003. Maximizing the output rate of multi-join queries over streaming information sources. In Proceedings of the 2003 International Conference on Very Large Data Bases. Morgan Kaufmann, San Mateo, CA, 285--296. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

      Recommendations

      Reviews

      Shannon Jacobs

      This paper considers analytic techniques for data streams, but I'm going to start this review with a simpler kind of analytic problem that is comparable to the primary example used in the paper, a kind of logical puzzle you've surely played with. This kind of problem might describe the results of a footrace with (deliberately) minimal clues, such as "Alice finished behind Bill and Cathy, but ahead of Elizabeth," and "Doug was right behind Bill." The goal might be to deduce who finished third. The clues are designed to be sufficient enough to eliminate all of the possibilities except for the desired answer. This paper describes something similar, but for streamed data where the race never ends. The authors use the example of network traffic analysis, with a primary focus on reducing memory usage by eliminating unneeded intermediate data, rapidly sorting the pending packets, and emitting significant results into their output streams as quickly as possible. Their k -constraints can be regarded as a kind of annotation to describe the minimal clues effectively. For example, they may describe constraints on sequencing, latency, or routing, and packets that violate those constraints can be discarded immediately. The problem sounds relatively simple in that form, but there are plenty of complexities that fill the 36 pages, and overflow into an eight-page appendix. For example, there are indirect effects when pending packets are affected by decisions made for other packets, and the authors consider some situations where probabilistic constraints are justified. Actually, the paper promises even more, specifically that the authors' system will help recognize and identify the minimal but useful clues (minimal patterns) to be expressed in their k -constraint notation. Unfortunately, this long paper still doesn't make it sufficiently clear how their system can contribute to that higher level task of recognizing significant patterns, but, rather, remains focused primarily on a much lower level of efficient analysis for data within predetermined patterns. The paper actually doesn't go all the way to the lowest levels of the proofs, which were relegated to an electronic appendix. (The appendix also contains brief descriptions of several other network analysis problems using the authors' system, and some related algorithms.) The paper also includes some analyses of the resource savings, and a discussion of the relationships of the savings to constraint selection. Without being an expert in this field, it's hard to assess the significance of this work. It would be very useful for certain researchers analyzing data streams, but the scope of relevance and the degree to which the system can be generalized are not clear. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 29, Issue 3
        September 2004
        136 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1016028
        Issue’s Table of Contents

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2004
        Published in tods Volume 29, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader