ABSTRACT
Recent data stream systems such as TelegraphCQ have employed the well-known property of duality between data and queries. In these systems, query processing methods are classified into two dual categories -- data-initiative and query-initiative -- depending on whether query processing is initiated by selecting a data element or a query. Although the duality property has been widely recognized, previous data stream systems do not fully take advantages of this property since they use the two dual methods independently: data-initiative methods only for continuous queries and query-initiative methods only for ad-hoc queries. We contend that continuous query processing can be better optimized by adopting an approach that integrates the two dual methods. Our primary contribution is based on the observation that spatial join is a powerful tool for achieving this objective. In this paper, we first present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We then present a continuous query processing algorithm based on spatial join, which we name Spatial Join CQ. This algorithm processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries, both defined as regions in the multi-dimensional space. The algorithm achieves the advantages of the two dual methods simultaneously. Experimental results show that the proposed algorithm outperforms earlier algorithms by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join queries.
- {1} Babcock, B. et al., "Models and Issues in Data Stream Systems," In Proc. the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems (PODS), Madison, Wisconsin, pp. 1-16, June 2002. Google ScholarDigital Library
- {2} Berchtold, S., Bohm, C., and Kriegel, H.-P., "The Pyramid-Technique: Towards Breaking the Curse of Dimensionality," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 142-153, June 1998. Google ScholarDigital Library
- {3} Brinkhoff, T., Kriegel, H.-P., and Seeger, B., "Efficient Processing of Spatial Join Using R-trees," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Washington, DC., pp. 237-246, May 1993. Google ScholarDigital Library
- {4} Chandrasekaran, S. and Franklin, M. J., "Streaming Queries over Streaming Data," In Proc. the 28th Int'l Conf. on Very Large Data Bases, Hong Kong, China, pp. 203-214, Aug. 2002. Google ScholarDigital Library
- {5} Chandrasekaran, S. et al., "TelegraphCQ: Continuous Dataflow Processing for an Uncertain World," In Proc. the First Biennial Conf. on Innovative Data Systems Research, Asiloma, Califonia, pp. 269-280, Jan. 2003.Google Scholar
- {6} Chen, J. et al., "NiagaraCQ: A Scalable Continuous Query System for Internet Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Dallas, Texas, pp. 379-390, June 2000. Google ScholarDigital Library
- {7} Faloutsos, C. and Roseman, S., "Fractals for Secondary Key Retrieval," In Proc. the Eighth ACM SIGACT-SIGMOD Symp. on Principles of Database Systems(PODS), Philadelphia, Pennsylvania, pp. 247-252, Mar. 1989. Google ScholarDigital Library
- {8} Fox, E. A. et al., "Order-preserving minimal perfect hash functions and information retrieval," ACM Trans. on Information Systems, Vol. 9, No. 3, pp. 281-308, July 1991. Google ScholarDigital Library
- {9} Golab, L. and Ozsu, M. T., "Issues in Data Stream Management," ACM SIGMOD Record, Vol. 32, No. 2, pp. 5-14, June 2003. Google ScholarDigital Library
- {10} Hanson, E. N. et al., "A Predicate Matching Algorithm for Database Rule Systems," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 271-280, June 1990. Google ScholarDigital Library
- {11} Hinrichs, K. and Nievergelt, J., "The Grid File: A Data Structure Designed to Support Proximity Queries on Spatial Objects," In Proc. Int'l Workshop on Graphtheoretic Concepts in Computer Science, Linz, Austria, pp. 100-113, Aug. 1983.Google Scholar
- {12} Huang, Y.-W., Jing, N., and Rundensteiner, E. A., "Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations," In Proc. the 23rd Int'l Conf. on Very Large Data Bases, Athens, Greece, pp. 396-405, Aug. 1997. Google ScholarDigital Library
- {13} Kang, J., Naughton, J. F., and Viglas, S. D., "Evaluating Window Joins over Unbounded Streams," In Proc. the 19th IEEE Int'l Conf. on Data Engineering(ICDE), Bangalore, India, pp. 341-352, Mar. 2003.Google Scholar
- {14} Kriegel, H.-P. et al., "Spatial Query Processing for High Resolutions," In Proc. the Eighth Int'l Conf. on Database Systems for Advanced Applications, Tokyo, Japan, pp. 17-26, Mar. 2003. Google ScholarDigital Library
- {15} Mokbel, M. F., Xiong, X., and Aref, W. G., "SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Paris, France, pp. 623-634, June 2004. Google ScholarDigital Library
- {16} Motwani, R. et al., "Query Processing, Approximation, and Resource Management in a Data Stream Management System," In Proc. the First Biennial Conf. on Innovative Data Systems Research, Asiloma, California, pp. 245-256, Jan. 2003.Google Scholar
- {17} Orenstein, J. A. and Merrett, T. H., "A Class of Data Structures for Associative Searching," In Proc. the Third ACM SIGACT-SIGMOD Symp. on Principles of Database Systems(PODS), Waterloo, Canada, pp. 181-190, Apr. 1984. Google ScholarDigital Library
- {18} Seeger, B. and Kriegel, H.-P., "Techniques for Design and Implementation of Efficient Spatial Access Methods," In Proc. the 14th Int'l Conf. on Very Large Data Bases, Los Angeles, California, pp. 360-371, Aug. 1988. Google ScholarDigital Library
- {19} Song, J.-W., Whang, K.-Y., Lee, Y.-K., and Kim, S.-W, "Spatial Join Processing Using Corner Transformation," IEEE Trans. on Knowledge and Data Engineering, Vol. 11, No. 4, July 1999. Google ScholarDigital Library
- {20} Terry, D. et al., "Continuous Queries over Append-Only Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, San Diego, California, pp. 321-330, June 1992. Google ScholarDigital Library
- {21} Weber, R., Schek, H.-J., and Blott, S., "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces," In Proc. the 24th Int'l Conf. on Very Large Data Bases, New York City, New York, pp. 194-205, Aug. 1998. Google ScholarDigital Library
- {22} Whang, K.-Y. and Krishnamurthy, R., Multilevel Grid Files, IBM Research Report RC11516, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, Nov. 1985.Google Scholar
- {23} Whang, K.-Y. and Krishnamurthy, R., "The Multilevel Grid File - a Dynamic Hierarchical Multidimensional File Structure," In Proc. Int'l Conf. on Database Systems for Advanced Applications, Tokyo, Japan, pp. 449-459, Apr. 1991. Google ScholarDigital Library
- {24} Zdonik, S. et al., "The Aurora and Medusa Projects," IEEE Data Engineering Bulletin, Vol. 26, No. 1, pp. 3-10, Mar. 2003.Google Scholar
Index Terms
- Continuous query processing in data streams using duality of data and queries
Recommendations
Filtering Data Streams for Entity-Based Continuous Queries
The idea of allowing query users to relax their correctness requirements in order to improve performance of a data stream management system (e.g., location-based services and sensor networks) has been recently studied. By exploiting the maximum error (...
The CQL continuous query language: semantic foundations and query execution
CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We ...
Continuous Top-k Dominating Queries
Top-k dominating queries use an intuitive scoring function which ranks multidimensional points with respect to their dominance power, i.e., the number of points that a point dominates. The k points with the best (e.g., highest) scores are returned to ...
Comments