Abstract
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this article, we introduce a new dimensionality reduction technique, which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower-bounding, but very tight, Euclidean distance approximation, and show how they can support fast exact searching and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.
- Agrawal, R., Faloutsos, C., and Swami, A. 1993. Efficient similarity search in sequence databases. In Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. Google ScholarDigital Library
- Agrawal, R., Lin, K. I., Sawhney, H. S., and Shim, K. 1995a. Fast similarity search in the presence of noise, scaling, and translation in times-series databases. In Proceedings of 21th International Conference on Very Large Data Bases (Zurich, Switzerland). pp 490--501. Google ScholarDigital Library
- Agrawal, R., Psaila, G., Wimmers, E. L., and Zait, M. 1995b. Querying shapes of histories. In Proceedings of the 1st International Conference on Very Large Databases. Google ScholarDigital Library
- Bay, S. D. 2000. The UCI KDD Archive {http://kdd.ics.uci.edu}. Department of Information and Computer Science, University of California, Irvine, Calif.Google Scholar
- Bennett, K., Fayyad, U., and Geiger. D. 1999. Density-based indexing for approximate nearest-neighbor queries. In Proceedings of 5th International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp. 233--243. Google ScholarDigital Library
- Chakrabarti, K. and Mehrotra, S. 1999. The Hybrid Tree: An index structure for high dimensional feature spaces. In Proceedings of the 15th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the 26th Conference on Very Large Database (Cairo, Egypt). Google ScholarDigital Library
- Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K., and Mehrotra, S. 2000. Similar shape retrieval in MARS. In Proceedings of IEEE International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, Calif.Google Scholar
- Chan, K. and Fu, W. 1999. Efficient time series matching by wavelets. In Proceedings of the 15th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- Chandrasekaran, S., Manjunath, B. S., Wang, Y. F., Winkeler, J., and Zhang. H. 1997. An eigenspace update algorithm for image analysis. Graph. Models Image Proc. 59, 5, 321--332. Google ScholarDigital Library
- Chu, K. and Wong, M. 1999. Fast time-series searching with scaling and shifting. In Proceedings of the 18th ACM Symposium on Principles of Database Systems (Philadelphia, Pa.). ACM, New York. Google ScholarDigital Library
- Das, G., Lin, K. Mannila, H., Renganathan, G., and Smyth, P. 1998. Rule discovery from time series. In Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16--22.Google Scholar
- Debregeas, A. and Hebrail, G. 1998. Interactive interpretation of Kohonen maps applied to curves. In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179--183.Google Scholar
- Evangelidis, G., Lomet, D., and Salzberg B. 1997. The hB-Pi-Tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB J. 6, 1, 1--25. Google ScholarDigital Library
- Faloutsos, C., Jagadish, H., Mendelzon, A., and Milo, T. 1997. A signature technique for similarity-based queries. In Proceedings of the SEQUENCES 97 (Positano-Salerno, Italy). Google ScholarDigital Library
- Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minn.). Google ScholarDigital Library
- Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings ACM SIGMOD Conference. ACM, New York, pp 47--57. Google ScholarDigital Library
- Hellerstein, J. M., Papadimitriou, C. H., and Koutsoupias, E. 1997. Towards an analysis of indexing schemes. In Proceedings of the 16th ACM Symposium on Principles of Database Systems. ACM, New York. Google ScholarDigital Library
- Hjaltason, G. and Samet, H. 1995. Ranking in spatial databases. In Proceedings of the Symposium on Large Spatial Databases. pp. 83--95. Google ScholarDigital Library
- Huang, Y. W. and Yu, P. 1999. Adaptive query processing for time-series data. In Proceedings of the 5th International Conference of Knowledge Discovery and Data Mining. AAAI Press, Reston, Va., pp. 282--286. Google ScholarDigital Library
- Jonsson. H. and Badal. D. 1997. Using signature files for querying time-series data. In Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery. Google ScholarDigital Library
- Kahveci, T. and Singh, A. 2001. Variable length queries for time series data. In Proceedings of the 17th International Conference on Data Engineering (Heidelberg, Germany). Google ScholarDigital Library
- Kanth, K. V., Agrawal, D., and Singh, A. 1998. Dimensionality reduction for similarity searching in dynamic databases. In Proceedings of the ACM SIGMOD Conference. ACM, New York, pp. 166--176. Google ScholarDigital Library
- Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. 2000. Dimensionality reduction for fast similarity search in large time series databases. J. Knowl. Inf. Syst.Google Scholar
- Keogh, E. and Pazzani, M. 1999. Relevance feedback retrieval of time series data. In Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. Google ScholarDigital Library
- Keogh, E. and Pazzani, M. 1998. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. AAAI Press, Reston, Va., pp. 239--241.Google Scholar
- Keogh, E. and Smyth, P. 1997. A probabilistic approach to fast pattern matching in time series databases. In Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. AAAI Press, Reston, Va., pp. 24--40.Google Scholar
- Korn, F., Jagadish, H., and Faloutsos. C. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of the ACM SIGMOD Conference (Tucson, Az.). ACM, New York, pp. 289--300. Google ScholarDigital Library
- Lam, S. and Wong, M. 1998. A fast projection algorithm for sequence data searching. Data Knowl. Eng. 28, 3, 321--339. Google ScholarDigital Library
- Li, C., Yu, P., and Castelli, V. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. CIKM. pp. 267--272. Google ScholarDigital Library
- Loh, W., Kim, S., and Whang, K. 2000. Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases. In Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarDigital Library
- Moody, G. 2000. MIT-BIH Database Distribution {http://ecg.mit.edu/index.html}. Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
- Ng, M. K., Huang, Z., and Hegland, M. 1998. Data-mining massive time series astronomical data sets---A case study. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 401--402. Google ScholarDigital Library
- Park, S., Lee, D., and Chu, W. 1999. Fast retrieval of similar subsequences in long sequence databases. In Proceedings of the 3rd IEEE Knowledge and Data Engineering Exchange Workshop. IEEE Computer Society Press, Los Alamitosh, Calif. Google ScholarDigital Library
- Pavlidis, T. 1976. Waveform segmentation through functional approximation. IEEE Trans. Comput. C-22, 7 (July).Google Scholar
- Perng, C., Wang, H., Zhang, S., and Parker, S. 2000. Landmarks: a new model for similarity-based pattern querying in time series databases. In Proceedings of the 16th International Conference on Data Engineering (San Diego, Calif.). Google ScholarDigital Library
- Porkaew, K., Chakrabarti, K., and Mehrotra, S. 1999. Query refinement for multimedia similarity retrieval in MARS. In Proceedings of the ACM International Multimedia Conference (Orlando, Fla.). ACM, New York, pp. 235--238. Google ScholarDigital Library
- Qu, Y., Wang, C., and Wang, S. 1998. Supporting fast search in time series for movement patterns in multiples scales. In Proceedings of the 7th International Conference on Information and Knowledge Management (Washington, DC). Google ScholarDigital Library
- Rafiei, D. 1999. On similarity-based queries for time series data. In Proceedings of the 15th IEEE International Conference on Data Engineering (Sydney, Australia). IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarDigital Library
- Roussopoulos, N., Kelley, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of the SIGMOD Conference. ACM, New York, pp. 71--79. Google ScholarDigital Library
- Seidl, T. and Kriegel, H. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of the SIGMOD Conference. ACM, New York, pp. 154--165. Google ScholarDigital Library
- Shatkay, H. and Zdonik, S. 1996. Approximate queries and representations for large data sequences. In Proceedings of the 12th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif., pp. 546--553. Google ScholarDigital Library
- Shevchenko, M. 2000. Space Research Institute (IKI) Web Site {http://www.iki.rssi.ru/}. Space Research Institute. Moscow, Russia.Google Scholar
- Stollnitz, E., Derose, T., and Salesin, D. 1995. Wavelets for computer graphics, A primer. IEEE Comput. Graphi. Appli. Google ScholarDigital Library
- Struzik, Z. and Siebes, A. 1999. The Haar wavelet transform in the time series similarity paradigm. In Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases. pp. 12--22. Google ScholarDigital Library
- Wang, C. and Wang, S. 2000. Supporting content-based searches on time Series via approximation. In Proceedings of the International Conference on Scientific and Statistical Database Management. Google ScholarDigital Library
- Weigend, A. 1994. The Santa Fe Time Series Competition Data {http://www.stern.nyu.edu/∼aweigend/Time-Series/SantaFe.html}.Google Scholar
- Welch. D. and Quinn. P. 1999. http://wwwmacho.mcmaster.ca/Project/Overview/status.html.Google Scholar
- Wu, Y., Agrawal, D., and el Abbadi, A. 2000. A comparison of DFT and DWT based similarity search in time-series databases. In Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarDigital Library
- Wu, D., Agrawal, D., el Abbadi, A., Singh, A., and Smith, T. R. 1996. Efficient retrieval for browsing large image databases. In Proceedings of the 5th International Conference on Knowledge Information (Rockville, Md.). pp. 11--18. Google ScholarDigital Library
- Yi, B, K., Jagadish, H., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of the IEEE International Conference on Data Engineering. pp. 201--208. Google ScholarDigital Library
- Yi, B, K. and Faloutsos, C. 2000. Fast time sequence indexing for arbitrary LP norms. In Proceedings of the 26st International Conference on Very Large Databases (Cairo, Egypt). Google ScholarDigital Library
Index Terms
- Locally adaptive dimensionality reduction for indexing large time series databases
Recommendations
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of dataSimilarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction ...
Locally adaptive dimensionality reduction for indexing large time series databases
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction ...
Dimensionality reduction-based spoken emotion recognition
To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Comments