skip to main content
article

Locally adaptive dimensionality reduction for indexing large time series databases

Published:01 June 2002Publication History
Skip Abstract Section

Abstract

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this article, we introduce a new dimensionality reduction technique, which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower-bounding, but very tight, Euclidean distance approximation, and show how they can support fast exact searching and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.

References

  1. Agrawal, R., Faloutsos, C., and Swami, A. 1993. Efficient similarity search in sequence databases. In Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R., Lin, K. I., Sawhney, H. S., and Shim, K. 1995a. Fast similarity search in the presence of noise, scaling, and translation in times-series databases. In Proceedings of 21th International Conference on Very Large Data Bases (Zurich, Switzerland). pp 490--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agrawal, R., Psaila, G., Wimmers, E. L., and Zait, M. 1995b. Querying shapes of histories. In Proceedings of the 1st International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bay, S. D. 2000. The UCI KDD Archive {http://kdd.ics.uci.edu}. Department of Information and Computer Science, University of California, Irvine, Calif.Google ScholarGoogle Scholar
  5. Bennett, K., Fayyad, U., and Geiger. D. 1999. Density-based indexing for approximate nearest-neighbor queries. In Proceedings of 5th International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp. 233--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chakrabarti, K. and Mehrotra, S. 1999. The Hybrid Tree: An index structure for high dimensional feature spaces. In Proceedings of the 15th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chakrabarti, K. and Mehrotra, S. 2000. Local dimensionality reduction: A new approach to indexing high dimensional spaces. In Proceedings of the 26th Conference on Very Large Database (Cairo, Egypt). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K., and Mehrotra, S. 2000. Similar shape retrieval in MARS. In Proceedings of IEEE International Conference on Multimedia and Expo. IEEE Computer Society Press, Los Alamitos, Calif.Google ScholarGoogle Scholar
  9. Chan, K. and Fu, W. 1999. Efficient time series matching by wavelets. In Proceedings of the 15th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chandrasekaran, S., Manjunath, B. S., Wang, Y. F., Winkeler, J., and Zhang. H. 1997. An eigenspace update algorithm for image analysis. Graph. Models Image Proc. 59, 5, 321--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chu, K. and Wong, M. 1999. Fast time-series searching with scaling and shifting. In Proceedings of the 18th ACM Symposium on Principles of Database Systems (Philadelphia, Pa.). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Das, G., Lin, K. Mannila, H., Renganathan, G., and Smyth, P. 1998. Rule discovery from time series. In Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16--22.Google ScholarGoogle Scholar
  13. Debregeas, A. and Hebrail, G. 1998. Interactive interpretation of Kohonen maps applied to curves. In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179--183.Google ScholarGoogle Scholar
  14. Evangelidis, G., Lomet, D., and Salzberg B. 1997. The hB-Pi-Tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB J. 6, 1, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Faloutsos, C., Jagadish, H., Mendelzon, A., and Milo, T. 1997. A signature technique for similarity-based queries. In Proceedings of the SEQUENCES 97 (Positano-Salerno, Italy). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minn.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings ACM SIGMOD Conference. ACM, New York, pp 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hellerstein, J. M., Papadimitriou, C. H., and Koutsoupias, E. 1997. Towards an analysis of indexing schemes. In Proceedings of the 16th ACM Symposium on Principles of Database Systems. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hjaltason, G. and Samet, H. 1995. Ranking in spatial databases. In Proceedings of the Symposium on Large Spatial Databases. pp. 83--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Huang, Y. W. and Yu, P. 1999. Adaptive query processing for time-series data. In Proceedings of the 5th International Conference of Knowledge Discovery and Data Mining. AAAI Press, Reston, Va., pp. 282--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jonsson. H. and Badal. D. 1997. Using signature files for querying time-series data. In Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kahveci, T. and Singh, A. 2001. Variable length queries for time series data. In Proceedings of the 17th International Conference on Data Engineering (Heidelberg, Germany). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kanth, K. V., Agrawal, D., and Singh, A. 1998. Dimensionality reduction for similarity searching in dynamic databases. In Proceedings of the ACM SIGMOD Conference. ACM, New York, pp. 166--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. 2000. Dimensionality reduction for fast similarity search in large time series databases. J. Knowl. Inf. Syst.Google ScholarGoogle Scholar
  25. Keogh, E. and Pazzani, M. 1999. Relevance feedback retrieval of time series data. In Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Keogh, E. and Pazzani, M. 1998. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. AAAI Press, Reston, Va., pp. 239--241.Google ScholarGoogle Scholar
  27. Keogh, E. and Smyth, P. 1997. A probabilistic approach to fast pattern matching in time series databases. In Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. AAAI Press, Reston, Va., pp. 24--40.Google ScholarGoogle Scholar
  28. Korn, F., Jagadish, H., and Faloutsos. C. 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of the ACM SIGMOD Conference (Tucson, Az.). ACM, New York, pp. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lam, S. and Wong, M. 1998. A fast projection algorithm for sequence data searching. Data Knowl. Eng. 28, 3, 321--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Li, C., Yu, P., and Castelli, V. 1998. MALM: A framework for mining sequence database at multiple abstraction levels. CIKM. pp. 267--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Loh, W., Kim, S., and Whang, K. 2000. Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases. In Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Moody, G. 2000. MIT-BIH Database Distribution {http://ecg.mit.edu/index.html}. Massachusetts Institute of Technology, Cambridge, Mass.Google ScholarGoogle Scholar
  33. Ng, M. K., Huang, Z., and Hegland, M. 1998. Data-mining massive time series astronomical data sets---A case study. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 401--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Park, S., Lee, D., and Chu, W. 1999. Fast retrieval of similar subsequences in long sequence databases. In Proceedings of the 3rd IEEE Knowledge and Data Engineering Exchange Workshop. IEEE Computer Society Press, Los Alamitosh, Calif. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pavlidis, T. 1976. Waveform segmentation through functional approximation. IEEE Trans. Comput. C-22, 7 (July).Google ScholarGoogle Scholar
  36. Perng, C., Wang, H., Zhang, S., and Parker, S. 2000. Landmarks: a new model for similarity-based pattern querying in time series databases. In Proceedings of the 16th International Conference on Data Engineering (San Diego, Calif.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Porkaew, K., Chakrabarti, K., and Mehrotra, S. 1999. Query refinement for multimedia similarity retrieval in MARS. In Proceedings of the ACM International Multimedia Conference (Orlando, Fla.). ACM, New York, pp. 235--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Qu, Y., Wang, C., and Wang, S. 1998. Supporting fast search in time series for movement patterns in multiples scales. In Proceedings of the 7th International Conference on Information and Knowledge Management (Washington, DC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rafiei, D. 1999. On similarity-based queries for time series data. In Proceedings of the 15th IEEE International Conference on Data Engineering (Sydney, Australia). IEEE Computer Society Press, Los Alamitos, Calif. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Roussopoulos, N., Kelley, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of the SIGMOD Conference. ACM, New York, pp. 71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Seidl, T. and Kriegel, H. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of the SIGMOD Conference. ACM, New York, pp. 154--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Shatkay, H. and Zdonik, S. 1996. Approximate queries and representations for large data sequences. In Proceedings of the 12th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif., pp. 546--553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Shevchenko, M. 2000. Space Research Institute (IKI) Web Site {http://www.iki.rssi.ru/}. Space Research Institute. Moscow, Russia.Google ScholarGoogle Scholar
  44. Stollnitz, E., Derose, T., and Salesin, D. 1995. Wavelets for computer graphics, A primer. IEEE Comput. Graphi. Appli. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Struzik, Z. and Siebes, A. 1999. The Haar wavelet transform in the time series similarity paradigm. In Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases. pp. 12--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wang, C. and Wang, S. 2000. Supporting content-based searches on time Series via approximation. In Proceedings of the International Conference on Scientific and Statistical Database Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Weigend, A. 1994. The Santa Fe Time Series Competition Data {http://www.stern.nyu.edu/∼aweigend/Time-Series/SantaFe.html}.Google ScholarGoogle Scholar
  48. Welch. D. and Quinn. P. 1999. http://wwwmacho.mcmaster.ca/Project/Overview/status.html.Google ScholarGoogle Scholar
  49. Wu, Y., Agrawal, D., and el Abbadi, A. 2000. A comparison of DFT and DWT based similarity search in time-series databases. In Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wu, D., Agrawal, D., el Abbadi, A., Singh, A., and Smith, T. R. 1996. Efficient retrieval for browsing large image databases. In Proceedings of the 5th International Conference on Knowledge Information (Rockville, Md.). pp. 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yi, B, K., Jagadish, H., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of the IEEE International Conference on Data Engineering. pp. 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Yi, B, K. and Faloutsos, C. 2000. Fast time sequence indexing for arbitrary LP norms. In Proceedings of the 26st International Conference on Very Large Databases (Cairo, Egypt). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Locally adaptive dimensionality reduction for indexing large time series databases

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader