Abstract
Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also shown to outperform traditional budget approaches on constant streams. In this article we present an index-inspired algorithm for Bayesian anytime classification on evolving data streams and show its performance on benchmark data sets.
Similar content being viewed by others
Notes
Similarity search on large data sets can be sped up to some extend by indexes and other appropriate methods.
Refining 26 classes for the letter data set takes less than 26 pages due a root compression technique employed in [22].
References
Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp 322–331
Boddy MS (1991) Anytime problem solving using dynamic programming. In: AAAI, pp 738–743
Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University Press, Cambridge
Dean T, Boddy MS (1988) An analysis of time-dependent planning. In: AAAI, pp 49–54
DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: ICML, pp 99–106
DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: SDM, pp 186–193
DeCoste D, Mazzoni D (2003) Fast query-optimized kernel machine classification via incremental approximate nearest support vectors. In: ICML, pp 115–122
Dempster AP, Laird NML, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38
Esmeir S, Markovitch S (2011) Anytime learning of anycost classifiers. Mach Learn (25th Anniversary) 82(3):445–473
Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: ICML, pp 40–47
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2):20–27
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp 47–57
Keogh EJ, Pazzani MJ (2002) Learning the structure of augmented Bayesian classifiers. Int J Artif Intell Tools 11(4):587–601
Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst J 29:249–272
Kranen P, Günnemann S, Fries S, Seidl T (2010) MC-tree: improving Bayesian anytime classification. In: SSDBM. Lecture notes in computer science, pp 252–269
Kranen P, Krieger R, Denker S, Seidl T (2010) Bulk loading hierarchical mixture models for efficient stream classification. In: PAKDD, pp 325–334
Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Discov 19(2):245–260
Likhachev M, Ferguson D, Gordon GJ, Stentz A, Thrun S (2008) Anytime search in dynamic graphs. Artif Intell 172(14):1613–1643
Likhachev M, Gordon GJ, Thrun S (2003) ARA*: anytime A* with provable bounds on sub-optimality. In: NIPS.
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT/ICDT, pp 311–322
Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp 461–470
Turaga DS, Verscheure O, Chaudhari UV, Amini L (2006) Resource management for networked classifiers in distributed stream mining systems. In: ICDM, pp 1102–1107
Ueno K, Xi X, Keogh EJ, Lee DJ (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632
Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications.
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp 226–235
Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Yang Y, Webb GI, Cerquides J, Korb KB, Boughton JR, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665
Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1):35–53
Zheng F, Webb GI (2006) Efficient lazy elimination for averaged one-dependence estimators. In: ICML, pp 1113–1120
Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: ECML PKDD, pp 490–501
Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83
Acknowledgements
This work has been supported by the UMIC Research Centre, RWTH Aachen University, Germany.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kranen, P., Assent, I. & Seidl, T. An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams. Datenbank Spektrum 12, 43–50 (2012). https://doi.org/10.1007/s13222-012-0083-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-012-0083-9