Skip to main content
Log in

An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

  • Fachbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also shown to outperform traditional budget approaches on constant streams. In this article we present an index-inspired algorithm for Bayesian anytime classification on evolving data streams and show its performance on benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Similarity search on large data sets can be sped up to some extend by indexes and other appropriate methods.

  2. Refining 26 classes for the letter data set takes less than 26 pages due a root compression technique employed in [22].

References

  1. Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427

    Article  Google Scholar 

  2. Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp 322–331

    Google Scholar 

  3. Boddy MS (1991) Anytime problem solving using dynamic programming. In: AAAI, pp 738–743

    Google Scholar 

  4. Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  5. Dean T, Boddy MS (1988) An analysis of time-dependent planning. In: AAAI, pp 49–54

    Google Scholar 

  6. DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: ICML, pp 99–106

    Google Scholar 

  7. DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: SDM, pp 186–193

    Google Scholar 

  8. DeCoste D, Mazzoni D (2003) Fast query-optimized kernel machine classification via incremental approximate nearest support vectors. In: ICML, pp 115–122

    Google Scholar 

  9. Dempster AP, Laird NML, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  10. Esmeir S, Markovitch S (2011) Anytime learning of anycost classifiers. Mach Learn (25th Anniversary) 82(3):445–473

    Article  Google Scholar 

  11. Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: ICML, pp 40–47

    Google Scholar 

  12. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  13. Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2):20–27

    Article  Google Scholar 

  14. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp 47–57

    Google Scholar 

  15. Keogh EJ, Pazzani MJ (2002) Learning the structure of augmented Bayesian classifiers. Int J Artif Intell Tools 11(4):587–601

    Article  Google Scholar 

  16. Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst J 29:249–272

    Article  Google Scholar 

  17. Kranen P, Günnemann S, Fries S, Seidl T (2010) MC-tree: improving Bayesian anytime classification. In: SSDBM. Lecture notes in computer science, pp 252–269

    Google Scholar 

  18. Kranen P, Krieger R, Denker S, Seidl T (2010) Bulk loading hierarchical mixture models for efficient stream classification. In: PAKDD, pp 325–334

    Google Scholar 

  19. Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Discov 19(2):245–260

    Article  MathSciNet  Google Scholar 

  20. Likhachev M, Ferguson D, Gordon GJ, Stentz A, Thrun S (2008) Anytime search in dynamic graphs. Artif Intell 172(14):1613–1643

    Article  MathSciNet  MATH  Google Scholar 

  21. Likhachev M, Gordon GJ, Thrun S (2003) ARA*: anytime A* with provable bounds on sub-optimality. In: NIPS.

    Google Scholar 

  22. Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT/ICDT, pp 311–322

    Google Scholar 

  23. Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp 461–470

    Google Scholar 

  24. Turaga DS, Verscheure O, Chaudhari UV, Amini L (2006) Resource management for networked classifiers in distributed stream mining systems. In: ICDM, pp 1102–1107

    Google Scholar 

  25. Ueno K, Xi X, Keogh EJ, Lee DJ (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632

    Google Scholar 

  26. Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications.

    Google Scholar 

  27. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp 226–235

    Google Scholar 

  28. Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24

    Article  MATH  Google Scholar 

  29. Yang Y, Webb GI, Cerquides J, Korb KB, Boughton JR, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665

    Article  Google Scholar 

  30. Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1):35–53

    Article  Google Scholar 

  31. Zheng F, Webb GI (2006) Efficient lazy elimination for averaged one-dependence estimators. In: ICML, pp 1113–1120

    Chapter  Google Scholar 

  32. Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: ECML PKDD, pp 490–501

    Google Scholar 

  33. Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the UMIC Research Centre, RWTH Aachen University, Germany.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philipp Kranen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kranen, P., Assent, I. & Seidl, T. An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams. Datenbank Spektrum 12, 43–50 (2012). https://doi.org/10.1007/s13222-012-0083-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-012-0083-9

Keywords

Navigation