An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

Kranen, Philipp; Assent, Ira; Seidl, Thomas

doi:10.1007/s13222-012-0083-9

An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

Fachbeitrag
Published: 01 February 2012

Volume 12, pages 43–50, (2012)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Philipp Kranen¹,
Ira Assent² &
Thomas Seidl¹

148 Accesses
Explore all metrics

Abstract

Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also shown to outperform traditional budget approaches on constant streams. In this article we present an index-inspired algorithm for Bayesian anytime classification on evolving data streams and show its performance on benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Similarity search on large data sets can be sped up to some extend by indexes and other appropriate methods.
Refining 26 classes for the letter data set takes less than 26 pages due a root compression technique employed in [22].

References

Arai B, Das G, Gunopulos D, Koudas N (2009) Anytime measures for top-k algorithms on exact and fuzzy data sets. VLDB J 18(2):407–427
Article Google Scholar
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD, pp 322–331
Google Scholar
Boddy MS (1991) Anytime problem solving using dynamic programming. In: AAAI, pp 738–743
Google Scholar
Borodin A, El-Yaniv R (1998) Online computation and competitive analysis. Cambridge University Press, Cambridge
MATH Google Scholar
Dean T, Boddy MS (1988) An analysis of time-dependent planning. In: AAAI, pp 49–54
Google Scholar
DeCoste D (2002) Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In: ICML, pp 99–106
Google Scholar
DeCoste D (2003) Anytime query-tuned kernel machines via Cholesky factorization. In: SDM, pp 186–193
Google Scholar
DeCoste D, Mazzoni D (2003) Fast query-optimized kernel machine classification via incremental approximate nearest support vectors. In: ICML, pp 115–122
Google Scholar
Dempster AP, Laird NML, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B 39(1):1–38
MathSciNet MATH Google Scholar
Esmeir S, Markovitch S (2011) Anytime learning of anycost classifiers. Mach Learn (25th Anniversary) 82(3):445–473
Article Google Scholar
Flores MJ, Gámez JA, Martínez AM, Puerta JM (2009) Gaode and haode: two proposals based on aode to deal with continuous variables. In: ICML, pp 40–47
Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Grass J, Zilberstein S (1996) Anytime algorithm development tools. SIGART Bull 7(2):20–27
Article Google Scholar
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD, pp 47–57
Google Scholar
Keogh EJ, Pazzani MJ (2002) Learning the structure of augmented Bayesian classifiers. Int J Artif Intell Tools 11(4):587–601
Article Google Scholar
Kranen P, Assent I, Baldauf C, Seidl T (2011) The clustree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst J 29:249–272
Article Google Scholar
Kranen P, Günnemann S, Fries S, Seidl T (2010) MC-tree: improving Bayesian anytime classification. In: SSDBM. Lecture notes in computer science, pp 252–269
Google Scholar
Kranen P, Krieger R, Denker S, Seidl T (2010) Bulk loading hierarchical mixture models for efficient stream classification. In: PAKDD, pp 325–334
Google Scholar
Kranen P, Seidl T (2009) Harnessing the strengths of anytime algorithms for constant data streams. Data Min Knowl Discov 19(2):245–260
Article MathSciNet Google Scholar
Likhachev M, Ferguson D, Gordon GJ, Stentz A, Thrun S (2008) Anytime search in dynamic graphs. Artif Intell 172(14):1613–1643
Article MathSciNet MATH Google Scholar
Likhachev M, Gordon GJ, Thrun S (2003) ARA*: anytime A* with provable bounds on sub-optimality. In: NIPS.
Google Scholar
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT/ICDT, pp 311–322
Google Scholar
Shieh J, Keogh E (2010) Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp 461–470
Google Scholar
Turaga DS, Verscheure O, Chaudhari UV, Amini L (2006) Resource management for networked classifiers in distributed stream mining systems. In: ICDM, pp 1102–1107
Google Scholar
Ueno K, Xi X, Keogh EJ, Lee DJ (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632
Google Scholar
Vlachos M, Lin J, Keogh EJ, Gunopulos D (2003) A wavelet-based anytime algorithm for k-means clustering of time series. In: Workshop on clustering high dimensionality data and its applications.
Google Scholar
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp 226–235
Google Scholar
Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Article MATH Google Scholar
Yang Y, Webb GI, Cerquides J, Korb KB, Boughton JR, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665
Article Google Scholar
Yang Y, Webb GI, Korb KB, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69(1):35–53
Article Google Scholar
Zheng F, Webb GI (2006) Efficient lazy elimination for averaged one-dependence estimators. In: ICML, pp 1113–1120
Chapter Google Scholar
Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: ECML PKDD, pp 490–501
Google Scholar
Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83
Google Scholar

Download references

Acknowledgements

This work has been supported by the UMIC Research Centre, RWTH Aachen University, Germany.

Author information

Authors and Affiliations

Data Management and Data Exploration Group, RWTH Aachen University, Aachen, Germany
Philipp Kranen & Thomas Seidl
Department of Computer Science, Aarhus University, Aarhus, Denmark
Ira Assent

Authors

Philipp Kranen
View author publications
You can also search for this author in PubMed Google Scholar
Ira Assent
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philipp Kranen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kranen, P., Assent, I. & Seidl, T. An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams. Datenbank Spektrum 12, 43–50 (2012). https://doi.org/10.1007/s13222-012-0083-9

Download citation

Received: 28 October 2011
Accepted: 19 January 2012
Published: 01 February 2012
Issue Date: March 2012
DOI: https://doi.org/10.1007/s13222-012-0083-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Uncertainty in big data analytics: survey, opportunities, and challenges

Stratified random sampling from streaming and stored data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Uncertainty in big data analytics: survey, opportunities, and challenges

Stratified random sampling from streaming and stored data

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation