Skip to main content
Log in

Recurrent concepts in data streams classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This work addresses the problem of mining data streams generated in dynamic environments where the distribution underlying the observations may change over time. We present a system that monitors the evolution of the learning process. The system is able to self-diagnose degradations of this process, using change detection mechanisms, and self-repair the decision models. The system uses meta-learning techniques that characterize the domain of applicability of previously learned models. The meta-learner can detect recurrence of contexts, using unlabeled examples, and take pro-active actions by activating previously learned models. The experimental evaluation on three text mining problems demonstrates the main advantages of the proposed system: it provides information about the recurrence of concepts and rapidly adapts decision models when drift occurs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The default value is 250 examples.

References

  1. Baena-Garcia M, Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams (ECML-PKDD), Berlin, Germany

  2. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the SIAM international conference on data mining, Minneapolis, USA. SIAM, pp 443–448

  3. Dijkstra W (1974) Self-stabilizing systems in spite of distributed control. Commun ACM 17(11):643–644

    Article  MATH  Google Scholar 

  4. Duda RO, Hart PE (1973) Pattern classification and scene analysis, vol 95. Wiley, New York

  5. Gama J (2010) Knowledge discovery from data streams. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  6. Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: SBIA Brazilian symposium on artificial intelligence. Springer, Berlin, pp 286–295

  7. Gama J, Sebastiao R, Rodrigues PP (2013) On evaluation stream learning algorithms. Mach Learn 90(3):317–346

    Article  MATH  MathSciNet  Google Scholar 

  8. Granitzer M, Kröll M, Seifert C, Rath AS, Weber N, Dietzel O, Lindstaedt SN (2008) Analysis of machine learning techniques for context extraction. In: Pichappan P, Abraham A (eds) ICDIM. IEEE, pp 233–240

  9. Grant E, Leavenworth R (1996) Statistical quality control. McGraw-Hill, London

    Google Scholar 

  10. Harries MB, Sammut C, Horn K (1998) Extracting hidden context. Mach Learn 32:101–126

    Article  MATH  Google Scholar 

  11. Katakis I, Tsoumakas G, Banos E, Bassiliades N, Vlahavas I (2009) An adaptive personalized news dissemination system. J Intell Inf Syst 32:191–212

    Article  Google Scholar 

  12. Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22:371–391

    Article  Google Scholar 

  13. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300

    Google Scholar 

  14. Lazarescu MM (2005) A multi-resolution learning approach to tracking concept drift and recurrent concepts. In: Proceedings of the 5th international workshop on pattern recognition in information systems

  15. Ortega J (1995) Exploiting multiple existing models and learning algorithms. In: AAAI 96—workshop in induction of multiple learning models, pp 17–21

  16. Ortega J, Koppel M, Argamon S (2001) Arbitrating among competing classifiers using learned referees. Knowl Inf Syst 3(4):470–490

    Article  MATH  Google Scholar 

  17. Ramamurthy S, Bhatnagar R (2007) Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Proceedings of the sixth international conference on machine learning and applications (ICMLA ’07), pp 404–409. IEEE Computer Society, Washington, DC

  18. Seewald A, Fürnkranz J (2001) An evaluation of grading classifiers. In: Hoffmann F, Hand DJ, Adams N, Fisher D, Guimaraes G (eds) Advances in intelligent data analysis: proceedings of the 4th international conference (IDA-01), Cascais, Portugal. Springer, pp 115–124

  19. Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: SIGKDD, Knowledge discovery and data mining. ACM Press, New York, pp 377–382

  20. Turney P (1996) The management of context-sensitive features: a review of strategies. In: 13th international conference on machine learning (ICML96), workshop on learning in context-sensitive domains, Bari, Italy, pp 60–66

  21. Widmer G (1997) Tracking context changes through meta-learning. Mach Learn 27(3):259–286

    Article  Google Scholar 

  22. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  23. Yang Y, Wu X, Zhu X (2006) Mining in anticipation for concept change: proactive–reactive prediction in data streams. Data Min Knowl Discov 13(3):261–289

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is part-funded by the ERDF—European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness), by the Portuguese Funds through the FCT (Portuguese Foundation for Science and Technology) within project FCOMP—01-0124-FEDER-022701. The authors acknowledge the financial support given by the project Knowledge Discovery from Ubiquitous Data Streams (PTDC/EIA/098355/2008), funded by FCT. Petr Kosina acknowledges the support of Masaryk University, Faculty of Informatics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Gama.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gama, J., Kosina, P. Recurrent concepts in data streams classification. Knowl Inf Syst 40, 489–507 (2014). https://doi.org/10.1007/s10115-013-0654-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0654-6

Keywords

Navigation