Skip to main content
Log in

Multi-label classification from high-speed data streams with adaptive model rules and random rules

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

Multi-label classification is a methodology that tries to solve classification problems where multiple classes are associated with each data example. Data streams pose new challenges to this methodology caused by the massive amounts of structured data production. In fact, most of the existent batch mode methods may not support this condition. Therefore, this paper proposes two multi-label classification methods based on rule and ensembles learning from continuous flow of data. These methods are derived from a multi-target regression algorithm. The main contribution of this work is the rule specialization for subsets of class labels, instead of the usual local (individual models for each output) or a global (one model for all outputs) methods. Prequential evaluation was conducted where global, local and subset operation modes were compared against other online classifiers found in the literature. Six real-world data sets were used. The evaluation demonstrated that the subset specialization presents competitive performance, when compared to local and global approaches and online classifiers found in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Aggarwal, C.C.: Data Streams: Models and Algorithms (Advances in Database Systems). Springer, New York (2006)

    Google Scholar 

  2. Almeida, E., Ferreira, C., Gama, J.: Adaptive model rules from data streams. In: ECML 2013—European Conference on Machine Learning (2013)

  3. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  4. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference, KDD ’09, pp. 139–148. ACM, New York (2009)

  5. Bifet, A., Kirkby, R.: Data stream mining: a practical approach. The University of Waikato, Tech. rep. (2009)

  6. Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD ’01, pp. 42–53. Springer, London (2001)

  7. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  8. Duarte, J., Gama, J.: Multi-target regression from high-speed data streams with adaptive model rules. In: IEEE Conference on Data Science and Advanced Analytics (2015)

  9. Fürnkranz, J., Gamberger, D., Lavra, N.: Foundations of Rule Learning. Springer, New York (2012)

    Book  MATH  Google Scholar 

  10. Gama, J.: Knowledge Discovery from Data Streams. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2010)

  11. Gama, J., Sebastião, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90(3), 317–346 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques, 1st edn. Springer, New York (2016)

    Google Scholar 

  13. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ikonomovska, E., Gama, J., Dzeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23(1), 128–168 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)

    Article  Google Scholar 

  16. Kong, X., Yu, P.: An ensemble-based approach to fast classification of multi-label data streams, pp. 95–104 (2011)

  17. Loza Mencía, E., Fürnkranz, J.: Pairwise learning of multilabel classifications with perceptrons. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence, pp. 2899–2906 (2008)

  18. Madjarov, G., Kocev, D., Gjorgjevikj, D., Deroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recognit. 45(9), 3084–3104 (2012)

    Article  Google Scholar 

  19. Osojnik, A., Panov, P., Dzeroski, S.: Multi-label classification via multi-target regression on data streams. Discov. Sci. (DS) 2015, 170–185 (2015)

    Article  MATH  Google Scholar 

  20. Osojnik, A., Panov, P., DźEroski, S.: Multi-label classification via multi-target regression on data streams. Mach. Learn. 106(6), 745–770 (2017). https://doi.org/10.1007/s10994-016-5613-5

    Article  MathSciNet  MATH  Google Scholar 

  21. Oza, N.C., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics, pp. 105–112. Morgan Kaufmann (2001)

  22. Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  23. Read, J., Bifet, A., Holmes, G., Pfahringer, B.: Scalable and efficient multi-label classification for evolving data streams. Mach. Learn. 88(1–2), 243–272 (2012)

    Article  MathSciNet  Google Scholar 

  24. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD ’09, pp. 254–269. Springer, Berlin (2009)

Download references

Acknowledgements

This work is financed by the ERDF European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Sousa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sousa, R., Gama, J. Multi-label classification from high-speed data streams with adaptive model rules and random rules. Prog Artif Intell 7, 177–187 (2018). https://doi.org/10.1007/s13748-018-0142-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-018-0142-z

Keywords

Navigation