skip to main content
research-article
Free Access
Just Accepted

Imbalance-Robust Multi-Label Self-Adjusting kNN

Online AM:11 May 2024Publication History
Skip Abstract Section

Abstract

In the task of multi-label classification in data streams, instances arriving in real time need to be associated with multiple labels simultaneously. Various methods based on the k Nearest Neighbors algorithm have been proposed to address this task. However, these methods face limitations when dealing with imbalanced data streams, a problem that has received limited attention in existing works. To approach this gap, this paper introduces the Imbalance-Robust Multi-Label Self-Adjusting kNN (IRMLSAkNN), designed to tackle multi-label imbalanced data streams. IRMLSAkNN’s strength relies on maintaining relevant instances with imbalance labels by using a discarding mechanism that considers the imbalance ratio per label. On the other hand, it evaluates subwindows with an imbalance-aware measure to discard older instances that are lacking performance. We conducted statistical experiments on 32 benchmark data streams, evaluating IRMLSAkNN against eight multi-label classification algorithms using common accuracy-aware and imbalance-aware measures. The obtained results demonstrate that IRMLSAkNN consistently outperforms these algorithms in terms of predictive capacity and time cost across various levels of imbalance.

References

  1. Gabriel Aguiar, Bartosz Krawczyk, and Alberto Cano. 2022. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning (2022).Google ScholarGoogle Scholar
  2. Gavin Alberghini, Sylvio Barbon Junior, and Alberto Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481 (2022), 228–248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM). 443–448.Google ScholarGoogle ScholarCross RefCross Ref
  4. Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive Online Analysis. J. Mach. Learn. Res. 11 (2010), 1601–1604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 139–148.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. 2022. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications 203 (2022), 117–215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.Google ScholarGoogle ScholarCross RefCross Ref
  8. Francisco Charte, Antonio Rivera, María José del Jesus, and Francisco Herrera. 2013. A First Approach to Deal with Imbalance in Multi-label Datasets. In Hybrid Artificial Intelligent Systems, Jeng-Shyang Pan, Marios M. Polycarpou, Michał Woźniak, André C. P. L. F. de Carvalho, Héctor Quintián, and Emilio Corchado (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 150–160.Google ScholarGoogle Scholar
  9. Pedro Domingos and Geoff Hulten. 2000. Mining High-Speed Data Streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 71–80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jie Du and Chi-Man Vong. 2020. Robust Online Multilabel Learning Under Dynamic Changes in Data Distribution With Labels. IEEE Transactions on Cybernetics 50, 1 (2020), 374–385.Google ScholarGoogle ScholarCross RefCross Ref
  11. Andrés Felipe Giraldo-Forero, Jorge Alberto Jaramillo-Garzón, and César Germán Castellanos-Domínguez. 2015. Evaluation of Example-Based Measures for Multi-label Classification Performance. In Bioinformatics and Biomedical Engineering, Francisco Ortuño and Ignacio Rojas (Eds.). 557–564.Google ScholarGoogle Scholar
  12. Jorge Gonzalez-Lopez, Alberto Cano, and Sebastian Ventura. 2017. Large-Scale Multi-label Ensemble Learning on Spark. In 2017 IEEE Trustcom/BigDataSE/ICESS. 893–900.Google ScholarGoogle Scholar
  13. Ege Berkay Gulcan, Isin Su Ecevit, and Fazli Can. 2022. Binary Transformation Method for Multi-Label Stream Classification. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 3968–3972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics 14 (2023), 697–724.Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus. 2016. Multilabel Classification. Springer Cham, Switzerland.Google ScholarGoogle Scholar
  16. Jiaye Li, Jian Zhang, Jilian Zhang, and Shichao Zhang. 2023. Quantum KNN Classification With K Value Selection and Neighbor Selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1–1.Google ScholarGoogle Scholar
  17. Shunpan Liang, Weiwei Pan, Dianlong You, Ze Liu, and Ling Yin. 2022. Incremental deep forest for multi-label data streams learning. Applied Intelligence 52 (2022), 13398–13414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Weiwei Liu, Xiaobo Shen, Haobo Wang, and Ivor W. Tsang. 2020. The Emerging Trends of Multi-Label Learning. IEEE transactions on pattern analysis and machine intelligence PP (2020).Google ScholarGoogle Scholar
  19. Viktor Losing, Barbara Hammer, and Heiko Wersing. 2016. KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 291–300.Google ScholarGoogle Scholar
  20. Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, and Guangquan Zhang. 2019. Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2346–2363.Google ScholarGoogle Scholar
  21. Oded Maimon and Lior Rokach. 2010. Data Mining and Knowledge Discovery Handbook, 2nd ed. Springer, New York, NY.Google ScholarGoogle Scholar
  22. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, USA. 234–265 pages.Google ScholarGoogle Scholar
  23. Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.Google ScholarGoogle ScholarCross RefCross Ref
  25. John W. Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures. J. Amer. Statist. Assoc. 54, 287 (1959), 655–667.Google ScholarGoogle ScholarCross RefCross Ref
  26. Niloofar Rastin, Mansoor Zolghadri Jahromi, and Mohammad Taheri. 2021. A generalized weighted distance k-Nearest Neighbor for multi-label problems. Pattern Recognition 114 (2021), 107526.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jesse Read, Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2012. Scalable and efficient multi-label classification for evolving data streams. Machine Learning 88 (2012), 243–272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jesse Read, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A Multi-label/Multi-target Extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5. http://jmlr.org/papers/v17/12-164.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  29. Martha Roseberry and Alberto Cano. 2018. Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams. In Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, Luís Torgo, Stan Matwin, Nathalie Japkowicz, Bartosz Krawczyk, Nuno Moniz, and Paula Branco (Eds.), Vol. 94. 23–37.Google ScholarGoogle Scholar
  30. Martha Roseberry, Saso Dzeroski, Albert Bifet, and Alberto Cano. 2023. Aging and Rejuvenating Strategies for Fading Windows in Multi-Label Classification on Data Streams. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, USA, 390–397.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Martha Roseberry, Bartosz Krawczyk, and Alberto Cano. 2019. Multi-Label Punitive KNN with Self-Adjusting Memory for Drifting Data Streams. ACM Trans. Knowl. Discov. Data 13, 6 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Martha Roseberry, Bartosz Krawczyk, Youcef Djenouri, and Alberto Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.Google ScholarGoogle ScholarCross RefCross Ref
  33. Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In Proceedings of the 14th ACM International Conference on Multimedia. 421–430.Google ScholarGoogle Scholar
  34. Yange Sun, Han Shao, and Shasha Wang. 2019. Efficient Ensemble Classification for Multi-Label Data Streams with Concept Drift. Information 10, 5 (2019).Google ScholarGoogle Scholar
  35. Adane Nega Tarekegn, Mario Giacobini, and Krzysztof Michalak. 2021. A review of methods for imbalanced multi-label classification. Pattern Recognition 118 (2021), 107965.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kashvi Taunk, Sanjukta De, Srishti Verma, and Aleena Swetapadma. 2019. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS). 1255–1260.Google ScholarGoogle ScholarCross RefCross Ref
  37. Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, and Ioannis Vlahavas. 2011. Mulan: A Java Library for Multi-Label Learning. Journal of Machine Learning Research 12 (2011), 2411–2414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2008. Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 467–476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-Label kNN classifier with Online Dual Memory on data stream. In 2021 International Conference on Data Mining Workshops (ICDMW). IEEE, Auckland, New Zealand, 405–413.Google ScholarGoogle ScholarCross RefCross Ref
  40. Zhe Wang, Hao Xu, Pan Zhou, and Gang Xiao. 2023. An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight. Computation 11, 2 (2023).Google ScholarGoogle Scholar
  41. Scott Wares, John Isaacs, and Eyad Elyan. 2019. Data stream mining: methods and challenges for handling concept drift. SN Applied Sciences 1 (2019), 1412–1431.Google ScholarGoogle ScholarCross RefCross Ref
  42. Hongxin Wu, Meng Han, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream. ACM Trans. Knowl. Discov. Data 17, 5 (2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jianhua Xu, Jiali Liu, Jing Yin, and Chengyu Sun. 2016. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously. Knowledge-Based Systems 98 (2016), 172–184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu, and Xin Geng. 2018. Binary relevance for multi-label learning: an overview. Frontiers of Computer Science 12 (2018), 191–202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038–2048.Google ScholarGoogle Scholar
  46. Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819–1837.Google ScholarGoogle ScholarCross RefCross Ref
  47. Shichao Zhang. 2022. Challenges in KNN Classification. IEEE Transactions on Knowledge and Data Engineering 34, 10 (2022), 4663–4675.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shichao Zhang and Jiaye Li. 2023. KNN Classification With One-Step Computation. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 2711–2723.Google ScholarGoogle Scholar
  49. Shichao Zhang, Jiaye Li, and Yangding Li. 2023. Reachable Distance Function for KNN Classification. IEEE Transactions on Knowledge and Data Engineering 35, 07 (2023), 7382–7396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shichao Zhang, Jiaye Li, Wenzhen Zhang, and Yongsong Qin. 2022. Hyper-class representation of data. 503, C (2022), 200–218.Google ScholarGoogle Scholar
  51. Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2018. Efficient kNN Classification With Different Numbers of Nearest Neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2018), 1774–1785.Google ScholarGoogle ScholarCross RefCross Ref
  52. Xiulin Zheng and Peipei Li. 2021. An Efficient Framework for Multi-Label Learning in Non-stationary Data Stream. In 2021 IEEE International Conference on Big Knowledge (ICBK). IEEE, Auckland, New Zealand, 149–156.Google ScholarGoogle Scholar
  53. Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A Survey on Multi-Label Data Stream Classification. IEEE Access 8 (2020), 1249–1275.Google ScholarGoogle ScholarCross RefCross Ref
  54. Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A Survey on Multi-Label Data Stream Classification. IEEE Access 8 (2020), 1249–1275.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Imbalance-Robust Multi-Label Self-Adjusting kNN

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Knowledge Discovery from Data
          ACM Transactions on Knowledge Discovery from Data Just Accepted
          ISSN:1556-4681
          EISSN:1556-472X
          Table of Contents

          Copyright © 2024 Copyright held by the owner/author(s).

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Online AM: 11 May 2024
          • Accepted: 23 April 2024
          • Revised: 9 April 2024
          • Received: 20 November 2023
          Published in tkdd Just Accepted

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)41
          • Downloads (Last 6 weeks)41

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader