research-article

Free Access

Just Accepted

Imbalance-Robust Multi-Label Self-Adjusting kNN

Authors:
Victor Gomes de Oliveira Martins Nicola

University of São Paulo, Brazil

University of São Paulo, Brazil

0009-0007-9335-5247
View Profile

,
Karina Valdivia Delgado

University of São Paulo, Brazil

University of São Paulo, Brazil

0000-0002-9120-8987
View Profile

,
Marcelo de Souza Lauretto

University of São Paulo, Brazil

University of São Paulo, Brazil

0000-0001-5507-2368
View Profile

Authors Info & Claims

ACM Transactions on Knowledge Discovery from DataAccepted on April 2024https://doi.org/10.1145/3663575

Online AM:11 May 2024Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

In the task of multi-label classification in data streams, instances arriving in real time need to be associated with multiple labels simultaneously. Various methods based on the k Nearest Neighbors algorithm have been proposed to address this task. However, these methods face limitations when dealing with imbalanced data streams, a problem that has received limited attention in existing works. To approach this gap, this paper introduces the Imbalance-Robust Multi-Label Self-Adjusting kNN (IRMLSAkNN), designed to tackle multi-label imbalanced data streams. IRMLSAkNN’s strength relies on maintaining relevant instances with imbalance labels by using a discarding mechanism that considers the imbalance ratio per label. On the other hand, it evaluates subwindows with an imbalance-aware measure to discard older instances that are lacking performance. We conducted statistical experiments on 32 benchmark data streams, evaluating IRMLSAkNN against eight multi-label classification algorithms using common accuracy-aware and imbalance-aware measures. The obtained results demonstrate that IRMLSAkNN consistently outperforms these algorithms in terms of predictive capacity and time cost across various levels of imbalance.

References

Gabriel Aguiar, Bartosz Krawczyk, and Alberto Cano. 2022. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning (2022).Google Scholar
Gavin Alberghini, Sylvio Barbon Junior, and Alberto Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481 (2022), 228–248.Google ScholarDigital Library
Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM). 443–448.Google ScholarCross Ref
Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive Online Analysis. J. Mach. Learn. Res. 11 (2010), 1601–1604.Google ScholarDigital Library
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 139–148.Google ScholarDigital Library
Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. 2022. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications 203 (2022), 117–215.Google ScholarDigital Library
Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.Google ScholarCross Ref
Francisco Charte, Antonio Rivera, María José del Jesus, and Francisco Herrera. 2013. A First Approach to Deal with Imbalance in Multi-label Datasets. In Hybrid Artificial Intelligent Systems, Jeng-Shyang Pan, Marios M. Polycarpou, Michał Woźniak, André C. P. L. F. de Carvalho, Héctor Quintián, and Emilio Corchado (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 150–160.Google Scholar
Pedro Domingos and Geoff Hulten. 2000. Mining High-Speed Data Streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 71–80.Google ScholarDigital Library
Jie Du and Chi-Man Vong. 2020. Robust Online Multilabel Learning Under Dynamic Changes in Data Distribution With Labels. IEEE Transactions on Cybernetics 50, 1 (2020), 374–385.Google ScholarCross Ref
Andrés Felipe Giraldo-Forero, Jorge Alberto Jaramillo-Garzón, and César Germán Castellanos-Domínguez. 2015. Evaluation of Example-Based Measures for Multi-label Classification Performance. In Bioinformatics and Biomedical Engineering, Francisco Ortuño and Ignacio Rojas (Eds.). 557–564.Google Scholar
Jorge Gonzalez-Lopez, Alberto Cano, and Sebastian Ventura. 2017. Large-Scale Multi-label Ensemble Learning on Spark. In 2017 IEEE Trustcom/BigDataSE/ICESS. 893–900.Google Scholar
Ege Berkay Gulcan, Isin Su Ecevit, and Fazli Can. 2022. Binary Transformation Method for Multi-Label Stream Classification. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 3968–3972.Google ScholarDigital Library
Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics 14 (2023), 697–724.Google ScholarCross Ref
F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus. 2016. Multilabel Classification. Springer Cham, Switzerland.Google Scholar
Jiaye Li, Jian Zhang, Jilian Zhang, and Shichao Zhang. 2023. Quantum KNN Classification With K Value Selection and Neighbor Selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1–1.Google Scholar
Shunpan Liang, Weiwei Pan, Dianlong You, Ze Liu, and Ling Yin. 2022. Incremental deep forest for multi-label data streams learning. Applied Intelligence 52 (2022), 13398–13414.Google ScholarDigital Library
Weiwei Liu, Xiaobo Shen, Haobo Wang, and Ivor W. Tsang. 2020. The Emerging Trends of Multi-Label Learning. IEEE transactions on pattern analysis and machine intelligence PP (2020).Google Scholar
Viktor Losing, Barbara Hammer, and Heiko Wersing. 2016. KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 291–300.Google Scholar
Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, and Guangquan Zhang. 2019. Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2346–2363.Google Scholar
Oded Maimon and Lior Rokach. 2010. Data Mining and Knowledge Discovery Handbook, 2nd ed. Springer, New York, NY.Google Scholar
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, USA. 234–265 pages.Google Scholar
Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.Google ScholarCross Ref
Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.Google ScholarCross Ref
John W. Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures. J. Amer. Statist. Assoc. 54, 287 (1959), 655–667.Google ScholarCross Ref
Niloofar Rastin, Mansoor Zolghadri Jahromi, and Mohammad Taheri. 2021. A generalized weighted distance k-Nearest Neighbor for multi-label problems. Pattern Recognition 114 (2021), 107526.Google ScholarCross Ref
Jesse Read, Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2012. Scalable and efficient multi-label classification for evolving data streams. Machine Learning 88 (2012), 243–272.Google ScholarDigital Library
Jesse Read, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A Multi-label/Multi-target Extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5. http://jmlr.org/papers/v17/12-164.htmlGoogle ScholarDigital Library
Martha Roseberry and Alberto Cano. 2018. Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams. In Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, Luís Torgo, Stan Matwin, Nathalie Japkowicz, Bartosz Krawczyk, Nuno Moniz, and Paula Branco (Eds.), Vol. 94. 23–37.Google Scholar
Martha Roseberry, Saso Dzeroski, Albert Bifet, and Alberto Cano. 2023. Aging and Rejuvenating Strategies for Fading Windows in Multi-Label Classification on Data Streams. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, USA, 390–397.Google ScholarDigital Library
Martha Roseberry, Bartosz Krawczyk, and Alberto Cano. 2019. Multi-Label Punitive KNN with Self-Adjusting Memory for Drifting Data Streams. ACM Trans. Knowl. Discov. Data 13, 6 (2019).Google ScholarDigital Library
Martha Roseberry, Bartosz Krawczyk, Youcef Djenouri, and Alberto Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.Google ScholarCross Ref
Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In Proceedings of the 14th ACM International Conference on Multimedia. 421–430.Google Scholar
Yange Sun, Han Shao, and Shasha Wang. 2019. Efficient Ensemble Classification for Multi-Label Data Streams with Concept Drift. Information 10, 5 (2019).Google Scholar
Adane Nega Tarekegn, Mario Giacobini, and Krzysztof Michalak. 2021. A review of methods for imbalanced multi-label classification. Pattern Recognition 118 (2021), 107965.Google ScholarCross Ref
Kashvi Taunk, Sanjukta De, Srishti Verma, and Aleena Swetapadma. 2019. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS). 1255–1260.Google ScholarCross Ref
Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, and Ioannis Vlahavas. 2011. Mulan: A Java Library for Multi-Label Learning. Journal of Machine Learning Research 12 (2011), 2411–2414.Google ScholarDigital Library
Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2008. Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 467–476.Google ScholarDigital Library
Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-Label kNN classifier with Online Dual Memory on data stream. In 2021 International Conference on Data Mining Workshops (ICDMW). IEEE, Auckland, New Zealand, 405–413.Google ScholarCross Ref
Zhe Wang, Hao Xu, Pan Zhou, and Gang Xiao. 2023. An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight. Computation 11, 2 (2023).Google Scholar
Scott Wares, John Isaacs, and Eyad Elyan. 2019. Data stream mining: methods and challenges for handling concept drift. SN Applied Sciences 1 (2019), 1412–1431.Google ScholarCross Ref
Hongxin Wu, Meng Han, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream. ACM Trans. Knowl. Discov. Data 17, 5 (2023).Google ScholarDigital Library
Jianhua Xu, Jiali Liu, Jing Yin, and Chengyu Sun. 2016. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously. Knowledge-Based Systems 98 (2016), 172–184.Google ScholarDigital Library
Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu, and Xin Geng. 2018. Binary relevance for multi-label learning: an overview. Frontiers of Computer Science 12 (2018), 191–202.Google ScholarDigital Library
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038–2048.Google Scholar
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819–1837.Google ScholarCross Ref
Shichao Zhang. 2022. Challenges in KNN Classification. IEEE Transactions on Knowledge and Data Engineering 34, 10 (2022), 4663–4675.Google ScholarDigital Library
Shichao Zhang and Jiaye Li. 2023. KNN Classification With One-Step Computation. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 2711–2723.Google Scholar
Shichao Zhang, Jiaye Li, and Yangding Li. 2023. Reachable Distance Function for KNN Classification. IEEE Transactions on Knowledge and Data Engineering 35, 07 (2023), 7382–7396.Google ScholarDigital Library
Shichao Zhang, Jiaye Li, Wenzhen Zhang, and Yongsong Qin. 2022. Hyper-class representation of data. 503, C (2022), 200–218.Google Scholar
Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2018. Efficient kNN Classification With Different Numbers of Nearest Neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2018), 1774–1785.Google ScholarCross Ref
Xiulin Zheng and Peipei Li. 2021. An Efficient Framework for Multi-Label Learning in Non-stationary Data Stream. In 2021 IEEE International Conference on Big Knowledge (ICBK). IEEE, Auckland, New Zealand, 149–156.Google Scholar
Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A Survey on Multi-Label Data Stream Classification. IEEE Access 8 (2020), 1249–1275.Google ScholarCross Ref
Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A Survey on Multi-Label Data Stream Classification. IEEE Access 8 (2020), 1249–1275.Google ScholarCross Ref

Index Terms

Imbalance-Robust Multi-Label Self-Adjusting kNN
1. Computing methodologies
  1. Machine learning

Recommendations

Multi-label sampling based on local label imbalance
Highlights
- The local imbalance is more crucial than the global one in multi-label data.
- ...
Abstract
Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-...
Read More
Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork

Multi-label learning is concerned with learning from data examples that are represented by a single feature vector while associated with multiple labels simultaneously. Existing multi-label learning approaches mainly focus on exploiting label ...
Read More
Multi-label borderline oversampling technique
Abstract
Class imbalance problem commonly exists in multi-label classification (MLC) tasks. It has non-negligible impacts on the classifier performance and has drawn extensive attention in recent years. Borderline oversampling has been widely used in ...
Highlights
- A new borderline oversampling technique for multi-label imbalanced learning.
- Defining self-borderline and cross-borderline samples in multi-label data sets.
- Handling one-vs-rest imbalance in multi-label imbalanced learning.
- ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Just Accepted
ISSN:1556-4681
EISSN:1556-472X
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 11 May 2024
- Accepted: 23 April 2024
- Revised: 9 April 2024
- Received: 20 November 2023
Published in tkdd Just Accepted

Check for updates
Author Tags
Multi-label learning
Data Stream Classification
Class Imbalance
Nearest Neighbors
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 41
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)41
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Imbalance-Robust Multi-Label Self-Adjusting kNN

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Multi-label sampling based on local label imbalance

Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork

Multi-label borderline oversampling technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Imbalance-Robust Multi-Label Self-Adjusting kNN

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Multi-label sampling based on local label imbalance

Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork

Multi-label borderline oversampling technique

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media