Abstract
In the task of multi-label classification in data streams, instances arriving in real time need to be associated with multiple labels simultaneously. Various methods based on the k Nearest Neighbors algorithm have been proposed to address this task. However, these methods face limitations when dealing with imbalanced data streams, a problem that has received limited attention in existing works. To approach this gap, this paper introduces the Imbalance-Robust Multi-Label Self-Adjusting kNN (IRMLSAkNN), designed to tackle multi-label imbalanced data streams. IRMLSAkNN’s strength relies on maintaining relevant instances with imbalance labels by using a discarding mechanism that considers the imbalance ratio per label. On the other hand, it evaluates subwindows with an imbalance-aware measure to discard older instances that are lacking performance. We conducted statistical experiments on 32 benchmark data streams, evaluating IRMLSAkNN against eight multi-label classification algorithms using common accuracy-aware and imbalance-aware measures. The obtained results demonstrate that IRMLSAkNN consistently outperforms these algorithms in terms of predictive capacity and time cost across various levels of imbalance.
- Gabriel Aguiar, Bartosz Krawczyk, and Alberto Cano. 2022. A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. Machine Learning (2022).Google Scholar
- Gavin Alberghini, Sylvio Barbon Junior, and Alberto Cano. 2022. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 481 (2022), 228–248.Google ScholarDigital Library
- Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM). 443–448.Google ScholarCross Ref
- Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive Online Analysis. J. Mach. Learn. Res. 11 (2010), 1601–1604.Google ScholarDigital Library
- Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavaldà. 2009. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 139–148.Google ScholarDigital Library
- Jasmin Bogatinovski, Ljupčo Todorovski, Sašo Džeroski, and Dragi Kocev. 2022. Comprehensive comparative study of multi-label classification methods. Expert Systems with Applications 203 (2022), 117–215.Google ScholarDigital Library
- Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.Google ScholarCross Ref
- Francisco Charte, Antonio Rivera, María José del Jesus, and Francisco Herrera. 2013. A First Approach to Deal with Imbalance in Multi-label Datasets. In Hybrid Artificial Intelligent Systems, Jeng-Shyang Pan, Marios M. Polycarpou, Michał Woźniak, André C. P. L. F. de Carvalho, Héctor Quintián, and Emilio Corchado (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 150–160.Google Scholar
- Pedro Domingos and Geoff Hulten. 2000. Mining High-Speed Data Streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 71–80.Google ScholarDigital Library
- Jie Du and Chi-Man Vong. 2020. Robust Online Multilabel Learning Under Dynamic Changes in Data Distribution With Labels. IEEE Transactions on Cybernetics 50, 1 (2020), 374–385.Google ScholarCross Ref
- Andrés Felipe Giraldo-Forero, Jorge Alberto Jaramillo-Garzón, and César Germán Castellanos-Domínguez. 2015. Evaluation of Example-Based Measures for Multi-label Classification Performance. In Bioinformatics and Biomedical Engineering, Francisco Ortuño and Ignacio Rojas (Eds.). 557–564.Google Scholar
- Jorge Gonzalez-Lopez, Alberto Cano, and Sebastian Ventura. 2017. Large-Scale Multi-label Ensemble Learning on Spark. In 2017 IEEE Trustcom/BigDataSE/ICESS. 893–900.Google Scholar
- Ege Berkay Gulcan, Isin Su Ecevit, and Fazli Can. 2022. Binary Transformation Method for Multi-Label Stream Classification. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, New York, NY, USA, 3968–3972.Google ScholarDigital Library
- Meng Han, Hongxin Wu, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A survey of multi-label classification based on supervised and semi-supervised learning. International Journal of Machine Learning and Cybernetics 14 (2023), 697–724.Google ScholarCross Ref
- F. Herrera, F. Charte, A. J. Rivera, and M. J. del Jesus. 2016. Multilabel Classification. Springer Cham, Switzerland.Google Scholar
- Jiaye Li, Jian Zhang, Jilian Zhang, and Shichao Zhang. 2023. Quantum KNN Classification With K Value Selection and Neighbor Selection. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1–1.Google Scholar
- Shunpan Liang, Weiwei Pan, Dianlong You, Ze Liu, and Ling Yin. 2022. Incremental deep forest for multi-label data streams learning. Applied Intelligence 52 (2022), 13398–13414.Google ScholarDigital Library
- Weiwei Liu, Xiaobo Shen, Haobo Wang, and Ivor W. Tsang. 2020. The Emerging Trends of Multi-Label Learning. IEEE transactions on pattern analysis and machine intelligence PP (2020).Google Scholar
- Viktor Losing, Barbara Hammer, and Heiko Wersing. 2016. KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift. In 2016 IEEE 16th International Conference on Data Mining (ICDM). 291–300.Google Scholar
- Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, and Guangquan Zhang. 2019. Learning under Concept Drift: A Review. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2346–2363.Google Scholar
- Oded Maimon and Lior Rokach. 2010. Data Mining and Knowledge Discovery Handbook, 2nd ed. Springer, New York, NY.Google Scholar
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, USA. 234–265 pages.Google Scholar
- Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.Google ScholarCross Ref
- Jose M. Moyano, Eva L. Gibaja, Krzysztof J. Cios, and Sebastián Ventura. 2018. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion 44 (2018), 33–45.Google ScholarCross Ref
- John W. Pratt. 1959. Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures. J. Amer. Statist. Assoc. 54, 287 (1959), 655–667.Google ScholarCross Ref
- Niloofar Rastin, Mansoor Zolghadri Jahromi, and Mohammad Taheri. 2021. A generalized weighted distance k-Nearest Neighbor for multi-label problems. Pattern Recognition 114 (2021), 107526.Google ScholarCross Ref
- Jesse Read, Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. 2012. Scalable and efficient multi-label classification for evolving data streams. Machine Learning 88 (2012), 243–272.Google ScholarDigital Library
- Jesse Read, Peter Reutemann, Bernhard Pfahringer, and Geoff Holmes. 2016. MEKA: A Multi-label/Multi-target Extension to WEKA. Journal of Machine Learning Research 17, 21 (2016), 1–5. http://jmlr.org/papers/v17/12-164.htmlGoogle ScholarDigital Library
- Martha Roseberry and Alberto Cano. 2018. Multi-label kNN Classifier with Self Adjusting Memory for Drifting Data Streams. In Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, Luís Torgo, Stan Matwin, Nathalie Japkowicz, Bartosz Krawczyk, Nuno Moniz, and Paula Branco (Eds.), Vol. 94. 23–37.Google Scholar
- Martha Roseberry, Saso Dzeroski, Albert Bifet, and Alberto Cano. 2023. Aging and Rejuvenating Strategies for Fading Windows in Multi-Label Classification on Data Streams. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, USA, 390–397.Google ScholarDigital Library
- Martha Roseberry, Bartosz Krawczyk, and Alberto Cano. 2019. Multi-Label Punitive KNN with Self-Adjusting Memory for Drifting Data Streams. ACM Trans. Knowl. Discov. Data 13, 6 (2019).Google ScholarDigital Library
- Martha Roseberry, Bartosz Krawczyk, Youcef Djenouri, and Alberto Cano. 2021. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 442 (2021), 10–25.Google ScholarCross Ref
- Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia. In Proceedings of the 14th ACM International Conference on Multimedia. 421–430.Google Scholar
- Yange Sun, Han Shao, and Shasha Wang. 2019. Efficient Ensemble Classification for Multi-Label Data Streams with Concept Drift. Information 10, 5 (2019).Google Scholar
- Adane Nega Tarekegn, Mario Giacobini, and Krzysztof Michalak. 2021. A review of methods for imbalanced multi-label classification. Pattern Recognition 118 (2021), 107965.Google ScholarCross Ref
- Kashvi Taunk, Sanjukta De, Srishti Verma, and Aleena Swetapadma. 2019. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS). 1255–1260.Google ScholarCross Ref
- Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Jozef Vilcek, and Ioannis Vlahavas. 2011. Mulan: A Java Library for Multi-Label Learning. Journal of Machine Learning Research 12 (2011), 2411–2414.Google ScholarDigital Library
- Douglas Turnbull, Luke Barrington, David Torres, and Gert Lanckriet. 2008. Semantic Annotation and Retrieval of Music and Sound Effects. IEEE Transactions on Audio, Speech, and Language Processing 16, 2 (2008), 467–476.Google ScholarDigital Library
- Xihui Wang, Pascale Kuntz, Frank Meyer, and Vincent Lemaire. 2021. Multi-Label kNN classifier with Online Dual Memory on data stream. In 2021 International Conference on Data Mining Workshops (ICDMW). IEEE, Auckland, New Zealand, 405–413.Google ScholarCross Ref
- Zhe Wang, Hao Xu, Pan Zhou, and Gang Xiao. 2023. An Improved Multilabel k-Nearest Neighbor Algorithm Based on Value and Weight. Computation 11, 2 (2023).Google Scholar
- Scott Wares, John Isaacs, and Eyad Elyan. 2019. Data stream mining: methods and challenges for handling concept drift. SN Applied Sciences 1 (2019), 1412–1431.Google ScholarCross Ref
- Hongxin Wu, Meng Han, Zhiqiang Chen, Muhang Li, and Xilong Zhang. 2023. A Weighted Ensemble Classification Algorithm Based on Nearest Neighbors for Multi-Label Data Stream. ACM Trans. Knowl. Discov. Data 17, 5 (2023).Google ScholarDigital Library
- Jianhua Xu, Jiali Liu, Jing Yin, and Chengyu Sun. 2016. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously. Knowledge-Based Systems 98 (2016), 172–184.Google ScholarDigital Library
- Min-Ling Zhang, Yu-Kun Li, Xu-Ying Liu, and Xin Geng. 2018. Binary relevance for multi-label learning: an overview. Frontiers of Computer Science 12 (2018), 191–202.Google ScholarDigital Library
- Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038–2048.Google Scholar
- Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review on Multi-Label Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819–1837.Google ScholarCross Ref
- Shichao Zhang. 2022. Challenges in KNN Classification. IEEE Transactions on Knowledge and Data Engineering 34, 10 (2022), 4663–4675.Google ScholarDigital Library
- Shichao Zhang and Jiaye Li. 2023. KNN Classification With One-Step Computation. IEEE Transactions on Knowledge and Data Engineering 35, 3 (2023), 2711–2723.Google Scholar
- Shichao Zhang, Jiaye Li, and Yangding Li. 2023. Reachable Distance Function for KNN Classification. IEEE Transactions on Knowledge and Data Engineering 35, 07 (2023), 7382–7396.Google ScholarDigital Library
- Shichao Zhang, Jiaye Li, Wenzhen Zhang, and Yongsong Qin. 2022. Hyper-class representation of data. 503, C (2022), 200–218.Google Scholar
- Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2018. Efficient kNN Classification With Different Numbers of Nearest Neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2018), 1774–1785.Google ScholarCross Ref
- Xiulin Zheng and Peipei Li. 2021. An Efficient Framework for Multi-Label Learning in Non-stationary Data Stream. In 2021 IEEE International Conference on Big Knowledge (ICBK). IEEE, Auckland, New Zealand, 149–156.Google Scholar
- Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A Survey on Multi-Label Data Stream Classification. IEEE Access 8 (2020), 1249–1275.Google ScholarCross Ref
- Xiulin Zheng, Peipei Li, Zhe Chu, and Xuegang Hu. 2020. A Survey on Multi-Label Data Stream Classification. IEEE Access 8 (2020), 1249–1275.Google ScholarCross Ref
Index Terms
- Imbalance-Robust Multi-Label Self-Adjusting kNN
Recommendations
Multi-label sampling based on local label imbalance
Highlights- The local imbalance is more crucial than the global one in multi-label data.
- ...
AbstractClass imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-...
Addressing class-imbalance in multi-label learning via two-stage multi-label hypernetwork
Multi-label learning is concerned with learning from data examples that are represented by a single feature vector while associated with multiple labels simultaneously. Existing multi-label learning approaches mainly focus on exploiting label ...
Multi-label borderline oversampling technique
AbstractClass imbalance problem commonly exists in multi-label classification (MLC) tasks. It has non-negligible impacts on the classifier performance and has drawn extensive attention in recent years. Borderline oversampling has been widely used in ...
Highlights- A new borderline oversampling technique for multi-label imbalanced learning.
- Defining self-borderline and cross-borderline samples in multi-label data sets.
- Handling one-vs-rest imbalance in multi-label imbalanced learning.
- ...
Comments