Abstract
Many data mining application generate data in the form of streams called as streaming data and they arrive continuously. The distribution of data changes over time for the streaming data. The online ensemble learning method is used to handle the change in the underlying distribution of data called as concept drift, which gives the timely response for incoming data instances. Although many methods have been proposed for concept drift detection, data streams pose challenge in learning from concept drift with class imbalance, which exist in real world application such as intrusion detection and fault detection. Therefore, it is a significant challenge for the machine learning community to learn from the drifting and imbalanced data stream. In this paper, Ensemble Based Enhanced Early Drift Detection Model and Random Resampling Technique for online learning is proposed which detects the drift based on average error rate and standard deviation. The class imbalance is handled in the data stream by generating the synthetic data using the random resampling technique and the concept drift adaptation is done using ensemble classifiers. The proposed ensemble method can handle both concept drift and class imbalance with 98.52% accuracy.
Similar content being viewed by others
Data availability
All the data is collected from the simulation reports of the software and tools used by the authors. Authors are working on implementing the same using real world data with appropriate permissions.
References
Liu W, Zhang H, Ding Z, Liu Q, Zhu C (2021) A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowl-Based Syst 215:106778
Wang S, Minku LL, Chawla N, Yao X (2019) Learning from data streams and class imbalance. Connect Sci 31(2):103–104
Abbasi A, Javed AR, Chakraborty C, Nebhen J, Zehra W, Jalil Z (2021) ElStream: An ensemble learning approach for concept drift detection in dynamic social big data stream learning. IEEE Access 9:66408–66419
Li Z, Huang W, Xiong Y, Ren S, Zhu T (2020) Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm. Knowl-Based Syst 195:105694
Zhang H, Liu W, Wang S, Shan J, Liu Q (2019) Resample-based ensemble framework for drifting imbalanced data streams. IEEE Access 7:65103–65115
Brzezinski D, Minku LL, Pewinski T, Stefanowski J, Szumaczuk A (2021) The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowl Inf Syst 63(6):1429–1469
Lu Y, Cheung YM, Tang YY (2019) Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift. IEEE Trans Neural Netw Learn Syst 31(8):2764–2778
Toor AA, Usman M, Younas F, Fong ACM, Khan SA, Fong S (2020) Mining massive E-health data streams for IoMT enabled healthcare systems. Sensors 20(7):2131
Cano A, Krawczyk B (2022) ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 111(7):2561–2599
Gözüaçık Ö, Can F (2021) Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif Intell Rev 54:3725–3747
Zyblewski P, Sabourin R, Woźniak M (2021) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Information Fusion 66:138–154
Korycki Ł, Krawczyk B (2023) Adversarial concept drift detection under poisoning attacks for robust data stream mining. Mach Learn 112(10):4013–4048
Cano A, Krawczyk B (2020) Kappa updated ensemble for drifting data stream mining. Mach Learn 109:175–218
Jain M, Kaur G (2021) Distributed anomaly detection using concept drift detection-based hybrid ensemble techniques in streamed network data. Clust Comput 24:2099–2114
Ancy S, Paulraj D (2020) Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model. Comput Commun 153:553–560
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250(2013):113–141
Jain M, Kaur G, Saxena V (2022) A K-Means clustering and SVM based hybrid concept drift detection technique for network anomaly detection. Expert Syst Appl 193:116510
Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: A survey. Information Fusion 37:132–156
Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv (CSUR) 50(2):23
Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of the 2007 siam international conference on data mining. Society for Industrial and Applied Mathematics, pp 3–14
Soares RG, Santana A, Canuto AM, de Souto MCP (2006) Using accuracy and diversity to select classifiers to build ensembles. In The 2006 IEEE International Joint Conference on Neural Network Proceedings. IEEE, pp 1310–1316
Barandela R, Valdovinos R, Sánchez J (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357
Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. IJKESDP 3:4–21
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Street WN, Kim YS (2011) “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, pp 377–382
Chen S, He H (2009) SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining. In 2009 international joint conference on neural networks. IEEE, pp 522–529
Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368
Ren S, Zhu W, Liao B, Li Z, Wang P, Li K, Chen M, Li Z (2018) Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowledge-Based Syst 163. https://doi.org/10.1016/j.knosys.2018.09.032
Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: Experimental evaluation. Inf Sci 513:429–441
(2009) Nsl-kdd data set for network-based intrusion detection systems. Available on: http://nsl.cs.unb.ca/KDD/NSLKDD.html
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In Fourth international workshop on knowledge discovery from data streams 6:77–86
Lee T, Singh VP (2019) Discrete k-nearest neighbor resampling for simulating multisite precipitation occurrence and model adaption to climate change. Geosci Model Dev 12(3):1189
Verma V, Aggarwal RK (2020) A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective. Soc Netw Anal Min 10(1). https://doi.org/10.1007/s13278-020-00660-9
Pinagé F, dos Santos EM, Gama J (2019) A drift detection method based on dynamic classifier selection. Data Min Knowl Disc. https://doi.org/10.1007/s10618-019-00656-w
Gao Y, Chandra S, Li Y, Khan L, Thuraisingham BM (2020) SACCOS: a semi-supervised framework for emerging class detection and concept drift adaption over data streams. IEEE Trans Knowledge Data Eng 1–1. https://doi.org/10.1109/tkde.2020.2993193
Funding
No fund received for this project.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval and human participation
No ethics approval is required.
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
S, P., R, A.U. Ensemble framework for concept drift detection and class imbalance in data streams. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18349-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18349-y