Skip to main content
Log in

Ensemble framework for concept drift detection and class imbalance in data streams

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Many data mining application generate data in the form of streams called as streaming data and they arrive continuously. The distribution of data changes over time for the streaming data. The online ensemble learning method is used to handle the change in the underlying distribution of data called as concept drift, which gives the timely response for incoming data instances. Although many methods have been proposed for concept drift detection, data streams pose challenge in learning from concept drift with class imbalance, which exist in real world application such as intrusion detection and fault detection. Therefore, it is a significant challenge for the machine learning community to learn from the drifting and imbalanced data stream. In this paper, Ensemble Based Enhanced Early Drift Detection Model and Random Resampling Technique for online learning is proposed which detects the drift based on average error rate and standard deviation. The class imbalance is handled in the data stream by generating the synthetic data using the random resampling technique and the concept drift adaptation is done using ensemble classifiers. The proposed ensemble method can handle both concept drift and class imbalance with 98.52% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

All the data is collected from the simulation reports of the software and tools used by the authors. Authors are working on implementing the same using real world data with appropriate permissions.

References

  1. Liu W, Zhang H, Ding Z, Liu Q, Zhu C (2021) A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowl-Based Syst 215:106778

    Article  Google Scholar 

  2. Wang S, Minku LL, Chawla N, Yao X (2019) Learning from data streams and class imbalance. Connect Sci 31(2):103–104

    Article  Google Scholar 

  3. Abbasi A, Javed AR, Chakraborty C, Nebhen J, Zehra W, Jalil Z (2021) ElStream: An ensemble learning approach for concept drift detection in dynamic social big data stream learning. IEEE Access 9:66408–66419

    Article  Google Scholar 

  4. Li Z, Huang W, Xiong Y, Ren S, Zhu T (2020) Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm. Knowl-Based Syst 195:105694

    Article  Google Scholar 

  5. Zhang H, Liu W, Wang S, Shan J, Liu Q (2019) Resample-based ensemble framework for drifting imbalanced data streams. IEEE Access 7:65103–65115

    Article  Google Scholar 

  6. Brzezinski D, Minku LL, Pewinski T, Stefanowski J, Szumaczuk A (2021) The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowl Inf Syst 63(6):1429–1469

    Article  Google Scholar 

  7. Lu Y, Cheung YM, Tang YY (2019) Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift. IEEE Trans Neural Netw Learn Syst 31(8):2764–2778

    Article  Google Scholar 

  8. Toor AA, Usman M, Younas F, Fong ACM, Khan SA, Fong S (2020) Mining massive E-health data streams for IoMT enabled healthcare systems. Sensors 20(7):2131

    Article  Google Scholar 

  9. Cano A, Krawczyk B (2022) ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 111(7):2561–2599

    Article  MathSciNet  Google Scholar 

  10. Gözüaçık Ö, Can F (2021) Concept learning using one-class classifiers for implicit drift detection in evolving data streams. Artif Intell Rev 54:3725–3747

    Article  Google Scholar 

  11. Zyblewski P, Sabourin R, Woźniak M (2021) Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams. Information Fusion 66:138–154

    Article  Google Scholar 

  12. Korycki Ł, Krawczyk B (2023) Adversarial concept drift detection under poisoning attacks for robust data stream mining. Mach Learn 112(10):4013–4048

    Article  MathSciNet  Google Scholar 

  13. Cano A, Krawczyk B (2020) Kappa updated ensemble for drifting data stream mining. Mach Learn 109:175–218

    Article  MathSciNet  Google Scholar 

  14. Jain M, Kaur G (2021) Distributed anomaly detection using concept drift detection-based hybrid ensemble techniques in streamed network data. Clust Comput 24:2099–2114

    Article  Google Scholar 

  15. Ancy S, Paulraj D (2020) Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model. Comput Commun 153:553–560

    Article  Google Scholar 

  16. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci 250(2013):113–141

    Article  Google Scholar 

  17. Jain M, Kaur G, Saxena V (2022) A K-Means clustering and SVM based hybrid concept drift detection technique for network anomaly detection. Expert Syst Appl 193:116510

    Article  Google Scholar 

  18. Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: A survey. Information Fusion 37:132–156

    Article  Google Scholar 

  19. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv (CSUR) 50(2):23

    Google Scholar 

  20. Gao J, Fan W, Han J, Yu PS (2007) A general framework for mining concept-drifting data streams with skewed distributions. In: Proceedings of the 2007 siam international conference on data mining. Society for Industrial and Applied Mathematics, pp 3–14

  21. Soares RG, Santana A, Canuto AM, de Souto MCP (2006) Using accuracy and diversity to select classifiers to build ensembles. In The 2006 IEEE International Joint Conference on Neural Network Proceedings. IEEE, pp 1310–1316

  22. Barandela R, Valdovinos R, Sánchez J (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256

    Article  MathSciNet  Google Scholar 

  23. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357

    Google Scholar 

  24. Nguyen HM, Cooper EW, Kamei K (2011) Borderline over-sampling for imbalanced data classification. IJKESDP 3:4–21

    Article  Google Scholar 

  25. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328

  26. Street WN, Kim YS (2011) “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, pp 377–382

  27. Chen S, He H (2009) SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining. In 2009 international joint conference on neural networks. IEEE, pp 522–529

  28. Wang S, Minku LL, Yao X (2015) Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng 27(5):1356–1368

    Article  Google Scholar 

  29. Ren S, Zhu W, Liao B, Li Z, Wang P, Li K, Chen M, Li Z (2018) Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowledge-Based Syst 163. https://doi.org/10.1016/j.knosys.2018.09.032

  30. Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: Experimental evaluation. Inf Sci 513:429–441

    Article  MathSciNet  Google Scholar 

  31. (2009) Nsl-kdd data set for network-based intrusion detection systems. Available on: http://nsl.cs.unb.ca/KDD/NSLKDD.html

  32. Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In Fourth international workshop on knowledge discovery from data streams 6:77–86

  33. Lee T, Singh VP (2019) Discrete k-nearest neighbor resampling for simulating multisite precipitation occurrence and model adaption to climate change. Geosci Model Dev 12(3):1189

    Article  Google Scholar 

  34. Verma V, Aggarwal RK (2020) A comparative analysis of similarity measures akin to the Jaccard index in collaborative recommendations: empirical and theoretical perspective. Soc Netw Anal Min 10(1). https://doi.org/10.1007/s13278-020-00660-9

  35. Pinagé F, dos Santos EM, Gama J (2019) A drift detection method based on dynamic classifier selection. Data Min Knowl Disc. https://doi.org/10.1007/s10618-019-00656-w

    Article  Google Scholar 

  36. Gao Y, Chandra S, Li Y, Khan L, Thuraisingham BM (2020) SACCOS: a semi-supervised framework for emerging class detection and concept drift adaption over data streams. IEEE Trans Knowledge Data Eng 1–1. https://doi.org/10.1109/tkde.2020.2993193

Download references

Funding

No fund received for this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priya S.

Ethics declarations

Ethical approval and human participation

No ethics approval is required.

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

S, P., R, A.U. Ensemble framework for concept drift detection and class imbalance in data streams. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18349-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18349-y

Keywords

Navigation