Skip to main content
Log in

Learning from evolving data streams through ensembles of random patches

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Ensemble methods represent an effective way to solve supervised learning problems. Such methods are prevalent for learning from evolving data streams. One of the main reasons for such popularity is the possibility of incorporating concept drift detection and recovery strategies in conjunction with the ensemble algorithm. On top of that, successful ensemble strategies, such as bagging and random forest, can be easily adapted to a streaming setting. In this work, we analyse a novel ensemble method designed specially to cope with evolving data streams, namely the streaming random patches (SRP) algorithm. SRP combines random subspaces and online bagging to achieve competitive predictive performance in comparison with other methods. We significantly extend previous theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produces a high predictive performance for both real and synthetic datasets. We also show how ensembles of random subspaces can be an efficient and accurate option to SRP and leveraging bagging as we increase the number of base learners. Besides, we analyse the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces). Finally, we analyse the behaviour of SRP when using Naive Bayes as its base learner instead of Hoeffding trees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The implementation and instructions are available at https://github.com/hmgomes/StreamingRandomPatches.

  2. A formal definition of concept drift can be found in [48].

  3. The inference is one way: Algorithmic stability is sufficient, but not necessary, for learning.

  4. \(\Lambda \) could comprise values of multiple types, for example, here the integer ensemble size M and real-valued weights \(w_j\) could be hyperparameters.

  5. GP and c were originally identified as \(n_{\min }\) and \(\delta \) by Domingos and Hulten [15]; however, we choose to keep their acronyms as in the Massive Online Analysis (MOA) framework to facilitate reproducibility.

  6. Results for AGR(A) and AGR(G) for \(k=50\%\) and \(k=60\%\) produce the same results as \(k=0.6\times 9=5.4\) and \(k=0.5\times 9=4.5\) rounded to the nearest integer is 5 in both cases.

  7. In DWM [30], we can only set the maximum number of base learners, since DWM dynamically changes the ensemble size during execution.

  8. The results in Figs. 10 and 11 exclude SPAM CPU Time and RAM hours for all algorithms, since BAG and LB did not finish executing.

References

  1. Abdulsalam H, Skillicorn DB, Martin P (2008) Classifying evolving data streams using dynamic streaming random forests. In: International conference on database and expert systems applications. Springer, pp 643–651 (2008)

  2. Bifet A, Frank E, Holmes G, Pfahringer B (2012) Ensembles of restricted Hoeffding trees. ACM TIST 3(2):30:1–30:20. https://doi.org/10.1145/2089094.2089106

    Article  Google Scholar 

  3. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: SIAM

  4. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  5. Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: PKDD, pp 135–150

  6. Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526

    MathSciNet  MATH  Google Scholar 

  7. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350

    Article  MATH  Google Scholar 

  8. Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(1–2):85–103

    Article  Google Scholar 

  9. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  10. Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. J Inf Fusion 6:5–20

    Article  Google Scholar 

  11. Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67. https://doi.org/10.1016/j.ins.2013.12.011

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen ST, Lin HT, Lu CJ (2012) An online boosting algorithm with theoretical justifications. In: Proceedings of the international conference on machine learning (ICML)

  13. Da Xu L, He W, Li S (2014) Internet of things in industries: a survey. IEEE Trans Ind Inform 10(4):2233–2243

    Article  Google Scholar 

  14. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  15. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM SIGKDD, pp 71–80

  16. Domingos PM (2000) A unified bias-variance decomposition for zero-one and squared loss. AAAI 2000:564–569

    Google Scholar 

  17. Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156

    Google Scholar 

  18. Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37. https://doi.org/10.1145/2523813

    Article  MATH  Google Scholar 

  19. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):23:1–23:36. https://doi.org/10.1145/3054925

    Article  Google Scholar 

  20. Gomes HM, Barddal JP, Ferreira LEB, Bifet A (2018) Adaptive random forests for data stream regression. In: ESANN

  21. Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 6:1–27. https://doi.org/10.1007/s10994-017-5642-8

    Article  MathSciNet  MATH  Google Scholar 

  22. Gomes HM, Montiel J, Mastelini SM, Pfahringer B, Bifet A (2020) On ensemble techniques for data stream regression. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–8

  23. Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classification. In: IEEE international conference on data mining. IEEE

  24. Gomes HM, Read J, Bifet A, Barddal JP, Gama J (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor Newsl 21(2):6–22

    Article  Google Scholar 

  25. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York

  26. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  27. Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 241–250

  28. Holmes G, Kirkby R, Pfahringer B (2005) Stress-testing Hoeffding trees. Knowl Discov Databases PKDD 2005:495–502. https://doi.org/10.1007/11564126_50

    Article  Google Scholar 

  29. Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168

    Article  MathSciNet  Google Scholar 

  30. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

    MATH  Google Scholar 

  31. Kuncheva LI (2003) That elusive diversity in classifier ensembles. In: Iberian conference on pattern recognition and image analysis. Springer, pp 1126–1138 (2003)

  32. Kuncheva LI, Rodríguez JJ, Plumpton CO, Linden DE, Johnston SJ (2010) Random subspace ensembles for FMRI classification. IEEE Trans Med Imaging 29(2):531–542

    Article  Google Scholar 

  33. Kutin S, Niyogi P (2002) Almost-everywhere algorithmic stability and generalization error. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 275–282

  34. Kutin S, Niyogi P (2002) Almost-everywhere algorithmic stability and generalization error. Tech. Rep. TR-2002-03, University of Chicago

  35. Lim N, Durrant RJ (2017) Linear dimensionality reduction in linear time: Johnson-lindenstrauss-type guarantees for random subspace. arXiv:1705.06408

  36. Lim N, Durrant RJ (2020) A diversity-aware model for majority vote ensemble accuracy. In: International conference on artificial intelligence and statistics. PMLR, pp 4078–4087

  37. Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590

    Article  MathSciNet  Google Scholar 

  38. Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261

    Article  MathSciNet  Google Scholar 

  39. Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12:1399–1404

    Article  Google Scholar 

  40. Louppe G, Geurts P (2012) Ensembles on random patches. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 346–361 (2012)

  41. Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742

    Article  Google Scholar 

  42. Oza N, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics 2001, pp 105–112. Morgan Kaufmann

  43. Panov P, Džeroski S (2007) Combining bagging and random subspaces to create better ensembles. In: International symposium on intelligent data analysis. Springer, pp 118–129 (2007)

  44. Plumpton CO, Kuncheva LI, Oosterhof NN, Johnston SJ (2012) Naive random subspace ensemble with linear classifiers for real-time classification of FMRI data. Pattern Recognit 45(6):2101–2108

    Article  Google Scholar 

  45. Servedio RA (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648

    MathSciNet  MATH  Google Scholar 

  46. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  47. Stapenhurst RJ (2012) Diversity, margins and non-stationary learning. Ph.D. thesis, University of Manchester, UK

  48. Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994

    Article  MathSciNet  Google Scholar 

  49. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101. https://doi.org/10.1023/A:1018046501280

    Article  Google Scholar 

  50. Žliobaite I (2010) Change with delayed labeling: When is it detectable? In: 2010 IEEE international conference on Data mining workshops (ICDMW). IEEE, pp 843–850 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heitor Murilo Gomes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gomes, H.M., Read, J., Bifet, A. et al. Learning from evolving data streams through ensembles of random patches. Knowl Inf Syst 63, 1597–1625 (2021). https://doi.org/10.1007/s10115-021-01579-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01579-z

Keywords

Navigation