Abstract
The increasing generation of data by devices, people and systems arises the need for processing non-stationary data streams, which continuously change over time. It was noticed that when compared to data stream classification, there is a lack of data stream regression studies. This work proposes AFXGBReg-D, an Adaptive Fast regression algorithm using XGBoost and active concept drift detectors. AFXGBReg uses an alternate model training strategy to achieve lean models adapted to concept drift, combined with a set of drift detector algorithms: ADWIN, KSWIN and DMM. We compared two AFXGBReg variants with other regressors and data stream regressors, simulating using synthetic datasets with different kinds of concept drifts. We show that AFXGBReg models have similar MSE to ARFReg, with these models achieving the best performance than others as proven statistically. Also AFXGBReg is 33 times faster than ARFReg, meaning that it is able to keep the same MSE level while being much faster. Another improvement is its ability of doing a faster recovery from concept drifts, having a smaller MSE peak.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbaszadeh, O., Amiri, A., Khanteymoori, A.R.: An ensemble method for data stream classification in the presence of concept drift. Front. Inf. Technol. Electron. Eng. 16(12), 1059–1068 (2015). https://doi.org/10.1631/FITEE.1400398
Barddal, J.P.: Vertical and horizontal partitioning in data stream regression ensembles. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Curitiba (2019)
Bonassa, G.: Adaptação de classificador utilizando a biblioteca XGBoost para classificação rápida de fluxos de dados parcialmente classificados com mudança de conceito (2021)
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al.: Xgboost: extreme gradient boosting. R Package Version 0.4-2 1(4), 1–4 (2015)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. Comput. Intell. Mag. 10(4), 12–25 (2015).https://doi.org/10.1109/MCI.2015.2471196
Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 22(10), 1517–1531 (2011)
Gama, J., Žliobaitundefined, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4) (2014). https://doi.org/10.1145/2523813
Gamage, S., Premaratne, U.: Detecting and adapting to concept drift in continually evolving stochastic processes. In: Proceedings of the International Conference on Big Data and Internet of Thing, BDIOT 2017, pp. 109–114. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3175684.3175723
Gomes, H.M., Barddal, J.P., Ferreira, L.E.B., Bifet, A.: Adaptive random forests for data stream regression. In: ESANN. IEEE, Curitiba (2018)
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inf. Fus. 37, 132–156 (2017)
Laney, D.: 3D data management: controlling data volume, velocity, and variety. Technical report, META Group, EUA (2001). http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
Larson, D., Chang, V.: A review and future direction of agile, business intelligence, analytics and data science. Int. J. Inf. Manag. 36(5), 700–710 (2016)
Liao, Z., Wang, Y.: Rival learner algorithm with drift adaptation for online data stream regression. In: Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2018, Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3302425.3302475
Lopes, R.H., Reid, I., Hobson, P.R.: The two-dimensional kolmogorov-smirnov test (2007)
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2018)
Mahdi, O.A., Pardede, E., Ali, N., Cao, J.: Fast reaction to sudden concept drift in the absence of class labels. Appl. Sci. 10(2), 606 (2020)
Mayr, A., Binder, H., Gefeller, O., Schmid, M.: The evolution of boosting algorithms. Methods Inf. Med. 53(06), 419–427 (2014)
Mehmood, H., Kostakos, P., Cortes, M., Anagnostopoulos, T., Pirttikangas, S., Gilman, E.: Concept drift adaptation techniques in distributed environment for real-world data streams. Smart Cities 4(1), 349–371 (2021)
Montiel, J., Mitchell, R., Frank, E., Pfahringer, B., Abdessalem, T., Bifet, A.: Adaptive XGBoost for evolving data streams. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Hamilton (2020)
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(72), 1–5 (2018). http://jmlr.org/papers/v19/18-251.html
Ramraj, S., Uzir, N., Sunil, R., Banerjee, S.: Experimenting XGBoost algorithm for prediction and classification of different datasets. Int. J. Control Theory Appl. 9, 651–662 (2016)
Schapire, R.E.: The boosting approach to machine learning: an overview. In: Nonlinear Estimation and Classification, pp. 149–171 (2003)
Yan, M.M.W.: Accurate detecting concept drift in evolving data streams. ICT Express 6(4), 332–338 (2020)
Yang, L., Manias, D.M., Shami, A.: Pwpae: an ensemble framework for concept drift adaptation in iot data streams. arXiv preprint arXiv:2109.05013 (2021)
Yu, H., Lu, J., Zhang, G.: Morstreaming: a multioutput regression system for streaming data. IEEE Trans. Syst. Man Cybern. Syst., 1–13 (2021). https://doi.org/10.1109/TSMC.2021.3102978
Acknowledgments
We would like to specially thanks FAPESC – Fundação de Amparo à Pesquisa e Inovação do Estado de Santa Catarina – to partially funded this research work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
de Souza, F.M., Grando, J., Baldo, F. (2022). Adaptive Fast XGBoost for Regression. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-21686-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21685-5
Online ISBN: 978-3-031-21686-2
eBook Packages: Computer ScienceComputer Science (R0)