Interval forecasts based on regression trees for streaming data

Zhao, Xin; Barber, Stuart; Taylor, Charles C.; Milan, Zoka

doi:10.1007/s11634-019-00382-7

Interval forecasts based on regression trees for streaming data

Regular Article
Published: 18 December 2019

Volume 15, pages 5–36, (2021)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Xin Zhao^1,2,
Stuart Barber²,
Charles C. Taylor² &
…
Zoka Milan³

637 Accesses
10 Citations
Explore all metrics

Abstract

In forecasting, we often require interval forecasts instead of just a specific point forecast. To track streaming data effectively, this interval forecast should reliably cover the observed data and yet be as narrow as possible. To achieve this, we propose two methods based on regression trees: one ensemble method and one method based on a single tree. For the ensemble method, we use weighted results from the most recent models, and for the single-tree method, we retain one model until it becomes necessary to train a new model. We propose a novel method to update the interval forecast adaptively using root mean square prediction errors calculated from the latest data batch. We use wavelet-transformed data to capture long time variable information and conditional inference trees for the underlying regression tree model. Results show that both methods perform well, having good coverage without the intervals being excessively wide. When the underlying data generation mechanism changes, their performance is initially affected but can recover relatively quickly as time proceeds. The method based on a single tree performs the best in computational (CPU) time compared to the ensemble method. When compared to ARIMA and GARCH modelling, our methods achieve better or similar coverage and width but require considerably less CPU time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bake off redux: a review and experimental evaluation of recent time series classification algorithms

Article Open access 19 April 2024

A review of predictive uncertainty estimation with machine learning

Article Open access 18 March 2024

Forecasting gold price with the XGBoost algorithm and SHAP interaction values

Article 23 July 2021

References

Appice A, Ceci M (2006) Mining tolerance regions with model trees. In: International symposium on methodologies for intelligent systems. Springer, pp 560–569
Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: International symposium on intelligent data analysis. Springer, pp 249–260
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and aata mining. ACM, pp 139–148
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80
Duarte J, Gama J, Bifet A (2016) Adaptive model rules from high-speed data streams. ACM Trans Knowl Discov Data (TKDD) 10(3):30
Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295
Ghalanos A (2014) rugarch: Univariate GARCH models. R package version 1.3-5
Gholipour A, Hosseini MJ, Beigy H (2013) An adaptive regression tree for non-stationary data streams. In: Proceedings of the 28th annual ACM symposium on applied computing. ACM, pp 815–817
Hothorn T, Zeileis A (2015) partykit: A modular toolkit for recursive partytioning in R. J Mach Learn Res 16:3905–3909. http://jmlr.org/papers/v16/hothorn15a.html
Hothorn T, Hornik K, Van De Wiel MA, Zeileis A (2006a) A lego system for conditional inference. Am Stat 60(3):257–263
Article MathSciNet Google Scholar
Hothorn T, Hornik K, Zeileis A (2006b) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674
Article MathSciNet Google Scholar
Hyndman RJ (2017) forecast: Forecasting functions for time series and linear models. http://pkg.robjhyndman.com/forecast. R package version 8.2
Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 26(3):1–22. http://www.jstatsoft.org/article/view/v027i03
Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Disc 23(1):128–168
Article MathSciNet Google Scholar
Ikonomovska E, Gama J, Džeroski S (2015) Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing 150:458–470
Article Google Scholar
Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp 571–576, New York, NY, USA, 2003. ACM. ISBN 1-58113-737-0. https://doi.org/10.1145/956750.956821
Khosravi A, Nahavandi S, Creighton D (2011) Prediction interval construction and optimization for adaptive neurofuzzy inference systems. IEEE Trans Fuzzy Syst 19(5):983–988
Article Google Scholar
Krawczyk B, Cano A (2018) Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 68:677–692. https://doi.org/10.1016/j.asoc.2017.12.008
Article Google Scholar
Milan Z, Taylor C, Armstrong D, Davies P, Roberts S, Rupnik B, Suddle A (2016) Does preoperative beta-blocker use influence intraoperative hemodynamic profile and post-operative course of liver transplantation? Transpl Proc 48(1):111–115
Article Google Scholar
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: International conference on discovery science. Springer, pp 264–269
Percival DB, Walden AT (2000) Wavelet methods for time series analysis, vol 4. Cambridge University Press, Cambridge
Book Google Scholar
Pfahringer B, Holmes G, Kirkby R (2007) New options for Hoeffding trees. In: Australasian joint conference on artificial intelligence. Springer, pp 90–99
Quan H, Srinivasan D, Khosravi A (2014) Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans Neural Netw Learn Syst 25(2):303–315
Article Google Scholar
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit Lett 33(2):191–198
Article Google Scholar
Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99
Article Google Scholar
Shrestha DL, Solomatine DP (2006) Machine learning approaches for estimation of prediction interval for the model output. Neural Netw 19(2):225–235
Article Google Scholar
Sobhani P, Beigy H (2011) New drift detection method for data streams. Adapt Intell Syst 6943:88–97
Article Google Scholar
Whitcher B (2013) waveslim: Basic wavelet routines for one-, two- and three-dimensional signal processing. http://CRAN.R-project.org/package=waveslim. R package version 1.7.3
Wuertz D, Setz T, Chalabi Y, Boudt C, Chausse P, Miklovac M (2019) fGarch: Rmetrics—autoregressive conditional heteroskedastic modelling. https://CRAN.R-project.org/package=fGarch. R package version 3042.83.1
Yoshida S-I, Hatano K, Takimoto E, Takeda M (2011) Adaptive online prediction using weighted windows. IEICE Trans 94–D:1917–1923
Article Google Scholar
Zhao X, Barber S, Taylor CC, Milan Z (2018) Classification tree methods for panel data using wavelet-transformed time series. Comput Stat Data Anal 127:204–216
Article MathSciNet Google Scholar

Download references

Acknowledgements

Xin Zhao is grateful for the financial support of the China Scholarship Council (CSC) (Grant No. 201506270135) during this research, which was completed during her Ph.D. studies at the University of Leeds.

Author information

Authors and Affiliations

School of Mathematics, Southeast University, Nanjing, 210096, China
Xin Zhao
School of Mathematics, University of Leeds, Leeds, LS2 9JT, UK
Xin Zhao, Stuart Barber & Charles C. Taylor
King’s College Hospital Trust, London, SE5 9RS, UK
Zoka Milan

Authors

Xin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Barber
View author publications
You can also search for this author in PubMed Google Scholar
Charles C. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Zoka Milan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (r 24 KB)

Supplementary material 2 (r 23 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Barber, S., Taylor, C.C. et al. Interval forecasts based on regression trees for streaming data. Adv Data Anal Classif 15, 5–36 (2021). https://doi.org/10.1007/s11634-019-00382-7

Download citation

Received: 04 May 2018
Revised: 05 December 2019
Accepted: 09 December 2019
Published: 18 December 2019
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11634-019-00382-7

Keywords

Mathematics Subject Classification

62M10

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interval forecasts based on regression trees for streaming data

Abstract

Access this article

Similar content being viewed by others

Bake off redux: a review and experimental evaluation of recent time series classification algorithms

A review of predictive uncertainty estimation with machine learning

Forecasting gold price with the XGBoost algorithm and SHAP interaction values

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (r 24 KB)

Supplementary material 2 (r 23 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Interval forecasts based on regression trees for streaming data

Abstract

Access this article

Similar content being viewed by others

Bake off redux: a review and experimental evaluation of recent time series classification algorithms

A review of predictive uncertainty estimation with machine learning

Forecasting gold price with the XGBoost algorithm and SHAP interaction values

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (r 24 KB)

Supplementary material 2 (r 23 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation