Skip to main content
Log in

Interval forecasts based on regression trees for streaming data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In forecasting, we often require interval forecasts instead of just a specific point forecast. To track streaming data effectively, this interval forecast should reliably cover the observed data and yet be as narrow as possible. To achieve this, we propose two methods based on regression trees: one ensemble method and one method based on a single tree. For the ensemble method, we use weighted results from the most recent models, and for the single-tree method, we retain one model until it becomes necessary to train a new model. We propose a novel method to update the interval forecast adaptively using root mean square prediction errors calculated from the latest data batch. We use wavelet-transformed data to capture long time variable information and conditional inference trees for the underlying regression tree model. Results show that both methods perform well, having good coverage without the intervals being excessively wide. When the underlying data generation mechanism changes, their performance is initially affected but can recover relatively quickly as time proceeds. The method based on a single tree performs the best in computational (CPU) time compared to the ensemble method. When compared to ARIMA and GARCH modelling, our methods achieve better or similar coverage and width but require considerably less CPU time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Appice A, Ceci M (2006) Mining tolerance regions with model trees. In: International symposium on methodologies for intelligent systems. Springer, pp 560–569

  • Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams

  • Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448

  • Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: International symposium on intelligent data analysis. Springer, pp 249–260

  • Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and aata mining. ACM, pp 139–148

  • Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80

  • Duarte J, Gama J, Bifet A (2016) Adaptive model rules from high-speed data streams. ACM Trans Knowl Discov Data (TKDD) 10(3):30

    Google Scholar 

  • Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Brazilian symposium on artificial intelligence. Springer, pp 286–295

  • Ghalanos A (2014) rugarch: Univariate GARCH models. R package version 1.3-5

  • Gholipour A, Hosseini MJ, Beigy H (2013) An adaptive regression tree for non-stationary data streams. In: Proceedings of the 28th annual ACM symposium on applied computing. ACM, pp 815–817

  • Hothorn T, Zeileis A (2015) partykit: A modular toolkit for recursive partytioning in R. J Mach Learn Res 16:3905–3909. http://jmlr.org/papers/v16/hothorn15a.html

  • Hothorn T, Hornik K, Van De Wiel MA, Zeileis A (2006a) A lego system for conditional inference. Am Stat 60(3):257–263

    Article  MathSciNet  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006b) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674

    Article  MathSciNet  Google Scholar 

  • Hyndman RJ (2017) forecast: Forecasting functions for time series and linear models. http://pkg.robjhyndman.com/forecast. R package version 8.2

  • Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 26(3):1–22. http://www.jstatsoft.org/article/view/v027i03

  • Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Disc 23(1):128–168

    Article  MathSciNet  Google Scholar 

  • Ikonomovska E, Gama J, Džeroski S (2015) Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing 150:458–470

    Article  Google Scholar 

  • Jin R, Agrawal G (2003) Efficient decision tree construction on streaming data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’03, pp 571–576, New York, NY, USA, 2003. ACM. ISBN 1-58113-737-0. https://doi.org/10.1145/956750.956821

  • Khosravi A, Nahavandi S, Creighton D (2011) Prediction interval construction and optimization for adaptive neurofuzzy inference systems. IEEE Trans Fuzzy Syst 19(5):983–988

    Article  Google Scholar 

  • Krawczyk B, Cano A (2018) Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 68:677–692. https://doi.org/10.1016/j.asoc.2017.12.008

    Article  Google Scholar 

  • Milan Z, Taylor C, Armstrong D, Davies P, Roberts S, Rupnik B, Suddle A (2016) Does preoperative beta-blocker use influence intraoperative hemodynamic profile and post-operative course of liver transplantation? Transpl Proc 48(1):111–115

    Article  Google Scholar 

  • Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing. In: International conference on discovery science. Springer, pp 264–269

  • Percival DB, Walden AT (2000) Wavelet methods for time series analysis, vol 4. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Pfahringer B, Holmes G, Kirkby R (2007) New options for Hoeffding trees. In: Australasian joint conference on artificial intelligence. Springer, pp 90–99

  • Quan H, Srinivasan D, Khosravi A (2014) Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans Neural Netw Learn Syst 25(2):303–315

    Article  Google Scholar 

  • R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

  • Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit Lett 33(2):191–198

    Article  Google Scholar 

  • Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99

    Article  Google Scholar 

  • Shrestha DL, Solomatine DP (2006) Machine learning approaches for estimation of prediction interval for the model output. Neural Netw 19(2):225–235

    Article  Google Scholar 

  • Sobhani P, Beigy H (2011) New drift detection method for data streams. Adapt Intell Syst 6943:88–97

    Article  Google Scholar 

  • Whitcher B (2013) waveslim: Basic wavelet routines for one-, two- and three-dimensional signal processing. http://CRAN.R-project.org/package=waveslim. R package version 1.7.3

  • Wuertz D, Setz T, Chalabi Y, Boudt C, Chausse P, Miklovac M (2019) fGarch: Rmetrics—autoregressive conditional heteroskedastic modelling. https://CRAN.R-project.org/package=fGarch. R package version 3042.83.1

  • Yoshida S-I, Hatano K, Takimoto E, Takeda M (2011) Adaptive online prediction using weighted windows. IEICE Trans 94–D:1917–1923

    Article  Google Scholar 

  • Zhao X, Barber S, Taylor CC, Milan Z (2018) Classification tree methods for panel data using wavelet-transformed time series. Comput Stat Data Anal 127:204–216

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Xin Zhao is grateful for the financial support of the China Scholarship Council (CSC) (Grant No. 201506270135) during this research, which was completed during her Ph.D. studies at the University of Leeds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Zhao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (r 24 KB)

Supplementary material 2 (r 23 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Barber, S., Taylor, C.C. et al. Interval forecasts based on regression trees for streaming data. Adv Data Anal Classif 15, 5–36 (2021). https://doi.org/10.1007/s11634-019-00382-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-019-00382-7

Keywords

Mathematics Subject Classification

Navigation