Probabilistic forecasting with temporal convolutional neural network
Introduction
Time series forecasting plays a key role in many business decision-making scenarios, such as managing limited resources, optimizing operational processes, among others. Most existing forecasting methods focus on point forecasting, i.e., forecasting the conditional mean or median of future observations. However, probabilistic forecasting becomes increasingly important as it is able to extract richer information from historical data and better capture the uncertainty of the future. In retail business, probabilistic forecasting of product supply and demand is fundamental for successful procurement process and optimal inventory planning. Also, probabilistic shipment forecasting, i.e., generating probability distributions of the delivery volumes of packages, is the key component of the consequent logistics operations, such as labor resource planning and delivery vehicle deployment.
In such circumstances, instead of predicting individual or a small number of time series, one needs to predict thousands or millions of related series. Moreover, there are many more challenges in real-world applications. For instance, new products emerge weekly on retail platforms and one often needs to forecast the demand of products without historical shopping festival data (e.g., Black Friday in North America, “11.11” shopping festival in China). Furthermore, forecasting often requires the consideration of exogenous variables that have significant influence on future demand (e.g., promotion plans provided by operations teams, accurate weather forecasts for brick and mortar retailers). Such forecasting problems can be extended to a variety of domains. Examples include forecasting the web traffic for internet companies [16], the energy consumption for individual households, the load for servers in a data center [33] and traffic flows in transportation domain [20].
Classical forecasting methods, such as ARIMA [AutoRegressive Integrated Moving Average, 5] and exponential smoothing [13], are widely employed for univariate base-level forecasting. To incorporate exogenous covariates, several extensions of these methods have been proposed, such as ARIMAX (AutoRegressive Integrated Moving Average with Explanatory Variable) and dynamic regression models [14]. These models are well-suited for applications in which the structure of the data is well understood and there is sufficient historical data. However, working with thousands or millions of series requires prohibitive labor and computing resources for parameter estimation. Moreover, they are not applicable in situations where historical data is sparse or unavailable.
Models based on Recurrent neural network (RNN) [9] and the sequence to sequence (Seq2Seq) framework [7], [40] have achieved great success in many different sequential tasks such as machine translation [40], language modeling [25] and recently time series forecasting [19], [31], [32], [33], [34], [43]. For example, in the forecasting competition community, the Seq2Seq model based on a gated recurrent unit (GRU) [7] won the Kaggle web traffic forecasting competition [39]. A hybrid model that combines exponential smoothing method and RNN won the M4 forecasting competition, which consists of 100,000 series with different seasonal patterns [22]. However, training with back propagation through time (BPTT) algorithm often hampers efficient computation. In addition, training RNN can be remarkably difficult [30], [44]. Dilated causal convolutional architectures, e.g., Wavenet [28], offers an alternative for modeling sequential data. By stacking layers of dilated causal convolutional nets, receptive fields can be increased, and the long-term correlations can be captured without violating the temporal orders. In addition, in dilated causal convolutional architectures, the training process can be performed in parallel, which guarantees computation efficiency.
Most Seq2Seq frameworks or Wavenet [28] are autoregressive generative models that factorize the joint distribution as a product of the conditionals. In this setting, a one-step-ahead prediction approach is adopted, i.e., first a prediction is generated by using the past observations, and the generated result is then fed back as the ground truth to make further forecasts. More recent research shows that non-autoregressive approaches or direct prediction strategy, predicting observations of all time steps directly, can achieve better performances [1], [10], [43]. In particular, non-autoregressive models are more robust to mis-specification by avoiding error accumulation and thus yield better prediction accuracy. Moreover, training over all the prediction horizons can be parallelized.
Having reviewing all these challenges and developments, in this paper, we propose the Deep Temporal Convolutional Network (DeepTCN), a non-autoregressive probabilistic forecasting framework for large collections of related time series. The main contributions of the paper are as follows:
- •
We propose a CNN-based forecasting framework that provides both parametric and non-parametric approaches for probability density estimation.
- •
The framework, being able to learn latent correlation among series and handle complex real-world forecasting situations such as data sparsity and cold starts, shows high scalability and extensibility.
- •
The model is very flexible and can include exogenous covariates such as additional promotion plans or weather forecasts.
- •
Extensive empirical studies show our framework compares favorably to state-of-the-art methods in both point forecasting and probabilistic forecasting tasks.
The rest of this paper is organized as follows. Section 2 provides a brief review of related work on time series forecasting and deep learning methods for forecasting. In Section 3, we describe the proposed forecasting method, including the neural network architectures, the probabilistic forecasting framework, and the input features. We demonstrate the superiority of the proposed approach via extensive experiments in Section 4 and conclude the paper in Section 5.
Section snippets
Related work
Earlier studies on time series forecasting are mostly based on statistical models, which are mainly generative models based on state space framework such as exponential smoothing, ARIMA models and several other extensions. For these methods, Hyndman et al. [13] and Box et al. [5] provide a comprehensive overview in the context of univariate forecasting.
In recent years, large number of related series are emerging in the routine functioning of many companies. Not sharing information from other
Method
A general probabilistic forecasting problem for multiple related time series can be described as follows: Given a set of time series we denote the future time series as where N is the number of series, t is the length of the historical observations and Ω is the length of the forecasting horizon. Our goal is to model the conditional distribution of the future time series .
Classical generative models are often used to
Datasets
We evaluate the performance of DeepTCN on five datasets. More specifically, within the DeepTCN framework, two models – the non-parametric model that predicts the quantiles and the parametric Gaussian likelihood model – are applied for the forecasting performance evaluation. We refer to them as DeepTCN-Quantile and DeepTCN-Gaussian, respectively, for the rest of the paper.
Table 1 shows the details of the five datasets. JD-demand and JD-shipment are from JD.com, which correspond to two
Conclusion
We present a convolutional-based probabilistic forecasting framework for multiple related time series and show both non-parametric and parametric approaches to model the probabilistic distribution based on neural networks. Our solution can help in the design of practical large-scale forecasting applications, which involves situations such as cold-starts and data sparsity. Results from both industrial datasets and public datasets show that the framework yields superior performance compared to
CRediT authorship contribution statement
Yitian Chen: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Yanfei Kang: Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing - original draft, Writing - review & editing. Yixiong Chen: Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Visualization, Writing -
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Yanfei Kang’s research were supported by the National Natural Science Foundation of China (No. 11701022).
Yitian Chen is a senior algorithm engineer at Bigo, Bigo Beijing R&D center. Prior to that, he was a senior algorithm engineer in intelligent supply chain units, JD.com and R&D Engineer in Yahoo!. Yitian holds a master degree of data science from John Hopkins University and his research interests include statistical learning, optimization and their applications to real industrial problems.
References (45)
- et al.
Forecasting across time series databases using recurrent neural networks on groups of similar series: a clustering approach
Expert Syst. Appl.
(2020) - et al.
The M4 competition: results, findings, conclusion and way forward
Int. J. Forecast.
(2018) - et al.
Time series forecasting of petroleum production using deep lstm recurrent networks
Neurocomputing
(2019) - A. Suilin, 1st Place Solution of Kaggle Web Traffic Time Series Forecasting, 2017,...
- R. Wen, K. Torkkola, B. Narayanaswamy, A multi-Horizon Quantile Recurrent Forecaster, arXiv preprint: 1711.11053...
Backpropagation through time: what it does and how to do it
Proc. IEEE
(1990)- S. Bai, J. Z. Kolter, V. Koltun, An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence...
- F.M. Bianchi, E. Maiorino, M.C. Kampffmeyer, A. Rizzi, R. Jenssen, An Overview and Comparative Analysis of Recurrent...
- A. Borovykh, S. Bohte, C.W. Oosterlee, Conditional Time Series Forecasting With Convolutional Neural Networks, 2017....
- et al.
Time Series Analysis: Forecasting and Control
(2015)
Learning phrase representations using RNN encoder-decoder for statistical machine translation
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
Probabilistic forecasting with spline quantile function RNNs
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Identity mappings in deep residual networks
Proceedings of the European Conference on Computer Vision
Forecasting with Exponential Smoothing: The State Space Approach
Forecasting: Principles and Practice
Batch normalization: accelerating deep network training by reducing internal covariate shift
Proceedings of the International Conference on Machine Learning
LightGBM: a highly efficient gradient boosting decision tree
proceedings of the Advances in Neural Information Processing Systems
Cited by (238)
Spatial–temporal uncertainty-aware graph networks for promoting accuracy and reliability of traffic forecasting[Formula presented]
2024, Expert Systems with ApplicationsLong-term cost planning of data-driven wind-storage hybrid systems
2024, Renewable EnergyDynamic probabilistic risk assessment with K-shortest-paths planning for generating discrete dynamic event trees
2024, Reliability Engineering and System SafetyA study of deep learning-based multi-horizon building energy forecasting
2024, Energy and Buildings
Yitian Chen is a senior algorithm engineer at Bigo, Bigo Beijing R&D center. Prior to that, he was a senior algorithm engineer in intelligent supply chain units, JD.com and R&D Engineer in Yahoo!. Yitian holds a master degree of data science from John Hopkins University and his research interests include statistical learning, optimization and their applications to real industrial problems.
Yanfei Kang is Associate Professor of Statistics in the School of Economics and Management at Beihang University in China. Prior to that, she was Senior R&D Engineer in Big Data Group of Baidu Inc. Yanfei obtained her Ph.D. degree of Applied and Computational Mathematics at Monash University in 2014. She worked as a postdoctoral research fellow in feature-based time series forecasting during 2014 and 2015 at Monash University. Her research interests include time series forecasting, time series visualization, statistical computing and machine learning.
Yixiong Chen is a data scientist in IBM CIC China. He obtained his PhD in Applied Mathematics from King’s College London in 2015. His research interests include deep learning and quantum physics.
Zizhuo Wang is an associate professor at the Department of Industrial and Systems Engineering at University of Minnesota, and the Institute for Data and Decision Analytics at the Chinese University of Hong Kong, Shenzhen. He obtained his PhD in Operations Research from Stanford University in 2012. His research interests include machine learning, optimization, and applications in operations management and revenue management.