Elsevier

Neurocomputing

Volume 399, 25 July 2020, Pages 491-501
Neurocomputing

Probabilistic forecasting with temporal convolutional neural network

https://doi.org/10.1016/j.neucom.2020.03.011Get rights and content

Abstract

We present a probabilistic forecasting framework based on convolutional neural network (CNN) for multiple related time series forecasting. The framework can be applied to estimate probability density under both parametric and non-parametric settings. More specifically, stacked residual blocks based on dilated causal convolutional nets are constructed to capture the temporal dependencies of the series. Combined with representation learning, our approach is able to learn complex patterns such as seasonality, holiday effects within and across series, and to leverage those patterns for more accurate forecasts, especially when historical data is sparse or unavailable. Extensive empirical studies are performed on several real-world datasets, including datasets from JD.com, China’s largest online retailer. The results show that our framework compares favorably to the state-of-the-art in both point and probabilistic forecasting.

Introduction

Time series forecasting plays a key role in many business decision-making scenarios, such as managing limited resources, optimizing operational processes, among others. Most existing forecasting methods focus on point forecasting, i.e., forecasting the conditional mean or median of future observations. However, probabilistic forecasting becomes increasingly important as it is able to extract richer information from historical data and better capture the uncertainty of the future. In retail business, probabilistic forecasting of product supply and demand is fundamental for successful procurement process and optimal inventory planning. Also, probabilistic shipment forecasting, i.e., generating probability distributions of the delivery volumes of packages, is the key component of the consequent logistics operations, such as labor resource planning and delivery vehicle deployment.

In such circumstances, instead of predicting individual or a small number of time series, one needs to predict thousands or millions of related series. Moreover, there are many more challenges in real-world applications. For instance, new products emerge weekly on retail platforms and one often needs to forecast the demand of products without historical shopping festival data (e.g., Black Friday in North America, “11.11” shopping festival in China). Furthermore, forecasting often requires the consideration of exogenous variables that have significant influence on future demand (e.g., promotion plans provided by operations teams, accurate weather forecasts for brick and mortar retailers). Such forecasting problems can be extended to a variety of domains. Examples include forecasting the web traffic for internet companies [16], the energy consumption for individual households, the load for servers in a data center [33] and traffic flows in transportation domain [20].

Classical forecasting methods, such as ARIMA [AutoRegressive Integrated Moving Average, 5] and exponential smoothing [13], are widely employed for univariate base-level forecasting. To incorporate exogenous covariates, several extensions of these methods have been proposed, such as ARIMAX (AutoRegressive Integrated Moving Average with Explanatory Variable) and dynamic regression models [14]. These models are well-suited for applications in which the structure of the data is well understood and there is sufficient historical data. However, working with thousands or millions of series requires prohibitive labor and computing resources for parameter estimation. Moreover, they are not applicable in situations where historical data is sparse or unavailable.

Models based on Recurrent neural network (RNN) [9] and the sequence to sequence (Seq2Seq) framework [7], [40] have achieved great success in many different sequential tasks such as machine translation [40], language modeling [25] and recently time series forecasting [19], [31], [32], [33], [34], [43]. For example, in the forecasting competition community, the Seq2Seq model based on a gated recurrent unit (GRU) [7] won the Kaggle web traffic forecasting competition [39]. A hybrid model that combines exponential smoothing method and RNN won the M4 forecasting competition, which consists of 100,000 series with different seasonal patterns [22]. However, training with back propagation through time (BPTT) algorithm often hampers efficient computation. In addition, training RNN can be remarkably difficult [30], [44]. Dilated causal convolutional architectures, e.g., Wavenet [28], offers an alternative for modeling sequential data. By stacking layers of dilated causal convolutional nets, receptive fields can be increased, and the long-term correlations can be captured without violating the temporal orders. In addition, in dilated causal convolutional architectures, the training process can be performed in parallel, which guarantees computation efficiency.

Most Seq2Seq frameworks or Wavenet [28] are autoregressive generative models that factorize the joint distribution as a product of the conditionals. In this setting, a one-step-ahead prediction approach is adopted, i.e., first a prediction is generated by using the past observations, and the generated result is then fed back as the ground truth to make further forecasts. More recent research shows that non-autoregressive approaches or direct prediction strategy, predicting observations of all time steps directly, can achieve better performances [1], [10], [43]. In particular, non-autoregressive models are more robust to mis-specification by avoiding error accumulation and thus yield better prediction accuracy. Moreover, training over all the prediction horizons can be parallelized.

Having reviewing all these challenges and developments, in this paper, we propose the Deep Temporal Convolutional Network (DeepTCN), a non-autoregressive probabilistic forecasting framework for large collections of related time series. The main contributions of the paper are as follows:

  • We propose a CNN-based forecasting framework that provides both parametric and non-parametric approaches for probability density estimation.

  • The framework, being able to learn latent correlation among series and handle complex real-world forecasting situations such as data sparsity and cold starts, shows high scalability and extensibility.

  • The model is very flexible and can include exogenous covariates such as additional promotion plans or weather forecasts.

  • Extensive empirical studies show our framework compares favorably to state-of-the-art methods in both point forecasting and probabilistic forecasting tasks.

The rest of this paper is organized as follows. Section 2 provides a brief review of related work on time series forecasting and deep learning methods for forecasting. In Section 3, we describe the proposed forecasting method, including the neural network architectures, the probabilistic forecasting framework, and the input features. We demonstrate the superiority of the proposed approach via extensive experiments in Section 4 and conclude the paper in Section 5.

Section snippets

Related work

Earlier studies on time series forecasting are mostly based on statistical models, which are mainly generative models based on state space framework such as exponential smoothing, ARIMA models and several other extensions. For these methods, Hyndman et al. [13] and Box et al. [5] provide a comprehensive overview in the context of univariate forecasting.

In recent years, large number of related series are emerging in the routine functioning of many companies. Not sharing information from other

Method

A general probabilistic forecasting problem for multiple related time series can be described as follows: Given a set of time series y1:t={y1:t(i)}i=1N, we denote the future time series as y(t+1):(t+Ω)={y(t+1):(t+Ω)(i)}i=1N, where N is the number of series, t is the length of the historical observations and Ω is the length of the forecasting horizon. Our goal is to model the conditional distribution of the future time series P(y(t+1):(t+Ω)|y1:t).

Classical generative models are often used to

Datasets

We evaluate the performance of DeepTCN on five datasets. More specifically, within the DeepTCN framework, two models – the non-parametric model that predicts the quantiles and the parametric Gaussian likelihood model – are applied for the forecasting performance evaluation. We refer to them as DeepTCN-Quantile and DeepTCN-Gaussian, respectively, for the rest of the paper.

Table 1 shows the details of the five datasets. JD-demand and JD-shipment are from JD.com, which correspond to two

Conclusion

We present a convolutional-based probabilistic forecasting framework for multiple related time series and show both non-parametric and parametric approaches to model the probabilistic distribution based on neural networks. Our solution can help in the design of practical large-scale forecasting applications, which involves situations such as cold-starts and data sparsity. Results from both industrial datasets and public datasets show that the framework yields superior performance compared to

CRediT authorship contribution statement

Yitian Chen: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing. Yanfei Kang: Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing - original draft, Writing - review & editing. Yixiong Chen: Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Visualization, Writing -

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Yanfei Kang’s research were supported by the National Natural Science Foundation of China (No. 11701022).

Yitian Chen is a senior algorithm engineer at Bigo, Bigo Beijing R&D center. Prior to that, he was a senior algorithm engineer in intelligent supply chain units, JD.com and R&D Engineer in Yahoo!. Yitian holds a master degree of data science from John Hopkins University and his research interests include statistical learning, optimization and their applications to real industrial problems.

References (45)

  • T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, Z. Zhang, Mxnet: A Flexible and Efficient...
  • K. Cho et al.

    Learning phrase representations using RNN encoder-decoder for statistical machine translation

    Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2014)
  • J. Gasthaus et al.

    Probabilistic forecasting with spline quantile function RNNs

    Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics

    (2019)
  • A. Graves, Generating Sequences With Recurrent Neural Networks, arXiv preprint: 1308.0850...
  • J. Gu, J. Bradbury, C. Xiong, V.O. Li, R. Socher, Non-Autoregressive Neural Machine Translation, arXiv preprint:...
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • K. He et al.

    Identity mappings in deep residual networks

    Proceedings of the European Conference on Computer Vision

    (2016)
  • R. Hyndman et al.

    Forecasting with Exponential Smoothing: The State Space Approach

    (2008)
  • R.J. Hyndman et al.

    Forecasting: Principles and Practice

    (2018)
  • S. Ioffe et al.

    Batch normalization: accelerating deep network training by reducing internal covariate shift

    Proceedings of the International Conference on Machine Learning

    (2015)
  • Kaggle, Web Traffic Time Series Forecasting, 2017,...
  • G. Ke et al.

    LightGBM: a highly efficient gradient boosting decision tree

    proceedings of the Advances in Neural Information Processing Systems

    (2017)
  • Cited by (238)

    View all citing articles on Scopus

    Yitian Chen is a senior algorithm engineer at Bigo, Bigo Beijing R&D center. Prior to that, he was a senior algorithm engineer in intelligent supply chain units, JD.com and R&D Engineer in Yahoo!. Yitian holds a master degree of data science from John Hopkins University and his research interests include statistical learning, optimization and their applications to real industrial problems.

    Yanfei Kang is Associate Professor of Statistics in the School of Economics and Management at Beihang University in China. Prior to that, she was Senior R&D Engineer in Big Data Group of Baidu Inc. Yanfei obtained her Ph.D. degree of Applied and Computational Mathematics at Monash University in 2014. She worked as a postdoctoral research fellow in feature-based time series forecasting during 2014 and 2015 at Monash University. Her research interests include time series forecasting, time series visualization, statistical computing and machine learning.

    Yixiong Chen is a data scientist in IBM CIC China. He obtained his PhD in Applied Mathematics from King’s College London in 2015. His research interests include deep learning and quantum physics.

    Zizhuo Wang is an associate professor at the Department of Industrial and Systems Engineering at University of Minnesota, and the Institute for Data and Decision Analytics at the Chinese University of Hong Kong, Shenzhen. He obtained his PhD in Operations Research from Stanford University in 2012. His research interests include machine learning, optimization, and applications in operations management and revenue management.

    View full text