Bayesian Estimation and Forecasting of Time Series in statsmodels

— Statsmodels , a Python library for statistical and econometric analysis, has traditionally focused on frequentist inference, including in its models for time series data. This paper introduces the powerful features for Bayesian inference of time series models that exist in statsmodels , with applications to model ﬁtting, forecasting, time series decomposition, data simulation, and impulse response functions.


Introduction
Statsmodels [SP10] is a well-established Python library for statistical and econometric analysis, with support for a wide range of important model classes, including linear regression, ANOVA, generalized linear models (GLM), generalized additive models (GAM), mixed effects models, and time series models, among many others.In most cases, model fitting proceeds by using frequentist inference, such as maximum likelihood estimation (MLE).In this paper, we focus on the class of time series models [MPS11], support for which has grown substantially in statsmodels over the last decade.After introducing several of the most important new model classes -which are by default fitted using MLE -and their features -which include forecasting, time series decomposition and seasonal adjustment, data simulation, and impulse response analysis -we describe the powerful functions that enable users to apply Bayesian methods to a wide range of time series models.
Support for Bayesian inference in Python outside of statsmodels has also grown tremendously, particularly in the realm of probabilistic programming, and includes powerful libraries such as PyMC3 [SWF16], PyStan [CGH + 17], and TensorFlow Probability [DLT + 17].Meanwhile, ArviZ [KCHM19] provides many excellent tools for associated diagnostics and vizualisations.The aim of these libraries is to provide support for Bayesian analysis of a large class of models, and they make available both advanced techniques, including autotuning algorithms, and flexible model specification.By contrast, here we focus on simpler techniques.However, while the libraries above do include some support for time series models, this has not been their primary focus.As a result, introducing Bayesian inference for the well-developed stable of time series models in statsmodels, and providing access to the rich associated feature set already mentioned, presents a complementary option to these more general-purpose libraries. 1

Time series analysis in statsmodels
A time series is a sequence of observations ordered in time, and time series data appear commonly in statistics, economics, finance, climate science, control systems, and signal processing, among many other fields.One distinguishing characteristic of many time series is that observations that are close in time tend to be more correlated, a feature known as autocorrelation.While successful analyses of time series data must account for this, statistical models can harness it to decompose a time series into trend, seasonal, and cyclical components, produce forecasts of future data, and study the propagation of shocks over time.
We now briefly review the models for time series data that are available in statsmodels and describe their features. 2

Exponential smoothing models
Exponential smoothing models are constructed by combining one or more simple equations that each describe some aspect of the evolution of univariate time series data.While originally somewhat ad hoc, these models can be defined in terms of a proper statistical model (for example, see [HKOS08]).They have enjoyed considerable popularity in forecasting (for example, see the implementation in R described by [HA18]).A prototypical example that allows for trending data and a seasonal component -often known as the additive "Holt-Winters' method" -can be written as where l t is the level of the series, b t is the trend, s t is the seasonal component of period m, and α, β , γ are parameters of the model.When augmented with an error term with some given probability distribution (usually Gaussian), likelihood-based inference can be used to estimate the parameters.In statsmodels, 1.In addition, it is possible to combine the sampling algorithms of PyMC3 with the time series models of statsmodels, although we will not discuss this approach in detail here.See, for example, https://www.statsmodels.org/v0.13.0/examples/notebooks/generated/statespace_sarimax_pymc3.html.
2. In addition to statistical models, statsmodels also provides a number of tools for exploratory data analysis, diagnostics, and hypothesis testing related to time series data; see https://www.statsmodels.org/stable/tsa.html.
additive exponential smoothing models can be constructed using the statespace.ExponentialSmoothing class. 3 These models have become popular for time series analysis and forecasting, as they are flexible and the estimated components are intuitive.Indeed, Google's Causal Impact library [BGK + 15] uses a Bayesian structural time series approach directly, and Facebook's Prophet library [TL17] uses a conceptually similar framework and is estimated using PyStan.
Autoregressive moving-average models Autoregressive moving-average (ARMA) models, ubiquitous in time series applications, are well-supported in statsmodels, including their generalizations, abbreviated as "SARIMAX", that allow for integrated time series data, explanatory variables, and seasonal effects. 4A general version of this model, excluding integration, can be written as where ε t ∼ N(0, σ 2 ).These are constructed in statsmodels with the ARIMA class; the following code shows how to construct a variety of autoregressive moving-average models for consumer price data: # AR(2) model model_ar2 = sm.tsa.ARIMA(y, order=(2, 0, 0))

3.
A second class, ETSModel, can also be used for both additive and multiplicative models, and can exhibit superior performance with maximum likelihood estimation.However, it lacks some of the features relevant for Bayesian inference discussed in this paper.

Vector autoregressive models
While the SARIMAX models above handle univariate series, statsmodels also has support for the multivariate generalization to vector autoregressive (VAR) models. 5These models are written where y t is now considered as an m × 1 vector.As a result, the intercept ν is also an m × 1 vector, the coefficients Φ i are each m × m matrices, and the error term is ε t ∼ N(0 m , Ω), with Ω an m×m matrix.These models can be constructed in statsmodels using the VARMAX class, as follows6 Dynamic factor models statsmodels also supports a second model for multivariate time series: the dynamic factor model (DFM).These models, often used for dimension reduction, posit a few unobserved factors, with autoregressive dynamics, that are used to explain the variation in the observed dataset.In statsmodels, there are two model classes, DynamicFactor`and DynamicFactorMQ, that can fit versions of the DFM.Here we focus on the DynamicFactor class, for which the model can be written Here again, the observation is assumed to be m × 1, but the factors are k × 1, where it is possible that k << m.As before, we assume conformable coefficient matrices and Gaussian errors.
The following code shows how to construct a DFM in statsmodels DynamicFactor, and DynamicFactorMQ) are implemented as part of a broader class of models, referred to as linear Gaussian state space models (hereafter for brevity, simply "state space models" or SSM).This class of models can be written as where α t represents an unobserved vector containing the "state" of the dynamic system.In general, the model is multivariate, with y t and ε t m × 1 vector, α t k × 1, and η t r times 1.
Powerful tools exist for state space models to estimate the values of the unobserved state vector, compute the value of the likelihood function for frequentist inference, and perform posterior sampling for Bayesian inference.These tools include the celebrated Kalman filter and smoother and a simulation smoother, all of which are important for conducting Bayesian inference for these models. 7The implementation in statsmodels largely follows the treatment in [DK12], and is described in more detail in [Ful15].
In addition to these key tools, state space models also admit general implementations of useful features such as forecasting, data simulation, time series decomposition, and impulse response analysis.As a consequence, each of these features extends to each of the time series models described above.Figure 1  Nearly identical code could be used for any of the model classes introduced above, since they are all implemented as part of the same state space model framework.In the next section, we show how these features can be used to perform Bayesian inference with these models.

Bayesian inference via Markov chain Monte Carlo
We begin by giving a cursory overview of the key elements of Bayesian inference required for our purposes here. 8In brief, the Bayesian approach stems from Bayes' theorem, in which the posterior distribution for an object of interest is derived as proportional to the combination of a prior distribution and the likelihood function prior Here, we will be interested in the posterior distribution of the parameters of our model and of the unobserved states, conditional on the chosen model specification and the observed time series data.While in most cases the form of the posterior cannot be derived analytically, simulation-based methods such as Markov chain Monte Carlo (MCMC) can be used to draw samples that approximate the posterior distribution nonetheless.While PyMC3, PyStan, and TensorFlow Probability emphasize Hamiltonian Monte Carlo (HMC) and no-U-turn sampling (NUTS) MCMC methods, we focus on the simpler random walk Metropolis-Hastings (MH) and Gibbs sampling (GS) methods.These are standard MCMC methods that have enjoyed great success in time series applications and which are simple to implement, given the state space framework already available in statsmodels.In addition, the ArviZ library is designed to work with MCMC output from any source, and we can easily adapt it to our use.
With either Metropolis-Hastings or Gibbs sampling, our procedure will produce a sequence of sample values (of parameters and / or the unobserved state vector) that approximate draws from the posterior distribution arbitrarily well, as the number of length of the chain of samples becomes very large.

Random walk Metropolis-Hastings
In random walk Metropolis-Hastings (MH), we begin with an arbitrary point as the initial sample, and then iteratively construct new samples in the chain as follows.At each iteration, (a) construct a proposal by perturbing the previous sample by a Gaussian random variable, and then (b) accept the proposal with some probability.If a proposal is accepted, it becomes the next sample in the chain, while if it is rejected then the previous sample value is carried over.Here, we show how to implement Metropolis-Hastings estimation of the variance parameter in a simple model, which only requires the use of the log-likelihood computation introduced above.The approximate posterior distribution, constructed from the sample chain, is shown in Figure 2.
8. While a detailed description of these issues is out of the scope of this paper, there are many superb references on this topic.We refer the interested reader to [WH99], which provides a book-length treatment of Bayesian inference for state space models, and [KN99], which provides many examples and applications.

Gibbs sampling
Gibbs sampling (GS) is a special case of Metropolis-Hastings (MH) that is applicable when it is possible to produce draws directly from the conditional distributions of every variable, even though it is still not possible to derive the general form of the joint posterior.While this approach can be superior to random walk MH when it is applicable, the ability to derive the conditional distributions typically requires the use of a "conjugate" prior -i.e., a prior from some specific family of distributions.For example, above we specified a uniform distribution as the prior when sampling via MH, but that is not possible with Gibbs sampling.Here, we show how to implement Gibbs sampling estimation of the variance parameter, now making use of an inverse Gamma prior, and the simulation smoother introduced above.The approximate posterior distribution, constructed from the sample chain, is shown in Figure 3.

Illustrative examples
For clarity and brevity, the examples in the previous section gave results for simple cases.However, these basic methods carry through to each of the models introduced earlier, including in cases with multivariate data and hundreds of parameters.Moreover, the Metropolis-Hastings approach can be combined with the Gibbs sampling approach, so that if the end user wishes to use Gibbs sampling for some parameters, they are not restricted to choose only conjugate priors for all parameters.
In addition to sampling the posterior distributions of the parameters, this method allows sampling other objects of interest, including forecasts of observed variables, impulse response functions, and the unobserved state vector.This last possibility is especially useful in cases such as the structural time series model, in which the unobserved states correspond to interpretable elements such as the trend and seasonal components.We provide several illustrative examples of the various types of analysis that are possible.

Forecasting and Time Series Decomposition
In our first example, we apply the Gibbs sampling approach to a structural time series model in order to forecast U.S. Industrial Production and to produce a decomposition of the series into level, trend, and seasonal components.The model is Here, we set the seasonal periodicity to s=12, since Industrial Production is a monthly variable.We can construct this model in Statsmodels as 9 To produce the time-series decomposition into level, trend, and seasonal components, we will use samples from the posterior of the state vector (µ t , β t , γ t ) for each time period t.These are immediately available when using the Gibbs sampling approach; in the earlier example, the draw at each iteration was assigned to the variable sample_state.To produce forecasts, we need to draw from the posterior predictive distribution for horizons h = 1, 2, . . .H.This can be easily accomplished by using the simulate method introduced earlier.To be concrete, we can accomplish these tasks by modifying section (b) of our Gibbs sampler iterations as follows: 9. This model is often referred to as a "local linear trend" model (with additionally a seasonal component); lltrend is an abbreviation of this name.These forecasts and the decomposition into level, trend, and seasonal components are summarized in Figures 4 and 5, which show the median values along with 80% credible intervals.Notably, the intervals shown incorporate for both the uncertainty arising from the stochastic terms in the model as well as the need to estimate the models' parameters. 10

Casual impacts
A closely related procedure described in [BGK + 15] uses a Bayesian structural time series model to estimate the "causal impact" of some event on some observed variable.This approach stops estimation of the model just before the date of an event and produces a forecast by drawing from the posterior predictive density, using the procedure described just above.It then uses the difference between the actual path of the data and the forecast to estimate impact of the event.
An example of this approach is shown in Figure 6, in which we use this method to illustrate the effect of the COVID-19 pandemic 10.The popular Prophet library, [TL17], similarly uses an additive model combined with Bayesian sampling methods to produce forecasts and decompositions, although its underlying model is a GAM rather than a state space model.

Extensions
There are many extensions to the time series models presented here that are made possible when using Bayesian inference.First, it is easy to create custom state space models within the statsmodels framework.As one example, the statsmodels documentation describes how to create a model that extends the typical VAR described above with time-varying parameters. 12These custom state space models automatically inherit all the functionality described above, so that Bayesian inference can be conducted in exactly the same way.
Second, because the general state space model available in statsmodels and introduced above allows for time-varying system matrices, it is possible using Gibbs sampling methods to introduce support for automatic outlier handling, stochastic volatility, and regime switching models, even though these are largely infeasible in statsmodels when using frequentist methods such as maximum likelihood estimation. 13

Conclusion
This paper introduces the suite of time series models available in statsmodels and shows how Bayesian inference using Markov chain Monte Carlo methods can be applied to estimate their parameters and produce analyses of interest, including time series decompositions and forecasts.11.In this example, we used a local linear trend model with no seasonal component.
13. See, for example, [SW16] for an application of these techniques that handles outliers, [KSC98] for stochastic volatility, and [KN98] for an application to dynamic factor models with regime switching.

import statsmodels.api as sm
The following code shows how to apply the additive Holt-Winters model above to model quarterly data on consumer prices: