State Space Methods in Stata

We illustrate how to estimate parameters of linear state-space models using the Stata program sspace. We provide examples of how to use sspace to estimate the parameters of unobserved-component models, vector autoregressive moving-average models, and dynamic-factor models. We also show how to compute one-step, filtered, and smoothed estimates of the series and the states; dynamic forecasts and their confidence intervals; and residuals.


Introduction
Stata is a general purpose package for statistics, graphics, data management, and matrix language programming.Stata's coverage of statistical areas is one of the most complete available, with many commands for regression analysis (StataCorp 2009k,l,m), multivariate statistics (StataCorp 2009i), panel-data analysis (StataCorp 2009h), survey data analysis (StataCorp 2009n), survival analysis and epidemiology statistics (StataCorp 2009o), and timeseries analysis (StataCorp 2009p).It is used for data management (Mitchell 2010), health research (Juul and Frydenberg 2010;Cleves, Gould, Gutierrez, and Marchenko 2010), as well as in economic analysis (Cameron and Trivedi 2009;Baum 2006).Stata is also a programming language used by researchers to implement and disseminate their methods; see any of the more than 40 issues of The Stata Journal for examples of peer-reviewed user-written programs and see StataCorp (2009j,f,g) for Stata's programming capabilities.
The Stata command sspace, released in version 11, estimates the parameters of linear statespace models by maximum likelihood (StataCorp 2009e).As demonstrated by Harvey (1989) and Commandeur, Koopman, and Ooms (2011), linear state-space models are very flexible, and many linear time-series models can be written as linear state-space models.In this article, we show how to use sspace to estimate the parameters of linear state-space models.We also note that Stata has some additional commands, such as dfactor, which provide simpler syntaxes for estimating the parameters of particular linear state-space models.
Because of this flexibility, sspace has two syntaxes; we call them the covariance-form syntax and the error-form syntax.They are illustrated by estimating the parameters of a locallinear-trend model with a seasonal component and a vector autoregressive moving-average (VARMA) model, respectively.In each syntax, the user must specify one or more state equations, one or more observation equations, and the stochastic components.

Case 1: The local-level model
The local-level model is described by Commandeur et al. (2011, Section 2.1) and we briefly review it here.The observation and state equations of this model are respectively, where t ∼ N (0, σ 2 ) and ξ t ∼ N (0, σ 2 ξ ) and both are independent.We express the level component at time t, µ t , as a function of that at time t − 1.This notation is a subtle change from that in Commandeur et al. (2011), but it is more consistent with the syntax of Stata's sspace for describing the model and how sspace executes the state-space recursions by starting with index 0 instead of 1.The parameters in this model are σ 2 , σ 2 ξ , and µ 0 .

Covariance-form syntax
The covariance-form syntax of sspace is as follows: sspace state_eq [state_eq ... state_eq] obs_eq [obs_eq ... obs_eq] [if] [in] [, options] where state_eq are state equations of the form  for Newton-Raphson.Optimization techniques may be mixed; such is the default, technique (BHHH 5 NR), which specifies the BHHH method for the first 5 iterations and NR for the remaining iterations.An example of a display option is level(), which allows you to set the confidence level to something other than the default of 95%.
We clarify this syntax in the following example.

Estimating the variances of a local-level model using sspace
Here we illustrate the sspace syntax by estimating the parameters of the local-level model The describe command will display a dataset's size, its variables, their storage type and format, any labels associated with the variables, sorting information, and any descriptive information that you have added to document your data.
Sorted by: year Stata computes time-series operators of variables using a time variable specified by the tsset command.Below we specify year to be our time variable; we tsset the data, in Stata parlance.
. tsset year time variable: year, 1871 to 1970 delta: 1 year We could now use sspace to estimate the parameters using the code While this code is transparent to Stata users, we discuss it in some detail for readers who are unaccustomed to Stata.
The first two lines define constraints on the model parameters, as discussed below.The third line begins with the command sspace and is followed by the definition of the state equation which is best understood from right to left.The option noconstant specifies that there is no constant term in the equation; the option state specifies the equation as a state equation; and the comma separates the options from equation specification.By specifying the equation as level L.level, we specify level as the name for the unobserved state and we specify that the state equation is level t = αlevel t−1 We use Stata's lag operator, L. in this example, to model level as a linear function of the lagged level.
At the end of third line, the three slashes, ///, denote a line continuation in Stata.In this example, we see that lines 3, 4, and 5 compose a single Stata command.
The fourth line specifies that the observation equation in the model is where the t are independent and identically distributed (IID) normal errors.As in the state equation above, we used the noconstant option to suppress the constant term.
The model in Equation (1) requires that α = β = 1.Lines 1 and 2 declare these constraints; on line 4, the option constraints(1 2) applies them to this model.
optimize() will not declare convergence until the length of the scaled gradient is smaller than 10 −6 .That is when g , where g k is the gradient on the k-th step and H k is the approximated negative Hessian.The requirement that H k be nonsingular prevents sspace from declaring convergence when the parameters are not identified, as discussed in Drukker and Wiggins (2004).
The standard errors are computed from the negative Hessian unless the variance-covariance option, vce(), specifies otherwise.The OIM in the table header for the standard errors indicates that the standard errors are computed from the observed information matrix.If nonnormal errors are suspected, use vce(robust) to obtain the Huber-White robust standard errors (StataCorp 2009q, robust).
Stata estimation commands store their results in a memory region called ereturn.The results may be accessed by the user and are used by other Stata commands, which are referred to as postestimation commands in Stata parlance.Typing .ereturn list lists the results saved in e().You may view or access any e() result by identifying the object as e(name), where name is the name of the object.
The matrices saved off by sspace are listed in Table 1 along with the Commandeur et al.
Mixing both notations, a linear state-space model is where x t and w t are column vectors of covariates.The vector w t may contain lagged independent variables specified on the left-hand side of observation equations.Commandeur et al. (2011) incorporate the regression coefficent matrices B and F into the state transition matrix T and the observation equation matrix Z, respectively.
Stata's sspace uses the square-root filter to numerically implement the Kalman filter recursions (DeJong 1991b; Durbin and Koopman 2001, Section 6.3).Moreover, when the model is not stationary, as is the case here, the filter is augmented as described by DeJong (1991a), DeJong and Chu-Chun-Lin (1994), and Durbin and Koopman (2001, Section 5.7).The two techniques are used together to evaluate the likelihood (DeJong 1988) and to provide maximum likelihood (ML) estimates of the parameters of the state-space model.The techniques also provide an estimate of the initial state.The initial state, α 0 = µ 0 is diffuse and is modeled as var(µ 0 ) → ∞ and E[µ 0 ] = δ.The ML estimate of δ is 1120.0.This quantity is not reported by sspace, but is stored as e(d).
We can obtain predictions using the predict command, after estimating the parameters.All the standard objects and their standard errors can be predicted using predict after sspace.These objects and the syntax for predict after sspace are discussed in StataCorp (2009d).

Case 1 postestimation
With the local-level model estimates still in memory we predict the smoothed trend of the Nile annual flow volume using the DeJong (1989) diffuse Kalman filter.Here we use the rmse option to obtain the smoothed trend root-mean-square error (RMSE) that is subsequently used to compute 90% confidence intervals.A second call to predict obtains the standardized residuals.We graph the series, trend, and trend confidence intervals in one graph and the standardized residuals in a second graph.We then combine the two graphs into one and allow it to render.This graph is displayed in Figure 1.Next, we demonstrate forecasting.First we use the preserve command to save the original dataset.We then extend the data by 10 years using the tsappend command.We compute .restore

Case 2: A local-linear-trend model
In this section we review the structure of a local-linear-trend model with an autoregressive component, AR(1), and a seasonal component.The state-space form of a time-domain seasonal component is described in Commandeur et al. (2011, Section 2.1).Our state-space model is where , and ω t ∼ N ID(0, σ 2 ω ).Equation ( 8) is the observation equation and it depends on the states µ (the linear trend), η (the AR(1) term), and γ 1 (the seasonal component).The observation equation has no error term.The model has six state equations: two for the linear trend, one for the AR(1) component and three for the seasonal component.

Estimating parameters of the local-linear-trend model using sspace
We now use sspace to estimate the parameters of a local-linear-trend model with an AR(1) component and a seasonal component.We fit this model to quarterly data on the food and tobacco production (FTP) in the United States for the years 1947 to 2000.Cox (2009) food and tobacco production date float %tq - -----------------------------------------------------------------------------Sorted by: date As before we tsset the data: . tsset date time variable: date, 1947q1 to 2000q4 delta: 1 quarter The code to estimate the parameters of the model is: The basic structure is the same as in the previous example.After defining some constraints, we issue the sspace command.The structure of the sspace command is also similar to the previous example.After specifying the state equations, we specify the observation equation, and then we specify the model-level options.The syntaxes for the state equations for the observation equation are similar to those in the previous example.The model-level option covstate(diagonal) is new; it specifies that covariance matrix of the state-errors have a diagonal structure.Each error has its own variance, but the errors are independent of one another.

Case 2 postestimation
After estimation, we can use the predict command to compute estimates of the observables or unobservables using the one-step, filter, or smoothed methods (Durbin and Koopman 2001, Chapter 4;DeJong 1989).The observation equation residuals or standardized residuals may be computed using the one-step or smoothed methods.
Below we compute the one-step estimates of the food and tobacco production: . predict ftp1 (option xb assumed; fitted values) Now we predict the one-step trend: . predict trend, state equation(trend) Finally, we compute the residuals: . predict res, residuals Now we perform some computations to produce more informative graphs.In the code below, we store the index that marks the last quarter of the sample in a local macro and generate a new variable q containing the quarter per annum of each observation.The next block produces the graphs shown in Figure 3. Figure 3 shows the time-series plots using plotting tips by Cox (2009) with the smoothed series and filtered trend.We only graph the last quarter of the sample.(The growth in the series covers up the seasonal detail when we graph the series over the entire sample.) . twoway (scatter ftp date in `n'/L, msymbol(none) mlabel(q) /// > mlabposition(0) ytitle(production) ylabel(#3)) /// > (tsline ftp1 trend in `n'/L), nodraw name(FTP1) . tsline res in `n'/L, nodraw name(RES) yline(0) .graph combine FTP1 RES, name(FTP2) rows(2) .graph drop FTP1 RES Next we illustrate how to forecast estimates.We begin by extending the data, adding two years starting at Q1 of year 2001.
. tsappend, add(8) The next code block predicts ftp, specifying that dynamic forecasts should begin on quarter Q1 of 2001.The function tq(2001q1) translates the string "2001q1" to the appropriate The state equations and the observation equations for the state-space form of this VARMA(1,1) model may be written, respectively, in vector form as where α t,1 = ∆c t , α t,2 = θ 1 η t,1 , α t,3 = ∆h t , and the 2 × 2 covariance matrix cov(η t ) is diagonal.
Next we use the sspace error-form syntax to estimate the parameters of this model.

Error-form syntax
The error-form syntax of sspace has the same overall structure as the covariance form, but it has an extra component in the state equation. (

statevar [lagged_statevars] [indepvars] [state_errors], state [noconstant])
The optional [state_errors] lists state-equation errors that enter a state equation.Each state error has the form e.statevar, where statevar is the name of a state in the model.The state_errors define the covariance structure so the option covstate() is not necessary.Also, the noerror option has no meaning in this style of syntax.

Estimation of the VARMA(1,1)
We now use the error-form syntax of sspace to estimate the parameters of the VARMA(1,1) model whose state-space form is given in Equations ( 10) and ( 11).
The code for estimating the model parameters is given below: The code has the same structure as the previous examples.After defining the contraints, we use them in the sspace command.The sspace command itself has two parts: First come the equations that define the state-space form of the model.Second we specify model-level options.
The code specifies the five equations that define the state-space form of the model.The first three equations are the state equations whose algebraic counterparts are in Equation ( 10).The only difference in the two versions is that the states are named α 1 , α 2 , and α 3 in the algebra and named u1, u2 and u3 in code.The last two equations are the observation equations whose algebraic equivalent is given in Equation ( 11).
We have already discussed the model-level options constraints() and covstate().The model-level option vce(robust) specifies that the standard errors should be estimated using the Huber-White robust estimator which is robust to nonnormal errors in this case.
Below we read in the dataset and run the code.

Case 3 postestimation
We now predict the differenced capital utilization using the one-step predictions and the standardized residuals: . predict pcaputil, equation(D.caputil)(option xb assumed; fitted values) . predict stdres, rstandard equation(D.caputil) predict computes predicted values for both D.caputil and D.hours by default because there are two observation equations in our model.To override the default behavior, we specify the

A dynamic-factor example
State-space models have been used to formulate estimators for popular models such as ARMA and VARMA models and to formulate estimators for new models suggested by the state-space framework.The unobserved-components (UC) model discussed in Harvey (1989) and the dynamic-factor model are two of the most important models that naturally fit into a statespace framework.Above we considered UC models and a VARMA model; now we consider a dynamic-factor model.
Dynamic-factor models are VAR models augmented by unobserved factors that may also have an autoregressive structure.Dynamic-factor models have been applied in macroeconomics, see Geweke (1977), Sargent and Sims (1977), Stock and Watson (1989), Stock and Watson (1991) and Watson and Engle (1983).Lütkepohl (2005) provides a good introduction to dynamic-factor models and their state-space formulation.StataCorp (2009a) provides a quick introduction to these models and has several examples including the one discussed below.
In this example, we consider a dynamic-factor model without exogenous variables in which the dynamic factor follows an AR(2) process, and the disturbances in the equations for the observable variables follow AR(1) processes.This example illustrates how to specify a dynamicfactor model and how to specify an AR(2) process.The dfactor command is an easy-to-use alternative to sspace for dynamic-factor models.
The state-space form of the model we consider is where f t is an unobserved factor that follows an AR(2) process; µ t is a 4 × 1 vector of errors, each of which follows an AR(1) process; and y t is a 4 × 1 vector of dependent variables.Equations ( 12), ( 13), and ( 14) are the state equations.Equation ( 15) is the vector of observation equations.
The system is driven by η t (a scalar IID error) and by t (a 4 × 1 vector of IID errors).By restricting Ψ to be 4 × 4 diagonal matrix, we specify that the unobserved factor is the only source of correlation between the dependent variables.φ 1 and φ 2 are the coefficients of the AR(2) process for the dynamic factor.b is a 4 × 1 vector of coefficients.
We downloaded some US macroeconomic data from the FRED database of the St. Louis Federal Reserve using the freduse command discussed in Drukker (2006).Specifically, we have data on the y t variables industrial production index, ipman; real disposable income, income; an aggregate weekly hours index, hours; and aggregate unemployment, unemp.These data were used in the Stata manuals, so we use the webuse command to download the dataset and read it into memory.
The code is similar to what we have seen in previous examples.Equation (13), which is the second equation specified in the sspace command, is the new element in this example.This method of including lags as additional trivial state equations is a standard trick in state-space modeling, see Lütkepohl (2005, Chapter 18.2) and Hamilton (1994, Chapter 13.1).
The Stata command dfactor provides an easy-to-use syntax for estimating the parameters of dynamic-factor models.For example, the command produces the same parameter estimates as the above sspace command.
predict after sspace and after dfactor provide all the standard options to forecast the observed dependent variables or to extract unobserved factors.We have already illustrated the use of predict after sspace, see StataCorp (2009b) for further examples and a detailed discussion of the underlying mathematics.

Conclusion
We have illustrated how to estimate the parameters of UC models, VARMA models, and dynamic-factor models using Stata's sspace command.Stata's sspace command can estimate the parameters of many other linear state-space models.
Using Stata's ADO programming language (StataCorp 2009q), the sspace command could be used as a computational engine for new estimation commands.The dfactor command is an example.These commands would be easy-to-use versions of sspace, presenting a simplified syntax unique to the target model.Because Stata is such a popular platform among applied researchers, sspace provides an opportunity for theoretical researchers to easily make their methods available to a huge audience of applied researchers.More complicated estimators could combine Stata's byte-compiled matrix language Mata, see StataCorp (2009f) and StataCorp (2009g), and sspace to implement new estimators.

(
statevar [lagged_statevars] [indepvars], state [noerror noconstant covstate(covform)]) and obs_eq are observation equations of the form (depvar [statevars] [indepvars] [, noerror noconstant covobserved(covform)]) A list of state equations, observation equations, and options specifies an sspace model.The square brackets indicate optional arguments, so the syntax diagram indicates that at least one state equation and one observation equation are required.Each equation must be enclosed in parentheses.In Stata parlance, a comma in the command toggles the parser from model specification mode to options specification mode.Options included within an equation are applied to that equation.Options specified outside the individual equations are applied to the model as a whole.

Figure 1 :
Figure 1: In the upper panel we display the Nile annual flow volume time-series (blue) with smoothed trend estimates (red) and trend 90% confidence intervals.The lower panel displays the standardized residuals.

Figure 2 :
Figure 2: The Nile river annual flow volume (blue), one-step predictions and dynamic forecasts (red), and forecast 50% confidence intervals.

Figure 3 :
Figure 3: Quarterly data on food and tobacco production in the United States with smoothed series and the filtered trend in the top panel.The one-step residuals are in the bottom panel.

Figure 4 :
Figure 4: One-step predictions of US food and tobacco production with dynamic predictions starting at Q1 of year 2001.Approximate 95% confidence bounds are also given.

.
dfactor (D.(ipman income hours unemp), noconstant ar(1)) (f = , ar(1/2)) Each state equation specifies the name of a latent variable and must have the state option specified.A state equation optionally contains a list of lagged state variables and a list exogenous covariates.By default, a constant is included in the equation unless the noconstant option is specified.By default, an error term is included in the equation unless the noerror option is specified.The option covstate() allows you to specify the covariance structure of the state equations.The covform in the syntax diagram may be identity, dscalar, diagonal, or unstructured.The default is diagonal.The option dscalar states that the covariance is diagonal and that all the variance terms are equal.Each observation equation specifies the name of an observed dependent variable.An observation equation optionally contains a list of contemporaneous state variables and a list exogenous covariates.By default, a constant is included in the equation unless the noconstant option is specified.By default, an error term is included in the equation unless the noerror option is specified.The option covobserved() allows you to specify the covariance structure of the observation equations.The covariance forms are the same as the option covstate().
The[if]and the [in] specifications allow you to estimate the parameters using a subsample of the observations.The options in the main syntax diagram include model, optimization, and display options.An important model option is constraints(), parameter constraints that identify the model.A popular optimization option is the technique() option.Two good techniques for sspace are technique(BHHH), or the Berndt-Hall-Hall-Hausman technique; and the technique(NR),
uses the dataset to demonstrate graphing seasonal time-series data in Stata.First we read the dataset into memory and describe it:.use http://www.stata.com/ddrukker/ftp.dta(Food and tobacco production in the United States for Drukker 2006)se (seeDrukker 2006)to obtain Federal Reserve data on the capacity utilization rate, caputil, and manufacturing hours, hours, for the US economy (http://research.stlouisfed.org/fred2).Here we model the differenced series, D.caputil and D.hours, as a first-order vector autoregressive moving-average (VARMA(1,1)) process.In this model, we allow the lag of D.caputil to affect D.hours, but we do not allow the lag of D.hours to affect the lag of D.caputil, as was done in StataCorp (2009e, Example 4).